K-AI 96 ROME 4090
Powerful configuration for AI inference, LLM and Deep Learning with 2644 TOPS performance.
Introducing a 4U rack-mount server designed for the most demanding AI workloads. Optimized for running large language models, image generation, and complex data analysis.
Configure and buy
2644 TOPS
Extreme computing power for instant response of modern AI models.
96 GB VRAM
4× NVIDIA RTX 4090 for smooth running of Llama 3.3, Qwen and DeepSeek models.
32 CORES
AMD EPYC 7542 (Rome) with 64 threads for handling massive data streams.
256 GB RAM
Server ECC memory ensuring system stability under 24/7 load.
Why choose K-AI 96 ROME?
This machine offers an unbeatable price-performance ratio thanks to the use of four NVIDIA GeForce RTX 4090 graphics cards. It is an ideal choice for:
- Inference gateway for businesses: Operation of internal chatbots (70B models) for 50–200 employees.
- Generative AI: Flash media generation using FLUX.1, SDXL or Wan 2.2.
- Fine-tuning: Efficient tuning of models (LoRA/QLoRA) with sizes of 7–34B parameters.
- RAG (Retrieval-Augmented Generation): Intelligent work with company documentation in real time.

Complete Technical Specifications
| Component | Specifications |
|---|---|
| Graphics cards | 4× NVIDIA GeForce RTX 4090 (each 24 GB GDDR6X, PCIe 4.0 x16) |
| processor | AMD EPYC 7542 (32 cores / 64 threads, TDP 225 W) |
| Motherboard | ASRock Rack ROMED8-2T with IPMI support for remote management |
| Operation memory | 256 GB DDR4-2666 ECC RDIMM (Expandable up to 512 GB) |
| Storage | 2TB NVMe M.2 (PCIe 4.0 x4) for lightning-fast system startup |
| Power supply | Dual synchronized 2 kW ATX power supply (total 4000 W) |
| Cooling | Industrial 120mm fans with optimized front-to-back flow |
| Operating system | Pre-installed Ubuntu + CUDA + Docker + AI Frameworks (vLLM, ComfyUI) |
Measured performance in practice:
Our laboratory tests confirm top efficiency:
- Llama 3.3 70B (AWQ INT4): Reaches up to 179 tok/s at batch-32.
- GPU memory throughput: 920 GB/s per card.
- Deployment time: The server is ready to work within 16-20 months (in case of rental/leasing) or for immediate shipment.
Do you need an individual configuration?
We can adjust the RAM size, NVMe disk capacity, or add additional network elements according to your needs.
Request an individual offer







