The exact silicon. The exact numbers.
No marketing handwaving. This page is the GPU SKUs we operate, the per-node specs, the cluster sizing, and the workload guidance our architects use when they price a deal. If your CIO needs the data sheet before the conversation, this is it.
Available GPU SKUs
Three GPU classes. Pick one, mix three — what matters is the workload.
NVIDIA H200 SXM5
| HBM3e memory | 141 GB |
| Memory bandwidth | 4.8 TB/s |
| FP8 Tensor TFLOPS | 3,958 |
| FP16 / BF16 TFLOPS | 1,979 |
| NVLink | 900 GB/s (4th gen) |
| TDP | 700 W |
| Best for | Llama-class pretraining, frontier model fine-tunes, long-context inference |
NVIDIA H100 SXM5
| HBM3 memory | 80 GB |
| Memory bandwidth | 3.35 TB/s |
| FP8 Tensor TFLOPS | 3,958 |
| FP16 / BF16 TFLOPS | 1,979 |
| NVLink | 900 GB/s (4th gen) |
| TDP | 700 W |
| Best for | Production inference, RAG, mid-size fine-tunes, computer vision, recommender systems |
NVIDIA A100 SXM4
| HBM2e memory | 80 GB |
| Memory bandwidth | 2.0 TB/s |
| BF16 / FP16 TFLOPS | 624 |
| FP32 TFLOPS | 19.5 |
| NVLink | 600 GB/s (3rd gen) |
| TDP | 400 W |
| Best for | Dev environments, small-model training, traditional HPC, fraud / anomaly detection |
Per-node configurations
Three node SKUs we ship today. Custom configurations on request for > 100 GPU deployments.
HyperAI Node — H200 (8× SXM5)
- 8× NVIDIA H200 SXM5 141 GB (NVLink + NVSwitch fully meshed)
- 2× Intel Xeon Platinum 8480+ (56C / 112T each, 224 threads/node)
- 2 TB DDR5-4800 ECC RAM
- 30 TB NVMe Gen4 local scratch
- 4× ConnectX-7 NDR 400 Gb/s (1.6 Tb/s aggregate IB) + 2× 100 GbE mgmt
- ~10.2 kW typical / ~12 kW peak; rear-door liquid-assisted
HyperAI Node — H100 (8× SXM5)
- 8× NVIDIA H100 SXM5 80 GB (NVLink + NVSwitch fully meshed)
- 2× Intel Xeon Platinum 8470 (52C each)
- 2 TB DDR5-4800 ECC RAM
- 30 TB NVMe Gen4 local scratch
- 4× ConnectX-7 NDR 400 Gb/s + 2× 100 GbE mgmt
- ~10 kW typical; same cooling envelope as H200 node
HyperAI Node — A100 (8× SXM4)
- 8× NVIDIA A100 SXM4 80 GB (NVLink 3rd gen + NVSwitch)
- 2× AMD EPYC 7763 (64C each)
- 1 TB DDR4-3200 ECC RAM
- 15 TB NVMe Gen4 local scratch
- 2× ConnectX-6 HDR 200 Gb/s + 2× 100 GbE mgmt
- ~6.5 kW typical; air-cooled compatible
Cluster sizing reference
What customers actually buy, and what they get from each tier.
1 node · 8 GPUs
Use: Inference for Llama 70B + RAG for one department. Or fine-tune a 7-13B model. Or POC.
Throughput: ~30K inference tokens/sec on Llama 3.1 70B with batching.
3 nodes · 24 GPUs
Use: Production inference for an enterprise (multiple LLMs, RAG, embeddings). Or train a 13-34B model.
Throughput: ~90K inference tokens/sec aggregate; train 13B model end-to-end in ~30 days.
8 nodes · 64 GPUs
Use: National-scale sovereign LLM hosting. Multi-tenant inference for a ministry or telco.
Throughput: Train a 70B model in ~60 days; serve 10× the standard inference load.
16-128+ nodes · 128-1,000+ GPUs
Use: Frontier-model training, GPU-as-a-service for telcos selling AI to their enterprise base, multi-region sovereign HPC.
Custom-engineered. Fabric, storage, power, and cooling designed per deployment.
What you can run on HyperAI today
Validated workloads — not theoretical. Real numbers from MomentumX customer pilots.
| Workload | Cluster | Indicative performance |
|---|---|---|
| Llama 3.1 70B inference (FP8, bsz 32) | 1 node H100×8 | ~30K tok/s aggregate, <120 ms TTFT |
| Llama 3.1 70B inference (FP8, bsz 64) | 3 nodes H100×24 | ~90K tok/s aggregate, <90 ms TTFT |
| Mixtral 8×22B inference | 1 node H100×8 | ~22K tok/s, FP8 quant |
| Falcon 180B inference | 3 nodes H100×24 | ~14K tok/s, BF16 |
| Llama 13B fine-tune (LoRA, 10B tokens) | 3 nodes H100×24 | ~36 hours wall-clock |
| Stable Diffusion XL inference | 1 node A100×8 | ~80 images/sec at 1024×1024 |
| RAG pipeline (embeddings + retrieval + LLM) | 1 node H100×8 | ~5K end-to-end RAG queries/sec |
Numbers are indicative on production hardware with TensorRT-LLM / vLLM + NCCL-tuned fabric. Your workload, batch shape, sequence length, and quantisation will move them — sometimes meaningfully. We benchmark on your workload during the 14-day POC.
Need the spec sheet for procurement?
We have a one-page PDF version of this content with our architect’s contact details, cluster pricing model, and SAMA / NCA / PDPL compliance pointers — designed to drop into your tender folder. Ask, and we’ll send it the same day.


