HyperAI Specifications

The exact silicon. The exact numbers.

No marketing handwaving. This page is the GPU SKUs we operate, the per-node specs, the cluster sizing, and the workload guidance our architects use when they price a deal. If your CIO needs the data sheet before the conversation, this is it.

Apply for a 14-day POC
See cluster architecture

Available GPU SKUs

Three GPU classes. Pick one, mix three — what matters is the workload.

Flagship — training

NVIDIA H200 SXM5

HBM3e memory	141 GB
Memory bandwidth	4.8 TB/s
FP8 Tensor TFLOPS	3,958
FP16 / BF16 TFLOPS	1,979
NVLink	900 GB/s (4th gen)
TDP	700 W
Best for	Llama-class pretraining, frontier model fine-tunes, long-context inference

Workhorse — training + inference

NVIDIA H100 SXM5

HBM3 memory	80 GB
Memory bandwidth	3.35 TB/s
FP8 Tensor TFLOPS	3,958
FP16 / BF16 TFLOPS	1,979
NVLink	900 GB/s (4th gen)
TDP	700 W
Best for	Production inference, RAG, mid-size fine-tunes, computer vision, recommender systems

Value — dev + small models

NVIDIA A100 SXM4

HBM2e memory	80 GB
Memory bandwidth	2.0 TB/s
BF16 / FP16 TFLOPS	624
FP32 TFLOPS	19.5
NVLink	600 GB/s (3rd gen)
TDP	400 W
Best for	Dev environments, small-model training, traditional HPC, fraud / anomaly detection

Per-node configurations

Three node SKUs we ship today. Custom configurations on request for > 100 GPU deployments.

HyperAI Node — H200 (8× SXM5)

8× NVIDIA H200 SXM5 141 GB (NVLink + NVSwitch fully meshed)
2× Intel Xeon Platinum 8480+ (56C / 112T each, 224 threads/node)
2 TB DDR5-4800 ECC RAM
30 TB NVMe Gen4 local scratch
4× ConnectX-7 NDR 400 Gb/s (1.6 Tb/s aggregate IB) + 2× 100 GbE mgmt
~10.2 kW typical / ~12 kW peak; rear-door liquid-assisted

HyperAI Node — H100 (8× SXM5)

8× NVIDIA H100 SXM5 80 GB (NVLink + NVSwitch fully meshed)
2× Intel Xeon Platinum 8470 (52C each)
2 TB DDR5-4800 ECC RAM
30 TB NVMe Gen4 local scratch
4× ConnectX-7 NDR 400 Gb/s + 2× 100 GbE mgmt
~10 kW typical; same cooling envelope as H200 node

HyperAI Node — A100 (8× SXM4)

8× NVIDIA A100 SXM4 80 GB (NVLink 3rd gen + NVSwitch)
2× AMD EPYC 7763 (64C each)
1 TB DDR4-3200 ECC RAM
15 TB NVMe Gen4 local scratch
2× ConnectX-6 HDR 200 Gb/s + 2× 100 GbE mgmt
~6.5 kW typical; air-cooled compatible

Cluster sizing reference

What customers actually buy, and what they get from each tier.

Pilot

1 node · 8 GPUs

Use: Inference for Llama 70B + RAG for one department. Or fine-tune a 7-13B model. Or POC.

Throughput: ~30K inference tokens/sec on Llama 3.1 70B with batching.

Standard

3 nodes · 24 GPUs

Use: Production inference for an enterprise (multiple LLMs, RAG, embeddings). Or train a 13-34B model.

Throughput: ~90K inference tokens/sec aggregate; train 13B model end-to-end in ~30 days.

Sovereign Pod

8 nodes · 64 GPUs

Use: National-scale sovereign LLM hosting. Multi-tenant inference for a ministry or telco.

Throughput: Train a 70B model in ~60 days; serve 10× the standard inference load.

Custom

16-128+ nodes · 128-1,000+ GPUs

Use: Frontier-model training, GPU-as-a-service for telcos selling AI to their enterprise base, multi-region sovereign HPC.

Custom-engineered. Fabric, storage, power, and cooling designed per deployment.

What you can run on HyperAI today

Validated workloads — not theoretical. Real numbers from MomentumX customer pilots.

Workload	Cluster	Indicative performance
Llama 3.1 70B inference (FP8, bsz 32)	1 node H100×8	~30K tok/s aggregate, <120 ms TTFT
Llama 3.1 70B inference (FP8, bsz 64)	3 nodes H100×24	~90K tok/s aggregate, <90 ms TTFT
Mixtral 8×22B inference	1 node H100×8	~22K tok/s, FP8 quant
Falcon 180B inference	3 nodes H100×24	~14K tok/s, BF16
Llama 13B fine-tune (LoRA, 10B tokens)	3 nodes H100×24	~36 hours wall-clock
Stable Diffusion XL inference	1 node A100×8	~80 images/sec at 1024×1024
RAG pipeline (embeddings + retrieval + LLM)	1 node H100×8	~5K end-to-end RAG queries/sec

Numbers are indicative on production hardware with TensorRT-LLM / vLLM + NCCL-tuned fabric. Your workload, batch shape, sequence length, and quantisation will move them — sometimes meaningfully. We benchmark on your workload during the 14-day POC.

Need the spec sheet for procurement?

We have a one-page PDF version of this content with our architect’s contact details, cluster pricing model, and SAMA / NCA / PDPL compliance pointers — designed to drop into your tender folder. Ask, and we’ll send it the same day.

Request the HyperAI spec sheet
Or apply for a 14-day POC