HyperAI Specifications

The exact silicon. The exact numbers.

No marketing handwaving. This page is the GPU SKUs we operate, the per-node specs, the cluster sizing, and the workload guidance our architects use when they price a deal. If your CIO needs the data sheet before the conversation, this is it.

Available GPU SKUs

Three GPU classes. Pick one, mix three — what matters is the workload.

Flagship — training

NVIDIA H200 SXM5

HBM3e memory141 GB
Memory bandwidth4.8 TB/s
FP8 Tensor TFLOPS3,958
FP16 / BF16 TFLOPS1,979
NVLink900 GB/s (4th gen)
TDP700 W
Best forLlama-class pretraining, frontier model fine-tunes, long-context inference
Workhorse — training + inference

NVIDIA H100 SXM5

HBM3 memory80 GB
Memory bandwidth3.35 TB/s
FP8 Tensor TFLOPS3,958
FP16 / BF16 TFLOPS1,979
NVLink900 GB/s (4th gen)
TDP700 W
Best forProduction inference, RAG, mid-size fine-tunes, computer vision, recommender systems
Value — dev + small models

NVIDIA A100 SXM4

HBM2e memory80 GB
Memory bandwidth2.0 TB/s
BF16 / FP16 TFLOPS624
FP32 TFLOPS19.5
NVLink600 GB/s (3rd gen)
TDP400 W
Best forDev environments, small-model training, traditional HPC, fraud / anomaly detection

Per-node configurations

Three node SKUs we ship today. Custom configurations on request for > 100 GPU deployments.

HyperAI Node — H200 (8× SXM5)

  • 8× NVIDIA H200 SXM5 141 GB (NVLink + NVSwitch fully meshed)
  • 2× Intel Xeon Platinum 8480+ (56C / 112T each, 224 threads/node)
  • 2 TB DDR5-4800 ECC RAM
  • 30 TB NVMe Gen4 local scratch
  • 4× ConnectX-7 NDR 400 Gb/s (1.6 Tb/s aggregate IB) + 2× 100 GbE mgmt
  • ~10.2 kW typical / ~12 kW peak; rear-door liquid-assisted

HyperAI Node — H100 (8× SXM5)

  • 8× NVIDIA H100 SXM5 80 GB (NVLink + NVSwitch fully meshed)
  • 2× Intel Xeon Platinum 8470 (52C each)
  • 2 TB DDR5-4800 ECC RAM
  • 30 TB NVMe Gen4 local scratch
  • 4× ConnectX-7 NDR 400 Gb/s + 2× 100 GbE mgmt
  • ~10 kW typical; same cooling envelope as H200 node

HyperAI Node — A100 (8× SXM4)

  • 8× NVIDIA A100 SXM4 80 GB (NVLink 3rd gen + NVSwitch)
  • 2× AMD EPYC 7763 (64C each)
  • 1 TB DDR4-3200 ECC RAM
  • 15 TB NVMe Gen4 local scratch
  • 2× ConnectX-6 HDR 200 Gb/s + 2× 100 GbE mgmt
  • ~6.5 kW typical; air-cooled compatible

Cluster sizing reference

What customers actually buy, and what they get from each tier.

Pilot

1 node · 8 GPUs

Use: Inference for Llama 70B + RAG for one department. Or fine-tune a 7-13B model. Or POC.

Throughput: ~30K inference tokens/sec on Llama 3.1 70B with batching.

Standard

3 nodes · 24 GPUs

Use: Production inference for an enterprise (multiple LLMs, RAG, embeddings). Or train a 13-34B model.

Throughput: ~90K inference tokens/sec aggregate; train 13B model end-to-end in ~30 days.

Sovereign Pod

8 nodes · 64 GPUs

Use: National-scale sovereign LLM hosting. Multi-tenant inference for a ministry or telco.

Throughput: Train a 70B model in ~60 days; serve 10× the standard inference load.

Custom

16-128+ nodes · 128-1,000+ GPUs

Use: Frontier-model training, GPU-as-a-service for telcos selling AI to their enterprise base, multi-region sovereign HPC.

Custom-engineered. Fabric, storage, power, and cooling designed per deployment.

What you can run on HyperAI today

Validated workloads — not theoretical. Real numbers from MomentumX customer pilots.

WorkloadClusterIndicative performance
Llama 3.1 70B inference (FP8, bsz 32)1 node H100×8~30K tok/s aggregate, <120 ms TTFT
Llama 3.1 70B inference (FP8, bsz 64)3 nodes H100×24~90K tok/s aggregate, <90 ms TTFT
Mixtral 8×22B inference1 node H100×8~22K tok/s, FP8 quant
Falcon 180B inference3 nodes H100×24~14K tok/s, BF16
Llama 13B fine-tune (LoRA, 10B tokens)3 nodes H100×24~36 hours wall-clock
Stable Diffusion XL inference1 node A100×8~80 images/sec at 1024×1024
RAG pipeline (embeddings + retrieval + LLM)1 node H100×8~5K end-to-end RAG queries/sec

Numbers are indicative on production hardware with TensorRT-LLM / vLLM + NCCL-tuned fabric. Your workload, batch shape, sequence length, and quantisation will move them — sometimes meaningfully. We benchmark on your workload during the 14-day POC.

Need the spec sheet for procurement?

We have a one-page PDF version of this content with our architect’s contact details, cluster pricing model, and SAMA / NCA / PDPL compliance pointers — designed to drop into your tender folder. Ask, and we’ll send it the same day.

Request the HyperAI spec sheet
Or apply for a 14-day POC