HyperAI Architecture

How a sovereign GPU cluster is actually built.

A HyperAI cluster is not a stack of GPUs in a rack. It’s a coordinated system: GPU compute, NVLink and InfiniBand fabrics, parallel storage, sovereign control plane, and a software stack that makes the whole thing feel like one supercomputer. This page is what that looks like in production for a MENA enterprise.

The four planes of a HyperAI cluster

Compute, fabric, storage, and control — engineered as one system, not four products bolted together.

01

Compute plane — NVIDIA H100 / H200 / A100

Up to 24 SXM5/SXM4 GPUs in a single cluster, configured per workload. NVLink + NVSwitch for intra-node bandwidth (900 GB/s on H100/H200), giving model parallelism real headroom. Mix-and-match SKUs supported — train on H200, inference on H100, dev on A100.

02

Fabric plane — non-blocking InfiniBand

NDR (400 Gb/s) InfiniBand HDR fabric between nodes, designed non-blocking for collective operations (NCCL all-reduce). Latency under 2 µs node-to-node. RoCEv2 Ethernet option for customers standardised on Ethernet — same throughput, slightly higher latency.

03

Storage plane — parallel + tiered

WekaFS or BeeGFS for training data (multi-GB/s per node, parallel I/O). NVMe-oF for inference KV-cache. Object storage (S3-compatible Ceph or MinIO) for cold artefacts and model registry. Sovereign — all tiers in-country.

04

Control plane — Kubernetes + Slurm + OpenStack

Slurm for batch training jobs. Kubernetes (with NVIDIA GPU Operator + KubeRay) for inference, RAG, and dev workloads. OpenStack Nova/Ironic for bare-metal lifecycle. Customer-managed KMS, SAML/OIDC auth, full audit trail.

Reference cluster topology — 24-GPU sovereign training pod

The sizing customers actually deploy for production AI in Egypt and KSA.

HyperAI 24-GPU Sovereign Training Pod — Reference TopologyThree compute nodes with 8 NVLink-meshed GPUs each, connected through a non-blocking InfiniBand fabric, sharing a tiered storage plane and a high-availability control plane, all within a sovereign boundary in Cairo or Riyadh.SOVEREIGN BOUNDARY · CAIRO / RIYADHHyperAI 24-GPU Sovereign Training Pod3 nodes · 24 H100/H200 GPUs · 1.6 Tb/s per node IB · Customer-managed keysCONTROL PLANESlurm 23 · Kubernetes 1.30 · OpenStack Nova/Ironic · Vault KMS · SAML/OIDC · DCGM monitoringNode 1 — DGX-class2× Xeon 8480+ · 2 TB DDR5 · 30 TB NVMeGPUGPUGPUGPUGPUGPUGPUGPUNVLink + NVSwitch · 900 GB/s meshNode 2 — DGX-class2× Xeon 8480+ · 2 TB DDR5 · 30 TB NVMeGPUGPUGPUGPUGPUGPUGPUGPUNVLink + NVSwitch · 900 GB/s meshNode 3 — DGX-class2× Xeon 8480+ · 2 TB DDR5 · 30 TB NVMeGPUGPUGPUGPUGPUGPUGPUGPUNVLink + NVSwitch · 900 GB/s meshNDR IBNDR IBFABRIC PLANENVIDIA Quantum-2 NDR 400 Gb/s · non-blocking · NCCL-tuned · < 2 µs node-to-node · RoCEv2 altHOT TIERWekaFS / BeeGFS500 TB NVMe · multi-GB/s parallel I/OTraining data + active checkpointsINFERENCE TIERNVMe-oF50 TB · KV-cache + model weightsProduction inference + RAGCOLD TIERCeph / MinIO2 PB · S3-compatible · model registryArchive + datasets + artefactsNo traffic traverses an external hyperscaler region · customer holds keys · audit trail end-to-end
Compute3× DGX-class nodes, 8× H100 SXM5 80 GB each (24 GPUs total), 2 TB DDR5 RAM per node, 2× Intel Xeon Platinum 8480+ CPUs.
Fabric4× ConnectX-7 NDR (400 Gb/s) per node — 1.6 Tb/s aggregate per node. Non-blocking IB switch (Quantum-2). RoCEv2 alt available.

Storage500 TB WekaFS hot tier (NVMe), 2 PB Ceph cold tier (HDD/QLC), NVMe-oF KV-cache (50 TB) for inference.
Control3× management nodes (HA Slurm + Kubernetes control plane), customer-managed Vault for KMS, dual-stack networking IPv4 + IPv6.

Power + cooling~36 kW per compute node (108 kW pod), liquid-assisted air or direct liquid cooling at the customer DC or Raya DC.
SovereigntyAll four planes in Cairo or Riyadh. No traffic traverses an OpenAI / AWS / Azure region. Customer holds keys.

The software stack

Everything you’d expect from a sovereign GPU cloud — pre-integrated and supported.

Orchestration
Slurm 23.x · Kubernetes 1.30 · Ray 2.x · NVIDIA GPU Operator · NVIDIA Network Operator · KubeRay
Frameworks
PyTorch 2.x · TensorFlow 2.x · JAX · DeepSpeed · Megatron-LM · NeMo · vLLM · TensorRT-LLM · TGI
Models supported
Llama 3 / 3.1 · Mistral · Mixtral · Falcon · Jais · AceGPT · Qwen · DeepSeek · Stable Diffusion XL · custom fine-tunes
Storage
WekaFS · BeeGFS · Lustre · Ceph (S3-compatible) · MinIO · NVMe-oF · POSIX gateways
Networking
NVIDIA Quantum-2 InfiniBand NDR · ConnectX-7 · BlueField DPUs · Cumulus / SONiC for Ethernet · NCCL / RCCL
Observability
NVIDIA DCGM · Prometheus + Grafana · Loki for logs · OpenTelemetry · per-GPU utilization, per-job cost attribution
Security
Customer-managed KMS (Vault) · BYOK · TPM-backed attestation · network microsegmentation · SAML/OIDC · full audit log

Deployment footprints — pick what fits your constraints

Not every workload runs in a hyperscale region. Sovereign means we deploy where your data lives.

Footprint A

MomentumX-managed in Cairo / Riyadh

You consume the cluster as-a-service from MomentumX-operated facilities. Fastest time-to-first-token (typically 14 days from contract). Same sovereignty guarantees, lower CAPEX.

Footprint B

Customer DC, MomentumX-operated

Cluster lives in your datacentre — Raya DC, Mobily, STC, or your private facility. We install, integrate, and run it. You hold physical control. Most regulated customers choose this.

Footprint C

Air-gapped sovereign deployment

No outbound network from the cluster. All updates, models, and patches delivered via signed offline bundle. For defence, intelligence, central bank, and high-classification ministries.

Want the architecture review walkthrough?

Bring your model size, your data classification, and your latency targets. We’ll size a cluster, sketch the fabric, and tell you whether HyperAI is the right tool — or whether you should keep your existing stack and add a small inference pod first.

Apply for a 14-day POC
Book an architecture review