A HyperAI cluster is not a stack of GPUs in a rack. It’s a coordinated system: GPU compute, NVLink and InfiniBand fabrics, parallel storage, sovereign control plane, and a software stack that makes the whole thing feel like one supercomputer. This page is what that looks like in production for a MENA enterprise.
Compute, fabric, storage, and control — engineered as one system, not four products bolted together.
01
Compute plane — NVIDIA H100 / H200 / A100
Up to 24 SXM5/SXM4 GPUs in a single cluster, configured per workload. NVLink + NVSwitch for intra-node bandwidth (900 GB/s on H100/H200), giving model parallelism real headroom. Mix-and-match SKUs supported — train on H200, inference on H100, dev on A100.
02
Fabric plane — non-blocking InfiniBand
NDR (400 Gb/s) InfiniBand HDR fabric between nodes, designed non-blocking for collective operations (NCCL all-reduce). Latency under 2 µs node-to-node. RoCEv2 Ethernet option for customers standardised on Ethernet — same throughput, slightly higher latency.
03
Storage plane — parallel + tiered
WekaFS or BeeGFS for training data (multi-GB/s per node, parallel I/O). NVMe-oF for inference KV-cache. Object storage (S3-compatible Ceph or MinIO) for cold artefacts and model registry. Sovereign — all tiers in-country.
04
Control plane — Kubernetes + Slurm + OpenStack
Slurm for batch training jobs. Kubernetes (with NVIDIA GPU Operator + KubeRay) for inference, RAG, and dev workloads. OpenStack Nova/Ironic for bare-metal lifecycle. Customer-managed KMS, SAML/OIDC auth, full audit trail.
Reference cluster topology — 24-GPU sovereign training pod
The sizing customers actually deploy for production AI in Egypt and KSA.
Deployment footprints — pick what fits your constraints
Not every workload runs in a hyperscale region. Sovereign means we deploy where your data lives.
Footprint A
MomentumX-managed in Cairo / Riyadh
You consume the cluster as-a-service from MomentumX-operated facilities. Fastest time-to-first-token (typically 14 days from contract). Same sovereignty guarantees, lower CAPEX.
Footprint B
Customer DC, MomentumX-operated
Cluster lives in your datacentre — Raya DC, Mobily, STC, or your private facility. We install, integrate, and run it. You hold physical control. Most regulated customers choose this.
Footprint C
Air-gapped sovereign deployment
No outbound network from the cluster. All updates, models, and patches delivered via signed offline bundle. For defence, intelligence, central bank, and high-classification ministries.
Want the architecture review walkthrough?
Bring your model size, your data classification, and your latency targets. We’ll size a cluster, sketch the fabric, and tell you whether HyperAI is the right tool — or whether you should keep your existing stack and add a small inference pod first.