HyperAI Architecture

How a sovereign GPU cluster is actually built.

A HyperAI cluster is not a stack of GPUs in a rack. It’s a coordinated system: GPU compute, NVLink and InfiniBand fabrics, parallel storage, sovereign control plane, and a software stack that makes the whole thing feel like one supercomputer. This page is what that looks like in production for a MENA enterprise.

Apply for a 14-day POC
View GPU specifications

The four planes of a HyperAI cluster

Compute, fabric, storage, and control — engineered as one system, not four products bolted together.

Compute plane — NVIDIA H100 / H200 / A100

Up to 24 SXM5/SXM4 GPUs in a single cluster, configured per workload. NVLink + NVSwitch for intra-node bandwidth (900 GB/s on H100/H200), giving model parallelism real headroom. Mix-and-match SKUs supported — train on H200, inference on H100, dev on A100.

Fabric plane — non-blocking InfiniBand

NDR (400 Gb/s) InfiniBand HDR fabric between nodes, designed non-blocking for collective operations (NCCL all-reduce). Latency under 2 µs node-to-node. RoCEv2 Ethernet option for customers standardised on Ethernet — same throughput, slightly higher latency.

Storage plane — parallel + tiered

WekaFS or BeeGFS for training data (multi-GB/s per node, parallel I/O). NVMe-oF for inference KV-cache. Object storage (S3-compatible Ceph or MinIO) for cold artefacts and model registry. Sovereign — all tiers in-country.

Control plane — Kubernetes + Slurm + OpenStack

Slurm for batch training jobs. Kubernetes (with NVIDIA GPU Operator + KubeRay) for inference, RAG, and dev workloads. OpenStack Nova/Ironic for bare-metal lifecycle. Customer-managed KMS, SAML/OIDC auth, full audit trail.

Reference cluster topology — 24-GPU sovereign training pod

The sizing customers actually deploy for production AI in Egypt and KSA.

Compute3× DGX-class nodes, 8× H100 SXM5 80 GB each (24 GPUs total), 2 TB DDR5 RAM per node, 2× Intel Xeon Platinum 8480+ CPUs.

Fabric4× ConnectX-7 NDR (400 Gb/s) per node — 1.6 Tb/s aggregate per node. Non-blocking IB switch (Quantum-2). RoCEv2 alt available.

Storage500 TB WekaFS hot tier (NVMe), 2 PB Ceph cold tier (HDD/QLC), NVMe-oF KV-cache (50 TB) for inference.

Control3× management nodes (HA Slurm + Kubernetes control plane), customer-managed Vault for KMS, dual-stack networking IPv4 + IPv6.

Power + cooling~36 kW per compute node (108 kW pod), liquid-assisted air or direct liquid cooling at the customer DC or Raya DC.

SovereigntyAll four planes in Cairo or Riyadh. No traffic traverses an OpenAI / AWS / Azure region. Customer holds keys.

The software stack

Everything you’d expect from a sovereign GPU cloud — pre-integrated and supported.

Orchestration

Slurm 23.x · Kubernetes 1.30 · Ray 2.x · NVIDIA GPU Operator · NVIDIA Network Operator · KubeRay

Frameworks

PyTorch 2.x · TensorFlow 2.x · JAX · DeepSpeed · Megatron-LM · NeMo · vLLM · TensorRT-LLM · TGI

Models supported

Llama 3 / 3.1 · Mistral · Mixtral · Falcon · Jais · AceGPT · Qwen · DeepSeek · Stable Diffusion XL · custom fine-tunes

Storage

WekaFS · BeeGFS · Lustre · Ceph (S3-compatible) · MinIO · NVMe-oF · POSIX gateways

Networking

NVIDIA Quantum-2 InfiniBand NDR · ConnectX-7 · BlueField DPUs · Cumulus / SONiC for Ethernet · NCCL / RCCL

Observability

NVIDIA DCGM · Prometheus + Grafana · Loki for logs · OpenTelemetry · per-GPU utilization, per-job cost attribution

Security

Customer-managed KMS (Vault) · BYOK · TPM-backed attestation · network microsegmentation · SAML/OIDC · full audit log

Deployment footprints — pick what fits your constraints

Not every workload runs in a hyperscale region. Sovereign means we deploy where your data lives.

Footprint A

MomentumX-managed in Cairo / Riyadh

You consume the cluster as-a-service from MomentumX-operated facilities. Fastest time-to-first-token (typically 14 days from contract). Same sovereignty guarantees, lower CAPEX.

Footprint B

Customer DC, MomentumX-operated

Cluster lives in your datacentre — Raya DC, Mobily, STC, or your private facility. We install, integrate, and run it. You hold physical control. Most regulated customers choose this.

Footprint C

Air-gapped sovereign deployment

No outbound network from the cluster. All updates, models, and patches delivered via signed offline bundle. For defence, intelligence, central bank, and high-classification ministries.

Want the architecture review walkthrough?

Bring your model size, your data classification, and your latency targets. We’ll size a cluster, sketch the fabric, and tell you whether HyperAI is the right tool — or whether you should keep your existing stack and add a small inference pod first.

Apply for a 14-day POC
Book an architecture review