Platform Overview
Cyberun Cloud is not just a collection of servers; it is a cohesive, software-defined operating system for global infrastructure. We abstract the complexity of disparate bare-metal providers into a unified, resilient, and sovereign cloud surface.
This document outlines the architectural philosophy and the physical topology that powers our platform.
The Cyberun Philosophy
Traditional multi-cloud strategies often rely on the lowest common denominator (e.g., just using VMs), losing the advanced features of each provider. Cyberun takes a different approach: We bring our own cloud stack.
- Commodity Hardware, Premium Software: We treat physical servers as interchangeable commodities. All intelligence—routing, storage, identity, and security—is strictly defined in software.
- Federation by Default: We do not build standalone clusters that need manual syncing. We build a single "Federated Control Plane" that treats the entire world as one resource pool.
- GitOps Everything: If it’s not in Git, it doesn’t exist. From the underlying OS configuration (Ansible) to the application layer (FluxCD), every state is version-controlled.
Architectural Topology
To achieve the balance between performance, cost, and resilience, we divide our infrastructure into three strategic tiers. This is what we call the "Brain-Muscle-Vault" Topology.
System Landscape
Our architecture spans the full stack, from Edge GeoDNS down to NVMe storage. For the best viewing experience, we have moved the high-resolution topology map to a dedicated page.
1. The Brain: Control Plane (Carrier)
- Location: Tokyo
- Role: Management, Scheduling, Identity.
- Design:
- This cluster runs the Karmada federation engine, Rancher dashboard, and Keycloak identity provider.
- Why Separate? By isolating the "Brain" from heavy application workloads, we ensure the management interface is always responsive, even if the compute nodes are under 100% load.
- Storage: Uses local Longhorn storage for high-speed metadata access and system state stability.
2. The Muscle: Compute Plane (Destroyer & Aegis)
- Location: New York (Destroyer) & Nuremberg (Aegis)
- Role: Application Workloads, AI Training, Data Processing.
- Design:
- These are high-performance bare-metal or dedicated core instances.
- Stateless Nature: Compute nodes are designed to be "disposable." If a node fails, workloads are instantly rescheduled. They do not hold persistent data locally.
- Specialization:
- Destroyer: General-purpose microservices and web applications.
- Aegis: Specialized GPU nodes for AI inference and training tasks.
3. The Vault: Storage Plane (Auxiliary)
- Location: New York (Low-latency link to Destroyer)
- Role: Persistent Data, Object Storage, Disaster Recovery.
- Design:
- Runs Rook-Ceph in a highly redundant configuration.
- Storage-Compute Separation: We utilize high-speed links to connect the "Compute Plane (Destroyer)" and "Storage Plane (Auxiliary)." This architecture enables independent scaling of compute and storage resources, significantly reducing Total Cost of Ownership (TCO) while providing PB-scale elasticity with stable ~1.0ms I/O latency。
- Benefit: You can scale compute (CPU/RAM) and storage (Disk) independently. Heavy DB I/O does not steal CPU cycles from your application.
The Tech Stack
We stand on the shoulders of open-source giants to deliver this architecture.
| Component | Technology | Why We Chose It |
|---|---|---|
| Orchestration | Karmada | Allows us to schedule applications across clusters as if they were one. |
| Networking | Cilium (eBPF) | Provides high-performance networking, security policies, and observability without sidecars. |
| Connectivity | WireGuard | Creates a secure, full-mesh encrypted tunnel between all global nodes. |
| Storage | Rook-Ceph | The industry standard for software-defined storage. Self-healing and infinitely scalable. |
| Edge Routing | HAProxy + Keepalived | A battle-tested, unbreakable front door for ingress traffic. |
| Automation | FluxCD | Ensures our clusters are always in sync with our Git repositories. |
How Traffic Flows
graph TD
%% Style Definitions: High Contrast Pastel
classDef user fill:#ffffff,stroke:#333333,stroke-width:2px,color:#000000;
classDef edge fill:#e0f2f1,stroke:#00695c,stroke-width:2px,color:#000000;
classDef core fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000000;
classDef data fill:#fff8e1,stroke:#fbc02d,stroke-width:2px,color:#000000;
User((User Request)):::user
DNS[GeoDNS Resolver]:::edge
subgraph EdgeLayer [Edge Access Layer]
direction TB
LB[HAProxy Load Balancer]:::edge
WG[WireGuard Tunnel]:::edge
end
subgraph ClusterLayer [Kubernetes Cluster]
direction TB
Ingress[Cilium Ingress eBPF]:::core
Svc[Service Mesh]:::core
Pod[App Workload]:::core
end
subgraph StorageLayer [Persistence]
direction TB
RBD[Ceph RBD Volume]:::data
end
%% Main Flow (Thick Lines)
User ==>|1. Request api.cyberun.cloud| DNS
DNS ==>|2. Resolve Nearest IP| LB
LB ==>|3. Scrub & Forward| Ingress
Ingress ==>|4. L7 Routing| Svc
Svc ==>|5. Load Balance| Pod
Pod <==>|6. High-Speed I/O| RBD
%% Cross-Region Flow (Dotted Lines)
LB -.->|Cross-Region Traffic| WG
WG -.->|Encrypted Transit| LB
- Ingress: A user request hits our HAProxy load balancer (Anycast/GeoDNS).
- Routing: HAProxy forwards the request to the Cilium Ingress controller inside the nearest cluster.
- Service Mesh: If the service needs to talk to a database in another region, traffic flows securely over the WireGuard mesh, optimized by Cilium's native routing.
- Persistence: If data needs to be written, it travels over the high-speed link to the Ceph storage cluster, where it is replicated for safety.
Operational Excellence
Architecture is not just about building; it's about running. We treat Day-2 operations as first-class citizens of the platform:
- Full-Stack Observability: Every Pod and every WireGuard tunnel is instrumented by default. See Full-Stack Observability.
- Zero-Downtime Maintenance: From kernel patching to K8s upgrades, our architecture ensures business continuity. See Zero-Downtime Maintenance.
- Designed for Failure: We don't assume hardware is reliable; we assume it will break. See Disaster Recovery Runbook.
Navigate to the Infrastructure section for deep dives into specific components.