Skip to content

Platform Overview

Cyberun Cloud is not just a collection of servers; it is a cohesive, software-defined operating system for global infrastructure. We abstract the complexity of disparate bare-metal providers into a unified, resilient, and sovereign cloud surface.

This document outlines the architectural philosophy and the physical topology that powers our platform.

The Cyberun Philosophy

Traditional multi-cloud strategies often rely on the lowest common denominator (e.g., just using VMs), losing the advanced features of each provider. Cyberun takes a different approach: We bring our own cloud stack.

  • Commodity Hardware, Premium Software: We treat physical servers as interchangeable commodities. All intelligence—routing, storage, identity, and security—is strictly defined in software.
  • Federation by Default: We do not build standalone clusters that need manual syncing. We build a single "Federated Control Plane" that treats the entire world as one resource pool.
  • GitOps Everything: If it’s not in Git, it doesn’t exist. From the underlying OS configuration (Ansible) to the application layer (FluxCD), every state is version-controlled.

Architectural Topology

To achieve the balance between performance, cost, and resilience, we divide our infrastructure into three strategic tiers. This is what we call the "Brain-Muscle-Vault" Topology.

System Landscape

Our architecture spans the full stack, from Edge GeoDNS down to NVMe storage. For the best viewing experience, we have moved the high-resolution topology map to a dedicated page.

🗺️ View Full-Screen Topology

1. The Brain: Control Plane (Carrier)

  • Location: Tokyo
  • Role: Management, Scheduling, Identity.
  • Design:
  • This cluster runs the Karmada federation engine, Rancher dashboard, and Keycloak identity provider.
  • Why Separate? By isolating the "Brain" from heavy application workloads, we ensure the management interface is always responsive, even if the compute nodes are under 100% load.
  • Storage: Uses local Longhorn storage for high-speed metadata access and system state stability.

2. The Muscle: Compute Plane (Destroyer & Aegis)

  • Location: New York (Destroyer) & Nuremberg (Aegis)
  • Role: Application Workloads, AI Training, Data Processing.
  • Design:
  • These are high-performance bare-metal or dedicated core instances.
  • Stateless Nature: Compute nodes are designed to be "disposable." If a node fails, workloads are instantly rescheduled. They do not hold persistent data locally.
  • Specialization:
    • Destroyer: General-purpose microservices and web applications.
    • Aegis: Specialized GPU nodes for AI inference and training tasks.

3. The Vault: Storage Plane (Auxiliary)

  • Location: New York (Low-latency link to Destroyer)
  • Role: Persistent Data, Object Storage, Disaster Recovery.
  • Design:
  • Runs Rook-Ceph in a highly redundant configuration.
  • Storage-Compute Separation: We utilize high-speed links to connect the "Compute Plane (Destroyer)" and "Storage Plane (Auxiliary)." This architecture enables independent scaling of compute and storage resources, significantly reducing Total Cost of Ownership (TCO) while providing PB-scale elasticity with stable ~1.0ms I/O latency。
  • Benefit: You can scale compute (CPU/RAM) and storage (Disk) independently. Heavy DB I/O does not steal CPU cycles from your application.

The Tech Stack

We stand on the shoulders of open-source giants to deliver this architecture.

Component Technology Why We Chose It
Orchestration Karmada Allows us to schedule applications across clusters as if they were one.
Networking Cilium (eBPF) Provides high-performance networking, security policies, and observability without sidecars.
Connectivity WireGuard Creates a secure, full-mesh encrypted tunnel between all global nodes.
Storage Rook-Ceph The industry standard for software-defined storage. Self-healing and infinitely scalable.
Edge Routing HAProxy + Keepalived A battle-tested, unbreakable front door for ingress traffic.
Automation FluxCD Ensures our clusters are always in sync with our Git repositories.

How Traffic Flows

graph TD
    %% Style Definitions: High Contrast Pastel
    classDef user fill:#ffffff,stroke:#333333,stroke-width:2px,color:#000000;
    classDef edge fill:#e0f2f1,stroke:#00695c,stroke-width:2px,color:#000000;
    classDef core fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000000;
    classDef data fill:#fff8e1,stroke:#fbc02d,stroke-width:2px,color:#000000;

    User((User Request)):::user
    DNS[GeoDNS Resolver]:::edge

    subgraph EdgeLayer [Edge Access Layer]
        direction TB
        LB[HAProxy Load Balancer]:::edge
        WG[WireGuard Tunnel]:::edge
    end

    subgraph ClusterLayer [Kubernetes Cluster]
        direction TB
        Ingress[Cilium Ingress eBPF]:::core
        Svc[Service Mesh]:::core
        Pod[App Workload]:::core
    end

    subgraph StorageLayer [Persistence]
        direction TB
        RBD[Ceph RBD Volume]:::data
    end

    %% Main Flow (Thick Lines)
    User ==>|1. Request api.cyberun.cloud| DNS
    DNS ==>|2. Resolve Nearest IP| LB
    LB ==>|3. Scrub & Forward| Ingress
    Ingress ==>|4. L7 Routing| Svc
    Svc ==>|5. Load Balance| Pod
    Pod <==>|6. High-Speed I/O| RBD

    %% Cross-Region Flow (Dotted Lines)
    LB -.->|Cross-Region Traffic| WG
    WG -.->|Encrypted Transit| LB
  1. Ingress: A user request hits our HAProxy load balancer (Anycast/GeoDNS).
  2. Routing: HAProxy forwards the request to the Cilium Ingress controller inside the nearest cluster.
  3. Service Mesh: If the service needs to talk to a database in another region, traffic flows securely over the WireGuard mesh, optimized by Cilium's native routing.
  4. Persistence: If data needs to be written, it travels over the high-speed link to the Ceph storage cluster, where it is replicated for safety.

Operational Excellence

Architecture is not just about building; it's about running. We treat Day-2 operations as first-class citizens of the platform:

  • Full-Stack Observability: Every Pod and every WireGuard tunnel is instrumented by default. See Full-Stack Observability.
  • Zero-Downtime Maintenance: From kernel patching to K8s upgrades, our architecture ensures business continuity. See Zero-Downtime Maintenance.
  • Designed for Failure: We don't assume hardware is reliable; we assume it will break. See Disaster Recovery Runbook.

Navigate to the Infrastructure section for deep dives into specific components.