Skip to content

Global Cluster Federation

Cyberun Cloud redefines the standard for multi-cloud orchestration. Instead of simple "multi-cluster management," we have built a Hyper-Converged Federated Architecture with a Single Pane of Glass. Through the Karmada engine, we abstract physically dispersed heterogeneous resources into a logically unified computing pool.

Architecture Topology: Functional Plane Isolation

graph TD
    %% Style Definitions
    classDef control fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000;
    classDef compute fill:#ffebee,stroke:#c62828,stroke-width:2px,color:#000;
    classDef storage fill:#fff8e1,stroke:#fbc02d,stroke-width:2px,color:#000;

    subgraph Tokyo [Brain: Control Plane - Tokyo]
        direction TB
        K[Karmada API]:::control
        R[Rancher]:::control
        IAM[Keycloak]:::control
    end

    subgraph Compute [Muscle: Compute Plane - US/EU]
        direction TB
        D[Destroyer - NY]:::compute
        A[Aegis GPU - DE]:::compute
    end

    subgraph Storage [Vault: Storage Plane - NY]
        direction TB
        AUX[Auxiliary Ceph]:::storage
    end

    %% Connections
    K -->|Instruction Propagation| D
    K -->|Instruction Propagation| A
    D -->|Low-Latency Mount| AUX
    A -->|Model Data Read| AUX

To eliminate the risk of a "single failure domain," we implement strict physical and logical isolation strategies, ensuring that management traffic and business traffic do not interfere with each other.

1. The Federation Control Plane (The Carrier Cluster)

  • Role: The "Brain" of the system, hosted in the Tokyo high-availability zone.
  • Architectural Standards:
  • High Availability (HA): Core components (API Server, Scheduler, Controller Manager) are deployed in a 3-Replica mode, ensuring zero downtime for single points of failure.
  • Data Consistency: Integrates an internal etcd cluster backed by Longhorn persistent storage, providing strong consistency for metadata management.
  • Responsibility Boundaries: Handles only scheduling instructions and metadata distribution. It is strictly prohibited from running any user-space business containers, ensuring absolute responsiveness of the control plane under any load.

2. The Business Compute Plane (The Destroyer & Aegis Clusters)

  • Role: The "Muscle" of the system, distributed across New York and Nuremberg.
  • Architectural Standards:
  • Stateless Design: Compute nodes are designed as "Disposable" resources. Through the Descheduler component, the system continuously optimizes load distribution to avoid hotspots.
  • Hardware Affinity Scheduling:
    • Destroyer (General Compute): CPU-intensive cluster optimized for high-concurrency microservices.
    • Aegis (HPC): GPU cluster equipped with dedicated accelerators, ensuring exclusive execution of AI tasks via Taints & Tolerations mechanisms.

3. The Persistent Storage Plane (The Auxiliary Cluster)

  • Role: The "Vault" of the system, interconnected with the compute plane via low-latency links.
  • Architectural Standards:
  • Failure Isolation: The storage cluster runs independently of the compute cluster. Even if compute nodes experience Kernel Panic or resource exhaustion, the storage layer's OSDs (Object Storage Daemons) remain stable, ensuring zero data corruption.

Federation Scheduling Policy

Cyberun utilizes GitOps-defined PropagationPolicies to implement intelligent scheduling, rather than simple manual intervention.

  • Failover: When a member cluster's heartbeat is lost for more than the threshold (default 5 minutes), the federation control plane automatically evicts stateless workloads and reschedules them to a healthy cluster.
  • Cross-Region Redundancy: Critical services are configured with SpreadConstraint by default, forcing replicas to be distributed across different continents to withstand regional network outages.