Running Kubernetes control planes as pods with Kamaji

I am working on a managed Kubernetes platform and one of the first problems I had to solve was: where does the control plane live? In most managed Kubernetes setups — including the big cloud providers — each customer gets dedicated VMs running etcd, kube-apiserver, and kube-controller-manager. This works but it has a few problems that bothered me from the start.

First, it is expensive. Three VMs per cluster minimum, just to run the control plane. Even small VMs add up fast when you have many customers. Second, and more importantly, those VMs exist on the network. Customers can see them, try to reach them, and potentially probe them. If your networking is ever misconfigured, the control plane is exposed. For a platform where security is a core requirement, having control plane VMs that exist and are reachable felt wrong.

I wanted a different model: a control plane that is architecturally unreachable, not just firewalled.

After researching different approaches I found Kamaji by Clastix. The idea is simple but powerful. Instead of running each customer's control plane on dedicated VMs, Kamaji runs them as pods inside a management cluster. Each customer gets their own etcd and kube-apiserver, but they run as Kubernetes workloads — not as separate VMs. From the customer's perspective they get a kubeconfig and a working cluster. From the infrastructure perspective, the control plane is just a set of pods inside a cluster they have no access to.

This is called a TenantControlPlane in Kamaji's model:

apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
  name: customer1
  namespace: kamaji-system
spec:
  dataStore: default
  controlPlane:
    deployment:
      replicas: 2
      additionalMetadata:
        labels:
          customer: customer1
    service:
      serviceType: ClusterIP
  kubernetes:
    version: "1.31.0"
    kubelet:
      cgroupfs: systemd
  networkProfile:
    port: 6443

When you apply this, Kamaji creates the etcd pods, the kube-apiserver pods, and the controller manager pods all inside the management cluster. No VMs provisioned, no separate machines to manage. The control plane exists only as Kubernetes workloads.

The problem this creates is: how does a customer connect to their API server? The pods are inside the management cluster on a ClusterIP service. Customers have no route there.

The solution I am using is an Envoy proxy VM that sits between the outside world and the management cluster. It listens on port 6443 and uses SNI — Server Name Indication — to route each connection to the correct customer's API server pods without terminating the TLS connection. Envoy reads the hostname from the TLS handshake and proxies the connection through to the right Kamaji pod. The TLS session goes end-to-end between the client and the kube-apiserver pod. Envoy never sees the decrypted traffic.

However not all traffic to the API server is the same. Customer kubectl comes from the internet. Worker nodes come from the internal network. There is no reason to route worker node traffic over a public IP, and there are good security reasons not to. So the Envoy proxy VM has two listeners on port 6443 — one on the public IP for customer kubectl access, and one on a private IP for worker node traffic:

listeners:
  # Public — customer kubectl access from the internet
  - name: apiserver_public
    address: 203.0.113.10:6443
    listener_filters:
      - name: tls_inspector
    filter_chains:
      - filter_chain_match:
          server_names: ["customer1.k8s.example.com"]
        filters:
          - name: envoy.filters.network.tcp_proxy
            typed_config:
              cluster: customer1_apiserver

  # Private — worker node bootstrap and API traffic
  - name: apiserver_private
    address: 10.0.0.100:6443    # internal management network
    listener_filters:
      - name: tls_inspector
    filter_chains:
      - filter_chain_match:
          server_names: ["customer1.k8s.example.com"]
        filters:
          - name: envoy.filters.network.tcp_proxy
            typed_config:
              cluster: customer1_apiserver

Both listeners route to the same Kamaji pod backends. The kubeconfig issued to customers points at the public DNS name. The machine config generated for worker nodes points at the private IP. Two different entry points, same destination.

The security benefit of this split is significant. Worker nodes never need internet access to reach their API server. You can firewall the worker VMs so they have no outbound internet connectivity at all — they only need routes to the internal networks they actually use. That is a much cleaner security posture than having every worker node talking to a public IP just to reach its own control plane.

From the customer's perspective this looks identical to any other managed Kubernetes cluster. They get a kubeconfig, they run kubectl get nodes, it works. What they cannot see is that their API server is a pod running inside a cluster they have no access to, reachable only through a proxy that does not terminate their TLS session.

What I find interesting about this architecture is that the control plane is not just firewalled — it literally does not exist as a reachable VM. There is no IP address a customer could discover and probe. The only way to interact with the API server is through the SNI proxy, and that only routes connections that present the correct hostname.

I am still in the planning and research phase for this part of the platform. The next step for me is to get Kamaji running in a lab environment and validate the full provisioning flow — Kamaji creates the TenantControlPlane, Envoy routes to it, worker nodes join via Cluster API. I will write a follow-up post once I have that working with the actual commands and what I ran into along the way.

If you are building something similar or have experience running Kamaji in production I would be interested to hear about it.