Detailed Summary
This phase of the cluster configuration establishes the foundation for both robust external traffic routing and a fault-tolerant administrative brain. Because this is a bare-metal/local environment, it lacks the native managed load balancers provided by cloud vendors (like AWS ALBs). Therefore, we must implement our own Layer 4 and Layer 7 routing layers.
First, we utilize MetalLB operating in Layer 2 mode to act as our local network load balancer, provisioning Virtual IPs (VIPs) via ARP. We bind this VIP to an NGINX Ingress Controller, which serves as the "smart" Layer 7 router, directing HTTP/HTTPS traffic to internal microservices based on decoupled Ingress routing rules.
Second, we protect the cluster from a single point of failure by establishing a High Availability (HA) Control Plane using a 3-node stacked topology. This ensures the etcd database maintains a quorum (a strict voting majority) even if a node crashes. To route internal and kubectl administrative traffic seamlessly, a secondary VIP sits in front of the API servers, managed by an internal load balancing mechanism (such as HAProxy + Keepalived or kube-vip), ensuring traffic is always sent to a healthy control plane node.
1. External Traffic Management: MetalLB and NGINX Ingress
To expose internal Kubernetes Services to the local network, two distinct components work together to handle the physical network connection and the application-level routing.
MetalLB: The Layer 4 Load Balancer
- Role: Acts as the network traffic cop. It listens for Kubernetes
Serviceobjects of typeLoadBalancerand assigns them an IP from a predefinedIPAddressPool(e.g.,192.168.1.200-192.168.1.210). - Mechanism (Layer 2 Mode): MetalLB deploys a
speakerpod on every worker node. The node elected as the leader for a specific VIP "shouts" out to the local network router using ARP (Address Resolution Protocol), mapping the VIP (e.g.,192.168.1.201) to that specific node's physical MAC address. - Failover: If the node holding the VIP crashes, another node's
speakerinstantly assumes ownership and broadcasts its own MAC address to the router.
NGINX Ingress Controller: The Layer 7 Router
- Role: Acts as the cluster's smart gateway for HTTP/HTTPS traffic.
- Mechanism: NGINX is exposed to the local network via the MetalLB VIP. When a packet arrives at the node holding the VIP, it is handed to the NGINX pod.
- Routing Rules: NGINX continuously monitors the cluster for
Ingressobjects (individual YAML files that define routing rules). It reads the requested host (e.g.,myapp.local) or path (e.g.,/api) from the incoming web request and forwards the traffic to the corresponding internalClusterIPService.
Traffic Flow: Client → Router → MetalLB VIP → NGINX Pod → Internal ClusterIP Service → Application Pod
2. High Availability Control Plane (Stacked etcd)
To prevent the cluster from collapsing if the master node fails, the control plane is distributed across three physical or virtual machines.
The 3-Node Quorum Requirement
The cluster's state is stored in a distributed key-value database called etcd. To prevent data corruption or "split-brain" scenarios, etcd uses the Raft consensus algorithm, which requires a strict majority (quorum) to approve any read or write.
- 1 Node: No fault tolerance.
- 2 Nodes: If one dies, the remaining node has 50% of the votes. Quorum is lost; the cluster freezes to protect data.
- 3 Nodes: If one dies, the remaining two nodes have 66% of the votes. Quorum is maintained, and the cluster remains fully operational.
The Control Plane VIP and Load Balancing
Because there are three identical kube-apiserver instances, worker nodes and administrators cannot point their kubeconfig to a single node's IP.
- The VIP: A Virtual IP (e.g.,
192.168.1.10) is created exclusively for the control plane. - Keepalived / HAProxy (or kube-vip): These tools manage the VIP. They hold the IP via ARP and load-balance incoming API requests (Port
6443) in a round-robin fashion across the three internal control plane IP addresses.
The Stacked etcd Write Flow
In a "stacked" topology, each control plane node runs its own kube-apiserver and its own local etcd instance. When an event occurs (e.g., creating a Pod), the data flows as follows:
- Request Arrival: The request hits the Control Plane VIP and is routed to one of the API servers (e.g., Node 2).
-
Local Handoff: Node 2's API server validates the request and attempts to save it by talking only to its local
etcdinstance (127.0.0.1:2379). -
Raft Consensus:
- Node 2's
etcdforwards the write request to the electedetcdLeader (e.g., Node 1). - The Leader broadcasts the request to all followers.
- Once a majority of nodes acknowledge the data, the Leader permanently commits the write.
- Node 2's
-
Confirmation: The local
etcdon Node 2 informs its API server that the data is safely committed, and the API server returns a success response to the user.
Key Takeaways
- MetalLB + NGINX Ingress provides a complete L4/L7 load balancing solution for bare-metal clusters
- 3-node etcd quorum is the minimum requirement for true fault tolerance
- Control Plane VIP ensures seamless API access even during node failures
- Stacked topology simplifies management by co-locating etcd with the control plane
Related Resources
MetalLB Documentation →
NGINX Ingress Controller Docs →
Kubernetes HA Topology →