Project: Networking & HA Setup

Detailed Summary

This phase of the cluster configuration establishes the foundation for both robust external traffic routing and a fault-tolerant administrative brain. Because this is a bare-metal/local environment, it lacks the native managed load balancers provided by cloud vendors (like AWS ALBs). Therefore, we must implement our own Layer 4 and Layer 7 routing layers.

First, we utilize MetalLB operating in Layer 2 mode to act as our local network load balancer, provisioning Virtual IPs (VIPs) via ARP. We bind this VIP to an NGINX Ingress Controller, which serves as the "smart" Layer 7 router, directing HTTP/HTTPS traffic to internal microservices based on decoupled Ingress routing rules.

Second, we protect the cluster from a single point of failure by establishing a High Availability (HA) Control Plane using a 3-node stacked topology. This ensures the etcd database maintains a quorum (a strict voting majority) even if a node crashes. To route internal and kubectl administrative traffic seamlessly, a secondary VIP sits in front of the API servers, managed by an internal load balancing mechanism (such as HAProxy + Keepalived or kube-vip), ensuring traffic is always sent to a healthy control plane node.

1. External Traffic Management: MetalLB and NGINX Ingress

To expose internal Kubernetes Services to the local network, two distinct components work together to handle the physical network connection and the application-level routing.

MetalLB: The Layer 4 Load Balancer

Role: Acts as the network traffic cop. It listens for Kubernetes Service objects of type LoadBalancer and assigns them an IP from a predefined IPAddressPool (e.g., 192.168.1.200-192.168.1.210).
Mechanism (Layer 2 Mode): MetalLB deploys a speaker pod on every worker node. The node elected as the leader for a specific VIP "shouts" out to the local network router using ARP (Address Resolution Protocol), mapping the VIP (e.g., 192.168.1.201) to that specific node's physical MAC address.
Failover: If the node holding the VIP crashes, another node's speaker instantly assumes ownership and broadcasts its own MAC address to the router.

NGINX Ingress Controller: The Layer 7 Router

Role: Acts as the cluster's smart gateway for HTTP/HTTPS traffic.
Mechanism: NGINX is exposed to the local network via the MetalLB VIP. When a packet arrives at the node holding the VIP, it is handed to the NGINX pod.
Routing Rules: NGINX continuously monitors the cluster for Ingress objects (individual YAML files that define routing rules). It reads the requested host (e.g., myapp.local) or path (e.g., /api) from the incoming web request and forwards the traffic to the corresponding internal ClusterIP Service.

Traffic Flow: Client → Router → MetalLB VIP → NGINX Pod → Internal ClusterIP Service → Application Pod

2. High Availability Control Plane (Stacked etcd)

To prevent the cluster from collapsing if the master node fails, the control plane is distributed across three physical or virtual machines.

The 3-Node Quorum Requirement

The cluster's state is stored in a distributed key-value database called etcd. To prevent data corruption or "split-brain" scenarios, etcd uses the Raft consensus algorithm, which requires a strict majority (quorum) to approve any read or write.

1 Node: No fault tolerance.
2 Nodes: If one dies, the remaining node has 50% of the votes. Quorum is lost; the cluster freezes to protect data.
3 Nodes: If one dies, the remaining two nodes have 66% of the votes. Quorum is maintained, and the cluster remains fully operational.

The Control Plane VIP and Load Balancing

Because there are three identical kube-apiserver instances, worker nodes and administrators cannot point their kubeconfig to a single node's IP.

The VIP: A Virtual IP (e.g., 192.168.1.10) is created exclusively for the control plane.
Keepalived / HAProxy (or kube-vip): These tools manage the VIP. They hold the IP via ARP and load-balance incoming API requests (Port 6443) in a round-robin fashion across the three internal control plane IP addresses.

The Stacked etcd Write Flow

In a "stacked" topology, each control plane node runs its own kube-apiserver and its own local etcd instance. When an event occurs (e.g., creating a Pod), the data flows as follows:

Request Arrival: The request hits the Control Plane VIP and is routed to one of the API servers (e.g., Node 2).
Local Handoff: Node 2's API server validates the request and attempts to save it by talking only to its local etcd instance (127.0.0.1:2379).
Raft Consensus:
- Node 2's etcd forwards the write request to the elected etcd Leader (e.g., Node 1).
- The Leader broadcasts the request to all followers.
- Once a majority of nodes acknowledge the data, the Leader permanently commits the write.
Confirmation: The local etcd on Node 2 informs its API server that the data is safely committed, and the API server returns a success response to the user.

Key Takeaways

MetalLB + NGINX Ingress provides a complete L4/L7 load balancing solution for bare-metal clusters
3-node etcd quorum is the minimum requirement for true fault tolerance
Control Plane VIP ensures seamless API access even during node failures
Stacked topology simplifies management by co-locating etcd with the control plane

Core Cluster Networking and High Availability Setup