Load Balancing

Reference notes.

Load balancers distribute network traffic across multiple servers to improve availability, throughput, and reliability. They operate at different layers of the OSI Model with fundamentally different capabilities.

L4 vs L7

Aspect	L4 (Transport)	L7 (Application)
Operates on	TCP/UDP 5-tuple (IPs, ports, protocol)	HTTP headers, URLs, cookies, body
Inspects content	No	Yes
TLS termination	Pass-through or terminate	Always terminates
Speed	Faster (no parsing)	Slower (must parse application data)
Routing intelligence	IP and port only	URL path, host header, cookie, gRPC method
Connection model	Per-connection	Per-request
Best for	Raw throughput, non-HTTP, databases	HTTP routing, canary deploys, API gateways

Why L7 Matters for HTTP/2 and gRPC

L4 is blind to application-level multiplexing. Two gRPC clients sharing one TCP connection look like a single flow to an L4 balancer — all requests go to the same backend. L7 can distribute individual requests across backends.

Modern Architecture: Layered Approach

Internet → L4 (edge, DDoS protection, raw speed)
             → L7 (intelligent routing, TLS termination, canary)
                  → Backend servers

Algorithms

Algorithm	Description	Use Case
Round Robin	Rotate through servers sequentially	Equal-capacity servers, stateless
Weighted Round Robin	Rotate proportionally to weight	Servers with different capacities
Least Connections	Send to server with fewest active connections	Varying request durations
Weighted Least Connections	Least connections adjusted by weight	Mixed capacity + varying load
IP Hash	Hash source IP to pick server	Simple session persistence
Consistent Hashing	Hash-ring distributes keys evenly, minimal remapping on changes	Caches, stateful services
Random Two Choices	Pick two random servers, send to the one with fewer connections	Simple, surprisingly effective
EWMA (Exponential Weighted Moving Average)	Route based on rolling latency average	Latency-sensitive, reduces tail latency 10-30% vs round robin

Consistent hashing (Maglev-style) is used in production L4 balancers — when a backend is added or removed, only a small percentage of flows remap.

Health Checks

Level	Method	Checks
L4	TCP connect / TLS handshake	Port is open, process is listening
L7	HTTP GET to health endpoint	Application is responding correctly
Deep	Custom health endpoint	Database connected, dependencies healthy

Configure short check intervals (5-10s) with a failure threshold (3 failures before marking unhealthy). Use separate liveness and readiness checks in Kubernetes environments.

Session Persistence (Sticky Sessions)

Route all requests from the same client to the same backend.

Methods:

Source IP — Hash client IP. Breaks with NAT (many clients share one IP)
Cookie — Insert a cookie identifying the backend. Most reliable for HTTP
Header — Route on a custom header (e.g., user ID)

Best practice: Avoid sticky sessions when possible. Store session state externally (Redis, database) so any backend can serve any request. This enables true stateless scaling and better fault tolerance.

TLS Termination

The load balancer decrypts TLS, inspects/routes the request, then forwards to backends.

Options:

TLS termination at LB — LB handles all crypto. Backends receive plain HTTP. Simplifies cert management but traffic is unencrypted internally.
TLS re-encryption — LB terminates and re-encrypts to backends. More secure internally but double the crypto overhead.
TLS passthrough — L4 only. LB forwards encrypted traffic untouched. No content inspection possible but end-to-end encryption is preserved.

Direct Server Return (DSR)

The load balancer only handles inbound traffic; backends reply directly to clients, bypassing the LB on the return path. Used at L4. Dramatically increases throughput (up to 10-40 Gbps per server) because the LB doesn’t process response data. Used by Meta’s Katran.

Software Load Balancers

Software	Layer	Notes
HAProxy	L4/L7	De facto standard. HTTP/2, HTTP/3 (2.6+), gRPC. Very high performance.
nginx	L7 (L4 with stream)	Web server + reverse proxy. HTTP/2, HTTP/3 (1.25+). Widely deployed.
Envoy	L4/L7	Cloud-native, designed for service meshes. xDS API for dynamic config. Used by Istio.
Traefik	L7	Auto-discovery from Docker/Kubernetes. Built-in Let’s Encrypt. Good for simpler setups.
Caddy	L7	Automatic HTTPS. Simple config. Good for small-medium deployments.
Katran	L4	Meta’s eBPF/XDP-based L4 LB. Extreme throughput.

eBPF/XDP Load Balancing

Modern L4 load balancers (Katran, Cilium) use eBPF and XDP to process packets in the kernel before they reach the network stack. This achieves near-hardware speeds on commodity servers. See eBPF and XDP.

Cloud Load Balancers

Service	Layer	Scope	Notes
AWS ALB	L7	Regional	HTTP/HTTPS, gRPC, WebSocket
AWS NLB	L4	Regional	TCP/UDP, static IPs, extreme throughput
GCP Cloud LB	L4/L7	Global	True anycast, one of the few global L7 LBs
Azure LB	L4	Regional	TCP/UDP
Azure App Gateway	L7	Regional	HTTP/HTTPS, WAF
Cloudflare LB	L7	Global	Anycast, integrated with CDN and DDoS protection

Global Server Load Balancing (GSLB)

Distributes traffic across geographically distributed data centres. Typically implemented via:

DNS-based — Return different IPs based on client location (GeoDNS)
Anycast — Multiple data centres advertise the same IP via BGP; routing sends clients to the nearest one

Kubernetes Load Balancing

kube-proxy — Default L4 load balancing for Services (iptables or IPVS mode). Being replaced by eBPF in Cilium deployments.
Ingress controllers — L7 load balancing (nginx, HAProxy, Envoy-based). Match on host/path rules.
Gateway API — Newer, more expressive replacement for Ingress. Role-oriented, supports TCP/UDP/gRPC natively. Supported by Cilium, Envoy Gateway, nginx Gateway Fabric.
Service mesh — Envoy sidecars (Istio) or eBPF (Cilium) for per-request L7 load balancing between services, with retries, circuit breaking, and observability.

References

HAProxy Documentation
nginx Load Balancing
Envoy Documentation
Maglev: A Fast and Reliable Software Network Load Balancer — Google’s consistent hashing for L4
Introduction to Modern Network Load Balancing — Matt Klein

Rai Notes

Explorer