Reference notes.
Load balancers distribute network traffic across multiple servers to improve availability, throughput, and reliability. They operate at different layers of the OSI Model with fundamentally different capabilities.
L4 vs L7
| Aspect | L4 (Transport) | L7 (Application) |
|---|---|---|
| Operates on | TCP/UDP 5-tuple (IPs, ports, protocol) | HTTP headers, URLs, cookies, body |
| Inspects content | No | Yes |
| TLS termination | Pass-through or terminate | Always terminates |
| Speed | Faster (no parsing) | Slower (must parse application data) |
| Routing intelligence | IP and port only | URL path, host header, cookie, gRPC method |
| Connection model | Per-connection | Per-request |
| Best for | Raw throughput, non-HTTP, databases | HTTP routing, canary deploys, API gateways |
Why L7 Matters for HTTP/2 and gRPC
L4 is blind to application-level multiplexing. Two gRPC clients sharing one TCP connection look like a single flow to an L4 balancer — all requests go to the same backend. L7 can distribute individual requests across backends.
Modern Architecture: Layered Approach
Internet → L4 (edge, DDoS protection, raw speed)
→ L7 (intelligent routing, TLS termination, canary)
→ Backend servers
Algorithms
| Algorithm | Description | Use Case |
|---|---|---|
| Round Robin | Rotate through servers sequentially | Equal-capacity servers, stateless |
| Weighted Round Robin | Rotate proportionally to weight | Servers with different capacities |
| Least Connections | Send to server with fewest active connections | Varying request durations |
| Weighted Least Connections | Least connections adjusted by weight | Mixed capacity + varying load |
| IP Hash | Hash source IP to pick server | Simple session persistence |
| Consistent Hashing | Hash-ring distributes keys evenly, minimal remapping on changes | Caches, stateful services |
| Random Two Choices | Pick two random servers, send to the one with fewer connections | Simple, surprisingly effective |
| EWMA (Exponential Weighted Moving Average) | Route based on rolling latency average | Latency-sensitive, reduces tail latency 10-30% vs round robin |
Consistent hashing (Maglev-style) is used in production L4 balancers — when a backend is added or removed, only a small percentage of flows remap.
Health Checks
| Level | Method | Checks |
|---|---|---|
| L4 | TCP connect / TLS handshake | Port is open, process is listening |
| L7 | HTTP GET to health endpoint | Application is responding correctly |
| Deep | Custom health endpoint | Database connected, dependencies healthy |
Configure short check intervals (5-10s) with a failure threshold (3 failures before marking unhealthy). Use separate liveness and readiness checks in Kubernetes environments.
Session Persistence (Sticky Sessions)
Route all requests from the same client to the same backend.
Methods:
- Source IP — Hash client IP. Breaks with NAT (many clients share one IP)
- Cookie — Insert a cookie identifying the backend. Most reliable for HTTP
- Header — Route on a custom header (e.g., user ID)
Best practice: Avoid sticky sessions when possible. Store session state externally (Redis, database) so any backend can serve any request. This enables true stateless scaling and better fault tolerance.
TLS Termination
The load balancer decrypts TLS, inspects/routes the request, then forwards to backends.
Options:
- TLS termination at LB — LB handles all crypto. Backends receive plain HTTP. Simplifies cert management but traffic is unencrypted internally.
- TLS re-encryption — LB terminates and re-encrypts to backends. More secure internally but double the crypto overhead.
- TLS passthrough — L4 only. LB forwards encrypted traffic untouched. No content inspection possible but end-to-end encryption is preserved.
Direct Server Return (DSR)
The load balancer only handles inbound traffic; backends reply directly to clients, bypassing the LB on the return path. Used at L4. Dramatically increases throughput (up to 10-40 Gbps per server) because the LB doesn’t process response data. Used by Meta’s Katran.
Software Load Balancers
| Software | Layer | Notes |
|---|---|---|
| HAProxy | L4/L7 | De facto standard. HTTP/2, HTTP/3 (2.6+), gRPC. Very high performance. |
| nginx | L7 (L4 with stream) | Web server + reverse proxy. HTTP/2, HTTP/3 (1.25+). Widely deployed. |
| Envoy | L4/L7 | Cloud-native, designed for service meshes. xDS API for dynamic config. Used by Istio. |
| Traefik | L7 | Auto-discovery from Docker/Kubernetes. Built-in Let’s Encrypt. Good for simpler setups. |
| Caddy | L7 | Automatic HTTPS. Simple config. Good for small-medium deployments. |
| Katran | L4 | Meta’s eBPF/XDP-based L4 LB. Extreme throughput. |
eBPF/XDP Load Balancing
Modern L4 load balancers (Katran, Cilium) use eBPF and XDP to process packets in the kernel before they reach the network stack. This achieves near-hardware speeds on commodity servers. See eBPF and XDP.
Cloud Load Balancers
| Service | Layer | Scope | Notes |
|---|---|---|---|
| AWS ALB | L7 | Regional | HTTP/HTTPS, gRPC, WebSocket |
| AWS NLB | L4 | Regional | TCP/UDP, static IPs, extreme throughput |
| GCP Cloud LB | L4/L7 | Global | True anycast, one of the few global L7 LBs |
| Azure LB | L4 | Regional | TCP/UDP |
| Azure App Gateway | L7 | Regional | HTTP/HTTPS, WAF |
| Cloudflare LB | L7 | Global | Anycast, integrated with CDN and DDoS protection |
Global Server Load Balancing (GSLB)
Distributes traffic across geographically distributed data centres. Typically implemented via:
- DNS-based — Return different IPs based on client location (GeoDNS)
- Anycast — Multiple data centres advertise the same IP via BGP; routing sends clients to the nearest one
Kubernetes Load Balancing
- kube-proxy — Default L4 load balancing for Services (iptables or IPVS mode). Being replaced by eBPF in Cilium deployments.
- Ingress controllers — L7 load balancing (nginx, HAProxy, Envoy-based). Match on host/path rules.
- Gateway API — Newer, more expressive replacement for Ingress. Role-oriented, supports TCP/UDP/gRPC natively. Supported by Cilium, Envoy Gateway, nginx Gateway Fabric.
- Service mesh — Envoy sidecars (Istio) or eBPF (Cilium) for per-request L7 load balancing between services, with retries, circuit breaking, and observability.
See Also
- HTTP — L7 load balancers operate at the HTTP layer
- TCP — L4 load balancers operate at the TCP layer
- Container Networking — Kubernetes networking and service meshes
- Firewalls — eBPF/XDP used for both firewalling and load balancing
References
- HAProxy Documentation
- nginx Load Balancing
- Envoy Documentation
- Maglev: A Fast and Reliable Software Network Load Balancer — Google’s consistent hashing for L4
- Introduction to Modern Network Load Balancing — Matt Klein