Reference notes.

Load balancers distribute network traffic across multiple servers to improve availability, throughput, and reliability. They operate at different layers of the OSI Model with fundamentally different capabilities.

L4 vs L7

AspectL4 (Transport)L7 (Application)
Operates onTCP/UDP 5-tuple (IPs, ports, protocol)HTTP headers, URLs, cookies, body
Inspects contentNoYes
TLS terminationPass-through or terminateAlways terminates
SpeedFaster (no parsing)Slower (must parse application data)
Routing intelligenceIP and port onlyURL path, host header, cookie, gRPC method
Connection modelPer-connectionPer-request
Best forRaw throughput, non-HTTP, databasesHTTP routing, canary deploys, API gateways

Why L7 Matters for HTTP/2 and gRPC

L4 is blind to application-level multiplexing. Two gRPC clients sharing one TCP connection look like a single flow to an L4 balancer — all requests go to the same backend. L7 can distribute individual requests across backends.

Modern Architecture: Layered Approach

Internet → L4 (edge, DDoS protection, raw speed)
             → L7 (intelligent routing, TLS termination, canary)
                  → Backend servers

Algorithms

AlgorithmDescriptionUse Case
Round RobinRotate through servers sequentiallyEqual-capacity servers, stateless
Weighted Round RobinRotate proportionally to weightServers with different capacities
Least ConnectionsSend to server with fewest active connectionsVarying request durations
Weighted Least ConnectionsLeast connections adjusted by weightMixed capacity + varying load
IP HashHash source IP to pick serverSimple session persistence
Consistent HashingHash-ring distributes keys evenly, minimal remapping on changesCaches, stateful services
Random Two ChoicesPick two random servers, send to the one with fewer connectionsSimple, surprisingly effective
EWMA (Exponential Weighted Moving Average)Route based on rolling latency averageLatency-sensitive, reduces tail latency 10-30% vs round robin

Consistent hashing (Maglev-style) is used in production L4 balancers — when a backend is added or removed, only a small percentage of flows remap.

Health Checks

LevelMethodChecks
L4TCP connect / TLS handshakePort is open, process is listening
L7HTTP GET to health endpointApplication is responding correctly
DeepCustom health endpointDatabase connected, dependencies healthy

Configure short check intervals (5-10s) with a failure threshold (3 failures before marking unhealthy). Use separate liveness and readiness checks in Kubernetes environments.

Session Persistence (Sticky Sessions)

Route all requests from the same client to the same backend.

Methods:

  • Source IP — Hash client IP. Breaks with NAT (many clients share one IP)
  • Cookie — Insert a cookie identifying the backend. Most reliable for HTTP
  • Header — Route on a custom header (e.g., user ID)

Best practice: Avoid sticky sessions when possible. Store session state externally (Redis, database) so any backend can serve any request. This enables true stateless scaling and better fault tolerance.

TLS Termination

The load balancer decrypts TLS, inspects/routes the request, then forwards to backends.

Options:

  • TLS termination at LB — LB handles all crypto. Backends receive plain HTTP. Simplifies cert management but traffic is unencrypted internally.
  • TLS re-encryption — LB terminates and re-encrypts to backends. More secure internally but double the crypto overhead.
  • TLS passthrough — L4 only. LB forwards encrypted traffic untouched. No content inspection possible but end-to-end encryption is preserved.

Direct Server Return (DSR)

The load balancer only handles inbound traffic; backends reply directly to clients, bypassing the LB on the return path. Used at L4. Dramatically increases throughput (up to 10-40 Gbps per server) because the LB doesn’t process response data. Used by Meta’s Katran.

Software Load Balancers

SoftwareLayerNotes
HAProxyL4/L7De facto standard. HTTP/2, HTTP/3 (2.6+), gRPC. Very high performance.
nginxL7 (L4 with stream)Web server + reverse proxy. HTTP/2, HTTP/3 (1.25+). Widely deployed.
EnvoyL4/L7Cloud-native, designed for service meshes. xDS API for dynamic config. Used by Istio.
TraefikL7Auto-discovery from Docker/Kubernetes. Built-in Let’s Encrypt. Good for simpler setups.
CaddyL7Automatic HTTPS. Simple config. Good for small-medium deployments.
KatranL4Meta’s eBPF/XDP-based L4 LB. Extreme throughput.

eBPF/XDP Load Balancing

Modern L4 load balancers (Katran, Cilium) use eBPF and XDP to process packets in the kernel before they reach the network stack. This achieves near-hardware speeds on commodity servers. See eBPF and XDP.

Cloud Load Balancers

ServiceLayerScopeNotes
AWS ALBL7RegionalHTTP/HTTPS, gRPC, WebSocket
AWS NLBL4RegionalTCP/UDP, static IPs, extreme throughput
GCP Cloud LBL4/L7GlobalTrue anycast, one of the few global L7 LBs
Azure LBL4RegionalTCP/UDP
Azure App GatewayL7RegionalHTTP/HTTPS, WAF
Cloudflare LBL7GlobalAnycast, integrated with CDN and DDoS protection

Global Server Load Balancing (GSLB)

Distributes traffic across geographically distributed data centres. Typically implemented via:

  • DNS-based — Return different IPs based on client location (GeoDNS)
  • Anycast — Multiple data centres advertise the same IP via BGP; routing sends clients to the nearest one

Kubernetes Load Balancing

  • kube-proxy — Default L4 load balancing for Services (iptables or IPVS mode). Being replaced by eBPF in Cilium deployments.
  • Ingress controllers — L7 load balancing (nginx, HAProxy, Envoy-based). Match on host/path rules.
  • Gateway API — Newer, more expressive replacement for Ingress. Role-oriented, supports TCP/UDP/gRPC natively. Supported by Cilium, Envoy Gateway, nginx Gateway Fabric.
  • Service mesh — Envoy sidecars (Istio) or eBPF (Cilium) for per-request L7 load balancing between services, with retries, circuit breaking, and observability.

See Also

  • HTTP — L7 load balancers operate at the HTTP layer
  • TCP — L4 load balancers operate at the TCP layer
  • Container Networking — Kubernetes networking and service meshes
  • Firewalls — eBPF/XDP used for both firewalling and load balancing

References