CAP Theorem

A distributed system can only guarantee two of three:

  • Consistency — every read receives the most recent write
  • Availability — every request receives a response
  • Partition tolerance — system continues despite network failures

In practice, network partitions always happen, so you’re really choosing between CP (consistency) and AP (availability). Most real-world systems are somewhere on the spectrum.

Consistency Patterns

  • Strong consistency — reads always return the latest write. Simpler to reason about, but slower (e.g. ACID databases)
  • Eventual consistency — reads may return stale data temporarily, but will converge. Higher availability and lower latency (e.g. DynamoDB, DNS)
  • Read-your-writes — a user always sees their own writes, even if other users see stale data

Scaling

Vertical vs Horizontal

  • Vertical (scale up) — bigger machine. Simple, but has a ceiling
  • Horizontal (scale out) — more machines. Complex, but no ceiling

Database Scaling

  • Read replicas — offload reads to copies
  • Sharding — split data across databases by a key (e.g. user ID). Hard to change later, so avoid until necessary
  • Connection pooling — reuse database connections (PgBouncer, ProxySQL)

Load Balancing

Distributes traffic across multiple servers. Strategies:

  • Round robin — simple rotation
  • Least connections — send to the server with fewest active connections
  • Weighted — route more traffic to stronger servers
  • Consistent hashing — useful for caching layers, minimises redistribution when servers are added/removed
  • Visual guide to load balancing

Caching

  • Cache invalidation is one of the hardest problems in CS
  • Cache-aside — application checks cache first, loads from DB on miss, writes to cache
  • Write-through — write to cache and DB simultaneously
  • Write-behind — write to cache, asynchronously persist to DB. Faster writes, risk of data loss
  • TTL (time to live) — expire entries after a duration. Simple but may serve stale data
  • Layers: browser → CDN → reverse proxy → application cache → database cache

Rate Limiting

  • Protect services from overload
  • Token bucket — tokens refill at a fixed rate, each request consumes a token
  • Sliding window — count requests in a rolling time window
  • Apply at API gateway level, not deep in the application

Retries

  • Use exponential backoff with jitter — don’t hammer a failing service
  • Set a max retry count
  • Make operations idempotent so retries are safe
  • Retries guide

Communication Patterns

  • Synchronous (request/response) — HTTP/REST, gRPC. Simpler, but creates coupling
  • Asynchronous (event-driven) — message Queues, pub/sub. Decoupled, but harder to debug
  • gRPC — binary protocol over HTTP/2, faster than REST, strongly typed via protobuf. Good for service-to-service communication

Resources