Microservices is an architectural style that structures an application as a collection of small, autonomous services modelled around a business domain. Each service runs in its own process, owns its data, and communicates via well-defined APIs.

For detailed best practices and implementation guidance, see The Art of Microservices: A Guide to Best Practices.

Monolith vs Microservices

AspectMonolithMicroservices
DeploymentSingle unitIndependent services
ScalingScale entire applicationScale individual services
TechnologySingle tech stackPolyglot (multiple languages/frameworks)
DataShared databaseDatabase per service
ComplexitySimpler to develop initiallyDistributed systems complexity
Team structureLarger, centralised teamsSmall, autonomous teams

A monolith isn’t inherently bad—it’s often the right choice for small teams or early-stage products. The Monolith First approach suggests starting simple and extracting services when the need arises.

When to Use Microservices

Good fit:

  • Large, complex domains with clear bounded contexts
  • Need for independent scaling of specific components
  • Multiple teams working in parallel
  • Different parts require different technology choices
  • High availability requirements for specific functions

Poor fit:

  • Small teams or early-stage startups
  • Simple domains without clear boundaries
  • When you don’t have DevOps maturity (CI/CD, monitoring, containerisation)
  • Tight coupling between components that can’t be easily separated

Warning signs you’re not ready:

  • Struggling to define service boundaries
  • Lacking automated testing and deployment pipelines
  • No experience operating distributed systems
  • Team doesn’t understand the domain well enough

Key Principles

Single Responsibility

Each service should do one thing well, aligned with a specific business capability. If you can’t describe what a service does in one sentence, it’s probably doing too much.

Loose Coupling

Services should be independent—changes to one shouldn’t require changes to others. This means:

  • No shared databases between services
  • Communication only through published APIs
  • No shared code libraries containing business logic

Independent Deployment

Each service can be deployed without coordinating with other teams. This requires:

  • Backwards-compatible API changes
  • Feature flags for gradual rollouts
  • Contract testing between services

Decentralised Governance

Teams own their services end-to-end (build it, run it). This includes technology choices, deployment schedules, and operational responsibility.

Communication Patterns

Synchronous Communication

Request-response model where the caller waits for a response.

REST — HTTP-based, human-readable, widely supported

  • Best for: CRUD operations, public APIs, simple integrations
  • Trade-offs: Higher latency, tight coupling

gRPC — Binary protocol using Protocol Buffers

  • Best for: Internal service-to-service, high performance, streaming
  • Trade-offs: Requires code generation, less human-readable
  • gRPC Documentation

GraphQL — Query language allowing clients to request specific data

  • Best for: Complex data requirements, mobile clients with bandwidth constraints
  • Trade-offs: Caching complexity, potential for expensive queries

Asynchronous Communication

Fire-and-forget or event-driven patterns that decouple sender and receiver.

Message Queues — Point-to-point messaging (RabbitMQ, Amazon SQS)

  • Best for: Task distribution, work queues, reliable delivery

Event Streaming — Publish-subscribe with event log (Apache Kafka, Amazon Kinesis)

  • Best for: Event sourcing, real-time analytics, decoupling services

Benefits of async:

  • Temporal decoupling (services don’t need to be available simultaneously)
  • Better resilience (messages can be retried)
  • Natural load levelling

Challenges:

  • Eventual consistency
  • Message ordering
  • Debugging distributed transactions

Data Management

Database per Service

Each service owns its data and exposes it only through its API. Other services cannot directly access another service’s database.

Benefits:

  • Services can choose the best database for their needs (polyglot persistence)
  • Schema changes don’t affect other services
  • Independent scaling

Challenges:

  • No cross-service joins
  • Distributed transactions are complex
  • Data duplication may be necessary

Eventual Consistency

In distributed systems, strong consistency across services is often impractical. Instead, accept that data will be consistent eventually.

Patterns:

  • Saga pattern — Coordinate distributed transactions as a sequence of local transactions with compensating actions for rollback
  • Event sourcing — Store state changes as a sequence of events
  • CQRS — Separate read and write models for different consistency requirements

See Saga Pattern for implementation details.

Service Discovery and Load Balancing

Services need to find each other in a dynamic environment where instances come and go.

Service Discovery

Client-side discovery — Client queries a service registry and chooses an instance

  • Examples: Netflix Eureka, Consul

Server-side discovery — Client makes request to a load balancer/router that queries the registry

API Gateway

Single entry point that handles:

  • Request routing
  • Authentication and authorisation
  • Rate limiting
  • Response aggregation
  • Protocol translation

Examples: Kong, AWS API Gateway, Traefik, Ambassador

Resilience Patterns

Distributed systems will fail. Design for failure, not against it.

Circuit Breaker

Prevents cascading failures by stopping requests to a failing service. Three states:

  1. Closed — Requests flow normally
  2. Open — Requests fail immediately (service is down)
  3. Half-open — Limited requests to test recovery

Libraries: Resilience4j, Hystrix (deprecated), Polly (.NET)

Retry with Backoff

Retry failed requests with exponential backoff and jitter to avoid thundering herd problems.

wait_time = min(base * 2^attempt + random_jitter, max_wait)

Timeout

Always set timeouts on external calls. A missing timeout can cause cascading failures when a downstream service hangs.

Bulkhead

Isolate failures by partitioning resources. If one service’s connection pool is exhausted, it shouldn’t affect calls to other services.

Rate Limiting

Protect services from being overwhelmed by limiting request rates per client or globally.

Fallback

Provide degraded functionality when a service is unavailable (cached data, default values, alternative service).

Observability

The three pillars for understanding distributed systems:

Logs

Structured, contextual logs with correlation IDs to trace requests across services.

Metrics

Numeric measurements: request rate, error rate, latency percentiles, saturation.

Distributed Tracing

Follow a request as it flows through multiple services. Essential for debugging latency issues.

Key tools: Jaeger, Zipkin, OpenTelemetry

See Observability for detailed guidance on the three pillars, alerting strategies, and tooling.

Deployment

Containerisation

Package services with their dependencies for consistent deployment across environments.

Deployment Strategies

  • Rolling deployment — Gradually replace old instances
  • Blue-green — Run two environments, switch traffic instantly
  • Canary — Route small percentage of traffic to new version

Infrastructure

  • Service Mesh — Dedicated infrastructure layer for service-to-service communication (Istio, Linkerd)
  • Infrastructure as Code — Version-controlled infrastructure definitions

Anti-Patterns

  • Distributed monolith — Services that must be deployed together
  • Shared database — Multiple services accessing the same database
  • Synchronous chains — Long chains of synchronous calls
  • Nano-services — Services so small they add overhead without benefit
  • Lack of monitoring — Can’t debug what you can’t see