Microservices is an architectural style that structures an application as a collection of small, autonomous services modelled around a business domain. Each service runs in its own process, owns its data, and communicates via well-defined APIs.
For detailed best practices and implementation guidance, see The Art of Microservices: A Guide to Best Practices.
Monolith vs Microservices
| Aspect | Monolith | Microservices |
|---|---|---|
| Deployment | Single unit | Independent services |
| Scaling | Scale entire application | Scale individual services |
| Technology | Single tech stack | Polyglot (multiple languages/frameworks) |
| Data | Shared database | Database per service |
| Complexity | Simpler to develop initially | Distributed systems complexity |
| Team structure | Larger, centralised teams | Small, autonomous teams |
A monolith isn’t inherently bad—it’s often the right choice for small teams or early-stage products. The Monolith First approach suggests starting simple and extracting services when the need arises.
When to Use Microservices
Good fit:
- Large, complex domains with clear bounded contexts
- Need for independent scaling of specific components
- Multiple teams working in parallel
- Different parts require different technology choices
- High availability requirements for specific functions
Poor fit:
- Small teams or early-stage startups
- Simple domains without clear boundaries
- When you don’t have DevOps maturity (CI/CD, monitoring, containerisation)
- Tight coupling between components that can’t be easily separated
Warning signs you’re not ready:
- Struggling to define service boundaries
- Lacking automated testing and deployment pipelines
- No experience operating distributed systems
- Team doesn’t understand the domain well enough
Key Principles
Single Responsibility
Each service should do one thing well, aligned with a specific business capability. If you can’t describe what a service does in one sentence, it’s probably doing too much.
Loose Coupling
Services should be independent—changes to one shouldn’t require changes to others. This means:
- No shared databases between services
- Communication only through published APIs
- No shared code libraries containing business logic
Independent Deployment
Each service can be deployed without coordinating with other teams. This requires:
- Backwards-compatible API changes
- Feature flags for gradual rollouts
- Contract testing between services
Decentralised Governance
Teams own their services end-to-end (build it, run it). This includes technology choices, deployment schedules, and operational responsibility.
Communication Patterns
Synchronous Communication
Request-response model where the caller waits for a response.
REST — HTTP-based, human-readable, widely supported
- Best for: CRUD operations, public APIs, simple integrations
- Trade-offs: Higher latency, tight coupling
gRPC — Binary protocol using Protocol Buffers
- Best for: Internal service-to-service, high performance, streaming
- Trade-offs: Requires code generation, less human-readable
- gRPC Documentation
GraphQL — Query language allowing clients to request specific data
- Best for: Complex data requirements, mobile clients with bandwidth constraints
- Trade-offs: Caching complexity, potential for expensive queries
Asynchronous Communication
Fire-and-forget or event-driven patterns that decouple sender and receiver.
Message Queues — Point-to-point messaging (RabbitMQ, Amazon SQS)
- Best for: Task distribution, work queues, reliable delivery
Event Streaming — Publish-subscribe with event log (Apache Kafka, Amazon Kinesis)
- Best for: Event sourcing, real-time analytics, decoupling services
Benefits of async:
- Temporal decoupling (services don’t need to be available simultaneously)
- Better resilience (messages can be retried)
- Natural load levelling
Challenges:
- Eventual consistency
- Message ordering
- Debugging distributed transactions
Data Management
Database per Service
Each service owns its data and exposes it only through its API. Other services cannot directly access another service’s database.
Benefits:
- Services can choose the best database for their needs (polyglot persistence)
- Schema changes don’t affect other services
- Independent scaling
Challenges:
- No cross-service joins
- Distributed transactions are complex
- Data duplication may be necessary
Eventual Consistency
In distributed systems, strong consistency across services is often impractical. Instead, accept that data will be consistent eventually.
Patterns:
- Saga pattern — Coordinate distributed transactions as a sequence of local transactions with compensating actions for rollback
- Event sourcing — Store state changes as a sequence of events
- CQRS — Separate read and write models for different consistency requirements
See Saga Pattern for implementation details.
Service Discovery and Load Balancing
Services need to find each other in a dynamic environment where instances come and go.
Service Discovery
Client-side discovery — Client queries a service registry and chooses an instance
- Examples: Netflix Eureka, Consul
Server-side discovery — Client makes request to a load balancer/router that queries the registry
- Examples: AWS ALB, Kubernetes Services, Service Mesh
API Gateway
Single entry point that handles:
- Request routing
- Authentication and authorisation
- Rate limiting
- Response aggregation
- Protocol translation
Examples: Kong, AWS API Gateway, Traefik, Ambassador
Resilience Patterns
Distributed systems will fail. Design for failure, not against it.
Circuit Breaker
Prevents cascading failures by stopping requests to a failing service. Three states:
- Closed — Requests flow normally
- Open — Requests fail immediately (service is down)
- Half-open — Limited requests to test recovery
Libraries: Resilience4j, Hystrix (deprecated), Polly (.NET)
Retry with Backoff
Retry failed requests with exponential backoff and jitter to avoid thundering herd problems.
wait_time = min(base * 2^attempt + random_jitter, max_wait)
Timeout
Always set timeouts on external calls. A missing timeout can cause cascading failures when a downstream service hangs.
Bulkhead
Isolate failures by partitioning resources. If one service’s connection pool is exhausted, it shouldn’t affect calls to other services.
Rate Limiting
Protect services from being overwhelmed by limiting request rates per client or globally.
Fallback
Provide degraded functionality when a service is unavailable (cached data, default values, alternative service).
Observability
The three pillars for understanding distributed systems:
Logs
Structured, contextual logs with correlation IDs to trace requests across services.
Metrics
Numeric measurements: request rate, error rate, latency percentiles, saturation.
Distributed Tracing
Follow a request as it flows through multiple services. Essential for debugging latency issues.
Key tools: Jaeger, Zipkin, OpenTelemetry
See Observability for detailed guidance on the three pillars, alerting strategies, and tooling.
Deployment
Containerisation
Package services with their dependencies for consistent deployment across environments.
- Docker for packaging
- Kubernetes for orchestration
Deployment Strategies
- Rolling deployment — Gradually replace old instances
- Blue-green — Run two environments, switch traffic instantly
- Canary — Route small percentage of traffic to new version
Infrastructure
- Service Mesh — Dedicated infrastructure layer for service-to-service communication (Istio, Linkerd)
- Infrastructure as Code — Version-controlled infrastructure definitions
Anti-Patterns
- Distributed monolith — Services that must be deployed together
- Shared database — Multiple services accessing the same database
- Synchronous chains — Long chains of synchronous calls
- Nano-services — Services so small they add overhead without benefit
- Lack of monitoring — Can’t debug what you can’t see
Links
- Microservices.io — Comprehensive patterns catalogue by Chris Richardson
- Martin Fowler: Microservices — Foundational article
- Sam Newman: Building Microservices — Essential book on the topic
- 12 Factor App — Methodology for building modern applications
- CNCF Cloud Native Landscape — Ecosystem of cloud native tools