Message queues enable asynchronous service-to-service communication, allowing applications to decouple heavy processing, buffer work, and smooth out traffic spikes. They are fundamental to building resilient Microservices architectures.
Core Concepts
Producers send messages to a queue or topic. Consumers retrieve and process messages. Messages remain in the queue until processed and acknowledged.
Topics and Partitions: Topics are named channels for messages. Partitions (in systems like Kafka) divide topics for parallel processing and ordering guarantees within each partition.
Acknowledgements: Consumers confirm successful message processing. Without acknowledgement, the message can be redelivered to another consumer.
Dead Letter Queues (DLQ): Messages that repeatedly fail processing are moved to a DLQ for inspection and manual intervention, preventing poison messages from blocking the queue.
Messaging Patterns
Point-to-Point
One producer, one consumer per message. Messages are load-balanced across competing consumers. Classic work queue pattern.
Publish/Subscribe (Pub/Sub)
One producer, multiple consumers. Each subscriber receives a copy of every message. Useful for event notifications and broadcasting.
Request-Reply
Synchronous-style communication over async infrastructure. Producer sends a request with a reply-to address; consumer processes and responds to that address.
Delivery Guarantees
| Guarantee | Description | Trade-off |
|---|---|---|
| At-most-once | Fire and forget. Message may be lost. | Fastest, lowest overhead |
| At-least-once | Message delivered one or more times. Duplicates possible. | Requires idempotent consumers |
| Exactly-once | Message delivered precisely once. | Most complex, highest overhead |
Most systems implement at-least-once with idempotent consumers, as true exactly-once is difficult and expensive.
Message Brokers Comparison
RabbitMQ
Traditional message broker with advanced routing via exchanges. Supports message TTL, delayed messages, and dead letter exchanges out of the box. Better for complex routing rules and when message timing control matters. Uses AMQP protocol.
Apache Kafka
Distributed event streaming platform using an append-only log. Excellent for high-throughput, ordered event streams, and message replay. Consumers track their own offset. Best for event sourcing, audit logs, and stream processing. Scales horizontally through partitioning.
AWS SQS / SNS
Fully managed. SQS provides simple queuing with visibility timeouts. SNS provides pub/sub fanout. Often used together. No infrastructure to manage but less flexible than self-hosted options.
Redis Streams
Lightweight streaming built into Redis. Good for simpler use cases where you already have Redis. Supports consumer groups and message acknowledgement.
NATS
Ultra-lightweight, high-performance messaging. Core NATS offers at-most-once; JetStream adds persistence and exactly-once. Tiny footprint (under 20MB), ideal for edge computing and IoT.
When to Use Queues vs Direct Calls
Use queues when:
- Work can be processed asynchronously
- You need to handle traffic spikes gracefully
- Consumers may be temporarily unavailable
- You want to decouple services
- Processing is slow or resource-intensive
Use direct calls when:
- You need immediate responses
- The operation is fast and unlikely to fail
- Strong consistency is required
- Simplicity outweighs resilience benefits
Best Practices
- Design idempotent consumers: Messages may be delivered more than once
- Set appropriate timeouts: Visibility timeouts should exceed expected processing time
- Monitor queue depth: Growing backlogs indicate consumer issues
- Use DLQs: Never lose messages silently; inspect failures
- Keep messages small: Store large payloads elsewhere and reference them
- Consider message ordering: Most queues don’t guarantee order; design accordingly
- Plan for replay: Design systems that can reprocess historical messages when needed