Traditional request-response architectures tightly couple services. Service A calls Service B and waits. If B is slow or down, A suffers. As systems grow, these synchronous dependencies create fragile, hard-to-scale architectures.
Event-driven architecture offers an alternative. Services communicate through events—facts about what happened. Producers emit events without knowing who consumes them. Consumers process events independently. This loose coupling enables scalability, resilience, and flexibility.
Core Concepts
Events vs. Commands
Events describe facts that happened. They’re named in past tense: OrderPlaced, UserRegistered, PaymentCompleted. Events are immutable—they record history.
Commands request actions. They’re named imperatively: PlaceOrder, RegisterUser, ProcessPayment. Commands may succeed or fail.
This distinction matters:
- Events are facts; consumers can’t reject them
- Commands are requests; handlers can reject them
- Events enable loose coupling; commands imply synchronous handling
Producers and Consumers
Producers emit events when something happens in their domain. The order service emits OrderPlaced when a customer places an order. It doesn’t know or care who consumes this event.
Consumers subscribe to events they care about. The inventory service consumes OrderPlaced to reserve inventory. The notification service consumes it to send confirmation emails. The analytics service consumes it to update metrics.
Producers and consumers are decoupled:
- Adding consumers doesn’t change producers
- Consumer failures don’t affect producers
- Services can be developed and deployed independently
Event Brokers
An event broker (Kafka, RabbitMQ, AWS SNS/SQS) mediates between producers and consumers:
- Receives events from producers
- Stores events (durably in Kafka, transiently in others)
- Delivers events to consumers
- Handles consumer scaling and failure
The broker enables the decoupling—producers and consumers never communicate directly.
Common Patterns
Publish-Subscribe
The simplest pattern. Producers publish events; consumers subscribe to event types.
Order Service → [OrderPlaced] → Broker
↓
┌───────────────┼───────────────┐
↓ ↓ ↓
Inventory Notification Analytics
Service Service Service
Each consumer receives every event independently. Consumers can process at different speeds.
Event Sourcing
Instead of storing current state, store the sequence of events that produced it. Current state is derived by replaying events.
Traditional approach (state storage):
-- Current state
SELECT * FROM orders WHERE id = 123;
-- Returns: {id: 123, status: "shipped", total: 99.00}
Event sourcing approach:
Event Stream for Order 123:
1. OrderCreated {total: 99.00}
2. PaymentReceived {amount: 99.00}
3. OrderShipped {carrier: "FedEx"}
Current state = replay(events)
Benefits:
- Complete audit trail
- Can reconstruct state at any point in time
- Enables temporal queries (“what was the state last Tuesday?”)
- Natural fit for event-driven systems
Challenges:
- More complex than CRUD
- Event schema evolution requires care
- Replay can be slow for long streams (requires snapshots)
CQRS (Command Query Responsibility Segregation)
Separate read and write models. Commands modify the write model (events). Queries read from optimized read models built from events.
Commands → Write Model (Event Store)
↓
Event Stream
↓
Read Model Builder
↓
Queries ← Read Model (Optimized for queries)
Why separate?
Write and read have different requirements:
- Writes need consistency, validation, business rules
- Reads need speed, various projections, different data shapes
Optimizing both in one model creates compromises. Separation lets each optimize independently.
Example:
Write model stores events:
UserRegistered {id: 1, name: "Alice", email: "alice@example.com"}
UserEmailChanged {id: 1, new_email: "alice@newdomain.com"}
Read model for user lookup (built from events):
CREATE TABLE user_lookup (
id INT PRIMARY KEY,
name VARCHAR,
email VARCHAR
);
Read model for user search (built from same events):
Elasticsearch index with full-text search on name
Saga Pattern
Long-running transactions across services. Instead of distributed transactions, coordinate through events and compensating actions.
Example: Order placement
1. OrderService: Create order (pending) → OrderCreated
2. InventoryService: Reserve inventory → InventoryReserved
3. PaymentService: Charge payment → PaymentCompleted
4. OrderService: Confirm order → OrderConfirmed
If step 3 fails:
PaymentService: → PaymentFailed
InventoryService: Release inventory → InventoryReleased
OrderService: Cancel order → OrderCancelled
Sagas coordinate distributed processes without distributed transactions. Each step can be compensated if later steps fail.
Implementation Considerations
Event Design
Include sufficient context. Events should be self-contained. Consumers shouldn’t need to call back to producers for context.
// Poor: requires callback for user details
{"event": "OrderPlaced", "user_id": 123}
// Better: includes needed context
{
"event": "OrderPlaced",
"order_id": "order_456",
"user": {"id": 123, "email": "user@example.com"},
"items": [{"sku": "ABC", "quantity": 2}],
"total": 99.00
}
Version events. Events are contracts. When changing event structure, version them:
{"event": "OrderPlaced", "version": 2, ...}
Consumers must handle multiple versions during transitions.
Ordering and Idempotency
Ordering: Events for the same entity should be processed in order. Most brokers provide ordering per partition/key:
producer.send(
topic="orders",
key=order_id, # Same order always goes to same partition
value=event
)
Idempotency: Consumers may receive events multiple times (at-least-once delivery). Make processing idempotent:
def handle_order_placed(event):
if already_processed(event.id):
return # Idempotent: skip duplicate
process_order(event)
mark_processed(event.id)
Consumer Groups
Multiple instances of a consumer service can form a consumer group. The broker distributes events across instances—each event is processed by one instance.
This enables horizontal scaling of consumers.
Dead Letter Queues
When consumers fail to process events, dead letter queues capture failed events for investigation and retry:
Event → Consumer → [Success] → Ack
→ [Failure] → Retry
→ [Repeated Failure] → Dead Letter Queue
Monitor dead letter queues; accumulation indicates problems.
Schema Management
As events evolve, manage schemas carefully:
- Schema registry validates events against schemas
- Backward-compatible changes add fields without breaking consumers
- Forward-compatible changes allow new consumers to handle old events
Tools like Avro, Protobuf, and JSON Schema provide schema evolution support.
When to Use Event-Driven Architecture
Good fit:
- Multiple services need to react to the same events
- Services can process independently without immediate response
- You need audit trails or temporal queries
- Scale and resilience matter more than immediate consistency
Poor fit:
- Simple CRUD with single database
- Requires immediate, consistent responses
- Debugging and tracing complexity is unacceptable
- Team lacks event-driven experience
Event-driven architecture adds complexity. It should solve real problems, not add architectural sophistication for its own sake.
Key Takeaways
- Events describe facts that happened; commands request actions
- Producer-consumer decoupling enables independent scaling and resilience
- Event sourcing stores event streams instead of current state
- CQRS separates read and write models for independent optimization
- Sagas coordinate distributed processes without distributed transactions
- Include sufficient context in events; version for schema evolution
- Ensure ordering per entity and idempotent processing
- Use event-driven architecture when you have real decoupling and scalability needs