Building Event-Driven Architectures

Traditional request-response architectures tightly couple services. Service A calls Service B and waits. If B is slow or down, A suffers. As systems grow, these synchronous dependencies create fragile, hard-to-scale architectures.

Event-driven architecture offers an alternative. Services communicate through events—facts about what happened. Producers emit events without knowing who consumes them. Consumers process events independently. This loose coupling enables scalability, resilience, and flexibility.

Core Concepts

Events vs. Commands

Events describe facts that happened. They’re named in past tense: OrderPlaced, UserRegistered, PaymentCompleted. Events are immutable—they record history.

Commands request actions. They’re named imperatively: PlaceOrder, RegisterUser, ProcessPayment. Commands may succeed or fail.

This distinction matters:

Events are facts; consumers can’t reject them
Commands are requests; handlers can reject them
Events enable loose coupling; commands imply synchronous handling

Producers and Consumers

Producers emit events when something happens in their domain. The order service emits OrderPlaced when a customer places an order. It doesn’t know or care who consumes this event.

Consumers subscribe to events they care about. The inventory service consumes OrderPlaced to reserve inventory. The notification service consumes it to send confirmation emails. The analytics service consumes it to update metrics.

Producers and consumers are decoupled:

Adding consumers doesn’t change producers
Consumer failures don’t affect producers
Services can be developed and deployed independently

Event Brokers

An event broker (Kafka, RabbitMQ, AWS SNS/SQS) mediates between producers and consumers:

Receives events from producers
Stores events (durably in Kafka, transiently in others)
Delivers events to consumers
Handles consumer scaling and failure

The broker enables the decoupling—producers and consumers never communicate directly.

Common Patterns

The simplest pattern. Producers publish events; consumers subscribe to event types.

Order Service → [OrderPlaced] → Broker
                                   ↓
                   ┌───────────────┼───────────────┐
                   ↓               ↓               ↓
            Inventory       Notification       Analytics
             Service          Service          Service

Each consumer receives every event independently. Consumers can process at different speeds.

Event Sourcing

Instead of storing current state, store the sequence of events that produced it. Current state is derived by replaying events.

Traditional approach (state storage):

-- Current state
SELECT * FROM orders WHERE id = 123;
-- Returns: {id: 123, status: "shipped", total: 99.00}

Event sourcing approach:

Event Stream for Order 123:
1. OrderCreated {total: 99.00}
2. PaymentReceived {amount: 99.00}
3. OrderShipped {carrier: "FedEx"}

Current state = replay(events)

Benefits:

Complete audit trail
Can reconstruct state at any point in time
Enables temporal queries (“what was the state last Tuesday?”)
Natural fit for event-driven systems

Challenges:

More complex than CRUD
Event schema evolution requires care
Replay can be slow for long streams (requires snapshots)

CQRS (Command Query Responsibility Segregation)

Separate read and write models. Commands modify the write model (events). Queries read from optimized read models built from events.

Commands → Write Model (Event Store)
                    ↓
              Event Stream
                    ↓
            Read Model Builder
                    ↓
Queries  ← Read Model (Optimized for queries)

Why separate?

Write and read have different requirements:

Writes need consistency, validation, business rules
Reads need speed, various projections, different data shapes

Optimizing both in one model creates compromises. Separation lets each optimize independently.

Example:

Write model stores events:

UserRegistered {id: 1, name: "Alice", email: "alice@example.com"}
UserEmailChanged {id: 1, new_email: "alice@newdomain.com"}

Read model for user lookup (built from events):

CREATE TABLE user_lookup (
  id INT PRIMARY KEY,
  name VARCHAR,
  email VARCHAR
);

Read model for user search (built from same events):

Elasticsearch index with full-text search on name

Saga Pattern

Long-running transactions across services. Instead of distributed transactions, coordinate through events and compensating actions.

Example: Order placement

1. OrderService: Create order (pending) → OrderCreated
2. InventoryService: Reserve inventory → InventoryReserved
3. PaymentService: Charge payment → PaymentCompleted
4. OrderService: Confirm order → OrderConfirmed

If step 3 fails:
   PaymentService: → PaymentFailed
   InventoryService: Release inventory → InventoryReleased
   OrderService: Cancel order → OrderCancelled

Sagas coordinate distributed processes without distributed transactions. Each step can be compensated if later steps fail.

Implementation Considerations

Event Design

Include sufficient context. Events should be self-contained. Consumers shouldn’t need to call back to producers for context.

// Poor: requires callback for user details
{"event": "OrderPlaced", "user_id": 123}

// Better: includes needed context
{
  "event": "OrderPlaced",
  "order_id": "order_456",
  "user": {"id": 123, "email": "user@example.com"},
  "items": [{"sku": "ABC", "quantity": 2}],
  "total": 99.00
}

Version events. Events are contracts. When changing event structure, version them:

{"event": "OrderPlaced", "version": 2, ...}

Consumers must handle multiple versions during transitions.

Ordering and Idempotency

Ordering: Events for the same entity should be processed in order. Most brokers provide ordering per partition/key:

producer.send(
    topic="orders",
    key=order_id,  # Same order always goes to same partition
    value=event
)

Idempotency: Consumers may receive events multiple times (at-least-once delivery). Make processing idempotent:

def handle_order_placed(event):
    if already_processed(event.id):
        return  # Idempotent: skip duplicate

    process_order(event)
    mark_processed(event.id)

Consumer Groups

Multiple instances of a consumer service can form a consumer group. The broker distributes events across instances—each event is processed by one instance.

This enables horizontal scaling of consumers.

Dead Letter Queues

When consumers fail to process events, dead letter queues capture failed events for investigation and retry:

Event → Consumer → [Success] → Ack
               → [Failure] → Retry
               → [Repeated Failure] → Dead Letter Queue

Monitor dead letter queues; accumulation indicates problems.

Schema Management

As events evolve, manage schemas carefully:

Schema registry validates events against schemas
Backward-compatible changes add fields without breaking consumers
Forward-compatible changes allow new consumers to handle old events

Tools like Avro, Protobuf, and JSON Schema provide schema evolution support.

When to Use Event-Driven Architecture

Good fit:

Multiple services need to react to the same events
Services can process independently without immediate response
You need audit trails or temporal queries
Scale and resilience matter more than immediate consistency

Poor fit:

Simple CRUD with single database
Requires immediate, consistent responses
Debugging and tracing complexity is unacceptable
Team lacks event-driven experience

Event-driven architecture adds complexity. It should solve real problems, not add architectural sophistication for its own sake.

Key Takeaways

Events describe facts that happened; commands request actions
Producer-consumer decoupling enables independent scaling and resilience
Event sourcing stores event streams instead of current state
CQRS separates read and write models for independent optimization
Sagas coordinate distributed processes without distributed transactions
Include sufficient context in events; version for schema evolution
Ensure ordering per entity and idempotent processing
Use event-driven architecture when you have real decoupling and scalability needs