Building Event-Driven Architectures

April 10, 2017

Traditional request-response architectures tightly couple services. Service A calls Service B and waits. If B is slow or down, A suffers. As systems grow, these synchronous dependencies create fragile, hard-to-scale architectures.

Event-driven architecture offers an alternative. Services communicate through events—facts about what happened. Producers emit events without knowing who consumes them. Consumers process events independently. This loose coupling enables scalability, resilience, and flexibility.

Core Concepts

Events vs. Commands

Events describe facts that happened. They’re named in past tense: OrderPlaced, UserRegistered, PaymentCompleted. Events are immutable—they record history.

Commands request actions. They’re named imperatively: PlaceOrder, RegisterUser, ProcessPayment. Commands may succeed or fail.

This distinction matters:

Producers and Consumers

Producers emit events when something happens in their domain. The order service emits OrderPlaced when a customer places an order. It doesn’t know or care who consumes this event.

Consumers subscribe to events they care about. The inventory service consumes OrderPlaced to reserve inventory. The notification service consumes it to send confirmation emails. The analytics service consumes it to update metrics.

Producers and consumers are decoupled:

Event Brokers

An event broker (Kafka, RabbitMQ, AWS SNS/SQS) mediates between producers and consumers:

The broker enables the decoupling—producers and consumers never communicate directly.

Common Patterns

Publish-Subscribe

The simplest pattern. Producers publish events; consumers subscribe to event types.

Order Service → [OrderPlaced] → Broker
                                   ↓
                   ┌───────────────┼───────────────┐
                   ↓               ↓               ↓
            Inventory       Notification       Analytics
             Service          Service          Service

Each consumer receives every event independently. Consumers can process at different speeds.

Event Sourcing

Instead of storing current state, store the sequence of events that produced it. Current state is derived by replaying events.

Traditional approach (state storage):

-- Current state
SELECT * FROM orders WHERE id = 123;
-- Returns: {id: 123, status: "shipped", total: 99.00}

Event sourcing approach:

Event Stream for Order 123:
1. OrderCreated {total: 99.00}
2. PaymentReceived {amount: 99.00}
3. OrderShipped {carrier: "FedEx"}

Current state = replay(events)

Benefits:

Challenges:

CQRS (Command Query Responsibility Segregation)

Separate read and write models. Commands modify the write model (events). Queries read from optimized read models built from events.

Commands → Write Model (Event Store)
                    ↓
              Event Stream
                    ↓
            Read Model Builder
                    ↓
Queries  ← Read Model (Optimized for queries)

Why separate?

Write and read have different requirements:

Optimizing both in one model creates compromises. Separation lets each optimize independently.

Example:

Write model stores events:

UserRegistered {id: 1, name: "Alice", email: "alice@example.com"}
UserEmailChanged {id: 1, new_email: "alice@newdomain.com"}

Read model for user lookup (built from events):

CREATE TABLE user_lookup (
  id INT PRIMARY KEY,
  name VARCHAR,
  email VARCHAR
);

Read model for user search (built from same events):

Elasticsearch index with full-text search on name

Saga Pattern

Long-running transactions across services. Instead of distributed transactions, coordinate through events and compensating actions.

Example: Order placement

1. OrderService: Create order (pending) → OrderCreated
2. InventoryService: Reserve inventory → InventoryReserved
3. PaymentService: Charge payment → PaymentCompleted
4. OrderService: Confirm order → OrderConfirmed

If step 3 fails:
   PaymentService: → PaymentFailed
   InventoryService: Release inventory → InventoryReleased
   OrderService: Cancel order → OrderCancelled

Sagas coordinate distributed processes without distributed transactions. Each step can be compensated if later steps fail.

Implementation Considerations

Event Design

Include sufficient context. Events should be self-contained. Consumers shouldn’t need to call back to producers for context.

// Poor: requires callback for user details
{"event": "OrderPlaced", "user_id": 123}

// Better: includes needed context
{
  "event": "OrderPlaced",
  "order_id": "order_456",
  "user": {"id": 123, "email": "user@example.com"},
  "items": [{"sku": "ABC", "quantity": 2}],
  "total": 99.00
}

Version events. Events are contracts. When changing event structure, version them:

{"event": "OrderPlaced", "version": 2, ...}

Consumers must handle multiple versions during transitions.

Ordering and Idempotency

Ordering: Events for the same entity should be processed in order. Most brokers provide ordering per partition/key:

producer.send(
    topic="orders",
    key=order_id,  # Same order always goes to same partition
    value=event
)

Idempotency: Consumers may receive events multiple times (at-least-once delivery). Make processing idempotent:

def handle_order_placed(event):
    if already_processed(event.id):
        return  # Idempotent: skip duplicate

    process_order(event)
    mark_processed(event.id)

Consumer Groups

Multiple instances of a consumer service can form a consumer group. The broker distributes events across instances—each event is processed by one instance.

This enables horizontal scaling of consumers.

Dead Letter Queues

When consumers fail to process events, dead letter queues capture failed events for investigation and retry:

Event → Consumer → [Success] → Ack
               → [Failure] → Retry
               → [Repeated Failure] → Dead Letter Queue

Monitor dead letter queues; accumulation indicates problems.

Schema Management

As events evolve, manage schemas carefully:

Tools like Avro, Protobuf, and JSON Schema provide schema evolution support.

When to Use Event-Driven Architecture

Good fit:

Poor fit:

Event-driven architecture adds complexity. It should solve real problems, not add architectural sophistication for its own sake.

Key Takeaways