Building Multi-Region Applications

When your application serves users in New York and Tokyo, physics becomes your enemy. Light takes 70 milliseconds to travel between them—and that’s the theoretical minimum. Real-world network latency is higher. Add in multiple round trips for a typical page load, and users across the globe experience dramatically different performance.

Multi-region architecture addresses this by placing application components close to users. Done well, it improves performance for everyone. Done poorly, it adds complexity without benefit.

Why Multi-Region

Latency Reduction

Closer servers mean lower latency. A user in Singapore hitting a server in the US experiences 200-300ms network latency. The same user hitting a server in Singapore experiences 10-20ms.

For interactive applications, this difference is noticeable. For real-time applications, it’s critical.

Disaster Recovery

A single region is a single point of failure. Data center outages, regional network issues, and cloud provider problems can take down everything.

Multi-region provides redundancy. If one region fails, others continue serving traffic.

Regulatory Compliance

Some regulations require data residency in specific jurisdictions. GDPR, data sovereignty laws, and industry regulations may require serving users from local regions.

Multi-region enables compliance by keeping data where regulations require.

Architecture Patterns

Active-Passive

One region handles all traffic (active). Another region maintains replicas (passive), ready to take over if the active region fails.

Active Region (US)          Passive Region (EU)
     │                            │
     ├── Application ──────────── │ (replica)
     │                            │
     ├── Database ────────────────│ (read replica)
     │                            │
     └── Users ◄───────────────── │ (failover)

Pros:

Simpler than active-active
No cross-region consistency challenges
Clear ownership and operational model

Cons:

Passive region doesn’t serve traffic (wasted capacity)
Failover requires intervention (or complex automation)
Users distant from active region experience latency

When to use:

Primary goal is disaster recovery, not global performance
Data consistency is critical
Simpler operations are preferred

Active-Active

Multiple regions serve traffic simultaneously. Users are routed to the nearest region.

Region US                       Region EU
    │                               │
    ├── Application                 ├── Application
    │                               │
    ├── Database ◄── sync ──►       ├── Database
    │                               │
    └── Users (US)                  └── Users (EU)

Pros:

Lower latency for all users
Full utilization of all regions
No failover required (traffic shifts automatically)

Cons:

Data consistency challenges
More complex operations
Higher cost

When to use:

Performance for global users is important
Application can tolerate eventual consistency
Operations capacity exists to manage complexity

Follow-the-Sun

Read traffic goes to nearest region. Write traffic goes to primary region.

Primary (US): Reads + Writes
Secondary (EU): Reads only, replicated from US
Secondary (Asia): Reads only, replicated from US

Pros:

Improved read latency globally
Simpler consistency model (single writer)
Less complex than full active-active

Cons:

Write latency not improved for remote users
Failover still required for primary

When to use:

Read-heavy workloads
Strong consistency requirements for writes
Want better performance without full active-active complexity

Data Synchronization

Multi-region data is the hard problem. Several approaches:

Asynchronous Replication

Changes replicate from primary to secondaries with some delay.

Write → Primary DB → Replication (async) → Secondary DBs
                          ↓
                    (lag: 100ms-10s)

Trade-offs:

Eventually consistent reads
Possible read-after-write issues
Conflicts possible with active-active

Use when: Application tolerates stale reads, write latency is acceptable.

Synchronous Replication

Writes aren’t acknowledged until replicated to all regions.

Trade-offs:

Strong consistency
Higher write latency (cross-region round trips)
Availability reduced (dependent on all regions)

Use when: Strong consistency required, write latency acceptable.

Conflict Resolution

Active-active writes can conflict. Strategies:

Last-write-wins: Latest timestamp wins (simple, can lose data)
Merge: Combine conflicting changes (complex, domain-specific)
Application resolution: Application logic decides (most flexible)

Choose based on your data semantics.

Regional Data Affinity

Some data has natural regional affinity. US users’ data stays in US. EU users’ data stays in EU.

User in US → US Region → US Data
User in EU → EU Region → EU Data

This avoids cross-region replication for most operations. Cross-region only needed for global data or migration.

Routing Traffic

Get users to the right region.

DNS-Based Routing

Route53, Cloudflare, and other DNS providers support geo-routing:

example.com →
  US users → us-east-app.example.com
  EU users → eu-west-app.example.com

Pros: Simple, widely supported, no client changes. Cons: DNS caching affects change propagation, limited precision.

Anycast

Single IP address routed to nearest instance via BGP.

Pros: Transparent, automatic, fast failover. Cons: Requires IP address ownership, complex to set up.

CDN-Based Routing

CDNs (CloudFront, Akamai, Fastly) route to nearest edge location:

User → Nearest CDN Edge → Cache or Origin in Best Region

Pros: Built-in, handles static content excellently. Cons: Cost, less control for dynamic content.

Operational Considerations

Deployment Coordination

Deploy to all regions consistently. Inconsistent versions cause subtle bugs.

Options:

Serial deployment: One region at a time, with validation
Parallel deployment: All regions simultaneously
Canary by region: New version to one region first

Monitoring Across Regions

Monitor each region independently and in aggregate:

Per-region latency, errors, traffic
Cross-region replication lag
Region health for routing decisions

Failover Procedures

Document and practice failover:

When to fail over?
Who decides?
What’s the procedure?
How to fail back?

Untested failover procedures fail when needed.

Testing

Test cross-region behavior:

Performance from different locations
Behavior during region failure
Data consistency under various scenarios

Global testing is harder than single-region testing. Invest in it.

When to Go Multi-Region

Multi-region adds significant complexity. Justify it:

Strong case:

Users distributed globally with latency requirements
Regulatory requirements for data residency
Business requires high availability beyond single-region capability

Weak case:

“We might go global someday”
Single region meets latency requirements
Complexity isn’t justified by traffic patterns

Start with single region, design for eventual multi-region, and migrate when justified.

Key Takeaways

Multi-region reduces latency, improves reliability, and enables regulatory compliance
Active-passive is simpler but doesn’t improve performance; active-active is complex but better for users
Data synchronization is the hard problem; choose consistency model based on application requirements
Route traffic using DNS, anycast, or CDN-based approaches
Coordinate deployments, monitor per-region, and practice failover procedures
Don’t go multi-region prematurely; justify complexity with actual requirements