Building Multi-Region Applications

October 2, 2017

When your application serves users in New York and Tokyo, physics becomes your enemy. Light takes 70 milliseconds to travel between them—and that’s the theoretical minimum. Real-world network latency is higher. Add in multiple round trips for a typical page load, and users across the globe experience dramatically different performance.

Multi-region architecture addresses this by placing application components close to users. Done well, it improves performance for everyone. Done poorly, it adds complexity without benefit.

Why Multi-Region

Latency Reduction

Closer servers mean lower latency. A user in Singapore hitting a server in the US experiences 200-300ms network latency. The same user hitting a server in Singapore experiences 10-20ms.

For interactive applications, this difference is noticeable. For real-time applications, it’s critical.

Disaster Recovery

A single region is a single point of failure. Data center outages, regional network issues, and cloud provider problems can take down everything.

Multi-region provides redundancy. If one region fails, others continue serving traffic.

Regulatory Compliance

Some regulations require data residency in specific jurisdictions. GDPR, data sovereignty laws, and industry regulations may require serving users from local regions.

Multi-region enables compliance by keeping data where regulations require.

Architecture Patterns

Active-Passive

One region handles all traffic (active). Another region maintains replicas (passive), ready to take over if the active region fails.

Active Region (US)          Passive Region (EU)
     │                            │
     ├── Application ──────────── │ (replica)
     │                            │
     ├── Database ────────────────│ (read replica)
     │                            │
     └── Users ◄───────────────── │ (failover)

Pros:

Cons:

When to use:

Active-Active

Multiple regions serve traffic simultaneously. Users are routed to the nearest region.

Region US                       Region EU
    │                               │
    ├── Application                 ├── Application
    │                               │
    ├── Database ◄── sync ──►       ├── Database
    │                               │
    └── Users (US)                  └── Users (EU)

Pros:

Cons:

When to use:

Follow-the-Sun

Read traffic goes to nearest region. Write traffic goes to primary region.

Primary (US): Reads + Writes
Secondary (EU): Reads only, replicated from US
Secondary (Asia): Reads only, replicated from US

Pros:

Cons:

When to use:

Data Synchronization

Multi-region data is the hard problem. Several approaches:

Asynchronous Replication

Changes replicate from primary to secondaries with some delay.

Write → Primary DB → Replication (async) → Secondary DBs
                          ↓
                    (lag: 100ms-10s)

Trade-offs:

Use when: Application tolerates stale reads, write latency is acceptable.

Synchronous Replication

Writes aren’t acknowledged until replicated to all regions.

Trade-offs:

Use when: Strong consistency required, write latency acceptable.

Conflict Resolution

Active-active writes can conflict. Strategies:

Choose based on your data semantics.

Regional Data Affinity

Some data has natural regional affinity. US users’ data stays in US. EU users’ data stays in EU.

User in US → US Region → US Data
User in EU → EU Region → EU Data

This avoids cross-region replication for most operations. Cross-region only needed for global data or migration.

Routing Traffic

Get users to the right region.

DNS-Based Routing

Route53, Cloudflare, and other DNS providers support geo-routing:

example.com →
  US users → us-east-app.example.com
  EU users → eu-west-app.example.com

Pros: Simple, widely supported, no client changes. Cons: DNS caching affects change propagation, limited precision.

Anycast

Single IP address routed to nearest instance via BGP.

Pros: Transparent, automatic, fast failover. Cons: Requires IP address ownership, complex to set up.

CDN-Based Routing

CDNs (CloudFront, Akamai, Fastly) route to nearest edge location:

User → Nearest CDN Edge → Cache or Origin in Best Region

Pros: Built-in, handles static content excellently. Cons: Cost, less control for dynamic content.

Operational Considerations

Deployment Coordination

Deploy to all regions consistently. Inconsistent versions cause subtle bugs.

Options:

Monitoring Across Regions

Monitor each region independently and in aggregate:

Failover Procedures

Document and practice failover:

Untested failover procedures fail when needed.

Testing

Test cross-region behavior:

Global testing is harder than single-region testing. Invest in it.

When to Go Multi-Region

Multi-region adds significant complexity. Justify it:

Strong case:

Weak case:

Start with single region, design for eventual multi-region, and migrate when justified.

Key Takeaways