Multi-region architectures deploy applications across geographically separated regions. The benefits are clear: better latency for global users, improved availability through regional redundancy, and compliance with data residency requirements.
The complexity is also significant. Here’s how to navigate the trade-offs.
Why Multi-Region
Latency Reduction
User in Tokyo → US-East server: ~150ms RTT
User in Tokyo → Tokyo server: ~10ms RTT
For interactive applications, this difference is dramatic.
High Availability
Single region: One AWS region outage = complete outage
Multi-region: One AWS region outage = traffic shifts to other regions
Regional outages happen. 2017 saw major AWS US-East-1 outages affecting thousands of companies.
Compliance
Data residency requirements:
- GDPR (EU data in EU)
- Data localization laws
- Industry regulations
Architecture Patterns
Active-Passive
One primary region, failover to secondary:
Normal:
Users → Primary Region (active)
Secondary Region (passive, replicated)
During outage:
Users → Secondary Region (promoted to active)
Pros:
- Simpler than active-active
- No write conflicts
- Clear ownership
Cons:
- Cold standby may have issues
- Failover time can be significant
- Secondary region underutilized
Active-Active
Both regions serve traffic:
Users → Global Load Balancer
├── Region A (active, serves traffic)
└── Region B (active, serves traffic)
Pros:
- Better resource utilization
- Lower latency for all users
- No cold start on failover
Cons:
- Write conflicts possible
- More complex data sync
- Higher operational complexity
Follow-the-Sun
Active region follows business hours:
00:00-08:00 UTC: Asia-Pacific active
08:00-16:00 UTC: Europe active
16:00-24:00 UTC: Americas active
Useful for batch processing and off-hours maintenance.
Data Replication
Synchronous Replication
Wait for all regions to acknowledge:
Write → Primary → Replicas → All ACK → Confirm to client
Latency: Primary write + Cross-region RTT + Replica write
Pros: Strong consistency Cons: High latency, availability reduced (any region down blocks writes)
Asynchronous Replication
Confirm write, replicate in background:
Write → Primary → Confirm to client
→ Background replication to replicas
Pros: Low latency, high availability Cons: Eventual consistency, possible data loss on failure
Conflict Resolution
Active-active writes create conflicts:
Region A: UPDATE users SET name = 'Alice' WHERE id = 1;
Region B: UPDATE users SET name = 'Bob' WHERE id = 1;
Resolution strategies:
- Last-write-wins (LWW): Most recent timestamp wins
- Merge: Application-specific merging
- Conflict-free (CRDTs): Data structures that merge automatically
Traffic Routing
DNS-Based (GeoDNS)
DNS query from EU → EU region IP
DNS query from US → US region IP
Pros: Simple, no infrastructure changes Cons: TTL caching, not instant failover
Global Load Balancer
AWS Global Accelerator
GCP Global Load Balancer
Cloudflare Load Balancing
Pros: Faster failover, health checking, anycast Cons: Additional cost and complexity
Application-Level
Client determines region:
const region = determineRegion(user.location);
const apiUrl = `https://${region}.api.example.com`;
Pros: Full control Cons: Client complexity
Database Strategies
Global Database Services
Amazon Aurora Global Database:
Primary (us-east-1): Read/Write
Secondary (eu-west-1): Read-only, <1s replication
CockroachDB:
-- Specify region for data
CREATE TABLE users (
id UUID PRIMARY KEY,
region STRING
) LOCALITY REGIONAL BY ROW AS region;
Spanner: Multi-region transactions with strong consistency.
Regional Databases with Sync
Region A: Primary for users A-M
Region B: Primary for users N-Z
Bidirectional replication for reads
Users’ data lives in their region; routing directs them correctly.
Caching Strategy
Region A Cache ← Invalidation → Region B Cache
↓ ↓
Regional DB Regional DB
↓ ↓
Primary Replication ────────►
Cache invalidation across regions is hard. Accept staleness or use careful invalidation.
Operational Considerations
Monitoring
Per-region metrics:
request_latency{region="us-east"}
error_rate{region="eu-west"}
replication_lag{source="us-east", target="eu-west"}
Cross-region visibility:
Dashboard showing all regions simultaneously
Alerts on regional discrepancies
Replication lag monitoring
Failover Testing
Regular failover drills:
- Scheduled regional failovers
- Game days with simulated outages
- Automated failover testing
Deployment Strategy
Options:
- Simultaneous: Deploy to all regions at once
- Sequential: One region at a time
- Canary: Small traffic in one region first
Sequential is safer but slower.
When Not to Go Multi-Region
Multi-region is expensive in complexity and cost:
Skip multi-region if:
- Users are in one geography
- Availability requirements are met with multi-AZ
- Team is small
- Budget is constrained
Consider multi-region if:
- Users are globally distributed
- Availability requirements exceed single-region capability
- Compliance requires data residency
- Business impact of regional outage is severe
Cost Considerations
Multi-region costs:
- Duplicate infrastructure
- Cross-region data transfer ($0.02-0.09/GB)
- Cross-region replication
- Operational complexity
Calculate ROI carefully.
Key Takeaways
- Multi-region improves latency, availability, and compliance but adds significant complexity
- Active-passive is simpler; active-active provides better utilization and latency
- Asynchronous replication for availability; synchronous for consistency
- Conflict resolution is critical for active-active writes
- Use global load balancers for fast, health-aware failover
- Test failover regularly; don’t wait for real outages
- Multi-region isn’t always worth it; single-region multi-AZ may suffice
- Calculate costs including data transfer and operational overhead
Multi-region is a significant architectural investment. Make sure the benefits justify the complexity.