Database Replication Patterns: When and How

Database replication seems straightforward: keep copies of data on multiple servers. The reality is nuanced. Different replication patterns have different tradeoffs in consistency, availability, performance, and complexity.

Here’s how to think about database replication.

Why Replicate

High Availability

Primary fails → Replica promoted → Minimal downtime

┌─────────┐         ┌─────────┐
│ Primary │ ──WAL──►│ Replica │
│  (RW)   │         │  (RO)   │
└─────────┘         └─────────┘
     │                   │
     ▼                   ▼
   Fails          Becomes Primary

Without replication, a single server failure means downtime. With replication, failover can happen in seconds.

Read Scaling

┌─────────────────────────────────────────┐
│            Load Balancer                │
│   (routes reads to replicas)            │
└─────────────────────────────────────────┘
      │           │           │
      ▼           ▼           ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Primary │ │ Replica │ │ Replica │
│  (RW)   │ │  (RO)   │ │  (RO)   │
└─────────┘ └─────────┘ └─────────┘

One server handles writes; multiple servers handle reads.

Geographic Distribution

US Region                EU Region
┌─────────┐             ┌─────────┐
│ Primary │ ───WAL────► │ Replica │
│  (NYC)  │             │  (FRA)  │
└─────────┘             └─────────┘

Users read from nearby replicas for lower latency.

Replication Modes

Asynchronous Replication

Primary doesn’t wait for replicas to confirm:

Primary: COMMIT → Return success → Send to replica (background)

Characteristics:

Fast commits
No added write latency
Possible data loss on failover
Replication lag possible

-- PostgreSQL async replication
-- In postgresql.conf on primary
wal_level = replica
max_wal_senders = 3

-- In pg_hba.conf
host replication replicator replica_ip/32 md5

-- On replica
-- In recovery.conf / postgresql.conf
primary_conninfo = 'host=primary_ip user=replicator'

When to use:

Read scaling where slight staleness is acceptable
Geographic replicas where latency makes sync impractical
Backup replicas not used for failover

Synchronous Replication

Primary waits for at least one replica:

Primary: COMMIT → Wait for replica ACK → Return success

Characteristics:

No data loss on failover
Added write latency
Replica failure can block writes
Stronger consistency

-- PostgreSQL sync replication
-- In postgresql.conf on primary
synchronous_commit = on
synchronous_standby_names = 'replica1'

When to use:

Financial transactions
Regulatory requirements
When data loss is unacceptable

Semi-Synchronous

Hybrid approach:

-- PostgreSQL: quorum-based
synchronous_standby_names = 'ANY 1 (replica1, replica2, replica3)'
-- Commit succeeds if ANY 1 of 3 replicas confirms

Better availability than full sync, better durability than async.

Replication Topologies

Primary-Replica (Master-Slave)

        ┌─────────┐
        │ Primary │
        │  (RW)   │
        └────┬────┘
        ┌────┴────┐
        ▼         ▼
   ┌─────────┐ ┌─────────┐
   │ Replica │ │ Replica │
   │  (RO)   │ │  (RO)   │
   └─────────┘ └─────────┘

Simple and common. One writer, multiple readers.

Cascading Replication

   ┌─────────┐
   │ Primary │
   └────┬────┘
        ▼
   ┌─────────┐
   │ Replica │ (also replicates)
   └────┬────┘
        ▼
   ┌─────────┐
   │ Replica │
   └─────────┘

Reduces load on primary. Useful for many replicas or geographic distribution.

-- PostgreSQL cascading
-- Intermediate replica needs:
hot_standby = on
max_wal_senders = 3

Multi-Primary (Multi-Master)

   ┌─────────┐     ┌─────────┐
   │ Primary │◄───►│ Primary │
   │   US    │     │   EU    │
   └─────────┘     └─────────┘

Both nodes accept writes. Conflict resolution required.

Challenges:

Conflict resolution complexity
Eventual consistency
Split-brain prevention

Technologies:

MySQL Group Replication
PostgreSQL BDR (Bi-Directional Replication)
CockroachDB, TiDB (distributed SQL)

Ring Replication

   ┌─────────┐
   │ Node 1  │
   └────┬────┘
        │
   ┌────▼────┐
   │ Node 2  │
   └────┬────┘
        │
   ┌────▼────┐
   │ Node 3  │───► Node 1
   └─────────┘

Each node replicates to the next. Simple but fragile (one failure breaks the ring).

Conflict Resolution

For multi-primary setups:

Last-Writer-Wins (LWW)

-- Concurrent updates
Node A: UPDATE users SET email='a@new.com' WHERE id=1; -- ts: 10:00:01
Node B: UPDATE users SET email='b@new.com' WHERE id=1; -- ts: 10:00:02

-- Node B's change wins (later timestamp)

Simple but can lose data.

Merge/Custom Resolution

-- Application-defined merge
CREATE OR REPLACE FUNCTION resolve_conflict(old_row, new_row)
RETURNS record AS $$
BEGIN
    -- Merge strategy: keep higher balance
    IF new_row.balance > old_row.balance THEN
        RETURN new_row;
    ELSE
        RETURN old_row;
    END IF;
END;
$$ LANGUAGE plpgsql;

CRDT-Based

Conflict-free Replicated Data Types:

# G-Counter (grow-only counter)
# Each node maintains its own count
class GCounter:
    def __init__(self, node_id):
        self.counts = {}
        self.node_id = node_id

    def increment(self):
        self.counts[self.node_id] = self.counts.get(self.node_id, 0) + 1

    def value(self):
        return sum(self.counts.values())

    def merge(self, other):
        for node, count in other.counts.items():
            self.counts[node] = max(self.counts.get(node, 0), count)

No conflicts by design, but limited data types.

Failover Strategies

Manual Failover

# 1. Verify primary is down
# 2. Stop replica from receiving
pg_ctl stop -D /var/lib/postgresql/data

# 3. Promote replica
pg_ctl promote -D /var/lib/postgresql/data

# 4. Update application connection strings
# 5. Rebuild old primary as replica

Safe but slow (minutes to hours).

Automated Failover

# Patroni configuration
bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
  postgresql:
    parameters:
      synchronous_commit: "on"

Tools: Patroni, PgBouncer + HAProxy, MHA (MySQL)

Faster (seconds) but requires careful configuration.

Proxy-Based

┌─────────────────────────────────────────┐
│               PgBouncer                  │
│      (tracks primary, routes writes)     │
└─────────────────────────────────────────┘
           │                 │
           ▼                 ▼
      ┌─────────┐       ┌─────────┐
      │ Primary │       │ Replica │
      └─────────┘       └─────────┘

Application connects to proxy; proxy handles routing.

Read-Your-Writes Consistency

With async replication, users might not see their own writes:

User writes → Primary
User reads  → Replica (hasn't received write yet)

Solutions:

Session Affinity

Route user to same replica:

def get_connection(user_id, is_write):
    if is_write:
        return primary_connection
    else:
        # Consistent hashing to same replica
        replica = replicas[hash(user_id) % len(replicas)]
        return replica

Read-from-Primary After Write

def get_connection(user_id, is_write):
    if is_write:
        cache.set(f"recent_write:{user_id}", True, ttl=5)
        return primary_connection

    if cache.get(f"recent_write:{user_id}"):
        return primary_connection  # Read from primary briefly

    return random.choice(replicas)

GTID Tracking

def write_data(data):
    result = primary.execute("INSERT ... RETURNING *")
    gtid = primary.get_gtid()
    return result, gtid

def read_data(user_id, min_gtid=None):
    if min_gtid:
        # Wait for replica to catch up
        replica.wait_for_gtid(min_gtid, timeout=1)
    return replica.execute("SELECT ...")

Monitoring Replication

Key Metrics

-- PostgreSQL replication lag
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    (sent_lsn - replay_lsn) AS byte_lag
FROM pg_stat_replication;

-- Lag in seconds (approximate)
SELECT
    EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) AS lag_seconds
FROM pg_stat_replication;

Alerts

alerts:
  - name: ReplicationLagHigh
    condition: replication_lag_seconds > 30
    severity: warning

  - name: ReplicationLagCritical
    condition: replication_lag_seconds > 300
    severity: critical

  - name: ReplicaDisconnected
    condition: replica_state != 'streaming'
    severity: critical

Key Takeaways

Async replication is faster but risks data loss; sync is safer but slower
Primary-replica is simple and sufficient for most read scaling
Multi-primary adds complexity; use only when geographically necessary
Automated failover requires careful configuration and testing
Read-your-writes consistency needs explicit handling with async replication
Monitor replication lag continuously; stale reads cause subtle bugs
Choose replication mode based on durability requirements, not convenience
Test failover regularly; untested failover often fails

Replication isn’t free. Understand the tradeoffs and choose patterns that match your actual requirements.