Scaling PostgreSQL: Replication, Sharding, and Beyond

PostgreSQL handles impressive workloads on a single server. Proper indexing, query optimization, and adequate hardware carry most applications far beyond their initial scale projections. But eventually, growth hits limits: CPU can’t process more queries, disk can’t handle more writes, or data exceeds practical storage limits.

Scaling PostgreSQL requires understanding the bottleneck and choosing appropriate strategies. This isn’t one-size-fits-all; different problems require different solutions.

Understanding the Bottleneck

Before choosing a scaling strategy, identify what’s actually limiting you:

Read capacity: Too many queries saturating CPU. Response times increasing under load. Solution: read replicas.

Write capacity: Transaction log writes can’t keep up. Write operations queuing. Solution: write optimization, partitioning, eventual sharding.

Storage: Data exceeds disk capacity or performance degrades with size. Solution: partitioning, archival, sharding.

Connection count: Too many concurrent connections. Solution: connection pooling.

Don’t assume you need sharding because “that’s what big companies do.” Most applications never need sharding; simpler approaches solve their problems.

Connection Pooling

PostgreSQL creates a process for each connection. Many connections consume significant memory; connection storms can overwhelm the server.

Connection pooling solves this by maintaining a pool of database connections and multiplexing application connections onto them.

PgBouncer

PgBouncer is the standard PostgreSQL connection pooler:

[databases]
myapp = host=127.0.0.1 dbname=myapp

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 5432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

Pool modes:

Session: Connection held for entire session. Least efficient but most compatible.
Transaction: Connection returned to pool after transaction. Good balance.
Statement: Connection returned after each statement. Most efficient but breaks multi-statement transactions.

Connection pooling is almost always worth implementing. It’s low-risk and solves a common bottleneck.

Read Replicas

When read queries saturate your primary server, add read replicas. Replicas receive all writes from the primary and can serve read queries independently.

Streaming Replication

PostgreSQL streaming replication is straightforward to configure:

Primary (postgresql.conf):

wal_level = replica
max_wal_senders = 5
wal_keep_segments = 32

Replica setup:

pg_basebackup -h primary-host -D /var/lib/postgresql/data -P -R

The -R flag creates a recovery.conf that connects to the primary for streaming.

Application Integration

Applications must route queries appropriately:

Write queries go to primary
Read queries can go to replicas

This requires application changes or middleware. Options:

Application-level routing: Application code explicitly routes reads to replicas. Gives precise control but requires code changes.

Connection middleware: Tools like PgPool-II route queries based on content. More transparent but adds complexity.

Load balancer with read/write awareness: Some load balancers can parse queries and route accordingly.

Replication Lag

Replicas lag behind the primary—writes on primary take time to propagate. This creates consistency challenges:

User writes data, then reads it back from replica before replication: sees stale data
Application makes decision based on replica data that’s behind primary: incorrect decision

Strategies for lag:

Read-your-writes: After writes, read from primary (or verify replication lag is acceptable).

Causal consistency: Track which replica has seen a particular write; read from that replica.

Accept eventual consistency: For many use cases (dashboards, analytics), slightly stale data is fine.

Monitor replication lag and alert when it exceeds acceptable thresholds.

Partitioning

Partitioning divides a large table into smaller pieces. Queries that touch only one partition access less data; maintenance operations can work on individual partitions.

Table Inheritance (Pre-PostgreSQL 10)

Traditional PostgreSQL partitioning uses table inheritance:

-- Parent table
CREATE TABLE events (
    id SERIAL,
    event_time TIMESTAMP NOT NULL,
    data JSONB
);

-- Child tables for each month
CREATE TABLE events_2016_01 (
    CHECK (event_time >= '2016-01-01' AND event_time < '2016-02-01')
) INHERITS (events);

CREATE TABLE events_2016_02 (
    CHECK (event_time >= '2016-02-01' AND event_time < '2016-03-01')
) INHERITS (events);

-- Index child tables
CREATE INDEX ON events_2016_01 (event_time);
CREATE INDEX ON events_2016_02 (event_time);

This approach requires triggers for insert routing and manual partition management.

Declarative Partitioning (PostgreSQL 10+)

PostgreSQL 10 introduces declarative partitioning:

CREATE TABLE events (
    id SERIAL,
    event_time TIMESTAMP NOT NULL,
    data JSONB
) PARTITION BY RANGE (event_time);

CREATE TABLE events_2016_01 PARTITION OF events
    FOR VALUES FROM ('2016-01-01') TO ('2016-02-01');

CREATE TABLE events_2016_02 PARTITION OF events
    FOR VALUES FROM ('2016-02-01') TO ('2016-03-01');

Declarative partitioning is cleaner and more efficient.

Partitioning Strategies

Range partitioning: Divide by value ranges (dates, IDs). Good for time-series data.

List partitioning: Divide by discrete values (regions, categories). Good for known, stable categories.

Hash partitioning: Divide by hash of a column. Distributes data evenly when no natural partitioning key exists.

Benefits

Query performance: Queries touching one partition scan less data.
Maintenance: VACUUM, ANALYZE, indexes can operate per partition.
Data lifecycle: Drop old partitions instead of deleting rows.

Limitations

Cross-partition queries: Queries spanning many partitions may not benefit.
Partition key constraints: All queries should include the partition key.
Management overhead: Creating and managing partitions adds operational work.

Sharding

Sharding distributes data across multiple independent database servers. Unlike replication (all servers have all data), sharding divides data (each server has a subset).

When Sharding Makes Sense

Sharding is the most complex scaling approach. Consider it only when:

Write volume exceeds single-server capacity
Data size exceeds practical single-server limits
Simpler approaches (optimization, replicas, partitioning) are insufficient

Most applications never need sharding. The complexity cost is significant.

Sharding Approaches

Application-level sharding: Application code determines which shard holds each piece of data and routes queries accordingly.

def get_shard(user_id):
    return f"shard_{user_id % NUM_SHARDS}"

def get_user(user_id):
    conn = get_connection(get_shard(user_id))
    return conn.execute("SELECT * FROM users WHERE id = %s", user_id)

Middleware sharding: A proxy layer routes queries to appropriate shards. Tools like Vitess, Citus, and others provide this.

Database-level sharding: Some databases handle sharding internally. Citus extends PostgreSQL with distributed table capability.

Sharding Challenges

Schema changes: Must apply to all shards consistently.

Cross-shard queries: Queries spanning multiple shards require aggregation from multiple sources.

Rebalancing: Adding shards requires redistributing data.

Transactions: Distributed transactions across shards are complex or impossible.

Operational complexity: Multiple databases to monitor, backup, upgrade.

Shard Key Selection

The shard key determines data distribution. Choose carefully:

Good shard keys: Provide even distribution and match query patterns. If most queries filter by user_id, shard by user_id.

Bad shard keys: Create hot spots (one shard gets disproportionate traffic) or require cross-shard queries for common operations.

Citus: PostgreSQL Sharding Extension

Citus extends PostgreSQL with distributed table capability. Tables can be distributed across multiple workers while queries use standard PostgreSQL syntax.

-- On coordinator
CREATE TABLE events (
    tenant_id INT,
    event_time TIMESTAMP,
    data JSONB
);

-- Distribute by tenant_id
SELECT create_distributed_table('events', 'tenant_id');

-- Queries work transparently
SELECT * FROM events WHERE tenant_id = 123;

Citus handles routing queries to appropriate shards and aggregating results. It’s particularly effective for multi-tenant applications where tenant_id is a natural shard key.

Scaling Strategy Progression

A typical scaling progression:

Single server optimization: Proper indexing, query optimization, adequate hardware. This handles more than you expect.
Connection pooling: Low-effort, high-reward. Implement early.
Read replicas: When reads saturate the primary. Moderate application changes.
Partitioning: When table size causes performance issues. Good for time-series data.
Sharding: When nothing else suffices. Significant architecture change.

At each stage, measure whether the problem is actually solved before moving to more complex solutions.

Key Takeaways

Identify your actual bottleneck (reads, writes, storage, connections) before choosing a solution
Connection pooling is almost always beneficial and low-risk
Read replicas scale read capacity; application must handle routing and lag
Partitioning improves performance for large tables when queries align with partition key
Sharding is a last resort for write scaling and extreme data volumes
Most applications never need sharding; simpler approaches suffice
Each scaling approach adds complexity; use the simplest approach that solves your problem