Database Performance Tuning: A Systematic Approach

Database performance problems trigger panic. Applications slow to a crawl. Users complain. Engineers start trying random optimizations: add an index here, increase buffer pool there, rewrite that query.

Random optimization wastes time. You might improve something irrelevant while ignoring the actual bottleneck. A systematic approach identifies what’s actually slow, why it’s slow, and how to fix it efficiently.

The Systematic Approach

Step 1: Define the Problem

Before optimizing, define what “slow” means:

Which queries are slow?
How slow is slow? (What’s the baseline? What’s acceptable?)
When does slowness occur? (Always? Under load? At specific times?)
What changed? (Recent deployments, data growth, traffic increase)

“The database is slow” isn’t a problem definition. “The user search query exceeds 500ms under peak load, started after last week’s deployment” is.

Step 2: Measure, Don’t Guess

Enable query logging and performance monitoring:

PostgreSQL:

-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = '100ms';
SELECT pg_reload_conf();

MySQL:

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1;

Identify the actual slow queries. The query you think is slow might not be the actual problem.

Step 3: Analyze Query Execution

Use EXPLAIN to understand how queries execute:

PostgreSQL:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM users WHERE email = 'user@example.com';

-- Output shows:
-- Seq Scan vs Index Scan
-- Rows estimated vs actual
-- Time per operation
-- Buffer usage

MySQL:

EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'user@example.com';

EXPLAIN reveals:

Whether indexes are used
Join strategies
Where time is spent
Planner estimation accuracy

Step 4: Identify the Bottleneck

Common bottlenecks:

Missing indexes: Sequential scans on large tables. Poor query structure: Inefficient joins, unnecessary subqueries. Lock contention: Queries waiting for locks. I/O saturation: Disk can’t keep up. Memory pressure: Buffer cache misses causing disk reads. Network: Remote database or large result sets.

Identify which bottleneck applies before optimizing.

Step 5: Apply Targeted Fixes

Fix the specific bottleneck identified. Measure improvement. Repeat.

Common Optimizations

Index Optimization

When to add indexes:

Columns in WHERE clauses
Columns in JOIN conditions
Columns in ORDER BY

Multi-column indexes: Order matters. Index on (a, b) helps queries filtering on a or a AND b, but not b alone.

-- For queries like: WHERE status = 'active' AND created_at > '2017-01-01'
CREATE INDEX idx_users_status_created ON users(status, created_at);

Partial indexes (PostgreSQL): Index only rows matching a condition:

-- Only index active users
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

Covering indexes: Include all columns needed by query to avoid table access:

-- Query only needs id, name, email
CREATE INDEX idx_users_covering ON users(email) INCLUDE (id, name);

Query Optimization

*Avoid SELECT : Fetch only columns you need. Reduces I/O and memory.

Use LIMIT: If you only need first N rows, say so. Databases can optimize.

Optimize subqueries: Often rewritable as JOINs for better performance:

-- Instead of:
SELECT * FROM orders WHERE user_id IN (SELECT id FROM users WHERE status = 'vip');

-- Use:
SELECT o.* FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.status = 'vip';

Use appropriate data types: Smaller types = more data fits in memory = faster.

Connection Management

Connection overhead adds up. Solutions:

Connection pooling: Reuse connections instead of creating new ones. Use PgBouncer for PostgreSQL.

Reduce connection count: Configure application pools appropriately. More connections isn’t always better.

Configuration Tuning

Database defaults are conservative. Production workloads benefit from tuning:

PostgreSQL:

# Memory
shared_buffers = 25% of RAM
effective_cache_size = 75% of RAM
work_mem = 50MB  # Per operation, be careful

# Checkpoints
checkpoint_completion_target = 0.9

# Parallelism
max_parallel_workers_per_gather = 4

MySQL:

# InnoDB buffer pool
innodb_buffer_pool_size = 70% of RAM

# Query cache (if applicable)
query_cache_size = 128M

Important: Change one thing at a time. Measure impact. Configuration tuning without measurement is guessing.

Caching

Not every query needs to hit the database:

Application caching: Cache frequently-read, rarely-changed data in Redis or Memcached.

Query result caching: Cache expensive query results with appropriate invalidation.

Materialized views: Pre-computed query results stored in database (PostgreSQL):

CREATE MATERIALIZED VIEW user_stats AS
SELECT user_id, COUNT(*) as order_count, SUM(total) as total_spent
FROM orders GROUP BY user_id;

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;

Monitoring for Prevention

Key Metrics

Query performance:

Query duration percentiles (p50, p95, p99)
Slow query rate
Query throughput

Resource utilization:

CPU usage
Memory usage
Disk I/O
Connection count

Database-specific:

Buffer cache hit ratio
Lock wait time
Replication lag
Checkpoint frequency

Alerting

Alert on:

Query duration exceeding threshold
Connection pool exhaustion
Disk space running low
Replication lag increasing

Regular Review

Periodically review:

Slow query logs
Index usage statistics
Table sizes and growth
Query patterns

Proactive review catches problems before they become incidents.

Anti-Patterns

Premature optimization: Don’t optimize until you have a problem. Measure first.

Index everything: Indexes have costs (storage, write performance). Add indexes for specific query patterns.

Ignoring the ORM: ORMs generate SQL. Understand what SQL your ORM produces.

Optimizing in production: Test optimizations in staging with production-like data first.

Single query focus: Sometimes the problem is 1000 slightly-slow queries, not one very-slow query.

Key Takeaways

Define the specific performance problem before optimizing
Measure actual slow queries; don’t guess
Use EXPLAIN to understand query execution plans
Identify the specific bottleneck before applying fixes
Index strategically based on query patterns
Optimize queries before adding hardware
Monitor continuously to catch problems early
Change one thing at a time and measure impact