Database performance problems trigger panic. Applications slow to a crawl. Users complain. Engineers start trying random optimizations: add an index here, increase buffer pool there, rewrite that query.
Random optimization wastes time. You might improve something irrelevant while ignoring the actual bottleneck. A systematic approach identifies what’s actually slow, why it’s slow, and how to fix it efficiently.
The Systematic Approach
Step 1: Define the Problem
Before optimizing, define what “slow” means:
- Which queries are slow?
- How slow is slow? (What’s the baseline? What’s acceptable?)
- When does slowness occur? (Always? Under load? At specific times?)
- What changed? (Recent deployments, data growth, traffic increase)
“The database is slow” isn’t a problem definition. “The user search query exceeds 500ms under peak load, started after last week’s deployment” is.
Step 2: Measure, Don’t Guess
Enable query logging and performance monitoring:
PostgreSQL:
-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = '100ms';
SELECT pg_reload_conf();
MySQL:
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1;
Identify the actual slow queries. The query you think is slow might not be the actual problem.
Step 3: Analyze Query Execution
Use EXPLAIN to understand how queries execute:
PostgreSQL:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM users WHERE email = 'user@example.com';
-- Output shows:
-- Seq Scan vs Index Scan
-- Rows estimated vs actual
-- Time per operation
-- Buffer usage
MySQL:
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'user@example.com';
EXPLAIN reveals:
- Whether indexes are used
- Join strategies
- Where time is spent
- Planner estimation accuracy
Step 4: Identify the Bottleneck
Common bottlenecks:
Missing indexes: Sequential scans on large tables. Poor query structure: Inefficient joins, unnecessary subqueries. Lock contention: Queries waiting for locks. I/O saturation: Disk can’t keep up. Memory pressure: Buffer cache misses causing disk reads. Network: Remote database or large result sets.
Identify which bottleneck applies before optimizing.
Step 5: Apply Targeted Fixes
Fix the specific bottleneck identified. Measure improvement. Repeat.
Common Optimizations
Index Optimization
When to add indexes:
- Columns in WHERE clauses
- Columns in JOIN conditions
- Columns in ORDER BY
Multi-column indexes:
Order matters. Index on (a, b) helps queries filtering on a or a AND b, but not b alone.
-- For queries like: WHERE status = 'active' AND created_at > '2017-01-01'
CREATE INDEX idx_users_status_created ON users(status, created_at);
Partial indexes (PostgreSQL): Index only rows matching a condition:
-- Only index active users
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
Covering indexes: Include all columns needed by query to avoid table access:
-- Query only needs id, name, email
CREATE INDEX idx_users_covering ON users(email) INCLUDE (id, name);
Query Optimization
*Avoid SELECT : Fetch only columns you need. Reduces I/O and memory.
Use LIMIT: If you only need first N rows, say so. Databases can optimize.
Optimize subqueries: Often rewritable as JOINs for better performance:
-- Instead of:
SELECT * FROM orders WHERE user_id IN (SELECT id FROM users WHERE status = 'vip');
-- Use:
SELECT o.* FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.status = 'vip';
Use appropriate data types: Smaller types = more data fits in memory = faster.
Connection Management
Connection overhead adds up. Solutions:
Connection pooling: Reuse connections instead of creating new ones. Use PgBouncer for PostgreSQL.
Reduce connection count: Configure application pools appropriately. More connections isn’t always better.
Configuration Tuning
Database defaults are conservative. Production workloads benefit from tuning:
PostgreSQL:
# Memory
shared_buffers = 25% of RAM
effective_cache_size = 75% of RAM
work_mem = 50MB # Per operation, be careful
# Checkpoints
checkpoint_completion_target = 0.9
# Parallelism
max_parallel_workers_per_gather = 4
MySQL:
# InnoDB buffer pool
innodb_buffer_pool_size = 70% of RAM
# Query cache (if applicable)
query_cache_size = 128M
Important: Change one thing at a time. Measure impact. Configuration tuning without measurement is guessing.
Caching
Not every query needs to hit the database:
Application caching: Cache frequently-read, rarely-changed data in Redis or Memcached.
Query result caching: Cache expensive query results with appropriate invalidation.
Materialized views: Pre-computed query results stored in database (PostgreSQL):
CREATE MATERIALIZED VIEW user_stats AS
SELECT user_id, COUNT(*) as order_count, SUM(total) as total_spent
FROM orders GROUP BY user_id;
-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;
Monitoring for Prevention
Key Metrics
Query performance:
- Query duration percentiles (p50, p95, p99)
- Slow query rate
- Query throughput
Resource utilization:
- CPU usage
- Memory usage
- Disk I/O
- Connection count
Database-specific:
- Buffer cache hit ratio
- Lock wait time
- Replication lag
- Checkpoint frequency
Alerting
Alert on:
- Query duration exceeding threshold
- Connection pool exhaustion
- Disk space running low
- Replication lag increasing
Regular Review
Periodically review:
- Slow query logs
- Index usage statistics
- Table sizes and growth
- Query patterns
Proactive review catches problems before they become incidents.
Anti-Patterns
Premature optimization: Don’t optimize until you have a problem. Measure first.
Index everything: Indexes have costs (storage, write performance). Add indexes for specific query patterns.
Ignoring the ORM: ORMs generate SQL. Understand what SQL your ORM produces.
Optimizing in production: Test optimizations in staging with production-like data first.
Single query focus: Sometimes the problem is 1000 slightly-slow queries, not one very-slow query.
Key Takeaways
- Define the specific performance problem before optimizing
- Measure actual slow queries; don’t guess
- Use EXPLAIN to understand query execution plans
- Identify the specific bottleneck before applying fixes
- Index strategically based on query patterns
- Optimize queries before adding hardware
- Monitor continuously to catch problems early
- Change one thing at a time and measure impact