Load Testing Strategies for Production Systems

Load testing validates system behavior under stress before users experience it. But many load tests are unrealistic, poorly designed, or produce misleading results. Here’s how to do load testing effectively.

Types of Load Tests

Baseline Testing

Establish normal performance:

Goal: Understand system behavior at expected load
Load: Current production traffic patterns
Duration: 1-2 hours
Metrics: Latency, throughput, error rate, resource utilization

Stress Testing

Find breaking points:

Goal: Discover where system fails
Load: Increase until failure
Pattern: Ramp up gradually
Metrics: When does latency spike? When do errors occur?

Spike Testing

Handle sudden load increases:

Goal: Validate autoscaling and sudden load handling
Load: Normal → 10x → Normal
Duration: Spike for 5-10 minutes
Metrics: Recovery time, errors during spike

Soak Testing

Find issues that emerge over time:

Goal: Identify memory leaks, connection issues
Load: Sustained moderate load
Duration: 12-24 hours
Metrics: Resource trends, error rate over time

Chaos + Load Testing

Combine with failure injection:

Goal: Validate graceful degradation under load
Load: Normal production load
Failures: Kill pods, inject latency, network partition
Metrics: Service behavior during failures

Realistic Load Patterns

Model Real Traffic

// Bad: Constant load
for (let i = 0; i < 1000; i++) {
    makeRequest();
}

// Good: Realistic distribution
const loadProfile = [
    { hour: 0, rps: 100 },
    { hour: 8, rps: 500 },  // Morning peak
    { hour: 12, rps: 800 }, // Lunch peak
    { hour: 18, rps: 600 }, // Evening
    { hour: 23, rps: 150 }, // Night
];

Vary Request Types

// Production distribution
const requestMix = {
    'GET /products': 60,      // Most common
    'GET /products/:id': 25,  // Product details
    'POST /cart': 10,         // Add to cart
    'POST /checkout': 5,      // Checkout
};

Realistic Data

// Use production-like data volumes
const testUser = getTestUser(); // Has real order history
const products = getPopularProducts(); // Actual product IDs

Think Time

Real users pause between actions:

scenario('browse_and_buy', {
    exec: async () => {
        await viewHomepage();
        await sleep(2);  // User browses
        await viewProduct(randomProduct());
        await sleep(5);  // User reads
        await addToCart();
        await sleep(3);  // User decides
        await checkout();
    }
});

Tool Selection

k6

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
    stages: [
        { duration: '2m', target: 100 },  // Ramp up
        { duration: '5m', target: 100 },  // Sustain
        { duration: '2m', target: 0 },    // Ramp down
    ],
    thresholds: {
        http_req_duration: ['p(95)<500'],
        http_req_failed: ['rate<0.01'],
    },
};

export default function () {
    const res = http.get('https://api.example.com/products');
    check(res, {
        'status is 200': (r) => r.status === 200,
        'response time < 500ms': (r) => r.timings.duration < 500,
    });
    sleep(1);
}

Locust

from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 5)

    @task(10)
    def view_products(self):
        self.client.get("/products")

    @task(5)
    def view_product(self):
        product_id = random.choice(PRODUCT_IDS)
        self.client.get(f"/products/{product_id}")

    @task(1)
    def checkout(self):
        self.client.post("/checkout", json=ORDER_DATA)

Gatling

class BasicSimulation extends Simulation {
    val httpProtocol = http.baseUrl("https://api.example.com")

    val scn = scenario("BasicLoad")
        .exec(http("Get Products").get("/products"))
        .pause(2)
        .exec(http("Get Product").get("/products/123"))

    setUp(
        scn.inject(
            rampUsers(100).during(2.minutes),
            constantUsersPerSec(50).during(5.minutes)
        )
    ).protocols(httpProtocol)
}

Infrastructure Considerations

Test Environment

Options:

Production: Most realistic, highest risk
Production clone: Expensive but accurate
Scaled staging: Cost-effective, less accurate

Distributed Load Generation

Single machine has limits:

# k6 Cloud or distributed setup
k6 cloud run script.js --vus 10000

# Or self-hosted with multiple generators
k6 run --execution-segment "0:1/4" script.js  # Machine 1
k6 run --execution-segment "1/4:2/4" script.js  # Machine 2
k6 run --execution-segment "2/4:3/4" script.js  # Machine 3
k6 run --execution-segment "3/4:1" script.js  # Machine 4

Baseline Measurements

Always know what you’re comparing against:

Before test:
- Current CPU utilization
- Current memory usage
- Baseline latency
- Current connection count

Analyzing Results

Key Metrics

Response time:
- Average (less useful)
- Percentiles: p50, p90, p95, p99 (more useful)
- Max (outliers)

Throughput:
- Requests per second
- Successful requests per second

Errors:
- Error rate
- Error types
- When errors started

Resources:
- CPU utilization
- Memory usage
- Database connections
- Network I/O

Finding Bottlenecks

High CPU → Application or algorithm issue
High Memory → Memory leak or insufficient allocation
High DB connections → Connection pool exhaustion
High latency with low CPU → External dependency or I/O wait

Results Documentation

## Load Test Results: 2019-08-26

### Configuration
- Target: api.example.com
- Duration: 30 minutes
- Peak load: 1000 RPS

### Results
| Metric | Target | Actual |
|--------|--------|--------|
| P95 Latency | < 500ms | 420ms |
| Error Rate | < 1% | 0.3% |
| Max Throughput | 1000 RPS | 1200 RPS |

### Observations
- Database CPU reached 80% at 800 RPS
- Connection pool exhaustion at 1100 RPS

### Recommendations
- Increase connection pool from 50 to 100
- Add read replica for query-heavy endpoints

CI/CD Integration

Automated Load Tests

# GitHub Actions
load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Run load test
        run: |
          k6 run --out json=results.json load-test.js
      - name: Check thresholds
        run: |
          if grep -q '"thresholds":{"http_req_duration":{"ok":false}' results.json; then
            echo "Performance regression detected"
            exit 1
          fi

Pre-Production Gates

# Don't deploy if load test fails
stages:
  - build
  - test
  - load_test  # Must pass
  - deploy

Key Takeaways

Different tests serve different purposes: baseline, stress, spike, soak
Model realistic traffic patterns, request mixes, and data
Include think time between requests
Run from distributed load generators for high loads
Focus on percentiles, not averages
Analyze resource utilization to find bottlenecks
Document results with clear metrics and recommendations
Integrate load tests into CI/CD pipeline

Load testing prevents production surprises. Invest in realistic, regular testing.