Load Testing — Advanced

Performance testing at scale: throughput, latency, saturation, and breakpoints. Goes beyond simple scripts into realistic traffic shaping, SLO validation, and CI integration.

Performance testing at scale: throughput, latency, saturation, and breakpoints. Goes beyond simple scripts into realistic traffic shaping, SLO validation, and CI integration.


Load Test Types

Load test:       Ramp to expected peak, hold, measure steady state
Stress test:     Push beyond expected peak to find the breaking point
Soak test:       Hold at moderate load for hours — find memory leaks, connection exhaustion
Spike test:      Instant jump from 0 to peak — cold start, connection pools
Breakpoint test: Gradually increase until something fails — find capacity ceiling
Volume test:     Large data sets — DB query time with 10M vs 10K rows

k6 — Realistic Traffic Scenarios

// load/checkout-flow.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Counter, Rate, Trend } from 'k6/metrics';

// Custom metrics
const checkoutErrors = new Counter('checkout_errors');
const checkoutDuration = new Trend('checkout_duration', true);
const successRate = new Rate('checkout_success_rate');

export const options = {
  scenarios: {
    // Ramp up to 100 VUs, hold 5 min, ramp down
    load_test: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 50 },    // ramp up
        { duration: '5m', target: 100 },   // hold at 100 VUs
        { duration: '2m', target: 0 },     // ramp down
      ],
    },
    // Spike: jump to 500 VUs instantly
    spike_test: {
      executor: 'constant-vus',
      vus: 500,
      duration: '30s',
      startTime: '10m',  // start after load test
    },
  },
  thresholds: {
    http_req_duration: ['p95<500', 'p99<1000'],   // SLO: p95 < 500ms
    http_req_failed: ['rate<0.01'],               // < 1% errors
    checkout_success_rate: ['rate>0.99'],
  },
};

const BASE_URL = __ENV.BASE_URL || 'https://staging.myapp.com';

export function setup() {
  // Seed test data before test starts
  const response = http.post(`${BASE_URL}/api/test/seed-products`, { count: 100 });
  check(response, { 'seed successful': r => r.status === 200 });
  return { products: response.json().product_ids };
}

export default function (data) {
  // 1. Browse products
  const products = http.get(`${BASE_URL}/api/products`);
  check(products, { 'products loaded': r => r.status === 200 });

  // 2. Add to cart
  const productId = data.products[Math.floor(Math.random() * data.products.length)];
  const cart = http.post(`${BASE_URL}/api/cart`, JSON.stringify({
    product_id: productId, quantity: 1,
  }), { headers: { 'Content-Type': 'application/json' } });

  // 3. Checkout
  const start = Date.now();
  const checkout = http.post(`${BASE_URL}/api/checkout`, JSON.stringify({
    cart_id: cart.json('cart_id'),
    payment_method: 'test_card',
  }), { headers: { 'Content-Type': 'application/json' } });

  checkoutDuration.add(Date.now() - start);

  const success = check(checkout, {
    'checkout 200': r => r.status === 200,
    'has order id': r => r.json('order_id') !== null,
  });

  successRate.add(success);
  if (!success) checkoutErrors.add(1);

  sleep(1 + Math.random() * 2);  // think time: 1-3 seconds
}

export function teardown(data) {
  http.post(`${BASE_URL}/api/test/cleanup`);
}

Locust — Python Load Testing

# load/locustfile.py
from locust import HttpUser, task, between, events
import random

class ProductUser(HttpUser):
    wait_time = between(1, 3)
    token = None

    def on_start(self):
        """Called once per simulated user at start."""
        response = self.client.post("/api/auth/token", json={
            "email": "loadtest@example.com",
            "password": "testpass"
        })
        self.token = response.json()["access_token"]
        self.client.headers.update({"Authorization": f"Bearer {self.token}"})

    @task(5)  # 5x more likely than other tasks
    def browse_products(self):
        self.client.get("/api/products")

    @task(2)
    def view_product_detail(self):
        product_id = random.randint(1, 1000)
        self.client.get(f"/api/products/{product_id}", name="/api/products/[id]")

    @task(1)
    def checkout(self):
        with self.client.post("/api/cart", json={"product_id": 1, "quantity": 1},
                              catch_response=True) as response:
            if response.status_code == 200:
                cart_id = response.json()["cart_id"]
                self.client.post("/api/checkout", json={"cart_id": cart_id})
            else:
                response.failure(f"Cart creation failed: {response.status_code}")

@events.test_start.add_listener
def on_test_start(environment, **kwargs):
    print(f"Load test starting against: {environment.host}")
# Run headless with target shape
locust --headless \
  --users 100 \
  --spawn-rate 10 \
  --run-time 5m \
  --host https://staging.myapp.com \
  --html report.html

SLO Validation

# load/validate_slos.py
import json
import sys

with open("k6-results.json") as f:
    results = json.load(f)

metrics = results["metrics"]

SLOS = {
    "http_req_duration{quantile:0.95}": 500,   # p95 < 500ms
    "http_req_duration{quantile:0.99}": 1000,  # p99 < 1000ms
    "http_req_failed{rate}": 0.01,             # error rate < 1%
}

failures = []
for metric, threshold in SLOS.items():
    actual = metrics.get(metric, {}).get("value", float("inf"))
    if actual > threshold:
        failures.append(f"{metric}: {actual:.2f} > {threshold}")

if failures:
    print("SLO VIOLATIONS:")
    for f in failures: print(f"  {f}")
    sys.exit(1)

print("All SLOs met")

Database Load Patterns

# Test read query performance under concurrent load
import asyncio
import time
import statistics

async def measure_query_latency(pool, query, iterations=1000):
    latencies = []
    async with asyncio.TaskGroup() as tg:
        for _ in range(iterations):
            async def run():
                start = time.perf_counter()
                async with pool.acquire() as conn:
                    await conn.fetch(query)
                latencies.append((time.perf_counter() - start) * 1000)
            tg.create_task(run())

    return {
        "p50": statistics.median(latencies),
        "p95": sorted(latencies)[int(len(latencies) * 0.95)],
        "p99": sorted(latencies)[int(len(latencies) * 0.99)],
        "max": max(latencies),
    }

async def test_product_list_query_slo():
    pool = await asyncpg.create_pool(DATABASE_URL, max_size=20)
    stats = await measure_query_latency(
        pool, "SELECT * FROM products WHERE category='electronics' ORDER BY name LIMIT 20"
    )
    assert stats["p95"] < 50, f"p95 query time {stats['p95']}ms exceeds 50ms SLO"

Load Test in CI (Nightly)

# .github/workflows/performance.yaml
name: Nightly Performance Tests
on:
  schedule:
    - cron: '0 1 * * *'

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: grafana/setup-k6-action@v1

    - name: Run load tests
      run: k6 run --out json=results.json load/checkout-flow.js
      env:
        BASE_URL: ${{ vars.STAGING_URL }}

    - name: Validate SLOs
      run: python load/validate_slos.py

    - name: Upload results
      uses: actions/upload-artifact@v4
      with:
        name: k6-results
        path: results.json

Common Failure Cases

Load test runs against production instead of staging Why: BASE_URL defaults to the production URL when the environment variable is not set in CI, so real users are affected. Detect: production error rate spikes during the nightly CI run; the load test output shows the production hostname. Fix: make BASE_URL a required parameter with no default — k6 will exit with a script error if it is unset, forcing explicit configuration.

Virtual user count set too high for a single load generator Why: running 500+ VUs from a single k6 process on a 2-core CI runner exhausts the generator's own CPU, making latency results meaningless (the tool, not the system, is the bottleneck). Detect: k6 reports http_req_blocked p99 > 100ms and iteration duration far higher than expected even at low concurrency. Fix: distribute load across multiple k6 instances using k6 cloud or a distributed runner setup, or reduce VUs to a level the generator can sustain.

Soak test catches no memory leak because test data is too small Why: holding 50 VUs for 2 hours against a seeded database of 1,000 rows does not surface the memory leak that only appears with real production-scale data volumes. Detect: no memory growth observed during the soak test, but production shows gradual memory increase over days. Fix: seed the test environment with production-representative data volumes before running the soak test.

SLO thresholds based on average hide the long tail Why: k6 threshold http_req_duration: ['avg<500'] passes even when p99 is 5,000ms because a fast majority drags the average down. Detect: the threshold passes in CI but users report slow experiences; checking the raw results shows high p99. Fix: replace average-based thresholds with percentile thresholds: ['p(95)<500', 'p(99)<1000'].

Connections

tqa-hub · technical-qa/chaos-engineering · qa/non-functional-testing · qa/test-automation-strategy · cloud/observability-stack · llms/ae-hub · technical-qa/k6

Open Questions

  • What is the most common failure mode when implementing this at scale?
  • How does this testing approach need to adapt for distributed or microservice architectures?