Performance Testing

Validates that a system behaves acceptably under expected and peak load. Catches performance regressions before they become production incidents.

Validates that a system behaves acceptably under expected and peak load. Catches performance regressions before they become production incidents. Types: load testing (expected load), stress testing (beyond capacity), spike testing (sudden burst), soak testing (sustained load over time).


Performance Testing Types

TypeGoalLoad pattern
Load testVerify system handles expected peak load within SLARamp to target, hold, ramp down
Stress testFind the breaking point; how does it fail?Ramp until failure
Spike testValidate behaviour under sudden traffic burstsInstant jump to peak, then back
Soak testDetect memory leaks and degradation over timeModerate load, held for 2–24 hours
Volume testVerify with large data volumes (large DB, big files)Target load but with production-scale data

k6 — Modern Load Testing

k6 (by Grafana Labs) is the modern standard. JavaScript test scripts, clean API, built-in threshold assertions, CI-friendly CLI output. Self-hosted or cloud.

Basic Script

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 100,             // virtual users
  duration: '5m',       // test duration

  thresholds: {
    http_req_duration: ['p(95)<1000'],   // 95% of requests under 1000ms
    http_req_failed: ['rate<0.01'],      // less than 1% error rate
  },
};

export default function() {
  const res = http.get('https://api.example.com/products');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });

  sleep(1);    // think time between requests (realistic user pacing)
}

Scenarios — Ramp Profiles

export const options = {
  scenarios: {
    // Ramp up gradually — load test
    gradual_ramp: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 50 },    // ramp to 50 VUs
        { duration: '5m', target: 50 },    // hold at 50
        { duration: '2m', target: 100 },   // ramp to 100
        { duration: '5m', target: 100 },   // hold at 100
        { duration: '2m', target: 0 },     // ramp down
      ],
    },

    // Spike test — sudden burst
    spike: {
      executor: 'ramping-vus',
      stages: [
        { duration: '10s', target: 0 },
        { duration: '1m', target: 500 },   // instant spike
        { duration: '3m', target: 500 },
        { duration: '10s', target: 0 },
      ],
    },

    // Constant arrival rate — realistic request rate
    constant_rate: {
      executor: 'constant-arrival-rate',
      rate: 1000,                         // 1,000 requests per second
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 50,
    },
  },

  thresholds: {
    http_req_duration: ['p(99)<2000'],
    http_req_failed: ['rate<0.005'],
  },
};

Authentication in k6

import http from 'k6/http';

// Get token once per VU lifecycle
export function setup() {
  const loginRes = http.post('https://api.example.com/auth/token', JSON.stringify({
    username: 'loadtest@example.com',
    password: 'testpassword',
  }), { headers: { 'Content-Type': 'application/json' } });

  return { token: loginRes.json('access_token') };
}

export default function(data) {
  http.get('https://api.example.com/dashboard', {
    headers: { Authorization: `Bearer ${data.token}` },
  });
}

CI Integration

# GitHub Actions
- name: Run k6 load test
  uses: grafana/k6-action@v0.3.1
  with:
    filename: tests/load/api-load-test.js
  env:
    BASE_URL: https://staging.api.example.com

# Or via Docker
- name: Run k6
  run: |
    docker run --rm \
      -v ${{ github.workspace }}/tests:/tests \
      -e BASE_URL=https://staging.api.example.com \
      grafana/k6 run /tests/load/api-load-test.js

JMeter — Enterprise Standard

Apache JMeter. GUI-based test creation, XML test plans (JMX), wide enterprise adoption. Heavier and older than k6 but ubiquitous in enterprise environments.

# Run from CLI (headless, CI)
jmeter -n -t my-test-plan.jmx \
  -l results.jtl \
  -e -o report/ \
  -Jbase_url=https://staging.example.com \
  -Jthreads=100 \
  -Jduration=300

JMeter test plan structure:

  • Thread Group — defines VU count, ramp-up time, loop count
  • HTTP Request Samplers — individual requests
  • Assertions — response code, response body, response time
  • Listeners — results (view in GUI or JTL file for CI)
  • Config Elements — HTTP defaults, CSV data set (parameterisation)

JMeter vs k6:

k6JMeter
LanguageJavaScriptXML/Groovy
Threshold assertionNativeVia plugins
CI-friendlinessExcellentGood (CLI mode)
Resource usageLowHigh (JVM)
Learning curveLowMedium
Enterprise adoptionGrowingDominant

Performance Metrics to Track

MetricGood targetHow measured
ThroughputAs high as possible at target VUsrequests/second
Response time p50< 200ms (API)Median response
Response time p95< 1000ms (API)95th percentile
Response time p99< 2000ms (API)99th percentile
Error rate< 1% at target load5xx responses / total
Saturation pointKnow it before production finds itVUs at which p99 exceeds SLA
CPU/memory at peak< 70% CPU, headroom for spikesCloudWatch / Prometheus

Track p95 and p99, not just average. Averages hide the long tail. A p50 of 100ms with p99 of 5000ms means 1 in 100 requests is extremely slow.


Database Performance Under Load

Often the bottleneck. Check during load tests:

  • Connection pool exhaustion (pool wait time > 0 = problem)
  • Slow query log during load
  • Lock contention (pg_stat_activity in Postgres)
  • Index scans vs sequential scans
-- Postgres: find slow queries during load test
SELECT query, mean_exec_time, calls, rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Active connections and states
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;

Gatling

JVM-based load testing tool with Scala DSL. Strong for complex scenarios with stateful sessions. Better reporting than JMeter out of the box. Less common than k6 but popular in Java shops.

class BasicSimulation extends Simulation {
  val httpProtocol = http.baseUrl("https://api.example.com")

  val scn = scenario("Browse Products")
    .exec(http("Get Products").get("/products").check(status.is(200)))
    .pause(1)
    .exec(http("Get Product Detail").get("/products/1").check(status.is(200)))

  setUp(scn.inject(
    rampUsers(100).during(30.seconds),
    constantUsersPerSec(50).during(5.minutes)
  ).protocols(httpProtocol))
    .assertions(
      global.responseTime.percentile(95).lt(1000),
      global.failedRequests.percent.lt(1)
    )
}

Common Failure Cases

Authentication tokens expire mid-test Why: JWTs obtained in setup() have a short TTL; by the time a long soak test reaches hour 2, every request returns 401. Detect: error rate spikes suddenly mid-run at a predictable time interval matching the token TTL. Fix: implement token refresh logic in the VU lifecycle (e.g., k6 setup returns the refresh token and each VU re-authenticates when the access token is near expiry).

Think time omitted, making concurrency unrealistically high Why: removing sleep(1) means 100 VUs each completing a request every 50ms generates 2,000 req/s — far beyond realistic user behaviour — producing a stress test when you intended a load test. Detect: throughput is 10-100x higher than production traffic patterns; results are not comparable to real-world behaviour. Fix: add think time (sleep(1 + Math.random() * 2)) to model realistic user pacing and match actual observed request rates.

Gatling simulation compiles but does not warm up the JVM before measurement starts Why: the first few thousand requests in a Gatling simulation run on an unwarmed JVM, inflating p99 latency figures and making results unrepeatable. Detect: p99 latency is significantly higher in the first 60 seconds than in steady state; re-running the test shows different numbers. Fix: add a ramp-up warm-up phase (rampUsers(10).during(30.seconds)) before the measurement period begins, and exclude the warm-up phase data from SLO assertions.

Database connection pool exhausted silently Why: as VU count climbs, connection pool wait time grows but the app returns 200 with degraded latency rather than an error, so the error rate threshold never fires. Detect: p99 latency climbs steadily while error rate stays near zero; pg_stat_activity shows many connections in idle in transaction state. Fix: add a custom metric tracking DB pool wait time and set a threshold on it (pool_wait_time: ['p(95)<50']), and monitor pg_stat_activity during the test.

Connections

Open Questions

  • What is the most common failure mode when implementing this at scale?
  • How does this testing approach need to adapt for distributed or microservice architectures?