Performance Testing

Validates that a system behaves acceptably under expected and peak load. Catches performance regressions before they become production incidents.

Updated Invalid Date·

performance-testing k6 jmeter load-testing stress-testing sla thresholds

Validates that a system behaves acceptably under expected and peak load. Catches performance regressions before they become production incidents. Types: load testing (expected load), stress testing (beyond capacity), spike testing (sudden burst), soak testing (sustained load over time).

Performance Testing Types

Type	Goal	Load pattern
Load test	Verify system handles expected peak load within SLA	Ramp to target, hold, ramp down
Stress test	Find the breaking point; how does it fail?	Ramp until failure
Spike test	Validate behaviour under sudden traffic bursts	Instant jump to peak, then back
Soak test	Detect memory leaks and degradation over time	Moderate load, held for 2–24 hours
Volume test	Verify with large data volumes (large DB, big files)	Target load but with production-scale data

k6 — Modern Load Testing

k6 (by Grafana Labs) is the modern standard. JavaScript test scripts, clean API, built-in threshold assertions, CI-friendly CLI output. Self-hosted or cloud.

Basic Script

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 100,             // virtual users
  duration: '5m',       // test duration

  thresholds: {
    http_req_duration: ['p(95)<1000'],   // 95% of requests under 1000ms
    http_req_failed: ['rate<0.01'],      // less than 1% error rate
  },
};

export default function() {
  const res = http.get('https://api.example.com/products');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });

  sleep(1);    // think time between requests (realistic user pacing)
}

Scenarios — Ramp Profiles

export const options = {
  scenarios: {
    // Ramp up gradually — load test
    gradual_ramp: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 50 },    // ramp to 50 VUs
        { duration: '5m', target: 50 },    // hold at 50
        { duration: '2m', target: 100 },   // ramp to 100
        { duration: '5m', target: 100 },   // hold at 100
        { duration: '2m', target: 0 },     // ramp down
      ],
    },

    // Spike test — sudden burst
    spike: {
      executor: 'ramping-vus',
      stages: [
        { duration: '10s', target: 0 },
        { duration: '1m', target: 500 },   // instant spike
        { duration: '3m', target: 500 },
        { duration: '10s', target: 0 },
      ],
    },

    // Constant arrival rate — realistic request rate
    constant_rate: {
      executor: 'constant-arrival-rate',
      rate: 1000,                         // 1,000 requests per second
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 50,
    },
  },

  thresholds: {
    http_req_duration: ['p(99)<2000'],
    http_req_failed: ['rate<0.005'],
  },
};

Authentication in k6

import http from 'k6/http';

// Get token once per VU lifecycle
export function setup() {
  const loginRes = http.post('https://api.example.com/auth/token', JSON.stringify({
    username: 'loadtest@example.com',
    password: 'testpassword',
  }), { headers: { 'Content-Type': 'application/json' } });

  return { token: loginRes.json('access_token') };
}

export default function(data) {
  http.get('https://api.example.com/dashboard', {
    headers: { Authorization: `Bearer ${data.token}` },
  });
}

CI Integration

# GitHub Actions
- name: Run k6 load test
  uses: grafana/k6-action@v0.3.1
  with:
    filename: tests/load/api-load-test.js
  env:
    BASE_URL: https://staging.api.example.com

# Or via Docker
- name: Run k6
  run: |
    docker run --rm \
      -v ${{ github.workspace }}/tests:/tests \
      -e BASE_URL=https://staging.api.example.com \
      grafana/k6 run /tests/load/api-load-test.js

JMeter — Enterprise Standard

Apache JMeter. GUI-based test creation, XML test plans (JMX), wide enterprise adoption. Heavier and older than k6 but ubiquitous in enterprise environments.

# Run from CLI (headless, CI)
jmeter -n -t my-test-plan.jmx \
  -l results.jtl \
  -e -o report/ \
  -Jbase_url=https://staging.example.com \
  -Jthreads=100 \
  -Jduration=300

JMeter test plan structure:

Thread Group — defines VU count, ramp-up time, loop count
HTTP Request Samplers — individual requests
Assertions — response code, response body, response time
Listeners — results (view in GUI or JTL file for CI)
Config Elements — HTTP defaults, CSV data set (parameterisation)

JMeter vs k6:

	k6	JMeter
Language	JavaScript	XML/Groovy
Threshold assertion	Native	Via plugins
CI-friendliness	Excellent	Good (CLI mode)
Resource usage	Low	High (JVM)
Learning curve	Low	Medium
Enterprise adoption	Growing	Dominant

Performance Metrics to Track

Metric	Good target	How measured
Throughput	As high as possible at target VUs	requests/second
Response time p50	< 200ms (API)	Median response
Response time p95	< 1000ms (API)	95th percentile
Response time p99	< 2000ms (API)	99th percentile
Error rate	< 1% at target load	5xx responses / total
Saturation point	Know it before production finds it	VUs at which p99 exceeds SLA
CPU/memory at peak	< 70% CPU, headroom for spikes	CloudWatch / Prometheus

Track p95 and p99, not just average. Averages hide the long tail. A p50 of 100ms with p99 of 5000ms means 1 in 100 requests is extremely slow.

Database Performance Under Load

Often the bottleneck. Check during load tests:

Connection pool exhaustion (pool wait time > 0 = problem)
Slow query log during load
Lock contention (pg_stat_activity in Postgres)
Index scans vs sequential scans

-- Postgres: find slow queries during load test
SELECT query, mean_exec_time, calls, rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Active connections and states
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;

Gatling

JVM-based load testing tool with Scala DSL. Strong for complex scenarios with stateful sessions. Better reporting than JMeter out of the box. Less common than k6 but popular in Java shops.

class BasicSimulation extends Simulation {
  val httpProtocol = http.baseUrl("https://api.example.com")

  val scn = scenario("Browse Products")
    .exec(http("Get Products").get("/products").check(status.is(200)))
    .pause(1)
    .exec(http("Get Product Detail").get("/products/1").check(status.is(200)))

  setUp(scn.inject(
    rampUsers(100).during(30.seconds),
    constantUsersPerSec(50).during(5.minutes)
  ).protocols(httpProtocol))
    .assertions(
      global.responseTime.percentile(95).lt(1000),
      global.failedRequests.percent.lt(1)
    )
}

Common Failure Cases

Authentication tokens expire mid-test Why: JWTs obtained in setup() have a short TTL; by the time a long soak test reaches hour 2, every request returns 401. Detect: error rate spikes suddenly mid-run at a predictable time interval matching the token TTL. Fix: implement token refresh logic in the VU lifecycle (e.g., k6 setup returns the refresh token and each VU re-authenticates when the access token is near expiry).

Think time omitted, making concurrency unrealistically high Why: removing sleep(1) means 100 VUs each completing a request every 50ms generates 2,000 req/s — far beyond realistic user behaviour — producing a stress test when you intended a load test. Detect: throughput is 10-100x higher than production traffic patterns; results are not comparable to real-world behaviour. Fix: add think time (sleep(1 + Math.random() * 2)) to model realistic user pacing and match actual observed request rates.

Gatling simulation compiles but does not warm up the JVM before measurement starts Why: the first few thousand requests in a Gatling simulation run on an unwarmed JVM, inflating p99 latency figures and making results unrepeatable. Detect: p99 latency is significantly higher in the first 60 seconds than in steady state; re-running the test shows different numbers. Fix: add a ramp-up warm-up phase (rampUsers(10).during(30.seconds)) before the measurement period begins, and exclude the warm-up phase data from SLO assertions.

Database connection pool exhausted silently Why: as VU count climbs, connection pool wait time grows but the app returns 200 with degraded latency rather than an error, so the error rate threshold never fires. Detect: p99 latency climbs steadily while error rate stays near zero; pg_stat_activity shows many connections in idle in transaction state. Fix: add a custom metric tracking DB pool wait time and set a threshold on it (pool_wait_time: ['p(95)<50']), and monitor pg_stat_activity during the test.

Connections

technical-qa/api-testing — API testing tools used in performance test setup
cloud/cloud-monitoring — CloudWatch / Prometheus metrics during load tests
qa/test-strategy — performance testing sits in Q3 (critique product, technology-facing)
qa/risk-based-testing — performance tests prioritised for high-traffic endpoints
cloud/kubernetes — HPA behaviour under load is a key scenario to test
technical-qa/k6 — dedicated k6 page: VU model, thresholds, WebSocket support, CI integration

Open Questions

What is the most common failure mode when implementing this at scale?
How does this testing approach need to adapt for distributed or microservice architectures?