Performance Test Reporting

The artefacts, formats, and stakeholder communication a Senior Technical Consultant produces during and after a performance testing engagement — from interim run reports through to go/no-go sign-off packs.

Updated Invalid Date·

performance-testing reporting stakeholder-communication consultancy

The artefacts, formats, and stakeholder communication a Senior Technical Consultant produces during and after a performance testing engagement. Knowing how to run the tests is only half the job. The other half is translating raw numbers into decisions that non-technical stakeholders can act on and that engineers can debug from.

Report Taxonomy

Artefact	Primary audience	When produced	Format
Run report	Engineers, Test Lead	After each test run	Markdown / HTML
Interim report	Project manager, Tech lead	Mid-engagement (weekly or mid-sprint)	Word / PDF
NFR pass/fail summary	All stakeholders	End of each test cycle	Table, one page
Trend analysis	Tech lead, Architect	End of sprint / release	Chart + commentary
Risk summary	Project manager, Sponsor	Before go/no-go decision	Bullet list, half page
Go/no-go sign-off pack	Project sponsor, Release manager	Pre-release	PDF, 3–5 pages
Final engagement report	Client, Account manager	End of engagement	Full document, 10–20 pages

Each artefact has a different job. A run report gets a bug fixed. A go/no-go pack authorises a production deployment. Conflating them produces documents that are too detailed to act on and too vague to debug from.

Run Report — Structure

Produced after every load test execution. Engineers consume this. It should answer three questions in under five minutes of reading: did we pass, where did we fail, what is the likely cause.

# Performance Run Report — [Service Name] — [Date]

## Run metadata
- Scenario: [load / stress / soak / spike / volume]
- Tool: k6 / JMeter / Locust / Gatling
- Environment: staging-eu-west-1
- Build / commit: abc1234
- Duration: 30 minutes
- Ramp profile: 0 → 200 VUs over 5 min, hold 20 min, ramp down 5 min

## Summary
PASS / FAIL — [one sentence reason]

## Results vs NFR thresholds

| Endpoint              | NFR (p95) | Actual (p95) | NFR (p99) | Actual (p99) | Error % | Status |
|-----------------------|-----------|--------------|-----------|--------------|---------|--------|
| GET /api/products     | 500ms     | 312ms        | 800ms     | 478ms        | 0.1%    | PASS   |
| POST /api/checkout    | 3000ms    | 2104ms       | 5000ms    | 3890ms       | 0.4%    | PASS   |
| GET /api/search       | 800ms     | 1243ms       | 1500ms    | 2710ms       | 2.1%    | FAIL   |
| GET /api/orders       | 1000ms    | 887ms        | 2000ms    | 1654ms       | 0.2%    | PASS   |

## Failures and likely causes

GET /api/search — p95 1243ms (NFR: 800ms), error rate 2.1% (NFR: < 0.5%)
Observed: latency spikes at 80+ VU mark; errors are 503 from the search service.
Probable cause: full-text search lacks an index on products.search_vector; confirmed
by DB slow query log — avg query time 890ms at load vs 45ms at baseline.
Recommended fix: CREATE INDEX CONCURRENTLY on search_vector (estimated 2 hours).

## Environment observations
- CPU spiked to 94% on app-server-02 at peak; other nodes < 60%. Potential scheduling issue.
- Redis cache hit rate: 73% (target: > 85%). Cold-start effect or TTL too low.

## Raw results
[Link to k6 HTML report] [Link to Grafana dashboard snapshot]

NFR Pass/Fail Summary Table

One-page artefact used in stakeholder reviews. Every NFR defined at the start of the engagement appears. No row should be missing — a blank NFR means it was not tested, which must be called out explicitly.

## NFR Sign-Off Table — Release 2.3.0

Test date: 2026-05-03
Environment: Staging (matched to production spec)
Load profile: 200 concurrent users, 30-minute sustained

| #  | NFR ID   | Description                                      | Target   | Measured | Status   |
|----|----------|--------------------------------------------------|----------|----------|----------|
| 1  | NFR-001  | Product list p95 < 500ms @ 200 VU                | 500ms    | 312ms    | PASS     |
| 2  | NFR-002  | Checkout p95 < 3s @ 200 VU                       | 3000ms   | 2104ms   | PASS     |
| 3  | NFR-003  | Search p95 < 800ms @ 200 VU                      | 800ms    | 1243ms   | FAIL     |
| 4  | NFR-004  | Order history p95 < 1s @ 200 VU                  | 1000ms   | 887ms    | PASS     |
| 5  | NFR-005  | Error rate < 0.5% across all endpoints           | 0.5%     | 2.1%*    | FAIL     |
| 6  | NFR-006  | System stable under 2h soak at 100 VU            | Stable   | NOT RUN  | PENDING  |
| 7  | NFR-007  | Peak throughput > 500 req/s                      | 500 rps  | 612 rps  | PASS     |

* Error rate failure driven entirely by NFR-003 endpoint.

Overall status: FAIL — 2 of 7 NFRs not met. NFR-006 not yet executed.
Blocker for release: NFR-003 and NFR-005 (linked).
NFR-006 must be executed before go/no-go.

Mark NOT RUN explicitly rather than omitting the row. A missing row looks like it passed.

Presenting p95 / p99 to Different Audiences

The same number means different things depending on who you are talking to. Calibrate the explanation.

To a developer

The p95 for GET /api/search is 1243ms. That means 95% of requests completed
within 1243ms, but 5% took longer — some of those are in the p99 at 2710ms.
The slow query log shows the search_vector column has no index. At 80+ VU the
planner switches to a seq scan and latency jumps. Create the index and I'd
expect p95 to drop below 200ms.

Give them the percentile, the raw number, the direction of the problem, and a hypothesis. They will find the fix.

To a project manager

The search feature does not meet its performance target. Under the expected
peak load, 1 in 20 search requests takes more than 1.2 seconds — our agreed
target was under 0.8 seconds. The engineering team has identified a fix
(estimated 2 hours) and it is now in the sprint backlog as PROJ-441.
We cannot recommend releasing the current build. If the fix lands and passes
re-test by Thursday, the release timeline is unaffected.

State the user impact (1 in 20 requests), whether the target was met, what the consequence is, and whether the timeline is at risk. Drop the percentile notation — p95 means nothing to most PMs.

Performance testing has found one issue that blocks this release. The product
search function is slower than agreed under peak load. The team has a fix
ready; we expect to re-test and close this by end of week. All other features
passed. The release date is not currently at risk.

One issue. Is the date at risk. What happens next. Nothing else.

Trend Analysis Across Sprints / Releases

Trend analysis answers: is performance getting better or worse over time, and is any endpoint drifting toward its SLO ceiling?

Latency trend table (populated per release)

## p95 Latency Trend — GET /api/products (ms)

| Release | Date       | p95  | p99  | NFR (p95) | Headroom |
|---------|------------|------|------|-----------|----------|
| 2.0.0   | 2026-02-14 | 180  | 290  | 500ms     | 64%      |
| 2.1.0   | 2026-03-01 | 195  | 310  | 500ms     | 61%      |
| 2.1.2   | 2026-03-15 | 220  | 380  | 500ms     | 56%      |
| 2.2.0   | 2026-04-05 | 290  | 520  | 500ms     | 42%      |
| 2.3.0   | 2026-05-03 | 312  | 478  | 500ms     | 38%      |

The headroom column is the most important. An endpoint can be passing its NFR while the headroom is shrinking consistently — that is a degradation trend that will eventually become a failure. Flag it before the failure, not after.

When to escalate a trend:

Headroom below 25%: raise in the next sprint review as a risk item.
Headroom below 10%: block new feature work on that endpoint until the cause is found.
Two consecutive releases with headroom reduction > 10 percentage points: write a defect even if the NFR is still passing.

Error rate trend

Track alongside latency. An error rate that is drifting from 0.1% toward 0.5% across releases deserves attention even if it has not crossed the NFR threshold.

Interim Report Format

Produced mid-engagement — typically weekly on a multi-week performance programme, or once mid-sprint on a shorter cycle. Audience: project manager and technical lead.

# Performance Testing — Interim Report
**Client:** Acme Corp
**Engagement:** Platform Re-platform Performance Validation
**Week:** 2 of 4
**Report date:** 2026-05-03
**Prepared by:** Lewis Elliott, Senior Technical Consultant

---

## Progress this week

- Load test scenarios completed: checkout flow, product search, order history (3 of 7 planned)
- Defects raised: 2 (PERF-001 search latency, PERF-002 connection pool exhaustion)
- Defects resolved and re-tested: 0 (both in dev)

## Results summary

| Scenario           | NFRs tested | Pass | Fail | Pending re-test |
|--------------------|-------------|------|------|-----------------|
| Checkout flow      | 4           | 4    | 0    | 0               |
| Product search     | 3           | 1    | 2    | 0               |
| Order history      | 3           | 3    | 0    | 0               |

## Open defects

| ID       | Severity | Description                            | Owner     | ETA    |
|----------|----------|----------------------------------------|-----------|--------|
| PERF-001 | High     | Search p95 exceeds NFR at 80+ VU       | Dev team  | 06-May |
| PERF-002 | Medium   | Connection pool exhausted under soak   | Infra     | 08-May |

## Risks

- If PERF-001 is not resolved by 08-May, the soak test cannot run on schedule.
  This would push the final report date by 3 days.
- Staging environment CPU anomaly on app-server-02 not yet explained.
  Results from that node may not be representative. [Action: infra team to investigate.]

## Next week

- Complete remaining 4 scenario groups (user profile, reports, admin, API)
- Re-test PERF-001 and PERF-002 once fixes land
- Begin soak test if environment issue is resolved

---
*Next interim report: 2026-05-10*

Go/No-Go Sign-Off Pack

Produced immediately before a production release decision. Primary audience: release manager, project sponsor, sometimes a change advisory board. It must be self-contained — the reader should not need to find other documents to make the decision.

# Performance Testing — Go/No-Go Sign-Off Pack
**System:** Acme Platform v2.3.0
**Release date:** 2026-05-10
**Prepared by:** Lewis Elliott, Senior Technical Consultant, Resillion
**Date:** 2026-05-07

---

## Recommendation

**GO** — all performance NFRs met. No outstanding performance defects. Soak test
passed. Release is approved from a performance standpoint.

[OR]

**NO-GO** — 2 NFRs unmet (PERF-001, PERF-005). Releasing with known performance
failures under expected peak load carries a risk of user-facing degradation.
See Risk section.

---

## NFR sign-off (final)

[Include full NFR table from section above — all rows, final measured values]

All 7 NFRs: PASS. PERF-001 resolved in commit def5678, re-tested 06-May, now
passing with p95 = 310ms (NFR: 800ms).

---

## Test coverage

| Test type    | Executed | Pass | Notes                             |
|--------------|----------|------|-----------------------------------|
| Load test    | Yes      | Yes  | 200 VU, 30 min, all scenarios     |
| Stress test  | Yes      | Yes  | Breaking point: 480 VU            |
| Soak test    | Yes      | Yes  | 100 VU, 2h, memory stable         |
| Spike test   | No       | N/A  | Out of scope per agreed test plan |
| Volume test  | Yes      | Yes  | 2M product records                |

---

## Defect log (performance cycle)

| ID       | Severity | Description                       | Status   | Resolution                         |
|----------|----------|-----------------------------------|----------|------------------------------------|
| PERF-001 | High     | Search p95 exceeded NFR at 80+ VU | Resolved | Index added on search_vector       |
| PERF-002 | Medium   | Connection pool exhausted (soak)  | Resolved | Pool size increased from 10 to 50  |

---

## Residual risks

- Spike test not executed (out of scope). A sudden 10x traffic event (e.g. viral
  campaign) has not been validated. Mitigation: ensure auto-scaling is enabled
  and alert thresholds are set before release.
- Stress test breaking point (480 VU) is 2.4x the expected peak (200 VU). Headroom
  is adequate but not large. Recommend monitoring closely at launch.

---

## Sign-off

| Role                   | Name         | Decision | Date |
|------------------------|--------------|----------|------|
| Release Manager        |              |          |      |
| Technical Lead         |              |          |      |
| Project Sponsor        |              |          |      |
| QA Lead (Resillion)    | Lewis Elliott| APPROVED | 2026-05-07 |

The sign-off table is not theatre. If a stakeholder signs GO they are acknowledging the residual risks. If they cannot explain what they are signing, the pack is not clear enough.

Executive Summary Format

The executive summary opens the final report and the go/no-go pack. It is the only section most sponsors read. Keep it to one page, never more.

## Executive Summary

Resillion conducted performance testing of the Acme Platform v2.3.0 between
24 April and 7 May 2026. The engagement validated 7 non-functional requirements
covering response time and throughput under expected and peak load conditions.

Two defects were identified and resolved during the engagement. The most
significant — a missing database index on the product search feature — caused
response times to exceed the agreed threshold under sustained load. Following
the fix, all endpoints now meet or exceed their performance targets.

The system reached a stable breaking point at 480 concurrent users — 2.4 times
the expected peak. Memory and connection behaviour were stable over a two-hour
sustained test.

Resillion recommends proceeding with the planned release. One residual risk
has been noted: spike testing was out of scope, and sudden traffic surges
above 480 concurrent users have not been validated. This is documented in
section 5 and should be monitored at launch.

Key figures:
- NFRs tested: 7
- NFRs passing: 7 (2 required remediation during testing)
- Performance defects raised: 2
- Performance defects outstanding: 0
- Breaking point: 480 VU (2.4x expected peak)
- Soak test duration: 2 hours — stable

Rules for the executive summary: one recommendation, key figures as bullets, residual risk called out, no percentile notation unless the audience is technical.

Defect Reporting During a Performance Cycle

Performance defects follow the same lifecycle as functional defects but need additional fields that functional bugs do not require.

Title: [Endpoint] [metric] exceeds NFR under [load condition]
Example: GET /api/search p95 latency exceeds 800ms NFR under 80+ concurrent users

Severity classification:
  Critical — NFR exceeded by > 50%, or error rate > 5%
  High     — NFR exceeded by 10–50%, or error rate 1–5%
  Medium   — NFR exceeded by < 10%, or error rate 0.5–1%
  Low      — NFR met but headroom < 10%; trend risk

Required fields beyond standard defect:
  Load profile at time of failure (VU count, ramp shape, duration)
  Measured value at failure (p50, p95, p99, error rate)
  NFR threshold being violated
  Environment and build ref
  Link to the run report and raw results
  Grafana / APM snapshot at time of failure

Steps to reproduce:
  1. Deploy build abc1234 to staging-eu-west-1
  2. Run k6 script /scripts/search-load.js with 80 VUs
  3. Observe p95 latency exceeding 800ms after approx 8 minutes

Expected: p95 < 800ms per NFR-003
Actual: p95 = 1243ms, p99 = 2710ms, error rate = 2.1%

Link every performance defect back to its NFR ID. This makes the NFR pass/fail table easy to maintain — the defect tracks what is broken; the table tracks whether it is fixed.

Interim vs Final Report: Key Differences

Dimension	Interim	Final
Purpose	Status update; surface blockers early	Full record; client-deliverable
Defects	All open defects listed, status current	All defects from the engagement, final resolution
NFR table	Partial (only tested so far)	Complete (every NFR, final result)
Recommendations	Provisional	Definitive
Sign-off	Not required	Required from QA lead; optional from client
Length	1–2 pages	10–20 pages including appendices
Trend analysis	Omit (too early)	Include if multiple test cycles ran
Appendices	None	Raw results, tool configs, test scripts, environment spec

The interim report is a working document. The final report is a deliverable. Write them differently.

Common Reporting Failures

Reporting average response time instead of percentiles Averages hide tail latency. A system where 95% of requests complete in 100ms and 5% time out at 30s has an "average" of 1.6s that sounds acceptable. Always report p50, p95, p99 as the baseline set. Add max only when there is a specific reason to highlight tail behaviour.

NFR table with rows missing Any NFR that was agreed at engagement start but not yet tested must appear as PENDING or NOT RUN, never omitted. A missing row reads as a pass to stakeholders scanning the table.

Presenting a FAIL without a cause and an owner A red cell in the NFR table must be accompanied by a defect ID, a probable cause, and an owner. "FAIL" with no context is unusable — the project manager cannot escalate it and the developer cannot fix it.

Go/no-go pack that requires the reader to have read all prior reports Each sign-off pack must stand alone. Include the NFR table, the defect log, and the risk summary in the pack itself. A sponsor should not need to hunt for context.

Trend data never reviewed between releases The trend table exists to catch gradual degradation before it becomes a production failure. If it is only written into the final report and never reviewed mid-engagement, it serves no early-warning function. Review it at each sprint review with the tech lead.

Risk summary buried in appendices Risks must be in the main body of every stakeholder-facing report. An appendix is where readers stop reading. If a spike test was out of scope, that risk belongs on the first page of the go/no-go pack, not page 12.

Connections

qa/performance-testing-qa · qa/non-functional-testing · qa/test-reporting · qa/qa-metrics · qa/test-planning · qa/test-documentation · qa/risk-based-testing · qa/qa-hub

Open Questions

At what stage in a release cycle should performance regression thresholds be locked — during design, after the first run, or after three stable baseline runs?
How should reports distinguish between regressions that exceed the SLO threshold and those within statistical noise — and communicate the difference to non-technical stakeholders?
What's the right response when a go/no-go sign-off is overridden by a stakeholder despite a failing performance gate?