Risk-Based Testing
Prioritise testing effort toward areas of highest risk. You never have enough time to test everything — risk-based testing ensures the most critical and failure-prone areas get the most attention.
Prioritise testing effort toward areas of highest risk. You never have enough time to test everything. Risk-based testing ensures the most critical and failure-prone areas get the most attention.
The Core Idea
Risk = Likelihood of failure × Impact of failure
High likelihood and high impact → test first, test thoroughly. Low likelihood and low impact → test last, test lightly.
Without risk analysis, teams tend to test what's easiest or what they're most familiar with. Not what matters most.
Risk Identification
Sources of risk in a software product:
Technical risks:
- Complex logic (calculations, state machines, algorithms)
- Recent changes (most bugs are introduced near the change)
- New code (unfamiliar territory; no regression baseline)
- Third-party integrations (external systems, APIs)
- Concurrency and race conditions
- Edge cases in data handling (null, empty, extreme values)
- Security-sensitive paths (auth, payment, PII)
- Performance-sensitive paths (high-traffic endpoints)
Business risks:
- Revenue-critical flows (checkout, billing, subscription management)
- High-visibility features (homepage, sign-up flow)
- Legal/compliance requirements (GDPR, PCI-DSS, accessibility)
- SLA commitments (uptime, response time)
- Regulatory deadlines
Risk Assessment Matrix
Score each area on two axes:
Likelihood (1–5):
- Very unlikely — stable, well-tested, unchanged in months
- Unlikely — minor changes, good test coverage
- Possible — moderate changes, known complexity
- Likely — significant changes, limited coverage
- Very likely — new area, no coverage, complex logic
Impact (1–5):
- Negligible — cosmetic issue, no user impact
- Minor — user inconvenienced, easy workaround
- Moderate — partial feature failure, workaround exists
- Major — key feature unusable, no workaround
- Critical — data loss, security breach, revenue impact
Risk score = Likelihood × Impact (1–25)
| Risk Score | Priority | Test depth |
|---|---|---|
| 20–25 | Critical | Full coverage; exploratory + automation |
| 12–19 | High | Broad coverage; automate key paths |
| 6–11 | Medium | Happy path + main negatives |
| 1–5 | Low | Basic smoke test or skip |
Risk Register (Example)
For a checkout feature:
| Area | Likelihood | Impact | Score | Action |
|---|---|---|---|---|
| Payment processing | 3 | 5 | 15 | High — full test coverage, E2E |
| Promo code calculation | 4 | 4 | 16 | High — EP + BVA for all discount types |
| Order confirmation email | 2 | 3 | 6 | Medium — happy path + invalid email |
| Order history display | 2 | 2 | 4 | Low — basic smoke |
| Currency formatting | 3 | 3 | 9 | Medium — check all supported locales |
Risk-Based Test Planning in a Sprint
Before the sprint:
- Review the user stories and changes in scope
- Identify risk areas using the matrix
- Map test types to risk level (unit for logic, E2E for flows, performance for load paths)
- Assign time budget proportional to risk score
Sprint boundary:
- Critical and High risks: test before release gate
- Medium risks: test before release but can be descoped if schedule pressure
- Low risks: test opportunistically; skip if time-pressured
At release: Risk-based regression. Don't retest everything on every release. Focus regression effort on:
- Areas changed in this release
- Areas that were High/Critical risk in previous sprints
- Areas with historical defect density
The 80/20 Rule for Testing
Roughly 80% of defects come from 20% of the code. Find that 20% by analysing:
- Defect history — which modules have the most bugs?
- Code complexity metrics (cyclomatic complexity) — which functions are hardest to reason about?
- Change frequency (git history) — which files are edited most often?
- Code review comments — which areas generate the most discussion?
Concentrate test effort on that 20%.
# Find most-changed files in git history
git log --pretty=format: --name-only | sort | uniq -c | sort -rn | head -20Failure Mode and Effect Analysis (FMEA)
More rigorous risk technique used in safety-critical systems (medical, automotive). For each component:
- Failure mode — what could go wrong?
- Effect — what happens when it fails?
- Severity (1–10)
- Occurrence — how likely to occur? (1–10)
- Detection — how likely to be caught before reaching users? (1–10, where 10 = very hard to detect)
- RPN (Risk Priority Number) = Severity × Occurrence × Detection
Actions prioritised by highest RPN.
Risk Review During Testing
Risk evolves. Update the risk assessment when:
- A bug is found in a previously "low risk" area — recalibrate
- A feature grows in scope mid-sprint — rerun risk analysis
- New third-party dependency introduced — add to risk register
- Performance issue found — add load tests to scope
Communicating Risk to Stakeholders
When time pressure requires descoping tests, communicate the risk explicitly:
"We are releasing with the following known untested areas:
- Currency conversion in the checkout: Medium risk, potential rounding errors
- Concurrent order creation: Medium risk, potential race condition on inventory
Mitigation: We will monitor Sentry error rates post-release and have a
rollback plan ready. These areas are scheduled for test coverage next sprint."
This makes risk visible. Stakeholders can accept it with awareness, or delay the release. Either is a valid outcome.
Common Failure Cases
Risk scores not updated mid-sprint after scope change — resources stay on original priorities Why: a new third-party dependency is added on day 3 of the sprint but the risk register is not updated; testing effort stays concentrated on the originally scored areas and the new integration ships untested. Detect: the risk register was last updated at sprint planning; any story that introduced a new external dependency since then is not represented. Fix: make risk register review a standing agenda item at the mid-sprint sync; any new dependency, new endpoint, or changed algorithm added after planning triggers a risk re-score before the sprint ends.
Impact scores consistently underestimated for payment and auth flows Why: teams score payment impact as 4 instead of 5 because "we have fraud detection downstream"; the downstream mitigation is real but does not reduce the immediate impact of a checkout failure on revenue. Detect: compare post-incident impact assessments with pre-sprint risk scores; if critical incidents are consistently happening in areas scored 3 or below, the scoring criteria need calibration. Fix: define explicit scoring anchors for impact 5 — any flow that directly handles money, credentials, or PII is automatically impact 5 regardless of downstream mitigations; mitigations reduce likelihood, not impact.
Risk register stays as a spreadsheet never referenced during test execution Why: the register is created during test planning as a documentation artefact; testers work from their usual checklists during execution and the risk scores are never used to prioritise which scenarios to run first. Detect: ask a tester which risk areas they covered in their last session; if they cannot map their activities to items in the risk register, the register is decorative. Fix: derive the test execution order directly from the risk register: sort by risk score descending, and start testing from the top; if time runs out, the lowest-risk items are what gets cut.
FMEA performed once at project start and never revisited Why: the FMEA captures the architecture at launch; six months later, a new caching layer and two new integrations have been added, but RPN scores still reflect the original architecture. Detect: compare FMEA component list against the current architecture diagram; components present in the diagram but absent from the FMEA are unscored risk. Fix: trigger an FMEA review whenever a new component is added to the architecture (new service, new dependency, new data store); the review need not redo the whole FMEA — add only the new components and re-score affected neighbours.
Connections
- qa/test-strategy — risk-based testing is the prioritisation layer within the strategy
- qa/test-case-design — risk score determines how many test cases to derive per area
- qa/exploratory-testing — high-risk areas warrant exploratory charters
- qa/qa-metrics — defect density by module informs risk calibration
- qa/bug-lifecycle — high-risk area bugs get Critical/P1 severity treatment
Open Questions
- What testing scenarios does this technique systematically miss?
- How does this approach need to change when delivery cadence moves to continuous deployment?