Risk-Based Testing

Prioritise testing effort toward areas of highest risk. You never have enough time to test everything — risk-based testing ensures the most critical and failure-prone areas get the most attention.

Prioritise testing effort toward areas of highest risk. You never have enough time to test everything. Risk-based testing ensures the most critical and failure-prone areas get the most attention.


The Core Idea

Risk = Likelihood of failure × Impact of failure

High likelihood and high impact → test first, test thoroughly. Low likelihood and low impact → test last, test lightly.

Without risk analysis, teams tend to test what's easiest or what they're most familiar with. Not what matters most.


Risk Identification

Sources of risk in a software product:

Technical risks:

  • Complex logic (calculations, state machines, algorithms)
  • Recent changes (most bugs are introduced near the change)
  • New code (unfamiliar territory; no regression baseline)
  • Third-party integrations (external systems, APIs)
  • Concurrency and race conditions
  • Edge cases in data handling (null, empty, extreme values)
  • Security-sensitive paths (auth, payment, PII)
  • Performance-sensitive paths (high-traffic endpoints)

Business risks:

  • Revenue-critical flows (checkout, billing, subscription management)
  • High-visibility features (homepage, sign-up flow)
  • Legal/compliance requirements (GDPR, PCI-DSS, accessibility)
  • SLA commitments (uptime, response time)
  • Regulatory deadlines

Risk Assessment Matrix

Score each area on two axes:

Likelihood (1–5):

  1. Very unlikely — stable, well-tested, unchanged in months
  2. Unlikely — minor changes, good test coverage
  3. Possible — moderate changes, known complexity
  4. Likely — significant changes, limited coverage
  5. Very likely — new area, no coverage, complex logic

Impact (1–5):

  1. Negligible — cosmetic issue, no user impact
  2. Minor — user inconvenienced, easy workaround
  3. Moderate — partial feature failure, workaround exists
  4. Major — key feature unusable, no workaround
  5. Critical — data loss, security breach, revenue impact

Risk score = Likelihood × Impact (1–25)

Risk ScorePriorityTest depth
20–25CriticalFull coverage; exploratory + automation
12–19HighBroad coverage; automate key paths
6–11MediumHappy path + main negatives
1–5LowBasic smoke test or skip

Risk Register (Example)

For a checkout feature:

AreaLikelihoodImpactScoreAction
Payment processing3515High — full test coverage, E2E
Promo code calculation4416High — EP + BVA for all discount types
Order confirmation email236Medium — happy path + invalid email
Order history display224Low — basic smoke
Currency formatting339Medium — check all supported locales

Risk-Based Test Planning in a Sprint

Before the sprint:

  1. Review the user stories and changes in scope
  2. Identify risk areas using the matrix
  3. Map test types to risk level (unit for logic, E2E for flows, performance for load paths)
  4. Assign time budget proportional to risk score

Sprint boundary:

  • Critical and High risks: test before release gate
  • Medium risks: test before release but can be descoped if schedule pressure
  • Low risks: test opportunistically; skip if time-pressured

At release: Risk-based regression. Don't retest everything on every release. Focus regression effort on:

  • Areas changed in this release
  • Areas that were High/Critical risk in previous sprints
  • Areas with historical defect density

The 80/20 Rule for Testing

Roughly 80% of defects come from 20% of the code. Find that 20% by analysing:

  • Defect history — which modules have the most bugs?
  • Code complexity metrics (cyclomatic complexity) — which functions are hardest to reason about?
  • Change frequency (git history) — which files are edited most often?
  • Code review comments — which areas generate the most discussion?

Concentrate test effort on that 20%.

# Find most-changed files in git history
git log --pretty=format: --name-only | sort | uniq -c | sort -rn | head -20

Failure Mode and Effect Analysis (FMEA)

More rigorous risk technique used in safety-critical systems (medical, automotive). For each component:

  1. Failure mode — what could go wrong?
  2. Effect — what happens when it fails?
  3. Severity (1–10)
  4. Occurrence — how likely to occur? (1–10)
  5. Detection — how likely to be caught before reaching users? (1–10, where 10 = very hard to detect)
  6. RPN (Risk Priority Number) = Severity × Occurrence × Detection

Actions prioritised by highest RPN.


Risk Review During Testing

Risk evolves. Update the risk assessment when:

  • A bug is found in a previously "low risk" area — recalibrate
  • A feature grows in scope mid-sprint — rerun risk analysis
  • New third-party dependency introduced — add to risk register
  • Performance issue found — add load tests to scope

Communicating Risk to Stakeholders

When time pressure requires descoping tests, communicate the risk explicitly:

"We are releasing with the following known untested areas:
 - Currency conversion in the checkout: Medium risk, potential rounding errors
 - Concurrent order creation: Medium risk, potential race condition on inventory

Mitigation: We will monitor Sentry error rates post-release and have a 
rollback plan ready. These areas are scheduled for test coverage next sprint."

This makes risk visible. Stakeholders can accept it with awareness, or delay the release. Either is a valid outcome.


Common Failure Cases

Risk scores not updated mid-sprint after scope change — resources stay on original priorities Why: a new third-party dependency is added on day 3 of the sprint but the risk register is not updated; testing effort stays concentrated on the originally scored areas and the new integration ships untested. Detect: the risk register was last updated at sprint planning; any story that introduced a new external dependency since then is not represented. Fix: make risk register review a standing agenda item at the mid-sprint sync; any new dependency, new endpoint, or changed algorithm added after planning triggers a risk re-score before the sprint ends.

Impact scores consistently underestimated for payment and auth flows Why: teams score payment impact as 4 instead of 5 because "we have fraud detection downstream"; the downstream mitigation is real but does not reduce the immediate impact of a checkout failure on revenue. Detect: compare post-incident impact assessments with pre-sprint risk scores; if critical incidents are consistently happening in areas scored 3 or below, the scoring criteria need calibration. Fix: define explicit scoring anchors for impact 5 — any flow that directly handles money, credentials, or PII is automatically impact 5 regardless of downstream mitigations; mitigations reduce likelihood, not impact.

Risk register stays as a spreadsheet never referenced during test execution Why: the register is created during test planning as a documentation artefact; testers work from their usual checklists during execution and the risk scores are never used to prioritise which scenarios to run first. Detect: ask a tester which risk areas they covered in their last session; if they cannot map their activities to items in the risk register, the register is decorative. Fix: derive the test execution order directly from the risk register: sort by risk score descending, and start testing from the top; if time runs out, the lowest-risk items are what gets cut.

FMEA performed once at project start and never revisited Why: the FMEA captures the architecture at launch; six months later, a new caching layer and two new integrations have been added, but RPN scores still reflect the original architecture. Detect: compare FMEA component list against the current architecture diagram; components present in the diagram but absent from the FMEA are unscored risk. Fix: trigger an FMEA review whenever a new component is added to the architecture (new service, new dependency, new data store); the review need not redo the whole FMEA — add only the new components and re-score affected neighbours.

Connections

Open Questions

  • What testing scenarios does this technique systematically miss?
  • How does this approach need to change when delivery cadence moves to continuous deployment?