Non-Functional Testing

Testing how the system behaves under conditions, not just whether it does the right thing.

Testing how the system behaves under conditions, not just whether it does the right thing. Non-functional requirements are often implicit. Users expect fast, reliable, secure, usable software even when it's not in the ticket.


Non-Functional Requirement Categories

CategoryQuestion it answersTest approach
PerformanceHow fast under load?Load testing (k6, JMeter, Locust)
ScalabilityDoes it hold up as load grows?Stress/spike testing
ReliabilityDoes it stay up?Chaos engineering, SLO tracking
SecurityIs it safe from attack?SAST, DAST, pen testing
UsabilityCan users accomplish their goals?User testing, heuristic eval
AccessibilityCan disabled users use it?axe-core, screen reader testing
CompatibilityDoes it work across browsers/devices?Cross-browser, device lab
MaintainabilityCan engineers change it?Code quality metrics, test coverage
ComplianceDoes it meet regulations?GDPR, WCAG, PCI DSS audits

Performance Testing Types

Load test      — sustained expected load (e.g., 500 concurrent users for 30 min)
Stress test    — ramp beyond capacity to find the breaking point
Spike test     — sudden large jump (Black Friday: 10x baseline in 30 seconds)
Soak test      — moderate load sustained over hours to find memory leaks
Volume test    — large data set (1M records) to find database/query bottlenecks
Baseline test  — single user, document response time under no load

Reliability — SLOs and Error Budgets

SLI (Service Level Indicator): measured value — "99.2% of requests completed in < 500ms this month"
SLO (Service Level Objective): target — "99.9% of requests under 500ms per 30-day window"
SLA (Service Level Agreement): contractual commitment — penalty if SLO missed

Error budget = 100% - SLO = allowed downtime/failure
  99.9% SLO → 43.8 min/month of allowed downtime
  99.95% SLO → 21.9 min/month

When error budget is depleted:
  - Freeze feature work
  - Focus entirely on reliability improvements
  - Post-mortem required

Chaos Engineering

Deliberately injecting failures to verify that systems degrade gracefully.

# AWS Fault Injection Service (FIS) — stop EC2 instances
aws fis create-experiment-template \
  --description "Terminate 25% of ECS tasks" \
  --actions '{
    "terminateTasks": {
      "actionId": "aws:ecs:stop-task",
      "parameters": {
        "duration": "PT2M",
        "ecsClusterArn": "arn:aws:ecs:eu-west-1:123456789:cluster/myapp"
      },
      "targets": {"Tasks": "taskTargets"}
    }
  }' \
  --targets '{"taskTargets": {"resourceType": "aws:ecs:task", "selectionMode": "PERCENT(25)"}}'
# Toxiproxy — simulate network conditions in tests
import toxiproxy

proxy = toxiproxy.Proxy(name='myapp_db', listen='0.0.0.0:5433', upstream='postgres:5432')
proxy.create()

# Add latency
proxy.add_toxic('latency', type='latency', attributes={'latency': 1000, 'jitter': 200})

# Test that app handles 1s DB latency within timeout
result = my_app_function_under_test()
assert result is not None  # degraded, not failed

proxy.remove_toxic('latency')

Chaos tools: AWS FIS, Chaos Monkey (Netflix), Toxiproxy, Gremlin, LitmusChaos (Kubernetes).


Usability Testing

Heuristic evaluation (expert review against Nielsen's 10 heuristics):
  1. Visibility of system status
  2. Match between system and real world
  3. User control and freedom
  4. Consistency and standards
  5. Error prevention
  6. Recognition rather than recall
  7. Flexibility and efficiency of use
  8. Aesthetic and minimalist design
  9. Help users recognise, diagnose, and recover from errors
  10. Help and documentation

Guerrilla testing (5 minutes, anyone nearby):
  - Give them a task, observe without helping
  - Note where they hesitate, misclick, or express confusion
  - 5 users reveal ~85% of usability problems (Nielsen)

Performance Budgets

# Lighthouse CI — fail build if metrics regress
# .lighthouserc.yaml
ci:
  collect:
    url: ['http://localhost:3000/', 'http://localhost:3000/products']
    numberOfRuns: 3
  assert:
    assertions:
      first-contentful-paint:
        - warn
        - maxNumericValue: 2000
      largest-contentful-paint:
        - error
        - maxNumericValue: 4000
      cumulative-layout-shift:
        - error
        - maxNumericValue: 0.1
      total-blocking-time:
        - error
        - maxNumericValue: 300

Common Failure Cases

NFRs never written, so performance testing has no pass/fail criteria Why: performance is treated as a concern only after a slow complaint arrives; no acceptance criteria exist against which a load test result can fail. Detect: the load test report shows p95 = 1.2s but there is no SLO to compare it to; the team cannot determine whether to block release. Fix: add performance acceptance criteria in the Given/When/Then format to every user-facing story before sprint start; the presence of a failing assertion makes the test a real gate.

Soak test skipped because load test passed — memory leak ships to production Why: a 5-minute load test does not exercise the leak; only the 2-hour soak test would catch connection pool exhaustion or unbounded cache growth. Detect: application memory climbs monotonically in production Grafana charts and requires a restart every 48 hours. Fix: make the soak test mandatory (not optional) for any sprint that introduces a new background job, caching layer, or connection pool; schedule it as a nightly CI job rather than a pre-release manual step.

Chaos experiment run against production without a rollback plan Why: FIS experiment template terminates 25% of ECS tasks but the auto-scaling policy has a misconfigured scale-out delay, leaving the cluster degraded for 12 minutes. Detect: synthetic monitor fires an alert within 2 minutes; no runbook exists to stop the experiment early. Fix: always define a stopCondition in the FIS template tied to an alarm (e.g., 5xx rate > 5%), and document the aws fis stop-experiment command in the runbook before executing any chaos experiment.

Lighthouse CI assertions never tighten — performance budget drifts upward Why: the initial maxNumericValue thresholds are set generously and never revised; LCP creeps from 2.1s to 3.8s over six months without any CI failure. Detect: compare the current Lighthouse CI config thresholds against the actual p95 values from last month's runs; if the threshold is more than 30% above current actuals, the budget is not functioning as a gate. Fix: after each quarter, lower the maxNumericValue thresholds to 110% of the measured actuals, treating performance budgets as a ratchet that can only tighten.

Connections

qa-hub · qa/test-strategy · qa/risk-based-testing · qa/accessibility-testing · qa/cross-browser-testing · technical-qa/performance-testing · cloud/cloud-monitoring

Open Questions

  • What testing scenarios does this technique systematically miss?
  • How does this approach need to change when delivery cadence moves to continuous deployment?