Test Reporting and Dashboards
Making test results visible, actionable, and trackable over time.
Making test results visible, actionable, and trackable over time.
Why Reporting Matters
"The tests are green" is insufficient information for stakeholders.
Good reporting answers:
- What was tested, and what wasn't? (coverage)
- Which tests failed, and why? (actionable failures)
- Is quality improving over time? (trend)
- Which tests are unreliable? (flaky test tracking)
- How fast is the suite? (execution time)
Good reporting changes behaviour:
Engineers notice when coverage drops → write more tests
QA notices flaky test trends → investigates and fixes
Product sees quality metrics → makes prioritisation decisions
Allure Report with pytest
pip install allure-pytest
pytest tests/ --alluredir=./allure-results
allure serve ./allure-results # opens browser with interactive report
allure generate ./allure-results --clean -o ./allure-report# Enrich tests with Allure metadata
import allure
@allure.feature("Order Management")
@allure.story("Order Placement")
@allure.severity(allure.severity_level.CRITICAL)
def test_place_order_success(page, client):
with allure.step("Navigate to checkout"):
page.goto("http://localhost:3000/checkout")
with allure.step("Fill payment details"):
page.get_by_label("Card number").fill("4242424242424242")
with allure.step("Submit order"):
page.get_by_role("button", name="Place order").click()
with allure.step("Verify confirmation"):
expect(page.get_by_role("heading", name="Order confirmed")).to_be_visible()
# Attach screenshot to report
allure.attach(
page.screenshot(),
name="Order confirmation",
attachment_type=allure.attachment_type.PNG,
)
@allure.feature("Order Management")
@allure.story("Order Cancellation")
@pytest.mark.parametrize("status", ["pending", "confirmed"])
def test_cancel_allowed_states(status: str, make_order, client):
allure.dynamic.title(f"Cancel order in {status} state")
...CI Pipeline Reporting
# .github/workflows/test.yml
name: Tests
on: [push]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests
run: pytest tests/ --alluredir=allure-results --junitxml=junit.xml -q
- name: Upload Allure results
uses: actions/upload-artifact@v4
if: always()
with:
name: allure-results
path: allure-results/
- name: Publish JUnit results
uses: EnricoMi/publish-unit-test-result-action@v2
if: always()
with:
files: junit.xml
- name: Comment PR with test summary
uses: actions/github-script@v7
if: github.event_name == 'pull_request' && always()
with:
script: |
const fs = require('fs');
const xml = fs.readFileSync('junit.xml', 'utf8');
// Parse and summarise...
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Test Results\n✅ 142 passed | ❌ 2 failed | ⏱ 45s`,
});Slack Notifications
# notify_slack.py — called from CI after test run
import httpx, json, sys, xml.etree.ElementTree as ET
def parse_junit(junit_path: str) -> dict:
tree = ET.parse(junit_path)
root = tree.getroot()
return {
"tests": int(root.attrib.get("tests", 0)),
"failures": int(root.attrib.get("failures", 0)),
"errors": int(root.attrib.get("errors", 0)),
"skipped": int(root.attrib.get("skipped", 0)),
"time": float(root.attrib.get("time", 0)),
}
def notify_slack(webhook_url: str, junit_path: str, build_url: str, branch: str) -> None:
stats = parse_junit(junit_path)
passed = stats["tests"] - stats["failures"] - stats["errors"]
success = stats["failures"] == 0 and stats["errors"] == 0
colour = "good" if success else "danger"
status = "PASSED" if success else "FAILED"
payload = {
"attachments": [{
"color": colour,
"title": f"Test Suite {status} — {branch}",
"title_link": build_url,
"fields": [
{"title": "Tests", "value": str(stats["tests"]), "short": True},
{"title": "Passed", "value": str(passed), "short": True},
{"title": "Failed", "value": str(stats["failures"]), "short": True},
{"title": "Duration", "value": f"{stats['time']:.0f}s", "short": True},
],
}]
}
httpx.post(webhook_url, json=payload).raise_for_status()
if __name__ == "__main__":
notify_slack(
webhook_url=sys.argv[1],
junit_path=sys.argv[2],
build_url=sys.argv[3],
branch=sys.argv[4],
)Flaky Test Tracking
# pytest plugin: track flaky tests across runs
# Write results to a shared store (S3, database, or file)
import json, os, time
from pathlib import Path
class FlakyTracker:
"""pytest plugin that records pass/fail per test across runs."""
def __init__(self, history_path: str) -> None:
self.history_path = Path(history_path)
self.results: dict[str, list[str]] = self._load()
def _load(self) -> dict:
if self.history_path.exists():
return json.loads(self.history_path.read_text())
return {}
def pytest_runtest_logreport(self, report) -> None:
if report.when == "call":
node_id = report.nodeid
outcome = "pass" if report.passed else "fail"
self.results.setdefault(node_id, []).append(outcome)
# Keep last 20 results
self.results[node_id] = self.results[node_id][-20:]
def pytest_sessionfinish(self) -> None:
self.history_path.write_text(json.dumps(self.results, indent=2))
def get_flaky_tests(self, threshold: float = 0.8) -> list[str]:
"""Tests that pass < threshold% of the time are considered flaky."""
flaky = []
for test, outcomes in self.results.items():
if len(outcomes) >= 5:
pass_rate = outcomes.count("pass") / len(outcomes)
if 0.1 < pass_rate < threshold: # not always failing, not always passing
flaky.append(test)
return flakyDashboard Metrics to Track
Metric Target How to measure
───────────────────────────────────────────────────────────────
Test pass rate > 99% JUnit XML / Allure
Suite duration trend Stable or ↓ CI job timing
Flaky test rate < 1% FlakyTracker over 30 days
Code coverage > 80% pytest-cov report
Coverage trend Stable or ↑ Compare across PRs
Time to first test result < 5 min Pre-commit hook timing
E2E suite duration < 30 min CI job timing
New tests per sprint Positive Git diff on tests/
Common Failure Cases
Allure report is generated from a stale results directory when --clean is omitted
Why: allure generate ./allure-results without --clean merges new results with old ones; if a test was renamed or deleted, its old result files remain and appear as ghost tests in the report.
Detect: the Allure report shows tests that no longer exist in the codebase, or a renamed test appears twice.
Fix: always pass --clean to allure generate in CI; in the artifact upload step, upload the raw allure-results/ directory and regenerate on each run.
Slack notification script reports "0 failed" when junit.xml is missing
Why: ET.parse() raises FileNotFoundError if pytest exits before writing the JUnit file (e.g., collection error); if the script catches all exceptions and defaults to success, the Slack message incorrectly shows a green build.
Detect: CI pipeline shows a red pytest step followed by a green Slack notification.
Fix: make the script fail explicitly if junit.xml does not exist (sys.exit(1)) and run it only after a successful pytest step using if: always() with an explicit file-existence check.
FlakyTracker history file grows unbounded and causes slow CI startup
Why: self.results[node_id][-20:] caps per-test history at 20 runs, but the JSON file accumulates entries for every test ever run, including deleted tests; over months the file can reach tens of MB.
Detect: CI setup time increases gradually; the history file exceeds 10 MB.
Fix: add a periodic cleanup step that removes entries for tests not present in the current pytest --collect-only output, or use a database with a TTL index instead of a flat JSON file.
PR comment shows incorrect test counts when tests run in parallel with -n auto
Why: pytest-xdist writes separate JUnit XML files per worker; if the CI step reads only the first file or concatenates them incorrectly, the comment reports a subset of the actual test count.
Detect: PR comment shows 20 tests when the full suite has 200; failure counts are understated.
Fix: use --junitxml=junit.xml with pytest-xdist's merged output, or merge all per-worker XML files with junitparser before parsing.
Connections
tqa-hub · technical-qa/flaky-test-management · technical-qa/parallel-test-execution · qa/qa-metrics · qa/test-reporting · cloud/github-actions
Open Questions
- What is the most common failure mode when implementing this at scale?
- How does this testing approach need to adapt for distributed or microservice architectures?
Related reading