Visual Testing
Automated comparison of UI screenshots against approved baselines to catch unintended visual regressions.
Automated comparison of UI screenshots against approved baselines to catch unintended visual regressions. Complements functional tests. You can test that a button is clickable without testing that it looks correct.
Why Visual Testing
Functional tests verify behaviour: "button exists and is clickable"
Visual tests verify appearance: "button is the right colour, size, and position"
Visual regressions happen when:
- CSS change unintentionally shifts layout
- Font loading changes text reflow
- Z-index change hides an element under another
- Dark mode styling breaks
- Responsive breakpoints misfire
Playwright Screenshot Testing
Built-in, no extra setup. Baseline images stored in the repo.
// tests/visual/homepage.spec.ts
import { test, expect } from '@playwright/test';
test.describe('Visual regression', () => {
test('homepage matches baseline', async ({ page }) => {
await page.goto('/');
await page.waitForLoadState('networkidle');
await expect(page).toHaveScreenshot('homepage.png', {
fullPage: true,
maxDiffPixelRatio: 0.02, // 2% pixel difference allowed (anti-aliasing)
threshold: 0.2, // per-pixel colour threshold 0-1
animations: 'disabled', // disable CSS animations for stable captures
});
});
test('product card component', async ({ page }) => {
await page.goto('/products/1');
const card = page.locator('[data-testid="product-card"]').first();
await expect(card).toHaveScreenshot('product-card.png');
});
test('mobile homepage', async ({ page }) => {
await page.setViewportSize({ width: 375, height: 812 });
await page.goto('/');
await expect(page).toHaveScreenshot('homepage-mobile.png');
});
});# First run — generate baselines
npx playwright test tests/visual/ --update-snapshots
# Subsequent runs — compare against baseline
npx playwright test tests/visual/
# Update specific snapshot
npx playwright test --update-snapshots --grep "homepage"
# View diff report when tests fail
npx playwright show-reportPercy (BrowserStack Visual Testing)
Cloud service with AI-powered comparison. Renders snapshots across multiple browsers/resolutions simultaneously.
// playwright + Percy
import { percySnapshot } from '@percy/playwright';
import { test } from '@playwright/test';
test('checkout page visual snapshot', async ({ page }) => {
await page.goto('/checkout');
await page.waitForLoadState('networkidle');
await percySnapshot(page, 'Checkout Page', {
widths: [375, 768, 1280], // capture at multiple widths
enableJavaScript: true,
});
});# Run with Percy token
PERCY_TOKEN=your_token npx playwright test tests/visual/
# Percy compares against approved baseline, flags regressions in its UI
# Engineers approve or reject each diff in the Percy dashboardApplitools Eyes
# Python + Playwright + Applitools
from applitools.playwright import Eyes, Target
def test_product_page_visual(page):
eyes = Eyes()
eyes.api_key = os.environ["APPLITOOLS_API_KEY"]
try:
eyes.open(page, "MyApp", "Product Page", {"width": 1280, "height": 800})
page.goto("/products/1")
eyes.check(Target.window().fully())
eyes.close()
except Exception:
eyes.abort()
raiseApplitools uses AI to ignore irrelevant diffs (dynamic dates, adverts) while catching layout regressions.
Component-Level Visual Testing (Storybook + Chromatic)
For component libraries:
# .github/workflows/chromatic.yaml
- name: Publish to Chromatic
uses: chromaui/action@v1
with:
projectToken: ${{ secrets.CHROMATIC_PROJECT_TOKEN }}
exitOnceUploaded: true # non-blocking in CI; review in Chromatic UIChromatic automatically diffs each Storybook story against the previous approved baseline. Reviewers approve/reject in the Chromatic UI before merging.
When Visual Tests Fail
# Playwright — view interactive diff report
npx playwright show-report
# Update baselines after intentional UI changes
npx playwright test --update-snapshots
# Commit updated baselines (must be intentional, not reflexive)
git add tests/visual/**/*.png
git commit -m "chore: update visual baselines after brand refresh"Policy: never auto-update snapshots in CI without human review. Baseline updates should require PR approval from a designer or QA lead.
Masking Dynamic Content
// Mask dynamic elements before screenshot (timestamps, user IDs, ads)
await page.goto('/dashboard');
await expect(page).toHaveScreenshot('dashboard.png', {
mask: [
page.locator('[data-testid="timestamp"]'),
page.locator('[data-testid="user-avatar"]'),
page.locator('.ad-container'),
],
maskColor: '#ff00ff', // magenta placeholder colour (visible in diff)
});Common Failure Cases
Baselines generated on macOS differ from CI Linux renders, causing permanent failures
Why: font rendering, anti-aliasing, and sub-pixel hinting differ between macOS and Linux; a baseline committed from a developer machine will always fail on a Linux CI runner even with maxDiffPixelRatio: 0.02.
Detect: visual tests pass locally but always fail in CI with small pixel differences concentrated around text edges.
Fix: generate and commit baselines only inside CI (run --update-snapshots in a dedicated CI job) and never commit locally-generated baselines; use Docker to ensure the local render environment matches CI.
CSS animations cause non-deterministic screenshots
Why: screenshots captured mid-animation differ from those captured at rest; even animations: 'disabled' in Playwright does not stop JavaScript-driven animations or third-party libraries that use requestAnimationFrame directly.
Detect: intermittent pixel diffs always in the same region of the page (e.g., a loading spinner, a carousel); failures are not reproducible on re-run.
Fix: add a waitForLoadState('networkidle') plus an explicit wait for animation-completion attributes (e.g., page.wait_for_selector('[data-animation-done]')), or use page.evaluate("document.getAnimations().forEach(a => a.finish()") to force all animations to completion.
--update-snapshots committed reflexively in CI, masking real regressions
Why: developers update snapshots to fix a failing build without reviewing the visual diff; a genuine regression (e.g., broken layout) is approved and the new broken state becomes the baseline.
Detect: baseline images change in a PR but no corresponding UI change was made; the diff shows significant structural layout changes.
Fix: enforce a policy where snapshot updates require a separate PR with mandatory design/QA review; never run --update-snapshots in the main CI gate.
Dynamic content not masked produces false-positive failures on every run
Why: timestamps, user avatars, session IDs, and ad slots change between runs; without masking, every screenshot differs from the baseline.
Detect: visual tests fail on every PR regardless of code changes; the diff always shows the same dynamic region.
Fix: identify all dynamic regions with data-testid attributes and add them to the mask array in toHaveScreenshot; verify the masked regions are covered by functional tests elsewhere.
Connections
tqa-hub · technical-qa/playwright-advanced · technical-qa/cypress · qa/cross-browser-testing · qa/regression-testing · qa/accessibility-testing
Open Questions
- What is the most common failure mode when implementing this at scale?
- How does this testing approach need to adapt for distributed or microservice architectures?
Related reading