Project 31: Visual Regression Testing - Screenshot Diff Engine

Project 31: Visual Regression Testing - Screenshot Diff Engine

Build a visual regression testing system that captures screenshots through Chrome MCP, compares them to baselines, highlights differences, and uses Claudeโ€™s visual reasoning to analyze what changed and why it matters.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Language TypeScript (Alternatives: Python, Go)
Prerequisites Projects 29-30, image processing concepts
Key Topics Visual testing, image comparison, baseline management, CI/CD integration
Main Book โ€œPractical Test-Driven Developmentโ€ by Viktor Farcic

1. Learning Objectives

By completing this project, you will:

  1. Capture consistent screenshots: Handle viewport sizing, timing, and dynamic content
  2. Implement image comparison: Understand pixel diff, perceptual hashing, and thresholds
  3. Manage baselines: Version control visual baselines and handle intentional changes
  4. Leverage Claudeโ€™s visual reasoning: Go beyond pixel counts to semantic change analysis
  5. Handle test flakiness: Mask dynamic content and stabilize captures
  6. Generate actionable reports: Create visual diff reports that developers can act on

2. Theoretical Foundation

2.1 Why Visual Testing Matters

Functional tests verify behavior. Visual tests verify appearance. Many bugs slip through functional tests:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    What Functional Tests Miss                                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚  Button works                    Button hidden behind overlay               โ”‚
โ”‚  โœ“ click handler fires           โœ— user can't see it                        โ”‚
โ”‚                                                                              โ”‚
โ”‚  Form validates                  Form text is white on white                โ”‚
โ”‚  โœ“ error message set             โœ— error is invisible                       โ”‚
โ”‚                                                                              โ”‚
โ”‚  Modal opens                     Modal is 10,000px wide                     โ”‚
โ”‚  โœ“ DOM element present           โœ— completely broken UI                     โ”‚
โ”‚                                                                              โ”‚
โ”‚  CSS property set                CSS has conflicting rules                  โ”‚
โ”‚  โœ“ style is applied              โœ— wrong style wins cascade                 โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.2 Image Comparison Algorithms

Three main approaches to comparing images:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     Comparison Algorithm Spectrum                            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚  Pixel-by-Pixel              Perceptual Hashing           Structural        โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€             โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€           โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€        โ”‚
โ”‚                                                                              โ”‚
โ”‚  Compare each pixel          Hash visual features         Compare layout    โ”‚
โ”‚  RGB value                   Compare hashes               and hierarchy     โ”‚
โ”‚                                                                              โ”‚
โ”‚  Pros:                       Pros:                        Pros:             โ”‚
โ”‚  - Exact matching            - Tolerant of minor          - Semantic        โ”‚
โ”‚  - Simple to implement         changes                      understanding   โ”‚
โ”‚  - Fast computation          - Fast comparison            - Resize tolerant โ”‚
โ”‚                              - Rotation tolerant                             โ”‚
โ”‚                                                                              โ”‚
โ”‚  Cons:                       Cons:                        Cons:             โ”‚
โ”‚  - Anti-aliasing noise       - May miss subtle            - Complex to      โ”‚
โ”‚  - Font rendering diffs        changes                      implement       โ”‚
โ”‚  - False positives           - Hash collisions            - Slow            โ”‚
โ”‚                                                                              โ”‚
โ”‚  Use when:                   Use when:                    Use when:         โ”‚
โ”‚  - Pixel-perfect required    - General similarity ok      - Layout testing  โ”‚
โ”‚  - Controlled environment    - Cross-browser testing      - Responsive      โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.3 The Flakiness Problem

Visual tests are notorious for flakiness. Common causes:

Source Example Mitigation
Anti-aliasing Font edges differ by 1 pixel Threshold tolerance (e.g., 0.1%)
Font rendering OS renders fonts differently Use web fonts, consistent environment
Animations Screenshot mid-animation Wait for animations, disable them
Dynamic content Timestamps, avatars Mask known dynamic regions
Network timing Images not loaded Wait for network idle
Scroll position Page scrolled differently Reset scroll before capture
Viewport size Browser chrome varies Use consistent viewport size
Date/time โ€œPosted 2 minutes agoโ€ Freeze time or mask

2.4 Baseline Management

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         Baseline Workflow                                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚   First Run (No Baseline)          Subsequent Runs                          โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€        โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                        โ”‚
โ”‚                                                                              โ”‚
โ”‚   Capture screenshot               Capture new screenshot                   โ”‚
โ”‚           โ”‚                                 โ”‚                               โ”‚
โ”‚           โ–ผ                                 โ–ผ                               โ”‚
โ”‚   Save as baseline โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ Compare to baseline                      โ”‚
โ”‚   (reviewed by human)                       โ”‚                               โ”‚
โ”‚                                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                     โ”‚
โ”‚                                   โ”‚                   โ”‚                     โ”‚
โ”‚                                 Match              Differ                   โ”‚
โ”‚                                   โ”‚                   โ”‚                     โ”‚
โ”‚                                   โ–ผ                   โ–ผ                     โ”‚
โ”‚                                 PASS          Human Review                  โ”‚
โ”‚                                              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”‚
โ”‚                                              โ”‚                   โ”‚          โ”‚
โ”‚                                         Intentional         Regression      โ”‚
โ”‚                                              โ”‚                   โ”‚          โ”‚
โ”‚                                              โ–ผ                   โ–ผ          โ”‚
โ”‚                                         Update baseline     Fix bug         โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.5 Claudeโ€™s Visual Reasoning Advantage

Traditional visual testing tools give you pixel counts. Claude can tell you what changed:

Traditional Tool:
  "87.3% similar. 12.7% pixels differ."

Claude's Analysis:
  "The pricing cards have been rearranged. The 'Pro' tier moved
   from position 2 to position 3. The 'Enterprise' card now shows
   'Contact Us' instead of a price. The overall color scheme and
   layout remain consistent. This appears to be an intentional
   product change, not a regression."

This semantic understanding is the unique value proposition of this project.


3. Project Specification

3.1 What You Will Build

A visual regression testing system that:

  1. Captures screenshots at consistent viewport sizes
  2. Compares new captures against baseline images
  3. Generates diff visualizations showing changes
  4. Uses Claude to analyze and explain differences
  5. Produces HTML reports with side-by-side comparisons
  6. Supports multiple viewports (desktop, tablet, mobile)

3.2 Functional Requirements

  1. Consistent Capture
    • Set viewport to specific dimensions
    • Wait for network idle and animations
    • Mask known dynamic content
    • Capture full page or specific regions
  2. Baseline Management
    • Store baselines with content-addressable naming
    • Support baseline creation on first run
    • Enable baseline updates via approval
    • Version control integration (Git LFS)
  3. Image Comparison
    • Pixel-by-pixel diff with configurable threshold
    • Generate highlighted diff images
    • Calculate similarity percentage
    • Support region-specific comparisons
  4. Semantic Analysis
    • Use Claude to analyze diff images
    • Explain what changed in human terms
    • Suggest whether change is intentional
    • Identify regression patterns
  5. Report Generation
    • HTML report with side-by-side images
    • Filterable by pass/fail status
    • Exportable to CI systems
    • Links to specific test results

3.3 Example Output

You: Run visual regression tests on our staging site

Claude: I'll capture screenshots and compare against baselines.

[Setting viewport: 1920x1080 (Desktop)]
[Navigating to /home...]
[Waiting for animations...]
[Capturing screenshot...]
[Comparing against baseline...]

================================================================================
                VISUAL REGRESSION REPORT - 2024-12-22 14:30 UTC
================================================================================

SUMMARY
-------
  Pages Tested: 5
  Viewports: Desktop (1920x1080), Tablet (768x1024), Mobile (375x667)
  Total Comparisons: 15

RESULTS
-------

/home
โ”œโ”€โ”€ Desktop: PASS (99.8% similar)
โ”‚   Minor anti-aliasing difference in header font
โ”‚
โ”œโ”€โ”€ Tablet: WARN (98.2% similar)
โ”‚   Difference: Button alignment shifted 3px left
โ”‚   [Screenshot shows highlighted region]
โ”‚   Claude Analysis: "The navigation buttons appear to have
โ”‚   shifted slightly. This may be intentional responsive
โ”‚   adjustment or an unintended side effect of CSS changes."
โ”‚
โ””โ”€โ”€ Mobile: PASS (99.9% similar)

/pricing
โ”œโ”€โ”€ Desktop: FAIL (87.3% similar)
โ”‚   [Side-by-side diff image generated]
โ”‚
โ”‚   Claude Analysis:
โ”‚   "Significant visual changes detected on the pricing page:
โ”‚
โ”‚   1. Card Reordering: The 'Pro' tier has moved from position 2
โ”‚      to position 3. The 'Enterprise' tier is now in position 2.
โ”‚
โ”‚   2. Price Change: The 'Pro' tier shows $19/mo instead of $15/mo
โ”‚
โ”‚   3. New Badge: 'Most Popular' badge added to 'Enterprise' tier
โ”‚
โ”‚   These appear to be intentional product changes rather than
โ”‚   regressions. Recommend reviewing with product team."
โ”‚
โ”‚   Action needed: Approve new baseline or revert changes
โ”‚
โ”œโ”€โ”€ Tablet: FAIL (85.1% similar)
โ”‚   [Same issues as Desktop, plus responsive layout shift]
โ”‚
โ””โ”€โ”€ Mobile: WARN (96.4% similar)
โ”‚   Font size appears smaller than baseline

/about
โ””โ”€โ”€ All viewports: PASS (>99.5% similar)

/contact
โ””โ”€โ”€ All viewports: PASS (>99.5% similar)

/login
โ””โ”€โ”€ All viewports: PASS (>99.5% similar)

================================================================================
                              ACTIONS REQUIRED
================================================================================

1. /pricing: Major visual changes detected
   - Review changes with product team
   - If intentional: Run with --update-baseline
   - If regression: Investigate CSS/component changes

2. /home (Tablet): Minor alignment shift
   - Low priority: Likely responsive adjustment

================================================================================

Report saved: ./visual-reports/2024-12-22-143000/index.html
Diff images: ./visual-reports/2024-12-22-143000/diffs/

4. Solution Architecture

4.1 High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Visual Regression Testing System                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚   Capture    โ”‚โ”€โ”€โ–ถโ”‚  Comparison  โ”‚โ”€โ”€โ–ถโ”‚   Analysis   โ”‚โ”€โ”€โ–ถโ”‚   Report     โ”‚  โ”‚
โ”‚  โ”‚   Engine     โ”‚   โ”‚    Engine    โ”‚   โ”‚   (Claude)   โ”‚   โ”‚  Generator   โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚         โ”‚                  โ”‚                  โ”‚                  โ”‚          โ”‚
โ”‚         โ–ผ                  โ–ผ                  โ–ผ                  โ–ผ          โ”‚
โ”‚  Chrome MCP tools    Image processing    Visual reasoning   HTML/JSON      โ”‚
โ”‚  Viewport control    Diff algorithms     Semantic analysis  reports        โ”‚
โ”‚  Wait strategies     Threshold calc      Change explanation Artifacts      โ”‚
โ”‚                                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚
โ”‚  โ”‚                         Baseline Store                                  โ”‚โ”‚
โ”‚  โ”‚  /baselines                                                             โ”‚โ”‚
โ”‚  โ”‚  โ”œโ”€โ”€ home_desktop_1920x1080.png                                        โ”‚โ”‚
โ”‚  โ”‚  โ”œโ”€โ”€ home_tablet_768x1024.png                                          โ”‚โ”‚
โ”‚  โ”‚  โ”œโ”€โ”€ home_mobile_375x667.png                                           โ”‚โ”‚
โ”‚  โ”‚  โ”œโ”€โ”€ pricing_desktop_1920x1080.png                                     โ”‚โ”‚
โ”‚  โ”‚  โ””โ”€โ”€ ...                                                               โ”‚โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.2 Screenshot Capture Flow

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         Screenshot Capture Flow                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚  Start                                                                       โ”‚
โ”‚    โ”‚                                                                         โ”‚
โ”‚    โ–ผ                                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                    โ”‚
โ”‚  โ”‚ 1. Set viewport size                โ”‚                                    โ”‚
โ”‚  โ”‚    resize_window(1920, 1080)        โ”‚                                    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                    โ”‚
โ”‚    โ”‚                                                                         โ”‚
โ”‚    โ–ผ                                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                    โ”‚
โ”‚  โ”‚ 2. Navigate to URL                  โ”‚                                    โ”‚
โ”‚  โ”‚    navigate(url)                    โ”‚                                    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                    โ”‚
โ”‚    โ”‚                                                                         โ”‚
โ”‚    โ–ผ                                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                    โ”‚
โ”‚  โ”‚ 3. Wait for page stability          โ”‚                                    โ”‚
โ”‚  โ”‚    - Network idle (no pending reqs) โ”‚                                    โ”‚
โ”‚  โ”‚    - Animations complete            โ”‚                                    โ”‚
โ”‚  โ”‚    - Fixed delay (safety margin)    โ”‚                                    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                    โ”‚
โ”‚    โ”‚                                                                         โ”‚
โ”‚    โ–ผ                                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                    โ”‚
โ”‚  โ”‚ 4. Apply masking (if configured)    โ”‚                                    โ”‚
โ”‚  โ”‚    - Hide timestamps                โ”‚                                    โ”‚
โ”‚  โ”‚    - Hide avatars                   โ”‚                                    โ”‚
โ”‚  โ”‚    - Hide dynamic ads               โ”‚                                    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                    โ”‚
โ”‚    โ”‚                                                                         โ”‚
โ”‚    โ–ผ                                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                    โ”‚
โ”‚  โ”‚ 5. Reset scroll position            โ”‚                                    โ”‚
โ”‚  โ”‚    Ensure consistent starting point โ”‚                                    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                    โ”‚
โ”‚    โ”‚                                                                         โ”‚
โ”‚    โ–ผ                                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                    โ”‚
โ”‚  โ”‚ 6. Capture screenshot               โ”‚                                    โ”‚
โ”‚  โ”‚    computer(action: "screenshot")   โ”‚                                    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                    โ”‚
โ”‚    โ”‚                                                                         โ”‚
โ”‚    โ–ผ                                                                         โ”‚
โ”‚  Save with consistent naming                                                 โ”‚
โ”‚  {page}_{viewport}_{width}x{height}.png                                     โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.3 Comparison Algorithm

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         Image Comparison Algorithm                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚  Input: baseline_image, current_image                                        โ”‚
โ”‚                                                                              โ”‚
โ”‚  Step 1: Dimension Check                                                     โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                  โ”‚
โ”‚  if (baseline.dimensions != current.dimensions):                            โ”‚
โ”‚      return FAIL("Dimensions changed: {old} -> {new}")                      โ”‚
โ”‚                                                                              โ”‚
โ”‚  Step 2: Pixel-by-Pixel Comparison                                          โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                       โ”‚
โ”‚  different_pixels = 0                                                        โ”‚
โ”‚  diff_image = create_empty_image(dimensions)                                โ”‚
โ”‚                                                                              โ”‚
โ”‚  for each pixel (x, y):                                                      โ”‚
โ”‚      baseline_color = baseline.get_pixel(x, y)                              โ”‚
โ”‚      current_color = current.get_pixel(x, y)                                โ”‚
โ”‚                                                                              โ”‚
โ”‚      if (color_distance(baseline_color, current_color) > tolerance):        โ”‚
โ”‚          different_pixels++                                                  โ”‚
โ”‚          diff_image.set_pixel(x, y, HIGHLIGHT_COLOR)                        โ”‚
โ”‚      else:                                                                   โ”‚
โ”‚          diff_image.set_pixel(x, y, current_color.grayscale())              โ”‚
โ”‚                                                                              โ”‚
โ”‚  Step 3: Calculate Similarity                                                โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                โ”‚
โ”‚  total_pixels = width * height                                              โ”‚
โ”‚  similarity = (total_pixels - different_pixels) / total_pixels * 100       โ”‚
โ”‚                                                                              โ”‚
โ”‚  Step 4: Apply Thresholds                                                    โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                   โ”‚
โ”‚  if (similarity >= 99.5%):  return PASS                                     โ”‚
โ”‚  if (similarity >= 95.0%):  return WARN                                     โ”‚
โ”‚  else:                      return FAIL                                     โ”‚
โ”‚                                                                              โ”‚
โ”‚  Output: (result, similarity_percentage, diff_image)                        โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.4 Claude Analysis Integration

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Claude Visual Analysis Workflow                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚  When similarity < 99%:                                                      โ”‚
โ”‚                                                                              โ”‚
โ”‚  1. Prepare Context                                                          โ”‚
โ”‚     โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                       โ”‚
โ”‚     - Page URL and viewport info                                            โ”‚
โ”‚     - Similarity percentage                                                  โ”‚
โ”‚     - Region of largest difference                                          โ”‚
โ”‚                                                                              โ”‚
โ”‚  2. Show Claude the Images                                                   โ”‚
โ”‚     โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                โ”‚
โ”‚     Claude can "see" the baseline, current, and diff images                 โ”‚
โ”‚     by referencing the screenshots captured                                  โ”‚
โ”‚                                                                              โ”‚
โ”‚  3. Request Analysis                                                         โ”‚
โ”‚     โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                       โ”‚
โ”‚     Prompt Claude to:                                                        โ”‚
โ”‚     - Describe what visually changed                                        โ”‚
โ”‚     - Identify if changes look intentional                                  โ”‚
โ”‚     - Suggest whether to update baseline                                    โ”‚
โ”‚     - Note potential regression patterns                                    โ”‚
โ”‚                                                                              โ”‚
โ”‚  4. Extract Structured Response                                              โ”‚
โ”‚     โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                              โ”‚
โ”‚     {                                                                        โ”‚
โ”‚       "changes": [                                                           โ”‚
โ”‚         { "type": "layout", "description": "Cards reordered" },            โ”‚
โ”‚         { "type": "content", "description": "Price updated" }              โ”‚
โ”‚       ],                                                                     โ”‚
โ”‚       "likely_intentional": true,                                           โ”‚
โ”‚       "regression_risk": "low",                                             โ”‚
โ”‚       "recommendation": "Review with product team before updating"         โ”‚
โ”‚     }                                                                        โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

5. Implementation Guide

5.1 Chrome MCP Tools for Visual Testing

resize_window - Set Viewport

// Set viewport to exact dimensions
mcp__claude-in-chrome__resize_window({
  width: 1920,
  height: 1080,
  tabId: 12345
})

// Common viewport configurations
const VIEWPORTS = {
  desktop: { width: 1920, height: 1080 },
  tablet: { width: 768, height: 1024 },
  mobile: { width: 375, height: 667 }
};

computer - Screenshot Capture

// Capture current viewport
mcp__claude-in-chrome__computer({
  action: "screenshot",
  tabId: 12345
})
// Returns: Image data that Claude can reference

// Wait for animations/network
mcp__claude-in-chrome__computer({
  action: "wait",
  duration: 2,  // seconds
  tabId: 12345
})

// Scroll to top before capture
mcp__claude-in-chrome__computer({
  action: "scroll",
  scroll_direction: "up",
  scroll_amount: 10,  // Maximum scroll
  coordinate: [960, 540],
  tabId: 12345
})

zoom - Capture Specific Regions

// Capture specific region for detailed comparison
mcp__claude-in-chrome__computer({
  action: "zoom",
  region: [100, 200, 500, 400],  // [x0, y0, x1, y1]
  tabId: 12345
})
// Useful for comparing specific components

javascript_tool - Apply Masks

// Hide dynamic content before screenshot
mcp__claude-in-chrome__javascript_tool({
  action: "javascript_exec",
  text: `
    // Hide timestamps
    document.querySelectorAll('[data-testid="timestamp"]')
      .forEach(el => el.style.visibility = 'hidden');

    // Hide avatars
    document.querySelectorAll('.user-avatar')
      .forEach(el => el.style.visibility = 'hidden');

    // Hide ads
    document.querySelectorAll('.ad-container')
      .forEach(el => el.style.display = 'none');
  `,
  tabId: 12345
})

5.2 Implementation Phases

Phase 1: Basic Capture (Days 1-3)

Goal: Capture consistent screenshots at multiple viewports.

Tasks:

  1. Implement viewport setting
  2. Navigate and wait for load
  3. Capture screenshot
  4. Save with consistent naming

Checkpoint: Capture same page 5 times, verify identical images.

Expected file structure:
/captures/
  home_desktop_1920x1080.png
  home_tablet_768x1024.png
  home_mobile_375x667.png

Phase 2: Baseline Management (Days 4-5)

Goal: Store and retrieve baseline images.

Tasks:

  1. Create baseline directory structure
  2. Implement first-run baseline creation
  3. Add baseline lookup logic
  4. Support baseline updates

Checkpoint: First run creates baseline, second run compares.

Phase 3: Image Comparison (Days 6-8)

Goal: Compare images and calculate similarity.

Tasks:

  1. Implement pixel comparison (can use Claudeโ€™s vision)
  2. Calculate similarity percentage
  3. Generate diff visualization
  4. Apply pass/warn/fail thresholds

Checkpoint: Detect intentionally modified page as different.

Phase 4: Dynamic Content Handling (Days 9-10)

Goal: Mask dynamic content to reduce flakiness.

Tasks:

  1. Define masking configuration
  2. Inject CSS/JS to hide dynamic elements
  3. Test with timestamp-heavy pages
  4. Document masking patterns

Checkpoint: Same page with different timestamps shows as match.

Phase 5: Claude Analysis Integration (Days 11-12)

Goal: Get semantic analysis of differences.

Tasks:

  1. Prepare comparison context for Claude
  2. Show Claude baseline and current images
  3. Request structured analysis
  4. Integrate analysis into report

Checkpoint: Claude correctly identifies type of visual change.

Phase 6: Report Generation (Days 13-14)

Goal: Create actionable HTML reports.

Tasks:

  1. Design report template
  2. Include side-by-side images
  3. Add Claude analysis sections
  4. Support filtering and navigation

Checkpoint: Complete HTML report generated for test suite.


6. Testing Strategy

6.1 Test Scenarios

Scenario Setup Expected Result
Identical pages Same page, two runs 100% match, PASS
Color change Change button color <99% match, diff highlights button
Layout shift Move element <95% match, structural change noted
Content change Update text <99% match, text region highlighted
Dynamic content Page with timestamps Masked, shows match
Viewport change Test all 3 sizes Independent results per viewport

6.2 Validation Approaches

  1. Known changes: Modify a test page, verify detection
  2. Anti-flakiness: Run same page 10 times, all should match
  3. Threshold testing: Create small changes, verify correct classification
  4. Cross-viewport: Ensure responsive issues are caught

7. Common Pitfalls & Debugging

7.1 Frequent Issues

Issue Symptom Solution
Flaky anti-aliasing Random 0.1% diffs Increase tolerance threshold
Animations captured Different frame each time Add wait, disable animations via CSS
Network timing Images missing Wait for network idle
Font differences Text regions differ Use web fonts consistently
Scroll position Different content visible Reset scroll before capture

7.2 Debugging Commands

// Check network activity
mcp__claude-in-chrome__read_network_requests({
  tabId: 12345,
  urlPattern: ""  // All requests
})

// Check for JavaScript errors
mcp__claude-in-chrome__read_console_messages({
  tabId: 12345,
  pattern: "error"
})

// Verify page state
mcp__claude-in-chrome__read_page({
  tabId: 12345,
  filter: "all"
})

// Take debug screenshot
mcp__claude-in-chrome__computer({
  action: "screenshot",
  tabId: 12345
})

7.3 Stabilization Techniques

// Disable animations via CSS injection
mcp__claude-in-chrome__javascript_tool({
  action: "javascript_exec",
  text: `
    const style = document.createElement('style');
    style.textContent = '*, *::before, *::after { ' +
      'animation-duration: 0s !important; ' +
      'transition-duration: 0s !important; ' +
    '}';
    document.head.appendChild(style);
  `,
  tabId: 12345
})

// Wait for specific element to be visible
mcp__claude-in-chrome__find({
  tabId: 12345,
  query: "main content area"
})
// Then proceed with screenshot

8. Real World Outcome

When you complete this project, hereโ€™s a full visual regression test session:

You: Run visual regression tests on staging.example.com

Claude: I'll run visual regression tests across all configured pages and viewports.

================================================================================
            VISUAL REGRESSION TEST SESSION - 2024-12-22 14:30 UTC
================================================================================

Configuration:
  Base URL: https://staging.example.com
  Viewports: Desktop (1920x1080), Tablet (768x1024), Mobile (375x667)
  Pages: /home, /pricing, /about, /contact, /login
  Threshold: 99.5% (PASS), 95% (WARN), <95% (FAIL)

Starting tests...

[1/15] /home @ Desktop
  Setting viewport: 1920x1080... OK
  Navigating... OK
  Waiting for stability... OK (2.1s)
  Disabling animations... OK
  Capturing screenshot... OK
  Comparing to baseline... 99.87% similar
  Result: PASS

[2/15] /home @ Tablet
  Setting viewport: 768x1024... OK
  Navigating... OK
  Waiting for stability... OK (1.8s)
  Capturing screenshot... OK
  Comparing to baseline... 98.23% similar
  Result: WARN - Minor differences detected

  Requesting Claude analysis...

  Analysis: "The navigation menu hamburger icon has shifted
  2-3 pixels to the right compared to baseline. The main
  content area appears identical. This is likely a minor
  responsive CSS adjustment. Low regression risk."

[3/15] /home @ Mobile
  ...
  Result: PASS (99.92% similar)

[4/15] /pricing @ Desktop
  Setting viewport: 1920x1080... OK
  Navigating... OK
  Waiting for stability... OK (2.4s)
  Capturing screenshot... OK
  Comparing to baseline... 72.14% similar
  Result: FAIL - Significant visual changes

  Requesting Claude analysis...

  Analysis: "Major visual changes detected on the pricing page:

  OBSERVED CHANGES:
  1. Card Layout: The three pricing cards have been completely
     redesigned. They now feature:
     - Rounded corners (was: square)
     - Drop shadows (was: flat)
     - Gradient backgrounds (was: solid colors)

  2. Pricing Structure:
     - 'Starter' tier: $9/mo (was: $12/mo) - price decrease
     - 'Pro' tier: $29/mo (was: $25/mo) - price increase
     - 'Enterprise': Now shows 'Contact Sales' (was: $99/mo)

  3. New Elements:
     - 'Most Popular' badge on Pro tier
     - Feature comparison table below cards
     - Annual billing toggle (new)

  ASSESSMENT:
  These changes appear to be a significant product redesign,
  not a regression. The visual styling is consistent and
  intentional. Recommend:

  1. Confirm with product team this is the expected new design
  2. If confirmed, update baseline with: --update-baseline pricing
  3. Consider adding visual regression tests for new elements"

[5/15] /pricing @ Tablet
  ...
  Result: FAIL (68.92% similar)
  Analysis: "Same redesign as desktop, plus responsive layout
  changes - cards now stack vertically. Consistent with
  intentional redesign."

... [remaining tests] ...

================================================================================
                            FINAL SUMMARY
================================================================================

Total Tests: 15
  PASS: 11 (73.3%)
  WARN: 2 (13.3%)
  FAIL: 2 (13.3%)

PASS Details (11):
  /home (Desktop, Mobile)
  /about (All viewports)
  /contact (All viewports)
  /login (All viewports)

WARN Details (2):
  /home @ Tablet - Minor hamburger icon shift
  /pricing @ Mobile - Same as Desktop + responsive

FAIL Details (2):
  /pricing @ Desktop - Major redesign
  /pricing @ Tablet - Major redesign

================================================================================
                          RECOMMENDED ACTIONS
================================================================================

PRIORITY 1 - Requires Immediate Decision:
  /pricing: Major visual redesign detected across all viewports

  Actions:
  a) If redesign is approved:
     Run: visual-test --update-baseline pricing

  b) If redesign is unexpected:
     Investigate recent commits to pricing page components
     Check staging deployment for unintended changes

PRIORITY 2 - Monitor:
  /home @ Tablet: Minor alignment shift

  Action: Low priority, but document in CSS changelog

================================================================================

Artifacts Generated:
  Report: ./visual-reports/2024-12-22-143000/index.html
  Diffs:  ./visual-reports/2024-12-22-143000/diffs/

Open report? [y/N]

9. The Core Question Youโ€™re Answering

โ€œHow do you detect unintended visual changes while ignoring acceptable variations?โ€

Visual testing is deceptively hard. The core challenges:

  1. Noise vs Signal: Anti-aliasing and font rendering create false positives
  2. Intentional vs Regression: Not all changes are bugs
  3. Reproducibility: Screenshots must be consistent across runs
  4. Actionability: Pixel counts donโ€™t tell you what to fix

Claudeโ€™s visual reasoning transforms visual testing from โ€œ87% differentโ€ to โ€œThe pricing cards were redesigned with new colors and a popular badge was added - this appears intentional.โ€


10. Concepts You Must Understand First

Before starting this project, ensure you understand:

Concept Why It Matters Where to Learn
Image comparison basics Core of diff algorithm ImageMagick documentation
Responsive design Viewport testing โ€œResponsive Web Designโ€ by Marcotte
CSS animations Cause of flakiness MDN - CSS Animations
Git LFS Storing large baselines Git LFS documentation
CI/CD pipelines Integration context โ€œContinuous Deliveryโ€ by Humble
Perceptual hashing Alternative comparison pHash documentation

11. Questions to Guide Your Design

Work through these questions BEFORE implementing:

  1. Baseline Storage: Where do baselines live? Git? Cloud storage? How do you handle large images?

  2. Threshold Selection: What percentage constitutes pass/warn/fail? Should this be configurable per page?

  3. Dynamic Content: How do you identify and mask timestamps, avatars, ads? Manual config or auto-detection?

  4. Multi-Browser: This project uses Chrome. How would you extend to Firefox/Safari?

  5. CI Integration: How would this run in a CI pipeline? What exit codes? What artifacts?

  6. Baseline Updates: Who can approve baseline updates? How is this tracked?


12. Thinking Exercise

Before implementing, consider this scenario:

Youโ€™re testing an e-commerce site. The pricing page shows:

  • Current prices (may change)
  • โ€œSale ends in 2:45:32โ€ countdown
  • User avatar (logged in)
  • Recently viewed items (personalized)
  • Ad banner (rotates)

Questions to answer on paper:

  1. Which elements should be masked? Why?
  2. How would you mask each element?
  3. What if prices change daily? Is that a test or data issue?
  4. How would you test the page structure without testing content?
  5. Whatโ€™s the difference between โ€œprice changedโ€ and โ€œprice formatting brokeโ€?

13. The Interview Questions Theyโ€™ll Ask

After completing this project, youโ€™ll be ready for:

  1. โ€œHow would you handle dynamic content in visual tests?โ€
    • Expected: Masking strategies, stable identifiers, content freezing
    • Bonus: Discuss boundary between visual and data testing
  2. โ€œWhatโ€™s your strategy for cross-browser visual testing?โ€
    • Expected: Separate baselines per browser, understand rendering differences
    • Bonus: Discuss acceptable tolerance per browser
  3. โ€œHow do you reduce flakiness in screenshot comparisons?โ€
    • Expected: Wait strategies, animation disabling, threshold tuning
    • Bonus: Discuss deterministic capture environments (Docker)
  4. โ€œHow would you implement this in a CI pipeline?โ€
    • Expected: Headless capture, baseline storage, failure handling
    • Bonus: Discuss parallel execution, artifact storage
  5. โ€œWhatโ€™s the tradeoff between pixel-perfect and perceptual testing?โ€
    • Expected: Sensitivity vs noise, use cases for each
    • Bonus: Discuss hybrid approaches

14. Hints in Layers

If you get stuck, reveal hints progressively:

Hint 1: Consistent Viewport

Always set viewport BEFORE navigating:

// Set size first
mcp__claude-in-chrome__resize_window({ width: 1920, height: 1080, tabId })
// Then navigate
mcp__claude-in-chrome__navigate({ url: targetUrl, tabId })
// Then wait
mcp__claude-in-chrome__computer({ action: "wait", duration: 2, tabId })
// Then screenshot
mcp__claude-in-chrome__computer({ action: "screenshot", tabId })

This order ensures consistent capture dimensions.

Hint 2: Baseline Naming

Use content-addressable naming:

{page_path}_{viewport}_{width}x{height}.png

Examples:
home_desktop_1920x1080.png
pricing_tablet_768x1024.png
contact-form_mobile_375x667.png

For page paths with slashes:
/product/123 โ†’ product-123_desktop_1920x1080.png

This makes baseline lookup deterministic.

Hint 3: Comparison Without Image Library

Claude can compare images visually! Instead of pixel algorithms:

  1. Capture current screenshot
  2. Show Claude both baseline and current images
  3. Ask Claude to identify and describe differences
  4. Use Claudeโ€™s response as the โ€œdiffโ€

This is simpler and gives semantic analysis for free.

Hint 4: Masking Dynamic Content

Inject CSS to hide dynamic elements:

mcp__claude-in-chrome__javascript_tool({
  action: "javascript_exec",
  text: `
    // Create mask style
    const style = document.createElement('style');
    style.textContent = \`
      [data-testid="timestamp"],
      [data-testid="avatar"],
      .ad-container,
      .countdown-timer {
        visibility: hidden !important;
      }
    \`;
    document.head.appendChild(style);
  `,
  tabId
})

Elements remain in layout but donโ€™t affect visual comparison.


15. Books That Will Help

Topic Book Chapter/Section
Visual testing concepts โ€œPractical Test-Driven Developmentโ€ by Viktor Farcic Ch. 9
Image processing basics โ€œDigital Image Processingโ€ by Gonzalez & Woods Ch. 2-3
CI/CD integration โ€œContinuous Deliveryโ€ by Humble & Farley Ch. 5
Responsive design โ€œResponsive Web Designโ€ by Ethan Marcotte All
Testing strategies โ€œThe Art of Software Testingโ€ by Myers Ch. 6
Percy documentation Percy.io docs Getting Started

16. Extensions & Challenges

16.1 Beginner Extensions

  • Multiple base URLs: Compare staging vs production
  • Scheduled runs: Automated nightly visual regression
  • Slack notifications: Alert on failures

16.2 Intermediate Extensions

  • Component-level testing: Test specific components, not pages
  • Animation testing: Capture GIFs instead of stills
  • A11y overlay: Show accessibility issues on screenshots

16.3 Advanced Extensions

  • AI-powered baseline updates: Auto-approve minor changes
  • Cross-browser: Support Firefox via different MCP
  • Performance correlation: Link visual changes to perf metrics

17. Learning Milestones

Track your progress through these checkpoints:

Milestone Description Verification
1. Consistent capture Same page = same image 5 captures identical
2. Baseline storage Save and retrieve baselines Baseline persists across runs
3. Diff detection Identify changed pages Modified page shows as different
4. Dynamic masking Timestamps donโ€™t cause diffs Dynamic page shows as match
5. Claude analysis Semantic change description Analysis explains what changed
6. Report generation Complete HTML report Report includes all sections

This guide was expanded from CLAUDE_CODE_MASTERY_40_PROJECTS.md. For the complete learning path, see the project index.