Project 31: Visual Regression Testing - Screenshot Diff Engine
Project 31: Visual Regression Testing - Screenshot Diff Engine
Build a visual regression testing system that captures screenshots through Chrome MCP, compares them to baselines, highlights differences, and uses Claudeโs visual reasoning to analyze what changed and why it matters.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1-2 weeks |
| Language | TypeScript (Alternatives: Python, Go) |
| Prerequisites | Projects 29-30, image processing concepts |
| Key Topics | Visual testing, image comparison, baseline management, CI/CD integration |
| Main Book | โPractical Test-Driven Developmentโ by Viktor Farcic |
1. Learning Objectives
By completing this project, you will:
- Capture consistent screenshots: Handle viewport sizing, timing, and dynamic content
- Implement image comparison: Understand pixel diff, perceptual hashing, and thresholds
- Manage baselines: Version control visual baselines and handle intentional changes
- Leverage Claudeโs visual reasoning: Go beyond pixel counts to semantic change analysis
- Handle test flakiness: Mask dynamic content and stabilize captures
- Generate actionable reports: Create visual diff reports that developers can act on
2. Theoretical Foundation
2.1 Why Visual Testing Matters
Functional tests verify behavior. Visual tests verify appearance. Many bugs slip through functional tests:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ What Functional Tests Miss โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Button works Button hidden behind overlay โ
โ โ click handler fires โ user can't see it โ
โ โ
โ Form validates Form text is white on white โ
โ โ error message set โ error is invisible โ
โ โ
โ Modal opens Modal is 10,000px wide โ
โ โ DOM element present โ completely broken UI โ
โ โ
โ CSS property set CSS has conflicting rules โ
โ โ style is applied โ wrong style wins cascade โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2.2 Image Comparison Algorithms
Three main approaches to comparing images:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Comparison Algorithm Spectrum โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Pixel-by-Pixel Perceptual Hashing Structural โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโ โ
โ โ
โ Compare each pixel Hash visual features Compare layout โ
โ RGB value Compare hashes and hierarchy โ
โ โ
โ Pros: Pros: Pros: โ
โ - Exact matching - Tolerant of minor - Semantic โ
โ - Simple to implement changes understanding โ
โ - Fast computation - Fast comparison - Resize tolerant โ
โ - Rotation tolerant โ
โ โ
โ Cons: Cons: Cons: โ
โ - Anti-aliasing noise - May miss subtle - Complex to โ
โ - Font rendering diffs changes implement โ
โ - False positives - Hash collisions - Slow โ
โ โ
โ Use when: Use when: Use when: โ
โ - Pixel-perfect required - General similarity ok - Layout testing โ
โ - Controlled environment - Cross-browser testing - Responsive โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2.3 The Flakiness Problem
Visual tests are notorious for flakiness. Common causes:
| Source | Example | Mitigation |
|---|---|---|
| Anti-aliasing | Font edges differ by 1 pixel | Threshold tolerance (e.g., 0.1%) |
| Font rendering | OS renders fonts differently | Use web fonts, consistent environment |
| Animations | Screenshot mid-animation | Wait for animations, disable them |
| Dynamic content | Timestamps, avatars | Mask known dynamic regions |
| Network timing | Images not loaded | Wait for network idle |
| Scroll position | Page scrolled differently | Reset scroll before capture |
| Viewport size | Browser chrome varies | Use consistent viewport size |
| Date/time | โPosted 2 minutes agoโ | Freeze time or mask |
2.4 Baseline Management
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Baseline Workflow โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ First Run (No Baseline) Subsequent Runs โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โ โ
โ Capture screenshot Capture new screenshot โ
โ โ โ โ
โ โผ โผ โ
โ Save as baseline โโโโโโโโโโโโโโโถ Compare to baseline โ
โ (reviewed by human) โ โ
โ โโโโโโโโโโโดโโโโโโโโโโ โ
โ โ โ โ
โ Match Differ โ
โ โ โ โ
โ โผ โผ โ
โ PASS Human Review โ
โ โโโโโโโโโโโดโโโโโโโโโโ โ
โ โ โ โ
โ Intentional Regression โ
โ โ โ โ
โ โผ โผ โ
โ Update baseline Fix bug โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2.5 Claudeโs Visual Reasoning Advantage
Traditional visual testing tools give you pixel counts. Claude can tell you what changed:
Traditional Tool:
"87.3% similar. 12.7% pixels differ."
Claude's Analysis:
"The pricing cards have been rearranged. The 'Pro' tier moved
from position 2 to position 3. The 'Enterprise' card now shows
'Contact Us' instead of a price. The overall color scheme and
layout remain consistent. This appears to be an intentional
product change, not a regression."
This semantic understanding is the unique value proposition of this project.
3. Project Specification
3.1 What You Will Build
A visual regression testing system that:
- Captures screenshots at consistent viewport sizes
- Compares new captures against baseline images
- Generates diff visualizations showing changes
- Uses Claude to analyze and explain differences
- Produces HTML reports with side-by-side comparisons
- Supports multiple viewports (desktop, tablet, mobile)
3.2 Functional Requirements
- Consistent Capture
- Set viewport to specific dimensions
- Wait for network idle and animations
- Mask known dynamic content
- Capture full page or specific regions
- Baseline Management
- Store baselines with content-addressable naming
- Support baseline creation on first run
- Enable baseline updates via approval
- Version control integration (Git LFS)
- Image Comparison
- Pixel-by-pixel diff with configurable threshold
- Generate highlighted diff images
- Calculate similarity percentage
- Support region-specific comparisons
- Semantic Analysis
- Use Claude to analyze diff images
- Explain what changed in human terms
- Suggest whether change is intentional
- Identify regression patterns
- Report Generation
- HTML report with side-by-side images
- Filterable by pass/fail status
- Exportable to CI systems
- Links to specific test results
3.3 Example Output
You: Run visual regression tests on our staging site
Claude: I'll capture screenshots and compare against baselines.
[Setting viewport: 1920x1080 (Desktop)]
[Navigating to /home...]
[Waiting for animations...]
[Capturing screenshot...]
[Comparing against baseline...]
================================================================================
VISUAL REGRESSION REPORT - 2024-12-22 14:30 UTC
================================================================================
SUMMARY
-------
Pages Tested: 5
Viewports: Desktop (1920x1080), Tablet (768x1024), Mobile (375x667)
Total Comparisons: 15
RESULTS
-------
/home
โโโ Desktop: PASS (99.8% similar)
โ Minor anti-aliasing difference in header font
โ
โโโ Tablet: WARN (98.2% similar)
โ Difference: Button alignment shifted 3px left
โ [Screenshot shows highlighted region]
โ Claude Analysis: "The navigation buttons appear to have
โ shifted slightly. This may be intentional responsive
โ adjustment or an unintended side effect of CSS changes."
โ
โโโ Mobile: PASS (99.9% similar)
/pricing
โโโ Desktop: FAIL (87.3% similar)
โ [Side-by-side diff image generated]
โ
โ Claude Analysis:
โ "Significant visual changes detected on the pricing page:
โ
โ 1. Card Reordering: The 'Pro' tier has moved from position 2
โ to position 3. The 'Enterprise' tier is now in position 2.
โ
โ 2. Price Change: The 'Pro' tier shows $19/mo instead of $15/mo
โ
โ 3. New Badge: 'Most Popular' badge added to 'Enterprise' tier
โ
โ These appear to be intentional product changes rather than
โ regressions. Recommend reviewing with product team."
โ
โ Action needed: Approve new baseline or revert changes
โ
โโโ Tablet: FAIL (85.1% similar)
โ [Same issues as Desktop, plus responsive layout shift]
โ
โโโ Mobile: WARN (96.4% similar)
โ Font size appears smaller than baseline
/about
โโโ All viewports: PASS (>99.5% similar)
/contact
โโโ All viewports: PASS (>99.5% similar)
/login
โโโ All viewports: PASS (>99.5% similar)
================================================================================
ACTIONS REQUIRED
================================================================================
1. /pricing: Major visual changes detected
- Review changes with product team
- If intentional: Run with --update-baseline
- If regression: Investigate CSS/component changes
2. /home (Tablet): Minor alignment shift
- Low priority: Likely responsive adjustment
================================================================================
Report saved: ./visual-reports/2024-12-22-143000/index.html
Diff images: ./visual-reports/2024-12-22-143000/diffs/
4. Solution Architecture
4.1 High-Level Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Visual Regression Testing System โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Capture โโโโถโ Comparison โโโโถโ Analysis โโโโถโ Report โ โ
โ โ Engine โ โ Engine โ โ (Claude) โ โ Generator โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ โ โ โ โ
โ โผ โผ โผ โผ โ
โ Chrome MCP tools Image processing Visual reasoning HTML/JSON โ
โ Viewport control Diff algorithms Semantic analysis reports โ
โ Wait strategies Threshold calc Change explanation Artifacts โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Baseline Store โโ
โ โ /baselines โโ
โ โ โโโ home_desktop_1920x1080.png โโ
โ โ โโโ home_tablet_768x1024.png โโ
โ โ โโโ home_mobile_375x667.png โโ
โ โ โโโ pricing_desktop_1920x1080.png โโ
โ โ โโโ ... โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
4.2 Screenshot Capture Flow
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Screenshot Capture Flow โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Start โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ 1. Set viewport size โ โ
โ โ resize_window(1920, 1080) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ 2. Navigate to URL โ โ
โ โ navigate(url) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ 3. Wait for page stability โ โ
โ โ - Network idle (no pending reqs) โ โ
โ โ - Animations complete โ โ
โ โ - Fixed delay (safety margin) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ 4. Apply masking (if configured) โ โ
โ โ - Hide timestamps โ โ
โ โ - Hide avatars โ โ
โ โ - Hide dynamic ads โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ 5. Reset scroll position โ โ
โ โ Ensure consistent starting point โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ 6. Capture screenshot โ โ
โ โ computer(action: "screenshot") โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ Save with consistent naming โ
โ {page}_{viewport}_{width}x{height}.png โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
4.3 Comparison Algorithm
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Image Comparison Algorithm โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Input: baseline_image, current_image โ
โ โ
โ Step 1: Dimension Check โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ if (baseline.dimensions != current.dimensions): โ
โ return FAIL("Dimensions changed: {old} -> {new}") โ
โ โ
โ Step 2: Pixel-by-Pixel Comparison โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ different_pixels = 0 โ
โ diff_image = create_empty_image(dimensions) โ
โ โ
โ for each pixel (x, y): โ
โ baseline_color = baseline.get_pixel(x, y) โ
โ current_color = current.get_pixel(x, y) โ
โ โ
โ if (color_distance(baseline_color, current_color) > tolerance): โ
โ different_pixels++ โ
โ diff_image.set_pixel(x, y, HIGHLIGHT_COLOR) โ
โ else: โ
โ diff_image.set_pixel(x, y, current_color.grayscale()) โ
โ โ
โ Step 3: Calculate Similarity โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ total_pixels = width * height โ
โ similarity = (total_pixels - different_pixels) / total_pixels * 100 โ
โ โ
โ Step 4: Apply Thresholds โ
โ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ if (similarity >= 99.5%): return PASS โ
โ if (similarity >= 95.0%): return WARN โ
โ else: return FAIL โ
โ โ
โ Output: (result, similarity_percentage, diff_image) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
4.4 Claude Analysis Integration
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Claude Visual Analysis Workflow โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ When similarity < 99%: โ
โ โ
โ 1. Prepare Context โ
โ โโโโโโโโโโโโโโโโโ โ
โ - Page URL and viewport info โ
โ - Similarity percentage โ
โ - Region of largest difference โ
โ โ
โ 2. Show Claude the Images โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Claude can "see" the baseline, current, and diff images โ
โ by referencing the screenshots captured โ
โ โ
โ 3. Request Analysis โ
โ โโโโโโโโโโโโโโโโโ โ
โ Prompt Claude to: โ
โ - Describe what visually changed โ
โ - Identify if changes look intentional โ
โ - Suggest whether to update baseline โ
โ - Note potential regression patterns โ
โ โ
โ 4. Extract Structured Response โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ { โ
โ "changes": [ โ
โ { "type": "layout", "description": "Cards reordered" }, โ
โ { "type": "content", "description": "Price updated" } โ
โ ], โ
โ "likely_intentional": true, โ
โ "regression_risk": "low", โ
โ "recommendation": "Review with product team before updating" โ
โ } โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
5. Implementation Guide
5.1 Chrome MCP Tools for Visual Testing
resize_window - Set Viewport
// Set viewport to exact dimensions
mcp__claude-in-chrome__resize_window({
width: 1920,
height: 1080,
tabId: 12345
})
// Common viewport configurations
const VIEWPORTS = {
desktop: { width: 1920, height: 1080 },
tablet: { width: 768, height: 1024 },
mobile: { width: 375, height: 667 }
};
computer - Screenshot Capture
// Capture current viewport
mcp__claude-in-chrome__computer({
action: "screenshot",
tabId: 12345
})
// Returns: Image data that Claude can reference
// Wait for animations/network
mcp__claude-in-chrome__computer({
action: "wait",
duration: 2, // seconds
tabId: 12345
})
// Scroll to top before capture
mcp__claude-in-chrome__computer({
action: "scroll",
scroll_direction: "up",
scroll_amount: 10, // Maximum scroll
coordinate: [960, 540],
tabId: 12345
})
zoom - Capture Specific Regions
// Capture specific region for detailed comparison
mcp__claude-in-chrome__computer({
action: "zoom",
region: [100, 200, 500, 400], // [x0, y0, x1, y1]
tabId: 12345
})
// Useful for comparing specific components
javascript_tool - Apply Masks
// Hide dynamic content before screenshot
mcp__claude-in-chrome__javascript_tool({
action: "javascript_exec",
text: `
// Hide timestamps
document.querySelectorAll('[data-testid="timestamp"]')
.forEach(el => el.style.visibility = 'hidden');
// Hide avatars
document.querySelectorAll('.user-avatar')
.forEach(el => el.style.visibility = 'hidden');
// Hide ads
document.querySelectorAll('.ad-container')
.forEach(el => el.style.display = 'none');
`,
tabId: 12345
})
5.2 Implementation Phases
Phase 1: Basic Capture (Days 1-3)
Goal: Capture consistent screenshots at multiple viewports.
Tasks:
- Implement viewport setting
- Navigate and wait for load
- Capture screenshot
- Save with consistent naming
Checkpoint: Capture same page 5 times, verify identical images.
Expected file structure:
/captures/
home_desktop_1920x1080.png
home_tablet_768x1024.png
home_mobile_375x667.png
Phase 2: Baseline Management (Days 4-5)
Goal: Store and retrieve baseline images.
Tasks:
- Create baseline directory structure
- Implement first-run baseline creation
- Add baseline lookup logic
- Support baseline updates
Checkpoint: First run creates baseline, second run compares.
Phase 3: Image Comparison (Days 6-8)
Goal: Compare images and calculate similarity.
Tasks:
- Implement pixel comparison (can use Claudeโs vision)
- Calculate similarity percentage
- Generate diff visualization
- Apply pass/warn/fail thresholds
Checkpoint: Detect intentionally modified page as different.
Phase 4: Dynamic Content Handling (Days 9-10)
Goal: Mask dynamic content to reduce flakiness.
Tasks:
- Define masking configuration
- Inject CSS/JS to hide dynamic elements
- Test with timestamp-heavy pages
- Document masking patterns
Checkpoint: Same page with different timestamps shows as match.
Phase 5: Claude Analysis Integration (Days 11-12)
Goal: Get semantic analysis of differences.
Tasks:
- Prepare comparison context for Claude
- Show Claude baseline and current images
- Request structured analysis
- Integrate analysis into report
Checkpoint: Claude correctly identifies type of visual change.
Phase 6: Report Generation (Days 13-14)
Goal: Create actionable HTML reports.
Tasks:
- Design report template
- Include side-by-side images
- Add Claude analysis sections
- Support filtering and navigation
Checkpoint: Complete HTML report generated for test suite.
6. Testing Strategy
6.1 Test Scenarios
| Scenario | Setup | Expected Result |
|---|---|---|
| Identical pages | Same page, two runs | 100% match, PASS |
| Color change | Change button color | <99% match, diff highlights button |
| Layout shift | Move element | <95% match, structural change noted |
| Content change | Update text | <99% match, text region highlighted |
| Dynamic content | Page with timestamps | Masked, shows match |
| Viewport change | Test all 3 sizes | Independent results per viewport |
6.2 Validation Approaches
- Known changes: Modify a test page, verify detection
- Anti-flakiness: Run same page 10 times, all should match
- Threshold testing: Create small changes, verify correct classification
- Cross-viewport: Ensure responsive issues are caught
7. Common Pitfalls & Debugging
7.1 Frequent Issues
| Issue | Symptom | Solution |
|---|---|---|
| Flaky anti-aliasing | Random 0.1% diffs | Increase tolerance threshold |
| Animations captured | Different frame each time | Add wait, disable animations via CSS |
| Network timing | Images missing | Wait for network idle |
| Font differences | Text regions differ | Use web fonts consistently |
| Scroll position | Different content visible | Reset scroll before capture |
7.2 Debugging Commands
// Check network activity
mcp__claude-in-chrome__read_network_requests({
tabId: 12345,
urlPattern: "" // All requests
})
// Check for JavaScript errors
mcp__claude-in-chrome__read_console_messages({
tabId: 12345,
pattern: "error"
})
// Verify page state
mcp__claude-in-chrome__read_page({
tabId: 12345,
filter: "all"
})
// Take debug screenshot
mcp__claude-in-chrome__computer({
action: "screenshot",
tabId: 12345
})
7.3 Stabilization Techniques
// Disable animations via CSS injection
mcp__claude-in-chrome__javascript_tool({
action: "javascript_exec",
text: `
const style = document.createElement('style');
style.textContent = '*, *::before, *::after { ' +
'animation-duration: 0s !important; ' +
'transition-duration: 0s !important; ' +
'}';
document.head.appendChild(style);
`,
tabId: 12345
})
// Wait for specific element to be visible
mcp__claude-in-chrome__find({
tabId: 12345,
query: "main content area"
})
// Then proceed with screenshot
8. Real World Outcome
When you complete this project, hereโs a full visual regression test session:
You: Run visual regression tests on staging.example.com
Claude: I'll run visual regression tests across all configured pages and viewports.
================================================================================
VISUAL REGRESSION TEST SESSION - 2024-12-22 14:30 UTC
================================================================================
Configuration:
Base URL: https://staging.example.com
Viewports: Desktop (1920x1080), Tablet (768x1024), Mobile (375x667)
Pages: /home, /pricing, /about, /contact, /login
Threshold: 99.5% (PASS), 95% (WARN), <95% (FAIL)
Starting tests...
[1/15] /home @ Desktop
Setting viewport: 1920x1080... OK
Navigating... OK
Waiting for stability... OK (2.1s)
Disabling animations... OK
Capturing screenshot... OK
Comparing to baseline... 99.87% similar
Result: PASS
[2/15] /home @ Tablet
Setting viewport: 768x1024... OK
Navigating... OK
Waiting for stability... OK (1.8s)
Capturing screenshot... OK
Comparing to baseline... 98.23% similar
Result: WARN - Minor differences detected
Requesting Claude analysis...
Analysis: "The navigation menu hamburger icon has shifted
2-3 pixels to the right compared to baseline. The main
content area appears identical. This is likely a minor
responsive CSS adjustment. Low regression risk."
[3/15] /home @ Mobile
...
Result: PASS (99.92% similar)
[4/15] /pricing @ Desktop
Setting viewport: 1920x1080... OK
Navigating... OK
Waiting for stability... OK (2.4s)
Capturing screenshot... OK
Comparing to baseline... 72.14% similar
Result: FAIL - Significant visual changes
Requesting Claude analysis...
Analysis: "Major visual changes detected on the pricing page:
OBSERVED CHANGES:
1. Card Layout: The three pricing cards have been completely
redesigned. They now feature:
- Rounded corners (was: square)
- Drop shadows (was: flat)
- Gradient backgrounds (was: solid colors)
2. Pricing Structure:
- 'Starter' tier: $9/mo (was: $12/mo) - price decrease
- 'Pro' tier: $29/mo (was: $25/mo) - price increase
- 'Enterprise': Now shows 'Contact Sales' (was: $99/mo)
3. New Elements:
- 'Most Popular' badge on Pro tier
- Feature comparison table below cards
- Annual billing toggle (new)
ASSESSMENT:
These changes appear to be a significant product redesign,
not a regression. The visual styling is consistent and
intentional. Recommend:
1. Confirm with product team this is the expected new design
2. If confirmed, update baseline with: --update-baseline pricing
3. Consider adding visual regression tests for new elements"
[5/15] /pricing @ Tablet
...
Result: FAIL (68.92% similar)
Analysis: "Same redesign as desktop, plus responsive layout
changes - cards now stack vertically. Consistent with
intentional redesign."
... [remaining tests] ...
================================================================================
FINAL SUMMARY
================================================================================
Total Tests: 15
PASS: 11 (73.3%)
WARN: 2 (13.3%)
FAIL: 2 (13.3%)
PASS Details (11):
/home (Desktop, Mobile)
/about (All viewports)
/contact (All viewports)
/login (All viewports)
WARN Details (2):
/home @ Tablet - Minor hamburger icon shift
/pricing @ Mobile - Same as Desktop + responsive
FAIL Details (2):
/pricing @ Desktop - Major redesign
/pricing @ Tablet - Major redesign
================================================================================
RECOMMENDED ACTIONS
================================================================================
PRIORITY 1 - Requires Immediate Decision:
/pricing: Major visual redesign detected across all viewports
Actions:
a) If redesign is approved:
Run: visual-test --update-baseline pricing
b) If redesign is unexpected:
Investigate recent commits to pricing page components
Check staging deployment for unintended changes
PRIORITY 2 - Monitor:
/home @ Tablet: Minor alignment shift
Action: Low priority, but document in CSS changelog
================================================================================
Artifacts Generated:
Report: ./visual-reports/2024-12-22-143000/index.html
Diffs: ./visual-reports/2024-12-22-143000/diffs/
Open report? [y/N]
9. The Core Question Youโre Answering
โHow do you detect unintended visual changes while ignoring acceptable variations?โ
Visual testing is deceptively hard. The core challenges:
- Noise vs Signal: Anti-aliasing and font rendering create false positives
- Intentional vs Regression: Not all changes are bugs
- Reproducibility: Screenshots must be consistent across runs
- Actionability: Pixel counts donโt tell you what to fix
Claudeโs visual reasoning transforms visual testing from โ87% differentโ to โThe pricing cards were redesigned with new colors and a popular badge was added - this appears intentional.โ
10. Concepts You Must Understand First
Before starting this project, ensure you understand:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Image comparison basics | Core of diff algorithm | ImageMagick documentation |
| Responsive design | Viewport testing | โResponsive Web Designโ by Marcotte |
| CSS animations | Cause of flakiness | MDN - CSS Animations |
| Git LFS | Storing large baselines | Git LFS documentation |
| CI/CD pipelines | Integration context | โContinuous Deliveryโ by Humble |
| Perceptual hashing | Alternative comparison | pHash documentation |
11. Questions to Guide Your Design
Work through these questions BEFORE implementing:
-
Baseline Storage: Where do baselines live? Git? Cloud storage? How do you handle large images?
-
Threshold Selection: What percentage constitutes pass/warn/fail? Should this be configurable per page?
-
Dynamic Content: How do you identify and mask timestamps, avatars, ads? Manual config or auto-detection?
-
Multi-Browser: This project uses Chrome. How would you extend to Firefox/Safari?
-
CI Integration: How would this run in a CI pipeline? What exit codes? What artifacts?
-
Baseline Updates: Who can approve baseline updates? How is this tracked?
12. Thinking Exercise
Before implementing, consider this scenario:
Youโre testing an e-commerce site. The pricing page shows:
- Current prices (may change)
- โSale ends in 2:45:32โ countdown
- User avatar (logged in)
- Recently viewed items (personalized)
- Ad banner (rotates)
Questions to answer on paper:
- Which elements should be masked? Why?
- How would you mask each element?
- What if prices change daily? Is that a test or data issue?
- How would you test the page structure without testing content?
- Whatโs the difference between โprice changedโ and โprice formatting brokeโ?
13. The Interview Questions Theyโll Ask
After completing this project, youโll be ready for:
- โHow would you handle dynamic content in visual tests?โ
- Expected: Masking strategies, stable identifiers, content freezing
- Bonus: Discuss boundary between visual and data testing
- โWhatโs your strategy for cross-browser visual testing?โ
- Expected: Separate baselines per browser, understand rendering differences
- Bonus: Discuss acceptable tolerance per browser
- โHow do you reduce flakiness in screenshot comparisons?โ
- Expected: Wait strategies, animation disabling, threshold tuning
- Bonus: Discuss deterministic capture environments (Docker)
- โHow would you implement this in a CI pipeline?โ
- Expected: Headless capture, baseline storage, failure handling
- Bonus: Discuss parallel execution, artifact storage
- โWhatโs the tradeoff between pixel-perfect and perceptual testing?โ
- Expected: Sensitivity vs noise, use cases for each
- Bonus: Discuss hybrid approaches
14. Hints in Layers
If you get stuck, reveal hints progressively:
Hint 1: Consistent Viewport
Always set viewport BEFORE navigating:
// Set size first
mcp__claude-in-chrome__resize_window({ width: 1920, height: 1080, tabId })
// Then navigate
mcp__claude-in-chrome__navigate({ url: targetUrl, tabId })
// Then wait
mcp__claude-in-chrome__computer({ action: "wait", duration: 2, tabId })
// Then screenshot
mcp__claude-in-chrome__computer({ action: "screenshot", tabId })
This order ensures consistent capture dimensions.
Hint 2: Baseline Naming
Use content-addressable naming:
{page_path}_{viewport}_{width}x{height}.png
Examples:
home_desktop_1920x1080.png
pricing_tablet_768x1024.png
contact-form_mobile_375x667.png
For page paths with slashes:
/product/123 โ product-123_desktop_1920x1080.png
This makes baseline lookup deterministic.
Hint 3: Comparison Without Image Library
Claude can compare images visually! Instead of pixel algorithms:
- Capture current screenshot
- Show Claude both baseline and current images
- Ask Claude to identify and describe differences
- Use Claudeโs response as the โdiffโ
This is simpler and gives semantic analysis for free.
Hint 4: Masking Dynamic Content
Inject CSS to hide dynamic elements:
mcp__claude-in-chrome__javascript_tool({
action: "javascript_exec",
text: `
// Create mask style
const style = document.createElement('style');
style.textContent = \`
[data-testid="timestamp"],
[data-testid="avatar"],
.ad-container,
.countdown-timer {
visibility: hidden !important;
}
\`;
document.head.appendChild(style);
`,
tabId
})
Elements remain in layout but donโt affect visual comparison.
15. Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| Visual testing concepts | โPractical Test-Driven Developmentโ by Viktor Farcic | Ch. 9 |
| Image processing basics | โDigital Image Processingโ by Gonzalez & Woods | Ch. 2-3 |
| CI/CD integration | โContinuous Deliveryโ by Humble & Farley | Ch. 5 |
| Responsive design | โResponsive Web Designโ by Ethan Marcotte | All |
| Testing strategies | โThe Art of Software Testingโ by Myers | Ch. 6 |
| Percy documentation | Percy.io docs | Getting Started |
16. Extensions & Challenges
16.1 Beginner Extensions
- Multiple base URLs: Compare staging vs production
- Scheduled runs: Automated nightly visual regression
- Slack notifications: Alert on failures
16.2 Intermediate Extensions
- Component-level testing: Test specific components, not pages
- Animation testing: Capture GIFs instead of stills
- A11y overlay: Show accessibility issues on screenshots
16.3 Advanced Extensions
- AI-powered baseline updates: Auto-approve minor changes
- Cross-browser: Support Firefox via different MCP
- Performance correlation: Link visual changes to perf metrics
17. Learning Milestones
Track your progress through these checkpoints:
| Milestone | Description | Verification |
|---|---|---|
| 1. Consistent capture | Same page = same image | 5 captures identical |
| 2. Baseline storage | Save and retrieve baselines | Baseline persists across runs |
| 3. Diff detection | Identify changed pages | Modified page shows as different |
| 4. Dynamic masking | Timestamps donโt cause diffs | Dynamic page shows as match |
| 5. Claude analysis | Semantic change description | Analysis explains what changed |
| 6. Report generation | Complete HTML report | Report includes all sections |
This guide was expanded from CLAUDE_CODE_MASTERY_40_PROJECTS.md. For the complete learning path, see the project index.