Project 15: “The Chrome Puppeteer” — Browser Automation

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language JavaScript / HTML
Software or Tool chrome-devtools-mcp
Coolness Level Level 5: Pure Magic
Difficulty Level 3: Advanced
Knowledge Area Browser Automation

What you’ll build: Use Kiro to drive a real browser and assert UI behavior.

Why it teaches E2E: You get AI-driven QA on a live page.

Success criteria:

  • Kiro captures a screenshot and reports a UI state.

Real World Outcome

You’ll have Kiro CLI connected to Chrome via the Model Context Protocol, allowing natural language browser automation. You can ask “Go to example.com and click the login button” and watch it happen in real-time.

1. MCP Server Configuration (in ~/.config/kiro/settings.json):

{
  "mcpServers": {
    "chrome": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"],
      "env": {}
    }
  }
}

2. Kiro CLI Session:

$ kiro
You: Go to https://news.ycombinator.com and tell me the top 3 story titles

[Tool Call] chrome.puppeteer_navigate(url="https://news.ycombinator.com")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_screenshot()
Tool Response: {"success": true, "image": "<base64>"}

[Tool Call] chrome.puppeteer_evaluate(expression=`
  Array.from(document.querySelectorAll('.titleline > a'))
    .slice(0, 3)
    .map(a => a.textContent)
`)
Tool Response: {
  "result": [
    "Show HN: I built a visual regex debugger",
    "Why SQLite uses B-trees instead of LSM trees",
    "The performance cost of virtual methods in C++"
  ]
}

Kiro: The top 3 stories on Hacker News right now are:
1. Show HN: I built a visual regex debugger
2. Why SQLite uses B-trees instead of LSM trees
3. The performance cost of virtual methods in C++

3. Visual Verification: A Chrome window opens automatically and you see:

  • Browser navigates to Hacker News
  • Page loads completely
  • Screenshot captured (visible in Kiro’s context)
  • JavaScript executed in page context
  • Results returned to Kiro

4. Advanced E2E Test Example:

You: Test the login flow on staging.myapp.com - use test@example.com / password123

[Tool Call] chrome.puppeteer_navigate(url="https://staging.myapp.com/login")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_fill(selector="#email", value="test@example.com")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_fill(selector="#password", value="password123")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_click(selector="button[type=submit]")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_wait_for_selector(selector=".dashboard-header")
Tool Response: {"success": true, "found": true}

[Tool Call] chrome.puppeteer_screenshot()
Tool Response: {"success": true, "image": "<base64>"}

Kiro: ✅ Login test passed:
- Email field filled successfully
- Password field filled successfully
- Submit button clicked
- Dashboard loaded (confirmed by presence of .dashboard-header)
- Screenshot captured showing logged-in state

What you’re seeing:

  • Natural language E2E testing - No Playwright/Selenium code to write
  • Visual feedback - Screenshots embedded in conversation
  • Selector debugging - Kiro can see the page and help fix selectors
  • Async handling - Automatic waiting for elements and navigation
  • Cross-site testing - Works on any public or localhost URL

This is the same technology behind tools like Browserbase, Skyvern, and multi-agent browser automation systems.


The Core Question You’re Answering

“How can I automate browser interactions using natural language instead of brittle test scripts?”

Traditional E2E tests look like this:

describe('Login flow', () => {
  it('should log in successfully', async () => {
    await page.goto('https://staging.myapp.com/login');
    await page.fill('#email', 'test@example.com');
    await page.fill('#password', 'password123');
    await page.click('button[type=submit]');
    await page.waitForSelector('.dashboard-header');
    expect(await page.isVisible('.dashboard-header')).toBe(true);
  });
});

Every time the UI changes (ID becomes a class, button text changes, form structure shifts), the test breaks. You spend more time maintaining tests than writing features.

This project teaches you:

  • LLM-driven selector discovery - Kiro can adapt to UI changes
  • Natural language assertions - “Verify the user is logged in” instead of brittle selectors
  • Screenshot-based debugging - AI can see what went wrong
  • Multimodal reasoning - Combine DOM inspection + visual analysis
  • Zero-code E2E testing - Write tests as conversation, not code

By the end, you’ll understand how to use Kiro as a QA engineer that can drive browsers, inspect state, and report issues—all through conversation.


Concepts You Must Understand First

Stop and research these before coding:

  1. Chrome DevTools Protocol (CDP)
    • What is the Chrome DevTools Protocol?
    • How do automation tools (Puppeteer, Selenium) communicate with Chrome?
    • What’s the difference between CDP and WebDriver?
    • How do you connect to a running Chrome instance programmatically?
    • Book Reference: “Web Performance in Action” by Jeremy Wagner - Ch. 8 (Browser Tools)
  2. DOM Querying Strategies
    • What’s the difference between CSS selectors and XPath?
    • Why are data-testid attributes better than class names for testing?
    • How do you write selectors that survive UI refactors?
    • What’s a “stable” vs “brittle” selector?
    • Book Reference: “CSS: The Definitive Guide” by Eric Meyer - Ch. 3 (Selectors)
  3. Async Browser Events
    • Why do you need to wait for elements to appear?
    • What’s the difference between waitForSelector and waitForNavigation?
    • How do you detect when a page has fully loaded (not just DOMContentLoaded)?
    • What are race conditions in browser automation?
    • Book Reference: “JavaScript: The Good Parts” by Douglas Crockford - Ch. 8 (Methods)
  4. Headless vs Headed Browsers
    • What’s the difference between headless and headed mode?
    • When should you use headless mode? (CI/CD, scraping)
    • Why might a test pass in headed mode but fail in headless?
    • How do you debug issues in headless mode?
    • Book Reference: “Web Scraping with Python” by Ryan Mitchell - Ch. 11 (JavaScript)

Questions to Guide Your Design

Before implementing, think through these:

  1. Selector Strategy
    • Should Kiro use CSS selectors, XPath, or text content matching?
    • How will you handle dynamic IDs (e.g., user-dropdown-a8f32d)?
    • What if multiple elements match a selector?
    • Should you use Kiro’s vision capabilities to verify the right element was clicked?
  2. Error Handling
    • What if a selector doesn’t exist? (Element not found)
    • What if a page doesn’t load? (Timeout)
    • What if JavaScript execution fails?
    • How will you communicate these failures to the user clearly?
  3. State Management
    • Should Kiro close the browser after each task, or keep it open?
    • How do you handle cookies and session state between tasks?
    • What if the user wants to test a multi-step flow (login → browse → checkout)?
    • Should each conversation start with a fresh browser session?
  4. Visual Debugging
    • When should Kiro automatically take screenshots?
    • Should screenshots be embedded in the conversation or saved to disk?
    • How do you handle sensitive information (passwords) in screenshots?
    • Can Kiro use computer vision to verify UI state instead of DOM inspection?

Thinking Exercise

Trace a Multi-Step Browser Flow

Before using Kiro, manually trace what should happen when you ask: “Go to Amazon, search for ‘mechanical keyboard’, and tell me the price of the first result”

Step 1: Navigate

chrome.puppeteer_navigate(url="https://amazon.com")
→ Browser loads Amazon homepage
→ Wait for page load complete

Step 2: Find Search Box

chrome.puppeteer_evaluate(`
  document.querySelector('#twotabsearchtextbox')?.placeholder
`)
→ Returns: "Search Amazon"
→ Confirms search box exists

Step 3: Fill Search Query

chrome.puppeteer_fill(selector="#twotabsearchtextbox", value="mechanical keyboard")
→ Text appears in search box

Step 4: Submit Search

chrome.puppeteer_click(selector="#nav-search-submit-button")
→ Page navigates to search results
→ Wait for navigation complete

Step 5: Wait for Results

chrome.puppeteer_wait_for_selector(selector=".s-main-slot .s-result-item")
→ Ensures at least one result exists

Step 6: Extract First Result Price

chrome.puppeteer_evaluate(`
  document.querySelector('.s-main-slot .s-result-item .a-price .a-offscreen')?.textContent
`)
→ Returns: "$89.99"

Questions while tracing:

  • What if Amazon’s HTML structure changes and selectors break?
  • How would Kiro adapt if the search box ID changes?
  • What if the page uses a loading spinner that delays results?
  • How do you handle CAPTCHA or “Are you a robot?” prompts?
  • Should Kiro use screenshots to verify the search actually happened?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the Chrome DevTools Protocol. How does Puppeteer use it to control Chrome?”

  2. “Your E2E test passes locally but fails in CI/CD. What are the most common causes, and how do you debug?”

  3. “How would you design a selector strategy that survives UI refactors? Compare data-testid attributes vs CSS classes vs XPath.”

  4. “If a button click doesn’t trigger the expected navigation, how do you debug it? Walk me through your process.”

  5. “How would you use Kiro to test a complex SPA (Single Page Application) where navigation doesn’t reload the page?”

  6. “Describe how you’d implement visual regression testing using Kiro and screenshot comparison.”

  7. “What’s the difference between waitForSelector, waitForNavigation, and waitForTimeout? When would you use each?”


Hints in Layers

Hint 1: Starting Point First, install the Puppeteer MCP server and configure Kiro to use it:

npm install -g @modelcontextprotocol/server-puppeteer

Then add it to ~/.config/kiro/settings.json under mcpServers. Start Kiro and verify the tools are available by asking “What browser tools do you have?”

Hint 2: Basic Navigation Start with simple commands:

You: Open https://example.com and take a screenshot

Kiro will call:

  1. chrome.puppeteer_navigate(url="https://example.com")
  2. chrome.puppeteer_screenshot()

The screenshot will appear in the conversation as an image.

Hint 3: Element Interaction For clicking buttons or filling forms, use selectors:

You: Fill the search box with "test query" and click submit

Kiro will inspect the page (using puppeteer_evaluate to query the DOM) and find appropriate selectors, then call:

  1. chrome.puppeteer_fill(selector="#search-input", value="test query")
  2. chrome.puppeteer_click(selector="button[type=submit]")

Hint 4: Extracting Data To extract text or data from the page:

You: What's the title of the first article on Hacker News?

Kiro will execute JavaScript in the page context:

chrome.puppeteer_evaluate(`
  document.querySelector('.titleline > a')?.textContent
`)

Hint 5: Handling Waits For dynamic content that loads asynchronously:

You: Wait for the results to load, then tell me how many items are shown

Kiro will use:

chrome.puppeteer_wait_for_selector(selector=".result-item", timeout=5000)

Then count the results:

chrome.puppeteer_evaluate(`
  document.querySelectorAll('.result-item').length
`)

Hint 6: Debugging Failed Selectors If a selector doesn’t work, ask Kiro to inspect the page:

You: The login button selector isn't working. Can you look at the page and find the right selector?

Kiro will take a screenshot and execute:

chrome.puppeteer_evaluate(`
  Array.from(document.querySelectorAll('button')).map(b => ({
    text: b.textContent,
    id: b.id,
    classes: b.className
  }))
`)

This gives Kiro the full list of buttons to choose from.


Books That Will Help

Topic Book Chapter
Chrome DevTools Protocol “Web Performance in Action” by Jeremy Wagner Ch. 8 (Browser Developer Tools)
DOM Selectors “CSS: The Definitive Guide” by Eric Meyer Ch. 3 (Selectors)
Browser Automation “Web Scraping with Python” by Ryan Mitchell Ch. 11 (JavaScript and AJAX)
Async JavaScript “JavaScript: The Good Parts” by Douglas Crockford Ch. 8 (Methods)
E2E Testing Patterns “Testing JavaScript Applications” by Lucas da Costa Ch. 9 (E2E Testing)

Common Pitfalls & Debugging

Problem 1: “Element not found” errors

  • Why: The selector is wrong, or the element hasn’t loaded yet
  • Fix: Ask Kiro to inspect the page first:
    You: Take a screenshot and show me all buttons on the page
    

    Then refine your selector based on what Kiro finds.

  • Quick test: Open Chrome DevTools manually and test the selector in the console: document.querySelector('your-selector')

Problem 2: “Navigation timeout” errors

  • Why: The page is slow, blocked by CAPTCHA, or the URL is wrong
  • Fix: Increase timeout or check network tab for errors:
    You: Navigate to example.com and wait up to 30 seconds for the page to load
    
  • Quick test: Load the URL manually in Chrome and check the Network tab for failed requests

Problem 3: “Click doesn’t do anything”

  • Why: The element is hidden, covered by another element, or requires JavaScript to be enabled
  • Fix: Verify the element is visible:
    chrome.puppeteer_evaluate(`
    const el = document.querySelector('button.submit');
    return {
      visible: el.offsetParent !== null,
      disabled: el.disabled,
      boundingBox: el.getBoundingClientRect()
    }
    `)
    
  • Quick test: Try clicking manually in headed mode to see if it works

Problem 4: “Screenshots are blank or show loading spinner”

  • Why: The screenshot was taken before content finished loading
  • Fix: Wait for a specific element that indicates the page is ready:
    chrome.puppeteer_wait_for_selector(selector=".content-loaded-indicator")
    chrome.puppeteer_screenshot()
    
  • Quick test: Add a delay and see if the screenshot improves: chrome.puppeteer_wait(ms=2000)

Problem 5: “Kiro can’t find the MCP server”

  • Why: The server isn’t installed globally, or the command in settings.json is wrong
  • Fix: Verify the server is installed:
    $ which npx
    $ npx @modelcontextprotocol/server-puppeteer --help
    

    Update settings.json to use the full path:

    {
    "command": "/usr/local/bin/npx",
    "args": ["-y", "@modelcontextprotocol/server-puppeteer"]
    }
    
  • Quick test: Restart Kiro after changing settings: kiro (fresh session)

Definition of Done

  • Puppeteer MCP server is configured in Kiro settings
  • Kiro can navigate to a URL and take a screenshot
  • Kiro can fill form fields and click buttons
  • Kiro can extract text from elements using puppeteer_evaluate
  • Kiro can wait for elements to appear before interacting
  • Screenshots are clear and show the correct page state
  • Failed selectors are debugged by inspecting the page structure
  • Multi-step flows (e.g., login → navigate → submit) work correctly
  • Test on both localhost and public URLs
  • Document common selectors for your app in a team wiki