Project 15: “The Chrome Puppeteer” — Browser Automation
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | JavaScript / HTML |
| Software or Tool | chrome-devtools-mcp |
| Coolness Level | Level 5: Pure Magic |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Browser Automation |
What you’ll build: Use Kiro to drive a real browser and assert UI behavior.
Why it teaches E2E: You get AI-driven QA on a live page.
Success criteria:
- Kiro captures a screenshot and reports a UI state.
Real World Outcome
You’ll have Kiro CLI connected to Chrome via the Model Context Protocol, allowing natural language browser automation. You can ask “Go to example.com and click the login button” and watch it happen in real-time.
1. MCP Server Configuration (in ~/.config/kiro/settings.json):
{
"mcpServers": {
"chrome": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-puppeteer"],
"env": {}
}
}
}
2. Kiro CLI Session:
$ kiro
You: Go to https://news.ycombinator.com and tell me the top 3 story titles
[Tool Call] chrome.puppeteer_navigate(url="https://news.ycombinator.com")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_screenshot()
Tool Response: {"success": true, "image": "<base64>"}
[Tool Call] chrome.puppeteer_evaluate(expression=`
Array.from(document.querySelectorAll('.titleline > a'))
.slice(0, 3)
.map(a => a.textContent)
`)
Tool Response: {
"result": [
"Show HN: I built a visual regex debugger",
"Why SQLite uses B-trees instead of LSM trees",
"The performance cost of virtual methods in C++"
]
}
Kiro: The top 3 stories on Hacker News right now are:
1. Show HN: I built a visual regex debugger
2. Why SQLite uses B-trees instead of LSM trees
3. The performance cost of virtual methods in C++
3. Visual Verification: A Chrome window opens automatically and you see:
- Browser navigates to Hacker News
- Page loads completely
- Screenshot captured (visible in Kiro’s context)
- JavaScript executed in page context
- Results returned to Kiro
4. Advanced E2E Test Example:
You: Test the login flow on staging.myapp.com - use test@example.com / password123
[Tool Call] chrome.puppeteer_navigate(url="https://staging.myapp.com/login")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_fill(selector="#email", value="test@example.com")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_fill(selector="#password", value="password123")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_click(selector="button[type=submit]")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_wait_for_selector(selector=".dashboard-header")
Tool Response: {"success": true, "found": true}
[Tool Call] chrome.puppeteer_screenshot()
Tool Response: {"success": true, "image": "<base64>"}
Kiro: ✅ Login test passed:
- Email field filled successfully
- Password field filled successfully
- Submit button clicked
- Dashboard loaded (confirmed by presence of .dashboard-header)
- Screenshot captured showing logged-in state
What you’re seeing:
- Natural language E2E testing - No Playwright/Selenium code to write
- Visual feedback - Screenshots embedded in conversation
- Selector debugging - Kiro can see the page and help fix selectors
- Async handling - Automatic waiting for elements and navigation
- Cross-site testing - Works on any public or localhost URL
This is the same technology behind tools like Browserbase, Skyvern, and multi-agent browser automation systems.
The Core Question You’re Answering
“How can I automate browser interactions using natural language instead of brittle test scripts?”
Traditional E2E tests look like this:
describe('Login flow', () => {
it('should log in successfully', async () => {
await page.goto('https://staging.myapp.com/login');
await page.fill('#email', 'test@example.com');
await page.fill('#password', 'password123');
await page.click('button[type=submit]');
await page.waitForSelector('.dashboard-header');
expect(await page.isVisible('.dashboard-header')).toBe(true);
});
});
Every time the UI changes (ID becomes a class, button text changes, form structure shifts), the test breaks. You spend more time maintaining tests than writing features.
This project teaches you:
- LLM-driven selector discovery - Kiro can adapt to UI changes
- Natural language assertions - “Verify the user is logged in” instead of brittle selectors
- Screenshot-based debugging - AI can see what went wrong
- Multimodal reasoning - Combine DOM inspection + visual analysis
- Zero-code E2E testing - Write tests as conversation, not code
By the end, you’ll understand how to use Kiro as a QA engineer that can drive browsers, inspect state, and report issues—all through conversation.
Concepts You Must Understand First
Stop and research these before coding:
- Chrome DevTools Protocol (CDP)
- What is the Chrome DevTools Protocol?
- How do automation tools (Puppeteer, Selenium) communicate with Chrome?
- What’s the difference between CDP and WebDriver?
- How do you connect to a running Chrome instance programmatically?
- Book Reference: “Web Performance in Action” by Jeremy Wagner - Ch. 8 (Browser Tools)
- DOM Querying Strategies
- What’s the difference between CSS selectors and XPath?
- Why are
data-testidattributes better than class names for testing? - How do you write selectors that survive UI refactors?
- What’s a “stable” vs “brittle” selector?
- Book Reference: “CSS: The Definitive Guide” by Eric Meyer - Ch. 3 (Selectors)
- Async Browser Events
- Why do you need to wait for elements to appear?
- What’s the difference between
waitForSelectorandwaitForNavigation? - How do you detect when a page has fully loaded (not just DOMContentLoaded)?
- What are race conditions in browser automation?
- Book Reference: “JavaScript: The Good Parts” by Douglas Crockford - Ch. 8 (Methods)
- Headless vs Headed Browsers
- What’s the difference between headless and headed mode?
- When should you use headless mode? (CI/CD, scraping)
- Why might a test pass in headed mode but fail in headless?
- How do you debug issues in headless mode?
- Book Reference: “Web Scraping with Python” by Ryan Mitchell - Ch. 11 (JavaScript)
Questions to Guide Your Design
Before implementing, think through these:
- Selector Strategy
- Should Kiro use CSS selectors, XPath, or text content matching?
- How will you handle dynamic IDs (e.g.,
user-dropdown-a8f32d)? - What if multiple elements match a selector?
- Should you use Kiro’s vision capabilities to verify the right element was clicked?
- Error Handling
- What if a selector doesn’t exist? (Element not found)
- What if a page doesn’t load? (Timeout)
- What if JavaScript execution fails?
- How will you communicate these failures to the user clearly?
- State Management
- Should Kiro close the browser after each task, or keep it open?
- How do you handle cookies and session state between tasks?
- What if the user wants to test a multi-step flow (login → browse → checkout)?
- Should each conversation start with a fresh browser session?
- Visual Debugging
- When should Kiro automatically take screenshots?
- Should screenshots be embedded in the conversation or saved to disk?
- How do you handle sensitive information (passwords) in screenshots?
- Can Kiro use computer vision to verify UI state instead of DOM inspection?
Thinking Exercise
Trace a Multi-Step Browser Flow
Before using Kiro, manually trace what should happen when you ask: “Go to Amazon, search for ‘mechanical keyboard’, and tell me the price of the first result”
Step 1: Navigate
chrome.puppeteer_navigate(url="https://amazon.com")
→ Browser loads Amazon homepage
→ Wait for page load complete
Step 2: Find Search Box
chrome.puppeteer_evaluate(`
document.querySelector('#twotabsearchtextbox')?.placeholder
`)
→ Returns: "Search Amazon"
→ Confirms search box exists
Step 3: Fill Search Query
chrome.puppeteer_fill(selector="#twotabsearchtextbox", value="mechanical keyboard")
→ Text appears in search box
Step 4: Submit Search
chrome.puppeteer_click(selector="#nav-search-submit-button")
→ Page navigates to search results
→ Wait for navigation complete
Step 5: Wait for Results
chrome.puppeteer_wait_for_selector(selector=".s-main-slot .s-result-item")
→ Ensures at least one result exists
Step 6: Extract First Result Price
chrome.puppeteer_evaluate(`
document.querySelector('.s-main-slot .s-result-item .a-price .a-offscreen')?.textContent
`)
→ Returns: "$89.99"
Questions while tracing:
- What if Amazon’s HTML structure changes and selectors break?
- How would Kiro adapt if the search box ID changes?
- What if the page uses a loading spinner that delays results?
- How do you handle CAPTCHA or “Are you a robot?” prompts?
- Should Kiro use screenshots to verify the search actually happened?
The Interview Questions They’ll Ask
Prepare to answer these:
-
“Explain the Chrome DevTools Protocol. How does Puppeteer use it to control Chrome?”
-
“Your E2E test passes locally but fails in CI/CD. What are the most common causes, and how do you debug?”
-
“How would you design a selector strategy that survives UI refactors? Compare
data-testidattributes vs CSS classes vs XPath.” -
“If a button click doesn’t trigger the expected navigation, how do you debug it? Walk me through your process.”
-
“How would you use Kiro to test a complex SPA (Single Page Application) where navigation doesn’t reload the page?”
-
“Describe how you’d implement visual regression testing using Kiro and screenshot comparison.”
-
“What’s the difference between
waitForSelector,waitForNavigation, andwaitForTimeout? When would you use each?”
Hints in Layers
Hint 1: Starting Point First, install the Puppeteer MCP server and configure Kiro to use it:
npm install -g @modelcontextprotocol/server-puppeteer
Then add it to ~/.config/kiro/settings.json under mcpServers. Start Kiro and verify the tools are available by asking “What browser tools do you have?”
Hint 2: Basic Navigation Start with simple commands:
You: Open https://example.com and take a screenshot
Kiro will call:
chrome.puppeteer_navigate(url="https://example.com")chrome.puppeteer_screenshot()
The screenshot will appear in the conversation as an image.
Hint 3: Element Interaction For clicking buttons or filling forms, use selectors:
You: Fill the search box with "test query" and click submit
Kiro will inspect the page (using puppeteer_evaluate to query the DOM) and find appropriate selectors, then call:
chrome.puppeteer_fill(selector="#search-input", value="test query")chrome.puppeteer_click(selector="button[type=submit]")
Hint 4: Extracting Data To extract text or data from the page:
You: What's the title of the first article on Hacker News?
Kiro will execute JavaScript in the page context:
chrome.puppeteer_evaluate(`
document.querySelector('.titleline > a')?.textContent
`)
Hint 5: Handling Waits For dynamic content that loads asynchronously:
You: Wait for the results to load, then tell me how many items are shown
Kiro will use:
chrome.puppeteer_wait_for_selector(selector=".result-item", timeout=5000)
Then count the results:
chrome.puppeteer_evaluate(`
document.querySelectorAll('.result-item').length
`)
Hint 6: Debugging Failed Selectors If a selector doesn’t work, ask Kiro to inspect the page:
You: The login button selector isn't working. Can you look at the page and find the right selector?
Kiro will take a screenshot and execute:
chrome.puppeteer_evaluate(`
Array.from(document.querySelectorAll('button')).map(b => ({
text: b.textContent,
id: b.id,
classes: b.className
}))
`)
This gives Kiro the full list of buttons to choose from.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Chrome DevTools Protocol | “Web Performance in Action” by Jeremy Wagner | Ch. 8 (Browser Developer Tools) |
| DOM Selectors | “CSS: The Definitive Guide” by Eric Meyer | Ch. 3 (Selectors) |
| Browser Automation | “Web Scraping with Python” by Ryan Mitchell | Ch. 11 (JavaScript and AJAX) |
| Async JavaScript | “JavaScript: The Good Parts” by Douglas Crockford | Ch. 8 (Methods) |
| E2E Testing Patterns | “Testing JavaScript Applications” by Lucas da Costa | Ch. 9 (E2E Testing) |
Common Pitfalls & Debugging
Problem 1: “Element not found” errors
- Why: The selector is wrong, or the element hasn’t loaded yet
- Fix: Ask Kiro to inspect the page first:
You: Take a screenshot and show me all buttons on the pageThen refine your selector based on what Kiro finds.
- Quick test: Open Chrome DevTools manually and test the selector in the console:
document.querySelector('your-selector')
Problem 2: “Navigation timeout” errors
- Why: The page is slow, blocked by CAPTCHA, or the URL is wrong
- Fix: Increase timeout or check network tab for errors:
You: Navigate to example.com and wait up to 30 seconds for the page to load - Quick test: Load the URL manually in Chrome and check the Network tab for failed requests
Problem 3: “Click doesn’t do anything”
- Why: The element is hidden, covered by another element, or requires JavaScript to be enabled
- Fix: Verify the element is visible:
chrome.puppeteer_evaluate(` const el = document.querySelector('button.submit'); return { visible: el.offsetParent !== null, disabled: el.disabled, boundingBox: el.getBoundingClientRect() } `) - Quick test: Try clicking manually in headed mode to see if it works
Problem 4: “Screenshots are blank or show loading spinner”
- Why: The screenshot was taken before content finished loading
- Fix: Wait for a specific element that indicates the page is ready:
chrome.puppeteer_wait_for_selector(selector=".content-loaded-indicator") chrome.puppeteer_screenshot() - Quick test: Add a delay and see if the screenshot improves:
chrome.puppeteer_wait(ms=2000)
Problem 5: “Kiro can’t find the MCP server”
- Why: The server isn’t installed globally, or the
commandin settings.json is wrong - Fix: Verify the server is installed:
$ which npx $ npx @modelcontextprotocol/server-puppeteer --helpUpdate
settings.jsonto use the full path:{ "command": "/usr/local/bin/npx", "args": ["-y", "@modelcontextprotocol/server-puppeteer"] } - Quick test: Restart Kiro after changing settings:
kiro(fresh session)
Definition of Done
- Puppeteer MCP server is configured in Kiro settings
- Kiro can navigate to a URL and take a screenshot
- Kiro can fill form fields and click buttons
- Kiro can extract text from elements using
puppeteer_evaluate - Kiro can wait for elements to appear before interacting
- Screenshots are clear and show the correct page state
- Failed selectors are debugged by inspecting the page structure
- Multi-step flows (e.g., login → navigate → submit) work correctly
- Test on both localhost and public URLs
- Document common selectors for your app in a team wiki