Project 40: "The Autonomous Developer (Capstone)" — Full Agentic Mastery

Project 40: “The Autonomous Developer (Capstone)” — Full Agentic Mastery

Attribute	Value
File	`KIRO_CLI_LEARNING_PROJECTS.md`
Main Programming Language	Polyglot
Coolness Level	Level 5: Pure Magic
Business Potential	5. Industry Disruptor (Agentic Workflow)
Difficulty	Level 5: Master
Knowledge Area	Full Agentic Mastery

What you’ll build: A fully autonomous CI/CD healing agent that monitors GitHub Actions, detects failures, diagnoses root causes, patches code, runs tests, and opens pull requests—all without human intervention.

Why it teaches Mastery: This capstone project combines every skill from Projects 1-39: headless operation, hooks, MCP servers, shell tools, reasoning, context management, and multi-agent orchestration. If you can build this, you’ve mastered Kiro.

Core challenges you’ll face:

Headless GitHub Actions monitoring → Maps to GitHub API polling, webhook handling
Log analysis and root cause diagnosis → Maps to error pattern matching, stack trace parsing
Autonomous code patching → Maps to multi-agent collaboration, test-driven fixes
Verification loop → Maps to running tests, validating fixes before PR

Real World Outcome

You’ll have a system that automatically fixes broken CI/CD pipelines:

# Setup: Deploy the autonomous agent to a server
$ kiro autonomous-dev setup --repo myorg/my-app --webhook-url https://my-server.com/webhook

[Kiro CLI Session]
🤖 Autonomous Developer Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Configuring GitHub webhook...
  ✓ Webhook URL: https://my-server.com/webhook
  ✓ Events: workflow_run, push
  ✓ Secret: ••••••••

Step 2: Starting headless Kiro agent...
  ✓ Listening for GitHub Actions failures
  ✓ Agent running in background (PID: 12345)
  ✓ Logs: /var/log/kiro-agent.log

✅ Autonomous agent deployed!

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[Time passes... a GitHub Action fails]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[2024-12-20 14:32:15] Webhook received: workflow_run.failed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Workflow: CI
Run ID: 123456789
Commit: a3f2c1d
Branch: feature/add-user-auth
Triggered by: @developer123

Step 1: Fetching logs...
  ✓ Downloaded logs from GitHub Actions

Step 2: Analyzing failure...

Logs:
  Run npm test
    FAIL src/services/AuthService.test.ts
      AuthService
        ✕ should validate JWT token (234ms)

    ● AuthService › should validate JWT token

      TypeError: Cannot read property 'split' of undefined
        at AuthService.validateToken (src/services/AuthService.ts:45:23)
        at Object.<anonymous> (src/services/AuthService.test.ts:67:34)

  Test Suites: 1 failed, 15 passed, 16 total
  Tests:       1 failed, 187 passed, 188 total

🔍 Diagnosis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  • Error: TypeError: Cannot read property 'split' of undefined
  • File: src/services/AuthService.ts:45
  • Method: validateToken
  • Root cause: Missing null check before calling .split()
  • Impact: JWT validation crashes when token is undefined/null

Proposed fix:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Add null check before token.split() call

Step 3: Reading source code...
  ✓ Fetched src/services/AuthService.ts

Current code (line 45):
  const parts = token.split('.');

Suggested fix:
  if (!token) {
    throw new Error('Token is required');
  }
  const parts = token.split('.');

Step 4: Applying fix...
  ✓ Created branch: fix/auth-service-null-check
  ✓ Modified: src/services/AuthService.ts
  ✓ Committed: "Fix null check in AuthService.validateToken"

Step 5: Running tests locally (headless)...
  ✓ Checked out fix/auth-service-null-check
  ✓ npm install (cached, 2.3s)
  ✓ npm test

  PASS src/services/AuthService.test.ts
    AuthService
      ✓ should validate JWT token (89ms)
      ✓ should throw error for null token (23ms)  ← NEW TEST ADDED
      ✓ should throw error for invalid token (34ms)

  Test Suites: 16 passed, 16 total
  Tests:       188 passed, 188 total  ← ALL PASSING!

Step 6: Opening pull request...
  ✓ Pushed branch to GitHub
  ✓ PR #456 opened: "Fix: Add null check in AuthService.validateToken"

PR Description:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Summary
Fixes CI failure caused by missing null check in `AuthService.validateToken`.

## Root Cause
The method attempted to call `.split()` on a potentially `null` or `undefined` token, causing a TypeError when tests passed `null` as input.

## Changes
- Added null check before `token.split()`
- Throws descriptive error when token is missing
- All tests now pass ✅

## Testing
- ✓ Existing tests pass
- ✓ New test added for null token case
- ✓ CI pipeline successful

## Autonomous Fix
🤖 This PR was automatically generated by Kiro Autonomous Developer Agent.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ Fix deployed!

PR URL: https://github.com/myorg/my-app/pull/456
Status: Awaiting review
CI Status: ✅ All checks passing

[Agent returns to monitoring mode...]

What just happened:

GitHub Actions workflow failed
Agent received webhook notification
Agent downloaded and analyzed failure logs
Agent diagnosed the root cause (null check missing)
Agent read the source code and generated a fix
Agent created a branch, committed the fix, and ran tests
Agent verified all tests pass
Agent opened a PR with full context and test results

This is full autonomy — zero human intervention required.

The Core Question You’re Answering

“Can you build an AI agent that operates completely autonomously in a production environment, diagnosing and fixing real failures without human guidance?”

This is the culmination of everything:

Headless operation (no interactive prompts)
Event-driven architecture (webhooks trigger actions)
Multi-step reasoning (diagnosis → fix → verify)
Safe automation (tests must pass before PR)
Production-ready (handles edge cases, logs all actions)

Concepts You Must Understand First

Stop and research these before coding:

GitHub Actions and Webhooks
- How do GitHub webhooks work? (delivery, signatures, retries)
- How do you download workflow logs via GitHub API?
- What information is in a workflow_run event?
- Reference: GitHub Webhooks Documentation
Headless Automation
- How do you run Kiro without interactive mode?
- How do you pass instructions via environment variables or config files?
- How do you handle errors when there’s no human to ask?
- Book Reference: “Continuous Delivery” by Jez Humble - Ch. 10
Root Cause Analysis
- How do you parse stack traces programmatically?
- What patterns indicate common failure types? (null checks, type errors, async issues)
- How do you distinguish flaky tests from real bugs?
- Book Reference: “Release It!” by Michael Nygard - Ch. 4
Test-Driven Fixes
- How do you verify a fix is correct without human review?
- Should the agent add new tests for the failure case?
- What if the fix causes other tests to fail?
- Book Reference: “Test Driven Development” by Kent Beck - Ch. 1-2

Questions to Guide Your Design

Before implementing, think through these:

Event Handling
- How do you ensure webhook deliveries aren’t lost? (queue, retry logic)
- What if multiple workflows fail simultaneously?
- Should the agent handle one failure at a time or in parallel?
- How do you prevent duplicate fixes for the same failure?
Diagnosis
- How do you extract the root cause from logs? (regex patterns, LLM analysis)
- What if the logs don’t have enough information?
- Should the agent ask Kiro to analyze logs or use static patterns?
- How do you handle flaky tests (failures that pass on retry)?
Fix Generation
- Should the agent always attempt a fix or only for certain error types?
- What if Kiro generates a fix that makes things worse?
- Should the agent rollback if tests fail after the fix?
- How do you prevent infinite loops (fix → test fail → new fix → …)?
Safety and Approval
- Should all PRs auto-merge or require human review?
- What if the agent opens 100 PRs in a day?
- Should there be a “dry-run” mode that shows what it would do?
- How do you audit all agent actions?

Thinking Exercise

Autonomous Decision Tree

The agent encounters this failure:

Error: ECONNREFUSED
  at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16)

Tests failed: 12 / 188

Questions to reason through:

Is this a code bug or an infrastructure issue? (Database not running?)
Should the agent attempt a code fix or just notify a human?
If it’s a missing service, how does the agent start it?
What if the error is intermittent (connection refused sometimes)?
Should the agent retry the workflow or fix the code first?

Decision tree:

Is error deterministic? (same failure every time)
  Yes → Attempt code fix
  No → Mark as flaky, notify human

Is error in application code or infrastructure?
  Application → Generate code patch
  Infrastructure → Notify ops team

Did fix pass tests?
  Yes → Open PR
  No → Rollback, try alternative fix

The Interview Questions They’ll Ask

Prepare to answer these:

“How would you prevent the autonomous agent from making things worse?”
“What if the agent generates an infinite loop of PRs?”
“How do you ensure the agent doesn’t leak secrets or sensitive data?”
“What happens if the agent’s fix causes a production outage?”
“How would you audit all actions taken by the autonomous agent?”
“Should the agent have access to merge PRs or only create them?”

Hints in Layers

Hint 1: Webhook Server Set up an Express server to receive GitHub webhooks:

import express from 'express';
import crypto from 'crypto';

const app = express();

app.post('/webhook', express.json(), async (req, res) => {
  // Verify signature
  const signature = req.headers['x-hub-signature-256'];
  const hmac = crypto.createHmac('sha256', WEBHOOK_SECRET);
  const digest = 'sha256=' + hmac.update(JSON.stringify(req.body)).digest('hex');

  if (signature !== digest) {
    return res.status(401).send('Invalid signature');
  }

  // Handle event
  if (req.body.action === 'completed' && req.body.workflow_run.conclusion === 'failure') {
    await handleWorkflowFailure(req.body.workflow_run);
  }

  res.status(200).send('OK');
});

Hint 2: Log Analysis Download logs and extract the failure:

const logs = await octokit.actions.downloadWorkflowRunLogs({
  owner,
  repo,
  run_id,
});

// Parse logs to find error
const errorPattern = /Error: (.+)\n\s+at (.+):(\d+):(\d+)/;
const match = logs.match(errorPattern);

if (match) {
  const [, message, file, line, column] = match;
  return { message, file, line: parseInt(line), column: parseInt(column) };
}

Hint 3: Headless Kiro Invocation Run Kiro in non-interactive mode:

$ kiro --headless --prompt "Fix the null check error in AuthService.ts line 45" \
       --files src/services/AuthService.ts \
       --output-branch fix/auth-service-null-check \
       --auto-commit

Or via API if Kiro has one:

const result = await kiro.executeTask({
  instruction: "Add null check before token.split() on line 45",
  files: ['src/services/AuthService.ts'],
  branch: 'fix/auth-service-null-check',
  runTests: true,
});

Hint 4: Verification Loop After generating a fix:

Checkout the fix branch
Run npm install (or pip install, etc.)
Run npm test
Parse test output:
- If all pass → open PR
- If some fail → analyze failures and retry
- If all fail → abort and notify human

Books That Will Help

Topic	Book	Chapter
Webhooks	“Webhooks: Events for RESTful APIs” by Mike Amundsen	Ch. 2-3
CI/CD	“Continuous Delivery” by Jez Humble	Ch. 10
Root cause analysis	“Release It!” by Michael Nygard	Ch. 4
Autonomous systems	“Building Event-Driven Microservices” by Adam Bellemare	Ch. 6

Common Pitfalls and Debugging

Problem 1: “Agent creates duplicate PRs for the same failure”

Why: Webhook is delivered multiple times or agent doesn’t track what it’s fixed
Fix: Store a hash of (run_id + failure_message) in a database to deduplicate
Quick test: Trigger same failure twice — does it create one PR or two?

Problem 2: “Fix causes other tests to fail”

Why: Fix is too aggressive or changes behavior elsewhere
Fix: Run full test suite before opening PR; if new failures appear, rollback
Quick test: Generate a fix that breaks a different test — does agent catch it?

Problem 3: “Agent leaks API keys or secrets in PRs”

Why: Logs or fix code include sensitive data
Fix: Use a secret sanitization hook before commits
Quick test: Simulate a failure with API key in logs — is it redacted in PR?

Problem 4: “Infinite loop: fix fails → new fix → fails → …“

Why: No circuit breaker for repeated failures
Fix: Limit retries to 3; if all fail, notify human and stop
Quick test: Create an unfixable failure — does agent stop after 3 tries?