Sprint: Kiro CLI Mastery - Real World Projects

Goal: Deeply understand Kiro CLI—the next-generation agentic terminal environment from AWS. You will master context management, custom agent creation, MCP integrations, type-safe automation hooks, headless CI/CD pipelines, and complex multi-agent workflows. By the end, you will transform your terminal from a passive command executor into an intelligent development partner that understands your codebase, enforces your standards, and automates your workflows with minimal human intervention. You will learn how to steer AI execution, control permissions, manage context, and build reliable automation that produces real, verifiable outcomes.


Introduction

Kiro CLI is not just a terminal chatbot—it’s a comprehensive agentic runtime that transforms how you interact with code, infrastructure, and development workflows.

What is Kiro CLI? Kiro CLI (formerly Amazon Q Developer CLI) is AWS’s agentic terminal environment that provides:

  • Deep context awareness across your entire project
  • Autonomous task execution with human-in-the-loop approval
  • Extensible tool ecosystem via Model Context Protocol (MCP)
  • Type-safe automation hooks for quality gates
  • Multi-agent workflows for complex tasks

What problem does it solve today? The terminal has been a passive executor for decades. You type commands, it returns output. This places 100% of the cognitive load on you. Kiro inverts this: you describe intent, and Kiro researches, plans, executes, and verifies—with your guidance.

What will you build across the projects? You’ll build a complete Kiro ecosystem:

  • Session management and analytics tools
  • Custom agents (security auditor, code reviewer, DevOps engineer)
  • MCP integrations (databases, GitHub, AWS, Slack, Docker)
  • Type-safe automation hooks (secret scanning, test generation, auto-formatting)
  • Multi-agent orchestration systems
  • Headless CI/CD pipelines
  • Configuration sharing systems for teams

What is in scope vs out of scope?

In Scope:

  • Kiro CLI configuration, agents, steering, hooks, and MCP
  • Building custom tools and integrations
  • Multi-agent workflows and orchestration
  • Headless automation for CI/CD
  • Team collaboration and configuration sharing

Out of Scope:

  • AWS-specific infrastructure setup (covered separately)
  • Deep LLM architecture (we use Kiro’s models)
  • General software engineering (we assume proficiency)

Big Picture: Kiro as an Agentic Operating System

┌────────────────────────────────────────────────────────────────┐
│                    KIRO CLI - AGENTIC OS                       │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐         │
│  │   CONTEXT    │  │    AGENCY    │  │   STEERING   │         │
│  │   LAYER      │  │    LAYER     │  │    LAYER     │         │
│  ├──────────────┤  ├──────────────┤  ├──────────────┤         │
│  │ • Files      │  │ • Subagents  │  │ • product.md │         │
│  │ • Git        │  │ • Planning   │  │ • tech.md    │         │
│  │ • Sessions   │  │ • Execution  │  │ • style.md   │         │
│  │ • Knowledge  │  │ • Parallel   │  │ • security.md│         │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘         │
│         │                 │                 │                 │
│         └─────────────────┼─────────────────┘                 │
│                           │                                   │
│              ┌────────────▼───────────┐                        │
│              │   REVL LOOP ENGINE     │                        │
│              │   (Read, Evaluate,     │                        │
│              │    Verify, Loop)       │                        │
│              └────────┬───────────────┘                        │
│                       │                                        │
│         ┌─────────────┼─────────────┐                          │
│         │             │             │                          │
│         ▼             ▼             ▼                          │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐                    │
│  │   TOOLS   │ │    MCP    │ │   HOOKS   │                    │
│  │  (Built-  │ │ (External │ │ (Events)  │                    │
│  │   in)     │ │  Systems) │ │           │                    │
│  └───────────┘ └───────────┘ └───────────┘                    │
│                                                                │
└────────────────────────────────────────────────────────────────┘

How to Use This Guide

Reading the Primer Before Projects:

  1. Start with the Theory Primer (sections below) to build mental models
  2. Read the Concept Summary Table to see what you’ll learn
  3. Review the Project-to-Concept Map to understand project coverage
  4. Use Deep Dive Reading for book references on each concept

Picking a Learning Path:

  • Beginner Path: Projects 1-5 → foundations and configuration
  • Intermediate Path: Projects 6-15 → MCP integrations and hooks
  • Advanced Path: Projects 16-25 → multi-agent workflows and automation
  • Enterprise Path: Projects 26-35 → headless CI/CD and team collaboration
  • Expert Path: Projects 36-40 → advanced orchestration and capstone

Validating Progress: Each project has a “Definition of Done” checklist and “Real World Outcome” section with exact expected outputs. Use these to verify mastery.


Prerequisites & Background Knowledge

Essential Prerequisites (Must Have)

  • Terminal proficiency: Comfortable with bash, environment variables, pipes, and redirection
  • Git fundamentals: Commit, branch, merge, rebase, remote operations
  • Programming basics: Variables, functions, loops, conditionals in any language
  • JSON/YAML: Read and write configuration files
  • Recommended Reading: “Effective Shell” by Dave Kerr - Ch. 1-5

Helpful But Not Required

  • Node.js/TypeScript: Learn during Projects 8, 14, 17 (Bun/TypeScript hooks)
  • Python: Learn during Projects 9, 13 (MCP server development)
  • Docker: Learn during Projects 18, 34 (containerized MCP servers)
  • AWS familiarity: Learn during Projects 9, 11, 17 (AWS integrations)

Self-Assessment Questions

  1. Can you explain what export VAR=value does vs VAR=value?
  2. Do you understand stdin, stdout, and stderr?
  3. Can you write a basic shell script with conditionals?
  4. Do you know how to parse JSON with jq or similar tools?
  5. Can you explain what environment variables are and how they’re inherited?

Development Environment Setup

Required Tools:

  • Kiro CLI: Latest version via npm install -g @aws/kiro-cli or brew
  • Node.js: v20+ (for MCP servers and hook development)
  • Bun: v1.0+ (for type-safe hooks, install via curl -fsSL https://bun.sh/install | bash)
  • Git: v2.30+ (for version control and context)
  • jq: Latest (for JSON parsing, brew install jq or apt install jq)

Recommended Tools:

  • Docker: v24+ (for containerized MCP servers)
  • PostgreSQL client: v14+ (for database MCP projects)
  • gh CLI: Latest (for GitHub integration projects)
  • AWS CLI: v2 (for AWS integration projects)

Testing Your Setup:

$ kiro-cli --version
kiro-cli version 1.x.x

$ node --version
v20.x.x

$ bun --version
1.x.x

$ jq --version
jq-1.x

$ git --version
git version 2.x.x

Time Investment

  • Simple projects (1-10): 4-8 hours each (fundamentals, configuration)
  • Moderate projects (11-25): 10-20 hours each (MCP, hooks, integrations)
  • Complex projects (26-35): 20-40 hours each (multi-agent, CI/CD)
  • Advanced projects (36-40): 40-80 hours each (orchestration, capstone)
  • Total sprint: 4-8 months at 10-15 hours/week

Important Reality Check Kiro CLI is cutting-edge technology. You will encounter:

  • Documentation gaps: Official docs are evolving; use community resources
  • Breaking changes: Kiro is under active development; expect API changes
  • Learning curve: Agentic workflows require new mental models
  • Debugging complexity: Multi-agent systems are harder to debug than scripts

This is normal. The goal is deep understanding, not just “making it work.”


Big Picture / Mental Model

The Agentic Terminal Paradigm Shift

Traditional terminals are reactive executors:

You: command
Terminal: output
You: next command
Terminal: output

Kiro CLI is a proactive partner:

You: intent ("Add OAuth login")
Kiro: research → plan → confirm → execute → verify → report
You: approve/modify
Kiro: continue

The REVL Loop: Kiro’s Core Execution Model

Kiro is not a single-shot REPL. It’s an execution loop with guardrails:

┌────────────────────────────────────────────────────────────────┐
│                    REVL LOOP (Deterministic AI)                │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│   User Intent                                                  │
│       │                                                        │
│       ▼                                                        │
│   ┌──────────────┐                                             │
│   │   READ       │  Load context: files, git, session history │
│   │   Context    │                                             │
│   └──────┬───────┘                                             │
│          │                                                     │
│          ▼                                                     │
│   ┌──────────────┐                                             │
│   │  EVALUATE    │  Generate plan based on context            │
│   │   Plan       │  Check against steering rules              │
│   └──────┬───────┘                                             │
│          │                                                     │
│          ▼                                                     │
│   ┌──────────────┐                                             │
│   │  VERIFY      │  Validate plan meets constraints           │
│   │   Constraints│  Human approval gate                       │
│   └──────┬───────┘                                             │
│          │                                                     │
│          ▼                                                     │
│   ┌──────────────┐                                             │
│   │  EXECUTE     │  Run tools, write files, call MCP          │
│   │   Tools      │  Hooks run pre/post execution              │
│   └──────┬───────┘                                             │
│          │                                                     │
│          ▼                                                     │
│   ┌──────────────┐                                             │
│   │  VERIFY      │  Check results match expectations          │
│   │   Results    │  Run tests, lint, validate                 │
│   └──────┬───────┘                                             │
│          │                                                     │
│          ▼                                                     │
│   ┌──────────────┐                                             │
│   │  LOOP        │  Continue to next subtask or stop          │
│   │              │                                             │
│   └──────────────┘                                             │
│                                                                │
│   Why this matters: AI output is probabilistic.               │
│   Verification makes it deterministic.                        │
│   If the loop cannot verify, the task is not done.            │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Key Insight: The REVL loop separates intent, execution, and verification—making AI reliable enough for production workflows.


Theory Primer

This section builds the conceptual foundation you need before diving into projects. Each concept below has a dedicated chapter with deep explanations, diagrams, and examples.


Chapter 1: Configuration and Scope - The Settings Hierarchy

Fundamentals (100+ words)

Kiro CLI uses a three-tier configuration system where settings cascade from global → project → agent scope. Understanding this hierarchy is critical because misconfiguration causes mysterious failures and frustrating debugging sessions.

The global scope (~/.kiro/settings.json) defines your baseline: default model, telemetry preferences, and global steering rules. These apply to every Kiro session unless overridden.

The project scope (.kiro/settings.json in your repo) enforces workspace-specific constraints: which MCP servers to load, project-specific steering files, and tool permissions. These override global settings.

The agent scope (.kiro/agents/your-agent.json) creates specialized personas with locked-down capabilities. A security auditor agent might only have read permissions and forced Opus model for deep analysis.

When Kiro starts, it merges these three layers, with agent scope winning conflicts, then project, then global. This allows you to define sensible defaults globally while enforcing strict boundaries per-project and per-agent.

Deep Dive (500+ words)

The configuration hierarchy solves a fundamental problem in agentic systems: how do you balance flexibility with safety across different contexts?

Global Configuration (~/.kiro/settings.json)

Your global config is your personal AI preferences. This is where you set:

  • Default model: Auto router, Haiku, Sonnet, or Opus
  • Telemetry: Whether AWS can collect usage data
  • Global steering: Organization-wide coding standards (e.g., “Always use TypeScript strict mode”)
  • Global MCP servers: Tools you want available everywhere (e.g., AWS docs, GitHub)

Example global config:

{
  "model": "auto",
  "telemetry": false,
  "steering": {
    "files": [
      "file://~/.kiro/steering/typescript-standards.md",
      "file://~/.kiro/steering/company-security-policy.md"
    ]
  },
  "mcpServers": {
    "github": {
      "command": "docker",
      "args": ["run", "-i", "ghcr.io/github/mcp-github"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"
      }
    }
  }
}

Project Configuration (.kiro/settings.json)

Project configs override globals for workspace-specific needs:

  • Project-specific MCP servers: Postgres for backend, none for frontend
  • Stricter tool permissions: Block shell commands in production repos
  • Project steering: “Use Prisma for all DB queries”, “API routes go in /api”

Example project config:

{
  "model": "opus",
  "steering": {
    "files": [
      "file://.kiro/steering/api-standards.md",
      "file://.kiro/steering/database-rules.md"
    ]
  },
  "tools": {
    "shell": {
      "allowedCommands": ["npm test", "npm run lint"],
      "deniedCommands": ["rm -rf", "sudo", "curl | sh"]
    }
  },
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "${DATABASE_URL}"]
    }
  }
}

Agent Configuration (.kiro/agents/security-auditor.json)

Agent configs create specialized personas with narrow capabilities:

{
  "name": "security-auditor",
  "model": "opus",
  "prompt": "You are a security auditor. Review code for OWASP Top 10 vulnerabilities. You cannot modify files—only report findings.",
  "allowedTools": ["read", "grep", "glob"],
  "deniedTools": ["write", "shell", "aws"],
  "resources": [
    "file://docs/security-standards.md",
    "file://docs/owasp-top-10.md"
  ],
  "mcpServers": {
    "sonarqube": {
      "command": "node",
      "args": ["./mcp-servers/sonarqube.js"]
    }
  }
}

Precedence Resolution

When Kiro starts, it merges configs in this order:

  1. Load global config
  2. Overlay project config (if in a repo with .kiro/)
  3. Overlay agent config (if --agent <name> flag used)

Conflict resolution:

  • Arrays: Concatenate (steering files from all levels)
  • Objects: Deep merge (MCP servers from all levels)
  • Primitives: Agent wins, then project, then global

Common Pitfalls:

  • Forgetting project overrides global: You set model: haiku globally but project forces opus—debugging why responses are slow
  • Agent deniedTools blocking MCP: Your agent denies “shell” but your MCP server needs it to run
  • Steering file conflicts: Global says “use tabs”, project says “use spaces”—Kiro gets confused

Mental Model: Think of configs as CSS specificity:

  • Global = element selectors (broad, low priority)
  • Project = class selectors (scoped, medium priority)
  • Agent = ID selectors (specific, highest priority)

How this fits in projects:

  • Project 1: Personalized Kiro Config - You’ll build a robust global config
  • Project 4: Custom Agent Factory - You’ll create specialized agent configs
  • Project 5: Steering Rules Engine - You’ll write comprehensive steering files
  • Project 16: Configuration Sync System - You’ll share configs across machines

Common Misconceptions:

  • “Global config is read-only”: False. You can edit it anytime, changes apply to new sessions.
  • “Agent config must live in .kiro/agents/: False. You can load from anywhere with --agent-config path/to/agent.json.
  • “Precedence always favors specificity”: Mostly true, but arrays concatenate rather than override.

Check-your-understanding questions:

  1. If global sets model: haiku and project sets model: opus, which wins?
  2. If global steering has typescript-standards.md and project has api-standards.md, which files load?
  3. How would you debug which config is actually being used?

Check-your-understanding answers:

  1. Project wins (opus). Agent would win over both.
  2. Both files load. Steering arrays concatenate.
  3. Run kiro-cli settings show to see merged config. Or check ~/.kiro/logs/ for config loading logs.

Real-world applications:

  • Enterprise teams: Global config enforces company security policies, project config adds repo-specific rules
  • Freelancers: Global config for personal preferences, project configs per client
  • Open source: Project config for contributor guidelines, global empty for maintainer flexibility

Where you’ll apply it:

  • Project 1: Build layered config system
  • Project 4: Create specialized agents
  • Project 16: Sync configs across machines
  • Project 37: Team configuration sharing

References:

  • Kiro CLI Docs: Configuration - https://kiro.dev/docs/cli/configuration/
  • “The Pragmatic Programmer” by Hunt & Thomas - Ch. 4: “Configuration”
  • AWS re:Invent 2024: Kiro CLI Deep Dive (YouTube)

Key insights: Config hierarchy enables both flexibility (global defaults) and safety (agent restrictions)—the foundation of scalable agentic workflows.

Summary: Kiro’s three-tier config system (global → project → agent) allows you to balance personal preferences, project requirements, and agent specialization. Mastering this hierarchy prevents configuration conflicts and enables team-wide standardization.

Homework/Exercises to practice the concept:

  1. Create a global config that sets Haiku as default model and enables telemetry
  2. Create a project config that overrides to Opus and loads a Postgres MCP server
  3. Create an agent config that denies all write tools
  4. Run kiro-cli settings show and explain which settings came from which layer
  5. Intentionally create a conflict (e.g., global allows shell, agent denies it) and observe behavior

Solutions to the homework/exercises:

  1. Global config: {"model": "haiku", "telemetry": true}
  2. Project config: {"model": "opus", "mcpServers": {"postgres": {...}}}
  3. Agent config: {"deniedTools": ["write", "edit", "multiedit"]}
  4. Settings show will display merged config with annotations showing source
  5. Agent denial wins—shell commands will be blocked despite global allowing them

(Continue with remaining chapters: Context Management, Steering, MCP, Hooks, Subagents, Planning, Models, etc.)

I’ll write this file in chunks to stay within token limits. Should I continue with the remaining chapters of the Theory Primer?

Chapter 2: Context Management - RAM, Not Disk

Fundamentals

Context is Kiro’s working memory—the information it actively uses to understand your intent and execute tasks. Unlike traditional tools that read files on demand, Kiro loads relevant information into its “context window” (a token budget, typically 200K tokens). Think of it as RAM for the AI: fast, limited, and requiring active management. Context includes chat history, file contents, steering rules, and MCP tool outputs. When context fills up, Kiro must compact (summarize) or remove information, risking the loss of important details. Mastering context management is the difference between an AI that “remembers” your project and one that constantly forgets critical constraints.

Deep Dive

Kiro’s context system operates on a tiered hierarchy with three distinct layers:

1. Session Context (Chat History) Every message you send and every response Kiro generates consumes tokens. A 100-line conversation can consume 20-50K tokens. This is why long sessions feel “forgetful”—early messages get summarized or dropped. The /compact command intelligently summarizes chat history while preserving critical information (steering rules, file references, decisions made).

2. Agent Resources (Explicit Context) Files you explicitly add via /context add src/auth.ts or by mentioning them (“look at auth.ts”). These remain in context until you /context remove them or context fills up. Kiro also loads files automatically when you reference them in prompts. The /context show command displays what’s currently loaded, broken down by source (chat, files, steering).

3. Knowledge Base (Indexed Codebase) For large codebases, Kiro can index your entire project using /knowledge enable. This creates vector embeddings for semantic search. When context is full, Kiro queries the knowledge base instead of holding everything in RAM. It’s slower but unlimited—like swapping to disk.

Mental Model:

┌────────────────────────────────────────────────────────┐
│             KIRO CONTEXT WINDOW (200K tokens)          │
├────────────────────────────────────────────────────────┤
│                                                        │
│  Session Context (Chat History)                       │
│  ┌──────────────────────────────────────────────┐     │
│  │ User: "Add authentication"                   │ 5K  │
│  │ Kiro: "I'll create JWT middleware..."        │     │
│  │ User: "Use bcrypt for passwords"             │     │
│  │ Kiro: "Installing bcrypt..."                 │     │
│  └──────────────────────────────────────────────┘     │
│                                                        │
│  Agent Resources (Explicit Files)                     │
│  ┌──────────────────────────────────────────────┐     │
│  │ • src/auth/middleware.ts          (4.2K)     │ 45K │
│  │ • src/models/User.ts              (3.1K)     │     │
│  │ • .kiro/steering/security.md      (1.8K)     │     │
│  │ • package.json                    (0.9K)     │     │
│  └──────────────────────────────────────────────┘     │
│                                                        │
│  Steering Rules (Active Constraints)                  │
│  ┌──────────────────────────────────────────────┐     │
│  │ • .kiro/steering/tech.md          (2.5K)     │ 8K  │
│  │ • .kiro/steering/api-standards.md (1.2K)     │     │
│  └──────────────────────────────────────────────┘     │
│                                                        │
│  Total: 58K / 200K (29% full)                         │
└────────────────────────────────────────────────────────┘

When full, Kiro must:
1. Compact (summarize) chat history
2. Remove least-used files
3. Query knowledge base instead of loading everything

How this fits on projects:

  • Project 3: Visualize context usage in real-time
  • Project 20: Inject git diffs automatically
  • Project 25: Use tangent mode to isolate exploratory work
  • Project 28: Enable knowledge base for large codebases

Minimal concrete example:

# Load a file into context
$ kiro-cli
> /context add src/auth.ts

# Check what's in context
> /context show
Session Context: 12,450 tokens (chat history)
Agent Resources: 8,200 tokens
  • src/auth.ts (4,200 tokens)
  • .kiro/steering/security.md (4,000 tokens)
Total: 20,650 / 200,000 tokens (10%)

# Remove file when done
> /context remove src/auth.ts

# Compact chat history when it gets large
> /compact
✓ Compacted 45 messages into 3 summary blocks
  Saved 18,000 tokens

Common misconceptions:

  • “Context is permanent”: No, it’s cleared on session restart unless you checkpoint
  • “Kiro reads files from disk as needed”: No, files must be in context or knowledge base
  • “Bigger context is always better”: No, too much context dilutes signal—Kiro loses focus

Check-your-understanding questions:

  1. Why does Kiro “forget” earlier parts of long conversations?
  2. What’s the difference between /context add and enabling knowledge base?
  3. When should you use /compact vs /context clear?

Check-your-understanding answers:

  1. Chat history consumes tokens. When context fills, early messages get summarized or dropped
  2. /context add keeps files in active RAM (fast, limited). Knowledge base indexes everything (slow retrieval, unlimited storage)
  3. /compact when you want to preserve decisions but free space. /context clear when starting fresh

Real-world applications:

  • Code reviews: Load changed files + steering rules, compact when reviewing many PRs
  • Refactoring: Enable knowledge base for cross-file references, use context for active files
  • Debugging: Add error logs + relevant source files, remove when fixed

Where you’ll apply it:

  • Project 3: “The Context Detective” - Visualize context usage
  • Project 20: “The Git Context Injector” - Auto-inject diffs
  • Project 25: “The Tangent Explorer” - Isolate exploratory context
  • Project 28: “The Semantic Search Engine” - Enable knowledge base

References:

  • Kiro CLI Docs: Context Management - https://kiro.dev/docs/cli/context/
  • “AI Engineering” by Chip Huyen - Ch. 3: “Context Windows”
  • “Designing Data-Intensive Applications” by Kleppmann - Ch. 3: “Storage”

Key insights: Context is scarce—treat it like RAM, not disk. Load what you need, remove what you don’t, and use knowledge base for deep search.

Summary: Kiro’s context window is its working memory, limited to ~200K tokens. It’s divided into session context (chat), agent resources (files), and steering rules. When full, Kiro compacts or forgets. For large codebases, enable knowledge base for unlimited semantic search.

Homework/Exercises:

  1. Load 5 files into context and run /context show. Calculate what percentage of your budget is used
  2. Have a 20-message conversation, then run /compact. Check how many tokens you saved
  3. Enable /knowledge on a 50+ file project and ask “where is authentication handled?”
  4. Deliberately overflow context (add huge files) and observe Kiro’s behavior
  5. Create a tangent for debugging, then return to main context. Verify tangent didn’t pollute main thread

Solutions:

  1. /context add each file, then /context show will display token breakdown
  2. /compact shows “Saved X tokens” - typically 30-60% reduction
  3. /knowledge enable && kiro ask "where is auth" - Kiro will cite specific files/lines
  4. Kiro will warn “Context nearly full” and auto-compact or reject adding more files
  5. /tangent "debug login" → work → /return/context show (tangent content absent)

Chapter 3: Steering and Specs - Constraints That Actually Work

Fundamentals

Steering files are Markdown documents that constrain AI behavior through explicit rules. They function as “soft prompts” that Kiro reads before executing any task. Unlike hard-coded logic, steering uses natural language constraints that the LLM interprets and enforces. A steering file might say “Never use var in JavaScript” or “All API responses must include error codes.” Kiro reads these rules, internalizes them, and refuses to violate them—even when explicitly asked. This makes steering the foundation of enterprise AI deployment: you encode company standards, security policies, and architectural decisions into files that travel with your codebase.

Deep Dive

Steering operates on a hierarchical loading system:

1. Global Steering (~/.kiro/steering/*.md) Applied to ALL sessions across ALL projects. Use for personal coding preferences (e.g., “Always use TypeScript strict mode”) or company-wide policies (e.g., “Never commit AWS credentials”).

2. Project Steering (<project>/.kiro/steering/*.md) Applied only to the current project. Use for repo-specific rules (e.g., “Follow Rails conventions” or “Use Tailwind for styling”).

3. Agent Steering (defined in agent config) Embedded directly in custom agent JSON files. Use for role-specific constraints (e.g., security auditor: “Never suggest write operations”).

File Structure Best Practices:

.kiro/steering/
├── tech.md          # Technical constraints (languages, frameworks)
├── product.md       # Business logic rules
├── style.md         # Code style and formatting
├── security.md      # Security policies
└── api-standards.md # API design principles

Mental Model:

┌────────────────────────────────────────────────────────┐
│          STEERING FILE LOADING HIERARCHY               │
├────────────────────────────────────────────────────────┤
│                                                        │
│  1. Global Steering (Personal/Company Standards)      │
│     ~/.kiro/steering/                                  │
│     ├── security.md      ←  "No hardcoded secrets"    │
│     └── typescript.md    ←  "Strict mode always"      │
│                   ↓                                    │
│  2. Project Steering (Repo-Specific Rules)             │
│     ./.kiro/steering/                                  │
│     ├── api-standards.md ←  "REST conventions"        │
│     └── testing.md       ←  "100% test coverage"      │
│                   ↓                                    │
│  3. Agent Steering (Role-Specific Constraints)         │
│     Agent config JSON                                  │
│     "Never write files" (for security-auditor)        │
│                   ↓                                    │
│           MERGED INTO CONTEXT                          │
│     All steering files concatenated and loaded         │
│     Kiro enforces ALL rules simultaneously            │
│                                                        │
└────────────────────────────────────────────────────────┘

How this fits on projects:

  • Project 2: Create hierarchical steering files
  • Project 5: Use /plan with steering constraints
  • Project 6: Build agents with embedded steering
  • Project 30: Analyze steering effectiveness and improve it

Minimal concrete example:

# Technical Constraints

## Language Rules
- Use TypeScript strict mode for all `.ts` files
- Never use `any` type—use `unknown` instead
- Prefer `const` over `let`, never use `var`

## Framework Constraints
- Use React functional components, not class components
- Use React Query for data fetching, not useEffect
- Use Zod for runtime validation

## Error Handling
- All async functions must have try/catch blocks
- Never swallow errors silently
- Log errors to structured logging service

Common misconceptions:

  • “Steering is just comments”: No, Kiro actively enforces these rules and will refuse violations
  • “More steering is better”: No, too many rules confuse the LLM—focus on critical constraints
  • “Steering replaces code review”: No, it prevents common mistakes but humans still validate logic

Check-your-understanding questions:

  1. If global steering says “use tabs” and project steering says “use spaces”, what happens?
  2. How would you write a steering rule that prevents SQL injection?
  3. When should steering go in global vs project vs agent config?

Check-your-understanding answers:

  1. Both load—Kiro sees conflicting rules and may ask for clarification or default to project (more specific)
  2. “Never construct SQL queries with string concatenation. Always use parameterized queries or an ORM.”
  3. Global: universal standards (security, style). Project: repo conventions. Agent: role restrictions (read-only, specific tools).

Real-world applications:

  • Startups: Enforce architectural decisions as team grows (e.g., “Use event sourcing for all domain events”)
  • Enterprises: Company-wide security policies (e.g., “All PII must be encrypted at rest”)
  • Open source: Contributor guidelines (e.g., “All PRs must include tests and update CHANGELOG.md”)

Where you’ll apply it:

  • Project 2: “The Steering Enforcer” - Build hierarchical steering system
  • Project 5: “The Plan Architect” - Use steering with planning agents
  • Project 18: “The Security Firewall Hook” - Enforce steering via hooks
  • Project 30: “The Recursive Prompt Improver” - Analyze and improve steering

References:

  • Kiro CLI Docs: Steering Files - https://kiro.dev/docs/cli/steering/
  • “The Pragmatic Programmer” by Hunt & Thomas - Ch. 2: “A Pragmatic Approach”
  • “Clean Architecture” by Robert C. Martin - Ch. 22: “The Clean Architecture”

Key insights: Steering encodes human judgment into executable constraints—the bridge between “describe intent” and “enforce standards.”

Summary: Steering files are Markdown documents with rules that Kiro enforces. They load hierarchically (global → project → agent) and function as “soft prompts” that constrain behavior. Use them to encode standards, policies, and architectural decisions that apply across your team or project.

Homework/Exercises:

  1. Create a security.md steering file with 5 rules (no secrets, parameterized SQL, etc.)
  2. Write a style.md that enforces your team’s code style
  3. Intentionally violate a steering rule and observe Kiro’s response
  4. Create an agent with steering that prevents all write operations
  5. Test steering conflict: global says “use tabs”, project says “use spaces”

Solutions:

  1. Example rule: “Never commit files containing ‘API_KEY=’ or ‘SECRET=’”
  2. Example: “Use Prettier with singleQuote: true and semi: false
  3. Kiro will refuse or warn: “This violates steering rule in security.md”
  4. Agent config: {"allowedTools": ["read", "grep", "glob"]} + steering: “You are read-only”
  5. Kiro may ask for clarification or use project-specific rule (more granular wins)

Chapter 4: MCP (Model Context Protocol) - Connecting to External Systems

Fundamentals

Model Context Protocol (MCP) is Kiro’s plugin architecture—a standardized way to extend the AI with external capabilities without teaching it new skills directly. Instead of asking Kiro to “learn how to query Postgres,” you connect an MCP server that already knows how. Think of MCP servers as specialized translators: Kiro asks them questions in natural language, they execute the technical operation (SQL query, API call, cloud command), and return structured results Kiro can understand. This creates a clean separation: Kiro handles reasoning and orchestration, MCP servers handle domain-specific execution.

How This Fits on Projects

  • Project 6: Build your first MCP server connector for Postgres
  • Project 7: Create GitHub automation using the official GitHub MCP server
  • Project 9: Connect to AWS using the aws-docs MCP server
  • Project 13: Implement a custom MCP server in Python
  • Project 14: Build a filesystem MCP server in Node.js
  • Project 18: Integrate Docker operations via MCP

Definitions & Key Terms

  • MCP Server: A standalone process that provides tools (functions) and resources (data) to Kiro via the MCP protocol
  • Local Server: An MCP server running on your machine (command + args)
  • Remote Server: An MCP server accessible via HTTP/HTTPS (url + headers)
  • Tool: An exposed function that Kiro can call (e.g., query_database, list_issues)
  • Resource: Data the server provides (e.g., database schemas, file trees)
  • Transport: How Kiro communicates with the server (stdio for local, HTTP for remote)
  • Scope: Where the MCP configuration applies (workspace vs user-level)

Mental Model Diagram

┌──────────────────────────────────────────────────────────────┐
│                    KIRO CLI (Orchestrator)                   │
│  "Find all bugs in my GitHub repo assigned to me"            │
└────────────┬──────────────────────────┬──────────────────────┘
             │                          │
             │ MCP Protocol             │ MCP Protocol
             │ (JSON-RPC over stdio)    │ (JSON-RPC over HTTP)
             ▼                          ▼
   ┌─────────────────────┐    ┌─────────────────────┐
   │   GitHub MCP Server │    │  Postgres MCP Server│
   │   (Local - npx)     │    │   (Remote - URL)    │
   ├─────────────────────┤    ├─────────────────────┤
   │ Tools:              │    │ Tools:              │
   │ • list_issues       │    │ • query             │
   │ • create_pr         │    │ • list_tables       │
   │ • add_comment       │    │ • describe_schema   │
   │                     │    │                     │
   │ Resources:          │    │ Resources:          │
   │ • repo_tree         │    │ • table_schemas     │
   │ • open_issues       │    │ • active_queries    │
   └──────────┬──────────┘    └──────────┬──────────┘
              │                          │
              │ GitHub API               │ PostgreSQL Protocol
              ▼                          ▼
     ┌─────────────────┐        ┌─────────────────┐
     │  GitHub.com API │        │ Postgres DB     │
     └─────────────────┘        └─────────────────┘

Configuration Files:
• Workspace: .kiro/settings/mcp.json (project-specific servers)
• User:      ~/.kiro/settings/mcp.json (global servers)

Deep Dive

MCP solves a fundamental problem in AI tooling: How do you give an AI access to specialized systems without bloating its core capabilities? Before MCP, every new integration required custom code, tight coupling, and fragile APIs. MCP introduces a universal contract: servers expose tools (callable functions) and resources (queryable data), and clients (like Kiro) discover and invoke them dynamically.

The MCP Architecture:

  1. Server Process: Each MCP server is a separate process (Python script, Node.js app, compiled binary) that implements the MCP specification. The server exposes:
    • Tools: Functions Kiro can invoke (e.g., query_database(sql: string))
    • Resources: Data endpoints Kiro can read (e.g., database://schema/users)
    • Metadata: Descriptions, parameter schemas, error handling
  2. Transport Layer: Communication happens via JSON-RPC:
    • stdio transport: For local servers (Kiro spawns the process, communicates via stdin/stdout)
    • HTTP transport: For remote servers (Kiro makes HTTP requests to a URL)
  3. Discovery: When Kiro starts, it reads mcp.json config files (user-level and workspace-level), spawns configured servers, and calls their list_tools and list_resources methods to discover capabilities.

  4. Invocation: When you ask Kiro to “check database for users created today,” Kiro:
    • Recognizes this requires database access
    • Finds the Postgres MCP server’s query tool
    • Constructs a SQL query: SELECT * FROM users WHERE created_at::date = CURRENT_DATE
    • Invokes the tool via JSON-RPC: {"method": "tools/call", "params": {"name": "query", "arguments": {"sql": "..."}}}
    • Receives structured results: {"rows": [...], "columns": [...], "rowCount": 42}
    • Formats the response for you: “Found 42 users created today…”

Configuration Hierarchy:

MCP configurations follow Kiro’s three-tier system:

  • User-level (~/.kiro/settings/mcp.json): Global servers available in all projects (e.g., GitHub, AWS docs)
  • Workspace-level (.kiro/settings/mcp.json): Project-specific servers (e.g., local Postgres for this app)
  • Merge behavior: Workspace configs extend (not replace) user configs. If both define a server with the same name, workspace wins.

Security Model:

MCP servers run with the same permissions as Kiro itself. This means:

  • A Postgres MCP server can execute any SQL query Kiro requests (including DROP TABLE)
  • A filesystem MCP server can read/write any file Kiro asks for
  • Authentication happens via environment variables (${GITHUB_TOKEN}) to avoid hardcoding secrets

Lifecycle:

  1. Startup: Kiro reads mcp.json, spawns local servers (via command + args), connects to remote servers (via URL)
  2. Discovery: Kiro queries each server for available tools and resources
  3. Runtime: As you chat, Kiro invokes tools when needed
  4. Shutdown: When you exit Kiro, local server processes are terminated

Example Configuration Breakdown:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    "postgres": {
      "url": "https://mcp.example.com/postgres",
      "headers": {
        "Authorization": "Bearer ${DB_API_KEY}"
      }
    }
  }
}
  • github server (local):
    • command: npx: Uses npx to run the server
    • args: Passes -y (auto-install) and package name
    • env.GITHUB_TOKEN: References environment variable (never hardcode!)
    • Kiro spawns this as a child process on startup
  • postgres server (remote):
    • url: HTTP endpoint hosting the MCP server
    • headers.Authorization: Passes Bearer token from environment
    • Kiro makes HTTP POST requests to this URL

How It Works (Step-by-Step)

  1. Add MCP Server: Run kiro-cli mcp add --name github --command npx --args="-y @modelcontextprotocol/server-github" --scope workspace
  2. Configure Environment: Set export GITHUB_TOKEN=ghp_... in your shell
  3. Start Kiro: Run kiro-cli. Kiro reads .kiro/settings/mcp.json and spawns the GitHub server
  4. Discovery: Kiro calls the server’s list_tools method and learns about list_issues, create_pr, etc.
  5. User Query: You ask: “Show me all open bugs in my repo”
  6. Tool Selection: Kiro’s auto-router decides to use list_issues tool
  7. Invocation: Kiro sends JSON-RPC request: {"method": "tools/call", "params": {"name": "list_issues", "arguments": {"state": "open", "labels": ["bug"]}}}
  8. Execution: GitHub MCP server calls GitHub API, fetches issues, returns JSON
  9. Response: Kiro formats the result: “Found 8 open bugs: #42 (auth crash), #51 (timeout)…”

Minimal Concrete Example

1. Create a simple MCP server config (.kiro/settings/mcp.json):

{
  "mcpServers": {
    "fetch": {
      "command": "uvx",
      "args": ["mcp-server-fetch"]
    }
  }
}

2. Start Kiro and verify:

$ kiro-cli
> Show me the tools you have access to
I have access to the following MCP tools:
• fetch (from fetch server): Fetch a URL and return its content

> Fetch https://api.github.com/repos/torvalds/linux and show me the star count
Fetching https://api.github.com/repos/torvalds/linux...
The Linux repository has 182,456 stars.

Common Misconceptions

  1. “MCP servers are plugins that run inside Kiro”
    • Truth: MCP servers are separate processes. Kiro communicates with them via JSON-RPC, not by loading code into its runtime.
  2. “I need to write code for Kiro to understand new tools”
    • Truth: You configure MCP servers declaratively in mcp.json. Kiro discovers tools automatically via the list_tools RPC method.
  3. “MCP servers must be written in JavaScript”
    • Truth: Any language that can output JSON-RPC over stdout (Python, Go, Rust, etc.) can be an MCP server. The protocol is language-agnostic.
  4. “Environment variables in mcp.json are optional”
    • Truth: They’re critical for security. Never hardcode GITHUB_TOKEN or DATABASE_URL in config files. Use ${VAR} syntax.
  5. “Remote MCP servers are slower than local”
    • Truth: It depends. Remote servers add HTTP latency, but they can be hosted on powerful hardware. Local servers are faster for startup but limited by your machine.

Check-Your-Understanding Questions

  1. What are the two types of MCP server transports, and when would you use each?
  2. If you define a “github” MCP server in both ~/.kiro/settings/mcp.json and .kiro/settings/mcp.json, which one does Kiro use?
  3. How does Kiro discover what tools an MCP server provides?
  4. Why should you use ${GITHUB_TOKEN} instead of hardcoding your token in mcp.json?
  5. What happens to local MCP server processes when you exit Kiro?

Check-Your-Understanding Answers

  1. stdio (for local servers spawned by Kiro) and HTTP (for remote servers accessed via URL). Use stdio for servers you control locally (faster, no network). Use HTTP for centralized servers shared across teams or hosted on different machines.

  2. The workspace config (.kiro/settings/mcp.json) takes precedence. Kiro merges configs with workspace overriding user-level for conflicting keys.

  3. Kiro calls the server’s list_tools JSON-RPC method during startup. The server responds with metadata for each tool (name, description, parameter schema).

  4. Security: Hardcoding tokens exposes secrets if you commit the file to Git. Environment variables keep secrets out of version control. Also, ${VAR} syntax lets you use different tokens per environment (dev/staging/prod).

  5. Kiro terminates them. Local MCP servers are child processes spawned by Kiro. When Kiro exits, it sends SIGTERM to clean up.

Real-World Applications

  1. Database Operations: Connect Postgres/MySQL MCP servers to query production databases without writing SQL manually
  2. Cloud Automation: Use AWS/GCP MCP servers to provision infrastructure (“Create an S3 bucket for logs”)
  3. API Integration: GitHub MCP server for issue tracking, Slack MCP server for notifications, Jira for project management
  4. Custom Tools: Build domain-specific MCP servers (e.g., “deploy” tool that runs your CI/CD pipeline)
  5. Enterprise Systems: Connect to internal APIs (CRM, ERP) via custom MCP servers with SSO authentication

Where You’ll Apply It

  • Project 6: The Postgres Analyst - Connect to a local Postgres database via MCP, query schemas, and run analytics queries
  • Project 7: The GitHub Project Manager - Use the official GitHub MCP server to automate issue triage and PR workflows
  • Project 9: The AWS Cloud Architect - Leverage the AWS documentation MCP server to query best practices and provision resources
  • Project 13: The Custom Tool Builder (Python) - Implement an MCP server from scratch in Python that exposes custom tools
  • Project 14: The File System Guardian (Node.js) - Build a Node.js MCP server that provides safe filesystem operations with guardrails
  • Project 18: The Docker MCP Integration - Create an MCP server that wraps Docker CLI commands for container management

References

Key Insights

MCP turns Kiro from a chatbot into an orchestration platform: instead of teaching the AI new skills, you connect specialized workers (MCP servers) that already know their domain, and Kiro coordinates them via natural language.

Summary

Model Context Protocol (MCP) is Kiro’s standardized plugin system for connecting to external tools and data sources. Servers expose tools (callable functions) and resources (queryable data) via JSON-RPC, using either stdio transport (local servers) or HTTP transport (remote servers). Configuration happens in mcp.json files at user or workspace level, with environment variables for secrets. Kiro discovers server capabilities on startup and invokes tools dynamically during conversations, creating a clean separation between AI reasoning (Kiro) and domain execution (MCP servers).

Homework/Exercises to Practice the Concept

  1. Exercise 1: Add a Fetch MCP Server
    • Install the mcp-server-fetch server using uvx
    • Configure it in .kiro/settings/mcp.json
    • Ask Kiro to fetch https://api.github.com/users/torvalds and summarize the profile
  2. Exercise 2: Configure GitHub MCP with Authentication
    • Generate a GitHub personal access token (classic) with repo scope
    • Add GITHUB_TOKEN to your environment
    • Configure the @modelcontextprotocol/server-github MCP server
    • Ask Kiro to list all open issues in one of your repos
  3. Exercise 3: Workspace vs User MCP Configs
    • Add a “fetch” server to your user-level config (~/.kiro/settings/mcp.json)
    • Add a different “fetch” server (or disable it) in a workspace config (.kiro/settings/mcp.json)
    • Start Kiro and verify which configuration wins
  4. Exercise 4: Explore MCP Server Tools
    • Configure any MCP server from the server directory
    • Ask Kiro: “What tools does the [server_name] server provide?”
    • Test each tool with a realistic query
  5. Exercise 5: Debug an MCP Server Failure
    • Intentionally misconfigure an MCP server (wrong command, missing env var)
    • Start Kiro and observe the error messages
    • Fix the configuration and verify the server starts successfully

Solutions to the Homework/Exercises

Solution 1:

# Install the server globally (if needed)
$ uvx mcp-server-fetch

# Create/edit .kiro/settings/mcp.json
$ mkdir -p .kiro/settings
$ cat > .kiro/settings/mcp.json << 'EOF'
{
  "mcpServers": {
    "fetch": {
      "command": "uvx",
      "args": ["mcp-server-fetch"]
    }
  }
}
EOF

# Start Kiro and test
$ kiro-cli
> Fetch https://api.github.com/users/torvalds and tell me about this user
Fetching https://api.github.com/users/torvalds...

Linus Torvalds:
- Username: torvalds
- Name: Linus Torvalds
- Public repos: 6
- Followers: 215,000+
- Bio: Creator of Linux and Git

Solution 2:

# Generate token at https://github.com/settings/tokens (classic, repo scope)
$ export GITHUB_TOKEN=ghp_yourTokenHere

# Add to workspace config
$ cat >> .kiro/settings/mcp.json << 'EOF'
{
  "mcpServers": {
    "fetch": {
      "command": "uvx",
      "args": ["mcp-server-fetch"]
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"
      }
    }
  }
}
EOF

# Start Kiro
$ kiro-cli
> List all open issues in my learning_journey_c repo
Querying GitHub for open issues in learning_journey_c...

Found 3 open issues:
1. #42: "Add DNS deep dive projects" (created 2 days ago)
2. #51: "Expand ARM assembly guide" (created 5 days ago)
3. #58: "Create Rust concurrency chapter" (created 1 week ago)

Solution 3:

# User-level config (~/.kiro/settings/mcp.json):
{
  "mcpServers": {
    "fetch": {
      "command": "uvx",
      "args": ["mcp-server-fetch"]
    }
  }
}

# Workspace config (.kiro/settings/mcp.json):
{
  "mcpServers": {
    "fetch": {
      "disabled": true
    }
  }
}

# Result: The workspace config wins, fetch server is disabled in this project
# Kiro will report: "MCP server 'fetch' is disabled in workspace configuration"

Solution 4:

# Example with SQLite MCP server
$ cat > .kiro/settings/mcp.json << 'EOF'
{
  "mcpServers": {
    "sqlite": {
      "command": "uvx",
      "args": ["mcp-server-sqlite", "--db-path", "./test.db"]
    }
  }
}
EOF

$ kiro-cli
> What tools does the sqlite server provide?
The sqlite MCP server provides these tools:
• query: Execute SQL queries on the database
• create_table: Create a new table with specified schema
• list_tables: Show all tables in the database
• describe_table: Show schema for a specific table

> Create a table called "users" with columns id, name, and email
Creating table "users"...
Table created successfully with columns: id (INTEGER PRIMARY KEY), name (TEXT), email (TEXT)

Solution 5:

# Broken config (missing GITHUB_TOKEN environment variable)
$ cat > .kiro/settings/mcp.json << 'EOF'
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"
      }
    }
  }
}
EOF

$ kiro-cli
Error starting MCP server 'github': Environment variable GITHUB_TOKEN not found

# Fix: Set the environment variable
$ export GITHUB_TOKEN=ghp_yourTokenHere

$ kiro-cli
Starting Kiro CLI...
✓ MCP server 'github' started successfully
>

Chapter 5: Hooks and Guardrails - The Nervous System

Fundamentals

Hooks are Kiro’s event-driven automation system—shell scripts that execute at specific moments in the agent’s workflow (before user prompts, before tool use, after tool use, on agent spawn). They act as the “nervous system,” reacting to events without you explicitly calling them. Instead of telling Kiro “run this command before every file write,” you configure a preToolUse hook with a matcher for the write tool, and it fires automatically every time. Hooks enable guardrails (prevent dangerous operations), auditing (log every command), auto-formatting (run Prettier after writes), and dynamic context injection (add git status before each prompt).

How This Fits on Projects

  • Project 8: Build a type-safe pre-commit hook system in TypeScript with Bun
  • Project 17: Create a security hook that validates code before execution
  • Project 18: Implement a guardrail hook that prevents destructive operations
  • Project 20: Build a Git context injector hook that adds repo state to every conversation
  • Project 22: Create a test-generator hook that auto-generates tests after writes
  • Project 24: Implement a secret sanitizer hook that scans for exposed credentials

Definitions & Key Terms

  • Hook: A shell command triggered by Kiro events (agentSpawn, userPromptSubmit, preToolUse, postToolUse)
  • agentSpawn: Fires once when the agent initializes (setup tasks, environment checks)
  • userPromptSubmit: Fires when you submit a message (inject context, run pre-flight checks)
  • preToolUse: Fires before a tool executes (validation, blocking dangerous operations)
  • postToolUse: Fires after a tool executes (auto-formatting, cleanup, notifications)
  • Matcher: Pattern to filter which tools trigger preToolUse/postToolUse hooks
  • Blocking Hook: A preToolUse hook that returns exit code 2 to prevent tool execution
  • Passthrough Hook: A hook that modifies stdin/stdout but allows execution (exit code 0)
  • Scope: Where the hook is configured (global, project, agent-specific)

Mental Model Diagram

┌───────────────────────────────────────────────────────┐
│           KIRO HOOK LIFECYCLE (Event-Driven)          │
└───────────────────────────────────────────────────────┘

1. Agent Startup
   ↓
   agentSpawn Hook(s) ────→ Setup environment, load data
   ↓
2. User Types Message: "Write auth.ts file"
   ↓
   userPromptSubmit Hook(s) ──→ Inject git status, check branch
   ↓
3. Kiro Decides: "Use Write tool for auth.ts"
   ↓
   preToolUse Hook(s) ────────→ Validate (check for secrets)
   ├─ Exit 0: ✅ Allow        │ - Scan content for API_KEY
   └─ Exit 2: ❌ Block        │ - Return error to Kiro
   ↓
4. Write Tool Executes: Create auth.ts
   ↓
   postToolUse Hook(s) ────────→ Auto-format, run linter
   ↓                             - prettier --write auth.ts
5. Return Result to User         - eslint --fix auth.ts
   ↓
6. User Types Next Message...
   ↓
   (Loop back to step 2)

Hook Configuration:
┌────────────────────────────────────────────────┐
│ .kiro/settings/agent-config.json               │
├────────────────────────────────────────────────┤
│ {                                              │
│   "hooks": {                                   │
│     "preToolUse": [                            │
│       {                                        │
│         "matcher": "write",                    │
│         "command": "scan_for_secrets.sh"       │
│       }                                        │
│     ],                                         │
│     "postToolUse": [                           │
│       {                                        │
│         "matcher": "write",                    │
│         "command": "prettier --write \"$FILE\"" │
│       }                                        │
│     ]                                          │
│   }                                            │
│ }                                              │
└────────────────────────────────────────────────┘

Deep Dive

Hooks solve the automation gap between “Kiro does what I ask” and “Kiro respects my team’s policies automatically.” Without hooks, you’d need to remind Kiro every time: “Before you write code, check for secrets. After you write code, run the formatter.” Hooks make these rules automatic.

The Four Hook Types:

  1. agentSpawn (Initialization Hook)
    • When: Once, when the agent starts
    • Use Cases:
      • Load environment variables from .env
      • Check if required tools are installed (docker, kubectl, etc.)
      • Verify git branch (“Only work on feature/* branches”)
      • Initialize workspace state (create temp directories, start services)
    • Exit Behavior: Exit code ignored (non-blocking)
    • Example: "command": "[ $(git branch --show-current) = main ] && echo 'ERROR: Do not work on main branch' >&2 || true"
  2. userPromptSubmit (Context Injection Hook)
    • When: Every time you submit a message
    • Use Cases:
      • Inject git status into conversation: git status --short
      • Add current branch: echo "Current branch: $(git branch --show-current)"
      • Show failing tests: npm test 2>&1 | grep FAIL
      • Add TODO comments: rg "TODO|FIXME" -n
    • Exit Behavior: Exit code ignored (non-blocking)
    • stdin/stdout: Hook output is appended to your message as context
    • Example: "command": "git diff --stat"
  3. preToolUse (Validation/Blocking Hook)
    • When: Before Kiro executes a tool (write, edit, bash, etc.)
    • Use Cases:
      • Block dangerous operations: Prevent rm -rf in bash commands
      • Validate file writes: Scan for hardcoded secrets (API keys, passwords)
      • Enforce patterns: Reject writes that don’t match code style
      • Audit logging: Log every tool invocation to /tmp/audit.log
    • Exit Behavior:
      • Exit 0: Tool executes normally
      • Exit 2: Tool is blocked, stderr is shown to Kiro as an error
    • stdin: Receives tool parameters (e.g., file path, command text)
    • Example: "command": "grep -q 'API_KEY' && exit 2 || exit 0" (block if API_KEY found)
  4. postToolUse (Cleanup/Formatting Hook)
    • When: After Kiro executes a tool successfully
    • Use Cases:
      • Auto-format code: Run prettier, rustfmt, black after writes
      • Run linters: eslint --fix, cargo clippy
      • Generate docs: Update API docs after code changes
      • Trigger builds: Run npm run build after dependency changes
      • Notify teams: Send Slack message after deployments
    • Exit Behavior: Exit code ignored (tool already executed)
    • stdin: Receives tool result (e.g., file content after write)
    • Example: "command": "prettier --write \"$FILE\"

Matcher Patterns:

For preToolUse and postToolUse, you can filter which tools trigger the hook:

{
  "matcher": "write",        // Exact match (only "write" tool)
  "matcher": "write|edit",   // Multiple tools (regex OR)
  "matcher": ".*",           // All tools (wildcard)
  "matcher": "bash"          // Only bash commands
}

Configuration Hierarchy:

Hooks follow Kiro’s three-tier system:

  • Global (~/.kiro/settings/agent-config.json): Hooks that apply to all projects
  • Project (.kiro/settings/agent-config.json): Project-specific hooks
  • Agent (.kiro/agents/my-agent/config.json): Agent-specific hooks

Merge behavior: All hooks accumulate (global + project + agent). They execute in order: global → project → agent.

Security Implications:

  • Hooks run with your shell’s permissions (same as Kiro)
  • A malicious preToolUse hook can read all tool parameters (including file contents)
  • A malicious postToolUse hook can modify tool results before Kiro sees them
  • Best practice: Keep hooks in version-controlled project repos, review changes carefully

Hook Execution Model:

  1. Synchronous: Kiro blocks until the hook completes
  2. Timeout: Hooks have a default 30-second timeout (configurable)
  3. Environment: Hooks inherit Kiro’s environment variables
  4. Working Directory: Hooks execute in the project root
  5. stdin/stdout/stderr:
    • preToolUse: stdin = tool parameters, stdout = modified parameters (if exit 0), stderr = error message (if exit 2)
    • postToolUse: stdin = tool result, stdout = logged (not returned to Kiro), stderr = logged as warning
    • userPromptSubmit: stdout = appended to user message, stderr = logged as warning

How It Works (Step-by-Step)

  1. Configure Hook: Add to .kiro/settings/agent-config.json:
    {
      "hooks": {
        "preToolUse": [
          {
            "matcher": "bash",
            "command": "~/.kiro/scripts/audit-bash.sh"
          }
        ]
      }
    }
    
  2. Create Hook Script (~/.kiro/scripts/audit-bash.sh):
    #!/bin/bash
    # Read bash command from stdin
    CMD=$(cat)
    
    # Log to audit file
    echo "$(date): $CMD" >> ~/.kiro/bash-audit.log
    
    # Block dangerous commands
    if echo "$CMD" | grep -qE "rm -rf|sudo|shutdown"; then
      echo "ERROR: Dangerous command blocked" >&2
      exit 2
    fi
    
    # Allow safe commands
    exit 0
    
  3. Make Executable: chmod +x ~/.kiro/scripts/audit-bash.sh

  4. Test: Ask Kiro to run ls -la (should log and execute), then ask it to run rm -rf /tmp/test (should block and show error)

Minimal Concrete Example

1. Auto-format code after every write:

{
  "hooks": {
    "postToolUse": [
      {
        "matcher": "write",
        "command": "prettier --write \"${FILE}\""
      }
    ]
  }
}

2. Inject git status before every user message:

{
  "hooks": {
    "userPromptSubmit": [
      {
        "command": "echo '\n\n--- Git Status ---' && git status --short"
      }
    ]
  }
}

3. Block file writes containing “TODO”:

{
  "hooks": {
    "preToolUse": [
      {
        "matcher": "write",
        "command": "grep -q 'TODO' && echo 'ERROR: Remove TODOs before writing' >&2 && exit 2 || exit 0"
      }
    ]
  }
}

Common Misconceptions

  1. “Hooks are called manually by Kiro”
    • Truth: Hooks are event-driven. Kiro triggers them automatically based on events (prompt submit, tool use). You never “call” a hook.
  2. “preToolUse hooks can modify the tool’s behavior”
    • Truth: They can only block (exit 2) or allow (exit 0). They can’t change what the tool does. (Exception: stdin passthrough can modify parameters, but this is advanced.)
  3. “postToolUse hooks run before the tool”
    • Truth: They run after the tool completes successfully. Use preToolUse for pre-execution validation.
  4. “Hooks must be written in Bash”
    • Truth: Any executable works (Python, Node.js, compiled binaries). The command just needs to be a valid shell command.
  5. “Blocking a preToolUse hook stops the entire conversation”
    • Truth: It stops that specific tool invocation. Kiro receives the error message and can try a different approach or ask you for guidance.

Check-Your-Understanding Questions

  1. What’s the difference between preToolUse and postToolUse hooks?
  2. If a preToolUse hook exits with code 2, what happens?
  3. When does the userPromptSubmit hook fire, and what is its output used for?
  4. Can you use preToolUse to auto-format code? Why or why not?
  5. If you define the same hook in both global and project configs, which one runs?

Check-Your-Understanding Answers

  1. preToolUse fires before a tool executes and can block it (exit 2). postToolUse fires after a tool executes and cannot block it (tool already ran). Use pre for validation, post for cleanup/formatting.

  2. The tool is blocked from executing. Kiro receives the hook’s stderr as an error message and must choose a different action (ask you, try a different tool, etc.).

  3. userPromptSubmit fires when you submit a message (before Kiro processes it). Its stdout is appended to your message as context, allowing you to inject dynamic information (git status, test results, etc.).

  4. No. preToolUse runs before the tool, so there’s no code to format yet. Use postToolUse with a write|edit matcher to format code after it’s written.

  5. Both run, in order: global first, then project. Hooks accumulate across configuration tiers. If you want project-specific behavior, you can use conditionals inside the hook script.

Real-World Applications

  1. Security Scanning: preToolUse hook that scans file writes for hardcoded secrets (API keys, passwords) and blocks the write with exit code 2
  2. Code Quality: postToolUse hook that runs linters (eslint --fix, cargo clippy) after every code write
  3. Compliance Auditing: Log every bash command to an audit file with preToolUse (exit 0 to allow, but log first)
  4. Context Enrichment: userPromptSubmit hook that adds git diff, test status, and current branch to every conversation
  5. CI/CD Integration: postToolUse hook that triggers builds or deployments after dependency changes
  6. Policy Enforcement: preToolUse hook that prevents Kiro from editing production config files or running destructive commands

Where You’ll Apply It

  • Project 8: The Type-Safe Hook with Bun - Build a pre-commit hook system in TypeScript using Bun’s fast runtime
  • Project 17: The Security Firewall Hook - Create a preToolUse hook that validates all operations against security policies
  • Project 18: The Secret Sanitizer Hook - Implement a hook that scans for exposed credentials before writes
  • Project 20: The Git Context Injector - Build a userPromptSubmit hook that adds rich git context to every conversation
  • Project 22: The Test Generator Hook - Create a postToolUse hook that auto-generates test files after code writes
  • Project 24: The Auto-Fixer Loop - Combine hooks to create a feedback loop (write → lint → auto-fix → verify)

References

  • Official Documentation: Hooks - CLI - Kiro (2025)
  • Agent Configuration Reference: Agent Configuration - CLI - Kiro (2025)
  • Settings Reference: Settings - CLI - Kiro (2025)
  • Blog Post: Introducing Kiro Powers (2025) - discusses hooks as part of Powers
  • Book: “Shell Scripting: Expert Recipes for Linux, Bash, and More” by Steve Parker - Advanced shell scripting techniques for hooks
  • Book: “Wicked Cool Shell Scripts” by Dave Taylor - Practical shell script patterns applicable to hooks

Key Insights

Hooks turn Kiro from a reactive assistant into a proactive enforcer of team policies, executing guardrails, formatters, and context injectors automatically without manual prompting—the difference between “remember to run the linter” and “the linter always runs.”

Summary

Hooks are event-driven shell scripts that execute at specific moments in Kiro’s workflow: agentSpawn (initialization), userPromptSubmit (before processing user messages), preToolUse (before tool execution, can block), and postToolUse (after tool execution). They enable guardrails, auditing, auto-formatting, and dynamic context injection. Configured in agent-config.json at global, project, or agent scope, hooks accumulate across tiers and execute synchronously with access to stdin/stdout/stderr for data flow.

Homework/Exercises to Practice the Concept

  1. Exercise 1: Context Injection with userPromptSubmit
    • Create a userPromptSubmit hook that adds git status to every message
    • Test by asking Kiro to describe the current state of your repo
    • Verify the git status appears in the conversation context
  2. Exercise 2: Auto-Format with postToolUse
    • Install Prettier (npm install -g prettier)
    • Create a postToolUse hook with matcher "write" that runs prettier --write on the file
    • Ask Kiro to write a poorly-formatted JS file
    • Verify Prettier auto-formats it
  3. Exercise 3: Block Dangerous Commands with preToolUse
    • Create a preToolUse hook with matcher "bash" that blocks commands containing rm -rf
    • Test by asking Kiro to “remove the /tmp/test directory with rm -rf”
    • Verify the hook blocks it and returns an error to Kiro
  4. Exercise 4: Audit Logging
    • Create a preToolUse hook that logs all bash commands to ~/.kiro/audit.log (but doesn’t block)
    • Run several Kiro commands (bash, write, etc.)
    • Verify the audit log contains all bash commands with timestamps
  5. Exercise 5: Hook Execution Order
    • Add the same userPromptSubmit hook to both global and project configs (with different echo messages)
    • Submit a message to Kiro
    • Verify both hooks run and observe the execution order (global → project)

Solutions to the Homework/Exercises

Solution 1:

# Create/edit .kiro/settings/agent-config.json
$ cat > .kiro/settings/agent-config.json << 'EOF'
{
  "hooks": {
    "userPromptSubmit": [
      {
        "command": "echo '\n\n--- Current Git Status ---' && git status --short"
      }
    ]
  }
}
EOF

# Start Kiro and test
$ kiro-cli
> What files have I changed?

# Kiro sees your message PLUS the git status output:
# "What files have I changed?
#
# --- Current Git Status ---
#  M src/auth.ts
#  M src/utils.ts
# ?? src/new-file.ts"

# Kiro responds: "Based on the git status, you've modified auth.ts and utils.ts, and added a new untracked file new-file.ts."

Solution 2:

# Install Prettier
$ npm install -g prettier

# Create hook configuration
$ cat > .kiro/settings/agent-config.json << 'EOF'
{
  "hooks": {
    "postToolUse": [
      {
        "matcher": "write",
        "command": "prettier --write '${FILE}' 2>&1"
      }
    ]
  }
}
EOF

# Test with Kiro
$ kiro-cli
> Write a file called test.js with this content:
> function foo(){return{a:1,b:2}}

# Kiro writes the file (poorly formatted)
# Hook triggers: prettier --write test.js
# File is automatically reformatted:
# function foo() {
#   return { a: 1, b: 2 };
# }

Solution 3:

# Create hook script
$ mkdir -p ~/.kiro/scripts
$ cat > ~/.kiro/scripts/block-dangerous-rm.sh << 'EOF'
#!/bin/bash
CMD=$(cat)

if echo "$CMD" | grep -q "rm -rf"; then
  echo "ERROR: Dangerous command 'rm -rf' is blocked by security policy" >&2
  exit 2
fi

exit 0
EOF

$ chmod +x ~/.kiro/scripts/block-dangerous-rm.sh

# Configure hook
$ cat > .kiro/settings/agent-config.json << 'EOF'
{
  "hooks": {
    "preToolUse": [
      {
        "matcher": "bash",
        "command": "~/.kiro/scripts/block-dangerous-rm.sh"
      }
    ]
  }
}
EOF

# Test
$ kiro-cli
> Remove the /tmp/test directory using rm -rf

# Kiro receives error from hook:
# "ERROR: Dangerous command 'rm -rf' is blocked by security policy"
# Kiro responds: "I cannot execute that command because it's blocked by your security policy. Would you like me to use a safer alternative like moving the directory to trash?"

Solution 4:

# Create audit hook
$ cat > ~/.kiro/scripts/audit-bash.sh << 'EOF'
#!/bin/bash
CMD=$(cat)
echo "$(date '+%Y-%m-%d %H:%M:%S') - BASH: $CMD" >> ~/.kiro/audit.log
cat <<< "$CMD"  # Pass through unmodified
exit 0
EOF

$ chmod +x ~/.kiro/scripts/audit-bash.sh

# Configure hook
$ cat > ~/.kiro/settings/agent-config.json << 'EOF'
{
  "hooks": {
    "preToolUse": [
      {
        "matcher": "bash",
        "command": "~/.kiro/scripts/audit-bash.sh"
      }
    ]
  }
}
EOF

# Test
$ kiro-cli
> List files in the current directory
> Check disk usage
> Show git log

# Verify audit log
$ cat ~/.kiro/audit.log
2025-01-02 14:32:15 - BASH: ls -la
2025-01-02 14:32:45 - BASH: df -h
2025-01-02 14:33:10 - BASH: git log --oneline -10

Solution 5:

# Global config (~/.kiro/settings/agent-config.json):
{
  "hooks": {
    "userPromptSubmit": [
      {
        "command": "echo '[GLOBAL HOOK] Current time:' && date"
      }
    ]
  }
}

# Project config (.kiro/settings/agent-config.json):
{
  "hooks": {
    "userPromptSubmit": [
      {
        "command": "echo '[PROJECT HOOK] Current branch:' && git branch --show-current"
      }
    ]
  }
}

# Start Kiro and submit a message
$ kiro-cli
> Hello

# Kiro sees:
# "Hello
#
# [GLOBAL HOOK] Current time:
# Thu Jan 2 14:35:22 PST 2025
#
# [PROJECT HOOK] Current branch:
# feature/hooks-demo"

# Both hooks run, global first, then project

Chapter 6: Planning and Subagents - Split Thinking From Doing

Fundamentals

Planning and subagents are Kiro’s separation-of-concerns mechanisms for complex workflows. Planning agents (accessed via Shift+Tab or /plan) are specialized agents that transform vague ideas into structured implementation plans without executing code—pure thinking, no doing. Subagents are autonomous worker processes that run tasks in parallel with isolated context windows, allowing Kiro to investigate multiple data sources simultaneously or delegate specialized work without polluting the main conversation context. Together, they enable Kiro to scale beyond the “one brain, one task” bottleneck: plan the work (planning agent), then split the execution across multiple workers (subagents).

How This Fits on Projects

  • Project 4: Build a subagent researcher that explores multiple data sources in parallel
  • Project 5: Create a plan architect that generates technical specifications from user stories
  • Project 10: Implement a subagent orchestrator for parallel task processing
  • Project 11: Build a planning agent workflow that separates design from implementation
  • Project 25: Create a code review workflow using multiple subagents (security, style, tests)
  • Project 29: Implement a background worker system using delegate subagents

Definitions & Key Terms

  • Planning Agent: A built-in specialized agent (Shift+Tab or /plan) that creates implementation plans without executing code
  • Subagent: An autonomous worker process with its own context window, spawned by the main agent
  • Parallel Execution: Running multiple subagents simultaneously (up to 10 concurrent tasks)
  • Context Isolation: Each subagent has its own 200K token context, separate from the main agent
  • Delegate: Background subagent that runs tasks asynchronously while you continue working
  • Context Gatherer: Built-in subagent for exploring projects and collecting information
  • General-Purpose Subagent: Built-in subagent for parallelizing arbitrary tasks
  • Custom Subagent: User-defined subagent with specialized configuration (tools, steering, MCP servers)
  • Plan Mode: Interactive workflow where planning agent asks clarifying questions before generating spec

Mental Model Diagram

┌────────────────────────────────────────────────────────────┐
│                  MAIN KIRO CONVERSATION                    │
│  Context: 58K / 200K tokens (chat history, files)         │
│  Focus: Coordinating work, making decisions               │
└─────────────────┬──────────────────────────────────────────┘
                  │
                  │ Spawns Subagents (Parallel Execution)
                  │
        ┌─────────┴─────────┬──────────────┬─────────────────┐
        │                   │              │                 │
        ▼                   ▼              ▼                 ▼
┌───────────────┐  ┌───────────────┐  ┌──────────────┐  ┌──────────────┐
│ Subagent 1    │  │ Subagent 2    │  │ Subagent 3   │  │ Planning     │
│ (Research)    │  │ (Code Review) │  │ (Testing)    │  │ Agent        │
├───────────────┤  ├───────────────┤  ├──────────────┤  ├──────────────┤
│ Context:      │  │ Context:      │  │ Context:     │  │ Context:     │
│ 12K / 200K    │  │ 8K / 200K     │  │ 6K / 200K    │  │ 15K / 200K   │
│               │  │               │  │              │  │              │
│ Task:         │  │ Task:         │  │ Task:        │  │ Task:        │
│ - Fetch GitHub│  │ - Scan for    │  │ - Run pytest │  │ - Draft tech │
│   issues      │  │   security    │  │ - Analyze    │  │   spec       │
│ - Summarize   │  │   vulns       │  │   coverage   │  │ - Create     │
│   priorities  │  │ - Check deps  │  │ - Report     │  │   subtasks   │
│               │  │               │  │   failures   │  │ (No Execute!)│
└───────┬───────┘  └───────┬───────┘  └──────┬───────┘  └──────┬───────┘
        │                  │                 │                 │
        │ Results          │ Results         │ Results         │ Plan
        └──────────────────┴─────────────────┴─────────────────┘
                                │
                  ┌─────────────┴──────────────┐
                  │   MAIN AGENT SYNTHESIZES    │
                  │   Combines all results      │
                  │   Makes final decision      │
                  └─────────────────────────────┘

Planning Agent Workflow (Shift+Tab):
1. User: "Build a feature for user authentication"
2. Plan Agent asks: "OAuth or custom? Session or JWT? Database?"
3. User answers clarifying questions
4. Plan Agent outputs: Markdown file with:
   - Technical spec
   - Step-by-step tasks
   - Files to create/modify
   - Testing strategy
5. User approves → Main agent executes plan

Deep Dive

As projects grow, two bottlenecks emerge: context limits (you can’t fit the entire codebase in 200K tokens) and serial execution (Kiro can’t investigate multiple things at once). Planning agents and subagents solve these problems by splitting work across specialized processes.

Planning Agents: Thinking Without Doing

The planning agent is a built-in specialized persona optimized for architectural thinking, not execution. When you press Shift+Tab or type /plan, Kiro switches to plan mode:

  1. Clarification Phase: The agent asks targeted questions to understand requirements (“Which database? Which authentication method? What’s the deployment target?”)
  2. Specification Phase: It drafts a technical spec as a structured Markdown document with clear sections (Overview, Architecture, Tasks, Testing, Deployment)
  3. Review Phase: You review the plan, make edits, and approve
  4. Handoff: The main agent receives the plan as context and executes it step-by-step

Why separate planning from execution?

  • Different mindsets: Planning requires breadth (consider all options, identify risks). Execution requires depth (focus on current task, handle edge cases).
  • Avoid premature optimization: Without a plan, agents tend to start coding immediately, leading to rework when requirements change.
  • Clear checkpoints: You approve the plan before code is written, catching misunderstandings early.

Example Planning Agent Output:

# Technical Specification: User Authentication

## Overview
Implement JWT-based authentication with refresh tokens.

## Architecture
- Auth middleware in `src/middleware/auth.ts`
- Token service in `src/services/token.ts`
- User model in `src/models/user.ts`

## Tasks
1. Install dependencies (`jsonwebtoken`, `bcrypt`)
2. Create User model with password hashing
3. Implement token service (generate, verify, refresh)
4. Add auth middleware to protect routes
5. Create login/logout endpoints

## Testing Strategy
- Unit tests for token service
- Integration tests for auth endpoints
- E2E tests for login/logout flow

## Security Considerations
- Store refresh tokens in HTTP-only cookies
- Use environment variables for JWT secret
- Hash passwords with bcrypt (12 rounds)

Subagents: Parallel Execution with Context Isolation

Subagents are separate Kiro processes spawned by the main agent. Each has:

  • Own context window: 200K tokens, isolated from main agent
  • Own tools: Access to read/write, bash, MCP servers (configurable)
  • Own configuration: Can use custom agent configs with specialized steering

Three Types of Subagents:

  1. Context Gatherer (built-in): Explores projects, reads files, summarizes information
    • Use case: “Gather all error handling patterns in the codebase”
    • Output: Summary of findings, returned to main agent
  2. General-Purpose (built-in): Executes arbitrary tasks in parallel
    • Use case: “Check GitHub issues, run tests, and fetch AWS logs—simultaneously”
    • Output: Results from all tasks, aggregated
  3. Custom Subagents (user-defined): Specialized workers with custom configurations
    • Example: Security scanner subagent (with mcp-server-semgrep MCP server + security steering)
    • Example: Performance profiler subagent (with flamegraph tools + profiling prompts)

How Subagents Work (Step-by-Step):

  1. Main agent decides a task benefits from parallelization or needs isolated context
  2. Spawns subagent(s) with specific instructions: [Subagent 1: Analyze auth.ts for security issues] [Subagent 2: Run tests and report failures]
  3. Each subagent works autonomously:
    • Reads files, runs tools, calls MCP servers
    • Context grows within its own 200K window (doesn’t affect main agent)
  4. Subagents complete and return results to main agent
  5. Main agent synthesizes results: “Subagent 1 found 3 security issues. Subagent 2 reports 2 failing tests.”
  6. Main agent presents combined findings to you

Parallelization Example:

Instead of serial execution (slow):

Main Agent:
1. Fetch GitHub issues (30 seconds)
2. Run tests (60 seconds)
3. Query database for metrics (20 seconds)
Total: 110 seconds

With subagents (fast):

Main Agent spawns 3 subagents:
├─ Subagent 1: Fetch GitHub issues (30 seconds)
├─ Subagent 2: Run tests (60 seconds)
└─ Subagent 3: Query database (20 seconds)

All run simultaneously. Total: 60 seconds (slowest task)

Context Isolation Benefits:

  • Main agent stays focused: Doesn’t get polluted with subagent’s exploratory work
  • Deeper investigation: Subagent can read hundreds of files without hitting main agent’s context limit
  • Clean summarization: Subagent distills findings before returning to main agent

Configuration Hierarchy for Subagents:

Subagents can inherit configurations or use custom ones:

  • Inherit from main: Default behavior (same MCP servers, hooks, steering)
  • Custom config: Specify .kiro/agents/security-scanner/config.json with specialized tools/steering

How It Works (Step-by-Step)

Planning Agent Workflow:

  1. Trigger: Press Shift+Tab or type /plan
  2. Context Injection: Kiro reads project files, git status, existing docs
  3. Clarification: “What authentication method? What database? What’s the deployment target?”
  4. User Interaction: You answer questions via chat
  5. Draft Plan: Planning agent generates a Markdown spec with sections (Overview, Architecture, Tasks, Testing)
  6. Review & Approve: You edit the plan or approve it
  7. Handoff: Main agent receives the plan and begins execution

Subagent Workflow:

  1. Spawn: Main agent decides to delegate a task and spawns subagent with instructions
  2. Isolation: Subagent gets own context window (200K tokens, empty at start)
  3. Execution: Subagent reads files, runs tools, calls MCP servers autonomously
  4. Completion: Subagent returns results (summary, findings, errors)
  5. Synthesis: Main agent integrates results and continues

Minimal Concrete Example

1. Use Planning Agent:

$ kiro-cli
> I need to add user authentication to my app

# Press Shift+Tab to enter plan mode
[Plan Mode Activated]

Plan Agent: "To design the authentication system, I need to clarify a few things:
1. Which authentication method? (OAuth, JWT, Session-based)
2. Which database are you using?
3. Do you need social login (Google, GitHub)?
4. What's your deployment environment?"

> JWT, PostgreSQL, no social login, Docker

Plan Agent: "Great. I'll draft a technical spec for JWT authentication with PostgreSQL."

[Generates 50-line Markdown spec with architecture, tasks, and testing strategy]

> Looks good, proceed

[Main agent takes over, executes plan step-by-step]

2. Spawn Subagents for Parallel Tasks:

$ kiro-cli
> Investigate why the build is failing. Check:
> 1. GitHub Actions logs
> 2. Local test failures
> 3. Dependency conflicts

Main Agent: "I'll spawn 3 subagents to investigate in parallel."

[Spawns Subagent 1: Fetch GitHub Actions logs via gh CLI]
[Spawns Subagent 2: Run tests locally and analyze failures]
[Spawns Subagent 3: Check package.json for dependency conflicts]

[All run simultaneously, ~30 seconds]

Subagent 1 Result: "GitHub Actions failed on 'npm install' step. Error: ENOTFOUND registry.npmjs.org"
Subagent 2 Result: "Local tests pass. No failures."
Subagent 3 Result: "Found conflicting versions: react@17 (package.json) vs react@18 (package-lock.json)"

Main Agent: "The build is failing because npm install can't reach the registry (network issue), and there's a React version conflict. Fix the conflict first, then retry CI."

Common Misconceptions

  1. “Planning agents execute code”
    • Truth: Planning agents only think and write specs. They don’t call tools like write, bash, or edit. Their output is a Markdown plan, which the main agent executes.
  2. “Subagents share context with the main agent”
    • Truth: Each subagent has an isolated 200K context window. They don’t see the main agent’s conversation history. This is intentional (keeps main context clean).
  3. “Subagents are slower than main agent execution”
    • Truth: For parallel tasks, subagents are faster (tasks run simultaneously). For serial tasks, there’s overhead from spawning processes.
  4. “You need to manually manage subagent lifecycles”
    • Truth: Kiro spawns and cleans up subagents automatically. You just ask for parallelized work, and Kiro handles the orchestration.
  5. “Planning mode is mandatory for complex tasks”
    • Truth: It’s optional but recommended. You can ask Kiro to build features directly, but planning first reduces rework and gives you control.

Check-Your-Understanding Questions

  1. What’s the difference between a planning agent and a subagent?
  2. How many subagents can Kiro run concurrently?
  3. If a subagent reads 50 files (100K tokens), does that count toward the main agent’s context limit?
  4. When should you use plan mode (Shift+Tab) vs. asking Kiro to implement directly?
  5. Can subagents access MCP servers configured in the main agent’s settings?

Check-Your-Understanding Answers

  1. Planning agent: Specialized for creating implementation specs (thinking, no code execution). Subagent: Autonomous worker for executing tasks in parallel with isolated context.

  2. 10 concurrent subagents (as of 2025 Kiro documentation for autonomous agent variant).

  3. No. Subagents have isolated context windows. The main agent only sees the subagent’s final result (summary), not every file it read.

  4. Use plan mode when requirements are unclear, the task is complex, or you want to review the approach before execution. Use direct implementation for well-defined, straightforward tasks.

  5. Yes, by default. Subagents inherit the main agent’s MCP servers unless you configure them with a custom agent config.

Real-World Applications

  1. Multi-Repository Refactoring: Spawn subagents to analyze different repos simultaneously, then synthesize changes in main agent
  2. Security Audits: Parallel subagents check code (static analysis), dependencies (vulnerability scan), and infrastructure (cloud config review)
  3. Data Pipeline Analysis: Subagents query different data sources (Postgres, S3, APIs) concurrently, main agent aggregates results
  4. CI/CD Debugging: Parallel investigation of logs (GitHub Actions), tests (local runner), and deployment (AWS CloudWatch)
  5. Feature Planning: Use planning agent to design complex features (e.g., payment integration) before writing code
  6. Code Review Workflows: Subagents perform style checks, security scans, and test coverage analysis in parallel

Where You’ll Apply It

  • Project 4: The Subagent Researcher - Build a system that spawns multiple subagents to research different topics and synthesize findings
  • Project 5: The Plan Architect - Create a workflow that uses planning agents to generate technical specs from user stories
  • Project 10: The Subagent Orchestrator - Implement a task queue that distributes work across subagents based on priority
  • Project 11: The Planning Agent Workflow - Design a structured development process with mandatory planning phase before implementation
  • Project 25: The Code Review Workflow - Use multiple subagents (security, style, tests, docs) to review pull requests in parallel
  • Project 29: The Delegate Background Worker - Build async workflows where subagents run long tasks (tests, builds) while you continue working

References

Key Insights

Planning agents separate “what to build” from “how to build it,” while subagents break serial execution into parallel workflows—together they transform Kiro from a single-threaded assistant into a multi-agent orchestrator that thinks before acting and works in parallel when possible.

Summary

Planning agents (Shift+Tab or /plan) create implementation specifications without executing code, focusing on architectural thinking and requirements clarification. Subagents are autonomous workers with isolated 200K-token context windows that execute tasks in parallel (up to 10 concurrent). Kiro offers three subagent types: context gatherer (exploration), general-purpose (parallelization), and custom (user-defined configurations). Subagents inherit main agent’s tools and MCP servers by default but can use specialized configs, enabling workflows that split thinking (planning) from doing (subagents) and serial execution from parallel processing.

Homework/Exercises to Practice the Concept

  1. Exercise 1: Use Planning Agent
    • Start Kiro and press Shift+Tab (or type /plan)
    • Ask the planning agent to design a REST API for a task management system
    • Observe the clarifying questions it asks
    • Review the generated technical spec
  2. Exercise 2: Parallel Data Collection with Subagents
    • Ask Kiro to “investigate these three GitHub repos simultaneously: torvalds/linux, nodejs/node, and rust-lang/rust”
    • Observe Kiro spawning subagents for parallel execution
    • Compare execution time to serial investigation
  3. Exercise 3: Context Isolation Verification
    • Spawn a subagent to read 20+ files in a large codebase
    • Check the main agent’s context usage (should not increase significantly)
    • Ask the main agent what it knows about those files (should only know the summary the subagent provided)
  4. Exercise 4: Custom Subagent Configuration
    • Create a custom agent config at .kiro/agents/security-scanner/config.json
    • Add steering rules for security scanning
    • Spawn a subagent using this config to scan code for vulnerabilities
  5. Exercise 5: Plan-Execute-Review Workflow
    • Use planning agent to design a feature (e.g., “user authentication”)
    • Approve the plan and let Kiro execute it
    • After completion, ask Kiro to review what it built against the original plan

Solutions to the Homework/Exercises

Solution 1:

$ kiro-cli

# Press Shift+Tab to enter plan mode
[Plan Mode Activated]

> Design a REST API for a task management system

Plan Agent: "To design the API, I need some clarifications:
1. Which database? (PostgreSQL, MongoDB, etc.)
2. Authentication method? (JWT, OAuth, Session)
3. What entities besides tasks? (Users, Projects, Tags?)
4. Required endpoints? (CRUD for tasks, filtering, search?)
5. Deployment target? (AWS, Vercel, Docker)"

> PostgreSQL, JWT, Users and Tasks, full CRUD plus search, Docker

Plan Agent: "Great. Here's the technical specification:"

---
# Technical Specification: Task Management REST API

## Overview
RESTful API for task management with JWT authentication and PostgreSQL.

## Database Schema
- Users: id, email, password_hash, created_at
- Tasks: id, user_id, title, description, status, due_date, created_at

## Endpoints
- POST /auth/register - Create user
- POST /auth/login - Get JWT token
- GET /tasks - List user's tasks (with filtering)
- POST /tasks - Create task
- GET /tasks/:id - Get task details
- PUT /tasks/:id - Update task
- DELETE /tasks/:id - Delete task

## Architecture
- Express.js server
- Prisma ORM for PostgreSQL
- JWT middleware for auth
- Docker Compose for PostgreSQL + API

## Implementation Tasks
1. Setup: Express + Prisma + PostgreSQL
2. Models: User and Task schemas
3. Auth: Register, login, JWT middleware
4. CRUD: Task endpoints with auth
5. Testing: Integration tests with supertest
6. Docker: Compose file for deployment
---

> Looks good, proceed

[Main agent begins executing the plan step-by-step]

Solution 2:

$ kiro-cli

> Investigate these three GitHub repos simultaneously: torvalds/linux, nodejs/node, and rust-lang/rust. For each, tell me the star count, primary language, and last commit date.

Main Agent: "I'll spawn 3 subagents to fetch this information in parallel."

[Spawning Subagent 1 for torvalds/linux...]
[Spawning Subagent 2 for nodejs/node...]
[Spawning Subagent 3 for rust-lang/rust...]

[All subagents run ~10 seconds concurrently]

Subagent 1 Result: "torvalds/linux: 182,456 stars, C (98%), last commit 2 hours ago"
Subagent 2 Result: "nodejs/node: 106,234 stars, JavaScript (68%), last commit 5 hours ago"
Subagent 3 Result: "rust-lang/rust: 97,845 stars, Rust (95%), last commit 1 hour ago"

Main Agent:
"Here's the summary:
1. **Linux Kernel** (torvalds/linux): 182K stars, C, active (2h ago)
2. **Node.js** (nodejs/node): 106K stars, JavaScript, active (5h ago)
3. **Rust** (rust-lang/rust): 98K stars, Rust, active (1h ago)

All three are highly active projects with commits in the last few hours."

# Without subagents (serial), this would take ~30 seconds (3 x 10s)
# With subagents (parallel), it took ~10 seconds (slowest task)

Solution 3:

$ kiro-cli

> Spawn a subagent to read all files in the src/ directory and summarize the code architecture. Don't add those files to my context.

Main Agent: "I'll spawn a context gatherer subagent to explore src/."

[Spawning Subagent: Context Gatherer]

Subagent (working in isolated context):
- Reading src/index.ts (8K tokens)
- Reading src/auth.ts (6K tokens)
- Reading src/database.ts (10K tokens)
- Reading src/utils.ts (4K tokens)
[... 15 more files, total 120K tokens in subagent context]

Subagent Result (returned to main agent):
"Architecture summary (2K tokens):
- Entry point: src/index.ts (Express server setup)
- Auth: src/auth.ts (JWT middleware, bcrypt hashing)
- Database: src/database.ts (Prisma client, connection pooling)
- Utils: src/utils.ts (validation, error handling)
- Structure: Layered architecture (routes → controllers → services → models)"

# Check main agent context
> What's in your context right now?

Main Agent: "My context contains:
- Our conversation history (15K tokens)
- The subagent's architecture summary (2K tokens)
- Total: 17K / 200K tokens

I don't have the full contents of those 20 files—only the summary."

Solution 4:

# Create custom security scanner agent config
$ mkdir -p .kiro/agents/security-scanner
$ cat > .kiro/agents/security-scanner/config.json << 'EOF'
{
  "name": "Security Scanner",
  "description": "Specialized agent for security vulnerability scanning",
  "allowedTools": ["read", "grep", "glob", "mcp"],
  "mcpServers": {
    "semgrep": {
      "command": "npx",
      "args": ["-y", "@semgrep/semgrep-mcp-server"]
    }
  },
  "steering": {
    "security.md": "# Security Scanning Rules\n\n- Scan for hardcoded secrets (API_KEY, PASSWORD)\n- Check for SQL injection vulnerabilities\n- Identify XSS risks\n- Flag insecure dependencies\n- Report findings with severity (HIGH, MEDIUM, LOW)"
  }
}
EOF

# Use Kiro to spawn security scanner subagent
$ kiro-cli

> Spawn a security scanner subagent using the config in .kiro/agents/security-scanner to scan all files in src/ for vulnerabilities

Main Agent: "Spawning security scanner subagent with custom configuration..."

[Subagent loads security.md steering + semgrep MCP server]
[Subagent scans all src/ files]

Subagent Result:
"Found 3 security issues:
HIGH: Hardcoded API key in src/config.ts (line 42)
MEDIUM: Potential SQL injection in src/database.ts (line 88)
LOW: Missing HTTPS enforcement in src/server.ts (line 12)"

Main Agent: "The security scan found 3 issues. Would you like me to fix them?"

Solution 5:

$ kiro-cli

# Step 1: Plan
> I want to add user authentication to my Express app. Use plan mode.

# Press Shift+Tab
[Plan Mode Activated]

Plan Agent: "Questions about authentication:
1. JWT or session-based?
2. Which database?
3. Need password reset functionality?"

> JWT, PostgreSQL, yes include password reset

Plan Agent: [Generates detailed spec with 8 sections]

# Step 2: Execute
> Looks good, implement it

Main Agent: [Executes plan over 10 minutes, creating 5 files, running tests]

# Step 3: Review
> Review what you built against the original plan. Did you implement everything?

Main Agent: "Comparing implementation to plan:
✅ JWT token service (implemented in src/auth/tokens.ts)
✅ User model with password hashing (src/models/user.ts)
✅ Login/logout endpoints (src/routes/auth.ts)
✅ Password reset flow (src/routes/reset-password.ts)
✅ Auth middleware (src/middleware/auth.ts)
✅ Tests (tests/auth.test.ts - 12 passing tests)

All planned features are implemented and tested."

Chapter 7: Models and Routing - Choose the Right Brain

Fundamentals

Kiro’s Auto Router is an intelligent model selection system that dynamically chooses the optimal Claude model for each task—balancing speed, cost, and quality without manual intervention. Instead of forcing you to decide “Should I use Haiku (fast/cheap) or Opus (powerful/expensive) for this task?”, Auto Router analyzes the complexity and picks the right model automatically. It’s like having a performance-aware compiler that optimizes execution strategy based on workload: simple file reads use Haiku (0.4x credit cost), complex refactorings use Sonnet 4.5 (1.3x cost), and architectural design uses Opus 4.5 (2.2x cost). Manual model selection (/model command) is available for specific use cases, but Auto (1x cost) is both smarter and more cost-efficient than always using Sonnet.

How This Fits on Projects

  • Project 2: Build a model analyzer to understand auto-router decisions
  • Project 3: Create a context window visualizer showing token budgets per model
  • Project 35: Implement a deep reasoner that explicitly uses Opus for algorithmic challenges
  • All Projects: Auto Router selects the optimal model for each task automatically

Definitions & Key Terms

  • Auto Router: Kiro’s default intelligent model selector (combines Haiku, Sonnet, Opus based on task complexity)
  • Claude Haiku 4.5: Fast, lightweight model (0.4x credits) for simple tasks (file reads, basic queries)
  • Claude Sonnet 4.5: Balanced model (1.3x credits) for standard development tasks (coding, refactoring)
  • Claude Opus 4.5: Powerful model (2.2x credits) for complex reasoning (architecture, algorithms, debugging)
  • Credit Multiplier: Cost factor relative to Auto mode (Auto = 1x baseline)
  • Model Switching: Manual model selection via /model command
  • Context Window: Token limit per model (200K for all Claude models as of 2025)
  • Experimental Support: Opus 4.5 availability (not supported for AWS IAM Identity Center users)

Mental Model Diagram

┌────────────────────────────────────────────────────────────┐
│                    KIRO AUTO ROUTER                        │
│  "Choose the right brain for each task automatically"     │
└────────────────────────────────────────────────────────────┘

User Request: "Refactor the auth system to use OAuth2"
                              │
                              ▼
                  ┌───────────────────────┐
                  │   AUTO ROUTER         │
                  │   Analyzes Task:      │
                  │   - Complexity        │
                  │   - Context Size      │
                  │   - User History      │
                  │   - Cost/Quality      │
                  └─────────┬─────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        │                   │                   │
        │                   │                   │
┌───────▼────────┐  ┌───────▼────────┐  ┌──────▼─────────┐
│  HAIKU 4.5     │  │  SONNET 4.5    │  │   OPUS 4.5     │
│  (0.4x cost)   │  │  (1.3x cost)   │  │   (2.2x cost)  │
├────────────────┤  ├────────────────┤  ├────────────────┤
│ Use Cases:     │  │ Use Cases:     │  │ Use Cases:     │
│ • File reads   │  │ • Code writing │  │ • Architecture │
│ • Simple edits │  │ • Refactoring  │  │ • Algorithms   │
│ • Quick Q&A    │  │ • Testing      │  │ • Debugging    │
│ • Navigation   │  │ • Standard dev │  │ • Planning     │
│                │  │                │  │                │
│ Context:       │  │ Context:       │  │ Context:       │
│ 200K tokens    │  │ 200K tokens    │  │ 200K tokens    │
│                │  │                │  │                │
│ Speed:         │  │ Speed:         │  │ Speed:         │
│ ⚡⚡⚡ Fastest  │  │ ⚡⚡ Fast      │  │ ⚡ Slower      │
└────────────────┘  └────────────────┘  └────────────────┘

Manual Selection (Override):
$ kiro-cli
> /model opus     # Force Opus for all tasks
> /model auto     # Return to Auto Router (recommended)

Cost Comparison for Same Workflow:
┌────────────────────────────────────────────────────────┐
│ Task: "Build user auth, run tests, fix 3 bugs"        │
├────────────────────────────────────────────────────────┤
│ Auto Router:    100 credits (1x)  ← Most efficient    │
│ Always Sonnet:  130 credits (1.3x) ← Overpaying       │
│ Always Opus:    220 credits (2.2x) ← Very expensive   │
│ Always Haiku:    40 credits (0.4x) ← Too weak         │
│                                    → Will fail tasks   │
└────────────────────────────────────────────────────────┘

Deep Dive

The Auto Router solves a fundamental trade-off in AI development: stronger models are better at complex tasks but cost more. Manually choosing the “right” model for every task is tedious and error-prone (you’d always pick Opus to be safe, wasting money on simple tasks). Auto Router applies machine learning techniques to predict task complexity and selects the optimal model dynamically.

How Auto Router Works:

  1. Task Classification: When you submit a request, the router analyzes:
    • Intent: What are you asking for? (read file vs. refactor architecture)
    • Context Size: How much code/data is involved?
    • Complexity Signals: Keywords like “design,” “optimize,” “explain deeply” suggest harder tasks
    • Historical Performance: Which model succeeded on similar tasks?
  2. Model Selection Heuristics:
    • Haiku: Tasks with clear, procedural steps (file reads, simple edits, grep searches)
    • Sonnet: Standard coding tasks (write functions, refactor, add tests, implement features)
    • Opus: High-complexity reasoning (architectural decisions, algorithmic optimizations, debugging race conditions, explaining complex systems)
  3. Cost-Quality Optimization: Auto Router doesn’t just pick the cheapest model that can do the job—it balances:
    • Quality: Will the output be correct and maintainable?
    • Cost: Is a cheaper model sufficient?
    • Latency: For interactive workflows, speed matters
  4. Feedback Loop: Over time, the router learns:
    • If Sonnet fails on a task type, escalate to Opus
    • If Opus is overkill, try Sonnet first next time
    • Adapt to your workflow patterns

Cost Economics:

Kiro uses a credit system where Auto = 1x baseline. Here’s the multiplier table:

Model Credit Multiplier When to Use
Auto 1x Default (recommended)
Haiku 4.5 0.4x Simple, repetitive tasks (if you’re sure)
Sonnet 4.5 1.3x When you need consistent quality
Opus 4.5 2.2x Complex reasoning, research, architecture

Key Insight: Auto mode (1x) is cheaper than manually picking Sonnet (1.3x) for all tasks, because Auto uses Haiku for simple work and Opus only when necessary.

Example Cost Breakdown:

A typical development session:

1. Read 10 files (Haiku):        0.4x * 10 = 4 credits
2. Write auth middleware (Sonnet): 1.3x * 1 = 1.3 credits
3. Refactor API routes (Sonnet):   1.3x * 1 = 1.3 credits
4. Debug race condition (Opus):    2.2x * 1 = 2.2 credits
Total: 8.8 credits with Auto Router

Same session with always-Sonnet:
1. Read 10 files (Sonnet):       1.3x * 10 = 13 credits
2. Write auth (Sonnet):          1.3x * 1  = 1.3 credits
3. Refactor API (Sonnet):        1.3x * 1  = 1.3 credits
4. Debug race (Sonnet, fails):   1.3x * 1  = 1.3 credits
   → Retry with Opus:            2.2x * 1  = 2.2 credits
Total: 19.1 credits (2.2x more expensive, took longer!)

Manual Model Selection:

Use /model command when:

  • Experimentation: Testing how different models handle a specific task
  • Benchmarking: Comparing output quality across models
  • Forcing stronger reasoning: Override Auto for known-hard tasks
  • Cost control: Force Haiku when budget is tight and quality is acceptable
$ kiro-cli
> /model opus
Switched to Claude Opus 4.5 (2.2x credits)

> Design a distributed consensus algorithm

[Opus provides detailed explanation with trade-offs, edge cases, and references]

> /model auto
Switched back to Auto Router (1x credits)

Limitations and Trade-Offs:

  1. Opus 4.5 Availability: Not available for AWS IAM Identity Center users (as of 2025)
  2. Experimental Status: Opus 4.5 is in experimental support (may have stability issues)
  3. Routing Overhead: Auto Router adds ~100ms latency for task classification (negligible for most workflows)
  4. Non-determinism: Auto Router’s choices can vary based on load, user history, and internal heuristics

How It Works (Step-by-Step)

  1. Submit Request: You ask Kiro to “refactor the auth system”
  2. Router Analyzes: Scans request for complexity signals (“refactor” → medium complexity)
  3. Context Check: Measures current context (files loaded, conversation history)
  4. Model Selection: Chooses Sonnet 4.5 (balanced speed/quality for refactoring)
  5. Execution: Sonnet processes the request and returns the refactored code
  6. Monitoring: If Sonnet struggles (multiple retries, low confidence), router escalates to Opus

Minimal Concrete Example

1. Default Auto Router Behavior:

$ kiro-cli  # Auto Router is enabled by default

> Read the README.md file
[Auto Router selects: Haiku 4.5 (0.4x cost, simple task)]
# File contents displayed quickly

> Write a function to parse JSON with error handling
[Auto Router selects: Sonnet 4.5 (1.3x cost, standard coding task)]
# Function written with proper try/catch

> Explain the trade-offs between B-tree and LSM-tree for database indexing
[Auto Router selects: Opus 4.5 (2.2x cost, deep technical reasoning)]
# Detailed explanation with diagrams, performance characteristics, use cases

2. Manual Model Override:

$ kiro-cli

> /model haiku
Switched to Claude Haiku 4.5 (0.4x credits)

> Refactor this complex state machine
[Haiku struggles, produces low-quality output]

> /model opus
Switched to Claude Opus 4.5 (2.2x credits)

> Refactor this complex state machine
[Opus produces clean, well-architected solution]

> /model auto
Switched back to Auto Router (recommended)

Common Misconceptions

  1. “Always using Opus gives better results”
    • Truth: For simple tasks (file reads, basic edits), Opus is overkill and 5x more expensive than Haiku via Auto Router. Quality gains are marginal.
  2. “Auto Router picks the cheapest model”
    • Truth: It balances cost and quality. For complex tasks, it uses Opus even though it’s expensive, because cheaper models would fail.
  3. “You should manually switch models for each task”
    • Truth: Auto Router is smarter than manual selection because it learns from millions of tasks. Manual selection is for edge cases only.
  4. “Haiku can’t handle coding tasks”
    • Truth: Haiku excels at simple, well-defined coding tasks (implementing a known algorithm, adding a basic CRUD endpoint). It struggles with ambiguity and complex reasoning.
  5. “Auto Router costs the same as Sonnet”
    • Truth: Auto (1x) is cheaper than always-Sonnet (1.3x) because it delegates simple tasks to Haiku (0.4x).

Check-Your-Understanding Questions

  1. What are the credit multipliers for Haiku, Sonnet, Opus, and Auto?
  2. Why is Auto Router (1x) cheaper than always using Sonnet (1.3x)?
  3. When should you manually override the Auto Router?
  4. If a task fails with Haiku, what does Auto Router do?
  5. Which model has the largest context window?

Check-Your-Understanding Answers

  1. Haiku: 0.4x, Sonnet: 1.3x, Opus: 2.2x, Auto: 1x (baseline).

  2. Because Auto Router uses Haiku (0.4x) for simple tasks and Opus (2.2x) only when necessary. Always-Sonnet pays 1.3x even for trivial file reads that Haiku can handle.

  3. When experimenting, benchmarking, or forcing specific behavior (e.g., you know a task needs Opus-level reasoning and don’t want Auto to try Sonnet first).

  4. Auto Router escalates to a stronger model (Sonnet or Opus, depending on task complexity) and retries the task.

  5. All Claude models have 200K token context windows (Haiku, Sonnet, Opus all share this limit as of 2025).

Real-World Applications

  1. Cost-Optimized Workflows: Use Auto Router to minimize costs across large development sessions (e.g., refactoring 50 files)
  2. Research Tasks: Manually select Opus for deep technical research, academic explanations, or algorithm design
  3. Rapid Prototyping: Force Haiku for fast iteration on well-scoped features when budget is constrained
  4. Code Review: Let Auto Router use Sonnet for style/logic checks, Opus for security audits
  5. Documentation Generation: Auto Router uses Haiku for boilerplate docs, Sonnet for API references, Opus for architectural decision records

Where You’ll Apply It

  • Project 2: The Model Router Analyzer - Build a tool that logs Auto Router decisions and visualizes which model was used for each task
  • Project 3: The Context Window Visualizer - Create a dashboard showing token usage across models and context budget
  • Project 35: The Deep Reasoner - Explicitly force Opus for algorithmic challenges and compare results to Sonnet
  • All Projects: Trust Auto Router to select the optimal model, monitor costs via /usage (if available)

References

Key Insights

Auto Router transforms model selection from a manual decision (“Which brain do I need?”) into an automated optimization problem, reducing costs by using Haiku for simple tasks and reserving Opus for genuinely hard problems—achieving better quality-per-dollar than always picking Sonnet.

Summary

Kiro’s Auto Router (1x credit cost) intelligently selects between Claude Haiku 4.5 (0.4x, fast/cheap), Sonnet 4.5 (1.3x, balanced), and Opus 4.5 (2.2x, powerful) based on task complexity, context size, and historical performance. It’s more cost-efficient than manual selection because it uses Haiku for simple tasks (file reads, basic edits) and Opus only for complex reasoning (architecture, algorithms, debugging). Manual model switching via /model is available for experimentation or forcing specific behavior, but Auto mode is recommended for production workflows. All models share a 200K-token context window.

Homework/Exercises to Practice the Concept

  1. Exercise 1: Observe Auto Router Decisions
    • Start Kiro in default Auto mode
    • Submit 5 tasks of varying complexity (file read, code write, architecture question)
    • Try to predict which model Auto Router will use for each
    • Verify your predictions (note: Kiro doesn’t expose model choice directly, but you can infer from response quality/speed)
  2. Exercise 2: Compare Model Quality
    • Pick a moderately complex task (“Refactor this function to use async/await”)
    • Try it with /model haiku, then /model sonnet, then /model opus
    • Compare output quality, response time, and suitability
  3. Exercise 3: Cost Analysis
    • Track a full development session (30 minutes of coding)
    • Estimate how many tasks were simple (Haiku-level), standard (Sonnet-level), and complex (Opus-level)
    • Calculate hypothetical cost with Auto vs always-Sonnet vs always-Opus
  4. Exercise 4: Force Opus for Deep Reasoning
    • Switch to Opus (/model opus)
    • Ask a complex technical question: “Explain the CAP theorem and its implications for distributed database design”
    • Compare the depth of explanation to what Sonnet would provide
  5. Exercise 5: Haiku Speed Test
    • Switch to Haiku (/model haiku)
    • Ask it to read 10 files and summarize their purpose
    • Measure response time and quality
    • Compare to Auto Router on the same task

Solutions to the Homework/Exercises

Solution 1:

$ kiro-cli  # Auto Router enabled

> Read src/auth.ts
# Prediction: Haiku (simple file read)
# Observation: Fast response (~1 second), basic file listing
# Result: Likely Haiku ✓

> Write a JWT authentication middleware for Express
# Prediction: Sonnet (standard coding task)
# Observation: Medium response time (~5 seconds), well-structured code
# Result: Likely Sonnet ✓

> What are the security implications of using JWTs vs session cookies?
# Prediction: Sonnet or Opus (depends on depth requested)
# Observation: Detailed response covering XSS, CSRF, token expiration, refresh tokens
# Result: Likely Sonnet (good depth) or Opus (exceptional depth) ✓

> Design a distributed rate-limiting system that handles 100K requests/sec with global consistency
# Prediction: Opus (complex distributed systems design)
# Observation: Slow response (~15 seconds), detailed architecture with trade-offs, algorithms, and implementation strategies
# Result: Definitely Opus ✓

> List all files in the src/ directory
# Prediction: Haiku (trivial task)
# Observation: Instant response (<1 second), simple file list
# Result: Likely Haiku ✓

Solution 2:

$ kiro-cli

# Test 1: Haiku
> /model haiku
> Refactor this function to use async/await:
> function getData(id, callback) {
>   db.query('SELECT * FROM users WHERE id = ?', [id], (err, result) => {
>     if (err) return callback(err);
>     callback(null, result);
>   });
> }

Haiku Output:
async function getData(id) {
  const result = await db.query('SELECT * FROM users WHERE id = ?', [id]);
  return result;
}
# Quality: Basic refactor, missing error handling, assumes promise-based db.query

# Test 2: Sonnet
> /model sonnet
> [Same prompt]

Sonnet Output:
async function getData(id) {
  try {
    const result = await db.query('SELECT * FROM users WHERE id = ?', [id]);
    return result;
  } catch (err) {
    throw new Error(`Failed to fetch user with id ${id}: ${err.message}`);
  }
}
# Quality: Proper error handling, clear error messages, good practices ✓

# Test 3: Opus
> /model opus
> [Same prompt]

Opus Output:
async function getData(id) {
  try {
    const result = await db.query('SELECT * FROM users WHERE id = ?', [id]);
    if (!result || result.length === 0) {
      throw new Error(`User not found for id: ${id}`);
    }
    return result[0]; // Return single user object
  } catch (err) {
    // Log error for debugging
    console.error(`Database error in getData(${id}):`, err);
    throw new Error(`Failed to fetch user with id ${id}: ${err.message}`);
  }
}
# Quality: Handles edge cases (no result), returns single object vs array, includes logging

Conclusion: Sonnet is the sweet spot for this task. Opus adds nice-to-haves but isn't necessary.

Solution 3:

# Development session (30 minutes, 20 tasks):

Simple tasks (Haiku-level, 0.4x): 10 tasks
- Read 5 files (0.4 * 5 = 2.0 credits)
- List directories (0.4 * 1 = 0.4 credits)
- Simple edits (0.4 * 4 = 1.6 credits)
Subtotal: 4.0 credits

Standard tasks (Sonnet-level, 1.3x): 8 tasks
- Write 3 functions (1.3 * 3 = 3.9 credits)
- Refactor 2 modules (1.3 * 2 = 2.6 credits)
- Add 3 test files (1.3 * 3 = 3.9 credits)
Subtotal: 10.4 credits

Complex tasks (Opus-level, 2.2x): 2 tasks
- Debug race condition (2.2 * 1 = 2.2 credits)
- Design API architecture (2.2 * 1 = 2.2 credits)
Subtotal: 4.4 credits

Total costs:
- Auto Router: 4.0 + 10.4 + 4.4 = 18.8 credits (1x equivalent)
- Always Sonnet: 1.3 * 20 = 26 credits (1.38x more expensive)
- Always Opus: 2.2 * 20 = 44 credits (2.34x more expensive)
- Always Haiku: 0.4 * 20 = 8 credits, but 2 tasks would fail → retry with Sonnet → 8 + (1.3 * 2) = 10.6 credits + wasted time

Solution 4:

$ kiro-cli

> /model opus
Switched to Claude Opus 4.5 (2.2x credits)

> Explain the CAP theorem and its implications for distributed database design. Include real-world examples.

Opus Response (excerpt):
"The CAP theorem, proven by Eric Brewer in 2000 and formalized by Gilbert and Lynch in 2002, states that a distributed system can provide at most two of these three guarantees simultaneously:

1. **Consistency (C)**: Every read receives the most recent write
2. **Availability (A)**: Every request receives a response (success or failure)
3. **Partition Tolerance (P)**: The system continues to operate despite network partitions

In practice, network partitions are inevitable in distributed systems (P is mandatory), so the real trade-off is between **CP** (consistency + partition tolerance) and **AP** (availability + partition tolerance).

**CP Systems (Consistency over Availability):**
- **Example**: HBase, MongoDB (with majority writes), ZooKeeper
- **Trade-off**: During a partition, unavailable nodes reject writes to maintain consistency
- **Use case**: Financial transactions (you can't have stale account balances)

**AP Systems (Availability over Consistency):**
- **Example**: Cassandra, DynamoDB, Riak
- **Trade-off**: During a partition, nodes accept writes, leading to temporary inconsistency (resolved via eventual consistency)
- **Use case**: Social media feeds (stale data is acceptable, downtime is not)

**Real-World Design Implications:**
1. **Banking systems** use CP databases (MongoDB with strong consistency, Google Spanner) because correctness > availability
2. **Shopping carts** use AP databases (DynamoDB) with conflict resolution (last-write-wins or merge strategies)
3. **Netflix** uses Cassandra (AP) because showing slightly stale movie recommendations is fine, but the service must always be up

**Modern Nuance:**
The CAP theorem is often oversimplified. Real systems use tunable consistency (e.g., Cassandra's quorum reads) to adjust the CP/AP trade-off per operation..."

[Response continues with diagrams, mathematical proof sketch, references to papers]

# Compare to Sonnet:
> /model sonnet
> [Same question]

Sonnet Response:
"The CAP theorem says you can only have 2 of 3: Consistency, Availability, Partition tolerance. Most systems choose either CP (like MongoDB) or AP (like Cassandra) based on whether they prioritize consistency or availability during network failures."

# Opus provides 10x more depth with real-world examples, trade-offs, and references.

Solution 5:

$ kiro-cli

> /model haiku
Switched to Claude Haiku 4.5 (0.4x credits)

> Read these files and summarize their purpose:
> src/auth.ts, src/database.ts, src/routes.ts, src/middleware.ts, src/utils.ts, src/config.ts, src/server.ts, src/models/user.ts, src/models/post.ts, src/services/email.ts

[Timer start]

Haiku Response (3 seconds total):
"Here's a summary of each file's purpose:
- **src/auth.ts**: Authentication logic (JWT tokens, password hashing)
- **src/database.ts**: Database connection setup (Prisma client)
- **src/routes.ts**: API route definitions
- **src/middleware.ts**: Express middleware (auth, error handling, logging)
- **src/utils.ts**: Utility functions (validation, formatting)
- **src/config.ts**: Configuration management (env variables, constants)
- **src/server.ts**: Server initialization (Express app setup)
- **src/models/user.ts**: User data model (schema, types)
- **src/models/post.ts**: Post data model (schema, types)
- **src/services/email.ts**: Email sending service (SMTP configuration)"

[Timer end: 3 seconds]

# Comparison with Auto Router:
> /model auto
> [Same task]

Auto Router Response (4 seconds total, likely used Haiku):
[Similar output with slightly more detail on relationships between files]

# Result: Haiku is 25% faster and produces acceptable quality for this simple summarization task
# Cost savings: 0.4x vs 1x (60% cheaper with Haiku)

Chapter 8: Tools, Permissions, and Safety - The Security Boundary

Fundamentals

Every Kiro tool—read, write, shell, MCP call—carries risk. A single rm -rf / or accidental API key commit can destroy a project. Kiro’s tool permission system is the safety mechanism that prevents catastrophic mistakes while maintaining workflow velocity. Unlike traditional permission systems (e.g., Unix file permissions), Kiro’s model is declarative and intent-based: you pre-authorize trusted tools and patterns, and Kiro prompts for everything else.

The core abstraction is the allowedTools field in agent configuration. This whitelist defines which tools can execute without user approval. Think of it as a firewall rule: “allow @builtin/read for *.ts files” or “block @mcp/postgres/execute_query unless user confirms.” The system supports wildcards (r*, w*), MCP server namespaces (@fetch, @github/*), and built-in tool groups (@builtin).

Why this matters: Without tool governance, AI agents become unpredictable executors. With it, you create specialized agents for specific tasks (e.g., a “read-only analyzer” agent that can never modify code). Tool permissions also enable audit trails—you know exactly what Kiro did and when it asked for your approval.

Deep Dive

The Permission Model (500+ Words)

Kiro’s tool permission system operates on three layers:

1. Built-in Tools (@builtin namespace)

Kiro CLI ships with 20+ built-in tools: read, write, edit, shell, grep, glob, and more. Each tool has a unique identifier:

  • @builtin/read - Read files
  • @builtin/write - Create/overwrite files
  • @builtin/edit - Modify existing files
  • @builtin/shell - Execute shell commands
  • @builtin/mcp_call - Invoke MCP server tools

By default, Kiro prompts for approval before using any tool. To pre-approve all built-in tools for a specific agent, add "allowedTools": ["@builtin"] to the agent config. For finer control, use wildcards: ["r*", "grep", "glob"] allows read, grep, and glob but blocks write/edit/shell.

2. MCP Tools (Server Namespace)

MCP servers expose their own tools with namespaced IDs: @github/list_issues, @postgres/query, @fetch/get. The permission model supports:

  • Server-level allowlisting: ["@github"] trusts all GitHub server tools
  • Tool-level patterns: ["@github/list_*"] allows list_issues, list_prs, etc.
  • Explicit tool names: ["@postgres/query"] grants access only to query, not execute or delete

Example MCP permission config:

{
  "allowedTools": [
    "@github/list_issues",
    "@github/create_comment",
    "@postgres/query"
  ]
}

3. Path and Command Restrictions

Beyond tool names, Kiro supports content-based rules:

  • toolsSettings.allowedPaths: Restrict file operations to specific directories
  • toolsSettings.deniedCommands: Block dangerous shell commands (e.g., ["rm -rf", "dd if=/dev/zero"])
  • toolsSettings.trustedCommands: Auto-approve specific shell patterns

Critical Security Concepts:

a. The Trust-All Anti-Pattern

Kiro supports /tools trust-all and /acceptall commands for rapid prototyping. NEVER use these in production. They disable all approval prompts, including:

  • Deleting files
  • Executing arbitrary shell commands
  • Committing to Git
  • Deploying to production
  • Modifying database schema

b. Wildcard Risks

Using "allowedTools": ["*"] is not supported—Kiro requires explicit patterns. Even "@builtin" (all built-in tools) should be used cautiously. A safer approach:

{
  "allowedTools": ["r*", "grep", "glob"],  // Read-only tools
  "deniedTools": ["write", "edit", "shell"]  // Explicitly block writes
}

c. Hook-Based Guardrails

Tool permissions are static (defined in config). For dynamic safety checks, use hooks:

  • preToolUse hook: Scans write/edit content for secrets (API keys, passwords)
  • postToolUse hook: Auto-formats code after write operations
  • userPromptSubmit hook: Warns if the user asks Kiro to delete production data

Example Security Hook (preToolUse):

#!/bin/bash
# .kiro/hooks/preToolUse.sh
TOOL_NAME=$1
ARGS=$2

if [[ "$TOOL_NAME" == "write" || "$TOOL_NAME" == "edit" ]]; then
  # Scan for common secret patterns
  if echo "$ARGS" | grep -E '(API_KEY|PASSWORD|SECRET)'; then
    echo "🚨 ERROR: Detected potential secret in file content"
    exit 2  # Block the tool execution
  fi
fi
exit 0  # Allow execution

d. Audit and Compliance

For regulated industries (finance, healthcare), tool execution logs are critical. Kiro supports:

  • Session transcripts: Full record of every tool call with arguments
  • Hook logging: Custom hooks can write to SIEM systems
  • MCP server logs: External tools (databases, APIs) log independently

Real-World Example: Fintech Security Setup

A financial services company configured Kiro with:

{
  "allowedTools": [
    "@builtin/read",
    "@builtin/grep",
    "@postgres/query"  // Read-only SQL
  ],
  "deniedTools": [
    "@builtin/write",
    "@builtin/edit",
    "@postgres/execute",
    "@postgres/delete"
  ],
  "hooks": {
    "preToolUse": ".kiro/hooks/secret-scanner.sh",
    "postToolUse": ".kiro/hooks/audit-logger.sh"
  }
}

Result: 35% reduction in security vulnerabilities from accidental secret commits (AWS case study, 2024).

How This Fits on Projects

You’ll apply tool permissions in Projects 18 (Security Firewall Hook) and Project 24 (Secret Sanitizer Hook). Both projects build guardrails using the permission system + hooks.

Definitions & Key Terms

  • allowedTools: Whitelist of tools that execute without approval
  • deniedTools: Blacklist that overrides allowedTools
  • @builtin: Namespace for Kiro’s built-in tools (read, write, shell, etc.)
  • @server: Namespace for MCP server tools (e.g., @github, @postgres)
  • toolsSettings: Fine-grained configuration for paths, commands, and approval behavior
  • preToolUse hook: Script that runs before a tool executes (can block execution)
  • trust-all mode: Dangerous mode that disables all approvals (production anti-pattern)

Mental Model Diagram

┌──────────────────────────────────────────────────────────┐
│         KIRO TOOL PERMISSION SYSTEM (3 Layers)           │
└──────────────────────────────────────────────────────────┘

User Request: "Read auth.ts, check for SQL injection, and fix it"
│
│
▼
┌────────────────────────────────────────────────────────┐
│  LAYER 1: STATIC PERMISSION CHECK (allowedTools)       │
├────────────────────────────────────────────────────────┤
│  Tool: read(auth.ts)                                   │
│  Allowed: ✅ (@builtin/read in allowedTools)           │
│  Action: Execute without prompt                        │
└────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│  LAYER 2: DYNAMIC HOOK CHECK (preToolUse)              │
├────────────────────────────────────────────────────────┤
│  Hook: .kiro/hooks/preToolUse.sh                       │
│  Logic: Scan for secrets, check file size             │
│  Result: ✅ Exit 0 (allow)                             │
└────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│  LAYER 3: USER APPROVAL (if needed)                    │
├────────────────────────────────────────────────────────┤
│  Tool: edit(auth.ts) - NOT in allowedTools             │
│  Kiro: "I want to edit auth.ts. Approve? [Y/n/diff]"  │
│  User: Y                                                │
│  Action: Execute + log approval                        │
└────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│  LAYER 4: POST-EXECUTION HOOK (postToolUse)            │
├────────────────────────────────────────────────────────┤
│  Hook: .kiro/hooks/postToolUse.sh                      │
│  Logic: Run prettier, log to audit.log                │
│  Result: ✅ Formatted code + audit trail                │
└────────────────────────────────────────────────────────┘

PERMISSION HIERARCHY (Most to Least Restrictive):

1. deniedTools        → Always block (highest priority)
2. allowedTools       → Pre-approve (no prompt)
3. preToolUse hook    → Dynamic validation (can block)
4. User approval      → Manual confirmation (fallback)
5. trust-all mode     → Disable all checks (DANGER)

How It Works

Step-by-step execution flow:

  1. Tool Request: Kiro decides to use a tool (e.g., write(newFile.ts))
  2. Denied Check: Is the tool in deniedTools? If yes → block + show error
  3. Allowed Check: Is the tool in allowedTools (or matches wildcard)? If yes → proceed to hook check
  4. Hook Execution: Run preToolUse hook (if configured). If exit code 2 → block, if 0 → allow
  5. User Approval: If not allowed and hook passed, prompt user: “Approve write(newFile.ts)? [Y/n/diff]”
  6. Execute Tool: Perform the operation (write file, run command, call MCP)
  7. Post-Hook: Run postToolUse hook (auto-format, log, notify)

Invariants:

  • deniedTools always wins over allowedTools
  • Hooks can block allowed tools (exit code 2)
  • User approval can override denied tools (manual “/allow” command)

Failure Modes:

  • Over-permissive config: ["@builtin"] + no hooks → accidental file deletions
  • Under-permissive config: Empty allowedTools → constant approval prompts (workflow friction)
  • Hook bugs: Infinite loop in preToolUse → Kiro hangs

Minimal Concrete Example

Scenario: Create a read-only agent that can analyze code but never modify it.

1. Agent config (.kiro/agents/readonly-analyzer.json):

{
  "name": "ReadOnly Analyzer",
  "prompt": "You analyze code for bugs but never modify files.",
  "allowedTools": [
    "@builtin/read",
    "@builtin/grep",
    "@builtin/glob"
  ],
  "deniedTools": [
    "@builtin/write",
    "@builtin/edit",
    "@builtin/shell"
  ]
}

2. Usage:

$ kiro chat --agent readonly-analyzer

You: "Find all SQL injection vulnerabilities in the codebase"

Kiro: [reads files with grep, analyzes patterns]
  "Found 3 potential SQL injection points:
   1. auth.ts:42 - User input concatenated in query
   2. api.ts:128 - Unescaped WHERE clause
   3. db.ts:56 - Dynamic table name"

You: "Fix them"

Kiro: "I cannot modify files (write/edit tools are denied).
       Would you like me to explain the fix instead?"

Result: The agent is guaranteed to be read-only by configuration. No user approval can override deniedTools.

Common Misconceptions

  1. “allowedTools: [‘*’] allows all tools”Wrong. Kiro does not support the * wildcard for all tools. You must use specific patterns like @builtin, r*, or server namespaces.

  2. “Hooks replace allowedTools”Wrong. Hooks are complementary. Static permissions (allowedTools) provide the first line of defense, hooks add dynamic validation.

  3. “trust-all mode is safe for prototyping” ⚠️ Dangerous. Even in dev environments, trust-all can delete .git directories, overwrite production configs, or leak secrets. Use narrow allowedTools instead.

  4. “deniedTools is redundant if allowedTools is restrictive”Wrong. deniedTools is explicit denial—it prevents accidental additions to allowedTools from bypassing safety.

  5. “MCP tools bypass Kiro permissions”Wrong. MCP tools (e.g., @github/delete_repo) follow the same allowedTools rules. You must explicitly allow @github/* or the specific tool.

Check-Your-Understanding Questions

  1. You configure "allowedTools": ["@builtin"] and "deniedTools": ["@builtin/shell"]. Can Kiro execute shell("npm install")?

  2. A preToolUse hook exits with code 1 (error). Does Kiro block the tool or prompt the user?

  3. What’s the difference between "allowedTools": ["@github"] and "allowedTools": ["@github/*"]?

  4. Can a user manually approve a tool that’s in deniedTools?

  5. If allowedTools is empty, does Kiro refuse all tool requests or prompt for every tool?

Check-Your-Understanding Answers

  1. No. deniedTools has higher priority than allowedTools. Even though @builtin allows all built-in tools, @builtin/shell is explicitly denied.

  2. Prompt the user. Only exit code 2 blocks execution. Exit code 1 is treated as a warning, and Kiro proceeds to user approval.

  3. No difference in Kiro. Both syntax forms allow all tools from the @github server. The /* suffix is optional for server-level allowlisting.

  4. Yes, with /allow command. Users can override denied tools manually, but it’s logged and requires explicit confirmation.

  5. Prompts for every tool. Empty allowedTools means “require approval for everything.” It doesn’t block tools—just forces manual confirmation.

Real-World Applications

  1. Regulated Industries (Finance, Healthcare)
    • Use allowedTools: ["@builtin/read", "@builtin/grep"] for compliance analysts
    • Add preToolUse hooks to scan for PII/PHI in file operations
    • Deploy deniedTools for production write operations
  2. CI/CD Pipelines
    • Headless Kiro agents need allowedTools: ["@builtin"] to run autonomously
    • Use hooks to enforce test-before-merge policies
    • Block @builtin/shell with dangerous commands (rm -rf, dd)
  3. Open-Source Projects
    • Contributors can use restrictive allowedTools to prevent accidental damage
    • Maintainers use hooks to auto-format code and run linters
  4. Educational Environments
    • Students use read-only agents to learn codebases without risk
    • Instructors use hooks to enforce style guides and prevent plagiarism

Where You’ll Apply It

  • Project 5 (Steering Enforcer): Uses allowedTools to create constrained agents
  • Project 18 (Security Firewall Hook): Builds preToolUse hooks for secret scanning
  • Project 24 (Secret Sanitizer Hook): Implements dynamic content validation
  • Project 34 (Cloud Native Deployer): Uses deniedTools to prevent accidental production deploys

References

Key Insights

The Golden Rule of Tool Permissions: Deny by default, allow explicitly. If you wouldn’t let a junior developer run a command unsupervised, don’t add it to allowedTools.

Summary

Kiro’s tool permission system is a three-layer security boundary: static allowedTools (config-based whitelist), dynamic hooks (runtime validation), and user approval (human-in-the-loop). The system prevents catastrophic mistakes (accidental deletions, secret leaks) while maintaining workflow velocity through pre-authorized tool patterns. Critical concepts include the @builtin namespace (20+ built-in tools), MCP server namespaces (@github, @postgres), wildcards (r, w), and the trust-all anti-pattern (never use in production). Advanced usage involves deniedTools (explicit blocklist), path restrictions (allowedPaths), and audit logging for compliance.

Homework/Exercises to Practice the Concept

Exercise 1: Design a Read-Only Agent Create an agent configuration that can analyze TypeScript code but cannot modify files or execute shell commands. The agent should be able to:

  • Read all .ts files in src/
  • Search for patterns with grep
  • List files with glob
  • Cannot: Write, edit, or run shell commands

Exercise 2: Build a Secret Scanner Hook Write a preToolUse hook (in bash or Python) that:

  1. Intercepts write/edit tool calls
  2. Scans the file content for common secret patterns:
    • API_KEY=...
    • password=...
    • -----BEGIN PRIVATE KEY-----
  3. Blocks execution (exit 2) if secrets are detected
  4. Logs the blocked attempt to a file

Exercise 3: MCP Permission Scoping Given this MCP server config:

{
  "mcpServers": {
    "github": { "command": "npx", "args": ["@modelcontextprotocol/server-github"] },
    "postgres": { "command": "npx", "args": ["@modelcontextprotocol/server-postgres"] }
  }
}

Design an agent that can:

  • List GitHub issues and PRs
  • Create comments on issues
  • Cannot: Delete repositories, close issues, or execute SQL queries

Solutions to the Homework/Exercises

Solution 1: Read-Only Agent

{
  "name": "TypeScript Analyzer",
  "prompt": "Analyze TypeScript code for bugs, patterns, and best practices. You cannot modify files.",
  "allowedTools": [
    "@builtin/read",
    "@builtin/grep",
    "@builtin/glob"
  ],
  "deniedTools": [
    "@builtin/write",
    "@builtin/edit",
    "@builtin/shell",
    "@builtin/mcp_call"
  ],
  "toolsSettings": {
    "allowedPaths": ["src/**/*.ts", "tests/**/*.ts"]
  }
}

Why this works:

  • allowedTools includes only read-oriented tools
  • deniedTools explicitly blocks all modification tools (defense-in-depth)
  • allowedPaths restricts reads to TypeScript files only (prevents reading .env, etc.)

Solution 2: Secret Scanner Hook

#!/bin/bash
# .kiro/hooks/preToolUse.sh

TOOL_NAME=$1
TOOL_ARGS=$2

# Only scan write/edit operations
if [[ "$TOOL_NAME" != "write" && "$TOOL_NAME" != "edit" ]]; then
  exit 0  # Allow other tools
fi

# Extract file content from tool arguments (simplified)
FILE_CONTENT=$(echo "$TOOL_ARGS" | jq -r '.content' 2>/dev/null)

# Secret patterns (regex)
SECRET_PATTERNS=(
  "API_KEY\s*=\s*['\"][A-Za-z0-9_-]+"
  "password\s*=\s*['\"][^'\"]+['\"]"
  "-----BEGIN (RSA |EC )?PRIVATE KEY-----"
  "ghp_[A-Za-z0-9]{36}"  # GitHub Personal Access Token
)

# Scan for secrets
for PATTERN in "${SECRET_PATTERNS[@]}"; do
  if echo "$FILE_CONTENT" | grep -E -q "$PATTERN"; then
    echo "🚨 BLOCKED: Detected secret pattern '$PATTERN' in file content"
    echo "$(date): BLOCKED write/edit - secret detected" >> .kiro/logs/security.log
    exit 2  # Block the tool
  fi
done

echo "✅ Secret scan passed"
exit 0  # Allow tool execution

Why this works:

  • Intercepts write/edit before execution
  • Uses regex to detect common secret formats
  • Exit code 2 blocks the tool (Kiro’s convention)
  • Logs blocked attempts for audit

Solution 3: MCP Permission Scoping

{
  "name": "GitHub Issue Manager",
  "prompt": "Manage GitHub issues and comments. You cannot delete repos or execute SQL.",
  "allowedTools": [
    "@builtin/read",
    "@github/list_issues",
    "@github/list_pull_requests",
    "@github/create_comment"
  ],
  "deniedTools": [
    "@github/delete_repository",
    "@github/close_issue",
    "@postgres"
  ]
}

Why this works:

  • allowedTools lists specific GitHub tools (not @github/*)
  • deniedTools blocks dangerous GitHub operations
  • Denying @postgres prevents all database operations (server-level block)

Chapter 9: Remote and Enterprise Workflows - Development Without Boundaries

Fundamentals

Traditional development workflows assume one machine: your laptop. But modern engineering happens across SSH tunnels to production servers, inside Docker containers, through corporate proxies, and across distributed teams. Kiro CLI is designed for headless, remote-first operation—you can run it on a Linux server via SSH, inside a CI/CD pipeline, or through a VPN tunnel to a corporate network.

The core capability is headless mode: Kiro operates without a browser UI, accepting commands from stdin and outputting results to stdout. This enables:

  • SSH sessions: Debug production issues on remote servers without losing Kiro context
  • CI/CD integration: Run Kiro agents in GitHub Actions, GitLab CI, or Jenkins
  • Corporate environments: Operate through proxies, firewalls, and air-gapped networks
  • Containerized workflows: Run Kiro inside Docker with MCP servers as sidecars

Why this matters: Local-only AI tools break down in enterprise environments. Kiro’s remote-first architecture means you never lose cognitive assistance—whether you’re SSH’d into a production database, debugging a Kubernetes pod, or working from a locked-down corporate network.

Deep Dive

Headless and SSH Workflows (500+ Words)

1. SSH Remote Development

Kiro CLI can run on any Linux server with glibc 2.34+ (or use the musl build for older systems). The typical workflow:

# On your local machine
$ ssh user@production-server

# On the remote server
$ kiro chat
You: "Analyze the nginx error logs from the last hour"

Kiro: [reads /var/log/nginx/error.log, analyzes patterns]
  "Found 42 errors:
   - 38 upstream timeouts (backend server slow)
   - 3 SSL handshake failures (cert expiring?)
   - 1 disk full warning (/var/log at 95%)"

You: "Show me the slowest API endpoint from access logs"

Kiro: [parses access.log, aggregates by endpoint]
  "Slowest endpoints (average response time):
   1. /api/reports/generate - 8.2s
   2. /api/users/search - 3.1s
   3. /api/dashboard - 1.9s"

Why this works: Kiro’s context window (200K tokens) persists across the SSH session. You can spend hours debugging without re-explaining the problem.

2. Authentication in Headless Mode

Kiro requires authentication, but headless servers don’t have browsers for OAuth flows. The solution: SSH port forwarding.

Setup (one-time):

# On remote server, edit SSH config
$ sudo nano /etc/ssh/sshd_config

# Add these lines:
AcceptEnv KIRO_AUTH_TOKEN
AllowStreamLocalForwarding yes

# Restart SSH
$ sudo systemctl restart sshd

Login workflow:

# On your local machine, authenticate once
$ kiro auth login
# Browser opens, you sign in, token stored locally

# SSH to remote with port forwarding
$ ssh -R 52698:localhost:52698 user@server

# On remote, Kiro uses the forwarded auth
$ kiro chat
# Works without re-authentication!

3. CI/CD Integration (GitHub Actions Example)

Kiro can run as a bot in your CI pipeline to auto-fix lint errors, generate tests, or refactor code.

Example: Auto-fix failing tests

# .github/workflows/kiro-fix-tests.yml
name: Kiro Test Fixer

on:
  push:
    branches: [main, develop]

jobs:
  fix-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Kiro CLI
        run: |
          curl -fsSL https://kiro.dev/install.sh | bash
          echo "$HOME/.kiro/bin" >> $GITHUB_PATH

      - name: Authenticate Kiro
        env:
          KIRO_API_KEY: ${{ secrets.KIRO_API_KEY }}
        run: kiro auth login --api-key $KIRO_API_KEY

      - name: Run tests and fix failures
        run: |
          npm test || true  # Don't fail if tests fail
          kiro chat --headless --prompt "Run npm test. If tests fail, analyze the errors and fix them. Commit the fixes with message 'fix: auto-fix failing tests [kiro]'"

      - name: Push fixes
        run: |
          git config user.name "Kiro Bot"
          git config user.email "kiro@company.com"
          git push origin HEAD

Result: Tests fail → Kiro analyzes errors → Fixes code → Commits → Pushes. Fully automated.

4. Corporate Proxy Navigation

Many enterprises block direct internet access, forcing all traffic through HTTP proxies. Kiro supports standard proxy environment variables:

Setup:

# In .bashrc or .zshrc
export HTTP_PROXY=http://proxy.company.com:8080
export HTTPS_PROXY=https://proxy.company.com:8080
export NO_PROXY=localhost,127.0.0.1,.company.local

# For authenticated proxies
export HTTP_PROXY=http://username:password@proxy.company.com:8080

MCP servers through proxies:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["@modelcontextprotocol/server-github"],
      "env": {
        "HTTP_PROXY": "http://proxy.company.com:8080",
        "HTTPS_PROXY": "https://proxy.company.com:8080"
      }
    }
  }
}

5. Air-Gapped Networks (Offline Mode)

In secure environments (defense, finance), servers have no internet access. Kiro supports offline operation with pre-downloaded models:

Setup:

# On an internet-connected machine, download models
$ kiro models download claude-sonnet-4.5
$ kiro models download claude-haiku-4.5

# Copy .kiro/models/ to the air-gapped server
$ scp -r ~/.kiro/models user@secure-server:~/.kiro/

# On the air-gapped server
$ kiro chat --offline

Limitations: MCP servers that require internet (GitHub, AWS) won’t work. Use file-based MCP servers (SQLite, local Postgres).

6. Distributed Teams (Shared Context)

Teams can share Kiro configurations and knowledge bases via Git.

Example: Team-wide agent library

# Create a shared config repo
$ git init kiro-team-config
$ cd kiro-team-config

# Add team agents
$ mkdir -p agents
$ cat > agents/security-scanner.json <<EOF
{
  "name": "Security Scanner",
  "prompt": "Scan code for vulnerabilities (SQL injection, XSS, etc.). Never modify files.",
  "allowedTools": ["@builtin/read", "@builtin/grep"],
  "deniedTools": ["@builtin/write", "@builtin/edit", "@builtin/shell"]
}
EOF

# Push to team repo
$ git add . && git commit -m "Add security scanner agent"
$ git push origin main

# Team members clone and link
$ git clone https://github.com/company/kiro-team-config
$ ln -s $(pwd)/kiro-team-config ~/.kiro/team

Usage: All team members can now run kiro chat --agent team/security-scanner.

Real-World Example: SRE Team at Fintech Corp

An SRE team manages 200+ microservices across 50 Kubernetes clusters. They configured Kiro with:

  1. SSH access to jump boxes (bastion hosts) with port forwarding
  2. MCP servers for Kubernetes (kubectl proxy as MCP)
  3. Shared agent library (incident-responder, log-analyzer, metric-detective)
  4. Audit hooks (all shell commands logged to Splunk)

Result: Incident response time reduced from 45 minutes to 12 minutes. Kiro analyzes logs, identifies root causes, and suggests fixes—all without leaving the SSH session.

How This Fits on Projects

You’ll apply remote workflows in:

  • Project 21 (Headless Server Setup): Configure Kiro for SSH and CI/CD
  • Project 22 (SSH Tunnel Agent): Build agents that work through SSH tunnels
  • Project 23 (Corporate Proxy Navigator): Handle proxy auth and certificate trust

Definitions & Key Terms

  • Headless mode: Running Kiro without a browser UI (stdin/stdout interaction)
  • SSH port forwarding: Tunneling Kiro’s auth server through SSH (-R flag)
  • glibc: GNU C Library required for Kiro (2.34+ on host)
  • musl build: Alternative Kiro binary for older Linux distributions
  • CI/CD integration: Running Kiro agents in automated pipelines (GitHub Actions, GitLab CI)
  • Air-gapped network: Isolated environment with no internet access
  • Jump box (bastion): Hardened server used as SSH gateway to production

Mental Model Diagram

┌──────────────────────────────────────────────────────────────┐
│           KIRO REMOTE WORKFLOW ARCHITECTURE                  │
└──────────────────────────────────────────────────────────────┘

LOCAL MACHINE (Developer Laptop)
┌────────────────────────────────────────┐
│  $ ssh -R 52698:localhost:52698 user@  │
│    production-db-01                    │
│                                        │
│  [Kiro Auth Token] ────────┐          │
│  ~/.kiro/auth.json          │          │
└─────────────────────────────┼──────────┘
                              │ SSH Tunnel
                              │ (Port 52698)
                              ▼
REMOTE SERVER (Production Database)
┌────────────────────────────────────────┐
│  $ kiro chat                           │
│                                        │
│  Kiro CLI ◄──────────────────┐        │
│    │                          │        │
│    │ Uses tunneled auth       │        │
│    ▼                          │        │
│  ┌──────────────────────┐    │        │
│  │  MCP Server: Postgres│    │        │
│  │  (localhost:5432)    │────┘        │
│  └──────────────────────┘             │
│                                        │
│  User: "Show me slow queries from the  │
│         last hour"                     │
│                                        │
│  Kiro: [queries pg_stat_statements]   │
│  "Top 3 slow queries:                  │
│   1. SELECT * FROM orders WHERE ... 8s │
│   2. UPDATE users SET ... 5.2s         │
│   3. DELETE FROM logs WHERE ... 3.1s"  │
└────────────────────────────────────────┘

CI/CD PIPELINE (GitHub Actions)
┌────────────────────────────────────────┐
│  workflow: kiro-test-fixer             │
│  ┌──────────────────────────────────┐  │
│  │ 1. Checkout code                 │  │
│  │ 2. Install Kiro CLI              │  │
│  │ 3. Auth with API key             │  │
│  │ 4. Run tests                     │  │
│  │ 5. If fail: kiro fix-tests       │  │
│  │ 6. Commit + push fixes           │  │
│  └──────────────────────────────────┘  │
│                                        │
│  Environment: Ubuntu 22.04             │
│  Context: Headless (no browser)        │
│  Auth: KIRO_API_KEY secret             │
└────────────────────────────────────────┘

CORPORATE PROXY SETUP
┌────────────────────────────────────────┐
│  Developer Workstation (Behind Proxy)  │
│  ┌──────────────────────────────────┐  │
│  │ export HTTP_PROXY=               │  │
│  │   http://proxy.corp.com:8080     │  │
│  │                                  │  │
│  │ Kiro CLI ──► Proxy ──► Internet  │  │
│  │              ▲                   │  │
│  │              │ Authenticated     │  │
│  │              │ (user:pass)       │  │
│  └──────────────┴───────────────────┘  │
│                                        │
│  MCP Servers inherit proxy config      │
└────────────────────────────────────────┘

KEY INSIGHT: Kiro's architecture is "context-first, location-agnostic"
            You can SSH into a server and continue a conversation from
            your laptop—context follows you, not the machine.

How It Works

Step-by-step SSH remote workflow:

  1. Local authentication: Run kiro auth login on your laptop (one-time setup)
  2. SSH with port forwarding: ssh -R 52698:localhost:52698 user@server
    • This tunnels Kiro’s auth server (port 52698) from local → remote
  3. Remote Kiro startup: On the server, run kiro chat
    • Kiro detects the tunneled auth port and uses your local credentials
  4. Context persistence: Your conversation state is stored in ~/.kiro/sessions/ on the remote server
  5. MCP servers: Remote Kiro can access local services (postgres, redis) via SSH tunnel or local sockets

Invariants:

  • Auth token must be valid (expires after 7 days, refresh with kiro auth refresh)
  • SSH tunnel must stay open (if connection drops, re-establish tunnel)
  • Remote server must allow AllowStreamLocalForwarding (sshd_config)

Failure Modes:

  • Port 52698 already in use: Another Kiro session is running locally
  • SSH timeout: Corporate firewalls may kill idle SSH sessions (use ServerAliveInterval=60)
  • glibc version mismatch: Server has glibc < 2.34 (solution: use musl build)

Minimal Concrete Example

Scenario: Debug a production API server via SSH.

1. On your laptop, set up port forwarding:

$ ssh -R 52698:localhost:52698 user@api-server-prod

2. On the remote server:

$ kiro chat

You: "Analyze the API access logs for 500 errors in the last hour"

Kiro: [reads /var/log/api/access.log]
  "Found 12 × 500 errors:
   - 8 from POST /api/payments (timeout connecting to payment gateway)
   - 3 from GET /api/users/:id (database connection pool exhausted)
   - 1 from DELETE /api/sessions (Redis connection refused)"

You: "Check if the payment gateway is reachable"

Kiro: [runs curl to payment gateway]
  "Payment gateway is UP but slow:
   - Ping: 250ms (normal: 50ms)
   - /health endpoint: 5s response (normal: 200ms)

   Likely cause: Network congestion or gateway under load."

You: "Increase the payment timeout from 5s to 10s in the config"

Kiro: [edits /etc/api/config.yaml, restarts service]
  "Updated payment_timeout: 10s
   Restarted api.service
   Monitoring for new 500 errors..."

Result: You debugged and fixed a production issue without leaving the SSH session or losing context.

Common Misconceptions

  1. “Kiro requires a desktop environment”Wrong. Kiro CLI is fully headless-compatible. No GUI, no browser required for operation.

  2. “SSH port forwarding is insecure”Wrong. SSH tunneling is encrypted end-to-end. Port 52698 is only accessible on localhost.

  3. “CI/CD bots need human approval for every tool”Wrong. In headless mode with allowedTools: ["@builtin"], Kiro runs autonomously (no prompts).

  4. “Corporate proxies break MCP servers” ⚠️ Partially true. HTTP/HTTPS proxies work fine. SOCKS proxies and authenticated proxies require manual configuration.

  5. “Offline mode disables all features”Wrong. Offline mode only disables internet-dependent MCP servers. All built-in tools (read, write, shell) work normally.

Check-Your-Understanding Questions

  1. You SSH into a server with ssh user@server. Kiro auth fails with “No authentication found.” What’s missing?

  2. In GitHub Actions, you set KIRO_API_KEY as a secret. How does Kiro authenticate in the CI pipeline?

  3. Can you run Kiro in a Docker container without internet access? What limitations exist?

  4. A corporate firewall blocks port 52698. Can you still use Kiro remotely?

  5. What’s the difference between kiro chat --headless and kiro chat in an SSH session?

Check-Your-Understanding Answers

  1. Missing SSH port forwarding. You need ssh -R 52698:localhost:52698 user@server to tunnel authentication from local → remote.

  2. Via environment variable. In the workflow, run kiro auth login --api-key $KIRO_API_KEY. Kiro reads the API key from the env var.

  3. Yes, with limitations. Offline mode works in Docker. You cannot use internet-dependent MCP servers (GitHub, AWS), but local servers (Postgres, SQLite) work fine.

  4. Yes, with a different port. Use ssh -R 12345:localhost:52698 user@server and set KIRO_AUTH_PORT=12345 on the remote.

  5. No difference in behavior. --headless is implicit when stdin/stdout are not a TTY (e.g., in SSH). The flag is mainly for CI/CD clarity.

Real-World Applications

  1. SRE Incident Response
    • SSH into production, analyze logs with Kiro, identify root cause
    • Kiro suggests fixes, you review and apply
    • Result: Faster MTTR (Mean Time To Resolution)
  2. Automated Code Review (CI/CD)
    • GitHub Actions runs Kiro on every PR
    • Kiro checks for security issues, style violations, missing tests
    • Posts review comments on the PR
  3. Database Migration Validation
    • Connect to staging database via SSH tunnel
    • Kiro analyzes schema changes, generates migration scripts
    • Validates data integrity before production deploy
  4. Kubernetes Debugging
    • kubectl exec into a pod, run Kiro
    • Kiro reads logs, analyzes traces, suggests config changes
    • No need to download logs locally

Where You’ll Apply It

  • Project 21 (Headless Server Setup): Configure Kiro for SSH, CI/CD, and Docker
  • Project 22 (SSH Tunnel Agent): Build agents that debug remote systems
  • Project 23 (Corporate Proxy Navigator): Handle proxy authentication and certificates
  • Project 27 (SSH Remote Development Agent): Full remote development workflow

References

Key Insights

The Remote-First Principle: Kiro is designed for “development without boundaries.” Whether you’re SSH’d into a server, inside a Docker container, or behind a corporate proxy, Kiro maintains full context and capability.

Summary

Kiro CLI supports headless, remote-first workflows via SSH port forwarding (authentication tunneling), CI/CD integration (GitHub Actions, GitLab CI), corporate proxy navigation (HTTP_PROXY environment variables), and air-gapped operation (offline mode with pre-downloaded models). The architecture is location-agnostic: context persists across SSH sessions, and MCP servers can run locally or remotely. Critical patterns include SSH tunnel setup (ssh -R 52698:localhost:52698), headless authentication (API keys or tunneled auth), and distributed team workflows (shared agent libraries via Git). Enterprise features include audit logging, proxy authentication, and compliance-friendly deniedTools configurations.

Homework/Exercises to Practice the Concept

Exercise 1: SSH Remote Debugging Setup Set up Kiro on a remote Linux server (use a VM or cloud instance like EC2/DigitalOcean). Configure:

  1. SSH port forwarding for authentication
  2. A custom agent that can read logs but not modify files
  3. An MCP server that connects to a remote PostgreSQL database

Test by SSH’ing into the server and running kiro chat to analyze database query performance.

Exercise 2: CI/CD Test Auto-Fixer Create a GitHub Actions workflow that:

  1. Runs your project’s test suite
  2. If tests fail, triggers Kiro to analyze the failures
  3. Kiro attempts to fix the tests automatically
  4. Commits and pushes the fixes to a new branch
  5. Opens a PR with the fixes

Exercise 3: Corporate Proxy Configuration Simulate a corporate proxy environment (use Squid proxy or similar). Configure:

  1. Environment variables for HTTP_PROXY and HTTPS_PROXY
  2. An MCP server (like @modelcontextprotocol/server-github) that works through the proxy
  3. A custom CA certificate for SSL inspection (common in enterprises)

Verify that Kiro can authenticate and use MCP servers through the proxy.

Solutions to the Homework/Exercises

Solution 1: SSH Remote Debugging Setup

On remote server (install Kiro):

$ curl -fsSL https://kiro.dev/install.sh | bash
$ echo 'export PATH="$HOME/.kiro/bin:$PATH"' >> ~/.bashrc
$ source ~/.bashrc

On remote server (configure SSH):

$ sudo nano /etc/ssh/sshd_config
# Add: AcceptEnv KIRO_AUTH_TOKEN
# Add: AllowStreamLocalForwarding yes
$ sudo systemctl restart sshd

Create read-only log analyzer agent:

$ mkdir -p ~/.kiro/agents
$ cat > ~/.kiro/agents/log-analyzer.json <<EOF
{
  "name": "Log Analyzer",
  "prompt": "Analyze system logs for errors, warnings, and patterns. You cannot modify files.",
  "allowedTools": [
    "@builtin/read",
    "@builtin/grep",
    "@builtin/shell"
  ],
  "deniedTools": [
    "@builtin/write",
    "@builtin/edit"
  ],
  "toolsSettings": {
    "allowedPaths": ["/var/log/**", "/tmp/**"],
    "trustedCommands": ["grep", "tail", "head", "wc"]
  }
}
EOF

Configure Postgres MCP server:

$ cat > ~/.kiro/settings/mcp.json <<EOF
{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": [
        "@modelcontextprotocol/server-postgres",
        "postgresql://user:password@localhost:5432/production_db"
      ]
    }
  }
}
EOF

On local machine (SSH with port forwarding):

$ ssh -R 52698:localhost:52698 user@remote-server

# Now on remote server
$ kiro chat --agent log-analyzer

You: "Show me all 500 errors from nginx in the last hour, then check if the database query time increased"

Kiro: [reads /var/log/nginx/error.log, uses @postgres/query to check pg_stat_statements]

Solution 2: CI/CD Test Auto-Fixer

# .github/workflows/kiro-test-fixer.yml
name: Kiro Test Auto-Fixer

on:
  push:
    branches: [main, develop, feature/*]
  pull_request:

jobs:
  fix-tests:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v3
        with:
          fetch-depth: 0  # Full history for better Kiro context

      - name: Set up Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'

      - name: Install dependencies
        run: npm ci

      - name: Install Kiro CLI
        run: |
          curl -fsSL https://kiro.dev/install.sh | bash
          echo "$HOME/.kiro/bin" >> $GITHUB_PATH

      - name: Authenticate Kiro
        env:
          KIRO_API_KEY: ${{ secrets.KIRO_API_KEY }}
        run: kiro auth login --api-key $KIRO_API_KEY

      - name: Run tests
        id: test
        continue-on-error: true
        run: npm test > test-output.txt 2>&1

      - name: Fix tests with Kiro
        if: steps.test.outcome == 'failure'
        run: |
          kiro chat --headless --prompt "
          I ran 'npm test' and tests failed. Here's the output:

          $(cat test-output.txt)

          Please:
          1. Analyze the test failures
          2. Fix the code to make tests pass
          3. Run 'npm test' again to verify
          4. If tests pass, stage all changes with 'git add .'

          Do NOT commit yet - I'll handle that.
          "

      - name: Verify tests pass
        run: npm test

      - name: Create Pull Request
        if: steps.test.outcome == 'failure'
        uses: peter-evans/create-pull-request@v5
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          commit-message: 'fix: auto-fix failing tests [kiro]'
          branch: kiro/fix-tests-${{ github.sha }}
          title: '🤖 Kiro: Auto-fix failing tests'
          body: |
            Kiro automatically fixed failing tests.

            **Original test output:**
            ```
            $(cat test-output.txt)
            ```

            **Kiro analysis and fixes applied.**

            Please review the changes before merging.

Solution 3: Corporate Proxy Configuration

Set up Squid proxy (for testing):

# On a test machine or Docker container
$ sudo apt-get install squid
$ sudo nano /etc/squid/squid.conf

# Add:
http_port 3128
acl localnet src 192.168.0.0/16
http_access allow localnet

$ sudo systemctl restart squid

Configure Kiro for proxy:

# In ~/.bashrc or ~/.zshrc
export HTTP_PROXY=http://proxy-server:3128
export HTTPS_PROXY=http://proxy-server:3128
export NO_PROXY=localhost,127.0.0.1,.internal

# For authenticated proxies
export HTTP_PROXY=http://username:password@proxy-server:3128

Configure MCP server with proxy:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["@modelcontextprotocol/server-github"],
      "env": {
        "HTTP_PROXY": "http://proxy-server:3128",
        "HTTPS_PROXY": "http://proxy-server:3128",
        "NODE_TLS_REJECT_UNAUTHORIZED": "0"
      }
    }
  }
}

For SSL inspection with custom CA:

# Download corporate CA certificate
$ wget http://internal.company.com/ca-cert.crt

# Add to system trust store
$ sudo cp ca-cert.crt /usr/local/share/ca-certificates/
$ sudo update-ca-certificates

# Configure Node.js to use system CA
$ export NODE_EXTRA_CA_CERTS=/etc/ssl/certs/ca-certificates.crt

Verify Kiro works through proxy:

$ kiro chat

You: "List my GitHub repositories"

Kiro: [uses @github/list_repos through proxy]
  "Found 12 repositories:
   1. company/api-gateway
   2. company/auth-service
   ..."

Chapter 10: Multimodal and Experimental Features - The Cutting Edge

Fundamentals

Kiro CLI ships with experimental features: beta capabilities that push the boundaries of AI-assisted development. These features are toggleable (via /experiment command) and actively evolving based on user feedback. The two major categories are:

  1. Multimodal Input: Processing images, PDFs, screenshots, and diagrams alongside text
  2. Advanced Context Management: Knowledge bases, thinking tool, checkpointing, and tangent mode

Why this matters: Traditional AI tools operate in text-only silos. Kiro’s multimodal and experimental features enable visual debugging (paste a screenshot, Kiro explains the bug), semantic memory (Kiro remembers patterns across all your projects), and non-linear thinking (explore tangents without polluting main context).

Important: Experimental features may change or be removed. Use them to accelerate development, but don’t build production workflows that depend on beta APIs.

Deep Dive

Multimodal Input (Images, Screenshots, PDFs) (500+ Words)

1. Visual Debugging with Screenshots

Kiro’s multimodal agent can analyze UI bugs, architecture diagrams, and whiteboard sketches. The workflow:

$ kiro chat

You: "Here's a screenshot of the bug: /path/to/screenshot.png"

[Kiro analyzes the image]
Kiro: "I see the UI rendering issue:
  - The login button is overlapping the username field
  - CSS issue: z-index conflict between .login-form and .auth-container
  - The mobile breakpoint (@media max-width: 768px) is missing

  Fix:
  ```css
  .auth-container {
    z-index: 10;
  }
  .login-form {
    z-index: 20;
    margin-top: 1rem; /* Prevent overlap */
  }

Would you like me to apply this fix to auth.css?”


**Why this works**: Claude Sonnet 4.5 has strong "spatial reasoning"—it can identify UI misalignments, button positions, and visual hierarchy issues that are hard to describe in text.

**2. Whiteboard to Code**

The Kiro blog showcases "whiteboard → production code" workflows:

**Example: Architecture Diagram → Infrastructure Code**

1. User draws a system architecture on a whiteboard (API Gateway → Lambda → DynamoDB)
2. Takes a photo with phone, uploads to Kiro
3. Kiro generates Terraform configurations that match the diagram

**Result**: From whiteboard sketch to deployed infrastructure in minutes.

**3. PDF Support (Feature Request Status)**

As of 2025, Kiro has **partial PDF support**. You can:
- Drag & drop PDFs into chat (Kiro extracts text)
- Reference PDF documentation (Kiro reads technical specs)

**Limitations**:
- Complex PDF layouts (multi-column, tables) may lose structure
- Image-heavy PDFs require OCR (not yet built-in)

**GitHub Issue #2066** (August 2025) tracks enhanced PDF support with:
- Full table extraction
- Diagram recognition
- Cross-reference linking

**4. Multimodal Use Cases**

| Input Type | Use Case | Example |
|------------|----------|---------|
| **Screenshot** | Debug UI bugs | "Why is this button not clickable?" |
| **Diagram** | Generate infrastructure | "Create Terraform from this AWS diagram" |
| **Whiteboard** | Convert design to code | "Build this API from my sketch" |
| **PDF** | Extract data models | "Read this API spec and generate TypeScript types" |
| **Photo** | Analyze error messages | "What does this terminal error mean?" |

**5. Real-World Example: PyQt6 App from Screenshot**

A developer shared a story (ludditus.com, July 2025): "I showed Kiro a screenshot of a Windows calculator. Kiro generated a fully functional PyQt6 app with the same layout, button handlers, and keyboard shortcuts. First try. No iterations."

**Why this matters**: Multimodal AI collapses "describe the UI" → "write the code" into a single step. You show, Kiro builds.

#### Experimental Features (Knowledge Management, Thinking, Checkpoints)

**1. Knowledge Management (Semantic Search)**

Enable with `/experiment knowledge`.

**What it does**: Kiro indexes your codebase into a **vector database** (embeddings). You can ask semantic questions like:

```bash
You: "Where do we validate credit card payments?"

Kiro: [searches embeddings, finds relevant code]
  "Credit card validation happens in 3 places:
   1. src/payments/stripe.ts:42 - Luhn algorithm check
   2. src/api/checkout.ts:128 - CVV validation
   3. lib/validators/card.ts:18 - Expiry date check"

Why this matters: Traditional grep searches for exact strings. Semantic search finds conceptually similar code, even if the variable names differ.

2. Thinking Tool (Extended Reasoning)

Enable with /experiment thinking.

What it does: Kiro shows its internal reasoning process for complex problems. Instead of jumping to a solution, Kiro breaks down the problem step-by-step.

Example:

You: "Why is my API slow only during peak hours?"

Kiro (with thinking tool):
  **Thinking:**
  1. Hypothesis: Database connection pool exhaustion
     - Check connection pool size
     - Check concurrent requests during peak
  2. Hypothesis: Network latency to external services
     - Check API response times for third-party calls
  3. Hypothesis: CPU throttling under load
     - Check server metrics (CPU, memory, disk I/O)

  [Analyzes logs, checks metrics]

  **Conclusion**: Database connection pool is too small (10 connections).
  During peak hours (1000+ requests/min), connections queue for 5-10s.

  **Fix**: Increase pool size from 10 → 50 in database.yml

Why this matters: For debugging and architecture decisions, seeing Kiro’s reasoning helps you learn instead of just accepting solutions blindly.

3. Checkpointing (Session Versioning)

Enable with /experiment checkpointing.

What it does: Create snapshots of your session state. If you make a mistake, roll back to a checkpoint.

Commands:

  • /checkpoint save "Before refactoring auth"
  • /checkpoint list - Show all checkpoints
  • /checkpoint restore <id> - Revert to checkpoint

Example:

You: "Refactor the auth system to use JWT instead of sessions"

Kiro: [makes changes across 10 files]

You: "Actually, let's stick with sessions. Rollback the changes."

Kiro: [uses /checkpoint restore to revert all 10 files]
  "Restored to checkpoint: 'Before refactoring auth'"

Why this matters: Experimentation without fear. Try radical refactorings, and if they don’t work, revert in one command.

4. Tangent Mode (Context Isolation)

Enable with /experiment tangent.

What it does: Create a side conversation that doesn’t pollute your main context. Useful for:

  • Exploring a design idea without committing to it
  • Debugging a side issue while maintaining main task focus
  • Asking theoretical questions (“What if we used Redis instead of Postgres?”)

Commands:

  • /tangent start "Explore Redis caching"
  • /tangent end - Return to main conversation
  • /tangent list - Show all tangents

Example:

# Main conversation: Building a payment API
You: "Implement Stripe payment processing"

Kiro: [works on Stripe integration]

You: /tangent start "Explore PayPal alternative"

# Now in tangent mode
You: "If we used PayPal instead, what would the code look like?"

Kiro: [generates PayPal integration example]

You: /tangent end

# Back to main conversation
You: "Continue with Stripe"

Why this matters: Your context window is limited (200K tokens). Tangents prevent “context pollution” from exploratory questions.

5. TODO Lists (Automated Task Tracking)

Enable with /experiment todo.

What it does: Kiro automatically creates and manages TODO lists based on your requests.

Example:

You: "Add user authentication, set up database migrations, and write API tests"

Kiro: [creates TODO list]
  TODO:
  1. [ ] Add user authentication (passport.js + JWT)
  2. [ ] Set up database migrations (Knex.js)
  3. [ ] Write API tests (Jest + Supertest)

Kiro: "Starting with TODO #1..."

[Implements auth]

Kiro: "✅ Completed: Add user authentication
       Moving to TODO #2: Set up database migrations"

Why this matters: Kiro tracks multi-step tasks without you needing to remember the plan.

6. Code Intelligence (LSP Integration)

Kiro CLI now integrates with Language Server Protocol (LSP). This means Kiro has access to:

  • Go-to-definition: “Show me where AuthService is defined”
  • Find references: “Where is validateToken called?”
  • Hover information: “What does this function return?”
  • Diagnostics: “Show me all TypeScript errors in the project”

Why this matters: Kiro can navigate your codebase like an IDE—understanding imports, types, and call graphs.

How This Fits on Projects

You’ll apply experimental features in:

  • Project 16 (Design to Code Converter): Uses multimodal input to convert UI mockups
  • Project 28 (Semantic Search Engine): Builds knowledge base indexing
  • Project 30 (Recursive Prompt Improver): Uses thinking tool for meta-reasoning

Definitions & Key Terms

  • Multimodal input: Processing text + images/PDFs in the same conversation
  • Spatial reasoning: AI’s ability to understand visual layouts and UI hierarchies
  • Semantic search: Concept-based search using embeddings (vs exact string match)
  • Thinking tool: Kiro’s extended reasoning mode (shows step-by-step thought process)
  • Checkpointing: Session snapshots that can be restored (like Git for conversations)
  • Tangent mode: Side conversations that don’t pollute main context
  • LSP (Language Server Protocol): Standard for code intelligence (go-to-def, find refs, etc.)
  • Vector embeddings: Numerical representations of code for semantic search

Mental Model Diagram

┌──────────────────────────────────────────────────────────────┐
│       KIRO EXPERIMENTAL FEATURES ARCHITECTURE                │
└──────────────────────────────────────────────────────────────┘

MULTIMODAL INPUT PROCESSING
┌────────────────────────────────────────┐
│  User Input: Text + Image              │
│  ┌──────────────────────────────────┐  │
│  │ "This UI is broken (screenshot)" │  │
│  └───────────────┬──────────────────┘  │
│                  ▼                      │
│  ┌──────────────────────────────────┐  │
│  │ Claude Sonnet 4.5 (Vision)       │  │
│  │ - Spatial reasoning              │  │
│  │ - OCR text extraction            │  │
│  │ - Layout analysis                │  │
│  └───────────────┬──────────────────┘  │
│                  ▼                      │
│  ┌──────────────────────────────────┐  │
│  │ Kiro Analysis + Code Generation  │  │
│  │ "CSS z-index conflict detected"  │  │
│  └──────────────────────────────────┘  │
└────────────────────────────────────────┘

KNOWLEDGE MANAGEMENT (Semantic Search)
┌────────────────────────────────────────┐
│  Codebase Indexing                     │
│  ┌──────────────────────────────────┐  │
│  │ src/**/*.ts → Embeddings         │  │
│  │ (Vector Database)                │  │
│  └───────────────┬──────────────────┘  │
│                  ▼                      │
│  User Query: "Where do we validate     │
│               credit cards?"           │
│                  ▼                      │
│  ┌──────────────────────────────────┐  │
│  │ Semantic Search                  │  │
│  │ (Cosine similarity)              │  │
│  └───────────────┬──────────────────┘  │
│                  ▼                      │
│  ┌──────────────────────────────────┐  │
│  │ Results: stripe.ts:42,           │  │
│  │          validators/card.ts:18   │  │
│  └──────────────────────────────────┘  │
└────────────────────────────────────────┘

THINKING TOOL (Extended Reasoning)
┌────────────────────────────────────────┐
│  Complex Problem: "API slow at peak"   │
│  ┌──────────────────────────────────┐  │
│  │ STEP 1: Hypothesize causes       │  │
│  │ - DB pool exhaustion?            │  │
│  │ - Network latency?               │  │
│  │ - CPU throttling?                │  │
│  ├──────────────────────────────────┤  │
│  │ STEP 2: Gather evidence          │  │
│  │ - Read logs                      │  │
│  │ - Check metrics                  │  │
│  ├──────────────────────────────────┤  │
│  │ STEP 3: Eliminate hypotheses     │  │
│  │ - Network: ✅ Normal latency      │  │
│  │ - CPU: ✅ Low utilization         │  │
│  │ - DB pool: ❌ 10/10 used (CAUSE)  │  │
│  ├──────────────────────────────────┤  │
│  │ STEP 4: Propose solution         │  │
│  │ "Increase pool from 10 → 50"     │  │
│  └──────────────────────────────────┘  │
└────────────────────────────────────────┘

CHECKPOINTING (Session Versioning)
┌────────────────────────────────────────┐
│  Session Timeline                      │
│  ┌──────────────────────────────────┐  │
│  │ Checkpoint 1: "Initial setup"    │  │
│  ├──────────────────────────────────┤  │
│  │ Checkpoint 2: "Auth system done" │  │
│  ├──────────────────────────────────┤  │
│  │ Checkpoint 3: "Before refactor"  │  │
│  ├──────────────────────────────────┤  │
│  │ [Current state]                  │  │
│  │ (Oops, refactor broke tests!)    │  │
│  │                                  │  │
│  │ /checkpoint restore 3            │  │
│  ├──────────────────────────────────┤  │
│  │ Restored: "Before refactor"      │  │
│  └──────────────────────────────────┘  │
└────────────────────────────────────────┘

TANGENT MODE (Context Isolation)
┌────────────────────────────────────────┐
│  Main Context (200K tokens)            │
│  ┌──────────────────────────────────┐  │
│  │ Building Stripe payment API      │  │
│  │ - Current progress: 60%          │  │
│  │ - Files modified: 8              │  │
│  └───────────────┬──────────────────┘  │
│                  │                      │
│  /tangent start "Explore PayPal"       │
│                  ▼                      │
│  ┌──────────────────────────────────┐  │
│  │ Tangent Context (isolated)       │  │
│  │ "If we used PayPal instead..."   │  │
│  │ [Generates PayPal code]          │  │
│  └───────────────┬──────────────────┘  │
│                  │                      │
│  /tangent end                           │
│                  ▼                      │
│  ┌──────────────────────────────────┐  │
│  │ Back to Main Context             │  │
│  │ (Stripe progress preserved)      │  │
│  └──────────────────────────────────┘  │
└────────────────────────────────────────┘

KEY INSIGHT: Experimental features add "dimensions" to Kiro:
- Multimodal: Vision (images/diagrams)
- Knowledge: Memory (semantic search across projects)
- Thinking: Transparency (see reasoning process)
- Checkpointing: Time travel (revert mistakes)
- Tangents: Parallel exploration (context isolation)

How It Works

Step-by-step multimodal workflow:

  1. User uploads image: You: "Debug this UI bug [screenshot.png]"
  2. Image preprocessing: Kiro converts image to base64, sends to Claude API
  3. Vision model analysis: Claude Sonnet 4.5 analyzes pixels, identifies visual elements
  4. Spatial reasoning: AI detects layout issues (overlapping elements, misalignment)
  5. Code generation: Kiro proposes CSS/HTML fixes based on visual analysis
  6. User confirmation: Kiro asks “Apply these fixes?” before modifying files

Invariants:

  • Images must be < 5MB (Kiro compresses larger files)
  • PDFs must be text-extractable (not scanned images)
  • Knowledge base requires /experiment knowledge enabled

Failure Modes:

  • Image too low-res: Kiro may misidentify UI elements
  • Handwritten diagrams: OCR struggles with messy handwriting
  • Context overflow: Very large PDFs can exceed token limits

Minimal Concrete Example

Scenario: Convert a UI mockup to React code using multimodal input.

1. User uploads mockup:

$ kiro chat

You: "Here's a mockup of the login page: mockup.png
     Generate a React component with:
     - Email and password inputs
     - Login button
     - 'Forgot password' link
     - Match the colors and layout from the mockup"

2. Kiro analyzes the image:

Kiro: [processes mockup.png]
  "I see a centered login form with:
   - Primary color: #3B82F6 (blue-500)
   - Background: #F9FAFB (gray-50)
   - Card shadow: subtle elevation
   - Button: rounded-lg, full-width

  Generating LoginPage.tsx..."

[Kiro writes the component]

3. Generated code:

// LoginPage.tsx
import { useState } from 'react';

export default function LoginPage() {
  const [email, setEmail] = useState('');
  const [password, setPassword] = useState('');

  return (
    <div className="min-h-screen bg-gray-50 flex items-center justify-center">
      <div className="bg-white p-8 rounded-lg shadow-md w-full max-w-md">
        <h1 className="text-2xl font-bold text-gray-900 mb-6">Login</h1>

        <form className="space-y-4">
          <div>
            <label className="block text-sm font-medium text-gray-700 mb-2">
              Email
            </label>
            <input
              type="email"
              value={email}
              onChange={(e) => setEmail(e.target.value)}
              className="w-full px-4 py-2 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500"
            />
          </div>

          <div>
            <label className="block text-sm font-medium text-gray-700 mb-2">
              Password
            </label>
            <input
              type="password"
              value={password}
              onChange={(e) => setPassword(e.target.value)}
              className="w-full px-4 py-2 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500"
            />
          </div>

          <button
            type="submit"
            className="w-full bg-blue-500 hover:bg-blue-600 text-white font-medium py-2 px-4 rounded-lg transition"
          >
            Login
          </button>

          <a href="/forgot-password" className="block text-center text-sm text-blue-500 hover:underline">
            Forgot password?
          </a>
        </form>
      </div>
    </div>
  );
}

Result: From mockup image to working React component in one prompt.

Common Misconceptions

  1. “Multimodal only works with professional design mockups”Wrong. Kiro can process hand-drawn sketches, whiteboard photos, and low-fidelity wireframes.

  2. “Experimental features are unstable and will break my workflow” ⚠️ Partially true. APIs may change, but features are production-tested before release. Use them, just don’t rely on undocumented behavior.

  3. “Knowledge base searches replace grep/rg”Wrong. Semantic search is complementary. Use grep for exact matches, knowledge base for conceptual searches.

  4. “Checkpointing works like Git branches”Wrong. Checkpoints are session-scoped (conversation state), not file-scoped. Git tracks code, checkpoints track context.

  5. “Tangent mode is just a second chat window”Wrong. Tangents share the same session but isolate context. You can reference main conversation findings in tangents.

Check-Your-Understanding Questions

  1. You upload a screenshot of a bug to Kiro. What AI capability allows Kiro to identify visual misalignments?

  2. If you enable /experiment knowledge, does Kiro index your entire home directory or just the current project?

  3. What’s the difference between /checkpoint save and git commit?

  4. Can you use tangent mode to explore multiple design alternatives without polluting main context?

  5. If the thinking tool shows Kiro’s reasoning, can you override its conclusions with your own hypothesis?

Check-Your-Understanding Answers

  1. Spatial reasoning (part of Claude Sonnet 4.5’s vision capabilities). It can detect overlapping elements, misaligned text, and layout issues.

  2. Just the current project. Knowledge base indexes files within the current working directory (respects .gitignore). It doesn’t scan your entire filesystem.

  3. Scope: Checkpoints save conversation state (context, tool calls, session history). Git commits save file changes. Checkpoints are temporary (session-only), commits are permanent (repository history).

  4. Yes. Tangent mode isolates exploratory work. You can start tangent “Explore Redux”, then tangent “Explore MobX”, then return to main without context pollution.

  5. Yes. The thinking tool shows reasoning, but you can interrupt with “Actually, let’s try hypothesis X instead.” Kiro will pivot to your approach.

Real-World Applications

  1. Design-to-Code Workflows
    • Upload Figma screenshots, generate React/Vue components
    • Whiteboard → Infrastructure diagrams → Terraform
  2. Visual Debugging
    • Screenshot of browser console → Kiro identifies stack trace root cause
    • Photo of terminal error → Kiro explains cryptic error messages
  3. Semantic Code Search
    • “Where do we handle user authentication?” (finds all auth-related code)
    • “Show me all places we call external APIs” (finds HTTP clients, fetch calls)
  4. Exploratory Refactoring
    • Use tangent mode to explore architecture changes
    • Checkpoint before refactor, restore if tests fail
  5. Learning Complex Codebases
    • Enable thinking tool to see how Kiro navigates unfamiliar code
    • Use knowledge base to find related code by concept, not keywords

Where You’ll Apply It

  • Project 12 (Documentation Librarian): Uses knowledge base for semantic search
  • Project 16 (Design to Code Converter): Uses multimodal input for mockups
  • Project 25 (Tangent Explorer): Builds tools to manage tangent mode
  • Project 30 (Recursive Prompt Improver): Uses thinking tool for meta-reasoning

References

Key Insights

The Multimodal Advantage: “Show, don’t tell” is faster and more accurate than describing visual problems in text. Kiro’s vision capabilities collapse UI debugging from “write detailed bug report” → “paste screenshot” → “fixed.”

Summary

Kiro CLI’s experimental features expand AI-assisted development into multimodal input (images, PDFs, screenshots with spatial reasoning), semantic search (knowledge base with vector embeddings), extended reasoning (thinking tool for transparent problem-solving), session versioning (checkpointing for rollback), and context isolation (tangent mode for side explorations). These features are toggleable via /experiment commands and actively evolving. Multimodal workflows enable design-to-code generation (whiteboard → infrastructure), visual debugging (screenshot → bug fix), and diagram analysis (architecture diagram → Terraform). Advanced context features include LSP integration (go-to-definition, find references), TODO list automation, and semantic search across projects. Critical concepts include spatial reasoning (visual layout understanding), vector embeddings (concept-based search), and context isolation (tangent mode prevents main context pollution).

Homework/Exercises to Practice the Concept

Exercise 1: Multimodal UI Debugging Create a simple React app with an intentional CSS bug (e.g., overlapping buttons, misaligned text). Take a screenshot of the buggy UI. Use Kiro’s multimodal input to:

  1. Upload the screenshot
  2. Ask Kiro to identify the bug
  3. Have Kiro generate the CSS fix
  4. Apply the fix and verify it works

Exercise 2: Semantic Search with Knowledge Base Enable knowledge management in a medium-sized codebase (e.g., an open-source project with 50+ files). Test semantic search by asking:

  1. “Where do we validate user input?”
  2. “Show me all database queries”
  3. “Find code that handles errors”

Compare the results to traditional grep/rg searches. Which finds more relevant code?

Exercise 3: Thinking Tool for Complex Debugging Find a bug in your codebase that you struggled to debug. Enable the thinking tool and ask Kiro to:

  1. Analyze the bug
  2. Show its reasoning process (hypotheses, evidence gathering, elimination)
  3. Propose a fix

Reflect on whether seeing Kiro’s reasoning helps you understand the bug better.

Solutions to the Homework/Exercises

Solution 1: Multimodal UI Debugging

Create buggy React app:

// App.tsx
export default function App() {
  return (
    <div className="container">
      <input type="text" placeholder="Username" className="input" />
      <button className="submit-button">Login</button>
    </div>
  );
}

// App.css (intentional bug)
.container {
  display: flex;
  flex-direction: column;
}

.input {
  width: 200px;
  padding: 8px;
  z-index: 10; /* Bug: z-index conflict */
}

.submit-button {
  width: 200px;
  padding: 8px;
  z-index: 20;
  margin-top: -10px; /* Bug: negative margin causes overlap */
}

Take screenshot and debug with Kiro:

$ kiro chat

You: "This UI has a bug where the button overlaps the input. Here's a screenshot: bug.png"

Kiro: [analyzes screenshot]
  "I can see the CSS issue:
   - The button has a negative top margin (-10px) that pushes it up
   - This causes it to overlap the input field

  Fix:
  ```css
  .submit-button {
    margin-top: 8px; /* Change from -10px to positive spacing */
  }

Would you like me to apply this fix to App.css?”

You: “Yes, apply the fix”

Kiro: [edits App.css] “✅ Fixed: Updated .submit-button margin-top to 8px”


**Verify the fix works by refreshing the app.**

**Solution 2: Semantic Search with Knowledge Base**

**Enable knowledge management:**
```bash
$ cd my-project
$ kiro chat

You: /experiment knowledge

Kiro: "✅ Knowledge management enabled. Indexing codebase..."
  [Indexes all files, creates vector embeddings]
  "Indexed 234 files (45,000 lines of code)"

Test semantic search:

You: "Where do we validate user input?"

Kiro: [semantic search via embeddings]
  "Found 4 validation locations:
   1. src/middleware/validation.ts:18 - Joi schema validation
   2. src/api/users.ts:45 - Email format check (regex)
   3. src/utils/sanitize.ts:12 - XSS prevention (DOMPurify)
   4. lib/validators/password.ts:8 - Password strength check"

You: "Show me all database queries"

Kiro: [semantic search]
  "Found 12 database query locations:
   1. src/db/users.ts:22 - SELECT * FROM users WHERE...
   2. src/db/posts.ts:56 - INSERT INTO posts...
   3. src/api/auth.ts:89 - UPDATE users SET last_login...
   [... 9 more]"

Compare with traditional grep:

$ rg "SELECT|INSERT|UPDATE|DELETE" --type ts

# Result: Finds exact SQL keywords, but misses:
# - Knex query builder calls (no raw SQL)
# - TypeORM methods (findOne, save, etc.)
# - Prisma client calls (prisma.user.create)

# Semantic search finds ALL database operations, regardless of ORM/syntax

Solution 3: Thinking Tool for Complex Debugging

Bug scenario: API endpoint /api/users/:id returns 500 errors intermittently.

Enable thinking tool and debug:

$ kiro chat

You: /experiment thinking

You: "Our /api/users/:id endpoint returns 500 errors sometimes, but not always. Debug this."

Kiro (with thinking tool):
  **THINKING:**

  **Hypothesis 1: Race condition in database query**
  - Intermittent failures suggest non-deterministic behavior
  - Check if multiple requests to same user ID conflict
  - Evidence needed: Database logs, request timing

  [Reads src/api/users.ts, checks database connection code]

  **Observation 1**: Using connection pool with 10 connections
  - Potential bottleneck if > 10 concurrent requests

  **Hypothesis 2: Null reference error for missing users**
  - Check if error happens when user doesn't exist
  - Evidence needed: Error logs, stack traces

  [Reads error logs]

  **Observation 2**: Stack trace shows "Cannot read property 'id' of undefined"
  - Line 42: `return user.id` without null check
  - Happens when `getUserById(id)` returns null (user not found)

  **ROOT CAUSE IDENTIFIED**: Missing null check after database query

  **SOLUTION:**
  ```typescript
  // src/api/users.ts:42 (before)
  const user = await getUserById(id);
  return user.id; // ❌ Crashes if user is null

  // Fixed:
  const user = await getUserById(id);
  if (!user) {
    throw new NotFoundError(`User ${id} not found`);
  }
  return user.id; // ✅ Safe

You: “Perfect! Seeing your reasoning helped me understand that I should always null-check database results.”

Reflection: The thinking tool shows:

  1. How Kiro generates hypotheses
  2. What evidence it gathers
  3. How it eliminates false leads
  4. The final reasoning path

This transparency helps developers learn debugging strategies, not just get answers.


Glossary

  • Agent: A specialized Kiro persona with defined tools, permissions, and prompt
  • Auto Router: Kiro’s intelligent model selection system (Haiku for simple, Opus for complex)
  • Checkpoint: Session snapshot that can be restored later
  • Context: The information Kiro has access to (files, chat history, steering)
  • Delegate: Background task execution (run tests while you work)
  • Hook: Event-driven automation script (PreToolUse, PostToolUse, UserPromptSubmit)
  • Knowledge Base: Vector-indexed codebase for semantic search beyond context limits
  • MCP (Model Context Protocol): Standard for connecting Kiro to external tools (databases, APIs, etc.)
  • Power: Bundled capability package (MCP + steering + hooks) for specific frameworks
  • REVL Loop: Read → Evaluate → Verify → Loop (Kiro’s execution model)
  • Steering: Markdown files that encode project standards and constraints
  • Subagent: Parallel Kiro instance with isolated context for large-scale analysis
  • Tangent: Side conversation that doesn’t pollute main context
  • Tool: Built-in Kiro capability (read, write, shell, grep, etc.)

Why Kiro CLI Matters

Modern Motivation (2025)

In 1972, Dennis Ritchie gave us C and direct memory access. In 2025, AWS gives us Kiro CLI and direct cognitive access. The terminal has remained fundamentally unchanged for decades: you type a command, it returns output. This synchronous, transactional model places the entire cognitive load on the operator.

Real-World Impact Statistics:

  • Developer productivity: Teams using Kiro report 40-60% reduction in repetitive coding tasks (AWS re:Invent 2024)
  • Code quality: Automated security scanning via hooks reduces vulnerabilities by 35% (AWS case study: Fintech Corp)
  • Context switching: Developers spend 23% less time switching between tools when using MCP integrations (McKinsey DevOps Report 2024)
  • Onboarding time: New developers become productive 2-3 weeks faster with Kiro agents (AWS Enterprise Survey 2024)

Modern Use Cases:

  1. CI/CD Automation: Headless Kiro agents fix failing builds autonomously
  2. Security Auditing: Read-only agents scan codebases for OWASP Top 10
  3. Multi-Repo Refactoring: Subagents coordinate changes across microservices
  4. Documentation Generation: PostToolUse hooks auto-generate docs on every commit
  5. Team Collaboration: Shared steering files enforce coding standards automatically

Context and Evolution (Optional Background)

The transition from Amazon Q Developer CLI to Kiro CLI (November 2024) marked a paradigm shift:

Aspect Amazon Q CLI (2023) Kiro CLI (2024-2025)
Licensing Apache License Proprietary (Free tier)
Focus AI Assistant AI Partner/Agent
Features Basic chat, AWS integration Subagents, Powers, Hooks, MCP
Authentication Builder ID, IAM Builder ID, IAM, Device Flow
Context Management Simple file loading Tiered (Session/Agent/Knowledge)
Automation Manual prompting Hooks + multi-agent workflows

Why alternatives exist:

  • Claude Code: Browser-based, great for UI automation, no headless mode
  • GitHub Copilot CLI: Chat-based, limited to GitHub context, no custom agents
  • Aider: Lightweight, single-model, no enterprise features
  • Cursor: IDE-bound, excellent for coding, limited automation

Kiro’s unique strengths:

  1. Headless automation: Run in CI/CD without human interaction
  2. Enterprise-ready: Device flow auth, corporate proxy support, SSO
  3. Extensible: MCP ecosystem, custom hooks, agent configurations
  4. Multi-agent: Subagents + orchestration for complex workflows
  5. AWS-native: Deep AWS integration (CloudWatch, S3, Lambda, etc.)

Concept Summary Table

Concept Cluster What You Need to Internalize
REVL Loop Read → Evaluate → Verify → Loop. Separates intent, execution, and verification for reliable AI.
Configuration Hierarchy Global (~/.kiro/) → Project (.kiro/) → Agent (.kiro/agents/*.json). Higher specificity wins.
Context Tiers Session (ephemeral chat), Agent Resources (persistent files), Knowledge Base (RAG for large codebases).
Steering Files Markdown constraints in .kiro/steering/. Positive, specific, testable rules.
Model Selection Auto router (default), Haiku (fast/cheap), Sonnet (balanced), Opus (deep reasoning). Override when you know the tradeoff.
MCP Servers External tool integrations via stdio. Configure in mcp.json. Enable databases, APIs, docs.
Hooks Event-driven automation. PreToolUse (blocking), PostToolUse (non-blocking), UserPromptSubmit (context injection).
Subagents Parallel task execution with isolated contexts. Fan-out/fan-in pattern for large-scale analysis.
Planning Agents Separate thinking (research + plan) from doing (execution). Reduces errors in complex tasks.
Permissions & Safety allowedTools, deniedTools, deniedCommands. Principle of least privilege for agents.
Experimental Features Tangent mode (context isolation), Checkpoints (session versioning), Knowledge (semantic search), Delegate (background tasks).

Project-to-Concept Map

Project Concepts Applied
Project 1: Session Explorer Context Tiers, Session Management, Configuration
Project 2: Model Router Analyzer Model Selection, Cost Optimization, Auto Router
Project 3: Context Window Visualizer Context Tiers, Token Economics, Real-time Monitoring
Project 4: Custom Agent Factory Agent Configuration, Permissions, Specialized Personas
Project 5: Steering Rules Engine Steering Files, Positive Constraints, Configuration Sharing
Project 6-10: MCP Integrations MCP Protocol, External Systems, Tool Design
Project 11: Planning Agent Workflow Planning Agents, Spec-Driven Development, REVL Loop
Project 12: Kiro Powers Creator Powers, Bundled Capabilities, Plugin System
Project 13-16: Experimental Features Tangent Mode, Checkpoints, Knowledge Base, Config Sync
Project 17-19: CI/CD & Headless Headless Automation, Device Flow, Non-Interactive Workflows
Project 20-25: Hooks & Automation PreToolUse, PostToolUse, Type-Safe Hooks, Event-Driven Systems
Project 26-30: Advanced MCP Docker MCP, Kubernetes, Custom Servers, Multi-System Integration
Project 31-35: Multi-Agent Workflows Subagents, Orchestration, Code Review Pipelines, Migration
Project 36-39: Enterprise & Teams Configuration Sharing, Output Styles, Skills, Team Collaboration
Project 40: Final Capstone All concepts integrated into production workflow platform

Deep Dive Reading by Concept

Context & Memory Management

Concept Book & Chapter Why This Matters
Context window limits “AI Engineering” by Chip Huyen — Ch. 4: “Model Serving” Understand token budgets and RAG
RAG fundamentals “Designing Data-Intensive Applications” by Martin Kleppmann — Ch. 12 Build knowledge bases for large codebases
Vector search “Fundamentals of Software Architecture” by Richards & Ford — Ch. 8 Implement semantic code search

Agent Design & Automation

Concept Book & Chapter Why This Matters
Tool design patterns “Clean Architecture” by Robert C. Martin — Ch. 22 Create well-bounded agent tools
Event-driven systems “Enterprise Integration Patterns” by Hohpe & Woolf — Ch. 3 Build robust hook systems
Permission models “Foundations of Information Security” by Jason Andress — Ch. 5 Implement least-privilege agents

Shell & System Integration

Concept Book & Chapter Why This Matters
Process management “The Linux Programming Interface” by Michael Kerrisk — Ch. 24-26 Understand hook subprocess execution
Shell scripting “Effective Shell” by Dave Kerr — Ch. 10-12 Write robust automation scripts
Git internals “How Linux Works” by Brian Ward — Ch. 8 Integrate with version control

MCP & Integration

Concept Book & Chapter Why This Matters
JSON-RPC MCP Specification — modelcontextprotocol.io Understand MCP protocol
API design “REST API Design Rulebook” by Mark Massé Build well-designed MCP servers
Database integration “Designing Data-Intensive Applications” by Kleppmann — Ch. 2-3 Connect Kiro to databases safely

Essential Reading Order

  1. Foundation (Week 1):
    • Kiro CLI official docs: https://kiro.dev/docs/cli/
    • MCP specification: https://modelcontextprotocol.io/
    • “Effective Shell” by Dave Kerr - Ch. 1-5
  2. Agent Design (Week 2):
    • Custom agents docs: https://kiro.dev/docs/cli/custom-agents/
    • Steering docs: https://kiro.dev/docs/cli/steering/
    • “Clean Architecture” by Robert C. Martin - Ch. 22
  3. Advanced Automation (Week 3):
    • Hooks docs: https://kiro.dev/docs/cli/hooks/
    • Experimental features: https://kiro.dev/docs/cli/experimental/
    • “Enterprise Integration Patterns” by Hohpe & Woolf - Ch. 3
  4. Production (Week 4+):
    • MCP server examples: https://github.com/modelcontextprotocol/
    • AWS re:Invent 2024: Kiro CLI Deep Dive (YouTube)
    • “Designing Data-Intensive Applications” by Kleppmann

Quick Start: Your First 48 Hours

Day 1: Install and Configure

Morning (2-3 hours):

  1. Install Kiro CLI: npm install -g @aws/kiro-cli or brew install kiro-cli
  2. Authenticate: kiro-cli login (Builder ID or IAM)
  3. Verify setup: kiro-cli --version and kiro-cli chat
  4. Read Introduction and Big Picture sections of this guide (30 minutes)
  5. Read Theory Primer Chapter 1: Configuration (30 minutes)

Afternoon (3-4 hours):

  1. Start Project 1: Session Explorer - build session management scripts
  2. Create your first global config file (~/.kiro/settings.json)
  3. Save and resume a session
  4. Export session data as JSON

Evening (1 hour):

  1. Read Chapter 2: Context Management
  2. Experiment with /context add, /context show, /compact
  3. Track your token usage in a session

Day 2: Build Your First Agent

Morning (2-3 hours):

  1. Read Chapter 4: Custom Agents
  2. Start Project 4: Custom Agent Factory
  3. Create a read-only security auditor agent
  4. Test that write tools are blocked

Afternoon (3-4 hours):

  1. Read Chapter 3: Steering Files
  2. Start Project 5: Steering Rules Engine
  3. Write your first steering file (tech.md)
  4. Validate that Kiro follows your rules

Evening (1 hour):

  1. Review your progress
  2. Complete Definition of Done checklists for Projects 1, 4, 5
  3. Read ahead: Chapter 6: MCP for tomorrow

Path 1: The Beginner (New to Agentic Tools)

Goal: Build confidence with Kiro fundamentals before complex automation.

Week 1-2: Foundations

  • Project 1: Session Explorer
  • Project 2: Model Router Analyzer
  • Project 3: Context Window Visualizer

Week 3-4: Configuration & Agents

  • Project 4: Custom Agent Factory
  • Project 5: Steering Rules Engine
  • Project 16: Configuration Sync System

Week 5-8: First Integrations

  • Project 6: MCP Server Connector (Postgres)
  • Project 9: AWS Documentation Searcher
  • Project 11: Planning Agent Workflow

Total: 2 months, 10-15 hours/week

Path 2: The Intermediate (Familiar with CLI Tools)

Goal: Master MCP integrations and automation hooks.

Week 1-2: Core Setup

  • Project 1: Session Explorer
  • Project 4: Custom Agent Factory
  • Project 5: Steering Rules Engine

Week 3-6: MCP Ecosystem

  • Project 6: MCP Server Connector
  • Project 7: GitHub Integration Agent
  • Project 9: AWS Documentation Searcher
  • Project 18: Docker MCP Server
  • Project 21: Slack Integration Agent

Week 7-10: Hooks & Automation

  • Project 8: Pre-Commit Hook System (Bun)
  • Project 22: Test Generator Hook
  • Project 23: Documentation Generator
  • Project 24: Secret Scanner Hook

Total: 2.5 months, 15-20 hours/week

Path 3: The Advanced (Ready for Production)

Goal: Build enterprise-grade multi-agent workflows.

Week 1-3: Rapid Foundations

  • Projects 1, 4, 5, 11 (planning mode)

Week 4-8: Full MCP Stack

  • Projects 6, 7, 9, 18, 19, 21, 28

Week 9-14: Multi-Agent Orchestration

  • Project 10: Subagent Orchestrator
  • Project 25: Code Review Workflow
  • Project 31: Codebase Migration Assistant
  • Project 33: Multi-Repository Refactoring
  • Project 39: Multi-Agent Pipeline Orchestrator

Week 15-20: Production Platform

  • Project 17: Headless CI/CD Pipeline
  • Project 37: Configuration Sharing System
  • Project 40: Complete Development Workflow Platform

Total: 5 months, 20+ hours/week

Path 4: The Enterprise Architect (Team Deployment)

Goal: Deploy Kiro across a team with shared configurations and governance.

Phase 1: Personal Mastery (Weeks 1-4)

  • Projects 1-5, 11, 16

Phase 2: Security & Governance (Weeks 5-8)

  • Project 8: Pre-Commit Hook System
  • Project 18: Security Firewall Hook (from LEARNING)
  • Project 24: Secret Scanner Hook
  • Project 20: Git Context Injector (from LEARNING)

Phase 3: Team Collaboration (Weeks 9-14)

  • Project 12: Kiro Powers Creator
  • Project 21: Slack Integration Agent
  • Project 37: Configuration Sharing System
  • Project 38: Output Style Designer

Phase 4: CI/CD Integration (Weeks 15-20)

  • Project 17: Headless CI/CD Pipeline
  • Project 28: Terraform Infrastructure Agent
  • Project 40: Complete Development Workflow Platform

Total: 5 months, team rollout plan included


Success Metrics

Technical Mastery

After completing this sprint, you should be able to:

Configuration & Setup:

  • ✅ Explain the three-tier config hierarchy (global, project, agent)
  • ✅ Debug config conflicts using kiro-cli settings show
  • ✅ Create specialized agents with precise tool permissions
  • ✅ Write effective steering files that enforce project standards

Context & Memory:

  • ✅ Manage token budgets across long sessions
  • ✅ Use /compact strategically to preserve important context
  • ✅ Implement knowledge bases for codebases >10MB
  • ✅ Understand the tradeoffs between session, agent, and knowledge context

Automation & Hooks:

  • ✅ Write type-safe hooks in TypeScript with Bun
  • ✅ Implement PreToolUse hooks that block dangerous commands
  • ✅ Implement PostToolUse hooks for auto-formatting and linting
  • ✅ Debug hook failures using stdin/stdout logs

MCP & Integrations:

  • ✅ Configure MCP servers for databases, GitHub, AWS, Docker
  • ✅ Build custom MCP servers in Python or TypeScript
  • ✅ Secure MCP integrations with read-only credentials
  • ✅ Troubleshoot MCP server connection issues

Multi-Agent Workflows:

  • ✅ Spawn subagents for parallel large-scale analysis
  • ✅ Orchestrate multi-agent code review pipelines
  • ✅ Use planning agents to reduce implementation errors
  • ✅ Coordinate changes across multiple repositories

Production Deployment:

  • ✅ Run Kiro headlessly in CI/CD pipelines
  • ✅ Implement device flow authentication for servers
  • ✅ Configure corporate proxy and custom CA trust
  • ✅ Share team configurations via Powers or dotfiles

Business Impact

Measurable outcomes you should achieve:

Productivity:

  • ⚡ 40-60% reduction in repetitive coding tasks (boilerplate, CRUD, migrations)
  • ⚡ 50%+ faster debugging sessions via context-aware analysis
  • ⚡ 2-3 weeks faster developer onboarding with agent assistance

Quality:

  • 🛡️ 35%+ reduction in security vulnerabilities via automated scanning
  • 🛡️ 90%+ test coverage on new code via test generation hooks
  • 🛡️ Zero secrets leaked to repos via secret scanner hooks

Collaboration:

  • 🤝 Team coding standards enforced automatically via steering
  • 🤝 Consistent PR review quality via code review agents
  • 🤝 Cross-repo refactoring coordination via multi-agent workflows

Self-Assessment Questions

Can you answer these confidently?

Beginner:

  1. What’s the difference between global, project, and agent config?
  2. How do you save and resume a Kiro session?
  3. What tools are available by default in Kiro?

Intermediate:

  1. How does the Auto router decide which model to use?
  2. What’s the difference between PreToolUse and PostToolUse hooks?
  3. How do you configure an MCP server for Postgres?

Advanced:

  1. How do subagents maintain isolated contexts?
  2. What’s the stdin/stdout protocol for hooks?
  3. How would you implement a custom MCP server from scratch?

Expert:

  1. How would you design a multi-agent CI/CD pipeline?
  2. How do you debug MCP server failures in production?
  3. What’s the strategy for sharing team configurations at scale?

Project List

The following 40 projects guide you from Kiro beginner to production expert. Projects are ordered by dependency and complexity.


Project 1: “The Personalized Kiro Config” — Configuration Management

Attribute Value
File KIRO_CLI_MASTERY.md
Main Programming Language JSON / Markdown
Coolness Level Level 2: Practical but Forgettable
Business Potential 1. The “Resume Gold” (Efficiency)
Difficulty Level 1: Beginner
Knowledge Area Configuration Management

What you’ll build: A robust, shareable global configuration system for Kiro that defines preferred models, telemetry, and your first global steering rules.

Why it teaches Config: You will understand precedence (global vs project vs agent) and how to persist your preferences across sessions.

Core challenges you’ll face:

  • Understanding the JSON schema for settings.json.
  • Defining global steering that applies to all projects.
  • Resolving conflicts between global and local settings.

Success criteria:

  • Kiro uses your chosen default model in new sessions.
  • A global steering rule is consistently applied.

Real World Outcome

You’ll have a fully configured Kiro CLI installation with persistent preferences that work across all your projects. When you run kiro chat, you’ll see:

Example Output:

$ kiro chat

Kiro CLI v1.4.2 (using claude-sonnet-4 by default)
Session: 2025-01-02-config-test
Telemetry: disabled
Knowledge Base: enabled

You: "Show me my current settings"

Kiro: [reads ~/.config/kiro/settings.json]
  Current configuration:
  - Default model: claude-sonnet-4
  - Telemetry: disabled
  - Auto-compact: enabled (at 80% context)
  - Knowledge base: enabled for *.py, *.js, *.md
  - Global steering: loaded from ~/.kiro/steering/tech.md

You: "Create a new Python function"

Kiro: [applies global steering from tech.md]
  Following your global steering rules:
  - Using type hints for all parameters
  - Including docstrings with examples
  - Following PEP 8 style guide

  Here's the function...

Your settings file at ~/.config/kiro/settings.json will look like this:

{
  "chat": {
    "defaultModel": "claude-sonnet-4",
    "greeting": {
      "enabled": false
    },
    "enableKnowledge": true
  },
  "telemetry": {
    "enabled": false
  },
  "context": {
    "autoCompact": {
      "enabled": true,
      "threshold": 0.8
    }
  },
  "knowledge": {
    "defaultIncludePatterns": ["*.py", "*.js", "*.md", "*.txt"]
  }
}

The Core Question You’re Answering

“How do I make Kiro remember my preferences across projects and sessions without repeating myself every time?”

Before building this, think about: Most developers waste time re-configuring tools for every project. Kiro’s three-tier configuration system (global → project → agent) lets you define intelligent defaults once, then override only what’s specific. This project teaches you how to architect reusable AI behavior.


Concepts You Must Understand First

Stop and research these before coding:

  1. Configuration Hierarchy
    • What happens when global settings conflict with project settings?
    • How does Kiro resolve precedence (global < project < agent)?
    • Where does each config file live in the filesystem?
    • Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 8 (Pragmatic Projects)
  2. JSON Schema Validation
    • What is the valid structure for settings.json?
    • How do you validate your config before Kiro loads it?
    • What happens when you provide invalid JSON?
    • Book Reference: “Working Effectively with Legacy Code” by Feathers - Ch. 10 (Configuration)
  3. Steering Files
    • How do steering files (.md) get loaded into context?
    • What’s the difference between prompt and steering?
    • When should constraints go in config vs steering?
    • Book Reference: “Clean Code” by Martin - Ch. 17 (Smells and Heuristics)

Questions to Guide Your Design

Before implementing, think through these:

  1. Default Model Selection
    • Which model should be your default: Haiku (fast), Sonnet (balanced), or Opus (powerful)?
    • What tasks will you do most often? (code review vs generation vs complex reasoning)
    • How much do credit costs matter to your workflow?
  2. Telemetry and Privacy
    • Do you want Kiro to send usage data to AWS?
    • What are the tradeoffs between telemetry off vs debugging help?
    • Should different projects have different telemetry settings?
  3. Knowledge Base Patterns
    • Which file types contain important context for your work?
    • Should you exclude generated files (build/, node_modules/)?
    • How large is your average codebase (affects knowledge base indexing)?

Thinking Exercise

Exercise: Trace Configuration Loading

Before creating your config, manually trace how Kiro would load settings for this scenario:

# File structure:
~/.config/kiro/settings.json         → defaultModel: "claude-sonnet-4"
~/my-project/.kiro/settings.json     → defaultModel: "claude-haiku-4"
~/my-project/.kiro/agents/audit.json → model: "claude-opus-4"

# Commands:
1. cd ~/my-project && kiro chat
2. kiro chat --agent audit

Questions while tracing:

  • What model is used for command #1? Why?
  • What model is used for command #2? Why?
  • How would you verify your answer without running the commands?
  • What if audit.json didn’t specify a model field?

The Interview Questions They’ll Ask

  1. “Explain the difference between Kiro’s global, project, and agent configuration levels.”
  2. “How would you debug why Kiro isn’t using your preferred model?”
  3. “What’s the purpose of the knowledge base feature, and when should you enable it?”
  4. “How do steering files differ from agent prompts?”
  5. “What are the security implications of enabling telemetry in a corporate environment?”
  6. “How would you share configuration across a team without committing credentials?”

Hints in Layers

Hint 1: Start with Inspection Don’t create config files from scratch. First run kiro-cli settings list to see all available options and their current values. This shows you what’s configurable.

Hint 2: Use CLI Commands First Instead of editing JSON manually, use kiro-cli settings <key> <value> commands. This validates your input and prevents syntax errors. Example: kiro-cli settings chat.defaultModel "claude-sonnet-4".

Hint 3: Test Incrementally Change one setting at a time and verify it works with kiro chat. Add model override, test. Add telemetry disable, test. Build confidence before creating complex configs.

Hint 4: Verify with JSON Tools Use jq or python -m json.tool to validate your settings.json syntax:

cat ~/.config/kiro/settings.json | jq .

If this fails, your JSON is malformed.


Books That Will Help

Topic Book Chapter
Configuration best practices “The Pragmatic Programmer” by Hunt & Thomas Ch. 8: Pragmatic Projects
JSON structure and validation “Effective Shell” by Dave Kerr Ch. 13: Working with Data
Tool customization patterns “Working Effectively with Legacy Code” by Feathers Ch. 10: Configuration Management

Common Pitfalls & Debugging

Problem 1: “Kiro ignores my global settings”

  • Why: Project-level settings override global ones
  • Fix: Check if <project>/.kiro/settings.json exists and has conflicting values
  • Quick test: cat .kiro/settings.json in your project directory

Problem 2: “Settings file causes Kiro to crash on startup”

  • Why: Invalid JSON syntax (missing comma, extra bracket, unquoted string)
  • Fix: Validate with jq . < ~/.config/kiro/settings.json or use kiro-cli settings commands
  • Quick test: kiro-cli settings list should not error

Problem 3: “Knowledge base includes too many files”

  • Why: Default patterns like * match everything including node_modules/
  • Fix: Use specific patterns: ["*.py", "*.js", "!node_modules/**", "!build/**"]
  • Quick test: kiro-cli settings knowledge.defaultIncludePatterns --get to inspect current patterns

Problem 4: “Steering file not loaded”

  • Why: File path in config is wrong or file doesn’t exist
  • Fix: Use absolute paths or paths relative to .kiro/ directory
  • Quick test: kiro chat → ask “what are your current steering rules?” → verify file is mentioned

Definition of Done

  • ~/.config/kiro/settings.json exists and is valid JSON
  • Running kiro chat in any directory uses your default model (verify with greeting message or /settings command)
  • Telemetry preference (on/off) persists across sessions
  • Knowledge base is enabled and indexes expected file types
  • Global steering file (if created) is loaded in all new sessions
  • kiro-cli settings list shows all your customizations
  • Settings can be exported as JSON and re-imported on a new machine
  • You understand how to override global settings at project level

Project 2: “The Steering Enforcer” — Prompt Engineering

Attribute Value
File KIRO_CLI_MASTERY.md
Main Programming Language Markdown
Coolness Level Level 3: Genuinely Clever
Business Potential 3. Service & Support (Team Standardization)
Difficulty Level 1: Beginner
Knowledge Area Prompt Engineering / Context Management

What you’ll build: A set of hierarchical steering files (tech.md, product.md, style.md) for a dummy project. Verify that Kiro refuses output that violates these rules.

Why it teaches Steering: This is the foundation of enterprise AI: constrain creativity to adhere to strict invariants.

Core challenges you’ll face:

  • Writing positive constraints.
  • Structuring intent vs implementation.

Success criteria:

  • A test prompt fails until you align it with steering.

Real World Outcome

You’ll have a set of steering files that enforce your coding standards automatically. When you ask Kiro to write code, it will follow your rules without you having to remind it every time.

Example Output:

$ cd my-web-app
$ ls .kiro/steering/
tech.md  product.md  style.md

$ cat .kiro/steering/tech.md
# Technical Standards

## TypeScript
- All functions must have explicit return types
- Use `interface` for public APIs, `type` for internal code
- Prefer functional patterns over classes
- No `any` types except in migration code (mark with TODO)

## React
- Functional components only (no class components)
- Use custom hooks for all side effects
- Colocate styles with components
- Prefer server components (mark client with "use client")

## Testing
- Unit test coverage minimum: 80%
- Integration tests for all API endpoints
- Use React Testing Library for component tests

$ kiro chat

You: "Create a new React component for displaying user profiles"

Kiro: [reads .kiro/steering/tech.md, product.md, style.md]
  Following your steering rules:
  ✓ Functional component with TypeScript
  ✓ Explicit return type
  ✓ Colocated styles
  ✓ Server component (no client-side state)

interface UserProfileProps {
  userId: string;
}

export function UserProfile({ userId }: UserProfileProps): React.ReactElement {
  // Component implementation following your standards...
}

You: "Actually use a class component"

Kiro: I cannot do that. Your tech.md steering file explicitly states "Functional components only (no class components)". If you need to override this rule for a specific case, you should either:
1. Update .kiro/steering/tech.md to allow exceptions
2. Provide a strong justification for why this component requires a class
3. Use /steer off to temporarily disable steering (not recommended)

Would you like me to explain why functional components are preferred in your standards?

Your .kiro/steering/tech.md file enforces project rules across all sessions.


The Core Question You’re Answering

“How do I make Kiro follow my team’s coding standards, architectural patterns, and best practices without explaining them in every single conversation?”

Before building this, understand: Most AI tools require constant reminders about your conventions. Kiro’s steering system solves this by loading markdown files into every session’s context. This project teaches you how to encode knowledge that persists.


Concepts You Must Understand First

Stop and research these before coding:

  1. Steering vs Prompts
    • How does steering differ from the agent’s base prompt?
    • When should a rule go in steering vs in a custom agent’s prompt field?
    • How does steering interact with the model’s training?
    • Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 7 (While You Are Coding)
  2. Markdown as Configuration
    • Why does Kiro use .md files instead of JSON for steering?
    • How does markdown structure (headings, lists, code blocks) affect AI understanding?
    • What’s the difference between declarative rules and example-based rules?
    • Reference: Kiro Steering Guide
  3. Constraint Design
    • How do you write constraints that are specific enough to enforce but flexible enough to allow creativity?
    • What’s the difference between “never use X” vs “prefer Y over X”?
    • How do you test that a steering rule actually works?
    • Reference: Best Kiro Steering Rules (2025)

Questions to Guide Your Design

Before implementing, think through these:

  1. File Organization
    • Should you have one monolithic steering.md or separate files per domain?
    • How do you decide what belongs in tech.md vs product.md vs style.md?
    • Should steering files be committed to version control?
  2. Rule Specificity
    • How specific should rules be? (“Use TypeScript” vs “All functions must have explicit return types”)
    • Should you include examples of good code, bad code, or both?
    • How do you handle edge cases and exceptions?
  3. Team Alignment
    • If this is a team project, who has authority to add/change steering rules?
    • How do you communicate steering changes to team members?
    • Should different team members have different local steering overrides?

Thinking Exercise

Exercise: Design a Steering Hierarchy

You’re building a full-stack TypeScript monorepo with:

  • 2 frontend apps (React + Next.js)
  • 1 backend API (Express)
  • Shared libraries (utils, types, components)

Design the steering file structure. For each file, list 3-5 rules it should enforce.

.kiro/steering/
├── _______________.md  → Rules for: _______________
├── _______________.md  → Rules for: _______________
└── _______________.md  → Rules for: _______________

Questions while designing:

  • Which rules apply globally vs per-domain?
  • How do you prevent contradictory rules (e.g., frontend prefers X, backend requires Y)?
  • What happens when a prompt violates multiple steering files?
  • Should shared libraries have their own steering file?

The Interview Questions They’ll Ask

  1. “Explain the difference between a steering file and a custom agent’s prompt.”
  2. “How would you enforce a team coding standard using Kiro’s steering system?”
  3. “What are the tradeoffs between specific rules (‘no any types’) vs general principles (‘write type-safe code’)?”
  4. “How do you test that a steering file is working correctly?”
  5. “When would you use global steering (~/.kiro/steering/) vs project steering (.kiro/steering/)?”
  6. “What security considerations exist when writing steering files?”

Hints in Layers

Hint 1: Start with Examples, Not Rules Don’t write abstract rules like “write clean code.” Instead, paste examples of code you like and code you don’t like. The AI learns patterns from examples better than from principles.

Hint 2: Use Hierarchical Markdown Organize steering with H2 headers for domains (## React, ## TypeScript) and H3 for specific topics (### Component Structure, ### Type Safety). This helps Kiro understand which rules apply when.

Hint 3: Test with Violations After creating a steering file, intentionally ask Kiro to violate a rule. Example: If you wrote “no class components,” ask “create a class-based React component.” Verify Kiro refuses or warns you.

Hint 4: Version Control Your Steering Treat .kiro/steering/ like code. Commit it, review changes in PRs, and document why rules exist. This creates a living document of your team’s standards.


Books That Will Help

Topic Book Chapter
Writing effective constraints “Clean Code” by Robert C. Martin Ch. 17: Smells and Heuristics
Team coding standards “The Pragmatic Programmer” by Hunt & Thomas Ch. 7: While You Are Coding
Documentation best practices “Docs for Developers” by Nunez & Seward Ch. 4: Writing Technical Documentation

Common Pitfalls & Debugging

Problem 1: “Kiro ignores my steering rules”

  • Why: Steering file has syntax errors or isn’t in the correct directory
  • Fix: Verify file is at .kiro/steering/*.md (not ~/.kiro/steering/ for project-specific rules)
  • Quick test: ls .kiro/steering/ should list your files; cat .kiro/steering/tech.md should show valid markdown

Problem 2: “Kiro follows steering too rigidly”

  • Why: Rules are written as absolute prohibitions (“never use X”) instead of preferences
  • Fix: Soften language: “Prefer functional components. Use class components only for error boundaries or third-party library requirements.”
  • Quick test: Ask Kiro to explain when it’s acceptable to violate a rule

Problem 3: “Contradictory steering files”

  • Why: tech.md says “use Prettier with 2 spaces” but style.md says “4 space indentation”
  • Fix: Consolidate related rules into a single file, or establish precedence (e.g., tech.md overrides style.md)
  • Quick test: grep -r "space" .kiro/steering/ to find conflicting rules

Problem 4: “Steering files are too long (5000+ lines)”

  • Why: Trying to document every pattern in one file
  • Fix: Split by domain: react.md, typescript.md, testing.md, api-design.md
  • Quick test: Each steering file should be < 500 lines; if longer, split into sub-domains

Definition of Done

  • Created at least 3 steering files in .kiro/steering/ directory
  • Each file uses hierarchical markdown (H2 sections, H3 subsections)
  • Includes both positive examples (“Do this”) and negative examples (“Don’t do this”)
  • At least one rule includes code snippets showing preferred vs discouraged patterns
  • Tested by asking Kiro to violate a rule—it should refuse or warn
  • Documented why each major rule exists (rationale for future developers)
  • Steering files are committed to version control
  • Team members can override steering locally if needed (using ~/.kiro/steering/)

Project 3: “The Context Detective” — LLM Context Window

Attribute Value
File KIRO_CLI_MASTERY.md
Main Programming Language Bash / Kiro Commands
Coolness Level Level 2: Practical
Difficulty Level 1: Beginner
Knowledge Area LLM Context Window

What you’ll build: Experiments to visualize token usage. Load a large file, use /context show, /compact, and /context remove.

Why it teaches Context: You learn to garbage collect AI memory to maintain performance and accuracy.

Core challenges you’ll face:

  • Distinguishing file context from chat history.
  • Detecting when instructions are being forgotten.

Success criteria:

  • You can reduce context size without losing key constraints.

Real World Outcome

You’ll understand exactly how Kiro manages its 200K token context window and learn to use commands to prevent context overflow. You’ll be able to see what’s consuming context and strategically free up space.

Example Output:

$ kiro chat

You: "Load the entire src/ directory"

Kiro: [reads 250 files totaling 180K tokens]
  Context usage: 90% (180K / 200K tokens)
  ⚠️  WARNING: Approaching context limit

  I've loaded your source code, but we're at 90% context capacity.
  Consider using `/compact` to summarize our conversation or
  `/context remove` to free up space.

You: "/context show"

Kiro:
┌─────────────────────────────────────────────────┐
│ Context Window Usage: 180K / 200K (90%)         │
├─────────────────────────────────────────────────┤
│ Category            │ Tokens    │ Percentage   │
├─────────────────────┼───────────┼──────────────┤
│ Files (src/)        │ 140K      │ 70%          │
│ Chat History        │  25K      │ 12.5%        │
│ Steering Files      │   8K      │  4%          │
│ System Prompt       │   5K      │  2.5%        │
│ Tools/Resources     │   2K      │  1%          │
└─────────────────────┴───────────┴──────────────┘

Top file consumers:
  1. src/components/Dashboard.tsx      → 18K tokens
  2. src/api/handlers.ts               → 15K tokens
  3. src/lib/database.ts               → 12K tokens

You: "/context remove src/components/Dashboard.tsx"

Kiro: Removed src/components/Dashboard.tsx (18K tokens)
  New context usage: 162K / 200K (81%)

You: "/compact"

Kiro: [summarizes conversation history]
  Compacted 25K tokens of chat history → 5K token summary
  New context usage: 142K / 200K (71%)

  Summary preserved:
  - Your project uses Next.js 14 with TypeScript
  - Current task: debugging authentication flow
  - Key files: src/api/handlers.ts, src/lib/database.ts

You: "Now analyze the auth flow"

Kiro: [works with 71% context usage]
  Looking at src/api/handlers.ts...
  [analysis continues with plenty of context headroom]

The context usage meter shows in real-time as files/messages consume space.


The Core Question You’re Answering

“What happens when I load too much code into Kiro’s context, and how do I manage the 200K token budget without losing important information?”

Before experimenting, understand: Kiro’s context window is like RAM—finite and precious. Once full, either Kiro auto-compacts (potentially losing details) or refuses to load more. This project teaches you to be a context window architect: strategic about what you load, when to summarize, and how to preserve critical constraints.


Concepts You Must Understand First

Stop and research these before experimenting:

  1. Token Counting
    • What is a “token” in LLM terms?
    • How many tokens does a typical code file consume?
    • Do comments, whitespace, and variable names count as tokens?
    • Reference: Kiro Context Management
  2. Context Window Composition
    • What’s the breakdown of Kiro’s 200K context? (files, chat, steering, system prompt)
    • Which components are fixed (system prompt) vs dynamic (chat history)?
    • How does adding a steering file affect available space?
    • Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 1 (Foundations)
  3. Compaction vs Removal
    • What’s the difference between /compact (summarize) and /context remove (delete)?
    • What information is lost during compaction?
    • When should you compact vs when should you remove?
    • Reference: Slash Commands Reference

Questions to Guide Your Design

Before experimenting, think through these:

  1. Loading Strategy
    • Should you load the entire codebase at once or selectively load files as needed?
    • How do you decide which files are “important enough” to keep in context?
    • What’s the tradeoff between having more context vs faster responses?
  2. Compaction Timing
    • Should you wait for Kiro’s auto-compact (80% threshold) or manually compact earlier?
    • What information must survive compaction? (steering rules, architectural decisions, bug context)
    • How do you verify that compaction preserved the right details?
  3. Multi-File Workflows
    • When debugging across 10 files, how do you keep all relevant context loaded?
    • How do you avoid reloading files you’ve already removed?
    • Should you use subagents for parallel file analysis instead?

Thinking Exercise

Exercise: Context Budget Allocation

You have 200K tokens. You’re debugging a Next.js authentication bug. Plan your context budget:

Available: 200K tokens

Fixed costs:
- System prompt:       5K
- Steering files:      8K
- Tools/Resources:     2K
─────────────────────────
Remaining budget:    185K

You need to analyze:

  • src/auth/login.tsx (12K tokens)
  • src/api/auth.ts (8K tokens)
  • src/lib/session.ts (6K tokens)
  • src/middleware.ts (4K tokens)
  • .env.example (1K tokens)
  • Chat history will grow over 50 messages (~25K tokens)

Questions while planning:

  • How much space should you reserve for growing chat history?
  • If you run out of space mid-conversation, which file would you remove first?
  • Should you proactively compact at 60% or wait until 80%?
  • Could you use /grep to search files instead of loading them entirely?

The Interview Questions They’ll Ask

  1. “Explain how Kiro’s 200K context window is allocated between files, chat history, and system components.”
  2. “What’s the difference between /compact and /context remove? When would you use each?”
  3. “How would you debug an issue across 20 files without exceeding the context window?”
  4. “What information is lost when Kiro auto-compacts at 80% context usage?”
  5. “How do steering files affect available context space?”
  6. “What strategies would you use to work with a codebase larger than 200K tokens?”

Hints in Layers

Hint 1: Monitor Before You Act Always run /context show before making decisions. Don’t guess about usage—measure it. This shows exactly what’s consuming space.

Hint 2: Load Incrementally Don’t run /context add src/ to load everything. Instead, load specific files: /context add src/auth/login.tsx. Add more only when needed. Start small, expand gradually.

Hint 3: Use Grep for Reconnaissance Before loading a file into context, use /grep to search it. Example: /grep "authenticate" src/auth/login.tsx. This finds info without burning context tokens.

Hint 4: Compact Early and Often Don’t wait until 90% usage. When you finish a subtask (e.g., “fixed login bug”), run /compact to summarize that work and free up space for the next subtask.


Books That Will Help

Topic Book Chapter
Token-based language models “Speech and Language Processing” by Jurafsky & Martin Ch. 3: N-gram Language Models
Memory management principles “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron Ch. 9: Virtual Memory
Resource allocation strategies “Designing Data-Intensive Applications” by Kleppmann Ch. 1: Foundations of Data Systems

Common Pitfalls & Debugging

Problem 1: “Kiro auto-compacted and forgot my instructions”

  • Why: Instructions were only in chat history, not in steering files or repeated in context
  • Fix: Put persistent instructions in .kiro/steering/*.md, not in chat messages
  • Quick test: After /compact, ask Kiro to repeat your key constraints—they should still be remembered

Problem 2: “Context fills up instantly when I load files”

  • Why: Loaded entire directory with /context add src/ instead of specific files
  • Fix: Remove with /context clear, then load only essential files one by one
  • Quick test: /context show → check if you’re loading files you don’t actually need

Problem 3: “Can’t load more files even though context shows 40%”

  • Why: Individual file might be too large (>50K tokens) or rate limiting
  • Fix: Use /grep to search file instead of loading it, or load only relevant sections
  • Quick test: Check file size with wc -w filename.tsx (words × 1.3 ≈ tokens)

Problem 4: “After compaction, Kiro gives different answers”

  • Why: Compaction creates a lossy summary; nuance is lost
  • Fix: Before compacting, explicitly state what must be remembered: “Key constraint: authentication must use JWT tokens with 1-hour expiry”
  • Quick test: Ask Kiro “what are the key constraints?” before and after /compact

Definition of Done

  • Ran /context show and understand the breakdown of usage
  • Loaded a large file (>10K tokens) and observed context percentage increase
  • Used /context remove <file> to free up space and verified percentage decrease
  • Triggered auto-compaction by reaching 80% context usage (or manually ran /compact)
  • Verified that key instructions survived compaction
  • Experimented with /grep as an alternative to loading full files
  • Identified which components are fixed (system prompt) vs dynamic (chat history)
  • Can explain when to use compact vs remove vs subagents for large tasks

Project 4: “The Subagent Researcher” — Agentic Architectures

Attribute Value
File KIRO_CLI_MASTERY.md
Main Programming Language Natural Language
Coolness Level Level 3: Genuinely Clever
Difficulty Level 2: Intermediate
Knowledge Area Agentic Architectures

What you’ll build: Use subagents to map a complex repository. Aggregate findings into ARCHITECTURE.md.

Why it teaches Subagents: You break linear context limits using parallel research.

Core challenges you’ll face:

  • Writing clear delegation prompts.
  • Avoiding hallucinated summaries.

Success criteria:

  • ARCHITECTURE.md accurately reflects repo structure.

Real World Outcome

You’ll create an ARCHITECTURE.md document that provides a comprehensive overview of a complex codebase by delegating research to multiple parallel subagents, each analyzing different parts of the system.

Example Output:

$ cd ~/large-monorepo  # 500+ files across 20 modules

$ kiro chat

You: "Map the entire architecture of this monorepo and create ARCHITECTURE.md"

Kiro: This codebase is large (500+ files). I'll spawn subagents to analyze different modules in parallel.

  Spawning subagents:
  → Subagent 1: Analyzing frontend packages (apps/web, apps/mobile)
  → Subagent 2: Analyzing backend services (services/api, services/auth, services/billing)
  → Subagent 3: Analyzing shared libraries (packages/ui, packages/utils, packages/types)
  → Subagent 4: Analyzing infrastructure (terraform/, k8s/, .github/workflows/)

  [After 2-3 minutes, all subagents complete]

  Aggregating findings from 4 subagents...

  Created ARCHITECTURE.md

$ cat ARCHITECTURE.md

# Monorepo Architecture Overview

## High-Level Structure

This is a TypeScript monorepo using Turborepo with the following components:

### Frontend Applications (2)
- **apps/web** - Next.js 14 customer dashboard (Server Components + App Router)
- **apps/mobile** - React Native app (iOS + Android)

### Backend Services (3)
- **services/api** - Express REST API (PostgreSQL + Prisma)
- **services/auth** - Authentication service (JWT + OAuth)
- **services/billing** - Stripe integration service

### Shared Packages (6)
- **packages/ui** - React component library (Tailwind CSS)
- **packages/utils** - Shared TypeScript utilities
- **packages/types** - Shared TypeScript types
- **packages/config** - ESLint, TypeScript, Tailwind configs
- **packages/database** - Prisma schema + migrations
- **packages/api-client** - Type-safe API client

### Infrastructure
- **terraform/** - AWS infrastructure (ECS, RDS, S3, CloudFront)
- **k8s/** - Kubernetes manifests for staging/prod
- **.github/workflows/** - CI/CD pipelines (test, build, deploy)

## Key Architectural Decisions

1. **Monorepo Strategy**: Turborepo for build caching and task orchestration
2. **Data Flow**: REST API → PostgreSQL (no GraphQL, deliberate choice for simplicity)
3. **Authentication**: Centralized auth service (JWT tokens, 1-hour expiry, refresh flow)
4. **Deployment**: ECS on AWS (not Lambda, for long-running WebSocket connections)

## Module Dependencies

[Dependency graph generated from package.json analysis]

web → @repo/ui, @repo/api-client, @repo/utils
api → @repo/database, @repo/utils
billing → @repo/database, @repo/api-client

## Critical Entry Points

- **Frontend**: `apps/web/app/page.tsx` (root route)
- **Backend**: `services/api/src/server.ts` (Express app)
- **Database**: `packages/database/prisma/schema.prisma` (data model)

---

This document was generated by analyzing 523 files across 4 parallel subagents.
Last updated: 2025-01-02

Each subagent had its own isolated 200K context window, allowing parallel analysis of the entire monorepo.


The Core Question You’re Answering

“How do I analyze a codebase that’s too large for a single context window, and how do I coordinate multiple AI agents working in parallel?”

Before building this, understand: Subagents are Kiro’s answer to the context window limit. Each subagent gets its own isolated 200K context, runs autonomously with a specific task, and reports back findings. This project teaches you distributed AI workflows—how to decompose a large problem into parallel subtasks and aggregate results.


Concepts You Must Understand First

Stop and research these before coding:

  1. Subagent Isolation
    • How does each subagent get its own 200K context window?
    • Can subagents communicate with each other, or only with the main agent?
    • What happens if a subagent fails or runs out of context?
    • Reference: Subagents and Plan Agent Changelog
  2. Task Decomposition
    • How do you break a vague goal (“map the architecture”) into specific subagent tasks?
    • What makes a good subagent delegation prompt?
    • How do you avoid duplicated work between subagents?
    • Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 6 (Concurrency)
  3. Result Aggregation
    • How do you merge findings from 5 different subagents into a coherent document?
    • What if subagents have conflicting information?
    • How do you verify subagent results are accurate?
    • Reference: Built-in Tools - use_subagent

Questions to Guide Your Design

Before implementing, think through these:

  1. Delegation Strategy
    • Should you divide work by directory (frontend/ vs backend/), by file type (*.tsx vs *.ts), or by concern (auth, billing, UI)?
    • How many subagents should you spawn? (Kiro supports up to 10 parallel)
    • What instructions should each subagent receive? (generic vs specific)
  2. Overlap and Gaps
    • How do you ensure no files are missed between subagent tasks?
    • What if a file belongs to multiple domains (e.g., shared types)?
    • Should subagents have overlapping scopes for validation?
  3. Aggregation Logic
    • Should you manually merge subagent outputs or ask Kiro to synthesize them?
    • What structure should the final ARCHITECTURE.md follow?
    • How do you cite which subagent discovered which fact?

Thinking Exercise

Exercise: Design Subagent Delegation

You’re analyzing a Django monolith with this structure:

project/
├── apps/           (8 Django apps: users, products, orders, payments, etc.)
├── core/           (shared models, middleware, utilities)
├── api/            (DRF API endpoints)
├── frontend/       (React SPA)
├── tests/          (pytest test suite)
└── infrastructure/ (Docker, k8s, Terraform)

Design a subagent strategy:

Subagent 1: ______________________
  Task: ___________________________
  Expected output: _________________

Subagent 2: ______________________
  Task: ___________________________
  Expected output: _________________

Subagent 3: ______________________
  Task: ___________________________
  Expected output: _________________

Questions while designing:

  • How do you handle shared code in core/ that all apps depend on?
  • Should the API analysis be separate from app analysis, or combined?
  • How do you prevent one subagent from analyzing the entire codebase redundantly?

The Interview Questions They’ll Ask

  1. “Explain how Kiro’s subagents have isolated context windows and why this matters.”
  2. “How would you decompose the task of ‘document this codebase’ into parallel subagent tasks?”
  3. “What are the tradeoffs between spawning many small subagents vs few large subagents?”
  4. “How do you handle conflicting information from different subagents?”
  5. “When should you use subagents vs when should you use the main agent with /compact?”
  6. “How would you verify that subagent-generated documentation is accurate?”

Hints in Layers

Hint 1: Start with Manual Decomposition Don’t just say “analyze the codebase.” First, manually explore with ls -R or tree to understand the structure. Then write specific tasks: “Subagent 1: analyze all files in apps/web/, summarize routing and components.”

Hint 2: Use Focused Delegation Prompts Each subagent needs a clear objective and output format. Example: “Analyze the authentication flow in services/auth/. Output: 1) Entry points, 2) Key functions, 3) Database models used, 4) External dependencies.”

Hint 3: Aggregate Incrementally Don’t wait for all subagents to finish. As each completes, copy its findings into a draft ARCHITECTURE.md. This lets you spot gaps early and spawn follow-up subagents if needed.

Hint 4: Verify with Cross-References After aggregation, use the main agent to verify facts. Example: “Subagent 2 claims the API uses JWT authentication. Can you confirm this by reading services/api/src/middleware/auth.ts?”


Books That Will Help

Topic Book Chapter
Concurrent task execution “The Pragmatic Programmer” by Hunt & Thomas Ch. 6: Concurrency
Distributed systems patterns “Designing Data-Intensive Applications” by Kleppmann Ch. 5: Replication
Code archaeology techniques “Working Effectively with Legacy Code” by Feathers Ch. 16: I Don’t Understand This Code

Common Pitfalls & Debugging

Problem 1: “Subagents all analyzed the same files”

  • Why: Delegation prompts were too vague (“analyze the frontend”)
  • Fix: Be specific with directory scopes: “analyze only apps/web/”, “analyze only apps/mobile/”
  • Quick test: Check each subagent’s output—file paths should not overlap significantly

Problem 2: “Subagent hallucinated architecture details”

  • Why: Asked for high-level summary without grounding in actual files
  • Fix: Require subagents to cite specific files and line numbers for claims
  • Quick test: Manually verify 3-5 claims from each subagent’s output

Problem 3: “One subagent ran out of context”

  • Why: Assigned too many files to a single subagent (e.g., “analyze all 200 components”)
  • Fix: Split into smaller chunks or use /grep for reconnaissance before loading files
  • Quick test: If a subagent’s task scope > 50 files, consider splitting

Problem 4: “Aggregated document is incoherent”

  • Why: Each subagent used different section structures
  • Fix: Give all subagents a template: “Output format: ## Module Name\n### Purpose\n### Key Files\n### Dependencies”
  • Quick test: All subagent outputs should have consistent markdown headings

Definition of Done

  • Spawned at least 3 subagents with non-overlapping scopes
  • Each subagent received a specific directory or module to analyze
  • All subagents completed successfully (no context overflow errors)
  • Aggregated findings into a single ARCHITECTURE.md document
  • Document includes: high-level structure, module purposes, key entry points, dependencies
  • Manually verified 5+ claims from subagent outputs (checked actual files)
  • Cross-referenced between subagents to resolve conflicts
  • Understand when to use subagents vs main agent with compaction

Project 5: “The Plan Architect” — Spec-Driven Development

Attribute Value
File KIRO_CLI_MASTERY.md
Main Programming Language Natural Language
Coolness Level Level 3: Genuinely Clever
Difficulty Level 2: Intermediate
Knowledge Area Spec-Driven Development

What you’ll build: A small ToDo app built only after a /plan phase that outputs a tasks.md checklist.

Why it teaches Planning: Separates thinking from typing, reducing logic errors.

Success criteria:

  • A complete plan exists before execution starts.
  • Each task is checked off with verification notes.

Real World Outcome

You’ll build a complete ToDo application, but only after creating a detailed implementation plan using Kiro’s Planning agent (Shift+Tab or /plan). The planning phase produces a tasks.md checklist that guides implementation step-by-step.

Example Output:

$ mkdir todo-app && cd todo-app
$ kiro chat

You: "I want to build a simple todo app with React and Node.js backend"

Kiro: Let me help you plan this project. I'll enter planning mode to create a structured implementation plan.

  [Entering Planning Mode - Shift+Tab pressed]

Plan Agent: I'll ask you some questions to refine your requirements.

  **Question 1: Frontend Framework**
  Which React setup do you prefer?
  A) Create React App (simple, batteries-included)
  B) Vite + React (faster, modern tooling)
  C) Next.js (SSR, App Router)

  Your choice: B

  **Question 2: Backend API**
  What backend approach?
  A) REST API with Express
  B) GraphQL with Apollo
  C) tRPC (type-safe)

  Your choice: A

  **Question 3: Database**
  Which database?
  A) PostgreSQL (relational, production-ready)
  B) SQLite (simple, file-based)
  C) MongoDB (document-based)

  Your choice: B

  **Question 4: Authentication**
  Do you need user authentication?
  A) Yes, JWT-based
  B) Yes, session-based
  C) No, single-user app

  Your choice: C

  [Researching codebase... 0 existing files]
  [Analyzing requirements...]
  [Creating implementation plan...]

  Plan created: tasks.md

$ cat tasks.md

# Todo App Implementation Plan

## Overview
Building a single-user todo application with:
- **Frontend**: Vite + React + TypeScript
- **Backend**: Express + SQLite
- **Features**: Create, Read, Update, Delete todos; Mark complete/incomplete

## Architecture

┌─────────────────┐ ┌─────────────────┐ │ React App │ ◄─HTTP─►│ Express API │ │ (Port 5173) │ │ (Port 3000) │ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ SQLite DB │ │ (todos.db) │ └─────────────────┘


## Task Breakdown

### Phase 1: Project Setup (30 minutes)
- [ ] Task 1.1: Initialize Vite React app with TypeScript
- [ ] Task 1.2: Set up Express server with TypeScript
- [ ] Task 1.3: Configure SQLite database with better-sqlite3
- [ ] Task 1.4: Create monorepo structure (frontend/, backend/)

### Phase 2: Backend API (1-2 hours)
- [ ] Task 2.1: Create database schema (todos table)
- [ ] Task 2.2: Implement GET /api/todos endpoint
- [ ] Task 2.3: Implement POST /api/todos endpoint
- [ ] Task 2.4: Implement PATCH /api/todos/:id endpoint (update)
- [ ] Task 2.5: Implement DELETE /api/todos/:id endpoint
- [ ] Task 2.6: Add CORS middleware for frontend

### Phase 3: Frontend UI (2-3 hours)
- [ ] Task 3.1: Create TodoList component
- [ ] Task 3.2: Create TodoItem component
- [ ] Task 3.3: Create AddTodo form component
- [ ] Task 3.4: Implement state management (useState + useEffect)
- [ ] Task 3.5: Add API client functions (fetch wrappers)
- [ ] Task 3.6: Style with Tailwind CSS or plain CSS

### Phase 4: Integration & Testing (1 hour)
- [ ] Task 4.1: Connect frontend to backend API
- [ ] Task 4.2: Test CRUD operations end-to-end
- [ ] Task 4.3: Handle loading states and errors
- [ ] Task 4.4: Add optimistic UI updates

## Demo Criteria
- User can add a todo with title and optional description
- User can mark todos as complete/incomplete (checkbox)
- User can delete todos
- Todos persist across page refreshes
- UI shows loading state during API calls

---

Plan Agent: This plan is ready for implementation. Would you like me to hand off to the execution agent? [Y/n]

You: Y

Kiro: [Exiting planning mode, entering execution mode]
  I'll implement the plan task-by-task. Starting with Task 1.1...

  [Implementation proceeds, checking off tasks in tasks.md as they complete]

The planning phase separated “what to build” from “how to build it,” reducing mistakes.


The Core Question You’re Answering

“How do I avoid jumping straight into coding without thinking through the architecture, leading to half-built features and refactoring nightmares?”

Before building this, understand: Most developers write code first, realize the design is flawed, and refactor. Kiro’s Planning agent inverts this: it asks clarifying questions, researches your codebase, and outputs a structured plan. This project teaches you spec-driven development—think deeply before typing.


Concepts You Must Understand First

Stop and research these before starting:

  1. Planning vs Execution Modes
    • What’s the difference between the Plan agent and the main chat agent?
    • Why does the Plan agent run in read-only mode?
    • How do you transition from planning to execution?
    • Reference: Subagents and Plan Agent Changelog
  2. Requirements Elicitation
    • How does the Plan agent ask clarifying questions?
    • What happens if you give vague requirements (“build an app”)?
    • How do multiple-choice questions improve plan quality?
    • Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 2 (A Pragmatic Approach)
  3. Task Decomposition
    • What makes a good task? (specific, measurable, testable)
    • How granular should tasks be? (30 min vs 4 hours)
    • How do you organize tasks into phases?
    • Book Reference: “Clean Architecture” by Robert C. Martin - Ch. 22 (The Clean Architecture)

Questions to Guide Your Design

Before entering planning mode, think through these:

  1. Project Scope
    • Is this a prototype (2-3 days) or a production app (2-3 weeks)?
    • What’s the minimum viable feature set?
    • What can be added later vs must be included now?
  2. Plan Granularity
    • Should tasks be 30 minutes each, or 2 hours each?
    • How many phases should the plan have?
    • Should you plan the entire app or just the first iteration?
  3. Verification Strategy
    • How do you verify each task is “done”?
    • What constitutes a passing demo for each phase?
    • Should tasks include tests, or is manual verification OK?

Thinking Exercise

Exercise: Evaluate a Bad Plan vs Good Plan

Compare these two plans for the same todo app:

Bad Plan:

- [ ] Build frontend
- [ ] Build backend
- [ ] Connect them
- [ ] Test everything

Good Plan:

Phase 1: Backend API
- [ ] Create SQLite schema (todos table: id, title, completed, createdAt)
- [ ] Implement GET /api/todos (return JSON array)
- [ ] Test with curl: `curl http://localhost:3000/api/todos`

Phase 2: Frontend
- [ ] Create TodoList component (displays array of todos)
- [ ] Fetch todos from API on mount (useEffect)
- [ ] Test: Should show hardcoded test todos from backend

Questions while comparing:

  • Which plan could a junior developer execute without asking questions?
  • Which plan has measurable success criteria for each task?
  • Which plan allows you to verify progress incrementally?
  • Which plan would survive a week-long break and still be understandable?

The Interview Questions They’ll Ask

  1. “What are the benefits of separating planning from execution in software development?”
  2. “How does Kiro’s Plan agent differ from the main execution agent?”
  3. “What makes a good task in an implementation plan?” (Give 3 characteristics)
  4. “How would you handle a plan that’s too granular (100 tiny tasks) vs too vague (5 huge tasks)?”
  5. “When would you skip the planning phase and just start coding?”
  6. “How do you verify that a plan is complete before starting implementation?”

Hints in Layers

Hint 1: Start with Clear Requirements Don’t enter planning mode with “build an app.” Be specific: “Build a React todo app with Express backend, SQLite database, and no authentication.” The Plan agent needs constraints to create a useful plan.

Hint 2: Answer Questions Thoughtfully When the Plan agent asks multiple-choice questions, don’t rush. Each choice affects the entire plan. Choosing “Next.js” vs “Vite + React” changes 10+ tasks.

Hint 3: Review the Plan Before Executing After the Plan agent generates tasks.md, read it fully. Ask: “Is anything missing? Are tasks too vague? Are dependencies clear?” Edit the plan before handing off to execution.

Hint 4: Check Off Tasks as You Go Treat tasks.md like a real checklist. Mark tasks complete with ✓, add notes for deviations, and reference git commits. This creates an audit trail.


Books That Will Help

Topic Book Chapter
Requirements engineering “The Pragmatic Programmer” by Hunt & Thomas Ch. 2: A Pragmatic Approach
Task decomposition “Clean Architecture” by Robert C. Martin Ch. 22: The Clean Architecture
Agile planning “User Stories Applied” by Mike Cohn Ch. 3: Writing Stories

Common Pitfalls & Debugging

Problem 1: “Plan is too vague to execute”

  • Why: Requirements given to Plan agent were generic (“build a web app”)
  • Fix: Provide specific constraints: “React + Node.js, single-user, CRUD operations, no auth”
  • Quick test: Hand the plan to a teammate—can they implement it without asking questions?

Problem 2: “Plan includes 50+ micro-tasks”

  • Why: Asked for extreme detail or Plan agent over-decomposed
  • Fix: Group related micro-tasks into larger tasks (e.g., “Set up database” instead of 5 separate schema tasks)
  • Quick test: Each task should take 30 mins - 2 hours, not 5 minutes

Problem 3: “Implementation deviates from plan”

  • Why: Discovered new requirements or constraints during coding
  • Fix: Update tasks.md as you go—add new tasks, mark others as skipped, document why
  • Quick test: At end of project, tasks.md should reflect what actually happened

Problem 4: “Plan agent suggested wrong tech stack”

  • Why: Didn’t answer clarifying questions carefully or gave conflicting requirements
  • Fix: Re-enter planning mode, explicitly state tech stack: “Must use PostgreSQL, not SQLite”
  • Quick test: Read the plan’s “Overview” section—does it match your actual requirements?

Definition of Done

  • Entered planning mode with Shift+Tab or /plan
  • Answered all clarifying questions from Plan agent
  • Received a tasks.md file with at least 10 specific tasks
  • Tasks are organized into logical phases (setup, backend, frontend, integration)
  • Each task has a clear success criterion or demo requirement
  • Reviewed plan and verified no major features are missing
  • Handed off to execution agent and implemented at least 3 tasks
  • Checked off completed tasks in tasks.md as work progressed

Project 6: “The Custom Persona Generator” — Agent Configuration

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language JSON
Coolness Level Level 3: Genuinely Clever
Business Potential 2. Micro-SaaS (Specialized Agents)
Difficulty Level 2: Intermediate
Knowledge Area Agent Configuration

What you’ll build: A security-auditor.json agent with read-only permissions and OWASP Top 10 in its prompt.

Why it teaches Custom Agents: You build specialized personas for specific SDLC phases.

Core challenges you’ll face:

  • Configuring allowedTools.
  • Injecting static resources.

Success criteria:

  • The agent refuses to write files but can review code.

Real World Outcome

You’ll create a specialized security-auditor.json agent configuration that can analyze code for security vulnerabilities but cannot modify files. When you invoke this agent, it has the OWASP Top 10 embedded in its context and only has read access.

Example Output:

$ cat .kiro/agents/security-auditor.json
{
  "name": "Security Auditor",
  "description": "Reviews code for security vulnerabilities using OWASP Top 10 guidelines",
  "model": "claude-opus-4",
  "prompt": "You are a security auditor specializing in web application vulnerabilities. Your role is to identify security issues based on the OWASP Top 10:\n\n1. Broken Access Control\n2. Cryptographic Failures\n3. Injection\n4. Insecure Design\n5. Security Misconfiguration\n6. Vulnerable and Outdated Components\n7. Identification and Authentication Failures\n8. Software and Data Integrity Failures\n9. Security Logging and Monitoring Failures\n10. Server-Side Request Forgery (SSRF)\n\nAnalyze code for these vulnerabilities. Provide specific line numbers and remediation steps. NEVER modify code—only report findings.",
  "tools": ["read", "grep", "glob"],
  "allowedTools": ["read", "grep", "glob"],
  "resources": [
    "https://owasp.org/www-project-top-ten/"
  ]
}

$ kiro chat --agent security-auditor

Security Auditor: I'm the Security Auditor agent. I'll review your code for OWASP Top 10 vulnerabilities. I can read and analyze code but cannot modify files.

You: "Analyze src/api/auth.ts for security issues"

Security Auditor: [reads src/api/auth.ts]

**Security Audit Report**

File: src/api/auth.ts

**🔴 CRITICAL: SQL Injection (OWASP #3: Injection)**
- **Line 42:** Direct string concatenation in SQL query
```typescript
const query = `SELECT * FROM users WHERE email = '${req.body.email}'`;
  • Risk: Attacker can inject malicious SQL (e.g., ' OR '1'='1)
  • Fix: Use parameterized queries:
    const query = 'SELECT * FROM users WHERE email = ?';
    db.query(query, [req.body.email]);
    

🟠 HIGH: Weak Password Hashing (OWASP #2: Cryptographic Failures)

  • Line 78: Using MD5 for password hashing
    const hash = crypto.createHash('md5').update(password).digest('hex');
    
  • Risk: MD5 is cryptographically broken and fast to crack
  • Fix: Use bcrypt with salt:
    const hash = await bcrypt.hash(password, 10);
    

🟡 MEDIUM: Missing Rate Limiting (OWASP #7: Auth Failures)

  • Line 15-20: Login endpoint has no rate limiting
  • Risk: Brute force attacks on user accounts
  • Fix: Add express-rate-limit middleware

✅ GOOD PRACTICES FOUND:

  • JWT tokens have 1-hour expiry (line 92)
  • HTTPS enforcement via middleware (line 12)

Summary: 3 vulnerabilities found (1 critical, 1 high, 1 medium)

You: “Fix the SQL injection”

Security Auditor: I cannot modify files. I’m configured with read-only access (allowedTools: [“read”, “grep”, “glob”]).

To fix this, you should:

  1. Exit this agent with /exit
  2. Use the main Kiro agent or a developer agent
  3. Implement the parameterized query fix I suggested above ```

The security-auditor agent enforces read-only behavior through allowedTools.


The Core Question You’re Answering

“How do I create specialized AI agents with specific expertise and restricted permissions for different phases of the development lifecycle?”

Before building this, understand: A single general-purpose agent can’t excel at everything. Kiro’s custom agents let you create personas with domain expertise (security, testing, documentation) and controlled permissions (read-only, write-only, specific directories). This project teaches you how to architect agent specialization.


Concepts You Must Understand First

Stop and research these before coding:

  1. Agent Configuration Schema
    • What fields are required vs optional in agent JSON?
    • How does the prompt field differ from the name and description?
    • What’s the precedence: agent config → project config → global config?
    • Reference: Agent Configuration Reference
  2. Tool Permissions
    • What’s the difference between tools (available) and allowedTools (auto-approved)?
    • How do wildcards work? (e.g., read* matches read, read_multiple)
    • What happens if you request a tool not in allowedTools?
    • Reference: Creating Custom Agents
  3. Resource Injection
    • How do resources (URLs, file paths) get loaded into context?
    • What’s the limit on resource size?
    • Can resources be dynamically updated, or are they static at agent creation?
    • Reference: Agent Examples

Questions to Guide Your Design

Before implementing, think through these:

  1. Agent Purpose
    • What specific task should this agent excel at? (security, performance, docs)
    • Should it be read-only, write-only, or full access?
    • What domain knowledge must be embedded in the prompt?
  2. Permission Boundaries
    • Which tools should be allowed? (read, write, grep, shell, MCP tools)
    • Should it access the network? (web_fetch, MCP servers)
    • Should it execute code? (shell, bash)
  3. Prompt Engineering
    • How do you encode expertise in the prompt field?
    • Should you include examples of good/bad patterns?
    • How do you prevent the agent from ignoring its role?

Thinking Exercise

Exercise: Design Agents for SDLC Phases

For a typical software development lifecycle, design 3 specialized agents:

1. Agent Name: __________________
   Purpose: __________________
   Allowed Tools: __________________
   Key Prompt Content: __________________

2. Agent Name: __________________
   Purpose: __________________
   Allowed Tools: __________________
   Key Prompt Content: __________________

3. Agent Name: __________________
   Purpose: __________________
   Allowed Tools: __________________
   Key Prompt Content: __________________

Questions while designing:

  • Which phase needs write access, and which needs read-only?
  • Should the testing agent be able to modify test files?
  • Should the deployment agent have access to shell commands?
  • How do you prevent the security agent from “fixing” code itself?

The Interview Questions They’ll Ask

  1. “Explain the difference between the tools and allowedTools fields in a Kiro agent config.”
  2. “How would you design a read-only code review agent vs a code-writing agent?”
  3. “What security considerations exist when giving an agent shell access?”
  4. “How do resources get injected into an agent’s context?”
  5. “When would you use multiple specialized agents vs one general-purpose agent?”
  6. “How would you test that an agent’s permissions are correctly enforced?”

Hints in Layers

Hint 1: Start with the Prompt Before configuring tools, write the prompt. Define the agent’s role, expertise, and constraints in natural language. Example: “You are a security auditor. NEVER modify code. Only report vulnerabilities.”

Hint 2: Test Permission Boundaries After creating the agent, intentionally try to violate its permissions. If it’s read-only, ask it to write a file. Verify it refuses or prompts for approval.

Hint 3: Use Resources for Static Knowledge Don’t put 10KB of OWASP guidelines in the prompt. Put them in resources as a URL or file path. This keeps the prompt clean and allows updates without changing the agent config.

Hint 4: Model Selection Matters Use claude-opus-4 for complex reasoning tasks (security analysis, architecture review). Use claude-haiku-4 for simple tasks (linting, formatting checks). This optimizes cost and speed.


Books That Will Help

Topic Book Chapter
Security best practices “Practical Malware Analysis” by Sikorski & Honig Ch. 1: Basic Static Techniques
Agent design patterns “The Pragmatic Programmer” by Hunt & Thomas Ch. 7: While You Are Coding
Permission models “Working Effectively with Legacy Code” by Feathers Ch. 4: The Seam Model

Common Pitfalls & Debugging

Problem 1: “Agent ignores its read-only role and suggests writing files”

  • Why: Prompt doesn’t explicitly forbid writing; the agent sees write in available tools
  • Fix: Remove write from the tools array entirely, and add to prompt: “You CANNOT modify files.”
  • Quick test: Ask agent “fix this bug” and verify it refuses to write

Problem 2: “Agent requires approval for every read operation”

  • Why: read is in tools but not in allowedTools
  • Fix: Add "read" to allowedTools array
  • Quick test: kiro chat --agent security-auditor → agent should read files without prompting

Problem 3: “Resources file is too large (>50K)”

  • Why: Loaded entire OWASP documentation as a local file
  • Fix: Use a URL resource instead, or extract only relevant sections
  • Quick test: Check agent context usage with /context show after startup

Problem 4: “Agent config not found”

  • Why: File is not in .kiro/agents/ directory or has incorrect JSON syntax
  • Fix: Verify path: .kiro/agents/security-auditor.json and validate JSON with jq
  • Quick test: kiro chat --agent security-auditor should load without “agent not found” error

Definition of Done

  • Created .kiro/agents/security-auditor.json with valid JSON
  • Configured allowedTools to include only ["read", "grep", "glob"]
  • Wrote a detailed prompt including OWASP Top 10 guidelines
  • Added relevant resources (OWASP URL or security checklist)
  • Tested agent with kiro chat --agent security-auditor
  • Verified agent can read and analyze code files
  • Verified agent refuses or prompts when asked to write/modify files
  • Agent successfully identified at least 1 security issue in sample code

Project 7: “The Executable Spec with mdflow” — Literate Programming

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Markdown / Bash
Coolness Level Level 4: Hardcore Tech Flex
Difficulty Level 3: Advanced
Knowledge Area Literate Programming

What you’ll build: A Markdown spec whose code blocks are executed and validated, keeping docs in sync with reality.

Why it teaches Executable Specs: Documentation that executes cannot rot.

Success criteria:

  • The spec fails when code changes and passes after repair.

Real World Outcome

You’ll create a living specification document where every code example is automatically executed and validated. When your implementation changes, the spec either passes (proving docs are accurate) or fails (alerting you to update them).

Example: API Specification (api-spec.md)

# User Authentication API

## Creating a User

The `/api/users` endpoint accepts POST requests with email and password:

```bash
curl -X POST http://localhost:3000/api/users \
  -H "Content-Type: application/json" \
  -d '{"email":"test@example.com","password":"secure123"}'

Expected response:

{
  "id": "usr_abc123",
  "email": "test@example.com",
  "created_at": "2025-01-02T10:00:00Z"
}

When you run mdflow execute api-spec.md:

$ mdflow execute api-spec.md

Running: api-spec.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✓ Block 1: curl POST /api/users
  Status: 201 Created
  Response matched expected JSON schema

✓ Block 2: Expected response validation
  Field 'id' matches pattern: usr_[a-z0-9]+
  Field 'email' equals: test@example.com
  Field 'created_at' is valid ISO 8601

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
All blocks passed ✓ (2/2)
Execution time: 1.2s

When the API breaks:

$ mdflow execute api-spec.md

Running: api-spec.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✗ Block 1: curl POST /api/users
  Status: 500 Internal Server Error
  Expected: 201 Created

  Response:
  {
    "error": "Database connection failed"
  }

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FAILED: 1 of 2 blocks failed
Execution time: 0.8s

This forces you to either fix the implementation or update the spec. Documentation can never drift from reality.


The Core Question You’re Answering

“How do I ensure my documentation stays synchronized with my actual codebase as it evolves?”

Most documentation becomes outdated within weeks of writing. Code examples break, APIs change, but the docs remain frozen in time. This project addresses the fundamental problem: passive documentation rots, executable documentation validates itself.

By embedding executable tests directly in your specification, you create a contract that must be maintained. When the contract breaks, CI fails, forcing alignment.


Concepts You Must Understand First

Stop and research these before coding:

  1. Literate Programming
    • What did Donald Knuth mean by “programs as literature”?
    • How does weaving code with narrative improve understanding?
    • Why is order of presentation different from order of execution?
    • Book Reference: “Literate Programming” by Donald E. Knuth
  2. Test-Driven Documentation
    • How do executable examples serve as both docs and tests?
    • What makes a good assertion in documentation?
    • When should examples be simplified vs realistic?
    • Book Reference: “Growing Object-Oriented Software, Guided by Tests” Ch. 2
  3. Markdown Processing
    • How do you parse and extract fenced code blocks?
    • What metadata can be attached to code blocks (language, annotations)?
    • How do you preserve line numbers for error reporting?
    • Web Reference: CommonMark Specification - Fenced Code Blocks

Questions to Guide Your Design

Before implementing, think through these:

  1. Execution Model
    • How do you isolate each code block’s execution environment?
    • Should blocks share state, or run independently?
    • How do you handle blocks that depend on previous outputs?
    • What happens if block 3 fails—do you run block 4?
  2. Assertion Syntax
    • How do users specify expected outputs (inline, separate blocks)?
    • Do you support regex matching, JSON schema validation, or both?
    • How do you handle non-deterministic outputs (timestamps, IDs)?
    • Should exit codes alone determine success, or stdout comparison?
  3. Language Support
    • How do you execute different languages (bash, python, curl)?
    • Do you need sandboxing (Docker containers, chroot)?
    • How do you manage dependencies (language runtimes, system packages)?
    • Should you support custom interpreters per project?

Thinking Exercise

Trace: Multi-Step API Workflow

Given this specification:

## User Workflow

Create a user:
```bash
USER_ID=$(curl -s POST /api/users -d '{"email":"test@example.com"}' | jq -r .id)

Verify creation:

curl GET "/api/users/$USER_ID"

Expected: {"id":"$USER_ID","email":"test@example.com"}


*Questions while designing:*
- How do you propagate `$USER_ID` from block 1 to block 2?
- Should the spec run in a single shell session, or fresh shells per block?
- What if `USER_ID` is empty because block 1 failed—should block 2 run?
- How do you validate that the returned ID matches the captured variable?

**Design Decision Matrix:**

| Approach | Pros | Cons |
|----------|------|------|
| Single shell session | State persists naturally | Pollution between tests |
| Environment variables | Explicit data flow | Manual propagation |
| JSON output files | Language-agnostic | Filesystem clutter |

---

#### The Interview Questions They'll Ask

1. "How would you design a system to execute code blocks from Markdown while preserving security boundaries?"

2. "Explain the tradeoffs between making documentation executable versus keeping separate test suites."

3. "How do you handle non-deterministic outputs (timestamps, random IDs) in executable documentation?"

4. "What strategies prevent test pollution when documentation blocks depend on shared state?"

5. "How would you integrate this into CI/CD to fail builds when documentation drifts from implementation?"

6. "Describe how you'd support multiple programming languages in a single specification document."

---

#### Hints in Layers

**Hint 1: Start with a Parser**
Use a Markdown parser (like `markdown-it` in Node.js or `mistune` in Python) to extract fenced code blocks. Store metadata (language, line numbers) for each block.

**Hint 2: Execution Strategy**
For each code block:
- Write code to a temporary script file
- Execute using the appropriate interpreter (`bash`, `python3`, `node`)
- Capture stdout, stderr, and exit code
- Compare against expected outputs (if specified)

**Hint 3: State Management**
Create a temporary directory as a "sandbox workspace":

/tmp/mdflow-session-abc123/ ├── block-1.sh ├── block-1.stdout ├── block-2.sh └── shared.env # Environment variables for state


**Hint 4: Assertion Annotations**
Support special comments for assertions:
```markdown
```bash
curl /api/users/123
# expect-status: 200
# expect-json: {"id":"123"}

Parse these comments to build validation rules.

---

#### Books That Will Help

| Topic | Book | Chapter |
|-------|------|---------|
| Literate Programming Philosophy | "Literate Programming" by Donald E. Knuth | Introduction & Ch. 1 |
| Test-Driven Development | "Test Driven Development: By Example" by Kent Beck | Part I |
| Markdown Parsing | "Crafting Interpreters" by Robert Nystrom | Ch. 4 (Scanning) |
| Documentation as Code | "Docs for Developers" by Jared Bhatti et al. | Ch. 6 |

---

#### Common Pitfalls & Debugging

**Problem 1: "Code blocks fail due to missing dependencies"**
- **Why:** Spec assumes tools are installed (curl, jq, etc.)
- **Fix:** Add a validation phase that checks for required binaries before execution
- **Quick test:** `command -v curl || echo "Missing curl"`

**Problem 2: "Non-deterministic outputs cause false failures"**
- **Why:** Timestamps, UUIDs, or random data change every run
- **Fix:** Support regex patterns or placeholder matching (`expect-pattern: usr_[a-z0-9]+`)
- **Quick test:** Replace exact matches with pattern assertions

**Problem 3: "State leaks between blocks"**
- **Why:** Environment variables, temp files, or database records persist
- **Fix:** Run each block in a fresh subprocess with isolated environment
- **Quick test:** Add `set -u` to bash blocks to catch undefined variables

**Problem 4: "Error messages don't point to the right line in the spec"**
- **Why:** You're losing line number context during extraction
- **Fix:** Store original line numbers when parsing, include them in error reports
- **Quick test:** `Error in api-spec.md:15 (block 2)`

---

#### Definition of Done

- [ ] Parser extracts all fenced code blocks with metadata (language, line numbers)
- [ ] Executor runs bash and at least one other language (Python or curl)
- [ ] Assertions validate exit codes and stdout/stderr content
- [ ] Failed blocks produce clear error messages with file/line references
- [ ] Spec execution stops on first failure (or continues with `--keep-going` flag)
- [ ] Environment isolation prevents state leaks between blocks
- [ ] README includes example spec demonstrating success and failure cases
- [ ] CI integration example shows how to fail builds on spec failures

---

### [Project 8: "The Property Based Testing Suite" — Advanced Testing](/guides/kiro-cli-mastery/P08-the-property-based-testing-suite-advanced-testing)

| Attribute | Value |
|-----------|-------|
| **File** | `KIRO_CLI_LEARNING_PROJECTS.md` |
| **Main Programming Language** | Python (Hypothesis) or TypeScript (fast-check) |
| **Coolness Level** | Level 4: Hardcore Tech Flex |
| **Difficulty** | Level 3: Advanced |
| **Knowledge Area** | Advanced Testing |

**What you'll build**: A booking system tested with PBT to prove no overlapping bookings.

**Why it teaches PBT**: It exposes subtle edge cases AI might miss.

**Success criteria**:
- PBT finds at least one real bug before you fix it.

---

#### Real World Outcome

You'll implement a room booking system where property-based testing automatically generates thousands of test cases, exposing edge cases like timezone boundaries, concurrent bookings, and off-by-one errors that example-based tests would miss.

**Example: Booking System Test Output**

```python
# test_booking.py
from hypothesis import given, strategies as st
from datetime import datetime, timedelta
from booking import BookingSystem

@given(
    bookings=st.lists(
        st.tuples(
            st.datetimes(min_value=datetime(2025,1,1), max_value=datetime(2025,12,31)),
            st.integers(min_value=1, max_value=8)  # duration in hours
        ),
        min_size=2,
        max_size=50
    )
)
def test_no_overlapping_bookings(bookings):
    system = BookingSystem()

    for start, duration in bookings:
        end = start + timedelta(hours=duration)
        system.book("room-A", start, end)

    # Property: No two bookings should overlap
    all_bookings = system.get_bookings("room-A")
    for i, booking1 in enumerate(all_bookings):
        for booking2 in all_bookings[i+1:]:
            assert not booking1.overlaps(booking2), \
                f"Overlap detected: {booking1} and {booking2}"

When you run the tests:

$ pytest test_booking.py -v

test_booking.py::test_no_overlapping_bookings FAILED

================================= FAILURES =================================
test_no_overlapping_bookings - AssertionError

Falsifying example:
  bookings = [
    (datetime(2025, 3, 15, 14, 0, 0), 2),  # 14:00-16:00
    (datetime(2025, 3, 15, 15, 59, 59), 1) # 15:59:59-16:59:59
  ]

AssertionError: Overlap detected:
  Booking(start=2025-03-15 14:00:00, end=2025-03-15 16:00:00)
  Booking(start=2025-03-15 15:59:59, end=2025-03-15 16:59:59)

Hypothesis found a counterexample after 147 test cases.
Shrunk input to minimal failing case.

The bug revealed: Your overlap check used start < other_end and end > other_start, but failed on second-level precision boundaries. The fix:

def overlaps(self, other):
    # Fixed: Use <= for inclusive boundary checking
    return self.start < other.end and self.end > other.start

After fixing:

$ pytest test_booking.py -v

test_booking.py::test_no_overlapping_bookings PASSED

Hypothesis ran 100 test cases (2,847 examples total)
All properties hold ✓

Property-based testing generated 2,847 booking combinations and proved your invariant holds across all of them.


The Core Question You’re Answering

“How do I test properties that must hold for ALL possible inputs, not just the examples I thought of?”

Traditional example-based testing forces you to imagine edge cases. You write tests for:

  • Normal case: 2pm-3pm
  • Boundary case: Midnight
  • Edge case: Leap year February 29th

But you’ll always miss combinations. Property-based testing inverts this: you state the invariant (no overlaps), and the framework generates inputs designed to break it.

This project teaches you to think in properties (universal truths) rather than examples (specific scenarios).


Concepts You Must Understand First

Stop and research these before coding:

  1. Property-Based Testing (PBT) vs Example-Based Testing
    • What is a “property” in the context of testing?
    • How does random generation differ from hand-crafted examples?
    • What is “shrinking” and why is it critical for debugging?
    • Book Reference: “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert - Ch. 1
  2. Test Generators and Strategies
    • How do you define the space of valid inputs?
    • What constraints ensure generated data is realistic?
    • How do you generate dependent values (end time > start time)?
    • Web Reference: Hypothesis Documentation - Strategies
  3. Invariants and Postconditions
    • What makes a good invariant (universally true property)?
    • How do you express “for all X, property P holds”?
    • When should you test state transitions vs final outcomes?
    • Book Reference: “Growing Object-Oriented Software, Guided by Tests” Ch. 19

Questions to Guide Your Design

Before implementing, think through these:

  1. System Properties
    • What invariants must ALWAYS hold in your booking system?
    • No overlapping bookings for the same room
    • Booking end time > start time
    • Cannot book in the past
    • Total bookings <= room capacity
    • Which of these can be violated by bad inputs vs implementation bugs?
  2. Test Data Generation
    • How do you generate realistic datetime ranges?
    • Should you test with timezones, or UTC only?
    • How do you ensure generated bookings have variety (short, long, overnight)?
    • Do you need to generate user IDs, or just time ranges?
  3. Shrinking Strategy
    • When a test fails with 50 bookings, how do you find the minimal failing case?
    • Should you shrink by removing bookings, or simplifying time ranges?
    • How do you preserve the failure while reducing complexity?

Thinking Exercise

Property Discovery: Booking System Invariants

Given a booking system with this interface:

class BookingSystem:
    def book(room_id, start, end) -> booking_id
    def cancel(booking_id) -> bool
    def get_bookings(room_id) -> List[Booking]

List all properties that should ALWAYS hold:

Temporal Properties:

  1. For any booking: booking.end > booking.start
  2. Cannot book a time in the past relative to system time

Collision Properties:

  1. No two active bookings for the same room overlap
  2. After canceling booking X, overlaps must be recalculated

State Properties:

  1. Total active bookings equals successful book() calls minus cancel() calls
  2. get_bookings() returns bookings in chronological order

Now design PBT tests for each:

Property Generator Strategy Assertion
1. End > Start Generate (start, start + positive_delta) assert booking.end > booking.start
3. No overlaps Generate list of (start, duration) tuples Pairwise overlap check
5. Booking count Generate sequence of book/cancel actions assert len(get_bookings) == expected

The Interview Questions They’ll Ask

  1. “Explain the difference between property-based testing and fuzzing. When would you use each?”

  2. “How would you write a property-based test for a sorting algorithm without reimplementing the sort?”

  3. “What strategies would you use to generate valid JSON that conforms to a specific schema?”

  4. “Describe how shrinking works in Hypothesis/QuickCheck and why it’s essential for debugging.”

  5. “How would you test a distributed system’s consistency guarantees using property-based testing?”

  6. “What are the limitations of PBT? Name scenarios where example-based tests are superior.”


Hints in Layers

Hint 1: Start with Simple Properties Before testing complex booking logic, verify basic properties:

@given(st.datetimes(), st.timedelta(min_value=timedelta(hours=1)))
def test_booking_duration_positive(start, duration):
    end = start + duration
    booking = Booking(start, end)
    assert booking.duration() > timedelta(0)

Hint 2: Use Composite Strategies Generate bookings that meet domain constraints:

valid_booking = st.builds(
    Booking,
    start=st.datetimes(min_value=datetime(2025,1,1)),
    duration=st.integers(min_value=1, max_value=8).map(lambda h: timedelta(hours=h))
)

Hint 3: Test State Machines Model booking workflows as state transitions:

class BookingStateMachine(RuleBasedStateMachine):
    @rule(start=datetimes(), duration=hours())
    def book_room(self, start, duration):
        self.system.book("room-A", start, start+duration)
        # Invariant: check no overlaps after every booking

Hint 4: Shrinking and Debugging When a test fails, Hypothesis automatically simplifies the input. Example:

Initial failure: 50 bookings
Shrunk to: 2 bookings (minimal reproduction)

Books That Will Help

Topic Book Chapter
PBT Fundamentals “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert Ch. 1-3
Hypothesis (Python) “Effective Python” by Brett Slatkin Item 76
QuickCheck (Haskell) “Learn You a Haskell for Great Good!” by Miran Lipovača Ch. 11
State Machine Testing “Hypothesis Documentation” (online) Stateful Testing Guide

Common Pitfalls & Debugging

Problem 1: “Tests pass locally but fail in CI due to timezone differences”

  • Why: Generated datetimes assume local timezone
  • Fix: Always use UTC for test data: st.datetimes(timezones=st.just(timezone.utc))
  • Quick test: export TZ=America/New_York && pytest

Problem 2: “Hypothesis generates unrealistic edge cases (year 9999)”

  • Why: Default datetime range is too broad
  • Fix: Constrain generators to realistic bounds: min_value=datetime(2025,1,1), max_value=datetime(2030,12,31)
  • Quick test: Add @settings(verbosity=Verbosity.verbose) to see generated values

Problem 3: “Test fails intermittently with different shrunk examples”

  • Why: Property relies on system state (database, clock)
  • Fix: Use deterministic seeds and isolated test fixtures
  • Quick test: @given(...) @settings(derandomize=True)

Problem 4: “Shrinking takes too long (>30 seconds)”

  • Why: Complex data structures with many interdependencies
  • Fix: Simplify generators or use @settings(max_examples=50) during development
  • Quick test: Monitor shrinking with --hypothesis-show-statistics

Definition of Done

  • Implemented booking system with book(), cancel(), and get_bookings() methods
  • Property test verifies no overlapping bookings (with Hypothesis generating 100+ examples)
  • Property test found at least one real bug (documented in README)
  • Tests use constrained datetime generation (realistic time ranges)
  • Shrinking produces minimal failing examples (verified manually)
  • README explains each property being tested and why it matters
  • CI runs PBT with fixed seed for reproducible failures
  • Coverage report shows all edge cases exercised by generated inputs

Project 9: “The Postgres Analyst” — Model Context Protocol (MCP)

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language SQL / JSON (Config)
Coolness Level Level 3: Genuinely Clever
Business Potential 3. Service & Support (Data Ops)
Difficulty Level 2: Intermediate
Knowledge Area Model Context Protocol (MCP)

What you’ll build: Configure postgres-mcp in mcp.json and run queries via Kiro.

Why it teaches MCP: Kiro gains real, typed access to your database.

Core challenges you’ll face:

  • Correct connection strings.
  • Using read-only DB users.

Success criteria:

  • Kiro answers schema-based questions by executing real SQL.

Real World Outcome

You’ll configure the PostgreSQL MCP server so Kiro can directly query your database to answer questions like “How many active users signed up in December?” without you writing SQL manually. Kiro inspects your schema, generates queries, and returns formatted results.

Configuration (.kiro/settings/mcp.json):

{
  "mcpServers": {
    "postgres-analytics": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": {
        "POSTGRES_CONNECTION_STRING": "postgresql://readonly_user:password@localhost:5432/production_db"
      }
    }
  }
}

When you ask Kiro to analyze data:

$ kiro chat

You: "How many users registered in the last 30 days?"

Kiro: Let me query the database to find that information.

[Tool: postgres-analytics/query]
SELECT COUNT(*) as user_count
FROM users
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
  AND created_at < CURRENT_DATE;

Result:
┌────────────┐
│ user_count │
├────────────┤
│    1,247   │
└────────────┘

There were 1,247 new user registrations in the last 30 days.

You: "Show me the top 5 products by revenue this month"

Kiro: [Tool: postgres-analytics/query]
SELECT
  p.name,
  SUM(oi.quantity * oi.price) as total_revenue
FROM products p
JOIN order_items oi ON p.id = oi.product_id
JOIN orders o ON oi.order_id = o.id
WHERE o.created_at >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY p.id, p.name
ORDER BY total_revenue DESC
LIMIT 5;

Result:
┌─────────────────────┬────────────────┐
│ name                │ total_revenue  │
├─────────────────────┼────────────────┤
│ Premium Plan (Year)$24,500.00  │
│ Pro Subscription    │    $18,200.00  │
│ Enterprise License  │    $15,000.00  │
│ Starter Kit         │     $8,750.00  │
│ Add-on Module       │     $3,200.00  │
└─────────────────────┴────────────────┘

The power: Kiro inspected your schema automatically and generated production-safe, read-only SQL. You never wrote a query manually.


The Core Question You’re Answering

“How can I give an AI assistant safe, structured access to my live database for analytics without risking data corruption?”

Traditional BI tools require learning query languages, dashboards become stale, and ad-hoc questions require engineering time. This project solves the paradox: grant database access without granting database risk.

By configuring MCP with a read-only user and schema introspection, Kiro becomes your personal data analyst that can’t break anything.


Concepts You Must Understand First

Stop and research these before coding:

  1. PostgreSQL Connection Strings
    • What components make up a connection URI (user, host, port, database)?
    • How do you specify SSL/TLS requirements in connection strings?
    • What’s the difference between connection pooling and direct connections?
    • Book Reference: “PostgreSQL: Up and Running” by Regina Obe - Ch. 2
  2. Database Permissions and Roles
    • How do you create a read-only user in PostgreSQL?
    • What’s the difference between GRANT SELECT and GRANT USAGE?
    • How do you revoke write permissions (INSERT, UPDATE, DELETE)?
    • Web Reference: PostgreSQL Documentation - GRANT
  3. MCP Server Configuration
    • How does Kiro communicate with MCP servers (stdio vs HTTP)?
    • What environment variables are passed to MCP server processes?
    • How do you debug MCP server startup failures?
    • Web Reference: Model Context Protocol Specification

Questions to Guide Your Design

Before implementing, think through these:

  1. Security Boundaries
    • Should the MCP server connect as a read-only user, or use row-level security?
    • How do you prevent Kiro from accessing sensitive tables (passwords, PII)?
    • Should you use a separate analytics database (replica)?
    • What happens if Kiro generates expensive queries (table scans)?
  2. Schema Discovery
    • How does Kiro learn about your tables, columns, and relationships?
    • Should you provide table descriptions as MCP resources?
    • How do you handle dynamic schemas (frequent migrations)?
    • Should Kiro see views, or only base tables?
  3. Query Safety
    • How do you prevent queries that could time out (missing indexes)?
    • Should you enforce query timeouts at the database or MCP level?
    • How do you log all SQL executed by Kiro for audit purposes?
    • What if Kiro generates syntactically correct but semantically wrong SQL?

Thinking Exercise

Scenario: Granting Safe Access

Given this database schema:

CREATE TABLE users (
  id UUID PRIMARY KEY,
  email VARCHAR(255) NOT NULL,
  password_hash VARCHAR(255) NOT NULL,  -- SENSITIVE
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE orders (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  total DECIMAL(10,2),
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE sessions (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  token VARCHAR(255) NOT NULL,  -- SENSITIVE
  expires_at TIMESTAMP
);

Design a read-only role for Kiro:

Option 1: Table-Level Permissions

CREATE ROLE kiro_readonly WITH LOGIN PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE production_db TO kiro_readonly;
GRANT USAGE ON SCHEMA public TO kiro_readonly;
GRANT SELECT ON users, orders TO kiro_readonly;
-- Problem: What about the password_hash column in users?

Option 2: Column-Level Permissions

-- PostgreSQL doesn't support column-level GRANT SELECT
-- Must use views instead
CREATE VIEW users_safe AS
  SELECT id, email, created_at FROM users;

GRANT SELECT ON users_safe, orders TO kiro_readonly;
-- Better: Kiro can't see password_hash

Option 3: Row-Level Security

ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
CREATE POLICY sessions_no_access ON sessions FOR SELECT
  USING (false);  -- Block all access to sessions table

GRANT SELECT ON sessions TO kiro_readonly;
-- Kiro sees the table exists but gets 0 rows

Which approach is best for analytics? Consider:

  • Transparency (Kiro knows columns exist but can’t access)
  • Maintenance (adding new tables requires updating grants)
  • Performance (views add overhead)

The Interview Questions They’ll Ask

  1. “How would you design a read-only database role that can access analytics tables but not PII columns?”

  2. “Explain the security implications of giving an AI direct database access. What guardrails would you implement?”

  3. “How do you prevent an AI from generating expensive queries that could impact production performance?”

  4. “Describe the tradeoffs between using database views vs application-level filtering for sensitive data.”

  5. “How would you audit and log all SQL queries executed by an AI assistant for compliance purposes?”

  6. “What strategies would you use to handle schema changes without breaking the MCP server configuration?”


Hints in Layers

Hint 1: Create a Read-Only User

CREATE ROLE kiro_analyst WITH LOGIN PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE analytics_db TO kiro_analyst;
GRANT USAGE ON SCHEMA public TO kiro_analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO kiro_analyst;

-- Ensure future tables are also read-only
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO kiro_analyst;

Hint 2: Test Permissions

psql -U kiro_analyst -d analytics_db -c "SELECT * FROM users LIMIT 1;"
# Should succeed

psql -U kiro_analyst -d analytics_db -c "DELETE FROM users WHERE id = '123';"
# Should fail: ERROR: permission denied for table users

Hint 3: Configure MCP with Connection String

{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": {
        "POSTGRES_CONNECTION_STRING": "postgresql://kiro_analyst:secure_password@localhost:5432/analytics_db?sslmode=require"
      }
    }
  }
}

Hint 4: Verify MCP Server is Running

kiro chat

You: "/context show"

# Should show:
MCP Servers:
  - postgres (stdio) - Connected ✓
    Tools: query_postgres, describe_schema

Books That Will Help

Topic Book Chapter
PostgreSQL Permissions “PostgreSQL: Up and Running” by Regina Obe Ch. 3 (Roles and Privileges)
SQL Security “SQL Antipatterns” by Bill Karwin Ch. 15 (SQL Injection)
Database Design “Designing Data-Intensive Applications” by Martin Kleppmann Ch. 2 (Data Models)
MCP Protocol “Model Context Protocol Specification” (online) Server Implementation Guide

Common Pitfalls & Debugging

Problem 1: “MCP server fails to start with ‘connection refused’“

  • Why: PostgreSQL isn’t accepting connections on the specified host/port
  • Fix: Verify PostgreSQL is running: pg_isready -h localhost -p 5432
  • Quick test: psql -U kiro_analyst -d analytics_db should connect

Problem 2: “Kiro can see schema but gets ‘permission denied’ on queries”

  • Why: Role has USAGE on schema but not SELECT on tables
  • Fix: GRANT SELECT ON ALL TABLES IN SCHEMA public TO kiro_analyst;
  • Quick test: SELECT table_name FROM information_schema.role_table_grants WHERE grantee = 'kiro_analyst';

Problem 3: “Connection string in mcp.json exposed in logs”

  • Why: Environment variables are logged during MCP server startup
  • Fix: Use a .env file loaded by the MCP server wrapper script
  • Quick test: Grep Kiro logs for password leaks

Problem 4: “Queries time out on large tables”

  • Why: Kiro generated table scans without LIMIT clauses
  • Fix: Set statement_timeout for the role: ALTER ROLE kiro_analyst SET statement_timeout = '30s';
  • Quick test: SHOW statement_timeout; as kiro_analyst

Definition of Done

  • Created read-only PostgreSQL role with SELECT-only permissions
  • Verified role cannot execute INSERT, UPDATE, DELETE, or DROP
  • Configured postgres-mcp in .kiro/settings/mcp.json with secure connection string
  • Kiro can query schema and execute SELECT statements successfully
  • Tested that Kiro cannot access sensitive tables (sessions, password_hash columns)
  • Set query timeout to prevent runaway queries (30 seconds)
  • Documented all granted permissions in README
  • Logged sample query outputs showing Kiro generating and executing SQL

Project 10: “The GitHub Project Manager” — Workflow Automation

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language JSON (Config)
Coolness Level Level 2: Practical
Difficulty Level 2: Intermediate
Knowledge Area Workflow Automation

What you’ll build: Integrate @modelcontextprotocol/server-github and generate issue summaries and PR drafts.

Why it teaches Tool Integration: Kiro bridges code and project management without context switching.

Success criteria:

  • Kiro can summarize open issues with a label filter.

Real World Outcome

You’ll configure the GitHub MCP server so Kiro can list issues, search PRs, read file contents, and draft pull request descriptions—all without leaving the terminal. This turns Kiro into a project manager that understands both your code and your issue tracker.

Configuration (.kiro/settings/mcp.json):

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "ghp_your_personal_access_token_here"
      }
    }
  }
}

When you ask Kiro to manage issues:

$ kiro chat

You: "Summarize all open bugs with the 'critical' label"

Kiro: [Tool: github/search_issues]
  Query: repo:your-org/your-repo is:issue is:open label:critical

[Tool: github/get_issue] (for each result)

**Critical Bugs Summary** (3 open)

1. **Database connection pool exhausted** (#342)
   - Opened: 2 days ago by @alice
   - Last updated: 4 hours ago
   - Symptom: Production app crashes after ~1000 concurrent users
   - No PR linked yet

2. **Payment gateway timeout on Stripe webhook** (#338)
   - Opened: 5 days ago by @bob
   - PR #340 (in review)
   - Root cause: Missing retry logic in webhook handler

3. **Data corruption in user preferences migration** (#335)
   - Opened: 1 week ago by @charlie
   - Assigned to: @dave
   - Rollback already deployed, fix in progress

**Priority Order Recommendation:**
- Fix #342 first (production outage risk)
- Merge PR #340 for #338 (already has solution)
- #335 can wait (rollback mitigates)

You: "Draft a PR description for fixing #342"

Kiro: [Tool: github/get_issue] (reads #342 full details)
[Tool: github/search_code] (finds connection pool config)

**Pull Request Draft:**

---
**Title:** Fix database connection pool exhaustion under high load

**Fixes:** #342

**Problem:**
The application crashes when concurrent user count exceeds ~1,000 due to database connection pool exhaustion. Current pool size is hardcoded to 10 connections.

**Root Cause:**
`config/database.js` sets `max: 10` without considering production load. Under load testing, connection wait time exceeded timeout threshold.

**Solution:**
- Increase connection pool size to 50 (based on DB max_connections=100)
- Add connection timeout of 5 seconds with retry logic
- Implement connection pool monitoring via Prometheus metrics

**Changes:**
- `config/database.js` - Updated pool configuration
- `lib/metrics.js` - Added pool health gauges
- `tests/load/connection-pool.test.js` - Load test validation

**Testing:**
- Load test with 2,000 concurrent users (sustained 10 min)
- Verified connection pool metrics in Grafana
- No connection timeouts observed

**Checklist:**
- [ ] Updated documentation (deployment guide)
- [ ] Added monitoring alerts for pool >80% utilization
- [ ] Tested rollback scenario
---

You: "Create the PR on branch 'fix/db-pool' targeting 'main'"

Kiro: [Tool: github/create_pull_request]

PR created: https://github.com/your-org/your-repo/pull/345

The power: Kiro read issue history, searched the codebase, and drafted a production-ready PR description without you switching between GitHub UI, terminal, and code editor.


The Core Question You’re Answering

“How do I integrate my issue tracker and code repository so I can manage projects conversationally without context switching?”

Traditional workflows require:

  1. Open GitHub in browser
  2. Read issue
  3. Switch to terminal
  4. Grep codebase for relevant files
  5. Switch back to browser
  6. Draft PR description (copying from terminal)

This project eliminates the context-switching tax by giving Kiro unified access to both code and project management.


Concepts You Must Understand First

Stop and research these before coding:

  1. GitHub Personal Access Tokens (PAT)
    • What scopes does the GitHub MCP server require (repo, read:org)?
    • How do you create a fine-grained token with minimal permissions?
    • Where should you store tokens securely (never in git)?
    • Web Reference: GitHub - Creating a Personal Access Token
  2. GitHub Search Syntax
    • How do you filter issues by label, state, author, and date?
    • What’s the difference between searching issues vs code?
    • How do you use qualifiers like is:pr, is:open, label:bug?
    • Web Reference: GitHub - Searching Issues and Pull Requests
  3. MCP Tools vs Resources
    • What’s the difference between calling a tool and loading a resource?
    • How do MCP tools return structured data (JSON schemas)?
    • How do you handle pagination in GitHub API responses?
    • Web Reference: Model Context Protocol - Tools

Questions to Guide Your Design

Before implementing, think through these:

  1. Token Permissions
    • Should you use a classic PAT or fine-grained token?
    • What’s the minimum scope needed (read-only issues, or also write PRs)?
    • How do you rotate tokens without breaking the MCP server?
    • Should you use organization-level tokens or personal tokens?
  2. Workflow Automation
    • What tasks should Kiro handle automatically (summarize) vs require approval (create PR)?
    • How do you prevent Kiro from creating duplicate issues?
    • Should Kiro be able to close issues, or only comment?
    • How do you audit all actions Kiro takes on GitHub?
  3. Context Boundaries
    • How much issue history should Kiro load (all comments, or just descriptions)?
    • Should Kiro analyze linked PRs and commits?
    • How do you prevent Kiro from leaking private repo data in logs?

Thinking Exercise

Scenario: Triaging a Bug Report

Given this GitHub issue:

Title: App crashes on iOS 14 when uploading photos

Body:
Steps to reproduce:
1. Open app on iPhone 8 (iOS 14.8)
2. Tap "Upload Photo"
3. Select image from camera roll
4. App crashes immediately

Expected: Photo uploads successfully
Actual: App crashes, no error message

Environment:
- iPhone 8, iOS 14.8
- App version 2.3.1

Questions Kiro should investigate:

  1. Search for similar issues:
    • Query: repo:your-org/your-app is:issue label:ios "upload" "crash"
    • Are there duplicates or related bugs?
  2. Analyze codebase:
    • Where is photo upload logic? (grep -r "upload.*photo" src/)
    • What changed in version 2.3.1? (git diff v2.3.0..v2.3.1)
  3. Check iOS version compatibility:
    • Does the app support iOS 14? (check package.json or podfile)
    • Were there recent iOS SDK changes?
  4. Draft response:
    • If it’s a duplicate: link to existing issue
    • If it’s new: suggest debug steps (enable logging, check memory usage)
    • If it’s a known limitation: explain iOS 14 support status

Kiro’s workflow should be:

  1. Search issues → 2. Search code → 3. Check git history → 4. Draft comment

The Interview Questions They’ll Ask

  1. “How would you design an AI assistant that can triage GitHub issues without creating spam or incorrect closures?”

  2. “Explain the security implications of granting an AI write access to your GitHub repository. What guardrails would you implement?”

  3. “How do you prevent an AI from leaking private repository information when generating responses?”

  4. “Describe how you would implement approval workflows for AI-generated pull requests in a team environment.”

  5. “What strategies would you use to keep GitHub integration working across API version changes and rate limits?”


Hints in Layers

Hint 1: Create a GitHub Personal Access Token

  1. Go to GitHub Settings → Developer settings → Personal access tokens → Fine-grained tokens
  2. Set repository access (all repos or specific ones)
  3. Grant permissions: Contents: Read, Issues: Read/Write, Pull Requests: Read/Write
  4. Copy token and store in .env file (never commit to git)

Hint 2: Configure MCP Server

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"  // Loaded from environment
      }
    }
  }
}

Hint 3: Test GitHub Integration

kiro chat

You: "List the 5 most recently updated issues in my repo"

# Kiro should call: github/search_issues
# You should see a list of issues with titles, numbers, and states

Hint 4: Draft a PR Description Template Create a steering file (.kiro/steering/github-pr-template.md):

# Pull Request Template

Always include these sections when drafting PRs:
- **Problem:** What issue does this solve?
- **Solution:** High-level approach
- **Changes:** Files modified
- **Testing:** How to verify
- **Checklist:** Deployment steps

Books That Will Help

Topic Book Chapter
GitHub Workflow “Pro Git” by Scott Chacon Ch. 6 (GitHub)
API Integration “Designing Data-Intensive Applications” by Martin Kleppmann Ch. 4 (Encoding)
Automation Best Practices “The DevOps Handbook” by Gene Kim et al. Part III (Flow)

Common Pitfalls & Debugging

Problem 1: “MCP server fails with ‘Bad credentials’“

  • Why: GitHub token is invalid or expired
  • Fix: Regenerate token with correct scopes, update .env file
  • Quick test: curl -H "Authorization: token YOUR_TOKEN" https://api.github.com/user

Problem 2: “Kiro can read issues but cannot create PRs”

  • Why: Token lacks Pull Requests: Write permission
  • Fix: Edit token permissions in GitHub settings
  • Quick test: gh pr create --title "Test" --body "Test" (using GitHub CLI)

Problem 3: “Rate limit exceeded errors”

  • Why: GitHub API has rate limits (5,000 requests/hour for authenticated)
  • Fix: Implement caching or reduce query frequency
  • Quick test: Check remaining quota: curl -H "Authorization: token YOUR_TOKEN" https://api.github.com/rate_limit

Problem 4: “Kiro creates PR with wrong base branch”

  • Why: Default branch is not ‘main’ in your repo (might be ‘master’ or ‘develop’)
  • Fix: Specify base branch explicitly in steering file or prompt
  • Quick test: git remote show origin | grep "HEAD branch"

Definition of Done

  • Created GitHub Personal Access Token with appropriate scopes
  • Configured github-mcp in .kiro/settings/mcp.json with secure token
  • Verified Kiro can list and search issues/PRs successfully
  • Kiro can read file contents from repository
  • Drafted at least one PR description using issue context + codebase search
  • Tested that token has minimal required permissions (no admin access)
  • Created steering file with PR description template
  • Documented all GitHub integration workflows in README

Project 11: “The AWS Cloud Architect” — Cloud Infrastructure

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Terraform / AWS CLI
Coolness Level Level 3: Genuinely Clever
Difficulty Level 3: Advanced
Knowledge Area Cloud Infrastructure

What you’ll build: Use AWS MCP to audit S3 buckets and generate Terraform fixes.

Why it teaches Cloud Ops: Kiro becomes an infra reviewer with live state access.

Success criteria:

  • Generated Terraform closes at least one real misconfiguration.

Real World Outcome

You’ll configure the AWS MCP server so Kiro can audit your cloud infrastructure, identify security misconfigurations (public S3 buckets, overly permissive IAM policies), and generate Terraform code to remediate them—turning Kiro into an automated security auditor.

Configuration (.kiro/settings/mcp.json):

{
  "mcpServers": {
    "aws": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-aws"],
      "env": {
        "AWS_PROFILE": "production-readonly",
        "AWS_REGION": "us-east-1"
      }
    }
  }
}

When you ask Kiro to audit infrastructure:

$ kiro chat

You: "Audit all S3 buckets for public access and generate Terraform fixes"

Kiro: [Tool: aws/list_s3_buckets]
Found 47 S3 buckets. Checking public access settings...

[Tool: aws/get_bucket_policy] (for each bucket)
[Tool: aws/get_bucket_acl]

**Security Audit Results:**

🚨 **Critical Issues Found (3 buckets):**

1. **app-logs-production** (public read access)
   - Bucket Policy: Allows s3:GetObject for Principal: "*"
   - Risk: Sensitive application logs exposed to internet
   - Created: 2023-05-12
   - Size: 2.4 TB

2. **user-uploads-staging** (public write access)
   - ACL: Grant WRITE to AllUsers
   - Risk: Anyone can upload files (potential malware hosting)
   - Created: 2024-08-03
   - Size: 340 GB

3. **backup-configs** (public list access)
   - Bucket Policy: Allows s3:ListBucket for Principal: "*"
   - Risk: Attackers can enumerate backup file names
   - Created: 2022-11-20
   - Size: 56 GB

**Generating Terraform Fixes...**

Created: `terraform/s3-security-fixes.tf`

```hcl
# Fix for app-logs-production: Remove public access
resource "aws_s3_bucket_public_access_block" "app_logs_production" {
  bucket = "app-logs-production"

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_policy" "app_logs_production" {
  bucket = "app-logs-production"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid       = "DenyPublicAccess"
        Effect    = "Deny"
        Principal = "*"
        Action    = "s3:GetObject"
        Resource  = "arn:aws:s3:::app-logs-production/*"
        Condition = {
          StringNotEquals = {
            "aws:PrincipalOrgID" = "o-abc12345"  # Your AWS Organization ID
          }
        }
      }
    ]
  })
}

# Fix for user-uploads-staging: Remove public ACL
resource "aws_s3_bucket_acl" "user_uploads_staging" {
  bucket = "user-uploads-staging"
  acl    = "private"
}

# Fix for backup-configs: Block all public access
resource "aws_s3_bucket_public_access_block" "backup_configs" {
  bucket = "backup-configs"

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Apply with:

cd terraform/
terraform plan -out=security-fixes.tfplan
terraform apply security-fixes.tfplan

Estimated Impact:

  • Fixes 3 critical security vulnerabilities
  • Protects 2.8 TB of sensitive data
  • Prevents potential compliance violations (GDPR, SOC 2) ```

The power: Kiro queried live AWS state, identified real security issues, and generated production-ready Terraform code to fix them.


The Core Question You’re Answering

“How can I continuously audit cloud infrastructure for security misconfigurations without manually reviewing every resource?”

Traditional cloud auditing requires:

  1. Manually checking AWS Console
  2. Writing custom scripts to query APIs
  3. Remembering which settings to check
  4. Manually drafting Terraform fixes

This project automates the entire workflow: Kiro becomes your infrastructure security engineer that never sleeps.


Concepts You Must Understand First

Stop and research these before coding:

  1. AWS IAM Policies and Permissions
    • What’s the difference between identity-based and resource-based policies?
    • How do you create a read-only IAM role for infrastructure auditing?
    • What permissions are needed to list and describe S3 buckets?
    • Book Reference: “AWS Security” by Dylan Shield - Ch. 2
  2. S3 Security Model
    • What’s the difference between Bucket Policies, ACLs, and Public Access Blocks?
    • How do you check if a bucket is publicly accessible?
    • What are the risks of public S3 buckets (data leaks, malware hosting)?
    • Web Reference: AWS S3 Security Best Practices
  3. Terraform State Management
    • How do you import existing AWS resources into Terraform state?
    • What’s the difference between terraform plan and terraform apply?
    • How do you manage Terraform state for team collaboration (remote backends)?
    • Book Reference: “Terraform: Up & Running” by Yevgeniy Brikman - Ch. 3

Questions to Guide Your Design

Before implementing, think through these:

  1. Audit Scope
    • Should Kiro audit all AWS regions, or just production regions?
    • What resources should be audited (S3, IAM, EC2 security groups)?
    • How do you prevent Kiro from modifying critical infrastructure?
    • Should audits run on-demand or scheduled (cron)?
  2. Security Boundaries
    • What IAM permissions should the AWS MCP server have (read-only)?
    • How do you prevent Kiro from accidentally deleting resources?
    • Should Kiro be able to apply Terraform changes, or just generate code?
    • How do you audit Kiro’s own actions (CloudTrail logs)?
  3. Terraform Generation
    • Should generated Terraform use modules or raw resources?
    • How do you handle resources not managed by Terraform (manual imports)?
    • What naming conventions for generated Terraform files?
    • Should Kiro check if a resource is already in Terraform state before generating?

Thinking Exercise

Scenario: Multi-Account Audit

Given this AWS organization structure:

Organization (o-abc12345)
├── Production Account (123456789012)
│   ├── S3 buckets (50)
│   ├── IAM roles (200)
│   └── Security groups (300)
├── Staging Account (234567890123)
│   ├── S3 buckets (30)
│   └── IAM roles (100)
└── Development Account (345678901234)
    ├── S3 buckets (20)
    └── IAM roles (50)

Questions to consider:

Cross-Account Access:

  • How do you configure Kiro to assume roles in different accounts?
  • Should you use AWS SSO, or cross-account IAM roles?
  • What’s the minimum permission set for read-only auditing?

Audit Strategy:

# Option 1: Sequential per account
for account in prod staging dev; do
  kiro chat --prompt "Audit S3 in $account account"
done

# Option 2: Parallel with subagents
kiro chat --prompt "Audit S3 across all accounts in parallel"

Terraform State Organization:

  • One Terraform state per account, or unified state?
  • How do you prevent state conflicts when fixing the same resource type across accounts?

The Interview Questions They’ll Ask

  1. “How would you design an automated cloud security auditing system that scales across hundreds of AWS accounts?”

  2. “Explain the security implications of giving an AI access to your AWS environment. What guardrails would you implement?”

  3. “How would you prevent an AI from generating Terraform code that accidentally deletes critical infrastructure?”

  4. “Describe the difference between Terraform plan and apply, and why automation should never skip the plan step.”

  5. “How would you implement approval workflows for AI-generated infrastructure changes in a regulated industry?”

  6. “What strategies would you use to handle Terraform drift between generated code and manually modified resources?”


Hints in Layers

Hint 1: Create Read-Only AWS IAM Role

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListAllMyBuckets",
        "s3:GetBucketPolicy",
        "s3:GetBucketAcl",
        "s3:GetBucketPublicAccessBlock",
        "iam:ListRoles",
        "iam:GetRole",
        "iam:GetRolePolicy"
      ],
      "Resource": "*"
    }
  ]
}

Hint 2: Configure AWS Credentials

# ~/.aws/config
[profile production-readonly]
role_arn = arn:aws:iam::123456789012:role/KiroAuditor
source_profile = default
region = us-east-1

Hint 3: Test AWS MCP Server

kiro chat

You: "List all S3 buckets in the current AWS account"

# Should return:
# - bucket-1
# - bucket-2
# - ...

Hint 4: Generate Terraform Incrementally Create a steering file (.kiro/steering/terraform-standards.md):

# Terraform Code Generation Standards

When generating Terraform:
1. Use `aws_s3_bucket_public_access_block` for blocking public access
2. Always include `depends_on` for resource dependencies
3. Add comments explaining WHY each resource is needed
4. Use variables for account IDs and region names

Books That Will Help

Topic Book Chapter
AWS Security “AWS Security” by Dylan Shield Ch. 2-4 (IAM, S3 Security)
Terraform “Terraform: Up & Running” by Yevgeniy Brikman Ch. 3-5
Infrastructure as Code “Infrastructure as Code” by Kief Morris Ch. 7 (Security)
Cloud Security “Cloud Security and Privacy” by Tim Mather et al. Ch. 4 (Storage Security)

Common Pitfalls & Debugging

Problem 1: “AWS MCP server fails with ‘AccessDenied’“

  • Why: IAM role lacks required permissions
  • Fix: Add s3:ListAllMyBuckets to the role policy
  • Quick test: aws s3 ls --profile production-readonly

Problem 2: “Terraform plan shows unrelated changes”

  • Why: Kiro is unaware of existing Terraform state
  • Fix: Run terraform import for existing resources before generating fixes
  • Quick test: terraform state list shows imported resources

Problem 3: “Generated Terraform uses hardcoded account IDs”

  • Why: Kiro doesn’t know to parameterize account-specific values
  • Fix: Add steering rule: “Use data.aws_caller_identity.current.account_id
  • Quick test: Generated code includes data "aws_caller_identity"

Problem 4: “Kiro recommends changes that would break production”

  • Why: Kiro doesn’t understand resource dependencies
  • Fix: Use read-only mode, require manual review before applying
  • Quick test: Only generate code, never auto-apply

Definition of Done

  • Created read-only AWS IAM role with S3 and IAM describe permissions
  • Configured AWS MCP server in .kiro/settings/mcp.json with correct profile
  • Verified Kiro can list S3 buckets and read bucket policies
  • Kiro identified at least one real security misconfiguration
  • Generated Terraform code that passes terraform validate
  • Terraform plan shows expected changes (no unintended modifications)
  • Documented the audit workflow in README
  • Created steering file with Terraform generation standards

Project 12: “The Documentation Librarian” — RAG

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Python (mcp-server-rag)
Coolness Level Level 3: Genuinely Clever
Difficulty Level 2: Intermediate
Knowledge Area RAG (Retrieval Augmented Generation)

What you’ll build: A custom MCP server that indexes PDFs or internal docs for question answering.

Why it teaches RAG: You extend Kiro with private knowledge beyond its training data.

Success criteria:

  • Kiro answers a question that is only in your private docs.

Real World Outcome

You’ll build a custom MCP server that indexes your internal documentation (PDFs, Markdown, Confluence pages) using embeddings, then lets Kiro search and answer questions from this private knowledge base—extending Kiro’s knowledge beyond its training data.

Example: Indexing Internal Documentation

$ python docs-mcp-server.py index ./company-docs/

Indexing documentation...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📄 engineering-handbook.pdf (142 pages)    ✓ Embedded
📄 api-reference.md (2,340 lines)          ✓ Embedded
📄 deployment-guide.pdf (68 pages)         ✓ Embedded
📄 security-policies.md (890 lines)        ✓ Embedded
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Total chunks: 1,847
Vector DB: ./company-docs/.vectordb/
Ready for queries ✓

Configuration (.kiro/settings/mcp.json):

{
  "mcpServers": {
    "docs": {
      "command": "python3",
      "args": ["./docs-mcp-server.py", "serve"],
      "env": {
        "VECTOR_DB_PATH": "./company-docs/.vectordb",
        "EMBEDDING_MODEL": "text-embedding-3-small"
      }
    }
  }
}

When you ask Kiro questions from internal docs:

$ kiro chat

You: "What's our process for deploying to production?"

Kiro: [Tool: docs/search_knowledge_base]
  Query: "production deployment process"
  Top 3 results:

[Tool: docs/retrieve_context]

Based on the Engineering Handbook (Section 7.2) and Deployment Guide (Chapter 3):

**Production Deployment Process:**

1. **Pre-Deployment Checklist** (Eng Handbook p.67)
   - [ ] PR approved by 2+ engineers
   - [ ] All tests passing in CI
   - [ ] Database migration tested in staging
   - [ ] Feature flags configured

2. **Deployment Window** (Deployment Guide p.24)
   - Scheduled: Tuesday/Thursday 10 AM PST
   - Emergency: Requires VP Engineering approval
   - Avoid: Friday afternoon, holiday weeks

3. **Rollout Strategy** (Deployment Guide p.31)
   - Blue-green deployment with AWS ECS
   - 10% traffic → 50% → 100% over 2 hours
   - Automatic rollback if error rate >0.1%

4. **Post-Deployment** (Eng Handbook p.71)
   - Monitor DataDog for 30 minutes
   - Check Sentry for new error spikes
   - Notify #production-deploys Slack channel

**Emergency Rollback:**
```bash
./scripts/rollback.sh --environment production --revision previous

Sources:

  • Engineering Handbook (2024-Q4), Section 7.2
  • Deployment Guide v3.1, Chapters 3-4

You: “What are the approved third-party libraries for authentication?”

Kiro: [Tool: docs/search_knowledge_base] Query: “approved authentication libraries security policy”

According to Security Policies (Section 4.5 “Approved Dependencies”):

Authentication Libraries:

Approved:

  • passport v0.7+ (Node.js) - OAuth, JWT, SAML
  • authlib v1.2+ (Python) - OAuth 2.0, OpenID Connect
  • spring-security v6.1+ (Java) - Enterprise SSO

Prohibited:

  • Custom-built JWT libraries (use approved only)
  • Deprecated: oauth2-server (EOL 2023)
  • Unvetted: Any library not on approved list

Approval Process: (Security Policies p.89)

  1. Submit request to #security-review
  2. Security team evaluates: License, maintenance, CVE history
  3. Approval SLA: 5 business days

Current Exception: (Added 2024-12-01)

  • next-auth v5.0+ approved for internal tools only (not customer-facing)

Source: Security Policies v2.3, Section 4.5


**The power:** Kiro accessed information that exists ONLY in your internal documentation, not in its training data. It cited exact page numbers and versions.

---

#### The Core Question You're Answering

> "How can I give an AI access to my company's internal knowledge base without uploading sensitive documents to third-party services?"

Traditional solutions have problems:
- **Shared drives:** Documents exist but aren't searchable conversationally
- **Confluence:** Requires manual searching and reading
- **Uploading to ChatGPT:** Violates data privacy policies

This project solves it: **Build a local RAG system that keeps your data on-premises while enabling AI-powered search**.

---

#### Concepts You Must Understand First

**Stop and research these before coding:**

1. **Embeddings and Vector Similarity**
   - What is an embedding (vector representation of text)?
   - How do you measure similarity between vectors (cosine similarity, dot product)?
   - Why are embeddings better than keyword search for semantic matching?
   - *Book Reference:* "Speech and Language Processing" by Jurafsky & Martin - Ch. 6

2. **Vector Databases**
   - What's the difference between traditional databases and vector databases?
   - How do vector indexes work (HNSW, IVF)?
   - When do you use in-memory (FAISS) vs persistent (Chroma, Pinecone)?
   - *Web Reference:* [Pinecone - What is a Vector Database?](https://www.pinecone.io/learn/vector-database/)

3. **Retrieval Augmented Generation (RAG)**
   - What's the difference between RAG and fine-tuning?
   - How do you chunk documents for optimal retrieval (size, overlap)?
   - What's the tradeoff between context window size and retrieval accuracy?
   - *Web Reference:* [LangChain RAG Documentation](https://python.langchain.com/docs/use_cases/question_answering/)

---

#### Questions to Guide Your Design

**Before implementing, think through these:**

1. **Document Processing**
   - How do you extract text from PDFs (PyPDF2, pdfplumber, or OCR)?
   - Should you split documents by page, paragraph, or semantic chunks?
   - How do you preserve metadata (source file, page number, section heading)?
   - What happens with images or tables in documents?

2. **Chunking Strategy**
   - What chunk size optimizes retrieval (512 tokens, 1000 tokens)?
   - Should chunks overlap to avoid splitting important context?
   - How do you handle code blocks vs prose (different chunking strategies)?
   - Should you create multiple chunk sizes for different query types?

3. **Embedding and Retrieval**
   - Which embedding model (OpenAI, Sentence Transformers, local models)?
   - How many top-k results to retrieve (3, 5, 10)?
   - Should you re-rank results after initial retrieval?
   - How do you handle queries that don't match any documents?

---

#### Thinking Exercise

### Scenario: Chunking Strategy

Given this internal documentation snippet:

```markdown
# Deployment Guide

## Chapter 3: Production Deploys

### 3.1 Pre-Deployment Checklist

Before deploying to production, verify:
1. All tests pass in CI/CD pipeline
2. Database migrations tested in staging
3. Feature flags configured

### 3.2 Deployment Window

Production deploys occur:
- Scheduled: Tuesday/Thursday 10 AM PST
- Emergency: Requires VP approval

### 3.3 Rollout Strategy

We use blue-green deployment:
1. Deploy to blue environment
2. Route 10% traffic
3. Monitor for 30 minutes
4. Gradually increase to 100%

Questions while designing chunking:

Option 1: By Section (Heading-Based)

Chunk 1: "Chapter 3: Production Deploys ... 3.1 Pre-Deployment Checklist ... verify: 1. All tests..."
Chunk 2: "3.2 Deployment Window ... Production deploys occur: ..."
Chunk 3: "3.3 Rollout Strategy ... We use blue-green deployment: ..."

Option 2: Fixed Token Size (500 tokens)

Chunk 1: "Chapter 3... 3.1 Pre-Deployment... 3.2 Deployment Window... (cut mid-section)"
Chunk 2: "...Window ... Tuesday/Thursday... 3.3 Rollout... blue-green deployment..."

Option 3: Semantic (Paragraph-Based with Overlap)

Chunk 1: "Chapter 3... 3.1 Pre-Deployment Checklist... verify: 1. All tests..."
Chunk 2: "3.1 (last paragraph)... 3.2 Deployment Window... Scheduled: Tuesday..."
Chunk 3: "3.2 (last paragraph)... 3.3 Rollout Strategy... blue-green..."

Which is best for this query: “What days can I deploy to production?”

  • Option 1 ✅ - Section 3.2 is intact
  • Option 2 ❌ - Answer split across chunks
  • Option 3 ✅ - Overlap ensures context

The Interview Questions They’ll Ask

  1. “Explain the difference between RAG and fine-tuning for extending an LLM’s knowledge. When would you use each?”

  2. “How would you design a document chunking strategy that balances retrieval accuracy with context preservation?”

  3. “What strategies would you use to prevent RAG systems from generating answers based on outdated documentation?”

  4. “Describe how you would handle multi-lingual documentation in a RAG system.”

  5. “How would you implement access control so users can only retrieve documents they’re authorized to see?”

  6. “What approaches would you use to evaluate RAG system quality (precision, recall, answer quality)?”


Hints in Layers

Hint 1: Document Ingestion Pipeline

# docs-mcp-server.py
from langchain.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

def index_documents(docs_path):
    # Load all PDFs and Markdown files
    loaders = [
        PyPDFLoader(f) for f in glob("**/*.pdf")
    ] + [
        TextLoader(f) for f in glob("**/*.md")
    ]

    docs = []
    for loader in loaders:
        docs.extend(loader.load())

    # Split into chunks with overlap
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )
    chunks = splitter.split_documents(docs)

    # Create embeddings and store in vector DB
    embeddings = OpenAIEmbeddings()
    vectordb = Chroma.from_documents(chunks, embeddings, persist_directory=".vectordb")

    return vectordb

Hint 2: MCP Server Query Tool

@mcp_server.tool("search_knowledge_base")
def search_docs(query: str, top_k: int = 3):
    """Search internal documentation"""
    vectordb = Chroma(persist_directory=".vectordb", embedding_function=OpenAIEmbeddings())

    results = vectordb.similarity_search_with_score(query, k=top_k)

    return [
        {
            "content": doc.page_content,
            "source": doc.metadata["source"],
            "page": doc.metadata.get("page", "N/A"),
            "score": score
        }
        for doc, score in results
    ]

Hint 3: Metadata Preservation When loading documents, preserve source information:

doc.metadata = {
    "source": filename,
    "page": page_num,
    "section": heading,
    "last_modified": file_mtime
}

Hint 4: Hybrid Search (Keyword + Semantic) Combine vector similarity with keyword matching for better results:

# Get semantic matches
vector_results = vectordb.similarity_search(query, k=10)

# Get keyword matches (BM25)
keyword_results = bm25.get_top_n(query, documents, n=10)

# Merge and re-rank (Reciprocal Rank Fusion)
final_results = reciprocal_rank_fusion([vector_results, keyword_results])

Books That Will Help

Topic Book Chapter
Embeddings “Speech and Language Processing” by Jurafsky & Martin Ch. 6 (Vector Semantics)
Information Retrieval “Introduction to Information Retrieval” by Manning et al. Ch. 6 (Scoring, Term Weighting)
RAG Systems “Building LLM Applications” by Damian Fanton Ch. 4 (Retrieval)
Vector Databases “Designing Data-Intensive Applications” by Martin Kleppmann Ch. 3 (Storage Engines)

Common Pitfalls & Debugging

Problem 1: “Embeddings fail with ‘token limit exceeded’“

  • Why: Document chunks are too large (>8,191 tokens for text-embedding-3)
  • Fix: Reduce chunk_size to 1000 tokens or use recursive splitting
  • Quick test: len(tiktoken.encode(chunk)) should be <8000

Problem 2: “RAG returns irrelevant results”

  • Why: Query and document embeddings use different semantic spaces
  • Fix: Use query expansion or rewrite user questions before embedding
  • Quick test: Manually inspect top-k results for relevance

Problem 3: “Kiro hallucinates information not in documents”

  • Why: LLM fills in gaps when retrieved context is incomplete
  • Fix: Add system prompt: “Only answer from provided context. Say ‘I don’t know’ if information isn’t in the documents.”
  • Quick test: Ask a question you know isn’t in the docs

Problem 4: “Vector DB queries are slow (>5 seconds)”

  • Why: No index optimization (brute-force search)
  • Fix: Use HNSW index in FAISS or enable indexing in Chroma
  • Quick test: Time queries: time vectordb.similarity_search(query, k=5)

Definition of Done

  • Indexed at least 3 internal documents (PDFs or Markdown)
  • MCP server exposes search_knowledge_base tool
  • Kiro successfully answers a question only present in indexed docs
  • Chunks preserve metadata (source file, page number)
  • Retrieved results include relevance scores
  • System prompt prevents hallucination beyond retrieved context
  • Documented chunking strategy and embedding model in README
  • Tested with queries that have no matching documents (graceful “I don’t know”)

Project 13: “The Custom Tool Builder (Python)” — MCP Protocol Implementation

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Python
Coolness Level Level 4: Hardcore Tech Flex
Business Potential 5. Industry Disruptor (Ecosystem)
Difficulty Level 3: Advanced
Knowledge Area MCP Protocol Implementation

What you’ll build: A custom MCP server exposing fetch_stock_price(ticker).

Why it teaches Protocol: You learn MCP as JSON-RPC over stdio.

Success criteria:

  • Kiro calls your tool and parses real output.

Real World Outcome

You’ll have a working MCP server in Python that Kiro CLI can discover and use. When you configure it in your Kiro settings and ask “What’s the current price of AAPL?”, the following happens:

1. MCP Server Registration (in ~/.config/kiro/settings.json):

{
  "mcpServers": {
    "stock-prices": {
      "command": "python3",
      "args": ["/path/to/stock_mcp_server.py"],
      "env": {
        "STOCK_API_KEY": "your_api_key_here"
      }
    }
  }
}

2. Kiro CLI Session:

$ kiro
You: What's the current price of AAPL?

[Tool Call] stock-prices.fetch_stock_price(ticker="AAPL")

Tool Response:
{
  "ticker": "AAPL",
  "price": 178.42,
  "currency": "USD",
  "timestamp": "2025-01-02T14:32:00Z",
  "change": +2.15,
  "change_percent": +1.22
}

Kiro: Apple (AAPL) is currently trading at $178.42 USD, up $2.15 (+1.22%) from the previous close.

3. Server Logs (in your terminal running the MCP server):

[2025-01-02 14:32:00] INFO: MCP Server started on stdio
[2025-01-02 14:32:00] INFO: Registered tool: fetch_stock_price
[2025-01-02 14:32:15] INFO: Received JSON-RPC request: initialize
[2025-01-02 14:32:15] INFO: Sent capabilities: {tools: 1}
[2025-01-02 14:32:18] INFO: Tool called: fetch_stock_price(ticker="AAPL")
[2025-01-02 14:32:19] INFO: Fetching from Alpha Vantage API...
[2025-01-02 14:32:20] INFO: Response sent: {"price": 178.42, ...}

You’re seeing:

  • The MCP protocol handshake (initialize request/response)
  • Tool discovery (Kiro learns about fetch_stock_price)
  • JSON-RPC call serialization
  • Real-time API integration
  • Structured data flowing back to the LLM

This is the same pattern used by production MCP servers like @modelcontextprotocol/server-postgres, @modelcontextprotocol/server-github, and custom enterprise tools.


The Core Question You’re Answering

“How do I extend an AI coding agent with custom capabilities that go beyond its built-in tools?”

Before you write any code, think about this: LLMs are powerful, but they’re fundamentally text generators. They can’t fetch real-time stock prices, query proprietary databases, or interact with internal APIs. MCP bridges this gap by giving you a protocol to expose custom functionality as “tools” the LLM can call.

This project forces you to understand:

  • The JSON-RPC protocol - How requests and responses are structured
  • Stdio transport - Why MCP uses stdin/stdout instead of HTTP
  • Tool schemas - How to declare parameters and return types
  • Error handling - What happens when your tool fails
  • State management - How to maintain connection state across calls

By the end, you’ll see that MCP is just structured conversation: the LLM sends JSON requests, your server sends JSON responses, and Kiro orchestrates the exchange.


Concepts You Must Understand First

Stop and research these before coding:

  1. JSON-RPC 2.0 Protocol
    • What are the required fields in a JSON-RPC request? (jsonrpc, method, params, id)
    • How do you distinguish a request from a response?
    • What’s the difference between a notification and a request?
    • How are errors represented in JSON-RPC?
    • Book Reference: “Computer Networks, Fifth Edition” by Tanenbaum - Ch. 9 (Application Layer)
  2. Stdio vs HTTP Transport
    • Why does MCP use stdin/stdout instead of HTTP endpoints?
    • How do you read JSON from stdin in Python without blocking?
    • What’s the difference between line-buffered and unbuffered I/O?
    • How do parent processes communicate with child processes?
    • Book Reference: “Advanced Programming in the UNIX Environment, Third Edition” by Stevens - Ch. 15 (IPC)
  3. Tool Schema Design
    • How do you specify parameter types for an LLM? (JSON Schema)
    • What’s the difference between required and optional parameters?
    • How do you document what a tool does so the LLM uses it correctly?
    • Why is return type structure important for LLM reasoning?
    • Book Reference: “REST API Design Rulebook” by Mark Massé - Ch. 4 (Metadata Design)
  4. Python Async I/O
    • What’s the difference between sys.stdin.read() and sys.stdin.readline()?
    • How do you flush stdout to ensure messages are sent immediately?
    • Why might buffered output cause MCP protocol failures?
    • How do you handle SIGTERM gracefully?

Questions to Guide Your Design

Before implementing, think through these:

  1. Tool Registration
    • How will you declare the fetch_stock_price tool to Kiro?
    • What parameters does it need? (Just ticker, or also date, interval?)
    • What should the return schema look like for maximum LLM usefulness?
    • Should errors be returned as exceptions or structured error objects?
  2. API Integration
    • Which stock API will you use? (Alpha Vantage, Finnhub, Yahoo Finance?)
    • How will you handle API rate limits?
    • What happens if the API is down or slow?
    • Should you cache responses to avoid redundant calls?
  3. Error Handling
    • What if the ticker symbol is invalid? (Return error or null?)
    • What if the API key is missing or expired?
    • What if the network request times out?
    • How will you communicate these errors to the LLM clearly?
  4. Protocol Compliance
    • How will you implement the MCP handshake (initialize request)?
    • What capabilities will you advertise? (Just tools, or also resources?)
    • How will you parse incoming JSON-RPC without crashing on malformed input?
    • What logging will help you debug protocol issues?

Thinking Exercise

Trace the MCP Handshake

Before coding, manually trace what happens when Kiro starts your MCP server:

Step 1: Kiro starts your server

$ python3 stock_mcp_server.py

Step 2: Kiro sends an initialize request via stdin:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2024-11-05",
    "clientInfo": {"name": "kiro", "version": "1.0.0"}
  }
}

Step 3: Your server must respond with capabilities:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2024-11-05",
    "capabilities": {
      "tools": {}
    },
    "serverInfo": {"name": "stock-prices", "version": "0.1.0"}
  }
}

Step 4: Kiro sends a tools/list request:

{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/list"
}

Step 5: Your server lists available tools:

{
  "jsonrpc": "2.0",
  "id": 2,
  "result": {
    "tools": [
      {
        "name": "fetch_stock_price",
        "description": "Get real-time stock price for a ticker symbol",
        "inputSchema": {
          "type": "object",
          "properties": {
            "ticker": {
              "type": "string",
              "description": "Stock ticker symbol (e.g., AAPL, GOOGL)"
            }
          },
          "required": ["ticker"]
        }
      }
    ]
  }
}

Questions while tracing:

  • What happens if your server sends a response with the wrong id?
  • Why must you flush stdout after each JSON response?
  • What if Kiro sends a tools/call before you’ve responded to initialize?
  • How would you implement a timeout if the API takes too long?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the difference between JSON-RPC and REST APIs. Why does MCP use JSON-RPC over stdio instead of HTTP?”

  2. “Your MCP server is registered in Kiro settings, but the tool isn’t appearing. Walk me through your debugging process.”

  3. “How would you handle API rate limits in an MCP server? Should you retry automatically or return an error to the LLM?”

  4. “What happens if your MCP server crashes mid-conversation? How does Kiro detect this, and what’s the recovery process?”

  5. “If you wanted to add authentication to your stock API calls (e.g., user-specific API keys), how would you design that in MCP?”

  6. “Describe the lifecycle of an MCP server process. When is it started, and when is it terminated?”

  7. “How would you test an MCP server without running Kiro? Can you simulate the protocol manually?”


Hints in Layers

Hint 1: Starting Point Your MCP server is just a Python script that reads JSON from stdin and writes JSON to stdout. Start by implementing a simple echo server: read a line, parse it as JSON, send back a response with the same id. Once that works, add the MCP-specific methods (initialize, tools/list, tools/call).

Hint 2: Structure Create a handle_request(request) function that dispatches based on request["method"]. Use a dictionary to map methods to handler functions:

handlers = {
    "initialize": handle_initialize,
    "tools/list": handle_tools_list,
    "tools/call": handle_tools_call
}

Hint 3: JSON-RPC Response Format Every response must include:

  • "jsonrpc": "2.0"
  • "id": <same as request>
  • Either "result": {...} for success or "error": {...} for failure

Always flush stdout after writing: sys.stdout.flush()

Hint 4: Tool Schema For tools/list, return a list of tools with JSON Schema for parameters. The inputSchema must follow JSON Schema Draft 7:

{
  "name": "fetch_stock_price",
  "description": "Get current stock price",
  "inputSchema": {
    "type": "object",
    "properties": {
      "ticker": {"type": "string", "description": "e.g., AAPL"}
    },
    "required": ["ticker"]
  }
}

Hint 5: API Integration For the stock API, use requests with error handling:

try:
    response = requests.get(api_url, params={"symbol": ticker}, timeout=5)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.Timeout:
    return {"error": "API timeout"}
except requests.exceptions.RequestException as e:
    return {"error": f"API error: {str(e)}"}

Hint 6: Debugging Log everything to stderr (not stdout, which is used for protocol):

import sys
sys.stderr.write(f"[DEBUG] Received request: {request}\n")
sys.stderr.flush()

Run your server manually and paste JSON requests to test:

$ python3 stock_mcp_server.py
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}

Books That Will Help

Topic Book Chapter
JSON-RPC Protocol “Computer Networks, Fifth Edition” by Tanenbaum Ch. 9 (RPC and Middleware)
Stdio/Process Communication “Advanced Programming in the UNIX Environment” by Stevens Ch. 15 (IPC)
JSON Schema “REST API Design Rulebook” by Mark Massé Ch. 4 (Metadata Design)
Python I/O “Fluent Python, 2nd Edition” by Luciano Ramalho Ch. 21 (Asynchronous Programming)
API Design “REST API Design Rulebook” by Mark Massé Ch. 2 (Identifier Design)

Common Pitfalls & Debugging

Problem 1: “Kiro doesn’t see my MCP server”

  • Why: The command path in settings.json is incorrect, or Python isn’t in PATH
  • Fix: Use absolute paths: "command": "/usr/bin/python3" and "args": ["/full/path/to/server.py"]
  • Quick test: Run the exact command manually: /usr/bin/python3 /full/path/to/server.py

Problem 2: “Server starts but tool doesn’t appear”

  • Why: You didn’t respond to tools/list correctly, or JSON is malformed
  • Fix: Add logging to stderr and check that tools/list returns valid JSON Schema
  • Quick test: Pipe a tools/list request manually:
    $ echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | python3 server.py
    

Problem 3: “Tool calls return no data”

  • Why: You’re not flushing stdout, or the API call is failing silently
  • Fix: Always sys.stdout.flush() after writing JSON, and log API errors to stderr
  • Quick test: Add sys.stderr.write(f"API response: {data}\n") before returning

Problem 4: “Server crashes on malformed JSON”

  • Why: Kiro sent unexpected input, or your JSON parsing is brittle
  • Fix: Wrap json.loads() in try/except and return a JSON-RPC error:
    try:
      request = json.loads(line)
    except json.JSONDecodeError as e:
      error_response = {
          "jsonrpc": "2.0",
          "id": None,
          "error": {"code": -32700, "message": "Parse error"}
      }
    
  • Quick test: Send invalid JSON to your server and verify it doesn’t crash

Problem 5: “API rate limit exceeded”

  • Why: You’re calling the API on every request without caching
  • Fix: Implement a simple cache with TTL:
    cache = {}
    def fetch_with_cache(ticker):
      if ticker in cache and time.time() - cache[ticker]["time"] < 60:
          return cache[ticker]["data"]
      data = fetch_from_api(ticker)
      cache[ticker] = {"data": data, "time": time.time()}
      return data
    
  • Quick test: Call the same ticker twice rapidly and verify only one API call is made

Definition of Done

  • MCP server responds correctly to initialize request
  • tools/list returns fetch_stock_price with valid JSON Schema
  • tools/call with valid ticker returns real-time price data
  • Invalid ticker symbols return clear error messages (not crashes)
  • Server logs all requests and responses to stderr for debugging
  • Kiro CLI can successfully call the tool and display results
  • API errors (timeout, rate limit) are handled gracefully
  • Server can be stopped cleanly with Ctrl+C (no orphan processes)
  • Code includes comments explaining the JSON-RPC protocol flow
  • README.md documents how to configure the server in Kiro settings

Project 14: “The File System Guardian (Node.js)” — MCP Protocol Implementation

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language TypeScript / Node.js
Coolness Level Level 4: Hardcore Tech Flex
Difficulty Level 3: Advanced
Knowledge Area MCP Protocol Implementation

What you’ll build: A sandboxed MCP server with high-level tools like scaffold_react_component(name, props).

Why it teaches Abstraction: You encode best practices into single, safe tools.

Success criteria:

  • The tool generates consistent component structure every run.

Real World Outcome

You’ll have a TypeScript MCP server that provides high-level file system abstractions. Instead of letting the LLM call raw write_file repeatedly, you expose semantic tools like scaffold_react_component that encode your team’s best practices.

1. MCP Server Configuration (in ~/.config/kiro/settings.json):

{
  "mcpServers": {
    "fs-guardian": {
      "command": "node",
      "args": ["/path/to/fs-guardian/dist/index.js"],
      "env": {
        "WORKSPACE_ROOT": "/path/to/your/project",
        "ALLOWED_DIRS": "src/components,src/lib,tests"
      }
    }
  }
}

2. Kiro CLI Session:

$ kiro
You: Create a Button component that accepts label, onClick, and variant (primary/secondary) props

[Tool Call] fs-guardian.scaffold_react_component({
  name: "Button",
  props: ["label: string", "onClick: () => void", "variant: 'primary' | 'secondary'"]
})

Tool Response:
{
  "success": true,
  "files_created": [
    "src/components/Button/Button.tsx",
    "src/components/Button/Button.test.tsx",
    "src/components/Button/Button.module.css",
    "src/components/Button/index.ts"
  ],
  "message": "React component 'Button' scaffolded successfully"
}

Kiro: I've created the Button component with the requested props. The component includes:
- TypeScript definition with proper prop types
- Jest unit test scaffolding
- CSS module for styling
- Barrel export in index.ts

All files are in src/components/Button/

3. Generated Files:

src/components/Button/Button.tsx:

import React from 'react';
import styles from './Button.module.css';

interface ButtonProps {
  label: string;
  onClick: () => void;
  variant: 'primary' | 'secondary';
}

export const Button: React.FC<ButtonProps> = ({ label, onClick, variant }) => {
  return (
    <button
      className={`${styles.button} ${styles[variant]}`}
      onClick={onClick}
    >
      {label}
    </button>
  );
};

src/components/Button/Button.test.tsx:

import { render, screen, fireEvent } from '@testing-library/react';
import { Button } from './Button';

describe('Button', () => {
  it('renders with label', () => {
    render(<Button label="Click me" onClick={() => {}} variant="primary" />);
    expect(screen.getByText('Click me')).toBeInTheDocument();
  });

  it('calls onClick when clicked', () => {
    const handleClick = jest.fn();
    render(<Button label="Click" onClick={handleClick} variant="primary" />);
    fireEvent.click(screen.getByText('Click'));
    expect(handleClick).toHaveBeenCalledTimes(1);
  });
});

4. Server Logs:

[2025-01-02 15:45:00] INFO: FS Guardian MCP Server started
[2025-01-02 15:45:00] INFO: Workspace root: /path/to/your/project
[2025-01-02 15:45:00] INFO: Allowed directories: src/components, src/lib, tests
[2025-01-02 15:45:15] INFO: Tool called: scaffold_react_component
[2025-01-02 15:45:15] INFO: Validating path: src/components/Button
[2025-01-02 15:45:15] INFO: Creating directory: src/components/Button
[2025-01-02 15:45:15] INFO: Writing file: Button.tsx (142 lines)
[2025-01-02 15:45:15] INFO: Writing file: Button.test.tsx (87 lines)
[2025-01-02 15:45:15] INFO: Writing file: Button.module.css (24 lines)
[2025-01-02 15:45:15] INFO: Writing file: index.ts (1 line)
[2025-01-02 15:45:15] INFO: Success: 4 files created

What you’re seeing:

  • Path sandboxing - Server refuses to write outside ALLOWED_DIRS
  • Template generation - Consistent structure (component, test, styles, barrel export)
  • Type safety - TypeScript interfaces generated from prop descriptions
  • Best practices encoded - Testing setup, CSS modules, proper exports
  • Atomic operations - All files created together or none at all

This pattern prevents the LLM from creating inconsistent file structures or writing to dangerous locations.


The Core Question You’re Answering

“How do I create safe, high-level abstractions over file system operations that encode organizational best practices?”

Think about the problem: If you give an LLM direct file system access via basic write_file tools, it might:

  • Create components in random directories
  • Forget to add tests
  • Use inconsistent naming conventions
  • Write to system directories (/etc, /usr/bin)
  • Overwrite critical files

This project teaches you to:

  • Build guardrails - Restrict operations to safe paths
  • Encode patterns - Capture “how we do things here” in reusable tools
  • Validate inputs - Ensure the LLM provides well-formed requests
  • Provide feedback - Return structured results the LLM can reason about
  • Make tools atomic - Either all files are created or none

By the end, you’ll understand how to transform low-level primitives into high-level, safe, team-specific abstractions.


Concepts You Must Understand First

Stop and research these before coding:

  1. Path Traversal Attacks
    • What is a path traversal attack? (e.g., ../../etc/passwd)
    • How do you validate that a path is within allowed directories?
    • What’s the difference between relative and absolute paths in validation?
    • How does path.resolve() help prevent directory traversal?
    • Book Reference: “The Web Application Hacker’s Handbook” by Stuttard & Pinto - Ch. 10
  2. File System Atomicity
    • What happens if your script crashes halfway through creating files?
    • How do you ensure “all or nothing” behavior?
    • What’s a transaction-like pattern for file operations?
    • How would you implement rollback if one file write fails?
    • Book Reference: “Operating Systems: Three Easy Pieces” - Ch. 40 (File System Implementation)
  3. Template Systems
    • How do you generate code from templates without embedding business logic in strings?
    • What’s the difference between string interpolation and proper templating?
    • How do you ensure generated code is syntactically valid?
    • Should you use a library (Handlebars, EJS) or custom logic?
    • Book Reference: “Compilers: Principles and Practice” - Ch. 2 (Lexical Analysis)
  4. TypeScript MCP Server Structure
    • How do you type MCP requests and responses in TypeScript?
    • What’s the recommended way to handle stdio in Node.js? (readline, streams)
    • How do you structure a TypeScript project for deployment?
    • What build process converts TS to JS for distribution?
    • Book Reference: “Programming TypeScript” by Boris Cherny - Ch. 10 (Modules)

Questions to Guide Your Design

Before implementing, think through these:

  1. Safety Boundaries
    • Which directories should be allowed? (Only src/? Or also tests/, docs/?)
    • Should you allow overwriting existing files, or only create new ones?
    • How will you communicate “permission denied” errors to the LLM clearly?
    • Should you log all file operations for audit purposes?
  2. Tool Granularity
    • Should you create one tool per component type? (scaffold_react_component, scaffold_vue_component)
    • Or one generic scaffold_component that takes framework as a parameter?
    • How many templates do you need to support your team’s patterns?
    • Should tests be optional or always included?
  3. Validation
    • How will you parse the props parameter? (Array of strings? TypeScript syntax?)
    • What if the LLM provides invalid TypeScript type syntax?
    • Should you validate component names against naming conventions? (PascalCase? No special chars?)
    • How will you handle edge cases like empty prop lists?
  4. Error Reporting
    • If a file already exists, return an error or auto-increment the name?
    • If path validation fails, explain why in a way the LLM can fix?
    • Should partial failures (3/4 files written) trigger a rollback?
    • How much detail should error messages include?

Thinking Exercise

Design the Sandboxing Logic

Before coding, trace what happens when the LLM tries to exploit your server:

Attack 1: Path Traversal

scaffold_react_component({
  name: "../../../../../../etc/SystemConfig"
})

Your validation logic:

  1. Resolve src/components/../../../../../../etc/SystemConfig to absolute path
  2. Check if resolved path starts with WORKSPACE_ROOT + "/src/components"
  3. If not, reject with error: “Path outside allowed directories”

Attack 2: Overwrite Critical Files

scaffold_react_component({
  name: "../../../package.json"
})

Your validation logic:

  1. Resolve to /path/to/project/package.json
  2. Check against allowed directories
  3. Reject (even though package.json exists, it’s not in src/components)

Attack 3: Malformed Props

scaffold_react_component({
  name: "Button",
  props: ["onClick: () => { console.log('pwned'); return void; }"]
})

Your prop parser:

  1. Split onClick: () => { console.log('pwned'); return void; } on :
  2. Extract type: () => { console.log('pwned'); return void; }
  3. Validate it’s a valid TypeScript type (complex function types are allowed)
  4. Generate interface: onClick: () => { console.log('pwned'); return void; }

Questions while designing:

  • Should you sanitize prop types, or trust the LLM to provide valid TypeScript?
  • What if the prop type includes backticks or quotes that break template strings?
  • How do you detect if a path is absolute vs. relative?
  • Should you allow symlinks, or only real directories?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain how path traversal attacks work. How does your MCP server prevent ../../etc/passwd from being written?”

  2. “Your server creates 4 files per component. What happens if the 3rd file write fails? How do you ensure atomicity?”

  3. “How would you extend this server to support different component frameworks (React, Vue, Svelte) without duplicating code?”

  4. “If the LLM provides a prop type like data: Array<{id: number, nested: {value: string}}>, how do you parse and validate it?”

  5. “What’s the security difference between validating paths before resolution vs. after path.resolve()?”

  6. “How would you implement rate limiting to prevent the LLM from creating 1000 components in a loop?”

  7. “Describe how you’d test this MCP server. Can you unit test it without running Kiro?”


Hints in Layers

Hint 1: Starting Point Start by implementing path validation. Create a validatePath(targetPath) function that:

  1. Resolves the path to absolute
  2. Checks if it starts with one of ALLOWED_DIRS
  3. Returns {valid: true} or {valid: false, reason: "..."}

Test this with various malicious inputs before building the rest.

Hint 2: Atomicity Pattern Collect all file writes in an array, then execute them all at once:

const operations = [
  { path: 'Button.tsx', content: '...' },
  { path: 'Button.test.tsx', content: '...' },
  { path: 'Button.module.css', content: '...' },
  { path: 'index.ts', content: '...' }
];

try {
  for (const op of operations) {
    await fs.writeFile(op.path, op.content);
  }
} catch (error) {
  // Rollback: delete all files created so far
  for (const op of operations) {
    await fs.unlink(op.path).catch(() => {});
  }
  throw error;
}

Hint 3: Template Generation Use template literals with a helper function:

function generateComponent(name: string, props: Array<{name: string, type: string}>) {
  const propsInterface = props.map(p => `  ${p.name}: ${p.type};`).join('\n');
  const destructuredProps = props.map(p => p.name).join(', ');

  return `import React from 'react';
import styles from './${name}.module.css';

interface ${name}Props {
${propsInterface}
}

export const ${name}: React.FC<${name}Props> = ({ ${destructuredProps} }) => {
  return <div className={styles.${name.toLowerCase()}}>{/* TODO */}</div>;
};`;
}

Hint 4: Prop Parsing Parse prop strings like "label: string" using regex:

function parseProp(propString: string): {name: string, type: string} {
  const match = propString.match(/^(\w+):\s*(.+)$/);
  if (!match) throw new Error(`Invalid prop: ${propString}`);
  return { name: match[1], type: match[2].trim() };
}

Hint 5: Path Validation Use path.resolve() and startsWith():

import path from 'path';

function isPathAllowed(targetPath: string): boolean {
  const resolved = path.resolve(process.env.WORKSPACE_ROOT!, targetPath);
  const allowedDirs = process.env.ALLOWED_DIRS!.split(',');

  return allowedDirs.some(dir => {
    const allowedPath = path.resolve(process.env.WORKSPACE_ROOT!, dir);
    return resolved.startsWith(allowedPath);
  });
}

Hint 6: MCP Protocol Handling Reuse the JSON-RPC handler pattern from Project 13:

async function handleToolCall(params: any) {
  const { name, arguments: args } = params;

  if (name === 'scaffold_react_component') {
    return scaffoldReactComponent(args.name, args.props);
  }

  throw new Error(`Unknown tool: ${name}`);
}

Books That Will Help

Topic Book Chapter
Path Traversal & Security “The Web Application Hacker’s Handbook” by Stuttard Ch. 10 (Attacking Back-End Components)
File System Operations “Operating Systems: Three Easy Pieces” Ch. 39-40 (Files and Directories)
Template Generation “Compilers: Principles and Practice” Ch. 2 (Lexical Analysis)
TypeScript Best Practices “Programming TypeScript” by Boris Cherny Ch. 10 (Modules and Namespaces)
Node.js Streams “Node.js Design Patterns” by Mario Casciaro Ch. 5 (Streams)

Common Pitfalls & Debugging

Problem 1: “Path validation allows ../ escapes”

  • Why: You’re validating the path string instead of the resolved absolute path
  • Fix: Always use path.resolve() before validation:
    const absolutePath = path.resolve(workspaceRoot, userInput);
    if (!absolutePath.startsWith(workspaceRoot)) {
    throw new Error("Path outside workspace");
    }
    
  • Quick test: Try validatePath("src/../../etc/passwd") and ensure it’s rejected

Problem 2: “File write succeeds but file is empty”

  • Why: You forgot to await the write operation, or the content variable is undefined
  • Fix: Always await fs.writeFile() and log the content length:
    console.error(`[DEBUG] Writing ${content.length} bytes to ${filePath}`);
    await fs.writeFile(filePath, content, 'utf-8');
    
  • Quick test: Check file size after write: ls -lh src/components/Button/Button.tsx

Problem 3: “TypeScript build fails with Cannot find module

  • Why: Your tsconfig.json doesn’t include the MCP server files
  • Fix: Update include to cover all source files:
    {
    "include": ["src/**/*", "index.ts"],
    "compilerOptions": {
      "outDir": "./dist",
      "rootDir": "./"
    }
    }
    
  • Quick test: Run tsc --noEmit to check for errors without building

Problem 4: “Server creates files but Kiro sees ‘Tool call failed’“

  • Why: You’re not returning a valid MCP response (missing result or wrong structure)
  • Fix: Ensure you return {success: true, files_created: [...]}:
    return {
    content: [{
      type: "text",
      text: JSON.stringify({
        success: true,
        files_created: filePaths,
        message: `Component '${name}' scaffolded successfully`
      })
    }]
    };
    
  • Quick test: Check server logs for the exact JSON response sent

Problem 5: “Rollback doesn’t work—partial files remain”

  • Why: Unlink errors are swallowed silently, or files aren’t tracked properly
  • Fix: Log rollback operations:
    for (const filePath of createdFiles) {
    try {
      await fs.unlink(filePath);
      console.error(`[ROLLBACK] Deleted ${filePath}`);
    } catch (err) {
      console.error(`[ROLLBACK FAILED] ${filePath}: ${err}`);
    }
    }
    
  • Quick test: Simulate a write failure on the 3rd file and verify all previous files are deleted

Definition of Done

  • Path validation correctly rejects ../ traversal attempts
  • Server refuses to write outside ALLOWED_DIRS (test with /etc/passwd)
  • scaffold_react_component generates all 4 files (component, test, CSS, barrel export)
  • Generated TypeScript compiles without errors (tsc --noEmit)
  • Generated tests run successfully (npm test)
  • Partial failures trigger rollback (all files deleted if any write fails)
  • Server logs all file operations to stderr for audit trail
  • Invalid component names return clear errors (e.g., "invalid-name" with hyphens)
  • Prop parsing handles complex types (e.g., Array<{id: number}>)
  • README.md documents the tool schema and example Kiro settings configuration

Project 15: “The Chrome Puppeteer” — Browser Automation

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language JavaScript / HTML
Software or Tool chrome-devtools-mcp
Coolness Level Level 5: Pure Magic
Difficulty Level 3: Advanced
Knowledge Area Browser Automation

What you’ll build: Use Kiro to drive a real browser and assert UI behavior.

Why it teaches E2E: You get AI-driven QA on a live page.

Success criteria:

  • Kiro captures a screenshot and reports a UI state.

Real World Outcome

You’ll have Kiro CLI connected to Chrome via the Model Context Protocol, allowing natural language browser automation. You can ask “Go to example.com and click the login button” and watch it happen in real-time.

1. MCP Server Configuration (in ~/.config/kiro/settings.json):

{
  "mcpServers": {
    "chrome": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"],
      "env": {}
    }
  }
}

2. Kiro CLI Session:

$ kiro
You: Go to https://news.ycombinator.com and tell me the top 3 story titles

[Tool Call] chrome.puppeteer_navigate(url="https://news.ycombinator.com")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_screenshot()
Tool Response: {"success": true, "image": "<base64>"}

[Tool Call] chrome.puppeteer_evaluate(expression=`
  Array.from(document.querySelectorAll('.titleline > a'))
    .slice(0, 3)
    .map(a => a.textContent)
`)
Tool Response: {
  "result": [
    "Show HN: I built a visual regex debugger",
    "Why SQLite uses B-trees instead of LSM trees",
    "The performance cost of virtual methods in C++"
  ]
}

Kiro: The top 3 stories on Hacker News right now are:
1. Show HN: I built a visual regex debugger
2. Why SQLite uses B-trees instead of LSM trees
3. The performance cost of virtual methods in C++

3. Visual Verification: A Chrome window opens automatically and you see:

  • Browser navigates to Hacker News
  • Page loads completely
  • Screenshot captured (visible in Kiro’s context)
  • JavaScript executed in page context
  • Results returned to Kiro

4. Advanced E2E Test Example:

You: Test the login flow on staging.myapp.com - use test@example.com / password123

[Tool Call] chrome.puppeteer_navigate(url="https://staging.myapp.com/login")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_fill(selector="#email", value="test@example.com")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_fill(selector="#password", value="password123")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_click(selector="button[type=submit]")
Tool Response: {"success": true}

[Tool Call] chrome.puppeteer_wait_for_selector(selector=".dashboard-header")
Tool Response: {"success": true, "found": true}

[Tool Call] chrome.puppeteer_screenshot()
Tool Response: {"success": true, "image": "<base64>"}

Kiro: ✅ Login test passed:
- Email field filled successfully
- Password field filled successfully
- Submit button clicked
- Dashboard loaded (confirmed by presence of .dashboard-header)
- Screenshot captured showing logged-in state

What you’re seeing:

  • Natural language E2E testing - No Playwright/Selenium code to write
  • Visual feedback - Screenshots embedded in conversation
  • Selector debugging - Kiro can see the page and help fix selectors
  • Async handling - Automatic waiting for elements and navigation
  • Cross-site testing - Works on any public or localhost URL

This is the same technology behind tools like Browserbase, Skyvern, and multi-agent browser automation systems.


The Core Question You’re Answering

“How can I automate browser interactions using natural language instead of brittle test scripts?”

Traditional E2E tests look like this:

describe('Login flow', () => {
  it('should log in successfully', async () => {
    await page.goto('https://staging.myapp.com/login');
    await page.fill('#email', 'test@example.com');
    await page.fill('#password', 'password123');
    await page.click('button[type=submit]');
    await page.waitForSelector('.dashboard-header');
    expect(await page.isVisible('.dashboard-header')).toBe(true);
  });
});

Every time the UI changes (ID becomes a class, button text changes, form structure shifts), the test breaks. You spend more time maintaining tests than writing features.

This project teaches you:

  • LLM-driven selector discovery - Kiro can adapt to UI changes
  • Natural language assertions - “Verify the user is logged in” instead of brittle selectors
  • Screenshot-based debugging - AI can see what went wrong
  • Multimodal reasoning - Combine DOM inspection + visual analysis
  • Zero-code E2E testing - Write tests as conversation, not code

By the end, you’ll understand how to use Kiro as a QA engineer that can drive browsers, inspect state, and report issues—all through conversation.


Concepts You Must Understand First

Stop and research these before coding:

  1. Chrome DevTools Protocol (CDP)
    • What is the Chrome DevTools Protocol?
    • How do automation tools (Puppeteer, Selenium) communicate with Chrome?
    • What’s the difference between CDP and WebDriver?
    • How do you connect to a running Chrome instance programmatically?
    • Book Reference: “Web Performance in Action” by Jeremy Wagner - Ch. 8 (Browser Tools)
  2. DOM Querying Strategies
    • What’s the difference between CSS selectors and XPath?
    • Why are data-testid attributes better than class names for testing?
    • How do you write selectors that survive UI refactors?
    • What’s a “stable” vs “brittle” selector?
    • Book Reference: “CSS: The Definitive Guide” by Eric Meyer - Ch. 3 (Selectors)
  3. Async Browser Events
    • Why do you need to wait for elements to appear?
    • What’s the difference between waitForSelector and waitForNavigation?
    • How do you detect when a page has fully loaded (not just DOMContentLoaded)?
    • What are race conditions in browser automation?
    • Book Reference: “JavaScript: The Good Parts” by Douglas Crockford - Ch. 8 (Methods)
  4. Headless vs Headed Browsers
    • What’s the difference between headless and headed mode?
    • When should you use headless mode? (CI/CD, scraping)
    • Why might a test pass in headed mode but fail in headless?
    • How do you debug issues in headless mode?
    • Book Reference: “Web Scraping with Python” by Ryan Mitchell - Ch. 11 (JavaScript)

Questions to Guide Your Design

Before implementing, think through these:

  1. Selector Strategy
    • Should Kiro use CSS selectors, XPath, or text content matching?
    • How will you handle dynamic IDs (e.g., user-dropdown-a8f32d)?
    • What if multiple elements match a selector?
    • Should you use Kiro’s vision capabilities to verify the right element was clicked?
  2. Error Handling
    • What if a selector doesn’t exist? (Element not found)
    • What if a page doesn’t load? (Timeout)
    • What if JavaScript execution fails?
    • How will you communicate these failures to the user clearly?
  3. State Management
    • Should Kiro close the browser after each task, or keep it open?
    • How do you handle cookies and session state between tasks?
    • What if the user wants to test a multi-step flow (login → browse → checkout)?
    • Should each conversation start with a fresh browser session?
  4. Visual Debugging
    • When should Kiro automatically take screenshots?
    • Should screenshots be embedded in the conversation or saved to disk?
    • How do you handle sensitive information (passwords) in screenshots?
    • Can Kiro use computer vision to verify UI state instead of DOM inspection?

Thinking Exercise

Trace a Multi-Step Browser Flow

Before using Kiro, manually trace what should happen when you ask: “Go to Amazon, search for ‘mechanical keyboard’, and tell me the price of the first result”

Step 1: Navigate

chrome.puppeteer_navigate(url="https://amazon.com")
→ Browser loads Amazon homepage
→ Wait for page load complete

Step 2: Find Search Box

chrome.puppeteer_evaluate(`
  document.querySelector('#twotabsearchtextbox')?.placeholder
`)
→ Returns: "Search Amazon"
→ Confirms search box exists

Step 3: Fill Search Query

chrome.puppeteer_fill(selector="#twotabsearchtextbox", value="mechanical keyboard")
→ Text appears in search box

Step 4: Submit Search

chrome.puppeteer_click(selector="#nav-search-submit-button")
→ Page navigates to search results
→ Wait for navigation complete

Step 5: Wait for Results

chrome.puppeteer_wait_for_selector(selector=".s-main-slot .s-result-item")
→ Ensures at least one result exists

Step 6: Extract First Result Price

chrome.puppeteer_evaluate(`
  document.querySelector('.s-main-slot .s-result-item .a-price .a-offscreen')?.textContent
`)
→ Returns: "$89.99"

Questions while tracing:

  • What if Amazon’s HTML structure changes and selectors break?
  • How would Kiro adapt if the search box ID changes?
  • What if the page uses a loading spinner that delays results?
  • How do you handle CAPTCHA or “Are you a robot?” prompts?
  • Should Kiro use screenshots to verify the search actually happened?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the Chrome DevTools Protocol. How does Puppeteer use it to control Chrome?”

  2. “Your E2E test passes locally but fails in CI/CD. What are the most common causes, and how do you debug?”

  3. “How would you design a selector strategy that survives UI refactors? Compare data-testid attributes vs CSS classes vs XPath.”

  4. “If a button click doesn’t trigger the expected navigation, how do you debug it? Walk me through your process.”

  5. “How would you use Kiro to test a complex SPA (Single Page Application) where navigation doesn’t reload the page?”

  6. “Describe how you’d implement visual regression testing using Kiro and screenshot comparison.”

  7. “What’s the difference between waitForSelector, waitForNavigation, and waitForTimeout? When would you use each?”


Hints in Layers

Hint 1: Starting Point First, install the Puppeteer MCP server and configure Kiro to use it:

npm install -g @modelcontextprotocol/server-puppeteer

Then add it to ~/.config/kiro/settings.json under mcpServers. Start Kiro and verify the tools are available by asking “What browser tools do you have?”

Hint 2: Basic Navigation Start with simple commands:

You: Open https://example.com and take a screenshot

Kiro will call:

  1. chrome.puppeteer_navigate(url="https://example.com")
  2. chrome.puppeteer_screenshot()

The screenshot will appear in the conversation as an image.

Hint 3: Element Interaction For clicking buttons or filling forms, use selectors:

You: Fill the search box with "test query" and click submit

Kiro will inspect the page (using puppeteer_evaluate to query the DOM) and find appropriate selectors, then call:

  1. chrome.puppeteer_fill(selector="#search-input", value="test query")
  2. chrome.puppeteer_click(selector="button[type=submit]")

Hint 4: Extracting Data To extract text or data from the page:

You: What's the title of the first article on Hacker News?

Kiro will execute JavaScript in the page context:

chrome.puppeteer_evaluate(`
  document.querySelector('.titleline > a')?.textContent
`)

Hint 5: Handling Waits For dynamic content that loads asynchronously:

You: Wait for the results to load, then tell me how many items are shown

Kiro will use:

chrome.puppeteer_wait_for_selector(selector=".result-item", timeout=5000)

Then count the results:

chrome.puppeteer_evaluate(`
  document.querySelectorAll('.result-item').length
`)

Hint 6: Debugging Failed Selectors If a selector doesn’t work, ask Kiro to inspect the page:

You: The login button selector isn't working. Can you look at the page and find the right selector?

Kiro will take a screenshot and execute:

chrome.puppeteer_evaluate(`
  Array.from(document.querySelectorAll('button')).map(b => ({
    text: b.textContent,
    id: b.id,
    classes: b.className
  }))
`)

This gives Kiro the full list of buttons to choose from.


Books That Will Help

Topic Book Chapter
Chrome DevTools Protocol “Web Performance in Action” by Jeremy Wagner Ch. 8 (Browser Developer Tools)
DOM Selectors “CSS: The Definitive Guide” by Eric Meyer Ch. 3 (Selectors)
Browser Automation “Web Scraping with Python” by Ryan Mitchell Ch. 11 (JavaScript and AJAX)
Async JavaScript “JavaScript: The Good Parts” by Douglas Crockford Ch. 8 (Methods)
E2E Testing Patterns “Testing JavaScript Applications” by Lucas da Costa Ch. 9 (E2E Testing)

Common Pitfalls & Debugging

Problem 1: “Element not found” errors

  • Why: The selector is wrong, or the element hasn’t loaded yet
  • Fix: Ask Kiro to inspect the page first:
    You: Take a screenshot and show me all buttons on the page
    

    Then refine your selector based on what Kiro finds.

  • Quick test: Open Chrome DevTools manually and test the selector in the console: document.querySelector('your-selector')

Problem 2: “Navigation timeout” errors

  • Why: The page is slow, blocked by CAPTCHA, or the URL is wrong
  • Fix: Increase timeout or check network tab for errors:
    You: Navigate to example.com and wait up to 30 seconds for the page to load
    
  • Quick test: Load the URL manually in Chrome and check the Network tab for failed requests

Problem 3: “Click doesn’t do anything”

  • Why: The element is hidden, covered by another element, or requires JavaScript to be enabled
  • Fix: Verify the element is visible:
    chrome.puppeteer_evaluate(`
    const el = document.querySelector('button.submit');
    return {
      visible: el.offsetParent !== null,
      disabled: el.disabled,
      boundingBox: el.getBoundingClientRect()
    }
    `)
    
  • Quick test: Try clicking manually in headed mode to see if it works

Problem 4: “Screenshots are blank or show loading spinner”

  • Why: The screenshot was taken before content finished loading
  • Fix: Wait for a specific element that indicates the page is ready:
    chrome.puppeteer_wait_for_selector(selector=".content-loaded-indicator")
    chrome.puppeteer_screenshot()
    
  • Quick test: Add a delay and see if the screenshot improves: chrome.puppeteer_wait(ms=2000)

Problem 5: “Kiro can’t find the MCP server”

  • Why: The server isn’t installed globally, or the command in settings.json is wrong
  • Fix: Verify the server is installed:
    $ which npx
    $ npx @modelcontextprotocol/server-puppeteer --help
    

    Update settings.json to use the full path:

    {
    "command": "/usr/local/bin/npx",
    "args": ["-y", "@modelcontextprotocol/server-puppeteer"]
    }
    
  • Quick test: Restart Kiro after changing settings: kiro (fresh session)

Definition of Done

  • Puppeteer MCP server is configured in Kiro settings
  • Kiro can navigate to a URL and take a screenshot
  • Kiro can fill form fields and click buttons
  • Kiro can extract text from elements using puppeteer_evaluate
  • Kiro can wait for elements to appear before interacting
  • Screenshots are clear and show the correct page state
  • Failed selectors are debugged by inspecting the page structure
  • Multi-step flows (e.g., login → navigate → submit) work correctly
  • Test on both localhost and public URLs
  • Document common selectors for your app in a team wiki

Project 16: “The Design to Code Converter” — Multimodal AI

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language CSS / React
Coolness Level Level 5: Pure Magic
Difficulty Level 2: Intermediate
Knowledge Area Multimodal AI

What you’ll build: Convert a hand-drawn layout into Tailwind CSS.

Why it teaches Vision: You learn how precise you must be when using sketches.

Success criteria:

  • The rendered UI matches the sketch’s layout hierarchy.

Real World Outcome

You’ll upload a hand-drawn wireframe (photo or sketch) to Kiro and get back production-ready Tailwind CSS that matches the layout. This demonstrates multimodal AI’s ability to understand visual design and translate it to code.

1. The Sketch (hand-drawn wireframe):

┌─────────────────────────────────────┐
│  [Logo]        Nav  Links  Button   │
├─────────────────────────────────────┤
│                                     │
│         Hero Title Here             │
│         Subtitle text               │
│         [CTA Button]                │
│                                     │
├─────────────────────────────────────┤
│  ┌────────┐  ┌────────┐  ┌────────┐│
│  │ Card 1 │  │ Card 2 │  │ Card 3 ││
│  │  Icon  │  │  Icon  │  │  Icon  ││
│  │  Text  │  │  Text  │  │  Text  ││
│  └────────┘  └────────┘  └────────┘│
└─────────────────────────────────────┘

2. Kiro CLI Session:

$ kiro
You: [Attach image: wireframe.jpg]

Convert this wireframe into a React component with Tailwind CSS. Make it responsive.

Kiro: I can see your wireframe showing:
- A header with logo, navigation links, and a button
- A hero section with centered title, subtitle, and CTA button
- A three-column card layout with icons and text

Let me create the React component:

[Writes file: src/components/LandingPage.tsx]

3. Generated Code:

src/components/LandingPage.tsx:

export default function LandingPage() {
  return (
    <div className="min-h-screen bg-white">
      {/* Header */}
      <header className="border-b border-gray-200">
        <div className="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8">
          <div className="flex justify-between items-center h-16">
            <div className="text-2xl font-bold text-gray-900">Logo</div>
            <nav className="hidden md:flex space-x-8">
              <a href="#" className="text-gray-700 hover:text-gray-900">Features</a>
              <a href="#" className="text-gray-700 hover:text-gray-900">Pricing</a>
              <a href="#" className="text-gray-700 hover:text-gray-900">About</a>
            </nav>
            <button className="bg-blue-600 text-white px-4 py-2 rounded-lg hover:bg-blue-700">
              Sign Up
            </button>
          </div>
        </div>
      </header>

      {/* Hero Section */}
      <section className="py-20 px-4">
        <div className="max-w-4xl mx-auto text-center">
          <h1 className="text-5xl font-bold text-gray-900 mb-4">
            Hero Title Here
          </h1>
          <p className="text-xl text-gray-600 mb-8">
            Subtitle text that explains the value proposition
          </p>
          <button className="bg-blue-600 text-white px-8 py-3 rounded-lg text-lg hover:bg-blue-700">
            Call to Action
          </button>
        </div>
      </section>

      {/* Three Cards */}
      <section className="py-16 px-4 bg-gray-50">
        <div className="max-w-7xl mx-auto">
          <div className="grid grid-cols-1 md:grid-cols-3 gap-8">
            {[1, 2, 3].map((i) => (
              <div key={i} className="bg-white p-6 rounded-lg shadow-sm text-center">
                <div className="w-16 h-16 bg-blue-100 rounded-full mx-auto mb-4 flex items-center justify-center">
                  <span className="text-2xl">🎯</span>
                </div>
                <h3 className="text-xl font-semibold mb-2">Card {i}</h3>
                <p className="text-gray-600">
                  Description text for this feature or benefit
                </p>
              </div>
            ))}
          </div>
        </div>
      </section>
    </div>
  );
}

4. Rendered Result: When you run npm run dev and view the page, you see:

  • Pixel-perfect match to the wireframe layout
  • Responsive design (collapses to mobile on small screens)
  • Hover states on buttons and links
  • Proper spacing and typography matching the sketch’s visual hierarchy

What you’re seeing:

  • Vision understanding - Kiro interprets hand-drawn boxes as layout sections
  • Semantic HTML - Proper use of header/section/nav tags
  • Responsive grid - Cards stack on mobile, side-by-side on desktop
  • Design system - Consistent spacing, colors, and typography
  • Production-ready code - Not just a prototype, but deployable components

This is the same technology behind tools like v0.dev, Galileo AI, and Figma-to-code plugins.


The Core Question You’re Answering

“Can AI understand visual design intent from rough sketches and translate it to production code?”

Think about the traditional design-to-code workflow:

  1. Designer creates wireframe in Figma
  2. Developer interprets the design
  3. Developer writes HTML/CSS (hours of work)
  4. Back-and-forth to fix spacing, colors, responsiveness
  5. Repeat for every screen

With multimodal AI, you can:

  • Sketch on paper → photo → code in minutes
  • Iterate designs without opening Figma
  • Prototype faster than traditional design tools
  • Bridge the designer-developer communication gap

This project teaches you:

  • How vision models parse layouts - Understanding hierarchy from visual cues
  • Prompt engineering for design - How to describe design intent clearly
  • Multimodal context - Combining images with text instructions
  • Design token extraction - How AI infers spacing, colors, and typography
  • Limitations of vision - Where it struggles (fine details, exact measurements)

By the end, you’ll know when to use AI for design-to-code and when traditional tools are better.


Concepts You Must Understand First

Stop and research these before coding:

  1. Tailwind CSS Utility Classes
    • What’s the difference between px-4 and p-4?
    • How does Tailwind’s responsive system work? (sm:, md:, lg:)
    • What’s the purpose of utility-first CSS vs traditional CSS?
    • How do you compose complex layouts with just utility classes?
    • Book Reference: “Refactoring UI” by Adam Wathan - All chapters
  2. Layout Hierarchy Recognition
    • How do designers communicate hierarchy visually? (size, spacing, weight)
    • What’s the difference between a card, section, and container?
    • How do you represent visual grouping in HTML structure?
    • Why does semantic HTML matter for layout interpretation?
    • Book Reference: “Don’t Make Me Think” by Steve Krug - Ch. 3-4
  3. Responsive Design Principles
    • What’s mobile-first design?
    • How do you handle different screen sizes without media queries?
    • What’s the difference between fluid and fixed layouts?
    • How do you test responsive designs efficiently?
    • Book Reference: “Responsive Web Design” by Ethan Marcotte - Ch. 1-2
  4. Vision Model Capabilities
    • How do vision models understand spatial relationships?
    • What’s the difference between image classification and layout analysis?
    • Why might a model misinterpret hand-drawn sketches?
    • How do you improve vision model accuracy with prompts?
    • Blog: OpenAI GPT-4 Vision System Card (2023)

Questions to Guide Your Design

Before implementing, think through these:

  1. Sketch Quality
    • How detailed should the sketch be? (low-fidelity vs high-fidelity)
    • Should you annotate the sketch with labels? (“Header”, “Hero”, etc.)
    • Does the sketch need to show exact measurements or just proportions?
    • How do you communicate color intent in a black-and-white sketch?
  2. Code Generation Strategy
    • Should Kiro generate a single component or split into multiple files?
    • How do you handle repeated elements (cards, buttons)?
    • Should the code include placeholder content or real text?
    • Do you want inline Tailwind classes or a component library?
  3. Responsive Behavior
    • Should the layout stack vertically on mobile or stay horizontal?
    • Where should breakpoints be? (Tailwind defaults or custom?)
    • How do you communicate responsive behavior in a static sketch?
    • Should text sizes scale or stay fixed?
  4. Iteration Workflow
    • If the output doesn’t match, do you refine the sketch or the prompt?
    • How do you provide feedback to Kiro? (Screenshot comparison?)
    • Should you iterate on the same component or generate multiple variations?
    • How do you version control design iterations?

Thinking Exercise

Analyze What Makes a Good Design Sketch for AI

Before uploading a sketch, think about what information the AI needs:

Good Sketch (High Success Rate):

  • Clear boxes with labels (“Header”, “Hero Section”, “Card Grid”)
  • Arrows showing hierarchy or flow
  • Annotations for interactive elements (“Button”, “Link”)
  • Relative sizing (big title vs small subtitle)
  • Grouping indicators (dotted lines around related elements)

Bad Sketch (Likely to Fail):

  • Ambiguous shapes (is this a button or a text box?)
  • No labels or context
  • Inconsistent spacing that doesn’t reflect intent
  • Missing sections (no header drawn, but you want one)
  • Too much fine detail that obscures structure

Questions while sketching:

  • If I showed this to a junior developer, could they build it?
  • Does the hierarchy (title > subtitle > body) come through visually?
  • Are interactive elements obvious? (buttons, links, forms)
  • Does spacing communicate grouping? (related items close together)

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How does a vision model understand the difference between a button and a text label in a wireframe?”

  2. “If the AI generates a layout that’s close but not quite right, how do you refine it? Walk me through your iteration process.”

  3. “Compare design-to-code AI tools (v0.dev, Galileo) to traditional Figma-to-React plugins. What are the tradeoffs?”

  4. “How would you handle complex interactions (hover states, animations) that aren’t visible in a static wireframe?”

  5. “Describe a scenario where AI design-to-code would fail. How do you know when to use traditional methods?”

  6. “If you wanted to enforce a design system (specific colors, spacing, components), how would you instruct the AI?”


Hints in Layers

Hint 1: Starting Point Draw a simple wireframe on paper or use a tool like Excalidraw. Focus on boxes and labels, not pixel-perfect design. Take a clear photo or screenshot.

Hint 2: The Prompt Upload the image to Kiro and provide context:

You: [Attach: wireframe.jpg]

Convert this wireframe into a React component using Tailwind CSS. The layout should be:
- Responsive (mobile-first)
- Use semantic HTML
- Include hover states on interactive elements

Hint 3: Refining the Output If the result doesn’t match, provide specific feedback:

You: The cards should be in a grid, not stacked. Also, make the hero title larger and the button more prominent.

Kiro will regenerate with adjustments.

Hint 4: Handling Ambiguity If your sketch is ambiguous, Kiro might ask clarifying questions:

Kiro: I see three boxes in your sketch. Should these be:
1. Image cards with captions?
2. Feature blocks with icons?
3. Call-to-action cards with buttons?

Provide clear direction to avoid wrong assumptions.

Hint 5: Extracting Reusable Components After generating code, ask Kiro to refactor:

You: Extract the card component into a reusable component that accepts title, icon, and description as props.

This produces a component library from the initial sketch.

Hint 6: Testing Responsiveness Ask Kiro to show you how it looks at different screen sizes:

You: Show me how this layout would look on mobile (375px) vs desktop (1440px).

Kiro can describe or generate screenshots for visual verification.


Books That Will Help

Topic Book Chapter
Tailwind CSS Patterns “Refactoring UI” by Adam Wathan All chapters
Layout Hierarchy “Don’t Make Me Think” by Steve Krug Ch. 3-4 (Visual Hierarchy)
Responsive Design “Responsive Web Design” by Ethan Marcotte Ch. 1-2
Design Systems “Atomic Design” by Brad Frost Ch. 2 (Atoms, Molecules)
Component Architecture “React Design Patterns” by Michele Bertoli Ch. 3 (Composition)

Common Pitfalls & Debugging

Problem 1: “AI generates wrong layout structure”

  • Why: The sketch is ambiguous or labels are missing
  • Fix: Add annotations to your sketch:
    [Header Section]
    [Hero: centered, large text]
    [Cards: 3 columns on desktop, 1 on mobile]
    
  • Quick test: Show the sketch to a colleague—can they understand it without your explanation?

Problem 2: “Colors don’t match the design system”

  • Why: The sketch doesn’t specify colors, so AI uses defaults
  • Fix: Provide color palette in the prompt: ``` You: Use these colors:
  • Primary: #3B82F6 (blue)
  • Background: #F9FAFB (light gray)
  • Text: #111827 (dark gray) ```
  • Quick test: Check the generated code for hard-coded colors and verify against your palette

Problem 3: “Spacing is inconsistent”

  • Why: The sketch doesn’t communicate spacing intent clearly
  • Fix: Add spacing notes to the sketch or prompt:
    You: Use 8px spacing units. Headers should have 32px padding, sections 64px.
    
  • Quick test: Inspect with browser DevTools and verify p-* and m-* classes are consistent

Problem 4: “Responsive behavior is wrong”

  • Why: The AI assumes a default responsive strategy
  • Fix: Be explicit about breakpoints:
    You: On mobile (<768px), stack cards vertically. On desktop (≥768px), show 3 columns.
    
  • Quick test: Resize browser to test breakpoints: 375px, 768px, 1024px, 1440px

Problem 5: “Interactive elements don’t work”

  • Why: Static wireframes don’t show behavior, only structure
  • Fix: Describe interactions explicitly:
    You: The CTA button should navigate to /signup on click. Navigation links should scroll to section anchors.
    
  • Quick test: Click buttons and links to verify behavior matches intent

Definition of Done

  • Wireframe is clear with labeled sections (header, hero, cards, etc.)
  • Kiro generates React component with Tailwind CSS
  • Layout matches wireframe structure (boxes, hierarchy, grouping)
  • Responsive design works on mobile (375px), tablet (768px), desktop (1440px)
  • Interactive elements (buttons, links) are styled with hover states
  • Colors and spacing follow design system (or specified palette)
  • Code uses semantic HTML (header, section, nav, not just divs)
  • Reusable components are extracted (Card, Button, etc.)
  • Tested in browser and matches visual expectations
  • Document the prompt pattern for future wireframe conversions

Project 17: “The Type-Safe Hook with Bun” — Kiro Hooks

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language TypeScript (Bun)
Coolness Level Level 4: Hardcore Tech Flex
Difficulty Level 3: Advanced
Knowledge Area Kiro Hooks

What you’ll build: A PostToolUse hook in Bun that parses JSON events with Zod and logs metrics.

Why it teaches Safe Automation: You replace fragile shell scripts with typed automation.

Success criteria:

  • Hook logs structured output on every tool use.

Real World Outcome

You’ll have a production-ready Kiro hook written in TypeScript that runs with Bun and validates every tool execution event using Zod schemas. When Kiro calls any tool (Bash, Edit, Read, etc.), your hook will:

For a successful Bash tool call:

$ kiro "run the test suite"

# Your hook logs to ~/.kiro/hooks/metrics.jsonl:
{"timestamp":"2024-12-20T14:32:01.234Z","toolName":"Bash","duration":1234,"success":true,"command":"npm test","exitCode":0}

For a failed Edit operation:

{"timestamp":"2024-12-20T14:35:22.456Z","toolName":"Edit","duration":89,"success":false,"error":"old_string not found in file","filePath":"/path/to/file.ts"}

Live metrics dashboard:

$ bun run analyze-metrics.ts

Tool Usage Report (Last 24 Hours)
─────────────────────────────────
Bash:  47 calls (43 success, 4 failed) - avg 890ms
Edit:  23 calls (22 success, 1 failed) - avg 45ms
Read:  89 calls (89 success, 0 failed) - avg 12ms
Grep:  34 calls (34 success, 0 failed) - avg 156ms

Slowest Commands:
1. npm run build (3421ms)
2. git push origin main (2876ms)
3. docker-compose up (2134ms)

Error Rate: 2.6%

The hook validates every event against strict TypeScript types, ensuring you never miss a field or misparse data.


The Core Question You’re Answering

“How do I build production-grade automation on top of Kiro without brittle shell scripts that break on edge cases?”

Before you start coding, consider: Shell scripts are fragile—they silently fail on malformed JSON, ignore type errors, and give no compile-time guarantees. A typed hook in Bun/TypeScript gives you runtime validation (Zod), compile-time safety (TypeScript), and fast execution (Bun’s native speed). This project teaches you to replace “parse with grep/awk” with “validate with schemas.”


Concepts You Must Understand First

Stop and research these before coding:

  1. Kiro Hook Lifecycle
    • What is the difference between PreToolUse and PostToolUse hooks?
    • How does Kiro pass event data to hooks (stdin JSON)?
    • What happens if a hook exits with non-zero status?
    • Book Reference: Kiro CLI documentation - Hook System Architecture
  2. Zod Schema Validation
    • How do you define a Zod schema for nested objects?
    • What is z.infer<typeof schema> and why is it critical?
    • How does Zod handle optional fields and defaults?
    • How do you compose schemas (unions, intersections)?
    • Book Reference: “Effective TypeScript” by Dan Vanderkam - Ch. 3 (Type Inference)
  3. Bun Runtime Specifics
    • How does Bun’s Bun.file() differ from Node’s fs module?
    • What is Bun.write() and why is it faster than fs.appendFile?
    • How do you handle stdin in Bun (await Bun.stdin.text())?
    • How does Bun’s bundler work (bun build --compile)?
    • Book Reference: Bun documentation - Runtime APIs
  4. JSON Lines (JSONL) Format
    • Why use JSONL instead of JSON arrays for logs?
    • How do you append to JSONL files atomically?
    • How do you parse JSONL with streaming (jq, ndjson)?
    • Book Reference: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 4 (Encoding)

Questions to Guide Your Design

Before implementing, think through these:

  1. Schema Design
    • What fields are common to all tool events (toolName, timestamp, duration)?
    • How do you model tool-specific data (Bash exitCode, Edit filePath)?
    • Should you use a discriminated union (type: "bash" | "edit") or a flat schema?
    • How do you handle unknown tools gracefully?
  2. Error Handling
    • What if Zod validation fails—do you crash the hook or log the error?
    • Should a hook failure block Kiro’s execution or just warn?
    • How do you ensure the log file is always writable?
    • What happens if the disk is full?
  3. Performance
    • Should you write to disk synchronously or asynchronously?
    • Do you batch writes or append immediately?
    • How do you avoid blocking Kiro on slow I/O?
    • Should you rotate log files daily/hourly?
  4. Observability
    • How do you debug a hook that’s failing silently?
    • Should you log to stderr or a separate debug file?
    • How do you measure the hook’s own performance overhead?

Thinking Exercise

Manual Hook Execution Trace

Before writing code, manually trace what happens when Kiro calls your hook:

Scenario: User runs kiro "run npm test" and Kiro invokes the Bash tool.

Step 1: Kiro prepares the event

{
  "hookType": "PostToolUse",
  "timestamp": "2024-12-20T14:32:01.234Z",
  "tool": {
    "name": "Bash",
    "input": {"command": "npm test"},
    "output": {"exitCode": 0, "stdout": "All tests passed", "stderr": ""},
    "duration": 1234
  }
}

Step 2: Kiro spawns your hook

bun run ~/.kiro/hooks/metrics-logger.ts < event.json

Step 3: Your hook reads stdin

const eventJson = await Bun.stdin.text();
// What if stdin is empty? What if it's malformed JSON?

Step 4: Zod validation

const event = PostToolUseEventSchema.parse(JSON.parse(eventJson));
// What if parsing throws? Should you catch and log, or let it crash?

Step 5: Write to JSONL

await Bun.write("metrics.jsonl", JSON.stringify(event) + "\n", {append: true});
// Is this atomic? What if another process is writing simultaneously?

Questions while tracing:

  • At which step could things fail? How would you detect each failure?
  • How do you test this hook without running Kiro every time?
  • How would you simulate different tool events (success, failure, timeout)?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How does Zod’s z.infer work, and why is it more reliable than manually typing JSON.parse results?”
  2. “What are the performance characteristics of Bun compared to Node.js for I/O-heavy tasks like log writes?”
  3. “Explain the difference between JSON and JSON Lines. When would you use each format?”
  4. “How would you design a schema for a discriminated union in TypeScript to handle multiple tool types?”
  5. “What strategies would you use to ensure atomic writes to a log file from concurrent processes?”
  6. “How do you test code that reads from stdin without manually piping data every time?”

Hints in Layers

Hint 1: Start with the Schema Define your Zod schemas first, then let TypeScript types flow from them. Start with a base ToolEvent schema, then extend it for specific tools.

Hint 2: Read Stdin Safely Bun provides await Bun.stdin.text() to read all stdin as a string. Wrap this in a try/catch for JSON.parse and Zod validation. If validation fails, log to stderr and exit 0 (don’t block Kiro).

Hint 3: Atomic Appends Use Bun.write(path, data, {append: true}) for atomic appends. Bun handles file locking internally. For rotation, check file size before writing and rename if needed.

Hint 4: Testing Without Kiro Create a test harness:

echo '{"hookType":"PostToolUse",...}' | bun run metrics-logger.ts

Write JSON fixtures for each tool type and pipe them to your hook.


Books That Will Help

Topic Book Chapter
TypeScript Type System “Effective TypeScript” by Dan Vanderkam Ch. 3 (Type Inference), Ch. 4 (Type Design)
Schema Validation “Programming TypeScript” by Boris Cherny Ch. 6 (Advanced Types)
JSON Lines Format “Designing Data-Intensive Applications” by Martin Kleppmann Ch. 4 (Encoding and Evolution)
Bun Runtime Bun official docs Runtime APIs, File I/O
Hook Patterns Kiro CLI docs Hooks System, Event Schemas

Common Pitfalls & Debugging

Problem 1: “Hook doesn’t run, no error messages”

  • Why: Kiro might not have execute permissions on the hook file, or the shebang is wrong
  • Fix:
    chmod +x ~/.kiro/hooks/metrics-logger.ts
    # Add shebang: #!/usr/bin/env bun
    
  • Quick test: ~/.kiro/hooks/metrics-logger.ts should run directly

Problem 2: “Zod validation fails with ‘Expected object, received undefined’“

  • Why: Stdin is empty, or Kiro isn’t passing the event correctly
  • Fix: Check if stdin exists before parsing:
    const input = await Bun.stdin.text();
    if (!input.trim()) {
      console.error("No input received");
      process.exit(0);
    }
    
  • Quick test: echo '{}' | bun run hook.ts should fail gracefully

Problem 3: “Log file grows infinitely, disk fills up”

  • Why: No log rotation implemented
  • Fix: Implement daily rotation:
    const logPath = `~/.kiro/hooks/metrics-${new Date().toISOString().split('T')[0]}.jsonl`;
    
  • Quick test: Run hook 1000 times, verify old logs are compressed/deleted

Problem 4: “Hook is slow, Kiro feels laggy”

  • Why: Synchronous disk writes block Kiro’s execution
  • Fix: Use async writes and don’t await them (fire-and-forget):
    Bun.write(logPath, data, {append: true}); // Don't await
    
  • Quick test: Time hook execution with time echo '...' | bun run hook.ts

Definition of Done

  • Hook receives PostToolUse events from Kiro and parses them with Zod
  • All tool types (Bash, Edit, Read, Grep, etc.) are logged correctly
  • Validation errors are logged to stderr but don’t crash the hook
  • JSONL log file is written atomically with append mode
  • Log rotation is implemented (daily files, compression after 7 days)
  • Hook adds <5ms overhead to Kiro’s tool execution
  • Test suite covers success, failure, and malformed input cases
  • Analysis script reads JSONL and produces human-readable reports
  • Hook is executable (chmod +x) with correct shebang
  • Documentation explains how to customize schemas for new tools

Project 18: “The Security Firewall Hook” — Security Governance

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Python or Rust
Coolness Level Level 4: Hardcore Tech Flex
Business Potential 3. Service & Support (Enterprise Security)
Difficulty Level 3: Advanced
Knowledge Area Security Governance

What you’ll build: A PreToolUse hook that blocks dangerous commands.

Why it teaches Governance: You enforce guardrails against hallucinated risk.

Success criteria:

  • A destructive command is blocked with a clear message.

Real World Outcome

You’ll have a security firewall that intercepts every Kiro tool call and blocks dangerous operations before execution. When Kiro attempts risky commands, your hook prevents disaster:

Blocked destructive command:

$ kiro "clean up the database by dropping all tables"

[PreToolUse Hook] Analyzing command...
🚫 BLOCKED: Dangerous Bash command detected

Command: DROP DATABASE production;
Reason: Database destructive operation (DROP, TRUNCATE)
Risk Level: CRITICAL
Suggested Alternative: Create a backup first with pg_dump

Hook Exit Code: 1 (Execution prevented)

Allowed safe command:

$ kiro "show me the database schema"

[PreToolUse Hook] Analyzing command...
✅ ALLOWED: Read-only database query

Command: \d+ users
Risk Level: LOW
Reason: SELECT queries are permitted

Security policy report:

$ python analyze-blocks.py

Security Firewall Report (Last 7 Days)
──────────────────────────────────────
Total Tool Calls: 1,247
Blocked: 23 (1.8%)
Allowed: 1,224 (98.2%)

Top Blocked Patterns:
1. rm -rf / (8 attempts)
2. DROP DATABASE (5 attempts)
3. chmod 777 (4 attempts)
4. curl | bash (3 attempts)
5. git push --force main (3 attempts)

Risk Prevented:
- Data loss: 13 incidents
- Security vulnerabilities: 7 incidents
- Production impact: 3 incidents

The hook runs pattern matching, AST analysis, and allow-list checking to catch both obvious and subtle threats before they execute.


The Core Question You’re Answering

“How do I prevent an AI agent from executing catastrophic commands while still allowing productive work?”

Before you start coding, consider: LLMs hallucinate. They suggest rm -rf / to “clean up space.” They recommend chmod 777 for “permission issues.” They propose DROP TABLE users to “fix schema conflicts.” A security firewall is your last line of defense against well-intentioned but devastating suggestions. This project teaches you to build guardrails that catch mistakes before they become disasters.


Concepts You Must Understand First

Stop and research these before coding:

  1. PreToolUse Hook Lifecycle
    • When does PreToolUse execute relative to tool invocation?
    • How does a non-zero exit code block tool execution?
    • Can you modify tool arguments in a PreToolUse hook?
    • What happens if the hook times out (5-second limit)?
    • Book Reference: Kiro CLI documentation - Hook System Architecture
  2. Command Pattern Matching
    • How do you distinguish rm -rf /tmp/safe from rm -rf /?
    • Should you use regex, AST parsing, or both?
    • How do you handle command aliases (ll, la, etc.)?
    • What about commands wrapped in functions or subshells?
    • Book Reference: “Compilers: Principles, Techniques, and Tools” by Aho et al. - Ch. 3 (Lexical Analysis)
  3. Bash AST Parsing
    • What is the structure of a Bash abstract syntax tree?
    • How do you extract the command name from complex pipelines?
    • How do you detect destructive flags (-f, --force, --no-preserve-root)?
    • Libraries: Python (bashlex), Rust (shell-words)
    • Book Reference: “The Linux Command Line” by William Shotts - Ch. 28 (Script Debugging)
  4. Security Policy Design
    • Deny-list (block known bad) vs allow-list (permit known good)?
    • How do you balance security with usability?
    • Should you allow overrides with explicit flags (–i-know-what-im-doing)?
    • How do you handle context-dependent risk (safe in dev, fatal in prod)?
    • Book Reference: “Building Secure and Reliable Systems” by Google - Ch. 6 (Design for Least Privilege)

Questions to Guide Your Design

Before implementing, think through these:

  1. Threat Model
    • What are the most dangerous commands to block (file deletion, permission changes, database drops)?
    • How do you detect SQL injection in commands?
    • Should you block network commands (curl, wget) if they download and execute?
    • What about indirect threats (cron jobs, systemd services)?
  2. Detection Strategy
    • Do you use regex patterns (fast but fragile) or AST parsing (slow but accurate)?
    • How do you handle obfuscated commands ($(echo cm)$(echo 64)echo cm64)?
    • Should you sandbox and execute the command in dry-run mode first?
    • Do you need a scoring system (low/medium/high risk) or binary allow/deny?
  3. User Experience
    • How do you communicate why a command was blocked?
    • Should you suggest safer alternatives?
    • Do you allow interactive approval (“This is risky, proceed? [y/N]”)?
    • How do you prevent alert fatigue from too many false positives?
  4. Policy Configuration
    • Should policies be global or per-project?
    • Do you support environment-specific rules (block in prod, allow in dev)?
    • How do you update patterns without modifying the hook code?
    • Should you support policy inheritance (base + overrides)?

Thinking Exercise

Manual Threat Detection Walkthrough

Before writing code, manually trace how your hook would analyze these commands:

Test Case 1: Obvious Threat

Command: rm -rf /

Analysis:

  • Command: rm (file deletion)
  • Flags: -rf (recursive, force)
  • Target: / (root directory)
  • Decision: BLOCK - Catastrophic data loss risk
  • Reason: Recursive deletion of root filesystem

Test Case 2: Subtle Threat

Command: find / -name "*.log" -exec rm {} \;

Analysis:

  • Command: find (search, seems harmless)
  • Execution: -exec rm (delete each match)
  • Target: / (all logs system-wide)
  • Decision: BLOCK - Hidden destructive operation in -exec
  • Reason: Mass file deletion disguised as search

Test Case 3: Context-Dependent

Command: chmod 777 /tmp/test.sh

Analysis:

  • Command: chmod (permission change)
  • Mode: 777 (world-writable)
  • Target: /tmp/test.sh (temp file)
  • Decision: WARN - Bad practice but low immediate risk
  • Reason: Insecure permissions on non-critical file

Test Case 4: SQL Injection

Command: psql -c "DROP TABLE users WHERE id = 1; DROP DATABASE production; --"

Analysis:

  • Command: psql (database client)
  • SQL: Multiple statements detected
  • Keywords: DROP TABLE, DROP DATABASE
  • Decision: BLOCK - SQL injection attempt
  • Reason: Destructive SQL operations

Questions while analyzing:

  • Which patterns can you detect with regex alone?
  • Which require parsing the command structure?
  • How would you handle base64-encoded commands?
  • What if the command is split across multiple tool calls?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How would you detect a destructive command hidden in a base64-encoded string that’s later decoded and executed?”
  2. “Explain the difference between deny-list and allow-list security models. Which is appropriate for AI agent governance?”
  3. “How do you prevent time-of-check to time-of-use (TOCTOU) attacks where the command changes after your hook approves it?”
  4. “What strategies would you use to minimize false positives while maintaining strong security boundaries?”
  5. “How would you design a policy system that’s secure by default but allows power users to override when necessary?”
  6. “Explain how you would parse a complex Bash command with pipes, redirects, and subshells to extract all executable components.”

Hints in Layers

Hint 1: Start with Pattern Matching Begin with a deny-list of obvious threats using regex:

  • rm -rf /
  • DROP DATABASE
  • chmod 777
  • curl .* | bash Build incrementally from simple patterns to complex AST analysis.

Hint 2: Parse the Bash Command Structure Use a library like bashlex (Python) to parse commands into an AST. Walk the tree to extract:

  • Primary command name
  • All flags and arguments
  • Nested commands in subshells or backticks This catches threats hidden in complex syntax.

Hint 3: Risk Scoring System Assign risk scores to each pattern:

  • Critical (100): rm -rf /, DROP DATABASE
  • High (75): chmod 777, git push --force main
  • Medium (50): Unverified curl downloads
  • Low (25): Warnings only Block anything ≥75, warn for 25-74.

Hint 4: Configuration-Driven Policies Load patterns from a YAML config file:

critical_patterns:
  - pattern: 'rm\s+-rf\s+/'
    reason: "Root filesystem deletion"
  - pattern: 'DROP\s+DATABASE'
    reason: "Database destruction"

This allows updates without code changes.


Books That Will Help

Topic Book Chapter
Security Policy Design “Building Secure and Reliable Systems” by Google Ch. 6 (Design for Least Privilege), Ch. 12 (Crisis Management)
Command Parsing “Compilers: Principles, Techniques, and Tools” by Aho et al. Ch. 3 (Lexical Analysis), Ch. 4 (Syntax Analysis)
Bash Scripting Security “The Linux Command Line” by William Shotts Ch. 28 (Script Debugging), Ch. 29 (Flow Control)
Pattern Matching “Regular Expressions Cookbook” by Goyvaerts & Levithan Ch. 4 (Validation), Ch. 7 (Security)
Hook System Kiro CLI docs Hooks System, Security Best Practices

Common Pitfalls & Debugging

Problem 1: “Hook blocks legitimate commands (false positives)”

  • Why: Overly broad regex patterns match harmless variations
  • Fix: Test patterns against a suite of safe commands:
    # Safe: rm -rf ./temp
    # Unsafe: rm -rf /
    # Pattern should distinguish based on target path
    
  • Quick test: Create a test suite with 100 safe and 20 unsafe commands

Problem 2: “Obfuscated commands bypass detection”

  • Why: Regex can’t detect $(echo rm) -rf /
  • Fix: Execute the command in a sandbox with dry-run mode:
    bash -n -c "$command" 2>&1  # Check syntax without executing
    
  • Quick test: Try variations like r''m -rf, ${CMD}, base64 encoding

Problem 3: “Hook is too slow, Kiro times out”

  • Why: AST parsing is expensive for every command
  • Fix: Implement a fast-path for common safe commands:
    if command.startswith(('ls', 'cat', 'grep', 'echo')):
        return ALLOW  # Skip expensive parsing
    
  • Quick test: Time hook execution—should be <100ms for 95% of commands

Problem 4: “Policy updates require redeploying the hook”

  • Why: Patterns are hardcoded in the script
  • Fix: Load patterns from ~/.kiro/security-policy.yaml:
    with open(os.path.expanduser('~/.kiro/security-policy.yaml')) as f:
        policy = yaml.safe_load(f)
    
  • Quick test: Modify YAML file, verify new patterns apply without restart

Problem 5: “Can’t block SQL injection in psql commands”

  • Why: SQL is embedded as a string argument
  • Fix: Parse SQL with a library like sqlparse (Python):
    import sqlparse
    statements = sqlparse.split(sql_query)
    for stmt in statements:
        if 'DROP' in stmt.upper() or 'DELETE' in stmt.upper():
            return BLOCK
    
  • Quick test: Test with various SQL injection payloads

Definition of Done

  • Hook intercepts all PreToolUse events for Bash and Edit tools
  • Critical patterns (rm -rf /, DROP DATABASE, chmod 777) are blocked
  • Blocked commands log to ~/.kiro/security-log.jsonl with reason
  • AST parsing catches threats hidden in complex command structures
  • Policy is loaded from YAML config file, updatable without code changes
  • False positive rate is <2% on a benchmark of 500 real commands
  • Hook completes analysis in <100ms for 95th percentile
  • Blocked commands show helpful error messages with alternatives
  • Allow-list overrides are supported via –security-override flag
  • Environment-specific policies (dev vs prod) are configurable
  • Documentation explains how to add custom patterns and policies

Project 19: “The Auto-Fixer Loop” — Developer Experience (DX)

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Bash / JavaScript
Coolness Level Level 3: Genuinely Clever
Difficulty Level 2: Intermediate
Knowledge Area Developer Experience (DX)

What you’ll build: A PostToolUse hook that runs lint/format and forces retries on failure.

Why it teaches Feedback Loops: The AI cannot finish until the code is clean.

Success criteria:

  • Lint errors prevent completion until fixed.

Real World Outcome

You’ll have an auto-fixer loop that enforces code quality by running linters and formatters after every file write, forcing Kiro to fix errors before proceeding. The hook creates a tight feedback loop where code quality is mandatory, not optional:

Successful write with auto-fix:

$ kiro "add a new React component for user profiles"

[Kiro writes UserProfile.tsx]

[PostToolUse Hook] Running linters...
✗ ESLint found 3 issues:
  - Line 12: 'useState' is not defined (no-undef)
  - Line 18: Missing return statement (consistent-return)
  - Line 24: Unexpected console.log (no-console)

[Hook] Auto-fixing with eslint --fix...
✓ Fixed 1 issue automatically (console.log removed)
✗ 2 issues require manual fixes

[Hook] Returning error to Kiro with fix instructions...

Kiro: I see lint errors. Let me fix them.

[Kiro writes corrected UserProfile.tsx]

[PostToolUse Hook] Running linters...
✓ ESLint: All checks passed
✓ Prettier: Code formatted
✓ TypeScript: No type errors

[Hook] All quality checks passed. Proceeding.

Quality enforcement metrics:

$ node analyze-quality.js

Auto-Fixer Loop Report (Last 7 Days)
─────────────────────────────────────
Total File Writes: 342
First-Pass Clean: 89 (26%)
Required Fixes: 253 (74%)
  - ESLint errors: 187
  - Type errors: 98
  - Format issues: 156

Average Fix Iterations: 1.8
Max Iterations (complex): 5

Quality Gate Success Rate: 100%
(Zero files merged with lint errors)

Time Saved:
- Pre-commit rejections prevented: 253
- Manual lint runs avoided: 342
- Broken builds prevented: 98

This hook ensures that Kiro never leaves behind broken or poorly formatted code—it fixes issues immediately or tries again until clean.


The Core Question You’re Answering

“How do I build a feedback loop that forces AI to fix code quality issues immediately instead of accumulating technical debt?”

Before you start coding, consider: Without enforcement, Kiro will happily write code with lint errors, type mismatches, and formatting inconsistencies. Developers then manually run npm run lint and find dozens of issues. An auto-fixer loop shifts quality left—errors are caught and fixed at write-time, not review-time. This project teaches you to build continuous quality gates that prevent bad code from ever existing.


Concepts You Must Understand First

Stop and research these before coding:

  1. PostToolUse Hook Timing
    • When does PostToolUse execute (after tool completes or before result returns)?
    • Can a PostToolUse hook modify tool outputs?
    • How do you signal failure to trigger a retry?
    • What is the maximum retry count before Kiro gives up?
    • Book Reference: Kiro CLI documentation - Hook System Architecture
  2. ESLint and Prettier Integration
    • How do you run ESLint programmatically via Node.js API?
    • What is the difference between --fix and manual fixes?
    • How do you combine ESLint and Prettier without conflicts?
    • How do you handle auto-fix failures (unfixable rules)?
    • Book Reference: ESLint documentation - Node.js API
  3. Error Return Codes and Retry Logic
    • How does Kiro interpret non-zero exit codes from hooks?
    • Should you return exit code 1 (error) or print to stderr?
    • How do you pass fix instructions back to Kiro?
    • How many times will Kiro retry before failing?
    • Book Reference: Unix process exit codes - Advanced Linux Programming
  4. TypeScript Compiler API
    • How do you run tsc --noEmit to check types without emitting files?
    • How do you parse TypeScript errors into actionable messages?
    • What is the performance cost of running the compiler on every write?
    • Should you cache compilation results?
    • Book Reference: “Programming TypeScript” by Boris Cherny - Ch. 10 (Compiler API)

Questions to Guide Your Design

Before implementing, think through these:

  1. Quality Gates
    • Which checks should be mandatory (ESLint, types, format)?
    • Should you run tests on every write or only on commit?
    • How do you handle slow checks (type checking takes 5 seconds)?
    • Do you fail fast (stop at first error) or collect all errors?
  2. Auto-Fix Strategy
    • Which errors can be auto-fixed (format, imports) vs require manual fixes?
    • Should you apply auto-fixes immediately or ask Kiro to review them?
    • How do you avoid infinite loops (auto-fix introduces new errors)?
    • What if auto-fix changes behavior (removing unused code)?
  3. Performance
    • How do you avoid blocking Kiro for 10 seconds on every write?
    • Should you run checks in parallel (lint + types + format)?
    • Do you cache lint results for unchanged files?
    • What is the acceptable latency for the feedback loop?
  4. Error Communication
    • How do you format lint errors so Kiro understands how to fix them?
    • Should you include line numbers, error codes, suggested fixes?
    • How do you distinguish critical errors from warnings?
    • Do you provide a “skip quality checks” escape hatch?

Thinking Exercise

Manual Fix Iteration Walkthrough

Before writing code, trace what happens when Kiro writes flawed code:

Iteration 1: Kiro writes component with errors

// UserProfile.tsx (written by Kiro)
export default function UserProfile() {
  const [name, setName] = useState(''); // Missing React import
  console.log('Rendering profile'); // Disallowed console.log
  // Missing return statement
}

Hook Analysis:

  • ESLint detects 3 errors
  • Auto-fix can remove console.log
  • Remaining 2 errors require code changes

Hook returns exit code 1 with message:

ESLint errors found:
- Line 2: 'useState' is not defined. Add: import { useState } from 'react'
- Line 5: Missing return statement. Add JSX return.

Iteration 2: Kiro fixes errors

// UserProfile.tsx (corrected by Kiro)
import { useState } from 'react';

export default function UserProfile() {
  const [name, setName] = useState('');
  return <div>Profile: {name}</div>;
}

Hook Analysis:

  • ESLint: PASS
  • TypeScript: PASS (if tsconfig allows implicit any)
  • Prettier: Needs formatting

Hook applies Prettier:

// Auto-formatted
import { useState } from 'react';

export default function UserProfile() {
  const [name, setName] = useState('');
  return <div>Profile: {name}</div>;
}

Hook returns exit code 0 (success)

Questions while tracing:

  • How many iterations is acceptable before giving up?
  • Should you auto-apply Prettier or ask Kiro to format?
  • What if TypeScript errors require type annotations—can you guide Kiro?
  • How do you detect an infinite loop (same error every iteration)?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How would you design a retry mechanism that prevents infinite loops while still giving the AI enough attempts to fix complex errors?”
  2. “Explain the performance tradeoff between running ESLint on every file write vs caching and invalidating results.”
  3. “How do you integrate multiple linters (ESLint, Prettier, TSC) without conflicts, especially when their rules contradict?”
  4. “What strategies would you use to communicate lint errors to an AI in a way that maximizes fix success rate?”
  5. “How would you implement incremental type checking to avoid re-checking the entire project on every file write?”
  6. “Explain how you would handle a scenario where auto-fix introduces new errors (e.g., removing ‘unused’ code that’s actually needed).”

Hints in Layers

Hint 1: Start with One Linter Begin with ESLint only. Run eslint --format json path/to/file.ts and parse the output. Return exit code 1 if errors exist, 0 if clean. Get this working before adding TypeScript or Prettier.

Hint 2: Parse Tool Output to Extract File Path The PostToolUse event includes the tool name and arguments. For the Edit/Write tools, extract the file path:

const event = JSON.parse(stdin);
if (event.tool.name === 'Edit' || event.tool.name === 'Write') {
  const filePath = event.tool.input.file_path;
  runLinters(filePath);
}

Hint 3: Structured Error Messages for Kiro Format errors as actionable instructions:

❌ ESLint Error (no-undef):
File: src/components/UserProfile.tsx
Line: 12
Error: 'useState' is not defined
Fix: Add import at top: import { useState } from 'react';

Hint 4: Performance - Run Checks in Parallel Use Promise.all() to run ESLint, TypeScript, and Prettier simultaneously:

const [eslintResult, tscResult, prettierResult] = await Promise.all([
  runESLint(filePath),
  runTypeScript(filePath),
  runPrettier(filePath)
]);

Books That Will Help

Topic Book Chapter
ESLint Integration ESLint official docs Node.js API, Custom Rules
Hook System Kiro CLI docs Hooks System, Event Schemas
Process Exit Codes “Advanced Linux Programming” by CodeSourcery Ch. 3 (Processes), Ch. 11 (Error Handling)
TypeScript Compiler “Programming TypeScript” by Boris Cherny Ch. 10 (Compiler API)
Prettier Integration Prettier official docs API Reference, Editor Integration

Common Pitfalls & Debugging

Problem 1: “Hook runs forever, infinite retry loop”

  • Why: Auto-fix introduces new errors, or Kiro keeps making the same mistake
  • Fix: Track error signatures and exit after 3 identical errors:
    const seen = new Set();
    const errorSig = JSON.stringify(errors);
    if (seen.has(errorSig)) {
      console.error("Same errors after fix. Aborting.");
      process.exit(0); // Let it through to avoid infinite loop
    }
    seen.add(errorSig);
    
  • Quick test: Write a file with an unfixable error, verify hook exits gracefully

Problem 2: “Type checking is too slow (10+ seconds per write)”

  • Why: Running tsc on entire project for every file write
  • Fix: Use incremental compilation or project references:
    tsc --noEmit --incremental --tsBuildInfoFile .tsbuildinfo
    
  • Quick test: Time hook execution—should be <2 seconds for 95% of files

Problem 3: “Prettier and ESLint conflict (format changes fail lint)”

  • Why: ESLint has formatting rules that contradict Prettier
  • Fix: Disable ESLint formatting rules:
    npm install --save-dev eslint-config-prettier
    

    Add to .eslintrc.json: "extends": ["prettier"]

  • Quick test: Format with Prettier, then run ESLint—no errors

Problem 4: “Hook doesn’t trigger for certain file writes”

  • Why: Hook only listens for Edit tool, but Kiro used Write tool
  • Fix: Handle both Edit and Write tools:
    if (['Edit', 'Write', 'MultiEdit'].includes(event.tool.name)) {
      runLinters(event.tool.input.file_path);
    }
    
  • Quick test: Use both Edit and Write tools, verify hook runs for both

Problem 5: “Kiro ignores fix instructions, doesn’t retry”

  • Why: Error message is unstructured or exit code is wrong
  • Fix: Return exit code 1 and print structured errors to stdout (not stderr):
    if (errors.length > 0) {
      console.log(formatErrorsForKiro(errors));
      process.exit(1); // Trigger retry
    }
    
  • Quick test: Manually trigger hook with errors, verify Kiro receives message

Definition of Done

  • Hook intercepts all Edit/Write/MultiEdit tool calls
  • ESLint runs on every file write and detects errors
  • Auto-fix is applied for fixable rules (no-console, quotes, etc.)
  • TypeScript compiler checks types (tsc –noEmit) for .ts/.tsx files
  • Prettier formats code after all checks pass
  • Hook returns exit code 1 with actionable errors if checks fail
  • Infinite loop detection prevents >3 retries on identical errors
  • Hook completes in <2 seconds for 95% of files
  • ESLint and Prettier configs are conflict-free
  • Errors are formatted in a way that Kiro can parse and fix
  • Documentation explains how to customize linter rules and disable checks

Project 20: “The Git Context Injector” — Context Management

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Bash
Coolness Level Level 2: Practical
Difficulty Level 2: Intermediate
Knowledge Area Context Management

What you’ll build: A UserPromptSubmit hook that appends git diff --staged.

Why it teaches Dynamic Context: The AI always sees the current change set.

Success criteria:

  • Prompt includes diff content automatically.

Real World Outcome

You’ll have a context injector that automatically enriches every Kiro prompt with git state information, ensuring the AI always has visibility into what code is currently changed, staged, or uncommitted. This eliminates the need to manually paste git diff output:

Without the hook:

$ kiro "write tests for the changes I just made"

Kiro: I don't see any recent changes in the conversation. Can you share what files you modified?

With the hook:

$ git add src/auth.ts  # Stage your changes

$ kiro "write tests for the changes I just made"

[UserPromptSubmit Hook] Injecting git context...

Enhanced prompt sent to Kiro:
────────────────────────────────────
Original: "write tests for the changes I just made"

Git Context:
Branch: feature/oauth-login
Status: 1 file changed, 45 insertions(+), 12 deletions(-)

Staged Changes:
diff --git a/src/auth.ts b/src/auth.ts
index 1234567..abcdefg 100644
--- a/src/auth.ts
+++ b/src/auth.ts
@@ -10,7 +10,15 @@ export class AuthService {
-  async login(username: string, password: string) {
-    return this.basicAuth(username, password);
+  async login(provider: 'google' | 'github', token: string) {
+    const user = await this.oauthVerify(provider, token);
+    return this.createSession(user);
   }
+
+  private async oauthVerify(provider: string, token: string) {
+    // New OAuth verification logic
+  }
────────────────────────────────────

Kiro: I can see you've refactored the login method to support OAuth. I'll write comprehensive tests for both Google and GitHub OAuth flows, covering token validation, user creation, and session management.

[Kiro writes auth.test.ts with OAuth-specific tests]

Context injection report:

$ bash analyze-context-usage.sh

Git Context Injector Report (Last 30 Days)
───────────────────────────────────────────
Total Prompts: 1,847
Context Injected: 1,245 (67%)
Context Skipped: 602 (33% - no staged changes)

Average Context Size:
- Staged diff: 234 lines
- Unstaged diff: 127 lines
- Recent commits: 3 commits

Top Use Cases:
1. "Write tests for these changes" (387 prompts)
2. "Review this code" (298 prompts)
3. "Fix the bug I introduced" (156 prompts)
4. "Document what I changed" (89 prompts)

Token Budget Impact:
- Average context added: 1,200 tokens
- Prompts that exceeded budget: 12 (0.6%)
- Context truncation applied: 8 times

The hook intelligently decides what git information is relevant and formats it for maximum AI comprehension.


The Core Question You’re Answering

“How do I give AI visibility into my current work context without manually pasting git diffs every time?”

Before you start coding, consider: AI is stateless—it doesn’t know what files you changed, what branch you’re on, or what you committed yesterday. Developers waste time copying git diff output or explaining “I just modified the auth file.” A context injector automates this, making every prompt git-aware. This project teaches you to augment prompts with dynamic, session-specific context that makes AI more effective.


Concepts You Must Understand First

Stop and research these before coding:

  1. UserPromptSubmit Hook Lifecycle
    • When does UserPromptSubmit execute (before prompt is sent to LLM)?
    • Can you modify the user’s prompt text?
    • How do you append context without overwriting the original prompt?
    • What is the maximum context size before truncation is needed?
    • Book Reference: Kiro CLI documentation - Hook System Architecture
  2. Git Plumbing Commands
    • How do you get staged changes only (git diff --cached)?
    • How do you get unstaged changes (git diff)?
    • How do you get recent commit history (git log -n 3 --oneline)?
    • How do you check if you’re in a git repository (git rev-parse --is-inside-work-tree)?
    • Book Reference: “Pro Git” by Scott Chacon - Ch. 10 (Git Internals)
  3. Context Relevance Heuristics
    • When should you inject diff (user mentions “changes”, “modified”, “tests”)?
    • When should you skip injection (generic questions about unrelated topics)?
    • How do you detect if the user is asking about code vs asking about concepts?
    • Should you always inject branch name and commit history?
    • Book Reference: None - requires experimentation and user feedback
  4. Token Budget Management
    • How many tokens does a typical diff consume?
    • How do you truncate large diffs (>100 files changed)?
    • Should you prioritize staged changes over unstaged?
    • How do you summarize commits vs including full diffs?
    • Book Reference: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 4 (Encoding)

Questions to Guide Your Design

Before implementing, think through these:

  1. Context Selection
    • Should you inject staged changes, unstaged changes, or both?
    • Do you include recent commits, or only uncommitted work?
    • How do you decide between git diff and git show HEAD?
    • Should you include file renames, binary file changes, or submodule updates?
  2. Prompt Augmentation
    • Where do you inject context (before prompt, after, or in structured fields)?
    • How do you format diffs for readability (syntax highlight, collapse large hunks)?
    • Should you summarize (“3 files changed, 45 insertions”) or show full diffs?
    • Do you annotate the context (“Git Context:” header) or inject silently?
  3. Trigger Logic
    • Do you inject context on every prompt or only when relevant?
    • How do you detect relevance (keyword matching, NLP, always-on)?
    • Should users be able to opt out (–no-git-context flag)?
    • What if there are no changes—do you inject “no changes” or skip entirely?
  4. Performance and Safety
    • How do you handle repositories with thousands of files changed?
    • Should you run git diff synchronously or cache results?
    • What if git diff takes 10 seconds (large binary files)?
    • How do you avoid leaking secrets in diffs (API keys, passwords)?

Thinking Exercise

Manual Context Injection Walkthrough

Before writing code, trace how your hook enhances different prompts:

Scenario 1: User asks about recent changes

User prompt: "Review the authentication changes I made"

Hook detects keywords: ["changes", "made"]
Hook runs: git diff --cached

Injected context:
Branch: feature/oauth
Staged: src/auth.ts (+45, -12)

Enhanced prompt:
"Review the authentication changes I made

Git Context:
<diff output>
"

Scenario 2: User asks generic question

User prompt: "What is the difference between OAuth and JWT?"

Hook detects: No code-specific keywords
Hook decision: Skip git context (not relevant)

Prompt sent unchanged:
"What is the difference between OAuth and JWT?"

Scenario 3: Large diff (500 files changed)

User prompt: "Write tests for my refactor"

Hook detects: 500 files changed in staging area
Hook decision: Truncate to top 10 most-changed files

Enhanced prompt:
"Write tests for my refactor

Git Context (truncated to top 10 files):
src/auth.ts (+200, -50)
src/db.ts (+150, -30)
...
[490 more files not shown]
"

Scenario 4: No changes staged

User prompt: "Explain this error message"

Hook runs: git diff --cached
Result: No output (nothing staged)

Hook decision: Inject "No staged changes" summary

Enhanced prompt:
"Explain this error message

Git Context: No staged changes. Branch: main (up to date)
"

Questions while tracing:

  • How do you balance verbosity (full diffs) vs conciseness (summaries)?
  • Should you always show branch name, even if it’s not relevant?
  • How do you handle merge conflicts in diffs?
  • What if the user’s prompt is already very long—do you still inject context?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How would you design a heuristic to determine when git context is relevant to a user’s prompt vs when it’s just noise?”
  2. “Explain the difference between git diff, git diff --cached, and git diff HEAD. When would you use each?”
  3. “How would you handle a scenario where the git diff output contains sensitive information like API keys or passwords?”
  4. “What strategies would you use to truncate large diffs (500+ files changed) while preserving the most important information?”
  5. “How would you implement caching for git commands to avoid running expensive operations on every prompt?”
  6. “Explain how you would detect if a user’s prompt is asking about code (inject context) vs asking a conceptual question (skip context).”

Hints in Layers

Hint 1: Start with Always-On Injection Begin by injecting git context on every prompt. Don’t implement keyword detection yet. Get the basic flow working: read stdin (user prompt), run git diff --cached, append to prompt, output to stdout.

Hint 2: Check for Git Repository First Before running git commands, verify you’re in a repo:

if ! git rev-parse --is-inside-work-tree &>/dev/null; then
  # Not a git repo, skip injection
  echo "$original_prompt"
  exit 0
fi

Hint 3: Format Context for Readability Use markdown fences to make diffs clear:

echo "$original_prompt"
echo ""
echo "Git Context:"
echo '```diff'
git diff --cached
echo '```'

Hint 4: Truncate Large Diffs Limit diff size to avoid token budget issues:

diff_lines=$(git diff --cached | wc -l)
if [ "$diff_lines" -gt 500 ]; then
  git diff --cached --stat  # Show summary only
else
  git diff --cached
fi

Books That Will Help

Topic Book Chapter
Git Internals “Pro Git” by Scott Chacon Ch. 10 (Git Internals), Ch. 2 (Git Basics)
Hook System Kiro CLI documentation Hooks System, UserPromptSubmit
Shell Scripting “The Linux Command Line” by William Shotts Ch. 27 (Flow Control), Ch. 24 (Script Debugging)
Token Management Kiro CLI docs Context Window Management
Text Processing “Unix Power Tools” by Shelley Powers Ch. 13 (Searching and Substitution)

Common Pitfalls & Debugging

Problem 1: “Hook adds context to every prompt, even unrelated questions”

  • Why: No relevance detection implemented
  • Fix: Add keyword matching:
    if echo "$prompt" | grep -qiE '(change|diff|commit|modify|test|review)'; then
      inject_git_context
    fi
    
  • Quick test: Ask “What is Python?” and verify no git context is added

Problem 2: “Diff output contains API keys or secrets”

  • Why: No secret scanning before injecting context
  • Fix: Filter out sensitive patterns:
    git diff --cached | grep -vE '(API_KEY|SECRET|PASSWORD|TOKEN)='
    
  • Quick test: Stage a file with API_KEY=abc123, verify it’s redacted

Problem 3: “Hook is slow, takes 5+ seconds per prompt”

  • Why: Running git diff on a massive repository every time
  • Fix: Cache diff results and invalidate on file changes:
    cache_file="/tmp/git-context-$(git rev-parse HEAD).cache"
    if [ ! -f "$cache_file" ]; then
      git diff --cached > "$cache_file"
    fi
    cat "$cache_file"
    
  • Quick test: Time hook execution—should be <100ms with cache

Problem 4: “Large diffs break Kiro’s context window”

  • Why: 500-file refactor generates 50,000 lines of diff
  • Fix: Implement smart truncation:
    if [ $(git diff --cached | wc -l) -gt 500 ]; then
      echo "Git Context (large changeset, showing summary):"
      git diff --cached --stat | head -20
      echo "[...truncated...]"
    else
      git diff --cached
    fi
    
  • Quick test: Create a large diff, verify it’s summarized

Problem 5: “Hook doesn’t work in subdirectories”

  • Why: Git commands run from hook’s directory, not user’s cwd
  • Fix: Detect git root and run commands there:
    git_root=$(git rev-parse --show-toplevel)
    cd "$git_root" || exit 0
    git diff --cached
    
  • Quick test: Run Kiro from a subdirectory, verify context injection works

Definition of Done

  • Hook intercepts all UserPromptSubmit events
  • Git context is injected when user mentions code/changes (keyword detection)
  • Staged changes are shown with git diff --cached
  • Unstaged changes are optionally included based on prompt
  • Branch name and recent commits are included in summary
  • Large diffs (>500 lines) are truncated with stats summary
  • Hook skips injection when not in a git repository
  • Secret patterns (API keys, passwords) are filtered from diffs
  • Context is formatted in markdown code fences for readability
  • Cache invalidation prevents stale diff data
  • Hook completes in <100ms for cached results, <1s for fresh diffs
  • Documentation explains how to disable context injection per-prompt

Project 21: “The Headless Server Setup” — Remote Development

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Linux Shell
Coolness Level Level 3: Genuinely Clever
Difficulty Level 2: Intermediate
Knowledge Area Remote Development

What you’ll build: Install Kiro on a headless VM and authenticate via device flow.

Why it teaches Remote Dev: This is the standard pattern for server-based work.

Success criteria:

  • Headless login succeeds without a local browser.

Real World Outcome

You’ll have Kiro running on a headless Linux server (no GUI, no browser) and successfully authenticated via OAuth device flow. This enables server-side automation, CI/CD integration, and remote development workflows:

Successful headless authentication:

# On remote server (no GUI)
$ ssh user@dev-server.company.com

$ kiro auth login

Kiro CLI Authentication (Device Flow)
═══════════════════════════════════════
No browser detected. Using device code flow.

1. Visit this URL on any device with a browser:
   https://anthropic.com/device

2. Enter this code: ABCD-EFGH

3. Authorize the Kiro CLI application

Waiting for authorization... (timeout in 10 minutes)

[User visits URL on laptop, enters code, approves]

✓ Authentication successful!
✓ Token stored in: ~/.kiro/credentials.json
✓ Expires in: 30 days

$ kiro "show me system info"

Kiro: [Returns server specs, uptime, disk usage]

# Headless session is now active

Automated server setup:

$ cat setup-headless-kiro.sh

#!/bin/bash
# Install and configure Kiro on headless servers

# Install Kiro CLI
curl -fsSL https://kiro.dev/install.sh | bash

# Verify installation
kiro --version

# Authenticate (device flow)
echo "Follow the URL and enter the code displayed"
kiro auth login

# Verify authentication
kiro auth whoami

# Set up SSH agent forwarding (optional)
echo 'Host dev-server' >> ~/.ssh/config
echo '  ForwardAgent yes' >> ~/.ssh/config

echo "✓ Kiro headless setup complete"

CI/CD integration example:

# .github/workflows/deploy.yml
name: Deploy with Kiro

on: [push]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Kiro
        run: curl -fsSL https://kiro.dev/install.sh | bash

      - name: Authenticate Kiro (service account)
        env:
          KIRO_API_KEY: ${{ secrets.KIRO_API_KEY }}
        run: kiro auth login --api-key $KIRO_API_KEY

      - name: Run deployment
        run: kiro "deploy to staging using docker-compose"

This setup unlocks server-side AI automation where Kiro can manage infrastructure, run deployments, and execute maintenance tasks—all without human intervention.


The Core Question You’re Answering

“How do I use an AI coding assistant on remote servers that don’t have browsers or GUIs?”

Before you start coding, consider: Most cloud services use OAuth for authentication, which requires opening a browser to authorize. But servers don’t have browsers—they’re headless. The device flow (RFC 8628) solves this by splitting authentication into two steps: the headless device gets a code, and you enter that code on any browser-enabled device. This project teaches you to bridge local and remote authentication workflows.


Concepts You Must Understand First

Stop and research these before coding:

  1. OAuth Device Authorization Grant (Device Flow)
    • What is the device code flow (RFC 8628)?
    • How does it differ from standard OAuth (authorization code flow)?
    • What are the steps: device code request → user authorization → token polling?
    • How do you handle timeout (user doesn’t authorize within 10 minutes)?
    • Book Reference: “OAuth 2 in Action” by Justin Richer - Ch. 12 (Device Flow)
  2. SSH Agent Forwarding
    • What is ForwardAgent yes in SSH config?
    • How do you forward local credentials to remote sessions?
    • What are the security risks of agent forwarding?
    • When should you use ProxyJump vs ForwardAgent?
    • Book Reference: “SSH Mastery” by Michael W. Lucas - Ch. 6 (Agent Forwarding)
  3. Headless Environment Detection
    • How do you detect if a display is available ($DISPLAY, $WAYLAND_DISPLAY)?
    • How do you check if xdg-open or open commands work?
    • What’s the difference between TTY and pseudo-TTY sessions?
    • How do you determine if running in a CI/CD environment?
    • Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 62 (Terminals)
  4. Credential Storage and Rotation
    • Where should tokens be stored (~/.kiro/credentials.json)?
    • What file permissions are required for security (600)?
    • How do you handle token expiration and refresh?
    • Should you use environment variables vs config files?
    • Book Reference: “Building Secure and Reliable Systems” by Google - Ch. 9 (Secrets Management)

Questions to Guide Your Design

Before implementing, think through these:

  1. Authentication Method Selection
    • How do you detect if a browser is available vs requiring device flow?
    • Should you support both interactive and non-interactive modes?
    • How do you handle service accounts (API keys) vs user accounts (OAuth)?
    • What happens if the device code expires before authorization?
  2. Token Management
    • Where do you store the access token (file, keyring, environment)?
    • How do you secure the token file (permissions, encryption)?
    • Do you need a refresh token for long-running servers?
    • How often should you validate the token (every command, daily)?
  3. CI/CD Integration
    • How do you authenticate in GitHub Actions/GitLab CI without interaction?
    • Should you use service accounts or personal access tokens?
    • How do you rotate tokens in CI without manual intervention?
    • What’s the fallback if authentication fails mid-pipeline?
  4. Debugging Headless Issues
    • How do you debug authentication failures without a GUI?
    • Should you log to a file (~/.kiro/debug.log) or stderr?
    • How do you test device flow locally before deploying to servers?
    • What telemetry do you need (auth attempts, failures, timeouts)?

Thinking Exercise

Manual Device Flow Authentication Walkthrough

Before writing code, trace the OAuth device flow step by step:

Step 1: Kiro detects headless environment

$ kiro auth login

# Check for browser availability
if ! command -v xdg-open &>/dev/null && [ -z "$DISPLAY" ]; then
  # Headless mode detected
  use_device_flow=true
fi

Step 2: Request device code from Anthropic

POST https://anthropic.com/oauth/device/code
Content-Type: application/json

{
  "client_id": "kiro-cli",
  "scope": "kiro.read kiro.write"
}

Response:
{
  "device_code": "DEVICE-12345",
  "user_code": "ABCD-EFGH",
  "verification_uri": "https://anthropic.com/device",
  "expires_in": 600,
  "interval": 5
}

Step 3: Display instructions to user

1. Visit: https://anthropic.com/device
2. Enter code: ABCD-EFGH
3. Authorize Kiro CLI

Step 4: Poll for authorization

# Poll every 5 seconds for up to 600 seconds
while [ $elapsed -lt 600 ]; do
  POST https://anthropic.com/oauth/token
  {
    "grant_type": "urn:ietf:params:oauth:grant-type:device_code",
    "device_code": "DEVICE-12345",
    "client_id": "kiro-cli"
  }

  # Response if pending:
  { "error": "authorization_pending" }

  # Response if approved:
  {
    "access_token": "kiro_...",
    "token_type": "Bearer",
    "expires_in": 2592000
  }

  sleep 5
done

Step 5: Store token securely

echo "$access_token" > ~/.kiro/credentials.json
chmod 600 ~/.kiro/credentials.json

Questions while tracing:

  • What happens if the user never authorizes (timeout)?
  • How do you handle polling errors (network failures)?
  • Should you provide a QR code for mobile authorization?
  • How do you test this without hitting real OAuth servers?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the OAuth device flow (RFC 8628) and how it differs from the standard authorization code flow. When would you use each?”
  2. “How would you securely store OAuth tokens on a headless server? What file permissions and encryption methods would you use?”
  3. “What strategies would you use to detect if a system is headless vs has a GUI available?”
  4. “How do you implement token refresh in a long-running server application without user interaction?”
  5. “Explain the security implications of SSH agent forwarding. When is it safe to use, and what are the alternatives?”
  6. “How would you design a CI/CD integration for an OAuth-authenticated tool without exposing tokens in logs?”

Hints in Layers

Hint 1: Detect Headless Environment Check for display availability before attempting browser launch:

if [ -z "$DISPLAY" ] && ! command -v xdg-open &>/dev/null; then
  echo "No browser detected. Using device code flow."
  device_flow=true
fi

Hint 2: Use curl for OAuth API Calls Request device code:

response=$(curl -s -X POST https://anthropic.com/oauth/device/code \
  -H "Content-Type: application/json" \
  -d '{"client_id": "kiro-cli", "scope": "kiro.read kiro.write"}')

device_code=$(echo "$response" | jq -r '.device_code')
user_code=$(echo "$response" | jq -r '.user_code')
verification_uri=$(echo "$response" | jq -r '.verification_uri')

Hint 3: Poll with Exponential Backoff Don’t hammer the server every second:

interval=5
max_attempts=120  # 10 minutes / 5 seconds
for i in $(seq 1 $max_attempts); do
  token_response=$(curl -s -X POST https://anthropic.com/oauth/token \
    -d "grant_type=device_code&device_code=$device_code&client_id=kiro-cli")

  if echo "$token_response" | jq -e '.access_token' &>/dev/null; then
    echo "✓ Authentication successful!"
    break
  fi
  sleep $interval
done

Hint 4: Secure Token Storage

mkdir -p ~/.kiro
echo "$access_token" > ~/.kiro/credentials.json
chmod 600 ~/.kiro/credentials.json  # Owner read/write only

Books That Will Help

Topic Book Chapter
OAuth Device Flow “OAuth 2 in Action” by Justin Richer Ch. 12 (Device Flow), Ch. 6 (Client Types)
SSH Configuration “SSH Mastery” by Michael W. Lucas Ch. 6 (Agent Forwarding), Ch. 11 (SSH for Automation)
Headless Systems “The Linux Programming Interface” by Michael Kerrisk Ch. 62 (Terminals), Ch. 34 (Process Groups)
Secrets Management “Building Secure and Reliable Systems” by Google Ch. 9 (Secrets), Ch. 6 (Least Privilege)
CI/CD Integration “Continuous Delivery” by Jez Humble Ch. 14 (Advanced Version Control)

Common Pitfalls & Debugging

Problem 1: “Device code flow times out before user authorizes”

  • Why: 10-minute timeout is too short, or user didn’t see the prompt
  • Fix: Send timeout reminder:
    echo "⏰ 5 minutes remaining. Visit https://anthropic.com/device"
    
  • Quick test: Wait 11 minutes without authorizing, verify graceful timeout

Problem 2: “Token file is world-readable, exposing credentials”

  • Why: Default file creation umask allows group/other read
  • Fix: Force secure permissions:
    (umask 077 && echo "$token" > ~/.kiro/credentials.json)
    
  • Quick test: ls -l ~/.kiro/credentials.json should show -rw-------

Problem 3: “Authentication works locally but fails in CI/CD”

  • Why: CI runs as different user with no home directory
  • Fix: Use environment variables in CI:
    export KIRO_TOKEN="$KIRO_API_KEY"
    kiro auth login --token-stdin <<< "$KIRO_TOKEN"
    
  • Quick test: Unset $HOME and verify token is read from env

Problem 4: “SSH agent forwarding doesn’t work”

  • Why: ForwardAgent not enabled or SSH key not added to agent
  • Fix:
    # On local machine
    ssh-add ~/.ssh/id_rsa
    ssh-add -l  # Verify key is added
    
    # In ~/.ssh/config
    Host dev-server
      ForwardAgent yes
    
  • Quick test: SSH to server, run ssh-add -l, verify keys are listed

Problem 5: “Device flow polling hammers the OAuth server (rate limit)”

  • Why: Polling every second instead of respecting interval from response
  • Fix: Use the interval field from device code response:
    interval=$(echo "$response" | jq -r '.interval // 5')
    sleep $interval
    
  • Quick test: Monitor network requests, verify polling rate matches interval

Definition of Done

  • Kiro detects headless environment automatically (no $DISPLAY)
  • Device code flow is initiated when browser is unavailable
  • User sees clear instructions: URL to visit and code to enter
  • Polling respects the interval from OAuth response (no rate limiting)
  • Token is stored in ~/.kiro/credentials.json with 600 permissions
  • Authentication timeout (10 minutes) is handled gracefully
  • Token refresh is implemented for long-running sessions
  • CI/CD integration works via environment variable (KIRO_API_KEY)
  • SSH agent forwarding is documented as an alternative
  • Installation script (setup-headless-kiro.sh) is provided
  • Documentation explains how to troubleshoot headless auth failures

Project 22: “The SSH Tunnel Agent” — Networking

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language SSH Config
Coolness Level Level 4: Hardcore Tech Flex
Difficulty Level 3: Advanced
Knowledge Area Networking

What you’ll build: Run Kiro locally but execute commands remotely via ssh.

Why it teaches Hybrid Workflows: Brain local, execution remote.

Success criteria:

  • A deploy task runs end-to-end on a remote host.

Real World Outcome

You’ll have a hybrid workflow where Kiro runs on your local machine (with GUI, editor, browser) but executes commands on remote servers via SSH. This combines local intelligence with remote execution power:

Local Kiro, remote execution:

# On your laptop
$ cat ~/.kiro/config.json
{
  "remote_execution": {
    "enabled": true,
    "host": "prod-server.company.com",
    "user": "deploy",
    "tools": ["Bash", "Read", "Write", "Edit"]
  }
}

$ kiro "deploy the latest version to production"

[Kiro running locally] Analyzing deployment strategy...

[SSH Tunnel] Connecting to prod-server.company.com...
[SSH Tunnel] Forwarding Bash tool execution...

[Remote Bash] ssh deploy@prod-server.company.com
$ cd /var/www/app
$ git pull origin main
$ npm install --production
$ pm2 restart app
✓ Deployment complete

[Kiro running locally] Deployment successful. Health check passing.

Transparent remote file access:

$ kiro "show me the nginx config on the production server"

[Kiro] Reading remote file via SSH...

[Remote Read Tool]
ssh deploy@prod-server.company.com 'cat /etc/nginx/nginx.conf'

[Kiro displays config and suggests improvements]

$ kiro "update the worker_processes to 4"

[Remote Edit Tool]
ssh deploy@prod-server.company.com 'cat > /tmp/nginx.conf.new << EOF
worker_processes 4;
...
EOF && sudo mv /tmp/nginx.conf.new /etc/nginx/nginx.conf'

[Remote Bash]
ssh deploy@prod-server.company.com 'sudo nginx -s reload'

✓ Configuration updated and reloaded

SSH config for seamless tunneling:

$ cat ~/.ssh/config

Host prod-server
  HostName prod-server.company.com
  User deploy
  Port 22
  IdentityFile ~/.ssh/deploy_key
  ForwardAgent yes
  ControlMaster auto
  ControlPath ~/.ssh/control-%r@%h:%p
  ControlPersist 10m

Host staging-server
  HostName staging.company.com
  User deploy
  ProxyJump bastion.company.com
  LocalForward 5432 localhost:5432

This setup enables “brain local, muscle remote” workflows where you get local responsiveness with remote execution power.


The Core Question You’re Answering

“How do I leverage my local development environment while executing commands on remote production servers?”

Before you start coding, consider: Installing Kiro on every server is impractical and risky. Running Kiro locally but SSHing to execute each command is slow and error-prone. SSH tunneling with ControlMaster lets you maintain a single persistent connection, forward tool execution transparently, and keep your local editor/browser while working on remote systems. This project teaches you to build hybrid architectures that combine local and remote strengths.


Concepts You Must Understand First

Stop and research these before coding:

  1. SSH ControlMaster (Connection Multiplexing)
    • What is ControlMaster and how does it reuse SSH connections?
    • How do you configure ControlPath for persistent sockets?
    • What is ControlPersist and when does it close connections?
    • How do you debug stuck control sockets (-O check, -O exit)?
    • Book Reference: “SSH Mastery” by Michael W. Lucas - Ch. 8 (Multiplexing)
  2. Remote Command Execution Patterns
    • How do you execute a single command via SSH (ssh host 'command')?
    • How do you handle stdin/stdout redirection over SSH?
    • What’s the difference between ssh -t (pseudo-TTY) vs non-interactive?
    • How do you escape shell metacharacters in remote commands?
    • Book Reference: “Unix Network Programming” by W. Richard Stevens - Ch. 19 (Remote Execution)
  3. SSH Port Forwarding (Tunneling)
    • What is local forwarding (-L) vs remote forwarding (-R)?
    • How do you forward multiple ports simultaneously?
    • What is dynamic forwarding (-D) for SOCKS proxy?
    • How do you debug forwarding failures (-v verbose mode)?
    • Book Reference: “SSH Mastery” by Michael W. Lucas - Ch. 7 (Forwarding)
  4. Tool Execution Proxying
    • How do you intercept tool calls and route them to SSH?
    • Should you proxy all tools or only specific ones (Bash, Read, Write)?
    • How do you handle tool failures (network errors, timeouts)?
    • What about file path translation (local vs remote paths)?
    • Book Reference: Kiro CLI docs - Remote Execution Configuration

Questions to Guide Your Design

Before implementing, think through these:

  1. Tool Selection for Remote Execution
    • Which tools should execute remotely (Bash, Read, Write, Edit)?
    • Should Grep run remotely or locally after fetching files?
    • What about tools that need local state (TodoWrite, EnterPlanMode)?
    • How do you handle mixed workflows (some local, some remote)?
  2. Connection Management
    • Do you open a new SSH connection per tool call or reuse one?
    • How do you detect connection failures and retry?
    • Should you establish the connection lazily (on first use) or eagerly?
    • What’s the timeout for idle connections (ControlPersist)?
  3. Path Translation
    • How do you map local file paths to remote paths?
    • Do you assume identical directory structures?
    • How do you handle absolute vs relative paths?
    • What about symlinks that resolve differently locally vs remotely?
  4. Security and Permissions
    • Should you use password auth or key-based auth?
    • How do you handle sudo commands that require passwords?
    • Do you need to validate the remote host’s fingerprint?
    • How do you prevent command injection via shell escaping?

Thinking Exercise

Manual SSH Tunnel Execution Trace

Before writing code, trace how a tool call is proxied through SSH:

Scenario: Kiro executes a Bash command remotely

Step 1: User asks Kiro to deploy

$ kiro "deploy the app to production"

Kiro decides: Run `npm run build && pm2 restart app`

Step 2: Kiro invokes Bash tool

Tool: Bash
Arguments: {
  "command": "npm run build && pm2 restart app"
}

Step 3: Remote execution hook intercepts

// Hook detects remote execution is enabled
if (config.remote_execution.enabled) {
  if (config.remote_execution.tools.includes('Bash')) {
    execute_remotely(tool, args);
  }
}

Step 4: Build SSH command

ssh_command = [
  'ssh',
  '-o', 'ControlMaster=auto',
  '-o', 'ControlPath=~/.ssh/control-%r@%h:%p',
  '-o', 'ControlPersist=10m',
  'deploy@prod-server.company.com',
  'cd /var/www/app && npm run build && pm2 restart app'
]

Step 5: Execute via SSH

$ ssh deploy@prod-server.company.com 'cd /var/www/app && npm run build && pm2 restart app'

# SSH reuses existing connection via ControlMaster socket
# Output is streamed back to local Kiro

Step 6: Return result to Kiro

{
  "exitCode": 0,
  "stdout": "Build complete. PM2 restarted app.",
  "stderr": "",
  "duration": 12345
}

Questions while tracing:

  • How do you handle commands that need interactive input (sudo passwords)?
  • What if the SSH connection breaks mid-execution?
  • How do you capture real-time output (streaming vs buffered)?
  • What if the remote command takes hours—do you keep the connection open?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain SSH ControlMaster and how it enables connection multiplexing. What are the performance benefits?”
  2. “How would you handle shell escaping when passing user-generated commands over SSH to prevent command injection?”
  3. “What are the differences between SSH local forwarding (-L), remote forwarding (-R), and dynamic forwarding (-D)? When would you use each?”
  4. “How would you design a retry mechanism for tool execution that fails due to transient network errors?”
  5. “Explain the security implications of SSH agent forwarding. How would you mitigate the risks?”
  6. “How would you implement path translation for tools that operate on files when local and remote directory structures differ?”

Hints in Layers

Hint 1: Configure SSH ControlMaster Add to ~/.ssh/config:

Host prod-server
  HostName prod-server.company.com
  User deploy
  ControlMaster auto
  ControlPath ~/.ssh/control-%r@%h:%p
  ControlPersist 10m

This creates a persistent connection socket that’s reused for all subsequent SSH commands.

Hint 2: Proxy Bash Tool Calls Intercept Bash tool and route to SSH:

# In a hook or wrapper script
original_command="$1"
ssh prod-server "cd /app && $original_command"

Hint 3: Test Connection Reuse Verify ControlMaster is working:

# First connection (slow, establishes socket)
time ssh prod-server 'echo hello'  # ~500ms

# Subsequent connections (fast, reuse socket)
time ssh prod-server 'echo hello'  # ~50ms

Hint 4: Handle Shell Escaping Use printf %q to safely escape commands:

safe_command=$(printf '%q' "$user_command")
ssh prod-server "bash -c $safe_command"

Books That Will Help

Topic Book Chapter
SSH Multiplexing “SSH Mastery” by Michael W. Lucas Ch. 8 (Multiplexing), Ch. 7 (Forwarding)
Remote Execution “Unix Network Programming” by W. Richard Stevens Ch. 19 (Remote Execution)
Shell Escaping “The Linux Command Line” by William Shotts Ch. 35 (Strings and Numbers)
SSH Configuration “SSH: The Secure Shell” by Barrett, Silverman, Byrnes Ch. 7 (Advanced Client Use)
Networking Basics “TCP/IP Illustrated, Volume 1” by W. Richard Stevens Ch. 2 (The Internet Protocol)

Common Pitfalls & Debugging

Problem 1: “Each SSH command takes 500ms, making Kiro unbearably slow”

  • Why: Opening new SSH connection for every command
  • Fix: Enable ControlMaster connection reuse:
    Host prod-server
      ControlMaster auto
      ControlPath ~/.ssh/control-%r@%h:%p
      ControlPersist 10m
    
  • Quick test: Time 10 rapid SSH commands—should be <100ms each after first

Problem 2: “Commands fail with ‘command not found’ on remote host”

  • Why: SSH non-interactive sessions don’t source .bashrc
  • Fix: Source profile or use login shell:
    ssh prod-server 'source ~/.bashrc && npm run build'
    # OR
    ssh -t prod-server 'npm run build'  # Force pseudo-TTY
    
  • Quick test: ssh prod-server 'echo $PATH' vs ssh -t prod-server 'echo $PATH'

Problem 3: “Command injection via user input”

  • Why: User command contains shell metacharacters (;, |, &&)
  • Fix: Use parameterized execution or escape properly:
    # BAD: ssh prod-server "rm $user_file"  # Injection risk
    # GOOD:
    safe_file=$(printf '%q' "$user_file")
    ssh prod-server "rm $safe_file"
    
  • Quick test: Try user_file="; rm -rf /" and verify it’s escaped

Problem 4: “Stuck control socket prevents new connections”

  • Why: ControlMaster socket is hung or orphaned
  • Fix: Kill stuck socket:
    ssh -O exit prod-server
    # OR manually:
    rm ~/.ssh/control-deploy@prod-server.company.com:22
    
  • Quick test: ssh -O check prod-server shows socket status

Problem 5: “Remote commands don’t stream output, buffered until completion”

  • Why: stdout is buffered when not connected to a TTY
  • Fix: Use stdbuf or script to force line buffering:
    ssh prod-server 'stdbuf -oL npm run build'
    # OR force pseudo-TTY:
    ssh -t prod-server 'npm run build'
    
  • Quick test: Long-running command should show incremental output

Definition of Done

  • SSH ControlMaster is configured for connection reuse
  • Bash tool calls are proxied to remote host via SSH
  • Read/Write/Edit tools can operate on remote files
  • Tool execution respects working directory on remote host
  • Shell commands are properly escaped to prevent injection
  • ControlPersist keeps connections open for 10 minutes
  • Failed SSH connections are retried with exponential backoff
  • Remote execution can be toggled on/off via config file
  • Path translation handles local vs remote directory structures
  • Streaming output works for long-running commands
  • Documentation explains SSH config setup and troubleshooting
  • Supports bastion/jump hosts for accessing internal servers

Project 23: “The Corporate Proxy Navigator” — Enterprise Networking

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Env Vars / Certs
Coolness Level Level 2: Practical
Difficulty Level 3: Advanced
Knowledge Area Enterprise Networking

What you’ll build: Configure Kiro to use HTTPS_PROXY and trust a custom root CA.

Why it teaches Enterprise Readiness: Most enterprise failures happen here.

Success criteria:

  • Kiro can reach LLM endpoints through the proxy.

Real World Outcome

You’ll configure Kiro to work in a corporate environment with an HTTPS proxy and custom SSL certificates. When you run Kiro, it will successfully connect to Anthropic’s API through the corporate proxy without certificate errors.

Example Configuration:

# Set proxy environment variables
export HTTPS_PROXY=http://proxy.company.com:8080
export HTTP_PROXY=http://proxy.company.com:8080
export NO_PROXY=localhost,127.0.0.1,.company.local

# Trust custom root CA (macOS)
sudo security add-trusted-cert -d -r trustRoot \
  -k /Library/Keychains/System.keychain \
  /path/to/company-root-ca.crt

# Trust custom root CA (Linux)
sudo cp /path/to/company-root-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates

# Verify proxy connectivity
curl -I https://api.anthropic.com/v1/messages
# HTTP/1.1 200 OK (via proxy)

# Run Kiro (should work without certificate errors)
kiro "list files in current directory"

Output you’ll see:

$ kiro "analyze the recent commits"

✓ Connected to Anthropic API via proxy.company.com:8080
✓ SSL certificate verified (CN=proxy.company.com, Issuer=CompanyRootCA)

Analyzing commits...
[Output shows Kiro successfully working through the proxy]

Troubleshooting verification:

# Verify certificate chain
openssl s_client -connect api.anthropic.com:443 -proxy proxy.company.com:8080 \
  -showcerts 2>&1 | grep -A 2 "Verify return code"
# Verify return code: 0 (ok)

# Test proxy authentication (if required)
export HTTPS_PROXY=http://username:password@proxy.company.com:8080

# Debug SSL handshake
SSL_DEBUG=true kiro "test command"

You’re configuring the same environment that 80% of enterprise developers face daily.


The Core Question You’re Answering

“Why does my LLM tool work at home but fail at the office, and how do I make it work behind a corporate proxy with SSL inspection?”

Before you configure anything, understand this: Corporate networks intercept HTTPS traffic for security monitoring. Your computer sees a certificate signed by “CompanyRootCA” instead of the real website’s certificate. Without trusting that CA, every HTTPS connection fails with “certificate validation error.”

The proxy configuration and CA trust setup are not optional workarounds—they’re the standard way enterprise tools connect to the internet.


Concepts You Must Understand First

Stop and research these before configuring:

  1. HTTPS Proxies and CONNECT Method
    • How does an HTTPS proxy tunnel encrypted traffic?
    • What is the HTTP CONNECT method and why is it used for HTTPS?
    • How does the client maintain end-to-end encryption through the proxy?
    • Book Reference: “Computer Networks” by Andrew S. Tanenbaum - Ch. 7 (Application Layer)
  2. SSL/TLS Certificate Chains and Trust
    • What is a certificate chain (root CA → intermediate CA → leaf certificate)?
    • How does the OS certificate store work (System.keychain on macOS, /etc/ssl/certs on Linux)?
    • What happens during SSL inspection (man-in-the-middle by the proxy)?
    • Book Reference: “Serious Cryptography” by Jean-Philippe Aumasson - Ch. 13 (TLS)
  3. Environment Variables and Tool Configuration
    • What is the precedence order (HTTPS_PROXY vs https_proxy vs tool-specific config)?
    • How does NO_PROXY prevent proxy usage for internal domains?
    • Why do some tools ignore environment variables (and how to configure them)?
    • Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 6 (Environment)
  4. Proxy Authentication (Basic, NTLM, Kerberos)
    • How does Basic Auth work (username:password in URL)?
    • What is NTLM authentication and why is it used in Windows environments?
    • How does Kerberos Single Sign-On (SSO) work with proxies?
    • Book Reference: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens - Ch. 18 (HTTP)
  5. Debugging Network Connectivity
    • How do you trace SSL handshake failures (openssl s_client)?
    • How do you verify proxy connectivity (curl -I -x)?
    • How do you capture and analyze HTTPS traffic (mitmproxy, Wireshark)?
    • Book Reference: “Practical Packet Analysis” by Chris Sanders - Ch. 5 (HTTP/HTTPS)
  6. Certificate Pinning and Bypass Strategies
    • What is certificate pinning and why do some tools use it?
    • How do you disable pinning for enterprise proxies (NODE_TLS_REJECT_UNAUTHORIZED)?
    • What are the security risks of disabling certificate validation?
    • Book Reference: “Security Engineering” by Ross Anderson - Ch. 21 (Network Attacks)

Questions to Guide Your Design

Before configuring, think through these:

  1. Proxy Discovery and Configuration
    • How do you automatically detect the corporate proxy (PAC file, WPAD)?
    • Should you use environment variables or tool-specific config files?
    • How do you handle proxy authentication without exposing credentials?
    • What happens when you VPN into the network vs. are physically on-site?
  2. Certificate Trust Strategy
    • Should you trust the root CA system-wide or per-application?
    • How do you export the root CA from the browser (Chrome, Firefox)?
    • What format does the certificate need to be (PEM, DER, PKCS#12)?
    • How do you update the certificate when it rotates annually?
  3. Handling Proxy Failures
    • How do you detect when the proxy is unreachable (timeout vs. connection refused)?
    • Should you fall back to direct connection, or fail fast?
    • How do you handle 407 Proxy Authentication Required errors?
    • What logging helps diagnose proxy issues quickly?
  4. NO_PROXY Configuration
    • Which domains should bypass the proxy (.company.local, localhost, 127.0.0.1)?
    • How do you handle wildcard domains (*.internal.company.com)?
    • Should you bypass the proxy for local development servers?
    • How do you test that NO_PROXY is working correctly?
  5. Cross-Platform Compatibility
    • How do you make the same config work on macOS, Linux, and Windows?
    • Where does each OS store trusted root certificates?
    • How do you handle case sensitivity (HTTPS_PROXY vs https_proxy on Windows)?
    • What about containerized environments (Docker, Kubernetes)?

Thinking Exercise

Trace an HTTPS Request Through a Corporate Proxy

Before configuring anything, manually trace how an HTTPS request flows through a corporate proxy with SSL inspection:

Client                 Proxy                Destination
  |                      |                       |
  |-- HTTP CONNECT api.anthropic.com:443 ------>|
  |                      |                       |
  |<-------- 200 Connection Established --------|
  |                      |                       |
  |-- TLS ClientHello -->|                       |
  |    (SNI: api.anthropic.com)                  |
  |                      |                       |
  |<-- TLS ServerHello --|                       |
  |    (Cert: CN=api.anthropic.com,             |
  |     Issuer=CompanyRootCA)                    |
  |                      |                       |
  |-- TLS Finished ----->|                       |
  |                      |-- TLS ClientHello --->|
  |                      |    (SNI: api.anthropic.com)
  |                      |                       |
  |                      |<-- TLS ServerHello ---|
  |                      |    (Cert: CN=api.anthropic.com,
  |                      |     Issuer=Let's Encrypt)
  |                      |                       |
  |-- HTTP Request ----->|-- HTTP Request ------>|
  |    (encrypted by     |    (re-encrypted by   |
  |     client-proxy TLS)|     proxy-server TLS) |
  |                      |                       |

Questions while tracing:

  • At what point does the client see the fake certificate signed by CompanyRootCA?
  • Why does the proxy need to decrypt and re-encrypt the traffic (SSL inspection)?
  • What would happen if CompanyRootCA is not trusted by the client?
  • How does the client verify the hostname matches (SNI vs. certificate CN)?
  • Why doesn’t the client see the real Let’s Encrypt certificate?

Manual test:

# 1. Try without trusting the CA (should fail)
curl https://api.anthropic.com/v1/messages -I
# curl: (60) SSL certificate problem: unable to get local issuer certificate

# 2. Export the proxy's CA certificate from Chrome
#    Settings → Privacy and security → Security → Manage certificates
#    → Authorities → Export "CompanyRootCA"

# 3. Trust the CA and retry
export SSL_CERT_FILE=/path/to/company-root-ca.crt
curl https://api.anthropic.com/v1/messages -I
# HTTP/1.1 200 OK (via proxy)

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Why do HTTPS connections fail with ‘certificate validation error’ behind a corporate proxy, even though the proxy is configured correctly?”

  2. “What is the difference between HTTP_PROXY and HTTPS_PROXY environment variables, and when is each used?”

  3. “A user reports that some tools work through the proxy but others don’t. How would you diagnose this?”

  4. “How do you securely configure proxy authentication without hardcoding credentials in environment variables?”

  5. “What is a PAC (Proxy Auto-Config) file, and how does it differ from manually setting HTTPS_PROXY?”

  6. “Why should you never set NODE_TLS_REJECT_UNAUTHORIZED=0 in production, even if it ‘fixes’ the certificate error?”


Hints in Layers

Hint 1: Start with Proxy Discovery Use the browser’s network settings to find the proxy configuration. On Windows, check “Internet Options → Connections → LAN Settings.” On macOS, check “System Preferences → Network → Advanced → Proxies.”

Hint 2: Test Connectivity Before Configuring Kiro First verify that curl works through the proxy. If curl fails with certificate errors, no amount of Kiro configuration will help—you need to fix the CA trust issue first.

Hint 3: Export and Inspect the Certificate Use openssl s_client to see the actual certificate chain:

openssl s_client -connect api.anthropic.com:443 -proxy proxy.company.com:8080 -showcerts

Look for “Issuer: CN=CompanyRootCA” (not “Issuer: Let’s Encrypt”). That certificate needs to be trusted.

Hint 4: Platform-Specific CA Trust Commands Each OS has a different way to trust certificates:

  • macOS: security add-trusted-cert
  • Linux (Debian/Ubuntu): Copy to /usr/local/share/ca-certificates/ and run update-ca-certificates
  • Linux (RHEL/CentOS): Copy to /etc/pki/ca-trust/source/anchors/ and run update-ca-trust
  • Node.js/Python tools: May need NODE_EXTRA_CA_CERTS or REQUESTS_CA_BUNDLE

Books That Will Help

Topic Book Chapter
HTTPS Proxies “Computer Networks” by Tanenbaum Ch. 7 (Application Layer)
SSL/TLS Internals “Serious Cryptography” by Aumasson Ch. 13 (TLS)
Certificate Chains “Bulletproof SSL and TLS” by Ivan Ristić Ch. 4 (PKI)
Environment Variables “The Linux Programming Interface” by Kerrisk Ch. 6 (Environment)
Debugging HTTPS “TCP/IP Illustrated, Vol. 1” by Stevens Ch. 18 (HTTP)
Network Troubleshooting “Practical Packet Analysis” by Sanders Ch. 5 (HTTP/HTTPS)

Common Pitfalls & Debugging

Problem 1: “curl: (60) SSL certificate problem: unable to get local issuer certificate”

  • Why: The corporate proxy is presenting a certificate signed by an untrusted root CA.
  • Fix: Export the root CA from your browser and add it to the system trust store.
  • Quick test: openssl s_client -connect api.anthropic.com:443 -proxy proxy.company.com:8080 -showcerts | grep Issuer

Problem 2: “Proxy authentication required (407)”

  • Why: The proxy requires username/password, but none was provided.
  • Fix: Add credentials to the proxy URL: export HTTPS_PROXY=http://username:password@proxy.company.com:8080
  • Quick test: curl -I -x http://username:password@proxy.company.com:8080 https://api.anthropic.com

Problem 3: “Connection timeout when accessing internal domains”

  • Why: Internal domains should bypass the proxy, but NO_PROXY is not configured.
  • Fix: export NO_PROXY=localhost,127.0.0.1,.company.local,*.internal.company.com
  • Quick test: curl -I http://internal.company.local (should connect directly, not via proxy)

Problem 4: “Works in browser but not in terminal”

  • Why: The browser uses the OS proxy settings, but terminal tools use environment variables.
  • Fix: Set HTTPS_PROXY in your shell profile (~/.bashrc, ~/.zshrc).
  • Quick test: echo $HTTPS_PROXY (should show the proxy URL)

Problem 5: “Certificate works for curl but not for Node.js tools”

  • Why: Node.js uses its own certificate store, separate from the OS.
  • Fix: export NODE_EXTRA_CA_CERTS=/path/to/company-root-ca.crt
  • Quick test: node -e "require('https').get('https://api.anthropic.com', r => console.log(r.statusCode))"

Problem 6: “Proxy works on-site but fails when VPNed from home”

  • Why: VPN routing may bypass the proxy, or the proxy may only be accessible on the internal network.
  • Fix: Check if the VPN provides split-tunnel (some traffic via VPN, some direct) or full-tunnel routing.
  • Quick test: curl -I https://api.anthropic.com (compare response headers on-site vs. VPN)

Definition of Done

  • Kiro successfully connects to Anthropic API through the corporate proxy
  • No SSL certificate validation errors appear
  • Internal company domains bypass the proxy (via NO_PROXY)
  • Proxy authentication (if required) works without exposing credentials in shell history
  • Configuration persists across terminal sessions (added to shell profile)
  • Works consistently on macOS, Linux, and Windows (if multi-platform team)
  • Fallback mechanism exists if proxy is unreachable (manual switch or direct connection)
  • Documentation includes troubleshooting steps for common certificate errors
  • Tested with both Kiro and curl to ensure system-wide proxy configuration
  • No hardcoded credentials in config files (use keychain or environment variables)

Project 24: “The Secret Sanitizer Hook” — Secrets Management

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Python (TruffleHog / Gitleaks)
Coolness Level Level 3: Genuinely Clever
Business Potential 3. Service & Support (Security)
Difficulty Level 2: Intermediate
Knowledge Area Secrets Management

What you’ll build: A PostToolUse hook that scans modified files for secrets.

Why it teaches Safety: Prevents accidental secret leakage.

Success criteria:

  • A dummy key is detected and blocked.

Real World Outcome

You’ll create a PostToolUse hook that automatically scans every file written or modified by Kiro for secrets (API keys, passwords, tokens, private keys). When a secret is detected, the hook blocks the operation and alerts you immediately.

Example Hook Execution:

# Kiro writes a file containing a secret
$ kiro "create a .env file with DATABASE_URL=postgres://user:pass@localhost/db"

[Kiro writes .env file]

🚨 SECRET DETECTED in .env (line 1)
   Type: PostgreSQL Connection String
   Pattern: postgres://[user]:[password]@[host]/[db]

   DATABASE_URL=postgres://user:pass@localhost/db
                              ^^^^^^^^

❌ BLOCKED: File write operation prevented.

Recommendations:
  1. Use environment variables instead: DATABASE_URL="${DATABASE_URL}"
  2. Add .env to .gitignore
  3. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault)

[Hook exits with code 1 - operation aborted]

Example scan output:

$ python ~/.kiro/hooks/secret-sanitizer.py

Scanning files modified in last operation...
  ✓ src/app.py - Clean
  ✓ src/utils.py - Clean
  🚨 config.yaml - 2 secrets found
     Line 12: AWS Access Key (AKIA...)
     Line 13: AWS Secret Key (40-char base64 string)
  ✓ README.md - Clean

Summary: 2 secrets detected in 1 file
Action: BLOCK operation (exit code 1)

Integration with Git:

# After blocking the write, show what would have been committed
$ git diff

+AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
+AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

❌ These credentials would have been committed to git!

You’re implementing the same secret detection that GitHub, GitLab, and Bitbucket use to prevent credential leaks.


The Core Question You’re Answering

“How do you prevent developers (and AI agents) from accidentally committing secrets to version control or writing them to unencrypted config files?”

Before you build any detection, understand this: Secret leakage is one of the most common security incidents. Attackers scan public repositories for AWS keys, database passwords, and API tokens. Once a secret is committed to git history, it’s permanently exposed—even if you delete it in a later commit.

Your hook acts as a safety net that catches secrets before they reach git, config files, or logs.


Concepts You Must Understand First

Stop and research these before coding:

  1. Secret Detection Patterns (Entropy, Regex, Signatures)
    • What is Shannon entropy and how is it used to detect random strings (API keys)?
    • How do you write regex patterns for AWS keys (AKIA…), GitHub tokens (ghp_…), etc.?
    • What are false positives (detecting “password” in code comments) and how do you reduce them?
    • Book Reference: “Practical Cryptography” by Niels Ferguson - Ch. 2 (Randomness)
  2. Git Internals and Hooks
    • What is the difference between PostToolUse hooks (Kiro) vs. pre-commit hooks (Git)?
    • How do you scan only the files modified in the last operation (git diff –name-only)?
    • Why can’t you just delete a secret from git history (it’s still in reflog and old commits)?
    • Book Reference: “Pro Git” by Scott Chacon - Ch. 10 (Git Internals)
  3. Secrets Management Best Practices
    • What is the principle of least privilege (why you shouldn’t use root credentials)?
    • How do environment variables (os.environ) protect secrets from being committed?
    • What are secrets managers (AWS Secrets Manager, Vault, 1Password) and when should you use them?
    • Book Reference: “Security Engineering” by Ross Anderson - Ch. 4 (Cryptographic Protocols)
  4. TruffleHog and Gitleaks Internals
    • How does TruffleHog scan git history for high-entropy strings?
    • What is the difference between regex-based detection and entropy-based detection?
    • How do you configure custom patterns (YAML rules for company-specific secrets)?
    • Book Reference: “Practical Malware Analysis” by Sikorski - Ch. 13 (Automated Analysis)
  5. False Positive Reduction
    • How do you distinguish between real secrets and test/example credentials?
    • What is allowlisting (explicitly marking known-safe strings)?
    • How do you handle encrypted secrets (ansible-vault, sops) vs. plaintext?
    • Book Reference: “Building Secure and Reliable Systems” by Google - Ch. 14 (Security Monitoring)
  6. Incident Response for Leaked Secrets
    • What do you do if a secret is detected after commit (rotate immediately)?
    • How do you scan the entire git history for secrets (git log -p)?
    • What is the MITRE ATT&CK framework for credential access (T1552)?
    • Book Reference: “The Art of Memory Forensics” by Ligh - Ch. 8 (Malware Analysis)

Questions to Guide Your Design

Before implementing, think through these:

  1. Detection Strategy
    • Should you scan all files or only modified files (performance trade-off)?
    • Do you run detection on every tool use or only on file writes (Edit, Write)?
    • Should you scan file content or git diffs (diffs are faster but may miss secrets)?
    • How do you handle binary files (images, PDFs) that might contain secrets?
  2. Pattern Library
    • Which secret types are highest priority (AWS, GitHub, Stripe, OpenAI)?
    • Should you use a pre-built pattern library (Gitleaks rules) or custom regex?
    • How do you detect generic secrets (40+ char random strings) vs. specific formats?
    • Should you detect passwords in URLs (https://user:pass@example.com)?
  3. Blocking vs. Warning
    • Should the hook block the operation (exit code 1) or just warn and continue?
    • Do you block on all secrets or only high-confidence detections?
    • Should you allow users to override the block (confirmation prompt)?
    • How do you handle secrets in test fixtures (tests/fixtures/dummy-key.txt)?
  4. User Experience
    • How do you show which line contains the secret without displaying the secret itself?
    • Should you suggest remediation steps (use environment variables, add to .gitignore)?
    • Do you integrate with the terminal (red error messages, visual alerts)?
    • Should you log detections to a file for security auditing?
  5. Performance Optimization
    • How do you avoid scanning the same file multiple times in one session?
    • Should you cache detection results (file hash → scan result)?
    • Do you run scans in parallel (ThreadPoolExecutor) for large codebases?
    • How do you handle large files (> 1MB) that slow down scanning?

Thinking Exercise

Analyze a Real Secret Leak Scenario

Before coding, manually trace how a secret might leak through Kiro’s workflow:

Scenario: Developer asks Kiro to deploy to AWS

# User prompt
$ kiro "deploy the app to AWS using my credentials"

# Kiro (without the hook) might write:
# deploy.sh
#!/bin/bash
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws s3 sync ./build s3://my-bucket

Where secrets could leak:

  1. deploy.sh file (hardcoded credentials)
  2. Git history (if deploy.sh is committed)
  3. Shell history (~/.bash_history if script is run manually)
  4. Logs (if AWS CLI logs credentials in error messages)
  5. CI/CD logs (GitHub Actions logs show environment variables)

Your hook’s detection points:

# PostToolUse hook fires after Write tool creates deploy.sh
event = {
    'tool': 'Write',
    'input': {'file_path': 'deploy.sh', 'content': '#!/bin/bash\nexport AWS_ACCESS_KEY_ID=AKIA...'},
    'output': {'status': 'success'}
}

# Hook scans the written file
findings = scan_file('deploy.sh')
# Finding 1: AWS Access Key (line 2, pattern: AKIA[A-Z0-9]{16})
# Finding 2: AWS Secret Key (line 3, high entropy: 40 random chars)

# Hook blocks the operation
exit(1)  # Reverts the file write

Questions while tracing:

  • At what point in the workflow should the hook scan for secrets?
  • Should you scan the file content or the tool input parameters?
  • What happens if the secret was copied from the user’s prompt (user provided it)?
  • How do you prevent false positives (AKIAIOSFODNN7EXAMPLE is a documented example key)?
  • Should you automatically suggest export AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" as a fix?

Manual test:

# 1. Create a test file with a fake secret
echo "API_KEY=sk_test_4eC39HqLyjWDarjtT1zdp7dc" > .env

# 2. Run Gitleaks on the file
gitleaks detect --source . --verbose
# Leak detected: Generic API Key (line 1)

# 3. Add to allowlist and re-run
echo "API_KEY=sk_test_4eC39HqLyjWDarjtT1zdp7dc" >> .gitleaksignore
gitleaks detect --source . --verbose
# No leaks detected (allowlisted)

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How do secret detection tools like TruffleHog distinguish between real API keys and random strings in the code?”

  2. “A developer committed an AWS access key to git 50 commits ago. What steps would you take to remediate this incident?”

  3. “What is the difference between entropy-based secret detection and regex-based detection? When would you use each?”

  4. “How would you handle false positives, such as detecting ‘password’ in code comments or test data?”

  5. “Why is deleting a secret from the latest commit insufficient to secure the repository?”

  6. “What are the performance trade-offs between scanning on every file write vs. scanning only on git commit?”


Hints in Layers

Hint 1: Start with Pre-Built Tools Don’t write regex patterns from scratch. Use TruffleHog or Gitleaks, which have hundreds of pre-built patterns for common secret types (AWS, GitHub, Stripe, etc.).

Hint 2: Hook Event Structure The PostToolUse hook receives a JSON event on stdin:

{
  "hookType": "PostToolUse",
  "tool": {"name": "Write", "input": {"file_path": "config.yaml", "content": "..."}, "output": {"status": "success"}}
}

Extract file_path and scan it.

Hint 3: Integrate Gitleaks for Fast Scanning

# Install Gitleaks
brew install gitleaks  # macOS
# or download binary from https://github.com/gitleaks/gitleaks/releases

# Scan a single file
gitleaks detect --source /path/to/file --verbose --no-git

# Parse JSON output
gitleaks detect --source . --report-format json --report-path results.json

Hint 4: Exit Code Semantics

  • Exit 0: No secrets found (allow operation)
  • Exit 1: Secrets found (block operation and revert changes)
  • Use sys.exit(1) in Python or exit 1 in Bash

Books That Will Help

Topic Book Chapter
Secret Detection Theory “Practical Cryptography” by Ferguson Ch. 2 (Randomness)
Git Internals “Pro Git” by Scott Chacon Ch. 10 (Git Internals)
Secrets Management “Security Engineering” by Ross Anderson Ch. 4 (Cryptographic Protocols)
Entropy Analysis “Applied Cryptography” by Bruce Schneier Ch. 17 (Randomness)
Incident Response “The Art of Memory Forensics” by Ligh Ch. 8 (Malware Analysis)
Secure Systems “Building Secure Systems” by Google Ch. 14 (Security Monitoring)

Common Pitfalls & Debugging

Problem 1: “Too many false positives (detecting ‘password’ in comments)”

  • Why: Regex patterns are too broad and match non-secrets.
  • Fix: Use entropy analysis (only flag strings with high randomness) or context-aware patterns (exclude comments).
  • Quick test: gitleaks detect --source . --verbose | grep "password" (review all matches)

Problem 2: “Secrets in git history not detected by the hook”

  • Why: The hook only scans new changes, not the entire git history.
  • Fix: Run a one-time full repo scan: gitleaks detect --source . --verbose
  • Quick test: git log -p | grep -E 'AKIA[A-Z0-9]{16}' (manual search for AWS keys)

Problem 3: “Hook blocks valid test fixtures (tests/fixtures/dummy-key.txt)”

  • Why: Test data often includes fake secrets for testing.
  • Fix: Add test directories to .gitleaksignore or use a custom allowlist.
  • Quick test: Create .gitleaksignore with tests/, re-run scan.

Problem 4: “Performance degradation on large repos (> 1000 files)”

  • Why: Scanning every file on every tool use is too slow.
  • Fix: Only scan files modified in the last operation (use git diff --name-only).
  • Quick test: time gitleaks detect --source . (measure scan time before/after optimization)

Problem 5: “Secrets in environment variables not detected”

  • Why: The hook scans files, but secrets might be passed via export VAR=secret in the shell.
  • Fix: Scan shell history (~/.bash_history) or intercept the Bash tool’s input.
  • Quick test: grep -E 'export.*KEY' ~/.bash_history

Problem 6: “No notification when secret is blocked”

  • Why: The hook exits with code 1, but Kiro doesn’t show the hook’s stderr output.
  • Fix: Write findings to a log file (~/.kiro/secret-findings.log) and show the path in the error.
  • Quick test: tail -f ~/.kiro/secret-findings.log (monitor detections in real-time)

Definition of Done

  • Hook scans all files written or modified by Kiro (Write, Edit tools)
  • Detects common secret types (AWS keys, GitHub tokens, API keys, database URLs)
  • Blocks operation when secrets are found (exit code 1)
  • Shows which line contains the secret without displaying the secret itself
  • False positive rate < 5% (most detections are real secrets)
  • Performance: scan completes in < 2 seconds for typical file changes
  • Integrates with Gitleaks or TruffleHog for pattern matching
  • Provides remediation suggestions (use environment variables, add to .gitignore)
  • Logs all detections to ~/.kiro/secret-findings.log for security auditing
  • Supports allowlisting (.gitleaksignore or custom allowlist file)
  • Tested with dummy secrets (fake AWS keys, test database URLs) to verify detection
  • Documentation explains how to override blocks for legitimate use cases

Project 25: “The Tangent Explorer” — Context Management

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language N/A (Interaction)
Coolness Level Level 2: Practical
Difficulty Level 1: Beginner
Knowledge Area Context Management

What you’ll build: Simulate a debugging session, switch to a tangent, then return without polluting context.

Why it teaches Context Hygiene: Tangents keep the main thread clean.

Success criteria:

  • The main context summary ignores tangent content.

Real World Outcome

You’ll learn to use Kiro’s tangent mode to isolate exploratory work, debugging, and side investigations without polluting your main conversation context. This keeps your primary task focused while allowing deep dives into related issues.

Example Workflow:

# Main conversation: Implementing user authentication
$ kiro "implement JWT-based user authentication for the API"

[Kiro starts implementing auth...]

# You notice a dependency version conflict (tangent opportunity)
$ kiro "I'm seeing a dependency conflict with jsonwebtoken. Let me investigate in a tangent."

# Kiro creates a tangent session
[TANGENT MODE: dependency-investigation]

$ kiro "show me all versions of jsonwebtoken in package-lock.json and explain the conflict"
[Kiro analyzes dependencies in tangent]

$ kiro "what's the difference between jsonwebtoken 8.x and 9.x?"
[Kiro researches in tangent]

$ kiro "update to jsonwebtoken 9.5.2 and verify tests pass"
[Kiro fixes the issue in tangent]

# Exit tangent and return to main task
$ kiro "tangent resolved, return to main"

[MAIN CONTEXT: implementing JWT-based user authentication]
[Tangent summary: Fixed jsonwebtoken version conflict -> upgraded to 9.5.2]

# Main context continues, enriched but not polluted
$ kiro "continue implementing auth with the updated jsonwebtoken library"

What you’ll see in the session history:

Main Context (auth-implementation):
├─ User: implement JWT-based auth
├─ Kiro: [implementation steps]
├─ [TANGENT: dependency-investigation]
│  └─ Summary: Upgraded jsonwebtoken 8.x → 9.5.2, tests passing
└─ Kiro: [continues with auth using new library]

Tangent Context (dependency-investigation - ISOLATED):
├─ User: show me all versions of jsonwebtoken
├─ Kiro: [detailed analysis of package-lock.json]
├─ User: what's the difference between 8.x and 9.x?
├─ Kiro: [research on breaking changes]
├─ User: update to 9.5.2 and verify tests
└─ Kiro: [upgrade + test verification]

Key benefit: The main context remains focused on “implement auth” and doesn’t include 15 messages about npm dependency resolution. Only the summary (“upgraded jsonwebtoken → 9.5.2”) is surfaced.


The Core Question You’re Answering

“How do you keep your primary conversation focused while exploring tangential issues, debugging edge cases, or researching related topics without losing all your context to noise?”

Before you use tangent mode, understand this: LLM context windows are finite (even at 200K tokens). Every message consumes context budget. If you debug a webpack config issue (20 messages) in the middle of implementing a feature (30 messages), the combined 50-message thread becomes hard to summarize and maintain focus on the original goal.

Tangent mode isolates exploratory work, keeping the main thread clean and summarizable.


Concepts You Must Understand First

Stop and research these before using tangents:

  1. Context Window and Token Budgets
    • What is the context window size for Claude (200K tokens)?
    • How many tokens does a typical message consume (prompt + response)?
    • What happens when context fills up (automatic summarization vs. truncation)?
    • Book Reference: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 1 (Data Models)
  2. Conversation Threading and Branching
    • How do chat systems maintain conversation history (linear vs. tree structure)?
    • What is the difference between forking a conversation and creating a tangent?
    • How do you merge insights from a tangent back into the main thread?
    • Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 7 (Debugging)
  3. Summarization and Information Compression
    • How do LLMs summarize long conversations (extractive vs. abstractive summarization)?
    • What information is preserved vs. discarded in a summary?
    • How do you ensure critical decisions from tangents are surfaced in summaries?
    • Book Reference: “Speech and Language Processing” by Jurafsky - Ch. 23 (Summarization)
  4. Context Hygiene Best Practices
    • When should you start a tangent vs. continue in the main thread?
    • How do you name tangents descriptively (for future reference)?
    • What is the “two-level rule” (main + tangent, avoid nested tangents)?
    • Book Reference: “Working Effectively with Legacy Code” by Feathers - Ch. 6 (Code Smells)
  5. Cognitive Load Management
    • What is working memory capacity (7 ± 2 items) and how does it apply to chat?
    • How does context switching (main → tangent → main) affect productivity?
    • What is the “one primary task” principle (single focus for the main thread)?
    • Book Reference: “Thinking, Fast and Slow” by Daniel Kahneman - Ch. 8 (Cognitive Effort)

Questions to Guide Your Design

Before starting a tangent, ask yourself:

  1. Tangent Trigger Criteria
    • Is this issue critical to the main task, or can it be deferred?
    • Will exploring this now derail the main conversation for > 5 messages?
    • Is this a debugging session that might involve trial-and-error (many iterations)?
    • Would solving this in-line make the main thread hard to read later?
  2. Tangent Scope
    • What specific question am I trying to answer in the tangent?
    • What does “done” look like (clear exit criteria)?
    • How will I bring the result back to the main thread (summary format)?
    • What happens if the tangent doesn’t resolve the issue (abandon or continue)?
  3. Context Preservation
    • What information from the main thread does the tangent need (dependencies)?
    • Should the tangent have full access to the codebase, or limited scope?
    • How do I avoid repeating context setup in the tangent (cached state)?
    • What happens if I start another tangent while in a tangent (nesting limits)?
  4. Return Strategy
    • How do I signal “return to main” (explicit command or automatic)?
    • What summary information should be surfaced (key decisions, outcomes)?
    • Do I need to re-state the main task when returning (context refresh)?
    • Should the tangent remain accessible for future reference?
  5. Workflow Integration
    • Can I start a tangent mid-operation (while Kiro is running a command)?
    • How do I handle tangents in team settings (shared context)?
    • Should tangents be saved in session history (for reproducibility)?
    • Can I export a tangent as a standalone session (for sharing)?

Thinking Exercise

Trace a Context Pollution Scenario

Before using tangent mode, manually trace how context gets polluted without isolation:

Scenario: Implementing API authentication + debugging CORS issue

WITHOUT TANGENT MODE (polluted context):

Message 1 [Main]: Implement JWT auth for /api/login
Message 2 [Main]: [Kiro implements auth endpoint]
Message 3 [Main]: Test the endpoint with curl
Message 4 [Main]: [Kiro runs curl, gets CORS error]

Message 5 [Detour]: Why am I getting CORS errors?
Message 6 [Detour]: [Kiro explains CORS preflight]
Message 7 [Detour]: Show me the Express CORS config
Message 8 [Detour]: [Kiro shows current config]
Message 9 [Detour]: Update CORS to allow http://localhost:3000
Message 10 [Detour]: [Kiro updates config]
Message 11 [Detour]: Test again with curl
Message 12 [Detour]: [Kiro runs curl, still fails]
Message 13 [Detour]: Check if OPTIONS requests work
Message 14 [Detour]: [Kiro tests OPTIONS]
Message 15 [Detour]: Add Access-Control-Allow-Credentials header
Message 16 [Detour]: [Kiro adds header]
Message 17 [Detour]: Test one more time
Message 18 [Detour]: [Kiro tests, success!]

Message 19 [Main]: OK, now continue with JWT auth
Message 20 [Main]: [Kiro has to re-read context, slowed by CORS noise]

# Problem: Messages 5-18 (14 messages) about CORS pollute the auth implementation context
# Summary would include both auth AND CORS details, losing focus

WITH TANGENT MODE (clean context):

Main Context:
Message 1: Implement JWT auth for /api/login
Message 2: [Kiro implements auth endpoint]
Message 3: Test the endpoint
Message 4: [Kiro runs curl, gets CORS error]
Message 5: [TANGENT: cors-debugging] Fixing CORS issue...
Message 6: [Tangent resolved: Added Access-Control-Allow-Credentials header]
Message 7: Test the fixed endpoint
Message 8: [Kiro tests, success! Continues with auth]

Tangent Context (cors-debugging - ISOLATED):
Message 1: Why am I getting CORS errors?
Message 2-14: [Full CORS debugging session]
Message 15: [Exit: CORS fixed]

# Main context: 8 messages (focused on auth)
# Tangent context: 15 messages (isolated CORS debugging)
# Main summary: "Implemented JWT auth, fixed CORS in tangent"

Questions while tracing:

  • At what point should you start a tangent (message 5, when you realize CORS is a side issue)?
  • What information from the tangent is essential for the main context (just the fix, not the debugging process)?
  • How does the main context “remember” what it was doing before the tangent?
  • What would happen if you needed to debug CORS again later (can you re-open the tangent)?
  • How would summarization differ (main context summary vs. full conversation summary)?

Manual test:

# 1. Start a main task
$ kiro "refactor the user service to use TypeScript"

# 2. Notice a side issue (linting error)
# Instead of debugging in-line, start a tangent
$ kiro "start tangent: fix-eslint-config"

# 3. Work in the tangent
$ kiro "why is ESLint complaining about implicit any?"
$ kiro "update tsconfig.json to enable noImplicitAny"

# 4. Exit and return to main
$ kiro "return to main"

# 5. Verify main context is clean
$ kiro "what were we working on?"
# Response: "We're refactoring the user service to TypeScript" (no ESLint details)

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How do you manage context in a long-running AI agent conversation to prevent focus drift?”

  2. “What is the difference between creating a new chat session and using a tangent within the same session?”

  3. “A developer wants to explore three different implementation approaches for a feature. How would you structure that workflow to keep context clean?”

  4. “How does context summarization work in LLMs, and what information is typically lost during summarization?”

  5. “When would you NOT use a tangent (i.e., when is in-line exploration better)?”

  6. “How do you balance context preservation (keeping everything) with context efficiency (keeping it focused)?”


Hints in Layers

Hint 1: Start Simple Use tangents for clear, isolated sub-tasks: debugging a dependency issue, researching an error message, exploring alternative approaches. Avoid tangents for tasks that are core to the main goal.

Hint 2: Name Tangents Descriptively Use short, descriptive names (e.g., “cors-debugging”, “dependency-conflict”, “api-error-research”). This helps when reviewing session history later.

Hint 3: Explicit Entry and Exit Always explicitly enter and exit tangents with commands like:

kiro "start tangent: investigate-webpack-error"
# [work in tangent]
kiro "return to main"

Hint 4: One-Sentence Summaries When exiting a tangent, summarize the outcome in one sentence:

kiro "tangent resolved: upgraded webpack 4→5, fixed config syntax"

This summary is what gets preserved in the main context.


Books That Will Help

Topic Book Chapter
Data Models “Designing Data-Intensive Applications” by Kleppmann Ch. 1 (Data Models)
Debugging Strategies “The Pragmatic Programmer” by Hunt & Thomas Ch. 7 (Debugging)
Summarization “Speech and Language Processing” by Jurafsky Ch. 23 (Summarization)
Cognitive Load “Thinking, Fast and Slow” by Kahneman Ch. 8 (Cognitive Effort)
Working Memory “The Design of Everyday Things” by Norman Ch. 3 (Knowledge in the Head)
Context Management “Working Effectively with Legacy Code” by Feathers Ch. 6 (Code Smells)

Common Pitfalls & Debugging

Problem 1: “I forgot to exit the tangent and continued working in tangent mode”

  • Why: No explicit “return to main” command was issued.
  • Fix: Always end tangents with kiro "return to main" or set up an auto-exit after N messages.
  • Quick test: Check the current mode: kiro "what mode are we in?" (should say “main” or “tangent”)

Problem 2: “Tangent summary is too vague (‘fixed the issue’)”

  • Why: The summary doesn’t capture what was actually done.
  • Fix: Write a specific summary: “upgraded jsonwebtoken 8.5.1 → 9.5.2, tests passing” not just “fixed dependency”.
  • Quick test: Read the summary and ask “Could I reproduce this fix from the summary alone?”

Problem 3: “Main context lost track of what we were doing”

  • Why: The tangent took too long (> 30 messages), and summarization lost key details.
  • Fix: Keep tangents short (< 20 messages). For complex issues, create a new top-level session instead.
  • Quick test: After exiting tangent, ask kiro "what were we working on?" (should recall main task)

Problem 4: “I need information from the tangent in the main context”

  • Why: The tangent summary was too brief.
  • Fix: Reference the tangent explicitly: kiro "pull the CORS config from the cors-debugging tangent".
  • Quick test: Tangents should remain accessible (as sub-sessions) even after exit.

Problem 5: “Nested tangents (tangent within a tangent)”

  • Why: You started a new tangent while already in tangent mode.
  • Fix: Avoid nesting. Exit the current tangent first, then start a new one from main.
  • Quick test: kiro "are we in a nested tangent?" (tool should warn against nesting)

Problem 6: “Tangent mode not available in my Kiro version”

  • Why: Tangent mode is a newer feature (or not yet implemented in your CLI).
  • Fix: Use workaround: start a new session with kiro --session tangent-name, exit and return to main session.
  • Quick test: kiro --version (check if tangent mode is supported)

Definition of Done

  • Successfully started a tangent session for a side investigation
  • Completed the tangent task (resolved the issue or answered the question)
  • Exited the tangent with a clear, one-sentence summary
  • Main context does NOT contain tangent messages (only the summary)
  • Main context correctly resumed the primary task after tangent exit
  • Tangent session is accessible for future reference (can re-open if needed)
  • Verified that summarization of the main context excludes tangent details
  • No nested tangents (stayed within the “two-level rule”: main + one tangent)
  • Named the tangent descriptively (easy to identify in session history)
  • Main task completed successfully without context pollution

Project 26: “The Checkpoint Time Machine” — Safety Systems

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Git / Kiro
Coolness Level Level 3: Genuinely Clever
Difficulty Level 2: Intermediate
Knowledge Area Safety Systems

What you’ll build: Snapshot, perform risky edits, and restore instantly.

Why it teaches Fearless Coding: You can let the agent be aggressive without fear.

Success criteria:

  • A restore returns the repo to a known good state.

Real World Outcome

You’ll create a checkpoint system that snapshots your entire codebase before Kiro makes risky changes, allowing you to instantly roll back if things go wrong. This enables fearless experimentation with aggressive refactoring, large-scale changes, and exploratory edits.

Example Workflow:

# Before risky refactoring
$ kiro checkpoint create "before-refactor-user-service"
✓ Checkpoint created: cp_20240115_143022_before-refactor-user-service
  Files: 156 tracked, 12 untracked
  Git ref: refs/checkpoints/cp_20240115_143022

# Let Kiro make aggressive changes
$ kiro "refactor the entire user service to use dependency injection"

[Kiro makes extensive changes across 23 files...]
[Tests start failing...]

# Oh no, restore the checkpoint
$ kiro checkpoint restore cp_20240115_143022_before-refactor-user-service

Restoring checkpoint: cp_20240115_143022_before-refactor-user-service
✓ Git worktree reset to refs/checkpoints/cp_20240115_143022
✓ Untracked files restored from .kiro/checkpoints/cp_20240115_143022/
✓ 23 files reverted, 12 untracked files restored
  Took 0.8s

# Codebase is back to exactly how it was before
$ git status
# On branch main
# nothing to commit, working tree clean

Checkpoint workflow:

# 1. Create checkpoint before experiment
$ kiro checkpoint create "experiment-async-await"

# 2. Try risky changes
$ kiro "convert all promise chains to async/await"
$ npm test
# Tests fail...

# 3. Restore (instant rollback)
$ kiro checkpoint restore experiment-async-await

# 4. List all checkpoints
$ kiro checkpoint list
cp_20240115_143022_before-refactor-user-service
cp_20240115_150431_experiment-async-await
cp_20240115_152103_before-schema-migration

# 5. Delete old checkpoints
$ kiro checkpoint delete cp_20240115_143022_before-refactor-user-service

What makes this different from git stash or git commit:

Feature Checkpoint Git Stash Git Commit
Speed Instant (<1s) Fast (~2s) Slow (>5s with hooks)
Untracked files ✓ Included ✗ Not included ✗ Not included
Easy restore checkpoint restore stash apply (conflicts) git reset --hard (dangerous)
Named ✓ Descriptive ✗ “WIP on main” ✓ Commit message
No git history pollution ✓ Clean ✓ Clean ✗ Creates commits

You’re building the same safety net that professional game developers use (“checkpoint before boss fight”).


The Core Question You’re Answering

“How do you experiment fearlessly with an AI agent making large-scale code changes when you know you might need to revert everything instantly?”

Before you build any checkpoint system, understand this: The fear of breaking the codebase is the #1 reason developers are conservative with AI agents. If you can’t easily undo changes, you’ll never let the agent be truly creative or aggressive.

Checkpoints remove that fear. You can say “try a radical refactoring” knowing you have a one-command rollback.


Concepts You Must Understand First

Stop and research these before coding:

  1. Git Worktree and Index
    • What is the difference between the working tree, the index (staging area), and HEAD?
    • How does git reset --hard work (and why is it dangerous)?
    • What are git refs (references like refs/heads/main, refs/tags/v1.0)?
    • Book Reference: “Pro Git” by Scott Chacon - Ch. 10 (Git Internals)
  2. Untracked Files and .gitignore
    • Why don’t git stash and git commit save untracked files?
    • How do you capture untracked files (tar, rsync, cp -r)?
    • What happens to .gitignored files during checkpoint/restore?
    • Book Reference: “Version Control with Git” by Loeliger - Ch. 5 (Working Trees)
  3. Atomic Operations and Race Conditions
    • How do you ensure checkpoint creation is atomic (all-or-nothing)?
    • What happens if Kiro creates a checkpoint while you’re editing files?
    • How do you handle concurrent checkpoint operations (locking)?
    • Book Reference: “The Linux Programming Interface” by Kerrisk - Ch. 30 (File Locking)
  4. Filesystem Snapshots vs. Git-Based Snapshots
    • What are filesystem snapshots (ZFS snapshots, Btrfs snapshots, LVM snapshots)?
    • Why use git refs instead of full filesystem snapshots (storage efficiency)?
    • How do you handle binary files and large assets (Git LFS)?
    • Book Reference: “Understanding the Linux Kernel” by Bovet - Ch. 16 (File Systems)
  5. Restoration Strategies (Full vs. Partial)
    • Should you restore the entire working tree or just specific files?
    • How do you handle merge conflicts during restore (overwrite vs. merge)?
    • What happens to local changes made after checkpoint creation?
    • Book Reference: “Git Pocket Guide” by Silverman - Ch. 3 (Undoing Changes)
  6. Checkpoint Lifecycle Management
    • When should checkpoints be automatically cleaned up (after N days, after restore)?
    • How much disk space do checkpoints consume (ref + untracked files)?
    • Should checkpoints survive git clean -fdx (store outside .git/)?
    • Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 3 (Storage)

Questions to Guide Your Design

Before implementing, think through these:

  1. Snapshot Granularity
    • Should checkpoints capture the entire repo or just specific directories?
    • Do you include node_modules, build artifacts, and .env files?
    • Should checkpoints be branch-specific or global across all branches?
    • How do you handle submodules and nested git repositories?
  2. Storage Strategy
    • Where do you store checkpoints (.kiro/checkpoints/, .git/refs/checkpoints/)?
    • Do you use git refs (lightweight) or full tar archives (heavyweight)?
    • How do you compress untracked files (tar.gz, zip, rsync)?
    • Should checkpoints be shareable across team members (git remote)?
  3. Restoration Safety
    • Should restore require confirmation (interactive prompt)?
    • Do you warn if there are uncommitted changes before restore?
    • Should restore create a “pre-restore” checkpoint automatically?
    • How do you handle file deletions (restore deleted files)?
  4. Naming and Discovery
    • How do you auto-generate checkpoint names (timestamp + description)?
    • Should checkpoints be tagged with metadata (author, timestamp, git commit)?
    • How do you search for checkpoints (by name, date, commit range)?
    • Can you compare two checkpoints (diff cp1 cp2)?
  5. Integration with Kiro
    • Should Kiro automatically create checkpoints before risky operations?
    • Can Kiro suggest when to create a checkpoint (heuristic: changing > 10 files)?
    • Should restore be a recoverable operation (keep restore history)?
    • How do you visualize checkpoint history (timeline, tree view)?

Thinking Exercise

Trace a Checkpoint and Restore Cycle

Before coding, manually simulate creating and restoring a checkpoint using git:

Scenario: Risky refactoring with checkpoint safety net

# Initial state
$ git status
# On branch main
# nothing to commit, working tree clean

# Create checkpoint (manual simulation)
$ git update-ref refs/checkpoints/cp_001 HEAD
$ tar -czf .kiro/checkpoints/cp_001_untracked.tar.gz \
    $(git ls-files --others --exclude-standard)

# Checkpoint created (stored: git ref + untracked files archive)

# Risky changes
$ kiro "refactor all database queries to use TypeORM instead of raw SQL"
[Kiro modifies 30 files, creates 12 new files]

# Tests fail
$ npm test
# 23 tests failed

# Restore checkpoint (manual simulation)
$ git reset --hard refs/checkpoints/cp_001  # Reset tracked files
Unstaged changes after reset:
M  src/db/queries.ts (modified but not committed after checkpoint)

$ tar -xzf .kiro/checkpoints/cp_001_untracked.tar.gz  # Restore untracked
$ git clean -fd  # Remove new untracked files created after checkpoint

# Verify restoration
$ git status
# On branch main
# nothing to commit, working tree clean

$ npm test
# All tests passing (back to checkpoint state)

Questions while tracing:

  • What happens to files that existed at checkpoint time but were deleted during the risky changes?
  • How do you handle files that were modified both before and after checkpoint creation?
  • What if the user has uncommitted changes when they try to restore?
  • Should restore delete new files created after the checkpoint (git clean -fd)?
  • How do you restore files that are in .gitignore (they’re untracked but intentionally ignored)?

Edge cases to consider:

# Edge Case 1: Checkpoint with dirty working tree
$ git status
# On branch main
# Changes not staged for commit:
#   modified: src/app.ts

$ kiro checkpoint create "dirty-state"
# Should this be allowed? Or require a clean working tree?

# Edge Case 2: Restore with uncommitted changes
$ kiro checkpoint restore cp_001
# Warning: You have uncommitted changes. Restore will overwrite them.
# Continue? (y/N)

# Edge Case 3: Checkpoint on non-main branch
$ git checkout feature-branch
$ kiro checkpoint create "feature-experiment"
# Should this checkpoint be branch-specific or global?

# Edge Case 4: Restore after git commit
$ kiro checkpoint create cp_001
$ kiro "refactor code"
$ git add . && git commit -m "refactor"
$ kiro checkpoint restore cp_001
# Restores working tree, but leaves commit in history?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is the difference between git stash, git commit, and a custom checkpoint system for saving codebase state?”

  2. “How would you implement a checkpoint system that captures both tracked files (in git) and untracked files (not in git)?”

  3. “A developer creates a checkpoint, makes changes, commits those changes, then restores the checkpoint. What happens to the git commit history?”

  4. “What are the trade-offs between storing checkpoints as git refs vs. full tar archives of the working directory?”

  5. “How do you handle checkpoint restoration when there are merge conflicts (files modified both at checkpoint time and after)?”

  6. “Why might a checkpoint system fail to restore a repository to its exact prior state, even with all files backed up?”


Hints in Layers

Hint 1: Start with Git Refs Use git update-ref refs/checkpoints/<name> HEAD to create a lightweight git reference to the current commit. This is fast and doesn’t duplicate the entire repository.

Hint 2: Capture Untracked Files Separately Git refs only track committed files. Use git ls-files --others --exclude-standard to find untracked files and tar them:

tar -czf .kiro/checkpoints/<name>_untracked.tar.gz $(git ls-files --others)

Hint 3: Restore in Two Steps First reset tracked files: git reset --hard refs/checkpoints/<name> Then restore untracked files: tar -xzf .kiro/checkpoints/<name>_untracked.tar.gz

Hint 4: Add Safety Checks Before restore, check for uncommitted changes:

if ! git diff-index --quiet HEAD --; then
  echo "Warning: You have uncommitted changes."
  read -p "Continue? (y/N) " -n 1 -r
fi

Books That Will Help

Topic Book Chapter
Git Internals “Pro Git” by Scott Chacon Ch. 10 (Git Internals)
Working Trees “Version Control with Git” by Loeliger Ch. 5 (Working Trees)
File Locking “The Linux Programming Interface” by Kerrisk Ch. 30 (File Locking)
Filesystems “Understanding the Linux Kernel” by Bovet Ch. 16 (File Systems)
Undoing Changes “Git Pocket Guide” by Silverman Ch. 3 (Undoing Changes)
Storage Engines “Designing Data-Intensive Applications” by Kleppmann Ch. 3 (Storage)

Common Pitfalls & Debugging

Problem 1: “Restore doesn’t bring back deleted files”

  • Why: git reset --hard only resets files tracked by git, not untracked files.
  • Fix: Also restore untracked files from the checkpoint’s tar archive.
  • Quick test: Create checkpoint, delete a file, restore, verify file exists.

Problem 2: “Checkpoint failed mid-creation (partial checkpoint)”

  • Why: tar command failed due to disk space or permission issues.
  • Fix: Make checkpoint creation atomic: write to temp location, then move.
  • Quick test: Fill disk during checkpoint creation, verify no partial checkpoints exist.

Problem 3: “Restore leaves new files created after checkpoint”

  • Why: Restore doesn’t delete files that didn’t exist at checkpoint time.
  • Fix: Run git clean -fd after restore to remove untracked files.
  • Quick test: Create checkpoint, add new file, restore, verify new file is gone.

Problem 4: “Checkpoints consume too much disk space (> 1GB each)”

  • Why: Untracked files include node_modules, build artifacts, or large assets.
  • Fix: Exclude large directories from checkpoint: tar --exclude='node_modules' ...
  • Quick test: du -sh .kiro/checkpoints/ (should be < 100MB per checkpoint)

Problem 5: “Restore fails with ‘ref not found’ error”

  • Why: The git ref was deleted (manual cleanup or git gc).
  • Fix: Store checkpoints as actual commits (or tags) instead of refs.
  • Quick test: git show-ref refs/checkpoints/ (list all checkpoint refs)

Problem 6: “Checkpoints don’t survive git clean -fdx”

  • Why: .kiro/checkpoints/ was deleted by git clean.
  • Fix: Add .kiro/ to .gitignore so git clean skips it.
  • Quick test: git clean -fdx && ls .kiro/checkpoints/ (should still exist)

Definition of Done

  • Can create a checkpoint with a descriptive name
  • Checkpoint captures all tracked files (via git ref)
  • Checkpoint captures all untracked files (via tar archive)
  • Restore brings back the exact working tree state (all files match)
  • Restore removes files created after checkpoint (clean working tree)
  • Checkpoint creation is atomic (no partial checkpoints on failure)
  • Can list all checkpoints with timestamps and descriptions
  • Can delete old checkpoints to free disk space
  • Restore warns if there are uncommitted changes
  • Checkpoint system survives git clean -fdx
  • Tested with risky refactoring (create checkpoint, break code, restore successfully)
  • Documentation explains when to create checkpoints and how to restore

Project 27: “The Checklist Manager” — Task Management

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Markdown
Coolness Level Level 2: Practical
Difficulty Level 1: Beginner
Knowledge Area Task Management

What you’ll build: Use /todo to turn a brain dump into executable steps.

Why it teaches Structured Execution: It enforces a real work queue.

Success criteria:

  • Items are executed and checked off by Kiro.

Real World Outcome

You’ll use Kiro’s /todo feature to convert unstructured brain dumps into structured, executable task lists. Kiro will work through the list systematically, checking off completed tasks and reporting progress.

Example Workflow:

# Initial brain dump (unstructured)
$ kiro "I need to add user authentication to the API. Also fix the CORS issues. And we should add logging. Oh, and update the README."

[Kiro responds with a plan, but it's not tracked]

# Better: Use /todo for structured execution
$ kiro "/todo Create tasks for: add JWT auth, fix CORS, add Winston logging, update README"

✓ Created todo list:
[ ] 1. Implement JWT authentication for API endpoints
[ ] 2. Fix CORS configuration to allow frontend domain
[ ] 3. Add Winston logging with log rotation
[ ] 4. Update README with new auth flow and setup instructions

# Kiro works through the list
[Working on task 1...]
✓ 1. Implement JWT authentication for API endpoints (DONE)
   - Created /auth/login and /auth/register endpoints
   - Added JWT middleware for protected routes
   - Tests passing

[ ] 2. Fix CORS configuration to allow frontend domain
[ ] 3. Add Winston logging with log rotation
[ ] 4. Update README with new auth flow and setup instructions

# Continue with next task
[Working on task 2...]
✓ 2. Fix CORS configuration to allow frontend domain (DONE)
   - Updated Express CORS config to whitelist https://app.example.com
   - Added preflight OPTIONS handling

[...continues until all tasks complete...]

Final status:
✓ 1. Implement JWT authentication for API endpoints (DONE)
✓ 2. Fix CORS configuration to allow frontend domain (DONE)
✓ 3. Add Winston logging with log rotation (DONE)
✓ 4. Update README with new auth flow and setup instructions (DONE)

All tasks completed! 🎉

Visual progress tracking:

$ kiro "/todo status"

Progress: [████████░░] 80% (4/5 tasks complete)

✓ 1. Set up CI/CD pipeline (DONE)
✓ 2. Add unit tests for auth service (DONE)
✓ 3. Implement rate limiting (DONE)
✓ 4. Add API documentation with Swagger (DONE)
⏳ 5. Deploy to staging environment (IN PROGRESS)
   - Current step: Running database migrations...

Key benefits:

  • No tasks forgotten (explicit checklist)
  • Clear progress visibility (X of Y tasks done)
  • Prioritization (numbered order)
  • Context preservation (Kiro remembers what’s next)
  • Audit trail (what was done, when)

You’re using the same task management system that agile teams use (sprint backlogs, kanban boards).


The Core Question You’re Answering

“How do you ensure complex multi-step tasks are executed completely and in order, without forgetting steps or getting distracted by tangents?”

Before you use /todo, understand this: Human working memory holds 7 ± 2 items. When you ask Kiro to do 10 things in one prompt, some will be forgotten or deprioritized. A todo list externalizes the task queue, ensuring systematic execution.

The todo system transforms “do these 10 things” (vague) into “here are 10 explicit steps, execute in order” (structured).


Concepts You Must Understand First

Stop and research these before using /todo:

  1. Task Decomposition and Prioritization
    • How do you break down a large goal (“add authentication”) into atomic tasks?
    • What makes a task “atomic” (single responsibility, verifiable completion)?
    • How do you prioritize tasks (dependencies, critical path, quick wins)?
    • Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 2 (Orthogonality)
  2. Markdown Checklists and Syntax
    • What is the GitHub-flavored markdown syntax for checklists (- [ ] vs. - [x])?
    • How do you represent task states (pending, in-progress, done, blocked)?
    • Can you nest sub-tasks (hierarchical checklists)?
    • Book Reference: “The Markdown Guide” by Matt Cone - Ch. 4 (Extended Syntax)
  3. Workflow State Machines
    • What are the valid state transitions (pending → in-progress → done)?
    • Can you skip states (pending → done without in-progress)?
    • How do you handle task failures (done → failed → retry)?
    • Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 7 (Transactions)
  4. Context Preservation Across Tasks
    • How does Kiro remember task N while working on task N+1?
    • What happens if a task requires information from a previous task?
    • Should tasks be independent (stateless) or can they build on each other (stateful)?
    • Book Reference: “Clean Code” by Robert C. Martin - Ch. 3 (Functions)
  5. Interruption and Resumption
    • What happens if you interrupt Kiro mid-task (Ctrl+C)?
    • Can you resume from the last completed task?
    • Should incomplete tasks be marked as “blocked” or deleted?
    • Book Reference: “The Mythical Man-Month” by Frederick Brooks - Ch. 10 (Tracking Progress)
  6. Audit Trail and Accountability
    • How do you track who completed each task (user vs. Kiro)?
    • Should you log timestamps for task start and completion?
    • Can you export the completed checklist as a report?
    • Book Reference: “Accelerate” by Forsgren, Humble, Kim - Ch. 4 (Measurement)

Questions to Guide Your Design

Before creating a todo list, think through these:

  1. Task Granularity
    • Should tasks be small (< 5 minutes each) or large (> 1 hour)?
    • Do you group related tasks (all database tasks together)?
    • Should each task have acceptance criteria (how do you know it’s done)?
    • What’s the optimal number of tasks (3-5 vs. 20-30)?
  2. Ordering and Dependencies
    • Do tasks have dependencies (task B requires task A to complete first)?
    • Should Kiro enforce dependency order or just suggest it?
    • Can tasks be parallelized (run tests while fixing linting)?
    • How do you represent blocked tasks (waiting on external input)?
  3. State Management
    • Where is the todo list stored (.kiro/todos.md, in-memory, database)?
    • How does Kiro update task states (mark as done, add notes)?
    • Can you manually edit the todo list (add/remove tasks mid-execution)?
    • Should tasks persist across Kiro sessions (resume later)?
  4. Error Handling
    • What happens if a task fails (mark as failed, retry, skip)?
    • Should Kiro stop execution on first failure (fail-fast)?
    • How do you handle partial completion (task 50% done)?
    • Can you rollback completed tasks if a later task fails?
  5. Reporting and Visibility
    • How do you show progress (X of Y tasks done, percentage)?
    • Should Kiro report estimated time remaining (based on historical velocity)?
    • Can you generate a summary report (what was done, time taken)?
    • How do you visualize the task graph (dependencies, critical path)?

Thinking Exercise

Convert a Brain Dump to a Structured Todo List

Before using /todo, manually trace how you would decompose a vague request:

Scenario: “Improve the app’s performance”

Unstructured brain dump: “The app is slow. We should optimize the database queries, add caching, compress images, minify JavaScript, enable CDN, and maybe use a load balancer. Also fix memory leaks.”

Structured todo list (atomic, prioritized, verifiable):

## Performance Improvement Tasks

### Critical Path (do these first)
- [ ] 1. Profile the app to identify bottlenecks
      - Use Chrome DevTools Performance tab
      - Identify the slowest 3 operations
      - Expected outcome: Waterfall chart showing blocking operations

- [ ] 2. Optimize database queries (N+1 problem detected)
      - Add indexes to `users.email` and `posts.author_id`
      - Replace eager loading with joins
      - Expected outcome: Query time < 50ms (currently 2s)

- [ ] 3. Add Redis caching for user sessions
      - Install redis, configure connection
      - Cache user profile lookups (TTL 5 minutes)
      - Expected outcome: 90% cache hit rate

### Quick Wins (easy, high impact)
- [ ] 4. Enable gzip compression on API responses
      - Add compression middleware to Express
      - Expected outcome: Response size reduced by 70%

- [ ] 5. Minify and bundle JavaScript (Webpack production build)
      - Run `npm run build:prod`
      - Expected outcome: Bundle size < 200KB (currently 1.5MB)

### Nice-to-Have (defer if time limited)
- [ ] 6. Set up CDN for static assets (CloudFront)
      - Configure S3 bucket with CloudFront distribution
      - Expected outcome: Asset load time < 100ms globally

- [ ] 7. Investigate memory leaks (long-term monitoring)
      - Add heap snapshot capture on production
      - Review weekly for memory growth trends

Questions while decomposing:

  • Which tasks are prerequisites for others (profiling before optimization)?
  • Which tasks are independent and can be done in parallel (caching + minification)?
  • Which tasks have measurable outcomes (query time, bundle size)?
  • Which tasks are risky and need checkpoints (database migration)?
  • Which tasks can be deferred if the deadline is tight (CDN setup)?

Manual test:

# 1. Create the todo list
$ kiro "/todo Improve app performance: profile, optimize queries, add caching, compress assets"

# Kiro generates the structured list above

# 2. Execute tasks in order
$ kiro "/todo execute"

# Kiro starts with task 1 (profiling)

# 3. Check progress mid-execution
$ kiro "/todo status"
# Progress: [███░░░░] 40% (2/7 tasks complete)

# 4. Mark a task as blocked (waiting for DBA approval)
$ kiro "/todo block 2 --reason 'Waiting for DBA to approve index creation'"

# 5. Skip to next unblocked task
$ kiro "/todo next"

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is the difference between a task list and a kanban board for managing development work?”

  2. “How do you decompose a large, vague requirement (‘make the app faster’) into atomic, executable tasks?”

  3. “What makes a task ‘atomic’? What are the characteristics of a well-defined task?”

  4. “How would you handle task dependencies in a todo system (task B requires task A to complete first)?”

  5. “What state transitions are valid for a task (e.g., pending → in-progress → done)?”

  6. “How do you measure progress when some tasks are large (1 day) and others are small (5 minutes)?”


Hints in Layers

Hint 1: Start with a Brain Dump First, tell Kiro everything you want to accomplish (unstructured). Then ask Kiro to convert it into a structured /todo list.

Hint 2: Use Verifiable Outcomes Each task should have a clear “done” condition: “Add unit tests” (vague) vs. “Add tests until coverage is > 80%” (verifiable).

Hint 3: Prioritize with Numbers Prefix tasks with numbers to enforce execution order: “1. Set up database”, “2. Run migrations”, “3. Seed test data”.

Hint 4: Separate “What” from “How” The task describes what to achieve, not how to do it. “Add JWT auth” (what) not “Install jsonwebtoken library, create /auth/login endpoint, add middleware…” (how).


Books That Will Help

Topic Book Chapter
Task Decomposition “The Pragmatic Programmer” by Hunt & Thomas Ch. 2 (Orthogonality)
Markdown Syntax “The Markdown Guide” by Matt Cone Ch. 4 (Extended Syntax)
State Machines “Designing Data-Intensive Applications” by Kleppmann Ch. 7 (Transactions)
Function Design “Clean Code” by Robert C. Martin Ch. 3 (Functions)
Progress Tracking “The Mythical Man-Month” by Brooks Ch. 10 (Tracking Progress)
Measurement “Accelerate” by Forsgren, Humble, Kim Ch. 4 (Measurement)

Common Pitfalls & Debugging

Problem 1: “Tasks are too vague (‘improve the code’)”

  • Why: No clear acceptance criteria or measurable outcome.
  • Fix: Rewrite as “Increase test coverage from 60% to 80%” or “Reduce cyclomatic complexity of auth.js to < 10”.
  • Quick test: Ask “How do I know when this task is done?” If unclear, the task is too vague.

Problem 2: “Too many tasks (30+), feels overwhelming”

  • Why: Tasks are too granular or not grouped.
  • Fix: Group related tasks into phases: “Phase 1: Auth (3 tasks)”, “Phase 2: API (5 tasks)”.
  • Quick test: Can you complete the entire list in a single work session? If no, it’s too long.

Problem 3: “Kiro forgot to complete a task from the list”

  • Why: Task was not explicitly tracked or Kiro lost context.
  • Fix: Use /todo status to verify all tasks are tracked, and /todo execute to enforce systematic execution.
  • Quick test: Review the final checklist—all tasks should be marked as done.

Problem 4: “Task order matters but Kiro did them out of order”

  • Why: Dependencies weren’t explicit.
  • Fix: Number tasks explicitly (1, 2, 3) or use “Prerequisites: Task 1 must complete first”.
  • Quick test: Can task 3 complete before task 1? If yes, there’s a dependency issue.

Problem 5: “A task failed mid-execution, but the todo list shows it as done”

  • Why: Kiro marked it as done based on attempting it, not succeeding.
  • Fix: Use stricter “done” criteria: tests must pass, manual verification required.
  • Quick test: Run tests after each task completion.

Problem 6: “Todo list disappeared after restarting Kiro”

  • Why: The list was only in-memory, not persisted.
  • Fix: Save the todo list to a file (.kiro/todos.md) and load it on session start.
  • Quick test: Restart Kiro and run /todo status (should show previous list).

Definition of Done

  • Can create a todo list from an unstructured brain dump
  • All tasks are atomic (single responsibility, verifiable completion)
  • Tasks are numbered or prioritized (execution order is clear)
  • Each task has a “done” condition (how do you know it’s complete?)
  • Kiro executes tasks systematically (in order, one at a time)
  • Can check progress mid-execution (/todo status)
  • All tasks marked as complete when done
  • Todo list persists across Kiro sessions (saved to file)
  • Can manually edit the todo list (add/remove/reorder tasks)
  • Can handle task failures (mark as failed, skip, or retry)
  • Tested with a real multi-step project (all tasks completed successfully)
  • Documentation explains how to create effective task lists

Project 28: “The Semantic Search Engine” — Knowledge Management

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Python (RAG)
Coolness Level Level 4: Hardcore Tech Flex
Difficulty Level 3: Advanced
Knowledge Area Knowledge Management

What you’ll build: Enable /knowledge and ingest a folder of PDFs for semantic Q&A.

Why it teaches Retrieval: You learn how to use data larger than the context window.

Success criteria:

  • An answer is grounded in retrieved chunks.

Real World Outcome

You will have a Kiro CLI extension that ingests PDF documents and enables semantic question-answering that goes beyond the context window limit. When you run it, you’ll see:

Ingestion Phase:

$ kiro "/knowledge ingest ~/Documents/research_papers/"

📚 Semantic Search Engine - Knowledge Ingestion
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Processing PDFs...
├─ attention_is_all_you_need.pdf (8 pages) ✓
│  └─ Extracted 47 chunks (avg: 512 tokens/chunk)
├─ bert_pretraining.pdf (16 pages) ✓
│  └─ Extracted 89 chunks (avg: 498 tokens/chunk)
└─ gpt3_language_models.pdf (75 pages) ✓
   └─ Extracted 412 chunks (avg: 505 tokens/chunk)

Generating embeddings... [████████████████████] 548/548 chunks

Building vector index (FAISS)...
├─ Index type: IVF256,Flat
├─ Dimensions: 1536 (text-embedding-3-small)
└─ Total vectors: 548

💾 Saved to: ~/.kiro/knowledge/research_papers.faiss
✓ Knowledge base ready: research_papers (548 chunks, 274k tokens)

Query Phase:

$ kiro "/knowledge query research_papers 'What is the self-attention mechanism in transformers?'"

🔍 Semantic Search Results
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Query: "What is the self-attention mechanism in transformers?"

Top 5 Retrieved Chunks (by cosine similarity):

1. attention_is_all_you_need.pdf (page 3, score: 0.94)
   "Self-attention, sometimes called intra-attention, is a mechanism
    relating different positions of a single sequence to compute a
    representation of the sequence. The attention function maps a
    query and set of key-value pairs to an output..."

2. attention_is_all_you_need.pdf (page 4, score: 0.89)
   "Scaled Dot-Product Attention: We compute attention as
    Attention(Q,K,V) = softmax(QK^T / sqrt(d_k))V where Q, K, V
    are the queries, keys, and values matrices..."

3. bert_pretraining.pdf (page 7, score: 0.82)
   "BERT uses bidirectional self-attention, allowing each token to
    attend to all tokens in both directions. This differs from GPT's
    causal (left-to-right) attention masking..."

───────────────────────────────────────────────────────────────

📝 Generated Answer (grounded in retrieved context):

Self-attention is a mechanism that relates different positions within
a single sequence to compute its representation. In the Transformer
architecture, it works by:

1. Computing Query (Q), Key (K), Value (V) matrices from input
2. Calculating attention scores: softmax(QK^T / sqrt(d_k))
3. Using scores to weight the Value vectors

The scaling factor sqrt(d_k) prevents dot products from growing too
large. BERT extends this with bidirectional attention, while GPT uses
causal masking for autoregressive generation.

📚 Sources: attention_is_all_you_need.pdf (p3-4), bert_pretraining.pdf (p7)

Usage in Conversation:

$ kiro "Based on my research papers, explain how to implement a custom attention layer"

[Kiro automatically retrieves relevant chunks from the knowledge base]

I found 3 relevant sections from your research papers knowledge base:
- attention_is_all_you_need.pdf discusses scaled dot-product attention
- bert_pretraining.pdf covers multi-head attention implementation
- efficient_transformers.pdf shows optimization techniques

Here's how to implement a custom attention layer...
[Answer grounded in retrieved context]

You’re seeing exactly what modern RAG (Retrieval-Augmented Generation) systems do - breaking the context window limitation by retrieving only relevant information on-demand!

The Core Question You’re Answering

“How do you give an LLM access to knowledge beyond its context window without fine-tuning?”

Before you write any code, sit with this question. Most developers think context windows solve everything (“just throw it all in!”), but:

  • GPT-4 Turbo: 128k tokens ≈ 96,000 words ≈ 200 pages
  • Your company’s documentation: 10,000 pages
  • Every research paper ever written: billions of pages

Even with 200k token windows, you can’t fit everything. RAG (Retrieval-Augmented Generation) solves this by:

  1. Converting text to semantic vectors (embeddings)
  2. Storing vectors in a searchable index
  3. Retrieving only relevant chunks for each query
  4. Grounding LLM responses in retrieved context

This is how ChatGPT’s “Browse with Bing” works, how GitHub Copilot uses your codebase, and how enterprise AI assistants access internal docs.

Concepts You Must Understand First

Stop and research these before coding:

  1. Vector Embeddings
    • What is an embedding? (numeric representation of semantic meaning)
    • Why does cosine similarity measure semantic relatedness?
    • How does text-embedding-3-small differ from text-embedding-ada-002?
    • Book Reference: “Speech and Language Processing” Ch. 6 (Vector Semantics) - Jurafsky & Martin
  2. Chunking Strategies
    • Why chunk documents instead of embedding entire PDFs?
    • What’s the trade-off between chunk size (128 vs 512 vs 2048 tokens)?
    • How does overlapping chunks prevent context loss at boundaries?
    • Book Reference: “Information Retrieval” Ch. 2 (Indexing) - Manning, Raghavan, Schütze
  3. Vector Databases (FAISS, Pinecone, Weaviate)
    • What is Approximate Nearest Neighbor (ANN) search?
    • Why is exhaustive search O(n) too slow for millions of vectors?
    • How does FAISS’s IVF (Inverted File Index) work?
    • Blog Reference: “FAISS: A Library for Efficient Similarity Search” - Facebook AI Research
  4. Retrieval Algorithms
    • Dense retrieval (embeddings) vs sparse retrieval (BM25/TF-IDF)
    • What is hybrid search? (combining dense + sparse)
    • How does reranking improve top-k results?
    • Paper Reference: “Dense Passage Retrieval for Open-Domain QA” - Karpukhin et al., 2020
  5. PDF Parsing
    • How does PyPDF2/pdfplumber extract text from PDFs?
    • What breaks with scanned PDFs (OCR needed)?
    • How do you handle tables, images, and multi-column layouts?
    • Docs Reference: pdfplumber documentation

Questions to Guide Your Design

Before implementing, think through these:

  1. Chunking Strategy
    • Fixed-size chunks (512 tokens) or semantic chunks (paragraph boundaries)?
    • Should chunks overlap? If so, by how much (50 tokens? 25%)?
    • How will you handle code blocks, tables, and lists (semantic units)?
  2. Embedding Model Selection
    • OpenAI text-embedding-3-small (1536 dims, $0.02/1M tokens)?
    • Sentence-BERT (384 dims, free, runs locally)?
    • How will you handle the latency vs cost trade-off?
  3. Vector Index Design
    • FAISS Flat (exact search, slow for >100k vectors)?
    • FAISS IVF (approximate, 10x faster, 95% recall)?
    • Do you need GPU acceleration (faiss-gpu)?
  4. Retrieval Strategy
    • Top-k retrieval (how many chunks? 3? 5? 10?)?
    • Score threshold (min cosine similarity to include)?
    • How will you format retrieved chunks in the prompt?
  5. Metadata & Filtering
    • Should you store page numbers, document titles, timestamps?
    • Do you need to filter by document type or date range?
    • How will you cite sources in the generated answer?

Thinking Exercise

Trace Retrieval Flow

Before coding, manually trace this RAG pipeline:

Given:

  • Knowledge base: 3 PDFs (Attention Is All You Need, BERT, GPT-3)
  • Query: “How does GPT-3 differ from BERT in pretraining?”

Trace each step:

  1. Query Embedding
    • Input: “How does GPT-3 differ from BERT in pretraining?”
    • Output: 1536-dimensional vector (e.g., [0.023, -0.145, 0.891, …])
    • Question: Why embed the query with the same model as the chunks?
  2. Vector Search (FAISS)
    • Compute cosine similarity between query vector and all 548 chunk vectors
    • Sort by similarity score (1.0 = identical, 0.0 = orthogonal)
    • Return top 5 chunks
    • Question: Why cosine similarity instead of Euclidean distance?
  3. Retrieved Chunks (hypothetical)
    Chunk 1 (gpt3_language_models.pdf, page 12, score: 0.91)
    "GPT-3 uses autoregressive language modeling, predicting the next
     token given all previous tokens. Unlike BERT's masked language
     modeling, GPT-3 is trained left-to-right..."
    
    Chunk 2 (bert_pretraining.pdf, page 3, score: 0.88)
    "BERT is pretrained with two objectives: (1) Masked Language Model
     (MLM) where 15% of tokens are masked, and (2) Next Sentence
     Prediction (NSP)..."
    
    • Question: Why did these chunks score higher than others?
  4. Prompt Construction
    System: You are an AI assistant. Answer based on the context below.
    
    Context:
    [Chunk 1 content]
    [Chunk 2 content]
    ...
    
    User: How does GPT-3 differ from BERT in pretraining?
    
    Answer:
    
    • Question: What if the retrieved chunks don’t answer the question?
  5. Generated Answer
    • LLM reads retrieved context + query
    • Generates grounded answer citing sources
    • Question: How do you detect hallucination (info NOT in retrieved chunks)?

Questions while tracing:

  • What if no chunks have similarity > 0.5? (query outside knowledge base)
  • What if 10 chunks all have similarity > 0.9? (do you use all? truncate?)
  • What if the PDF has OCR errors? (“pretraining” → “pre-training” → “pretrainng”)?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the difference between RAG (Retrieval-Augmented Generation) and fine-tuning. When would you use each?”

  2. “Your vector search is returning irrelevant chunks for 20% of queries. How would you debug and fix this?”

  3. “You have 1 million PDF pages to index. Embedding them with OpenAI costs $200. How would you reduce this cost?”

  4. “A user asks ‘What’s the latest update?’ but your knowledge base is from 6 months ago. How does your system handle this gracefully?”

  5. “Walk me through the math of cosine similarity. Why is it better than Euclidean distance for text embeddings?”

  6. “You’re getting complaints that answers are slow (10 seconds). Where are the bottlenecks and how do you optimize?”

Hints in Layers

Hint 1: Start with PDF Ingestion Don’t jump straight to embeddings. First, prove you can extract clean text from a single PDF. Use pdfplumber (better than PyPDF2 for tables). Test with a research paper PDF and verify paragraph boundaries are preserved.

Hint 2: Implement Chunking Split the extracted text into 512-token chunks with 50-token overlap. Use tiktoken (OpenAI’s tokenizer) to count tokens accurately. Store chunks with metadata:

chunk = {
    'text': "Self-attention is a mechanism...",
    'source': 'attention_is_all_you_need.pdf',
    'page': 3,
    'chunk_id': 'doc1_chunk_047',
    'token_count': 498
}

Hint 3: Generate Embeddings Call OpenAI’s embedding API for each chunk. Batch requests (up to 2048 chunks/request) to reduce latency:

response = openai.embeddings.create(
    model="text-embedding-3-small",
    input=[chunk['text'] for chunk in chunks[:2048]]
)
embeddings = [data.embedding for data in response.data]

Each embedding is a 1536-dimensional float array.

Hint 4: Build FAISS Index Create a Flat index for exact search (start simple before optimizing):

import faiss
import numpy as np

dimension = 1536
embeddings_matrix = np.array(embeddings).astype('float32')

index = faiss.IndexFlatL2(dimension)  # L2 distance (convert to cosine later)
index.add(embeddings_matrix)  # Add all vectors

faiss.write_index(index, 'knowledge.faiss')  # Save to disk

Hint 5: Query & Retrieve For a user query, embed it and search the index:

query_embedding = openai.embeddings.create(
    model="text-embedding-3-small",
    input="What is self-attention?"
).data[0].embedding

query_vector = np.array([query_embedding]).astype('float32')
k = 5  # Top 5 results
distances, indices = index.search(query_vector, k)

# Retrieve original chunks
retrieved_chunks = [chunks[i] for i in indices[0]]

Hint 6: Construct RAG Prompt Format retrieved chunks into a prompt:

context = "\n\n".join([
    f"Source: {chunk['source']} (page {chunk['page']})\n{chunk['text']}"
    for chunk in retrieved_chunks
])

prompt = f"""Answer based on the following context:

{context}

Question: {user_query}

Answer:"""

Hint 7: Debugging Tools When results are bad, inspect:

  • Chunk quality: Are chunks semantically coherent? (print first 10)
  • Embedding distribution: Are vectors normalized? (check norms)
  • Similarity scores: What are the top-k scores? (should be > 0.6 for good matches)
  • Retrieved text: Does it actually answer the query? (manual review)

Hint 8: Optimization (Once It Works)

  • Switch to FAISS IVF for >10k chunks (10x faster, slight recall loss)
  • Cache embeddings (don’t re-embed the same query)
  • Use sentence-transformers for local embedding (no API costs)
  • Implement hybrid search (dense + BM25 sparse retrieval)

Books That Will Help

Topic Book Chapter
Vector Semantics & Embeddings “Speech and Language Processing” by Jurafsky & Martin Ch. 6
Information Retrieval Fundamentals “Information Retrieval” by Manning, Raghavan, Schütze Ch. 1-2
Nearest Neighbor Search “Foundations of Data Science” by Blum, Hopcroft, Kannan Ch. 2 (High-Dimensional Space)
Transformer Attention (context for RAG) “Deep Learning” by Goodfellow, Bengio, Courville Ch. 10 (Sequence Modeling)
PDF Parsing & Text Extraction “Mining the Web” by Soumen Chakrabarti Ch. 3 (Crawling & Extraction)

Common Pitfalls & Debugging

Problem 1: “Embeddings return nonsense - unrelated chunks rank highest”

  • Why: You’re using Euclidean distance (L2) instead of cosine similarity. L2 is affected by vector magnitude; cosine only cares about direction.
  • Fix: Use IndexFlatIP (inner product) with normalized vectors, or convert L2 distances to cosine.
  • Quick test: faiss.normalize_L2(embeddings_matrix) before adding to index. Verify with np.linalg.norm(embeddings_matrix[0]) ≈ 1.0.

Problem 2: “PDF extraction is garbled - formulas and tables break”

  • Why: PyPDF2 doesn’t handle complex layouts. Scanned PDFs need OCR.
  • Fix: Use pdfplumber for tables, pytesseract for scanned PDFs, unstructured library for mixed content.
  • Quick test: pdfplumber.open('paper.pdf').pages[0].extract_text() - inspect visually for garbling.

Problem 3: “Query returns 0 results with similarity > 0.5”

  • Why: Query is outside the knowledge base domain, or embedding model mismatch (query embedded with different model than chunks).
  • Fix: Fallback to “no relevant information found” response. Check embedding model consistency.
  • Quick test: Embed a chunk’s text as a query - should return that chunk with similarity ≈ 1.0.

Problem 4: “Indexing 100k chunks takes 30 minutes”

  • Why: Calling OpenAI API for each chunk individually (network latency dominates).
  • Fix: Batch requests (up to 2048 chunks per API call). Use asyncio for parallelism.
  • Quick test: Time 1 chunk vs 100 chunks batched - batching should be 10-50x faster.

Problem 5: “Answers hallucinate facts not in retrieved chunks”

  • Why: LLM ignores context and uses pretrained knowledge. Prompt doesn’t enforce grounding.
  • Fix: Add to prompt: “Answer ONLY using the context above. If the answer isn’t in the context, say ‘I don’t have enough information.’”
  • Quick test: Query something NOT in the knowledge base - LLM should refuse to answer.

Definition of Done

  • PDF ingestion works: Extract text from 3+ PDFs with different layouts (text-heavy, tables, diagrams)
  • Chunking is semantic: Verify chunks split on paragraph boundaries, not mid-sentence
  • Embeddings are generated: 500+ chunks embedded successfully, stored with metadata
  • FAISS index builds: Index file saved to disk, loads correctly on restart
  • Query retrieval works: Top-5 chunks for a test query include expected results
  • Similarity scores make sense: Relevant chunks score > 0.7, irrelevant < 0.5
  • Answers are grounded: Generated responses cite sources (page numbers, document names)
  • Edge cases handled: Empty PDFs, malformed PDFs, queries outside knowledge base domain
  • Performance is acceptable: Query latency < 2 seconds (embedding + search + generation)
  • Code is documented: README explains ingestion, querying, and adding new documents

Project 29: “The Delegate Background Worker” — Async Workflows

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Bash
Coolness Level Level 3: Genuinely Clever
Difficulty Level 2: Intermediate
Knowledge Area Async Workflows

What you’ll build: Use delegate to run tests and fix linting while you keep working.

Why it teaches Parallelism: The agent becomes non-blocking.

Success criteria:

  • A background task completes and reports its summary.

Real World Outcome

You will have a Kiro CLI workflow that delegates time-consuming tasks to background agents while you continue working on other tasks. When you run it, you’ll see:

Delegating a Task:

$ kiro "I need to implement user authentication. But first, delegate fixing all ESLint errors in the background."

🔄 Delegating Task: Fix ESLint errors
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Starting background agent: kiro_delegate_001
Task: Fix all ESLint errors in src/ directory
Priority: Low
Estimated time: 10-15 minutes

Background agent running in: /tmp/kiro_delegate_001.log
Agent ID: agent_20240102_143052

[Main session continues - you keep working on auth]
✓ Background task delegated. You can continue working.

Main Session Continues (Non-Blocking):

[You continue in the main conversation]

$ kiro "Add JWT authentication to the Express API"

🔨 Implementing JWT Authentication
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Installing dependencies...
$ npm install jsonwebtoken bcrypt

Step 2: Creating auth middleware...
[Creating src/middleware/auth.js]

Step 3: Adding login endpoint...
[Creating POST /api/auth/login]

[While you work, background agent runs in parallel]

Background Task Completes:

🎉 Background Task Complete: agent_20240102_143052
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Task: Fix ESLint errors
Duration: 12m 34s
Status: Success ✓

Summary of Changes:
├─ Fixed 47 ESLint errors across 12 files
│  ├─ 23 × no-unused-vars (removed dead code)
│  ├─ 15 × prefer-const (changed let → const)
│  ├─  6 × no-console (added eslint-disable comments)
│  └─  3 × missing-return (added return statements)
├─ Ran tests: 156 passed, 0 failed
└─ Created commit: "fix: resolve 47 ESLint errors"

Files modified:
  M src/controllers/userController.js
  M src/services/authService.js
  M src/utils/validation.js
  ... (9 more files)

Full log: /tmp/kiro_delegate_001.log

Checking Background Task Status:

$ kiro "/delegate status"

📋 Active Background Tasks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

agent_20240102_143052 [COMPLETE] ✓
  Task: Fix ESLint errors
  Started: 2:30 PM
  Completed: 2:43 PM (12m 34s)
  Status: Success

agent_20240102_143515 [RUNNING] ⏳
  Task: Run integration test suite
  Started: 2:35 PM
  Elapsed: 8m 15s
  Progress: Running test 47/89...

agent_20240102_144201 [FAILED] ✗
  Task: Update all dependencies
  Started: 2:42 PM
  Failed: 2:45 PM (3m 12s)
  Error: Dependency conflict in @types/node
  Log: /tmp/kiro_delegate_003.log

Monitoring Real-Time Progress:

$ tail -f /tmp/kiro_delegate_002.log

[14:35:12] Starting integration tests...
[14:36:45] ✓ Auth flow tests (12 tests, 2.3s)
[14:37:23] ✓ Database migrations (8 tests, 3.1s)
[14:38:01] ⏳ API endpoint tests (running 47/89...)
[14:38:45] ✓ POST /api/users (201 response, 0.8s)
[14:39:12] ✓ GET /api/users/:id (200 response, 0.5s)
...

You’re seeing exactly what concurrent programming enables - parallelism that lets you stay productive while long-running tasks complete in the background!

The Core Question You’re Answering

“How do you make Kiro non-blocking so you can work on Task A while Task B executes in parallel?”

Before you write any code, sit with this question. Most CLI tools are synchronous (“wait for this to finish before doing anything else”), but modern development workflows demand parallelism:

Synchronous (blocking) workflow:

You: "Run the test suite"
Kiro: [runs 500 tests for 15 minutes]
You: [waits... twiddling thumbs... can't do anything else]
Kiro: "Tests passed!"
You: [finally continues]

Asynchronous (non-blocking) workflow:

You: "Delegate running the test suite to a background agent"
Kiro: [spawns background agent]
Background Agent: [runs 500 tests for 15 minutes in parallel]
You: [continues working on auth implementation]
You: [continues working on API docs]
Background Agent: "Tests passed!" [notifies when done]

This is the same pattern as:

  • GitHub Actions (CI/CD in background while you keep coding)
  • Background jobs in web apps (Sidekiq, Celery, Bull)
  • Async/await in programming (non-blocking I/O)

Concepts You Must Understand First

Stop and research these before coding:

  1. Process Management (Unix)
    • What is a process? (running program with PID)
    • How do you spawn a background process in Bash? (command &, nohup, disown)
    • What happens when a parent process exits? (orphaned processes, init adoption)
    • Book Reference: “Advanced Programming in the UNIX Environment” Ch. 9 (Process Relationships) - Stevens & Rago
  2. Inter-Process Communication (IPC)
    • How do two processes communicate? (pipes, sockets, files, signals)
    • What is stdout/stderr redirection? (>, >>, 2>&1)
    • How do you read a process’s output while it’s running? (tail -f, named pipes)
    • Book Reference: “The Linux Programming Interface” Ch. 44 (Pipes and FIFOs) - Michael Kerrisk
  3. Job Control
    • What is job control? (fg, bg, jobs, kill)
    • How do you bring a background job to the foreground? (fg %1)
    • What signals exist? (SIGTERM, SIGKILL, SIGINT, SIGHUP)
    • Book Reference: “Learning the bash Shell” Ch. 8 (Job Control) - Cameron Newham
  4. Async Execution Patterns
    • What is the difference between parallel and concurrent?
    • How do you wait for multiple background tasks? (wait $PID1 $PID2)
    • What is a task queue? (producers add tasks, workers consume)
    • Blog Reference: “Concurrency vs Parallelism” - Rob Pike (Go creator)
  5. Exit Codes & Error Handling
    • What do exit codes mean? (0 = success, 1-255 = error)
    • How do you capture a background process’s exit code? (wait $PID; echo $?)
    • How do you handle failures in background tasks?
    • Docs Reference: Bash manual on exit status

Questions to Guide Your Design

Before implementing, think through these:

  1. Task Lifecycle
    • How do you spawn a background Kiro session? (new process? Docker container? tmux pane?)
    • Where do you store task metadata? (PID, log file, status, start time)
    • How do you track which tasks are running vs completed?
  2. Communication Protocol
    • How does the main session know when a background task completes?
    • File-based polling (check status.json every 5s)?
    • Signal-based notification (SIGUSR1 when done)?
    • Webhook/HTTP callback?
  3. Logging & Observability
    • Where do background task logs go? (separate file per task? centralized?)
    • How do you tail logs in real-time? (tail -f)
    • How do you prevent log files from growing unbounded? (rotation, max size)
  4. Error Handling
    • What if a background task crashes? (save stack trace to log)
    • What if a background task hangs? (timeout after 1 hour?)
    • What if the main session exits while background tasks run? (orphan cleanup?)
  5. Resource Limits
    • How many background tasks can run concurrently? (CPU cores, memory limits)
    • Should you queue tasks if too many are running? (max 4 concurrent)
    • How do you prioritize tasks? (critical > high > normal > low)

Thinking Exercise

Trace Background Task Execution

Before coding, manually trace this workflow:

Given:

  • Main session: User asks Kiro to implement JWT auth
  • Background task: Fix ESLint errors (15-minute task)

Trace each step:

  1. User Delegates Task
    $ kiro "Delegate fixing ESLint errors while I work on auth"
    
    • Main Kiro session detects “delegate” keyword
    • Creates task metadata:
      {
        "id": "agent_20240102_143052",
        "task": "Fix ESLint errors",
        "status": "starting",
        "pid": null,
        "log_file": "/tmp/kiro_delegate_001.log",
        "start_time": "2024-01-02T14:30:52Z"
      }
      
    • Question: Where is this metadata stored? (file? database? in-memory?)
  2. Spawn Background Process
    # Main session executes:
    nohup kiro "Fix all ESLint errors in src/" > /tmp/kiro_delegate_001.log 2>&1 &
    BACKGROUND_PID=$!
    
    # Update metadata with PID
    echo $BACKGROUND_PID > /tmp/kiro_delegate_001.pid
    
    • nohup: Ignore SIGHUP (session logout won’t kill it)
    • &: Run in background
    • $!: Capture PID of last background process
    • Question: What if the background Kiro process spawns subprocesses? (process tree)
  3. Main Session Continues (Non-Blocking)
    # User continues working
    $ kiro "Add JWT authentication"
    
    # Main session is responsive immediately
    [Working on auth implementation...]
    
    • Background task runs in parallel
    • Main session doesn’t wait
    • Question: How does Kiro prevent context pollution? (separate conversation history?)
  4. Background Task Runs
    # In the background (separate process):
    [Background Agent Log - /tmp/kiro_delegate_001.log]
    
    [14:30:52] Starting task: Fix ESLint errors
    [14:31:05] Running: eslint src/ --fix
    [14:31:45] Fixed 47 errors across 12 files
    [14:32:10] Running tests: npm test
    [14:43:15] Tests passed (156/156)
    [14:43:20] Creating commit: "fix: resolve 47 ESLint errors"
    [14:43:26] Task complete
    • Question: How does the background agent know to commit? (task instructions)
  5. Completion Detection
    # Background process writes completion metadata
    {
      "id": "agent_20240102_143052",
      "status": "complete",
      "exit_code": 0,
      "end_time": "2024-01-02T14:43:26Z",
      "summary": "Fixed 47 ESLint errors, tests passed"
    }
    
    # Main session polls or gets notified
    [Main Session] Background task agent_20240102_143052 completed ✓
    
    • Question: Polling (check every 10s) vs event-driven (callback)?

Questions while tracing:

  • What if the background task needs user input? (can’t prompt, must fail gracefully)
  • What if the background task modifies files the main session is using? (file locking, conflict resolution)
  • What if two background tasks both try to commit? (git lock conflict)

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the difference between concurrency and parallelism. How does background task delegation relate to each?”

  2. “Your background task hangs indefinitely. How would you implement a timeout mechanism in Bash?”

  3. “You have 10 background tasks queued but only 2 CPU cores. How would you schedule them efficiently?”

  4. “A background task crashes halfway through. How do you ensure it doesn’t leave the codebase in a broken state?”

  5. “Walk me through how nohup command & works at the OS level. What happens when the parent shell exits?”

  6. “You’re running 5 background agents. How would you implement a priority queue so critical tasks run first?”

Hints in Layers

Hint 1: Start with Process Spawning Before building the full system, prove you can spawn a background process and capture its output. Test with a simple sleep command:

# Spawn background process
nohup sleep 30 > /tmp/test.log 2>&1 &
BACKGROUND_PID=$!
echo "Started PID: $BACKGROUND_PID"

# Check if it's running
ps -p $BACKGROUND_PID

Hint 2: Store Task Metadata Create a task registry (simple JSON file):

# /tmp/kiro_tasks.json
{
  "tasks": [
    {
      "id": "agent_001",
      "pid": 12345,
      "status": "running",
      "log": "/tmp/kiro_delegate_001.log",
      "started": "2024-01-02T14:30:52Z",
      "task": "Fix ESLint errors"
    }
  ]
}

Use jq to read/write JSON from Bash.

Hint 3: Delegate Command Implement a /delegate command in Kiro:

# Pseudocode for /delegate handler
if user_input.startswith("/delegate"):
    task_description = extract_task(user_input)

    # Create task metadata
    task_id = generate_id()  # agent_YYYYMMDD_HHMMSS
    log_file = f"/tmp/kiro_delegate_{task_id}.log"

    # Spawn background Kiro process
    pid = spawn_background(f"kiro '{task_description}'", log_file)

    # Register task
    register_task(task_id, pid, log_file, task_description)

    # Notify user
    print(f"✓ Delegated task {task_id} (PID {pid})")

Hint 4: Monitor Background Tasks Implement a /delegate status command:

# Read task registry
tasks = read_json("/tmp/kiro_tasks.json")

for task in tasks:
    pid = task['pid']

    # Check if process is still running
    if process_exists(pid):
        status = "RUNNING ⏳"
    else:
        exit_code = get_exit_code(pid)  # from wait $pid
        status = "COMPLETE ✓" if exit_code == 0 else "FAILED ✗"

    print(f"{task['id']} [{status}] {task['task']}")

Hint 5: Tail Logs in Real-Time Allow users to monitor background tasks:

# Command: kiro "/delegate logs agent_001"
log_file = get_log_file("agent_001")
subprocess.run(f"tail -f {log_file}")

Hint 6: Wait for Completion Implement a blocking wait if needed:

# Command: kiro "/delegate wait agent_001"
pid = get_pid("agent_001")
wait $pid  # Blocks until process exits
exit_code = $?

if [ $exit_code -eq 0 ]; then
    echo "✓ Task completed successfully"
else
    echo "✗ Task failed with exit code $exit_code"
fi

Hint 7: Cleanup Orphaned Tasks When main session exits, decide what to do with background tasks:

# Option 1: Kill all background tasks
trap 'kill $(jobs -p)' EXIT

# Option 2: Let them continue (orphan them)
disown -a  # Remove from job table

# Option 3: Ask user
echo "Background tasks running. Kill them? (y/n)"

Hint 8: Prevent Context Pollution Each background agent should have an isolated conversation history:

# Spawn with fresh context
kiro --new-session "Fix ESLint errors"

# Or use explicit context isolation
kiro --context-id "delegate_001" "Fix ESLint errors"

Books That Will Help

Topic Book Chapter
Process Management & Job Control “Advanced Programming in the UNIX Environment” by Stevens & Rago Ch. 9-10
Inter-Process Communication “The Linux Programming Interface” by Michael Kerrisk Ch. 44 (Pipes), Ch. 63 (Sockets)
Bash Background Jobs “Learning the bash Shell” by Cameron Newham Ch. 8 (Job Control)
Signals & Process Control “Linux System Programming” by Robert Love Ch. 5 (Process Management)
Concurrency Patterns “The Art of Concurrency” by Clay Breshears Ch. 2 (Threads vs Processes)

Common Pitfalls & Debugging

Problem 1: “Background task exits immediately after spawning”

  • Why: The background process inherits stdin/stdout tied to the terminal, which closes when the parent exits.
  • Fix: Use nohup and redirect all I/O: nohup command > log.txt 2>&1 &
  • Quick test: Logout and login again, then ps aux | grep kiro - background process should still be running.

Problem 2: “Can’t read background task’s PID after spawning”

  • Why: $! only works immediately after &. If you run other commands, $! changes.
  • Fix: Capture PID immediately: command & PID=$!; echo $PID > task.pid
  • Quick test: cat task.pid should show the correct PID, verify with ps -p $(cat task.pid).

Problem 3: “Background task writes to main session’s stdout (pollutes output)”

  • Why: Background process still has stdout pointing to the terminal.
  • Fix: Redirect stdout/stderr to a log file: command > /tmp/task.log 2>&1 &
  • Quick test: Main session should have clean output, logs go to file only.

Problem 4: “Background task hangs forever, no timeout”

  • Why: No timeout mechanism in place.
  • Fix: Use timeout command: timeout 1h kiro "long task" & (kills after 1 hour)
  • Quick test: timeout 5s sleep 10 & - process should die after 5 seconds.

Problem 5: “Background task conflicts with main session (file locks, git operations)”

  • Why: Both sessions try to modify the same files or run git commit simultaneously.
  • Fix: Implement file locking (flock) or coordinate via task queue (only one git operation at a time).
  • Quick test: Start two background tasks that both commit - second should wait or fail gracefully.

Definition of Done

  • Delegate command works: /delegate "run tests" spawns a background Kiro process
  • PID is captured: Background task’s PID is stored and accessible
  • Logs are isolated: Each background task writes to its own log file
  • Main session is non-blocking: User can continue working immediately after delegating
  • Status command works: /delegate status shows running/completed/failed tasks
  • Real-time monitoring: tail -f on log files shows live progress
  • Completion detection: Main session notifies when background tasks finish
  • Exit code handling: Failed background tasks are marked as failed (non-zero exit code)
  • Orphan cleanup: Background tasks don’t become zombies when main session exits
  • Context isolation: Background tasks don’t pollute main session’s conversation history

Project 30: “The Recursive Prompt Improver” — Metacognition

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Natural Language
Coolness Level Level 5: Pure Magic
Difficulty Level 3: Advanced
Knowledge Area Metacognition

What you’ll build: Ask Kiro to analyze its own steering and improve it based on mistakes.

Why it teaches Self-Correction: Your system gets smarter over time.

Success criteria:

  • A new steering rule prevents a repeated mistake.

Real World Outcome

You will have a Kiro CLI workflow that enables self-improvement through metacognitive analysis - Kiro analyzes its own behavior, identifies patterns of mistakes, and proposes steering rules to prevent future errors. When you run it, you’ll see:

Triggering Self-Analysis:

$ kiro "You keep running eslint without --fix, so I have to manually fix errors. Analyze this pattern and improve your steering."

🧠 Metacognitive Analysis Mode
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Analyzing recent conversation history...
├─ Reviewing last 50 messages
├─ Identifying repeated patterns
└─ Searching for corrective feedback

🔍 Pattern Detected: ESLint Usage Anti-Pattern
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Occurrences: 8 times in last 2 weeks

Example 1 (2024-01-01):
  Kiro: "Running ESLint to check for issues..."
  $ eslint src/
  User: "You found 23 errors but didn't fix them. Run it with --fix!"
  Kiro: "Sorry! Running eslint src/ --fix"

Example 2 (2024-01-02):
  Kiro: "Let me check for linting errors..."
  $ eslint src/components/
  User: "Again, you forgot --fix. Always use --fix!"
  Kiro: "My apologies. Running eslint src/components/ --fix"

Example 3 (2023-12-29):
  [Similar pattern repeated 6 more times]

Root Cause Analysis:
├─ Current steering doesn't specify --fix flag
├─ Default behavior is to report errors, not fix them
└─ User expects automatic fixing in 100% of cases

📝 Proposed Steering Rule
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Title: "Always Use ESLint --fix Flag"

Rule:
When running ESLint for any reason, ALWAYS include the --fix flag
to automatically fix correctable errors. Do not run eslint without
--fix unless the user explicitly requests a "dry run" or "report only".

Correct:
  $ eslint src/ --fix
  $ eslint src/**/*.js --fix

Incorrect:
  $ eslint src/
  $ eslint --debug src/

Exceptions:
- User says "check for errors but don't fix"
- User says "dry run"
- Running in CI/CD (--fix would modify files)

Priority: High (affects 40% of linting tasks)

Impact:
  Before: 8 occurrences of this mistake
  After: 0 expected occurrences
  Time saved: ~2 minutes per occurrence = 16 min/week

Do you want me to add this steering rule to ~/.kiro/steering.md? (y/n)

User Approves and Rule is Added:

$ y

✅ Steering Rule Added
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

File: ~/.kiro/steering.md
Section: ## Linting & Code Quality
Rule: "Always Use ESLint --fix Flag"

Full rule saved. This will apply to all future conversations.

Would you like me to analyze other patterns? (y/n)

Subsequent Behavior (Rule in Effect):

$ kiro "Check the code quality in src/"

🔧 Running Code Quality Check
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

$ eslint src/ --fix

[Applying steering rule: "Always Use ESLint --fix Flag"]

Fixed 12 errors automatically:
├─  7 × no-unused-vars
├─  3 × prefer-const
└─  2 × no-console

Remaining 3 errors require manual intervention:
├─ src/utils.js:45 - Undefined variable 'config'
├─ src/app.js:102 - Missing return statement
└─ src/api.js:23 - Async function without await

✓ Auto-fixable errors resolved
⚠ Manual errors remain - see above

Advanced: Multi-Pattern Analysis:

$ kiro "/meta-analyze last 100 messages"

🧠 Comprehensive Metacognitive Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Analyzing conversation history...
└─ Processed 100 messages, 42 tool calls, 18 corrections

Patterns Detected:

1. 📦 NPM Install Forgetting (12 occurrences)
   - You forget to run npm install after adding dependencies
   - Proposed Rule: "Always run npm install after package.json changes"

2. 🧪 Test Running Incomplete (8 occurrences)
   - You run tests but don't check if they passed
   - Proposed Rule: "Always verify test results before proceeding"

3. 📝 Commit Messages Too Vague (15 occurrences)
   - Commit messages like "fix bug" instead of descriptive
   - Proposed Rule: "Commit messages must describe WHAT and WHY"

4. 🔒 Secrets Exposure (2 occurrences)
   - You almost committed API keys in .env files
   - Proposed Rule: "NEVER commit .env files, always .gitignore them"

Generate steering rules for all 4 patterns? (y/n)

You’re seeing exactly what metacognition enables - a system that learns from its mistakes and self-improves through reflective analysis!

The Core Question You’re Answering

“How do you make an AI agent that learns from its own mistakes without retraining or fine-tuning?”

Before you write any code, sit with this question. Most AI systems are static - they repeat the same mistakes forever because they have no mechanism for self-correction. But humans learn through:

  1. Reflection - “I made a mistake”
  2. Analysis - “Why did I make it?”
  3. Abstraction - “What pattern caused this?”
  4. Rule Formation - “What rule would prevent it?”
  5. Application - “Follow the rule next time”

This is metacognition - thinking about thinking. The system watches itself, identifies failure modes, and updates its behavior.

Traditional approach (static):

Kiro: [makes mistake]
User: [corrects]
Kiro: "Sorry, fixed it"
[2 days later]
Kiro: [makes same mistake again]

Metacognitive approach (self-improving):

Kiro: [makes mistake]
User: [corrects]
Kiro: "I notice I've made this mistake 8 times. Let me analyze..."
Kiro: [proposes steering rule]
User: "Yes, add that rule"
Kiro: [updates steering]
[2 days later]
Kiro: [applies rule, avoids mistake]

This is how AlphaGo learned (self-play), how humans learn (reflection), and how expert systems evolve (knowledge base updates).

Concepts You Must Understand First

Stop and research these before coding:

  1. Metacognition (Thinking About Thinking)
    • What is metacognition? (awareness of one’s own thought processes)
    • How do humans self-correct? (error detection → analysis → strategy change)
    • What is the OODA loop? (Observe, Orient, Decide, Act)
    • Book Reference: “Thinking, Fast and Slow” Ch. 20-21 (Self-Monitoring) - Daniel Kahneman
  2. Conversation Analysis & Pattern Mining
    • How do you detect repeated patterns in text? (regex, n-grams, semantic clustering)
    • What is cosine similarity for semantic patterns? (vector comparison)
    • How do you extract “correction events”? (user says “no, do it this way”)
    • Paper Reference: “Extracting Patterns from Conversational Data” - NLP literature
  3. Steering/System Prompts
    • What is a system prompt? (instructions that guide LLM behavior)
    • How do steering rules work? (constraints added to every request)
    • What’s the difference between few-shot examples and rules? (examples vs constraints)
    • Docs Reference: Anthropic’s “Prompt Engineering Guide”
  4. Rule Synthesis from Examples
    • How do you generalize from specific examples? (abstraction)
    • What makes a good rule? (clear, actionable, measurable)
    • How do you avoid overfitting rules? (too specific = not generalizable)
    • Book Reference: “AI: A Modern Approach” Ch. 19 (Learning from Examples) - Russell & Norvig
  5. Feedback Loops & System Stability
    • What is a feedback loop? (output affects future input)
    • What is positive vs negative feedback? (amplifying vs dampening)
    • How do you prevent runaway rule creation? (too many rules = conflict)
    • Book Reference: “Thinking in Systems” Ch. 1 (Feedback Loops) - Donella Meadows

Questions to Guide Your Design

Before implementing, think through these:

  1. Pattern Detection
    • How do you identify a “mistake”? (user correction keywords: “no”, “actually”, “you forgot”)
    • How many occurrences make a “pattern”? (3+ times = pattern, <3 = one-off)
    • How do you cluster similar mistakes? (semantic similarity of corrections)
  2. Analysis Triggering
    • User-initiated (“/meta-analyze”) vs automatic (after 3 corrections)?
    • Real-time (during conversation) vs batch (end of day)?
    • Threshold-based (trigger after N mistakes)?
  3. Rule Formulation
    • Template-based (“Always X when Y”) vs freeform?
    • Should rules include examples (few-shot) or just constraints?
    • How specific should rules be? (per-project vs global)
  4. Rule Storage & Application
    • Where are rules stored? (steering.md, JSON config, database)
    • How are rules loaded? (startup vs dynamic reload)
    • Priority/precedence: What if rules conflict? (specific > general)
  5. Validation & Testing
    • How do you test if a rule works? (simulate past mistakes, verify prevention)
    • How do you detect bad rules? (too restrictive, blocks valid actions)
    • Should rules expire? (remove if not triggered in 3 months)

Thinking Exercise

Trace Metacognitive Loop

Before coding, manually trace this self-improvement cycle:

Given:

  • Conversation history: 50 messages
  • User has corrected Kiro 3 times for forgetting npm install

Trace each step:

  1. Error Detection (Reflection)
    Message 12:
      Kiro: "I've added express to package.json"
      User: "You forgot to run npm install!"
      Kiro: "Installing now: npm install"
    
    Message 28:
      Kiro: "Added jsonwebtoken to dependencies"
      User: "npm install? You always forget this!"
      Kiro: "Sorry! Running npm install"
    
    Message 45:
      Kiro: "Updated to React 18 in package.json"
      User: "AGAIN! npm install!!"
      Kiro: "My apologies. Running npm install"
    
    • Question: How do you detect the correction pattern? (user frustration escalates)
  2. Pattern Extraction
    # Pseudocode
    corrections = []
    for i, msg in enumerate(messages):
        if user_corrected(msg):  # Contains "forgot", "you always", "again"
            corrections.append({
                'index': i,
                'context': messages[i-1],  # What Kiro did wrong
                'correction': msg,
                'fix': messages[i+1]  # What Kiro did to fix
            })
    
    # Group similar corrections
    clusters = cluster_by_similarity(corrections)
    # Cluster 1: "npm install" corrections (3 occurrences)
    
    • Question: What similarity threshold defines a cluster? (cosine > 0.8?)
  3. Root Cause Analysis
    Cluster: "NPM Install Forgetting" (3 occurrences)
    
    Common pattern:
    1. Kiro modifies package.json (add/update dependency)
    2. Kiro does NOT run npm install
    3. User reminds Kiro to run npm install
    4. Kiro runs npm install
    
    Root cause:
    - Current steering doesn't link package.json changes → npm install
    - Kiro treats them as independent actions
    
    • Question: How do you infer causality? (sequence analysis: A always followed by B)
  4. Rule Synthesis
    Proposed Steering Rule:
    
    ## Dependency Management
    
    **Always run `npm install` after modifying package.json**
    
    When you add, update, or remove dependencies in package.json,
    IMMEDIATELY run `npm install` to sync node_modules.
    
    Correct sequence:
    1. Edit package.json (add dependency)
    2. Run npm install
    3. Verify installation succeeded
    
    Don't forget this step - it's required for dependencies to be usable.
    
    • Question: Is this rule too specific? (what about yarn, pnpm?)
  5. User Approval & Application
    User: y (approves rule)
    
    # Rule added to ~/.kiro/steering.md
    # Next conversation, rule is loaded
    
    Kiro: "Adding lodash to package.json..."
    [Applying rule: "Always run npm install after modifying package.json"]
    Kiro: "Running npm install..."
    $ npm install
    Kiro: "✓ lodash installed successfully"
    
    • Question: How do you verify the rule prevented the mistake? (no correction needed)

Questions while tracing:

  • What if the user corrects something that’s actually context-specific? (rule would be wrong)
  • What if two rules conflict? (“Always X” vs “Never X in situation Y”)
  • What if a rule is too broad? (blocks valid edge cases)

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the difference between metacognition in humans and self-improvement in AI systems. What are the key similarities and differences?”

  2. “Your Kiro agent proposes a steering rule that’s too specific: ‘Always use port 3000 for Express servers.’ How would you generalize this into a better rule?”

  3. “You’ve added 50 steering rules over 6 months. Now Kiro is slow and rules conflict. How do you prune and consolidate rules?”

  4. “Walk me through how you would detect that a steering rule is harmful (blocking valid actions). What metrics would you track?”

  5. “How would you prevent an adversarial user from poisoning the steering rules by giving intentionally bad corrections?”

  6. “Explain the concept of ‘overfitting’ in machine learning. How does it relate to creating overly specific steering rules?”

Hints in Layers

Hint 1: Start with Manual Analysis Don’t automate pattern detection immediately. First, manually review your conversation history and identify 3 real mistakes Kiro made repeatedly. Write them down with examples.

Hint 2: Implement Correction Detection Scan conversation history for user corrections using keyword matching:

correction_keywords = [
    "you forgot",
    "you always",
    "again",
    "no, do it this way",
    "that's wrong",
    "actually",
    "incorrect"
]

for msg in messages:
    if any(kw in msg.content.lower() for kw in correction_keywords):
        # Mark as correction
        corrections.append(msg)

Hint 3: Cluster Similar Corrections Use embeddings to group semantically similar corrections:

# Embed each correction
correction_embeddings = [
    embed(c.content) for c in corrections
]

# Compute pairwise similarity
from sklearn.cluster import DBSCAN

clusters = DBSCAN(eps=0.3, min_samples=2).fit(correction_embeddings)
# Cluster 0: npm install corrections
# Cluster 1: eslint --fix corrections
# Cluster 2: commit message corrections

Hint 4: Extract Pattern Context For each cluster, extract the Kiro action that preceded the correction:

for cluster in clusters:
    for correction in cluster.corrections:
        prev_msg = get_previous_message(correction)  # What Kiro did
        next_msg = get_next_message(correction)      # How Kiro fixed it

        pattern = {
            'mistake': prev_msg.content,
            'correction': correction.content,
            'fix': next_msg.content
        }

Hint 5: Generate Rule Template Use an LLM to synthesize a rule from the pattern:

prompt = f"""
Based on these examples of a repeated mistake:

Example 1: {pattern_1}
Example 2: {pattern_2}
Example 3: {pattern_3}

Generate a steering rule that would prevent this mistake in the future.

Format:
## [Category]
**[Rule Title]**
[Rule description with examples of correct behavior]
"""

proposed_rule = llm(prompt)

Hint 6: Present for User Approval Display the proposed rule and ask for confirmation:

print(f"""
Proposed Steering Rule:

{proposed_rule}

Impact:
- Occurrences: {len(pattern.examples)}
- Estimated time saved: {time_estimate}

Add this rule to steering.md? (y/n)
""")

Hint 7: Append Rule to Steering File If approved, append to ~/.kiro/steering.md:

if user_approves:
    with open(os.path.expanduser('~/.kiro/steering.md'), 'a') as f:
        f.write(f"\n\n{proposed_rule}\n")
    print("✅ Rule added successfully")

Hint 8: Verify Rule Application In future conversations, check if the rule prevents the mistake:

# Load steering rules at startup
steering_rules = load_steering('~/.kiro/steering.md')

# Before each action, check rules
if action == 'modify package.json':
    relevant_rules = [r for r in steering_rules if 'npm install' in r]
    if relevant_rules:
        print("[Applying rule: 'Always run npm install after package.json changes']")
        run_npm_install()

Books That Will Help

Topic Book Chapter
Metacognition & Self-Monitoring “Thinking, Fast and Slow” by Daniel Kahneman Ch. 20-21
Learning from Examples “Artificial Intelligence: A Modern Approach” by Russell & Norvig Ch. 19
Feedback Loops & Systems “Thinking in Systems” by Donella Meadows Ch. 1
Pattern Mining in Text “Speech and Language Processing” by Jurafsky & Martin Ch. 8 (Sequence Labeling)
Prompt Engineering “The Prompt Engineering Guide” (online) All chapters

Common Pitfalls & Debugging

Problem 1: “Too many false positives - normal feedback detected as corrections”

  • Why: Overly broad keyword matching. “Actually, that looks good” is not a correction.
  • Fix: Use sentiment analysis or semantic similarity. Corrections have negative sentiment + suggest alternative action.
  • Quick test: Review 10 detected “corrections” - should all be actual mistakes, not positive feedback.

Problem 2: “Proposed rules are too specific - don’t generalize”

  • Why: Rule synthesized from a single example, not a pattern.
  • Fix: Require 3+ examples before creating a rule. Use LLM to generalize.
  • Quick test: Proposed rule should apply to related scenarios, not just the exact mistake.

Problem 3: “Steering file grows unbounded - 100+ rules conflict”

  • Why: No pruning or consolidation mechanism.
  • Fix: Implement rule expiry (remove if not triggered in 6 months), rule merging (combine similar rules).
  • Quick test: Count rules - should be <50 active rules at any time.

Problem 4: “Rule prevents valid actions - too restrictive”

  • Why: Rule doesn’t account for edge cases or exceptions.
  • Fix: Add exception clauses to rules (“unless user explicitly requests X”).
  • Quick test: User should be able to override any rule with explicit instruction.

Problem 5: “Metacognitive analysis is too slow - takes 30 seconds”

  • Why: Embedding all messages and clustering on every request.
  • Fix: Cache analysis results, run batch analysis periodically (not real-time).
  • Quick test: Analysis should complete in <5 seconds for 100-message history.

Definition of Done

  • Correction detection works: Identifies when user corrects Kiro’s behavior
  • Pattern clustering works: Groups similar corrections into clusters (3+ examples)
  • Root cause analysis works: Identifies the Kiro action that triggered correction
  • Rule synthesis works: LLM generates a clear, actionable steering rule
  • User approval flow works: Displays proposed rule and waits for confirmation
  • Rule persistence works: Approved rules are saved to steering.md
  • Rule application works: New conversations load and apply steering rules
  • Mistake prevention verified: Past mistakes no longer occur after rule is added
  • Rule quality checks: Rules are general enough to apply broadly, specific enough to be actionable
  • Performance is acceptable: Metacognitive analysis completes in <10 seconds

Project 31: “The Legacy Code Archaeologist” — Code Understanding

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language C / Java (Legacy)
Coolness Level Level 3: Genuinely Clever
Difficulty Level 3: Advanced
Knowledge Area Code Understanding

What you’ll build: Map a legacy repo and generate a call-graph summary.

Why it teaches Exploration: The AI becomes a codebase cartographer.

Success criteria:

  • DIAGRAM.md matches real entry points and call flow.

Real World Outcome

You will have a Kiro CLI workflow that automatically reverse-engineers legacy codebases by generating architectural diagrams, call graphs, and dependency maps from un documented code. When you run it, you’ll see:

Analyzing a Legacy Codebase:

$ kiro "Analyze this legacy Java codebase and create a DIAGRAM.md with the architecture"

🔍 Legacy Code Archaeologist - Codebase Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Phase 1: Discovery
├─ Scanning directory structure...
│  └─ Found 247 Java files across 38 packages
├─ Identifying entry points...
│  ├─ Main.java (static void main)
│  ├─ ServletInitializer.java (web entry)
│  └─ ScheduledTasks.java (cron jobs)
└─ Extracting package structure...
   ✓ com.legacy.app (core business logic)
   ✓ com.legacy.db (database layer)
   ✓ com.legacy.web (servlets & controllers)
   ✓ com.legacy.util (utilities)

Phase 2: Call Graph Construction
├─ Parsing method calls...
│  └─ Analyzed 1,843 methods across 247 files
├─ Building dependency graph...
│  └─ Detected 4,521 method-to-method calls
└─ Identifying hot paths...
   ✓ UserService.authenticate() called from 23 places
   ✓ DatabaseConnection.execute() called from 156 places
   ✓ LegacyValidator.validate() called from 89 places

Phase 3: Architecture Extraction
├─ Detecting design patterns...
│  ├─ Singleton: DatabaseConnection, ConfigManager
│  ├─ Factory: UserFactory, OrderFactory
│  └─ Observer: EventBus, NotificationManager
├─ Identifying layers...
│  └─ 3-tier architecture (Web → Service → DAO)
└─ Extracting data flow...
   ✓ HTTP Request → Servlet → Service → DAO → Database

✍️ Generating DIAGRAM.md...
✓ Architecture diagram created
✓ Call graph generated
✓ Dependency map visualized
✓ Hot paths documented

File created: DIAGRAM.md (523 lines)

Generated DIAGRAM.md Content:

# Legacy App Architecture - Reverse Engineered

## System Overview

This is a 3-tier Java web application built circa 2008 (Servlet 2.5 era).
No documentation exists - this diagram was generated via static analysis.

## Entry Points

1. **Web Requests**`com.legacy.web.MainServlet`
2. **Scheduled Jobs**`com.legacy.jobs.ScheduledTasks`
3. **CLI Admin Tool**`com.legacy.admin.Main`

## Architecture Diagram

┌─────────────────────────────────────────────────┐ │ WEB LAYER │ │ ┌─────────────────────────────────────────┐ │ │ │ MainServlet, UserServlet, OrderServlet │ │ │ └─────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ (HTTP requests) ↓ ┌─────────────────────────────────────────────────┐ │ SERVICE LAYER │ │ ┌─────────────────────────────────────────┐ │ │ │ UserService, OrderService, AuthService │ │ │ │ PaymentService, NotificationService │ │ │ └─────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ (business logic) ↓ ┌─────────────────────────────────────────────────┐ │ DAO LAYER │ │ ┌─────────────────────────────────────────┐ │ │ │ UserDAO, OrderDAO, PaymentDAO │ │ │ │ (JDBC-based, no ORM) │ │ │ └─────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ (SQL queries) ↓ ┌─────────────────────────────────────────────────┐ │ DATABASE │ │ MySQL 5.6 (inferred from JDBC driver) │ │ Schema: users, orders, payments, audit_log │ └─────────────────────────────────────────────────┘


## Critical Path: User Authentication

HTTP POST /login │ ↓ MainServlet.doPost() │ ├─ validate(request) │ └─ LegacyValidator.validate() [89 call sites!] │ ├─ UserService.authenticate(username, password) │ │ │ ├─ UserDAO.findByUsername() │ │ └─ DatabaseConnection.execute(sql) │ │ └─ “SELECT * FROM users WHERE username = ?” │ │ │ └─ PasswordHasher.verify(inputPwd, storedHash) │ └─ WARNING: Uses MD5 (insecure!) │ ├─ SessionManager.createSession(user) │ └─ response.redirect(“/dashboard”)


## Dependency Hot Spots

(Methods called most frequently - refactoring risks)

1. **DatabaseConnection.execute()** - 156 call sites
   - Every database query goes through this
   - Refactoring would break the entire app

2. **LegacyValidator.validate()** - 89 call sites
   - Input validation scattered everywhere
   - No clear validation strategy

3. **UserService.authenticate()** - 23 call sites
   - Called from multiple servlets
   - Session management tightly coupled

## Design Pattern Usage

| Pattern | Implementation | Notes |
|---------|----------------|-------|
| Singleton | DatabaseConnection, ConfigManager | Thread-unsafe (bug risk) |
| Factory | UserFactory, OrderFactory | Creates domain objects |
| Observer | EventBus, NotificationManager | Event-driven async tasks |
| DAO | *DAO classes | Raw JDBC, no ORM |

## Technical Debt Detected

1. ❌ **MD5 Password Hashing** (CRITICAL)
   - File: `PasswordHasher.java:42`
   - Risk: Passwords crackable in seconds

2. ⚠️ **Thread-Unsafe Singleton** (HIGH)
   - File: `DatabaseConnection.java`
   - Risk: Race conditions under load

3. ⚠️ **No Connection Pooling** (MEDIUM)
   - Each request creates new DB connection
   - Scalability bottleneck

4. ℹ️ **Servlet 2.5** (LOW)
   - Ancient API (2005)
   - Recommend upgrade to Servlet 4.0+

You’re seeing exactly what code archaeology enables - transforming undocumented legacy systems into understandable, maintainable architectures!

The Core Question You’re Answering

“How do you understand a 10-year-old codebase with zero documentation and the original developers long gone?”

Before you write any code, sit with this question. Every developer faces this:

  • You inherit a legacy project (acquired company, staff turnover, archaeological dig)
  • No README, no diagrams, no comments worth reading
  • “Just read the code” - but there are 500 files and 100K lines

Manual approach (weeks of work):

Week 1: Read random files, get overwhelmed
Week 2: Find the entry point (main() or servlet)
Week 3: Trace execution paths with a debugger
Week 4: Draw diagrams on a whiteboard
Week 5: Finally understand 20% of the system

AI-assisted approach (hours):

Step 1: Feed entire codebase to Kiro
Step 2: "Map the architecture and generate call graphs"
Step 3: Review generated DIAGRAM.md
Step 4: Ask follow-up questions ("Why does UserService call PaymentDAO directly?")
Result: 80% understanding in 4 hours

This is code archaeology - using static analysis, pattern recognition, and LLM reasoning to reverse-engineer systems.

Concepts You Must Understand First

Stop and research these before coding:

  1. Static Code Analysis
    • What is an Abstract Syntax Tree (AST)? (parse tree of code structure)
    • How do you extract method calls from source code? (AST traversal)
    • What tools exist? (Understand, Sourcetrail, javaparser, tree-sitter)
    • Book Reference: “Compilers: Principles, Techniques, and Tools” Ch. 4 (Syntax Analysis) - Aho, Lam, Sethi, Ullman
  2. Call Graph Construction
    • What is a call graph? (directed graph: nodes = methods, edges = calls)
    • Static vs dynamic call graphs (compile-time vs runtime)
    • How do you handle polymorphism? (method dispatch is ambiguous)
    • Paper Reference: “Practical Algorithms for Call Graph Construction” - Grove & Chambers
  3. Dependency Analysis
    • What is coupling? (how tightly connected are modules)
    • What is cohesion? (how focused is a module’s purpose)
    • How do you detect circular dependencies? (cycle detection in directed graphs)
    • Book Reference: “Clean Architecture” Ch. 13-14 (Component Cohesion/Coupling) - Robert C. Martin
  4. Design Pattern Recognition
    • Common patterns: Singleton, Factory, Observer, Strategy, DAO
    • How do you detect patterns in code? (structural matching, AST patterns)
    • What are anti-patterns? (God Object, Spaghetti Code, Shotgun Surgery)
    • Book Reference: “Design Patterns” - Gamma, Helm, Johnson, Vlissides (Gang of Four)
  5. Legacy Code Characteristics
    • What defines “legacy”? (no tests, no docs, fear of change)
    • How do you prioritize what to understand first? (entry points, hot paths)
    • What is the strangler fig pattern? (gradually replace legacy system)
    • Book Reference: “Working Effectively with Legacy Code” Ch. 1-2 - Michael Feathers

Questions to Guide Your Design

Before implementing, think through these:

  1. Entry Point Detection
    • How do you find main()? (search for public static void main)
    • What about web apps? (Servlet annotations, web.xml)
    • What about background jobs? (@Scheduled, cron config)
  2. Call Graph Scope
    • Full codebase or just application code? (exclude libraries?)
    • How deep to trace calls? (1 level? All transitive dependencies?)
    • How do you handle reflection? (runtime method invocation)
  3. Visualization Format
    • ASCII art in markdown (simple, readable in GitHub)
    • Graphviz DOT (generates PNG diagrams)
    • Mermaid.js (renders in markdown viewers)
  4. Prioritization
    • What’s most important to document first? (entry points, critical paths)
    • How do you identify “hot spots”? (most-called methods)
    • What about dead code? (unreachable methods)
  5. Technical Debt Detection
    • Security issues (MD5, SQL injection, XSS)
    • Performance problems (N+1 queries, missing indexes)
    • Maintainability issues (God classes, long methods)

Thinking Exercise

Trace Architecture Extraction

Before coding, manually analyze this legacy Java snippet:

Given:

// MainServlet.java
public class MainServlet extends HttpServlet {
    protected void doPost(HttpServletRequest req, HttpServletResponse resp) {
        String username = req.getParameter("username");
        String password = req.getParameter("password");

        User user = UserService.getInstance().authenticate(username, password);
        if (user != null) {
            SessionManager.createSession(req, user);
            resp.sendRedirect("/dashboard");
        }
    }
}

// UserService.java
public class UserService {
    private static UserService instance;

    public static UserService getInstance() {
        if (instance == null) {
            instance = new UserService();
        }
        return instance;
    }

    public User authenticate(String username, String password) {
        User user = UserDAO.findByUsername(username);
        if (user != null && PasswordHasher.verify(password, user.getPasswordHash())) {
            return user;
        }
        return null;
    }
}

Trace the analysis:

  1. Entry Point Identification
    • MainServlet.doPost() is the entry point (HTTP POST /login)
    • Question: How do you know this handles /login? (need web.xml or @WebServlet annotation)
  2. Call Graph
    MainServlet.doPost()
      ├─ UserService.getInstance()
      ├─ UserService.authenticate()
      │    ├─ UserDAO.findByUsername()
      │    └─ PasswordHasher.verify()
      └─ SessionManager.createSession()
    
    • Question: What methods does UserDAO.findByUsername() call? (need to analyze UserDAO source)
  3. Pattern Detection
    • Singleton: UserService.getInstance() (lazy initialization)
    • DAO: UserDAO (data access layer)
    • Question: Is this thread-safe? (NO! Double-checked locking bug)
  4. Technical Debt
    • Singleton is thread-unsafe (race condition)
    • No input validation (SQL injection risk)
    • Direct string comparison (timing attack on password)
    • Question: What’s the priority order for fixing? (security > concurrency > style)

Questions while tracing:

  • How do you handle method overloading? (multiple findByUsername() signatures)
  • What if UserDAO uses reflection? (can’t see calls statically)
  • How deep should the call graph go? (stop at library boundaries?)

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Walk me through how you would reverse-engineer a 100K-line Java codebase with no documentation. What’s your systematic approach?”

  2. “You detect a Singleton pattern in legacy code that’s accessed from 50 places. How would you refactor it safely without breaking the app?”

  3. “Your call graph tool reports 10,000 method calls. How would you prioritize which ones to document first?”

  4. “Explain the difference between static and dynamic call graphs. When would you need a dynamic call graph despite the extra complexity?”

  5. “You find a method called 500 times across the codebase. How would you determine if this is a design problem or just a legitimate utility method?”

  6. “How would you detect and visualize circular dependencies in a legacy codebase? What tools and algorithms would you use?”

Hints in Layers

Hint 1: Start with Entry Point Detection Use grep to find main methods and servlets:

# Find Java main methods
rg "public static void main" --type java

# Find servlets
rg "extends HttpServlet" --type java
rg "@WebServlet" --type java

# Find Spring controllers
rg "@Controller|@RestController" --type java

Hint 2: Parse Source Code into AST Use a parsing library (don’t write a parser from scratch):

# For Java: use javalang or tree-sitter
import javalang

tree = javalang.parse.parse(java_source_code)

for path, node in tree.filter(javalang.tree.MethodInvocation):
    print(f"Method call: {node.member}")
    # Extract: UserService.getInstance().authenticate()

Hint 3: Build Call Graph Create a directed graph of method calls:

call_graph = {}  # {caller: [callees]}

for file in java_files:
    tree = parse(file)
    for method in tree.methods:
        caller = f"{method.class_name}.{method.name}"
        callees = extract_method_calls(method)
        call_graph[caller] = callees

# Example output:
# {
#   'MainServlet.doPost': ['UserService.authenticate', 'SessionManager.createSession'],
#   'UserService.authenticate': ['UserDAO.findByUsername', 'PasswordHasher.verify']
# }

Hint 4: Detect Hot Spots Count incoming edges to find most-called methods:

call_counts = {}
for caller, callees in call_graph.items():
    for callee in callees:
        call_counts[callee] = call_counts.get(callee, 0) + 1

# Sort by frequency
hot_spots = sorted(call_counts.items(), key=lambda x: x[1], reverse=True)
# [('DatabaseConnection.execute', 156), ('LegacyValidator.validate', 89), ...]

Hint 5: Identify Design Patterns Pattern matching on AST structure:

# Detect Singleton (lazy initialization)
for method in tree.filter(javalang.tree.MethodDeclaration):
    if method.name == 'getInstance' and 'static' in method.modifiers:
        # Check for: if (instance == null) instance = new ...
        print(f"Singleton detected: {method.class_name}")

Hint 6: Generate ASCII Diagram Format call paths as a tree:

def print_call_tree(method, graph, depth=0, max_depth=3):
    if depth > max_depth:
        return
    indent = "  " * depth
    print(f"{indent}├─ {method}")
    for callee in graph.get(method, []):
        print_call_tree(callee, graph, depth + 1)

# Output:
# ├─ MainServlet.doPost
#   ├─ UserService.authenticate
#     ├─ UserDAO.findByUsername
#     └─ PasswordHasher.verify

Hint 7: Use LLM for Pattern Explanation Once you have the call graph, ask Kiro to explain it:

prompt = f"""
Based on this call graph:

{json.dumps(call_graph, indent=2)}

1. What architectural pattern is this? (MVC, layered, etc.)
2. Identify the entry points
3. Spot any design issues or anti-patterns
4. Generate a markdown diagram
"""

explanation = llm(prompt)

Hint 8: Validate Generated Diagram Cross-reference with actual code execution:

  • Run the app with a debugger
  • Set breakpoints at entry points
  • Trace actual call stack
  • Compare with static analysis diagram

Books That Will Help

Topic Book Chapter
Static Code Analysis “Compilers: Principles, Techniques, and Tools” by Aho et al. Ch. 4 (Syntax Analysis)
Call Graph Algorithms “Engineering a Compiler” by Cooper & Torczon Ch. 9 (Data-Flow Analysis)
Design Patterns “Design Patterns” by Gang of Four All chapters
Legacy Code Understanding “Working Effectively with Legacy Code” by Michael Feathers Ch. 1-2, 16
Software Architecture “Clean Architecture” by Robert C. Martin Ch. 13-14 (Components)

Common Pitfalls & Debugging

Problem 1: “Call graph includes too many library methods (java.util.*, etc.)”

  • Why: No filtering - you’re graphing the entire JDK.
  • Fix: Filter out standard library packages. Only graph application code.
  • Quick test: Call graph should have <1000 nodes for a typical app.

Problem 2: “Missing method calls - graph is incomplete”

  • Why: Reflection, lambda expressions, or method references not detected.
  • Fix: Combine static analysis with dynamic profiling (run app with instrumentation).
  • Quick test: Cross-check against actual execution trace from debugger.

Problem 3: “Singleton detection produces false positives”

  • Why: Any method named getInstance() triggers detection.
  • Fix: Check for static field + lazy initialization pattern, not just method name.
  • Quick test: Manual code review of detected Singletons.

Problem 4: “Generated diagram is unreadable - 1000+ lines of ASCII”

  • Why: Showing entire call graph instead of high-level architecture.
  • Fix: Create multiple diagrams: overview + detailed sub-graphs for each layer.
  • Quick test: Overview diagram should fit on one screen (<50 lines).

Problem 5: “Analysis takes 10 minutes for 500 files”

  • Why: Parsing each file from scratch on every run.
  • Fix: Cache parsed ASTs, only re-parse changed files.
  • Quick test: Second run should be <5 seconds (cache hit).

Definition of Done

  • Entry points identified: Main methods, servlets, scheduled tasks are found
  • Call graph built: Method-to-method calls extracted from source code
  • Hot spots detected: Most-called methods identified (top 10)
  • Design patterns recognized: Singleton, Factory, DAO, etc. detected
  • Architecture diagram generated: High-level 3-tier or N-tier diagram created
  • Critical paths documented: Authentication, payment, etc. workflows traced
  • Technical debt flagged: Security issues (MD5, SQL injection) highlighted
  • Markdown format: DIAGRAM.md is readable in GitHub
  • Validation: Diagram matches actual execution (verified with debugger)
  • Performance: Analysis completes in <2 minutes for 500-file codebase

Project 32: “The Reverse Documenter” — Documentation

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Markdown
Coolness Level Level 2: Practical
Difficulty Level 2: Intermediate
Knowledge Area Documentation

What you’ll build: Generate docs and tests from an undocumented function.

Why it teaches Verification: If the AI cannot explain it, the code is too complex.

Success criteria:

  • A new unit test covers the documented edge cases.

Real World Outcome

You’ll have a Kiro workflow that analyzes undocumented code and generates:

  1. JSDoc/docstring comments explaining purpose, parameters, return values, and edge cases
  2. Unit tests covering normal cases, edge cases, and error conditions
  3. README sections documenting public APIs
  4. Type definitions (TypeScript interfaces, Python type hints)

Example Session:

$ kiro "Document the calculateDiscount function in src/pricing.ts and generate tests"

🔍 Analyzing undocumented function...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Reading: src/pricing.ts
Function: calculateDiscount(price: number, userTier: string, promoCode?: string)

Static Analysis Results:
├─ Parameters: 3 (2 required, 1 optional)
├─ Return type: number
├─ Branches: 8 (if/else conditions)
├─ External calls: 2 (fetchPromoDetails, getUserHistory)
└─ Edge cases detected: 5 (null checks, tier validation, promo expiration)

📝 Generating Documentation...

/**
 * Calculates the final discount percentage for a purchase.
 *
 * Combines base tier discount with promotional codes. Tier discounts
 * stack multiplicatively with promo codes. Premium tier users receive
 * an additional 5% on top of promo discounts.
 *
 * @param price - The original price before any discounts (must be > 0)
 * @param userTier - User membership tier: 'free', 'pro', or 'premium'
 * @param promoCode - Optional promotional code (e.g., 'SAVE20')
 * @returns Final discount percentage (0-100)
 *
 * @throws {Error} If price <= 0 or userTier is invalid
 *
 * @example
 * calculateDiscount(100, 'pro', 'SAVE20') // Returns 25 (5% tier + 20% promo)
 * calculateDiscount(100, 'premium', 'SAVE20') // Returns 30 (5% tier + 20% promo + 5% premium bonus)
 */

✅ Documentation added to src/pricing.ts

🧪 Generating Test Cases...

describe('calculateDiscount', () => {
  it('should apply tier discount only when no promo code', () => {
    expect(calculateDiscount(100, 'pro')).toBe(5);
  });

  it('should stack tier and promo discounts', () => {
    expect(calculateDiscount(100, 'pro', 'SAVE20')).toBe(25);
  });

  it('should apply premium bonus to promo codes', () => {
    expect(calculateDiscount(100, 'premium', 'SAVE20')).toBe(30);
  });

  it('should throw error for negative price', () => {
    expect(() => calculateDiscount(-10, 'pro')).toThrow('Price must be positive');
  });

  it('should throw error for invalid tier', () => {
    expect(() => calculateDiscount(100, 'gold')).toThrow('Invalid user tier');
  });

  it('should return 0 for expired promo codes', () => {
    expect(calculateDiscount(100, 'pro', 'EXPIRED2023')).toBe(5); // Only tier discount
  });
});

✅ Test file created: src/__tests__/pricing.test.ts
✅ 6 test cases covering all branches

📊 Coverage Analysis:
├─ Line coverage: 100%
├─ Branch coverage: 100%
└─ Edge cases covered: 5/5

💡 Complexity Signal: calculateDiscount has a cyclomatic complexity of 8.
   Consider refactoring into smaller functions if it grows further.

What You See:

  • Auto-generated JSDoc comments explaining every parameter, return value, and edge case
  • Complete test suite covering happy paths, edge cases, and error conditions
  • Coverage report showing 100% branch coverage
  • Complexity warnings when functions are too dense

The Core Question You’re Answering

“How can AI reverse-engineer intent from undocumented code, and can it generate tests that prove its understanding is correct?”

This project forces you to confront the verification problem: if Kiro generates documentation that sounds plausible but is wrong, the tests will fail. This feedback loop ensures the AI actually understands the code, not just pattern-matches documentation style.


Concepts You Must Understand First

Stop and research these before coding:

  1. Static Code Analysis (AST Parsing)
    • What is an Abstract Syntax Tree and how do you traverse it?
    • How do you extract function signatures, parameter types, and control flow?
    • How do you detect edge cases (null checks, boundary conditions)?
    • Book Reference: “Compilers: Principles and Practice” by Parag H. Dave - Ch. 2-3
  2. Test Generation Strategies
    • What is the difference between property-based testing and example-based testing?
    • How do you identify equivalence classes for input partitioning?
    • What is branch coverage vs line coverage vs path coverage?
    • Book Reference: “The Art of Software Testing” by Glenford J. Myers - Ch. 4-5
  3. Documentation Standards
    • What are JSDoc, docstring, and XML documentation comment conventions?
    • How do you write documentation that survives refactoring?
    • What level of detail is appropriate for public vs private APIs?
    • Reference: JSDoc specification, PEP 257 (Python Docstring Conventions)
  4. Cyclomatic Complexity
    • How do you measure code complexity (McCabe metric)?
    • Why does high complexity correlate with bugs?
    • When should you refactor based on complexity scores?
    • Book Reference: “Code Complete” by Steve McConnell - Ch. 19

Questions to Guide Your Design

Before implementing, think through these:

  1. Code Understanding
    • How will you parse the target function (AST parser vs regex vs LLM-based)?
    • How will you identify edge cases (static analysis vs symbolic execution)?
    • How will you handle external dependencies (mocking vs integration tests)?
    • How will you detect the function’s actual behavior vs its intended behavior?
  2. Documentation Quality
    • How will you validate that generated docs match actual behavior?
    • How will you avoid hallucinating functionality that doesn’t exist?
    • How will you decide which details to include vs omit?
    • How will you maintain docs when code changes (watch for drift)?
  3. Test Coverage
    • How will you ensure tests actually validate the documented behavior?
    • How will you generate realistic test data (random vs domain-specific)?
    • How will you avoid brittle tests that break on refactoring?
    • How will you measure test quality (mutation testing)?

Thinking Exercise

Exercise: Analyze This Undocumented Function

Given this undocumented JavaScript function:

function process(data, opts) {
  if (!data) return [];
  const result = [];
  const limit = opts?.max || 100;

  for (let i = 0; i < data.length && i < limit; i++) {
    if (data[i].status === 'active' || opts?.includeInactive) {
      result.push({
        ...data[i],
        processed: true,
        timestamp: Date.now()
      });
    }
  }

  return opts?.reverse ? result.reverse() : result;
}

Questions while analyzing:

  • What are the possible input types for data and opts?
  • What are all the edge cases (null data, empty array, missing opts, etc.)?
  • What is the function’s actual purpose based on its behavior?
  • What would be a good name for this function?
  • What test cases would prove you understand its behavior?
  • What happens if data is not an array? Should that be documented/tested?

Expected Documentation:

/**
 * Filters and processes active records from a dataset, with optional limits and ordering.
 *
 * @param {Array<{status: string}>} data - Array of objects with at least a `status` field
 * @param {Object} [opts] - Optional configuration
 * @param {number} [opts.max=100] - Maximum number of records to process
 * @param {boolean} [opts.includeInactive=false] - Whether to include non-active records
 * @param {boolean} [opts.reverse=false] - Whether to reverse the output order
 * @returns {Array<Object>} Processed records with added `processed` and `timestamp` fields
 *
 * @example
 * process([{status: 'active', id: 1}], {max: 50})
 * // Returns: [{status: 'active', id: 1, processed: true, timestamp: 1704211234567}]
 */

Expected Test Cases:

  1. Returns empty array when data is null/undefined
  2. Filters out inactive records by default
  3. Includes inactive records when opts.includeInactive is true
  4. Limits output to opts.max records
  5. Reverses output when opts.reverse is true
  6. Adds processed: true and current timestamp to each record

The Interview Questions They’ll Ask

  1. “How would you detect if AI-generated documentation is hallucinating functionality that doesn’t exist in the code?”

  2. “Explain the difference between documenting what code does vs why it does it. Which should AI focus on?”

  3. “How would you validate that generated tests actually cover the documented edge cases?”

  4. “What strategies would you use to keep documentation in sync with code as it evolves?”

  5. “How would you measure the quality of AI-generated tests (beyond simple code coverage)?”

  6. “Explain how mutation testing could validate that your tests actually catch bugs, not just execute lines.”


Hints in Layers

Hint 1: AST-Based Analysis Use a proper parser (TypeScript Compiler API, Babel, tree-sitter) to extract:

  • Function signature (name, parameters, return type)
  • Control flow branches (if/else, switch, loops)
  • External dependencies (function calls, imports)
  • Type annotations (if available)

Hint 2: Edge Case Detection Look for these patterns in the AST:

  • if (!x) or if (x == null) → null check
  • if (arr.length === 0) → empty array check
  • if (x < 0) or if (x > MAX) → boundary conditions
  • throw new Error(...) → error cases
  • try/catch → exception handling

Hint 3: Test Generation Strategy For each branch in the code:

  1. Generate a test that triggers that branch
  2. Assert the expected output for that branch
  3. Add a test for the inverse condition (branch not taken)
  4. Add boundary tests (min, max, just-above, just-below)

Hint 4: Documentation Validation Loop

1. Generate documentation from code analysis
2. Generate tests from documentation
3. Run tests against actual code
4. If tests fail → documentation was wrong → regenerate
5. If tests pass → documentation matches behavior ✓

Hint 5: Complexity Signals If a function has:

  • Cyclomatic complexity > 10 → suggest refactoring before documenting
  • More than 5 parameters → suggest object parameter pattern
  • Deeply nested logic → suggest extracting helper functions
  • No return type annotation → infer and suggest adding it

Books That Will Help

Topic Book Chapter
AST parsing and code analysis “Compilers: Principles and Practice” by Parag H. Dave Ch. 2-3 (Lexical/Syntax Analysis)
Test generation strategies “The Art of Software Testing” by Glenford J. Myers Ch. 4-5 (Test Case Design)
Code complexity metrics “Code Complete” by Steve McConnell Ch. 19 (Complexity Management)
Documentation best practices “Clean Code” by Robert C. Martin Ch. 4 (Comments)
Property-based testing “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert Ch. 1-3

Common Pitfalls & Debugging

Problem 1: “Generated docs claim the function does X, but tests show it does Y”

  • Why: LLM hallucinated functionality based on function name, not actual code behavior
  • Fix: Always validate docs against actual execution (run tests)
  • Quick test: npm test -- --coverage and check if tests pass

Problem 2: “Tests are too brittle - they break when code is refactored”

  • Why: Tests are coupled to implementation details, not behavior
  • Fix: Test public API behavior, not internal implementation
  • Example: Test calculateDiscount(100, 'pro') === 5 not expect(tierDiscountMap['pro']).toBe(0.05)

Problem 3: “AST parser fails on modern JavaScript syntax (optional chaining, nullish coalescing)”

  • Why: Using outdated parser or wrong parser configuration
  • Fix: Use TypeScript Compiler API or Babel with latest preset
  • Quick test: npx tsc --version (ensure TypeScript 5.x+)

Problem 4: “Generated tests have 100% line coverage but miss critical bugs”

  • Why: Line coverage doesn’t measure test quality, only execution
  • Fix: Add mutation testing (Stryker) to validate tests catch bugs
  • Quick test: npx stryker run and check mutation score

Problem 5: “Function is too complex to document clearly (cyclomatic complexity 20+)”

  • Why: Function violates Single Responsibility Principle
  • Fix: Suggest refactoring before documenting: “This function is too complex. Consider breaking it into smaller functions: extractActiveRecords(), applyLimit(), applyTransform()”
  • Signal: If you can’t write clear docs, the code is too complex

Definition of Done

  • Generated documentation includes purpose, all parameters, return value, and examples
  • All documented edge cases have corresponding test cases
  • Tests achieve 100% branch coverage (not just line coverage)
  • Tests pass when run against the actual code
  • Documentation follows language conventions (JSDoc/docstring/XML doc)
  • Complexity warnings are shown for functions with cyclomatic complexity > 10
  • Generated tests use realistic test data (not just foo, bar, 123)
  • Tests are independent (no shared state between tests)
  • Error cases are documented and tested (throw conditions, edge cases)
  • Public API documentation includes usage examples

Project 33: “The Full Stack Scaffolder” — Rapid Prototyping

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language React / Node.js
Coolness Level Level 3: Genuinely Clever
Difficulty Level 2: Intermediate
Knowledge Area Rapid Prototyping

What you’ll build: From empty directory to working full stack app in one session.

Why it teaches Agency: The planner and executor must collaborate.

Success criteria:

  • App runs locally with a working UI and API.

Real World Outcome

You’ll have a Kiro workflow that transforms a high-level specification into a working full-stack application with:

  1. Backend API (Express/Fastify with TypeScript)
  2. Frontend UI (React/Next.js with Tailwind CSS)
  3. Database (PostgreSQL with Prisma ORM)
  4. Authentication (JWT or session-based)
  5. Development tooling (ESLint, Prettier, TypeScript, tests)
  6. Deployment readiness (Dockerfiles, environment configs)

Example Session:

$ mkdir task-manager-app && cd task-manager-app

$ kiro "Build a full-stack task manager app with user authentication, task CRUD, and tagging. Use React, Node.js, PostgreSQL."

🏗️ Full Stack Scaffolder - Planning Phase
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📋 Requirements Analysis:
├─ Frontend: React with TypeScript
├─ Backend: Node.js + Express + TypeScript
├─ Database: PostgreSQL + Prisma ORM
├─ Features: Auth, Task CRUD, Tags
└─ Estimated files: ~45 files

📐 Architecture Plan:

task-manager-app/
├── backend/
│   ├── src/
│   │   ├── routes/        (auth.ts, tasks.ts, tags.ts)
│   │   ├── middleware/    (auth.ts, validation.ts, error.ts)
│   │   ├── services/      (AuthService.ts, TaskService.ts)
│   │   ├── models/        (Prisma client)
│   │   └── server.ts
│   ├── prisma/
│   │   └── schema.prisma
│   ├── tests/
│   └── package.json
│
└── frontend/
    ├── src/
    │   ├── components/    (TaskList, TaskForm, LoginForm)
    │   ├── pages/         (Dashboard, Login, Register)
    │   ├── hooks/         (useAuth, useTasks)
    │   ├── api/           (client.ts)
    │   └── App.tsx
    ├── public/
    └── package.json

🔄 Executing Build Plan...

[1/8] Setting up backend structure...
✅ Created backend/ directory
✅ Initialized package.json with dependencies
✅ Created tsconfig.json
✅ Set up folder structure

[2/8] Installing backend dependencies...
✅ express, cors, dotenv, jsonwebtoken
✅ prisma, @prisma/client, bcrypt
✅ TypeScript, @types/* packages
✅ jest, supertest (dev dependencies)

[3/8] Configuring Prisma schema...
✅ Created prisma/schema.prisma with User, Task, Tag models
✅ Generated Prisma Client
✅ Created initial migration

model User {
  id        String   @id @default(uuid())
  email     String   @unique
  password  String
  tasks     Task[]
  createdAt DateTime @default(now())
}

model Task {
  id          String   @id @default(uuid())
  title       String
  description String?
  status      TaskStatus @default(TODO)
  userId      String
  user        User     @relation(fields: [userId], references: [id])
  tags        Tag[]
  createdAt   DateTime @default(now())
  updatedAt   DateTime @updatedAt
}

[4/8] Building backend API routes...
✅ POST   /api/auth/register
✅ POST   /api/auth/login
✅ GET    /api/tasks (protected)
✅ POST   /api/tasks (protected)
✅ PUT    /api/tasks/:id (protected)
✅ DELETE /api/tasks/:id (protected)
✅ GET    /api/tags (protected)

[5/8] Setting up frontend structure...
✅ Created React app with TypeScript
✅ Installed Tailwind CSS
✅ Created component structure
✅ Set up React Router

[6/8] Building frontend components...
✅ LoginForm component (email, password, validation)
✅ TaskList component (fetch, display, delete)
✅ TaskForm component (create/edit tasks)
✅ TaskCard component (status badge, tags, actions)
✅ ProtectedRoute wrapper (auth check)

[7/8] Connecting frontend to backend...
✅ API client with JWT interceptor
✅ useAuth hook (login, register, logout, token refresh)
✅ useTasks hook (CRUD operations)
✅ Environment variables (.env.local)

[8/8] Final setup...
✅ Created README.md with setup instructions
✅ Created .gitignore
✅ Created Docker Compose file (Postgres + app)
✅ Created example.env files
✅ Added npm scripts (dev, build, test)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ Full Stack App Generated Successfully!

📦 Files created: 47 files
📏 Total lines of code: ~3,200 LOC

🚀 Quick Start:

# Terminal 1 (Backend)
cd backend
cp example.env .env    # Then edit DATABASE_URL
npm install
npm run db:migrate
npm run dev            # Runs on http://localhost:3001

# Terminal 2 (Frontend)
cd frontend
cp example.env .env.local
npm install
npm run dev            # Runs on http://localhost:3000

# Terminal 3 (Database)
docker-compose up postgres

🎯 Try it out:
1. Open http://localhost:3000
2. Register a new account
3. Create a task with tags
4. See it persist in PostgreSQL!

What You See:

  • Working full-stack app generated from a single prompt
  • Backend API with authentication, authorization, and CRUD endpoints
  • Frontend UI with login, task list, task creation, and tag filtering
  • Database schema with relationships (User → Tasks → Tags)
  • Complete development environment (hot reload, TypeScript, linting)
  • Production-ready patterns (error handling, validation, logging)

The Core Question You’re Answering

“How can AI coordinate multiple sub-agents to plan, architect, and implement a cohesive full-stack application without human intervention?”

This project explores multi-agent collaboration: Kiro must act as both architect (planning) and builder (executing). The planner agent creates a dependency graph (database → backend → frontend), while executor agents implement each layer. Success requires maintaining architectural coherence across all layers.


Concepts You Must Understand First

Stop and research these before coding:

  1. Full-Stack Architecture Patterns
    • What is the separation between presentation, business logic, and data layers?
    • How do you design REST APIs that are easy to evolve?
    • What is the role of an ORM (Prisma, TypeORM, Sequelize)?
    • Book Reference: “Patterns of Enterprise Application Architecture” by Martin Fowler - Ch. 1-3
  2. Dependency Management and Build Order
    • Why must the database schema be defined before the backend?
    • Why must the API routes be defined before the frontend?
    • How do you handle circular dependencies (frontend ↔ backend during dev)?
    • Book Reference: “Building Microservices” by Sam Newman - Ch. 4 (Integration)
  3. Code Generation Quality
    • How do you generate code that is readable, not just functional?
    • What conventions should generated code follow (naming, file structure)?
    • How do you avoid generating brittle code that breaks on changes?
    • Book Reference: “Clean Code” by Robert C. Martin - Ch. 2-3
  4. Testing Full-Stack Applications
    • What is the difference between unit tests, integration tests, and E2E tests?
    • How do you test API endpoints (mocking vs real database)?
    • How do you test frontend components (React Testing Library)?
    • Book Reference: “Test-Driven Development” by Kent Beck - Ch. 1-5

Questions to Guide Your Design

Before implementing, think through these:

  1. Planning Strategy
    • How will you decompose the high-level spec into tasks?
    • How will you determine the build order (database → backend → frontend)?
    • How will you handle missing requirements (e.g., “should we use sessions or JWT?”)?
    • How will you validate the plan before executing?
  2. Agent Coordination
    • How will you split work across sub-agents (one per layer vs one per feature)?
    • How will agents communicate shared context (API contracts, types)?
    • How will you handle failures mid-build (rollback vs partial completion)?
    • How will you parallelize independent tasks (frontend UI + backend routes)?
  3. Code Quality
    • How will you ensure generated code follows best practices?
    • How will you avoid hardcoding secrets (database passwords, API keys)?
    • How will you generate meaningful variable/function names (not foo, bar)?
    • How will you add comments explaining non-obvious logic?
  4. Development Experience
    • How will you set up hot reload for frontend and backend?
    • How will you generate helpful README with setup instructions?
    • How will you create example .env files with placeholder values?
    • How will you add npm scripts for common tasks (dev, test, build, deploy)?

Thinking Exercise

Exercise: Design the Build Order for a Blog Platform

Given the spec: “Build a blog platform with posts, comments, authors, and markdown rendering”

Plan the Dependency Graph:

1. Database Schema (Prisma)
   ├─ User (authors)
   ├─ Post (belongs to User)
   └─ Comment (belongs to Post and User)

2. Backend API Routes
   ├─ POST /api/auth/register
   ├─ POST /api/auth/login
   ├─ GET  /api/posts (public)
   ├─ POST /api/posts (protected, authors only)
   ├─ GET  /api/posts/:id (public)
   └─ POST /api/posts/:id/comments (protected)

3. Frontend Components
   ├─ PostList (fetches from GET /api/posts)
   ├─ PostDetail (fetches from GET /api/posts/:id, renders markdown)
   ├─ CommentSection (fetches comments, posts new comment)
   └─ AuthorDashboard (fetches user's posts, creates new post)

Questions to answer:

  • Which layer must be built first, and why?
  • Can frontend and backend be built in parallel? What’s needed for that?
  • How would you generate TypeScript types shared between frontend and backend?
  • What testing strategy would you use (unit, integration, E2E)?
  • How would you handle markdown rendering (server-side vs client-side)?
  • What security considerations exist (XSS in markdown, auth on comments)?

The Interview Questions They’ll Ask

  1. “How would you design a multi-agent system where one agent plans and others execute? How do they communicate?”

  2. “Explain the trade-offs between generating a monorepo vs separate repositories for frontend and backend.”

  3. “How would you ensure generated code follows the user’s preferred conventions (tabs vs spaces, naming style)?”

  4. “What strategies would you use to validate the generated app actually works (automated testing vs manual verification)?”

  5. “How would you handle evolving requirements (user asks to add a feature to the generated app)?”

  6. “Explain how you’d generate a database schema that supports future migrations without breaking existing data.”


Hints in Layers

Hint 1: Task Decomposition Strategy Break the spec into layers and features:

  1. Infrastructure: Database, environment config, Docker
  2. Backend Foundation: Express server, middleware, error handling
  3. Database Models: Prisma schema, migrations
  4. Backend Features: Auth routes → Task routes → Tag routes
  5. Frontend Foundation: React app, routing, API client
  6. Frontend Features: Login page → Dashboard → Task components

Hint 2: Sub-Agent Workflow Use Kiro’s subagent system:

# Planning agent
kiro plan "Generate full-stack task manager app"

# Execution agents (parallel)
kiro task "Set up backend Express server with TypeScript"
kiro task "Create Prisma schema with User, Task, Tag models"
kiro task "Build frontend React app with Tailwind"

Hint 3: Shared Type Generation Generate TypeScript types that both frontend and backend use:

// backend/src/types/api.ts
export interface Task {
  id: string;
  title: string;
  status: 'TODO' | 'IN_PROGRESS' | 'DONE';
  tags: Tag[];
}

// frontend/src/types/api.ts (copy or import from backend)
import type { Task } from '../../backend/src/types/api';

Better: Use a shared types/ package or generate from Prisma schema.

Hint 4: Validation Before Execution Before generating 47 files, validate the plan:

1. Show the directory structure to the user
2. Ask: "Should I proceed with this architecture?"
3. If approved → execute
4. If not → refine plan based on feedback

Hint 5: Incremental Verification After each layer, validate it works:

1. Generate database schema → Run `prisma generate` → Check for errors
2. Generate backend routes → Run `npm run build` → Check TypeScript errors
3. Generate frontend → Run `npm run build` → Check for compilation errors
4. Start servers → Run `curl http://localhost:3001/health` → Verify 200 OK

Books That Will Help

Topic Book Chapter
Full-stack architecture “Patterns of Enterprise Application Architecture” by Martin Fowler Ch. 1-3 (Layering)
REST API design “RESTful Web APIs” by Leonard Richardson Ch. 3-5 (Resource Design)
React patterns “React Design Patterns and Best Practices” by Michele Bertoli Ch. 1-4
TypeScript best practices “Effective TypeScript” by Dan Vanderkam Ch. 1-3 (Types)
Testing strategies “Test-Driven Development” by Kent Beck Ch. 1-5
Multi-agent systems “Building Microservices” by Sam Newman Ch. 4 (Integration)

Common Pitfalls & Debugging

Problem 1: “Generated frontend tries to call backend API before backend is running”

  • Why: Frontend hardcodes API URL without checking if backend is reachable
  • Fix: Add health check endpoint and retry logic in API client
  • Quick test: curl http://localhost:3001/health should return 200 OK

Problem 2: “Prisma migration fails with ‘relation does not exist’“

  • Why: Running migrations before database is created
  • Fix: Ensure Docker Compose starts Postgres before running prisma migrate
  • Quick test: docker ps shows postgres container, then npm run db:migrate

Problem 3: “Frontend builds but shows blank page (no errors in console)”

  • Why: React Router routes not configured correctly, or missing index.html
  • Fix: Check App.tsx has <BrowserRouter> and routes defined
  • Quick test: Open browser console, check for 404s or router errors

Problem 4: “Generated code has inconsistent formatting (mix of tabs and spaces)”

  • Why: Different code generators use different formatting
  • Fix: Run Prettier on all generated files: npx prettier --write "**/*.{ts,tsx}"
  • Quick test: git diff should show consistent indentation

Problem 5: “Backend API returns 401 Unauthorized for all protected routes”

  • Why: JWT secret not set in .env, or frontend not sending token
  • Fix: Check .env has JWT_SECRET, check frontend sets Authorization: Bearer <token>
  • Quick test: curl -H "Authorization: Bearer <token>" http://localhost:3001/api/tasks

Problem 6: “Generated app works locally but environment variables are committed to git”

  • Why: .env file was committed instead of example.env
  • Fix: Add .env to .gitignore, commit example.env with placeholders
  • Security: Never commit real secrets; use example.env with CHANGE_ME values

Definition of Done

  • Backend server starts without errors (npm run dev in backend/)
  • Frontend builds and runs (npm run dev in frontend/)
  • Database migrations apply cleanly (npm run db:migrate)
  • All API routes return expected responses (test with curl or Postman)
  • Frontend can register a user, login, and perform CRUD operations
  • TypeScript compiles without errors in both frontend and backend
  • ESLint and Prettier pass on all generated files
  • README.md includes setup instructions and quick start guide
  • .gitignore excludes node_modules, .env, build artifacts
  • Example environment files (example.env) are provided with placeholders
  • Docker Compose file successfully starts the database
  • At least one test passes in backend (API route test) and frontend (component test)

Project 34: “The Cloud Native Deployer” — DevOps

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Docker / Kubernetes YAML
Coolness Level Level 2: Practical
Difficulty Level 3: Advanced
Knowledge Area DevOps

What you’ll build: Dockerize Project 33 and generate Kubernetes manifests.

Why it teaches Ops: Kiro removes boilerplate friction.

Success criteria:

  • Containers build and Kubernetes manifests apply cleanly.

Real World Outcome

You’ll have a Kiro workflow that containerizes an existing full-stack application and generates production-ready Kubernetes manifests with:

  1. Multi-stage Dockerfiles (optimized layer caching, minimal image size)
  2. Kubernetes Deployments (frontend, backend, database)
  3. Service definitions (ClusterIP, LoadBalancer, NodePort)
  4. ConfigMaps and Secrets (environment variables, database credentials)
  5. Persistent Volumes (PostgreSQL data persistence)
  6. Ingress rules (routing, TLS certificates)
  7. Health checks (liveness and readiness probes)
  8. Resource limits (CPU, memory quotas)

Example Session:

$ cd task-manager-app  # From Project 33

$ kiro "Containerize this full-stack app and generate Kubernetes manifests for production deployment"

🐳 Cloud Native Deployer - Containerization Phase
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📦 Analyzing Project Structure...
├─ Frontend: React app (Node.js build)
├─ Backend: Express API (Node.js runtime)
└─ Database: PostgreSQL (external managed service)

🔨 Generating Dockerfiles...

[1/3] Frontend Dockerfile (Multi-stage build)
✅ Created frontend/Dockerfile

# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Production
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

📊 Image size optimization:
├─ Before: 1.2 GB (node_modules included)
└─ After: 45 MB (only static files + nginx)

[2/3] Backend Dockerfile
✅ Created backend/Dockerfile

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npx prisma generate
EXPOSE 3001
CMD ["npm", "start"]

[3/3] Database (Using managed Postgres)
✅ Skipping Dockerfile (use managed service: AWS RDS, GCP Cloud SQL, or Azure PostgreSQL)
✅ Created k8s/postgres-statefulset.yaml (for local/dev clusters only)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

☸️  Generating Kubernetes Manifests...

[1/8] Namespace
✅ Created k8s/namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
  name: task-manager

[2/8] ConfigMaps
✅ Created k8s/configmap-frontend.yaml
✅ Created k8s/configmap-backend.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: backend-config
  namespace: task-manager
data:
  NODE_ENV: "production"
  PORT: "3001"
  DATABASE_HOST: "postgres-service"
  DATABASE_PORT: "5432"

[3/8] Secrets (Base64 encoded)
✅ Created k8s/secret-db.yaml

apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
  namespace: task-manager
type: Opaque
data:
  DATABASE_URL: <base64-encoded-connection-string>
  JWT_SECRET: <base64-encoded-secret>

⚠️  WARNING: Never commit real secrets to git!
    Use sealed-secrets or external secret management (Vault, AWS Secrets Manager)

[4/8] Persistent Volumes (PostgreSQL)
✅ Created k8s/pvc-postgres.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: task-manager
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

[5/8] Deployments
✅ Created k8s/deployment-frontend.yaml
✅ Created k8s/deployment-backend.yaml
✅ Created k8s/statefulset-postgres.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: task-manager
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: task-manager-backend:latest
        ports:
        - containerPort: 3001
        envFrom:
        - configMapRef:
            name: backend-config
        - secretRef:
            name: db-credentials
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3001
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3001
          initialDelaySeconds: 10
          periodSeconds: 5

[6/8] Services
✅ Created k8s/service-frontend.yaml (type: LoadBalancer)
✅ Created k8s/service-backend.yaml (type: ClusterIP)
✅ Created k8s/service-postgres.yaml (type: ClusterIP)

[7/8] Ingress (NGINX with TLS)
✅ Created k8s/ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: task-manager-ingress
  namespace: task-manager
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  tls:
  - hosts:
    - taskmanager.example.com
    secretName: taskmanager-tls
  rules:
  - host: taskmanager.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: backend
            port:
              number: 3001
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80

[8/8] Helm Chart (Optional)
✅ Created helm/task-manager/Chart.yaml
✅ Created helm/task-manager/values.yaml
✅ Created helm/task-manager/templates/

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ Containerization Complete!

📂 Generated Files:
├── frontend/Dockerfile
├── backend/Dockerfile
├── docker-compose.yml (for local development)
├── k8s/
│   ├── namespace.yaml
│   ├── configmap-frontend.yaml
│   ├── configmap-backend.yaml
│   ├── secret-db.yaml
│   ├── pvc-postgres.yaml
│   ├── deployment-frontend.yaml
│   ├── deployment-backend.yaml
│   ├── statefulset-postgres.yaml
│   ├── service-frontend.yaml
│   ├── service-backend.yaml
│   ├── service-postgres.yaml
│   └── ingress.yaml
└── helm/task-manager/ (chart)

🚀 Deployment Instructions:

# Build and push images
docker build -t your-registry/task-manager-frontend:v1.0 ./frontend
docker build -t your-registry/task-manager-backend:v1.0 ./backend
docker push your-registry/task-manager-frontend:v1.0
docker push your-registry/task-manager-backend:v1.0

# Apply Kubernetes manifests
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap-*.yaml
kubectl apply -f k8s/secret-db.yaml
kubectl apply -f k8s/pvc-postgres.yaml
kubectl apply -f k8s/statefulset-postgres.yaml
kubectl apply -f k8s/service-*.yaml
kubectl apply -f k8s/deployment-*.yaml
kubectl apply -f k8s/ingress.yaml

# Or use Helm
helm install task-manager ./helm/task-manager

# Check deployment status
kubectl get pods -n task-manager
kubectl get services -n task-manager
kubectl logs -n task-manager deployment/backend

What You See:

  • Optimized multi-stage Dockerfiles (45 MB frontend vs 1.2 GB before)
  • Complete Kubernetes manifests with health checks, resource limits, and auto-scaling
  • Ingress configuration with TLS termination
  • Persistent volumes for database data
  • Helm chart for easy deployment and upgrades

The Core Question You’re Answering

“How can AI bridge the gap between ‘works on my machine’ and production-ready cloud-native deployments?”

This project teaches Infrastructure as Code (IaC): converting an application into declarative Kubernetes manifests. Success means understanding the 12-factor app principles, security boundaries (secrets management), and operational concerns (health checks, resource limits).


Concepts You Must Understand First

Stop and research these before coding:

  1. Container Fundamentals
    • What is a container image vs a container instance?
    • How do layers work in Docker (layer caching, .dockerignore)?
    • What is the difference between RUN, CMD, and ENTRYPOINT?
    • Book Reference: “Docker Deep Dive” by Nigel Poulton - Ch. 3-5
  2. Kubernetes Architecture
    • What are Pods, Deployments, Services, and Ingress?
    • How does service discovery work (DNS, ClusterIP)?
    • What is the difference between Deployment and StatefulSet?
    • Book Reference: “Kubernetes in Action” by Marko Lukša - Ch. 1-5
  3. Configuration Management
    • When should you use ConfigMaps vs Secrets vs environment variables?
    • How do you handle database connection strings securely?
    • What is the principle of least privilege for secrets?
    • Book Reference: “Kubernetes Patterns” by Bilgin Ibryam - Ch. 4 (Configuration)
  4. Health Checks and Observability
    • What is the difference between liveness and readiness probes?
    • Why would a pod be “Running” but not “Ready”?
    • How do you prevent cascading failures (circuit breakers, retries)?
    • Book Reference: “Site Reliability Engineering” by Google - Ch. 21 (Monitoring)

Questions to Guide Your Design

Before implementing, think through these:

  1. Docker Image Optimization
    • How will you minimize image size (multi-stage builds, Alpine Linux)?
    • How will you optimize layer caching (COPY package.json before COPY .)?
    • How will you handle secrets during build (BuildKit secrets, .dockerignore)?
    • How will you tag images (semantic versioning, git SHA)?
  2. Kubernetes Resource Sizing
    • How will you determine CPU/memory requests and limits?
    • What happens if a pod exceeds its memory limit (OOMKilled)?
    • How will you handle auto-scaling (HorizontalPodAutoscaler)?
    • How will you prevent resource starvation (PodDisruptionBudgets)?
  3. Database Strategy
    • Will you run Postgres in Kubernetes (StatefulSet) or use managed service (RDS)?
    • How will you handle database migrations (init containers, Jobs)?
    • How will you back up database data (PersistentVolume snapshots)?
    • How will you handle connection pooling (PgBouncer sidecar)?
  4. Networking and Security
    • How will you expose the app (LoadBalancer, Ingress, NodePort)?
    • How will you handle TLS certificates (cert-manager, manual)?
    • How will you restrict network traffic (NetworkPolicies)?
    • How will you inject secrets (mounted volumes, environment variables)?

Thinking Exercise

Exercise: Design a Multi-Region Kubernetes Deployment

Given: “Deploy the task manager app to 3 regions (us-east, eu-west, ap-south) with geo-routing”

Architecture Decisions:

1. Image Registry Strategy
   ├─ Option A: Single registry with global replication (GCR, ECR with cross-region)
   ├─ Option B: Regional registries with image sync
   └─ Trade-off: Latency vs consistency

2. Database Strategy
   ├─ Option A: Single primary DB in us-east, read replicas in other regions
   ├─ Option B: Multi-region DB with CockroachDB or Spanner
   └─ Trade-off: Complexity vs latency

3. Traffic Routing
   ├─ Option A: Global load balancer (AWS Global Accelerator, GCP Cloud Load Balancing)
   ├─ Option B: DNS-based geo-routing (Route 53, Cloud DNS)
   └─ Trade-off: Cost vs failover speed

Questions to answer:

  • How do you ensure all regions run the same version (blue-green deployment)?
  • How do you handle database writes (single-region write master vs multi-master)?
  • How do you route users to the nearest region (latency-based DNS)?
  • How do you handle region failures (automatic failover, manual intervention)?
  • What monitoring would you add (Prometheus, Grafana, distributed tracing)?

The Interview Questions They’ll Ask

  1. “Explain the difference between liveness and readiness probes. Give an example where a pod is alive but not ready.”

  2. “How would you debug a pod that is CrashLoopBackOff? Walk me through your debugging process.”

  3. “What are the security risks of mounting Secrets as environment variables vs as files?”

  4. “Explain how Kubernetes service discovery works. How does a frontend pod find the backend service?”

  5. “How would you perform a zero-downtime deployment with rolling updates in Kubernetes?”

  6. “What is a sidecar container pattern? Give three examples of when you’d use it.”


Hints in Layers

Hint 1: Multi-Stage Docker Builds

# Build stage (large, includes dev dependencies)
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci  # Includes devDependencies for build
COPY . .
RUN npm run build

# Production stage (small, only runtime dependencies)
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]

Hint 2: Kubernetes Resource Manifest Pattern Every Kubernetes resource needs:

  • apiVersion (which API group: v1, apps/v1, networking.k8s.io/v1)
  • kind (resource type: Pod, Deployment, Service, Ingress)
  • metadata (name, namespace, labels)
  • spec (desired state)

Hint 3: Health Check Design

livenessProbe:   # "Is the app alive?" (restart if fails)
  httpGet:
    path: /health
    port: 3001
  initialDelaySeconds: 30  # Wait 30s before first check
  periodSeconds: 10        # Check every 10s

readinessProbe:  # "Is the app ready for traffic?" (remove from service if fails)
  httpGet:
    path: /ready
    port: 3001
  initialDelaySeconds: 10
  periodSeconds: 5

Hint 4: Secrets Management Best Practices Never commit secrets to git:

# k8s/secret-db.yaml.template
apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque
data:
  DATABASE_URL: <REPLACE_WITH_BASE64_ENCODED_VALUE>
  JWT_SECRET: <REPLACE_WITH_BASE64_ENCODED_VALUE>

Generate secrets at deployment time:

kubectl create secret generic db-credentials \
  --from-literal=DATABASE_URL="postgres://..." \
  --from-literal=JWT_SECRET="random-generated-secret"

Hint 5: Apply Order Matters Apply resources in dependency order:

  1. Namespace (everything else goes in here)
  2. ConfigMaps and Secrets (env vars for pods)
  3. PersistentVolumeClaims (storage for StatefulSets)
  4. StatefulSets/Deployments (the apps)
  5. Services (networking between apps)
  6. Ingress (external access)

Books That Will Help

Topic Book Chapter
Docker fundamentals “Docker Deep Dive” by Nigel Poulton Ch. 3-5 (Images, Containers)
Kubernetes architecture “Kubernetes in Action” by Marko Lukša Ch. 1-5 (Pods, Deployments, Services)
K8s configuration patterns “Kubernetes Patterns” by Bilgin Ibryam Ch. 4 (Configuration), Ch. 5 (Health Probes)
Cloud-native apps “Cloud Native DevOps with Kubernetes” by John Arundel Ch. 3-6
Site reliability “Site Reliability Engineering” by Google Ch. 21-23 (Monitoring, Alerts)
12-factor apps “The Twelve-Factor App” by Adam Wiggins All (online resource)

Common Pitfalls & Debugging

Problem 1: “Pod is in CrashLoopBackOff state”

  • Why: Application exits immediately (missing env vars, connection refused to DB)
  • Fix: Check logs: kubectl logs -n task-manager deployment/backend
  • Quick test: kubectl describe pod -n task-manager <pod-name> shows exit code and reason

Problem 2: “Frontend can’t connect to backend API (CORS errors)”

  • Why: Backend Service is ClusterIP (internal only), not reachable from browser
  • Fix: Frontend should call backend via Ingress path /api, not directly
  • Quick test: curl http://<ingress-ip>/api/health from outside cluster

Problem 3: “Database data is lost when pod restarts”

  • Why: No PersistentVolumeClaim, data stored in ephemeral pod storage
  • Fix: Create PVC and mount it to /var/lib/postgresql/data in Postgres pod
  • Quick test: kubectl get pvc -n task-manager shows Bound status

Problem 4: “Secrets are visible in plaintext when running kubectl get secret -o yaml”

  • Why: Secrets are base64-encoded, not encrypted (base64 is reversible!)
  • Fix: Use Sealed Secrets, external secret stores (Vault), or RBAC to restrict access
  • Security: Secrets are only encrypted at rest in etcd, not in transit

Problem 5: “Liveness probe keeps killing healthy pods”

  • Why: initialDelaySeconds is too short; app needs more time to start
  • Fix: Increase initialDelaySeconds to account for startup time (database migrations, cache warming)
  • Quick test: kubectl describe pod <name> shows probe failure logs

Problem 6: “Docker build is slow (reinstalls node_modules every time)”

  • Why: COPY . . invalidates cache before npm install
  • Fix: COPY package.json first, run npm install, then COPY rest of app
  • Pattern:
    COPY package*.json ./
    RUN npm ci
    COPY . .  # This comes AFTER npm ci
    

Definition of Done

  • Dockerfiles build successfully (docker build -t test .)
  • Multi-stage builds reduce image size by >50% (use docker images to compare)
  • All Kubernetes manifests apply without errors (kubectl apply -f k8s/)
  • Pods start and reach “Running” and “Ready” states
  • Liveness and readiness probes pass (check with kubectl describe pod)
  • Frontend is accessible via Ingress (curl from outside cluster)
  • Backend API responds to health checks (/health, /ready)
  • Database data persists across pod restarts (PersistentVolumeClaim)
  • Resource limits prevent pods from consuming excessive CPU/memory
  • Secrets are not committed to git (use .gitignore, sealed-secrets)
  • README includes deployment instructions and troubleshooting guide
  • Helm chart (optional) successfully installs and upgrades the app

Project 35: “The Deep Reasoner (Claude Code)” — Algorithmic Reasoning

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Python
Coolness Level Level 4: Hardcore Tech Flex
Difficulty Level 3: Advanced
Knowledge Area Algorithmic Reasoning

What you’ll build: Solve a complex algorithmic problem using a scratchpad file and Opus model.

Why it teaches Reasoning: You use high-end reasoning as a tool.

Success criteria:

  • The solution passes a non-trivial test set.

Real World Outcome

You’ll have a Kiro workflow that solves complex algorithmic problems by:

  1. Breaking down problems into sub-problems (problem decomposition)
  2. Using a scratchpad to work through examples and edge cases
  3. Switching to Opus (Claude Opus 4.5) for deep reasoning tasks
  4. Iterating on solutions based on test failures
  5. Generating test cases to validate correctness

Example Session:

$ kiro "Solve LeetCode #1235: Maximum Profit in Job Scheduling using dynamic programming with binary search. Use Opus for reasoning."

🧠 Deep Reasoner - Algorithmic Problem Solving
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📋 Problem Analysis Phase (using Opus)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Switching to Claude Opus 4.5 for deep reasoning...

Problem: Maximum Profit in Job Scheduling
Inputs:
  - startTime: [1,2,3,3]
  - endTime: [3,4,5,6]
  - profit: [50,10,40,70]
Output: 120 (jobs 0 and 3)

Constraint: Cannot take two overlapping jobs.

📝 Creating Scratchpad: /tmp/job_scheduling_scratchpad.md

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SCRATCHPAD: Job Scheduling Problem
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Step 1: Understanding the Problem

Jobs:
Job 0: [1, 3] profit=50
Job 1: [2, 4] profit=10
Job 2: [3, 5] profit=40
Job 3: [3, 6] profit=70

Overlapping check:
- Job 0 [1,3] overlaps with Job 1 [2,4]? YES (2 < 3)
- Job 0 [1,3] overlaps with Job 2 [3,5]? NO (3 >= 3)
- Job 0 [1,3] overlaps with Job 3 [3,6]? NO (3 >= 3)

## Step 2: Brute Force Approach

Try all subsets: 2^n = 16 combinations
- {} → profit = 0
- {0} → profit = 50
- {1} → profit = 10
- {0,2} → profit = 90 (0 ends at 3, 2 starts at 3 ✓)
- {0,3} → profit = 120 (0 ends at 3, 3 starts at 3 ✓) ← MAX!

Brute force works but O(2^n) - too slow for n=50,000!

## Step 3: Dynamic Programming Insight

Key observation: If we sort jobs by end time, we can use DP!

dp[i] = max profit using jobs 0..i

For each job i:
  Option 1: Skip job i → dp[i] = dp[i-1]
  Option 2: Take job i → dp[i] = profit[i] + dp[last_compatible_job]

Finding last_compatible_job:
  - Need job j where endTime[j] <= startTime[i]
  - Binary search on sorted end times! (O(log n))

## Step 4: Algorithm Design

1. Sort jobs by end time
2. For each job i:
   - Binary search to find latest job j where endTime[j] <= startTime[i]
   - dp[i] = max(dp[i-1], profit[i] + dp[j])
3. Return dp[n-1]

Time: O(n log n) - sorting + binary search per job
Space: O(n) - DP array

## Step 5: Edge Cases to Test

- Single job → profit = job profit
- All jobs overlap → profit = max single job
- No jobs → profit = 0
- Jobs already sorted vs unsorted
- Identical start/end times

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ Reasoning complete. Generating solution...

🐍 Generating Python Implementation...

def jobScheduling(startTime, endTime, profit):
    # Combine and sort jobs by end time
    jobs = sorted(zip(startTime, endTime, profit), key=lambda x: x[1])
    n = len(jobs)

    # Binary search for last compatible job
    def find_last_compatible(i):
        left, right = 0, i - 1
        result = -1
        while left <= right:
            mid = (left + right) // 2
            if jobs[mid][1] <= jobs[i][0]:  # endTime[mid] <= startTime[i]
                result = mid
                left = mid + 1
            else:
                right = mid - 1
        return result

    # DP array
    dp = [0] * n
    dp[0] = jobs[0][2]  # First job profit

    for i in range(1, n):
        # Option 1: Skip job i
        skip = dp[i - 1]

        # Option 2: Take job i
        take = jobs[i][2]  # Current job profit
        last_compatible = find_last_compatible(i)
        if last_compatible != -1:
            take += dp[last_compatible]

        dp[i] = max(skip, take)

    return dp[n - 1]

✅ Solution generated: solution.py

🧪 Generating Test Cases...

# Test 1: Example from problem
assert jobScheduling([1,2,3,3], [3,4,5,6], [50,10,40,70]) == 120

# Test 2: All overlapping jobs
assert jobScheduling([1,1,1], [2,3,4], [5,6,4]) == 6

# Test 3: No overlapping jobs
assert jobScheduling([1,2,3], [2,3,4], [5,6,4]) == 15

# Test 4: Single job
assert jobScheduling([1], [2], [100]) == 100

# Test 5: Two identical jobs
assert jobScheduling([1,1], [2,2], [50,50]) == 50

Running tests...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Test 1: PASSED (120 == 120)
✅ Test 2: PASSED (6 == 6)
✅ Test 3: PASSED (15 == 15)
✅ Test 4: PASSED (100 == 100)
✅ Test 5: PASSED (50 == 50)

All tests passed! ✓

📊 Complexity Analysis:
├─ Time: O(n log n) - sorting + binary search
├─ Space: O(n) - DP array
└─ Verified against 50,000 job test set ✓

💡 Key Insights (from scratchpad):
1. Sorting by end time enables DP
2. Binary search finds last compatible job in O(log n)
3. Trade-off: Skip vs Take each job

Would you like me to:
1. Add more test cases?
2. Optimize space to O(1)?
3. Explain the binary search in more detail?

What You See:

  • Step-by-step problem breakdown in scratchpad file
  • Opus-powered reasoning for algorithm design
  • Generated Python solution with binary search optimization
  • Comprehensive test suite with edge cases
  • Complexity analysis and verification

The Core Question You’re Answering

“How can AI leverage extended reasoning (Claude Opus) and structured scratchpad files to solve problems that require multi-step logical deduction?”

This project teaches metacognitive problem-solving: using an AI system that knows when to slow down and think deeply. Switching to Opus for hard problems is analogous to a developer pausing to whiteboard a solution before coding.


Concepts You Must Understand First

Stop and research these before coding:

  1. Dynamic Programming Fundamentals
    • What is overlapping subproblems and optimal substructure?
    • How do you identify when DP is applicable?
    • What is the difference between top-down (memoization) and bottom-up (tabulation)?
    • Book Reference: “Introduction to Algorithms” (CLRS) - Ch. 15
  2. Binary Search and Divide-and-Conquer
    • How does binary search achieve O(log n) time?
    • What invariants must be maintained during binary search?
    • When is binary search applicable (sorted data, monotonic function)?
    • Book Reference: “Algorithms” by Sedgewick & Wayne - Ch. 3.1
  3. Problem Decomposition Strategies
    • How do you break a complex problem into smaller sub-problems?
    • What is the “simplify and generalize” technique?
    • How do you identify base cases vs recursive cases?
    • Book Reference: “How to Solve It” by George Pólya - Ch. 1-2
  4. Complexity Analysis
    • How do you calculate time complexity (Big-O notation)?
    • What is the difference between time complexity and space complexity?
    • How do you identify bottlenecks (profiling, asymptotic analysis)?
    • Book Reference: “Grokking Algorithms” by Aditya Bhargava - Ch. 1-2

Questions to Guide Your Design

Before implementing, think through these:

  1. Model Selection Strategy
    • When should you use Opus (slow, expensive, deep reasoning)?
    • When should you use Sonnet (fast, cheaper, good for most tasks)?
    • How do you detect that a problem requires deep reasoning?
    • How do you balance cost vs solution quality?
  2. Scratchpad Design
    • What should go in the scratchpad (examples, edge cases, algorithm sketches)?
    • How do you structure the scratchpad (markdown sections, code blocks)?
    • How do you prevent the scratchpad from growing too large (context limits)?
    • How do you use the scratchpad to validate the final solution?
  3. Test Case Generation
    • How do you generate edge cases (empty input, single element, all same values)?
    • How do you generate stress tests (large n, worst-case inputs)?
    • How do you verify correctness (test against brute force, known solutions)?
    • How do you measure coverage (all branches, all edge cases)?
  4. Solution Validation
    • How do you verify time complexity matches the theoretical analysis?
    • How do you test on large inputs (n=50,000, n=100,000)?
    • How do you handle time limit exceeded (TLE) failures?
    • How do you iterate when tests fail (analyze failure, refine algorithm)?

Thinking Exercise

Exercise: Solve a Two-Pointer Problem with Scratchpad

Problem: “Three Sum - Find all unique triplets in an array that sum to zero”

# Input: nums = [-1, 0, 1, 2, -1, -4]
# Output: [[-1, -1, 2], [-1, 0, 1]]

Scratchpad Template:

## Step 1: Understand the Problem
- Input: Array of integers (may have duplicates)
- Output: All unique triplets that sum to 0
- Constraint: No duplicate triplets in result

## Step 2: Examples and Edge Cases
Example 1: [-1, 0, 1, 2, -1, -4]
  Sort: [-4, -1, -1, 0, 1, 2]
  Fix -4: need two numbers that sum to 4 → no solution
  Fix -1: need two numbers that sum to 1 → (-1, 0, 1) ✓ and (-1, 2, -1) ✓

Edge cases:
- All zeros: [0, 0, 0] → [[0, 0, 0]]
- No solution: [1, 2, 3] → []
- Duplicates: [-1, -1, 0, 1, 1] → [[-1, 0, 1]]

## Step 3: Brute Force
Try all triplets: O(n³)
for i in range(n):
  for j in range(i+1, n):
    for k in range(j+1, n):
      if nums[i] + nums[j] + nums[k] == 0:
        result.append([nums[i], nums[j], nums[k]])

Too slow for n=3000!

## Step 4: Optimized Approach
Sort array: O(n log n)
Fix first element, use two pointers for remaining: O(n²)

Questions to answer:

  • How do you avoid duplicate triplets (skip same values)?
  • How do you move the two pointers (left++, right–)?
  • What is the time complexity (O(n²) after sorting)?
  • How do you test this (unit tests, large inputs)?

The Interview Questions They’ll Ask

  1. “Explain the difference between using Claude Sonnet vs Claude Opus. When would you choose each?”

  2. “How would you design a scratchpad system that prevents context overflow (scratchpad growing too large)?”

  3. “Walk me through how you’d debug a dynamic programming solution that passes small tests but fails large ones.”

  4. “How would you validate that an AI-generated algorithm matches the claimed time complexity?”

  5. “Explain how you’d use test-driven development with AI code generation (write tests first, then generate solution).”

  6. “How would you prevent AI from overfitting to test cases (generating code that passes tests but is wrong)?”


Hints in Layers

Hint 1: Opus Model Selection Use Opus when:

  • Problem requires multi-step logical reasoning (DP, graph algorithms)
  • Problem has non-obvious insights (greedy choice, invariant)
  • Initial Sonnet solution fails tests and you need deeper analysis
  • You need detailed explanations for teaching/understanding

Use Sonnet when:

  • Problem is straightforward (two-pointer, hash map)
  • You need fast iteration (testing multiple approaches)
  • Cost is a concern (Opus is 5x more expensive)

Hint 2: Scratchpad Structure

# Problem: [Title]

## Step 1: Problem Understanding
[Restate in own words, identify constraints]

## Step 2: Examples (3-5 examples, including edge cases)
[Work through examples by hand]

## Step 3: Brute Force
[Naive O(n²) or O(2^n) solution, analyze why it's slow]

## Step 4: Optimization Insight
[Key observation that leads to faster solution]

## Step 5: Algorithm Design
[Pseudocode or step-by-step description]

## Step 6: Complexity Analysis
[Time and space complexity with justification]

## Step 7: Edge Cases
[List all edge cases to test]

Hint 3: Test Generation Strategy Generate tests in this order:

  1. Example from problem statement (sanity check)
  2. Edge cases (empty, single element, all same)
  3. Boundary cases (min/max values, size limits)
  4. Stress tests (large n, worst-case complexity)
  5. Random tests (fuzzing for unexpected bugs)

Hint 4: Solution Validation Loop

1. Generate solution using Opus + scratchpad
2. Run tests (example + edge + stress)
3. If all pass → done ✓
4. If some fail:
   a. Analyze failure (wrong output, TLE, crash)
   b. Add failing case to scratchpad
   c. Ask Opus to refine solution
   d. Go to step 2

Hint 5: Complexity Verification

import time

# Test time complexity empirically
for n in [100, 1000, 10000, 100000]:
    input_data = generate_test_input(n)
    start = time.time()
    result = jobScheduling(input_data)
    elapsed = time.time() - start
    print(f"n={n:6d} time={elapsed:.4f}s")

# Expected for O(n log n):
# n=100     time=0.0001s
# n=1000    time=0.0012s  (10x input → ~10x time)
# n=10000   time=0.015s   (10x input → ~13x time)
# n=100000  time=0.20s    (10x input → ~13x time)

Books That Will Help

Topic Book Chapter
Dynamic programming “Introduction to Algorithms” (CLRS) Ch. 15 (Dynamic Programming)
Problem-solving strategies “How to Solve It” by George Pólya Ch. 1-2 (Understanding, Devising a Plan)
Algorithm design “Algorithms” by Sedgewick & Wayne Ch. 3-6 (Sorting, Searching, DP)
Complexity analysis “Grokking Algorithms” by Aditya Bhargava Ch. 1-2 (Big-O Notation)
Competitive programming “Competitive Programming 4” by Steven Halim Ch. 3 (Problem Solving Paradigms)
Interview prep “Cracking the Coding Interview” by Gayle McDowell Ch. 8-9 (DP, Recursion)

Common Pitfalls & Debugging

Problem 1: “Solution passes small tests but gets Time Limit Exceeded (TLE) on large inputs”

  • Why: Algorithm is O(n³) or O(2^n), too slow for n=50,000
  • Fix: Analyze complexity in scratchpad, find O(n log n) or O(n²) solution
  • Quick test: Run on n=100, n=1000, n=10000 and measure time scaling

Problem 2: “Opus generates a solution but doesn’t explain the key insight”

  • Why: Opus jumps to solution without showing reasoning steps
  • Fix: Ask Opus to “work through examples in the scratchpad first, then generate code”
  • Pattern: Always request scratchpad-driven reasoning before code

Problem 3: “Generated solution is correct but hard to understand (no comments, cryptic variable names)”

  • Why: Optimization prioritized over readability
  • Fix: Ask for “readable solution with comments explaining each step”
  • Example: Change dp[i] = max(dp[i-1], p[i] + dp[f(i)]) to include comment explaining skip vs take

Problem 4: “Tests pass but solution fails on edge case not in test suite”

  • Why: Incomplete test coverage (missing edge case)
  • Fix: Add property-based testing or fuzz testing
  • Example: Use Hypothesis library to generate random test inputs

Problem 5: “Binary search has off-by-one error (infinite loop or wrong answer)”

  • Why: Incorrect loop invariant (left <= right vs left < right)
  • Fix: Trace through binary search by hand on small example
  • Debugging: Print left, right, mid at each iteration to see what’s wrong

Problem 6: “Opus uses too much context analyzing the problem (exceeds token limit)”

  • Why: Scratchpad includes too much detail (full trace of all examples)
  • Fix: Summarize examples instead of full traces
  • Pattern: “Show 2-3 key examples, not all 10 test cases”

Definition of Done

  • Scratchpad file shows step-by-step reasoning (problem understanding → brute force → optimization)
  • Solution is generated using Opus model (verify model selection in logs)
  • All example test cases from problem statement pass
  • Edge cases are identified in scratchpad and tested
  • Stress test with large input (n=50,000) completes in <1 second
  • Time complexity matches theoretical analysis (empirical verification)
  • Space complexity is documented and justified
  • Solution includes comments explaining key steps (binary search, DP transition)
  • Code is readable (meaningful variable names, no cryptic abbreviations)
  • Complexity analysis is included in output (Big-O with justification)
  • Solution generalizes beyond test cases (not hardcoded for specific inputs)

Project 36: “The Global Translator” — Internationalization

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language JSON (i18n)
Coolness Level Level 2: Practical
Difficulty Level 1: Beginner
Knowledge Area Internationalization

What you’ll build: An automated translation system that takes your application’s English locale files and generates accurate, context-aware translations for multiple languages while preserving key structure, placeholders, and formatting.

Why it teaches Grunt Work: This project demonstrates how AI agents excel at high-volume, high-precision tasks that are tedious for humans but critical for global applications. You’ll learn to leverage Kiro’s language understanding to maintain translation consistency across hundreds of strings.

Core challenges you’ll face:

  • Key structure preservation → Maps to nested object validation and structural equality checking
  • Placeholder detection → Maps to regex patterns for interpolation variables ({name}, {{count}}, %s)
  • Context-aware translation → Maps to providing semantic context to avoid literal translations
  • Pluralization rules → Maps to ICU MessageFormat and language-specific plural forms

Real World Outcome

You’ll have a CLI tool that transforms a single English locale file into multiple language files with verified structural integrity:

$ ls locales/
en.json

$ cat locales/en.json
{
  "common": {
    "welcome": "Welcome, {name}!",
    "itemCount": "{count, plural, =0 {No items} =1 {One item} other {# items}}",
    "errors": {
      "notFound": "The resource at {path} was not found",
      "unauthorized": "You must be logged in to access this page"
    }
  },
  "dashboard": {
    "greeting": "Good {timeOfDay}, {userName}",
    "stats": {
      "totalUsers": "{count} active users",
      "lastLogin": "Last login: {date, date, medium}"
    }
  }
}

$ kiro translate --source locales/en.json --target es,fr,de,ja,pt

[Kiro CLI Session]
🌍 Global Translator Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📋 Source: locales/en.json (47 keys)
🎯 Target languages: es, fr, de, ja, pt

Step 1: Analyzing source structure...
  ✓ Found 47 translation keys
  ✓ Detected 12 placeholders: {name}, {count}, {path}, {timeOfDay}, {userName}, {date}
  ✓ Identified 3 plural forms (ICU MessageFormat)

Step 2: Extracting context...
  • "welcome" → greeting context, informal tone
  • "itemCount" → UI element, requires plural support
  • "errors.notFound" → error message, technical context
  • "dashboard.greeting" → personalized greeting with time

Step 3: Translating to Spanish (es)...
  ✓ 47/47 keys translated
  ✓ All placeholders preserved
  ✓ Plural rules adapted for Spanish

Step 4: Translating to French (fr)...
  ✓ 47/47 keys translated
  ✓ Gender-neutral alternatives used
  ✓ Plural rules adapted for French

Step 5: Translating to German (de)...
  ✓ 47/47 keys translated
  ✓ Formal "Sie" used for general UI
  ✓ Plural rules adapted for German

Step 6: Translating to Japanese (ja)...
  ✓ 47/47 keys translated
  ✓ Polite form (です/ます) used
  ✓ No plural forms (Japanese uses counters)

Step 7: Translating to Portuguese (pt)...
  ✓ 47/47 keys translated
  ✓ Brazilian Portuguese variant selected
  ✓ Plural rules adapted for Portuguese

Step 8: Validation...
  ✓ All files have identical key structures
  ✓ All placeholders match source
  ✓ No missing or extra keys
  ✓ ICU MessageFormat syntax valid

✅ Translation complete!

Generated files:
  locales/es.json (47 keys)
  locales/fr.json (47 keys)
  locales/de.json (47 keys)
  locales/ja.json (47 keys)
  locales/pt.json (47 keys)

$ cat locales/es.json
{
  "common": {
    "welcome": "¡Bienvenido, {name}!",
    "itemCount": "{count, plural, =0 {Sin elementos} =1 {Un elemento} other {# elementos}}",
    "errors": {
      "notFound": "El recurso en {path} no fue encontrado",
      "unauthorized": "Debes iniciar sesión para acceder a esta página"
    }
  },
  "dashboard": {
    "greeting": "Buenas {timeOfDay}, {userName}",
    "stats": {
      "totalUsers": "{count} usuarios activos",
      "lastLogin": "Último acceso: {date, date, medium}"
    }
  }
}

$ npm run validate-locales

> Validating locale files...
✓ All 5 locale files have matching key structure
✓ All placeholders are consistent
✓ ICU MessageFormat syntax is valid
✓ No duplicate keys found

# Your application now supports 5 languages with zero manual translation!

Exactly what happens:

  1. Kiro reads your English locale file and builds a structural map
  2. For each target language, it translates strings while preserving:
    • Nested object structure (exact key paths)
    • Placeholder syntax ({variableName})
    • ICU MessageFormat plural rules
    • HTML entities and special characters
  3. It validates every generated file against the source structure
  4. It produces a diff report showing what changed beyond just the text

The Core Question You’re Answering

“How do you leverage AI for high-volume, high-precision tasks that require both creativity (natural translation) and rigid constraints (structural validation)?”

This is the essence of “AI as a power tool” — you’re not just prompting for a translation; you’re building a system that:

  • Uses AI’s language understanding for quality translations
  • Enforces programmatic validation to prevent structural drift
  • Scales to hundreds of languages without manual effort
  • Maintains consistency across all locales

Concepts You Must Understand First

Stop and research these before coding:

  1. Internationalization (i18n) vs Localization (l10n)
    • What’s the difference between i18n and l10n?
    • What are locale codes (en-US, pt-BR, zh-CN)?
    • What is the difference between language and region?
    • Book Reference: “Internationalization and Localization Using Microsoft .NET” by Nick Symmonds - Ch. 1-2
  2. ICU MessageFormat
    • What are interpolation placeholders?
    • How do plural rules work in different languages? (English: one/other, Russian: one/few/many/other)
    • What is gender-based inflection?
    • Reference: ICU MessageFormat Specification (https://unicode-org.github.io/icu/userguide/format_parse/messages/)
  3. JSON Structure Validation
    • How do you recursively compare nested objects?
    • What’s the difference between shallow and deep equality?
    • How do you validate that two JSON files have the same keys but different values?
    • Book Reference: “JavaScript: The Definitive Guide” by David Flanigan - Ch. 6 (Objects)
  4. Context-Aware Translation
    • Why does “Welcome” translate differently in formal vs informal contexts?
    • What are gender-neutral alternatives in gendered languages (French, Spanish)?
    • How do you handle culturally-specific idioms?
    • Book Reference: “Found in Translation” by Nataly Kelly & Jost Zetzsche

Questions to Guide Your Design

Before implementing, think through these:

  1. Structural Validation
    • How will you detect if a translated file is missing a key?
    • How will you validate that placeholders weren’t accidentally translated?
    • What happens if the source file changes after translations are generated?
    • Should you support nested keys with dot notation (errors.notFound) or only nested objects?
  2. Translation Quality
    • How do you provide context to Kiro for better translations? (e.g., “button label” vs “error message”)
    • Should you allow overrides for specific keys that need manual review?
    • How do you handle brand names or technical terms that shouldn’t be translated?
    • What about HTML tags inside strings (<strong>Bold</strong> text)?
  3. Plural Handling
    • How many plural forms does each language have? (English: 2, Arabic: 6)
    • Should you use ICU MessageFormat or a simpler plural syntax?
    • What happens if the source uses English plural rules but the target language needs different forms?
  4. Incremental Updates
    • If you add 5 new keys to en.json, should the tool re-translate everything or just the new keys?
    • How do you track which translations are human-reviewed vs AI-generated?
    • Should you version your locale files?

Thinking Exercise

Placeholder Preservation Challenge

You have this English string:

"greeting": "Hello, {name}! You have {count} new {count, plural, =1 {message} other {messages}}."

Kiro translates it to Spanish as:

"greeting": "¡Hola, {nombre}! Tienes {cantidad} {cantidad, plural, =1 {mensaje nuevo} other {mensajes nuevos}}."

Questions to reason through:

  1. What went wrong? (Hint: look at the placeholder names)
  2. How would your validation catch this error?
  3. What instruction would you give Kiro to prevent placeholder name changes?
  4. How would you handle the fact that Spanish puts the adjective after the noun (“mensajes nuevos” vs “new messages”)?
  5. Should you allow Kiro to reorder placeholders if the target language has different word order?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How would you handle right-to-left (RTL) languages like Arabic or Hebrew in your translation system?”
  2. “What’s the difference between i18n and l10n? Give a concrete example.”
  3. “How do you ensure that date/time formatting respects locale conventions (MM/DD/YYYY vs DD/MM/YYYY)?”
  4. “Your translation system accidentally translated ‘Apple’ (the company) to ‘Manzana’ (the fruit). How do you prevent this?”
  5. “What are the security implications of allowing user-provided locale files in your application?”
  6. “How would you implement translation fallback chains? (e.g., pt-BR → pt → en)”

Hints in Layers

Hint 1: Start with Structure Validation Before you translate anything, write a function that:

  • Reads the source JSON
  • Extracts all key paths (e.g., ["common.welcome", "common.itemCount", "common.errors.notFound"])
  • Validates that a translated file has exactly the same key paths
  • This becomes your “validator” that runs after every translation

Hint 2: Placeholder Detection Extract all placeholders from the source strings using regex:

const placeholderRegex = /\{[^}]+\}/g;
const placeholders = sourceString.match(placeholderRegex);
// ["name}"], ["{count}"]

For each translated string, verify the same placeholders exist (exact match).

Hint 3: Kiro Prompt Structure Give Kiro context for each string:

Translate the following strings to {targetLanguage}.

CRITICAL RULES:
1. Preserve ALL placeholders exactly as written: {name}, {count}, {path}
2. Do NOT translate placeholder names
3. Maintain ICU MessageFormat syntax for plurals
4. Use {formality} tone (formal/informal)
5. Keep HTML tags unchanged

Context: {stringContext} (e.g., "error message", "button label", "greeting")

Source (English):
{
  "key": "value with {placeholder}"
}

Return ONLY valid JSON with the same structure.

Hint 4: Validation Loop After translation:

  1. Parse both source and target JSON
  2. Extract key paths from both (recursive traversal)
  3. Assert: sourceKeys === targetKeys (same order, same depth)
  4. For each key, extract placeholders from source and target
  5. Assert: sourcePlaceholders === targetPlaceholders
  6. If validation fails, show diff and ask if Kiro should retry

Books That Will Help

Topic Book Chapter
i18n fundamentals “Internationalization and Localization Using Microsoft .NET” by Nick Symmonds Ch. 1-3
JSON manipulation “JavaScript: The Definitive Guide” by David Flanigan Ch. 6
ICU MessageFormat ICU User Guide (online) Message Formatting
Translation best practices “Found in Translation” by Nataly Kelly Ch. 4, 7

Common Pitfalls and Debugging

Problem 1: “Placeholders are being translated”

  • Why: Kiro doesn’t know that {name} is a variable, not English text
  • Fix: Explicitly instruct: “Do NOT translate text inside curly braces {}
  • Quick test: grep -o '{[^}]*}' locales/es.json should match grep -o '{[^}]*}' locales/en.json

Problem 2: “Key structure doesn’t match”

  • Why: JSON parsing errors or Kiro adding/removing keys
  • Fix: Write a structural diff function that shows which keys are missing/extra
  • Quick test:
    const sourceKeys = Object.keys(flattenObject(source)).sort();
    const targetKeys = Object.keys(flattenObject(target)).sort();
    console.log("Missing:", sourceKeys.filter(k => !targetKeys.includes(k)));
    console.log("Extra:", targetKeys.filter(k => !sourceKeys.includes(k)));
    

Problem 3: “Plural forms are broken in the target language”

  • Why: Different languages have different plural categories (English: 2, Russian: 4, Arabic: 6)
  • Fix: Use a plural rules library (e.g., make-plural) to generate correct ICU syntax
  • Quick test: Validate with intl-messageformat parser before writing the file

Problem 4: “HTML tags inside strings are malformed after translation”

  • Why: Kiro might rearrange or escape HTML: <strong>&lt;strong&gt;
  • Fix: Instruct Kiro: “Preserve all HTML tags exactly as written, including attributes”
  • Quick test: Count < and > characters before and after translation

Definition of Done

  • Source locale file is parsed and all key paths are extracted
  • For each target language, a valid JSON file is generated with identical structure
  • All placeholders in translated strings match the source (name and count)
  • ICU MessageFormat plural syntax is valid and adapted to target language plural rules
  • No HTML tags or special characters are malformed
  • A validation report shows 100% structural match across all locale files
  • Incremental updates work: adding a new key to source updates only that key in translations
  • Tool outputs a human-readable diff showing what changed beyond just translations

Project 37: “The SQL Optimizer” — Database Performance

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language SQL
Coolness Level Level 3: Genuinely Clever
Difficulty Level 3: Advanced
Knowledge Area Database Performance

What you’ll build: A SQL performance analyzer that takes slow queries, runs EXPLAIN ANALYZE, identifies bottlenecks (missing indexes, inefficient joins, full table scans), and generates optimized queries with index recommendations.

Why it teaches Analysis: This project shows how Kiro can act like a senior DBA by reading query execution plans, understanding database internals, and suggesting data-driven optimizations. You’ll learn to leverage AI for performance tuning that would normally require years of experience.

Core challenges you’ll face:

  • Execution plan interpretation → Maps to understanding EXPLAIN output formats (PostgreSQL, MySQL, etc.)
  • Index recommendation → Maps to analyzing WHERE clauses, JOIN conditions, and ORDER BY columns
  • Query rewriting → Maps to CTE optimization, subquery elimination, join order changes
  • Cost estimation validation → Maps to verifying that suggested changes actually reduce query cost

Real World Outcome

You’ll have a CLI tool that analyzes slow queries and produces actionable optimization recommendations with before/after performance metrics:

$ cat slow_queries.sql
-- Query 1: User dashboard
SELECT u.id, u.email, u.created_at,
       COUNT(DISTINCT o.id) as order_count,
       SUM(o.total_amount) as total_spent,
       MAX(o.created_at) as last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2023-01-01'
  AND o.status IN ('completed', 'shipped')
GROUP BY u.id, u.email, u.created_at
ORDER BY total_spent DESC
LIMIT 100;

$ kiro sql-optimize --file slow_queries.sql --database postgres --explain

[Kiro CLI Session]
🔍 SQL Optimizer Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📋 Analyzing Query 1: User dashboard
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Running EXPLAIN ANALYZE...

QUERY PLAN (Original)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Limit  (cost=45623.22..45623.47 rows=100 width=56)
       (actual time=3847.234..3847.289 rows=100 loops=1)
  ->  Sort  (cost=45623.22..45891.78 rows=107424 width=56)
            (actual time=3847.232..3847.261 rows=100 loops=1)
        Sort Key: (sum(o.total_amount)) DESC
        Sort Method: top-N heapsort  Memory: 32kB
        ->  HashAggregate  (cost=38456.89..39531.13 rows=107424 width=56)
                           (actual time=3621.445..3742.667 rows=125000 loops=1)
              Group Key: u.id, u.email, u.created_at
              ->  Hash Left Join  (cost=12456.00..34789.56 rows=244488 width=48)
                                  (actual time=124.567..2345.789 rows=456000 loops=1)
                    Hash Cond: (u.id = o.user_id)
                    ->  Seq Scan on users u  (cost=0.00..3567.00 rows=125000 width=24)
                                             (actual time=0.012..45.678 rows=125000 loops=1)
                          Filter: (created_at > '2023-01-01'::date)
                          Rows Removed by Filter: 75000
                    ->  Hash  (cost=8934.00..8934.00 rows=244488 width=24)
                             (actual time=123.456..123.456 rows=456000 loops=1)
                          Buckets: 65536  Batches: 8  Memory Usage: 4567kB
                          ->  Seq Scan on orders o  (cost=0.00..8934.00 rows=244488 width=24)
                                                    (actual time=0.023..67.891 rows=456000 loops=1)
                                Filter: (status = ANY ('{completed,shipped}'::text[]))
                                Rows Removed by Filter: 144000

Planning Time: 1.234 ms
Execution Time: 3847.567 ms  ⚠️  SLOW!

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 2: Identifying Bottlenecks...

🚨 CRITICAL ISSUES FOUND:
  1. Sequential Scan on 'users' (125k rows, Filter removes 75k)
     → Missing index on users.created_at
     Cost impact: ~45ms per scan

  2. Sequential Scan on 'orders' (600k rows, Filter removes 144k)
     → Missing index on orders.status
     Cost impact: ~68ms per scan

  3. HashAggregate on 107k rows
     → Could be reduced with filtered join
     Cost impact: ~121ms

  4. Hash Join with 8 batches (memory spill to disk)
     → work_mem too low (current: 4MB, needed: ~8MB)
     Cost impact: ~200ms (I/O overhead)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 3: Generating Index Recommendations...

CREATE INDEX CONCURRENTLY idx_users_created_at
  ON users(created_at)
  WHERE created_at > '2020-01-01';  -- Partial index (smaller, faster)

CREATE INDEX CONCURRENTLY idx_orders_user_status
  ON orders(user_id, status)
  INCLUDE (total_amount, created_at);  -- Covering index (no table lookup)

Estimated index sizes:
  • idx_users_created_at: 3.2 MB
  • idx_orders_user_status: 28.4 MB

Total disk space required: 31.6 MB

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 4: Optimized Query (v1)...

-- Optimized version: filtered join + covering index
WITH recent_users AS (
  SELECT id, email, created_at
  FROM users
  WHERE created_at > '2023-01-01'  -- Index scan on idx_users_created_at
)
SELECT ru.id, ru.email, ru.created_at,
       COUNT(DISTINCT o.id) as order_count,
       SUM(o.total_amount) as total_spent,
       MAX(o.created_at) as last_order_date
FROM recent_users ru
LEFT JOIN orders o
  ON ru.id = o.user_id
  AND o.status IN ('completed', 'shipped')  -- Index scan on idx_orders_user_status
GROUP BY ru.id, ru.email, ru.created_at
ORDER BY total_spent DESC
LIMIT 100;

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 5: Running EXPLAIN ANALYZE (Optimized)...

QUERY PLAN (Optimized)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Limit  (cost=8934.56..8934.81 rows=100 width=56)
       (actual time=287.123..287.156 rows=100 loops=1)
  ->  Sort  (cost=8934.56..9203.12 rows=107424 width=56)
            (actual time=287.121..287.142 rows=100 loops=1)
        Sort Key: (sum(o.total_amount)) DESC
        Sort Method: top-N heapsort  Memory: 32kB
        ->  HashAggregate  (cost=6234.89..7309.13 rows=107424 width=56)
                           (actual time=245.678..267.891 rows=125000 loops=1)
              Group Key: ru.id, ru.email, ru.created_at
              ->  Hash Left Join  (cost=3456.00..4789.56 rows=244488 width=48)
                                  (actual time=34.567..178.234 rows=456000 loops=1)
                    Hash Cond: (ru.id = o.user_id)
                    ->  Index Scan using idx_users_created_at on users(cost=0.29..845.00 rows=125000 width=24)
                          (actual time=0.012..12.345 rows=125000 loops=1)
                          Index Cond: (created_at > '2023-01-01'::date)
                    ->  Hash  (cost=2345.00..2345.00 rows=244488 width=24)
                             (actual time=34.123..34.123 rows=456000 loops=1)
                          Buckets: 65536  Batches: 1  Memory Usage: 8234kB  ✓
                          ->  Index Scan using idx_orders_user_status on orders  ✓
                                (cost=0.43..2345.00 rows=244488 width=24)
                                (actual time=0.023..18.456 rows=456000 loops=1)
                                Index Cond: (status = ANY ('{completed,shipped}'::text[]))

Planning Time: 0.987 ms
Execution Time: 287.456 ms  ✅  13.4x FASTER!

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 Performance Summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Before:  3847.567 ms
  After:    287.456 ms
  Speedup:  13.4x

  Improvements:
    • Sequential Scans eliminated: 2
    • Index Scans added: 2
    • Memory spills reduced: 1 (8 batches → 1 batch)
    • Rows filtered earlier: 144k (at index level)

✅ Optimizations applied successfully!

Would you like to:
  1. Apply indexes to database? (CREATE INDEX commands)
  2. See alternative query plans?
  3. Analyze another query?

Exactly what happens:

  1. Kiro runs EXPLAIN ANALYZE on your slow query
  2. It identifies bottlenecks: sequential scans, missing indexes, inefficient joins
  3. It recommends specific indexes with size estimates
  4. It rewrites the query to leverage those indexes
  5. It re-runs EXPLAIN ANALYZE and shows before/after metrics
  6. You get actionable SQL commands to apply the optimizations

The Core Question You’re Answering

“How do you teach an AI to think like a database query optimizer and recommend performance improvements based on execution plans?”

This is about giving Kiro the mental model of a senior DBA:

  • Understanding what “Seq Scan” vs “Index Scan” means for performance
  • Recognizing when a query will cause memory spills or hash join overflow
  • Knowing which columns benefit from indexes based on WHERE/JOIN/ORDER BY usage
  • Validating that optimizations actually improve cost (not just guessing)

Concepts You Must Understand First

Stop and research these before coding:

  1. EXPLAIN and EXPLAIN ANALYZE
    • What’s the difference between EXPLAIN (estimated) and EXPLAIN ANALYZE (actual)?
    • What are the key metrics: cost, rows, width, actual time, loops?
    • How do you interpret nested plan nodes?
    • Book Reference: “PostgreSQL: Up and Running” by Regina Obe & Leo Hsu - Ch. 10
  2. Index Types and Use Cases
    • B-tree vs Hash vs GiST vs GIN indexes
    • When is a partial index better than a full index?
    • What is a covering index (INCLUDE columns)?
    • Why doesn’t PostgreSQL use an index for LIKE '%pattern%'?
    • Book Reference: “SQL Performance Explained” by Markus Winand - Ch. 2-3
  3. Query Execution Plans
    • What is a Sequential Scan and when is it acceptable?
    • What is a Hash Join vs Nested Loop vs Merge Join?
    • What does “Rows Removed by Filter” indicate?
    • What is the significance of “Batches” in a Hash operation?
    • Book Reference: “PostgreSQL Query Optimization” by Henrietta Dombrovskaya - Ch. 4
  4. Cost-Based Optimization
    • How does the planner estimate query cost?
    • What are random_page_cost and seq_page_cost?
    • Why might the planner choose a slower plan than you expect?
    • Book Reference: “Database Internals” by Alex Petrov - Ch. 12

Questions to Guide Your Design

Before implementing, think through these:

  1. Execution Plan Parsing
    • How will you parse the text output of EXPLAIN ANALYZE? (Regex? Structured JSON format?)
    • Different databases have different EXPLAIN formats (PostgreSQL vs MySQL vs SQLite). Will you support multiple?
    • How do you extract the key bottlenecks programmatically?
    • Should you use EXPLAIN (FORMAT JSON) for easier parsing?
  2. Index Recommendation Logic
    • How do you identify which columns should be indexed?
    • What if a column is already indexed but the index isn’t being used?
    • Should you recommend composite indexes for multi-column WHERE clauses?
    • How do you avoid recommending too many indexes (index bloat)?
  3. Query Rewriting
    • When should you suggest a CTE vs a subquery?
    • How do you know if reordering joins will help?
    • What if the query uses features Kiro doesn’t understand (window functions, recursive CTEs)?
    • Should you preserve query semantics exactly or allow minor changes?
  4. Validation
    • How do you ensure the optimized query returns the same results?
    • What if the index recommendations require more disk space than available?
    • Should you test on a staging database first?
    • How do you handle queries that are already well-optimized?

Thinking Exercise

Index Selection Challenge

You have this query:

SELECT p.id, p.title, p.created_at, u.username, COUNT(c.id) as comment_count
FROM posts p
JOIN users u ON p.author_id = u.id
LEFT JOIN comments c ON p.id = c.post_id
WHERE p.published = true
  AND p.created_at > NOW() - INTERVAL '30 days'
  AND u.is_active = true
ORDER BY p.created_at DESC, comment_count DESC
LIMIT 20;

EXPLAIN shows:

  • Sequential Scan on posts (2M rows, filtering 1.8M)
  • Sequential Scan on users (500k rows, filtering 100k)
  • Hash Join on comments (5M rows)

Questions to reason through:

  1. Which columns should you index and why?
  2. Should you create separate indexes or one composite index? (posts(published, created_at) vs posts(published) + posts(created_at))
  3. The ORDER BY uses comment_count which is computed. Can you index it?
  4. Would a partial index help here? (WHERE published = true)
  5. What’s the tradeoff between index size and query speed for rarely-used queries?

Proposed indexes:

CREATE INDEX idx_posts_published_created ON posts(published, created_at DESC);
CREATE INDEX idx_users_active ON users(is_active) WHERE is_active = true;
CREATE INDEX idx_comments_post ON comments(post_id);

Are these optimal? What would you change?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the difference between a Sequential Scan and an Index Scan. When might a Sequential Scan actually be faster?”
  2. “Your query uses WHERE user_id = 123, and there’s an index on user_id, but EXPLAIN shows a Seq Scan. Why?”
  3. “What is a covering index and when should you use one?”
  4. “How would you optimize a query with ORDER BY on a non-indexed column?”
  5. “What’s the difference between EXPLAIN and EXPLAIN ANALYZE? When would you use each?”
  6. “A junior developer adds 15 indexes to a table. What problems might this cause?”

Hints in Layers

Hint 1: Start with EXPLAIN Parsing Run EXPLAIN (FORMAT JSON) <query> to get structured output:

{
  "Plan": {
    "Node Type": "Limit",
    "Startup Cost": 45623.22,
    "Total Cost": 45623.47,
    "Plan Rows": 100,
    "Plan Width": 56,
    "Actual Startup Time": 3847.234,
    "Actual Total Time": 3847.289,
    "Actual Rows": 100,
    "Actual Loops": 1,
    "Plans": [...]
  }
}

Parse this JSON to extract:

  • Node types (“Seq Scan”, “Index Scan”, “Hash Join”)
  • Costs (Total Cost, Actual Total Time)
  • Filters (“Rows Removed by Filter”)

Hint 2: Bottleneck Detection Look for these patterns in the execution plan:

  • Seq Scan with high Rows Removed by Filter → missing WHERE index
  • Hash Join with Batches > 1 → memory overflow, need more work_mem
  • Sort with external merge Disk → sort spilled to disk, need index for ORDER BY
  • Nested Loop with high loop count → inefficient join, consider hash join

Hint 3: Index Recommendation Algorithm

For each table in the query:
  1. Extract WHERE conditions → index these columns
  2. Extract JOIN conditions → index foreign keys
  3. Extract ORDER BY columns → index with DESC/ASC matching
  4. If all columns are in SELECT → recommend covering index (INCLUDE)
  5. If WHERE has multiple ANDs → recommend composite index

Hint 4: Validation Loop After generating recommendations:

  1. Create the indexes in a transaction: BEGIN; CREATE INDEX ...; ROLLBACK; (test without committing)
  2. Run EXPLAIN on the optimized query
  3. Compare costs: if new_cost < old_cost * 0.8 (20% improvement), accept
  4. If cost increased, reject the suggestion and explain why

Books That Will Help

Topic Book Chapter
PostgreSQL EXPLAIN “PostgreSQL: Up and Running” by Regina Obe & Leo Hsu Ch. 10
Index fundamentals “SQL Performance Explained” by Markus Winand Ch. 2-3
Query optimization “PostgreSQL Query Optimization” by Henrietta Dombrovskaya Ch. 4-5
Database internals “Database Internals” by Alex Petrov Ch. 12
Execution plans “SQL Tuning” by Dan Tow Ch. 3-4

Common Pitfalls and Debugging

Problem 1: “Index exists but isn’t being used”

  • Why: Index selectivity is too low (planner prefers Seq Scan for large result sets)
  • Fix: Check SELECT COUNT(*) / (SELECT COUNT(*) FROM table) — if >20%, Seq Scan is often faster
  • Quick test: SET enable_seqscan = off; EXPLAIN <query>; (force index usage to compare)

Problem 2: “Optimized query returns different results”

  • Why: Query rewriting changed semantics (e.g., moving WHERE from JOIN to outer query)
  • Fix: Use EXCEPT to find missing/extra rows: (original EXCEPT optimized) UNION (optimized EXCEPT original)
  • Quick test: Hash the results: SELECT MD5(string_agg(row::text, '')) FROM (query) AS row;

Problem 3: “Index recommendation is way too large”

  • Why: Composite index on high-cardinality columns
  • Fix: Use partial indexes with WHERE clause to reduce size
  • Quick test: SELECT pg_size_pretty(pg_relation_size('index_name'));

Problem 4: “EXPLAIN shows Index Scan but query is still slow”

  • Why: Index Scan has high Actual Loops (nested loop with bad join order)
  • Fix: Reorder joins so smaller table is scanned first, or switch to Hash Join
  • Quick test: Check Actual Loops — if >1000, you’re doing a nested loop on too many rows

Definition of Done

  • Tool accepts a SQL query and database connection string
  • EXPLAIN ANALYZE output is captured and parsed (JSON format preferred)
  • Bottlenecks are identified: Seq Scans with high filter cost, missing indexes, memory spills
  • Index recommendations are generated with estimated size and creation SQL
  • Optimized query is generated with CTE or join reordering
  • Before/after EXPLAIN comparison shows measurable improvement (cost reduction >20%)
  • Results are validated: optimized query returns same row count and hash as original
  • Tool outputs a report with: bottlenecks, recommendations, speedup metrics
  • Optional: Tool can auto-apply indexes with CREATE INDEX CONCURRENTLY (no table locks)

Project 38: “The Refactoring Surgeon” — Software Architecture

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language TypeScript
Coolness Level Level 3: Genuinely Clever
Difficulty Level 3: Advanced
Knowledge Area Software Architecture

What you’ll build: A refactoring tool that identifies God Classes (classes with too many responsibilities) and uses Kiro to safely decompose them into focused services, utilities, and domain models while preserving all tests and behavior.

Why it teaches Safe Changes: This project shows how AI agents excel at mechanical refactoring tasks that require understanding code structure, dependency graphs, and test coverage. You’ll learn to use Kiro for large-scale architectural changes that would take days manually.

Core challenges you’ll face:

  • God Class detection → Maps to cyclomatic complexity analysis, SRP (Single Responsibility Principle) violations
  • Dependency extraction → Maps to identifying method dependencies, data flow analysis
  • Safe decomposition → Maps to extract class/method refactorings that preserve semantics
  • Test preservation → Maps to ensuring all tests pass after refactoring

Real World Outcome

You’ll have a CLI tool that analyzes a codebase, identifies bloated classes, and refactors them into clean, testable components:

$ cat src/services/OrderService.ts
// WARNING: God Class - 1,247 lines, 43 methods, cyclomatic complexity: 87

export class OrderService {
  constructor(
    private db: Database,
    private emailClient: EmailClient,
    private paymentGateway: PaymentGateway,
    private inventoryService: InventoryService,
    private shippingService: ShippingService,
    private taxCalculator: TaxCalculator,
    private discountEngine: DiscountEngine
  ) {}

  // Payment methods (8 methods, 312 lines)
  async processPayment(orderId: string) { /* ... */ }
  async refundPayment(orderId: string) { /* ... */ }
  async captureAuthorization(orderId: string) { /* ... */ }
  // ... 5 more payment methods

  // Email methods (6 methods, 189 lines)
  async sendOrderConfirmation(orderId: string) { /* ... */ }
  async sendShippingNotification(orderId: string) { /* ... */ }
  async sendCancellationEmail(orderId: string) { /* ... */ }
  // ... 3 more email methods

  // Inventory methods (7 methods, 234 lines)
  async reserveInventory(orderId: string) { /* ... */ }
  async releaseInventory(orderId: string) { /* ... */ }
  async checkStockAvailability(items: Item[]) { /* ... */ }
  // ... 4 more inventory methods

  // Tax/discount methods (9 methods, 278 lines)
  async calculateTax(order: Order) { /* ... */ }
  async applyDiscount(order: Order, code: string) { /* ... */ }
  // ... 7 more calculation methods

  // Shipping methods (5 methods, 156 lines)
  async calculateShipping(order: Order) { /* ... */ }
  async schedulePickup(orderId: string) { /* ... */ }
  // ... 3 more shipping methods

  // Core order methods (8 methods, 178 lines)
  async createOrder(data: CreateOrderDTO) { /* ... */ }
  async cancelOrder(orderId: string) { /* ... */ }
  async getOrderDetails(orderId: string) { /* ... */ }
  // ... 5 more order methods
}

$ kiro refactor --file src/services/OrderService.ts --analyze

[Kiro CLI Session]
🔧 Refactoring Surgeon Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 Analyzing OrderService.ts...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Metrics:
  • Lines of Code: 1,247
  • Number of Methods: 43
  • Cyclomatic Complexity: 87 ⚠️  (threshold: 20)
  • Coupling (Dependencies): 7 classes
  • Cohesion: LOW (methods operate on different data)

🚨 God Class Detected!

Responsibility Clusters Identified:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. Payment Operations (8 methods, 312 lines)
   → processPayment, refundPayment, captureAuthorization, etc.
   → Dependencies: PaymentGateway
   → Suggested: Extract to PaymentService

2. Email Notifications (6 methods, 189 lines)
   → sendOrderConfirmation, sendShippingNotification, etc.
   → Dependencies: EmailClient
   → Suggested: Extract to OrderNotificationService

3. Inventory Management (7 methods, 234 lines)
   → reserveInventory, releaseInventory, checkStockAvailability
   → Dependencies: InventoryService
   → Suggested: Extract to OrderInventoryCoordinator

4. Tax & Discount Calculations (9 methods, 278 lines)
   → calculateTax, applyDiscount, validateCoupon, etc.
   → Dependencies: TaxCalculator, DiscountEngine
   → Suggested: Extract to OrderPricingService

5. Shipping Coordination (5 methods, 156 lines)
   → calculateShipping, schedulePickup, trackShipment
   → Dependencies: ShippingService
   → Suggested: Extract to ShippingCoordinator

6. Core Order Domain (8 methods, 178 lines)
   → createOrder, cancelOrder, getOrderDetails
   → Dependencies: Database
   → Keep in OrderService (core domain logic)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Refactoring Plan:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Extract PaymentService (8 methods)
Step 2: Extract OrderNotificationService (6 methods)
Step 3: Extract OrderInventoryCoordinator (7 methods)
Step 4: Extract OrderPricingService (9 methods)
Step 5: Extract ShippingCoordinator (5 methods)
Step 6: Update OrderService to use new services (8 core methods remain)
Step 7: Update all imports and tests

Impact:
  • 5 new service classes created
  • OrderService reduced from 1,247 to 178 lines
  • Cyclomatic complexity: 87 → 12
  • Test files to update: 14

Proceed with refactoring? (y/n): y

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Extracting PaymentService...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Created: src/services/payments/PaymentService.ts

export class PaymentService {
  constructor(private paymentGateway: PaymentGateway) {}

  async processPayment(orderId: string, amount: number): Promise<PaymentResult> {
    // [Original logic moved here]
  }

  async refundPayment(orderId: string, amount: number): Promise<void> {
    // [Original logic moved here]
  }

  async captureAuthorization(orderId: string): Promise<void> {
    // [Original logic moved here]
  }

  // ... 5 more methods
}

✓ PaymentService extracted (8 methods, 312 lines)
✓ Tests updated: PaymentService.test.ts created

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 2: Extracting OrderNotificationService...
Created: src/services/notifications/OrderNotificationService.ts
✓ OrderNotificationService extracted (6 methods, 189 lines)
✓ Tests updated: OrderNotificationService.test.ts created

Step 3: Extracting OrderInventoryCoordinator...
Created: src/services/inventory/OrderInventoryCoordinator.ts
✓ OrderInventoryCoordinator extracted (7 methods, 234 lines)
✓ Tests updated: OrderInventoryCoordinator.test.ts created

Step 4: Extracting OrderPricingService...
Created: src/services/pricing/OrderPricingService.ts
✓ OrderPricingService extracted (9 methods, 278 lines)
✓ Tests updated: OrderPricingService.test.ts created

Step 5: Extracting ShippingCoordinator...
Created: src/services/shipping/ShippingCoordinator.ts
✓ ShippingCoordinator extracted (5 methods, 156 lines)
✓ Tests updated: ShippingCoordinator.test.ts created

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 6: Updating OrderService (core domain logic)...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Updated: src/services/OrderService.ts

export class OrderService {
  constructor(
    private db: Database,
    private paymentService: PaymentService,
    private notificationService: OrderNotificationService,
    private inventoryCoordinator: OrderInventoryCoordinator,
    private pricingService: OrderPricingService,
    private shippingCoordinator: ShippingCoordinator
  ) {}

  async createOrder(data: CreateOrderDTO): Promise<Order> {
    // 1. Reserve inventory
    await this.inventoryCoordinator.reserveInventory(data.items);

    // 2. Calculate pricing
    const pricing = await this.pricingService.calculateTotal(data);

    // 3. Create order record
    const order = await this.db.orders.create({
      ...data,
      totalAmount: pricing.total,
      status: 'pending'
    });

    // 4. Process payment
    const paymentResult = await this.paymentService.processPayment(
      order.id,
      pricing.total
    );

    // 5. Update order status
    await this.db.orders.update(order.id, { status: 'confirmed' });

    // 6. Send confirmation email
    await this.notificationService.sendOrderConfirmation(order.id);

    return order;
  }

  async cancelOrder(orderId: string): Promise<void> {
    const order = await this.db.orders.findById(orderId);

    // 1. Refund payment
    await this.paymentService.refundPayment(orderId, order.totalAmount);

    // 2. Release inventory
    await this.inventoryCoordinator.releaseInventory(order.items);

    // 3. Update order status
    await this.db.orders.update(orderId, { status: 'cancelled' });

    // 4. Send cancellation email
    await this.notificationService.sendCancellationEmail(orderId);
  }

  // ... 6 more core order methods (getOrderDetails, updateOrder, etc.)
}

New Metrics:
  • Lines of Code: 178 (was 1,247) — 86% reduction
  • Number of Methods: 8 (was 43)
  • Cyclomatic Complexity: 12 (was 87)
  • Coupling: 6 services (clean dependency injection)

✓ OrderService refactored successfully!

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 7: Running Tests...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

$ npm test

> Running 247 tests...

  OrderService.test.ts
    ✓ creates order successfully (45ms)
    ✓ cancels order and refunds payment (67ms)
    ✓ handles payment failure gracefully (34ms)
    ... 23 more tests

  PaymentService.test.ts
    ✓ processes payment via gateway (23ms)
    ✓ refunds payment correctly (29ms)
    ... 14 more tests

  [All other service tests passing...]

Test Suites: 14 passed, 14 total
Tests:       247 passed, 247 total
Time:        8.456s

✅ All tests passing! Behavior preserved.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 Refactoring Summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Files Created: 5
  • src/services/payments/PaymentService.ts
  • src/services/notifications/OrderNotificationService.ts
  • src/services/inventory/OrderInventoryCoordinator.ts
  • src/services/pricing/OrderPricingService.ts
  • src/services/shipping/ShippingCoordinator.ts

Files Modified: 15
  • src/services/OrderService.ts (refactored)
  • 14 test files updated

Before vs After:
  • OrderService: 1,247 lines → 178 lines (86% reduction)
  • Cyclomatic Complexity: 87 → 12 (86% reduction)
  • Single Responsibility: ✓ Each service has one clear purpose
  • Testability: ✓ Services can be tested in isolation
  • Maintainability: ✓ Changes are now localized

✅ Refactoring complete! Your God Class is now 6 focused services.

Exactly what happens:

  1. Kiro analyzes the class and calculates complexity metrics
  2. It identifies clusters of methods that operate on similar data/dependencies
  3. It extracts each cluster into a new service class
  4. It updates the original class to delegate to the new services
  5. It updates all tests to match the new structure
  6. It runs the full test suite to verify behavior is preserved

The Core Question You’re Answering

“How do you use AI to perform safe, large-scale refactoring that would take humans days but preserves all behavior and tests?”

This is about teaching Kiro architectural thinking:

  • Recognizing code smells (God Classes, tight coupling)
  • Understanding the Single Responsibility Principle
  • Safely extracting classes without breaking tests
  • Maintaining dependency injection patterns

Concepts You Must Understand First

Stop and research these before coding:

  1. SOLID Principles (especially SRP)
    • What is the Single Responsibility Principle?
    • How do you identify when a class has too many responsibilities?
    • What is coupling vs cohesion?
    • Book Reference: “Clean Architecture” by Robert C. Martin - Ch. 7-8
  2. Refactoring Patterns
    • Extract Class, Extract Method, Move Method
    • How do you safely refactor without changing behavior?
    • What is the “red-green-refactor” cycle?
    • Book Reference: “Refactoring” by Martin Fowler - Ch. 6-7
  3. Code Metrics
    • What is cyclomatic complexity and why does it matter?
    • How do you measure coupling and cohesion?
    • What’s the difference between LOC and meaningful complexity?
    • Book Reference: “Code Complete” by Steve McConnell - Ch. 19
  4. Dependency Injection
    • Why use constructor injection vs property injection?
    • How does DI make code testable?
    • What is inversion of control (IoC)?
    • Book Reference: “Dependency Injection Principles, Practices, and Patterns” by Steven van Deursen & Mark Seemann - Ch. 1-2

Questions to Guide Your Design

Before implementing, think through these:

  1. Cluster Detection
    • How do you identify which methods belong together?
    • Should you cluster by data dependencies, control flow, or domain concepts?
    • What if a method uses dependencies from multiple clusters?
    • Should you use static analysis or dynamic runtime profiling?
  2. Safe Extraction
    • How do you ensure the extracted class has the same behavior?
    • What if the original methods had side effects or shared mutable state?
    • Should you extract one service at a time or all at once?
    • How do you handle private methods that are called by methods in different clusters?
  3. Test Preservation
    • Should you update tests to match the new structure or keep them as integration tests?
    • What if some tests break after refactoring?
    • How do you ensure test coverage doesn’t decrease?
    • Should you add new unit tests for the extracted services?
  4. Naming and Structure
    • How do you name the new service classes?
    • Where should they live in the directory structure?
    • Should you group them by layer (services/, repositories/) or by domain (orders/, payments/)?
    • What if the God Class is already named “OrderService” — do you rename it?

Thinking Exercise

Refactoring Decision Challenge

You have this class:

class UserService {
  // Authentication methods
  async login(email, password) { /* ... */ }
  async logout(userId) { /* ... */ }
  async resetPassword(email) { /* ... */ }

  // Profile management
  async updateProfile(userId, data) { /* ... */ }
  async uploadAvatar(userId, file) { /* ... */ }

  // Notification preferences
  async updateEmailPreferences(userId, prefs) { /* ... */ }
  async updatePushPreferences(userId, prefs) { /* ... */ }

  // Analytics
  async trackLogin(userId) { /* ... */ }
  async trackProfileView(userId, viewerId) { /* ... */ }
}

Questions to reason through:

  1. How many services should you extract? (3? 4?)
  2. Should login and logout go in AuthenticationService or SessionService?
  3. The trackLogin method depends on login — which service should it live in?
  4. After refactoring, should the original UserService still exist? What should it contain?
  5. What if 50 files import UserService — how do you update them all safely?

Proposed structure:

AuthenticationService: login, logout, resetPassword
ProfileService: updateProfile, uploadAvatar
NotificationPreferenceService: updateEmailPreferences, updatePushPreferences
UserAnalyticsService: trackLogin, trackProfileView

Is this optimal? What would you change?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is a God Class and why is it a code smell?”
  2. “How do you safely refactor a class without breaking existing functionality?”
  3. “What’s the difference between coupling and cohesion? Give examples.”
  4. “You extract a service but tests start failing. What’s your debugging strategy?”
  5. “How would you refactor a God Class that has circular dependencies?”
  6. “What metrics do you use to identify classes that need refactoring?”

Hints in Layers

Hint 1: Start with Static Analysis Use TypeScript Compiler API to parse the class:

const sourceFile = ts.createSourceFile(filename, code, ts.ScriptTarget.Latest);
const classNode = findClassDeclaration(sourceFile, 'OrderService');
const methods = classNode.members.filter(m => ts.isMethodDeclaration(m));

// For each method, extract:
// - Dependencies used (from constructor params or method args)
// - Other methods called
// - Properties accessed

This gives you a dependency graph to cluster methods.

Hint 2: Cluster by Dependencies

For each method:
  1. Extract dependencies it uses (PaymentGateway, EmailClient, etc.)
  2. Extract other methods it calls
  3. Group methods that share >70% of dependencies

Example:
  processPayment → uses PaymentGateway
  refundPayment → uses PaymentGateway
  captureAuthorization → uses PaymentGateway
  → Cluster: PaymentService

Hint 3: Extract Class Refactoring For each cluster:

  1. Generate new class file
  2. Move methods to new class
  3. Add constructor with required dependencies
  4. Update original class to instantiate new service
  5. Replace direct calls with this.paymentService.processPayment(...)

Hint 4: Test Validation After each extraction:

  1. Run full test suite: npm test
  2. If tests fail, analyze failures:
    • Missing imports? → Add imports
    • Constructor signature changed? → Update DI container
    • Method signature changed? → Verify parameters match
  3. If all tests pass, commit the refactoring
  4. If tests fail repeatedly, rollback and try a different clustering

Books That Will Help

Topic Book Chapter
SOLID principles “Clean Architecture” by Robert C. Martin Ch. 7-8
Refactoring patterns “Refactoring” by Martin Fowler Ch. 6-7
Code metrics “Code Complete” by Steve McConnell Ch. 19
Dependency injection “Dependency Injection” by Steven van Deursen Ch. 1-2
Software design “Domain-Driven Design” by Eric Evans Ch. 5

Common Pitfalls and Debugging

Problem 1: “Tests fail after extraction”

  • Why: Constructor signature changed but DI container wasn’t updated
  • Fix: Update all places where OrderService is instantiated (DI config, test setup)
  • Quick test: Search for new OrderService( and verify all constructor calls match

Problem 2: “Extracted service has circular dependency”

  • Why: Two clusters reference each other’s methods
  • Fix: Introduce an interface or event bus to break the cycle
  • Quick test: Draw dependency graph — if there’s a cycle, refactor to one-way dependencies

Problem 3: “Method calls private methods from different clusters”

  • Why: Private method is utility logic used by multiple responsibilities
  • Fix: Extract private method to a separate utility class or make it public in the appropriate service
  • Quick test: Analyze private method dependencies — does it belong in one cluster more than others?

Problem 4: “Extracted services are too granular (over-engineering)”

  • Why: Created too many small services (e.g., separate service for each method)
  • Fix: Merge related services (e.g., PaymentService + RefundService → PaymentService)
  • Quick test: If a service has <3 methods and low complexity, consider merging

Definition of Done

  • God Class is identified (cyclomatic complexity >20, >500 lines, or >10 methods)
  • Responsibility clusters are detected programmatically (shared dependencies)
  • Each cluster is extracted into a new service class with focused responsibility
  • Original class delegates to new services via constructor injection
  • All tests pass after refactoring (behavior is preserved)
  • Cyclomatic complexity is reduced by >50%
  • Each new service has <10 methods and single responsibility
  • Test coverage remains the same or improves

Project 39: “The API Client Generator” — Integration

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language TypeScript / Python
Coolness Level Level 2: Practical
Difficulty Level 2: Intermediate
Knowledge Area Integration

What you’ll build: An OpenAPI-to-SDK generator that reads openapi.yaml specifications and produces fully-typed client libraries in TypeScript or Python with request/response validation, error handling, and authentication support.

Why it teaches Automation: This project eliminates the manual work of writing API client boilerplate. You’ll learn how to use Kiro to generate production-ready SDKs that stay in sync with your API spec.

Core challenges you’ll face:

  • OpenAPI spec parsing → Maps to YAML/JSON parsing, JSON Schema validation
  • Code generation → Maps to template engines, AST builders
  • Type safety → Maps to TypeScript interfaces from JSON Schema
  • Authentication patterns → Maps to API key, Bearer token, OAuth2 flow implementation

Real World Outcome

You’ll have a CLI tool that transforms an OpenAPI spec into a production-ready, typed SDK:

$ cat openapi.yaml
openapi: 3.0.0
info:
  title: Task Management API
  version: 1.0.0
servers:
  - url: https://api.example.com/v1
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
  schemas:
    Task:
      type: object
      required: [id, title, status]
      properties:
        id:
          type: string
          format: uuid
        title:
          type: string
          minLength: 1
          maxLength: 200
        description:
          type: string
          nullable: true
        status:
          type: string
          enum: [todo, in_progress, done]
        dueDate:
          type: string
          format: date-time
          nullable: true
        tags:
          type: array
          items:
            type: string
paths:
  /tasks:
    get:
      summary: List all tasks
      security:
        - bearerAuth: []
      parameters:
        - name: status
          in: query
          schema:
            type: string
            enum: [todo, in_progress, done]
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
            minimum: 1
            maximum: 100
      responses:
        '200':
          description: Success
          content:
            application/json:
              schema:
                type: object
                properties:
                  tasks:
                    type: array
                    items:
                      $ref: '#/components/schemas/Task'
                  total:
                    type: integer
    post:
      summary: Create a task
      security:
        - bearerAuth: []
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [title]
              properties:
                title:
                  type: string
                description:
                  type: string
                status:
                  type: string
                  enum: [todo, in_progress, done]
                  default: todo
      responses:
        '201':
          description: Task created
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Task'
  /tasks/{taskId}:
    get:
      summary: Get task by ID
      security:
        - bearerAuth: []
      parameters:
        - name: taskId
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Success
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Task'

$ kiro generate-sdk --spec openapi.yaml --language typescript --output task-api-client

[Kiro CLI Session]
📦 API Client Generator
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Parsing OpenAPI spec...
  ✓ OpenAPI version: 3.0.0
  ✓ API title: Task Management API
  ✓ Base URL: https://api.example.com/v1
  ✓ Security: Bearer Token (HTTP)
  ✓ Schemas: 1 (Task)
  ✓ Endpoints: 3 (GET /tasks, POST /tasks, GET /tasks/{taskId})

Step 2: Generating TypeScript types...
  ✓ Generated: src/types.ts
  ✓ Generated: src/schemas.ts (Zod validators)

Step 3: Generating API client...
  ✓ Generated: src/client.ts
  ✓ Generated: src/endpoints/TasksApi.ts

Step 4: Generating tests...
  ✓ Generated: tests/client.test.ts

Step 5: Setting up project...
  ✓ package.json created
  ✓ tsconfig.json created
  ✓ README.md created

✅ SDK generated successfully!

$ tree task-api-client/
task-api-client/
├── src/
│   ├── client.ts          (Main API client class)
│   ├── types.ts           (TypeScript interfaces)
│   ├── schemas.ts         (Zod runtime validators)
│   └── endpoints/
│       └── TasksApi.ts    (Task endpoints wrapper)
├── tests/
│   └── client.test.ts
├── package.json
├── tsconfig.json
└── README.md

$ cat task-api-client/src/types.ts
// Auto-generated from openapi.yaml — DO NOT EDIT

export interface Task {
  id: string;           // UUID
  title: string;        // 1-200 chars
  description?: string | null;
  status: 'todo' | 'in_progress' | 'done';
  dueDate?: string | null;  // ISO 8601 date-time
  tags?: string[];
}

export interface ListTasksRequest {
  status?: 'todo' | 'in_progress' | 'done';
  limit?: number;  // 1-100, default: 20
}

export interface ListTasksResponse {
  tasks: Task[];
  total: number;
}

export interface CreateTaskRequest {
  title: string;
  description?: string;
  status?: 'todo' | 'in_progress' | 'done';
}

$ cat task-api-client/src/client.ts
import axios, { AxiosInstance } from 'axios';
import { TasksApi } from './endpoints/TasksApi';

export class TaskApiClient {
  private axiosInstance: AxiosInstance;
  public tasks: TasksApi;

  constructor(config: {
    baseURL?: string;
    apiKey: string;
    timeout?: number;
  }) {
    this.axiosInstance = axios.create({
      baseURL: config.baseURL || 'https://api.example.com/v1',
      timeout: config.timeout || 30000,
      headers: {
        'Authorization': `Bearer ${config.apiKey}`,
        'Content-Type': 'application/json',
      },
    });

    // Initialize endpoint wrappers
    this.tasks = new TasksApi(this.axiosInstance);
  }
}

$ cat task-api-client/src/endpoints/TasksApi.ts
import { AxiosInstance } from 'axios';
import {
  Task,
  ListTasksRequest,
  ListTasksResponse,
  CreateTaskRequest,
} from '../types';
import { TaskSchema, ListTasksResponseSchema } from '../schemas';

export class TasksApi {
  constructor(private axios: AxiosInstance) {}

  /**
   * List all tasks
   * @param params - Query parameters
   * @returns List of tasks with total count
   */
  async list(params?: ListTasksRequest): Promise<ListTasksResponse> {
    const response = await this.axios.get<ListTasksResponse>('/tasks', {
      params,
    });

    // Runtime validation with Zod
    return ListTasksResponseSchema.parse(response.data);
  }

  /**
   * Create a new task
   * @param data - Task creation payload
   * @returns The created task
   */
  async create(data: CreateTaskRequest): Promise<Task> {
    const response = await this.axios.post<Task>('/tasks', data);
    return TaskSchema.parse(response.data);
  }

  /**
   * Get task by ID
   * @param taskId - Task UUID
   * @returns The task details
   */
  async get(taskId: string): Promise<Task> {
    const response = await this.axios.get<Task>(`/tasks/${taskId}`);
    return TaskSchema.parse(response.data);
  }
}

$ cat task-api-client/src/schemas.ts
import { z } from 'zod';

export const TaskSchema = z.object({
  id: z.string().uuid(),
  title: z.string().min(1).max(200),
  description: z.string().nullable().optional(),
  status: z.enum(['todo', 'in_progress', 'done']),
  dueDate: z.string().datetime().nullable().optional(),
  tags: z.array(z.string()).optional(),
});

export const ListTasksResponseSchema = z.object({
  tasks: z.array(TaskSchema),
  total: z.number().int(),
});

$ cat task-api-client/README.md
# Task Management API Client

Auto-generated TypeScript SDK for Task Management API.

## Installation

```bash
npm install task-api-client

Usage

import { TaskApiClient } from 'task-api-client';

const client = new TaskApiClient({
  apiKey: process.env.API_KEY!,
});

// List tasks
const { tasks, total } = await client.tasks.list({
  status: 'todo',
  limit: 10,
});

// Create a task
const newTask = await client.tasks.create({
  title: 'Implement user authentication',
  description: 'Add JWT-based auth flow',
  status: 'todo',
});

// Get task by ID
const task = await client.tasks.get(newTask.id);

Features

✓ Full TypeScript type safety ✓ Runtime validation with Zod ✓ Automatic authentication (Bearer token) ✓ Error handling and retries ✓ Request/response interceptors ✓ Auto-generated from OpenAPI 3.0 spec

$ cd task-api-client && npm install && npm test

task-api-client@1.0.0 test jest

PASS tests/client.test.ts TaskApiClient ✓ creates client with API key (12ms) ✓ lists tasks with filters (45ms) ✓ creates a new task (34ms) ✓ validates response schema (23ms) ✓ throws error on invalid status (18ms)

Tests: 5 passed, 5 total Time: 2.134s

✅ SDK is ready to publish!


**Exactly what happens:**
1. Kiro parses the OpenAPI spec and extracts schemas, endpoints, and auth requirements
2. It generates TypeScript interfaces from JSON Schema definitions
3. It creates Zod validators for runtime type safety
4. It generates an API client class with typed methods for each endpoint
5. It adds authentication, error handling, and request validation
6. It produces a complete npm package ready to publish

#### The Core Question You're Answering

> "How do you automatically generate production-ready API clients that stay in sync with your OpenAPI spec and provide full type safety?"

This is about code generation as a force multiplier:
- Eliminating manual SDK maintenance (API changes → regenerate SDK)
- Providing better DX than hand-written clients (types, validation, docs)
- Ensuring client and server contracts match (single source of truth)

#### Concepts You Must Understand First

**Stop and research these before coding:**

1. **OpenAPI Specification 3.0**
   - What are paths, operations, parameters, requestBody, responses?
   - How do JSON Schema definitions map to types?
   - What are `$ref` references and how do you resolve them?
   - *Reference:* OpenAPI 3.0 Specification (https://spec.openapis.org/oas/v3.0.3)

2. **Code Generation Strategies**
   - Template engines (Handlebars, EJS) vs AST builders (TypeScript Compiler API)
   - When to use string concatenation vs structured code generation?
   - How do you generate readable, idiomatic code?
   - *Book Reference:* "Code Generation in Action" by Jack Herrington

3. **Runtime Validation**
   - Why use Zod, Yup, or io-ts for runtime type checking?
   - What's the difference between compile-time types (TypeScript) and runtime validation?
   - How do you handle optional fields, nullable types, and unions?
   - *Reference:* Zod documentation (https://zod.dev)

4. **HTTP Client Patterns**
   - Axios vs Fetch API — which to use for generated clients?
   - How do you handle authentication (API keys, Bearer tokens, OAuth2)?
   - What about request interceptors, retries, and error handling?
   - *Book Reference:* "RESTful Web API Patterns & Practices" by Mike Amundsen

#### Questions to Guide Your Design

**Before implementing, think through these:**

1. **Spec Parsing**
   - How do you resolve `$ref` pointers to schemas in other files?
   - What if the spec has circular references (Task → User → Task)?
   - Should you validate the OpenAPI spec before generating code?
   - How do you handle deprecated endpoints or parameters?

2. **Type Generation**
   - Should you generate interfaces or types? (`interface Task` vs `type Task`)
   - How do you handle discriminated unions (polymorphic schemas)?
   - What about enums — should they be TypeScript enums or union types?
   - How do you generate JSDoc comments from OpenAPI descriptions?

3. **SDK Structure**
   - Should you group endpoints by tags or paths?
   - Should each endpoint be a method or a separate class?
   - How do you handle pagination, filtering, sorting?
   - What about file uploads (multipart/form-data)?

4. **Versioning and Updates**
   - If the API spec changes, how do you regenerate without breaking client code?
   - Should you version the SDK independently of the API?
   - How do you handle breaking vs non-breaking changes?
   - Should the generator produce a diff showing what changed?

#### Thinking Exercise

### Type Generation Challenge

You have this OpenAPI schema:
```yaml
components:
  schemas:
    User:
      type: object
      required: [id, email, role]
      properties:
        id:
          type: string
        email:
          type: string
          format: email
        role:
          type: string
          enum: [admin, member, guest]
        profile:
          oneOf:
            - $ref: '#/components/schemas/AdminProfile'
            - $ref: '#/components/schemas/MemberProfile'
    AdminProfile:
      type: object
      properties:
        permissions:
          type: array
          items:
            type: string
    MemberProfile:
      type: object
      properties:
        joinedAt:
          type: string
          format: date-time

Questions to reason through:

  1. Should role be a TypeScript enum or a union type ('admin' | 'member' | 'guest')?
  2. How do you represent oneOf in TypeScript? (Union type? Discriminated union?)
  3. The profile field depends on role — can you enforce this at the type level?
  4. Should you generate separate types for AdminProfile and MemberProfile or inline them?
  5. How would Zod validate the oneOf relationship at runtime?

Generated types:

export type UserRole = 'admin' | 'member' | 'guest';

export interface AdminProfile {
  permissions: string[];
}

export interface MemberProfile {
  joinedAt: string;  // ISO 8601 date-time
}

export interface User {
  id: string;
  email: string;
  role: UserRole;
  profile: AdminProfile | MemberProfile;
}

Is this type-safe enough? How would you improve it?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How would you handle versioning in a generated SDK? (e.g., API v1 vs v2)”
  2. “Your OpenAPI spec has a circular reference. How do you generate types without infinite loops?”
  3. “What’s the difference between compile-time type safety and runtime validation? Why do you need both?”
  4. “How would you handle authentication in a generated client? (API keys, OAuth2, etc.)”
  5. “The API spec changes frequently. How do you keep the SDK in sync without manual work?”
  6. “Should you generate one giant SDK or multiple packages per API resource?”

Hints in Layers

Hint 1: Parse OpenAPI with a Library Don’t parse YAML manually — use a library:

import SwaggerParser from '@apidevtools/swagger-parser';

const api = await SwaggerParser.dereference('openapi.yaml');
// This resolves all $ref pointers into a single object

const paths = api.paths;
const schemas = api.components.schemas;

Hint 2: Generate Types from JSON Schema For each schema in components.schemas:

function generateInterface(name: string, schema: any): string {
  const required = schema.required || [];
  const properties = Object.entries(schema.properties || {})
    .map(([key, prop]: [string, any]) => {
      const optional = !required.includes(key) ? '?' : '';
      const type = mapJsonSchemaTypeToTS(prop);
      return `  ${key}${optional}: ${type};`;
    })
    .join('\n');

  return `export interface ${name} {\n${properties}\n}`;
}

function mapJsonSchemaTypeToTS(schema: any): string {
  if (schema.type === 'string') {
    if (schema.enum) return schema.enum.map(v => `'${v}'`).join(' | ');
    return 'string';
  }
  if (schema.type === 'number' || schema.type === 'integer') return 'number';
  if (schema.type === 'boolean') return 'boolean';
  if (schema.type === 'array') return `${mapJsonSchemaTypeToTS(schema.items)}[]`;
  if (schema.type === 'object') return 'Record<string, any>';
  return 'any';
}

Hint 3: Generate API Methods For each endpoint in paths:

function generateMethod(path: string, method: string, operation: any): string {
  const functionName = operation.operationId || generateOperationId(path, method);
  const params = extractParameters(operation);
  const requestBody = operation.requestBody;
  const response = operation.responses['200'] || operation.responses['201'];

  return `
  async ${functionName}(${params}): Promise<${getResponseType(response)}> {
    const response = await this.axios.${method}('${path}', ${getRequestConfig()});
    return ${getResponseSchema()}.parse(response.data);
  }
  `;
}

Hint 4: Add Authentication Generate authentication logic based on securitySchemes:

if (spec.components.securitySchemes.bearerAuth) {
  // Add Bearer token to headers
  headers['Authorization'] = `Bearer ${config.apiKey}`;
}

if (spec.components.securitySchemes.apiKey) {
  // Add API key to query or header
  const apiKeyLocation = spec.components.securitySchemes.apiKey.in;
  if (apiKeyLocation === 'header') {
    headers[spec.components.securitySchemes.apiKey.name] = config.apiKey;
  }
}

Books That Will Help

Topic Book Chapter
OpenAPI fundamentals “Designing Web APIs” by Brenda Jin et al. Ch. 3-4
Code generation “Code Generation in Action” by Jack Herrington Ch. 1-2
TypeScript types “Programming TypeScript” by Boris Cherny Ch. 6
REST API patterns “RESTful Web API Patterns” by Mike Amundsen Ch. 5

Common Pitfalls and Debugging

Problem 1: “Generated types don’t match runtime data”

  • Why: OpenAPI spec is out of sync with actual API responses
  • Fix: Add runtime validation with Zod to catch mismatches early
  • Quick test: Call real API and log response.data — does it match the schema?

Problem 2: “Circular references cause infinite loop”

  • Why: Schema A references Schema B, which references Schema A
  • Fix: Use lazy evaluation in Zod: z.lazy(() => UserSchema)
  • Quick test: Try generating types for User → Team → User — does it terminate?

Problem 3: “Generated code is unreadable”

  • Why: Long lines, no formatting, missing comments
  • Fix: Run generated code through Prettier: prettier --write src/**/*.ts
  • Quick test: Open generated file — would you be comfortable editing it?

Problem 4: “Authentication doesn’t work”

  • Why: Security scheme in spec doesn’t match actual API requirements
  • Fix: Test generated client against real API with curl equivalent
  • Quick test: await client.tasks.list() — does it return 401 or actual data?

Definition of Done

  • OpenAPI spec is parsed and validated
  • TypeScript interfaces are generated for all schemas
  • Zod validators are generated for runtime validation
  • API client class is generated with typed methods for each endpoint
  • Authentication is implemented based on securitySchemes
  • Error handling and request interceptors are included
  • Generated SDK can successfully call at least one endpoint
  • package.json, tsconfig.json, and README are generated
  • Generated code compiles without errors
  • Tests pass for basic SDK functionality

Project 40: “The Autonomous Developer (Capstone)” — Full Agentic Mastery

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Polyglot
Coolness Level Level 5: Pure Magic
Business Potential 5. Industry Disruptor (Agentic Workflow)
Difficulty Level 5: Master
Knowledge Area Full Agentic Mastery

What you’ll build: A fully autonomous CI/CD healing agent that monitors GitHub Actions, detects failures, diagnoses root causes, patches code, runs tests, and opens pull requests—all without human intervention.

Why it teaches Mastery: This capstone project combines every skill from Projects 1-39: headless operation, hooks, MCP servers, shell tools, reasoning, context management, and multi-agent orchestration. If you can build this, you’ve mastered Kiro.

Core challenges you’ll face:

  • Headless GitHub Actions monitoring → Maps to GitHub API polling, webhook handling
  • Log analysis and root cause diagnosis → Maps to error pattern matching, stack trace parsing
  • Autonomous code patching → Maps to multi-agent collaboration, test-driven fixes
  • Verification loop → Maps to running tests, validating fixes before PR

Real World Outcome

You’ll have a system that automatically fixes broken CI/CD pipelines:

# Setup: Deploy the autonomous agent to a server
$ kiro autonomous-dev setup --repo myorg/my-app --webhook-url https://my-server.com/webhook

[Kiro CLI Session]
🤖 Autonomous Developer Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step 1: Configuring GitHub webhook...
  ✓ Webhook URL: https://my-server.com/webhook
  ✓ Events: workflow_run, push
  ✓ Secret: ••••••••

Step 2: Starting headless Kiro agent...
  ✓ Listening for GitHub Actions failures
  ✓ Agent running in background (PID: 12345)
  ✓ Logs: /var/log/kiro-agent.log

✅ Autonomous agent deployed!

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[Time passes... a GitHub Action fails]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[2024-12-20 14:32:15] Webhook received: workflow_run.failed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Workflow: CI
Run ID: 123456789
Commit: a3f2c1d
Branch: feature/add-user-auth
Triggered by: @developer123

Step 1: Fetching logs...
  ✓ Downloaded logs from GitHub Actions

Step 2: Analyzing failure...

Logs:
  Run npm test
    FAIL src/services/AuthService.test.ts
      AuthService
        ✕ should validate JWT token (234ms)

    ● AuthService › should validate JWT token

      TypeError: Cannot read property 'split' of undefined
        at AuthService.validateToken (src/services/AuthService.ts:45:23)
        at Object.<anonymous> (src/services/AuthService.test.ts:67:34)

  Test Suites: 1 failed, 15 passed, 16 total
  Tests:       1 failed, 187 passed, 188 total

🔍 Diagnosis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  • Error: TypeError: Cannot read property 'split' of undefined
  • File: src/services/AuthService.ts:45
  • Method: validateToken
  • Root cause: Missing null check before calling .split()
  • Impact: JWT validation crashes when token is undefined/null

Proposed fix:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Add null check before token.split() call

Step 3: Reading source code...
  ✓ Fetched src/services/AuthService.ts

Current code (line 45):
  const parts = token.split('.');

Suggested fix:
  if (!token) {
    throw new Error('Token is required');
  }
  const parts = token.split('.');

Step 4: Applying fix...
  ✓ Created branch: fix/auth-service-null-check
  ✓ Modified: src/services/AuthService.ts
  ✓ Committed: "Fix null check in AuthService.validateToken"

Step 5: Running tests locally (headless)...
  ✓ Checked out fix/auth-service-null-check
  ✓ npm install (cached, 2.3s)
  ✓ npm test

  PASS src/services/AuthService.test.ts
    AuthService
      ✓ should validate JWT token (89ms)
      ✓ should throw error for null token (23ms)  ← NEW TEST ADDED
      ✓ should throw error for invalid token (34ms)

  Test Suites: 16 passed, 16 total
  Tests:       188 passed, 188 total  ← ALL PASSING!

Step 6: Opening pull request...
  ✓ Pushed branch to GitHub
  ✓ PR #456 opened: "Fix: Add null check in AuthService.validateToken"

PR Description:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Summary
Fixes CI failure caused by missing null check in `AuthService.validateToken`.

## Root Cause
The method attempted to call `.split()` on a potentially `null` or `undefined` token, causing a TypeError when tests passed `null` as input.

## Changes
- Added null check before `token.split()`
- Throws descriptive error when token is missing
- All tests now pass ✅

## Testing
- ✓ Existing tests pass
- ✓ New test added for null token case
- ✓ CI pipeline successful

## Autonomous Fix
🤖 This PR was automatically generated by Kiro Autonomous Developer Agent.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

✅ Fix deployed!

PR URL: https://github.com/myorg/my-app/pull/456
Status: Awaiting review
CI Status: ✅ All checks passing

[Agent returns to monitoring mode...]

What just happened:

  1. GitHub Actions workflow failed
  2. Agent received webhook notification
  3. Agent downloaded and analyzed failure logs
  4. Agent diagnosed the root cause (null check missing)
  5. Agent read the source code and generated a fix
  6. Agent created a branch, committed the fix, and ran tests
  7. Agent verified all tests pass
  8. Agent opened a PR with full context and test results

This is full autonomy — zero human intervention required.

The Core Question You’re Answering

“Can you build an AI agent that operates completely autonomously in a production environment, diagnosing and fixing real failures without human guidance?”

This is the culmination of everything:

  • Headless operation (no interactive prompts)
  • Event-driven architecture (webhooks trigger actions)
  • Multi-step reasoning (diagnosis → fix → verify)
  • Safe automation (tests must pass before PR)
  • Production-ready (handles edge cases, logs all actions)

Concepts You Must Understand First

Stop and research these before coding:

  1. GitHub Actions and Webhooks
    • How do GitHub webhooks work? (delivery, signatures, retries)
    • How do you download workflow logs via GitHub API?
    • What information is in a workflow_run event?
    • Reference: GitHub Webhooks Documentation
  2. Headless Automation
    • How do you run Kiro without interactive mode?
    • How do you pass instructions via environment variables or config files?
    • How do you handle errors when there’s no human to ask?
    • Book Reference: “Continuous Delivery” by Jez Humble - Ch. 10
  3. Root Cause Analysis
    • How do you parse stack traces programmatically?
    • What patterns indicate common failure types? (null checks, type errors, async issues)
    • How do you distinguish flaky tests from real bugs?
    • Book Reference: “Release It!” by Michael Nygard - Ch. 4
  4. Test-Driven Fixes
    • How do you verify a fix is correct without human review?
    • Should the agent add new tests for the failure case?
    • What if the fix causes other tests to fail?
    • Book Reference: “Test Driven Development” by Kent Beck - Ch. 1-2

Questions to Guide Your Design

Before implementing, think through these:

  1. Event Handling
    • How do you ensure webhook deliveries aren’t lost? (queue, retry logic)
    • What if multiple workflows fail simultaneously?
    • Should the agent handle one failure at a time or in parallel?
    • How do you prevent duplicate fixes for the same failure?
  2. Diagnosis
    • How do you extract the root cause from logs? (regex patterns, LLM analysis)
    • What if the logs don’t have enough information?
    • Should the agent ask Kiro to analyze logs or use static patterns?
    • How do you handle flaky tests (failures that pass on retry)?
  3. Fix Generation
    • Should the agent always attempt a fix or only for certain error types?
    • What if Kiro generates a fix that makes things worse?
    • Should the agent rollback if tests fail after the fix?
    • How do you prevent infinite loops (fix → test fail → new fix → …)?
  4. Safety and Approval
    • Should all PRs auto-merge or require human review?
    • What if the agent opens 100 PRs in a day?
    • Should there be a “dry-run” mode that shows what it would do?
    • How do you audit all agent actions?

Thinking Exercise

Autonomous Decision Tree

The agent encounters this failure:

Error: ECONNREFUSED
  at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16)

Tests failed: 12 / 188

Questions to reason through:

  1. Is this a code bug or an infrastructure issue? (Database not running?)
  2. Should the agent attempt a code fix or just notify a human?
  3. If it’s a missing service, how does the agent start it?
  4. What if the error is intermittent (connection refused sometimes)?
  5. Should the agent retry the workflow or fix the code first?

Decision tree:

Is error deterministic? (same failure every time)
  Yes → Attempt code fix
  No → Mark as flaky, notify human

Is error in application code or infrastructure?
  Application → Generate code patch
  Infrastructure → Notify ops team

Did fix pass tests?
  Yes → Open PR
  No → Rollback, try alternative fix

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How would you prevent the autonomous agent from making things worse?”
  2. “What if the agent generates an infinite loop of PRs?”
  3. “How do you ensure the agent doesn’t leak secrets or sensitive data?”
  4. “What happens if the agent’s fix causes a production outage?”
  5. “How would you audit all actions taken by the autonomous agent?”
  6. “Should the agent have access to merge PRs or only create them?”

Hints in Layers

Hint 1: Webhook Server Set up an Express server to receive GitHub webhooks:

import express from 'express';
import crypto from 'crypto';

const app = express();

app.post('/webhook', express.json(), async (req, res) => {
  // Verify signature
  const signature = req.headers['x-hub-signature-256'];
  const hmac = crypto.createHmac('sha256', WEBHOOK_SECRET);
  const digest = 'sha256=' + hmac.update(JSON.stringify(req.body)).digest('hex');

  if (signature !== digest) {
    return res.status(401).send('Invalid signature');
  }

  // Handle event
  if (req.body.action === 'completed' && req.body.workflow_run.conclusion === 'failure') {
    await handleWorkflowFailure(req.body.workflow_run);
  }

  res.status(200).send('OK');
});

Hint 2: Log Analysis Download logs and extract the failure:

const logs = await octokit.actions.downloadWorkflowRunLogs({
  owner,
  repo,
  run_id,
});

// Parse logs to find error
const errorPattern = /Error: (.+)\n\s+at (.+):(\d+):(\d+)/;
const match = logs.match(errorPattern);

if (match) {
  const [, message, file, line, column] = match;
  return { message, file, line: parseInt(line), column: parseInt(column) };
}

Hint 3: Headless Kiro Invocation Run Kiro in non-interactive mode:

$ kiro --headless --prompt "Fix the null check error in AuthService.ts line 45" \
       --files src/services/AuthService.ts \
       --output-branch fix/auth-service-null-check \
       --auto-commit

Or via API if Kiro has one:

const result = await kiro.executeTask({
  instruction: "Add null check before token.split() on line 45",
  files: ['src/services/AuthService.ts'],
  branch: 'fix/auth-service-null-check',
  runTests: true,
});

Hint 4: Verification Loop After generating a fix:

  1. Checkout the fix branch
  2. Run npm install (or pip install, etc.)
  3. Run npm test
  4. Parse test output:
    • If all pass → open PR
    • If some fail → analyze failures and retry
    • If all fail → abort and notify human

Books That Will Help

Topic Book Chapter
Webhooks “Webhooks: Events for RESTful APIs” by Mike Amundsen Ch. 2-3
CI/CD “Continuous Delivery” by Jez Humble Ch. 10
Root cause analysis “Release It!” by Michael Nygard Ch. 4
Autonomous systems “Building Event-Driven Microservices” by Adam Bellemare Ch. 6

Common Pitfalls and Debugging

Problem 1: “Agent creates duplicate PRs for the same failure”

  • Why: Webhook is delivered multiple times or agent doesn’t track what it’s fixed
  • Fix: Store a hash of (run_id + failure_message) in a database to deduplicate
  • Quick test: Trigger same failure twice — does it create one PR or two?

Problem 2: “Fix causes other tests to fail”

  • Why: Fix is too aggressive or changes behavior elsewhere
  • Fix: Run full test suite before opening PR; if new failures appear, rollback
  • Quick test: Generate a fix that breaks a different test — does agent catch it?

Problem 3: “Agent leaks API keys or secrets in PRs”

  • Why: Logs or fix code include sensitive data
  • Fix: Use a secret sanitization hook before commits
  • Quick test: Simulate a failure with API key in logs — is it redacted in PR?

Problem 4: “Infinite loop: fix fails → new fix → fails → …“

  • Why: No circuit breaker for repeated failures
  • Fix: Limit retries to 3; if all fail, notify human and stop
  • Quick test: Create an unfixable failure — does agent stop after 3 tries?

Definition of Done

  • Webhook server receives GitHub Actions failure events
  • Agent downloads and parses workflow logs to find root cause
  • Agent diagnoses the failure type (null check, type error, etc.)
  • Agent generates a code fix using headless Kiro
  • Agent creates a branch, commits the fix, and runs tests
  • All tests must pass before opening a PR
  • Agent opens a PR with detailed description and test results
  • Agent logs all actions for audit trail
  • Circuit breaker prevents infinite retry loops
  • Secrets are sanitized before commits
  • At least one real workflow failure is automatically fixed end-to-end

Project Comparison Table

Project Range Difficulty Focus Cool Factor
1-5 (Foundations) Beginner Config, Context, Planning 2/5
6-10 (Steering) Intermediate Personas, Specs, PBT 3/5
11-16 (MCP) Advanced DBs, Cloud, Tools 4/5
17-24 (Hooks/Remote) Advanced Security, SSH, Safety 4/5
25-32 (Workflows) Mixed Tangents, Checkpoints, Docs 3/5
33-40 (Capstone) Master Full Autonomy 5/5

Summary

This learning path covers Kiro CLI through 40 hands-on projects.

# Project Name Focus
1 Personalized Config Configuration
2 Steering Enforcer Prompt Engineering
3 Context Detective LLM Context
4 Subagent Researcher Agent Delegation
5 Plan Architect Spec-Driven Dev
6 Custom Persona Agent Config
7 Executable Spec Documentation
8 PBT Suite Testing
9 Postgres Analyst MCP (Database)
10 GitHub Manager MCP (Workflow)
11 AWS Architect MCP (Cloud)
12 Doc Librarian MCP (RAG)
13 Custom Tool (Py) MCP (Protocol)
14 FS Guardian (Node) MCP (Protocol)
15 Chrome Puppeteer Browser Automation
16 Design to Code Multimodal
17 Type-Safe Hook Bun / TypeScript
18 Security Firewall Policy / Governance
19 Auto-Fixer Loop Feedback Loops
20 Git Context Injector Context Automation
21 Headless Setup Remote Dev
22 SSH Tunnel Agent Networking
23 Corporate Proxy Enterprise Ops
24 Secret Sanitizer Security
25 Tangent Explorer Context Management
26 Checkpoint Time Machine Safety
27 Checklist Manager Task Execution
28 Semantic Search RAG
29 Delegate Worker Async Tasks
30 Recursive Improver Metacognition
31 Legacy Archaeologist Code Exploration
32 Reverse Documenter Documentation
33 Full Stack Scaffolder Prototyping
34 Cloud Native Deployer DevOps
35 Deep Reasoner Algorithms
36 Global Translator i18n
37 SQL Optimizer Performance
38 Refactoring Surgeon Architecture
39 API Client Generator Integration
40 Autonomous Developer Full Autonomy

For beginners: Start with Project 1, 2, 5. Get comfortable with steering first.

For system architects: Jump to Project 9 (MCP) and Project 17 (Hooks).

For DevOps/SRE: Focus on Project 21 (Headless) and Project 40 (Capstone).