Sprint: Kiro CLI Mastery - Real World Projects
Goal: Deeply understand Kiro CLI—the next-generation agentic terminal environment from AWS. You will master context management, custom agent creation, MCP integrations, type-safe automation hooks, headless CI/CD pipelines, and complex multi-agent workflows. By the end, you will transform your terminal from a passive command executor into an intelligent development partner that understands your codebase, enforces your standards, and automates your workflows with minimal human intervention. You will learn how to steer AI execution, control permissions, manage context, and build reliable automation that produces real, verifiable outcomes.
Introduction
Kiro CLI is not just a terminal chatbot—it’s a comprehensive agentic runtime that transforms how you interact with code, infrastructure, and development workflows.
What is Kiro CLI? Kiro CLI (formerly Amazon Q Developer CLI) is AWS’s agentic terminal environment that provides:
- Deep context awareness across your entire project
- Autonomous task execution with human-in-the-loop approval
- Extensible tool ecosystem via Model Context Protocol (MCP)
- Type-safe automation hooks for quality gates
- Multi-agent workflows for complex tasks
What problem does it solve today? The terminal has been a passive executor for decades. You type commands, it returns output. This places 100% of the cognitive load on you. Kiro inverts this: you describe intent, and Kiro researches, plans, executes, and verifies—with your guidance.
What will you build across the projects? You’ll build a complete Kiro ecosystem:
- Session management and analytics tools
- Custom agents (security auditor, code reviewer, DevOps engineer)
- MCP integrations (databases, GitHub, AWS, Slack, Docker)
- Type-safe automation hooks (secret scanning, test generation, auto-formatting)
- Multi-agent orchestration systems
- Headless CI/CD pipelines
- Configuration sharing systems for teams
What is in scope vs out of scope?
In Scope:
- Kiro CLI configuration, agents, steering, hooks, and MCP
- Building custom tools and integrations
- Multi-agent workflows and orchestration
- Headless automation for CI/CD
- Team collaboration and configuration sharing
Out of Scope:
- AWS-specific infrastructure setup (covered separately)
- Deep LLM architecture (we use Kiro’s models)
- General software engineering (we assume proficiency)
Big Picture: Kiro as an Agentic Operating System
┌────────────────────────────────────────────────────────────────┐
│ KIRO CLI - AGENTIC OS │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ CONTEXT │ │ AGENCY │ │ STEERING │ │
│ │ LAYER │ │ LAYER │ │ LAYER │ │
│ ├──────────────┤ ├──────────────┤ ├──────────────┤ │
│ │ • Files │ │ • Subagents │ │ • product.md │ │
│ │ • Git │ │ • Planning │ │ • tech.md │ │
│ │ • Sessions │ │ • Execution │ │ • style.md │ │
│ │ • Knowledge │ │ • Parallel │ │ • security.md│ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └─────────────────┼─────────────────┘ │
│ │ │
│ ┌────────────▼───────────┐ │
│ │ REVL LOOP ENGINE │ │
│ │ (Read, Evaluate, │ │
│ │ Verify, Loop) │ │
│ └────────┬───────────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ TOOLS │ │ MCP │ │ HOOKS │ │
│ │ (Built- │ │ (External │ │ (Events) │ │
│ │ in) │ │ Systems) │ │ │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
How to Use This Guide
Reading the Primer Before Projects:
- Start with the Theory Primer (sections below) to build mental models
- Read the Concept Summary Table to see what you’ll learn
- Review the Project-to-Concept Map to understand project coverage
- Use Deep Dive Reading for book references on each concept
Picking a Learning Path:
- Beginner Path: Projects 1-5 → foundations and configuration
- Intermediate Path: Projects 6-15 → MCP integrations and hooks
- Advanced Path: Projects 16-25 → multi-agent workflows and automation
- Enterprise Path: Projects 26-35 → headless CI/CD and team collaboration
- Expert Path: Projects 36-40 → advanced orchestration and capstone
Validating Progress: Each project has a “Definition of Done” checklist and “Real World Outcome” section with exact expected outputs. Use these to verify mastery.
Prerequisites & Background Knowledge
Essential Prerequisites (Must Have)
- Terminal proficiency: Comfortable with bash, environment variables, pipes, and redirection
- Git fundamentals: Commit, branch, merge, rebase, remote operations
- Programming basics: Variables, functions, loops, conditionals in any language
- JSON/YAML: Read and write configuration files
- Recommended Reading: “Effective Shell” by Dave Kerr - Ch. 1-5
Helpful But Not Required
- Node.js/TypeScript: Learn during Projects 8, 14, 17 (Bun/TypeScript hooks)
- Python: Learn during Projects 9, 13 (MCP server development)
- Docker: Learn during Projects 18, 34 (containerized MCP servers)
- AWS familiarity: Learn during Projects 9, 11, 17 (AWS integrations)
Self-Assessment Questions
- Can you explain what
export VAR=valuedoes vsVAR=value? - Do you understand stdin, stdout, and stderr?
- Can you write a basic shell script with conditionals?
- Do you know how to parse JSON with
jqor similar tools? - Can you explain what environment variables are and how they’re inherited?
Development Environment Setup
Required Tools:
- Kiro CLI: Latest version via
npm install -g @aws/kiro-clior brew - Node.js: v20+ (for MCP servers and hook development)
- Bun: v1.0+ (for type-safe hooks, install via
curl -fsSL https://bun.sh/install | bash) - Git: v2.30+ (for version control and context)
- jq: Latest (for JSON parsing,
brew install jqorapt install jq)
Recommended Tools:
- Docker: v24+ (for containerized MCP servers)
- PostgreSQL client: v14+ (for database MCP projects)
- gh CLI: Latest (for GitHub integration projects)
- AWS CLI: v2 (for AWS integration projects)
Testing Your Setup:
$ kiro-cli --version
kiro-cli version 1.x.x
$ node --version
v20.x.x
$ bun --version
1.x.x
$ jq --version
jq-1.x
$ git --version
git version 2.x.x
Time Investment
- Simple projects (1-10): 4-8 hours each (fundamentals, configuration)
- Moderate projects (11-25): 10-20 hours each (MCP, hooks, integrations)
- Complex projects (26-35): 20-40 hours each (multi-agent, CI/CD)
- Advanced projects (36-40): 40-80 hours each (orchestration, capstone)
- Total sprint: 4-8 months at 10-15 hours/week
Important Reality Check Kiro CLI is cutting-edge technology. You will encounter:
- Documentation gaps: Official docs are evolving; use community resources
- Breaking changes: Kiro is under active development; expect API changes
- Learning curve: Agentic workflows require new mental models
- Debugging complexity: Multi-agent systems are harder to debug than scripts
This is normal. The goal is deep understanding, not just “making it work.”
Big Picture / Mental Model
The Agentic Terminal Paradigm Shift
Traditional terminals are reactive executors:
You: command
Terminal: output
You: next command
Terminal: output
Kiro CLI is a proactive partner:
You: intent ("Add OAuth login")
Kiro: research → plan → confirm → execute → verify → report
You: approve/modify
Kiro: continue
The REVL Loop: Kiro’s Core Execution Model
Kiro is not a single-shot REPL. It’s an execution loop with guardrails:
┌────────────────────────────────────────────────────────────────┐
│ REVL LOOP (Deterministic AI) │
├────────────────────────────────────────────────────────────────┤
│ │
│ User Intent │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ READ │ Load context: files, git, session history │
│ │ Context │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ EVALUATE │ Generate plan based on context │
│ │ Plan │ Check against steering rules │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ VERIFY │ Validate plan meets constraints │
│ │ Constraints│ Human approval gate │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ EXECUTE │ Run tools, write files, call MCP │
│ │ Tools │ Hooks run pre/post execution │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ VERIFY │ Check results match expectations │
│ │ Results │ Run tests, lint, validate │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ LOOP │ Continue to next subtask or stop │
│ │ │ │
│ └──────────────┘ │
│ │
│ Why this matters: AI output is probabilistic. │
│ Verification makes it deterministic. │
│ If the loop cannot verify, the task is not done. │
│ │
└────────────────────────────────────────────────────────────────┘
Key Insight: The REVL loop separates intent, execution, and verification—making AI reliable enough for production workflows.
Theory Primer
This section builds the conceptual foundation you need before diving into projects. Each concept below has a dedicated chapter with deep explanations, diagrams, and examples.
Chapter 1: Configuration and Scope - The Settings Hierarchy
Fundamentals (100+ words)
Kiro CLI uses a three-tier configuration system where settings cascade from global → project → agent scope. Understanding this hierarchy is critical because misconfiguration causes mysterious failures and frustrating debugging sessions.
The global scope (~/.kiro/settings.json) defines your baseline: default model, telemetry preferences, and global steering rules. These apply to every Kiro session unless overridden.
The project scope (.kiro/settings.json in your repo) enforces workspace-specific constraints: which MCP servers to load, project-specific steering files, and tool permissions. These override global settings.
The agent scope (.kiro/agents/your-agent.json) creates specialized personas with locked-down capabilities. A security auditor agent might only have read permissions and forced Opus model for deep analysis.
When Kiro starts, it merges these three layers, with agent scope winning conflicts, then project, then global. This allows you to define sensible defaults globally while enforcing strict boundaries per-project and per-agent.
Deep Dive (500+ words)
The configuration hierarchy solves a fundamental problem in agentic systems: how do you balance flexibility with safety across different contexts?
Global Configuration (~/.kiro/settings.json)
Your global config is your personal AI preferences. This is where you set:
- Default model: Auto router, Haiku, Sonnet, or Opus
- Telemetry: Whether AWS can collect usage data
- Global steering: Organization-wide coding standards (e.g., “Always use TypeScript strict mode”)
- Global MCP servers: Tools you want available everywhere (e.g., AWS docs, GitHub)
Example global config:
{
"model": "auto",
"telemetry": false,
"steering": {
"files": [
"file://~/.kiro/steering/typescript-standards.md",
"file://~/.kiro/steering/company-security-policy.md"
]
},
"mcpServers": {
"github": {
"command": "docker",
"args": ["run", "-i", "ghcr.io/github/mcp-github"],
"env": {
"GITHUB_TOKEN": "${GITHUB_TOKEN}"
}
}
}
}
Project Configuration (.kiro/settings.json)
Project configs override globals for workspace-specific needs:
- Project-specific MCP servers: Postgres for backend, none for frontend
- Stricter tool permissions: Block shell commands in production repos
- Project steering: “Use Prisma for all DB queries”, “API routes go in /api”
Example project config:
{
"model": "opus",
"steering": {
"files": [
"file://.kiro/steering/api-standards.md",
"file://.kiro/steering/database-rules.md"
]
},
"tools": {
"shell": {
"allowedCommands": ["npm test", "npm run lint"],
"deniedCommands": ["rm -rf", "sudo", "curl | sh"]
}
},
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres", "${DATABASE_URL}"]
}
}
}
Agent Configuration (.kiro/agents/security-auditor.json)
Agent configs create specialized personas with narrow capabilities:
{
"name": "security-auditor",
"model": "opus",
"prompt": "You are a security auditor. Review code for OWASP Top 10 vulnerabilities. You cannot modify files—only report findings.",
"allowedTools": ["read", "grep", "glob"],
"deniedTools": ["write", "shell", "aws"],
"resources": [
"file://docs/security-standards.md",
"file://docs/owasp-top-10.md"
],
"mcpServers": {
"sonarqube": {
"command": "node",
"args": ["./mcp-servers/sonarqube.js"]
}
}
}
Precedence Resolution
When Kiro starts, it merges configs in this order:
- Load global config
- Overlay project config (if in a repo with
.kiro/) - Overlay agent config (if
--agent <name>flag used)
Conflict resolution:
- Arrays: Concatenate (steering files from all levels)
- Objects: Deep merge (MCP servers from all levels)
- Primitives: Agent wins, then project, then global
Common Pitfalls:
- Forgetting project overrides global: You set
model: haikuglobally but project forcesopus—debugging why responses are slow - Agent deniedTools blocking MCP: Your agent denies “shell” but your MCP server needs it to run
- Steering file conflicts: Global says “use tabs”, project says “use spaces”—Kiro gets confused
Mental Model: Think of configs as CSS specificity:
- Global = element selectors (broad, low priority)
- Project = class selectors (scoped, medium priority)
- Agent = ID selectors (specific, highest priority)
How this fits in projects:
- Project 1: Personalized Kiro Config - You’ll build a robust global config
- Project 4: Custom Agent Factory - You’ll create specialized agent configs
- Project 5: Steering Rules Engine - You’ll write comprehensive steering files
- Project 16: Configuration Sync System - You’ll share configs across machines
Common Misconceptions:
- “Global config is read-only”: False. You can edit it anytime, changes apply to new sessions.
- “Agent config must live in
.kiro/agents/“: False. You can load from anywhere with--agent-config path/to/agent.json. - “Precedence always favors specificity”: Mostly true, but arrays concatenate rather than override.
Check-your-understanding questions:
- If global sets
model: haikuand project setsmodel: opus, which wins? - If global steering has
typescript-standards.mdand project hasapi-standards.md, which files load? - How would you debug which config is actually being used?
Check-your-understanding answers:
- Project wins (
opus). Agent would win over both. - Both files load. Steering arrays concatenate.
- Run
kiro-cli settings showto see merged config. Or check~/.kiro/logs/for config loading logs.
Real-world applications:
- Enterprise teams: Global config enforces company security policies, project config adds repo-specific rules
- Freelancers: Global config for personal preferences, project configs per client
- Open source: Project config for contributor guidelines, global empty for maintainer flexibility
Where you’ll apply it:
- Project 1: Build layered config system
- Project 4: Create specialized agents
- Project 16: Sync configs across machines
- Project 37: Team configuration sharing
References:
- Kiro CLI Docs: Configuration - https://kiro.dev/docs/cli/configuration/
- “The Pragmatic Programmer” by Hunt & Thomas - Ch. 4: “Configuration”
- AWS re:Invent 2024: Kiro CLI Deep Dive (YouTube)
Key insights: Config hierarchy enables both flexibility (global defaults) and safety (agent restrictions)—the foundation of scalable agentic workflows.
Summary: Kiro’s three-tier config system (global → project → agent) allows you to balance personal preferences, project requirements, and agent specialization. Mastering this hierarchy prevents configuration conflicts and enables team-wide standardization.
Homework/Exercises to practice the concept:
- Create a global config that sets Haiku as default model and enables telemetry
- Create a project config that overrides to Opus and loads a Postgres MCP server
- Create an agent config that denies all write tools
- Run
kiro-cli settings showand explain which settings came from which layer - Intentionally create a conflict (e.g., global allows shell, agent denies it) and observe behavior
Solutions to the homework/exercises:
- Global config:
{"model": "haiku", "telemetry": true} - Project config:
{"model": "opus", "mcpServers": {"postgres": {...}}} - Agent config:
{"deniedTools": ["write", "edit", "multiedit"]} - Settings show will display merged config with annotations showing source
- Agent denial wins—shell commands will be blocked despite global allowing them
(Continue with remaining chapters: Context Management, Steering, MCP, Hooks, Subagents, Planning, Models, etc.)
I’ll write this file in chunks to stay within token limits. Should I continue with the remaining chapters of the Theory Primer?
Chapter 2: Context Management - RAM, Not Disk
Fundamentals
Context is Kiro’s working memory—the information it actively uses to understand your intent and execute tasks. Unlike traditional tools that read files on demand, Kiro loads relevant information into its “context window” (a token budget, typically 200K tokens). Think of it as RAM for the AI: fast, limited, and requiring active management. Context includes chat history, file contents, steering rules, and MCP tool outputs. When context fills up, Kiro must compact (summarize) or remove information, risking the loss of important details. Mastering context management is the difference between an AI that “remembers” your project and one that constantly forgets critical constraints.
Deep Dive
Kiro’s context system operates on a tiered hierarchy with three distinct layers:
1. Session Context (Chat History)
Every message you send and every response Kiro generates consumes tokens. A 100-line conversation can consume 20-50K tokens. This is why long sessions feel “forgetful”—early messages get summarized or dropped. The /compact command intelligently summarizes chat history while preserving critical information (steering rules, file references, decisions made).
2. Agent Resources (Explicit Context)
Files you explicitly add via /context add src/auth.ts or by mentioning them (“look at auth.ts”). These remain in context until you /context remove them or context fills up. Kiro also loads files automatically when you reference them in prompts. The /context show command displays what’s currently loaded, broken down by source (chat, files, steering).
3. Knowledge Base (Indexed Codebase)
For large codebases, Kiro can index your entire project using /knowledge enable. This creates vector embeddings for semantic search. When context is full, Kiro queries the knowledge base instead of holding everything in RAM. It’s slower but unlimited—like swapping to disk.
Mental Model:
┌────────────────────────────────────────────────────────┐
│ KIRO CONTEXT WINDOW (200K tokens) │
├────────────────────────────────────────────────────────┤
│ │
│ Session Context (Chat History) │
│ ┌──────────────────────────────────────────────┐ │
│ │ User: "Add authentication" │ 5K │
│ │ Kiro: "I'll create JWT middleware..." │ │
│ │ User: "Use bcrypt for passwords" │ │
│ │ Kiro: "Installing bcrypt..." │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ Agent Resources (Explicit Files) │
│ ┌──────────────────────────────────────────────┐ │
│ │ • src/auth/middleware.ts (4.2K) │ 45K │
│ │ • src/models/User.ts (3.1K) │ │
│ │ • .kiro/steering/security.md (1.8K) │ │
│ │ • package.json (0.9K) │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ Steering Rules (Active Constraints) │
│ ┌──────────────────────────────────────────────┐ │
│ │ • .kiro/steering/tech.md (2.5K) │ 8K │
│ │ • .kiro/steering/api-standards.md (1.2K) │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ Total: 58K / 200K (29% full) │
└────────────────────────────────────────────────────────┘
When full, Kiro must:
1. Compact (summarize) chat history
2. Remove least-used files
3. Query knowledge base instead of loading everything
How this fits on projects:
- Project 3: Visualize context usage in real-time
- Project 20: Inject git diffs automatically
- Project 25: Use tangent mode to isolate exploratory work
- Project 28: Enable knowledge base for large codebases
Minimal concrete example:
# Load a file into context
$ kiro-cli
> /context add src/auth.ts
# Check what's in context
> /context show
Session Context: 12,450 tokens (chat history)
Agent Resources: 8,200 tokens
• src/auth.ts (4,200 tokens)
• .kiro/steering/security.md (4,000 tokens)
Total: 20,650 / 200,000 tokens (10%)
# Remove file when done
> /context remove src/auth.ts
# Compact chat history when it gets large
> /compact
✓ Compacted 45 messages into 3 summary blocks
Saved 18,000 tokens
Common misconceptions:
- “Context is permanent”: No, it’s cleared on session restart unless you checkpoint
- “Kiro reads files from disk as needed”: No, files must be in context or knowledge base
- “Bigger context is always better”: No, too much context dilutes signal—Kiro loses focus
Check-your-understanding questions:
- Why does Kiro “forget” earlier parts of long conversations?
- What’s the difference between
/context addand enabling knowledge base? - When should you use
/compactvs/context clear?
Check-your-understanding answers:
- Chat history consumes tokens. When context fills, early messages get summarized or dropped
/context addkeeps files in active RAM (fast, limited). Knowledge base indexes everything (slow retrieval, unlimited storage)/compactwhen you want to preserve decisions but free space./context clearwhen starting fresh
Real-world applications:
- Code reviews: Load changed files + steering rules, compact when reviewing many PRs
- Refactoring: Enable knowledge base for cross-file references, use context for active files
- Debugging: Add error logs + relevant source files, remove when fixed
Where you’ll apply it:
- Project 3: “The Context Detective” - Visualize context usage
- Project 20: “The Git Context Injector” - Auto-inject diffs
- Project 25: “The Tangent Explorer” - Isolate exploratory context
- Project 28: “The Semantic Search Engine” - Enable knowledge base
References:
- Kiro CLI Docs: Context Management - https://kiro.dev/docs/cli/context/
- “AI Engineering” by Chip Huyen - Ch. 3: “Context Windows”
- “Designing Data-Intensive Applications” by Kleppmann - Ch. 3: “Storage”
Key insights: Context is scarce—treat it like RAM, not disk. Load what you need, remove what you don’t, and use knowledge base for deep search.
Summary: Kiro’s context window is its working memory, limited to ~200K tokens. It’s divided into session context (chat), agent resources (files), and steering rules. When full, Kiro compacts or forgets. For large codebases, enable knowledge base for unlimited semantic search.
Homework/Exercises:
- Load 5 files into context and run
/context show. Calculate what percentage of your budget is used - Have a 20-message conversation, then run
/compact. Check how many tokens you saved - Enable
/knowledgeon a 50+ file project and ask “where is authentication handled?” - Deliberately overflow context (add huge files) and observe Kiro’s behavior
- Create a tangent for debugging, then return to main context. Verify tangent didn’t pollute main thread
Solutions:
/context addeach file, then/context showwill display token breakdown/compactshows “Saved X tokens” - typically 30-60% reduction/knowledge enable && kiro ask "where is auth"- Kiro will cite specific files/lines- Kiro will warn “Context nearly full” and auto-compact or reject adding more files
/tangent "debug login"→ work →/return→/context show(tangent content absent)
Chapter 3: Steering and Specs - Constraints That Actually Work
Fundamentals
Steering files are Markdown documents that constrain AI behavior through explicit rules. They function as “soft prompts” that Kiro reads before executing any task. Unlike hard-coded logic, steering uses natural language constraints that the LLM interprets and enforces. A steering file might say “Never use var in JavaScript” or “All API responses must include error codes.” Kiro reads these rules, internalizes them, and refuses to violate them—even when explicitly asked. This makes steering the foundation of enterprise AI deployment: you encode company standards, security policies, and architectural decisions into files that travel with your codebase.
Deep Dive
Steering operates on a hierarchical loading system:
1. Global Steering (~/.kiro/steering/*.md)
Applied to ALL sessions across ALL projects. Use for personal coding preferences (e.g., “Always use TypeScript strict mode”) or company-wide policies (e.g., “Never commit AWS credentials”).
2. Project Steering (<project>/.kiro/steering/*.md)
Applied only to the current project. Use for repo-specific rules (e.g., “Follow Rails conventions” or “Use Tailwind for styling”).
3. Agent Steering (defined in agent config) Embedded directly in custom agent JSON files. Use for role-specific constraints (e.g., security auditor: “Never suggest write operations”).
File Structure Best Practices:
.kiro/steering/
├── tech.md # Technical constraints (languages, frameworks)
├── product.md # Business logic rules
├── style.md # Code style and formatting
├── security.md # Security policies
└── api-standards.md # API design principles
Mental Model:
┌────────────────────────────────────────────────────────┐
│ STEERING FILE LOADING HIERARCHY │
├────────────────────────────────────────────────────────┤
│ │
│ 1. Global Steering (Personal/Company Standards) │
│ ~/.kiro/steering/ │
│ ├── security.md ← "No hardcoded secrets" │
│ └── typescript.md ← "Strict mode always" │
│ ↓ │
│ 2. Project Steering (Repo-Specific Rules) │
│ ./.kiro/steering/ │
│ ├── api-standards.md ← "REST conventions" │
│ └── testing.md ← "100% test coverage" │
│ ↓ │
│ 3. Agent Steering (Role-Specific Constraints) │
│ Agent config JSON │
│ "Never write files" (for security-auditor) │
│ ↓ │
│ MERGED INTO CONTEXT │
│ All steering files concatenated and loaded │
│ Kiro enforces ALL rules simultaneously │
│ │
└────────────────────────────────────────────────────────┘
How this fits on projects:
- Project 2: Create hierarchical steering files
- Project 5: Use
/planwith steering constraints - Project 6: Build agents with embedded steering
- Project 30: Analyze steering effectiveness and improve it
Minimal concrete example:
# Technical Constraints
## Language Rules
- Use TypeScript strict mode for all `.ts` files
- Never use `any` type—use `unknown` instead
- Prefer `const` over `let`, never use `var`
## Framework Constraints
- Use React functional components, not class components
- Use React Query for data fetching, not useEffect
- Use Zod for runtime validation
## Error Handling
- All async functions must have try/catch blocks
- Never swallow errors silently
- Log errors to structured logging service
Common misconceptions:
- “Steering is just comments”: No, Kiro actively enforces these rules and will refuse violations
- “More steering is better”: No, too many rules confuse the LLM—focus on critical constraints
- “Steering replaces code review”: No, it prevents common mistakes but humans still validate logic
Check-your-understanding questions:
- If global steering says “use tabs” and project steering says “use spaces”, what happens?
- How would you write a steering rule that prevents SQL injection?
- When should steering go in global vs project vs agent config?
Check-your-understanding answers:
- Both load—Kiro sees conflicting rules and may ask for clarification or default to project (more specific)
- “Never construct SQL queries with string concatenation. Always use parameterized queries or an ORM.”
- Global: universal standards (security, style). Project: repo conventions. Agent: role restrictions (read-only, specific tools).
Real-world applications:
- Startups: Enforce architectural decisions as team grows (e.g., “Use event sourcing for all domain events”)
- Enterprises: Company-wide security policies (e.g., “All PII must be encrypted at rest”)
- Open source: Contributor guidelines (e.g., “All PRs must include tests and update CHANGELOG.md”)
Where you’ll apply it:
- Project 2: “The Steering Enforcer” - Build hierarchical steering system
- Project 5: “The Plan Architect” - Use steering with planning agents
- Project 18: “The Security Firewall Hook” - Enforce steering via hooks
- Project 30: “The Recursive Prompt Improver” - Analyze and improve steering
References:
- Kiro CLI Docs: Steering Files - https://kiro.dev/docs/cli/steering/
- “The Pragmatic Programmer” by Hunt & Thomas - Ch. 2: “A Pragmatic Approach”
- “Clean Architecture” by Robert C. Martin - Ch. 22: “The Clean Architecture”
Key insights: Steering encodes human judgment into executable constraints—the bridge between “describe intent” and “enforce standards.”
Summary: Steering files are Markdown documents with rules that Kiro enforces. They load hierarchically (global → project → agent) and function as “soft prompts” that constrain behavior. Use them to encode standards, policies, and architectural decisions that apply across your team or project.
Homework/Exercises:
- Create a
security.mdsteering file with 5 rules (no secrets, parameterized SQL, etc.) - Write a
style.mdthat enforces your team’s code style - Intentionally violate a steering rule and observe Kiro’s response
- Create an agent with steering that prevents all write operations
- Test steering conflict: global says “use tabs”, project says “use spaces”
Solutions:
- Example rule: “Never commit files containing ‘API_KEY=’ or ‘SECRET=’”
- Example: “Use Prettier with
singleQuote: trueandsemi: false” - Kiro will refuse or warn: “This violates steering rule in security.md”
- Agent config:
{"allowedTools": ["read", "grep", "glob"]}+ steering: “You are read-only” - Kiro may ask for clarification or use project-specific rule (more granular wins)
Chapter 4: MCP (Model Context Protocol) - Connecting to External Systems
Fundamentals
Model Context Protocol (MCP) is Kiro’s plugin architecture—a standardized way to extend the AI with external capabilities without teaching it new skills directly. Instead of asking Kiro to “learn how to query Postgres,” you connect an MCP server that already knows how. Think of MCP servers as specialized translators: Kiro asks them questions in natural language, they execute the technical operation (SQL query, API call, cloud command), and return structured results Kiro can understand. This creates a clean separation: Kiro handles reasoning and orchestration, MCP servers handle domain-specific execution.
How This Fits on Projects
- Project 6: Build your first MCP server connector for Postgres
- Project 7: Create GitHub automation using the official GitHub MCP server
- Project 9: Connect to AWS using the aws-docs MCP server
- Project 13: Implement a custom MCP server in Python
- Project 14: Build a filesystem MCP server in Node.js
- Project 18: Integrate Docker operations via MCP
Definitions & Key Terms
- MCP Server: A standalone process that provides tools (functions) and resources (data) to Kiro via the MCP protocol
- Local Server: An MCP server running on your machine (command + args)
- Remote Server: An MCP server accessible via HTTP/HTTPS (url + headers)
- Tool: An exposed function that Kiro can call (e.g.,
query_database,list_issues) - Resource: Data the server provides (e.g., database schemas, file trees)
- Transport: How Kiro communicates with the server (stdio for local, HTTP for remote)
- Scope: Where the MCP configuration applies (workspace vs user-level)
Mental Model Diagram
┌──────────────────────────────────────────────────────────────┐
│ KIRO CLI (Orchestrator) │
│ "Find all bugs in my GitHub repo assigned to me" │
└────────────┬──────────────────────────┬──────────────────────┘
│ │
│ MCP Protocol │ MCP Protocol
│ (JSON-RPC over stdio) │ (JSON-RPC over HTTP)
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ GitHub MCP Server │ │ Postgres MCP Server│
│ (Local - npx) │ │ (Remote - URL) │
├─────────────────────┤ ├─────────────────────┤
│ Tools: │ │ Tools: │
│ • list_issues │ │ • query │
│ • create_pr │ │ • list_tables │
│ • add_comment │ │ • describe_schema │
│ │ │ │
│ Resources: │ │ Resources: │
│ • repo_tree │ │ • table_schemas │
│ • open_issues │ │ • active_queries │
└──────────┬──────────┘ └──────────┬──────────┘
│ │
│ GitHub API │ PostgreSQL Protocol
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ GitHub.com API │ │ Postgres DB │
└─────────────────┘ └─────────────────┘
Configuration Files:
• Workspace: .kiro/settings/mcp.json (project-specific servers)
• User: ~/.kiro/settings/mcp.json (global servers)
Deep Dive
MCP solves a fundamental problem in AI tooling: How do you give an AI access to specialized systems without bloating its core capabilities? Before MCP, every new integration required custom code, tight coupling, and fragile APIs. MCP introduces a universal contract: servers expose tools (callable functions) and resources (queryable data), and clients (like Kiro) discover and invoke them dynamically.
The MCP Architecture:
- Server Process: Each MCP server is a separate process (Python script, Node.js app, compiled binary) that implements the MCP specification. The server exposes:
- Tools: Functions Kiro can invoke (e.g.,
query_database(sql: string)) - Resources: Data endpoints Kiro can read (e.g.,
database://schema/users) - Metadata: Descriptions, parameter schemas, error handling
- Tools: Functions Kiro can invoke (e.g.,
- Transport Layer: Communication happens via JSON-RPC:
- stdio transport: For local servers (Kiro spawns the process, communicates via stdin/stdout)
- HTTP transport: For remote servers (Kiro makes HTTP requests to a URL)
-
Discovery: When Kiro starts, it reads
mcp.jsonconfig files (user-level and workspace-level), spawns configured servers, and calls theirlist_toolsandlist_resourcesmethods to discover capabilities. - Invocation: When you ask Kiro to “check database for users created today,” Kiro:
- Recognizes this requires database access
- Finds the Postgres MCP server’s
querytool - Constructs a SQL query:
SELECT * FROM users WHERE created_at::date = CURRENT_DATE - Invokes the tool via JSON-RPC:
{"method": "tools/call", "params": {"name": "query", "arguments": {"sql": "..."}}} - Receives structured results:
{"rows": [...], "columns": [...], "rowCount": 42} - Formats the response for you: “Found 42 users created today…”
Configuration Hierarchy:
MCP configurations follow Kiro’s three-tier system:
- User-level (
~/.kiro/settings/mcp.json): Global servers available in all projects (e.g., GitHub, AWS docs) - Workspace-level (
.kiro/settings/mcp.json): Project-specific servers (e.g., local Postgres for this app) - Merge behavior: Workspace configs extend (not replace) user configs. If both define a server with the same name, workspace wins.
Security Model:
MCP servers run with the same permissions as Kiro itself. This means:
- A Postgres MCP server can execute any SQL query Kiro requests (including
DROP TABLE) - A filesystem MCP server can read/write any file Kiro asks for
- Authentication happens via environment variables (
${GITHUB_TOKEN}) to avoid hardcoding secrets
Lifecycle:
- Startup: Kiro reads
mcp.json, spawns local servers (via command + args), connects to remote servers (via URL) - Discovery: Kiro queries each server for available tools and resources
- Runtime: As you chat, Kiro invokes tools when needed
- Shutdown: When you exit Kiro, local server processes are terminated
Example Configuration Breakdown:
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "${GITHUB_TOKEN}"
}
},
"postgres": {
"url": "https://mcp.example.com/postgres",
"headers": {
"Authorization": "Bearer ${DB_API_KEY}"
}
}
}
}
githubserver (local):command: npx: Uses npx to run the serverargs: Passes-y(auto-install) and package nameenv.GITHUB_TOKEN: References environment variable (never hardcode!)- Kiro spawns this as a child process on startup
postgresserver (remote):url: HTTP endpoint hosting the MCP serverheaders.Authorization: Passes Bearer token from environment- Kiro makes HTTP POST requests to this URL
How It Works (Step-by-Step)
- Add MCP Server: Run
kiro-cli mcp add --name github --command npx --args="-y @modelcontextprotocol/server-github" --scope workspace - Configure Environment: Set
export GITHUB_TOKEN=ghp_...in your shell - Start Kiro: Run
kiro-cli. Kiro reads.kiro/settings/mcp.jsonand spawns the GitHub server - Discovery: Kiro calls the server’s
list_toolsmethod and learns aboutlist_issues,create_pr, etc. - User Query: You ask: “Show me all open bugs in my repo”
- Tool Selection: Kiro’s auto-router decides to use
list_issuestool - Invocation: Kiro sends JSON-RPC request:
{"method": "tools/call", "params": {"name": "list_issues", "arguments": {"state": "open", "labels": ["bug"]}}} - Execution: GitHub MCP server calls GitHub API, fetches issues, returns JSON
- Response: Kiro formats the result: “Found 8 open bugs: #42 (auth crash), #51 (timeout)…”
Minimal Concrete Example
1. Create a simple MCP server config (.kiro/settings/mcp.json):
{
"mcpServers": {
"fetch": {
"command": "uvx",
"args": ["mcp-server-fetch"]
}
}
}
2. Start Kiro and verify:
$ kiro-cli
> Show me the tools you have access to
I have access to the following MCP tools:
• fetch (from fetch server): Fetch a URL and return its content
> Fetch https://api.github.com/repos/torvalds/linux and show me the star count
Fetching https://api.github.com/repos/torvalds/linux...
The Linux repository has 182,456 stars.
Common Misconceptions
- “MCP servers are plugins that run inside Kiro”
- Truth: MCP servers are separate processes. Kiro communicates with them via JSON-RPC, not by loading code into its runtime.
- “I need to write code for Kiro to understand new tools”
- Truth: You configure MCP servers declaratively in
mcp.json. Kiro discovers tools automatically via thelist_toolsRPC method.
- Truth: You configure MCP servers declaratively in
- “MCP servers must be written in JavaScript”
- Truth: Any language that can output JSON-RPC over stdout (Python, Go, Rust, etc.) can be an MCP server. The protocol is language-agnostic.
- “Environment variables in mcp.json are optional”
- Truth: They’re critical for security. Never hardcode
GITHUB_TOKENorDATABASE_URLin config files. Use${VAR}syntax.
- Truth: They’re critical for security. Never hardcode
- “Remote MCP servers are slower than local”
- Truth: It depends. Remote servers add HTTP latency, but they can be hosted on powerful hardware. Local servers are faster for startup but limited by your machine.
Check-Your-Understanding Questions
- What are the two types of MCP server transports, and when would you use each?
- If you define a “github” MCP server in both
~/.kiro/settings/mcp.jsonand.kiro/settings/mcp.json, which one does Kiro use? - How does Kiro discover what tools an MCP server provides?
- Why should you use
${GITHUB_TOKEN}instead of hardcoding your token inmcp.json? - What happens to local MCP server processes when you exit Kiro?
Check-Your-Understanding Answers
-
stdio (for local servers spawned by Kiro) and HTTP (for remote servers accessed via URL). Use stdio for servers you control locally (faster, no network). Use HTTP for centralized servers shared across teams or hosted on different machines.
-
The workspace config (
.kiro/settings/mcp.json) takes precedence. Kiro merges configs with workspace overriding user-level for conflicting keys. -
Kiro calls the server’s
list_toolsJSON-RPC method during startup. The server responds with metadata for each tool (name, description, parameter schema). -
Security: Hardcoding tokens exposes secrets if you commit the file to Git. Environment variables keep secrets out of version control. Also,
${VAR}syntax lets you use different tokens per environment (dev/staging/prod). -
Kiro terminates them. Local MCP servers are child processes spawned by Kiro. When Kiro exits, it sends SIGTERM to clean up.
Real-World Applications
- Database Operations: Connect Postgres/MySQL MCP servers to query production databases without writing SQL manually
- Cloud Automation: Use AWS/GCP MCP servers to provision infrastructure (“Create an S3 bucket for logs”)
- API Integration: GitHub MCP server for issue tracking, Slack MCP server for notifications, Jira for project management
- Custom Tools: Build domain-specific MCP servers (e.g., “deploy” tool that runs your CI/CD pipeline)
- Enterprise Systems: Connect to internal APIs (CRM, ERP) via custom MCP servers with SSO authentication
Where You’ll Apply It
- Project 6: The Postgres Analyst - Connect to a local Postgres database via MCP, query schemas, and run analytics queries
- Project 7: The GitHub Project Manager - Use the official GitHub MCP server to automate issue triage and PR workflows
- Project 9: The AWS Cloud Architect - Leverage the AWS documentation MCP server to query best practices and provision resources
- Project 13: The Custom Tool Builder (Python) - Implement an MCP server from scratch in Python that exposes custom tools
- Project 14: The File System Guardian (Node.js) - Build a Node.js MCP server that provides safe filesystem operations with guardrails
- Project 18: The Docker MCP Integration - Create an MCP server that wraps Docker CLI commands for container management
References
- Official Documentation: Model Context Protocol (MCP) - CLI - Kiro (2025)
- Configuration Guide: Configuration - CLI - MCP - Kiro (2025)
- Examples: MCP Examples - CLI - Kiro (2025)
- Server Directory: MCP Server Directory - Kiro (2025)
- Blog Post: Unlock your development productivity with Kiro and MCP (2025)
- Book: “Building AI Agents” by multiple authors - Chapter on Tool Integration Patterns
- MCP Specification: Model Context Protocol Specification (official spec, 2024-2025)
Key Insights
MCP turns Kiro from a chatbot into an orchestration platform: instead of teaching the AI new skills, you connect specialized workers (MCP servers) that already know their domain, and Kiro coordinates them via natural language.
Summary
Model Context Protocol (MCP) is Kiro’s standardized plugin system for connecting to external tools and data sources. Servers expose tools (callable functions) and resources (queryable data) via JSON-RPC, using either stdio transport (local servers) or HTTP transport (remote servers). Configuration happens in mcp.json files at user or workspace level, with environment variables for secrets. Kiro discovers server capabilities on startup and invokes tools dynamically during conversations, creating a clean separation between AI reasoning (Kiro) and domain execution (MCP servers).
Homework/Exercises to Practice the Concept
- Exercise 1: Add a Fetch MCP Server
- Install the
mcp-server-fetchserver using uvx - Configure it in
.kiro/settings/mcp.json - Ask Kiro to fetch
https://api.github.com/users/torvaldsand summarize the profile
- Install the
- Exercise 2: Configure GitHub MCP with Authentication
- Generate a GitHub personal access token (classic) with
reposcope - Add
GITHUB_TOKENto your environment - Configure the
@modelcontextprotocol/server-githubMCP server - Ask Kiro to list all open issues in one of your repos
- Generate a GitHub personal access token (classic) with
- Exercise 3: Workspace vs User MCP Configs
- Add a “fetch” server to your user-level config (
~/.kiro/settings/mcp.json) - Add a different “fetch” server (or disable it) in a workspace config (
.kiro/settings/mcp.json) - Start Kiro and verify which configuration wins
- Add a “fetch” server to your user-level config (
- Exercise 4: Explore MCP Server Tools
- Configure any MCP server from the server directory
- Ask Kiro: “What tools does the [server_name] server provide?”
- Test each tool with a realistic query
- Exercise 5: Debug an MCP Server Failure
- Intentionally misconfigure an MCP server (wrong command, missing env var)
- Start Kiro and observe the error messages
- Fix the configuration and verify the server starts successfully
Solutions to the Homework/Exercises
Solution 1:
# Install the server globally (if needed)
$ uvx mcp-server-fetch
# Create/edit .kiro/settings/mcp.json
$ mkdir -p .kiro/settings
$ cat > .kiro/settings/mcp.json << 'EOF'
{
"mcpServers": {
"fetch": {
"command": "uvx",
"args": ["mcp-server-fetch"]
}
}
}
EOF
# Start Kiro and test
$ kiro-cli
> Fetch https://api.github.com/users/torvalds and tell me about this user
Fetching https://api.github.com/users/torvalds...
Linus Torvalds:
- Username: torvalds
- Name: Linus Torvalds
- Public repos: 6
- Followers: 215,000+
- Bio: Creator of Linux and Git
Solution 2:
# Generate token at https://github.com/settings/tokens (classic, repo scope)
$ export GITHUB_TOKEN=ghp_yourTokenHere
# Add to workspace config
$ cat >> .kiro/settings/mcp.json << 'EOF'
{
"mcpServers": {
"fetch": {
"command": "uvx",
"args": ["mcp-server-fetch"]
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "${GITHUB_TOKEN}"
}
}
}
}
EOF
# Start Kiro
$ kiro-cli
> List all open issues in my learning_journey_c repo
Querying GitHub for open issues in learning_journey_c...
Found 3 open issues:
1. #42: "Add DNS deep dive projects" (created 2 days ago)
2. #51: "Expand ARM assembly guide" (created 5 days ago)
3. #58: "Create Rust concurrency chapter" (created 1 week ago)
Solution 3:
# User-level config (~/.kiro/settings/mcp.json):
{
"mcpServers": {
"fetch": {
"command": "uvx",
"args": ["mcp-server-fetch"]
}
}
}
# Workspace config (.kiro/settings/mcp.json):
{
"mcpServers": {
"fetch": {
"disabled": true
}
}
}
# Result: The workspace config wins, fetch server is disabled in this project
# Kiro will report: "MCP server 'fetch' is disabled in workspace configuration"
Solution 4:
# Example with SQLite MCP server
$ cat > .kiro/settings/mcp.json << 'EOF'
{
"mcpServers": {
"sqlite": {
"command": "uvx",
"args": ["mcp-server-sqlite", "--db-path", "./test.db"]
}
}
}
EOF
$ kiro-cli
> What tools does the sqlite server provide?
The sqlite MCP server provides these tools:
• query: Execute SQL queries on the database
• create_table: Create a new table with specified schema
• list_tables: Show all tables in the database
• describe_table: Show schema for a specific table
> Create a table called "users" with columns id, name, and email
Creating table "users"...
Table created successfully with columns: id (INTEGER PRIMARY KEY), name (TEXT), email (TEXT)
Solution 5:
# Broken config (missing GITHUB_TOKEN environment variable)
$ cat > .kiro/settings/mcp.json << 'EOF'
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "${GITHUB_TOKEN}"
}
}
}
}
EOF
$ kiro-cli
Error starting MCP server 'github': Environment variable GITHUB_TOKEN not found
# Fix: Set the environment variable
$ export GITHUB_TOKEN=ghp_yourTokenHere
$ kiro-cli
Starting Kiro CLI...
✓ MCP server 'github' started successfully
>
Chapter 5: Hooks and Guardrails - The Nervous System
Fundamentals
Hooks are Kiro’s event-driven automation system—shell scripts that execute at specific moments in the agent’s workflow (before user prompts, before tool use, after tool use, on agent spawn). They act as the “nervous system,” reacting to events without you explicitly calling them. Instead of telling Kiro “run this command before every file write,” you configure a preToolUse hook with a matcher for the write tool, and it fires automatically every time. Hooks enable guardrails (prevent dangerous operations), auditing (log every command), auto-formatting (run Prettier after writes), and dynamic context injection (add git status before each prompt).
How This Fits on Projects
- Project 8: Build a type-safe pre-commit hook system in TypeScript with Bun
- Project 17: Create a security hook that validates code before execution
- Project 18: Implement a guardrail hook that prevents destructive operations
- Project 20: Build a Git context injector hook that adds repo state to every conversation
- Project 22: Create a test-generator hook that auto-generates tests after writes
- Project 24: Implement a secret sanitizer hook that scans for exposed credentials
Definitions & Key Terms
- Hook: A shell command triggered by Kiro events (agentSpawn, userPromptSubmit, preToolUse, postToolUse)
- agentSpawn: Fires once when the agent initializes (setup tasks, environment checks)
- userPromptSubmit: Fires when you submit a message (inject context, run pre-flight checks)
- preToolUse: Fires before a tool executes (validation, blocking dangerous operations)
- postToolUse: Fires after a tool executes (auto-formatting, cleanup, notifications)
- Matcher: Pattern to filter which tools trigger preToolUse/postToolUse hooks
- Blocking Hook: A preToolUse hook that returns exit code 2 to prevent tool execution
- Passthrough Hook: A hook that modifies stdin/stdout but allows execution (exit code 0)
- Scope: Where the hook is configured (global, project, agent-specific)
Mental Model Diagram
┌───────────────────────────────────────────────────────┐
│ KIRO HOOK LIFECYCLE (Event-Driven) │
└───────────────────────────────────────────────────────┘
1. Agent Startup
↓
agentSpawn Hook(s) ────→ Setup environment, load data
↓
2. User Types Message: "Write auth.ts file"
↓
userPromptSubmit Hook(s) ──→ Inject git status, check branch
↓
3. Kiro Decides: "Use Write tool for auth.ts"
↓
preToolUse Hook(s) ────────→ Validate (check for secrets)
├─ Exit 0: ✅ Allow │ - Scan content for API_KEY
└─ Exit 2: ❌ Block │ - Return error to Kiro
↓
4. Write Tool Executes: Create auth.ts
↓
postToolUse Hook(s) ────────→ Auto-format, run linter
↓ - prettier --write auth.ts
5. Return Result to User - eslint --fix auth.ts
↓
6. User Types Next Message...
↓
(Loop back to step 2)
Hook Configuration:
┌────────────────────────────────────────────────┐
│ .kiro/settings/agent-config.json │
├────────────────────────────────────────────────┤
│ { │
│ "hooks": { │
│ "preToolUse": [ │
│ { │
│ "matcher": "write", │
│ "command": "scan_for_secrets.sh" │
│ } │
│ ], │
│ "postToolUse": [ │
│ { │
│ "matcher": "write", │
│ "command": "prettier --write \"$FILE\"" │
│ } │
│ ] │
│ } │
│ } │
└────────────────────────────────────────────────┘
Deep Dive
Hooks solve the automation gap between “Kiro does what I ask” and “Kiro respects my team’s policies automatically.” Without hooks, you’d need to remind Kiro every time: “Before you write code, check for secrets. After you write code, run the formatter.” Hooks make these rules automatic.
The Four Hook Types:
- agentSpawn (Initialization Hook)
- When: Once, when the agent starts
- Use Cases:
- Load environment variables from
.env - Check if required tools are installed (
docker,kubectl, etc.) - Verify git branch (“Only work on
feature/*branches”) - Initialize workspace state (create temp directories, start services)
- Load environment variables from
- Exit Behavior: Exit code ignored (non-blocking)
- Example:
"command": "[ $(git branch --show-current) = main ] && echo 'ERROR: Do not work on main branch' >&2 || true"
- userPromptSubmit (Context Injection Hook)
- When: Every time you submit a message
- Use Cases:
- Inject git status into conversation:
git status --short - Add current branch:
echo "Current branch: $(git branch --show-current)" - Show failing tests:
npm test 2>&1 | grep FAIL - Add TODO comments:
rg "TODO|FIXME" -n
- Inject git status into conversation:
- Exit Behavior: Exit code ignored (non-blocking)
- stdin/stdout: Hook output is appended to your message as context
- Example:
"command": "git diff --stat"
- preToolUse (Validation/Blocking Hook)
- When: Before Kiro executes a tool (write, edit, bash, etc.)
- Use Cases:
- Block dangerous operations: Prevent
rm -rfin bash commands - Validate file writes: Scan for hardcoded secrets (API keys, passwords)
- Enforce patterns: Reject writes that don’t match code style
- Audit logging: Log every tool invocation to
/tmp/audit.log
- Block dangerous operations: Prevent
- Exit Behavior:
- Exit 0: Tool executes normally
- Exit 2: Tool is blocked, stderr is shown to Kiro as an error
- stdin: Receives tool parameters (e.g., file path, command text)
- Example:
"command": "grep -q 'API_KEY' && exit 2 || exit 0"(block if API_KEY found)
- postToolUse (Cleanup/Formatting Hook)
- When: After Kiro executes a tool successfully
- Use Cases:
- Auto-format code: Run
prettier,rustfmt,blackafter writes - Run linters:
eslint --fix,cargo clippy - Generate docs: Update API docs after code changes
- Trigger builds: Run
npm run buildafter dependency changes - Notify teams: Send Slack message after deployments
- Auto-format code: Run
- Exit Behavior: Exit code ignored (tool already executed)
- stdin: Receives tool result (e.g., file content after write)
- Example:
"command": "prettier --write \"$FILE\"
Matcher Patterns:
For preToolUse and postToolUse, you can filter which tools trigger the hook:
{
"matcher": "write", // Exact match (only "write" tool)
"matcher": "write|edit", // Multiple tools (regex OR)
"matcher": ".*", // All tools (wildcard)
"matcher": "bash" // Only bash commands
}
Configuration Hierarchy:
Hooks follow Kiro’s three-tier system:
- Global (
~/.kiro/settings/agent-config.json): Hooks that apply to all projects - Project (
.kiro/settings/agent-config.json): Project-specific hooks - Agent (
.kiro/agents/my-agent/config.json): Agent-specific hooks
Merge behavior: All hooks accumulate (global + project + agent). They execute in order: global → project → agent.
Security Implications:
- Hooks run with your shell’s permissions (same as Kiro)
- A malicious
preToolUsehook can read all tool parameters (including file contents) - A malicious
postToolUsehook can modify tool results before Kiro sees them - Best practice: Keep hooks in version-controlled project repos, review changes carefully
Hook Execution Model:
- Synchronous: Kiro blocks until the hook completes
- Timeout: Hooks have a default 30-second timeout (configurable)
- Environment: Hooks inherit Kiro’s environment variables
- Working Directory: Hooks execute in the project root
- stdin/stdout/stderr:
preToolUse: stdin = tool parameters, stdout = modified parameters (if exit 0), stderr = error message (if exit 2)postToolUse: stdin = tool result, stdout = logged (not returned to Kiro), stderr = logged as warninguserPromptSubmit: stdout = appended to user message, stderr = logged as warning
How It Works (Step-by-Step)
- Configure Hook: Add to
.kiro/settings/agent-config.json:{ "hooks": { "preToolUse": [ { "matcher": "bash", "command": "~/.kiro/scripts/audit-bash.sh" } ] } } - Create Hook Script (
~/.kiro/scripts/audit-bash.sh):#!/bin/bash # Read bash command from stdin CMD=$(cat) # Log to audit file echo "$(date): $CMD" >> ~/.kiro/bash-audit.log # Block dangerous commands if echo "$CMD" | grep -qE "rm -rf|sudo|shutdown"; then echo "ERROR: Dangerous command blocked" >&2 exit 2 fi # Allow safe commands exit 0 -
Make Executable:
chmod +x ~/.kiro/scripts/audit-bash.sh - Test: Ask Kiro to run
ls -la(should log and execute), then ask it to runrm -rf /tmp/test(should block and show error)
Minimal Concrete Example
1. Auto-format code after every write:
{
"hooks": {
"postToolUse": [
{
"matcher": "write",
"command": "prettier --write \"${FILE}\""
}
]
}
}
2. Inject git status before every user message:
{
"hooks": {
"userPromptSubmit": [
{
"command": "echo '\n\n--- Git Status ---' && git status --short"
}
]
}
}
3. Block file writes containing “TODO”:
{
"hooks": {
"preToolUse": [
{
"matcher": "write",
"command": "grep -q 'TODO' && echo 'ERROR: Remove TODOs before writing' >&2 && exit 2 || exit 0"
}
]
}
}
Common Misconceptions
- “Hooks are called manually by Kiro”
- Truth: Hooks are event-driven. Kiro triggers them automatically based on events (prompt submit, tool use). You never “call” a hook.
- “preToolUse hooks can modify the tool’s behavior”
- Truth: They can only block (exit 2) or allow (exit 0). They can’t change what the tool does. (Exception: stdin passthrough can modify parameters, but this is advanced.)
- “postToolUse hooks run before the tool”
- Truth: They run after the tool completes successfully. Use
preToolUsefor pre-execution validation.
- Truth: They run after the tool completes successfully. Use
- “Hooks must be written in Bash”
- Truth: Any executable works (Python, Node.js, compiled binaries). The
commandjust needs to be a valid shell command.
- Truth: Any executable works (Python, Node.js, compiled binaries). The
- “Blocking a preToolUse hook stops the entire conversation”
- Truth: It stops that specific tool invocation. Kiro receives the error message and can try a different approach or ask you for guidance.
Check-Your-Understanding Questions
- What’s the difference between
preToolUseandpostToolUsehooks? - If a
preToolUsehook exits with code 2, what happens? - When does the
userPromptSubmithook fire, and what is its output used for? - Can you use
preToolUseto auto-format code? Why or why not? - If you define the same hook in both global and project configs, which one runs?
Check-Your-Understanding Answers
-
preToolUsefires before a tool executes and can block it (exit 2).postToolUsefires after a tool executes and cannot block it (tool already ran). Use pre for validation, post for cleanup/formatting. -
The tool is blocked from executing. Kiro receives the hook’s stderr as an error message and must choose a different action (ask you, try a different tool, etc.).
-
userPromptSubmitfires when you submit a message (before Kiro processes it). Its stdout is appended to your message as context, allowing you to inject dynamic information (git status, test results, etc.). -
No.
preToolUseruns before the tool, so there’s no code to format yet. UsepostToolUsewith awrite|editmatcher to format code after it’s written. -
Both run, in order: global first, then project. Hooks accumulate across configuration tiers. If you want project-specific behavior, you can use conditionals inside the hook script.
Real-World Applications
- Security Scanning:
preToolUsehook that scans file writes for hardcoded secrets (API keys, passwords) and blocks the write with exit code 2 - Code Quality:
postToolUsehook that runs linters (eslint --fix,cargo clippy) after every code write - Compliance Auditing: Log every bash command to an audit file with
preToolUse(exit 0 to allow, but log first) - Context Enrichment:
userPromptSubmithook that adds git diff, test status, and current branch to every conversation - CI/CD Integration:
postToolUsehook that triggers builds or deployments after dependency changes - Policy Enforcement:
preToolUsehook that prevents Kiro from editing production config files or running destructive commands
Where You’ll Apply It
- Project 8: The Type-Safe Hook with Bun - Build a pre-commit hook system in TypeScript using Bun’s fast runtime
- Project 17: The Security Firewall Hook - Create a
preToolUsehook that validates all operations against security policies - Project 18: The Secret Sanitizer Hook - Implement a hook that scans for exposed credentials before writes
- Project 20: The Git Context Injector - Build a
userPromptSubmithook that adds rich git context to every conversation - Project 22: The Test Generator Hook - Create a
postToolUsehook that auto-generates test files after code writes - Project 24: The Auto-Fixer Loop - Combine hooks to create a feedback loop (write → lint → auto-fix → verify)
References
- Official Documentation: Hooks - CLI - Kiro (2025)
- Agent Configuration Reference: Agent Configuration - CLI - Kiro (2025)
- Settings Reference: Settings - CLI - Kiro (2025)
- Blog Post: Introducing Kiro Powers (2025) - discusses hooks as part of Powers
- Book: “Shell Scripting: Expert Recipes for Linux, Bash, and More” by Steve Parker - Advanced shell scripting techniques for hooks
- Book: “Wicked Cool Shell Scripts” by Dave Taylor - Practical shell script patterns applicable to hooks
Key Insights
Hooks turn Kiro from a reactive assistant into a proactive enforcer of team policies, executing guardrails, formatters, and context injectors automatically without manual prompting—the difference between “remember to run the linter” and “the linter always runs.”
Summary
Hooks are event-driven shell scripts that execute at specific moments in Kiro’s workflow: agentSpawn (initialization), userPromptSubmit (before processing user messages), preToolUse (before tool execution, can block), and postToolUse (after tool execution). They enable guardrails, auditing, auto-formatting, and dynamic context injection. Configured in agent-config.json at global, project, or agent scope, hooks accumulate across tiers and execute synchronously with access to stdin/stdout/stderr for data flow.
Homework/Exercises to Practice the Concept
- Exercise 1: Context Injection with userPromptSubmit
- Create a
userPromptSubmithook that adds git status to every message - Test by asking Kiro to describe the current state of your repo
- Verify the git status appears in the conversation context
- Create a
- Exercise 2: Auto-Format with postToolUse
- Install Prettier (
npm install -g prettier) - Create a
postToolUsehook with matcher"write"that runsprettier --writeon the file - Ask Kiro to write a poorly-formatted JS file
- Verify Prettier auto-formats it
- Install Prettier (
- Exercise 3: Block Dangerous Commands with preToolUse
- Create a
preToolUsehook with matcher"bash"that blocks commands containingrm -rf - Test by asking Kiro to “remove the /tmp/test directory with rm -rf”
- Verify the hook blocks it and returns an error to Kiro
- Create a
- Exercise 4: Audit Logging
- Create a
preToolUsehook that logs all bash commands to~/.kiro/audit.log(but doesn’t block) - Run several Kiro commands (bash, write, etc.)
- Verify the audit log contains all bash commands with timestamps
- Create a
- Exercise 5: Hook Execution Order
- Add the same
userPromptSubmithook to both global and project configs (with different echo messages) - Submit a message to Kiro
- Verify both hooks run and observe the execution order (global → project)
- Add the same
Solutions to the Homework/Exercises
Solution 1:
# Create/edit .kiro/settings/agent-config.json
$ cat > .kiro/settings/agent-config.json << 'EOF'
{
"hooks": {
"userPromptSubmit": [
{
"command": "echo '\n\n--- Current Git Status ---' && git status --short"
}
]
}
}
EOF
# Start Kiro and test
$ kiro-cli
> What files have I changed?
# Kiro sees your message PLUS the git status output:
# "What files have I changed?
#
# --- Current Git Status ---
# M src/auth.ts
# M src/utils.ts
# ?? src/new-file.ts"
# Kiro responds: "Based on the git status, you've modified auth.ts and utils.ts, and added a new untracked file new-file.ts."
Solution 2:
# Install Prettier
$ npm install -g prettier
# Create hook configuration
$ cat > .kiro/settings/agent-config.json << 'EOF'
{
"hooks": {
"postToolUse": [
{
"matcher": "write",
"command": "prettier --write '${FILE}' 2>&1"
}
]
}
}
EOF
# Test with Kiro
$ kiro-cli
> Write a file called test.js with this content:
> function foo(){return{a:1,b:2}}
# Kiro writes the file (poorly formatted)
# Hook triggers: prettier --write test.js
# File is automatically reformatted:
# function foo() {
# return { a: 1, b: 2 };
# }
Solution 3:
# Create hook script
$ mkdir -p ~/.kiro/scripts
$ cat > ~/.kiro/scripts/block-dangerous-rm.sh << 'EOF'
#!/bin/bash
CMD=$(cat)
if echo "$CMD" | grep -q "rm -rf"; then
echo "ERROR: Dangerous command 'rm -rf' is blocked by security policy" >&2
exit 2
fi
exit 0
EOF
$ chmod +x ~/.kiro/scripts/block-dangerous-rm.sh
# Configure hook
$ cat > .kiro/settings/agent-config.json << 'EOF'
{
"hooks": {
"preToolUse": [
{
"matcher": "bash",
"command": "~/.kiro/scripts/block-dangerous-rm.sh"
}
]
}
}
EOF
# Test
$ kiro-cli
> Remove the /tmp/test directory using rm -rf
# Kiro receives error from hook:
# "ERROR: Dangerous command 'rm -rf' is blocked by security policy"
# Kiro responds: "I cannot execute that command because it's blocked by your security policy. Would you like me to use a safer alternative like moving the directory to trash?"
Solution 4:
# Create audit hook
$ cat > ~/.kiro/scripts/audit-bash.sh << 'EOF'
#!/bin/bash
CMD=$(cat)
echo "$(date '+%Y-%m-%d %H:%M:%S') - BASH: $CMD" >> ~/.kiro/audit.log
cat <<< "$CMD" # Pass through unmodified
exit 0
EOF
$ chmod +x ~/.kiro/scripts/audit-bash.sh
# Configure hook
$ cat > ~/.kiro/settings/agent-config.json << 'EOF'
{
"hooks": {
"preToolUse": [
{
"matcher": "bash",
"command": "~/.kiro/scripts/audit-bash.sh"
}
]
}
}
EOF
# Test
$ kiro-cli
> List files in the current directory
> Check disk usage
> Show git log
# Verify audit log
$ cat ~/.kiro/audit.log
2025-01-02 14:32:15 - BASH: ls -la
2025-01-02 14:32:45 - BASH: df -h
2025-01-02 14:33:10 - BASH: git log --oneline -10
Solution 5:
# Global config (~/.kiro/settings/agent-config.json):
{
"hooks": {
"userPromptSubmit": [
{
"command": "echo '[GLOBAL HOOK] Current time:' && date"
}
]
}
}
# Project config (.kiro/settings/agent-config.json):
{
"hooks": {
"userPromptSubmit": [
{
"command": "echo '[PROJECT HOOK] Current branch:' && git branch --show-current"
}
]
}
}
# Start Kiro and submit a message
$ kiro-cli
> Hello
# Kiro sees:
# "Hello
#
# [GLOBAL HOOK] Current time:
# Thu Jan 2 14:35:22 PST 2025
#
# [PROJECT HOOK] Current branch:
# feature/hooks-demo"
# Both hooks run, global first, then project
Chapter 6: Planning and Subagents - Split Thinking From Doing
Fundamentals
Planning and subagents are Kiro’s separation-of-concerns mechanisms for complex workflows. Planning agents (accessed via Shift+Tab or /plan) are specialized agents that transform vague ideas into structured implementation plans without executing code—pure thinking, no doing. Subagents are autonomous worker processes that run tasks in parallel with isolated context windows, allowing Kiro to investigate multiple data sources simultaneously or delegate specialized work without polluting the main conversation context. Together, they enable Kiro to scale beyond the “one brain, one task” bottleneck: plan the work (planning agent), then split the execution across multiple workers (subagents).
How This Fits on Projects
- Project 4: Build a subagent researcher that explores multiple data sources in parallel
- Project 5: Create a plan architect that generates technical specifications from user stories
- Project 10: Implement a subagent orchestrator for parallel task processing
- Project 11: Build a planning agent workflow that separates design from implementation
- Project 25: Create a code review workflow using multiple subagents (security, style, tests)
- Project 29: Implement a background worker system using delegate subagents
Definitions & Key Terms
- Planning Agent: A built-in specialized agent (
Shift+Tabor/plan) that creates implementation plans without executing code - Subagent: An autonomous worker process with its own context window, spawned by the main agent
- Parallel Execution: Running multiple subagents simultaneously (up to 10 concurrent tasks)
- Context Isolation: Each subagent has its own 200K token context, separate from the main agent
- Delegate: Background subagent that runs tasks asynchronously while you continue working
- Context Gatherer: Built-in subagent for exploring projects and collecting information
- General-Purpose Subagent: Built-in subagent for parallelizing arbitrary tasks
- Custom Subagent: User-defined subagent with specialized configuration (tools, steering, MCP servers)
- Plan Mode: Interactive workflow where planning agent asks clarifying questions before generating spec
Mental Model Diagram
┌────────────────────────────────────────────────────────────┐
│ MAIN KIRO CONVERSATION │
│ Context: 58K / 200K tokens (chat history, files) │
│ Focus: Coordinating work, making decisions │
└─────────────────┬──────────────────────────────────────────┘
│
│ Spawns Subagents (Parallel Execution)
│
┌─────────┴─────────┬──────────────┬─────────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌──────────────┐ ┌──────────────┐
│ Subagent 1 │ │ Subagent 2 │ │ Subagent 3 │ │ Planning │
│ (Research) │ │ (Code Review) │ │ (Testing) │ │ Agent │
├───────────────┤ ├───────────────┤ ├──────────────┤ ├──────────────┤
│ Context: │ │ Context: │ │ Context: │ │ Context: │
│ 12K / 200K │ │ 8K / 200K │ │ 6K / 200K │ │ 15K / 200K │
│ │ │ │ │ │ │ │
│ Task: │ │ Task: │ │ Task: │ │ Task: │
│ - Fetch GitHub│ │ - Scan for │ │ - Run pytest │ │ - Draft tech │
│ issues │ │ security │ │ - Analyze │ │ spec │
│ - Summarize │ │ vulns │ │ coverage │ │ - Create │
│ priorities │ │ - Check deps │ │ - Report │ │ subtasks │
│ │ │ │ │ failures │ │ (No Execute!)│
└───────┬───────┘ └───────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │ │
│ Results │ Results │ Results │ Plan
└──────────────────┴─────────────────┴─────────────────┘
│
┌─────────────┴──────────────┐
│ MAIN AGENT SYNTHESIZES │
│ Combines all results │
│ Makes final decision │
└─────────────────────────────┘
Planning Agent Workflow (Shift+Tab):
1. User: "Build a feature for user authentication"
2. Plan Agent asks: "OAuth or custom? Session or JWT? Database?"
3. User answers clarifying questions
4. Plan Agent outputs: Markdown file with:
- Technical spec
- Step-by-step tasks
- Files to create/modify
- Testing strategy
5. User approves → Main agent executes plan
Deep Dive
As projects grow, two bottlenecks emerge: context limits (you can’t fit the entire codebase in 200K tokens) and serial execution (Kiro can’t investigate multiple things at once). Planning agents and subagents solve these problems by splitting work across specialized processes.
Planning Agents: Thinking Without Doing
The planning agent is a built-in specialized persona optimized for architectural thinking, not execution. When you press Shift+Tab or type /plan, Kiro switches to plan mode:
- Clarification Phase: The agent asks targeted questions to understand requirements (“Which database? Which authentication method? What’s the deployment target?”)
- Specification Phase: It drafts a technical spec as a structured Markdown document with clear sections (Overview, Architecture, Tasks, Testing, Deployment)
- Review Phase: You review the plan, make edits, and approve
- Handoff: The main agent receives the plan as context and executes it step-by-step
Why separate planning from execution?
- Different mindsets: Planning requires breadth (consider all options, identify risks). Execution requires depth (focus on current task, handle edge cases).
- Avoid premature optimization: Without a plan, agents tend to start coding immediately, leading to rework when requirements change.
- Clear checkpoints: You approve the plan before code is written, catching misunderstandings early.
Example Planning Agent Output:
# Technical Specification: User Authentication
## Overview
Implement JWT-based authentication with refresh tokens.
## Architecture
- Auth middleware in `src/middleware/auth.ts`
- Token service in `src/services/token.ts`
- User model in `src/models/user.ts`
## Tasks
1. Install dependencies (`jsonwebtoken`, `bcrypt`)
2. Create User model with password hashing
3. Implement token service (generate, verify, refresh)
4. Add auth middleware to protect routes
5. Create login/logout endpoints
## Testing Strategy
- Unit tests for token service
- Integration tests for auth endpoints
- E2E tests for login/logout flow
## Security Considerations
- Store refresh tokens in HTTP-only cookies
- Use environment variables for JWT secret
- Hash passwords with bcrypt (12 rounds)
Subagents: Parallel Execution with Context Isolation
Subagents are separate Kiro processes spawned by the main agent. Each has:
- Own context window: 200K tokens, isolated from main agent
- Own tools: Access to read/write, bash, MCP servers (configurable)
- Own configuration: Can use custom agent configs with specialized steering
Three Types of Subagents:
- Context Gatherer (built-in): Explores projects, reads files, summarizes information
- Use case: “Gather all error handling patterns in the codebase”
- Output: Summary of findings, returned to main agent
- General-Purpose (built-in): Executes arbitrary tasks in parallel
- Use case: “Check GitHub issues, run tests, and fetch AWS logs—simultaneously”
- Output: Results from all tasks, aggregated
- Custom Subagents (user-defined): Specialized workers with custom configurations
- Example: Security scanner subagent (with
mcp-server-semgrepMCP server + security steering) - Example: Performance profiler subagent (with flamegraph tools + profiling prompts)
- Example: Security scanner subagent (with
How Subagents Work (Step-by-Step):
- Main agent decides a task benefits from parallelization or needs isolated context
- Spawns subagent(s) with specific instructions:
[Subagent 1: Analyze auth.ts for security issues] [Subagent 2: Run tests and report failures] - Each subagent works autonomously:
- Reads files, runs tools, calls MCP servers
- Context grows within its own 200K window (doesn’t affect main agent)
- Subagents complete and return results to main agent
- Main agent synthesizes results: “Subagent 1 found 3 security issues. Subagent 2 reports 2 failing tests.”
- Main agent presents combined findings to you
Parallelization Example:
Instead of serial execution (slow):
Main Agent:
1. Fetch GitHub issues (30 seconds)
2. Run tests (60 seconds)
3. Query database for metrics (20 seconds)
Total: 110 seconds
With subagents (fast):
Main Agent spawns 3 subagents:
├─ Subagent 1: Fetch GitHub issues (30 seconds)
├─ Subagent 2: Run tests (60 seconds)
└─ Subagent 3: Query database (20 seconds)
All run simultaneously. Total: 60 seconds (slowest task)
Context Isolation Benefits:
- Main agent stays focused: Doesn’t get polluted with subagent’s exploratory work
- Deeper investigation: Subagent can read hundreds of files without hitting main agent’s context limit
- Clean summarization: Subagent distills findings before returning to main agent
Configuration Hierarchy for Subagents:
Subagents can inherit configurations or use custom ones:
- Inherit from main: Default behavior (same MCP servers, hooks, steering)
- Custom config: Specify
.kiro/agents/security-scanner/config.jsonwith specialized tools/steering
How It Works (Step-by-Step)
Planning Agent Workflow:
- Trigger: Press Shift+Tab or type
/plan - Context Injection: Kiro reads project files, git status, existing docs
- Clarification: “What authentication method? What database? What’s the deployment target?”
- User Interaction: You answer questions via chat
- Draft Plan: Planning agent generates a Markdown spec with sections (Overview, Architecture, Tasks, Testing)
- Review & Approve: You edit the plan or approve it
- Handoff: Main agent receives the plan and begins execution
Subagent Workflow:
- Spawn: Main agent decides to delegate a task and spawns subagent with instructions
- Isolation: Subagent gets own context window (200K tokens, empty at start)
- Execution: Subagent reads files, runs tools, calls MCP servers autonomously
- Completion: Subagent returns results (summary, findings, errors)
- Synthesis: Main agent integrates results and continues
Minimal Concrete Example
1. Use Planning Agent:
$ kiro-cli
> I need to add user authentication to my app
# Press Shift+Tab to enter plan mode
[Plan Mode Activated]
Plan Agent: "To design the authentication system, I need to clarify a few things:
1. Which authentication method? (OAuth, JWT, Session-based)
2. Which database are you using?
3. Do you need social login (Google, GitHub)?
4. What's your deployment environment?"
> JWT, PostgreSQL, no social login, Docker
Plan Agent: "Great. I'll draft a technical spec for JWT authentication with PostgreSQL."
[Generates 50-line Markdown spec with architecture, tasks, and testing strategy]
> Looks good, proceed
[Main agent takes over, executes plan step-by-step]
2. Spawn Subagents for Parallel Tasks:
$ kiro-cli
> Investigate why the build is failing. Check:
> 1. GitHub Actions logs
> 2. Local test failures
> 3. Dependency conflicts
Main Agent: "I'll spawn 3 subagents to investigate in parallel."
[Spawns Subagent 1: Fetch GitHub Actions logs via gh CLI]
[Spawns Subagent 2: Run tests locally and analyze failures]
[Spawns Subagent 3: Check package.json for dependency conflicts]
[All run simultaneously, ~30 seconds]
Subagent 1 Result: "GitHub Actions failed on 'npm install' step. Error: ENOTFOUND registry.npmjs.org"
Subagent 2 Result: "Local tests pass. No failures."
Subagent 3 Result: "Found conflicting versions: react@17 (package.json) vs react@18 (package-lock.json)"
Main Agent: "The build is failing because npm install can't reach the registry (network issue), and there's a React version conflict. Fix the conflict first, then retry CI."
Common Misconceptions
- “Planning agents execute code”
- Truth: Planning agents only think and write specs. They don’t call tools like write, bash, or edit. Their output is a Markdown plan, which the main agent executes.
- “Subagents share context with the main agent”
- Truth: Each subagent has an isolated 200K context window. They don’t see the main agent’s conversation history. This is intentional (keeps main context clean).
- “Subagents are slower than main agent execution”
- Truth: For parallel tasks, subagents are faster (tasks run simultaneously). For serial tasks, there’s overhead from spawning processes.
- “You need to manually manage subagent lifecycles”
- Truth: Kiro spawns and cleans up subagents automatically. You just ask for parallelized work, and Kiro handles the orchestration.
- “Planning mode is mandatory for complex tasks”
- Truth: It’s optional but recommended. You can ask Kiro to build features directly, but planning first reduces rework and gives you control.
Check-Your-Understanding Questions
- What’s the difference between a planning agent and a subagent?
- How many subagents can Kiro run concurrently?
- If a subagent reads 50 files (100K tokens), does that count toward the main agent’s context limit?
- When should you use plan mode (Shift+Tab) vs. asking Kiro to implement directly?
- Can subagents access MCP servers configured in the main agent’s settings?
Check-Your-Understanding Answers
-
Planning agent: Specialized for creating implementation specs (thinking, no code execution). Subagent: Autonomous worker for executing tasks in parallel with isolated context.
-
10 concurrent subagents (as of 2025 Kiro documentation for autonomous agent variant).
-
No. Subagents have isolated context windows. The main agent only sees the subagent’s final result (summary), not every file it read.
-
Use plan mode when requirements are unclear, the task is complex, or you want to review the approach before execution. Use direct implementation for well-defined, straightforward tasks.
-
Yes, by default. Subagents inherit the main agent’s MCP servers unless you configure them with a custom agent config.
Real-World Applications
- Multi-Repository Refactoring: Spawn subagents to analyze different repos simultaneously, then synthesize changes in main agent
- Security Audits: Parallel subagents check code (static analysis), dependencies (vulnerability scan), and infrastructure (cloud config review)
- Data Pipeline Analysis: Subagents query different data sources (Postgres, S3, APIs) concurrently, main agent aggregates results
- CI/CD Debugging: Parallel investigation of logs (GitHub Actions), tests (local runner), and deployment (AWS CloudWatch)
- Feature Planning: Use planning agent to design complex features (e.g., payment integration) before writing code
- Code Review Workflows: Subagents perform style checks, security scans, and test coverage analysis in parallel
Where You’ll Apply It
- Project 4: The Subagent Researcher - Build a system that spawns multiple subagents to research different topics and synthesize findings
- Project 5: The Plan Architect - Create a workflow that uses planning agents to generate technical specs from user stories
- Project 10: The Subagent Orchestrator - Implement a task queue that distributes work across subagents based on priority
- Project 11: The Planning Agent Workflow - Design a structured development process with mandatory planning phase before implementation
- Project 25: The Code Review Workflow - Use multiple subagents (security, style, tests, docs) to review pull requests in parallel
- Project 29: The Delegate Background Worker - Build async workflows where subagents run long tasks (tests, builds) while you continue working
References
- Official Documentation: Introducing Kiro Autonomous Agent (2025)
- Changelog: Subagents, Plan Agent, Grep/Glob Tools, and MCP Registry (2025)
- Changelog: Web Tools, Subagents, Contextual Hooks (2025)
- GitHub Issue: Enable Parallel Task Execution with Sub-Agent Sessions (2024-2025)
- Autonomous Agent Docs: Using the Agent - Kiro (2025)
- Book: “Designing Data-Intensive Applications” by Martin Kleppmann - Chapter on parallel processing and coordination patterns
- Paper: “Communicating Sequential Processes” by C.A.R. Hoare - Theoretical foundation for concurrent agent systems
Key Insights
Planning agents separate “what to build” from “how to build it,” while subagents break serial execution into parallel workflows—together they transform Kiro from a single-threaded assistant into a multi-agent orchestrator that thinks before acting and works in parallel when possible.
Summary
Planning agents (Shift+Tab or /plan) create implementation specifications without executing code, focusing on architectural thinking and requirements clarification. Subagents are autonomous workers with isolated 200K-token context windows that execute tasks in parallel (up to 10 concurrent). Kiro offers three subagent types: context gatherer (exploration), general-purpose (parallelization), and custom (user-defined configurations). Subagents inherit main agent’s tools and MCP servers by default but can use specialized configs, enabling workflows that split thinking (planning) from doing (subagents) and serial execution from parallel processing.
Homework/Exercises to Practice the Concept
- Exercise 1: Use Planning Agent
- Start Kiro and press Shift+Tab (or type
/plan) - Ask the planning agent to design a REST API for a task management system
- Observe the clarifying questions it asks
- Review the generated technical spec
- Start Kiro and press Shift+Tab (or type
- Exercise 2: Parallel Data Collection with Subagents
- Ask Kiro to “investigate these three GitHub repos simultaneously: torvalds/linux, nodejs/node, and rust-lang/rust”
- Observe Kiro spawning subagents for parallel execution
- Compare execution time to serial investigation
- Exercise 3: Context Isolation Verification
- Spawn a subagent to read 20+ files in a large codebase
- Check the main agent’s context usage (should not increase significantly)
- Ask the main agent what it knows about those files (should only know the summary the subagent provided)
- Exercise 4: Custom Subagent Configuration
- Create a custom agent config at
.kiro/agents/security-scanner/config.json - Add steering rules for security scanning
- Spawn a subagent using this config to scan code for vulnerabilities
- Create a custom agent config at
- Exercise 5: Plan-Execute-Review Workflow
- Use planning agent to design a feature (e.g., “user authentication”)
- Approve the plan and let Kiro execute it
- After completion, ask Kiro to review what it built against the original plan
Solutions to the Homework/Exercises
Solution 1:
$ kiro-cli
# Press Shift+Tab to enter plan mode
[Plan Mode Activated]
> Design a REST API for a task management system
Plan Agent: "To design the API, I need some clarifications:
1. Which database? (PostgreSQL, MongoDB, etc.)
2. Authentication method? (JWT, OAuth, Session)
3. What entities besides tasks? (Users, Projects, Tags?)
4. Required endpoints? (CRUD for tasks, filtering, search?)
5. Deployment target? (AWS, Vercel, Docker)"
> PostgreSQL, JWT, Users and Tasks, full CRUD plus search, Docker
Plan Agent: "Great. Here's the technical specification:"
---
# Technical Specification: Task Management REST API
## Overview
RESTful API for task management with JWT authentication and PostgreSQL.
## Database Schema
- Users: id, email, password_hash, created_at
- Tasks: id, user_id, title, description, status, due_date, created_at
## Endpoints
- POST /auth/register - Create user
- POST /auth/login - Get JWT token
- GET /tasks - List user's tasks (with filtering)
- POST /tasks - Create task
- GET /tasks/:id - Get task details
- PUT /tasks/:id - Update task
- DELETE /tasks/:id - Delete task
## Architecture
- Express.js server
- Prisma ORM for PostgreSQL
- JWT middleware for auth
- Docker Compose for PostgreSQL + API
## Implementation Tasks
1. Setup: Express + Prisma + PostgreSQL
2. Models: User and Task schemas
3. Auth: Register, login, JWT middleware
4. CRUD: Task endpoints with auth
5. Testing: Integration tests with supertest
6. Docker: Compose file for deployment
---
> Looks good, proceed
[Main agent begins executing the plan step-by-step]
Solution 2:
$ kiro-cli
> Investigate these three GitHub repos simultaneously: torvalds/linux, nodejs/node, and rust-lang/rust. For each, tell me the star count, primary language, and last commit date.
Main Agent: "I'll spawn 3 subagents to fetch this information in parallel."
[Spawning Subagent 1 for torvalds/linux...]
[Spawning Subagent 2 for nodejs/node...]
[Spawning Subagent 3 for rust-lang/rust...]
[All subagents run ~10 seconds concurrently]
Subagent 1 Result: "torvalds/linux: 182,456 stars, C (98%), last commit 2 hours ago"
Subagent 2 Result: "nodejs/node: 106,234 stars, JavaScript (68%), last commit 5 hours ago"
Subagent 3 Result: "rust-lang/rust: 97,845 stars, Rust (95%), last commit 1 hour ago"
Main Agent:
"Here's the summary:
1. **Linux Kernel** (torvalds/linux): 182K stars, C, active (2h ago)
2. **Node.js** (nodejs/node): 106K stars, JavaScript, active (5h ago)
3. **Rust** (rust-lang/rust): 98K stars, Rust, active (1h ago)
All three are highly active projects with commits in the last few hours."
# Without subagents (serial), this would take ~30 seconds (3 x 10s)
# With subagents (parallel), it took ~10 seconds (slowest task)
Solution 3:
$ kiro-cli
> Spawn a subagent to read all files in the src/ directory and summarize the code architecture. Don't add those files to my context.
Main Agent: "I'll spawn a context gatherer subagent to explore src/."
[Spawning Subagent: Context Gatherer]
Subagent (working in isolated context):
- Reading src/index.ts (8K tokens)
- Reading src/auth.ts (6K tokens)
- Reading src/database.ts (10K tokens)
- Reading src/utils.ts (4K tokens)
[... 15 more files, total 120K tokens in subagent context]
Subagent Result (returned to main agent):
"Architecture summary (2K tokens):
- Entry point: src/index.ts (Express server setup)
- Auth: src/auth.ts (JWT middleware, bcrypt hashing)
- Database: src/database.ts (Prisma client, connection pooling)
- Utils: src/utils.ts (validation, error handling)
- Structure: Layered architecture (routes → controllers → services → models)"
# Check main agent context
> What's in your context right now?
Main Agent: "My context contains:
- Our conversation history (15K tokens)
- The subagent's architecture summary (2K tokens)
- Total: 17K / 200K tokens
I don't have the full contents of those 20 files—only the summary."
Solution 4:
# Create custom security scanner agent config
$ mkdir -p .kiro/agents/security-scanner
$ cat > .kiro/agents/security-scanner/config.json << 'EOF'
{
"name": "Security Scanner",
"description": "Specialized agent for security vulnerability scanning",
"allowedTools": ["read", "grep", "glob", "mcp"],
"mcpServers": {
"semgrep": {
"command": "npx",
"args": ["-y", "@semgrep/semgrep-mcp-server"]
}
},
"steering": {
"security.md": "# Security Scanning Rules\n\n- Scan for hardcoded secrets (API_KEY, PASSWORD)\n- Check for SQL injection vulnerabilities\n- Identify XSS risks\n- Flag insecure dependencies\n- Report findings with severity (HIGH, MEDIUM, LOW)"
}
}
EOF
# Use Kiro to spawn security scanner subagent
$ kiro-cli
> Spawn a security scanner subagent using the config in .kiro/agents/security-scanner to scan all files in src/ for vulnerabilities
Main Agent: "Spawning security scanner subagent with custom configuration..."
[Subagent loads security.md steering + semgrep MCP server]
[Subagent scans all src/ files]
Subagent Result:
"Found 3 security issues:
HIGH: Hardcoded API key in src/config.ts (line 42)
MEDIUM: Potential SQL injection in src/database.ts (line 88)
LOW: Missing HTTPS enforcement in src/server.ts (line 12)"
Main Agent: "The security scan found 3 issues. Would you like me to fix them?"
Solution 5:
$ kiro-cli
# Step 1: Plan
> I want to add user authentication to my Express app. Use plan mode.
# Press Shift+Tab
[Plan Mode Activated]
Plan Agent: "Questions about authentication:
1. JWT or session-based?
2. Which database?
3. Need password reset functionality?"
> JWT, PostgreSQL, yes include password reset
Plan Agent: [Generates detailed spec with 8 sections]
# Step 2: Execute
> Looks good, implement it
Main Agent: [Executes plan over 10 minutes, creating 5 files, running tests]
# Step 3: Review
> Review what you built against the original plan. Did you implement everything?
Main Agent: "Comparing implementation to plan:
✅ JWT token service (implemented in src/auth/tokens.ts)
✅ User model with password hashing (src/models/user.ts)
✅ Login/logout endpoints (src/routes/auth.ts)
✅ Password reset flow (src/routes/reset-password.ts)
✅ Auth middleware (src/middleware/auth.ts)
✅ Tests (tests/auth.test.ts - 12 passing tests)
All planned features are implemented and tested."
Chapter 7: Models and Routing - Choose the Right Brain
Fundamentals
Kiro’s Auto Router is an intelligent model selection system that dynamically chooses the optimal Claude model for each task—balancing speed, cost, and quality without manual intervention. Instead of forcing you to decide “Should I use Haiku (fast/cheap) or Opus (powerful/expensive) for this task?”, Auto Router analyzes the complexity and picks the right model automatically. It’s like having a performance-aware compiler that optimizes execution strategy based on workload: simple file reads use Haiku (0.4x credit cost), complex refactorings use Sonnet 4.5 (1.3x cost), and architectural design uses Opus 4.5 (2.2x cost). Manual model selection (/model command) is available for specific use cases, but Auto (1x cost) is both smarter and more cost-efficient than always using Sonnet.
How This Fits on Projects
- Project 2: Build a model analyzer to understand auto-router decisions
- Project 3: Create a context window visualizer showing token budgets per model
- Project 35: Implement a deep reasoner that explicitly uses Opus for algorithmic challenges
- All Projects: Auto Router selects the optimal model for each task automatically
Definitions & Key Terms
- Auto Router: Kiro’s default intelligent model selector (combines Haiku, Sonnet, Opus based on task complexity)
- Claude Haiku 4.5: Fast, lightweight model (0.4x credits) for simple tasks (file reads, basic queries)
- Claude Sonnet 4.5: Balanced model (1.3x credits) for standard development tasks (coding, refactoring)
- Claude Opus 4.5: Powerful model (2.2x credits) for complex reasoning (architecture, algorithms, debugging)
- Credit Multiplier: Cost factor relative to Auto mode (Auto = 1x baseline)
- Model Switching: Manual model selection via
/modelcommand - Context Window: Token limit per model (200K for all Claude models as of 2025)
- Experimental Support: Opus 4.5 availability (not supported for AWS IAM Identity Center users)
Mental Model Diagram
┌────────────────────────────────────────────────────────────┐
│ KIRO AUTO ROUTER │
│ "Choose the right brain for each task automatically" │
└────────────────────────────────────────────────────────────┘
User Request: "Refactor the auth system to use OAuth2"
│
▼
┌───────────────────────┐
│ AUTO ROUTER │
│ Analyzes Task: │
│ - Complexity │
│ - Context Size │
│ - User History │
│ - Cost/Quality │
└─────────┬─────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
│ │ │
┌───────▼────────┐ ┌───────▼────────┐ ┌──────▼─────────┐
│ HAIKU 4.5 │ │ SONNET 4.5 │ │ OPUS 4.5 │
│ (0.4x cost) │ │ (1.3x cost) │ │ (2.2x cost) │
├────────────────┤ ├────────────────┤ ├────────────────┤
│ Use Cases: │ │ Use Cases: │ │ Use Cases: │
│ • File reads │ │ • Code writing │ │ • Architecture │
│ • Simple edits │ │ • Refactoring │ │ • Algorithms │
│ • Quick Q&A │ │ • Testing │ │ • Debugging │
│ • Navigation │ │ • Standard dev │ │ • Planning │
│ │ │ │ │ │
│ Context: │ │ Context: │ │ Context: │
│ 200K tokens │ │ 200K tokens │ │ 200K tokens │
│ │ │ │ │ │
│ Speed: │ │ Speed: │ │ Speed: │
│ ⚡⚡⚡ Fastest │ │ ⚡⚡ Fast │ │ ⚡ Slower │
└────────────────┘ └────────────────┘ └────────────────┘
Manual Selection (Override):
$ kiro-cli
> /model opus # Force Opus for all tasks
> /model auto # Return to Auto Router (recommended)
Cost Comparison for Same Workflow:
┌────────────────────────────────────────────────────────┐
│ Task: "Build user auth, run tests, fix 3 bugs" │
├────────────────────────────────────────────────────────┤
│ Auto Router: 100 credits (1x) ← Most efficient │
│ Always Sonnet: 130 credits (1.3x) ← Overpaying │
│ Always Opus: 220 credits (2.2x) ← Very expensive │
│ Always Haiku: 40 credits (0.4x) ← Too weak │
│ → Will fail tasks │
└────────────────────────────────────────────────────────┘
Deep Dive
The Auto Router solves a fundamental trade-off in AI development: stronger models are better at complex tasks but cost more. Manually choosing the “right” model for every task is tedious and error-prone (you’d always pick Opus to be safe, wasting money on simple tasks). Auto Router applies machine learning techniques to predict task complexity and selects the optimal model dynamically.
How Auto Router Works:
- Task Classification: When you submit a request, the router analyzes:
- Intent: What are you asking for? (read file vs. refactor architecture)
- Context Size: How much code/data is involved?
- Complexity Signals: Keywords like “design,” “optimize,” “explain deeply” suggest harder tasks
- Historical Performance: Which model succeeded on similar tasks?
- Model Selection Heuristics:
- Haiku: Tasks with clear, procedural steps (file reads, simple edits, grep searches)
- Sonnet: Standard coding tasks (write functions, refactor, add tests, implement features)
- Opus: High-complexity reasoning (architectural decisions, algorithmic optimizations, debugging race conditions, explaining complex systems)
- Cost-Quality Optimization: Auto Router doesn’t just pick the cheapest model that can do the job—it balances:
- Quality: Will the output be correct and maintainable?
- Cost: Is a cheaper model sufficient?
- Latency: For interactive workflows, speed matters
- Feedback Loop: Over time, the router learns:
- If Sonnet fails on a task type, escalate to Opus
- If Opus is overkill, try Sonnet first next time
- Adapt to your workflow patterns
Cost Economics:
Kiro uses a credit system where Auto = 1x baseline. Here’s the multiplier table:
| Model | Credit Multiplier | When to Use |
|---|---|---|
| Auto | 1x | Default (recommended) |
| Haiku 4.5 | 0.4x | Simple, repetitive tasks (if you’re sure) |
| Sonnet 4.5 | 1.3x | When you need consistent quality |
| Opus 4.5 | 2.2x | Complex reasoning, research, architecture |
Key Insight: Auto mode (1x) is cheaper than manually picking Sonnet (1.3x) for all tasks, because Auto uses Haiku for simple work and Opus only when necessary.
Example Cost Breakdown:
A typical development session:
1. Read 10 files (Haiku): 0.4x * 10 = 4 credits
2. Write auth middleware (Sonnet): 1.3x * 1 = 1.3 credits
3. Refactor API routes (Sonnet): 1.3x * 1 = 1.3 credits
4. Debug race condition (Opus): 2.2x * 1 = 2.2 credits
Total: 8.8 credits with Auto Router
Same session with always-Sonnet:
1. Read 10 files (Sonnet): 1.3x * 10 = 13 credits
2. Write auth (Sonnet): 1.3x * 1 = 1.3 credits
3. Refactor API (Sonnet): 1.3x * 1 = 1.3 credits
4. Debug race (Sonnet, fails): 1.3x * 1 = 1.3 credits
→ Retry with Opus: 2.2x * 1 = 2.2 credits
Total: 19.1 credits (2.2x more expensive, took longer!)
Manual Model Selection:
Use /model command when:
- Experimentation: Testing how different models handle a specific task
- Benchmarking: Comparing output quality across models
- Forcing stronger reasoning: Override Auto for known-hard tasks
- Cost control: Force Haiku when budget is tight and quality is acceptable
$ kiro-cli
> /model opus
Switched to Claude Opus 4.5 (2.2x credits)
> Design a distributed consensus algorithm
[Opus provides detailed explanation with trade-offs, edge cases, and references]
> /model auto
Switched back to Auto Router (1x credits)
Limitations and Trade-Offs:
- Opus 4.5 Availability: Not available for AWS IAM Identity Center users (as of 2025)
- Experimental Status: Opus 4.5 is in experimental support (may have stability issues)
- Routing Overhead: Auto Router adds ~100ms latency for task classification (negligible for most workflows)
- Non-determinism: Auto Router’s choices can vary based on load, user history, and internal heuristics
How It Works (Step-by-Step)
- Submit Request: You ask Kiro to “refactor the auth system”
- Router Analyzes: Scans request for complexity signals (“refactor” → medium complexity)
- Context Check: Measures current context (files loaded, conversation history)
- Model Selection: Chooses Sonnet 4.5 (balanced speed/quality for refactoring)
- Execution: Sonnet processes the request and returns the refactored code
- Monitoring: If Sonnet struggles (multiple retries, low confidence), router escalates to Opus
Minimal Concrete Example
1. Default Auto Router Behavior:
$ kiro-cli # Auto Router is enabled by default
> Read the README.md file
[Auto Router selects: Haiku 4.5 (0.4x cost, simple task)]
# File contents displayed quickly
> Write a function to parse JSON with error handling
[Auto Router selects: Sonnet 4.5 (1.3x cost, standard coding task)]
# Function written with proper try/catch
> Explain the trade-offs between B-tree and LSM-tree for database indexing
[Auto Router selects: Opus 4.5 (2.2x cost, deep technical reasoning)]
# Detailed explanation with diagrams, performance characteristics, use cases
2. Manual Model Override:
$ kiro-cli
> /model haiku
Switched to Claude Haiku 4.5 (0.4x credits)
> Refactor this complex state machine
[Haiku struggles, produces low-quality output]
> /model opus
Switched to Claude Opus 4.5 (2.2x credits)
> Refactor this complex state machine
[Opus produces clean, well-architected solution]
> /model auto
Switched back to Auto Router (recommended)
Common Misconceptions
- “Always using Opus gives better results”
- Truth: For simple tasks (file reads, basic edits), Opus is overkill and 5x more expensive than Haiku via Auto Router. Quality gains are marginal.
- “Auto Router picks the cheapest model”
- Truth: It balances cost and quality. For complex tasks, it uses Opus even though it’s expensive, because cheaper models would fail.
- “You should manually switch models for each task”
- Truth: Auto Router is smarter than manual selection because it learns from millions of tasks. Manual selection is for edge cases only.
- “Haiku can’t handle coding tasks”
- Truth: Haiku excels at simple, well-defined coding tasks (implementing a known algorithm, adding a basic CRUD endpoint). It struggles with ambiguity and complex reasoning.
- “Auto Router costs the same as Sonnet”
- Truth: Auto (1x) is cheaper than always-Sonnet (1.3x) because it delegates simple tasks to Haiku (0.4x).
Check-Your-Understanding Questions
- What are the credit multipliers for Haiku, Sonnet, Opus, and Auto?
- Why is Auto Router (1x) cheaper than always using Sonnet (1.3x)?
- When should you manually override the Auto Router?
- If a task fails with Haiku, what does Auto Router do?
- Which model has the largest context window?
Check-Your-Understanding Answers
-
Haiku: 0.4x, Sonnet: 1.3x, Opus: 2.2x, Auto: 1x (baseline).
-
Because Auto Router uses Haiku (0.4x) for simple tasks and Opus (2.2x) only when necessary. Always-Sonnet pays 1.3x even for trivial file reads that Haiku can handle.
-
When experimenting, benchmarking, or forcing specific behavior (e.g., you know a task needs Opus-level reasoning and don’t want Auto to try Sonnet first).
-
Auto Router escalates to a stronger model (Sonnet or Opus, depending on task complexity) and retries the task.
-
All Claude models have 200K token context windows (Haiku, Sonnet, Opus all share this limit as of 2025).
Real-World Applications
- Cost-Optimized Workflows: Use Auto Router to minimize costs across large development sessions (e.g., refactoring 50 files)
- Research Tasks: Manually select Opus for deep technical research, academic explanations, or algorithm design
- Rapid Prototyping: Force Haiku for fast iteration on well-scoped features when budget is constrained
- Code Review: Let Auto Router use Sonnet for style/logic checks, Opus for security audits
- Documentation Generation: Auto Router uses Haiku for boilerplate docs, Sonnet for API references, Opus for architectural decision records
Where You’ll Apply It
- Project 2: The Model Router Analyzer - Build a tool that logs Auto Router decisions and visualizes which model was used for each task
- Project 3: The Context Window Visualizer - Create a dashboard showing token usage across models and context budget
- Project 35: The Deep Reasoner - Explicitly force Opus for algorithmic challenges and compare results to Sonnet
- All Projects: Trust Auto Router to select the optimal model, monitor costs via
/usage(if available)
References
- Official Documentation: Model Selection - CLI - Kiro (2025)
- Official Documentation: Model Selection - IDE - Kiro (2025)
- Blog Post: Introducing Opus 4.5 in Kiro (2025)
- Analysis: Kiro Supports Claude Opus 4.5 - DevelopersIO (2025)
- Comparison: Testing Code Generated by Claude Opus vs Sonnet vs Haiku (2025)
- Blog Post: When Claude Goes Silent: API Timeouts and Model Selection (2025)
Key Insights
Auto Router transforms model selection from a manual decision (“Which brain do I need?”) into an automated optimization problem, reducing costs by using Haiku for simple tasks and reserving Opus for genuinely hard problems—achieving better quality-per-dollar than always picking Sonnet.
Summary
Kiro’s Auto Router (1x credit cost) intelligently selects between Claude Haiku 4.5 (0.4x, fast/cheap), Sonnet 4.5 (1.3x, balanced), and Opus 4.5 (2.2x, powerful) based on task complexity, context size, and historical performance. It’s more cost-efficient than manual selection because it uses Haiku for simple tasks (file reads, basic edits) and Opus only for complex reasoning (architecture, algorithms, debugging). Manual model switching via /model is available for experimentation or forcing specific behavior, but Auto mode is recommended for production workflows. All models share a 200K-token context window.
Homework/Exercises to Practice the Concept
- Exercise 1: Observe Auto Router Decisions
- Start Kiro in default Auto mode
- Submit 5 tasks of varying complexity (file read, code write, architecture question)
- Try to predict which model Auto Router will use for each
- Verify your predictions (note: Kiro doesn’t expose model choice directly, but you can infer from response quality/speed)
- Exercise 2: Compare Model Quality
- Pick a moderately complex task (“Refactor this function to use async/await”)
- Try it with
/model haiku, then/model sonnet, then/model opus - Compare output quality, response time, and suitability
- Exercise 3: Cost Analysis
- Track a full development session (30 minutes of coding)
- Estimate how many tasks were simple (Haiku-level), standard (Sonnet-level), and complex (Opus-level)
- Calculate hypothetical cost with Auto vs always-Sonnet vs always-Opus
- Exercise 4: Force Opus for Deep Reasoning
- Switch to Opus (
/model opus) - Ask a complex technical question: “Explain the CAP theorem and its implications for distributed database design”
- Compare the depth of explanation to what Sonnet would provide
- Switch to Opus (
- Exercise 5: Haiku Speed Test
- Switch to Haiku (
/model haiku) - Ask it to read 10 files and summarize their purpose
- Measure response time and quality
- Compare to Auto Router on the same task
- Switch to Haiku (
Solutions to the Homework/Exercises
Solution 1:
$ kiro-cli # Auto Router enabled
> Read src/auth.ts
# Prediction: Haiku (simple file read)
# Observation: Fast response (~1 second), basic file listing
# Result: Likely Haiku ✓
> Write a JWT authentication middleware for Express
# Prediction: Sonnet (standard coding task)
# Observation: Medium response time (~5 seconds), well-structured code
# Result: Likely Sonnet ✓
> What are the security implications of using JWTs vs session cookies?
# Prediction: Sonnet or Opus (depends on depth requested)
# Observation: Detailed response covering XSS, CSRF, token expiration, refresh tokens
# Result: Likely Sonnet (good depth) or Opus (exceptional depth) ✓
> Design a distributed rate-limiting system that handles 100K requests/sec with global consistency
# Prediction: Opus (complex distributed systems design)
# Observation: Slow response (~15 seconds), detailed architecture with trade-offs, algorithms, and implementation strategies
# Result: Definitely Opus ✓
> List all files in the src/ directory
# Prediction: Haiku (trivial task)
# Observation: Instant response (<1 second), simple file list
# Result: Likely Haiku ✓
Solution 2:
$ kiro-cli
# Test 1: Haiku
> /model haiku
> Refactor this function to use async/await:
> function getData(id, callback) {
> db.query('SELECT * FROM users WHERE id = ?', [id], (err, result) => {
> if (err) return callback(err);
> callback(null, result);
> });
> }
Haiku Output:
async function getData(id) {
const result = await db.query('SELECT * FROM users WHERE id = ?', [id]);
return result;
}
# Quality: Basic refactor, missing error handling, assumes promise-based db.query
# Test 2: Sonnet
> /model sonnet
> [Same prompt]
Sonnet Output:
async function getData(id) {
try {
const result = await db.query('SELECT * FROM users WHERE id = ?', [id]);
return result;
} catch (err) {
throw new Error(`Failed to fetch user with id ${id}: ${err.message}`);
}
}
# Quality: Proper error handling, clear error messages, good practices ✓
# Test 3: Opus
> /model opus
> [Same prompt]
Opus Output:
async function getData(id) {
try {
const result = await db.query('SELECT * FROM users WHERE id = ?', [id]);
if (!result || result.length === 0) {
throw new Error(`User not found for id: ${id}`);
}
return result[0]; // Return single user object
} catch (err) {
// Log error for debugging
console.error(`Database error in getData(${id}):`, err);
throw new Error(`Failed to fetch user with id ${id}: ${err.message}`);
}
}
# Quality: Handles edge cases (no result), returns single object vs array, includes logging
Conclusion: Sonnet is the sweet spot for this task. Opus adds nice-to-haves but isn't necessary.
Solution 3:
# Development session (30 minutes, 20 tasks):
Simple tasks (Haiku-level, 0.4x): 10 tasks
- Read 5 files (0.4 * 5 = 2.0 credits)
- List directories (0.4 * 1 = 0.4 credits)
- Simple edits (0.4 * 4 = 1.6 credits)
Subtotal: 4.0 credits
Standard tasks (Sonnet-level, 1.3x): 8 tasks
- Write 3 functions (1.3 * 3 = 3.9 credits)
- Refactor 2 modules (1.3 * 2 = 2.6 credits)
- Add 3 test files (1.3 * 3 = 3.9 credits)
Subtotal: 10.4 credits
Complex tasks (Opus-level, 2.2x): 2 tasks
- Debug race condition (2.2 * 1 = 2.2 credits)
- Design API architecture (2.2 * 1 = 2.2 credits)
Subtotal: 4.4 credits
Total costs:
- Auto Router: 4.0 + 10.4 + 4.4 = 18.8 credits (1x equivalent)
- Always Sonnet: 1.3 * 20 = 26 credits (1.38x more expensive)
- Always Opus: 2.2 * 20 = 44 credits (2.34x more expensive)
- Always Haiku: 0.4 * 20 = 8 credits, but 2 tasks would fail → retry with Sonnet → 8 + (1.3 * 2) = 10.6 credits + wasted time
Solution 4:
$ kiro-cli
> /model opus
Switched to Claude Opus 4.5 (2.2x credits)
> Explain the CAP theorem and its implications for distributed database design. Include real-world examples.
Opus Response (excerpt):
"The CAP theorem, proven by Eric Brewer in 2000 and formalized by Gilbert and Lynch in 2002, states that a distributed system can provide at most two of these three guarantees simultaneously:
1. **Consistency (C)**: Every read receives the most recent write
2. **Availability (A)**: Every request receives a response (success or failure)
3. **Partition Tolerance (P)**: The system continues to operate despite network partitions
In practice, network partitions are inevitable in distributed systems (P is mandatory), so the real trade-off is between **CP** (consistency + partition tolerance) and **AP** (availability + partition tolerance).
**CP Systems (Consistency over Availability):**
- **Example**: HBase, MongoDB (with majority writes), ZooKeeper
- **Trade-off**: During a partition, unavailable nodes reject writes to maintain consistency
- **Use case**: Financial transactions (you can't have stale account balances)
**AP Systems (Availability over Consistency):**
- **Example**: Cassandra, DynamoDB, Riak
- **Trade-off**: During a partition, nodes accept writes, leading to temporary inconsistency (resolved via eventual consistency)
- **Use case**: Social media feeds (stale data is acceptable, downtime is not)
**Real-World Design Implications:**
1. **Banking systems** use CP databases (MongoDB with strong consistency, Google Spanner) because correctness > availability
2. **Shopping carts** use AP databases (DynamoDB) with conflict resolution (last-write-wins or merge strategies)
3. **Netflix** uses Cassandra (AP) because showing slightly stale movie recommendations is fine, but the service must always be up
**Modern Nuance:**
The CAP theorem is often oversimplified. Real systems use tunable consistency (e.g., Cassandra's quorum reads) to adjust the CP/AP trade-off per operation..."
[Response continues with diagrams, mathematical proof sketch, references to papers]
# Compare to Sonnet:
> /model sonnet
> [Same question]
Sonnet Response:
"The CAP theorem says you can only have 2 of 3: Consistency, Availability, Partition tolerance. Most systems choose either CP (like MongoDB) or AP (like Cassandra) based on whether they prioritize consistency or availability during network failures."
# Opus provides 10x more depth with real-world examples, trade-offs, and references.
Solution 5:
$ kiro-cli
> /model haiku
Switched to Claude Haiku 4.5 (0.4x credits)
> Read these files and summarize their purpose:
> src/auth.ts, src/database.ts, src/routes.ts, src/middleware.ts, src/utils.ts, src/config.ts, src/server.ts, src/models/user.ts, src/models/post.ts, src/services/email.ts
[Timer start]
Haiku Response (3 seconds total):
"Here's a summary of each file's purpose:
- **src/auth.ts**: Authentication logic (JWT tokens, password hashing)
- **src/database.ts**: Database connection setup (Prisma client)
- **src/routes.ts**: API route definitions
- **src/middleware.ts**: Express middleware (auth, error handling, logging)
- **src/utils.ts**: Utility functions (validation, formatting)
- **src/config.ts**: Configuration management (env variables, constants)
- **src/server.ts**: Server initialization (Express app setup)
- **src/models/user.ts**: User data model (schema, types)
- **src/models/post.ts**: Post data model (schema, types)
- **src/services/email.ts**: Email sending service (SMTP configuration)"
[Timer end: 3 seconds]
# Comparison with Auto Router:
> /model auto
> [Same task]
Auto Router Response (4 seconds total, likely used Haiku):
[Similar output with slightly more detail on relationships between files]
# Result: Haiku is 25% faster and produces acceptable quality for this simple summarization task
# Cost savings: 0.4x vs 1x (60% cheaper with Haiku)
Chapter 8: Tools, Permissions, and Safety - The Security Boundary
Fundamentals
Every Kiro tool—read, write, shell, MCP call—carries risk. A single rm -rf / or accidental API key commit can destroy a project. Kiro’s tool permission system is the safety mechanism that prevents catastrophic mistakes while maintaining workflow velocity. Unlike traditional permission systems (e.g., Unix file permissions), Kiro’s model is declarative and intent-based: you pre-authorize trusted tools and patterns, and Kiro prompts for everything else.
The core abstraction is the allowedTools field in agent configuration. This whitelist defines which tools can execute without user approval. Think of it as a firewall rule: “allow @builtin/read for *.ts files” or “block @mcp/postgres/execute_query unless user confirms.” The system supports wildcards (r*, w*), MCP server namespaces (@fetch, @github/*), and built-in tool groups (@builtin).
Why this matters: Without tool governance, AI agents become unpredictable executors. With it, you create specialized agents for specific tasks (e.g., a “read-only analyzer” agent that can never modify code). Tool permissions also enable audit trails—you know exactly what Kiro did and when it asked for your approval.
Deep Dive
The Permission Model (500+ Words)
Kiro’s tool permission system operates on three layers:
1. Built-in Tools (@builtin namespace)
Kiro CLI ships with 20+ built-in tools: read, write, edit, shell, grep, glob, and more. Each tool has a unique identifier:
@builtin/read- Read files@builtin/write- Create/overwrite files@builtin/edit- Modify existing files@builtin/shell- Execute shell commands@builtin/mcp_call- Invoke MCP server tools
By default, Kiro prompts for approval before using any tool. To pre-approve all built-in tools for a specific agent, add "allowedTools": ["@builtin"] to the agent config. For finer control, use wildcards: ["r*", "grep", "glob"] allows read, grep, and glob but blocks write/edit/shell.
2. MCP Tools (Server Namespace)
MCP servers expose their own tools with namespaced IDs: @github/list_issues, @postgres/query, @fetch/get. The permission model supports:
- Server-level allowlisting:
["@github"]trusts all GitHub server tools - Tool-level patterns:
["@github/list_*"]allows list_issues, list_prs, etc. - Explicit tool names:
["@postgres/query"]grants access only to query, not execute or delete
Example MCP permission config:
{
"allowedTools": [
"@github/list_issues",
"@github/create_comment",
"@postgres/query"
]
}
3. Path and Command Restrictions
Beyond tool names, Kiro supports content-based rules:
toolsSettings.allowedPaths: Restrict file operations to specific directoriestoolsSettings.deniedCommands: Block dangerous shell commands (e.g.,["rm -rf", "dd if=/dev/zero"])toolsSettings.trustedCommands: Auto-approve specific shell patterns
Critical Security Concepts:
a. The Trust-All Anti-Pattern
Kiro supports /tools trust-all and /acceptall commands for rapid prototyping. NEVER use these in production. They disable all approval prompts, including:
- Deleting files
- Executing arbitrary shell commands
- Committing to Git
- Deploying to production
- Modifying database schema
b. Wildcard Risks
Using "allowedTools": ["*"] is not supported—Kiro requires explicit patterns. Even "@builtin" (all built-in tools) should be used cautiously. A safer approach:
{
"allowedTools": ["r*", "grep", "glob"], // Read-only tools
"deniedTools": ["write", "edit", "shell"] // Explicitly block writes
}
c. Hook-Based Guardrails
Tool permissions are static (defined in config). For dynamic safety checks, use hooks:
- preToolUse hook: Scans write/edit content for secrets (API keys, passwords)
- postToolUse hook: Auto-formats code after write operations
- userPromptSubmit hook: Warns if the user asks Kiro to delete production data
Example Security Hook (preToolUse):
#!/bin/bash
# .kiro/hooks/preToolUse.sh
TOOL_NAME=$1
ARGS=$2
if [[ "$TOOL_NAME" == "write" || "$TOOL_NAME" == "edit" ]]; then
# Scan for common secret patterns
if echo "$ARGS" | grep -E '(API_KEY|PASSWORD|SECRET)'; then
echo "🚨 ERROR: Detected potential secret in file content"
exit 2 # Block the tool execution
fi
fi
exit 0 # Allow execution
d. Audit and Compliance
For regulated industries (finance, healthcare), tool execution logs are critical. Kiro supports:
- Session transcripts: Full record of every tool call with arguments
- Hook logging: Custom hooks can write to SIEM systems
- MCP server logs: External tools (databases, APIs) log independently
Real-World Example: Fintech Security Setup
A financial services company configured Kiro with:
{
"allowedTools": [
"@builtin/read",
"@builtin/grep",
"@postgres/query" // Read-only SQL
],
"deniedTools": [
"@builtin/write",
"@builtin/edit",
"@postgres/execute",
"@postgres/delete"
],
"hooks": {
"preToolUse": ".kiro/hooks/secret-scanner.sh",
"postToolUse": ".kiro/hooks/audit-logger.sh"
}
}
Result: 35% reduction in security vulnerabilities from accidental secret commits (AWS case study, 2024).
How This Fits on Projects
You’ll apply tool permissions in Projects 18 (Security Firewall Hook) and Project 24 (Secret Sanitizer Hook). Both projects build guardrails using the permission system + hooks.
Definitions & Key Terms
- allowedTools: Whitelist of tools that execute without approval
- deniedTools: Blacklist that overrides allowedTools
- @builtin: Namespace for Kiro’s built-in tools (read, write, shell, etc.)
- @server: Namespace for MCP server tools (e.g., @github, @postgres)
- toolsSettings: Fine-grained configuration for paths, commands, and approval behavior
- preToolUse hook: Script that runs before a tool executes (can block execution)
- trust-all mode: Dangerous mode that disables all approvals (production anti-pattern)
Mental Model Diagram
┌──────────────────────────────────────────────────────────┐
│ KIRO TOOL PERMISSION SYSTEM (3 Layers) │
└──────────────────────────────────────────────────────────┘
User Request: "Read auth.ts, check for SQL injection, and fix it"
│
│
▼
┌────────────────────────────────────────────────────────┐
│ LAYER 1: STATIC PERMISSION CHECK (allowedTools) │
├────────────────────────────────────────────────────────┤
│ Tool: read(auth.ts) │
│ Allowed: ✅ (@builtin/read in allowedTools) │
│ Action: Execute without prompt │
└────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ LAYER 2: DYNAMIC HOOK CHECK (preToolUse) │
├────────────────────────────────────────────────────────┤
│ Hook: .kiro/hooks/preToolUse.sh │
│ Logic: Scan for secrets, check file size │
│ Result: ✅ Exit 0 (allow) │
└────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ LAYER 3: USER APPROVAL (if needed) │
├────────────────────────────────────────────────────────┤
│ Tool: edit(auth.ts) - NOT in allowedTools │
│ Kiro: "I want to edit auth.ts. Approve? [Y/n/diff]" │
│ User: Y │
│ Action: Execute + log approval │
└────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ LAYER 4: POST-EXECUTION HOOK (postToolUse) │
├────────────────────────────────────────────────────────┤
│ Hook: .kiro/hooks/postToolUse.sh │
│ Logic: Run prettier, log to audit.log │
│ Result: ✅ Formatted code + audit trail │
└────────────────────────────────────────────────────────┘
PERMISSION HIERARCHY (Most to Least Restrictive):
1. deniedTools → Always block (highest priority)
2. allowedTools → Pre-approve (no prompt)
3. preToolUse hook → Dynamic validation (can block)
4. User approval → Manual confirmation (fallback)
5. trust-all mode → Disable all checks (DANGER)
How It Works
Step-by-step execution flow:
- Tool Request: Kiro decides to use a tool (e.g.,
write(newFile.ts)) - Denied Check: Is the tool in
deniedTools? If yes → block + show error - Allowed Check: Is the tool in
allowedTools(or matches wildcard)? If yes → proceed to hook check - Hook Execution: Run
preToolUsehook (if configured). If exit code 2 → block, if 0 → allow - User Approval: If not allowed and hook passed, prompt user: “Approve write(newFile.ts)? [Y/n/diff]”
- Execute Tool: Perform the operation (write file, run command, call MCP)
- Post-Hook: Run
postToolUsehook (auto-format, log, notify)
Invariants:
deniedToolsalways wins overallowedTools- Hooks can block allowed tools (exit code 2)
- User approval can override denied tools (manual “/allow” command)
Failure Modes:
- Over-permissive config:
["@builtin"]+ no hooks → accidental file deletions - Under-permissive config: Empty
allowedTools→ constant approval prompts (workflow friction) - Hook bugs: Infinite loop in preToolUse → Kiro hangs
Minimal Concrete Example
Scenario: Create a read-only agent that can analyze code but never modify it.
1. Agent config (.kiro/agents/readonly-analyzer.json):
{
"name": "ReadOnly Analyzer",
"prompt": "You analyze code for bugs but never modify files.",
"allowedTools": [
"@builtin/read",
"@builtin/grep",
"@builtin/glob"
],
"deniedTools": [
"@builtin/write",
"@builtin/edit",
"@builtin/shell"
]
}
2. Usage:
$ kiro chat --agent readonly-analyzer
You: "Find all SQL injection vulnerabilities in the codebase"
Kiro: [reads files with grep, analyzes patterns]
"Found 3 potential SQL injection points:
1. auth.ts:42 - User input concatenated in query
2. api.ts:128 - Unescaped WHERE clause
3. db.ts:56 - Dynamic table name"
You: "Fix them"
Kiro: "I cannot modify files (write/edit tools are denied).
Would you like me to explain the fix instead?"
Result: The agent is guaranteed to be read-only by configuration. No user approval can override deniedTools.
Common Misconceptions
-
“allowedTools: [‘*’] allows all tools” ❌ Wrong. Kiro does not support the
*wildcard for all tools. You must use specific patterns like@builtin,r*, or server namespaces. -
“Hooks replace allowedTools” ❌ Wrong. Hooks are complementary. Static permissions (allowedTools) provide the first line of defense, hooks add dynamic validation.
-
“trust-all mode is safe for prototyping” ⚠️ Dangerous. Even in dev environments, trust-all can delete .git directories, overwrite production configs, or leak secrets. Use narrow
allowedToolsinstead. -
“deniedTools is redundant if allowedTools is restrictive” ❌ Wrong.
deniedToolsis explicit denial—it prevents accidental additions toallowedToolsfrom bypassing safety. -
“MCP tools bypass Kiro permissions” ❌ Wrong. MCP tools (e.g., @github/delete_repo) follow the same allowedTools rules. You must explicitly allow @github/* or the specific tool.
Check-Your-Understanding Questions
-
You configure
"allowedTools": ["@builtin"]and"deniedTools": ["@builtin/shell"]. Can Kiro executeshell("npm install")? -
A preToolUse hook exits with code 1 (error). Does Kiro block the tool or prompt the user?
-
What’s the difference between
"allowedTools": ["@github"]and"allowedTools": ["@github/*"]? -
Can a user manually approve a tool that’s in
deniedTools? -
If
allowedToolsis empty, does Kiro refuse all tool requests or prompt for every tool?
Check-Your-Understanding Answers
-
No.
deniedToolshas higher priority thanallowedTools. Even though@builtinallows all built-in tools,@builtin/shellis explicitly denied. -
Prompt the user. Only exit code 2 blocks execution. Exit code 1 is treated as a warning, and Kiro proceeds to user approval.
-
No difference in Kiro. Both syntax forms allow all tools from the @github server. The
/*suffix is optional for server-level allowlisting. -
Yes, with /allow command. Users can override denied tools manually, but it’s logged and requires explicit confirmation.
-
Prompts for every tool. Empty
allowedToolsmeans “require approval for everything.” It doesn’t block tools—just forces manual confirmation.
Real-World Applications
- Regulated Industries (Finance, Healthcare)
- Use
allowedTools: ["@builtin/read", "@builtin/grep"]for compliance analysts - Add preToolUse hooks to scan for PII/PHI in file operations
- Deploy deniedTools for production write operations
- Use
- CI/CD Pipelines
- Headless Kiro agents need
allowedTools: ["@builtin"]to run autonomously - Use hooks to enforce test-before-merge policies
- Block
@builtin/shellwith dangerous commands (rm -rf, dd)
- Headless Kiro agents need
- Open-Source Projects
- Contributors can use restrictive allowedTools to prevent accidental damage
- Maintainers use hooks to auto-format code and run linters
- Educational Environments
- Students use read-only agents to learn codebases without risk
- Instructors use hooks to enforce style guides and prevent plagiarism
Where You’ll Apply It
- Project 5 (Steering Enforcer): Uses allowedTools to create constrained agents
- Project 18 (Security Firewall Hook): Builds preToolUse hooks for secret scanning
- Project 24 (Secret Sanitizer Hook): Implements dynamic content validation
- Project 34 (Cloud Native Deployer): Uses deniedTools to prevent accidental production deploys
References
- Managing Tool Permissions - Kiro Docs
- Security Considerations - Kiro Docs
- Agent Configuration Reference - Kiro Docs
- Built-in Tools - Kiro Docs
Key Insights
The Golden Rule of Tool Permissions: Deny by default, allow explicitly. If you wouldn’t let a junior developer run a command unsupervised, don’t add it to allowedTools.
Summary
Kiro’s tool permission system is a three-layer security boundary: static allowedTools (config-based whitelist), dynamic hooks (runtime validation), and user approval (human-in-the-loop). The system prevents catastrophic mistakes (accidental deletions, secret leaks) while maintaining workflow velocity through pre-authorized tool patterns. Critical concepts include the @builtin namespace (20+ built-in tools), MCP server namespaces (@github, @postgres), wildcards (r, w), and the trust-all anti-pattern (never use in production). Advanced usage involves deniedTools (explicit blocklist), path restrictions (allowedPaths), and audit logging for compliance.
Homework/Exercises to Practice the Concept
Exercise 1: Design a Read-Only Agent Create an agent configuration that can analyze TypeScript code but cannot modify files or execute shell commands. The agent should be able to:
- Read all .ts files in src/
- Search for patterns with grep
- List files with glob
- Cannot: Write, edit, or run shell commands
Exercise 2: Build a Secret Scanner Hook Write a preToolUse hook (in bash or Python) that:
- Intercepts write/edit tool calls
- Scans the file content for common secret patterns:
API_KEY=...password=...-----BEGIN PRIVATE KEY-----
- Blocks execution (exit 2) if secrets are detected
- Logs the blocked attempt to a file
Exercise 3: MCP Permission Scoping Given this MCP server config:
{
"mcpServers": {
"github": { "command": "npx", "args": ["@modelcontextprotocol/server-github"] },
"postgres": { "command": "npx", "args": ["@modelcontextprotocol/server-postgres"] }
}
}
Design an agent that can:
- List GitHub issues and PRs
- Create comments on issues
- Cannot: Delete repositories, close issues, or execute SQL queries
Solutions to the Homework/Exercises
Solution 1: Read-Only Agent
{
"name": "TypeScript Analyzer",
"prompt": "Analyze TypeScript code for bugs, patterns, and best practices. You cannot modify files.",
"allowedTools": [
"@builtin/read",
"@builtin/grep",
"@builtin/glob"
],
"deniedTools": [
"@builtin/write",
"@builtin/edit",
"@builtin/shell",
"@builtin/mcp_call"
],
"toolsSettings": {
"allowedPaths": ["src/**/*.ts", "tests/**/*.ts"]
}
}
Why this works:
allowedToolsincludes only read-oriented toolsdeniedToolsexplicitly blocks all modification tools (defense-in-depth)allowedPathsrestricts reads to TypeScript files only (prevents reading .env, etc.)
Solution 2: Secret Scanner Hook
#!/bin/bash
# .kiro/hooks/preToolUse.sh
TOOL_NAME=$1
TOOL_ARGS=$2
# Only scan write/edit operations
if [[ "$TOOL_NAME" != "write" && "$TOOL_NAME" != "edit" ]]; then
exit 0 # Allow other tools
fi
# Extract file content from tool arguments (simplified)
FILE_CONTENT=$(echo "$TOOL_ARGS" | jq -r '.content' 2>/dev/null)
# Secret patterns (regex)
SECRET_PATTERNS=(
"API_KEY\s*=\s*['\"][A-Za-z0-9_-]+"
"password\s*=\s*['\"][^'\"]+['\"]"
"-----BEGIN (RSA |EC )?PRIVATE KEY-----"
"ghp_[A-Za-z0-9]{36}" # GitHub Personal Access Token
)
# Scan for secrets
for PATTERN in "${SECRET_PATTERNS[@]}"; do
if echo "$FILE_CONTENT" | grep -E -q "$PATTERN"; then
echo "🚨 BLOCKED: Detected secret pattern '$PATTERN' in file content"
echo "$(date): BLOCKED write/edit - secret detected" >> .kiro/logs/security.log
exit 2 # Block the tool
fi
done
echo "✅ Secret scan passed"
exit 0 # Allow tool execution
Why this works:
- Intercepts write/edit before execution
- Uses regex to detect common secret formats
- Exit code 2 blocks the tool (Kiro’s convention)
- Logs blocked attempts for audit
Solution 3: MCP Permission Scoping
{
"name": "GitHub Issue Manager",
"prompt": "Manage GitHub issues and comments. You cannot delete repos or execute SQL.",
"allowedTools": [
"@builtin/read",
"@github/list_issues",
"@github/list_pull_requests",
"@github/create_comment"
],
"deniedTools": [
"@github/delete_repository",
"@github/close_issue",
"@postgres"
]
}
Why this works:
allowedToolslists specific GitHub tools (not@github/*)deniedToolsblocks dangerous GitHub operations- Denying
@postgresprevents all database operations (server-level block)
Chapter 9: Remote and Enterprise Workflows - Development Without Boundaries
Fundamentals
Traditional development workflows assume one machine: your laptop. But modern engineering happens across SSH tunnels to production servers, inside Docker containers, through corporate proxies, and across distributed teams. Kiro CLI is designed for headless, remote-first operation—you can run it on a Linux server via SSH, inside a CI/CD pipeline, or through a VPN tunnel to a corporate network.
The core capability is headless mode: Kiro operates without a browser UI, accepting commands from stdin and outputting results to stdout. This enables:
- SSH sessions: Debug production issues on remote servers without losing Kiro context
- CI/CD integration: Run Kiro agents in GitHub Actions, GitLab CI, or Jenkins
- Corporate environments: Operate through proxies, firewalls, and air-gapped networks
- Containerized workflows: Run Kiro inside Docker with MCP servers as sidecars
Why this matters: Local-only AI tools break down in enterprise environments. Kiro’s remote-first architecture means you never lose cognitive assistance—whether you’re SSH’d into a production database, debugging a Kubernetes pod, or working from a locked-down corporate network.
Deep Dive
Headless and SSH Workflows (500+ Words)
1. SSH Remote Development
Kiro CLI can run on any Linux server with glibc 2.34+ (or use the musl build for older systems). The typical workflow:
# On your local machine
$ ssh user@production-server
# On the remote server
$ kiro chat
You: "Analyze the nginx error logs from the last hour"
Kiro: [reads /var/log/nginx/error.log, analyzes patterns]
"Found 42 errors:
- 38 upstream timeouts (backend server slow)
- 3 SSL handshake failures (cert expiring?)
- 1 disk full warning (/var/log at 95%)"
You: "Show me the slowest API endpoint from access logs"
Kiro: [parses access.log, aggregates by endpoint]
"Slowest endpoints (average response time):
1. /api/reports/generate - 8.2s
2. /api/users/search - 3.1s
3. /api/dashboard - 1.9s"
Why this works: Kiro’s context window (200K tokens) persists across the SSH session. You can spend hours debugging without re-explaining the problem.
2. Authentication in Headless Mode
Kiro requires authentication, but headless servers don’t have browsers for OAuth flows. The solution: SSH port forwarding.
Setup (one-time):
# On remote server, edit SSH config
$ sudo nano /etc/ssh/sshd_config
# Add these lines:
AcceptEnv KIRO_AUTH_TOKEN
AllowStreamLocalForwarding yes
# Restart SSH
$ sudo systemctl restart sshd
Login workflow:
# On your local machine, authenticate once
$ kiro auth login
# Browser opens, you sign in, token stored locally
# SSH to remote with port forwarding
$ ssh -R 52698:localhost:52698 user@server
# On remote, Kiro uses the forwarded auth
$ kiro chat
# Works without re-authentication!
3. CI/CD Integration (GitHub Actions Example)
Kiro can run as a bot in your CI pipeline to auto-fix lint errors, generate tests, or refactor code.
Example: Auto-fix failing tests
# .github/workflows/kiro-fix-tests.yml
name: Kiro Test Fixer
on:
push:
branches: [main, develop]
jobs:
fix-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Kiro CLI
run: |
curl -fsSL https://kiro.dev/install.sh | bash
echo "$HOME/.kiro/bin" >> $GITHUB_PATH
- name: Authenticate Kiro
env:
KIRO_API_KEY: ${{ secrets.KIRO_API_KEY }}
run: kiro auth login --api-key $KIRO_API_KEY
- name: Run tests and fix failures
run: |
npm test || true # Don't fail if tests fail
kiro chat --headless --prompt "Run npm test. If tests fail, analyze the errors and fix them. Commit the fixes with message 'fix: auto-fix failing tests [kiro]'"
- name: Push fixes
run: |
git config user.name "Kiro Bot"
git config user.email "kiro@company.com"
git push origin HEAD
Result: Tests fail → Kiro analyzes errors → Fixes code → Commits → Pushes. Fully automated.
4. Corporate Proxy Navigation
Many enterprises block direct internet access, forcing all traffic through HTTP proxies. Kiro supports standard proxy environment variables:
Setup:
# In .bashrc or .zshrc
export HTTP_PROXY=http://proxy.company.com:8080
export HTTPS_PROXY=https://proxy.company.com:8080
export NO_PROXY=localhost,127.0.0.1,.company.local
# For authenticated proxies
export HTTP_PROXY=http://username:password@proxy.company.com:8080
MCP servers through proxies:
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["@modelcontextprotocol/server-github"],
"env": {
"HTTP_PROXY": "http://proxy.company.com:8080",
"HTTPS_PROXY": "https://proxy.company.com:8080"
}
}
}
}
5. Air-Gapped Networks (Offline Mode)
In secure environments (defense, finance), servers have no internet access. Kiro supports offline operation with pre-downloaded models:
Setup:
# On an internet-connected machine, download models
$ kiro models download claude-sonnet-4.5
$ kiro models download claude-haiku-4.5
# Copy .kiro/models/ to the air-gapped server
$ scp -r ~/.kiro/models user@secure-server:~/.kiro/
# On the air-gapped server
$ kiro chat --offline
Limitations: MCP servers that require internet (GitHub, AWS) won’t work. Use file-based MCP servers (SQLite, local Postgres).
6. Distributed Teams (Shared Context)
Teams can share Kiro configurations and knowledge bases via Git.
Example: Team-wide agent library
# Create a shared config repo
$ git init kiro-team-config
$ cd kiro-team-config
# Add team agents
$ mkdir -p agents
$ cat > agents/security-scanner.json <<EOF
{
"name": "Security Scanner",
"prompt": "Scan code for vulnerabilities (SQL injection, XSS, etc.). Never modify files.",
"allowedTools": ["@builtin/read", "@builtin/grep"],
"deniedTools": ["@builtin/write", "@builtin/edit", "@builtin/shell"]
}
EOF
# Push to team repo
$ git add . && git commit -m "Add security scanner agent"
$ git push origin main
# Team members clone and link
$ git clone https://github.com/company/kiro-team-config
$ ln -s $(pwd)/kiro-team-config ~/.kiro/team
Usage: All team members can now run kiro chat --agent team/security-scanner.
Real-World Example: SRE Team at Fintech Corp
An SRE team manages 200+ microservices across 50 Kubernetes clusters. They configured Kiro with:
- SSH access to jump boxes (bastion hosts) with port forwarding
- MCP servers for Kubernetes (kubectl proxy as MCP)
- Shared agent library (incident-responder, log-analyzer, metric-detective)
- Audit hooks (all shell commands logged to Splunk)
Result: Incident response time reduced from 45 minutes to 12 minutes. Kiro analyzes logs, identifies root causes, and suggests fixes—all without leaving the SSH session.
How This Fits on Projects
You’ll apply remote workflows in:
- Project 21 (Headless Server Setup): Configure Kiro for SSH and CI/CD
- Project 22 (SSH Tunnel Agent): Build agents that work through SSH tunnels
- Project 23 (Corporate Proxy Navigator): Handle proxy auth and certificate trust
Definitions & Key Terms
- Headless mode: Running Kiro without a browser UI (stdin/stdout interaction)
- SSH port forwarding: Tunneling Kiro’s auth server through SSH (-R flag)
- glibc: GNU C Library required for Kiro (2.34+ on host)
- musl build: Alternative Kiro binary for older Linux distributions
- CI/CD integration: Running Kiro agents in automated pipelines (GitHub Actions, GitLab CI)
- Air-gapped network: Isolated environment with no internet access
- Jump box (bastion): Hardened server used as SSH gateway to production
Mental Model Diagram
┌──────────────────────────────────────────────────────────────┐
│ KIRO REMOTE WORKFLOW ARCHITECTURE │
└──────────────────────────────────────────────────────────────┘
LOCAL MACHINE (Developer Laptop)
┌────────────────────────────────────────┐
│ $ ssh -R 52698:localhost:52698 user@ │
│ production-db-01 │
│ │
│ [Kiro Auth Token] ────────┐ │
│ ~/.kiro/auth.json │ │
└─────────────────────────────┼──────────┘
│ SSH Tunnel
│ (Port 52698)
▼
REMOTE SERVER (Production Database)
┌────────────────────────────────────────┐
│ $ kiro chat │
│ │
│ Kiro CLI ◄──────────────────┐ │
│ │ │ │
│ │ Uses tunneled auth │ │
│ ▼ │ │
│ ┌──────────────────────┐ │ │
│ │ MCP Server: Postgres│ │ │
│ │ (localhost:5432) │────┘ │
│ └──────────────────────┘ │
│ │
│ User: "Show me slow queries from the │
│ last hour" │
│ │
│ Kiro: [queries pg_stat_statements] │
│ "Top 3 slow queries: │
│ 1. SELECT * FROM orders WHERE ... 8s │
│ 2. UPDATE users SET ... 5.2s │
│ 3. DELETE FROM logs WHERE ... 3.1s" │
└────────────────────────────────────────┘
CI/CD PIPELINE (GitHub Actions)
┌────────────────────────────────────────┐
│ workflow: kiro-test-fixer │
│ ┌──────────────────────────────────┐ │
│ │ 1. Checkout code │ │
│ │ 2. Install Kiro CLI │ │
│ │ 3. Auth with API key │ │
│ │ 4. Run tests │ │
│ │ 5. If fail: kiro fix-tests │ │
│ │ 6. Commit + push fixes │ │
│ └──────────────────────────────────┘ │
│ │
│ Environment: Ubuntu 22.04 │
│ Context: Headless (no browser) │
│ Auth: KIRO_API_KEY secret │
└────────────────────────────────────────┘
CORPORATE PROXY SETUP
┌────────────────────────────────────────┐
│ Developer Workstation (Behind Proxy) │
│ ┌──────────────────────────────────┐ │
│ │ export HTTP_PROXY= │ │
│ │ http://proxy.corp.com:8080 │ │
│ │ │ │
│ │ Kiro CLI ──► Proxy ──► Internet │ │
│ │ ▲ │ │
│ │ │ Authenticated │ │
│ │ │ (user:pass) │ │
│ └──────────────┴───────────────────┘ │
│ │
│ MCP Servers inherit proxy config │
└────────────────────────────────────────┘
KEY INSIGHT: Kiro's architecture is "context-first, location-agnostic"
You can SSH into a server and continue a conversation from
your laptop—context follows you, not the machine.
How It Works
Step-by-step SSH remote workflow:
- Local authentication: Run
kiro auth loginon your laptop (one-time setup) - SSH with port forwarding:
ssh -R 52698:localhost:52698 user@server- This tunnels Kiro’s auth server (port 52698) from local → remote
- Remote Kiro startup: On the server, run
kiro chat- Kiro detects the tunneled auth port and uses your local credentials
- Context persistence: Your conversation state is stored in
~/.kiro/sessions/on the remote server - MCP servers: Remote Kiro can access local services (postgres, redis) via SSH tunnel or local sockets
Invariants:
- Auth token must be valid (expires after 7 days, refresh with
kiro auth refresh) - SSH tunnel must stay open (if connection drops, re-establish tunnel)
- Remote server must allow AllowStreamLocalForwarding (sshd_config)
Failure Modes:
- Port 52698 already in use: Another Kiro session is running locally
- SSH timeout: Corporate firewalls may kill idle SSH sessions (use ServerAliveInterval=60)
- glibc version mismatch: Server has glibc < 2.34 (solution: use musl build)
Minimal Concrete Example
Scenario: Debug a production API server via SSH.
1. On your laptop, set up port forwarding:
$ ssh -R 52698:localhost:52698 user@api-server-prod
2. On the remote server:
$ kiro chat
You: "Analyze the API access logs for 500 errors in the last hour"
Kiro: [reads /var/log/api/access.log]
"Found 12 × 500 errors:
- 8 from POST /api/payments (timeout connecting to payment gateway)
- 3 from GET /api/users/:id (database connection pool exhausted)
- 1 from DELETE /api/sessions (Redis connection refused)"
You: "Check if the payment gateway is reachable"
Kiro: [runs curl to payment gateway]
"Payment gateway is UP but slow:
- Ping: 250ms (normal: 50ms)
- /health endpoint: 5s response (normal: 200ms)
Likely cause: Network congestion or gateway under load."
You: "Increase the payment timeout from 5s to 10s in the config"
Kiro: [edits /etc/api/config.yaml, restarts service]
"Updated payment_timeout: 10s
Restarted api.service
Monitoring for new 500 errors..."
Result: You debugged and fixed a production issue without leaving the SSH session or losing context.
Common Misconceptions
-
“Kiro requires a desktop environment” ❌ Wrong. Kiro CLI is fully headless-compatible. No GUI, no browser required for operation.
-
“SSH port forwarding is insecure” ❌ Wrong. SSH tunneling is encrypted end-to-end. Port 52698 is only accessible on localhost.
-
“CI/CD bots need human approval for every tool” ❌ Wrong. In headless mode with
allowedTools: ["@builtin"], Kiro runs autonomously (no prompts). -
“Corporate proxies break MCP servers” ⚠️ Partially true. HTTP/HTTPS proxies work fine. SOCKS proxies and authenticated proxies require manual configuration.
-
“Offline mode disables all features” ❌ Wrong. Offline mode only disables internet-dependent MCP servers. All built-in tools (read, write, shell) work normally.
Check-Your-Understanding Questions
-
You SSH into a server with
ssh user@server. Kiro auth fails with “No authentication found.” What’s missing? -
In GitHub Actions, you set
KIRO_API_KEYas a secret. How does Kiro authenticate in the CI pipeline? -
Can you run Kiro in a Docker container without internet access? What limitations exist?
-
A corporate firewall blocks port 52698. Can you still use Kiro remotely?
-
What’s the difference between
kiro chat --headlessandkiro chatin an SSH session?
Check-Your-Understanding Answers
-
Missing SSH port forwarding. You need
ssh -R 52698:localhost:52698 user@serverto tunnel authentication from local → remote. -
Via environment variable. In the workflow, run
kiro auth login --api-key $KIRO_API_KEY. Kiro reads the API key from the env var. -
Yes, with limitations. Offline mode works in Docker. You cannot use internet-dependent MCP servers (GitHub, AWS), but local servers (Postgres, SQLite) work fine.
-
Yes, with a different port. Use
ssh -R 12345:localhost:52698 user@serverand setKIRO_AUTH_PORT=12345on the remote. -
No difference in behavior.
--headlessis implicit when stdin/stdout are not a TTY (e.g., in SSH). The flag is mainly for CI/CD clarity.
Real-World Applications
- SRE Incident Response
- SSH into production, analyze logs with Kiro, identify root cause
- Kiro suggests fixes, you review and apply
- Result: Faster MTTR (Mean Time To Resolution)
- Automated Code Review (CI/CD)
- GitHub Actions runs Kiro on every PR
- Kiro checks for security issues, style violations, missing tests
- Posts review comments on the PR
- Database Migration Validation
- Connect to staging database via SSH tunnel
- Kiro analyzes schema changes, generates migration scripts
- Validates data integrity before production deploy
- Kubernetes Debugging
kubectl execinto a pod, run Kiro- Kiro reads logs, analyzes traces, suggests config changes
- No need to download logs locally
Where You’ll Apply It
- Project 21 (Headless Server Setup): Configure Kiro for SSH, CI/CD, and Docker
- Project 22 (SSH Tunnel Agent): Build agents that debug remote systems
- Project 23 (Corporate Proxy Navigator): Handle proxy authentication and certificates
- Project 27 (SSH Remote Development Agent): Full remote development workflow
References
- SSH Remote Development - DEV Community Guide
- Terminal Integration - Kiro Docs
- Kiro CLI Get Started Guide
Key Insights
The Remote-First Principle: Kiro is designed for “development without boundaries.” Whether you’re SSH’d into a server, inside a Docker container, or behind a corporate proxy, Kiro maintains full context and capability.
Summary
Kiro CLI supports headless, remote-first workflows via SSH port forwarding (authentication tunneling), CI/CD integration (GitHub Actions, GitLab CI), corporate proxy navigation (HTTP_PROXY environment variables), and air-gapped operation (offline mode with pre-downloaded models). The architecture is location-agnostic: context persists across SSH sessions, and MCP servers can run locally or remotely. Critical patterns include SSH tunnel setup (ssh -R 52698:localhost:52698), headless authentication (API keys or tunneled auth), and distributed team workflows (shared agent libraries via Git). Enterprise features include audit logging, proxy authentication, and compliance-friendly deniedTools configurations.
Homework/Exercises to Practice the Concept
Exercise 1: SSH Remote Debugging Setup Set up Kiro on a remote Linux server (use a VM or cloud instance like EC2/DigitalOcean). Configure:
- SSH port forwarding for authentication
- A custom agent that can read logs but not modify files
- An MCP server that connects to a remote PostgreSQL database
Test by SSH’ing into the server and running kiro chat to analyze database query performance.
Exercise 2: CI/CD Test Auto-Fixer Create a GitHub Actions workflow that:
- Runs your project’s test suite
- If tests fail, triggers Kiro to analyze the failures
- Kiro attempts to fix the tests automatically
- Commits and pushes the fixes to a new branch
- Opens a PR with the fixes
Exercise 3: Corporate Proxy Configuration Simulate a corporate proxy environment (use Squid proxy or similar). Configure:
- Environment variables for HTTP_PROXY and HTTPS_PROXY
- An MCP server (like @modelcontextprotocol/server-github) that works through the proxy
- A custom CA certificate for SSL inspection (common in enterprises)
Verify that Kiro can authenticate and use MCP servers through the proxy.
Solutions to the Homework/Exercises
Solution 1: SSH Remote Debugging Setup
On remote server (install Kiro):
$ curl -fsSL https://kiro.dev/install.sh | bash
$ echo 'export PATH="$HOME/.kiro/bin:$PATH"' >> ~/.bashrc
$ source ~/.bashrc
On remote server (configure SSH):
$ sudo nano /etc/ssh/sshd_config
# Add: AcceptEnv KIRO_AUTH_TOKEN
# Add: AllowStreamLocalForwarding yes
$ sudo systemctl restart sshd
Create read-only log analyzer agent:
$ mkdir -p ~/.kiro/agents
$ cat > ~/.kiro/agents/log-analyzer.json <<EOF
{
"name": "Log Analyzer",
"prompt": "Analyze system logs for errors, warnings, and patterns. You cannot modify files.",
"allowedTools": [
"@builtin/read",
"@builtin/grep",
"@builtin/shell"
],
"deniedTools": [
"@builtin/write",
"@builtin/edit"
],
"toolsSettings": {
"allowedPaths": ["/var/log/**", "/tmp/**"],
"trustedCommands": ["grep", "tail", "head", "wc"]
}
}
EOF
Configure Postgres MCP server:
$ cat > ~/.kiro/settings/mcp.json <<EOF
{
"mcpServers": {
"postgres": {
"command": "npx",
"args": [
"@modelcontextprotocol/server-postgres",
"postgresql://user:password@localhost:5432/production_db"
]
}
}
}
EOF
On local machine (SSH with port forwarding):
$ ssh -R 52698:localhost:52698 user@remote-server
# Now on remote server
$ kiro chat --agent log-analyzer
You: "Show me all 500 errors from nginx in the last hour, then check if the database query time increased"
Kiro: [reads /var/log/nginx/error.log, uses @postgres/query to check pg_stat_statements]
Solution 2: CI/CD Test Auto-Fixer
# .github/workflows/kiro-test-fixer.yml
name: Kiro Test Auto-Fixer
on:
push:
branches: [main, develop, feature/*]
pull_request:
jobs:
fix-tests:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0 # Full history for better Kiro context
- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install dependencies
run: npm ci
- name: Install Kiro CLI
run: |
curl -fsSL https://kiro.dev/install.sh | bash
echo "$HOME/.kiro/bin" >> $GITHUB_PATH
- name: Authenticate Kiro
env:
KIRO_API_KEY: ${{ secrets.KIRO_API_KEY }}
run: kiro auth login --api-key $KIRO_API_KEY
- name: Run tests
id: test
continue-on-error: true
run: npm test > test-output.txt 2>&1
- name: Fix tests with Kiro
if: steps.test.outcome == 'failure'
run: |
kiro chat --headless --prompt "
I ran 'npm test' and tests failed. Here's the output:
$(cat test-output.txt)
Please:
1. Analyze the test failures
2. Fix the code to make tests pass
3. Run 'npm test' again to verify
4. If tests pass, stage all changes with 'git add .'
Do NOT commit yet - I'll handle that.
"
- name: Verify tests pass
run: npm test
- name: Create Pull Request
if: steps.test.outcome == 'failure'
uses: peter-evans/create-pull-request@v5
with:
token: ${{ secrets.GITHUB_TOKEN }}
commit-message: 'fix: auto-fix failing tests [kiro]'
branch: kiro/fix-tests-${{ github.sha }}
title: '🤖 Kiro: Auto-fix failing tests'
body: |
Kiro automatically fixed failing tests.
**Original test output:**
```
$(cat test-output.txt)
```
**Kiro analysis and fixes applied.**
Please review the changes before merging.
Solution 3: Corporate Proxy Configuration
Set up Squid proxy (for testing):
# On a test machine or Docker container
$ sudo apt-get install squid
$ sudo nano /etc/squid/squid.conf
# Add:
http_port 3128
acl localnet src 192.168.0.0/16
http_access allow localnet
$ sudo systemctl restart squid
Configure Kiro for proxy:
# In ~/.bashrc or ~/.zshrc
export HTTP_PROXY=http://proxy-server:3128
export HTTPS_PROXY=http://proxy-server:3128
export NO_PROXY=localhost,127.0.0.1,.internal
# For authenticated proxies
export HTTP_PROXY=http://username:password@proxy-server:3128
Configure MCP server with proxy:
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["@modelcontextprotocol/server-github"],
"env": {
"HTTP_PROXY": "http://proxy-server:3128",
"HTTPS_PROXY": "http://proxy-server:3128",
"NODE_TLS_REJECT_UNAUTHORIZED": "0"
}
}
}
}
For SSL inspection with custom CA:
# Download corporate CA certificate
$ wget http://internal.company.com/ca-cert.crt
# Add to system trust store
$ sudo cp ca-cert.crt /usr/local/share/ca-certificates/
$ sudo update-ca-certificates
# Configure Node.js to use system CA
$ export NODE_EXTRA_CA_CERTS=/etc/ssl/certs/ca-certificates.crt
Verify Kiro works through proxy:
$ kiro chat
You: "List my GitHub repositories"
Kiro: [uses @github/list_repos through proxy]
"Found 12 repositories:
1. company/api-gateway
2. company/auth-service
..."
Chapter 10: Multimodal and Experimental Features - The Cutting Edge
Fundamentals
Kiro CLI ships with experimental features: beta capabilities that push the boundaries of AI-assisted development. These features are toggleable (via /experiment command) and actively evolving based on user feedback. The two major categories are:
- Multimodal Input: Processing images, PDFs, screenshots, and diagrams alongside text
- Advanced Context Management: Knowledge bases, thinking tool, checkpointing, and tangent mode
Why this matters: Traditional AI tools operate in text-only silos. Kiro’s multimodal and experimental features enable visual debugging (paste a screenshot, Kiro explains the bug), semantic memory (Kiro remembers patterns across all your projects), and non-linear thinking (explore tangents without polluting main context).
Important: Experimental features may change or be removed. Use them to accelerate development, but don’t build production workflows that depend on beta APIs.
Deep Dive
Multimodal Input (Images, Screenshots, PDFs) (500+ Words)
1. Visual Debugging with Screenshots
Kiro’s multimodal agent can analyze UI bugs, architecture diagrams, and whiteboard sketches. The workflow:
$ kiro chat
You: "Here's a screenshot of the bug: /path/to/screenshot.png"
[Kiro analyzes the image]
Kiro: "I see the UI rendering issue:
- The login button is overlapping the username field
- CSS issue: z-index conflict between .login-form and .auth-container
- The mobile breakpoint (@media max-width: 768px) is missing
Fix:
```css
.auth-container {
z-index: 10;
}
.login-form {
z-index: 20;
margin-top: 1rem; /* Prevent overlap */
}
Would you like me to apply this fix to auth.css?”
**Why this works**: Claude Sonnet 4.5 has strong "spatial reasoning"—it can identify UI misalignments, button positions, and visual hierarchy issues that are hard to describe in text.
**2. Whiteboard to Code**
The Kiro blog showcases "whiteboard → production code" workflows:
**Example: Architecture Diagram → Infrastructure Code**
1. User draws a system architecture on a whiteboard (API Gateway → Lambda → DynamoDB)
2. Takes a photo with phone, uploads to Kiro
3. Kiro generates Terraform configurations that match the diagram
**Result**: From whiteboard sketch to deployed infrastructure in minutes.
**3. PDF Support (Feature Request Status)**
As of 2025, Kiro has **partial PDF support**. You can:
- Drag & drop PDFs into chat (Kiro extracts text)
- Reference PDF documentation (Kiro reads technical specs)
**Limitations**:
- Complex PDF layouts (multi-column, tables) may lose structure
- Image-heavy PDFs require OCR (not yet built-in)
**GitHub Issue #2066** (August 2025) tracks enhanced PDF support with:
- Full table extraction
- Diagram recognition
- Cross-reference linking
**4. Multimodal Use Cases**
| Input Type | Use Case | Example |
|------------|----------|---------|
| **Screenshot** | Debug UI bugs | "Why is this button not clickable?" |
| **Diagram** | Generate infrastructure | "Create Terraform from this AWS diagram" |
| **Whiteboard** | Convert design to code | "Build this API from my sketch" |
| **PDF** | Extract data models | "Read this API spec and generate TypeScript types" |
| **Photo** | Analyze error messages | "What does this terminal error mean?" |
**5. Real-World Example: PyQt6 App from Screenshot**
A developer shared a story (ludditus.com, July 2025): "I showed Kiro a screenshot of a Windows calculator. Kiro generated a fully functional PyQt6 app with the same layout, button handlers, and keyboard shortcuts. First try. No iterations."
**Why this matters**: Multimodal AI collapses "describe the UI" → "write the code" into a single step. You show, Kiro builds.
#### Experimental Features (Knowledge Management, Thinking, Checkpoints)
**1. Knowledge Management (Semantic Search)**
Enable with `/experiment knowledge`.
**What it does**: Kiro indexes your codebase into a **vector database** (embeddings). You can ask semantic questions like:
```bash
You: "Where do we validate credit card payments?"
Kiro: [searches embeddings, finds relevant code]
"Credit card validation happens in 3 places:
1. src/payments/stripe.ts:42 - Luhn algorithm check
2. src/api/checkout.ts:128 - CVV validation
3. lib/validators/card.ts:18 - Expiry date check"
Why this matters: Traditional grep searches for exact strings. Semantic search finds conceptually similar code, even if the variable names differ.
2. Thinking Tool (Extended Reasoning)
Enable with /experiment thinking.
What it does: Kiro shows its internal reasoning process for complex problems. Instead of jumping to a solution, Kiro breaks down the problem step-by-step.
Example:
You: "Why is my API slow only during peak hours?"
Kiro (with thinking tool):
**Thinking:**
1. Hypothesis: Database connection pool exhaustion
- Check connection pool size
- Check concurrent requests during peak
2. Hypothesis: Network latency to external services
- Check API response times for third-party calls
3. Hypothesis: CPU throttling under load
- Check server metrics (CPU, memory, disk I/O)
[Analyzes logs, checks metrics]
**Conclusion**: Database connection pool is too small (10 connections).
During peak hours (1000+ requests/min), connections queue for 5-10s.
**Fix**: Increase pool size from 10 → 50 in database.yml
Why this matters: For debugging and architecture decisions, seeing Kiro’s reasoning helps you learn instead of just accepting solutions blindly.
3. Checkpointing (Session Versioning)
Enable with /experiment checkpointing.
What it does: Create snapshots of your session state. If you make a mistake, roll back to a checkpoint.
Commands:
/checkpoint save "Before refactoring auth"/checkpoint list- Show all checkpoints/checkpoint restore <id>- Revert to checkpoint
Example:
You: "Refactor the auth system to use JWT instead of sessions"
Kiro: [makes changes across 10 files]
You: "Actually, let's stick with sessions. Rollback the changes."
Kiro: [uses /checkpoint restore to revert all 10 files]
"Restored to checkpoint: 'Before refactoring auth'"
Why this matters: Experimentation without fear. Try radical refactorings, and if they don’t work, revert in one command.
4. Tangent Mode (Context Isolation)
Enable with /experiment tangent.
What it does: Create a side conversation that doesn’t pollute your main context. Useful for:
- Exploring a design idea without committing to it
- Debugging a side issue while maintaining main task focus
- Asking theoretical questions (“What if we used Redis instead of Postgres?”)
Commands:
/tangent start "Explore Redis caching"/tangent end- Return to main conversation/tangent list- Show all tangents
Example:
# Main conversation: Building a payment API
You: "Implement Stripe payment processing"
Kiro: [works on Stripe integration]
You: /tangent start "Explore PayPal alternative"
# Now in tangent mode
You: "If we used PayPal instead, what would the code look like?"
Kiro: [generates PayPal integration example]
You: /tangent end
# Back to main conversation
You: "Continue with Stripe"
Why this matters: Your context window is limited (200K tokens). Tangents prevent “context pollution” from exploratory questions.
5. TODO Lists (Automated Task Tracking)
Enable with /experiment todo.
What it does: Kiro automatically creates and manages TODO lists based on your requests.
Example:
You: "Add user authentication, set up database migrations, and write API tests"
Kiro: [creates TODO list]
TODO:
1. [ ] Add user authentication (passport.js + JWT)
2. [ ] Set up database migrations (Knex.js)
3. [ ] Write API tests (Jest + Supertest)
Kiro: "Starting with TODO #1..."
[Implements auth]
Kiro: "✅ Completed: Add user authentication
Moving to TODO #2: Set up database migrations"
Why this matters: Kiro tracks multi-step tasks without you needing to remember the plan.
6. Code Intelligence (LSP Integration)
Kiro CLI now integrates with Language Server Protocol (LSP). This means Kiro has access to:
- Go-to-definition: “Show me where
AuthServiceis defined” - Find references: “Where is
validateTokencalled?” - Hover information: “What does this function return?”
- Diagnostics: “Show me all TypeScript errors in the project”
Why this matters: Kiro can navigate your codebase like an IDE—understanding imports, types, and call graphs.
How This Fits on Projects
You’ll apply experimental features in:
- Project 16 (Design to Code Converter): Uses multimodal input to convert UI mockups
- Project 28 (Semantic Search Engine): Builds knowledge base indexing
- Project 30 (Recursive Prompt Improver): Uses thinking tool for meta-reasoning
Definitions & Key Terms
- Multimodal input: Processing text + images/PDFs in the same conversation
- Spatial reasoning: AI’s ability to understand visual layouts and UI hierarchies
- Semantic search: Concept-based search using embeddings (vs exact string match)
- Thinking tool: Kiro’s extended reasoning mode (shows step-by-step thought process)
- Checkpointing: Session snapshots that can be restored (like Git for conversations)
- Tangent mode: Side conversations that don’t pollute main context
- LSP (Language Server Protocol): Standard for code intelligence (go-to-def, find refs, etc.)
- Vector embeddings: Numerical representations of code for semantic search
Mental Model Diagram
┌──────────────────────────────────────────────────────────────┐
│ KIRO EXPERIMENTAL FEATURES ARCHITECTURE │
└──────────────────────────────────────────────────────────────┘
MULTIMODAL INPUT PROCESSING
┌────────────────────────────────────────┐
│ User Input: Text + Image │
│ ┌──────────────────────────────────┐ │
│ │ "This UI is broken (screenshot)" │ │
│ └───────────────┬──────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ Claude Sonnet 4.5 (Vision) │ │
│ │ - Spatial reasoning │ │
│ │ - OCR text extraction │ │
│ │ - Layout analysis │ │
│ └───────────────┬──────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ Kiro Analysis + Code Generation │ │
│ │ "CSS z-index conflict detected" │ │
│ └──────────────────────────────────┘ │
└────────────────────────────────────────┘
KNOWLEDGE MANAGEMENT (Semantic Search)
┌────────────────────────────────────────┐
│ Codebase Indexing │
│ ┌──────────────────────────────────┐ │
│ │ src/**/*.ts → Embeddings │ │
│ │ (Vector Database) │ │
│ └───────────────┬──────────────────┘ │
│ ▼ │
│ User Query: "Where do we validate │
│ credit cards?" │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ Semantic Search │ │
│ │ (Cosine similarity) │ │
│ └───────────────┬──────────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ Results: stripe.ts:42, │ │
│ │ validators/card.ts:18 │ │
│ └──────────────────────────────────┘ │
└────────────────────────────────────────┘
THINKING TOOL (Extended Reasoning)
┌────────────────────────────────────────┐
│ Complex Problem: "API slow at peak" │
│ ┌──────────────────────────────────┐ │
│ │ STEP 1: Hypothesize causes │ │
│ │ - DB pool exhaustion? │ │
│ │ - Network latency? │ │
│ │ - CPU throttling? │ │
│ ├──────────────────────────────────┤ │
│ │ STEP 2: Gather evidence │ │
│ │ - Read logs │ │
│ │ - Check metrics │ │
│ ├──────────────────────────────────┤ │
│ │ STEP 3: Eliminate hypotheses │ │
│ │ - Network: ✅ Normal latency │ │
│ │ - CPU: ✅ Low utilization │ │
│ │ - DB pool: ❌ 10/10 used (CAUSE) │ │
│ ├──────────────────────────────────┤ │
│ │ STEP 4: Propose solution │ │
│ │ "Increase pool from 10 → 50" │ │
│ └──────────────────────────────────┘ │
└────────────────────────────────────────┘
CHECKPOINTING (Session Versioning)
┌────────────────────────────────────────┐
│ Session Timeline │
│ ┌──────────────────────────────────┐ │
│ │ Checkpoint 1: "Initial setup" │ │
│ ├──────────────────────────────────┤ │
│ │ Checkpoint 2: "Auth system done" │ │
│ ├──────────────────────────────────┤ │
│ │ Checkpoint 3: "Before refactor" │ │
│ ├──────────────────────────────────┤ │
│ │ [Current state] │ │
│ │ (Oops, refactor broke tests!) │ │
│ │ │ │
│ │ /checkpoint restore 3 │ │
│ ├──────────────────────────────────┤ │
│ │ Restored: "Before refactor" │ │
│ └──────────────────────────────────┘ │
└────────────────────────────────────────┘
TANGENT MODE (Context Isolation)
┌────────────────────────────────────────┐
│ Main Context (200K tokens) │
│ ┌──────────────────────────────────┐ │
│ │ Building Stripe payment API │ │
│ │ - Current progress: 60% │ │
│ │ - Files modified: 8 │ │
│ └───────────────┬──────────────────┘ │
│ │ │
│ /tangent start "Explore PayPal" │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ Tangent Context (isolated) │ │
│ │ "If we used PayPal instead..." │ │
│ │ [Generates PayPal code] │ │
│ └───────────────┬──────────────────┘ │
│ │ │
│ /tangent end │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ Back to Main Context │ │
│ │ (Stripe progress preserved) │ │
│ └──────────────────────────────────┘ │
└────────────────────────────────────────┘
KEY INSIGHT: Experimental features add "dimensions" to Kiro:
- Multimodal: Vision (images/diagrams)
- Knowledge: Memory (semantic search across projects)
- Thinking: Transparency (see reasoning process)
- Checkpointing: Time travel (revert mistakes)
- Tangents: Parallel exploration (context isolation)
How It Works
Step-by-step multimodal workflow:
- User uploads image:
You: "Debug this UI bug [screenshot.png]" - Image preprocessing: Kiro converts image to base64, sends to Claude API
- Vision model analysis: Claude Sonnet 4.5 analyzes pixels, identifies visual elements
- Spatial reasoning: AI detects layout issues (overlapping elements, misalignment)
- Code generation: Kiro proposes CSS/HTML fixes based on visual analysis
- User confirmation: Kiro asks “Apply these fixes?” before modifying files
Invariants:
- Images must be < 5MB (Kiro compresses larger files)
- PDFs must be text-extractable (not scanned images)
- Knowledge base requires
/experiment knowledgeenabled
Failure Modes:
- Image too low-res: Kiro may misidentify UI elements
- Handwritten diagrams: OCR struggles with messy handwriting
- Context overflow: Very large PDFs can exceed token limits
Minimal Concrete Example
Scenario: Convert a UI mockup to React code using multimodal input.
1. User uploads mockup:
$ kiro chat
You: "Here's a mockup of the login page: mockup.png
Generate a React component with:
- Email and password inputs
- Login button
- 'Forgot password' link
- Match the colors and layout from the mockup"
2. Kiro analyzes the image:
Kiro: [processes mockup.png]
"I see a centered login form with:
- Primary color: #3B82F6 (blue-500)
- Background: #F9FAFB (gray-50)
- Card shadow: subtle elevation
- Button: rounded-lg, full-width
Generating LoginPage.tsx..."
[Kiro writes the component]
3. Generated code:
// LoginPage.tsx
import { useState } from 'react';
export default function LoginPage() {
const [email, setEmail] = useState('');
const [password, setPassword] = useState('');
return (
<div className="min-h-screen bg-gray-50 flex items-center justify-center">
<div className="bg-white p-8 rounded-lg shadow-md w-full max-w-md">
<h1 className="text-2xl font-bold text-gray-900 mb-6">Login</h1>
<form className="space-y-4">
<div>
<label className="block text-sm font-medium text-gray-700 mb-2">
Email
</label>
<input
type="email"
value={email}
onChange={(e) => setEmail(e.target.value)}
className="w-full px-4 py-2 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500"
/>
</div>
<div>
<label className="block text-sm font-medium text-gray-700 mb-2">
Password
</label>
<input
type="password"
value={password}
onChange={(e) => setPassword(e.target.value)}
className="w-full px-4 py-2 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500"
/>
</div>
<button
type="submit"
className="w-full bg-blue-500 hover:bg-blue-600 text-white font-medium py-2 px-4 rounded-lg transition"
>
Login
</button>
<a href="/forgot-password" className="block text-center text-sm text-blue-500 hover:underline">
Forgot password?
</a>
</form>
</div>
</div>
);
}
Result: From mockup image to working React component in one prompt.
Common Misconceptions
-
“Multimodal only works with professional design mockups” ❌ Wrong. Kiro can process hand-drawn sketches, whiteboard photos, and low-fidelity wireframes.
-
“Experimental features are unstable and will break my workflow” ⚠️ Partially true. APIs may change, but features are production-tested before release. Use them, just don’t rely on undocumented behavior.
-
“Knowledge base searches replace grep/rg” ❌ Wrong. Semantic search is complementary. Use grep for exact matches, knowledge base for conceptual searches.
-
“Checkpointing works like Git branches” ❌ Wrong. Checkpoints are session-scoped (conversation state), not file-scoped. Git tracks code, checkpoints track context.
-
“Tangent mode is just a second chat window” ❌ Wrong. Tangents share the same session but isolate context. You can reference main conversation findings in tangents.
Check-Your-Understanding Questions
-
You upload a screenshot of a bug to Kiro. What AI capability allows Kiro to identify visual misalignments?
-
If you enable
/experiment knowledge, does Kiro index your entire home directory or just the current project? -
What’s the difference between
/checkpoint saveandgit commit? -
Can you use tangent mode to explore multiple design alternatives without polluting main context?
-
If the thinking tool shows Kiro’s reasoning, can you override its conclusions with your own hypothesis?
Check-Your-Understanding Answers
-
Spatial reasoning (part of Claude Sonnet 4.5’s vision capabilities). It can detect overlapping elements, misaligned text, and layout issues.
-
Just the current project. Knowledge base indexes files within the current working directory (respects .gitignore). It doesn’t scan your entire filesystem.
-
Scope: Checkpoints save conversation state (context, tool calls, session history). Git commits save file changes. Checkpoints are temporary (session-only), commits are permanent (repository history).
-
Yes. Tangent mode isolates exploratory work. You can start tangent “Explore Redux”, then tangent “Explore MobX”, then return to main without context pollution.
-
Yes. The thinking tool shows reasoning, but you can interrupt with “Actually, let’s try hypothesis X instead.” Kiro will pivot to your approach.
Real-World Applications
- Design-to-Code Workflows
- Upload Figma screenshots, generate React/Vue components
- Whiteboard → Infrastructure diagrams → Terraform
- Visual Debugging
- Screenshot of browser console → Kiro identifies stack trace root cause
- Photo of terminal error → Kiro explains cryptic error messages
- Semantic Code Search
- “Where do we handle user authentication?” (finds all auth-related code)
- “Show me all places we call external APIs” (finds HTTP clients, fetch calls)
- Exploratory Refactoring
- Use tangent mode to explore architecture changes
- Checkpoint before refactor, restore if tests fail
- Learning Complex Codebases
- Enable thinking tool to see how Kiro navigates unfamiliar code
- Use knowledge base to find related code by concept, not keywords
Where You’ll Apply It
- Project 12 (Documentation Librarian): Uses knowledge base for semantic search
- Project 16 (Design to Code Converter): Uses multimodal input for mockups
- Project 25 (Tangent Explorer): Builds tools to manage tangent mode
- Project 30 (Recursive Prompt Improver): Uses thinking tool for meta-reasoning
References
- Multimodal Development with Kiro - Kiro Blog
- Experimental Features - Kiro Docs
- Knowledge Management - Kiro Docs
- Thinking Tool - Kiro Docs
- Checkpointing - Kiro Docs
Key Insights
The Multimodal Advantage: “Show, don’t tell” is faster and more accurate than describing visual problems in text. Kiro’s vision capabilities collapse UI debugging from “write detailed bug report” → “paste screenshot” → “fixed.”
Summary
Kiro CLI’s experimental features expand AI-assisted development into multimodal input (images, PDFs, screenshots with spatial reasoning), semantic search (knowledge base with vector embeddings), extended reasoning (thinking tool for transparent problem-solving), session versioning (checkpointing for rollback), and context isolation (tangent mode for side explorations). These features are toggleable via /experiment commands and actively evolving. Multimodal workflows enable design-to-code generation (whiteboard → infrastructure), visual debugging (screenshot → bug fix), and diagram analysis (architecture diagram → Terraform). Advanced context features include LSP integration (go-to-definition, find references), TODO list automation, and semantic search across projects. Critical concepts include spatial reasoning (visual layout understanding), vector embeddings (concept-based search), and context isolation (tangent mode prevents main context pollution).
Homework/Exercises to Practice the Concept
Exercise 1: Multimodal UI Debugging Create a simple React app with an intentional CSS bug (e.g., overlapping buttons, misaligned text). Take a screenshot of the buggy UI. Use Kiro’s multimodal input to:
- Upload the screenshot
- Ask Kiro to identify the bug
- Have Kiro generate the CSS fix
- Apply the fix and verify it works
Exercise 2: Semantic Search with Knowledge Base Enable knowledge management in a medium-sized codebase (e.g., an open-source project with 50+ files). Test semantic search by asking:
- “Where do we validate user input?”
- “Show me all database queries”
- “Find code that handles errors”
Compare the results to traditional grep/rg searches. Which finds more relevant code?
Exercise 3: Thinking Tool for Complex Debugging Find a bug in your codebase that you struggled to debug. Enable the thinking tool and ask Kiro to:
- Analyze the bug
- Show its reasoning process (hypotheses, evidence gathering, elimination)
- Propose a fix
Reflect on whether seeing Kiro’s reasoning helps you understand the bug better.
Solutions to the Homework/Exercises
Solution 1: Multimodal UI Debugging
Create buggy React app:
// App.tsx
export default function App() {
return (
<div className="container">
<input type="text" placeholder="Username" className="input" />
<button className="submit-button">Login</button>
</div>
);
}
// App.css (intentional bug)
.container {
display: flex;
flex-direction: column;
}
.input {
width: 200px;
padding: 8px;
z-index: 10; /* Bug: z-index conflict */
}
.submit-button {
width: 200px;
padding: 8px;
z-index: 20;
margin-top: -10px; /* Bug: negative margin causes overlap */
}
Take screenshot and debug with Kiro:
$ kiro chat
You: "This UI has a bug where the button overlaps the input. Here's a screenshot: bug.png"
Kiro: [analyzes screenshot]
"I can see the CSS issue:
- The button has a negative top margin (-10px) that pushes it up
- This causes it to overlap the input field
Fix:
```css
.submit-button {
margin-top: 8px; /* Change from -10px to positive spacing */
}
Would you like me to apply this fix to App.css?”
You: “Yes, apply the fix”
Kiro: [edits App.css] “✅ Fixed: Updated .submit-button margin-top to 8px”
**Verify the fix works by refreshing the app.**
**Solution 2: Semantic Search with Knowledge Base**
**Enable knowledge management:**
```bash
$ cd my-project
$ kiro chat
You: /experiment knowledge
Kiro: "✅ Knowledge management enabled. Indexing codebase..."
[Indexes all files, creates vector embeddings]
"Indexed 234 files (45,000 lines of code)"
Test semantic search:
You: "Where do we validate user input?"
Kiro: [semantic search via embeddings]
"Found 4 validation locations:
1. src/middleware/validation.ts:18 - Joi schema validation
2. src/api/users.ts:45 - Email format check (regex)
3. src/utils/sanitize.ts:12 - XSS prevention (DOMPurify)
4. lib/validators/password.ts:8 - Password strength check"
You: "Show me all database queries"
Kiro: [semantic search]
"Found 12 database query locations:
1. src/db/users.ts:22 - SELECT * FROM users WHERE...
2. src/db/posts.ts:56 - INSERT INTO posts...
3. src/api/auth.ts:89 - UPDATE users SET last_login...
[... 9 more]"
Compare with traditional grep:
$ rg "SELECT|INSERT|UPDATE|DELETE" --type ts
# Result: Finds exact SQL keywords, but misses:
# - Knex query builder calls (no raw SQL)
# - TypeORM methods (findOne, save, etc.)
# - Prisma client calls (prisma.user.create)
# Semantic search finds ALL database operations, regardless of ORM/syntax
Solution 3: Thinking Tool for Complex Debugging
Bug scenario: API endpoint /api/users/:id returns 500 errors intermittently.
Enable thinking tool and debug:
$ kiro chat
You: /experiment thinking
You: "Our /api/users/:id endpoint returns 500 errors sometimes, but not always. Debug this."
Kiro (with thinking tool):
**THINKING:**
**Hypothesis 1: Race condition in database query**
- Intermittent failures suggest non-deterministic behavior
- Check if multiple requests to same user ID conflict
- Evidence needed: Database logs, request timing
[Reads src/api/users.ts, checks database connection code]
**Observation 1**: Using connection pool with 10 connections
- Potential bottleneck if > 10 concurrent requests
**Hypothesis 2: Null reference error for missing users**
- Check if error happens when user doesn't exist
- Evidence needed: Error logs, stack traces
[Reads error logs]
**Observation 2**: Stack trace shows "Cannot read property 'id' of undefined"
- Line 42: `return user.id` without null check
- Happens when `getUserById(id)` returns null (user not found)
**ROOT CAUSE IDENTIFIED**: Missing null check after database query
**SOLUTION:**
```typescript
// src/api/users.ts:42 (before)
const user = await getUserById(id);
return user.id; // ❌ Crashes if user is null
// Fixed:
const user = await getUserById(id);
if (!user) {
throw new NotFoundError(`User ${id} not found`);
}
return user.id; // ✅ Safe
You: “Perfect! Seeing your reasoning helped me understand that I should always null-check database results.”
Reflection: The thinking tool shows:
- How Kiro generates hypotheses
- What evidence it gathers
- How it eliminates false leads
- The final reasoning path
This transparency helps developers learn debugging strategies, not just get answers.
Glossary
- Agent: A specialized Kiro persona with defined tools, permissions, and prompt
- Auto Router: Kiro’s intelligent model selection system (Haiku for simple, Opus for complex)
- Checkpoint: Session snapshot that can be restored later
- Context: The information Kiro has access to (files, chat history, steering)
- Delegate: Background task execution (run tests while you work)
- Hook: Event-driven automation script (PreToolUse, PostToolUse, UserPromptSubmit)
- Knowledge Base: Vector-indexed codebase for semantic search beyond context limits
- MCP (Model Context Protocol): Standard for connecting Kiro to external tools (databases, APIs, etc.)
- Power: Bundled capability package (MCP + steering + hooks) for specific frameworks
- REVL Loop: Read → Evaluate → Verify → Loop (Kiro’s execution model)
- Steering: Markdown files that encode project standards and constraints
- Subagent: Parallel Kiro instance with isolated context for large-scale analysis
- Tangent: Side conversation that doesn’t pollute main context
- Tool: Built-in Kiro capability (read, write, shell, grep, etc.)
Why Kiro CLI Matters
Modern Motivation (2025)
In 1972, Dennis Ritchie gave us C and direct memory access. In 2025, AWS gives us Kiro CLI and direct cognitive access. The terminal has remained fundamentally unchanged for decades: you type a command, it returns output. This synchronous, transactional model places the entire cognitive load on the operator.
Real-World Impact Statistics:
- Developer productivity: Teams using Kiro report 40-60% reduction in repetitive coding tasks (AWS re:Invent 2024)
- Code quality: Automated security scanning via hooks reduces vulnerabilities by 35% (AWS case study: Fintech Corp)
- Context switching: Developers spend 23% less time switching between tools when using MCP integrations (McKinsey DevOps Report 2024)
- Onboarding time: New developers become productive 2-3 weeks faster with Kiro agents (AWS Enterprise Survey 2024)
Modern Use Cases:
- CI/CD Automation: Headless Kiro agents fix failing builds autonomously
- Security Auditing: Read-only agents scan codebases for OWASP Top 10
- Multi-Repo Refactoring: Subagents coordinate changes across microservices
- Documentation Generation: PostToolUse hooks auto-generate docs on every commit
- Team Collaboration: Shared steering files enforce coding standards automatically
Context and Evolution (Optional Background)
The transition from Amazon Q Developer CLI to Kiro CLI (November 2024) marked a paradigm shift:
| Aspect | Amazon Q CLI (2023) | Kiro CLI (2024-2025) |
|---|---|---|
| Licensing | Apache License | Proprietary (Free tier) |
| Focus | AI Assistant | AI Partner/Agent |
| Features | Basic chat, AWS integration | Subagents, Powers, Hooks, MCP |
| Authentication | Builder ID, IAM | Builder ID, IAM, Device Flow |
| Context Management | Simple file loading | Tiered (Session/Agent/Knowledge) |
| Automation | Manual prompting | Hooks + multi-agent workflows |
Why alternatives exist:
- Claude Code: Browser-based, great for UI automation, no headless mode
- GitHub Copilot CLI: Chat-based, limited to GitHub context, no custom agents
- Aider: Lightweight, single-model, no enterprise features
- Cursor: IDE-bound, excellent for coding, limited automation
Kiro’s unique strengths:
- Headless automation: Run in CI/CD without human interaction
- Enterprise-ready: Device flow auth, corporate proxy support, SSO
- Extensible: MCP ecosystem, custom hooks, agent configurations
- Multi-agent: Subagents + orchestration for complex workflows
- AWS-native: Deep AWS integration (CloudWatch, S3, Lambda, etc.)
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| REVL Loop | Read → Evaluate → Verify → Loop. Separates intent, execution, and verification for reliable AI. |
| Configuration Hierarchy | Global (~/.kiro/) → Project (.kiro/) → Agent (.kiro/agents/*.json). Higher specificity wins. |
| Context Tiers | Session (ephemeral chat), Agent Resources (persistent files), Knowledge Base (RAG for large codebases). |
| Steering Files | Markdown constraints in .kiro/steering/. Positive, specific, testable rules. |
| Model Selection | Auto router (default), Haiku (fast/cheap), Sonnet (balanced), Opus (deep reasoning). Override when you know the tradeoff. |
| MCP Servers | External tool integrations via stdio. Configure in mcp.json. Enable databases, APIs, docs. |
| Hooks | Event-driven automation. PreToolUse (blocking), PostToolUse (non-blocking), UserPromptSubmit (context injection). |
| Subagents | Parallel task execution with isolated contexts. Fan-out/fan-in pattern for large-scale analysis. |
| Planning Agents | Separate thinking (research + plan) from doing (execution). Reduces errors in complex tasks. |
| Permissions & Safety | allowedTools, deniedTools, deniedCommands. Principle of least privilege for agents. |
| Experimental Features | Tangent mode (context isolation), Checkpoints (session versioning), Knowledge (semantic search), Delegate (background tasks). |
Project-to-Concept Map
| Project | Concepts Applied |
|---|---|
| Project 1: Session Explorer | Context Tiers, Session Management, Configuration |
| Project 2: Model Router Analyzer | Model Selection, Cost Optimization, Auto Router |
| Project 3: Context Window Visualizer | Context Tiers, Token Economics, Real-time Monitoring |
| Project 4: Custom Agent Factory | Agent Configuration, Permissions, Specialized Personas |
| Project 5: Steering Rules Engine | Steering Files, Positive Constraints, Configuration Sharing |
| Project 6-10: MCP Integrations | MCP Protocol, External Systems, Tool Design |
| Project 11: Planning Agent Workflow | Planning Agents, Spec-Driven Development, REVL Loop |
| Project 12: Kiro Powers Creator | Powers, Bundled Capabilities, Plugin System |
| Project 13-16: Experimental Features | Tangent Mode, Checkpoints, Knowledge Base, Config Sync |
| Project 17-19: CI/CD & Headless | Headless Automation, Device Flow, Non-Interactive Workflows |
| Project 20-25: Hooks & Automation | PreToolUse, PostToolUse, Type-Safe Hooks, Event-Driven Systems |
| Project 26-30: Advanced MCP | Docker MCP, Kubernetes, Custom Servers, Multi-System Integration |
| Project 31-35: Multi-Agent Workflows | Subagents, Orchestration, Code Review Pipelines, Migration |
| Project 36-39: Enterprise & Teams | Configuration Sharing, Output Styles, Skills, Team Collaboration |
| Project 40: Final Capstone | All concepts integrated into production workflow platform |
Deep Dive Reading by Concept
Context & Memory Management
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Context window limits | “AI Engineering” by Chip Huyen — Ch. 4: “Model Serving” | Understand token budgets and RAG |
| RAG fundamentals | “Designing Data-Intensive Applications” by Martin Kleppmann — Ch. 12 | Build knowledge bases for large codebases |
| Vector search | “Fundamentals of Software Architecture” by Richards & Ford — Ch. 8 | Implement semantic code search |
Agent Design & Automation
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Tool design patterns | “Clean Architecture” by Robert C. Martin — Ch. 22 | Create well-bounded agent tools |
| Event-driven systems | “Enterprise Integration Patterns” by Hohpe & Woolf — Ch. 3 | Build robust hook systems |
| Permission models | “Foundations of Information Security” by Jason Andress — Ch. 5 | Implement least-privilege agents |
Shell & System Integration
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Process management | “The Linux Programming Interface” by Michael Kerrisk — Ch. 24-26 | Understand hook subprocess execution |
| Shell scripting | “Effective Shell” by Dave Kerr — Ch. 10-12 | Write robust automation scripts |
| Git internals | “How Linux Works” by Brian Ward — Ch. 8 | Integrate with version control |
MCP & Integration
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| JSON-RPC | MCP Specification — modelcontextprotocol.io | Understand MCP protocol |
| API design | “REST API Design Rulebook” by Mark Massé | Build well-designed MCP servers |
| Database integration | “Designing Data-Intensive Applications” by Kleppmann — Ch. 2-3 | Connect Kiro to databases safely |
Essential Reading Order
- Foundation (Week 1):
- Kiro CLI official docs: https://kiro.dev/docs/cli/
- MCP specification: https://modelcontextprotocol.io/
- “Effective Shell” by Dave Kerr - Ch. 1-5
- Agent Design (Week 2):
- Custom agents docs: https://kiro.dev/docs/cli/custom-agents/
- Steering docs: https://kiro.dev/docs/cli/steering/
- “Clean Architecture” by Robert C. Martin - Ch. 22
- Advanced Automation (Week 3):
- Hooks docs: https://kiro.dev/docs/cli/hooks/
- Experimental features: https://kiro.dev/docs/cli/experimental/
- “Enterprise Integration Patterns” by Hohpe & Woolf - Ch. 3
- Production (Week 4+):
- MCP server examples: https://github.com/modelcontextprotocol/
- AWS re:Invent 2024: Kiro CLI Deep Dive (YouTube)
- “Designing Data-Intensive Applications” by Kleppmann
Quick Start: Your First 48 Hours
Day 1: Install and Configure
Morning (2-3 hours):
- Install Kiro CLI:
npm install -g @aws/kiro-cliorbrew install kiro-cli - Authenticate:
kiro-cli login(Builder ID or IAM) - Verify setup:
kiro-cli --versionandkiro-cli chat - Read Introduction and Big Picture sections of this guide (30 minutes)
- Read Theory Primer Chapter 1: Configuration (30 minutes)
Afternoon (3-4 hours):
- Start Project 1: Session Explorer - build session management scripts
- Create your first global config file (
~/.kiro/settings.json) - Save and resume a session
- Export session data as JSON
Evening (1 hour):
- Read Chapter 2: Context Management
- Experiment with
/context add,/context show,/compact - Track your token usage in a session
Day 2: Build Your First Agent
Morning (2-3 hours):
- Read Chapter 4: Custom Agents
- Start Project 4: Custom Agent Factory
- Create a read-only security auditor agent
- Test that write tools are blocked
Afternoon (3-4 hours):
- Read Chapter 3: Steering Files
- Start Project 5: Steering Rules Engine
- Write your first steering file (
tech.md) - Validate that Kiro follows your rules
Evening (1 hour):
- Review your progress
- Complete Definition of Done checklists for Projects 1, 4, 5
- Read ahead: Chapter 6: MCP for tomorrow
Recommended Learning Paths
Path 1: The Beginner (New to Agentic Tools)
Goal: Build confidence with Kiro fundamentals before complex automation.
Week 1-2: Foundations
- Project 1: Session Explorer
- Project 2: Model Router Analyzer
- Project 3: Context Window Visualizer
Week 3-4: Configuration & Agents
- Project 4: Custom Agent Factory
- Project 5: Steering Rules Engine
- Project 16: Configuration Sync System
Week 5-8: First Integrations
- Project 6: MCP Server Connector (Postgres)
- Project 9: AWS Documentation Searcher
- Project 11: Planning Agent Workflow
Total: 2 months, 10-15 hours/week
Path 2: The Intermediate (Familiar with CLI Tools)
Goal: Master MCP integrations and automation hooks.
Week 1-2: Core Setup
- Project 1: Session Explorer
- Project 4: Custom Agent Factory
- Project 5: Steering Rules Engine
Week 3-6: MCP Ecosystem
- Project 6: MCP Server Connector
- Project 7: GitHub Integration Agent
- Project 9: AWS Documentation Searcher
- Project 18: Docker MCP Server
- Project 21: Slack Integration Agent
Week 7-10: Hooks & Automation
- Project 8: Pre-Commit Hook System (Bun)
- Project 22: Test Generator Hook
- Project 23: Documentation Generator
- Project 24: Secret Scanner Hook
Total: 2.5 months, 15-20 hours/week
Path 3: The Advanced (Ready for Production)
Goal: Build enterprise-grade multi-agent workflows.
Week 1-3: Rapid Foundations
- Projects 1, 4, 5, 11 (planning mode)
Week 4-8: Full MCP Stack
- Projects 6, 7, 9, 18, 19, 21, 28
Week 9-14: Multi-Agent Orchestration
- Project 10: Subagent Orchestrator
- Project 25: Code Review Workflow
- Project 31: Codebase Migration Assistant
- Project 33: Multi-Repository Refactoring
- Project 39: Multi-Agent Pipeline Orchestrator
Week 15-20: Production Platform
- Project 17: Headless CI/CD Pipeline
- Project 37: Configuration Sharing System
- Project 40: Complete Development Workflow Platform
Total: 5 months, 20+ hours/week
Path 4: The Enterprise Architect (Team Deployment)
Goal: Deploy Kiro across a team with shared configurations and governance.
Phase 1: Personal Mastery (Weeks 1-4)
- Projects 1-5, 11, 16
Phase 2: Security & Governance (Weeks 5-8)
- Project 8: Pre-Commit Hook System
- Project 18: Security Firewall Hook (from LEARNING)
- Project 24: Secret Scanner Hook
- Project 20: Git Context Injector (from LEARNING)
Phase 3: Team Collaboration (Weeks 9-14)
- Project 12: Kiro Powers Creator
- Project 21: Slack Integration Agent
- Project 37: Configuration Sharing System
- Project 38: Output Style Designer
Phase 4: CI/CD Integration (Weeks 15-20)
- Project 17: Headless CI/CD Pipeline
- Project 28: Terraform Infrastructure Agent
- Project 40: Complete Development Workflow Platform
Total: 5 months, team rollout plan included
Success Metrics
Technical Mastery
After completing this sprint, you should be able to:
Configuration & Setup:
- ✅ Explain the three-tier config hierarchy (global, project, agent)
- ✅ Debug config conflicts using
kiro-cli settings show - ✅ Create specialized agents with precise tool permissions
- ✅ Write effective steering files that enforce project standards
Context & Memory:
- ✅ Manage token budgets across long sessions
- ✅ Use
/compactstrategically to preserve important context - ✅ Implement knowledge bases for codebases >10MB
- ✅ Understand the tradeoffs between session, agent, and knowledge context
Automation & Hooks:
- ✅ Write type-safe hooks in TypeScript with Bun
- ✅ Implement PreToolUse hooks that block dangerous commands
- ✅ Implement PostToolUse hooks for auto-formatting and linting
- ✅ Debug hook failures using stdin/stdout logs
MCP & Integrations:
- ✅ Configure MCP servers for databases, GitHub, AWS, Docker
- ✅ Build custom MCP servers in Python or TypeScript
- ✅ Secure MCP integrations with read-only credentials
- ✅ Troubleshoot MCP server connection issues
Multi-Agent Workflows:
- ✅ Spawn subagents for parallel large-scale analysis
- ✅ Orchestrate multi-agent code review pipelines
- ✅ Use planning agents to reduce implementation errors
- ✅ Coordinate changes across multiple repositories
Production Deployment:
- ✅ Run Kiro headlessly in CI/CD pipelines
- ✅ Implement device flow authentication for servers
- ✅ Configure corporate proxy and custom CA trust
- ✅ Share team configurations via Powers or dotfiles
Business Impact
Measurable outcomes you should achieve:
Productivity:
- ⚡ 40-60% reduction in repetitive coding tasks (boilerplate, CRUD, migrations)
- ⚡ 50%+ faster debugging sessions via context-aware analysis
- ⚡ 2-3 weeks faster developer onboarding with agent assistance
Quality:
- 🛡️ 35%+ reduction in security vulnerabilities via automated scanning
- 🛡️ 90%+ test coverage on new code via test generation hooks
- 🛡️ Zero secrets leaked to repos via secret scanner hooks
Collaboration:
- 🤝 Team coding standards enforced automatically via steering
- 🤝 Consistent PR review quality via code review agents
- 🤝 Cross-repo refactoring coordination via multi-agent workflows
Self-Assessment Questions
Can you answer these confidently?
Beginner:
- What’s the difference between global, project, and agent config?
- How do you save and resume a Kiro session?
- What tools are available by default in Kiro?
Intermediate:
- How does the Auto router decide which model to use?
- What’s the difference between PreToolUse and PostToolUse hooks?
- How do you configure an MCP server for Postgres?
Advanced:
- How do subagents maintain isolated contexts?
- What’s the stdin/stdout protocol for hooks?
- How would you implement a custom MCP server from scratch?
Expert:
- How would you design a multi-agent CI/CD pipeline?
- How do you debug MCP server failures in production?
- What’s the strategy for sharing team configurations at scale?
Project List
The following 40 projects guide you from Kiro beginner to production expert. Projects are ordered by dependency and complexity.
Project 1: “The Personalized Kiro Config” — Configuration Management
| Attribute | Value |
|---|---|
| File | KIRO_CLI_MASTERY.md |
| Main Programming Language | JSON / Markdown |
| Coolness Level | Level 2: Practical but Forgettable |
| Business Potential | 1. The “Resume Gold” (Efficiency) |
| Difficulty | Level 1: Beginner |
| Knowledge Area | Configuration Management |
What you’ll build: A robust, shareable global configuration system for Kiro that defines preferred models, telemetry, and your first global steering rules.
Why it teaches Config: You will understand precedence (global vs project vs agent) and how to persist your preferences across sessions.
Core challenges you’ll face:
- Understanding the JSON schema for
settings.json. - Defining global steering that applies to all projects.
- Resolving conflicts between global and local settings.
Success criteria:
- Kiro uses your chosen default model in new sessions.
- A global steering rule is consistently applied.
Real World Outcome
You’ll have a fully configured Kiro CLI installation with persistent preferences that work across all your projects. When you run kiro chat, you’ll see:
Example Output:
$ kiro chat
Kiro CLI v1.4.2 (using claude-sonnet-4 by default)
Session: 2025-01-02-config-test
Telemetry: disabled
Knowledge Base: enabled
You: "Show me my current settings"
Kiro: [reads ~/.config/kiro/settings.json]
Current configuration:
- Default model: claude-sonnet-4
- Telemetry: disabled
- Auto-compact: enabled (at 80% context)
- Knowledge base: enabled for *.py, *.js, *.md
- Global steering: loaded from ~/.kiro/steering/tech.md
You: "Create a new Python function"
Kiro: [applies global steering from tech.md]
Following your global steering rules:
- Using type hints for all parameters
- Including docstrings with examples
- Following PEP 8 style guide
Here's the function...
Your settings file at ~/.config/kiro/settings.json will look like this:
{
"chat": {
"defaultModel": "claude-sonnet-4",
"greeting": {
"enabled": false
},
"enableKnowledge": true
},
"telemetry": {
"enabled": false
},
"context": {
"autoCompact": {
"enabled": true,
"threshold": 0.8
}
},
"knowledge": {
"defaultIncludePatterns": ["*.py", "*.js", "*.md", "*.txt"]
}
}
The Core Question You’re Answering
“How do I make Kiro remember my preferences across projects and sessions without repeating myself every time?”
Before building this, think about: Most developers waste time re-configuring tools for every project. Kiro’s three-tier configuration system (global → project → agent) lets you define intelligent defaults once, then override only what’s specific. This project teaches you how to architect reusable AI behavior.
Concepts You Must Understand First
Stop and research these before coding:
- Configuration Hierarchy
- What happens when global settings conflict with project settings?
- How does Kiro resolve precedence (global < project < agent)?
- Where does each config file live in the filesystem?
- Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 8 (Pragmatic Projects)
- JSON Schema Validation
- What is the valid structure for
settings.json? - How do you validate your config before Kiro loads it?
- What happens when you provide invalid JSON?
- Book Reference: “Working Effectively with Legacy Code” by Feathers - Ch. 10 (Configuration)
- What is the valid structure for
- Steering Files
- How do steering files (
.md) get loaded into context? - What’s the difference between prompt and steering?
- When should constraints go in config vs steering?
- Book Reference: “Clean Code” by Martin - Ch. 17 (Smells and Heuristics)
- How do steering files (
Questions to Guide Your Design
Before implementing, think through these:
- Default Model Selection
- Which model should be your default: Haiku (fast), Sonnet (balanced), or Opus (powerful)?
- What tasks will you do most often? (code review vs generation vs complex reasoning)
- How much do credit costs matter to your workflow?
- Telemetry and Privacy
- Do you want Kiro to send usage data to AWS?
- What are the tradeoffs between telemetry off vs debugging help?
- Should different projects have different telemetry settings?
- Knowledge Base Patterns
- Which file types contain important context for your work?
- Should you exclude generated files (
build/,node_modules/)? - How large is your average codebase (affects knowledge base indexing)?
Thinking Exercise
Exercise: Trace Configuration Loading
Before creating your config, manually trace how Kiro would load settings for this scenario:
# File structure:
~/.config/kiro/settings.json → defaultModel: "claude-sonnet-4"
~/my-project/.kiro/settings.json → defaultModel: "claude-haiku-4"
~/my-project/.kiro/agents/audit.json → model: "claude-opus-4"
# Commands:
1. cd ~/my-project && kiro chat
2. kiro chat --agent audit
Questions while tracing:
- What model is used for command #1? Why?
- What model is used for command #2? Why?
- How would you verify your answer without running the commands?
- What if
audit.jsondidn’t specify a model field?
The Interview Questions They’ll Ask
- “Explain the difference between Kiro’s global, project, and agent configuration levels.”
- “How would you debug why Kiro isn’t using your preferred model?”
- “What’s the purpose of the knowledge base feature, and when should you enable it?”
- “How do steering files differ from agent prompts?”
- “What are the security implications of enabling telemetry in a corporate environment?”
- “How would you share configuration across a team without committing credentials?”
Hints in Layers
Hint 1: Start with Inspection
Don’t create config files from scratch. First run kiro-cli settings list to see all available options and their current values. This shows you what’s configurable.
Hint 2: Use CLI Commands First
Instead of editing JSON manually, use kiro-cli settings <key> <value> commands. This validates your input and prevents syntax errors. Example: kiro-cli settings chat.defaultModel "claude-sonnet-4".
Hint 3: Test Incrementally
Change one setting at a time and verify it works with kiro chat. Add model override, test. Add telemetry disable, test. Build confidence before creating complex configs.
Hint 4: Verify with JSON Tools
Use jq or python -m json.tool to validate your settings.json syntax:
cat ~/.config/kiro/settings.json | jq .
If this fails, your JSON is malformed.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Configuration best practices | “The Pragmatic Programmer” by Hunt & Thomas | Ch. 8: Pragmatic Projects |
| JSON structure and validation | “Effective Shell” by Dave Kerr | Ch. 13: Working with Data |
| Tool customization patterns | “Working Effectively with Legacy Code” by Feathers | Ch. 10: Configuration Management |
Common Pitfalls & Debugging
Problem 1: “Kiro ignores my global settings”
- Why: Project-level settings override global ones
- Fix: Check if
<project>/.kiro/settings.jsonexists and has conflicting values - Quick test:
cat .kiro/settings.jsonin your project directory
Problem 2: “Settings file causes Kiro to crash on startup”
- Why: Invalid JSON syntax (missing comma, extra bracket, unquoted string)
- Fix: Validate with
jq . < ~/.config/kiro/settings.jsonor usekiro-cli settingscommands - Quick test:
kiro-cli settings listshould not error
Problem 3: “Knowledge base includes too many files”
- Why: Default patterns like
*match everything includingnode_modules/ - Fix: Use specific patterns:
["*.py", "*.js", "!node_modules/**", "!build/**"] - Quick test:
kiro-cli settings knowledge.defaultIncludePatterns --getto inspect current patterns
Problem 4: “Steering file not loaded”
- Why: File path in config is wrong or file doesn’t exist
- Fix: Use absolute paths or paths relative to
.kiro/directory - Quick test:
kiro chat→ ask “what are your current steering rules?” → verify file is mentioned
Definition of Done
~/.config/kiro/settings.jsonexists and is valid JSON- Running
kiro chatin any directory uses your default model (verify with greeting message or/settingscommand) - Telemetry preference (on/off) persists across sessions
- Knowledge base is enabled and indexes expected file types
- Global steering file (if created) is loaded in all new sessions
kiro-cli settings listshows all your customizations- Settings can be exported as JSON and re-imported on a new machine
- You understand how to override global settings at project level
Project 2: “The Steering Enforcer” — Prompt Engineering
| Attribute | Value |
|---|---|
| File | KIRO_CLI_MASTERY.md |
| Main Programming Language | Markdown |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | 3. Service & Support (Team Standardization) |
| Difficulty | Level 1: Beginner |
| Knowledge Area | Prompt Engineering / Context Management |
What you’ll build: A set of hierarchical steering files (tech.md, product.md, style.md) for a dummy project. Verify that Kiro refuses output that violates these rules.
Why it teaches Steering: This is the foundation of enterprise AI: constrain creativity to adhere to strict invariants.
Core challenges you’ll face:
- Writing positive constraints.
- Structuring intent vs implementation.
Success criteria:
- A test prompt fails until you align it with steering.
Real World Outcome
You’ll have a set of steering files that enforce your coding standards automatically. When you ask Kiro to write code, it will follow your rules without you having to remind it every time.
Example Output:
$ cd my-web-app
$ ls .kiro/steering/
tech.md product.md style.md
$ cat .kiro/steering/tech.md
# Technical Standards
## TypeScript
- All functions must have explicit return types
- Use `interface` for public APIs, `type` for internal code
- Prefer functional patterns over classes
- No `any` types except in migration code (mark with TODO)
## React
- Functional components only (no class components)
- Use custom hooks for all side effects
- Colocate styles with components
- Prefer server components (mark client with "use client")
## Testing
- Unit test coverage minimum: 80%
- Integration tests for all API endpoints
- Use React Testing Library for component tests
$ kiro chat
You: "Create a new React component for displaying user profiles"
Kiro: [reads .kiro/steering/tech.md, product.md, style.md]
Following your steering rules:
✓ Functional component with TypeScript
✓ Explicit return type
✓ Colocated styles
✓ Server component (no client-side state)
interface UserProfileProps {
userId: string;
}
export function UserProfile({ userId }: UserProfileProps): React.ReactElement {
// Component implementation following your standards...
}
You: "Actually use a class component"
Kiro: I cannot do that. Your tech.md steering file explicitly states "Functional components only (no class components)". If you need to override this rule for a specific case, you should either:
1. Update .kiro/steering/tech.md to allow exceptions
2. Provide a strong justification for why this component requires a class
3. Use /steer off to temporarily disable steering (not recommended)
Would you like me to explain why functional components are preferred in your standards?
Your .kiro/steering/tech.md file enforces project rules across all sessions.
The Core Question You’re Answering
“How do I make Kiro follow my team’s coding standards, architectural patterns, and best practices without explaining them in every single conversation?”
Before building this, understand: Most AI tools require constant reminders about your conventions. Kiro’s steering system solves this by loading markdown files into every session’s context. This project teaches you how to encode knowledge that persists.
Concepts You Must Understand First
Stop and research these before coding:
- Steering vs Prompts
- How does steering differ from the agent’s base prompt?
- When should a rule go in steering vs in a custom agent’s prompt field?
- How does steering interact with the model’s training?
- Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 7 (While You Are Coding)
- Markdown as Configuration
- Why does Kiro use
.mdfiles instead of JSON for steering? - How does markdown structure (headings, lists, code blocks) affect AI understanding?
- What’s the difference between declarative rules and example-based rules?
- Reference: Kiro Steering Guide
- Why does Kiro use
- Constraint Design
- How do you write constraints that are specific enough to enforce but flexible enough to allow creativity?
- What’s the difference between “never use X” vs “prefer Y over X”?
- How do you test that a steering rule actually works?
- Reference: Best Kiro Steering Rules (2025)
Questions to Guide Your Design
Before implementing, think through these:
- File Organization
- Should you have one monolithic
steering.mdor separate files per domain? - How do you decide what belongs in
tech.mdvsproduct.mdvsstyle.md? - Should steering files be committed to version control?
- Should you have one monolithic
- Rule Specificity
- How specific should rules be? (“Use TypeScript” vs “All functions must have explicit return types”)
- Should you include examples of good code, bad code, or both?
- How do you handle edge cases and exceptions?
- Team Alignment
- If this is a team project, who has authority to add/change steering rules?
- How do you communicate steering changes to team members?
- Should different team members have different local steering overrides?
Thinking Exercise
Exercise: Design a Steering Hierarchy
You’re building a full-stack TypeScript monorepo with:
- 2 frontend apps (React + Next.js)
- 1 backend API (Express)
- Shared libraries (utils, types, components)
Design the steering file structure. For each file, list 3-5 rules it should enforce.
.kiro/steering/
├── _______________.md → Rules for: _______________
├── _______________.md → Rules for: _______________
└── _______________.md → Rules for: _______________
Questions while designing:
- Which rules apply globally vs per-domain?
- How do you prevent contradictory rules (e.g., frontend prefers X, backend requires Y)?
- What happens when a prompt violates multiple steering files?
- Should shared libraries have their own steering file?
The Interview Questions They’ll Ask
- “Explain the difference between a steering file and a custom agent’s prompt.”
- “How would you enforce a team coding standard using Kiro’s steering system?”
- “What are the tradeoffs between specific rules (‘no any types’) vs general principles (‘write type-safe code’)?”
- “How do you test that a steering file is working correctly?”
- “When would you use global steering (~/.kiro/steering/) vs project steering (.kiro/steering/)?”
- “What security considerations exist when writing steering files?”
Hints in Layers
Hint 1: Start with Examples, Not Rules Don’t write abstract rules like “write clean code.” Instead, paste examples of code you like and code you don’t like. The AI learns patterns from examples better than from principles.
Hint 2: Use Hierarchical Markdown Organize steering with H2 headers for domains (## React, ## TypeScript) and H3 for specific topics (### Component Structure, ### Type Safety). This helps Kiro understand which rules apply when.
Hint 3: Test with Violations After creating a steering file, intentionally ask Kiro to violate a rule. Example: If you wrote “no class components,” ask “create a class-based React component.” Verify Kiro refuses or warns you.
Hint 4: Version Control Your Steering
Treat .kiro/steering/ like code. Commit it, review changes in PRs, and document why rules exist. This creates a living document of your team’s standards.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Writing effective constraints | “Clean Code” by Robert C. Martin | Ch. 17: Smells and Heuristics |
| Team coding standards | “The Pragmatic Programmer” by Hunt & Thomas | Ch. 7: While You Are Coding |
| Documentation best practices | “Docs for Developers” by Nunez & Seward | Ch. 4: Writing Technical Documentation |
Common Pitfalls & Debugging
Problem 1: “Kiro ignores my steering rules”
- Why: Steering file has syntax errors or isn’t in the correct directory
- Fix: Verify file is at
.kiro/steering/*.md(not~/.kiro/steering/for project-specific rules) - Quick test:
ls .kiro/steering/should list your files;cat .kiro/steering/tech.mdshould show valid markdown
Problem 2: “Kiro follows steering too rigidly”
- Why: Rules are written as absolute prohibitions (“never use X”) instead of preferences
- Fix: Soften language: “Prefer functional components. Use class components only for error boundaries or third-party library requirements.”
- Quick test: Ask Kiro to explain when it’s acceptable to violate a rule
Problem 3: “Contradictory steering files”
- Why:
tech.mdsays “use Prettier with 2 spaces” butstyle.mdsays “4 space indentation” - Fix: Consolidate related rules into a single file, or establish precedence (e.g., tech.md overrides style.md)
- Quick test:
grep -r "space" .kiro/steering/to find conflicting rules
Problem 4: “Steering files are too long (5000+ lines)”
- Why: Trying to document every pattern in one file
- Fix: Split by domain:
react.md,typescript.md,testing.md,api-design.md - Quick test: Each steering file should be < 500 lines; if longer, split into sub-domains
Definition of Done
- Created at least 3 steering files in
.kiro/steering/directory - Each file uses hierarchical markdown (H2 sections, H3 subsections)
- Includes both positive examples (“Do this”) and negative examples (“Don’t do this”)
- At least one rule includes code snippets showing preferred vs discouraged patterns
- Tested by asking Kiro to violate a rule—it should refuse or warn
- Documented why each major rule exists (rationale for future developers)
- Steering files are committed to version control
- Team members can override steering locally if needed (using ~/.kiro/steering/)
Project 3: “The Context Detective” — LLM Context Window
| Attribute | Value |
|---|---|
| File | KIRO_CLI_MASTERY.md |
| Main Programming Language | Bash / Kiro Commands |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 1: Beginner |
| Knowledge Area | LLM Context Window |
What you’ll build: Experiments to visualize token usage. Load a large file, use /context show, /compact, and /context remove.
Why it teaches Context: You learn to garbage collect AI memory to maintain performance and accuracy.
Core challenges you’ll face:
- Distinguishing file context from chat history.
- Detecting when instructions are being forgotten.
Success criteria:
- You can reduce context size without losing key constraints.
Real World Outcome
You’ll understand exactly how Kiro manages its 200K token context window and learn to use commands to prevent context overflow. You’ll be able to see what’s consuming context and strategically free up space.
Example Output:
$ kiro chat
You: "Load the entire src/ directory"
Kiro: [reads 250 files totaling 180K tokens]
Context usage: 90% (180K / 200K tokens)
⚠️ WARNING: Approaching context limit
I've loaded your source code, but we're at 90% context capacity.
Consider using `/compact` to summarize our conversation or
`/context remove` to free up space.
You: "/context show"
Kiro:
┌─────────────────────────────────────────────────┐
│ Context Window Usage: 180K / 200K (90%) │
├─────────────────────────────────────────────────┤
│ Category │ Tokens │ Percentage │
├─────────────────────┼───────────┼──────────────┤
│ Files (src/) │ 140K │ 70% │
│ Chat History │ 25K │ 12.5% │
│ Steering Files │ 8K │ 4% │
│ System Prompt │ 5K │ 2.5% │
│ Tools/Resources │ 2K │ 1% │
└─────────────────────┴───────────┴──────────────┘
Top file consumers:
1. src/components/Dashboard.tsx → 18K tokens
2. src/api/handlers.ts → 15K tokens
3. src/lib/database.ts → 12K tokens
You: "/context remove src/components/Dashboard.tsx"
Kiro: Removed src/components/Dashboard.tsx (18K tokens)
New context usage: 162K / 200K (81%)
You: "/compact"
Kiro: [summarizes conversation history]
Compacted 25K tokens of chat history → 5K token summary
New context usage: 142K / 200K (71%)
Summary preserved:
- Your project uses Next.js 14 with TypeScript
- Current task: debugging authentication flow
- Key files: src/api/handlers.ts, src/lib/database.ts
You: "Now analyze the auth flow"
Kiro: [works with 71% context usage]
Looking at src/api/handlers.ts...
[analysis continues with plenty of context headroom]
The context usage meter shows in real-time as files/messages consume space.
The Core Question You’re Answering
“What happens when I load too much code into Kiro’s context, and how do I manage the 200K token budget without losing important information?”
Before experimenting, understand: Kiro’s context window is like RAM—finite and precious. Once full, either Kiro auto-compacts (potentially losing details) or refuses to load more. This project teaches you to be a context window architect: strategic about what you load, when to summarize, and how to preserve critical constraints.
Concepts You Must Understand First
Stop and research these before experimenting:
- Token Counting
- What is a “token” in LLM terms?
- How many tokens does a typical code file consume?
- Do comments, whitespace, and variable names count as tokens?
- Reference: Kiro Context Management
- Context Window Composition
- What’s the breakdown of Kiro’s 200K context? (files, chat, steering, system prompt)
- Which components are fixed (system prompt) vs dynamic (chat history)?
- How does adding a steering file affect available space?
- Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 1 (Foundations)
- Compaction vs Removal
- What’s the difference between
/compact(summarize) and/context remove(delete)? - What information is lost during compaction?
- When should you compact vs when should you remove?
- Reference: Slash Commands Reference
- What’s the difference between
Questions to Guide Your Design
Before experimenting, think through these:
- Loading Strategy
- Should you load the entire codebase at once or selectively load files as needed?
- How do you decide which files are “important enough” to keep in context?
- What’s the tradeoff between having more context vs faster responses?
- Compaction Timing
- Should you wait for Kiro’s auto-compact (80% threshold) or manually compact earlier?
- What information must survive compaction? (steering rules, architectural decisions, bug context)
- How do you verify that compaction preserved the right details?
- Multi-File Workflows
- When debugging across 10 files, how do you keep all relevant context loaded?
- How do you avoid reloading files you’ve already removed?
- Should you use subagents for parallel file analysis instead?
Thinking Exercise
Exercise: Context Budget Allocation
You have 200K tokens. You’re debugging a Next.js authentication bug. Plan your context budget:
Available: 200K tokens
Fixed costs:
- System prompt: 5K
- Steering files: 8K
- Tools/Resources: 2K
─────────────────────────
Remaining budget: 185K
You need to analyze:
src/auth/login.tsx(12K tokens)src/api/auth.ts(8K tokens)src/lib/session.ts(6K tokens)src/middleware.ts(4K tokens).env.example(1K tokens)- Chat history will grow over 50 messages (~25K tokens)
Questions while planning:
- How much space should you reserve for growing chat history?
- If you run out of space mid-conversation, which file would you remove first?
- Should you proactively compact at 60% or wait until 80%?
- Could you use
/grepto search files instead of loading them entirely?
The Interview Questions They’ll Ask
- “Explain how Kiro’s 200K context window is allocated between files, chat history, and system components.”
- “What’s the difference between
/compactand/context remove? When would you use each?” - “How would you debug an issue across 20 files without exceeding the context window?”
- “What information is lost when Kiro auto-compacts at 80% context usage?”
- “How do steering files affect available context space?”
- “What strategies would you use to work with a codebase larger than 200K tokens?”
Hints in Layers
Hint 1: Monitor Before You Act
Always run /context show before making decisions. Don’t guess about usage—measure it. This shows exactly what’s consuming space.
Hint 2: Load Incrementally
Don’t run /context add src/ to load everything. Instead, load specific files: /context add src/auth/login.tsx. Add more only when needed. Start small, expand gradually.
Hint 3: Use Grep for Reconnaissance
Before loading a file into context, use /grep to search it. Example: /grep "authenticate" src/auth/login.tsx. This finds info without burning context tokens.
Hint 4: Compact Early and Often
Don’t wait until 90% usage. When you finish a subtask (e.g., “fixed login bug”), run /compact to summarize that work and free up space for the next subtask.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Token-based language models | “Speech and Language Processing” by Jurafsky & Martin | Ch. 3: N-gram Language Models |
| Memory management principles | “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron | Ch. 9: Virtual Memory |
| Resource allocation strategies | “Designing Data-Intensive Applications” by Kleppmann | Ch. 1: Foundations of Data Systems |
Common Pitfalls & Debugging
Problem 1: “Kiro auto-compacted and forgot my instructions”
- Why: Instructions were only in chat history, not in steering files or repeated in context
- Fix: Put persistent instructions in
.kiro/steering/*.md, not in chat messages - Quick test: After
/compact, ask Kiro to repeat your key constraints—they should still be remembered
Problem 2: “Context fills up instantly when I load files”
- Why: Loaded entire directory with
/context add src/instead of specific files - Fix: Remove with
/context clear, then load only essential files one by one - Quick test:
/context show→ check if you’re loading files you don’t actually need
Problem 3: “Can’t load more files even though context shows 40%”
- Why: Individual file might be too large (>50K tokens) or rate limiting
- Fix: Use
/grepto search file instead of loading it, or load only relevant sections - Quick test: Check file size with
wc -w filename.tsx(words × 1.3 ≈ tokens)
Problem 4: “After compaction, Kiro gives different answers”
- Why: Compaction creates a lossy summary; nuance is lost
- Fix: Before compacting, explicitly state what must be remembered: “Key constraint: authentication must use JWT tokens with 1-hour expiry”
- Quick test: Ask Kiro “what are the key constraints?” before and after
/compact
Definition of Done
- Ran
/context showand understand the breakdown of usage - Loaded a large file (>10K tokens) and observed context percentage increase
- Used
/context remove <file>to free up space and verified percentage decrease - Triggered auto-compaction by reaching 80% context usage (or manually ran
/compact) - Verified that key instructions survived compaction
- Experimented with
/grepas an alternative to loading full files - Identified which components are fixed (system prompt) vs dynamic (chat history)
- Can explain when to use compact vs remove vs subagents for large tasks
Project 4: “The Subagent Researcher” — Agentic Architectures
| Attribute | Value |
|---|---|
| File | KIRO_CLI_MASTERY.md |
| Main Programming Language | Natural Language |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Agentic Architectures |
What you’ll build: Use subagents to map a complex repository. Aggregate findings into ARCHITECTURE.md.
Why it teaches Subagents: You break linear context limits using parallel research.
Core challenges you’ll face:
- Writing clear delegation prompts.
- Avoiding hallucinated summaries.
Success criteria:
ARCHITECTURE.mdaccurately reflects repo structure.
Real World Outcome
You’ll create an ARCHITECTURE.md document that provides a comprehensive overview of a complex codebase by delegating research to multiple parallel subagents, each analyzing different parts of the system.
Example Output:
$ cd ~/large-monorepo # 500+ files across 20 modules
$ kiro chat
You: "Map the entire architecture of this monorepo and create ARCHITECTURE.md"
Kiro: This codebase is large (500+ files). I'll spawn subagents to analyze different modules in parallel.
Spawning subagents:
→ Subagent 1: Analyzing frontend packages (apps/web, apps/mobile)
→ Subagent 2: Analyzing backend services (services/api, services/auth, services/billing)
→ Subagent 3: Analyzing shared libraries (packages/ui, packages/utils, packages/types)
→ Subagent 4: Analyzing infrastructure (terraform/, k8s/, .github/workflows/)
[After 2-3 minutes, all subagents complete]
Aggregating findings from 4 subagents...
Created ARCHITECTURE.md
$ cat ARCHITECTURE.md
# Monorepo Architecture Overview
## High-Level Structure
This is a TypeScript monorepo using Turborepo with the following components:
### Frontend Applications (2)
- **apps/web** - Next.js 14 customer dashboard (Server Components + App Router)
- **apps/mobile** - React Native app (iOS + Android)
### Backend Services (3)
- **services/api** - Express REST API (PostgreSQL + Prisma)
- **services/auth** - Authentication service (JWT + OAuth)
- **services/billing** - Stripe integration service
### Shared Packages (6)
- **packages/ui** - React component library (Tailwind CSS)
- **packages/utils** - Shared TypeScript utilities
- **packages/types** - Shared TypeScript types
- **packages/config** - ESLint, TypeScript, Tailwind configs
- **packages/database** - Prisma schema + migrations
- **packages/api-client** - Type-safe API client
### Infrastructure
- **terraform/** - AWS infrastructure (ECS, RDS, S3, CloudFront)
- **k8s/** - Kubernetes manifests for staging/prod
- **.github/workflows/** - CI/CD pipelines (test, build, deploy)
## Key Architectural Decisions
1. **Monorepo Strategy**: Turborepo for build caching and task orchestration
2. **Data Flow**: REST API → PostgreSQL (no GraphQL, deliberate choice for simplicity)
3. **Authentication**: Centralized auth service (JWT tokens, 1-hour expiry, refresh flow)
4. **Deployment**: ECS on AWS (not Lambda, for long-running WebSocket connections)
## Module Dependencies
[Dependency graph generated from package.json analysis]
web → @repo/ui, @repo/api-client, @repo/utils
api → @repo/database, @repo/utils
billing → @repo/database, @repo/api-client
## Critical Entry Points
- **Frontend**: `apps/web/app/page.tsx` (root route)
- **Backend**: `services/api/src/server.ts` (Express app)
- **Database**: `packages/database/prisma/schema.prisma` (data model)
---
This document was generated by analyzing 523 files across 4 parallel subagents.
Last updated: 2025-01-02
Each subagent had its own isolated 200K context window, allowing parallel analysis of the entire monorepo.
The Core Question You’re Answering
“How do I analyze a codebase that’s too large for a single context window, and how do I coordinate multiple AI agents working in parallel?”
Before building this, understand: Subagents are Kiro’s answer to the context window limit. Each subagent gets its own isolated 200K context, runs autonomously with a specific task, and reports back findings. This project teaches you distributed AI workflows—how to decompose a large problem into parallel subtasks and aggregate results.
Concepts You Must Understand First
Stop and research these before coding:
- Subagent Isolation
- How does each subagent get its own 200K context window?
- Can subagents communicate with each other, or only with the main agent?
- What happens if a subagent fails or runs out of context?
- Reference: Subagents and Plan Agent Changelog
- Task Decomposition
- How do you break a vague goal (“map the architecture”) into specific subagent tasks?
- What makes a good subagent delegation prompt?
- How do you avoid duplicated work between subagents?
- Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 6 (Concurrency)
- Result Aggregation
- How do you merge findings from 5 different subagents into a coherent document?
- What if subagents have conflicting information?
- How do you verify subagent results are accurate?
- Reference: Built-in Tools - use_subagent
Questions to Guide Your Design
Before implementing, think through these:
- Delegation Strategy
- Should you divide work by directory (frontend/ vs backend/), by file type (*.tsx vs *.ts), or by concern (auth, billing, UI)?
- How many subagents should you spawn? (Kiro supports up to 10 parallel)
- What instructions should each subagent receive? (generic vs specific)
- Overlap and Gaps
- How do you ensure no files are missed between subagent tasks?
- What if a file belongs to multiple domains (e.g., shared types)?
- Should subagents have overlapping scopes for validation?
- Aggregation Logic
- Should you manually merge subagent outputs or ask Kiro to synthesize them?
- What structure should the final
ARCHITECTURE.mdfollow? - How do you cite which subagent discovered which fact?
Thinking Exercise
Exercise: Design Subagent Delegation
You’re analyzing a Django monolith with this structure:
project/
├── apps/ (8 Django apps: users, products, orders, payments, etc.)
├── core/ (shared models, middleware, utilities)
├── api/ (DRF API endpoints)
├── frontend/ (React SPA)
├── tests/ (pytest test suite)
└── infrastructure/ (Docker, k8s, Terraform)
Design a subagent strategy:
Subagent 1: ______________________
Task: ___________________________
Expected output: _________________
Subagent 2: ______________________
Task: ___________________________
Expected output: _________________
Subagent 3: ______________________
Task: ___________________________
Expected output: _________________
Questions while designing:
- How do you handle shared code in
core/that all apps depend on? - Should the API analysis be separate from app analysis, or combined?
- How do you prevent one subagent from analyzing the entire codebase redundantly?
The Interview Questions They’ll Ask
- “Explain how Kiro’s subagents have isolated context windows and why this matters.”
- “How would you decompose the task of ‘document this codebase’ into parallel subagent tasks?”
- “What are the tradeoffs between spawning many small subagents vs few large subagents?”
- “How do you handle conflicting information from different subagents?”
- “When should you use subagents vs when should you use the main agent with
/compact?” - “How would you verify that subagent-generated documentation is accurate?”
Hints in Layers
Hint 1: Start with Manual Decomposition
Don’t just say “analyze the codebase.” First, manually explore with ls -R or tree to understand the structure. Then write specific tasks: “Subagent 1: analyze all files in apps/web/, summarize routing and components.”
Hint 2: Use Focused Delegation Prompts Each subagent needs a clear objective and output format. Example: “Analyze the authentication flow in services/auth/. Output: 1) Entry points, 2) Key functions, 3) Database models used, 4) External dependencies.”
Hint 3: Aggregate Incrementally
Don’t wait for all subagents to finish. As each completes, copy its findings into a draft ARCHITECTURE.md. This lets you spot gaps early and spawn follow-up subagents if needed.
Hint 4: Verify with Cross-References After aggregation, use the main agent to verify facts. Example: “Subagent 2 claims the API uses JWT authentication. Can you confirm this by reading services/api/src/middleware/auth.ts?”
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Concurrent task execution | “The Pragmatic Programmer” by Hunt & Thomas | Ch. 6: Concurrency |
| Distributed systems patterns | “Designing Data-Intensive Applications” by Kleppmann | Ch. 5: Replication |
| Code archaeology techniques | “Working Effectively with Legacy Code” by Feathers | Ch. 16: I Don’t Understand This Code |
Common Pitfalls & Debugging
Problem 1: “Subagents all analyzed the same files”
- Why: Delegation prompts were too vague (“analyze the frontend”)
- Fix: Be specific with directory scopes: “analyze only apps/web/”, “analyze only apps/mobile/”
- Quick test: Check each subagent’s output—file paths should not overlap significantly
Problem 2: “Subagent hallucinated architecture details”
- Why: Asked for high-level summary without grounding in actual files
- Fix: Require subagents to cite specific files and line numbers for claims
- Quick test: Manually verify 3-5 claims from each subagent’s output
Problem 3: “One subagent ran out of context”
- Why: Assigned too many files to a single subagent (e.g., “analyze all 200 components”)
- Fix: Split into smaller chunks or use
/grepfor reconnaissance before loading files - Quick test: If a subagent’s task scope > 50 files, consider splitting
Problem 4: “Aggregated document is incoherent”
- Why: Each subagent used different section structures
- Fix: Give all subagents a template: “Output format: ## Module Name\n### Purpose\n### Key Files\n### Dependencies”
- Quick test: All subagent outputs should have consistent markdown headings
Definition of Done
- Spawned at least 3 subagents with non-overlapping scopes
- Each subagent received a specific directory or module to analyze
- All subagents completed successfully (no context overflow errors)
- Aggregated findings into a single
ARCHITECTURE.mddocument - Document includes: high-level structure, module purposes, key entry points, dependencies
- Manually verified 5+ claims from subagent outputs (checked actual files)
- Cross-referenced between subagents to resolve conflicts
- Understand when to use subagents vs main agent with compaction
Project 5: “The Plan Architect” — Spec-Driven Development
| Attribute | Value |
|---|---|
| File | KIRO_CLI_MASTERY.md |
| Main Programming Language | Natural Language |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Spec-Driven Development |
What you’ll build: A small ToDo app built only after a /plan phase that outputs a tasks.md checklist.
Why it teaches Planning: Separates thinking from typing, reducing logic errors.
Success criteria:
- A complete plan exists before execution starts.
- Each task is checked off with verification notes.
Real World Outcome
You’ll build a complete ToDo application, but only after creating a detailed implementation plan using Kiro’s Planning agent (Shift+Tab or /plan). The planning phase produces a tasks.md checklist that guides implementation step-by-step.
Example Output:
$ mkdir todo-app && cd todo-app
$ kiro chat
You: "I want to build a simple todo app with React and Node.js backend"
Kiro: Let me help you plan this project. I'll enter planning mode to create a structured implementation plan.
[Entering Planning Mode - Shift+Tab pressed]
Plan Agent: I'll ask you some questions to refine your requirements.
**Question 1: Frontend Framework**
Which React setup do you prefer?
A) Create React App (simple, batteries-included)
B) Vite + React (faster, modern tooling)
C) Next.js (SSR, App Router)
Your choice: B
**Question 2: Backend API**
What backend approach?
A) REST API with Express
B) GraphQL with Apollo
C) tRPC (type-safe)
Your choice: A
**Question 3: Database**
Which database?
A) PostgreSQL (relational, production-ready)
B) SQLite (simple, file-based)
C) MongoDB (document-based)
Your choice: B
**Question 4: Authentication**
Do you need user authentication?
A) Yes, JWT-based
B) Yes, session-based
C) No, single-user app
Your choice: C
[Researching codebase... 0 existing files]
[Analyzing requirements...]
[Creating implementation plan...]
Plan created: tasks.md
$ cat tasks.md
# Todo App Implementation Plan
## Overview
Building a single-user todo application with:
- **Frontend**: Vite + React + TypeScript
- **Backend**: Express + SQLite
- **Features**: Create, Read, Update, Delete todos; Mark complete/incomplete
## Architecture
┌─────────────────┐ ┌─────────────────┐ │ React App │ ◄─HTTP─►│ Express API │ │ (Port 5173) │ │ (Port 3000) │ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ SQLite DB │ │ (todos.db) │ └─────────────────┘
## Task Breakdown
### Phase 1: Project Setup (30 minutes)
- [ ] Task 1.1: Initialize Vite React app with TypeScript
- [ ] Task 1.2: Set up Express server with TypeScript
- [ ] Task 1.3: Configure SQLite database with better-sqlite3
- [ ] Task 1.4: Create monorepo structure (frontend/, backend/)
### Phase 2: Backend API (1-2 hours)
- [ ] Task 2.1: Create database schema (todos table)
- [ ] Task 2.2: Implement GET /api/todos endpoint
- [ ] Task 2.3: Implement POST /api/todos endpoint
- [ ] Task 2.4: Implement PATCH /api/todos/:id endpoint (update)
- [ ] Task 2.5: Implement DELETE /api/todos/:id endpoint
- [ ] Task 2.6: Add CORS middleware for frontend
### Phase 3: Frontend UI (2-3 hours)
- [ ] Task 3.1: Create TodoList component
- [ ] Task 3.2: Create TodoItem component
- [ ] Task 3.3: Create AddTodo form component
- [ ] Task 3.4: Implement state management (useState + useEffect)
- [ ] Task 3.5: Add API client functions (fetch wrappers)
- [ ] Task 3.6: Style with Tailwind CSS or plain CSS
### Phase 4: Integration & Testing (1 hour)
- [ ] Task 4.1: Connect frontend to backend API
- [ ] Task 4.2: Test CRUD operations end-to-end
- [ ] Task 4.3: Handle loading states and errors
- [ ] Task 4.4: Add optimistic UI updates
## Demo Criteria
- User can add a todo with title and optional description
- User can mark todos as complete/incomplete (checkbox)
- User can delete todos
- Todos persist across page refreshes
- UI shows loading state during API calls
---
Plan Agent: This plan is ready for implementation. Would you like me to hand off to the execution agent? [Y/n]
You: Y
Kiro: [Exiting planning mode, entering execution mode]
I'll implement the plan task-by-task. Starting with Task 1.1...
[Implementation proceeds, checking off tasks in tasks.md as they complete]
The planning phase separated “what to build” from “how to build it,” reducing mistakes.
The Core Question You’re Answering
“How do I avoid jumping straight into coding without thinking through the architecture, leading to half-built features and refactoring nightmares?”
Before building this, understand: Most developers write code first, realize the design is flawed, and refactor. Kiro’s Planning agent inverts this: it asks clarifying questions, researches your codebase, and outputs a structured plan. This project teaches you spec-driven development—think deeply before typing.
Concepts You Must Understand First
Stop and research these before starting:
- Planning vs Execution Modes
- What’s the difference between the Plan agent and the main chat agent?
- Why does the Plan agent run in read-only mode?
- How do you transition from planning to execution?
- Reference: Subagents and Plan Agent Changelog
- Requirements Elicitation
- How does the Plan agent ask clarifying questions?
- What happens if you give vague requirements (“build an app”)?
- How do multiple-choice questions improve plan quality?
- Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 2 (A Pragmatic Approach)
- Task Decomposition
- What makes a good task? (specific, measurable, testable)
- How granular should tasks be? (30 min vs 4 hours)
- How do you organize tasks into phases?
- Book Reference: “Clean Architecture” by Robert C. Martin - Ch. 22 (The Clean Architecture)
Questions to Guide Your Design
Before entering planning mode, think through these:
- Project Scope
- Is this a prototype (2-3 days) or a production app (2-3 weeks)?
- What’s the minimum viable feature set?
- What can be added later vs must be included now?
- Plan Granularity
- Should tasks be 30 minutes each, or 2 hours each?
- How many phases should the plan have?
- Should you plan the entire app or just the first iteration?
- Verification Strategy
- How do you verify each task is “done”?
- What constitutes a passing demo for each phase?
- Should tasks include tests, or is manual verification OK?
Thinking Exercise
Exercise: Evaluate a Bad Plan vs Good Plan
Compare these two plans for the same todo app:
Bad Plan:
- [ ] Build frontend
- [ ] Build backend
- [ ] Connect them
- [ ] Test everything
Good Plan:
Phase 1: Backend API
- [ ] Create SQLite schema (todos table: id, title, completed, createdAt)
- [ ] Implement GET /api/todos (return JSON array)
- [ ] Test with curl: `curl http://localhost:3000/api/todos`
Phase 2: Frontend
- [ ] Create TodoList component (displays array of todos)
- [ ] Fetch todos from API on mount (useEffect)
- [ ] Test: Should show hardcoded test todos from backend
Questions while comparing:
- Which plan could a junior developer execute without asking questions?
- Which plan has measurable success criteria for each task?
- Which plan allows you to verify progress incrementally?
- Which plan would survive a week-long break and still be understandable?
The Interview Questions They’ll Ask
- “What are the benefits of separating planning from execution in software development?”
- “How does Kiro’s Plan agent differ from the main execution agent?”
- “What makes a good task in an implementation plan?” (Give 3 characteristics)
- “How would you handle a plan that’s too granular (100 tiny tasks) vs too vague (5 huge tasks)?”
- “When would you skip the planning phase and just start coding?”
- “How do you verify that a plan is complete before starting implementation?”
Hints in Layers
Hint 1: Start with Clear Requirements Don’t enter planning mode with “build an app.” Be specific: “Build a React todo app with Express backend, SQLite database, and no authentication.” The Plan agent needs constraints to create a useful plan.
Hint 2: Answer Questions Thoughtfully When the Plan agent asks multiple-choice questions, don’t rush. Each choice affects the entire plan. Choosing “Next.js” vs “Vite + React” changes 10+ tasks.
Hint 3: Review the Plan Before Executing
After the Plan agent generates tasks.md, read it fully. Ask: “Is anything missing? Are tasks too vague? Are dependencies clear?” Edit the plan before handing off to execution.
Hint 4: Check Off Tasks as You Go
Treat tasks.md like a real checklist. Mark tasks complete with ✓, add notes for deviations, and reference git commits. This creates an audit trail.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Requirements engineering | “The Pragmatic Programmer” by Hunt & Thomas | Ch. 2: A Pragmatic Approach |
| Task decomposition | “Clean Architecture” by Robert C. Martin | Ch. 22: The Clean Architecture |
| Agile planning | “User Stories Applied” by Mike Cohn | Ch. 3: Writing Stories |
Common Pitfalls & Debugging
Problem 1: “Plan is too vague to execute”
- Why: Requirements given to Plan agent were generic (“build a web app”)
- Fix: Provide specific constraints: “React + Node.js, single-user, CRUD operations, no auth”
- Quick test: Hand the plan to a teammate—can they implement it without asking questions?
Problem 2: “Plan includes 50+ micro-tasks”
- Why: Asked for extreme detail or Plan agent over-decomposed
- Fix: Group related micro-tasks into larger tasks (e.g., “Set up database” instead of 5 separate schema tasks)
- Quick test: Each task should take 30 mins - 2 hours, not 5 minutes
Problem 3: “Implementation deviates from plan”
- Why: Discovered new requirements or constraints during coding
- Fix: Update
tasks.mdas you go—add new tasks, mark others as skipped, document why - Quick test: At end of project,
tasks.mdshould reflect what actually happened
Problem 4: “Plan agent suggested wrong tech stack”
- Why: Didn’t answer clarifying questions carefully or gave conflicting requirements
- Fix: Re-enter planning mode, explicitly state tech stack: “Must use PostgreSQL, not SQLite”
- Quick test: Read the plan’s “Overview” section—does it match your actual requirements?
Definition of Done
- Entered planning mode with Shift+Tab or
/plan - Answered all clarifying questions from Plan agent
- Received a
tasks.mdfile with at least 10 specific tasks - Tasks are organized into logical phases (setup, backend, frontend, integration)
- Each task has a clear success criterion or demo requirement
- Reviewed plan and verified no major features are missing
- Handed off to execution agent and implemented at least 3 tasks
- Checked off completed tasks in
tasks.mdas work progressed
Project 6: “The Custom Persona Generator” — Agent Configuration
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | JSON |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | 2. Micro-SaaS (Specialized Agents) |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Agent Configuration |
What you’ll build: A security-auditor.json agent with read-only permissions and OWASP Top 10 in its prompt.
Why it teaches Custom Agents: You build specialized personas for specific SDLC phases.
Core challenges you’ll face:
- Configuring
allowedTools. - Injecting static resources.
Success criteria:
- The agent refuses to write files but can review code.
Real World Outcome
You’ll create a specialized security-auditor.json agent configuration that can analyze code for security vulnerabilities but cannot modify files. When you invoke this agent, it has the OWASP Top 10 embedded in its context and only has read access.
Example Output:
$ cat .kiro/agents/security-auditor.json
{
"name": "Security Auditor",
"description": "Reviews code for security vulnerabilities using OWASP Top 10 guidelines",
"model": "claude-opus-4",
"prompt": "You are a security auditor specializing in web application vulnerabilities. Your role is to identify security issues based on the OWASP Top 10:\n\n1. Broken Access Control\n2. Cryptographic Failures\n3. Injection\n4. Insecure Design\n5. Security Misconfiguration\n6. Vulnerable and Outdated Components\n7. Identification and Authentication Failures\n8. Software and Data Integrity Failures\n9. Security Logging and Monitoring Failures\n10. Server-Side Request Forgery (SSRF)\n\nAnalyze code for these vulnerabilities. Provide specific line numbers and remediation steps. NEVER modify code—only report findings.",
"tools": ["read", "grep", "glob"],
"allowedTools": ["read", "grep", "glob"],
"resources": [
"https://owasp.org/www-project-top-ten/"
]
}
$ kiro chat --agent security-auditor
Security Auditor: I'm the Security Auditor agent. I'll review your code for OWASP Top 10 vulnerabilities. I can read and analyze code but cannot modify files.
You: "Analyze src/api/auth.ts for security issues"
Security Auditor: [reads src/api/auth.ts]
**Security Audit Report**
File: src/api/auth.ts
**🔴 CRITICAL: SQL Injection (OWASP #3: Injection)**
- **Line 42:** Direct string concatenation in SQL query
```typescript
const query = `SELECT * FROM users WHERE email = '${req.body.email}'`;
- Risk: Attacker can inject malicious SQL (e.g.,
' OR '1'='1) - Fix: Use parameterized queries:
const query = 'SELECT * FROM users WHERE email = ?'; db.query(query, [req.body.email]);
🟠 HIGH: Weak Password Hashing (OWASP #2: Cryptographic Failures)
- Line 78: Using MD5 for password hashing
const hash = crypto.createHash('md5').update(password).digest('hex'); - Risk: MD5 is cryptographically broken and fast to crack
- Fix: Use bcrypt with salt:
const hash = await bcrypt.hash(password, 10);
🟡 MEDIUM: Missing Rate Limiting (OWASP #7: Auth Failures)
- Line 15-20: Login endpoint has no rate limiting
- Risk: Brute force attacks on user accounts
- Fix: Add express-rate-limit middleware
✅ GOOD PRACTICES FOUND:
- JWT tokens have 1-hour expiry (line 92)
- HTTPS enforcement via middleware (line 12)
Summary: 3 vulnerabilities found (1 critical, 1 high, 1 medium)
You: “Fix the SQL injection”
Security Auditor: I cannot modify files. I’m configured with read-only access (allowedTools: [“read”, “grep”, “glob”]).
To fix this, you should:
- Exit this agent with /exit
- Use the main Kiro agent or a developer agent
- Implement the parameterized query fix I suggested above ```
The security-auditor agent enforces read-only behavior through allowedTools.
The Core Question You’re Answering
“How do I create specialized AI agents with specific expertise and restricted permissions for different phases of the development lifecycle?”
Before building this, understand: A single general-purpose agent can’t excel at everything. Kiro’s custom agents let you create personas with domain expertise (security, testing, documentation) and controlled permissions (read-only, write-only, specific directories). This project teaches you how to architect agent specialization.
Concepts You Must Understand First
Stop and research these before coding:
- Agent Configuration Schema
- What fields are required vs optional in agent JSON?
- How does the
promptfield differ from thenameanddescription? - What’s the precedence: agent config → project config → global config?
- Reference: Agent Configuration Reference
- Tool Permissions
- What’s the difference between
tools(available) andallowedTools(auto-approved)? - How do wildcards work? (e.g.,
read*matchesread,read_multiple) - What happens if you request a tool not in
allowedTools? - Reference: Creating Custom Agents
- What’s the difference between
- Resource Injection
- How do
resources(URLs, file paths) get loaded into context? - What’s the limit on resource size?
- Can resources be dynamically updated, or are they static at agent creation?
- Reference: Agent Examples
- How do
Questions to Guide Your Design
Before implementing, think through these:
- Agent Purpose
- What specific task should this agent excel at? (security, performance, docs)
- Should it be read-only, write-only, or full access?
- What domain knowledge must be embedded in the prompt?
- Permission Boundaries
- Which tools should be allowed? (read, write, grep, shell, MCP tools)
- Should it access the network? (web_fetch, MCP servers)
- Should it execute code? (shell, bash)
- Prompt Engineering
- How do you encode expertise in the prompt field?
- Should you include examples of good/bad patterns?
- How do you prevent the agent from ignoring its role?
Thinking Exercise
Exercise: Design Agents for SDLC Phases
For a typical software development lifecycle, design 3 specialized agents:
1. Agent Name: __________________
Purpose: __________________
Allowed Tools: __________________
Key Prompt Content: __________________
2. Agent Name: __________________
Purpose: __________________
Allowed Tools: __________________
Key Prompt Content: __________________
3. Agent Name: __________________
Purpose: __________________
Allowed Tools: __________________
Key Prompt Content: __________________
Questions while designing:
- Which phase needs write access, and which needs read-only?
- Should the testing agent be able to modify test files?
- Should the deployment agent have access to shell commands?
- How do you prevent the security agent from “fixing” code itself?
The Interview Questions They’ll Ask
- “Explain the difference between the
toolsandallowedToolsfields in a Kiro agent config.” - “How would you design a read-only code review agent vs a code-writing agent?”
- “What security considerations exist when giving an agent shell access?”
- “How do
resourcesget injected into an agent’s context?” - “When would you use multiple specialized agents vs one general-purpose agent?”
- “How would you test that an agent’s permissions are correctly enforced?”
Hints in Layers
Hint 1: Start with the Prompt Before configuring tools, write the prompt. Define the agent’s role, expertise, and constraints in natural language. Example: “You are a security auditor. NEVER modify code. Only report vulnerabilities.”
Hint 2: Test Permission Boundaries After creating the agent, intentionally try to violate its permissions. If it’s read-only, ask it to write a file. Verify it refuses or prompts for approval.
Hint 3: Use Resources for Static Knowledge
Don’t put 10KB of OWASP guidelines in the prompt. Put them in resources as a URL or file path. This keeps the prompt clean and allows updates without changing the agent config.
Hint 4: Model Selection Matters
Use claude-opus-4 for complex reasoning tasks (security analysis, architecture review). Use claude-haiku-4 for simple tasks (linting, formatting checks). This optimizes cost and speed.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Security best practices | “Practical Malware Analysis” by Sikorski & Honig | Ch. 1: Basic Static Techniques |
| Agent design patterns | “The Pragmatic Programmer” by Hunt & Thomas | Ch. 7: While You Are Coding |
| Permission models | “Working Effectively with Legacy Code” by Feathers | Ch. 4: The Seam Model |
Common Pitfalls & Debugging
Problem 1: “Agent ignores its read-only role and suggests writing files”
- Why: Prompt doesn’t explicitly forbid writing; the agent sees
writein available tools - Fix: Remove
writefrom thetoolsarray entirely, and add to prompt: “You CANNOT modify files.” - Quick test: Ask agent “fix this bug” and verify it refuses to write
Problem 2: “Agent requires approval for every read operation”
- Why:
readis intoolsbut not inallowedTools - Fix: Add
"read"toallowedToolsarray - Quick test:
kiro chat --agent security-auditor→ agent should read files without prompting
Problem 3: “Resources file is too large (>50K)”
- Why: Loaded entire OWASP documentation as a local file
- Fix: Use a URL resource instead, or extract only relevant sections
- Quick test: Check agent context usage with
/context showafter startup
Problem 4: “Agent config not found”
- Why: File is not in
.kiro/agents/directory or has incorrect JSON syntax - Fix: Verify path:
.kiro/agents/security-auditor.jsonand validate JSON withjq - Quick test:
kiro chat --agent security-auditorshould load without “agent not found” error
Definition of Done
- Created
.kiro/agents/security-auditor.jsonwith valid JSON - Configured
allowedToolsto include only["read", "grep", "glob"] - Wrote a detailed prompt including OWASP Top 10 guidelines
- Added relevant resources (OWASP URL or security checklist)
- Tested agent with
kiro chat --agent security-auditor - Verified agent can read and analyze code files
- Verified agent refuses or prompts when asked to write/modify files
- Agent successfully identified at least 1 security issue in sample code
Project 7: “The Executable Spec with mdflow” — Literate Programming
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Markdown / Bash |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Literate Programming |
What you’ll build: A Markdown spec whose code blocks are executed and validated, keeping docs in sync with reality.
Why it teaches Executable Specs: Documentation that executes cannot rot.
Success criteria:
- The spec fails when code changes and passes after repair.
Real World Outcome
You’ll create a living specification document where every code example is automatically executed and validated. When your implementation changes, the spec either passes (proving docs are accurate) or fails (alerting you to update them).
Example: API Specification (api-spec.md)
# User Authentication API
## Creating a User
The `/api/users` endpoint accepts POST requests with email and password:
```bash
curl -X POST http://localhost:3000/api/users \
-H "Content-Type: application/json" \
-d '{"email":"test@example.com","password":"secure123"}'
Expected response:
{
"id": "usr_abc123",
"email": "test@example.com",
"created_at": "2025-01-02T10:00:00Z"
}
When you run mdflow execute api-spec.md:
$ mdflow execute api-spec.md
Running: api-spec.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Block 1: curl POST /api/users
Status: 201 Created
Response matched expected JSON schema
✓ Block 2: Expected response validation
Field 'id' matches pattern: usr_[a-z0-9]+
Field 'email' equals: test@example.com
Field 'created_at' is valid ISO 8601
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
All blocks passed ✓ (2/2)
Execution time: 1.2s
When the API breaks:
$ mdflow execute api-spec.md
Running: api-spec.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✗ Block 1: curl POST /api/users
Status: 500 Internal Server Error
Expected: 201 Created
Response:
{
"error": "Database connection failed"
}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
FAILED: 1 of 2 blocks failed
Execution time: 0.8s
This forces you to either fix the implementation or update the spec. Documentation can never drift from reality.
The Core Question You’re Answering
“How do I ensure my documentation stays synchronized with my actual codebase as it evolves?”
Most documentation becomes outdated within weeks of writing. Code examples break, APIs change, but the docs remain frozen in time. This project addresses the fundamental problem: passive documentation rots, executable documentation validates itself.
By embedding executable tests directly in your specification, you create a contract that must be maintained. When the contract breaks, CI fails, forcing alignment.
Concepts You Must Understand First
Stop and research these before coding:
- Literate Programming
- What did Donald Knuth mean by “programs as literature”?
- How does weaving code with narrative improve understanding?
- Why is order of presentation different from order of execution?
- Book Reference: “Literate Programming” by Donald E. Knuth
- Test-Driven Documentation
- How do executable examples serve as both docs and tests?
- What makes a good assertion in documentation?
- When should examples be simplified vs realistic?
- Book Reference: “Growing Object-Oriented Software, Guided by Tests” Ch. 2
- Markdown Processing
- How do you parse and extract fenced code blocks?
- What metadata can be attached to code blocks (language, annotations)?
- How do you preserve line numbers for error reporting?
- Web Reference: CommonMark Specification - Fenced Code Blocks
Questions to Guide Your Design
Before implementing, think through these:
- Execution Model
- How do you isolate each code block’s execution environment?
- Should blocks share state, or run independently?
- How do you handle blocks that depend on previous outputs?
- What happens if block 3 fails—do you run block 4?
- Assertion Syntax
- How do users specify expected outputs (inline, separate blocks)?
- Do you support regex matching, JSON schema validation, or both?
- How do you handle non-deterministic outputs (timestamps, IDs)?
- Should exit codes alone determine success, or stdout comparison?
- Language Support
- How do you execute different languages (bash, python, curl)?
- Do you need sandboxing (Docker containers, chroot)?
- How do you manage dependencies (language runtimes, system packages)?
- Should you support custom interpreters per project?
Thinking Exercise
Trace: Multi-Step API Workflow
Given this specification:
## User Workflow
Create a user:
```bash
USER_ID=$(curl -s POST /api/users -d '{"email":"test@example.com"}' | jq -r .id)
Verify creation:
curl GET "/api/users/$USER_ID"
Expected: {"id":"$USER_ID","email":"test@example.com"}
*Questions while designing:*
- How do you propagate `$USER_ID` from block 1 to block 2?
- Should the spec run in a single shell session, or fresh shells per block?
- What if `USER_ID` is empty because block 1 failed—should block 2 run?
- How do you validate that the returned ID matches the captured variable?
**Design Decision Matrix:**
| Approach | Pros | Cons |
|----------|------|------|
| Single shell session | State persists naturally | Pollution between tests |
| Environment variables | Explicit data flow | Manual propagation |
| JSON output files | Language-agnostic | Filesystem clutter |
---
#### The Interview Questions They'll Ask
1. "How would you design a system to execute code blocks from Markdown while preserving security boundaries?"
2. "Explain the tradeoffs between making documentation executable versus keeping separate test suites."
3. "How do you handle non-deterministic outputs (timestamps, random IDs) in executable documentation?"
4. "What strategies prevent test pollution when documentation blocks depend on shared state?"
5. "How would you integrate this into CI/CD to fail builds when documentation drifts from implementation?"
6. "Describe how you'd support multiple programming languages in a single specification document."
---
#### Hints in Layers
**Hint 1: Start with a Parser**
Use a Markdown parser (like `markdown-it` in Node.js or `mistune` in Python) to extract fenced code blocks. Store metadata (language, line numbers) for each block.
**Hint 2: Execution Strategy**
For each code block:
- Write code to a temporary script file
- Execute using the appropriate interpreter (`bash`, `python3`, `node`)
- Capture stdout, stderr, and exit code
- Compare against expected outputs (if specified)
**Hint 3: State Management**
Create a temporary directory as a "sandbox workspace":
/tmp/mdflow-session-abc123/ ├── block-1.sh ├── block-1.stdout ├── block-2.sh └── shared.env # Environment variables for state
**Hint 4: Assertion Annotations**
Support special comments for assertions:
```markdown
```bash
curl /api/users/123
# expect-status: 200
# expect-json: {"id":"123"}
Parse these comments to build validation rules.
---
#### Books That Will Help
| Topic | Book | Chapter |
|-------|------|---------|
| Literate Programming Philosophy | "Literate Programming" by Donald E. Knuth | Introduction & Ch. 1 |
| Test-Driven Development | "Test Driven Development: By Example" by Kent Beck | Part I |
| Markdown Parsing | "Crafting Interpreters" by Robert Nystrom | Ch. 4 (Scanning) |
| Documentation as Code | "Docs for Developers" by Jared Bhatti et al. | Ch. 6 |
---
#### Common Pitfalls & Debugging
**Problem 1: "Code blocks fail due to missing dependencies"**
- **Why:** Spec assumes tools are installed (curl, jq, etc.)
- **Fix:** Add a validation phase that checks for required binaries before execution
- **Quick test:** `command -v curl || echo "Missing curl"`
**Problem 2: "Non-deterministic outputs cause false failures"**
- **Why:** Timestamps, UUIDs, or random data change every run
- **Fix:** Support regex patterns or placeholder matching (`expect-pattern: usr_[a-z0-9]+`)
- **Quick test:** Replace exact matches with pattern assertions
**Problem 3: "State leaks between blocks"**
- **Why:** Environment variables, temp files, or database records persist
- **Fix:** Run each block in a fresh subprocess with isolated environment
- **Quick test:** Add `set -u` to bash blocks to catch undefined variables
**Problem 4: "Error messages don't point to the right line in the spec"**
- **Why:** You're losing line number context during extraction
- **Fix:** Store original line numbers when parsing, include them in error reports
- **Quick test:** `Error in api-spec.md:15 (block 2)`
---
#### Definition of Done
- [ ] Parser extracts all fenced code blocks with metadata (language, line numbers)
- [ ] Executor runs bash and at least one other language (Python or curl)
- [ ] Assertions validate exit codes and stdout/stderr content
- [ ] Failed blocks produce clear error messages with file/line references
- [ ] Spec execution stops on first failure (or continues with `--keep-going` flag)
- [ ] Environment isolation prevents state leaks between blocks
- [ ] README includes example spec demonstrating success and failure cases
- [ ] CI integration example shows how to fail builds on spec failures
---
### [Project 8: "The Property Based Testing Suite" — Advanced Testing](/guides/kiro-cli-mastery/P08-the-property-based-testing-suite-advanced-testing)
| Attribute | Value |
|-----------|-------|
| **File** | `KIRO_CLI_LEARNING_PROJECTS.md` |
| **Main Programming Language** | Python (Hypothesis) or TypeScript (fast-check) |
| **Coolness Level** | Level 4: Hardcore Tech Flex |
| **Difficulty** | Level 3: Advanced |
| **Knowledge Area** | Advanced Testing |
**What you'll build**: A booking system tested with PBT to prove no overlapping bookings.
**Why it teaches PBT**: It exposes subtle edge cases AI might miss.
**Success criteria**:
- PBT finds at least one real bug before you fix it.
---
#### Real World Outcome
You'll implement a room booking system where property-based testing automatically generates thousands of test cases, exposing edge cases like timezone boundaries, concurrent bookings, and off-by-one errors that example-based tests would miss.
**Example: Booking System Test Output**
```python
# test_booking.py
from hypothesis import given, strategies as st
from datetime import datetime, timedelta
from booking import BookingSystem
@given(
bookings=st.lists(
st.tuples(
st.datetimes(min_value=datetime(2025,1,1), max_value=datetime(2025,12,31)),
st.integers(min_value=1, max_value=8) # duration in hours
),
min_size=2,
max_size=50
)
)
def test_no_overlapping_bookings(bookings):
system = BookingSystem()
for start, duration in bookings:
end = start + timedelta(hours=duration)
system.book("room-A", start, end)
# Property: No two bookings should overlap
all_bookings = system.get_bookings("room-A")
for i, booking1 in enumerate(all_bookings):
for booking2 in all_bookings[i+1:]:
assert not booking1.overlaps(booking2), \
f"Overlap detected: {booking1} and {booking2}"
When you run the tests:
$ pytest test_booking.py -v
test_booking.py::test_no_overlapping_bookings FAILED
================================= FAILURES =================================
test_no_overlapping_bookings - AssertionError
Falsifying example:
bookings = [
(datetime(2025, 3, 15, 14, 0, 0), 2), # 14:00-16:00
(datetime(2025, 3, 15, 15, 59, 59), 1) # 15:59:59-16:59:59
]
AssertionError: Overlap detected:
Booking(start=2025-03-15 14:00:00, end=2025-03-15 16:00:00)
Booking(start=2025-03-15 15:59:59, end=2025-03-15 16:59:59)
Hypothesis found a counterexample after 147 test cases.
Shrunk input to minimal failing case.
The bug revealed: Your overlap check used start < other_end and end > other_start, but failed on second-level precision boundaries. The fix:
def overlaps(self, other):
# Fixed: Use <= for inclusive boundary checking
return self.start < other.end and self.end > other.start
After fixing:
$ pytest test_booking.py -v
test_booking.py::test_no_overlapping_bookings PASSED
Hypothesis ran 100 test cases (2,847 examples total)
All properties hold ✓
Property-based testing generated 2,847 booking combinations and proved your invariant holds across all of them.
The Core Question You’re Answering
“How do I test properties that must hold for ALL possible inputs, not just the examples I thought of?”
Traditional example-based testing forces you to imagine edge cases. You write tests for:
- Normal case: 2pm-3pm
- Boundary case: Midnight
- Edge case: Leap year February 29th
But you’ll always miss combinations. Property-based testing inverts this: you state the invariant (no overlaps), and the framework generates inputs designed to break it.
This project teaches you to think in properties (universal truths) rather than examples (specific scenarios).
Concepts You Must Understand First
Stop and research these before coding:
- Property-Based Testing (PBT) vs Example-Based Testing
- What is a “property” in the context of testing?
- How does random generation differ from hand-crafted examples?
- What is “shrinking” and why is it critical for debugging?
- Book Reference: “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert - Ch. 1
- Test Generators and Strategies
- How do you define the space of valid inputs?
- What constraints ensure generated data is realistic?
- How do you generate dependent values (end time > start time)?
- Web Reference: Hypothesis Documentation - Strategies
- Invariants and Postconditions
- What makes a good invariant (universally true property)?
- How do you express “for all X, property P holds”?
- When should you test state transitions vs final outcomes?
- Book Reference: “Growing Object-Oriented Software, Guided by Tests” Ch. 19
Questions to Guide Your Design
Before implementing, think through these:
- System Properties
- What invariants must ALWAYS hold in your booking system?
- No overlapping bookings for the same room
- Booking end time > start time
- Cannot book in the past
- Total bookings <= room capacity
- Which of these can be violated by bad inputs vs implementation bugs?
- Test Data Generation
- How do you generate realistic datetime ranges?
- Should you test with timezones, or UTC only?
- How do you ensure generated bookings have variety (short, long, overnight)?
- Do you need to generate user IDs, or just time ranges?
- Shrinking Strategy
- When a test fails with 50 bookings, how do you find the minimal failing case?
- Should you shrink by removing bookings, or simplifying time ranges?
- How do you preserve the failure while reducing complexity?
Thinking Exercise
Property Discovery: Booking System Invariants
Given a booking system with this interface:
class BookingSystem:
def book(room_id, start, end) -> booking_id
def cancel(booking_id) -> bool
def get_bookings(room_id) -> List[Booking]
List all properties that should ALWAYS hold:
Temporal Properties:
- For any booking:
booking.end > booking.start - Cannot book a time in the past relative to system time
Collision Properties:
- No two active bookings for the same room overlap
- After canceling booking X, overlaps must be recalculated
State Properties:
- Total active bookings equals successful
book()calls minuscancel()calls - get_bookings() returns bookings in chronological order
Now design PBT tests for each:
| Property | Generator Strategy | Assertion |
|---|---|---|
| 1. End > Start | Generate (start, start + positive_delta) |
assert booking.end > booking.start |
| 3. No overlaps | Generate list of (start, duration) tuples |
Pairwise overlap check |
| 5. Booking count | Generate sequence of book/cancel actions | assert len(get_bookings) == expected |
The Interview Questions They’ll Ask
-
“Explain the difference between property-based testing and fuzzing. When would you use each?”
-
“How would you write a property-based test for a sorting algorithm without reimplementing the sort?”
-
“What strategies would you use to generate valid JSON that conforms to a specific schema?”
-
“Describe how shrinking works in Hypothesis/QuickCheck and why it’s essential for debugging.”
-
“How would you test a distributed system’s consistency guarantees using property-based testing?”
-
“What are the limitations of PBT? Name scenarios where example-based tests are superior.”
Hints in Layers
Hint 1: Start with Simple Properties Before testing complex booking logic, verify basic properties:
@given(st.datetimes(), st.timedelta(min_value=timedelta(hours=1)))
def test_booking_duration_positive(start, duration):
end = start + duration
booking = Booking(start, end)
assert booking.duration() > timedelta(0)
Hint 2: Use Composite Strategies Generate bookings that meet domain constraints:
valid_booking = st.builds(
Booking,
start=st.datetimes(min_value=datetime(2025,1,1)),
duration=st.integers(min_value=1, max_value=8).map(lambda h: timedelta(hours=h))
)
Hint 3: Test State Machines Model booking workflows as state transitions:
class BookingStateMachine(RuleBasedStateMachine):
@rule(start=datetimes(), duration=hours())
def book_room(self, start, duration):
self.system.book("room-A", start, start+duration)
# Invariant: check no overlaps after every booking
Hint 4: Shrinking and Debugging When a test fails, Hypothesis automatically simplifies the input. Example:
Initial failure: 50 bookings
Shrunk to: 2 bookings (minimal reproduction)
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| PBT Fundamentals | “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert | Ch. 1-3 |
| Hypothesis (Python) | “Effective Python” by Brett Slatkin | Item 76 |
| QuickCheck (Haskell) | “Learn You a Haskell for Great Good!” by Miran Lipovača | Ch. 11 |
| State Machine Testing | “Hypothesis Documentation” (online) | Stateful Testing Guide |
Common Pitfalls & Debugging
Problem 1: “Tests pass locally but fail in CI due to timezone differences”
- Why: Generated datetimes assume local timezone
- Fix: Always use UTC for test data:
st.datetimes(timezones=st.just(timezone.utc)) - Quick test:
export TZ=America/New_York && pytest
Problem 2: “Hypothesis generates unrealistic edge cases (year 9999)”
- Why: Default datetime range is too broad
- Fix: Constrain generators to realistic bounds:
min_value=datetime(2025,1,1), max_value=datetime(2030,12,31) - Quick test: Add
@settings(verbosity=Verbosity.verbose)to see generated values
Problem 3: “Test fails intermittently with different shrunk examples”
- Why: Property relies on system state (database, clock)
- Fix: Use deterministic seeds and isolated test fixtures
- Quick test:
@given(...) @settings(derandomize=True)
Problem 4: “Shrinking takes too long (>30 seconds)”
- Why: Complex data structures with many interdependencies
- Fix: Simplify generators or use
@settings(max_examples=50)during development - Quick test: Monitor shrinking with
--hypothesis-show-statistics
Definition of Done
- Implemented booking system with
book(),cancel(), andget_bookings()methods - Property test verifies no overlapping bookings (with Hypothesis generating 100+ examples)
- Property test found at least one real bug (documented in README)
- Tests use constrained datetime generation (realistic time ranges)
- Shrinking produces minimal failing examples (verified manually)
- README explains each property being tested and why it matters
- CI runs PBT with fixed seed for reproducible failures
- Coverage report shows all edge cases exercised by generated inputs
Project 9: “The Postgres Analyst” — Model Context Protocol (MCP)
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | SQL / JSON (Config) |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | 3. Service & Support (Data Ops) |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Model Context Protocol (MCP) |
What you’ll build: Configure postgres-mcp in mcp.json and run queries via Kiro.
Why it teaches MCP: Kiro gains real, typed access to your database.
Core challenges you’ll face:
- Correct connection strings.
- Using read-only DB users.
Success criteria:
- Kiro answers schema-based questions by executing real SQL.
Real World Outcome
You’ll configure the PostgreSQL MCP server so Kiro can directly query your database to answer questions like “How many active users signed up in December?” without you writing SQL manually. Kiro inspects your schema, generates queries, and returns formatted results.
Configuration (.kiro/settings/mcp.json):
{
"mcpServers": {
"postgres-analytics": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": {
"POSTGRES_CONNECTION_STRING": "postgresql://readonly_user:password@localhost:5432/production_db"
}
}
}
}
When you ask Kiro to analyze data:
$ kiro chat
You: "How many users registered in the last 30 days?"
Kiro: Let me query the database to find that information.
[Tool: postgres-analytics/query]
SELECT COUNT(*) as user_count
FROM users
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
AND created_at < CURRENT_DATE;
Result:
┌────────────┐
│ user_count │
├────────────┤
│ 1,247 │
└────────────┘
There were 1,247 new user registrations in the last 30 days.
You: "Show me the top 5 products by revenue this month"
Kiro: [Tool: postgres-analytics/query]
SELECT
p.name,
SUM(oi.quantity * oi.price) as total_revenue
FROM products p
JOIN order_items oi ON p.id = oi.product_id
JOIN orders o ON oi.order_id = o.id
WHERE o.created_at >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY p.id, p.name
ORDER BY total_revenue DESC
LIMIT 5;
Result:
┌─────────────────────┬────────────────┐
│ name │ total_revenue │
├─────────────────────┼────────────────┤
│ Premium Plan (Year) │ $24,500.00 │
│ Pro Subscription │ $18,200.00 │
│ Enterprise License │ $15,000.00 │
│ Starter Kit │ $8,750.00 │
│ Add-on Module │ $3,200.00 │
└─────────────────────┴────────────────┘
The power: Kiro inspected your schema automatically and generated production-safe, read-only SQL. You never wrote a query manually.
The Core Question You’re Answering
“How can I give an AI assistant safe, structured access to my live database for analytics without risking data corruption?”
Traditional BI tools require learning query languages, dashboards become stale, and ad-hoc questions require engineering time. This project solves the paradox: grant database access without granting database risk.
By configuring MCP with a read-only user and schema introspection, Kiro becomes your personal data analyst that can’t break anything.
Concepts You Must Understand First
Stop and research these before coding:
- PostgreSQL Connection Strings
- What components make up a connection URI (user, host, port, database)?
- How do you specify SSL/TLS requirements in connection strings?
- What’s the difference between connection pooling and direct connections?
- Book Reference: “PostgreSQL: Up and Running” by Regina Obe - Ch. 2
- Database Permissions and Roles
- How do you create a read-only user in PostgreSQL?
- What’s the difference between
GRANT SELECTandGRANT USAGE? - How do you revoke write permissions (
INSERT,UPDATE,DELETE)? - Web Reference: PostgreSQL Documentation - GRANT
- MCP Server Configuration
- How does Kiro communicate with MCP servers (stdio vs HTTP)?
- What environment variables are passed to MCP server processes?
- How do you debug MCP server startup failures?
- Web Reference: Model Context Protocol Specification
Questions to Guide Your Design
Before implementing, think through these:
- Security Boundaries
- Should the MCP server connect as a read-only user, or use row-level security?
- How do you prevent Kiro from accessing sensitive tables (passwords, PII)?
- Should you use a separate analytics database (replica)?
- What happens if Kiro generates expensive queries (table scans)?
- Schema Discovery
- How does Kiro learn about your tables, columns, and relationships?
- Should you provide table descriptions as MCP resources?
- How do you handle dynamic schemas (frequent migrations)?
- Should Kiro see views, or only base tables?
- Query Safety
- How do you prevent queries that could time out (missing indexes)?
- Should you enforce query timeouts at the database or MCP level?
- How do you log all SQL executed by Kiro for audit purposes?
- What if Kiro generates syntactically correct but semantically wrong SQL?
Thinking Exercise
Scenario: Granting Safe Access
Given this database schema:
CREATE TABLE users (
id UUID PRIMARY KEY,
email VARCHAR(255) NOT NULL,
password_hash VARCHAR(255) NOT NULL, -- SENSITIVE
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
total DECIMAL(10,2),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE sessions (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
token VARCHAR(255) NOT NULL, -- SENSITIVE
expires_at TIMESTAMP
);
Design a read-only role for Kiro:
Option 1: Table-Level Permissions
CREATE ROLE kiro_readonly WITH LOGIN PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE production_db TO kiro_readonly;
GRANT USAGE ON SCHEMA public TO kiro_readonly;
GRANT SELECT ON users, orders TO kiro_readonly;
-- Problem: What about the password_hash column in users?
Option 2: Column-Level Permissions
-- PostgreSQL doesn't support column-level GRANT SELECT
-- Must use views instead
CREATE VIEW users_safe AS
SELECT id, email, created_at FROM users;
GRANT SELECT ON users_safe, orders TO kiro_readonly;
-- Better: Kiro can't see password_hash
Option 3: Row-Level Security
ALTER TABLE sessions ENABLE ROW LEVEL SECURITY;
CREATE POLICY sessions_no_access ON sessions FOR SELECT
USING (false); -- Block all access to sessions table
GRANT SELECT ON sessions TO kiro_readonly;
-- Kiro sees the table exists but gets 0 rows
Which approach is best for analytics? Consider:
- Transparency (Kiro knows columns exist but can’t access)
- Maintenance (adding new tables requires updating grants)
- Performance (views add overhead)
The Interview Questions They’ll Ask
-
“How would you design a read-only database role that can access analytics tables but not PII columns?”
-
“Explain the security implications of giving an AI direct database access. What guardrails would you implement?”
-
“How do you prevent an AI from generating expensive queries that could impact production performance?”
-
“Describe the tradeoffs between using database views vs application-level filtering for sensitive data.”
-
“How would you audit and log all SQL queries executed by an AI assistant for compliance purposes?”
-
“What strategies would you use to handle schema changes without breaking the MCP server configuration?”
Hints in Layers
Hint 1: Create a Read-Only User
CREATE ROLE kiro_analyst WITH LOGIN PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE analytics_db TO kiro_analyst;
GRANT USAGE ON SCHEMA public TO kiro_analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO kiro_analyst;
-- Ensure future tables are also read-only
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO kiro_analyst;
Hint 2: Test Permissions
psql -U kiro_analyst -d analytics_db -c "SELECT * FROM users LIMIT 1;"
# Should succeed
psql -U kiro_analyst -d analytics_db -c "DELETE FROM users WHERE id = '123';"
# Should fail: ERROR: permission denied for table users
Hint 3: Configure MCP with Connection String
{
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": {
"POSTGRES_CONNECTION_STRING": "postgresql://kiro_analyst:secure_password@localhost:5432/analytics_db?sslmode=require"
}
}
}
}
Hint 4: Verify MCP Server is Running
kiro chat
You: "/context show"
# Should show:
MCP Servers:
- postgres (stdio) - Connected ✓
Tools: query_postgres, describe_schema
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| PostgreSQL Permissions | “PostgreSQL: Up and Running” by Regina Obe | Ch. 3 (Roles and Privileges) |
| SQL Security | “SQL Antipatterns” by Bill Karwin | Ch. 15 (SQL Injection) |
| Database Design | “Designing Data-Intensive Applications” by Martin Kleppmann | Ch. 2 (Data Models) |
| MCP Protocol | “Model Context Protocol Specification” (online) | Server Implementation Guide |
Common Pitfalls & Debugging
Problem 1: “MCP server fails to start with ‘connection refused’“
- Why: PostgreSQL isn’t accepting connections on the specified host/port
- Fix: Verify PostgreSQL is running:
pg_isready -h localhost -p 5432 - Quick test:
psql -U kiro_analyst -d analytics_dbshould connect
Problem 2: “Kiro can see schema but gets ‘permission denied’ on queries”
- Why: Role has
USAGEon schema but notSELECTon tables - Fix:
GRANT SELECT ON ALL TABLES IN SCHEMA public TO kiro_analyst; - Quick test:
SELECT table_name FROM information_schema.role_table_grants WHERE grantee = 'kiro_analyst';
Problem 3: “Connection string in mcp.json exposed in logs”
- Why: Environment variables are logged during MCP server startup
- Fix: Use a
.envfile loaded by the MCP server wrapper script - Quick test: Grep Kiro logs for password leaks
Problem 4: “Queries time out on large tables”
- Why: Kiro generated table scans without
LIMITclauses - Fix: Set
statement_timeoutfor the role:ALTER ROLE kiro_analyst SET statement_timeout = '30s'; - Quick test:
SHOW statement_timeout;askiro_analyst
Definition of Done
- Created read-only PostgreSQL role with
SELECT-only permissions - Verified role cannot execute
INSERT,UPDATE,DELETE, orDROP - Configured
postgres-mcpin.kiro/settings/mcp.jsonwith secure connection string - Kiro can query schema and execute SELECT statements successfully
- Tested that Kiro cannot access sensitive tables (sessions, password_hash columns)
- Set query timeout to prevent runaway queries (30 seconds)
- Documented all granted permissions in README
- Logged sample query outputs showing Kiro generating and executing SQL
Project 10: “The GitHub Project Manager” — Workflow Automation
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | JSON (Config) |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Workflow Automation |
What you’ll build: Integrate @modelcontextprotocol/server-github and generate issue summaries and PR drafts.
Why it teaches Tool Integration: Kiro bridges code and project management without context switching.
Success criteria:
- Kiro can summarize open issues with a label filter.
Real World Outcome
You’ll configure the GitHub MCP server so Kiro can list issues, search PRs, read file contents, and draft pull request descriptions—all without leaving the terminal. This turns Kiro into a project manager that understands both your code and your issue tracker.
Configuration (.kiro/settings/mcp.json):
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "ghp_your_personal_access_token_here"
}
}
}
}
When you ask Kiro to manage issues:
$ kiro chat
You: "Summarize all open bugs with the 'critical' label"
Kiro: [Tool: github/search_issues]
Query: repo:your-org/your-repo is:issue is:open label:critical
[Tool: github/get_issue] (for each result)
**Critical Bugs Summary** (3 open)
1. **Database connection pool exhausted** (#342)
- Opened: 2 days ago by @alice
- Last updated: 4 hours ago
- Symptom: Production app crashes after ~1000 concurrent users
- No PR linked yet
2. **Payment gateway timeout on Stripe webhook** (#338)
- Opened: 5 days ago by @bob
- PR #340 (in review)
- Root cause: Missing retry logic in webhook handler
3. **Data corruption in user preferences migration** (#335)
- Opened: 1 week ago by @charlie
- Assigned to: @dave
- Rollback already deployed, fix in progress
**Priority Order Recommendation:**
- Fix #342 first (production outage risk)
- Merge PR #340 for #338 (already has solution)
- #335 can wait (rollback mitigates)
You: "Draft a PR description for fixing #342"
Kiro: [Tool: github/get_issue] (reads #342 full details)
[Tool: github/search_code] (finds connection pool config)
**Pull Request Draft:**
---
**Title:** Fix database connection pool exhaustion under high load
**Fixes:** #342
**Problem:**
The application crashes when concurrent user count exceeds ~1,000 due to database connection pool exhaustion. Current pool size is hardcoded to 10 connections.
**Root Cause:**
`config/database.js` sets `max: 10` without considering production load. Under load testing, connection wait time exceeded timeout threshold.
**Solution:**
- Increase connection pool size to 50 (based on DB max_connections=100)
- Add connection timeout of 5 seconds with retry logic
- Implement connection pool monitoring via Prometheus metrics
**Changes:**
- `config/database.js` - Updated pool configuration
- `lib/metrics.js` - Added pool health gauges
- `tests/load/connection-pool.test.js` - Load test validation
**Testing:**
- Load test with 2,000 concurrent users (sustained 10 min)
- Verified connection pool metrics in Grafana
- No connection timeouts observed
**Checklist:**
- [ ] Updated documentation (deployment guide)
- [ ] Added monitoring alerts for pool >80% utilization
- [ ] Tested rollback scenario
---
You: "Create the PR on branch 'fix/db-pool' targeting 'main'"
Kiro: [Tool: github/create_pull_request]
PR created: https://github.com/your-org/your-repo/pull/345
The power: Kiro read issue history, searched the codebase, and drafted a production-ready PR description without you switching between GitHub UI, terminal, and code editor.
The Core Question You’re Answering
“How do I integrate my issue tracker and code repository so I can manage projects conversationally without context switching?”
Traditional workflows require:
- Open GitHub in browser
- Read issue
- Switch to terminal
- Grep codebase for relevant files
- Switch back to browser
- Draft PR description (copying from terminal)
This project eliminates the context-switching tax by giving Kiro unified access to both code and project management.
Concepts You Must Understand First
Stop and research these before coding:
- GitHub Personal Access Tokens (PAT)
- What scopes does the GitHub MCP server require (
repo,read:org)? - How do you create a fine-grained token with minimal permissions?
- Where should you store tokens securely (never in git)?
- Web Reference: GitHub - Creating a Personal Access Token
- What scopes does the GitHub MCP server require (
- GitHub Search Syntax
- How do you filter issues by label, state, author, and date?
- What’s the difference between searching issues vs code?
- How do you use qualifiers like
is:pr,is:open,label:bug? - Web Reference: GitHub - Searching Issues and Pull Requests
- MCP Tools vs Resources
- What’s the difference between calling a tool and loading a resource?
- How do MCP tools return structured data (JSON schemas)?
- How do you handle pagination in GitHub API responses?
- Web Reference: Model Context Protocol - Tools
Questions to Guide Your Design
Before implementing, think through these:
- Token Permissions
- Should you use a classic PAT or fine-grained token?
- What’s the minimum scope needed (read-only issues, or also write PRs)?
- How do you rotate tokens without breaking the MCP server?
- Should you use organization-level tokens or personal tokens?
- Workflow Automation
- What tasks should Kiro handle automatically (summarize) vs require approval (create PR)?
- How do you prevent Kiro from creating duplicate issues?
- Should Kiro be able to close issues, or only comment?
- How do you audit all actions Kiro takes on GitHub?
- Context Boundaries
- How much issue history should Kiro load (all comments, or just descriptions)?
- Should Kiro analyze linked PRs and commits?
- How do you prevent Kiro from leaking private repo data in logs?
Thinking Exercise
Scenario: Triaging a Bug Report
Given this GitHub issue:
Title: App crashes on iOS 14 when uploading photos
Body:
Steps to reproduce:
1. Open app on iPhone 8 (iOS 14.8)
2. Tap "Upload Photo"
3. Select image from camera roll
4. App crashes immediately
Expected: Photo uploads successfully
Actual: App crashes, no error message
Environment:
- iPhone 8, iOS 14.8
- App version 2.3.1
Questions Kiro should investigate:
- Search for similar issues:
- Query:
repo:your-org/your-app is:issue label:ios "upload" "crash" - Are there duplicates or related bugs?
- Query:
- Analyze codebase:
- Where is photo upload logic? (
grep -r "upload.*photo" src/) - What changed in version 2.3.1? (git diff v2.3.0..v2.3.1)
- Where is photo upload logic? (
- Check iOS version compatibility:
- Does the app support iOS 14? (check package.json or podfile)
- Were there recent iOS SDK changes?
- Draft response:
- If it’s a duplicate: link to existing issue
- If it’s new: suggest debug steps (enable logging, check memory usage)
- If it’s a known limitation: explain iOS 14 support status
Kiro’s workflow should be:
- Search issues → 2. Search code → 3. Check git history → 4. Draft comment
The Interview Questions They’ll Ask
-
“How would you design an AI assistant that can triage GitHub issues without creating spam or incorrect closures?”
-
“Explain the security implications of granting an AI write access to your GitHub repository. What guardrails would you implement?”
-
“How do you prevent an AI from leaking private repository information when generating responses?”
-
“Describe how you would implement approval workflows for AI-generated pull requests in a team environment.”
-
“What strategies would you use to keep GitHub integration working across API version changes and rate limits?”
Hints in Layers
Hint 1: Create a GitHub Personal Access Token
- Go to GitHub Settings → Developer settings → Personal access tokens → Fine-grained tokens
- Set repository access (all repos or specific ones)
- Grant permissions:
Contents: Read,Issues: Read/Write,Pull Requests: Read/Write - Copy token and store in
.envfile (never commit to git)
Hint 2: Configure MCP Server
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "${GITHUB_TOKEN}" // Loaded from environment
}
}
}
}
Hint 3: Test GitHub Integration
kiro chat
You: "List the 5 most recently updated issues in my repo"
# Kiro should call: github/search_issues
# You should see a list of issues with titles, numbers, and states
Hint 4: Draft a PR Description Template
Create a steering file (.kiro/steering/github-pr-template.md):
# Pull Request Template
Always include these sections when drafting PRs:
- **Problem:** What issue does this solve?
- **Solution:** High-level approach
- **Changes:** Files modified
- **Testing:** How to verify
- **Checklist:** Deployment steps
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| GitHub Workflow | “Pro Git” by Scott Chacon | Ch. 6 (GitHub) |
| API Integration | “Designing Data-Intensive Applications” by Martin Kleppmann | Ch. 4 (Encoding) |
| Automation Best Practices | “The DevOps Handbook” by Gene Kim et al. | Part III (Flow) |
Common Pitfalls & Debugging
Problem 1: “MCP server fails with ‘Bad credentials’“
- Why: GitHub token is invalid or expired
- Fix: Regenerate token with correct scopes, update
.envfile - Quick test:
curl -H "Authorization: token YOUR_TOKEN" https://api.github.com/user
Problem 2: “Kiro can read issues but cannot create PRs”
- Why: Token lacks
Pull Requests: Writepermission - Fix: Edit token permissions in GitHub settings
- Quick test:
gh pr create --title "Test" --body "Test"(using GitHub CLI)
Problem 3: “Rate limit exceeded errors”
- Why: GitHub API has rate limits (5,000 requests/hour for authenticated)
- Fix: Implement caching or reduce query frequency
- Quick test: Check remaining quota:
curl -H "Authorization: token YOUR_TOKEN" https://api.github.com/rate_limit
Problem 4: “Kiro creates PR with wrong base branch”
- Why: Default branch is not ‘main’ in your repo (might be ‘master’ or ‘develop’)
- Fix: Specify base branch explicitly in steering file or prompt
- Quick test:
git remote show origin | grep "HEAD branch"
Definition of Done
- Created GitHub Personal Access Token with appropriate scopes
- Configured
github-mcpin.kiro/settings/mcp.jsonwith secure token - Verified Kiro can list and search issues/PRs successfully
- Kiro can read file contents from repository
- Drafted at least one PR description using issue context + codebase search
- Tested that token has minimal required permissions (no admin access)
- Created steering file with PR description template
- Documented all GitHub integration workflows in README
Project 11: “The AWS Cloud Architect” — Cloud Infrastructure
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Terraform / AWS CLI |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Cloud Infrastructure |
What you’ll build: Use AWS MCP to audit S3 buckets and generate Terraform fixes.
Why it teaches Cloud Ops: Kiro becomes an infra reviewer with live state access.
Success criteria:
- Generated Terraform closes at least one real misconfiguration.
Real World Outcome
You’ll configure the AWS MCP server so Kiro can audit your cloud infrastructure, identify security misconfigurations (public S3 buckets, overly permissive IAM policies), and generate Terraform code to remediate them—turning Kiro into an automated security auditor.
Configuration (.kiro/settings/mcp.json):
{
"mcpServers": {
"aws": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-aws"],
"env": {
"AWS_PROFILE": "production-readonly",
"AWS_REGION": "us-east-1"
}
}
}
}
When you ask Kiro to audit infrastructure:
$ kiro chat
You: "Audit all S3 buckets for public access and generate Terraform fixes"
Kiro: [Tool: aws/list_s3_buckets]
Found 47 S3 buckets. Checking public access settings...
[Tool: aws/get_bucket_policy] (for each bucket)
[Tool: aws/get_bucket_acl]
**Security Audit Results:**
🚨 **Critical Issues Found (3 buckets):**
1. **app-logs-production** (public read access)
- Bucket Policy: Allows s3:GetObject for Principal: "*"
- Risk: Sensitive application logs exposed to internet
- Created: 2023-05-12
- Size: 2.4 TB
2. **user-uploads-staging** (public write access)
- ACL: Grant WRITE to AllUsers
- Risk: Anyone can upload files (potential malware hosting)
- Created: 2024-08-03
- Size: 340 GB
3. **backup-configs** (public list access)
- Bucket Policy: Allows s3:ListBucket for Principal: "*"
- Risk: Attackers can enumerate backup file names
- Created: 2022-11-20
- Size: 56 GB
**Generating Terraform Fixes...**
Created: `terraform/s3-security-fixes.tf`
```hcl
# Fix for app-logs-production: Remove public access
resource "aws_s3_bucket_public_access_block" "app_logs_production" {
bucket = "app-logs-production"
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_policy" "app_logs_production" {
bucket = "app-logs-production"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "DenyPublicAccess"
Effect = "Deny"
Principal = "*"
Action = "s3:GetObject"
Resource = "arn:aws:s3:::app-logs-production/*"
Condition = {
StringNotEquals = {
"aws:PrincipalOrgID" = "o-abc12345" # Your AWS Organization ID
}
}
}
]
})
}
# Fix for user-uploads-staging: Remove public ACL
resource "aws_s3_bucket_acl" "user_uploads_staging" {
bucket = "user-uploads-staging"
acl = "private"
}
# Fix for backup-configs: Block all public access
resource "aws_s3_bucket_public_access_block" "backup_configs" {
bucket = "backup-configs"
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
Apply with:
cd terraform/
terraform plan -out=security-fixes.tfplan
terraform apply security-fixes.tfplan
Estimated Impact:
- Fixes 3 critical security vulnerabilities
- Protects 2.8 TB of sensitive data
- Prevents potential compliance violations (GDPR, SOC 2) ```
The power: Kiro queried live AWS state, identified real security issues, and generated production-ready Terraform code to fix them.
The Core Question You’re Answering
“How can I continuously audit cloud infrastructure for security misconfigurations without manually reviewing every resource?”
Traditional cloud auditing requires:
- Manually checking AWS Console
- Writing custom scripts to query APIs
- Remembering which settings to check
- Manually drafting Terraform fixes
This project automates the entire workflow: Kiro becomes your infrastructure security engineer that never sleeps.
Concepts You Must Understand First
Stop and research these before coding:
- AWS IAM Policies and Permissions
- What’s the difference between identity-based and resource-based policies?
- How do you create a read-only IAM role for infrastructure auditing?
- What permissions are needed to list and describe S3 buckets?
- Book Reference: “AWS Security” by Dylan Shield - Ch. 2
- S3 Security Model
- What’s the difference between Bucket Policies, ACLs, and Public Access Blocks?
- How do you check if a bucket is publicly accessible?
- What are the risks of public S3 buckets (data leaks, malware hosting)?
- Web Reference: AWS S3 Security Best Practices
- Terraform State Management
- How do you import existing AWS resources into Terraform state?
- What’s the difference between
terraform planandterraform apply? - How do you manage Terraform state for team collaboration (remote backends)?
- Book Reference: “Terraform: Up & Running” by Yevgeniy Brikman - Ch. 3
Questions to Guide Your Design
Before implementing, think through these:
- Audit Scope
- Should Kiro audit all AWS regions, or just production regions?
- What resources should be audited (S3, IAM, EC2 security groups)?
- How do you prevent Kiro from modifying critical infrastructure?
- Should audits run on-demand or scheduled (cron)?
- Security Boundaries
- What IAM permissions should the AWS MCP server have (read-only)?
- How do you prevent Kiro from accidentally deleting resources?
- Should Kiro be able to apply Terraform changes, or just generate code?
- How do you audit Kiro’s own actions (CloudTrail logs)?
- Terraform Generation
- Should generated Terraform use modules or raw resources?
- How do you handle resources not managed by Terraform (manual imports)?
- What naming conventions for generated Terraform files?
- Should Kiro check if a resource is already in Terraform state before generating?
Thinking Exercise
Scenario: Multi-Account Audit
Given this AWS organization structure:
Organization (o-abc12345)
├── Production Account (123456789012)
│ ├── S3 buckets (50)
│ ├── IAM roles (200)
│ └── Security groups (300)
├── Staging Account (234567890123)
│ ├── S3 buckets (30)
│ └── IAM roles (100)
└── Development Account (345678901234)
├── S3 buckets (20)
└── IAM roles (50)
Questions to consider:
Cross-Account Access:
- How do you configure Kiro to assume roles in different accounts?
- Should you use AWS SSO, or cross-account IAM roles?
- What’s the minimum permission set for read-only auditing?
Audit Strategy:
# Option 1: Sequential per account
for account in prod staging dev; do
kiro chat --prompt "Audit S3 in $account account"
done
# Option 2: Parallel with subagents
kiro chat --prompt "Audit S3 across all accounts in parallel"
Terraform State Organization:
- One Terraform state per account, or unified state?
- How do you prevent state conflicts when fixing the same resource type across accounts?
The Interview Questions They’ll Ask
-
“How would you design an automated cloud security auditing system that scales across hundreds of AWS accounts?”
-
“Explain the security implications of giving an AI access to your AWS environment. What guardrails would you implement?”
-
“How would you prevent an AI from generating Terraform code that accidentally deletes critical infrastructure?”
-
“Describe the difference between Terraform plan and apply, and why automation should never skip the plan step.”
-
“How would you implement approval workflows for AI-generated infrastructure changes in a regulated industry?”
-
“What strategies would you use to handle Terraform drift between generated code and manually modified resources?”
Hints in Layers
Hint 1: Create Read-Only AWS IAM Role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketPolicy",
"s3:GetBucketAcl",
"s3:GetBucketPublicAccessBlock",
"iam:ListRoles",
"iam:GetRole",
"iam:GetRolePolicy"
],
"Resource": "*"
}
]
}
Hint 2: Configure AWS Credentials
# ~/.aws/config
[profile production-readonly]
role_arn = arn:aws:iam::123456789012:role/KiroAuditor
source_profile = default
region = us-east-1
Hint 3: Test AWS MCP Server
kiro chat
You: "List all S3 buckets in the current AWS account"
# Should return:
# - bucket-1
# - bucket-2
# - ...
Hint 4: Generate Terraform Incrementally
Create a steering file (.kiro/steering/terraform-standards.md):
# Terraform Code Generation Standards
When generating Terraform:
1. Use `aws_s3_bucket_public_access_block` for blocking public access
2. Always include `depends_on` for resource dependencies
3. Add comments explaining WHY each resource is needed
4. Use variables for account IDs and region names
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| AWS Security | “AWS Security” by Dylan Shield | Ch. 2-4 (IAM, S3 Security) |
| Terraform | “Terraform: Up & Running” by Yevgeniy Brikman | Ch. 3-5 |
| Infrastructure as Code | “Infrastructure as Code” by Kief Morris | Ch. 7 (Security) |
| Cloud Security | “Cloud Security and Privacy” by Tim Mather et al. | Ch. 4 (Storage Security) |
Common Pitfalls & Debugging
Problem 1: “AWS MCP server fails with ‘AccessDenied’“
- Why: IAM role lacks required permissions
- Fix: Add
s3:ListAllMyBucketsto the role policy - Quick test:
aws s3 ls --profile production-readonly
Problem 2: “Terraform plan shows unrelated changes”
- Why: Kiro is unaware of existing Terraform state
- Fix: Run
terraform importfor existing resources before generating fixes - Quick test:
terraform state listshows imported resources
Problem 3: “Generated Terraform uses hardcoded account IDs”
- Why: Kiro doesn’t know to parameterize account-specific values
- Fix: Add steering rule: “Use
data.aws_caller_identity.current.account_id” - Quick test: Generated code includes
data "aws_caller_identity"
Problem 4: “Kiro recommends changes that would break production”
- Why: Kiro doesn’t understand resource dependencies
- Fix: Use read-only mode, require manual review before applying
- Quick test: Only generate code, never auto-apply
Definition of Done
- Created read-only AWS IAM role with S3 and IAM describe permissions
- Configured AWS MCP server in
.kiro/settings/mcp.jsonwith correct profile - Verified Kiro can list S3 buckets and read bucket policies
- Kiro identified at least one real security misconfiguration
- Generated Terraform code that passes
terraform validate - Terraform plan shows expected changes (no unintended modifications)
- Documented the audit workflow in README
- Created steering file with Terraform generation standards
Project 12: “The Documentation Librarian” — RAG
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Python (mcp-server-rag) |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | RAG (Retrieval Augmented Generation) |
What you’ll build: A custom MCP server that indexes PDFs or internal docs for question answering.
Why it teaches RAG: You extend Kiro with private knowledge beyond its training data.
Success criteria:
- Kiro answers a question that is only in your private docs.
Real World Outcome
You’ll build a custom MCP server that indexes your internal documentation (PDFs, Markdown, Confluence pages) using embeddings, then lets Kiro search and answer questions from this private knowledge base—extending Kiro’s knowledge beyond its training data.
Example: Indexing Internal Documentation
$ python docs-mcp-server.py index ./company-docs/
Indexing documentation...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📄 engineering-handbook.pdf (142 pages) ✓ Embedded
📄 api-reference.md (2,340 lines) ✓ Embedded
📄 deployment-guide.pdf (68 pages) ✓ Embedded
📄 security-policies.md (890 lines) ✓ Embedded
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total chunks: 1,847
Vector DB: ./company-docs/.vectordb/
Ready for queries ✓
Configuration (.kiro/settings/mcp.json):
{
"mcpServers": {
"docs": {
"command": "python3",
"args": ["./docs-mcp-server.py", "serve"],
"env": {
"VECTOR_DB_PATH": "./company-docs/.vectordb",
"EMBEDDING_MODEL": "text-embedding-3-small"
}
}
}
}
When you ask Kiro questions from internal docs:
$ kiro chat
You: "What's our process for deploying to production?"
Kiro: [Tool: docs/search_knowledge_base]
Query: "production deployment process"
Top 3 results:
[Tool: docs/retrieve_context]
Based on the Engineering Handbook (Section 7.2) and Deployment Guide (Chapter 3):
**Production Deployment Process:**
1. **Pre-Deployment Checklist** (Eng Handbook p.67)
- [ ] PR approved by 2+ engineers
- [ ] All tests passing in CI
- [ ] Database migration tested in staging
- [ ] Feature flags configured
2. **Deployment Window** (Deployment Guide p.24)
- Scheduled: Tuesday/Thursday 10 AM PST
- Emergency: Requires VP Engineering approval
- Avoid: Friday afternoon, holiday weeks
3. **Rollout Strategy** (Deployment Guide p.31)
- Blue-green deployment with AWS ECS
- 10% traffic → 50% → 100% over 2 hours
- Automatic rollback if error rate >0.1%
4. **Post-Deployment** (Eng Handbook p.71)
- Monitor DataDog for 30 minutes
- Check Sentry for new error spikes
- Notify #production-deploys Slack channel
**Emergency Rollback:**
```bash
./scripts/rollback.sh --environment production --revision previous
Sources:
- Engineering Handbook (2024-Q4), Section 7.2
- Deployment Guide v3.1, Chapters 3-4
You: “What are the approved third-party libraries for authentication?”
Kiro: [Tool: docs/search_knowledge_base] Query: “approved authentication libraries security policy”
According to Security Policies (Section 4.5 “Approved Dependencies”):
Authentication Libraries:
✅ Approved:
passportv0.7+ (Node.js) - OAuth, JWT, SAMLauthlibv1.2+ (Python) - OAuth 2.0, OpenID Connectspring-securityv6.1+ (Java) - Enterprise SSO
❌ Prohibited:
- Custom-built JWT libraries (use approved only)
- Deprecated:
oauth2-server(EOL 2023) - Unvetted: Any library not on approved list
Approval Process: (Security Policies p.89)
- Submit request to #security-review
- Security team evaluates: License, maintenance, CVE history
- Approval SLA: 5 business days
Current Exception: (Added 2024-12-01)
next-authv5.0+ approved for internal tools only (not customer-facing)
Source: Security Policies v2.3, Section 4.5
**The power:** Kiro accessed information that exists ONLY in your internal documentation, not in its training data. It cited exact page numbers and versions.
---
#### The Core Question You're Answering
> "How can I give an AI access to my company's internal knowledge base without uploading sensitive documents to third-party services?"
Traditional solutions have problems:
- **Shared drives:** Documents exist but aren't searchable conversationally
- **Confluence:** Requires manual searching and reading
- **Uploading to ChatGPT:** Violates data privacy policies
This project solves it: **Build a local RAG system that keeps your data on-premises while enabling AI-powered search**.
---
#### Concepts You Must Understand First
**Stop and research these before coding:**
1. **Embeddings and Vector Similarity**
- What is an embedding (vector representation of text)?
- How do you measure similarity between vectors (cosine similarity, dot product)?
- Why are embeddings better than keyword search for semantic matching?
- *Book Reference:* "Speech and Language Processing" by Jurafsky & Martin - Ch. 6
2. **Vector Databases**
- What's the difference between traditional databases and vector databases?
- How do vector indexes work (HNSW, IVF)?
- When do you use in-memory (FAISS) vs persistent (Chroma, Pinecone)?
- *Web Reference:* [Pinecone - What is a Vector Database?](https://www.pinecone.io/learn/vector-database/)
3. **Retrieval Augmented Generation (RAG)**
- What's the difference between RAG and fine-tuning?
- How do you chunk documents for optimal retrieval (size, overlap)?
- What's the tradeoff between context window size and retrieval accuracy?
- *Web Reference:* [LangChain RAG Documentation](https://python.langchain.com/docs/use_cases/question_answering/)
---
#### Questions to Guide Your Design
**Before implementing, think through these:**
1. **Document Processing**
- How do you extract text from PDFs (PyPDF2, pdfplumber, or OCR)?
- Should you split documents by page, paragraph, or semantic chunks?
- How do you preserve metadata (source file, page number, section heading)?
- What happens with images or tables in documents?
2. **Chunking Strategy**
- What chunk size optimizes retrieval (512 tokens, 1000 tokens)?
- Should chunks overlap to avoid splitting important context?
- How do you handle code blocks vs prose (different chunking strategies)?
- Should you create multiple chunk sizes for different query types?
3. **Embedding and Retrieval**
- Which embedding model (OpenAI, Sentence Transformers, local models)?
- How many top-k results to retrieve (3, 5, 10)?
- Should you re-rank results after initial retrieval?
- How do you handle queries that don't match any documents?
---
#### Thinking Exercise
### Scenario: Chunking Strategy
Given this internal documentation snippet:
```markdown
# Deployment Guide
## Chapter 3: Production Deploys
### 3.1 Pre-Deployment Checklist
Before deploying to production, verify:
1. All tests pass in CI/CD pipeline
2. Database migrations tested in staging
3. Feature flags configured
### 3.2 Deployment Window
Production deploys occur:
- Scheduled: Tuesday/Thursday 10 AM PST
- Emergency: Requires VP approval
### 3.3 Rollout Strategy
We use blue-green deployment:
1. Deploy to blue environment
2. Route 10% traffic
3. Monitor for 30 minutes
4. Gradually increase to 100%
Questions while designing chunking:
Option 1: By Section (Heading-Based)
Chunk 1: "Chapter 3: Production Deploys ... 3.1 Pre-Deployment Checklist ... verify: 1. All tests..."
Chunk 2: "3.2 Deployment Window ... Production deploys occur: ..."
Chunk 3: "3.3 Rollout Strategy ... We use blue-green deployment: ..."
Option 2: Fixed Token Size (500 tokens)
Chunk 1: "Chapter 3... 3.1 Pre-Deployment... 3.2 Deployment Window... (cut mid-section)"
Chunk 2: "...Window ... Tuesday/Thursday... 3.3 Rollout... blue-green deployment..."
Option 3: Semantic (Paragraph-Based with Overlap)
Chunk 1: "Chapter 3... 3.1 Pre-Deployment Checklist... verify: 1. All tests..."
Chunk 2: "3.1 (last paragraph)... 3.2 Deployment Window... Scheduled: Tuesday..."
Chunk 3: "3.2 (last paragraph)... 3.3 Rollout Strategy... blue-green..."
Which is best for this query: “What days can I deploy to production?”
- Option 1 ✅ - Section 3.2 is intact
- Option 2 ❌ - Answer split across chunks
- Option 3 ✅ - Overlap ensures context
The Interview Questions They’ll Ask
-
“Explain the difference between RAG and fine-tuning for extending an LLM’s knowledge. When would you use each?”
-
“How would you design a document chunking strategy that balances retrieval accuracy with context preservation?”
-
“What strategies would you use to prevent RAG systems from generating answers based on outdated documentation?”
-
“Describe how you would handle multi-lingual documentation in a RAG system.”
-
“How would you implement access control so users can only retrieve documents they’re authorized to see?”
-
“What approaches would you use to evaluate RAG system quality (precision, recall, answer quality)?”
Hints in Layers
Hint 1: Document Ingestion Pipeline
# docs-mcp-server.py
from langchain.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
def index_documents(docs_path):
# Load all PDFs and Markdown files
loaders = [
PyPDFLoader(f) for f in glob("**/*.pdf")
] + [
TextLoader(f) for f in glob("**/*.md")
]
docs = []
for loader in loaders:
docs.extend(loader.load())
# Split into chunks with overlap
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = splitter.split_documents(docs)
# Create embeddings and store in vector DB
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(chunks, embeddings, persist_directory=".vectordb")
return vectordb
Hint 2: MCP Server Query Tool
@mcp_server.tool("search_knowledge_base")
def search_docs(query: str, top_k: int = 3):
"""Search internal documentation"""
vectordb = Chroma(persist_directory=".vectordb", embedding_function=OpenAIEmbeddings())
results = vectordb.similarity_search_with_score(query, k=top_k)
return [
{
"content": doc.page_content,
"source": doc.metadata["source"],
"page": doc.metadata.get("page", "N/A"),
"score": score
}
for doc, score in results
]
Hint 3: Metadata Preservation When loading documents, preserve source information:
doc.metadata = {
"source": filename,
"page": page_num,
"section": heading,
"last_modified": file_mtime
}
Hint 4: Hybrid Search (Keyword + Semantic) Combine vector similarity with keyword matching for better results:
# Get semantic matches
vector_results = vectordb.similarity_search(query, k=10)
# Get keyword matches (BM25)
keyword_results = bm25.get_top_n(query, documents, n=10)
# Merge and re-rank (Reciprocal Rank Fusion)
final_results = reciprocal_rank_fusion([vector_results, keyword_results])
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Embeddings | “Speech and Language Processing” by Jurafsky & Martin | Ch. 6 (Vector Semantics) |
| Information Retrieval | “Introduction to Information Retrieval” by Manning et al. | Ch. 6 (Scoring, Term Weighting) |
| RAG Systems | “Building LLM Applications” by Damian Fanton | Ch. 4 (Retrieval) |
| Vector Databases | “Designing Data-Intensive Applications” by Martin Kleppmann | Ch. 3 (Storage Engines) |
Common Pitfalls & Debugging
Problem 1: “Embeddings fail with ‘token limit exceeded’“
- Why: Document chunks are too large (>8,191 tokens for text-embedding-3)
- Fix: Reduce chunk_size to 1000 tokens or use recursive splitting
- Quick test:
len(tiktoken.encode(chunk))should be <8000
Problem 2: “RAG returns irrelevant results”
- Why: Query and document embeddings use different semantic spaces
- Fix: Use query expansion or rewrite user questions before embedding
- Quick test: Manually inspect top-k results for relevance
Problem 3: “Kiro hallucinates information not in documents”
- Why: LLM fills in gaps when retrieved context is incomplete
- Fix: Add system prompt: “Only answer from provided context. Say ‘I don’t know’ if information isn’t in the documents.”
- Quick test: Ask a question you know isn’t in the docs
Problem 4: “Vector DB queries are slow (>5 seconds)”
- Why: No index optimization (brute-force search)
- Fix: Use HNSW index in FAISS or enable indexing in Chroma
- Quick test: Time queries:
time vectordb.similarity_search(query, k=5)
Definition of Done
- Indexed at least 3 internal documents (PDFs or Markdown)
- MCP server exposes
search_knowledge_basetool - Kiro successfully answers a question only present in indexed docs
- Chunks preserve metadata (source file, page number)
- Retrieved results include relevance scores
- System prompt prevents hallucination beyond retrieved context
- Documented chunking strategy and embedding model in README
- Tested with queries that have no matching documents (graceful “I don’t know”)
Project 13: “The Custom Tool Builder (Python)” — MCP Protocol Implementation
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Python |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | 5. Industry Disruptor (Ecosystem) |
| Difficulty | Level 3: Advanced |
| Knowledge Area | MCP Protocol Implementation |
What you’ll build: A custom MCP server exposing fetch_stock_price(ticker).
Why it teaches Protocol: You learn MCP as JSON-RPC over stdio.
Success criteria:
- Kiro calls your tool and parses real output.
Real World Outcome
You’ll have a working MCP server in Python that Kiro CLI can discover and use. When you configure it in your Kiro settings and ask “What’s the current price of AAPL?”, the following happens:
1. MCP Server Registration (in ~/.config/kiro/settings.json):
{
"mcpServers": {
"stock-prices": {
"command": "python3",
"args": ["/path/to/stock_mcp_server.py"],
"env": {
"STOCK_API_KEY": "your_api_key_here"
}
}
}
}
2. Kiro CLI Session:
$ kiro
You: What's the current price of AAPL?
[Tool Call] stock-prices.fetch_stock_price(ticker="AAPL")
Tool Response:
{
"ticker": "AAPL",
"price": 178.42,
"currency": "USD",
"timestamp": "2025-01-02T14:32:00Z",
"change": +2.15,
"change_percent": +1.22
}
Kiro: Apple (AAPL) is currently trading at $178.42 USD, up $2.15 (+1.22%) from the previous close.
3. Server Logs (in your terminal running the MCP server):
[2025-01-02 14:32:00] INFO: MCP Server started on stdio
[2025-01-02 14:32:00] INFO: Registered tool: fetch_stock_price
[2025-01-02 14:32:15] INFO: Received JSON-RPC request: initialize
[2025-01-02 14:32:15] INFO: Sent capabilities: {tools: 1}
[2025-01-02 14:32:18] INFO: Tool called: fetch_stock_price(ticker="AAPL")
[2025-01-02 14:32:19] INFO: Fetching from Alpha Vantage API...
[2025-01-02 14:32:20] INFO: Response sent: {"price": 178.42, ...}
You’re seeing:
- The MCP protocol handshake (initialize request/response)
- Tool discovery (Kiro learns about
fetch_stock_price) - JSON-RPC call serialization
- Real-time API integration
- Structured data flowing back to the LLM
This is the same pattern used by production MCP servers like @modelcontextprotocol/server-postgres, @modelcontextprotocol/server-github, and custom enterprise tools.
The Core Question You’re Answering
“How do I extend an AI coding agent with custom capabilities that go beyond its built-in tools?”
Before you write any code, think about this: LLMs are powerful, but they’re fundamentally text generators. They can’t fetch real-time stock prices, query proprietary databases, or interact with internal APIs. MCP bridges this gap by giving you a protocol to expose custom functionality as “tools” the LLM can call.
This project forces you to understand:
- The JSON-RPC protocol - How requests and responses are structured
- Stdio transport - Why MCP uses stdin/stdout instead of HTTP
- Tool schemas - How to declare parameters and return types
- Error handling - What happens when your tool fails
- State management - How to maintain connection state across calls
By the end, you’ll see that MCP is just structured conversation: the LLM sends JSON requests, your server sends JSON responses, and Kiro orchestrates the exchange.
Concepts You Must Understand First
Stop and research these before coding:
- JSON-RPC 2.0 Protocol
- What are the required fields in a JSON-RPC request? (
jsonrpc,method,params,id) - How do you distinguish a request from a response?
- What’s the difference between a notification and a request?
- How are errors represented in JSON-RPC?
- Book Reference: “Computer Networks, Fifth Edition” by Tanenbaum - Ch. 9 (Application Layer)
- What are the required fields in a JSON-RPC request? (
- Stdio vs HTTP Transport
- Why does MCP use stdin/stdout instead of HTTP endpoints?
- How do you read JSON from stdin in Python without blocking?
- What’s the difference between line-buffered and unbuffered I/O?
- How do parent processes communicate with child processes?
- Book Reference: “Advanced Programming in the UNIX Environment, Third Edition” by Stevens - Ch. 15 (IPC)
- Tool Schema Design
- How do you specify parameter types for an LLM? (JSON Schema)
- What’s the difference between required and optional parameters?
- How do you document what a tool does so the LLM uses it correctly?
- Why is return type structure important for LLM reasoning?
- Book Reference: “REST API Design Rulebook” by Mark Massé - Ch. 4 (Metadata Design)
- Python Async I/O
- What’s the difference between
sys.stdin.read()andsys.stdin.readline()? - How do you flush stdout to ensure messages are sent immediately?
- Why might buffered output cause MCP protocol failures?
- How do you handle SIGTERM gracefully?
- What’s the difference between
Questions to Guide Your Design
Before implementing, think through these:
- Tool Registration
- How will you declare the
fetch_stock_pricetool to Kiro? - What parameters does it need? (Just
ticker, or alsodate,interval?) - What should the return schema look like for maximum LLM usefulness?
- Should errors be returned as exceptions or structured error objects?
- How will you declare the
- API Integration
- Which stock API will you use? (Alpha Vantage, Finnhub, Yahoo Finance?)
- How will you handle API rate limits?
- What happens if the API is down or slow?
- Should you cache responses to avoid redundant calls?
- Error Handling
- What if the ticker symbol is invalid? (Return error or null?)
- What if the API key is missing or expired?
- What if the network request times out?
- How will you communicate these errors to the LLM clearly?
- Protocol Compliance
- How will you implement the MCP handshake (
initializerequest)? - What capabilities will you advertise? (Just
tools, or alsoresources?) - How will you parse incoming JSON-RPC without crashing on malformed input?
- What logging will help you debug protocol issues?
- How will you implement the MCP handshake (
Thinking Exercise
Trace the MCP Handshake
Before coding, manually trace what happens when Kiro starts your MCP server:
Step 1: Kiro starts your server
$ python3 stock_mcp_server.py
Step 2: Kiro sends an initialize request via stdin:
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"clientInfo": {"name": "kiro", "version": "1.0.0"}
}
}
Step 3: Your server must respond with capabilities:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"protocolVersion": "2024-11-05",
"capabilities": {
"tools": {}
},
"serverInfo": {"name": "stock-prices", "version": "0.1.0"}
}
}
Step 4: Kiro sends a tools/list request:
{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list"
}
Step 5: Your server lists available tools:
{
"jsonrpc": "2.0",
"id": 2,
"result": {
"tools": [
{
"name": "fetch_stock_price",
"description": "Get real-time stock price for a ticker symbol",
"inputSchema": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "Stock ticker symbol (e.g., AAPL, GOOGL)"
}
},
"required": ["ticker"]
}
}
]
}
}
Questions while tracing:
- What happens if your server sends a response with the wrong
id? - Why must you flush stdout after each JSON response?
- What if Kiro sends a
tools/callbefore you’ve responded toinitialize? - How would you implement a timeout if the API takes too long?
The Interview Questions They’ll Ask
Prepare to answer these:
-
“Explain the difference between JSON-RPC and REST APIs. Why does MCP use JSON-RPC over stdio instead of HTTP?”
-
“Your MCP server is registered in Kiro settings, but the tool isn’t appearing. Walk me through your debugging process.”
-
“How would you handle API rate limits in an MCP server? Should you retry automatically or return an error to the LLM?”
-
“What happens if your MCP server crashes mid-conversation? How does Kiro detect this, and what’s the recovery process?”
-
“If you wanted to add authentication to your stock API calls (e.g., user-specific API keys), how would you design that in MCP?”
-
“Describe the lifecycle of an MCP server process. When is it started, and when is it terminated?”
-
“How would you test an MCP server without running Kiro? Can you simulate the protocol manually?”
Hints in Layers
Hint 1: Starting Point
Your MCP server is just a Python script that reads JSON from stdin and writes JSON to stdout. Start by implementing a simple echo server: read a line, parse it as JSON, send back a response with the same id. Once that works, add the MCP-specific methods (initialize, tools/list, tools/call).
Hint 2: Structure
Create a handle_request(request) function that dispatches based on request["method"]. Use a dictionary to map methods to handler functions:
handlers = {
"initialize": handle_initialize,
"tools/list": handle_tools_list,
"tools/call": handle_tools_call
}
Hint 3: JSON-RPC Response Format Every response must include:
"jsonrpc": "2.0""id": <same as request>- Either
"result": {...}for success or"error": {...}for failure
Always flush stdout after writing: sys.stdout.flush()
Hint 4: Tool Schema
For tools/list, return a list of tools with JSON Schema for parameters. The inputSchema must follow JSON Schema Draft 7:
{
"name": "fetch_stock_price",
"description": "Get current stock price",
"inputSchema": {
"type": "object",
"properties": {
"ticker": {"type": "string", "description": "e.g., AAPL"}
},
"required": ["ticker"]
}
}
Hint 5: API Integration
For the stock API, use requests with error handling:
try:
response = requests.get(api_url, params={"symbol": ticker}, timeout=5)
response.raise_for_status()
data = response.json()
except requests.exceptions.Timeout:
return {"error": "API timeout"}
except requests.exceptions.RequestException as e:
return {"error": f"API error: {str(e)}"}
Hint 6: Debugging Log everything to stderr (not stdout, which is used for protocol):
import sys
sys.stderr.write(f"[DEBUG] Received request: {request}\n")
sys.stderr.flush()
Run your server manually and paste JSON requests to test:
$ python3 stock_mcp_server.py
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| JSON-RPC Protocol | “Computer Networks, Fifth Edition” by Tanenbaum | Ch. 9 (RPC and Middleware) |
| Stdio/Process Communication | “Advanced Programming in the UNIX Environment” by Stevens | Ch. 15 (IPC) |
| JSON Schema | “REST API Design Rulebook” by Mark Massé | Ch. 4 (Metadata Design) |
| Python I/O | “Fluent Python, 2nd Edition” by Luciano Ramalho | Ch. 21 (Asynchronous Programming) |
| API Design | “REST API Design Rulebook” by Mark Massé | Ch. 2 (Identifier Design) |
Common Pitfalls & Debugging
Problem 1: “Kiro doesn’t see my MCP server”
- Why: The
commandpath insettings.jsonis incorrect, or Python isn’t in PATH - Fix: Use absolute paths:
"command": "/usr/bin/python3"and"args": ["/full/path/to/server.py"] - Quick test: Run the exact command manually:
/usr/bin/python3 /full/path/to/server.py
Problem 2: “Server starts but tool doesn’t appear”
- Why: You didn’t respond to
tools/listcorrectly, or JSON is malformed - Fix: Add logging to stderr and check that
tools/listreturns valid JSON Schema - Quick test: Pipe a
tools/listrequest manually:$ echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | python3 server.py
Problem 3: “Tool calls return no data”
- Why: You’re not flushing stdout, or the API call is failing silently
- Fix: Always
sys.stdout.flush()after writing JSON, and log API errors to stderr - Quick test: Add
sys.stderr.write(f"API response: {data}\n")before returning
Problem 4: “Server crashes on malformed JSON”
- Why: Kiro sent unexpected input, or your JSON parsing is brittle
- Fix: Wrap
json.loads()in try/except and return a JSON-RPC error:try: request = json.loads(line) except json.JSONDecodeError as e: error_response = { "jsonrpc": "2.0", "id": None, "error": {"code": -32700, "message": "Parse error"} } - Quick test: Send invalid JSON to your server and verify it doesn’t crash
Problem 5: “API rate limit exceeded”
- Why: You’re calling the API on every request without caching
- Fix: Implement a simple cache with TTL:
cache = {} def fetch_with_cache(ticker): if ticker in cache and time.time() - cache[ticker]["time"] < 60: return cache[ticker]["data"] data = fetch_from_api(ticker) cache[ticker] = {"data": data, "time": time.time()} return data - Quick test: Call the same ticker twice rapidly and verify only one API call is made
Definition of Done
- MCP server responds correctly to
initializerequest tools/listreturnsfetch_stock_pricewith valid JSON Schematools/callwith valid ticker returns real-time price data- Invalid ticker symbols return clear error messages (not crashes)
- Server logs all requests and responses to stderr for debugging
- Kiro CLI can successfully call the tool and display results
- API errors (timeout, rate limit) are handled gracefully
- Server can be stopped cleanly with Ctrl+C (no orphan processes)
- Code includes comments explaining the JSON-RPC protocol flow
- README.md documents how to configure the server in Kiro settings
Project 14: “The File System Guardian (Node.js)” — MCP Protocol Implementation
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | TypeScript / Node.js |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Difficulty | Level 3: Advanced |
| Knowledge Area | MCP Protocol Implementation |
What you’ll build: A sandboxed MCP server with high-level tools like scaffold_react_component(name, props).
Why it teaches Abstraction: You encode best practices into single, safe tools.
Success criteria:
- The tool generates consistent component structure every run.
Real World Outcome
You’ll have a TypeScript MCP server that provides high-level file system abstractions. Instead of letting the LLM call raw write_file repeatedly, you expose semantic tools like scaffold_react_component that encode your team’s best practices.
1. MCP Server Configuration (in ~/.config/kiro/settings.json):
{
"mcpServers": {
"fs-guardian": {
"command": "node",
"args": ["/path/to/fs-guardian/dist/index.js"],
"env": {
"WORKSPACE_ROOT": "/path/to/your/project",
"ALLOWED_DIRS": "src/components,src/lib,tests"
}
}
}
}
2. Kiro CLI Session:
$ kiro
You: Create a Button component that accepts label, onClick, and variant (primary/secondary) props
[Tool Call] fs-guardian.scaffold_react_component({
name: "Button",
props: ["label: string", "onClick: () => void", "variant: 'primary' | 'secondary'"]
})
Tool Response:
{
"success": true,
"files_created": [
"src/components/Button/Button.tsx",
"src/components/Button/Button.test.tsx",
"src/components/Button/Button.module.css",
"src/components/Button/index.ts"
],
"message": "React component 'Button' scaffolded successfully"
}
Kiro: I've created the Button component with the requested props. The component includes:
- TypeScript definition with proper prop types
- Jest unit test scaffolding
- CSS module for styling
- Barrel export in index.ts
All files are in src/components/Button/
3. Generated Files:
src/components/Button/Button.tsx:
import React from 'react';
import styles from './Button.module.css';
interface ButtonProps {
label: string;
onClick: () => void;
variant: 'primary' | 'secondary';
}
export const Button: React.FC<ButtonProps> = ({ label, onClick, variant }) => {
return (
<button
className={`${styles.button} ${styles[variant]}`}
onClick={onClick}
>
{label}
</button>
);
};
src/components/Button/Button.test.tsx:
import { render, screen, fireEvent } from '@testing-library/react';
import { Button } from './Button';
describe('Button', () => {
it('renders with label', () => {
render(<Button label="Click me" onClick={() => {}} variant="primary" />);
expect(screen.getByText('Click me')).toBeInTheDocument();
});
it('calls onClick when clicked', () => {
const handleClick = jest.fn();
render(<Button label="Click" onClick={handleClick} variant="primary" />);
fireEvent.click(screen.getByText('Click'));
expect(handleClick).toHaveBeenCalledTimes(1);
});
});
4. Server Logs:
[2025-01-02 15:45:00] INFO: FS Guardian MCP Server started
[2025-01-02 15:45:00] INFO: Workspace root: /path/to/your/project
[2025-01-02 15:45:00] INFO: Allowed directories: src/components, src/lib, tests
[2025-01-02 15:45:15] INFO: Tool called: scaffold_react_component
[2025-01-02 15:45:15] INFO: Validating path: src/components/Button
[2025-01-02 15:45:15] INFO: Creating directory: src/components/Button
[2025-01-02 15:45:15] INFO: Writing file: Button.tsx (142 lines)
[2025-01-02 15:45:15] INFO: Writing file: Button.test.tsx (87 lines)
[2025-01-02 15:45:15] INFO: Writing file: Button.module.css (24 lines)
[2025-01-02 15:45:15] INFO: Writing file: index.ts (1 line)
[2025-01-02 15:45:15] INFO: Success: 4 files created
What you’re seeing:
- Path sandboxing - Server refuses to write outside
ALLOWED_DIRS - Template generation - Consistent structure (component, test, styles, barrel export)
- Type safety - TypeScript interfaces generated from prop descriptions
- Best practices encoded - Testing setup, CSS modules, proper exports
- Atomic operations - All files created together or none at all
This pattern prevents the LLM from creating inconsistent file structures or writing to dangerous locations.
The Core Question You’re Answering
“How do I create safe, high-level abstractions over file system operations that encode organizational best practices?”
Think about the problem: If you give an LLM direct file system access via basic write_file tools, it might:
- Create components in random directories
- Forget to add tests
- Use inconsistent naming conventions
- Write to system directories (
/etc,/usr/bin) - Overwrite critical files
This project teaches you to:
- Build guardrails - Restrict operations to safe paths
- Encode patterns - Capture “how we do things here” in reusable tools
- Validate inputs - Ensure the LLM provides well-formed requests
- Provide feedback - Return structured results the LLM can reason about
- Make tools atomic - Either all files are created or none
By the end, you’ll understand how to transform low-level primitives into high-level, safe, team-specific abstractions.
Concepts You Must Understand First
Stop and research these before coding:
- Path Traversal Attacks
- What is a path traversal attack? (e.g.,
../../etc/passwd) - How do you validate that a path is within allowed directories?
- What’s the difference between relative and absolute paths in validation?
- How does
path.resolve()help prevent directory traversal? - Book Reference: “The Web Application Hacker’s Handbook” by Stuttard & Pinto - Ch. 10
- What is a path traversal attack? (e.g.,
- File System Atomicity
- What happens if your script crashes halfway through creating files?
- How do you ensure “all or nothing” behavior?
- What’s a transaction-like pattern for file operations?
- How would you implement rollback if one file write fails?
- Book Reference: “Operating Systems: Three Easy Pieces” - Ch. 40 (File System Implementation)
- Template Systems
- How do you generate code from templates without embedding business logic in strings?
- What’s the difference between string interpolation and proper templating?
- How do you ensure generated code is syntactically valid?
- Should you use a library (Handlebars, EJS) or custom logic?
- Book Reference: “Compilers: Principles and Practice” - Ch. 2 (Lexical Analysis)
- TypeScript MCP Server Structure
- How do you type MCP requests and responses in TypeScript?
- What’s the recommended way to handle stdio in Node.js? (
readline, streams) - How do you structure a TypeScript project for deployment?
- What build process converts TS to JS for distribution?
- Book Reference: “Programming TypeScript” by Boris Cherny - Ch. 10 (Modules)
Questions to Guide Your Design
Before implementing, think through these:
- Safety Boundaries
- Which directories should be allowed? (Only
src/? Or alsotests/,docs/?) - Should you allow overwriting existing files, or only create new ones?
- How will you communicate “permission denied” errors to the LLM clearly?
- Should you log all file operations for audit purposes?
- Which directories should be allowed? (Only
- Tool Granularity
- Should you create one tool per component type? (
scaffold_react_component,scaffold_vue_component) - Or one generic
scaffold_componentthat takesframeworkas a parameter? - How many templates do you need to support your team’s patterns?
- Should tests be optional or always included?
- Should you create one tool per component type? (
- Validation
- How will you parse the
propsparameter? (Array of strings? TypeScript syntax?) - What if the LLM provides invalid TypeScript type syntax?
- Should you validate component names against naming conventions? (PascalCase? No special chars?)
- How will you handle edge cases like empty prop lists?
- How will you parse the
- Error Reporting
- If a file already exists, return an error or auto-increment the name?
- If path validation fails, explain why in a way the LLM can fix?
- Should partial failures (3/4 files written) trigger a rollback?
- How much detail should error messages include?
Thinking Exercise
Design the Sandboxing Logic
Before coding, trace what happens when the LLM tries to exploit your server:
Attack 1: Path Traversal
scaffold_react_component({
name: "../../../../../../etc/SystemConfig"
})
Your validation logic:
- Resolve
src/components/../../../../../../etc/SystemConfigto absolute path - Check if resolved path starts with
WORKSPACE_ROOT + "/src/components" - If not, reject with error: “Path outside allowed directories”
Attack 2: Overwrite Critical Files
scaffold_react_component({
name: "../../../package.json"
})
Your validation logic:
- Resolve to
/path/to/project/package.json - Check against allowed directories
- Reject (even though
package.jsonexists, it’s not insrc/components)
Attack 3: Malformed Props
scaffold_react_component({
name: "Button",
props: ["onClick: () => { console.log('pwned'); return void; }"]
})
Your prop parser:
- Split
onClick: () => { console.log('pwned'); return void; }on: - Extract type:
() => { console.log('pwned'); return void; } - Validate it’s a valid TypeScript type (complex function types are allowed)
- Generate interface:
onClick: () => { console.log('pwned'); return void; }
Questions while designing:
- Should you sanitize prop types, or trust the LLM to provide valid TypeScript?
- What if the prop type includes backticks or quotes that break template strings?
- How do you detect if a path is absolute vs. relative?
- Should you allow symlinks, or only real directories?
The Interview Questions They’ll Ask
Prepare to answer these:
-
“Explain how path traversal attacks work. How does your MCP server prevent
../../etc/passwdfrom being written?” -
“Your server creates 4 files per component. What happens if the 3rd file write fails? How do you ensure atomicity?”
-
“How would you extend this server to support different component frameworks (React, Vue, Svelte) without duplicating code?”
-
“If the LLM provides a prop type like
data: Array<{id: number, nested: {value: string}}>, how do you parse and validate it?” -
“What’s the security difference between validating paths before resolution vs. after
path.resolve()?” -
“How would you implement rate limiting to prevent the LLM from creating 1000 components in a loop?”
-
“Describe how you’d test this MCP server. Can you unit test it without running Kiro?”
Hints in Layers
Hint 1: Starting Point
Start by implementing path validation. Create a validatePath(targetPath) function that:
- Resolves the path to absolute
- Checks if it starts with one of
ALLOWED_DIRS - Returns
{valid: true}or{valid: false, reason: "..."}
Test this with various malicious inputs before building the rest.
Hint 2: Atomicity Pattern Collect all file writes in an array, then execute them all at once:
const operations = [
{ path: 'Button.tsx', content: '...' },
{ path: 'Button.test.tsx', content: '...' },
{ path: 'Button.module.css', content: '...' },
{ path: 'index.ts', content: '...' }
];
try {
for (const op of operations) {
await fs.writeFile(op.path, op.content);
}
} catch (error) {
// Rollback: delete all files created so far
for (const op of operations) {
await fs.unlink(op.path).catch(() => {});
}
throw error;
}
Hint 3: Template Generation Use template literals with a helper function:
function generateComponent(name: string, props: Array<{name: string, type: string}>) {
const propsInterface = props.map(p => ` ${p.name}: ${p.type};`).join('\n');
const destructuredProps = props.map(p => p.name).join(', ');
return `import React from 'react';
import styles from './${name}.module.css';
interface ${name}Props {
${propsInterface}
}
export const ${name}: React.FC<${name}Props> = ({ ${destructuredProps} }) => {
return <div className={styles.${name.toLowerCase()}}>{/* TODO */}</div>;
};`;
}
Hint 4: Prop Parsing
Parse prop strings like "label: string" using regex:
function parseProp(propString: string): {name: string, type: string} {
const match = propString.match(/^(\w+):\s*(.+)$/);
if (!match) throw new Error(`Invalid prop: ${propString}`);
return { name: match[1], type: match[2].trim() };
}
Hint 5: Path Validation
Use path.resolve() and startsWith():
import path from 'path';
function isPathAllowed(targetPath: string): boolean {
const resolved = path.resolve(process.env.WORKSPACE_ROOT!, targetPath);
const allowedDirs = process.env.ALLOWED_DIRS!.split(',');
return allowedDirs.some(dir => {
const allowedPath = path.resolve(process.env.WORKSPACE_ROOT!, dir);
return resolved.startsWith(allowedPath);
});
}
Hint 6: MCP Protocol Handling Reuse the JSON-RPC handler pattern from Project 13:
async function handleToolCall(params: any) {
const { name, arguments: args } = params;
if (name === 'scaffold_react_component') {
return scaffoldReactComponent(args.name, args.props);
}
throw new Error(`Unknown tool: ${name}`);
}
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Path Traversal & Security | “The Web Application Hacker’s Handbook” by Stuttard | Ch. 10 (Attacking Back-End Components) |
| File System Operations | “Operating Systems: Three Easy Pieces” | Ch. 39-40 (Files and Directories) |
| Template Generation | “Compilers: Principles and Practice” | Ch. 2 (Lexical Analysis) |
| TypeScript Best Practices | “Programming TypeScript” by Boris Cherny | Ch. 10 (Modules and Namespaces) |
| Node.js Streams | “Node.js Design Patterns” by Mario Casciaro | Ch. 5 (Streams) |
Common Pitfalls & Debugging
Problem 1: “Path validation allows ../ escapes”
- Why: You’re validating the path string instead of the resolved absolute path
- Fix: Always use
path.resolve()before validation:const absolutePath = path.resolve(workspaceRoot, userInput); if (!absolutePath.startsWith(workspaceRoot)) { throw new Error("Path outside workspace"); } - Quick test: Try
validatePath("src/../../etc/passwd")and ensure it’s rejected
Problem 2: “File write succeeds but file is empty”
- Why: You forgot to
awaitthe write operation, or the content variable is undefined - Fix: Always
await fs.writeFile()and log the content length:console.error(`[DEBUG] Writing ${content.length} bytes to ${filePath}`); await fs.writeFile(filePath, content, 'utf-8'); - Quick test: Check file size after write:
ls -lh src/components/Button/Button.tsx
Problem 3: “TypeScript build fails with Cannot find module“
- Why: Your
tsconfig.jsondoesn’t include the MCP server files - Fix: Update
includeto cover all source files:{ "include": ["src/**/*", "index.ts"], "compilerOptions": { "outDir": "./dist", "rootDir": "./" } } - Quick test: Run
tsc --noEmitto check for errors without building
Problem 4: “Server creates files but Kiro sees ‘Tool call failed’“
- Why: You’re not returning a valid MCP response (missing
resultor wrong structure) - Fix: Ensure you return
{success: true, files_created: [...]}:return { content: [{ type: "text", text: JSON.stringify({ success: true, files_created: filePaths, message: `Component '${name}' scaffolded successfully` }) }] }; - Quick test: Check server logs for the exact JSON response sent
Problem 5: “Rollback doesn’t work—partial files remain”
- Why: Unlink errors are swallowed silently, or files aren’t tracked properly
- Fix: Log rollback operations:
for (const filePath of createdFiles) { try { await fs.unlink(filePath); console.error(`[ROLLBACK] Deleted ${filePath}`); } catch (err) { console.error(`[ROLLBACK FAILED] ${filePath}: ${err}`); } } - Quick test: Simulate a write failure on the 3rd file and verify all previous files are deleted
Definition of Done
- Path validation correctly rejects
../traversal attempts - Server refuses to write outside
ALLOWED_DIRS(test with/etc/passwd) scaffold_react_componentgenerates all 4 files (component, test, CSS, barrel export)- Generated TypeScript compiles without errors (
tsc --noEmit) - Generated tests run successfully (
npm test) - Partial failures trigger rollback (all files deleted if any write fails)
- Server logs all file operations to stderr for audit trail
- Invalid component names return clear errors (e.g.,
"invalid-name"with hyphens) - Prop parsing handles complex types (e.g.,
Array<{id: number}>) - README.md documents the tool schema and example Kiro settings configuration
Project 15: “The Chrome Puppeteer” — Browser Automation
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | JavaScript / HTML |
| Software or Tool | chrome-devtools-mcp |
| Coolness Level | Level 5: Pure Magic |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Browser Automation |
What you’ll build: Use Kiro to drive a real browser and assert UI behavior.
Why it teaches E2E: You get AI-driven QA on a live page.
Success criteria:
- Kiro captures a screenshot and reports a UI state.
Real World Outcome
You’ll have Kiro CLI connected to Chrome via the Model Context Protocol, allowing natural language browser automation. You can ask “Go to example.com and click the login button” and watch it happen in real-time.
1. MCP Server Configuration (in ~/.config/kiro/settings.json):
{
"mcpServers": {
"chrome": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-puppeteer"],
"env": {}
}
}
}
2. Kiro CLI Session:
$ kiro
You: Go to https://news.ycombinator.com and tell me the top 3 story titles
[Tool Call] chrome.puppeteer_navigate(url="https://news.ycombinator.com")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_screenshot()
Tool Response: {"success": true, "image": "<base64>"}
[Tool Call] chrome.puppeteer_evaluate(expression=`
Array.from(document.querySelectorAll('.titleline > a'))
.slice(0, 3)
.map(a => a.textContent)
`)
Tool Response: {
"result": [
"Show HN: I built a visual regex debugger",
"Why SQLite uses B-trees instead of LSM trees",
"The performance cost of virtual methods in C++"
]
}
Kiro: The top 3 stories on Hacker News right now are:
1. Show HN: I built a visual regex debugger
2. Why SQLite uses B-trees instead of LSM trees
3. The performance cost of virtual methods in C++
3. Visual Verification: A Chrome window opens automatically and you see:
- Browser navigates to Hacker News
- Page loads completely
- Screenshot captured (visible in Kiro’s context)
- JavaScript executed in page context
- Results returned to Kiro
4. Advanced E2E Test Example:
You: Test the login flow on staging.myapp.com - use test@example.com / password123
[Tool Call] chrome.puppeteer_navigate(url="https://staging.myapp.com/login")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_fill(selector="#email", value="test@example.com")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_fill(selector="#password", value="password123")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_click(selector="button[type=submit]")
Tool Response: {"success": true}
[Tool Call] chrome.puppeteer_wait_for_selector(selector=".dashboard-header")
Tool Response: {"success": true, "found": true}
[Tool Call] chrome.puppeteer_screenshot()
Tool Response: {"success": true, "image": "<base64>"}
Kiro: ✅ Login test passed:
- Email field filled successfully
- Password field filled successfully
- Submit button clicked
- Dashboard loaded (confirmed by presence of .dashboard-header)
- Screenshot captured showing logged-in state
What you’re seeing:
- Natural language E2E testing - No Playwright/Selenium code to write
- Visual feedback - Screenshots embedded in conversation
- Selector debugging - Kiro can see the page and help fix selectors
- Async handling - Automatic waiting for elements and navigation
- Cross-site testing - Works on any public or localhost URL
This is the same technology behind tools like Browserbase, Skyvern, and multi-agent browser automation systems.
The Core Question You’re Answering
“How can I automate browser interactions using natural language instead of brittle test scripts?”
Traditional E2E tests look like this:
describe('Login flow', () => {
it('should log in successfully', async () => {
await page.goto('https://staging.myapp.com/login');
await page.fill('#email', 'test@example.com');
await page.fill('#password', 'password123');
await page.click('button[type=submit]');
await page.waitForSelector('.dashboard-header');
expect(await page.isVisible('.dashboard-header')).toBe(true);
});
});
Every time the UI changes (ID becomes a class, button text changes, form structure shifts), the test breaks. You spend more time maintaining tests than writing features.
This project teaches you:
- LLM-driven selector discovery - Kiro can adapt to UI changes
- Natural language assertions - “Verify the user is logged in” instead of brittle selectors
- Screenshot-based debugging - AI can see what went wrong
- Multimodal reasoning - Combine DOM inspection + visual analysis
- Zero-code E2E testing - Write tests as conversation, not code
By the end, you’ll understand how to use Kiro as a QA engineer that can drive browsers, inspect state, and report issues—all through conversation.
Concepts You Must Understand First
Stop and research these before coding:
- Chrome DevTools Protocol (CDP)
- What is the Chrome DevTools Protocol?
- How do automation tools (Puppeteer, Selenium) communicate with Chrome?
- What’s the difference between CDP and WebDriver?
- How do you connect to a running Chrome instance programmatically?
- Book Reference: “Web Performance in Action” by Jeremy Wagner - Ch. 8 (Browser Tools)
- DOM Querying Strategies
- What’s the difference between CSS selectors and XPath?
- Why are
data-testidattributes better than class names for testing? - How do you write selectors that survive UI refactors?
- What’s a “stable” vs “brittle” selector?
- Book Reference: “CSS: The Definitive Guide” by Eric Meyer - Ch. 3 (Selectors)
- Async Browser Events
- Why do you need to wait for elements to appear?
- What’s the difference between
waitForSelectorandwaitForNavigation? - How do you detect when a page has fully loaded (not just DOMContentLoaded)?
- What are race conditions in browser automation?
- Book Reference: “JavaScript: The Good Parts” by Douglas Crockford - Ch. 8 (Methods)
- Headless vs Headed Browsers
- What’s the difference between headless and headed mode?
- When should you use headless mode? (CI/CD, scraping)
- Why might a test pass in headed mode but fail in headless?
- How do you debug issues in headless mode?
- Book Reference: “Web Scraping with Python” by Ryan Mitchell - Ch. 11 (JavaScript)
Questions to Guide Your Design
Before implementing, think through these:
- Selector Strategy
- Should Kiro use CSS selectors, XPath, or text content matching?
- How will you handle dynamic IDs (e.g.,
user-dropdown-a8f32d)? - What if multiple elements match a selector?
- Should you use Kiro’s vision capabilities to verify the right element was clicked?
- Error Handling
- What if a selector doesn’t exist? (Element not found)
- What if a page doesn’t load? (Timeout)
- What if JavaScript execution fails?
- How will you communicate these failures to the user clearly?
- State Management
- Should Kiro close the browser after each task, or keep it open?
- How do you handle cookies and session state between tasks?
- What if the user wants to test a multi-step flow (login → browse → checkout)?
- Should each conversation start with a fresh browser session?
- Visual Debugging
- When should Kiro automatically take screenshots?
- Should screenshots be embedded in the conversation or saved to disk?
- How do you handle sensitive information (passwords) in screenshots?
- Can Kiro use computer vision to verify UI state instead of DOM inspection?
Thinking Exercise
Trace a Multi-Step Browser Flow
Before using Kiro, manually trace what should happen when you ask: “Go to Amazon, search for ‘mechanical keyboard’, and tell me the price of the first result”
Step 1: Navigate
chrome.puppeteer_navigate(url="https://amazon.com")
→ Browser loads Amazon homepage
→ Wait for page load complete
Step 2: Find Search Box
chrome.puppeteer_evaluate(`
document.querySelector('#twotabsearchtextbox')?.placeholder
`)
→ Returns: "Search Amazon"
→ Confirms search box exists
Step 3: Fill Search Query
chrome.puppeteer_fill(selector="#twotabsearchtextbox", value="mechanical keyboard")
→ Text appears in search box
Step 4: Submit Search
chrome.puppeteer_click(selector="#nav-search-submit-button")
→ Page navigates to search results
→ Wait for navigation complete
Step 5: Wait for Results
chrome.puppeteer_wait_for_selector(selector=".s-main-slot .s-result-item")
→ Ensures at least one result exists
Step 6: Extract First Result Price
chrome.puppeteer_evaluate(`
document.querySelector('.s-main-slot .s-result-item .a-price .a-offscreen')?.textContent
`)
→ Returns: "$89.99"
Questions while tracing:
- What if Amazon’s HTML structure changes and selectors break?
- How would Kiro adapt if the search box ID changes?
- What if the page uses a loading spinner that delays results?
- How do you handle CAPTCHA or “Are you a robot?” prompts?
- Should Kiro use screenshots to verify the search actually happened?
The Interview Questions They’ll Ask
Prepare to answer these:
-
“Explain the Chrome DevTools Protocol. How does Puppeteer use it to control Chrome?”
-
“Your E2E test passes locally but fails in CI/CD. What are the most common causes, and how do you debug?”
-
“How would you design a selector strategy that survives UI refactors? Compare
data-testidattributes vs CSS classes vs XPath.” -
“If a button click doesn’t trigger the expected navigation, how do you debug it? Walk me through your process.”
-
“How would you use Kiro to test a complex SPA (Single Page Application) where navigation doesn’t reload the page?”
-
“Describe how you’d implement visual regression testing using Kiro and screenshot comparison.”
-
“What’s the difference between
waitForSelector,waitForNavigation, andwaitForTimeout? When would you use each?”
Hints in Layers
Hint 1: Starting Point First, install the Puppeteer MCP server and configure Kiro to use it:
npm install -g @modelcontextprotocol/server-puppeteer
Then add it to ~/.config/kiro/settings.json under mcpServers. Start Kiro and verify the tools are available by asking “What browser tools do you have?”
Hint 2: Basic Navigation Start with simple commands:
You: Open https://example.com and take a screenshot
Kiro will call:
chrome.puppeteer_navigate(url="https://example.com")chrome.puppeteer_screenshot()
The screenshot will appear in the conversation as an image.
Hint 3: Element Interaction For clicking buttons or filling forms, use selectors:
You: Fill the search box with "test query" and click submit
Kiro will inspect the page (using puppeteer_evaluate to query the DOM) and find appropriate selectors, then call:
chrome.puppeteer_fill(selector="#search-input", value="test query")chrome.puppeteer_click(selector="button[type=submit]")
Hint 4: Extracting Data To extract text or data from the page:
You: What's the title of the first article on Hacker News?
Kiro will execute JavaScript in the page context:
chrome.puppeteer_evaluate(`
document.querySelector('.titleline > a')?.textContent
`)
Hint 5: Handling Waits For dynamic content that loads asynchronously:
You: Wait for the results to load, then tell me how many items are shown
Kiro will use:
chrome.puppeteer_wait_for_selector(selector=".result-item", timeout=5000)
Then count the results:
chrome.puppeteer_evaluate(`
document.querySelectorAll('.result-item').length
`)
Hint 6: Debugging Failed Selectors If a selector doesn’t work, ask Kiro to inspect the page:
You: The login button selector isn't working. Can you look at the page and find the right selector?
Kiro will take a screenshot and execute:
chrome.puppeteer_evaluate(`
Array.from(document.querySelectorAll('button')).map(b => ({
text: b.textContent,
id: b.id,
classes: b.className
}))
`)
This gives Kiro the full list of buttons to choose from.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Chrome DevTools Protocol | “Web Performance in Action” by Jeremy Wagner | Ch. 8 (Browser Developer Tools) |
| DOM Selectors | “CSS: The Definitive Guide” by Eric Meyer | Ch. 3 (Selectors) |
| Browser Automation | “Web Scraping with Python” by Ryan Mitchell | Ch. 11 (JavaScript and AJAX) |
| Async JavaScript | “JavaScript: The Good Parts” by Douglas Crockford | Ch. 8 (Methods) |
| E2E Testing Patterns | “Testing JavaScript Applications” by Lucas da Costa | Ch. 9 (E2E Testing) |
Common Pitfalls & Debugging
Problem 1: “Element not found” errors
- Why: The selector is wrong, or the element hasn’t loaded yet
- Fix: Ask Kiro to inspect the page first:
You: Take a screenshot and show me all buttons on the pageThen refine your selector based on what Kiro finds.
- Quick test: Open Chrome DevTools manually and test the selector in the console:
document.querySelector('your-selector')
Problem 2: “Navigation timeout” errors
- Why: The page is slow, blocked by CAPTCHA, or the URL is wrong
- Fix: Increase timeout or check network tab for errors:
You: Navigate to example.com and wait up to 30 seconds for the page to load - Quick test: Load the URL manually in Chrome and check the Network tab for failed requests
Problem 3: “Click doesn’t do anything”
- Why: The element is hidden, covered by another element, or requires JavaScript to be enabled
- Fix: Verify the element is visible:
chrome.puppeteer_evaluate(` const el = document.querySelector('button.submit'); return { visible: el.offsetParent !== null, disabled: el.disabled, boundingBox: el.getBoundingClientRect() } `) - Quick test: Try clicking manually in headed mode to see if it works
Problem 4: “Screenshots are blank or show loading spinner”
- Why: The screenshot was taken before content finished loading
- Fix: Wait for a specific element that indicates the page is ready:
chrome.puppeteer_wait_for_selector(selector=".content-loaded-indicator") chrome.puppeteer_screenshot() - Quick test: Add a delay and see if the screenshot improves:
chrome.puppeteer_wait(ms=2000)
Problem 5: “Kiro can’t find the MCP server”
- Why: The server isn’t installed globally, or the
commandin settings.json is wrong - Fix: Verify the server is installed:
$ which npx $ npx @modelcontextprotocol/server-puppeteer --helpUpdate
settings.jsonto use the full path:{ "command": "/usr/local/bin/npx", "args": ["-y", "@modelcontextprotocol/server-puppeteer"] } - Quick test: Restart Kiro after changing settings:
kiro(fresh session)
Definition of Done
- Puppeteer MCP server is configured in Kiro settings
- Kiro can navigate to a URL and take a screenshot
- Kiro can fill form fields and click buttons
- Kiro can extract text from elements using
puppeteer_evaluate - Kiro can wait for elements to appear before interacting
- Screenshots are clear and show the correct page state
- Failed selectors are debugged by inspecting the page structure
- Multi-step flows (e.g., login → navigate → submit) work correctly
- Test on both localhost and public URLs
- Document common selectors for your app in a team wiki
Project 16: “The Design to Code Converter” — Multimodal AI
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | CSS / React |
| Coolness Level | Level 5: Pure Magic |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Multimodal AI |
What you’ll build: Convert a hand-drawn layout into Tailwind CSS.
Why it teaches Vision: You learn how precise you must be when using sketches.
Success criteria:
- The rendered UI matches the sketch’s layout hierarchy.
Real World Outcome
You’ll upload a hand-drawn wireframe (photo or sketch) to Kiro and get back production-ready Tailwind CSS that matches the layout. This demonstrates multimodal AI’s ability to understand visual design and translate it to code.
1. The Sketch (hand-drawn wireframe):
┌─────────────────────────────────────┐
│ [Logo] Nav Links Button │
├─────────────────────────────────────┤
│ │
│ Hero Title Here │
│ Subtitle text │
│ [CTA Button] │
│ │
├─────────────────────────────────────┤
│ ┌────────┐ ┌────────┐ ┌────────┐│
│ │ Card 1 │ │ Card 2 │ │ Card 3 ││
│ │ Icon │ │ Icon │ │ Icon ││
│ │ Text │ │ Text │ │ Text ││
│ └────────┘ └────────┘ └────────┘│
└─────────────────────────────────────┘
2. Kiro CLI Session:
$ kiro
You: [Attach image: wireframe.jpg]
Convert this wireframe into a React component with Tailwind CSS. Make it responsive.
Kiro: I can see your wireframe showing:
- A header with logo, navigation links, and a button
- A hero section with centered title, subtitle, and CTA button
- A three-column card layout with icons and text
Let me create the React component:
[Writes file: src/components/LandingPage.tsx]
3. Generated Code:
src/components/LandingPage.tsx:
export default function LandingPage() {
return (
<div className="min-h-screen bg-white">
{/* Header */}
<header className="border-b border-gray-200">
<div className="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8">
<div className="flex justify-between items-center h-16">
<div className="text-2xl font-bold text-gray-900">Logo</div>
<nav className="hidden md:flex space-x-8">
<a href="#" className="text-gray-700 hover:text-gray-900">Features</a>
<a href="#" className="text-gray-700 hover:text-gray-900">Pricing</a>
<a href="#" className="text-gray-700 hover:text-gray-900">About</a>
</nav>
<button className="bg-blue-600 text-white px-4 py-2 rounded-lg hover:bg-blue-700">
Sign Up
</button>
</div>
</div>
</header>
{/* Hero Section */}
<section className="py-20 px-4">
<div className="max-w-4xl mx-auto text-center">
<h1 className="text-5xl font-bold text-gray-900 mb-4">
Hero Title Here
</h1>
<p className="text-xl text-gray-600 mb-8">
Subtitle text that explains the value proposition
</p>
<button className="bg-blue-600 text-white px-8 py-3 rounded-lg text-lg hover:bg-blue-700">
Call to Action
</button>
</div>
</section>
{/* Three Cards */}
<section className="py-16 px-4 bg-gray-50">
<div className="max-w-7xl mx-auto">
<div className="grid grid-cols-1 md:grid-cols-3 gap-8">
{[1, 2, 3].map((i) => (
<div key={i} className="bg-white p-6 rounded-lg shadow-sm text-center">
<div className="w-16 h-16 bg-blue-100 rounded-full mx-auto mb-4 flex items-center justify-center">
<span className="text-2xl">🎯</span>
</div>
<h3 className="text-xl font-semibold mb-2">Card {i}</h3>
<p className="text-gray-600">
Description text for this feature or benefit
</p>
</div>
))}
</div>
</div>
</section>
</div>
);
}
4. Rendered Result:
When you run npm run dev and view the page, you see:
- Pixel-perfect match to the wireframe layout
- Responsive design (collapses to mobile on small screens)
- Hover states on buttons and links
- Proper spacing and typography matching the sketch’s visual hierarchy
What you’re seeing:
- Vision understanding - Kiro interprets hand-drawn boxes as layout sections
- Semantic HTML - Proper use of header/section/nav tags
- Responsive grid - Cards stack on mobile, side-by-side on desktop
- Design system - Consistent spacing, colors, and typography
- Production-ready code - Not just a prototype, but deployable components
This is the same technology behind tools like v0.dev, Galileo AI, and Figma-to-code plugins.
The Core Question You’re Answering
“Can AI understand visual design intent from rough sketches and translate it to production code?”
Think about the traditional design-to-code workflow:
- Designer creates wireframe in Figma
- Developer interprets the design
- Developer writes HTML/CSS (hours of work)
- Back-and-forth to fix spacing, colors, responsiveness
- Repeat for every screen
With multimodal AI, you can:
- Sketch on paper → photo → code in minutes
- Iterate designs without opening Figma
- Prototype faster than traditional design tools
- Bridge the designer-developer communication gap
This project teaches you:
- How vision models parse layouts - Understanding hierarchy from visual cues
- Prompt engineering for design - How to describe design intent clearly
- Multimodal context - Combining images with text instructions
- Design token extraction - How AI infers spacing, colors, and typography
- Limitations of vision - Where it struggles (fine details, exact measurements)
By the end, you’ll know when to use AI for design-to-code and when traditional tools are better.
Concepts You Must Understand First
Stop and research these before coding:
- Tailwind CSS Utility Classes
- What’s the difference between
px-4andp-4? - How does Tailwind’s responsive system work? (
sm:,md:,lg:) - What’s the purpose of utility-first CSS vs traditional CSS?
- How do you compose complex layouts with just utility classes?
- Book Reference: “Refactoring UI” by Adam Wathan - All chapters
- What’s the difference between
- Layout Hierarchy Recognition
- How do designers communicate hierarchy visually? (size, spacing, weight)
- What’s the difference between a card, section, and container?
- How do you represent visual grouping in HTML structure?
- Why does semantic HTML matter for layout interpretation?
- Book Reference: “Don’t Make Me Think” by Steve Krug - Ch. 3-4
- Responsive Design Principles
- What’s mobile-first design?
- How do you handle different screen sizes without media queries?
- What’s the difference between fluid and fixed layouts?
- How do you test responsive designs efficiently?
- Book Reference: “Responsive Web Design” by Ethan Marcotte - Ch. 1-2
- Vision Model Capabilities
- How do vision models understand spatial relationships?
- What’s the difference between image classification and layout analysis?
- Why might a model misinterpret hand-drawn sketches?
- How do you improve vision model accuracy with prompts?
- Blog: OpenAI GPT-4 Vision System Card (2023)
Questions to Guide Your Design
Before implementing, think through these:
- Sketch Quality
- How detailed should the sketch be? (low-fidelity vs high-fidelity)
- Should you annotate the sketch with labels? (“Header”, “Hero”, etc.)
- Does the sketch need to show exact measurements or just proportions?
- How do you communicate color intent in a black-and-white sketch?
- Code Generation Strategy
- Should Kiro generate a single component or split into multiple files?
- How do you handle repeated elements (cards, buttons)?
- Should the code include placeholder content or real text?
- Do you want inline Tailwind classes or a component library?
- Responsive Behavior
- Should the layout stack vertically on mobile or stay horizontal?
- Where should breakpoints be? (Tailwind defaults or custom?)
- How do you communicate responsive behavior in a static sketch?
- Should text sizes scale or stay fixed?
- Iteration Workflow
- If the output doesn’t match, do you refine the sketch or the prompt?
- How do you provide feedback to Kiro? (Screenshot comparison?)
- Should you iterate on the same component or generate multiple variations?
- How do you version control design iterations?
Thinking Exercise
Analyze What Makes a Good Design Sketch for AI
Before uploading a sketch, think about what information the AI needs:
Good Sketch (High Success Rate):
- Clear boxes with labels (“Header”, “Hero Section”, “Card Grid”)
- Arrows showing hierarchy or flow
- Annotations for interactive elements (“Button”, “Link”)
- Relative sizing (big title vs small subtitle)
- Grouping indicators (dotted lines around related elements)
Bad Sketch (Likely to Fail):
- Ambiguous shapes (is this a button or a text box?)
- No labels or context
- Inconsistent spacing that doesn’t reflect intent
- Missing sections (no header drawn, but you want one)
- Too much fine detail that obscures structure
Questions while sketching:
- If I showed this to a junior developer, could they build it?
- Does the hierarchy (title > subtitle > body) come through visually?
- Are interactive elements obvious? (buttons, links, forms)
- Does spacing communicate grouping? (related items close together)
The Interview Questions They’ll Ask
Prepare to answer these:
-
“How does a vision model understand the difference between a button and a text label in a wireframe?”
-
“If the AI generates a layout that’s close but not quite right, how do you refine it? Walk me through your iteration process.”
-
“Compare design-to-code AI tools (v0.dev, Galileo) to traditional Figma-to-React plugins. What are the tradeoffs?”
-
“How would you handle complex interactions (hover states, animations) that aren’t visible in a static wireframe?”
-
“Describe a scenario where AI design-to-code would fail. How do you know when to use traditional methods?”
-
“If you wanted to enforce a design system (specific colors, spacing, components), how would you instruct the AI?”
Hints in Layers
Hint 1: Starting Point Draw a simple wireframe on paper or use a tool like Excalidraw. Focus on boxes and labels, not pixel-perfect design. Take a clear photo or screenshot.
Hint 2: The Prompt Upload the image to Kiro and provide context:
You: [Attach: wireframe.jpg]
Convert this wireframe into a React component using Tailwind CSS. The layout should be:
- Responsive (mobile-first)
- Use semantic HTML
- Include hover states on interactive elements
Hint 3: Refining the Output If the result doesn’t match, provide specific feedback:
You: The cards should be in a grid, not stacked. Also, make the hero title larger and the button more prominent.
Kiro will regenerate with adjustments.
Hint 4: Handling Ambiguity If your sketch is ambiguous, Kiro might ask clarifying questions:
Kiro: I see three boxes in your sketch. Should these be:
1. Image cards with captions?
2. Feature blocks with icons?
3. Call-to-action cards with buttons?
Provide clear direction to avoid wrong assumptions.
Hint 5: Extracting Reusable Components After generating code, ask Kiro to refactor:
You: Extract the card component into a reusable component that accepts title, icon, and description as props.
This produces a component library from the initial sketch.
Hint 6: Testing Responsiveness Ask Kiro to show you how it looks at different screen sizes:
You: Show me how this layout would look on mobile (375px) vs desktop (1440px).
Kiro can describe or generate screenshots for visual verification.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Tailwind CSS Patterns | “Refactoring UI” by Adam Wathan | All chapters |
| Layout Hierarchy | “Don’t Make Me Think” by Steve Krug | Ch. 3-4 (Visual Hierarchy) |
| Responsive Design | “Responsive Web Design” by Ethan Marcotte | Ch. 1-2 |
| Design Systems | “Atomic Design” by Brad Frost | Ch. 2 (Atoms, Molecules) |
| Component Architecture | “React Design Patterns” by Michele Bertoli | Ch. 3 (Composition) |
Common Pitfalls & Debugging
Problem 1: “AI generates wrong layout structure”
- Why: The sketch is ambiguous or labels are missing
- Fix: Add annotations to your sketch:
[Header Section] [Hero: centered, large text] [Cards: 3 columns on desktop, 1 on mobile] - Quick test: Show the sketch to a colleague—can they understand it without your explanation?
Problem 2: “Colors don’t match the design system”
- Why: The sketch doesn’t specify colors, so AI uses defaults
- Fix: Provide color palette in the prompt: ``` You: Use these colors:
- Primary: #3B82F6 (blue)
- Background: #F9FAFB (light gray)
- Text: #111827 (dark gray) ```
- Quick test: Check the generated code for hard-coded colors and verify against your palette
Problem 3: “Spacing is inconsistent”
- Why: The sketch doesn’t communicate spacing intent clearly
- Fix: Add spacing notes to the sketch or prompt:
You: Use 8px spacing units. Headers should have 32px padding, sections 64px. - Quick test: Inspect with browser DevTools and verify
p-*andm-*classes are consistent
Problem 4: “Responsive behavior is wrong”
- Why: The AI assumes a default responsive strategy
- Fix: Be explicit about breakpoints:
You: On mobile (<768px), stack cards vertically. On desktop (≥768px), show 3 columns. - Quick test: Resize browser to test breakpoints: 375px, 768px, 1024px, 1440px
Problem 5: “Interactive elements don’t work”
- Why: Static wireframes don’t show behavior, only structure
- Fix: Describe interactions explicitly:
You: The CTA button should navigate to /signup on click. Navigation links should scroll to section anchors. - Quick test: Click buttons and links to verify behavior matches intent
Definition of Done
- Wireframe is clear with labeled sections (header, hero, cards, etc.)
- Kiro generates React component with Tailwind CSS
- Layout matches wireframe structure (boxes, hierarchy, grouping)
- Responsive design works on mobile (375px), tablet (768px), desktop (1440px)
- Interactive elements (buttons, links) are styled with hover states
- Colors and spacing follow design system (or specified palette)
- Code uses semantic HTML (header, section, nav, not just divs)
- Reusable components are extracted (Card, Button, etc.)
- Tested in browser and matches visual expectations
- Document the prompt pattern for future wireframe conversions
Project 17: “The Type-Safe Hook with Bun” — Kiro Hooks
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | TypeScript (Bun) |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Kiro Hooks |
What you’ll build: A PostToolUse hook in Bun that parses JSON events with Zod and logs metrics.
Why it teaches Safe Automation: You replace fragile shell scripts with typed automation.
Success criteria:
- Hook logs structured output on every tool use.
Real World Outcome
You’ll have a production-ready Kiro hook written in TypeScript that runs with Bun and validates every tool execution event using Zod schemas. When Kiro calls any tool (Bash, Edit, Read, etc.), your hook will:
For a successful Bash tool call:
$ kiro "run the test suite"
# Your hook logs to ~/.kiro/hooks/metrics.jsonl:
{"timestamp":"2024-12-20T14:32:01.234Z","toolName":"Bash","duration":1234,"success":true,"command":"npm test","exitCode":0}
For a failed Edit operation:
{"timestamp":"2024-12-20T14:35:22.456Z","toolName":"Edit","duration":89,"success":false,"error":"old_string not found in file","filePath":"/path/to/file.ts"}
Live metrics dashboard:
$ bun run analyze-metrics.ts
Tool Usage Report (Last 24 Hours)
─────────────────────────────────
Bash: 47 calls (43 success, 4 failed) - avg 890ms
Edit: 23 calls (22 success, 1 failed) - avg 45ms
Read: 89 calls (89 success, 0 failed) - avg 12ms
Grep: 34 calls (34 success, 0 failed) - avg 156ms
Slowest Commands:
1. npm run build (3421ms)
2. git push origin main (2876ms)
3. docker-compose up (2134ms)
Error Rate: 2.6%
The hook validates every event against strict TypeScript types, ensuring you never miss a field or misparse data.
The Core Question You’re Answering
“How do I build production-grade automation on top of Kiro without brittle shell scripts that break on edge cases?”
Before you start coding, consider: Shell scripts are fragile—they silently fail on malformed JSON, ignore type errors, and give no compile-time guarantees. A typed hook in Bun/TypeScript gives you runtime validation (Zod), compile-time safety (TypeScript), and fast execution (Bun’s native speed). This project teaches you to replace “parse with grep/awk” with “validate with schemas.”
Concepts You Must Understand First
Stop and research these before coding:
- Kiro Hook Lifecycle
- What is the difference between PreToolUse and PostToolUse hooks?
- How does Kiro pass event data to hooks (stdin JSON)?
- What happens if a hook exits with non-zero status?
- Book Reference: Kiro CLI documentation - Hook System Architecture
- Zod Schema Validation
- How do you define a Zod schema for nested objects?
- What is
z.infer<typeof schema>and why is it critical? - How does Zod handle optional fields and defaults?
- How do you compose schemas (unions, intersections)?
- Book Reference: “Effective TypeScript” by Dan Vanderkam - Ch. 3 (Type Inference)
- Bun Runtime Specifics
- How does Bun’s
Bun.file()differ from Node’sfsmodule? - What is
Bun.write()and why is it faster than fs.appendFile? - How do you handle stdin in Bun (
await Bun.stdin.text())? - How does Bun’s bundler work (
bun build --compile)? - Book Reference: Bun documentation - Runtime APIs
- How does Bun’s
- JSON Lines (JSONL) Format
- Why use JSONL instead of JSON arrays for logs?
- How do you append to JSONL files atomically?
- How do you parse JSONL with streaming (jq, ndjson)?
- Book Reference: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 4 (Encoding)
Questions to Guide Your Design
Before implementing, think through these:
- Schema Design
- What fields are common to all tool events (toolName, timestamp, duration)?
- How do you model tool-specific data (Bash exitCode, Edit filePath)?
- Should you use a discriminated union (
type: "bash" | "edit") or a flat schema? - How do you handle unknown tools gracefully?
- Error Handling
- What if Zod validation fails—do you crash the hook or log the error?
- Should a hook failure block Kiro’s execution or just warn?
- How do you ensure the log file is always writable?
- What happens if the disk is full?
- Performance
- Should you write to disk synchronously or asynchronously?
- Do you batch writes or append immediately?
- How do you avoid blocking Kiro on slow I/O?
- Should you rotate log files daily/hourly?
- Observability
- How do you debug a hook that’s failing silently?
- Should you log to stderr or a separate debug file?
- How do you measure the hook’s own performance overhead?
Thinking Exercise
Manual Hook Execution Trace
Before writing code, manually trace what happens when Kiro calls your hook:
Scenario: User runs kiro "run npm test" and Kiro invokes the Bash tool.
Step 1: Kiro prepares the event
{
"hookType": "PostToolUse",
"timestamp": "2024-12-20T14:32:01.234Z",
"tool": {
"name": "Bash",
"input": {"command": "npm test"},
"output": {"exitCode": 0, "stdout": "All tests passed", "stderr": ""},
"duration": 1234
}
}
Step 2: Kiro spawns your hook
bun run ~/.kiro/hooks/metrics-logger.ts < event.json
Step 3: Your hook reads stdin
const eventJson = await Bun.stdin.text();
// What if stdin is empty? What if it's malformed JSON?
Step 4: Zod validation
const event = PostToolUseEventSchema.parse(JSON.parse(eventJson));
// What if parsing throws? Should you catch and log, or let it crash?
Step 5: Write to JSONL
await Bun.write("metrics.jsonl", JSON.stringify(event) + "\n", {append: true});
// Is this atomic? What if another process is writing simultaneously?
Questions while tracing:
- At which step could things fail? How would you detect each failure?
- How do you test this hook without running Kiro every time?
- How would you simulate different tool events (success, failure, timeout)?
The Interview Questions They’ll Ask
Prepare to answer these:
- “How does Zod’s
z.inferwork, and why is it more reliable than manually typing JSON.parse results?” - “What are the performance characteristics of Bun compared to Node.js for I/O-heavy tasks like log writes?”
- “Explain the difference between JSON and JSON Lines. When would you use each format?”
- “How would you design a schema for a discriminated union in TypeScript to handle multiple tool types?”
- “What strategies would you use to ensure atomic writes to a log file from concurrent processes?”
- “How do you test code that reads from stdin without manually piping data every time?”
Hints in Layers
Hint 1: Start with the Schema
Define your Zod schemas first, then let TypeScript types flow from them. Start with a base ToolEvent schema, then extend it for specific tools.
Hint 2: Read Stdin Safely
Bun provides await Bun.stdin.text() to read all stdin as a string. Wrap this in a try/catch for JSON.parse and Zod validation. If validation fails, log to stderr and exit 0 (don’t block Kiro).
Hint 3: Atomic Appends
Use Bun.write(path, data, {append: true}) for atomic appends. Bun handles file locking internally. For rotation, check file size before writing and rename if needed.
Hint 4: Testing Without Kiro Create a test harness:
echo '{"hookType":"PostToolUse",...}' | bun run metrics-logger.ts
Write JSON fixtures for each tool type and pipe them to your hook.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| TypeScript Type System | “Effective TypeScript” by Dan Vanderkam | Ch. 3 (Type Inference), Ch. 4 (Type Design) |
| Schema Validation | “Programming TypeScript” by Boris Cherny | Ch. 6 (Advanced Types) |
| JSON Lines Format | “Designing Data-Intensive Applications” by Martin Kleppmann | Ch. 4 (Encoding and Evolution) |
| Bun Runtime | Bun official docs | Runtime APIs, File I/O |
| Hook Patterns | Kiro CLI docs | Hooks System, Event Schemas |
Common Pitfalls & Debugging
Problem 1: “Hook doesn’t run, no error messages”
- Why: Kiro might not have execute permissions on the hook file, or the shebang is wrong
- Fix:
chmod +x ~/.kiro/hooks/metrics-logger.ts # Add shebang: #!/usr/bin/env bun - Quick test:
~/.kiro/hooks/metrics-logger.tsshould run directly
Problem 2: “Zod validation fails with ‘Expected object, received undefined’“
- Why: Stdin is empty, or Kiro isn’t passing the event correctly
- Fix: Check if stdin exists before parsing:
const input = await Bun.stdin.text(); if (!input.trim()) { console.error("No input received"); process.exit(0); } - Quick test:
echo '{}' | bun run hook.tsshould fail gracefully
Problem 3: “Log file grows infinitely, disk fills up”
- Why: No log rotation implemented
- Fix: Implement daily rotation:
const logPath = `~/.kiro/hooks/metrics-${new Date().toISOString().split('T')[0]}.jsonl`; - Quick test: Run hook 1000 times, verify old logs are compressed/deleted
Problem 4: “Hook is slow, Kiro feels laggy”
- Why: Synchronous disk writes block Kiro’s execution
- Fix: Use async writes and don’t await them (fire-and-forget):
Bun.write(logPath, data, {append: true}); // Don't await - Quick test: Time hook execution with
time echo '...' | bun run hook.ts
Definition of Done
- Hook receives PostToolUse events from Kiro and parses them with Zod
- All tool types (Bash, Edit, Read, Grep, etc.) are logged correctly
- Validation errors are logged to stderr but don’t crash the hook
- JSONL log file is written atomically with append mode
- Log rotation is implemented (daily files, compression after 7 days)
- Hook adds <5ms overhead to Kiro’s tool execution
- Test suite covers success, failure, and malformed input cases
- Analysis script reads JSONL and produces human-readable reports
- Hook is executable (
chmod +x) with correct shebang - Documentation explains how to customize schemas for new tools
Project 18: “The Security Firewall Hook” — Security Governance
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Python or Rust |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | 3. Service & Support (Enterprise Security) |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Security Governance |
What you’ll build: A PreToolUse hook that blocks dangerous commands.
Why it teaches Governance: You enforce guardrails against hallucinated risk.
Success criteria:
- A destructive command is blocked with a clear message.
Real World Outcome
You’ll have a security firewall that intercepts every Kiro tool call and blocks dangerous operations before execution. When Kiro attempts risky commands, your hook prevents disaster:
Blocked destructive command:
$ kiro "clean up the database by dropping all tables"
[PreToolUse Hook] Analyzing command...
🚫 BLOCKED: Dangerous Bash command detected
Command: DROP DATABASE production;
Reason: Database destructive operation (DROP, TRUNCATE)
Risk Level: CRITICAL
Suggested Alternative: Create a backup first with pg_dump
Hook Exit Code: 1 (Execution prevented)
Allowed safe command:
$ kiro "show me the database schema"
[PreToolUse Hook] Analyzing command...
✅ ALLOWED: Read-only database query
Command: \d+ users
Risk Level: LOW
Reason: SELECT queries are permitted
Security policy report:
$ python analyze-blocks.py
Security Firewall Report (Last 7 Days)
──────────────────────────────────────
Total Tool Calls: 1,247
Blocked: 23 (1.8%)
Allowed: 1,224 (98.2%)
Top Blocked Patterns:
1. rm -rf / (8 attempts)
2. DROP DATABASE (5 attempts)
3. chmod 777 (4 attempts)
4. curl | bash (3 attempts)
5. git push --force main (3 attempts)
Risk Prevented:
- Data loss: 13 incidents
- Security vulnerabilities: 7 incidents
- Production impact: 3 incidents
The hook runs pattern matching, AST analysis, and allow-list checking to catch both obvious and subtle threats before they execute.
The Core Question You’re Answering
“How do I prevent an AI agent from executing catastrophic commands while still allowing productive work?”
Before you start coding, consider: LLMs hallucinate. They suggest rm -rf / to “clean up space.” They recommend chmod 777 for “permission issues.” They propose DROP TABLE users to “fix schema conflicts.” A security firewall is your last line of defense against well-intentioned but devastating suggestions. This project teaches you to build guardrails that catch mistakes before they become disasters.
Concepts You Must Understand First
Stop and research these before coding:
- PreToolUse Hook Lifecycle
- When does PreToolUse execute relative to tool invocation?
- How does a non-zero exit code block tool execution?
- Can you modify tool arguments in a PreToolUse hook?
- What happens if the hook times out (5-second limit)?
- Book Reference: Kiro CLI documentation - Hook System Architecture
- Command Pattern Matching
- How do you distinguish
rm -rf /tmp/safefromrm -rf /? - Should you use regex, AST parsing, or both?
- How do you handle command aliases (ll, la, etc.)?
- What about commands wrapped in functions or subshells?
- Book Reference: “Compilers: Principles, Techniques, and Tools” by Aho et al. - Ch. 3 (Lexical Analysis)
- How do you distinguish
- Bash AST Parsing
- What is the structure of a Bash abstract syntax tree?
- How do you extract the command name from complex pipelines?
- How do you detect destructive flags (
-f,--force,--no-preserve-root)? - Libraries: Python (bashlex), Rust (shell-words)
- Book Reference: “The Linux Command Line” by William Shotts - Ch. 28 (Script Debugging)
- Security Policy Design
- Deny-list (block known bad) vs allow-list (permit known good)?
- How do you balance security with usability?
- Should you allow overrides with explicit flags (–i-know-what-im-doing)?
- How do you handle context-dependent risk (safe in dev, fatal in prod)?
- Book Reference: “Building Secure and Reliable Systems” by Google - Ch. 6 (Design for Least Privilege)
Questions to Guide Your Design
Before implementing, think through these:
- Threat Model
- What are the most dangerous commands to block (file deletion, permission changes, database drops)?
- How do you detect SQL injection in commands?
- Should you block network commands (curl, wget) if they download and execute?
- What about indirect threats (cron jobs, systemd services)?
- Detection Strategy
- Do you use regex patterns (fast but fragile) or AST parsing (slow but accurate)?
- How do you handle obfuscated commands (
$(echo cm)$(echo 64)→echo cm64)? - Should you sandbox and execute the command in dry-run mode first?
- Do you need a scoring system (low/medium/high risk) or binary allow/deny?
- User Experience
- How do you communicate why a command was blocked?
- Should you suggest safer alternatives?
- Do you allow interactive approval (“This is risky, proceed? [y/N]”)?
- How do you prevent alert fatigue from too many false positives?
- Policy Configuration
- Should policies be global or per-project?
- Do you support environment-specific rules (block in prod, allow in dev)?
- How do you update patterns without modifying the hook code?
- Should you support policy inheritance (base + overrides)?
Thinking Exercise
Manual Threat Detection Walkthrough
Before writing code, manually trace how your hook would analyze these commands:
Test Case 1: Obvious Threat
Command: rm -rf /
Analysis:
- Command:
rm(file deletion) - Flags:
-rf(recursive, force) - Target:
/(root directory) - Decision: BLOCK - Catastrophic data loss risk
- Reason: Recursive deletion of root filesystem
Test Case 2: Subtle Threat
Command: find / -name "*.log" -exec rm {} \;
Analysis:
- Command:
find(search, seems harmless) - Execution:
-exec rm(delete each match) - Target:
/(all logs system-wide) - Decision: BLOCK - Hidden destructive operation in -exec
- Reason: Mass file deletion disguised as search
Test Case 3: Context-Dependent
Command: chmod 777 /tmp/test.sh
Analysis:
- Command:
chmod(permission change) - Mode:
777(world-writable) - Target:
/tmp/test.sh(temp file) - Decision: WARN - Bad practice but low immediate risk
- Reason: Insecure permissions on non-critical file
Test Case 4: SQL Injection
Command: psql -c "DROP TABLE users WHERE id = 1; DROP DATABASE production; --"
Analysis:
- Command:
psql(database client) - SQL: Multiple statements detected
- Keywords:
DROP TABLE,DROP DATABASE - Decision: BLOCK - SQL injection attempt
- Reason: Destructive SQL operations
Questions while analyzing:
- Which patterns can you detect with regex alone?
- Which require parsing the command structure?
- How would you handle base64-encoded commands?
- What if the command is split across multiple tool calls?
The Interview Questions They’ll Ask
Prepare to answer these:
- “How would you detect a destructive command hidden in a base64-encoded string that’s later decoded and executed?”
- “Explain the difference between deny-list and allow-list security models. Which is appropriate for AI agent governance?”
- “How do you prevent time-of-check to time-of-use (TOCTOU) attacks where the command changes after your hook approves it?”
- “What strategies would you use to minimize false positives while maintaining strong security boundaries?”
- “How would you design a policy system that’s secure by default but allows power users to override when necessary?”
- “Explain how you would parse a complex Bash command with pipes, redirects, and subshells to extract all executable components.”
Hints in Layers
Hint 1: Start with Pattern Matching Begin with a deny-list of obvious threats using regex:
rm -rf /DROP DATABASEchmod 777curl .* | bashBuild incrementally from simple patterns to complex AST analysis.
Hint 2: Parse the Bash Command Structure
Use a library like bashlex (Python) to parse commands into an AST. Walk the tree to extract:
- Primary command name
- All flags and arguments
- Nested commands in subshells or backticks This catches threats hidden in complex syntax.
Hint 3: Risk Scoring System Assign risk scores to each pattern:
- Critical (100):
rm -rf /,DROP DATABASE - High (75):
chmod 777,git push --force main - Medium (50): Unverified curl downloads
- Low (25): Warnings only Block anything ≥75, warn for 25-74.
Hint 4: Configuration-Driven Policies Load patterns from a YAML config file:
critical_patterns:
- pattern: 'rm\s+-rf\s+/'
reason: "Root filesystem deletion"
- pattern: 'DROP\s+DATABASE'
reason: "Database destruction"
This allows updates without code changes.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Security Policy Design | “Building Secure and Reliable Systems” by Google | Ch. 6 (Design for Least Privilege), Ch. 12 (Crisis Management) |
| Command Parsing | “Compilers: Principles, Techniques, and Tools” by Aho et al. | Ch. 3 (Lexical Analysis), Ch. 4 (Syntax Analysis) |
| Bash Scripting Security | “The Linux Command Line” by William Shotts | Ch. 28 (Script Debugging), Ch. 29 (Flow Control) |
| Pattern Matching | “Regular Expressions Cookbook” by Goyvaerts & Levithan | Ch. 4 (Validation), Ch. 7 (Security) |
| Hook System | Kiro CLI docs | Hooks System, Security Best Practices |
Common Pitfalls & Debugging
Problem 1: “Hook blocks legitimate commands (false positives)”
- Why: Overly broad regex patterns match harmless variations
- Fix: Test patterns against a suite of safe commands:
# Safe: rm -rf ./temp # Unsafe: rm -rf / # Pattern should distinguish based on target path - Quick test: Create a test suite with 100 safe and 20 unsafe commands
Problem 2: “Obfuscated commands bypass detection”
- Why: Regex can’t detect
$(echo rm) -rf / - Fix: Execute the command in a sandbox with dry-run mode:
bash -n -c "$command" 2>&1 # Check syntax without executing - Quick test: Try variations like
r''m -rf,${CMD}, base64 encoding
Problem 3: “Hook is too slow, Kiro times out”
- Why: AST parsing is expensive for every command
- Fix: Implement a fast-path for common safe commands:
if command.startswith(('ls', 'cat', 'grep', 'echo')): return ALLOW # Skip expensive parsing - Quick test: Time hook execution—should be <100ms for 95% of commands
Problem 4: “Policy updates require redeploying the hook”
- Why: Patterns are hardcoded in the script
- Fix: Load patterns from
~/.kiro/security-policy.yaml:with open(os.path.expanduser('~/.kiro/security-policy.yaml')) as f: policy = yaml.safe_load(f) - Quick test: Modify YAML file, verify new patterns apply without restart
Problem 5: “Can’t block SQL injection in psql commands”
- Why: SQL is embedded as a string argument
- Fix: Parse SQL with a library like
sqlparse(Python):import sqlparse statements = sqlparse.split(sql_query) for stmt in statements: if 'DROP' in stmt.upper() or 'DELETE' in stmt.upper(): return BLOCK - Quick test: Test with various SQL injection payloads
Definition of Done
- Hook intercepts all PreToolUse events for Bash and Edit tools
- Critical patterns (rm -rf /, DROP DATABASE, chmod 777) are blocked
- Blocked commands log to ~/.kiro/security-log.jsonl with reason
- AST parsing catches threats hidden in complex command structures
- Policy is loaded from YAML config file, updatable without code changes
- False positive rate is <2% on a benchmark of 500 real commands
- Hook completes analysis in <100ms for 95th percentile
- Blocked commands show helpful error messages with alternatives
- Allow-list overrides are supported via –security-override flag
- Environment-specific policies (dev vs prod) are configurable
- Documentation explains how to add custom patterns and policies
Project 19: “The Auto-Fixer Loop” — Developer Experience (DX)
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Bash / JavaScript |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Developer Experience (DX) |
What you’ll build: A PostToolUse hook that runs lint/format and forces retries on failure.
Why it teaches Feedback Loops: The AI cannot finish until the code is clean.
Success criteria:
- Lint errors prevent completion until fixed.
Real World Outcome
You’ll have an auto-fixer loop that enforces code quality by running linters and formatters after every file write, forcing Kiro to fix errors before proceeding. The hook creates a tight feedback loop where code quality is mandatory, not optional:
Successful write with auto-fix:
$ kiro "add a new React component for user profiles"
[Kiro writes UserProfile.tsx]
[PostToolUse Hook] Running linters...
✗ ESLint found 3 issues:
- Line 12: 'useState' is not defined (no-undef)
- Line 18: Missing return statement (consistent-return)
- Line 24: Unexpected console.log (no-console)
[Hook] Auto-fixing with eslint --fix...
✓ Fixed 1 issue automatically (console.log removed)
✗ 2 issues require manual fixes
[Hook] Returning error to Kiro with fix instructions...
Kiro: I see lint errors. Let me fix them.
[Kiro writes corrected UserProfile.tsx]
[PostToolUse Hook] Running linters...
✓ ESLint: All checks passed
✓ Prettier: Code formatted
✓ TypeScript: No type errors
[Hook] All quality checks passed. Proceeding.
Quality enforcement metrics:
$ node analyze-quality.js
Auto-Fixer Loop Report (Last 7 Days)
─────────────────────────────────────
Total File Writes: 342
First-Pass Clean: 89 (26%)
Required Fixes: 253 (74%)
- ESLint errors: 187
- Type errors: 98
- Format issues: 156
Average Fix Iterations: 1.8
Max Iterations (complex): 5
Quality Gate Success Rate: 100%
(Zero files merged with lint errors)
Time Saved:
- Pre-commit rejections prevented: 253
- Manual lint runs avoided: 342
- Broken builds prevented: 98
This hook ensures that Kiro never leaves behind broken or poorly formatted code—it fixes issues immediately or tries again until clean.
The Core Question You’re Answering
“How do I build a feedback loop that forces AI to fix code quality issues immediately instead of accumulating technical debt?”
Before you start coding, consider: Without enforcement, Kiro will happily write code with lint errors, type mismatches, and formatting inconsistencies. Developers then manually run npm run lint and find dozens of issues. An auto-fixer loop shifts quality left—errors are caught and fixed at write-time, not review-time. This project teaches you to build continuous quality gates that prevent bad code from ever existing.
Concepts You Must Understand First
Stop and research these before coding:
- PostToolUse Hook Timing
- When does PostToolUse execute (after tool completes or before result returns)?
- Can a PostToolUse hook modify tool outputs?
- How do you signal failure to trigger a retry?
- What is the maximum retry count before Kiro gives up?
- Book Reference: Kiro CLI documentation - Hook System Architecture
- ESLint and Prettier Integration
- How do you run ESLint programmatically via Node.js API?
- What is the difference between
--fixand manual fixes? - How do you combine ESLint and Prettier without conflicts?
- How do you handle auto-fix failures (unfixable rules)?
- Book Reference: ESLint documentation - Node.js API
- Error Return Codes and Retry Logic
- How does Kiro interpret non-zero exit codes from hooks?
- Should you return exit code 1 (error) or print to stderr?
- How do you pass fix instructions back to Kiro?
- How many times will Kiro retry before failing?
- Book Reference: Unix process exit codes - Advanced Linux Programming
- TypeScript Compiler API
- How do you run
tsc --noEmitto check types without emitting files? - How do you parse TypeScript errors into actionable messages?
- What is the performance cost of running the compiler on every write?
- Should you cache compilation results?
- Book Reference: “Programming TypeScript” by Boris Cherny - Ch. 10 (Compiler API)
- How do you run
Questions to Guide Your Design
Before implementing, think through these:
- Quality Gates
- Which checks should be mandatory (ESLint, types, format)?
- Should you run tests on every write or only on commit?
- How do you handle slow checks (type checking takes 5 seconds)?
- Do you fail fast (stop at first error) or collect all errors?
- Auto-Fix Strategy
- Which errors can be auto-fixed (format, imports) vs require manual fixes?
- Should you apply auto-fixes immediately or ask Kiro to review them?
- How do you avoid infinite loops (auto-fix introduces new errors)?
- What if auto-fix changes behavior (removing unused code)?
- Performance
- How do you avoid blocking Kiro for 10 seconds on every write?
- Should you run checks in parallel (lint + types + format)?
- Do you cache lint results for unchanged files?
- What is the acceptable latency for the feedback loop?
- Error Communication
- How do you format lint errors so Kiro understands how to fix them?
- Should you include line numbers, error codes, suggested fixes?
- How do you distinguish critical errors from warnings?
- Do you provide a “skip quality checks” escape hatch?
Thinking Exercise
Manual Fix Iteration Walkthrough
Before writing code, trace what happens when Kiro writes flawed code:
Iteration 1: Kiro writes component with errors
// UserProfile.tsx (written by Kiro)
export default function UserProfile() {
const [name, setName] = useState(''); // Missing React import
console.log('Rendering profile'); // Disallowed console.log
// Missing return statement
}
Hook Analysis:
- ESLint detects 3 errors
- Auto-fix can remove
console.log - Remaining 2 errors require code changes
Hook returns exit code 1 with message:
ESLint errors found:
- Line 2: 'useState' is not defined. Add: import { useState } from 'react'
- Line 5: Missing return statement. Add JSX return.
Iteration 2: Kiro fixes errors
// UserProfile.tsx (corrected by Kiro)
import { useState } from 'react';
export default function UserProfile() {
const [name, setName] = useState('');
return <div>Profile: {name}</div>;
}
Hook Analysis:
- ESLint: PASS
- TypeScript: PASS (if tsconfig allows implicit any)
- Prettier: Needs formatting
Hook applies Prettier:
// Auto-formatted
import { useState } from 'react';
export default function UserProfile() {
const [name, setName] = useState('');
return <div>Profile: {name}</div>;
}
Hook returns exit code 0 (success)
Questions while tracing:
- How many iterations is acceptable before giving up?
- Should you auto-apply Prettier or ask Kiro to format?
- What if TypeScript errors require type annotations—can you guide Kiro?
- How do you detect an infinite loop (same error every iteration)?
The Interview Questions They’ll Ask
Prepare to answer these:
- “How would you design a retry mechanism that prevents infinite loops while still giving the AI enough attempts to fix complex errors?”
- “Explain the performance tradeoff between running ESLint on every file write vs caching and invalidating results.”
- “How do you integrate multiple linters (ESLint, Prettier, TSC) without conflicts, especially when their rules contradict?”
- “What strategies would you use to communicate lint errors to an AI in a way that maximizes fix success rate?”
- “How would you implement incremental type checking to avoid re-checking the entire project on every file write?”
- “Explain how you would handle a scenario where auto-fix introduces new errors (e.g., removing ‘unused’ code that’s actually needed).”
Hints in Layers
Hint 1: Start with One Linter
Begin with ESLint only. Run eslint --format json path/to/file.ts and parse the output. Return exit code 1 if errors exist, 0 if clean. Get this working before adding TypeScript or Prettier.
Hint 2: Parse Tool Output to Extract File Path The PostToolUse event includes the tool name and arguments. For the Edit/Write tools, extract the file path:
const event = JSON.parse(stdin);
if (event.tool.name === 'Edit' || event.tool.name === 'Write') {
const filePath = event.tool.input.file_path;
runLinters(filePath);
}
Hint 3: Structured Error Messages for Kiro Format errors as actionable instructions:
❌ ESLint Error (no-undef):
File: src/components/UserProfile.tsx
Line: 12
Error: 'useState' is not defined
Fix: Add import at top: import { useState } from 'react';
Hint 4: Performance - Run Checks in Parallel
Use Promise.all() to run ESLint, TypeScript, and Prettier simultaneously:
const [eslintResult, tscResult, prettierResult] = await Promise.all([
runESLint(filePath),
runTypeScript(filePath),
runPrettier(filePath)
]);
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| ESLint Integration | ESLint official docs | Node.js API, Custom Rules |
| Hook System | Kiro CLI docs | Hooks System, Event Schemas |
| Process Exit Codes | “Advanced Linux Programming” by CodeSourcery | Ch. 3 (Processes), Ch. 11 (Error Handling) |
| TypeScript Compiler | “Programming TypeScript” by Boris Cherny | Ch. 10 (Compiler API) |
| Prettier Integration | Prettier official docs | API Reference, Editor Integration |
Common Pitfalls & Debugging
Problem 1: “Hook runs forever, infinite retry loop”
- Why: Auto-fix introduces new errors, or Kiro keeps making the same mistake
- Fix: Track error signatures and exit after 3 identical errors:
const seen = new Set(); const errorSig = JSON.stringify(errors); if (seen.has(errorSig)) { console.error("Same errors after fix. Aborting."); process.exit(0); // Let it through to avoid infinite loop } seen.add(errorSig); - Quick test: Write a file with an unfixable error, verify hook exits gracefully
Problem 2: “Type checking is too slow (10+ seconds per write)”
- Why: Running
tscon entire project for every file write - Fix: Use incremental compilation or project references:
tsc --noEmit --incremental --tsBuildInfoFile .tsbuildinfo - Quick test: Time hook execution—should be <2 seconds for 95% of files
Problem 3: “Prettier and ESLint conflict (format changes fail lint)”
- Why: ESLint has formatting rules that contradict Prettier
- Fix: Disable ESLint formatting rules:
npm install --save-dev eslint-config-prettierAdd to
.eslintrc.json:"extends": ["prettier"] - Quick test: Format with Prettier, then run ESLint—no errors
Problem 4: “Hook doesn’t trigger for certain file writes”
- Why: Hook only listens for Edit tool, but Kiro used Write tool
- Fix: Handle both Edit and Write tools:
if (['Edit', 'Write', 'MultiEdit'].includes(event.tool.name)) { runLinters(event.tool.input.file_path); } - Quick test: Use both Edit and Write tools, verify hook runs for both
Problem 5: “Kiro ignores fix instructions, doesn’t retry”
- Why: Error message is unstructured or exit code is wrong
- Fix: Return exit code 1 and print structured errors to stdout (not stderr):
if (errors.length > 0) { console.log(formatErrorsForKiro(errors)); process.exit(1); // Trigger retry } - Quick test: Manually trigger hook with errors, verify Kiro receives message
Definition of Done
- Hook intercepts all Edit/Write/MultiEdit tool calls
- ESLint runs on every file write and detects errors
- Auto-fix is applied for fixable rules (no-console, quotes, etc.)
- TypeScript compiler checks types (tsc –noEmit) for .ts/.tsx files
- Prettier formats code after all checks pass
- Hook returns exit code 1 with actionable errors if checks fail
- Infinite loop detection prevents >3 retries on identical errors
- Hook completes in <2 seconds for 95% of files
- ESLint and Prettier configs are conflict-free
- Errors are formatted in a way that Kiro can parse and fix
- Documentation explains how to customize linter rules and disable checks
Project 20: “The Git Context Injector” — Context Management
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Bash |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Context Management |
What you’ll build: A UserPromptSubmit hook that appends git diff --staged.
Why it teaches Dynamic Context: The AI always sees the current change set.
Success criteria:
- Prompt includes diff content automatically.
Real World Outcome
You’ll have a context injector that automatically enriches every Kiro prompt with git state information, ensuring the AI always has visibility into what code is currently changed, staged, or uncommitted. This eliminates the need to manually paste git diff output:
Without the hook:
$ kiro "write tests for the changes I just made"
Kiro: I don't see any recent changes in the conversation. Can you share what files you modified?
With the hook:
$ git add src/auth.ts # Stage your changes
$ kiro "write tests for the changes I just made"
[UserPromptSubmit Hook] Injecting git context...
Enhanced prompt sent to Kiro:
────────────────────────────────────
Original: "write tests for the changes I just made"
Git Context:
Branch: feature/oauth-login
Status: 1 file changed, 45 insertions(+), 12 deletions(-)
Staged Changes:
diff --git a/src/auth.ts b/src/auth.ts
index 1234567..abcdefg 100644
--- a/src/auth.ts
+++ b/src/auth.ts
@@ -10,7 +10,15 @@ export class AuthService {
- async login(username: string, password: string) {
- return this.basicAuth(username, password);
+ async login(provider: 'google' | 'github', token: string) {
+ const user = await this.oauthVerify(provider, token);
+ return this.createSession(user);
}
+
+ private async oauthVerify(provider: string, token: string) {
+ // New OAuth verification logic
+ }
────────────────────────────────────
Kiro: I can see you've refactored the login method to support OAuth. I'll write comprehensive tests for both Google and GitHub OAuth flows, covering token validation, user creation, and session management.
[Kiro writes auth.test.ts with OAuth-specific tests]
Context injection report:
$ bash analyze-context-usage.sh
Git Context Injector Report (Last 30 Days)
───────────────────────────────────────────
Total Prompts: 1,847
Context Injected: 1,245 (67%)
Context Skipped: 602 (33% - no staged changes)
Average Context Size:
- Staged diff: 234 lines
- Unstaged diff: 127 lines
- Recent commits: 3 commits
Top Use Cases:
1. "Write tests for these changes" (387 prompts)
2. "Review this code" (298 prompts)
3. "Fix the bug I introduced" (156 prompts)
4. "Document what I changed" (89 prompts)
Token Budget Impact:
- Average context added: 1,200 tokens
- Prompts that exceeded budget: 12 (0.6%)
- Context truncation applied: 8 times
The hook intelligently decides what git information is relevant and formats it for maximum AI comprehension.
The Core Question You’re Answering
“How do I give AI visibility into my current work context without manually pasting git diffs every time?”
Before you start coding, consider: AI is stateless—it doesn’t know what files you changed, what branch you’re on, or what you committed yesterday. Developers waste time copying git diff output or explaining “I just modified the auth file.” A context injector automates this, making every prompt git-aware. This project teaches you to augment prompts with dynamic, session-specific context that makes AI more effective.
Concepts You Must Understand First
Stop and research these before coding:
- UserPromptSubmit Hook Lifecycle
- When does UserPromptSubmit execute (before prompt is sent to LLM)?
- Can you modify the user’s prompt text?
- How do you append context without overwriting the original prompt?
- What is the maximum context size before truncation is needed?
- Book Reference: Kiro CLI documentation - Hook System Architecture
- Git Plumbing Commands
- How do you get staged changes only (
git diff --cached)? - How do you get unstaged changes (
git diff)? - How do you get recent commit history (
git log -n 3 --oneline)? - How do you check if you’re in a git repository (
git rev-parse --is-inside-work-tree)? - Book Reference: “Pro Git” by Scott Chacon - Ch. 10 (Git Internals)
- How do you get staged changes only (
- Context Relevance Heuristics
- When should you inject diff (user mentions “changes”, “modified”, “tests”)?
- When should you skip injection (generic questions about unrelated topics)?
- How do you detect if the user is asking about code vs asking about concepts?
- Should you always inject branch name and commit history?
- Book Reference: None - requires experimentation and user feedback
- Token Budget Management
- How many tokens does a typical diff consume?
- How do you truncate large diffs (>100 files changed)?
- Should you prioritize staged changes over unstaged?
- How do you summarize commits vs including full diffs?
- Book Reference: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 4 (Encoding)
Questions to Guide Your Design
Before implementing, think through these:
- Context Selection
- Should you inject staged changes, unstaged changes, or both?
- Do you include recent commits, or only uncommitted work?
- How do you decide between
git diffandgit show HEAD? - Should you include file renames, binary file changes, or submodule updates?
- Prompt Augmentation
- Where do you inject context (before prompt, after, or in structured fields)?
- How do you format diffs for readability (syntax highlight, collapse large hunks)?
- Should you summarize (“3 files changed, 45 insertions”) or show full diffs?
- Do you annotate the context (“Git Context:” header) or inject silently?
- Trigger Logic
- Do you inject context on every prompt or only when relevant?
- How do you detect relevance (keyword matching, NLP, always-on)?
- Should users be able to opt out (–no-git-context flag)?
- What if there are no changes—do you inject “no changes” or skip entirely?
- Performance and Safety
- How do you handle repositories with thousands of files changed?
- Should you run
git diffsynchronously or cache results? - What if
git difftakes 10 seconds (large binary files)? - How do you avoid leaking secrets in diffs (API keys, passwords)?
Thinking Exercise
Manual Context Injection Walkthrough
Before writing code, trace how your hook enhances different prompts:
Scenario 1: User asks about recent changes
User prompt: "Review the authentication changes I made"
Hook detects keywords: ["changes", "made"]
Hook runs: git diff --cached
Injected context:
Branch: feature/oauth
Staged: src/auth.ts (+45, -12)
Enhanced prompt:
"Review the authentication changes I made
Git Context:
<diff output>
"
Scenario 2: User asks generic question
User prompt: "What is the difference between OAuth and JWT?"
Hook detects: No code-specific keywords
Hook decision: Skip git context (not relevant)
Prompt sent unchanged:
"What is the difference between OAuth and JWT?"
Scenario 3: Large diff (500 files changed)
User prompt: "Write tests for my refactor"
Hook detects: 500 files changed in staging area
Hook decision: Truncate to top 10 most-changed files
Enhanced prompt:
"Write tests for my refactor
Git Context (truncated to top 10 files):
src/auth.ts (+200, -50)
src/db.ts (+150, -30)
...
[490 more files not shown]
"
Scenario 4: No changes staged
User prompt: "Explain this error message"
Hook runs: git diff --cached
Result: No output (nothing staged)
Hook decision: Inject "No staged changes" summary
Enhanced prompt:
"Explain this error message
Git Context: No staged changes. Branch: main (up to date)
"
Questions while tracing:
- How do you balance verbosity (full diffs) vs conciseness (summaries)?
- Should you always show branch name, even if it’s not relevant?
- How do you handle merge conflicts in diffs?
- What if the user’s prompt is already very long—do you still inject context?
The Interview Questions They’ll Ask
Prepare to answer these:
- “How would you design a heuristic to determine when git context is relevant to a user’s prompt vs when it’s just noise?”
- “Explain the difference between
git diff,git diff --cached, andgit diff HEAD. When would you use each?” - “How would you handle a scenario where the git diff output contains sensitive information like API keys or passwords?”
- “What strategies would you use to truncate large diffs (500+ files changed) while preserving the most important information?”
- “How would you implement caching for git commands to avoid running expensive operations on every prompt?”
- “Explain how you would detect if a user’s prompt is asking about code (inject context) vs asking a conceptual question (skip context).”
Hints in Layers
Hint 1: Start with Always-On Injection
Begin by injecting git context on every prompt. Don’t implement keyword detection yet. Get the basic flow working: read stdin (user prompt), run git diff --cached, append to prompt, output to stdout.
Hint 2: Check for Git Repository First Before running git commands, verify you’re in a repo:
if ! git rev-parse --is-inside-work-tree &>/dev/null; then
# Not a git repo, skip injection
echo "$original_prompt"
exit 0
fi
Hint 3: Format Context for Readability Use markdown fences to make diffs clear:
echo "$original_prompt"
echo ""
echo "Git Context:"
echo '```diff'
git diff --cached
echo '```'
Hint 4: Truncate Large Diffs Limit diff size to avoid token budget issues:
diff_lines=$(git diff --cached | wc -l)
if [ "$diff_lines" -gt 500 ]; then
git diff --cached --stat # Show summary only
else
git diff --cached
fi
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Git Internals | “Pro Git” by Scott Chacon | Ch. 10 (Git Internals), Ch. 2 (Git Basics) |
| Hook System | Kiro CLI documentation | Hooks System, UserPromptSubmit |
| Shell Scripting | “The Linux Command Line” by William Shotts | Ch. 27 (Flow Control), Ch. 24 (Script Debugging) |
| Token Management | Kiro CLI docs | Context Window Management |
| Text Processing | “Unix Power Tools” by Shelley Powers | Ch. 13 (Searching and Substitution) |
Common Pitfalls & Debugging
Problem 1: “Hook adds context to every prompt, even unrelated questions”
- Why: No relevance detection implemented
- Fix: Add keyword matching:
if echo "$prompt" | grep -qiE '(change|diff|commit|modify|test|review)'; then inject_git_context fi - Quick test: Ask “What is Python?” and verify no git context is added
Problem 2: “Diff output contains API keys or secrets”
- Why: No secret scanning before injecting context
- Fix: Filter out sensitive patterns:
git diff --cached | grep -vE '(API_KEY|SECRET|PASSWORD|TOKEN)=' - Quick test: Stage a file with
API_KEY=abc123, verify it’s redacted
Problem 3: “Hook is slow, takes 5+ seconds per prompt”
- Why: Running
git diffon a massive repository every time - Fix: Cache diff results and invalidate on file changes:
cache_file="/tmp/git-context-$(git rev-parse HEAD).cache" if [ ! -f "$cache_file" ]; then git diff --cached > "$cache_file" fi cat "$cache_file" - Quick test: Time hook execution—should be <100ms with cache
Problem 4: “Large diffs break Kiro’s context window”
- Why: 500-file refactor generates 50,000 lines of diff
- Fix: Implement smart truncation:
if [ $(git diff --cached | wc -l) -gt 500 ]; then echo "Git Context (large changeset, showing summary):" git diff --cached --stat | head -20 echo "[...truncated...]" else git diff --cached fi - Quick test: Create a large diff, verify it’s summarized
Problem 5: “Hook doesn’t work in subdirectories”
- Why: Git commands run from hook’s directory, not user’s cwd
- Fix: Detect git root and run commands there:
git_root=$(git rev-parse --show-toplevel) cd "$git_root" || exit 0 git diff --cached - Quick test: Run Kiro from a subdirectory, verify context injection works
Definition of Done
- Hook intercepts all UserPromptSubmit events
- Git context is injected when user mentions code/changes (keyword detection)
- Staged changes are shown with
git diff --cached - Unstaged changes are optionally included based on prompt
- Branch name and recent commits are included in summary
- Large diffs (>500 lines) are truncated with stats summary
- Hook skips injection when not in a git repository
- Secret patterns (API keys, passwords) are filtered from diffs
- Context is formatted in markdown code fences for readability
- Cache invalidation prevents stale diff data
- Hook completes in <100ms for cached results, <1s for fresh diffs
- Documentation explains how to disable context injection per-prompt
Project 21: “The Headless Server Setup” — Remote Development
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Linux Shell |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Remote Development |
What you’ll build: Install Kiro on a headless VM and authenticate via device flow.
Why it teaches Remote Dev: This is the standard pattern for server-based work.
Success criteria:
- Headless login succeeds without a local browser.
Real World Outcome
You’ll have Kiro running on a headless Linux server (no GUI, no browser) and successfully authenticated via OAuth device flow. This enables server-side automation, CI/CD integration, and remote development workflows:
Successful headless authentication:
# On remote server (no GUI)
$ ssh user@dev-server.company.com
$ kiro auth login
Kiro CLI Authentication (Device Flow)
═══════════════════════════════════════
No browser detected. Using device code flow.
1. Visit this URL on any device with a browser:
https://anthropic.com/device
2. Enter this code: ABCD-EFGH
3. Authorize the Kiro CLI application
Waiting for authorization... (timeout in 10 minutes)
[User visits URL on laptop, enters code, approves]
✓ Authentication successful!
✓ Token stored in: ~/.kiro/credentials.json
✓ Expires in: 30 days
$ kiro "show me system info"
Kiro: [Returns server specs, uptime, disk usage]
# Headless session is now active
Automated server setup:
$ cat setup-headless-kiro.sh
#!/bin/bash
# Install and configure Kiro on headless servers
# Install Kiro CLI
curl -fsSL https://kiro.dev/install.sh | bash
# Verify installation
kiro --version
# Authenticate (device flow)
echo "Follow the URL and enter the code displayed"
kiro auth login
# Verify authentication
kiro auth whoami
# Set up SSH agent forwarding (optional)
echo 'Host dev-server' >> ~/.ssh/config
echo ' ForwardAgent yes' >> ~/.ssh/config
echo "✓ Kiro headless setup complete"
CI/CD integration example:
# .github/workflows/deploy.yml
name: Deploy with Kiro
on: [push]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Kiro
run: curl -fsSL https://kiro.dev/install.sh | bash
- name: Authenticate Kiro (service account)
env:
KIRO_API_KEY: ${{ secrets.KIRO_API_KEY }}
run: kiro auth login --api-key $KIRO_API_KEY
- name: Run deployment
run: kiro "deploy to staging using docker-compose"
This setup unlocks server-side AI automation where Kiro can manage infrastructure, run deployments, and execute maintenance tasks—all without human intervention.
The Core Question You’re Answering
“How do I use an AI coding assistant on remote servers that don’t have browsers or GUIs?”
Before you start coding, consider: Most cloud services use OAuth for authentication, which requires opening a browser to authorize. But servers don’t have browsers—they’re headless. The device flow (RFC 8628) solves this by splitting authentication into two steps: the headless device gets a code, and you enter that code on any browser-enabled device. This project teaches you to bridge local and remote authentication workflows.
Concepts You Must Understand First
Stop and research these before coding:
- OAuth Device Authorization Grant (Device Flow)
- What is the device code flow (RFC 8628)?
- How does it differ from standard OAuth (authorization code flow)?
- What are the steps: device code request → user authorization → token polling?
- How do you handle timeout (user doesn’t authorize within 10 minutes)?
- Book Reference: “OAuth 2 in Action” by Justin Richer - Ch. 12 (Device Flow)
- SSH Agent Forwarding
- What is
ForwardAgent yesin SSH config? - How do you forward local credentials to remote sessions?
- What are the security risks of agent forwarding?
- When should you use ProxyJump vs ForwardAgent?
- Book Reference: “SSH Mastery” by Michael W. Lucas - Ch. 6 (Agent Forwarding)
- What is
- Headless Environment Detection
- How do you detect if a display is available (
$DISPLAY,$WAYLAND_DISPLAY)? - How do you check if
xdg-openoropencommands work? - What’s the difference between TTY and pseudo-TTY sessions?
- How do you determine if running in a CI/CD environment?
- Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 62 (Terminals)
- How do you detect if a display is available (
- Credential Storage and Rotation
- Where should tokens be stored (
~/.kiro/credentials.json)? - What file permissions are required for security (600)?
- How do you handle token expiration and refresh?
- Should you use environment variables vs config files?
- Book Reference: “Building Secure and Reliable Systems” by Google - Ch. 9 (Secrets Management)
- Where should tokens be stored (
Questions to Guide Your Design
Before implementing, think through these:
- Authentication Method Selection
- How do you detect if a browser is available vs requiring device flow?
- Should you support both interactive and non-interactive modes?
- How do you handle service accounts (API keys) vs user accounts (OAuth)?
- What happens if the device code expires before authorization?
- Token Management
- Where do you store the access token (file, keyring, environment)?
- How do you secure the token file (permissions, encryption)?
- Do you need a refresh token for long-running servers?
- How often should you validate the token (every command, daily)?
- CI/CD Integration
- How do you authenticate in GitHub Actions/GitLab CI without interaction?
- Should you use service accounts or personal access tokens?
- How do you rotate tokens in CI without manual intervention?
- What’s the fallback if authentication fails mid-pipeline?
- Debugging Headless Issues
- How do you debug authentication failures without a GUI?
- Should you log to a file (
~/.kiro/debug.log) or stderr? - How do you test device flow locally before deploying to servers?
- What telemetry do you need (auth attempts, failures, timeouts)?
Thinking Exercise
Manual Device Flow Authentication Walkthrough
Before writing code, trace the OAuth device flow step by step:
Step 1: Kiro detects headless environment
$ kiro auth login
# Check for browser availability
if ! command -v xdg-open &>/dev/null && [ -z "$DISPLAY" ]; then
# Headless mode detected
use_device_flow=true
fi
Step 2: Request device code from Anthropic
POST https://anthropic.com/oauth/device/code
Content-Type: application/json
{
"client_id": "kiro-cli",
"scope": "kiro.read kiro.write"
}
Response:
{
"device_code": "DEVICE-12345",
"user_code": "ABCD-EFGH",
"verification_uri": "https://anthropic.com/device",
"expires_in": 600,
"interval": 5
}
Step 3: Display instructions to user
1. Visit: https://anthropic.com/device
2. Enter code: ABCD-EFGH
3. Authorize Kiro CLI
Step 4: Poll for authorization
# Poll every 5 seconds for up to 600 seconds
while [ $elapsed -lt 600 ]; do
POST https://anthropic.com/oauth/token
{
"grant_type": "urn:ietf:params:oauth:grant-type:device_code",
"device_code": "DEVICE-12345",
"client_id": "kiro-cli"
}
# Response if pending:
{ "error": "authorization_pending" }
# Response if approved:
{
"access_token": "kiro_...",
"token_type": "Bearer",
"expires_in": 2592000
}
sleep 5
done
Step 5: Store token securely
echo "$access_token" > ~/.kiro/credentials.json
chmod 600 ~/.kiro/credentials.json
Questions while tracing:
- What happens if the user never authorizes (timeout)?
- How do you handle polling errors (network failures)?
- Should you provide a QR code for mobile authorization?
- How do you test this without hitting real OAuth servers?
The Interview Questions They’ll Ask
Prepare to answer these:
- “Explain the OAuth device flow (RFC 8628) and how it differs from the standard authorization code flow. When would you use each?”
- “How would you securely store OAuth tokens on a headless server? What file permissions and encryption methods would you use?”
- “What strategies would you use to detect if a system is headless vs has a GUI available?”
- “How do you implement token refresh in a long-running server application without user interaction?”
- “Explain the security implications of SSH agent forwarding. When is it safe to use, and what are the alternatives?”
- “How would you design a CI/CD integration for an OAuth-authenticated tool without exposing tokens in logs?”
Hints in Layers
Hint 1: Detect Headless Environment Check for display availability before attempting browser launch:
if [ -z "$DISPLAY" ] && ! command -v xdg-open &>/dev/null; then
echo "No browser detected. Using device code flow."
device_flow=true
fi
Hint 2: Use curl for OAuth API Calls Request device code:
response=$(curl -s -X POST https://anthropic.com/oauth/device/code \
-H "Content-Type: application/json" \
-d '{"client_id": "kiro-cli", "scope": "kiro.read kiro.write"}')
device_code=$(echo "$response" | jq -r '.device_code')
user_code=$(echo "$response" | jq -r '.user_code')
verification_uri=$(echo "$response" | jq -r '.verification_uri')
Hint 3: Poll with Exponential Backoff Don’t hammer the server every second:
interval=5
max_attempts=120 # 10 minutes / 5 seconds
for i in $(seq 1 $max_attempts); do
token_response=$(curl -s -X POST https://anthropic.com/oauth/token \
-d "grant_type=device_code&device_code=$device_code&client_id=kiro-cli")
if echo "$token_response" | jq -e '.access_token' &>/dev/null; then
echo "✓ Authentication successful!"
break
fi
sleep $interval
done
Hint 4: Secure Token Storage
mkdir -p ~/.kiro
echo "$access_token" > ~/.kiro/credentials.json
chmod 600 ~/.kiro/credentials.json # Owner read/write only
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| OAuth Device Flow | “OAuth 2 in Action” by Justin Richer | Ch. 12 (Device Flow), Ch. 6 (Client Types) |
| SSH Configuration | “SSH Mastery” by Michael W. Lucas | Ch. 6 (Agent Forwarding), Ch. 11 (SSH for Automation) |
| Headless Systems | “The Linux Programming Interface” by Michael Kerrisk | Ch. 62 (Terminals), Ch. 34 (Process Groups) |
| Secrets Management | “Building Secure and Reliable Systems” by Google | Ch. 9 (Secrets), Ch. 6 (Least Privilege) |
| CI/CD Integration | “Continuous Delivery” by Jez Humble | Ch. 14 (Advanced Version Control) |
Common Pitfalls & Debugging
Problem 1: “Device code flow times out before user authorizes”
- Why: 10-minute timeout is too short, or user didn’t see the prompt
- Fix: Send timeout reminder:
echo "⏰ 5 minutes remaining. Visit https://anthropic.com/device" - Quick test: Wait 11 minutes without authorizing, verify graceful timeout
Problem 2: “Token file is world-readable, exposing credentials”
- Why: Default file creation umask allows group/other read
- Fix: Force secure permissions:
(umask 077 && echo "$token" > ~/.kiro/credentials.json) - Quick test:
ls -l ~/.kiro/credentials.jsonshould show-rw-------
Problem 3: “Authentication works locally but fails in CI/CD”
- Why: CI runs as different user with no home directory
- Fix: Use environment variables in CI:
export KIRO_TOKEN="$KIRO_API_KEY" kiro auth login --token-stdin <<< "$KIRO_TOKEN" - Quick test: Unset
$HOMEand verify token is read from env
Problem 4: “SSH agent forwarding doesn’t work”
- Why:
ForwardAgentnot enabled or SSH key not added to agent - Fix:
# On local machine ssh-add ~/.ssh/id_rsa ssh-add -l # Verify key is added # In ~/.ssh/config Host dev-server ForwardAgent yes - Quick test: SSH to server, run
ssh-add -l, verify keys are listed
Problem 5: “Device flow polling hammers the OAuth server (rate limit)”
- Why: Polling every second instead of respecting
intervalfrom response - Fix: Use the
intervalfield from device code response:interval=$(echo "$response" | jq -r '.interval // 5') sleep $interval - Quick test: Monitor network requests, verify polling rate matches interval
Definition of Done
- Kiro detects headless environment automatically (no
$DISPLAY) - Device code flow is initiated when browser is unavailable
- User sees clear instructions: URL to visit and code to enter
- Polling respects the
intervalfrom OAuth response (no rate limiting) - Token is stored in
~/.kiro/credentials.jsonwith600permissions - Authentication timeout (10 minutes) is handled gracefully
- Token refresh is implemented for long-running sessions
- CI/CD integration works via environment variable (
KIRO_API_KEY) - SSH agent forwarding is documented as an alternative
- Installation script (
setup-headless-kiro.sh) is provided - Documentation explains how to troubleshoot headless auth failures
Project 22: “The SSH Tunnel Agent” — Networking
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | SSH Config |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Networking |
What you’ll build: Run Kiro locally but execute commands remotely via ssh.
Why it teaches Hybrid Workflows: Brain local, execution remote.
Success criteria:
- A deploy task runs end-to-end on a remote host.
Real World Outcome
You’ll have a hybrid workflow where Kiro runs on your local machine (with GUI, editor, browser) but executes commands on remote servers via SSH. This combines local intelligence with remote execution power:
Local Kiro, remote execution:
# On your laptop
$ cat ~/.kiro/config.json
{
"remote_execution": {
"enabled": true,
"host": "prod-server.company.com",
"user": "deploy",
"tools": ["Bash", "Read", "Write", "Edit"]
}
}
$ kiro "deploy the latest version to production"
[Kiro running locally] Analyzing deployment strategy...
[SSH Tunnel] Connecting to prod-server.company.com...
[SSH Tunnel] Forwarding Bash tool execution...
[Remote Bash] ssh deploy@prod-server.company.com
$ cd /var/www/app
$ git pull origin main
$ npm install --production
$ pm2 restart app
✓ Deployment complete
[Kiro running locally] Deployment successful. Health check passing.
Transparent remote file access:
$ kiro "show me the nginx config on the production server"
[Kiro] Reading remote file via SSH...
[Remote Read Tool]
ssh deploy@prod-server.company.com 'cat /etc/nginx/nginx.conf'
[Kiro displays config and suggests improvements]
$ kiro "update the worker_processes to 4"
[Remote Edit Tool]
ssh deploy@prod-server.company.com 'cat > /tmp/nginx.conf.new << EOF
worker_processes 4;
...
EOF && sudo mv /tmp/nginx.conf.new /etc/nginx/nginx.conf'
[Remote Bash]
ssh deploy@prod-server.company.com 'sudo nginx -s reload'
✓ Configuration updated and reloaded
SSH config for seamless tunneling:
$ cat ~/.ssh/config
Host prod-server
HostName prod-server.company.com
User deploy
Port 22
IdentityFile ~/.ssh/deploy_key
ForwardAgent yes
ControlMaster auto
ControlPath ~/.ssh/control-%r@%h:%p
ControlPersist 10m
Host staging-server
HostName staging.company.com
User deploy
ProxyJump bastion.company.com
LocalForward 5432 localhost:5432
This setup enables “brain local, muscle remote” workflows where you get local responsiveness with remote execution power.
The Core Question You’re Answering
“How do I leverage my local development environment while executing commands on remote production servers?”
Before you start coding, consider: Installing Kiro on every server is impractical and risky. Running Kiro locally but SSHing to execute each command is slow and error-prone. SSH tunneling with ControlMaster lets you maintain a single persistent connection, forward tool execution transparently, and keep your local editor/browser while working on remote systems. This project teaches you to build hybrid architectures that combine local and remote strengths.
Concepts You Must Understand First
Stop and research these before coding:
- SSH ControlMaster (Connection Multiplexing)
- What is ControlMaster and how does it reuse SSH connections?
- How do you configure ControlPath for persistent sockets?
- What is ControlPersist and when does it close connections?
- How do you debug stuck control sockets (
-O check,-O exit)? - Book Reference: “SSH Mastery” by Michael W. Lucas - Ch. 8 (Multiplexing)
- Remote Command Execution Patterns
- How do you execute a single command via SSH (
ssh host 'command')? - How do you handle stdin/stdout redirection over SSH?
- What’s the difference between
ssh -t(pseudo-TTY) vs non-interactive? - How do you escape shell metacharacters in remote commands?
- Book Reference: “Unix Network Programming” by W. Richard Stevens - Ch. 19 (Remote Execution)
- How do you execute a single command via SSH (
- SSH Port Forwarding (Tunneling)
- What is local forwarding (
-L) vs remote forwarding (-R)? - How do you forward multiple ports simultaneously?
- What is dynamic forwarding (
-D) for SOCKS proxy? - How do you debug forwarding failures (
-vverbose mode)? - Book Reference: “SSH Mastery” by Michael W. Lucas - Ch. 7 (Forwarding)
- What is local forwarding (
- Tool Execution Proxying
- How do you intercept tool calls and route them to SSH?
- Should you proxy all tools or only specific ones (Bash, Read, Write)?
- How do you handle tool failures (network errors, timeouts)?
- What about file path translation (local vs remote paths)?
- Book Reference: Kiro CLI docs - Remote Execution Configuration
Questions to Guide Your Design
Before implementing, think through these:
- Tool Selection for Remote Execution
- Which tools should execute remotely (Bash, Read, Write, Edit)?
- Should Grep run remotely or locally after fetching files?
- What about tools that need local state (TodoWrite, EnterPlanMode)?
- How do you handle mixed workflows (some local, some remote)?
- Connection Management
- Do you open a new SSH connection per tool call or reuse one?
- How do you detect connection failures and retry?
- Should you establish the connection lazily (on first use) or eagerly?
- What’s the timeout for idle connections (ControlPersist)?
- Path Translation
- How do you map local file paths to remote paths?
- Do you assume identical directory structures?
- How do you handle absolute vs relative paths?
- What about symlinks that resolve differently locally vs remotely?
- Security and Permissions
- Should you use password auth or key-based auth?
- How do you handle sudo commands that require passwords?
- Do you need to validate the remote host’s fingerprint?
- How do you prevent command injection via shell escaping?
Thinking Exercise
Manual SSH Tunnel Execution Trace
Before writing code, trace how a tool call is proxied through SSH:
Scenario: Kiro executes a Bash command remotely
Step 1: User asks Kiro to deploy
$ kiro "deploy the app to production"
Kiro decides: Run `npm run build && pm2 restart app`
Step 2: Kiro invokes Bash tool
Tool: Bash
Arguments: {
"command": "npm run build && pm2 restart app"
}
Step 3: Remote execution hook intercepts
// Hook detects remote execution is enabled
if (config.remote_execution.enabled) {
if (config.remote_execution.tools.includes('Bash')) {
execute_remotely(tool, args);
}
}
Step 4: Build SSH command
ssh_command = [
'ssh',
'-o', 'ControlMaster=auto',
'-o', 'ControlPath=~/.ssh/control-%r@%h:%p',
'-o', 'ControlPersist=10m',
'deploy@prod-server.company.com',
'cd /var/www/app && npm run build && pm2 restart app'
]
Step 5: Execute via SSH
$ ssh deploy@prod-server.company.com 'cd /var/www/app && npm run build && pm2 restart app'
# SSH reuses existing connection via ControlMaster socket
# Output is streamed back to local Kiro
Step 6: Return result to Kiro
{
"exitCode": 0,
"stdout": "Build complete. PM2 restarted app.",
"stderr": "",
"duration": 12345
}
Questions while tracing:
- How do you handle commands that need interactive input (sudo passwords)?
- What if the SSH connection breaks mid-execution?
- How do you capture real-time output (streaming vs buffered)?
- What if the remote command takes hours—do you keep the connection open?
The Interview Questions They’ll Ask
Prepare to answer these:
- “Explain SSH ControlMaster and how it enables connection multiplexing. What are the performance benefits?”
- “How would you handle shell escaping when passing user-generated commands over SSH to prevent command injection?”
- “What are the differences between SSH local forwarding (-L), remote forwarding (-R), and dynamic forwarding (-D)? When would you use each?”
- “How would you design a retry mechanism for tool execution that fails due to transient network errors?”
- “Explain the security implications of SSH agent forwarding. How would you mitigate the risks?”
- “How would you implement path translation for tools that operate on files when local and remote directory structures differ?”
Hints in Layers
Hint 1: Configure SSH ControlMaster
Add to ~/.ssh/config:
Host prod-server
HostName prod-server.company.com
User deploy
ControlMaster auto
ControlPath ~/.ssh/control-%r@%h:%p
ControlPersist 10m
This creates a persistent connection socket that’s reused for all subsequent SSH commands.
Hint 2: Proxy Bash Tool Calls Intercept Bash tool and route to SSH:
# In a hook or wrapper script
original_command="$1"
ssh prod-server "cd /app && $original_command"
Hint 3: Test Connection Reuse Verify ControlMaster is working:
# First connection (slow, establishes socket)
time ssh prod-server 'echo hello' # ~500ms
# Subsequent connections (fast, reuse socket)
time ssh prod-server 'echo hello' # ~50ms
Hint 4: Handle Shell Escaping
Use printf %q to safely escape commands:
safe_command=$(printf '%q' "$user_command")
ssh prod-server "bash -c $safe_command"
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| SSH Multiplexing | “SSH Mastery” by Michael W. Lucas | Ch. 8 (Multiplexing), Ch. 7 (Forwarding) |
| Remote Execution | “Unix Network Programming” by W. Richard Stevens | Ch. 19 (Remote Execution) |
| Shell Escaping | “The Linux Command Line” by William Shotts | Ch. 35 (Strings and Numbers) |
| SSH Configuration | “SSH: The Secure Shell” by Barrett, Silverman, Byrnes | Ch. 7 (Advanced Client Use) |
| Networking Basics | “TCP/IP Illustrated, Volume 1” by W. Richard Stevens | Ch. 2 (The Internet Protocol) |
Common Pitfalls & Debugging
Problem 1: “Each SSH command takes 500ms, making Kiro unbearably slow”
- Why: Opening new SSH connection for every command
- Fix: Enable ControlMaster connection reuse:
Host prod-server ControlMaster auto ControlPath ~/.ssh/control-%r@%h:%p ControlPersist 10m - Quick test: Time 10 rapid SSH commands—should be <100ms each after first
Problem 2: “Commands fail with ‘command not found’ on remote host”
- Why: SSH non-interactive sessions don’t source
.bashrc - Fix: Source profile or use login shell:
ssh prod-server 'source ~/.bashrc && npm run build' # OR ssh -t prod-server 'npm run build' # Force pseudo-TTY - Quick test:
ssh prod-server 'echo $PATH'vsssh -t prod-server 'echo $PATH'
Problem 3: “Command injection via user input”
- Why: User command contains shell metacharacters (
;,|,&&) - Fix: Use parameterized execution or escape properly:
# BAD: ssh prod-server "rm $user_file" # Injection risk # GOOD: safe_file=$(printf '%q' "$user_file") ssh prod-server "rm $safe_file" - Quick test: Try
user_file="; rm -rf /"and verify it’s escaped
Problem 4: “Stuck control socket prevents new connections”
- Why: ControlMaster socket is hung or orphaned
- Fix: Kill stuck socket:
ssh -O exit prod-server # OR manually: rm ~/.ssh/control-deploy@prod-server.company.com:22 - Quick test:
ssh -O check prod-servershows socket status
Problem 5: “Remote commands don’t stream output, buffered until completion”
- Why: stdout is buffered when not connected to a TTY
- Fix: Use
stdbuforscriptto force line buffering:ssh prod-server 'stdbuf -oL npm run build' # OR force pseudo-TTY: ssh -t prod-server 'npm run build' - Quick test: Long-running command should show incremental output
Definition of Done
- SSH ControlMaster is configured for connection reuse
- Bash tool calls are proxied to remote host via SSH
- Read/Write/Edit tools can operate on remote files
- Tool execution respects working directory on remote host
- Shell commands are properly escaped to prevent injection
- ControlPersist keeps connections open for 10 minutes
- Failed SSH connections are retried with exponential backoff
- Remote execution can be toggled on/off via config file
- Path translation handles local vs remote directory structures
- Streaming output works for long-running commands
- Documentation explains SSH config setup and troubleshooting
- Supports bastion/jump hosts for accessing internal servers
Project 23: “The Corporate Proxy Navigator” — Enterprise Networking
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Env Vars / Certs |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Enterprise Networking |
What you’ll build: Configure Kiro to use HTTPS_PROXY and trust a custom root CA.
Why it teaches Enterprise Readiness: Most enterprise failures happen here.
Success criteria:
- Kiro can reach LLM endpoints through the proxy.
Real World Outcome
You’ll configure Kiro to work in a corporate environment with an HTTPS proxy and custom SSL certificates. When you run Kiro, it will successfully connect to Anthropic’s API through the corporate proxy without certificate errors.
Example Configuration:
# Set proxy environment variables
export HTTPS_PROXY=http://proxy.company.com:8080
export HTTP_PROXY=http://proxy.company.com:8080
export NO_PROXY=localhost,127.0.0.1,.company.local
# Trust custom root CA (macOS)
sudo security add-trusted-cert -d -r trustRoot \
-k /Library/Keychains/System.keychain \
/path/to/company-root-ca.crt
# Trust custom root CA (Linux)
sudo cp /path/to/company-root-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
# Verify proxy connectivity
curl -I https://api.anthropic.com/v1/messages
# HTTP/1.1 200 OK (via proxy)
# Run Kiro (should work without certificate errors)
kiro "list files in current directory"
Output you’ll see:
$ kiro "analyze the recent commits"
✓ Connected to Anthropic API via proxy.company.com:8080
✓ SSL certificate verified (CN=proxy.company.com, Issuer=CompanyRootCA)
Analyzing commits...
[Output shows Kiro successfully working through the proxy]
Troubleshooting verification:
# Verify certificate chain
openssl s_client -connect api.anthropic.com:443 -proxy proxy.company.com:8080 \
-showcerts 2>&1 | grep -A 2 "Verify return code"
# Verify return code: 0 (ok)
# Test proxy authentication (if required)
export HTTPS_PROXY=http://username:password@proxy.company.com:8080
# Debug SSL handshake
SSL_DEBUG=true kiro "test command"
You’re configuring the same environment that 80% of enterprise developers face daily.
The Core Question You’re Answering
“Why does my LLM tool work at home but fail at the office, and how do I make it work behind a corporate proxy with SSL inspection?”
Before you configure anything, understand this: Corporate networks intercept HTTPS traffic for security monitoring. Your computer sees a certificate signed by “CompanyRootCA” instead of the real website’s certificate. Without trusting that CA, every HTTPS connection fails with “certificate validation error.”
The proxy configuration and CA trust setup are not optional workarounds—they’re the standard way enterprise tools connect to the internet.
Concepts You Must Understand First
Stop and research these before configuring:
- HTTPS Proxies and CONNECT Method
- How does an HTTPS proxy tunnel encrypted traffic?
- What is the HTTP CONNECT method and why is it used for HTTPS?
- How does the client maintain end-to-end encryption through the proxy?
- Book Reference: “Computer Networks” by Andrew S. Tanenbaum - Ch. 7 (Application Layer)
- SSL/TLS Certificate Chains and Trust
- What is a certificate chain (root CA → intermediate CA → leaf certificate)?
- How does the OS certificate store work (System.keychain on macOS, /etc/ssl/certs on Linux)?
- What happens during SSL inspection (man-in-the-middle by the proxy)?
- Book Reference: “Serious Cryptography” by Jean-Philippe Aumasson - Ch. 13 (TLS)
- Environment Variables and Tool Configuration
- What is the precedence order (HTTPS_PROXY vs https_proxy vs tool-specific config)?
- How does NO_PROXY prevent proxy usage for internal domains?
- Why do some tools ignore environment variables (and how to configure them)?
- Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 6 (Environment)
- Proxy Authentication (Basic, NTLM, Kerberos)
- How does Basic Auth work (username:password in URL)?
- What is NTLM authentication and why is it used in Windows environments?
- How does Kerberos Single Sign-On (SSO) work with proxies?
- Book Reference: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens - Ch. 18 (HTTP)
- Debugging Network Connectivity
- How do you trace SSL handshake failures (openssl s_client)?
- How do you verify proxy connectivity (curl -I -x)?
- How do you capture and analyze HTTPS traffic (mitmproxy, Wireshark)?
- Book Reference: “Practical Packet Analysis” by Chris Sanders - Ch. 5 (HTTP/HTTPS)
- Certificate Pinning and Bypass Strategies
- What is certificate pinning and why do some tools use it?
- How do you disable pinning for enterprise proxies (NODE_TLS_REJECT_UNAUTHORIZED)?
- What are the security risks of disabling certificate validation?
- Book Reference: “Security Engineering” by Ross Anderson - Ch. 21 (Network Attacks)
Questions to Guide Your Design
Before configuring, think through these:
- Proxy Discovery and Configuration
- How do you automatically detect the corporate proxy (PAC file, WPAD)?
- Should you use environment variables or tool-specific config files?
- How do you handle proxy authentication without exposing credentials?
- What happens when you VPN into the network vs. are physically on-site?
- Certificate Trust Strategy
- Should you trust the root CA system-wide or per-application?
- How do you export the root CA from the browser (Chrome, Firefox)?
- What format does the certificate need to be (PEM, DER, PKCS#12)?
- How do you update the certificate when it rotates annually?
- Handling Proxy Failures
- How do you detect when the proxy is unreachable (timeout vs. connection refused)?
- Should you fall back to direct connection, or fail fast?
- How do you handle 407 Proxy Authentication Required errors?
- What logging helps diagnose proxy issues quickly?
- NO_PROXY Configuration
- Which domains should bypass the proxy (.company.local, localhost, 127.0.0.1)?
- How do you handle wildcard domains (*.internal.company.com)?
- Should you bypass the proxy for local development servers?
- How do you test that NO_PROXY is working correctly?
- Cross-Platform Compatibility
- How do you make the same config work on macOS, Linux, and Windows?
- Where does each OS store trusted root certificates?
- How do you handle case sensitivity (HTTPS_PROXY vs https_proxy on Windows)?
- What about containerized environments (Docker, Kubernetes)?
Thinking Exercise
Trace an HTTPS Request Through a Corporate Proxy
Before configuring anything, manually trace how an HTTPS request flows through a corporate proxy with SSL inspection:
Client Proxy Destination
| | |
|-- HTTP CONNECT api.anthropic.com:443 ------>|
| | |
|<-------- 200 Connection Established --------|
| | |
|-- TLS ClientHello -->| |
| (SNI: api.anthropic.com) |
| | |
|<-- TLS ServerHello --| |
| (Cert: CN=api.anthropic.com, |
| Issuer=CompanyRootCA) |
| | |
|-- TLS Finished ----->| |
| |-- TLS ClientHello --->|
| | (SNI: api.anthropic.com)
| | |
| |<-- TLS ServerHello ---|
| | (Cert: CN=api.anthropic.com,
| | Issuer=Let's Encrypt)
| | |
|-- HTTP Request ----->|-- HTTP Request ------>|
| (encrypted by | (re-encrypted by |
| client-proxy TLS)| proxy-server TLS) |
| | |
Questions while tracing:
- At what point does the client see the fake certificate signed by CompanyRootCA?
- Why does the proxy need to decrypt and re-encrypt the traffic (SSL inspection)?
- What would happen if CompanyRootCA is not trusted by the client?
- How does the client verify the hostname matches (SNI vs. certificate CN)?
- Why doesn’t the client see the real Let’s Encrypt certificate?
Manual test:
# 1. Try without trusting the CA (should fail)
curl https://api.anthropic.com/v1/messages -I
# curl: (60) SSL certificate problem: unable to get local issuer certificate
# 2. Export the proxy's CA certificate from Chrome
# Settings → Privacy and security → Security → Manage certificates
# → Authorities → Export "CompanyRootCA"
# 3. Trust the CA and retry
export SSL_CERT_FILE=/path/to/company-root-ca.crt
curl https://api.anthropic.com/v1/messages -I
# HTTP/1.1 200 OK (via proxy)
The Interview Questions They’ll Ask
Prepare to answer these:
-
“Why do HTTPS connections fail with ‘certificate validation error’ behind a corporate proxy, even though the proxy is configured correctly?”
-
“What is the difference between
HTTP_PROXYandHTTPS_PROXYenvironment variables, and when is each used?” -
“A user reports that some tools work through the proxy but others don’t. How would you diagnose this?”
-
“How do you securely configure proxy authentication without hardcoding credentials in environment variables?”
-
“What is a PAC (Proxy Auto-Config) file, and how does it differ from manually setting HTTPS_PROXY?”
-
“Why should you never set
NODE_TLS_REJECT_UNAUTHORIZED=0in production, even if it ‘fixes’ the certificate error?”
Hints in Layers
Hint 1: Start with Proxy Discovery Use the browser’s network settings to find the proxy configuration. On Windows, check “Internet Options → Connections → LAN Settings.” On macOS, check “System Preferences → Network → Advanced → Proxies.”
Hint 2: Test Connectivity Before Configuring Kiro
First verify that curl works through the proxy. If curl fails with certificate errors, no amount of Kiro configuration will help—you need to fix the CA trust issue first.
Hint 3: Export and Inspect the Certificate
Use openssl s_client to see the actual certificate chain:
openssl s_client -connect api.anthropic.com:443 -proxy proxy.company.com:8080 -showcerts
Look for “Issuer: CN=CompanyRootCA” (not “Issuer: Let’s Encrypt”). That certificate needs to be trusted.
Hint 4: Platform-Specific CA Trust Commands Each OS has a different way to trust certificates:
- macOS:
security add-trusted-cert - Linux (Debian/Ubuntu): Copy to
/usr/local/share/ca-certificates/and runupdate-ca-certificates - Linux (RHEL/CentOS): Copy to
/etc/pki/ca-trust/source/anchors/and runupdate-ca-trust - Node.js/Python tools: May need
NODE_EXTRA_CA_CERTSorREQUESTS_CA_BUNDLE
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| HTTPS Proxies | “Computer Networks” by Tanenbaum | Ch. 7 (Application Layer) |
| SSL/TLS Internals | “Serious Cryptography” by Aumasson | Ch. 13 (TLS) |
| Certificate Chains | “Bulletproof SSL and TLS” by Ivan Ristić | Ch. 4 (PKI) |
| Environment Variables | “The Linux Programming Interface” by Kerrisk | Ch. 6 (Environment) |
| Debugging HTTPS | “TCP/IP Illustrated, Vol. 1” by Stevens | Ch. 18 (HTTP) |
| Network Troubleshooting | “Practical Packet Analysis” by Sanders | Ch. 5 (HTTP/HTTPS) |
Common Pitfalls & Debugging
Problem 1: “curl: (60) SSL certificate problem: unable to get local issuer certificate”
- Why: The corporate proxy is presenting a certificate signed by an untrusted root CA.
- Fix: Export the root CA from your browser and add it to the system trust store.
- Quick test:
openssl s_client -connect api.anthropic.com:443 -proxy proxy.company.com:8080 -showcerts | grep Issuer
Problem 2: “Proxy authentication required (407)”
- Why: The proxy requires username/password, but none was provided.
- Fix: Add credentials to the proxy URL:
export HTTPS_PROXY=http://username:password@proxy.company.com:8080 - Quick test:
curl -I -x http://username:password@proxy.company.com:8080 https://api.anthropic.com
Problem 3: “Connection timeout when accessing internal domains”
- Why: Internal domains should bypass the proxy, but NO_PROXY is not configured.
- Fix:
export NO_PROXY=localhost,127.0.0.1,.company.local,*.internal.company.com - Quick test:
curl -I http://internal.company.local(should connect directly, not via proxy)
Problem 4: “Works in browser but not in terminal”
- Why: The browser uses the OS proxy settings, but terminal tools use environment variables.
- Fix: Set
HTTPS_PROXYin your shell profile (~/.bashrc,~/.zshrc). - Quick test:
echo $HTTPS_PROXY(should show the proxy URL)
Problem 5: “Certificate works for curl but not for Node.js tools”
- Why: Node.js uses its own certificate store, separate from the OS.
- Fix:
export NODE_EXTRA_CA_CERTS=/path/to/company-root-ca.crt - Quick test:
node -e "require('https').get('https://api.anthropic.com', r => console.log(r.statusCode))"
Problem 6: “Proxy works on-site but fails when VPNed from home”
- Why: VPN routing may bypass the proxy, or the proxy may only be accessible on the internal network.
- Fix: Check if the VPN provides split-tunnel (some traffic via VPN, some direct) or full-tunnel routing.
- Quick test:
curl -I https://api.anthropic.com(compare response headers on-site vs. VPN)
Definition of Done
- Kiro successfully connects to Anthropic API through the corporate proxy
- No SSL certificate validation errors appear
- Internal company domains bypass the proxy (via NO_PROXY)
- Proxy authentication (if required) works without exposing credentials in shell history
- Configuration persists across terminal sessions (added to shell profile)
- Works consistently on macOS, Linux, and Windows (if multi-platform team)
- Fallback mechanism exists if proxy is unreachable (manual switch or direct connection)
- Documentation includes troubleshooting steps for common certificate errors
- Tested with both Kiro and curl to ensure system-wide proxy configuration
- No hardcoded credentials in config files (use keychain or environment variables)
Project 24: “The Secret Sanitizer Hook” — Secrets Management
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Python (TruffleHog / Gitleaks) |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | 3. Service & Support (Security) |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Secrets Management |
What you’ll build: A PostToolUse hook that scans modified files for secrets.
Why it teaches Safety: Prevents accidental secret leakage.
Success criteria:
- A dummy key is detected and blocked.
Real World Outcome
You’ll create a PostToolUse hook that automatically scans every file written or modified by Kiro for secrets (API keys, passwords, tokens, private keys). When a secret is detected, the hook blocks the operation and alerts you immediately.
Example Hook Execution:
# Kiro writes a file containing a secret
$ kiro "create a .env file with DATABASE_URL=postgres://user:pass@localhost/db"
[Kiro writes .env file]
🚨 SECRET DETECTED in .env (line 1)
Type: PostgreSQL Connection String
Pattern: postgres://[user]:[password]@[host]/[db]
DATABASE_URL=postgres://user:pass@localhost/db
^^^^^^^^
❌ BLOCKED: File write operation prevented.
Recommendations:
1. Use environment variables instead: DATABASE_URL="${DATABASE_URL}"
2. Add .env to .gitignore
3. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault)
[Hook exits with code 1 - operation aborted]
Example scan output:
$ python ~/.kiro/hooks/secret-sanitizer.py
Scanning files modified in last operation...
✓ src/app.py - Clean
✓ src/utils.py - Clean
🚨 config.yaml - 2 secrets found
Line 12: AWS Access Key (AKIA...)
Line 13: AWS Secret Key (40-char base64 string)
✓ README.md - Clean
Summary: 2 secrets detected in 1 file
Action: BLOCK operation (exit code 1)
Integration with Git:
# After blocking the write, show what would have been committed
$ git diff
+AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
+AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
❌ These credentials would have been committed to git!
You’re implementing the same secret detection that GitHub, GitLab, and Bitbucket use to prevent credential leaks.
The Core Question You’re Answering
“How do you prevent developers (and AI agents) from accidentally committing secrets to version control or writing them to unencrypted config files?”
Before you build any detection, understand this: Secret leakage is one of the most common security incidents. Attackers scan public repositories for AWS keys, database passwords, and API tokens. Once a secret is committed to git history, it’s permanently exposed—even if you delete it in a later commit.
Your hook acts as a safety net that catches secrets before they reach git, config files, or logs.
Concepts You Must Understand First
Stop and research these before coding:
- Secret Detection Patterns (Entropy, Regex, Signatures)
- What is Shannon entropy and how is it used to detect random strings (API keys)?
- How do you write regex patterns for AWS keys (AKIA…), GitHub tokens (ghp_…), etc.?
- What are false positives (detecting “password” in code comments) and how do you reduce them?
- Book Reference: “Practical Cryptography” by Niels Ferguson - Ch. 2 (Randomness)
- Git Internals and Hooks
- What is the difference between PostToolUse hooks (Kiro) vs. pre-commit hooks (Git)?
- How do you scan only the files modified in the last operation (git diff –name-only)?
- Why can’t you just delete a secret from git history (it’s still in reflog and old commits)?
- Book Reference: “Pro Git” by Scott Chacon - Ch. 10 (Git Internals)
- Secrets Management Best Practices
- What is the principle of least privilege (why you shouldn’t use root credentials)?
- How do environment variables (os.environ) protect secrets from being committed?
- What are secrets managers (AWS Secrets Manager, Vault, 1Password) and when should you use them?
- Book Reference: “Security Engineering” by Ross Anderson - Ch. 4 (Cryptographic Protocols)
- TruffleHog and Gitleaks Internals
- How does TruffleHog scan git history for high-entropy strings?
- What is the difference between regex-based detection and entropy-based detection?
- How do you configure custom patterns (YAML rules for company-specific secrets)?
- Book Reference: “Practical Malware Analysis” by Sikorski - Ch. 13 (Automated Analysis)
- False Positive Reduction
- How do you distinguish between real secrets and test/example credentials?
- What is allowlisting (explicitly marking known-safe strings)?
- How do you handle encrypted secrets (ansible-vault, sops) vs. plaintext?
- Book Reference: “Building Secure and Reliable Systems” by Google - Ch. 14 (Security Monitoring)
- Incident Response for Leaked Secrets
- What do you do if a secret is detected after commit (rotate immediately)?
- How do you scan the entire git history for secrets (git log -p)?
- What is the MITRE ATT&CK framework for credential access (T1552)?
- Book Reference: “The Art of Memory Forensics” by Ligh - Ch. 8 (Malware Analysis)
Questions to Guide Your Design
Before implementing, think through these:
- Detection Strategy
- Should you scan all files or only modified files (performance trade-off)?
- Do you run detection on every tool use or only on file writes (Edit, Write)?
- Should you scan file content or git diffs (diffs are faster but may miss secrets)?
- How do you handle binary files (images, PDFs) that might contain secrets?
- Pattern Library
- Which secret types are highest priority (AWS, GitHub, Stripe, OpenAI)?
- Should you use a pre-built pattern library (Gitleaks rules) or custom regex?
- How do you detect generic secrets (40+ char random strings) vs. specific formats?
- Should you detect passwords in URLs (https://user:pass@example.com)?
- Blocking vs. Warning
- Should the hook block the operation (exit code 1) or just warn and continue?
- Do you block on all secrets or only high-confidence detections?
- Should you allow users to override the block (confirmation prompt)?
- How do you handle secrets in test fixtures (tests/fixtures/dummy-key.txt)?
- User Experience
- How do you show which line contains the secret without displaying the secret itself?
- Should you suggest remediation steps (use environment variables, add to .gitignore)?
- Do you integrate with the terminal (red error messages, visual alerts)?
- Should you log detections to a file for security auditing?
- Performance Optimization
- How do you avoid scanning the same file multiple times in one session?
- Should you cache detection results (file hash → scan result)?
- Do you run scans in parallel (ThreadPoolExecutor) for large codebases?
- How do you handle large files (> 1MB) that slow down scanning?
Thinking Exercise
Analyze a Real Secret Leak Scenario
Before coding, manually trace how a secret might leak through Kiro’s workflow:
Scenario: Developer asks Kiro to deploy to AWS
# User prompt
$ kiro "deploy the app to AWS using my credentials"
# Kiro (without the hook) might write:
# deploy.sh
#!/bin/bash
export AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws s3 sync ./build s3://my-bucket
Where secrets could leak:
- deploy.sh file (hardcoded credentials)
- Git history (if deploy.sh is committed)
- Shell history (~/.bash_history if script is run manually)
- Logs (if AWS CLI logs credentials in error messages)
- CI/CD logs (GitHub Actions logs show environment variables)
Your hook’s detection points:
# PostToolUse hook fires after Write tool creates deploy.sh
event = {
'tool': 'Write',
'input': {'file_path': 'deploy.sh', 'content': '#!/bin/bash\nexport AWS_ACCESS_KEY_ID=AKIA...'},
'output': {'status': 'success'}
}
# Hook scans the written file
findings = scan_file('deploy.sh')
# Finding 1: AWS Access Key (line 2, pattern: AKIA[A-Z0-9]{16})
# Finding 2: AWS Secret Key (line 3, high entropy: 40 random chars)
# Hook blocks the operation
exit(1) # Reverts the file write
Questions while tracing:
- At what point in the workflow should the hook scan for secrets?
- Should you scan the file content or the tool input parameters?
- What happens if the secret was copied from the user’s prompt (user provided it)?
- How do you prevent false positives (AKIAIOSFODNN7EXAMPLE is a documented example key)?
- Should you automatically suggest
export AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}"as a fix?
Manual test:
# 1. Create a test file with a fake secret
echo "API_KEY=sk_test_4eC39HqLyjWDarjtT1zdp7dc" > .env
# 2. Run Gitleaks on the file
gitleaks detect --source . --verbose
# Leak detected: Generic API Key (line 1)
# 3. Add to allowlist and re-run
echo "API_KEY=sk_test_4eC39HqLyjWDarjtT1zdp7dc" >> .gitleaksignore
gitleaks detect --source . --verbose
# No leaks detected (allowlisted)
The Interview Questions They’ll Ask
Prepare to answer these:
-
“How do secret detection tools like TruffleHog distinguish between real API keys and random strings in the code?”
-
“A developer committed an AWS access key to git 50 commits ago. What steps would you take to remediate this incident?”
-
“What is the difference between entropy-based secret detection and regex-based detection? When would you use each?”
-
“How would you handle false positives, such as detecting ‘password’ in code comments or test data?”
-
“Why is deleting a secret from the latest commit insufficient to secure the repository?”
-
“What are the performance trade-offs between scanning on every file write vs. scanning only on git commit?”
Hints in Layers
Hint 1: Start with Pre-Built Tools Don’t write regex patterns from scratch. Use TruffleHog or Gitleaks, which have hundreds of pre-built patterns for common secret types (AWS, GitHub, Stripe, etc.).
Hint 2: Hook Event Structure The PostToolUse hook receives a JSON event on stdin:
{
"hookType": "PostToolUse",
"tool": {"name": "Write", "input": {"file_path": "config.yaml", "content": "..."}, "output": {"status": "success"}}
}
Extract file_path and scan it.
Hint 3: Integrate Gitleaks for Fast Scanning
# Install Gitleaks
brew install gitleaks # macOS
# or download binary from https://github.com/gitleaks/gitleaks/releases
# Scan a single file
gitleaks detect --source /path/to/file --verbose --no-git
# Parse JSON output
gitleaks detect --source . --report-format json --report-path results.json
Hint 4: Exit Code Semantics
- Exit 0: No secrets found (allow operation)
- Exit 1: Secrets found (block operation and revert changes)
- Use
sys.exit(1)in Python orexit 1in Bash
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Secret Detection Theory | “Practical Cryptography” by Ferguson | Ch. 2 (Randomness) |
| Git Internals | “Pro Git” by Scott Chacon | Ch. 10 (Git Internals) |
| Secrets Management | “Security Engineering” by Ross Anderson | Ch. 4 (Cryptographic Protocols) |
| Entropy Analysis | “Applied Cryptography” by Bruce Schneier | Ch. 17 (Randomness) |
| Incident Response | “The Art of Memory Forensics” by Ligh | Ch. 8 (Malware Analysis) |
| Secure Systems | “Building Secure Systems” by Google | Ch. 14 (Security Monitoring) |
Common Pitfalls & Debugging
Problem 1: “Too many false positives (detecting ‘password’ in comments)”
- Why: Regex patterns are too broad and match non-secrets.
- Fix: Use entropy analysis (only flag strings with high randomness) or context-aware patterns (exclude comments).
- Quick test:
gitleaks detect --source . --verbose | grep "password"(review all matches)
Problem 2: “Secrets in git history not detected by the hook”
- Why: The hook only scans new changes, not the entire git history.
- Fix: Run a one-time full repo scan:
gitleaks detect --source . --verbose - Quick test:
git log -p | grep -E 'AKIA[A-Z0-9]{16}'(manual search for AWS keys)
Problem 3: “Hook blocks valid test fixtures (tests/fixtures/dummy-key.txt)”
- Why: Test data often includes fake secrets for testing.
- Fix: Add test directories to
.gitleaksignoreor use a custom allowlist. - Quick test: Create
.gitleaksignorewithtests/, re-run scan.
Problem 4: “Performance degradation on large repos (> 1000 files)”
- Why: Scanning every file on every tool use is too slow.
- Fix: Only scan files modified in the last operation (use
git diff --name-only). - Quick test:
time gitleaks detect --source .(measure scan time before/after optimization)
Problem 5: “Secrets in environment variables not detected”
- Why: The hook scans files, but secrets might be passed via
export VAR=secretin the shell. - Fix: Scan shell history (
~/.bash_history) or intercept theBashtool’s input. - Quick test:
grep -E 'export.*KEY' ~/.bash_history
Problem 6: “No notification when secret is blocked”
- Why: The hook exits with code 1, but Kiro doesn’t show the hook’s stderr output.
- Fix: Write findings to a log file (
~/.kiro/secret-findings.log) and show the path in the error. - Quick test:
tail -f ~/.kiro/secret-findings.log(monitor detections in real-time)
Definition of Done
- Hook scans all files written or modified by Kiro (Write, Edit tools)
- Detects common secret types (AWS keys, GitHub tokens, API keys, database URLs)
- Blocks operation when secrets are found (exit code 1)
- Shows which line contains the secret without displaying the secret itself
- False positive rate < 5% (most detections are real secrets)
- Performance: scan completes in < 2 seconds for typical file changes
- Integrates with Gitleaks or TruffleHog for pattern matching
- Provides remediation suggestions (use environment variables, add to .gitignore)
- Logs all detections to
~/.kiro/secret-findings.logfor security auditing - Supports allowlisting (
.gitleaksignoreor custom allowlist file) - Tested with dummy secrets (fake AWS keys, test database URLs) to verify detection
- Documentation explains how to override blocks for legitimate use cases
Project 25: “The Tangent Explorer” — Context Management
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | N/A (Interaction) |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 1: Beginner |
| Knowledge Area | Context Management |
What you’ll build: Simulate a debugging session, switch to a tangent, then return without polluting context.
Why it teaches Context Hygiene: Tangents keep the main thread clean.
Success criteria:
- The main context summary ignores tangent content.
Real World Outcome
You’ll learn to use Kiro’s tangent mode to isolate exploratory work, debugging, and side investigations without polluting your main conversation context. This keeps your primary task focused while allowing deep dives into related issues.
Example Workflow:
# Main conversation: Implementing user authentication
$ kiro "implement JWT-based user authentication for the API"
[Kiro starts implementing auth...]
# You notice a dependency version conflict (tangent opportunity)
$ kiro "I'm seeing a dependency conflict with jsonwebtoken. Let me investigate in a tangent."
# Kiro creates a tangent session
[TANGENT MODE: dependency-investigation]
$ kiro "show me all versions of jsonwebtoken in package-lock.json and explain the conflict"
[Kiro analyzes dependencies in tangent]
$ kiro "what's the difference between jsonwebtoken 8.x and 9.x?"
[Kiro researches in tangent]
$ kiro "update to jsonwebtoken 9.5.2 and verify tests pass"
[Kiro fixes the issue in tangent]
# Exit tangent and return to main task
$ kiro "tangent resolved, return to main"
[MAIN CONTEXT: implementing JWT-based user authentication]
[Tangent summary: Fixed jsonwebtoken version conflict -> upgraded to 9.5.2]
# Main context continues, enriched but not polluted
$ kiro "continue implementing auth with the updated jsonwebtoken library"
What you’ll see in the session history:
Main Context (auth-implementation):
├─ User: implement JWT-based auth
├─ Kiro: [implementation steps]
├─ [TANGENT: dependency-investigation]
│ └─ Summary: Upgraded jsonwebtoken 8.x → 9.5.2, tests passing
└─ Kiro: [continues with auth using new library]
Tangent Context (dependency-investigation - ISOLATED):
├─ User: show me all versions of jsonwebtoken
├─ Kiro: [detailed analysis of package-lock.json]
├─ User: what's the difference between 8.x and 9.x?
├─ Kiro: [research on breaking changes]
├─ User: update to 9.5.2 and verify tests
└─ Kiro: [upgrade + test verification]
Key benefit: The main context remains focused on “implement auth” and doesn’t include 15 messages about npm dependency resolution. Only the summary (“upgraded jsonwebtoken → 9.5.2”) is surfaced.
The Core Question You’re Answering
“How do you keep your primary conversation focused while exploring tangential issues, debugging edge cases, or researching related topics without losing all your context to noise?”
Before you use tangent mode, understand this: LLM context windows are finite (even at 200K tokens). Every message consumes context budget. If you debug a webpack config issue (20 messages) in the middle of implementing a feature (30 messages), the combined 50-message thread becomes hard to summarize and maintain focus on the original goal.
Tangent mode isolates exploratory work, keeping the main thread clean and summarizable.
Concepts You Must Understand First
Stop and research these before using tangents:
- Context Window and Token Budgets
- What is the context window size for Claude (200K tokens)?
- How many tokens does a typical message consume (prompt + response)?
- What happens when context fills up (automatic summarization vs. truncation)?
- Book Reference: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 1 (Data Models)
- Conversation Threading and Branching
- How do chat systems maintain conversation history (linear vs. tree structure)?
- What is the difference between forking a conversation and creating a tangent?
- How do you merge insights from a tangent back into the main thread?
- Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 7 (Debugging)
- Summarization and Information Compression
- How do LLMs summarize long conversations (extractive vs. abstractive summarization)?
- What information is preserved vs. discarded in a summary?
- How do you ensure critical decisions from tangents are surfaced in summaries?
- Book Reference: “Speech and Language Processing” by Jurafsky - Ch. 23 (Summarization)
- Context Hygiene Best Practices
- When should you start a tangent vs. continue in the main thread?
- How do you name tangents descriptively (for future reference)?
- What is the “two-level rule” (main + tangent, avoid nested tangents)?
- Book Reference: “Working Effectively with Legacy Code” by Feathers - Ch. 6 (Code Smells)
- Cognitive Load Management
- What is working memory capacity (7 ± 2 items) and how does it apply to chat?
- How does context switching (main → tangent → main) affect productivity?
- What is the “one primary task” principle (single focus for the main thread)?
- Book Reference: “Thinking, Fast and Slow” by Daniel Kahneman - Ch. 8 (Cognitive Effort)
Questions to Guide Your Design
Before starting a tangent, ask yourself:
- Tangent Trigger Criteria
- Is this issue critical to the main task, or can it be deferred?
- Will exploring this now derail the main conversation for > 5 messages?
- Is this a debugging session that might involve trial-and-error (many iterations)?
- Would solving this in-line make the main thread hard to read later?
- Tangent Scope
- What specific question am I trying to answer in the tangent?
- What does “done” look like (clear exit criteria)?
- How will I bring the result back to the main thread (summary format)?
- What happens if the tangent doesn’t resolve the issue (abandon or continue)?
- Context Preservation
- What information from the main thread does the tangent need (dependencies)?
- Should the tangent have full access to the codebase, or limited scope?
- How do I avoid repeating context setup in the tangent (cached state)?
- What happens if I start another tangent while in a tangent (nesting limits)?
- Return Strategy
- How do I signal “return to main” (explicit command or automatic)?
- What summary information should be surfaced (key decisions, outcomes)?
- Do I need to re-state the main task when returning (context refresh)?
- Should the tangent remain accessible for future reference?
- Workflow Integration
- Can I start a tangent mid-operation (while Kiro is running a command)?
- How do I handle tangents in team settings (shared context)?
- Should tangents be saved in session history (for reproducibility)?
- Can I export a tangent as a standalone session (for sharing)?
Thinking Exercise
Trace a Context Pollution Scenario
Before using tangent mode, manually trace how context gets polluted without isolation:
Scenario: Implementing API authentication + debugging CORS issue
WITHOUT TANGENT MODE (polluted context):
Message 1 [Main]: Implement JWT auth for /api/login
Message 2 [Main]: [Kiro implements auth endpoint]
Message 3 [Main]: Test the endpoint with curl
Message 4 [Main]: [Kiro runs curl, gets CORS error]
Message 5 [Detour]: Why am I getting CORS errors?
Message 6 [Detour]: [Kiro explains CORS preflight]
Message 7 [Detour]: Show me the Express CORS config
Message 8 [Detour]: [Kiro shows current config]
Message 9 [Detour]: Update CORS to allow http://localhost:3000
Message 10 [Detour]: [Kiro updates config]
Message 11 [Detour]: Test again with curl
Message 12 [Detour]: [Kiro runs curl, still fails]
Message 13 [Detour]: Check if OPTIONS requests work
Message 14 [Detour]: [Kiro tests OPTIONS]
Message 15 [Detour]: Add Access-Control-Allow-Credentials header
Message 16 [Detour]: [Kiro adds header]
Message 17 [Detour]: Test one more time
Message 18 [Detour]: [Kiro tests, success!]
Message 19 [Main]: OK, now continue with JWT auth
Message 20 [Main]: [Kiro has to re-read context, slowed by CORS noise]
# Problem: Messages 5-18 (14 messages) about CORS pollute the auth implementation context
# Summary would include both auth AND CORS details, losing focus
WITH TANGENT MODE (clean context):
Main Context:
Message 1: Implement JWT auth for /api/login
Message 2: [Kiro implements auth endpoint]
Message 3: Test the endpoint
Message 4: [Kiro runs curl, gets CORS error]
Message 5: [TANGENT: cors-debugging] Fixing CORS issue...
Message 6: [Tangent resolved: Added Access-Control-Allow-Credentials header]
Message 7: Test the fixed endpoint
Message 8: [Kiro tests, success! Continues with auth]
Tangent Context (cors-debugging - ISOLATED):
Message 1: Why am I getting CORS errors?
Message 2-14: [Full CORS debugging session]
Message 15: [Exit: CORS fixed]
# Main context: 8 messages (focused on auth)
# Tangent context: 15 messages (isolated CORS debugging)
# Main summary: "Implemented JWT auth, fixed CORS in tangent"
Questions while tracing:
- At what point should you start a tangent (message 5, when you realize CORS is a side issue)?
- What information from the tangent is essential for the main context (just the fix, not the debugging process)?
- How does the main context “remember” what it was doing before the tangent?
- What would happen if you needed to debug CORS again later (can you re-open the tangent)?
- How would summarization differ (main context summary vs. full conversation summary)?
Manual test:
# 1. Start a main task
$ kiro "refactor the user service to use TypeScript"
# 2. Notice a side issue (linting error)
# Instead of debugging in-line, start a tangent
$ kiro "start tangent: fix-eslint-config"
# 3. Work in the tangent
$ kiro "why is ESLint complaining about implicit any?"
$ kiro "update tsconfig.json to enable noImplicitAny"
# 4. Exit and return to main
$ kiro "return to main"
# 5. Verify main context is clean
$ kiro "what were we working on?"
# Response: "We're refactoring the user service to TypeScript" (no ESLint details)
The Interview Questions They’ll Ask
Prepare to answer these:
-
“How do you manage context in a long-running AI agent conversation to prevent focus drift?”
-
“What is the difference between creating a new chat session and using a tangent within the same session?”
-
“A developer wants to explore three different implementation approaches for a feature. How would you structure that workflow to keep context clean?”
-
“How does context summarization work in LLMs, and what information is typically lost during summarization?”
-
“When would you NOT use a tangent (i.e., when is in-line exploration better)?”
-
“How do you balance context preservation (keeping everything) with context efficiency (keeping it focused)?”
Hints in Layers
Hint 1: Start Simple Use tangents for clear, isolated sub-tasks: debugging a dependency issue, researching an error message, exploring alternative approaches. Avoid tangents for tasks that are core to the main goal.
Hint 2: Name Tangents Descriptively Use short, descriptive names (e.g., “cors-debugging”, “dependency-conflict”, “api-error-research”). This helps when reviewing session history later.
Hint 3: Explicit Entry and Exit Always explicitly enter and exit tangents with commands like:
kiro "start tangent: investigate-webpack-error"
# [work in tangent]
kiro "return to main"
Hint 4: One-Sentence Summaries When exiting a tangent, summarize the outcome in one sentence:
kiro "tangent resolved: upgraded webpack 4→5, fixed config syntax"
This summary is what gets preserved in the main context.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Data Models | “Designing Data-Intensive Applications” by Kleppmann | Ch. 1 (Data Models) |
| Debugging Strategies | “The Pragmatic Programmer” by Hunt & Thomas | Ch. 7 (Debugging) |
| Summarization | “Speech and Language Processing” by Jurafsky | Ch. 23 (Summarization) |
| Cognitive Load | “Thinking, Fast and Slow” by Kahneman | Ch. 8 (Cognitive Effort) |
| Working Memory | “The Design of Everyday Things” by Norman | Ch. 3 (Knowledge in the Head) |
| Context Management | “Working Effectively with Legacy Code” by Feathers | Ch. 6 (Code Smells) |
Common Pitfalls & Debugging
Problem 1: “I forgot to exit the tangent and continued working in tangent mode”
- Why: No explicit “return to main” command was issued.
- Fix: Always end tangents with
kiro "return to main"or set up an auto-exit after N messages. - Quick test: Check the current mode:
kiro "what mode are we in?"(should say “main” or “tangent”)
Problem 2: “Tangent summary is too vague (‘fixed the issue’)”
- Why: The summary doesn’t capture what was actually done.
- Fix: Write a specific summary: “upgraded jsonwebtoken 8.5.1 → 9.5.2, tests passing” not just “fixed dependency”.
- Quick test: Read the summary and ask “Could I reproduce this fix from the summary alone?”
Problem 3: “Main context lost track of what we were doing”
- Why: The tangent took too long (> 30 messages), and summarization lost key details.
- Fix: Keep tangents short (< 20 messages). For complex issues, create a new top-level session instead.
- Quick test: After exiting tangent, ask
kiro "what were we working on?"(should recall main task)
Problem 4: “I need information from the tangent in the main context”
- Why: The tangent summary was too brief.
- Fix: Reference the tangent explicitly:
kiro "pull the CORS config from the cors-debugging tangent". - Quick test: Tangents should remain accessible (as sub-sessions) even after exit.
Problem 5: “Nested tangents (tangent within a tangent)”
- Why: You started a new tangent while already in tangent mode.
- Fix: Avoid nesting. Exit the current tangent first, then start a new one from main.
- Quick test:
kiro "are we in a nested tangent?"(tool should warn against nesting)
Problem 6: “Tangent mode not available in my Kiro version”
- Why: Tangent mode is a newer feature (or not yet implemented in your CLI).
- Fix: Use workaround: start a new session with
kiro --session tangent-name, exit and return to main session. - Quick test:
kiro --version(check if tangent mode is supported)
Definition of Done
- Successfully started a tangent session for a side investigation
- Completed the tangent task (resolved the issue or answered the question)
- Exited the tangent with a clear, one-sentence summary
- Main context does NOT contain tangent messages (only the summary)
- Main context correctly resumed the primary task after tangent exit
- Tangent session is accessible for future reference (can re-open if needed)
- Verified that summarization of the main context excludes tangent details
- No nested tangents (stayed within the “two-level rule”: main + one tangent)
- Named the tangent descriptively (easy to identify in session history)
- Main task completed successfully without context pollution
Project 26: “The Checkpoint Time Machine” — Safety Systems
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Git / Kiro |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Safety Systems |
What you’ll build: Snapshot, perform risky edits, and restore instantly.
Why it teaches Fearless Coding: You can let the agent be aggressive without fear.
Success criteria:
- A restore returns the repo to a known good state.
Real World Outcome
You’ll create a checkpoint system that snapshots your entire codebase before Kiro makes risky changes, allowing you to instantly roll back if things go wrong. This enables fearless experimentation with aggressive refactoring, large-scale changes, and exploratory edits.
Example Workflow:
# Before risky refactoring
$ kiro checkpoint create "before-refactor-user-service"
✓ Checkpoint created: cp_20240115_143022_before-refactor-user-service
Files: 156 tracked, 12 untracked
Git ref: refs/checkpoints/cp_20240115_143022
# Let Kiro make aggressive changes
$ kiro "refactor the entire user service to use dependency injection"
[Kiro makes extensive changes across 23 files...]
[Tests start failing...]
# Oh no, restore the checkpoint
$ kiro checkpoint restore cp_20240115_143022_before-refactor-user-service
Restoring checkpoint: cp_20240115_143022_before-refactor-user-service
✓ Git worktree reset to refs/checkpoints/cp_20240115_143022
✓ Untracked files restored from .kiro/checkpoints/cp_20240115_143022/
✓ 23 files reverted, 12 untracked files restored
Took 0.8s
# Codebase is back to exactly how it was before
$ git status
# On branch main
# nothing to commit, working tree clean
Checkpoint workflow:
# 1. Create checkpoint before experiment
$ kiro checkpoint create "experiment-async-await"
# 2. Try risky changes
$ kiro "convert all promise chains to async/await"
$ npm test
# Tests fail...
# 3. Restore (instant rollback)
$ kiro checkpoint restore experiment-async-await
# 4. List all checkpoints
$ kiro checkpoint list
cp_20240115_143022_before-refactor-user-service
cp_20240115_150431_experiment-async-await
cp_20240115_152103_before-schema-migration
# 5. Delete old checkpoints
$ kiro checkpoint delete cp_20240115_143022_before-refactor-user-service
What makes this different from git stash or git commit:
| Feature | Checkpoint | Git Stash | Git Commit |
|---|---|---|---|
| Speed | Instant (<1s) | Fast (~2s) | Slow (>5s with hooks) |
| Untracked files | ✓ Included | ✗ Not included | ✗ Not included |
| Easy restore | checkpoint restore |
stash apply (conflicts) |
git reset --hard (dangerous) |
| Named | ✓ Descriptive | ✗ “WIP on main” | ✓ Commit message |
| No git history pollution | ✓ Clean | ✓ Clean | ✗ Creates commits |
You’re building the same safety net that professional game developers use (“checkpoint before boss fight”).
The Core Question You’re Answering
“How do you experiment fearlessly with an AI agent making large-scale code changes when you know you might need to revert everything instantly?”
Before you build any checkpoint system, understand this: The fear of breaking the codebase is the #1 reason developers are conservative with AI agents. If you can’t easily undo changes, you’ll never let the agent be truly creative or aggressive.
Checkpoints remove that fear. You can say “try a radical refactoring” knowing you have a one-command rollback.
Concepts You Must Understand First
Stop and research these before coding:
- Git Worktree and Index
- What is the difference between the working tree, the index (staging area), and HEAD?
- How does
git reset --hardwork (and why is it dangerous)? - What are git refs (references like refs/heads/main, refs/tags/v1.0)?
- Book Reference: “Pro Git” by Scott Chacon - Ch. 10 (Git Internals)
- Untracked Files and .gitignore
- Why don’t
git stashandgit commitsave untracked files? - How do you capture untracked files (tar, rsync, cp -r)?
- What happens to .gitignored files during checkpoint/restore?
- Book Reference: “Version Control with Git” by Loeliger - Ch. 5 (Working Trees)
- Why don’t
- Atomic Operations and Race Conditions
- How do you ensure checkpoint creation is atomic (all-or-nothing)?
- What happens if Kiro creates a checkpoint while you’re editing files?
- How do you handle concurrent checkpoint operations (locking)?
- Book Reference: “The Linux Programming Interface” by Kerrisk - Ch. 30 (File Locking)
- Filesystem Snapshots vs. Git-Based Snapshots
- What are filesystem snapshots (ZFS snapshots, Btrfs snapshots, LVM snapshots)?
- Why use git refs instead of full filesystem snapshots (storage efficiency)?
- How do you handle binary files and large assets (Git LFS)?
- Book Reference: “Understanding the Linux Kernel” by Bovet - Ch. 16 (File Systems)
- Restoration Strategies (Full vs. Partial)
- Should you restore the entire working tree or just specific files?
- How do you handle merge conflicts during restore (overwrite vs. merge)?
- What happens to local changes made after checkpoint creation?
- Book Reference: “Git Pocket Guide” by Silverman - Ch. 3 (Undoing Changes)
- Checkpoint Lifecycle Management
- When should checkpoints be automatically cleaned up (after N days, after restore)?
- How much disk space do checkpoints consume (ref + untracked files)?
- Should checkpoints survive
git clean -fdx(store outside .git/)? - Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 3 (Storage)
Questions to Guide Your Design
Before implementing, think through these:
- Snapshot Granularity
- Should checkpoints capture the entire repo or just specific directories?
- Do you include node_modules, build artifacts, and .env files?
- Should checkpoints be branch-specific or global across all branches?
- How do you handle submodules and nested git repositories?
- Storage Strategy
- Where do you store checkpoints (.kiro/checkpoints/, .git/refs/checkpoints/)?
- Do you use git refs (lightweight) or full tar archives (heavyweight)?
- How do you compress untracked files (tar.gz, zip, rsync)?
- Should checkpoints be shareable across team members (git remote)?
- Restoration Safety
- Should restore require confirmation (interactive prompt)?
- Do you warn if there are uncommitted changes before restore?
- Should restore create a “pre-restore” checkpoint automatically?
- How do you handle file deletions (restore deleted files)?
- Naming and Discovery
- How do you auto-generate checkpoint names (timestamp + description)?
- Should checkpoints be tagged with metadata (author, timestamp, git commit)?
- How do you search for checkpoints (by name, date, commit range)?
- Can you compare two checkpoints (diff cp1 cp2)?
- Integration with Kiro
- Should Kiro automatically create checkpoints before risky operations?
- Can Kiro suggest when to create a checkpoint (heuristic: changing > 10 files)?
- Should restore be a recoverable operation (keep restore history)?
- How do you visualize checkpoint history (timeline, tree view)?
Thinking Exercise
Trace a Checkpoint and Restore Cycle
Before coding, manually simulate creating and restoring a checkpoint using git:
Scenario: Risky refactoring with checkpoint safety net
# Initial state
$ git status
# On branch main
# nothing to commit, working tree clean
# Create checkpoint (manual simulation)
$ git update-ref refs/checkpoints/cp_001 HEAD
$ tar -czf .kiro/checkpoints/cp_001_untracked.tar.gz \
$(git ls-files --others --exclude-standard)
# Checkpoint created (stored: git ref + untracked files archive)
# Risky changes
$ kiro "refactor all database queries to use TypeORM instead of raw SQL"
[Kiro modifies 30 files, creates 12 new files]
# Tests fail
$ npm test
# 23 tests failed
# Restore checkpoint (manual simulation)
$ git reset --hard refs/checkpoints/cp_001 # Reset tracked files
Unstaged changes after reset:
M src/db/queries.ts (modified but not committed after checkpoint)
$ tar -xzf .kiro/checkpoints/cp_001_untracked.tar.gz # Restore untracked
$ git clean -fd # Remove new untracked files created after checkpoint
# Verify restoration
$ git status
# On branch main
# nothing to commit, working tree clean
$ npm test
# All tests passing (back to checkpoint state)
Questions while tracing:
- What happens to files that existed at checkpoint time but were deleted during the risky changes?
- How do you handle files that were modified both before and after checkpoint creation?
- What if the user has uncommitted changes when they try to restore?
- Should restore delete new files created after the checkpoint (git clean -fd)?
- How do you restore files that are in .gitignore (they’re untracked but intentionally ignored)?
Edge cases to consider:
# Edge Case 1: Checkpoint with dirty working tree
$ git status
# On branch main
# Changes not staged for commit:
# modified: src/app.ts
$ kiro checkpoint create "dirty-state"
# Should this be allowed? Or require a clean working tree?
# Edge Case 2: Restore with uncommitted changes
$ kiro checkpoint restore cp_001
# Warning: You have uncommitted changes. Restore will overwrite them.
# Continue? (y/N)
# Edge Case 3: Checkpoint on non-main branch
$ git checkout feature-branch
$ kiro checkpoint create "feature-experiment"
# Should this checkpoint be branch-specific or global?
# Edge Case 4: Restore after git commit
$ kiro checkpoint create cp_001
$ kiro "refactor code"
$ git add . && git commit -m "refactor"
$ kiro checkpoint restore cp_001
# Restores working tree, but leaves commit in history?
The Interview Questions They’ll Ask
Prepare to answer these:
-
“What is the difference between
git stash,git commit, and a custom checkpoint system for saving codebase state?” -
“How would you implement a checkpoint system that captures both tracked files (in git) and untracked files (not in git)?”
-
“A developer creates a checkpoint, makes changes, commits those changes, then restores the checkpoint. What happens to the git commit history?”
-
“What are the trade-offs between storing checkpoints as git refs vs. full tar archives of the working directory?”
-
“How do you handle checkpoint restoration when there are merge conflicts (files modified both at checkpoint time and after)?”
-
“Why might a checkpoint system fail to restore a repository to its exact prior state, even with all files backed up?”
Hints in Layers
Hint 1: Start with Git Refs
Use git update-ref refs/checkpoints/<name> HEAD to create a lightweight git reference to the current commit. This is fast and doesn’t duplicate the entire repository.
Hint 2: Capture Untracked Files Separately
Git refs only track committed files. Use git ls-files --others --exclude-standard to find untracked files and tar them:
tar -czf .kiro/checkpoints/<name>_untracked.tar.gz $(git ls-files --others)
Hint 3: Restore in Two Steps
First reset tracked files: git reset --hard refs/checkpoints/<name>
Then restore untracked files: tar -xzf .kiro/checkpoints/<name>_untracked.tar.gz
Hint 4: Add Safety Checks Before restore, check for uncommitted changes:
if ! git diff-index --quiet HEAD --; then
echo "Warning: You have uncommitted changes."
read -p "Continue? (y/N) " -n 1 -r
fi
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Git Internals | “Pro Git” by Scott Chacon | Ch. 10 (Git Internals) |
| Working Trees | “Version Control with Git” by Loeliger | Ch. 5 (Working Trees) |
| File Locking | “The Linux Programming Interface” by Kerrisk | Ch. 30 (File Locking) |
| Filesystems | “Understanding the Linux Kernel” by Bovet | Ch. 16 (File Systems) |
| Undoing Changes | “Git Pocket Guide” by Silverman | Ch. 3 (Undoing Changes) |
| Storage Engines | “Designing Data-Intensive Applications” by Kleppmann | Ch. 3 (Storage) |
Common Pitfalls & Debugging
Problem 1: “Restore doesn’t bring back deleted files”
- Why:
git reset --hardonly resets files tracked by git, not untracked files. - Fix: Also restore untracked files from the checkpoint’s tar archive.
- Quick test: Create checkpoint, delete a file, restore, verify file exists.
Problem 2: “Checkpoint failed mid-creation (partial checkpoint)”
- Why: tar command failed due to disk space or permission issues.
- Fix: Make checkpoint creation atomic: write to temp location, then move.
- Quick test: Fill disk during checkpoint creation, verify no partial checkpoints exist.
Problem 3: “Restore leaves new files created after checkpoint”
- Why: Restore doesn’t delete files that didn’t exist at checkpoint time.
- Fix: Run
git clean -fdafter restore to remove untracked files. - Quick test: Create checkpoint, add new file, restore, verify new file is gone.
Problem 4: “Checkpoints consume too much disk space (> 1GB each)”
- Why: Untracked files include node_modules, build artifacts, or large assets.
- Fix: Exclude large directories from checkpoint:
tar --exclude='node_modules' ... - Quick test:
du -sh .kiro/checkpoints/(should be < 100MB per checkpoint)
Problem 5: “Restore fails with ‘ref not found’ error”
- Why: The git ref was deleted (manual cleanup or git gc).
- Fix: Store checkpoints as actual commits (or tags) instead of refs.
- Quick test:
git show-ref refs/checkpoints/(list all checkpoint refs)
Problem 6: “Checkpoints don’t survive git clean -fdx”
- Why:
.kiro/checkpoints/was deleted bygit clean. - Fix: Add
.kiro/to.gitignoresogit cleanskips it. - Quick test:
git clean -fdx && ls .kiro/checkpoints/(should still exist)
Definition of Done
- Can create a checkpoint with a descriptive name
- Checkpoint captures all tracked files (via git ref)
- Checkpoint captures all untracked files (via tar archive)
- Restore brings back the exact working tree state (all files match)
- Restore removes files created after checkpoint (clean working tree)
- Checkpoint creation is atomic (no partial checkpoints on failure)
- Can list all checkpoints with timestamps and descriptions
- Can delete old checkpoints to free disk space
- Restore warns if there are uncommitted changes
- Checkpoint system survives git clean -fdx
- Tested with risky refactoring (create checkpoint, break code, restore successfully)
- Documentation explains when to create checkpoints and how to restore
Project 27: “The Checklist Manager” — Task Management
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Markdown |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 1: Beginner |
| Knowledge Area | Task Management |
What you’ll build: Use /todo to turn a brain dump into executable steps.
Why it teaches Structured Execution: It enforces a real work queue.
Success criteria:
- Items are executed and checked off by Kiro.
Real World Outcome
You’ll use Kiro’s /todo feature to convert unstructured brain dumps into structured, executable task lists. Kiro will work through the list systematically, checking off completed tasks and reporting progress.
Example Workflow:
# Initial brain dump (unstructured)
$ kiro "I need to add user authentication to the API. Also fix the CORS issues. And we should add logging. Oh, and update the README."
[Kiro responds with a plan, but it's not tracked]
# Better: Use /todo for structured execution
$ kiro "/todo Create tasks for: add JWT auth, fix CORS, add Winston logging, update README"
✓ Created todo list:
[ ] 1. Implement JWT authentication for API endpoints
[ ] 2. Fix CORS configuration to allow frontend domain
[ ] 3. Add Winston logging with log rotation
[ ] 4. Update README with new auth flow and setup instructions
# Kiro works through the list
[Working on task 1...]
✓ 1. Implement JWT authentication for API endpoints (DONE)
- Created /auth/login and /auth/register endpoints
- Added JWT middleware for protected routes
- Tests passing
[ ] 2. Fix CORS configuration to allow frontend domain
[ ] 3. Add Winston logging with log rotation
[ ] 4. Update README with new auth flow and setup instructions
# Continue with next task
[Working on task 2...]
✓ 2. Fix CORS configuration to allow frontend domain (DONE)
- Updated Express CORS config to whitelist https://app.example.com
- Added preflight OPTIONS handling
[...continues until all tasks complete...]
Final status:
✓ 1. Implement JWT authentication for API endpoints (DONE)
✓ 2. Fix CORS configuration to allow frontend domain (DONE)
✓ 3. Add Winston logging with log rotation (DONE)
✓ 4. Update README with new auth flow and setup instructions (DONE)
All tasks completed! 🎉
Visual progress tracking:
$ kiro "/todo status"
Progress: [████████░░] 80% (4/5 tasks complete)
✓ 1. Set up CI/CD pipeline (DONE)
✓ 2. Add unit tests for auth service (DONE)
✓ 3. Implement rate limiting (DONE)
✓ 4. Add API documentation with Swagger (DONE)
⏳ 5. Deploy to staging environment (IN PROGRESS)
- Current step: Running database migrations...
Key benefits:
- No tasks forgotten (explicit checklist)
- Clear progress visibility (X of Y tasks done)
- Prioritization (numbered order)
- Context preservation (Kiro remembers what’s next)
- Audit trail (what was done, when)
You’re using the same task management system that agile teams use (sprint backlogs, kanban boards).
The Core Question You’re Answering
“How do you ensure complex multi-step tasks are executed completely and in order, without forgetting steps or getting distracted by tangents?”
Before you use /todo, understand this: Human working memory holds 7 ± 2 items. When you ask Kiro to do 10 things in one prompt, some will be forgotten or deprioritized. A todo list externalizes the task queue, ensuring systematic execution.
The todo system transforms “do these 10 things” (vague) into “here are 10 explicit steps, execute in order” (structured).
Concepts You Must Understand First
Stop and research these before using /todo:
- Task Decomposition and Prioritization
- How do you break down a large goal (“add authentication”) into atomic tasks?
- What makes a task “atomic” (single responsibility, verifiable completion)?
- How do you prioritize tasks (dependencies, critical path, quick wins)?
- Book Reference: “The Pragmatic Programmer” by Hunt & Thomas - Ch. 2 (Orthogonality)
- Markdown Checklists and Syntax
- What is the GitHub-flavored markdown syntax for checklists (
- [ ]vs.- [x])? - How do you represent task states (pending, in-progress, done, blocked)?
- Can you nest sub-tasks (hierarchical checklists)?
- Book Reference: “The Markdown Guide” by Matt Cone - Ch. 4 (Extended Syntax)
- What is the GitHub-flavored markdown syntax for checklists (
- Workflow State Machines
- What are the valid state transitions (pending → in-progress → done)?
- Can you skip states (pending → done without in-progress)?
- How do you handle task failures (done → failed → retry)?
- Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 7 (Transactions)
- Context Preservation Across Tasks
- How does Kiro remember task N while working on task N+1?
- What happens if a task requires information from a previous task?
- Should tasks be independent (stateless) or can they build on each other (stateful)?
- Book Reference: “Clean Code” by Robert C. Martin - Ch. 3 (Functions)
- Interruption and Resumption
- What happens if you interrupt Kiro mid-task (Ctrl+C)?
- Can you resume from the last completed task?
- Should incomplete tasks be marked as “blocked” or deleted?
- Book Reference: “The Mythical Man-Month” by Frederick Brooks - Ch. 10 (Tracking Progress)
- Audit Trail and Accountability
- How do you track who completed each task (user vs. Kiro)?
- Should you log timestamps for task start and completion?
- Can you export the completed checklist as a report?
- Book Reference: “Accelerate” by Forsgren, Humble, Kim - Ch. 4 (Measurement)
Questions to Guide Your Design
Before creating a todo list, think through these:
- Task Granularity
- Should tasks be small (< 5 minutes each) or large (> 1 hour)?
- Do you group related tasks (all database tasks together)?
- Should each task have acceptance criteria (how do you know it’s done)?
- What’s the optimal number of tasks (3-5 vs. 20-30)?
- Ordering and Dependencies
- Do tasks have dependencies (task B requires task A to complete first)?
- Should Kiro enforce dependency order or just suggest it?
- Can tasks be parallelized (run tests while fixing linting)?
- How do you represent blocked tasks (waiting on external input)?
- State Management
- Where is the todo list stored (.kiro/todos.md, in-memory, database)?
- How does Kiro update task states (mark as done, add notes)?
- Can you manually edit the todo list (add/remove tasks mid-execution)?
- Should tasks persist across Kiro sessions (resume later)?
- Error Handling
- What happens if a task fails (mark as failed, retry, skip)?
- Should Kiro stop execution on first failure (fail-fast)?
- How do you handle partial completion (task 50% done)?
- Can you rollback completed tasks if a later task fails?
- Reporting and Visibility
- How do you show progress (X of Y tasks done, percentage)?
- Should Kiro report estimated time remaining (based on historical velocity)?
- Can you generate a summary report (what was done, time taken)?
- How do you visualize the task graph (dependencies, critical path)?
Thinking Exercise
Convert a Brain Dump to a Structured Todo List
Before using /todo, manually trace how you would decompose a vague request:
Scenario: “Improve the app’s performance”
Unstructured brain dump: “The app is slow. We should optimize the database queries, add caching, compress images, minify JavaScript, enable CDN, and maybe use a load balancer. Also fix memory leaks.”
Structured todo list (atomic, prioritized, verifiable):
## Performance Improvement Tasks
### Critical Path (do these first)
- [ ] 1. Profile the app to identify bottlenecks
- Use Chrome DevTools Performance tab
- Identify the slowest 3 operations
- Expected outcome: Waterfall chart showing blocking operations
- [ ] 2. Optimize database queries (N+1 problem detected)
- Add indexes to `users.email` and `posts.author_id`
- Replace eager loading with joins
- Expected outcome: Query time < 50ms (currently 2s)
- [ ] 3. Add Redis caching for user sessions
- Install redis, configure connection
- Cache user profile lookups (TTL 5 minutes)
- Expected outcome: 90% cache hit rate
### Quick Wins (easy, high impact)
- [ ] 4. Enable gzip compression on API responses
- Add compression middleware to Express
- Expected outcome: Response size reduced by 70%
- [ ] 5. Minify and bundle JavaScript (Webpack production build)
- Run `npm run build:prod`
- Expected outcome: Bundle size < 200KB (currently 1.5MB)
### Nice-to-Have (defer if time limited)
- [ ] 6. Set up CDN for static assets (CloudFront)
- Configure S3 bucket with CloudFront distribution
- Expected outcome: Asset load time < 100ms globally
- [ ] 7. Investigate memory leaks (long-term monitoring)
- Add heap snapshot capture on production
- Review weekly for memory growth trends
Questions while decomposing:
- Which tasks are prerequisites for others (profiling before optimization)?
- Which tasks are independent and can be done in parallel (caching + minification)?
- Which tasks have measurable outcomes (query time, bundle size)?
- Which tasks are risky and need checkpoints (database migration)?
- Which tasks can be deferred if the deadline is tight (CDN setup)?
Manual test:
# 1. Create the todo list
$ kiro "/todo Improve app performance: profile, optimize queries, add caching, compress assets"
# Kiro generates the structured list above
# 2. Execute tasks in order
$ kiro "/todo execute"
# Kiro starts with task 1 (profiling)
# 3. Check progress mid-execution
$ kiro "/todo status"
# Progress: [███░░░░] 40% (2/7 tasks complete)
# 4. Mark a task as blocked (waiting for DBA approval)
$ kiro "/todo block 2 --reason 'Waiting for DBA to approve index creation'"
# 5. Skip to next unblocked task
$ kiro "/todo next"
The Interview Questions They’ll Ask
Prepare to answer these:
-
“What is the difference between a task list and a kanban board for managing development work?”
-
“How do you decompose a large, vague requirement (‘make the app faster’) into atomic, executable tasks?”
-
“What makes a task ‘atomic’? What are the characteristics of a well-defined task?”
-
“How would you handle task dependencies in a todo system (task B requires task A to complete first)?”
-
“What state transitions are valid for a task (e.g., pending → in-progress → done)?”
-
“How do you measure progress when some tasks are large (1 day) and others are small (5 minutes)?”
Hints in Layers
Hint 1: Start with a Brain Dump
First, tell Kiro everything you want to accomplish (unstructured). Then ask Kiro to convert it into a structured /todo list.
Hint 2: Use Verifiable Outcomes Each task should have a clear “done” condition: “Add unit tests” (vague) vs. “Add tests until coverage is > 80%” (verifiable).
Hint 3: Prioritize with Numbers Prefix tasks with numbers to enforce execution order: “1. Set up database”, “2. Run migrations”, “3. Seed test data”.
Hint 4: Separate “What” from “How” The task describes what to achieve, not how to do it. “Add JWT auth” (what) not “Install jsonwebtoken library, create /auth/login endpoint, add middleware…” (how).
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Task Decomposition | “The Pragmatic Programmer” by Hunt & Thomas | Ch. 2 (Orthogonality) |
| Markdown Syntax | “The Markdown Guide” by Matt Cone | Ch. 4 (Extended Syntax) |
| State Machines | “Designing Data-Intensive Applications” by Kleppmann | Ch. 7 (Transactions) |
| Function Design | “Clean Code” by Robert C. Martin | Ch. 3 (Functions) |
| Progress Tracking | “The Mythical Man-Month” by Brooks | Ch. 10 (Tracking Progress) |
| Measurement | “Accelerate” by Forsgren, Humble, Kim | Ch. 4 (Measurement) |
Common Pitfalls & Debugging
Problem 1: “Tasks are too vague (‘improve the code’)”
- Why: No clear acceptance criteria or measurable outcome.
- Fix: Rewrite as “Increase test coverage from 60% to 80%” or “Reduce cyclomatic complexity of auth.js to < 10”.
- Quick test: Ask “How do I know when this task is done?” If unclear, the task is too vague.
Problem 2: “Too many tasks (30+), feels overwhelming”
- Why: Tasks are too granular or not grouped.
- Fix: Group related tasks into phases: “Phase 1: Auth (3 tasks)”, “Phase 2: API (5 tasks)”.
- Quick test: Can you complete the entire list in a single work session? If no, it’s too long.
Problem 3: “Kiro forgot to complete a task from the list”
- Why: Task was not explicitly tracked or Kiro lost context.
- Fix: Use
/todo statusto verify all tasks are tracked, and/todo executeto enforce systematic execution. - Quick test: Review the final checklist—all tasks should be marked as done.
Problem 4: “Task order matters but Kiro did them out of order”
- Why: Dependencies weren’t explicit.
- Fix: Number tasks explicitly (1, 2, 3) or use “Prerequisites: Task 1 must complete first”.
- Quick test: Can task 3 complete before task 1? If yes, there’s a dependency issue.
Problem 5: “A task failed mid-execution, but the todo list shows it as done”
- Why: Kiro marked it as done based on attempting it, not succeeding.
- Fix: Use stricter “done” criteria: tests must pass, manual verification required.
- Quick test: Run tests after each task completion.
Problem 6: “Todo list disappeared after restarting Kiro”
- Why: The list was only in-memory, not persisted.
- Fix: Save the todo list to a file (
.kiro/todos.md) and load it on session start. - Quick test: Restart Kiro and run
/todo status(should show previous list).
Definition of Done
- Can create a todo list from an unstructured brain dump
- All tasks are atomic (single responsibility, verifiable completion)
- Tasks are numbered or prioritized (execution order is clear)
- Each task has a “done” condition (how do you know it’s complete?)
- Kiro executes tasks systematically (in order, one at a time)
- Can check progress mid-execution (
/todo status) - All tasks marked as complete when done
- Todo list persists across Kiro sessions (saved to file)
- Can manually edit the todo list (add/remove/reorder tasks)
- Can handle task failures (mark as failed, skip, or retry)
- Tested with a real multi-step project (all tasks completed successfully)
- Documentation explains how to create effective task lists
Project 28: “The Semantic Search Engine” — Knowledge Management
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Python (RAG) |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Knowledge Management |
What you’ll build: Enable /knowledge and ingest a folder of PDFs for semantic Q&A.
Why it teaches Retrieval: You learn how to use data larger than the context window.
Success criteria:
- An answer is grounded in retrieved chunks.
Real World Outcome
You will have a Kiro CLI extension that ingests PDF documents and enables semantic question-answering that goes beyond the context window limit. When you run it, you’ll see:
Ingestion Phase:
$ kiro "/knowledge ingest ~/Documents/research_papers/"
📚 Semantic Search Engine - Knowledge Ingestion
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Processing PDFs...
├─ attention_is_all_you_need.pdf (8 pages) ✓
│ └─ Extracted 47 chunks (avg: 512 tokens/chunk)
├─ bert_pretraining.pdf (16 pages) ✓
│ └─ Extracted 89 chunks (avg: 498 tokens/chunk)
└─ gpt3_language_models.pdf (75 pages) ✓
└─ Extracted 412 chunks (avg: 505 tokens/chunk)
Generating embeddings... [████████████████████] 548/548 chunks
Building vector index (FAISS)...
├─ Index type: IVF256,Flat
├─ Dimensions: 1536 (text-embedding-3-small)
└─ Total vectors: 548
💾 Saved to: ~/.kiro/knowledge/research_papers.faiss
✓ Knowledge base ready: research_papers (548 chunks, 274k tokens)
Query Phase:
$ kiro "/knowledge query research_papers 'What is the self-attention mechanism in transformers?'"
🔍 Semantic Search Results
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Query: "What is the self-attention mechanism in transformers?"
Top 5 Retrieved Chunks (by cosine similarity):
1. attention_is_all_you_need.pdf (page 3, score: 0.94)
"Self-attention, sometimes called intra-attention, is a mechanism
relating different positions of a single sequence to compute a
representation of the sequence. The attention function maps a
query and set of key-value pairs to an output..."
2. attention_is_all_you_need.pdf (page 4, score: 0.89)
"Scaled Dot-Product Attention: We compute attention as
Attention(Q,K,V) = softmax(QK^T / sqrt(d_k))V where Q, K, V
are the queries, keys, and values matrices..."
3. bert_pretraining.pdf (page 7, score: 0.82)
"BERT uses bidirectional self-attention, allowing each token to
attend to all tokens in both directions. This differs from GPT's
causal (left-to-right) attention masking..."
───────────────────────────────────────────────────────────────
📝 Generated Answer (grounded in retrieved context):
Self-attention is a mechanism that relates different positions within
a single sequence to compute its representation. In the Transformer
architecture, it works by:
1. Computing Query (Q), Key (K), Value (V) matrices from input
2. Calculating attention scores: softmax(QK^T / sqrt(d_k))
3. Using scores to weight the Value vectors
The scaling factor sqrt(d_k) prevents dot products from growing too
large. BERT extends this with bidirectional attention, while GPT uses
causal masking for autoregressive generation.
📚 Sources: attention_is_all_you_need.pdf (p3-4), bert_pretraining.pdf (p7)
Usage in Conversation:
$ kiro "Based on my research papers, explain how to implement a custom attention layer"
[Kiro automatically retrieves relevant chunks from the knowledge base]
I found 3 relevant sections from your research papers knowledge base:
- attention_is_all_you_need.pdf discusses scaled dot-product attention
- bert_pretraining.pdf covers multi-head attention implementation
- efficient_transformers.pdf shows optimization techniques
Here's how to implement a custom attention layer...
[Answer grounded in retrieved context]
You’re seeing exactly what modern RAG (Retrieval-Augmented Generation) systems do - breaking the context window limitation by retrieving only relevant information on-demand!
The Core Question You’re Answering
“How do you give an LLM access to knowledge beyond its context window without fine-tuning?”
Before you write any code, sit with this question. Most developers think context windows solve everything (“just throw it all in!”), but:
- GPT-4 Turbo: 128k tokens ≈ 96,000 words ≈ 200 pages
- Your company’s documentation: 10,000 pages
- Every research paper ever written: billions of pages
Even with 200k token windows, you can’t fit everything. RAG (Retrieval-Augmented Generation) solves this by:
- Converting text to semantic vectors (embeddings)
- Storing vectors in a searchable index
- Retrieving only relevant chunks for each query
- Grounding LLM responses in retrieved context
This is how ChatGPT’s “Browse with Bing” works, how GitHub Copilot uses your codebase, and how enterprise AI assistants access internal docs.
Concepts You Must Understand First
Stop and research these before coding:
- Vector Embeddings
- What is an embedding? (numeric representation of semantic meaning)
- Why does cosine similarity measure semantic relatedness?
- How does text-embedding-3-small differ from text-embedding-ada-002?
- Book Reference: “Speech and Language Processing” Ch. 6 (Vector Semantics) - Jurafsky & Martin
- Chunking Strategies
- Why chunk documents instead of embedding entire PDFs?
- What’s the trade-off between chunk size (128 vs 512 vs 2048 tokens)?
- How does overlapping chunks prevent context loss at boundaries?
- Book Reference: “Information Retrieval” Ch. 2 (Indexing) - Manning, Raghavan, Schütze
- Vector Databases (FAISS, Pinecone, Weaviate)
- What is Approximate Nearest Neighbor (ANN) search?
- Why is exhaustive search O(n) too slow for millions of vectors?
- How does FAISS’s IVF (Inverted File Index) work?
- Blog Reference: “FAISS: A Library for Efficient Similarity Search” - Facebook AI Research
- Retrieval Algorithms
- Dense retrieval (embeddings) vs sparse retrieval (BM25/TF-IDF)
- What is hybrid search? (combining dense + sparse)
- How does reranking improve top-k results?
- Paper Reference: “Dense Passage Retrieval for Open-Domain QA” - Karpukhin et al., 2020
- PDF Parsing
- How does PyPDF2/pdfplumber extract text from PDFs?
- What breaks with scanned PDFs (OCR needed)?
- How do you handle tables, images, and multi-column layouts?
- Docs Reference: pdfplumber documentation
Questions to Guide Your Design
Before implementing, think through these:
- Chunking Strategy
- Fixed-size chunks (512 tokens) or semantic chunks (paragraph boundaries)?
- Should chunks overlap? If so, by how much (50 tokens? 25%)?
- How will you handle code blocks, tables, and lists (semantic units)?
- Embedding Model Selection
- OpenAI text-embedding-3-small (1536 dims, $0.02/1M tokens)?
- Sentence-BERT (384 dims, free, runs locally)?
- How will you handle the latency vs cost trade-off?
- Vector Index Design
- FAISS Flat (exact search, slow for >100k vectors)?
- FAISS IVF (approximate, 10x faster, 95% recall)?
- Do you need GPU acceleration (faiss-gpu)?
- Retrieval Strategy
- Top-k retrieval (how many chunks? 3? 5? 10?)?
- Score threshold (min cosine similarity to include)?
- How will you format retrieved chunks in the prompt?
- Metadata & Filtering
- Should you store page numbers, document titles, timestamps?
- Do you need to filter by document type or date range?
- How will you cite sources in the generated answer?
Thinking Exercise
Trace Retrieval Flow
Before coding, manually trace this RAG pipeline:
Given:
- Knowledge base: 3 PDFs (Attention Is All You Need, BERT, GPT-3)
- Query: “How does GPT-3 differ from BERT in pretraining?”
Trace each step:
- Query Embedding
- Input: “How does GPT-3 differ from BERT in pretraining?”
- Output: 1536-dimensional vector (e.g., [0.023, -0.145, 0.891, …])
- Question: Why embed the query with the same model as the chunks?
- Vector Search (FAISS)
- Compute cosine similarity between query vector and all 548 chunk vectors
- Sort by similarity score (1.0 = identical, 0.0 = orthogonal)
- Return top 5 chunks
- Question: Why cosine similarity instead of Euclidean distance?
- Retrieved Chunks (hypothetical)
Chunk 1 (gpt3_language_models.pdf, page 12, score: 0.91) "GPT-3 uses autoregressive language modeling, predicting the next token given all previous tokens. Unlike BERT's masked language modeling, GPT-3 is trained left-to-right..." Chunk 2 (bert_pretraining.pdf, page 3, score: 0.88) "BERT is pretrained with two objectives: (1) Masked Language Model (MLM) where 15% of tokens are masked, and (2) Next Sentence Prediction (NSP)..."- Question: Why did these chunks score higher than others?
- Prompt Construction
System: You are an AI assistant. Answer based on the context below. Context: [Chunk 1 content] [Chunk 2 content] ... User: How does GPT-3 differ from BERT in pretraining? Answer:- Question: What if the retrieved chunks don’t answer the question?
- Generated Answer
- LLM reads retrieved context + query
- Generates grounded answer citing sources
- Question: How do you detect hallucination (info NOT in retrieved chunks)?
Questions while tracing:
- What if no chunks have similarity > 0.5? (query outside knowledge base)
- What if 10 chunks all have similarity > 0.9? (do you use all? truncate?)
- What if the PDF has OCR errors? (“pretraining” → “pre-training” → “pretrainng”)?
The Interview Questions They’ll Ask
Prepare to answer these:
-
“Explain the difference between RAG (Retrieval-Augmented Generation) and fine-tuning. When would you use each?”
-
“Your vector search is returning irrelevant chunks for 20% of queries. How would you debug and fix this?”
-
“You have 1 million PDF pages to index. Embedding them with OpenAI costs $200. How would you reduce this cost?”
-
“A user asks ‘What’s the latest update?’ but your knowledge base is from 6 months ago. How does your system handle this gracefully?”
-
“Walk me through the math of cosine similarity. Why is it better than Euclidean distance for text embeddings?”
-
“You’re getting complaints that answers are slow (10 seconds). Where are the bottlenecks and how do you optimize?”
Hints in Layers
Hint 1: Start with PDF Ingestion
Don’t jump straight to embeddings. First, prove you can extract clean text from a single PDF. Use pdfplumber (better than PyPDF2 for tables). Test with a research paper PDF and verify paragraph boundaries are preserved.
Hint 2: Implement Chunking
Split the extracted text into 512-token chunks with 50-token overlap. Use tiktoken (OpenAI’s tokenizer) to count tokens accurately. Store chunks with metadata:
chunk = {
'text': "Self-attention is a mechanism...",
'source': 'attention_is_all_you_need.pdf',
'page': 3,
'chunk_id': 'doc1_chunk_047',
'token_count': 498
}
Hint 3: Generate Embeddings Call OpenAI’s embedding API for each chunk. Batch requests (up to 2048 chunks/request) to reduce latency:
response = openai.embeddings.create(
model="text-embedding-3-small",
input=[chunk['text'] for chunk in chunks[:2048]]
)
embeddings = [data.embedding for data in response.data]
Each embedding is a 1536-dimensional float array.
Hint 4: Build FAISS Index Create a Flat index for exact search (start simple before optimizing):
import faiss
import numpy as np
dimension = 1536
embeddings_matrix = np.array(embeddings).astype('float32')
index = faiss.IndexFlatL2(dimension) # L2 distance (convert to cosine later)
index.add(embeddings_matrix) # Add all vectors
faiss.write_index(index, 'knowledge.faiss') # Save to disk
Hint 5: Query & Retrieve For a user query, embed it and search the index:
query_embedding = openai.embeddings.create(
model="text-embedding-3-small",
input="What is self-attention?"
).data[0].embedding
query_vector = np.array([query_embedding]).astype('float32')
k = 5 # Top 5 results
distances, indices = index.search(query_vector, k)
# Retrieve original chunks
retrieved_chunks = [chunks[i] for i in indices[0]]
Hint 6: Construct RAG Prompt Format retrieved chunks into a prompt:
context = "\n\n".join([
f"Source: {chunk['source']} (page {chunk['page']})\n{chunk['text']}"
for chunk in retrieved_chunks
])
prompt = f"""Answer based on the following context:
{context}
Question: {user_query}
Answer:"""
Hint 7: Debugging Tools When results are bad, inspect:
- Chunk quality: Are chunks semantically coherent? (print first 10)
- Embedding distribution: Are vectors normalized? (check norms)
- Similarity scores: What are the top-k scores? (should be > 0.6 for good matches)
- Retrieved text: Does it actually answer the query? (manual review)
Hint 8: Optimization (Once It Works)
- Switch to FAISS IVF for >10k chunks (10x faster, slight recall loss)
- Cache embeddings (don’t re-embed the same query)
- Use sentence-transformers for local embedding (no API costs)
- Implement hybrid search (dense + BM25 sparse retrieval)
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Vector Semantics & Embeddings | “Speech and Language Processing” by Jurafsky & Martin | Ch. 6 |
| Information Retrieval Fundamentals | “Information Retrieval” by Manning, Raghavan, Schütze | Ch. 1-2 |
| Nearest Neighbor Search | “Foundations of Data Science” by Blum, Hopcroft, Kannan | Ch. 2 (High-Dimensional Space) |
| Transformer Attention (context for RAG) | “Deep Learning” by Goodfellow, Bengio, Courville | Ch. 10 (Sequence Modeling) |
| PDF Parsing & Text Extraction | “Mining the Web” by Soumen Chakrabarti | Ch. 3 (Crawling & Extraction) |
Common Pitfalls & Debugging
Problem 1: “Embeddings return nonsense - unrelated chunks rank highest”
- Why: You’re using Euclidean distance (L2) instead of cosine similarity. L2 is affected by vector magnitude; cosine only cares about direction.
- Fix: Use
IndexFlatIP(inner product) with normalized vectors, or convert L2 distances to cosine. - Quick test:
faiss.normalize_L2(embeddings_matrix)before adding to index. Verify withnp.linalg.norm(embeddings_matrix[0])≈ 1.0.
Problem 2: “PDF extraction is garbled - formulas and tables break”
- Why: PyPDF2 doesn’t handle complex layouts. Scanned PDFs need OCR.
- Fix: Use
pdfplumberfor tables,pytesseractfor scanned PDFs,unstructuredlibrary for mixed content. - Quick test:
pdfplumber.open('paper.pdf').pages[0].extract_text()- inspect visually for garbling.
Problem 3: “Query returns 0 results with similarity > 0.5”
- Why: Query is outside the knowledge base domain, or embedding model mismatch (query embedded with different model than chunks).
- Fix: Fallback to “no relevant information found” response. Check embedding model consistency.
- Quick test: Embed a chunk’s text as a query - should return that chunk with similarity ≈ 1.0.
Problem 4: “Indexing 100k chunks takes 30 minutes”
- Why: Calling OpenAI API for each chunk individually (network latency dominates).
- Fix: Batch requests (up to 2048 chunks per API call). Use
asynciofor parallelism. - Quick test: Time 1 chunk vs 100 chunks batched - batching should be 10-50x faster.
Problem 5: “Answers hallucinate facts not in retrieved chunks”
- Why: LLM ignores context and uses pretrained knowledge. Prompt doesn’t enforce grounding.
- Fix: Add to prompt: “Answer ONLY using the context above. If the answer isn’t in the context, say ‘I don’t have enough information.’”
- Quick test: Query something NOT in the knowledge base - LLM should refuse to answer.
Definition of Done
- PDF ingestion works: Extract text from 3+ PDFs with different layouts (text-heavy, tables, diagrams)
- Chunking is semantic: Verify chunks split on paragraph boundaries, not mid-sentence
- Embeddings are generated: 500+ chunks embedded successfully, stored with metadata
- FAISS index builds: Index file saved to disk, loads correctly on restart
- Query retrieval works: Top-5 chunks for a test query include expected results
- Similarity scores make sense: Relevant chunks score > 0.7, irrelevant < 0.5
- Answers are grounded: Generated responses cite sources (page numbers, document names)
- Edge cases handled: Empty PDFs, malformed PDFs, queries outside knowledge base domain
- Performance is acceptable: Query latency < 2 seconds (embedding + search + generation)
- Code is documented: README explains ingestion, querying, and adding new documents
Project 29: “The Delegate Background Worker” — Async Workflows
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Bash |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Async Workflows |
What you’ll build: Use delegate to run tests and fix linting while you keep working.
Why it teaches Parallelism: The agent becomes non-blocking.
Success criteria:
- A background task completes and reports its summary.
Real World Outcome
You will have a Kiro CLI workflow that delegates time-consuming tasks to background agents while you continue working on other tasks. When you run it, you’ll see:
Delegating a Task:
$ kiro "I need to implement user authentication. But first, delegate fixing all ESLint errors in the background."
🔄 Delegating Task: Fix ESLint errors
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Starting background agent: kiro_delegate_001
Task: Fix all ESLint errors in src/ directory
Priority: Low
Estimated time: 10-15 minutes
Background agent running in: /tmp/kiro_delegate_001.log
Agent ID: agent_20240102_143052
[Main session continues - you keep working on auth]
✓ Background task delegated. You can continue working.
Main Session Continues (Non-Blocking):
[You continue in the main conversation]
$ kiro "Add JWT authentication to the Express API"
🔨 Implementing JWT Authentication
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Installing dependencies...
$ npm install jsonwebtoken bcrypt
Step 2: Creating auth middleware...
[Creating src/middleware/auth.js]
Step 3: Adding login endpoint...
[Creating POST /api/auth/login]
[While you work, background agent runs in parallel]
Background Task Completes:
🎉 Background Task Complete: agent_20240102_143052
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Task: Fix ESLint errors
Duration: 12m 34s
Status: Success ✓
Summary of Changes:
├─ Fixed 47 ESLint errors across 12 files
│ ├─ 23 × no-unused-vars (removed dead code)
│ ├─ 15 × prefer-const (changed let → const)
│ ├─ 6 × no-console (added eslint-disable comments)
│ └─ 3 × missing-return (added return statements)
├─ Ran tests: 156 passed, 0 failed
└─ Created commit: "fix: resolve 47 ESLint errors"
Files modified:
M src/controllers/userController.js
M src/services/authService.js
M src/utils/validation.js
... (9 more files)
Full log: /tmp/kiro_delegate_001.log
Checking Background Task Status:
$ kiro "/delegate status"
📋 Active Background Tasks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
agent_20240102_143052 [COMPLETE] ✓
Task: Fix ESLint errors
Started: 2:30 PM
Completed: 2:43 PM (12m 34s)
Status: Success
agent_20240102_143515 [RUNNING] ⏳
Task: Run integration test suite
Started: 2:35 PM
Elapsed: 8m 15s
Progress: Running test 47/89...
agent_20240102_144201 [FAILED] ✗
Task: Update all dependencies
Started: 2:42 PM
Failed: 2:45 PM (3m 12s)
Error: Dependency conflict in @types/node
Log: /tmp/kiro_delegate_003.log
Monitoring Real-Time Progress:
$ tail -f /tmp/kiro_delegate_002.log
[14:35:12] Starting integration tests...
[14:36:45] ✓ Auth flow tests (12 tests, 2.3s)
[14:37:23] ✓ Database migrations (8 tests, 3.1s)
[14:38:01] ⏳ API endpoint tests (running 47/89...)
[14:38:45] ✓ POST /api/users (201 response, 0.8s)
[14:39:12] ✓ GET /api/users/:id (200 response, 0.5s)
...
You’re seeing exactly what concurrent programming enables - parallelism that lets you stay productive while long-running tasks complete in the background!
The Core Question You’re Answering
“How do you make Kiro non-blocking so you can work on Task A while Task B executes in parallel?”
Before you write any code, sit with this question. Most CLI tools are synchronous (“wait for this to finish before doing anything else”), but modern development workflows demand parallelism:
Synchronous (blocking) workflow:
You: "Run the test suite"
Kiro: [runs 500 tests for 15 minutes]
You: [waits... twiddling thumbs... can't do anything else]
Kiro: "Tests passed!"
You: [finally continues]
Asynchronous (non-blocking) workflow:
You: "Delegate running the test suite to a background agent"
Kiro: [spawns background agent]
Background Agent: [runs 500 tests for 15 minutes in parallel]
You: [continues working on auth implementation]
You: [continues working on API docs]
Background Agent: "Tests passed!" [notifies when done]
This is the same pattern as:
- GitHub Actions (CI/CD in background while you keep coding)
- Background jobs in web apps (Sidekiq, Celery, Bull)
- Async/await in programming (non-blocking I/O)
Concepts You Must Understand First
Stop and research these before coding:
- Process Management (Unix)
- What is a process? (running program with PID)
- How do you spawn a background process in Bash? (
command &,nohup,disown) - What happens when a parent process exits? (orphaned processes, init adoption)
- Book Reference: “Advanced Programming in the UNIX Environment” Ch. 9 (Process Relationships) - Stevens & Rago
- Inter-Process Communication (IPC)
- How do two processes communicate? (pipes, sockets, files, signals)
- What is stdout/stderr redirection? (
>,>>,2>&1) - How do you read a process’s output while it’s running? (
tail -f, named pipes) - Book Reference: “The Linux Programming Interface” Ch. 44 (Pipes and FIFOs) - Michael Kerrisk
- Job Control
- What is job control? (fg, bg, jobs, kill)
- How do you bring a background job to the foreground? (
fg %1) - What signals exist? (SIGTERM, SIGKILL, SIGINT, SIGHUP)
- Book Reference: “Learning the bash Shell” Ch. 8 (Job Control) - Cameron Newham
- Async Execution Patterns
- What is the difference between parallel and concurrent?
- How do you wait for multiple background tasks? (
wait $PID1 $PID2) - What is a task queue? (producers add tasks, workers consume)
- Blog Reference: “Concurrency vs Parallelism” - Rob Pike (Go creator)
- Exit Codes & Error Handling
- What do exit codes mean? (0 = success, 1-255 = error)
- How do you capture a background process’s exit code? (
wait $PID; echo $?) - How do you handle failures in background tasks?
- Docs Reference: Bash manual on exit status
Questions to Guide Your Design
Before implementing, think through these:
- Task Lifecycle
- How do you spawn a background Kiro session? (new process? Docker container? tmux pane?)
- Where do you store task metadata? (PID, log file, status, start time)
- How do you track which tasks are running vs completed?
- Communication Protocol
- How does the main session know when a background task completes?
- File-based polling (check status.json every 5s)?
- Signal-based notification (SIGUSR1 when done)?
- Webhook/HTTP callback?
- Logging & Observability
- Where do background task logs go? (separate file per task? centralized?)
- How do you tail logs in real-time? (
tail -f) - How do you prevent log files from growing unbounded? (rotation, max size)
- Error Handling
- What if a background task crashes? (save stack trace to log)
- What if a background task hangs? (timeout after 1 hour?)
- What if the main session exits while background tasks run? (orphan cleanup?)
- Resource Limits
- How many background tasks can run concurrently? (CPU cores, memory limits)
- Should you queue tasks if too many are running? (max 4 concurrent)
- How do you prioritize tasks? (critical > high > normal > low)
Thinking Exercise
Trace Background Task Execution
Before coding, manually trace this workflow:
Given:
- Main session: User asks Kiro to implement JWT auth
- Background task: Fix ESLint errors (15-minute task)
Trace each step:
- User Delegates Task
$ kiro "Delegate fixing ESLint errors while I work on auth"- Main Kiro session detects “delegate” keyword
- Creates task metadata:
{ "id": "agent_20240102_143052", "task": "Fix ESLint errors", "status": "starting", "pid": null, "log_file": "/tmp/kiro_delegate_001.log", "start_time": "2024-01-02T14:30:52Z" } - Question: Where is this metadata stored? (file? database? in-memory?)
- Spawn Background Process
# Main session executes: nohup kiro "Fix all ESLint errors in src/" > /tmp/kiro_delegate_001.log 2>&1 & BACKGROUND_PID=$! # Update metadata with PID echo $BACKGROUND_PID > /tmp/kiro_delegate_001.pidnohup: Ignore SIGHUP (session logout won’t kill it)&: Run in background$!: Capture PID of last background process- Question: What if the background Kiro process spawns subprocesses? (process tree)
- Main Session Continues (Non-Blocking)
# User continues working $ kiro "Add JWT authentication" # Main session is responsive immediately [Working on auth implementation...]- Background task runs in parallel
- Main session doesn’t wait
- Question: How does Kiro prevent context pollution? (separate conversation history?)
- Background Task Runs
# In the background (separate process): [Background Agent Log - /tmp/kiro_delegate_001.log] [14:30:52] Starting task: Fix ESLint errors [14:31:05] Running: eslint src/ --fix [14:31:45] Fixed 47 errors across 12 files [14:32:10] Running tests: npm test [14:43:15] Tests passed (156/156) [14:43:20] Creating commit: "fix: resolve 47 ESLint errors" [14:43:26] Task complete ✓- Question: How does the background agent know to commit? (task instructions)
- Completion Detection
# Background process writes completion metadata { "id": "agent_20240102_143052", "status": "complete", "exit_code": 0, "end_time": "2024-01-02T14:43:26Z", "summary": "Fixed 47 ESLint errors, tests passed" } # Main session polls or gets notified [Main Session] Background task agent_20240102_143052 completed ✓- Question: Polling (check every 10s) vs event-driven (callback)?
Questions while tracing:
- What if the background task needs user input? (can’t prompt, must fail gracefully)
- What if the background task modifies files the main session is using? (file locking, conflict resolution)
- What if two background tasks both try to commit? (git lock conflict)
The Interview Questions They’ll Ask
Prepare to answer these:
-
“Explain the difference between concurrency and parallelism. How does background task delegation relate to each?”
-
“Your background task hangs indefinitely. How would you implement a timeout mechanism in Bash?”
-
“You have 10 background tasks queued but only 2 CPU cores. How would you schedule them efficiently?”
-
“A background task crashes halfway through. How do you ensure it doesn’t leave the codebase in a broken state?”
-
“Walk me through how
nohup command &works at the OS level. What happens when the parent shell exits?” -
“You’re running 5 background agents. How would you implement a priority queue so critical tasks run first?”
Hints in Layers
Hint 1: Start with Process Spawning Before building the full system, prove you can spawn a background process and capture its output. Test with a simple sleep command:
# Spawn background process
nohup sleep 30 > /tmp/test.log 2>&1 &
BACKGROUND_PID=$!
echo "Started PID: $BACKGROUND_PID"
# Check if it's running
ps -p $BACKGROUND_PID
Hint 2: Store Task Metadata Create a task registry (simple JSON file):
# /tmp/kiro_tasks.json
{
"tasks": [
{
"id": "agent_001",
"pid": 12345,
"status": "running",
"log": "/tmp/kiro_delegate_001.log",
"started": "2024-01-02T14:30:52Z",
"task": "Fix ESLint errors"
}
]
}
Use jq to read/write JSON from Bash.
Hint 3: Delegate Command
Implement a /delegate command in Kiro:
# Pseudocode for /delegate handler
if user_input.startswith("/delegate"):
task_description = extract_task(user_input)
# Create task metadata
task_id = generate_id() # agent_YYYYMMDD_HHMMSS
log_file = f"/tmp/kiro_delegate_{task_id}.log"
# Spawn background Kiro process
pid = spawn_background(f"kiro '{task_description}'", log_file)
# Register task
register_task(task_id, pid, log_file, task_description)
# Notify user
print(f"✓ Delegated task {task_id} (PID {pid})")
Hint 4: Monitor Background Tasks
Implement a /delegate status command:
# Read task registry
tasks = read_json("/tmp/kiro_tasks.json")
for task in tasks:
pid = task['pid']
# Check if process is still running
if process_exists(pid):
status = "RUNNING ⏳"
else:
exit_code = get_exit_code(pid) # from wait $pid
status = "COMPLETE ✓" if exit_code == 0 else "FAILED ✗"
print(f"{task['id']} [{status}] {task['task']}")
Hint 5: Tail Logs in Real-Time Allow users to monitor background tasks:
# Command: kiro "/delegate logs agent_001"
log_file = get_log_file("agent_001")
subprocess.run(f"tail -f {log_file}")
Hint 6: Wait for Completion Implement a blocking wait if needed:
# Command: kiro "/delegate wait agent_001"
pid = get_pid("agent_001")
wait $pid # Blocks until process exits
exit_code = $?
if [ $exit_code -eq 0 ]; then
echo "✓ Task completed successfully"
else
echo "✗ Task failed with exit code $exit_code"
fi
Hint 7: Cleanup Orphaned Tasks When main session exits, decide what to do with background tasks:
# Option 1: Kill all background tasks
trap 'kill $(jobs -p)' EXIT
# Option 2: Let them continue (orphan them)
disown -a # Remove from job table
# Option 3: Ask user
echo "Background tasks running. Kill them? (y/n)"
Hint 8: Prevent Context Pollution Each background agent should have an isolated conversation history:
# Spawn with fresh context
kiro --new-session "Fix ESLint errors"
# Or use explicit context isolation
kiro --context-id "delegate_001" "Fix ESLint errors"
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Process Management & Job Control | “Advanced Programming in the UNIX Environment” by Stevens & Rago | Ch. 9-10 |
| Inter-Process Communication | “The Linux Programming Interface” by Michael Kerrisk | Ch. 44 (Pipes), Ch. 63 (Sockets) |
| Bash Background Jobs | “Learning the bash Shell” by Cameron Newham | Ch. 8 (Job Control) |
| Signals & Process Control | “Linux System Programming” by Robert Love | Ch. 5 (Process Management) |
| Concurrency Patterns | “The Art of Concurrency” by Clay Breshears | Ch. 2 (Threads vs Processes) |
Common Pitfalls & Debugging
Problem 1: “Background task exits immediately after spawning”
- Why: The background process inherits stdin/stdout tied to the terminal, which closes when the parent exits.
- Fix: Use
nohupand redirect all I/O:nohup command > log.txt 2>&1 & - Quick test: Logout and login again, then
ps aux | grep kiro- background process should still be running.
Problem 2: “Can’t read background task’s PID after spawning”
- Why:
$!only works immediately after&. If you run other commands,$!changes. - Fix: Capture PID immediately:
command & PID=$!; echo $PID > task.pid - Quick test:
cat task.pidshould show the correct PID, verify withps -p $(cat task.pid).
Problem 3: “Background task writes to main session’s stdout (pollutes output)”
- Why: Background process still has stdout pointing to the terminal.
- Fix: Redirect stdout/stderr to a log file:
command > /tmp/task.log 2>&1 & - Quick test: Main session should have clean output, logs go to file only.
Problem 4: “Background task hangs forever, no timeout”
- Why: No timeout mechanism in place.
- Fix: Use
timeoutcommand:timeout 1h kiro "long task" &(kills after 1 hour) - Quick test:
timeout 5s sleep 10 &- process should die after 5 seconds.
Problem 5: “Background task conflicts with main session (file locks, git operations)”
- Why: Both sessions try to modify the same files or run
git commitsimultaneously. - Fix: Implement file locking (
flock) or coordinate via task queue (only one git operation at a time). - Quick test: Start two background tasks that both commit - second should wait or fail gracefully.
Definition of Done
- Delegate command works:
/delegate "run tests"spawns a background Kiro process - PID is captured: Background task’s PID is stored and accessible
- Logs are isolated: Each background task writes to its own log file
- Main session is non-blocking: User can continue working immediately after delegating
- Status command works:
/delegate statusshows running/completed/failed tasks - Real-time monitoring:
tail -fon log files shows live progress - Completion detection: Main session notifies when background tasks finish
- Exit code handling: Failed background tasks are marked as failed (non-zero exit code)
- Orphan cleanup: Background tasks don’t become zombies when main session exits
- Context isolation: Background tasks don’t pollute main session’s conversation history
Project 30: “The Recursive Prompt Improver” — Metacognition
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Natural Language |
| Coolness Level | Level 5: Pure Magic |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Metacognition |
What you’ll build: Ask Kiro to analyze its own steering and improve it based on mistakes.
Why it teaches Self-Correction: Your system gets smarter over time.
Success criteria:
- A new steering rule prevents a repeated mistake.
Real World Outcome
You will have a Kiro CLI workflow that enables self-improvement through metacognitive analysis - Kiro analyzes its own behavior, identifies patterns of mistakes, and proposes steering rules to prevent future errors. When you run it, you’ll see:
Triggering Self-Analysis:
$ kiro "You keep running eslint without --fix, so I have to manually fix errors. Analyze this pattern and improve your steering."
🧠 Metacognitive Analysis Mode
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Analyzing recent conversation history...
├─ Reviewing last 50 messages
├─ Identifying repeated patterns
└─ Searching for corrective feedback
🔍 Pattern Detected: ESLint Usage Anti-Pattern
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Occurrences: 8 times in last 2 weeks
Example 1 (2024-01-01):
Kiro: "Running ESLint to check for issues..."
$ eslint src/
User: "You found 23 errors but didn't fix them. Run it with --fix!"
Kiro: "Sorry! Running eslint src/ --fix"
Example 2 (2024-01-02):
Kiro: "Let me check for linting errors..."
$ eslint src/components/
User: "Again, you forgot --fix. Always use --fix!"
Kiro: "My apologies. Running eslint src/components/ --fix"
Example 3 (2023-12-29):
[Similar pattern repeated 6 more times]
Root Cause Analysis:
├─ Current steering doesn't specify --fix flag
├─ Default behavior is to report errors, not fix them
└─ User expects automatic fixing in 100% of cases
📝 Proposed Steering Rule
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Title: "Always Use ESLint --fix Flag"
Rule:
When running ESLint for any reason, ALWAYS include the --fix flag
to automatically fix correctable errors. Do not run eslint without
--fix unless the user explicitly requests a "dry run" or "report only".
Correct:
$ eslint src/ --fix
$ eslint src/**/*.js --fix
Incorrect:
$ eslint src/
$ eslint --debug src/
Exceptions:
- User says "check for errors but don't fix"
- User says "dry run"
- Running in CI/CD (--fix would modify files)
Priority: High (affects 40% of linting tasks)
Impact:
Before: 8 occurrences of this mistake
After: 0 expected occurrences
Time saved: ~2 minutes per occurrence = 16 min/week
Do you want me to add this steering rule to ~/.kiro/steering.md? (y/n)
User Approves and Rule is Added:
$ y
✅ Steering Rule Added
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
File: ~/.kiro/steering.md
Section: ## Linting & Code Quality
Rule: "Always Use ESLint --fix Flag"
Full rule saved. This will apply to all future conversations.
Would you like me to analyze other patterns? (y/n)
Subsequent Behavior (Rule in Effect):
$ kiro "Check the code quality in src/"
🔧 Running Code Quality Check
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
$ eslint src/ --fix
[Applying steering rule: "Always Use ESLint --fix Flag"]
Fixed 12 errors automatically:
├─ 7 × no-unused-vars
├─ 3 × prefer-const
└─ 2 × no-console
Remaining 3 errors require manual intervention:
├─ src/utils.js:45 - Undefined variable 'config'
├─ src/app.js:102 - Missing return statement
└─ src/api.js:23 - Async function without await
✓ Auto-fixable errors resolved
⚠ Manual errors remain - see above
Advanced: Multi-Pattern Analysis:
$ kiro "/meta-analyze last 100 messages"
🧠 Comprehensive Metacognitive Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Analyzing conversation history...
└─ Processed 100 messages, 42 tool calls, 18 corrections
Patterns Detected:
1. 📦 NPM Install Forgetting (12 occurrences)
- You forget to run npm install after adding dependencies
- Proposed Rule: "Always run npm install after package.json changes"
2. 🧪 Test Running Incomplete (8 occurrences)
- You run tests but don't check if they passed
- Proposed Rule: "Always verify test results before proceeding"
3. 📝 Commit Messages Too Vague (15 occurrences)
- Commit messages like "fix bug" instead of descriptive
- Proposed Rule: "Commit messages must describe WHAT and WHY"
4. 🔒 Secrets Exposure (2 occurrences)
- You almost committed API keys in .env files
- Proposed Rule: "NEVER commit .env files, always .gitignore them"
Generate steering rules for all 4 patterns? (y/n)
You’re seeing exactly what metacognition enables - a system that learns from its mistakes and self-improves through reflective analysis!
The Core Question You’re Answering
“How do you make an AI agent that learns from its own mistakes without retraining or fine-tuning?”
Before you write any code, sit with this question. Most AI systems are static - they repeat the same mistakes forever because they have no mechanism for self-correction. But humans learn through:
- Reflection - “I made a mistake”
- Analysis - “Why did I make it?”
- Abstraction - “What pattern caused this?”
- Rule Formation - “What rule would prevent it?”
- Application - “Follow the rule next time”
This is metacognition - thinking about thinking. The system watches itself, identifies failure modes, and updates its behavior.
Traditional approach (static):
Kiro: [makes mistake]
User: [corrects]
Kiro: "Sorry, fixed it"
[2 days later]
Kiro: [makes same mistake again]
Metacognitive approach (self-improving):
Kiro: [makes mistake]
User: [corrects]
Kiro: "I notice I've made this mistake 8 times. Let me analyze..."
Kiro: [proposes steering rule]
User: "Yes, add that rule"
Kiro: [updates steering]
[2 days later]
Kiro: [applies rule, avoids mistake]
This is how AlphaGo learned (self-play), how humans learn (reflection), and how expert systems evolve (knowledge base updates).
Concepts You Must Understand First
Stop and research these before coding:
- Metacognition (Thinking About Thinking)
- What is metacognition? (awareness of one’s own thought processes)
- How do humans self-correct? (error detection → analysis → strategy change)
- What is the OODA loop? (Observe, Orient, Decide, Act)
- Book Reference: “Thinking, Fast and Slow” Ch. 20-21 (Self-Monitoring) - Daniel Kahneman
- Conversation Analysis & Pattern Mining
- How do you detect repeated patterns in text? (regex, n-grams, semantic clustering)
- What is cosine similarity for semantic patterns? (vector comparison)
- How do you extract “correction events”? (user says “no, do it this way”)
- Paper Reference: “Extracting Patterns from Conversational Data” - NLP literature
- Steering/System Prompts
- What is a system prompt? (instructions that guide LLM behavior)
- How do steering rules work? (constraints added to every request)
- What’s the difference between few-shot examples and rules? (examples vs constraints)
- Docs Reference: Anthropic’s “Prompt Engineering Guide”
- Rule Synthesis from Examples
- How do you generalize from specific examples? (abstraction)
- What makes a good rule? (clear, actionable, measurable)
- How do you avoid overfitting rules? (too specific = not generalizable)
- Book Reference: “AI: A Modern Approach” Ch. 19 (Learning from Examples) - Russell & Norvig
- Feedback Loops & System Stability
- What is a feedback loop? (output affects future input)
- What is positive vs negative feedback? (amplifying vs dampening)
- How do you prevent runaway rule creation? (too many rules = conflict)
- Book Reference: “Thinking in Systems” Ch. 1 (Feedback Loops) - Donella Meadows
Questions to Guide Your Design
Before implementing, think through these:
- Pattern Detection
- How do you identify a “mistake”? (user correction keywords: “no”, “actually”, “you forgot”)
- How many occurrences make a “pattern”? (3+ times = pattern, <3 = one-off)
- How do you cluster similar mistakes? (semantic similarity of corrections)
- Analysis Triggering
- User-initiated (“/meta-analyze”) vs automatic (after 3 corrections)?
- Real-time (during conversation) vs batch (end of day)?
- Threshold-based (trigger after N mistakes)?
- Rule Formulation
- Template-based (“Always X when Y”) vs freeform?
- Should rules include examples (few-shot) or just constraints?
- How specific should rules be? (per-project vs global)
- Rule Storage & Application
- Where are rules stored? (steering.md, JSON config, database)
- How are rules loaded? (startup vs dynamic reload)
- Priority/precedence: What if rules conflict? (specific > general)
- Validation & Testing
- How do you test if a rule works? (simulate past mistakes, verify prevention)
- How do you detect bad rules? (too restrictive, blocks valid actions)
- Should rules expire? (remove if not triggered in 3 months)
Thinking Exercise
Trace Metacognitive Loop
Before coding, manually trace this self-improvement cycle:
Given:
- Conversation history: 50 messages
- User has corrected Kiro 3 times for forgetting
npm install
Trace each step:
- Error Detection (Reflection)
Message 12: Kiro: "I've added express to package.json" User: "You forgot to run npm install!" Kiro: "Installing now: npm install" Message 28: Kiro: "Added jsonwebtoken to dependencies" User: "npm install? You always forget this!" Kiro: "Sorry! Running npm install" Message 45: Kiro: "Updated to React 18 in package.json" User: "AGAIN! npm install!!" Kiro: "My apologies. Running npm install"- Question: How do you detect the correction pattern? (user frustration escalates)
- Pattern Extraction
# Pseudocode corrections = [] for i, msg in enumerate(messages): if user_corrected(msg): # Contains "forgot", "you always", "again" corrections.append({ 'index': i, 'context': messages[i-1], # What Kiro did wrong 'correction': msg, 'fix': messages[i+1] # What Kiro did to fix }) # Group similar corrections clusters = cluster_by_similarity(corrections) # Cluster 1: "npm install" corrections (3 occurrences)- Question: What similarity threshold defines a cluster? (cosine > 0.8?)
- Root Cause Analysis
Cluster: "NPM Install Forgetting" (3 occurrences) Common pattern: 1. Kiro modifies package.json (add/update dependency) 2. Kiro does NOT run npm install 3. User reminds Kiro to run npm install 4. Kiro runs npm install Root cause: - Current steering doesn't link package.json changes → npm install - Kiro treats them as independent actions- Question: How do you infer causality? (sequence analysis: A always followed by B)
- Rule Synthesis
Proposed Steering Rule: ## Dependency Management **Always run `npm install` after modifying package.json** When you add, update, or remove dependencies in package.json, IMMEDIATELY run `npm install` to sync node_modules. Correct sequence: 1. Edit package.json (add dependency) 2. Run npm install 3. Verify installation succeeded Don't forget this step - it's required for dependencies to be usable.- Question: Is this rule too specific? (what about yarn, pnpm?)
- User Approval & Application
User: y (approves rule) # Rule added to ~/.kiro/steering.md # Next conversation, rule is loaded Kiro: "Adding lodash to package.json..." [Applying rule: "Always run npm install after modifying package.json"] Kiro: "Running npm install..." $ npm install Kiro: "✓ lodash installed successfully"- Question: How do you verify the rule prevented the mistake? (no correction needed)
Questions while tracing:
- What if the user corrects something that’s actually context-specific? (rule would be wrong)
- What if two rules conflict? (“Always X” vs “Never X in situation Y”)
- What if a rule is too broad? (blocks valid edge cases)
The Interview Questions They’ll Ask
Prepare to answer these:
-
“Explain the difference between metacognition in humans and self-improvement in AI systems. What are the key similarities and differences?”
-
“Your Kiro agent proposes a steering rule that’s too specific: ‘Always use port 3000 for Express servers.’ How would you generalize this into a better rule?”
-
“You’ve added 50 steering rules over 6 months. Now Kiro is slow and rules conflict. How do you prune and consolidate rules?”
-
“Walk me through how you would detect that a steering rule is harmful (blocking valid actions). What metrics would you track?”
-
“How would you prevent an adversarial user from poisoning the steering rules by giving intentionally bad corrections?”
-
“Explain the concept of ‘overfitting’ in machine learning. How does it relate to creating overly specific steering rules?”
Hints in Layers
Hint 1: Start with Manual Analysis Don’t automate pattern detection immediately. First, manually review your conversation history and identify 3 real mistakes Kiro made repeatedly. Write them down with examples.
Hint 2: Implement Correction Detection Scan conversation history for user corrections using keyword matching:
correction_keywords = [
"you forgot",
"you always",
"again",
"no, do it this way",
"that's wrong",
"actually",
"incorrect"
]
for msg in messages:
if any(kw in msg.content.lower() for kw in correction_keywords):
# Mark as correction
corrections.append(msg)
Hint 3: Cluster Similar Corrections Use embeddings to group semantically similar corrections:
# Embed each correction
correction_embeddings = [
embed(c.content) for c in corrections
]
# Compute pairwise similarity
from sklearn.cluster import DBSCAN
clusters = DBSCAN(eps=0.3, min_samples=2).fit(correction_embeddings)
# Cluster 0: npm install corrections
# Cluster 1: eslint --fix corrections
# Cluster 2: commit message corrections
Hint 4: Extract Pattern Context For each cluster, extract the Kiro action that preceded the correction:
for cluster in clusters:
for correction in cluster.corrections:
prev_msg = get_previous_message(correction) # What Kiro did
next_msg = get_next_message(correction) # How Kiro fixed it
pattern = {
'mistake': prev_msg.content,
'correction': correction.content,
'fix': next_msg.content
}
Hint 5: Generate Rule Template Use an LLM to synthesize a rule from the pattern:
prompt = f"""
Based on these examples of a repeated mistake:
Example 1: {pattern_1}
Example 2: {pattern_2}
Example 3: {pattern_3}
Generate a steering rule that would prevent this mistake in the future.
Format:
## [Category]
**[Rule Title]**
[Rule description with examples of correct behavior]
"""
proposed_rule = llm(prompt)
Hint 6: Present for User Approval Display the proposed rule and ask for confirmation:
print(f"""
Proposed Steering Rule:
{proposed_rule}
Impact:
- Occurrences: {len(pattern.examples)}
- Estimated time saved: {time_estimate}
Add this rule to steering.md? (y/n)
""")
Hint 7: Append Rule to Steering File
If approved, append to ~/.kiro/steering.md:
if user_approves:
with open(os.path.expanduser('~/.kiro/steering.md'), 'a') as f:
f.write(f"\n\n{proposed_rule}\n")
print("✅ Rule added successfully")
Hint 8: Verify Rule Application In future conversations, check if the rule prevents the mistake:
# Load steering rules at startup
steering_rules = load_steering('~/.kiro/steering.md')
# Before each action, check rules
if action == 'modify package.json':
relevant_rules = [r for r in steering_rules if 'npm install' in r]
if relevant_rules:
print("[Applying rule: 'Always run npm install after package.json changes']")
run_npm_install()
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Metacognition & Self-Monitoring | “Thinking, Fast and Slow” by Daniel Kahneman | Ch. 20-21 |
| Learning from Examples | “Artificial Intelligence: A Modern Approach” by Russell & Norvig | Ch. 19 |
| Feedback Loops & Systems | “Thinking in Systems” by Donella Meadows | Ch. 1 |
| Pattern Mining in Text | “Speech and Language Processing” by Jurafsky & Martin | Ch. 8 (Sequence Labeling) |
| Prompt Engineering | “The Prompt Engineering Guide” (online) | All chapters |
Common Pitfalls & Debugging
Problem 1: “Too many false positives - normal feedback detected as corrections”
- Why: Overly broad keyword matching. “Actually, that looks good” is not a correction.
- Fix: Use sentiment analysis or semantic similarity. Corrections have negative sentiment + suggest alternative action.
- Quick test: Review 10 detected “corrections” - should all be actual mistakes, not positive feedback.
Problem 2: “Proposed rules are too specific - don’t generalize”
- Why: Rule synthesized from a single example, not a pattern.
- Fix: Require 3+ examples before creating a rule. Use LLM to generalize.
- Quick test: Proposed rule should apply to related scenarios, not just the exact mistake.
Problem 3: “Steering file grows unbounded - 100+ rules conflict”
- Why: No pruning or consolidation mechanism.
- Fix: Implement rule expiry (remove if not triggered in 6 months), rule merging (combine similar rules).
- Quick test: Count rules - should be <50 active rules at any time.
Problem 4: “Rule prevents valid actions - too restrictive”
- Why: Rule doesn’t account for edge cases or exceptions.
- Fix: Add exception clauses to rules (“unless user explicitly requests X”).
- Quick test: User should be able to override any rule with explicit instruction.
Problem 5: “Metacognitive analysis is too slow - takes 30 seconds”
- Why: Embedding all messages and clustering on every request.
- Fix: Cache analysis results, run batch analysis periodically (not real-time).
- Quick test: Analysis should complete in <5 seconds for 100-message history.
Definition of Done
- Correction detection works: Identifies when user corrects Kiro’s behavior
- Pattern clustering works: Groups similar corrections into clusters (3+ examples)
- Root cause analysis works: Identifies the Kiro action that triggered correction
- Rule synthesis works: LLM generates a clear, actionable steering rule
- User approval flow works: Displays proposed rule and waits for confirmation
- Rule persistence works: Approved rules are saved to steering.md
- Rule application works: New conversations load and apply steering rules
- Mistake prevention verified: Past mistakes no longer occur after rule is added
- Rule quality checks: Rules are general enough to apply broadly, specific enough to be actionable
- Performance is acceptable: Metacognitive analysis completes in <10 seconds
Project 31: “The Legacy Code Archaeologist” — Code Understanding
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | C / Java (Legacy) |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Code Understanding |
What you’ll build: Map a legacy repo and generate a call-graph summary.
Why it teaches Exploration: The AI becomes a codebase cartographer.
Success criteria:
DIAGRAM.mdmatches real entry points and call flow.
Real World Outcome
You will have a Kiro CLI workflow that automatically reverse-engineers legacy codebases by generating architectural diagrams, call graphs, and dependency maps from un documented code. When you run it, you’ll see:
Analyzing a Legacy Codebase:
$ kiro "Analyze this legacy Java codebase and create a DIAGRAM.md with the architecture"
🔍 Legacy Code Archaeologist - Codebase Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase 1: Discovery
├─ Scanning directory structure...
│ └─ Found 247 Java files across 38 packages
├─ Identifying entry points...
│ ├─ Main.java (static void main)
│ ├─ ServletInitializer.java (web entry)
│ └─ ScheduledTasks.java (cron jobs)
└─ Extracting package structure...
✓ com.legacy.app (core business logic)
✓ com.legacy.db (database layer)
✓ com.legacy.web (servlets & controllers)
✓ com.legacy.util (utilities)
Phase 2: Call Graph Construction
├─ Parsing method calls...
│ └─ Analyzed 1,843 methods across 247 files
├─ Building dependency graph...
│ └─ Detected 4,521 method-to-method calls
└─ Identifying hot paths...
✓ UserService.authenticate() called from 23 places
✓ DatabaseConnection.execute() called from 156 places
✓ LegacyValidator.validate() called from 89 places
Phase 3: Architecture Extraction
├─ Detecting design patterns...
│ ├─ Singleton: DatabaseConnection, ConfigManager
│ ├─ Factory: UserFactory, OrderFactory
│ └─ Observer: EventBus, NotificationManager
├─ Identifying layers...
│ └─ 3-tier architecture (Web → Service → DAO)
└─ Extracting data flow...
✓ HTTP Request → Servlet → Service → DAO → Database
✍️ Generating DIAGRAM.md...
✓ Architecture diagram created
✓ Call graph generated
✓ Dependency map visualized
✓ Hot paths documented
File created: DIAGRAM.md (523 lines)
Generated DIAGRAM.md Content:
# Legacy App Architecture - Reverse Engineered
## System Overview
This is a 3-tier Java web application built circa 2008 (Servlet 2.5 era).
No documentation exists - this diagram was generated via static analysis.
## Entry Points
1. **Web Requests** → `com.legacy.web.MainServlet`
2. **Scheduled Jobs** → `com.legacy.jobs.ScheduledTasks`
3. **CLI Admin Tool** → `com.legacy.admin.Main`
## Architecture Diagram
┌─────────────────────────────────────────────────┐ │ WEB LAYER │ │ ┌─────────────────────────────────────────┐ │ │ │ MainServlet, UserServlet, OrderServlet │ │ │ └─────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ (HTTP requests) ↓ ┌─────────────────────────────────────────────────┐ │ SERVICE LAYER │ │ ┌─────────────────────────────────────────┐ │ │ │ UserService, OrderService, AuthService │ │ │ │ PaymentService, NotificationService │ │ │ └─────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ (business logic) ↓ ┌─────────────────────────────────────────────────┐ │ DAO LAYER │ │ ┌─────────────────────────────────────────┐ │ │ │ UserDAO, OrderDAO, PaymentDAO │ │ │ │ (JDBC-based, no ORM) │ │ │ └─────────────────────────────────────────┘ │ └─────────────────┬───────────────────────────────┘ │ (SQL queries) ↓ ┌─────────────────────────────────────────────────┐ │ DATABASE │ │ MySQL 5.6 (inferred from JDBC driver) │ │ Schema: users, orders, payments, audit_log │ └─────────────────────────────────────────────────┘
## Critical Path: User Authentication
HTTP POST /login │ ↓ MainServlet.doPost() │ ├─ validate(request) │ └─ LegacyValidator.validate() [89 call sites!] │ ├─ UserService.authenticate(username, password) │ │ │ ├─ UserDAO.findByUsername() │ │ └─ DatabaseConnection.execute(sql) │ │ └─ “SELECT * FROM users WHERE username = ?” │ │ │ └─ PasswordHasher.verify(inputPwd, storedHash) │ └─ WARNING: Uses MD5 (insecure!) │ ├─ SessionManager.createSession(user) │ └─ response.redirect(“/dashboard”)
## Dependency Hot Spots
(Methods called most frequently - refactoring risks)
1. **DatabaseConnection.execute()** - 156 call sites
- Every database query goes through this
- Refactoring would break the entire app
2. **LegacyValidator.validate()** - 89 call sites
- Input validation scattered everywhere
- No clear validation strategy
3. **UserService.authenticate()** - 23 call sites
- Called from multiple servlets
- Session management tightly coupled
## Design Pattern Usage
| Pattern | Implementation | Notes |
|---------|----------------|-------|
| Singleton | DatabaseConnection, ConfigManager | Thread-unsafe (bug risk) |
| Factory | UserFactory, OrderFactory | Creates domain objects |
| Observer | EventBus, NotificationManager | Event-driven async tasks |
| DAO | *DAO classes | Raw JDBC, no ORM |
## Technical Debt Detected
1. ❌ **MD5 Password Hashing** (CRITICAL)
- File: `PasswordHasher.java:42`
- Risk: Passwords crackable in seconds
2. ⚠️ **Thread-Unsafe Singleton** (HIGH)
- File: `DatabaseConnection.java`
- Risk: Race conditions under load
3. ⚠️ **No Connection Pooling** (MEDIUM)
- Each request creates new DB connection
- Scalability bottleneck
4. ℹ️ **Servlet 2.5** (LOW)
- Ancient API (2005)
- Recommend upgrade to Servlet 4.0+
You’re seeing exactly what code archaeology enables - transforming undocumented legacy systems into understandable, maintainable architectures!
The Core Question You’re Answering
“How do you understand a 10-year-old codebase with zero documentation and the original developers long gone?”
Before you write any code, sit with this question. Every developer faces this:
- You inherit a legacy project (acquired company, staff turnover, archaeological dig)
- No README, no diagrams, no comments worth reading
- “Just read the code” - but there are 500 files and 100K lines
Manual approach (weeks of work):
Week 1: Read random files, get overwhelmed
Week 2: Find the entry point (main() or servlet)
Week 3: Trace execution paths with a debugger
Week 4: Draw diagrams on a whiteboard
Week 5: Finally understand 20% of the system
AI-assisted approach (hours):
Step 1: Feed entire codebase to Kiro
Step 2: "Map the architecture and generate call graphs"
Step 3: Review generated DIAGRAM.md
Step 4: Ask follow-up questions ("Why does UserService call PaymentDAO directly?")
Result: 80% understanding in 4 hours
This is code archaeology - using static analysis, pattern recognition, and LLM reasoning to reverse-engineer systems.
Concepts You Must Understand First
Stop and research these before coding:
- Static Code Analysis
- What is an Abstract Syntax Tree (AST)? (parse tree of code structure)
- How do you extract method calls from source code? (AST traversal)
- What tools exist? (Understand, Sourcetrail, javaparser, tree-sitter)
- Book Reference: “Compilers: Principles, Techniques, and Tools” Ch. 4 (Syntax Analysis) - Aho, Lam, Sethi, Ullman
- Call Graph Construction
- What is a call graph? (directed graph: nodes = methods, edges = calls)
- Static vs dynamic call graphs (compile-time vs runtime)
- How do you handle polymorphism? (method dispatch is ambiguous)
- Paper Reference: “Practical Algorithms for Call Graph Construction” - Grove & Chambers
- Dependency Analysis
- What is coupling? (how tightly connected are modules)
- What is cohesion? (how focused is a module’s purpose)
- How do you detect circular dependencies? (cycle detection in directed graphs)
- Book Reference: “Clean Architecture” Ch. 13-14 (Component Cohesion/Coupling) - Robert C. Martin
- Design Pattern Recognition
- Common patterns: Singleton, Factory, Observer, Strategy, DAO
- How do you detect patterns in code? (structural matching, AST patterns)
- What are anti-patterns? (God Object, Spaghetti Code, Shotgun Surgery)
- Book Reference: “Design Patterns” - Gamma, Helm, Johnson, Vlissides (Gang of Four)
- Legacy Code Characteristics
- What defines “legacy”? (no tests, no docs, fear of change)
- How do you prioritize what to understand first? (entry points, hot paths)
- What is the strangler fig pattern? (gradually replace legacy system)
- Book Reference: “Working Effectively with Legacy Code” Ch. 1-2 - Michael Feathers
Questions to Guide Your Design
Before implementing, think through these:
- Entry Point Detection
- How do you find main()? (search for
public static void main) - What about web apps? (Servlet annotations, web.xml)
- What about background jobs? (@Scheduled, cron config)
- How do you find main()? (search for
- Call Graph Scope
- Full codebase or just application code? (exclude libraries?)
- How deep to trace calls? (1 level? All transitive dependencies?)
- How do you handle reflection? (runtime method invocation)
- Visualization Format
- ASCII art in markdown (simple, readable in GitHub)
- Graphviz DOT (generates PNG diagrams)
- Mermaid.js (renders in markdown viewers)
- Prioritization
- What’s most important to document first? (entry points, critical paths)
- How do you identify “hot spots”? (most-called methods)
- What about dead code? (unreachable methods)
- Technical Debt Detection
- Security issues (MD5, SQL injection, XSS)
- Performance problems (N+1 queries, missing indexes)
- Maintainability issues (God classes, long methods)
Thinking Exercise
Trace Architecture Extraction
Before coding, manually analyze this legacy Java snippet:
Given:
// MainServlet.java
public class MainServlet extends HttpServlet {
protected void doPost(HttpServletRequest req, HttpServletResponse resp) {
String username = req.getParameter("username");
String password = req.getParameter("password");
User user = UserService.getInstance().authenticate(username, password);
if (user != null) {
SessionManager.createSession(req, user);
resp.sendRedirect("/dashboard");
}
}
}
// UserService.java
public class UserService {
private static UserService instance;
public static UserService getInstance() {
if (instance == null) {
instance = new UserService();
}
return instance;
}
public User authenticate(String username, String password) {
User user = UserDAO.findByUsername(username);
if (user != null && PasswordHasher.verify(password, user.getPasswordHash())) {
return user;
}
return null;
}
}
Trace the analysis:
- Entry Point Identification
MainServlet.doPost()is the entry point (HTTP POST /login)- Question: How do you know this handles /login? (need web.xml or @WebServlet annotation)
- Call Graph
MainServlet.doPost() ├─ UserService.getInstance() ├─ UserService.authenticate() │ ├─ UserDAO.findByUsername() │ └─ PasswordHasher.verify() └─ SessionManager.createSession()- Question: What methods does UserDAO.findByUsername() call? (need to analyze UserDAO source)
- Pattern Detection
- Singleton:
UserService.getInstance()(lazy initialization) - DAO:
UserDAO(data access layer) - Question: Is this thread-safe? (NO! Double-checked locking bug)
- Singleton:
- Technical Debt
- Singleton is thread-unsafe (race condition)
- No input validation (SQL injection risk)
- Direct string comparison (timing attack on password)
- Question: What’s the priority order for fixing? (security > concurrency > style)
Questions while tracing:
- How do you handle method overloading? (multiple findByUsername() signatures)
- What if UserDAO uses reflection? (can’t see calls statically)
- How deep should the call graph go? (stop at library boundaries?)
The Interview Questions They’ll Ask
Prepare to answer these:
-
“Walk me through how you would reverse-engineer a 100K-line Java codebase with no documentation. What’s your systematic approach?”
-
“You detect a Singleton pattern in legacy code that’s accessed from 50 places. How would you refactor it safely without breaking the app?”
-
“Your call graph tool reports 10,000 method calls. How would you prioritize which ones to document first?”
-
“Explain the difference between static and dynamic call graphs. When would you need a dynamic call graph despite the extra complexity?”
-
“You find a method called 500 times across the codebase. How would you determine if this is a design problem or just a legitimate utility method?”
-
“How would you detect and visualize circular dependencies in a legacy codebase? What tools and algorithms would you use?”
Hints in Layers
Hint 1: Start with Entry Point Detection Use grep to find main methods and servlets:
# Find Java main methods
rg "public static void main" --type java
# Find servlets
rg "extends HttpServlet" --type java
rg "@WebServlet" --type java
# Find Spring controllers
rg "@Controller|@RestController" --type java
Hint 2: Parse Source Code into AST Use a parsing library (don’t write a parser from scratch):
# For Java: use javalang or tree-sitter
import javalang
tree = javalang.parse.parse(java_source_code)
for path, node in tree.filter(javalang.tree.MethodInvocation):
print(f"Method call: {node.member}")
# Extract: UserService.getInstance().authenticate()
Hint 3: Build Call Graph Create a directed graph of method calls:
call_graph = {} # {caller: [callees]}
for file in java_files:
tree = parse(file)
for method in tree.methods:
caller = f"{method.class_name}.{method.name}"
callees = extract_method_calls(method)
call_graph[caller] = callees
# Example output:
# {
# 'MainServlet.doPost': ['UserService.authenticate', 'SessionManager.createSession'],
# 'UserService.authenticate': ['UserDAO.findByUsername', 'PasswordHasher.verify']
# }
Hint 4: Detect Hot Spots Count incoming edges to find most-called methods:
call_counts = {}
for caller, callees in call_graph.items():
for callee in callees:
call_counts[callee] = call_counts.get(callee, 0) + 1
# Sort by frequency
hot_spots = sorted(call_counts.items(), key=lambda x: x[1], reverse=True)
# [('DatabaseConnection.execute', 156), ('LegacyValidator.validate', 89), ...]
Hint 5: Identify Design Patterns Pattern matching on AST structure:
# Detect Singleton (lazy initialization)
for method in tree.filter(javalang.tree.MethodDeclaration):
if method.name == 'getInstance' and 'static' in method.modifiers:
# Check for: if (instance == null) instance = new ...
print(f"Singleton detected: {method.class_name}")
Hint 6: Generate ASCII Diagram Format call paths as a tree:
def print_call_tree(method, graph, depth=0, max_depth=3):
if depth > max_depth:
return
indent = " " * depth
print(f"{indent}├─ {method}")
for callee in graph.get(method, []):
print_call_tree(callee, graph, depth + 1)
# Output:
# ├─ MainServlet.doPost
# ├─ UserService.authenticate
# ├─ UserDAO.findByUsername
# └─ PasswordHasher.verify
Hint 7: Use LLM for Pattern Explanation Once you have the call graph, ask Kiro to explain it:
prompt = f"""
Based on this call graph:
{json.dumps(call_graph, indent=2)}
1. What architectural pattern is this? (MVC, layered, etc.)
2. Identify the entry points
3. Spot any design issues or anti-patterns
4. Generate a markdown diagram
"""
explanation = llm(prompt)
Hint 8: Validate Generated Diagram Cross-reference with actual code execution:
- Run the app with a debugger
- Set breakpoints at entry points
- Trace actual call stack
- Compare with static analysis diagram
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Static Code Analysis | “Compilers: Principles, Techniques, and Tools” by Aho et al. | Ch. 4 (Syntax Analysis) |
| Call Graph Algorithms | “Engineering a Compiler” by Cooper & Torczon | Ch. 9 (Data-Flow Analysis) |
| Design Patterns | “Design Patterns” by Gang of Four | All chapters |
| Legacy Code Understanding | “Working Effectively with Legacy Code” by Michael Feathers | Ch. 1-2, 16 |
| Software Architecture | “Clean Architecture” by Robert C. Martin | Ch. 13-14 (Components) |
Common Pitfalls & Debugging
Problem 1: “Call graph includes too many library methods (java.util.*, etc.)”
- Why: No filtering - you’re graphing the entire JDK.
- Fix: Filter out standard library packages. Only graph application code.
- Quick test: Call graph should have <1000 nodes for a typical app.
Problem 2: “Missing method calls - graph is incomplete”
- Why: Reflection, lambda expressions, or method references not detected.
- Fix: Combine static analysis with dynamic profiling (run app with instrumentation).
- Quick test: Cross-check against actual execution trace from debugger.
Problem 3: “Singleton detection produces false positives”
- Why: Any method named getInstance() triggers detection.
- Fix: Check for static field + lazy initialization pattern, not just method name.
- Quick test: Manual code review of detected Singletons.
Problem 4: “Generated diagram is unreadable - 1000+ lines of ASCII”
- Why: Showing entire call graph instead of high-level architecture.
- Fix: Create multiple diagrams: overview + detailed sub-graphs for each layer.
- Quick test: Overview diagram should fit on one screen (<50 lines).
Problem 5: “Analysis takes 10 minutes for 500 files”
- Why: Parsing each file from scratch on every run.
- Fix: Cache parsed ASTs, only re-parse changed files.
- Quick test: Second run should be <5 seconds (cache hit).
Definition of Done
- Entry points identified: Main methods, servlets, scheduled tasks are found
- Call graph built: Method-to-method calls extracted from source code
- Hot spots detected: Most-called methods identified (top 10)
- Design patterns recognized: Singleton, Factory, DAO, etc. detected
- Architecture diagram generated: High-level 3-tier or N-tier diagram created
- Critical paths documented: Authentication, payment, etc. workflows traced
- Technical debt flagged: Security issues (MD5, SQL injection) highlighted
- Markdown format: DIAGRAM.md is readable in GitHub
- Validation: Diagram matches actual execution (verified with debugger)
- Performance: Analysis completes in <2 minutes for 500-file codebase
Project 32: “The Reverse Documenter” — Documentation
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Markdown |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Documentation |
What you’ll build: Generate docs and tests from an undocumented function.
Why it teaches Verification: If the AI cannot explain it, the code is too complex.
Success criteria:
- A new unit test covers the documented edge cases.
Real World Outcome
You’ll have a Kiro workflow that analyzes undocumented code and generates:
- JSDoc/docstring comments explaining purpose, parameters, return values, and edge cases
- Unit tests covering normal cases, edge cases, and error conditions
- README sections documenting public APIs
- Type definitions (TypeScript interfaces, Python type hints)
Example Session:
$ kiro "Document the calculateDiscount function in src/pricing.ts and generate tests"
🔍 Analyzing undocumented function...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Reading: src/pricing.ts
Function: calculateDiscount(price: number, userTier: string, promoCode?: string)
Static Analysis Results:
├─ Parameters: 3 (2 required, 1 optional)
├─ Return type: number
├─ Branches: 8 (if/else conditions)
├─ External calls: 2 (fetchPromoDetails, getUserHistory)
└─ Edge cases detected: 5 (null checks, tier validation, promo expiration)
📝 Generating Documentation...
/**
* Calculates the final discount percentage for a purchase.
*
* Combines base tier discount with promotional codes. Tier discounts
* stack multiplicatively with promo codes. Premium tier users receive
* an additional 5% on top of promo discounts.
*
* @param price - The original price before any discounts (must be > 0)
* @param userTier - User membership tier: 'free', 'pro', or 'premium'
* @param promoCode - Optional promotional code (e.g., 'SAVE20')
* @returns Final discount percentage (0-100)
*
* @throws {Error} If price <= 0 or userTier is invalid
*
* @example
* calculateDiscount(100, 'pro', 'SAVE20') // Returns 25 (5% tier + 20% promo)
* calculateDiscount(100, 'premium', 'SAVE20') // Returns 30 (5% tier + 20% promo + 5% premium bonus)
*/
✅ Documentation added to src/pricing.ts
🧪 Generating Test Cases...
describe('calculateDiscount', () => {
it('should apply tier discount only when no promo code', () => {
expect(calculateDiscount(100, 'pro')).toBe(5);
});
it('should stack tier and promo discounts', () => {
expect(calculateDiscount(100, 'pro', 'SAVE20')).toBe(25);
});
it('should apply premium bonus to promo codes', () => {
expect(calculateDiscount(100, 'premium', 'SAVE20')).toBe(30);
});
it('should throw error for negative price', () => {
expect(() => calculateDiscount(-10, 'pro')).toThrow('Price must be positive');
});
it('should throw error for invalid tier', () => {
expect(() => calculateDiscount(100, 'gold')).toThrow('Invalid user tier');
});
it('should return 0 for expired promo codes', () => {
expect(calculateDiscount(100, 'pro', 'EXPIRED2023')).toBe(5); // Only tier discount
});
});
✅ Test file created: src/__tests__/pricing.test.ts
✅ 6 test cases covering all branches
📊 Coverage Analysis:
├─ Line coverage: 100%
├─ Branch coverage: 100%
└─ Edge cases covered: 5/5
💡 Complexity Signal: calculateDiscount has a cyclomatic complexity of 8.
Consider refactoring into smaller functions if it grows further.
What You See:
- Auto-generated JSDoc comments explaining every parameter, return value, and edge case
- Complete test suite covering happy paths, edge cases, and error conditions
- Coverage report showing 100% branch coverage
- Complexity warnings when functions are too dense
The Core Question You’re Answering
“How can AI reverse-engineer intent from undocumented code, and can it generate tests that prove its understanding is correct?”
This project forces you to confront the verification problem: if Kiro generates documentation that sounds plausible but is wrong, the tests will fail. This feedback loop ensures the AI actually understands the code, not just pattern-matches documentation style.
Concepts You Must Understand First
Stop and research these before coding:
- Static Code Analysis (AST Parsing)
- What is an Abstract Syntax Tree and how do you traverse it?
- How do you extract function signatures, parameter types, and control flow?
- How do you detect edge cases (null checks, boundary conditions)?
- Book Reference: “Compilers: Principles and Practice” by Parag H. Dave - Ch. 2-3
- Test Generation Strategies
- What is the difference between property-based testing and example-based testing?
- How do you identify equivalence classes for input partitioning?
- What is branch coverage vs line coverage vs path coverage?
- Book Reference: “The Art of Software Testing” by Glenford J. Myers - Ch. 4-5
- Documentation Standards
- What are JSDoc, docstring, and XML documentation comment conventions?
- How do you write documentation that survives refactoring?
- What level of detail is appropriate for public vs private APIs?
- Reference: JSDoc specification, PEP 257 (Python Docstring Conventions)
- Cyclomatic Complexity
- How do you measure code complexity (McCabe metric)?
- Why does high complexity correlate with bugs?
- When should you refactor based on complexity scores?
- Book Reference: “Code Complete” by Steve McConnell - Ch. 19
Questions to Guide Your Design
Before implementing, think through these:
- Code Understanding
- How will you parse the target function (AST parser vs regex vs LLM-based)?
- How will you identify edge cases (static analysis vs symbolic execution)?
- How will you handle external dependencies (mocking vs integration tests)?
- How will you detect the function’s actual behavior vs its intended behavior?
- Documentation Quality
- How will you validate that generated docs match actual behavior?
- How will you avoid hallucinating functionality that doesn’t exist?
- How will you decide which details to include vs omit?
- How will you maintain docs when code changes (watch for drift)?
- Test Coverage
- How will you ensure tests actually validate the documented behavior?
- How will you generate realistic test data (random vs domain-specific)?
- How will you avoid brittle tests that break on refactoring?
- How will you measure test quality (mutation testing)?
Thinking Exercise
Exercise: Analyze This Undocumented Function
Given this undocumented JavaScript function:
function process(data, opts) {
if (!data) return [];
const result = [];
const limit = opts?.max || 100;
for (let i = 0; i < data.length && i < limit; i++) {
if (data[i].status === 'active' || opts?.includeInactive) {
result.push({
...data[i],
processed: true,
timestamp: Date.now()
});
}
}
return opts?.reverse ? result.reverse() : result;
}
Questions while analyzing:
- What are the possible input types for
dataandopts? - What are all the edge cases (null data, empty array, missing opts, etc.)?
- What is the function’s actual purpose based on its behavior?
- What would be a good name for this function?
- What test cases would prove you understand its behavior?
- What happens if
datais not an array? Should that be documented/tested?
Expected Documentation:
/**
* Filters and processes active records from a dataset, with optional limits and ordering.
*
* @param {Array<{status: string}>} data - Array of objects with at least a `status` field
* @param {Object} [opts] - Optional configuration
* @param {number} [opts.max=100] - Maximum number of records to process
* @param {boolean} [opts.includeInactive=false] - Whether to include non-active records
* @param {boolean} [opts.reverse=false] - Whether to reverse the output order
* @returns {Array<Object>} Processed records with added `processed` and `timestamp` fields
*
* @example
* process([{status: 'active', id: 1}], {max: 50})
* // Returns: [{status: 'active', id: 1, processed: true, timestamp: 1704211234567}]
*/
Expected Test Cases:
- Returns empty array when data is null/undefined
- Filters out inactive records by default
- Includes inactive records when
opts.includeInactiveis true - Limits output to
opts.maxrecords - Reverses output when
opts.reverseis true - Adds
processed: trueand current timestamp to each record
The Interview Questions They’ll Ask
-
“How would you detect if AI-generated documentation is hallucinating functionality that doesn’t exist in the code?”
-
“Explain the difference between documenting what code does vs why it does it. Which should AI focus on?”
-
“How would you validate that generated tests actually cover the documented edge cases?”
-
“What strategies would you use to keep documentation in sync with code as it evolves?”
-
“How would you measure the quality of AI-generated tests (beyond simple code coverage)?”
-
“Explain how mutation testing could validate that your tests actually catch bugs, not just execute lines.”
Hints in Layers
Hint 1: AST-Based Analysis Use a proper parser (TypeScript Compiler API, Babel, tree-sitter) to extract:
- Function signature (name, parameters, return type)
- Control flow branches (if/else, switch, loops)
- External dependencies (function calls, imports)
- Type annotations (if available)
Hint 2: Edge Case Detection Look for these patterns in the AST:
if (!x)orif (x == null)→ null checkif (arr.length === 0)→ empty array checkif (x < 0)orif (x > MAX)→ boundary conditionsthrow new Error(...)→ error casestry/catch→ exception handling
Hint 3: Test Generation Strategy For each branch in the code:
- Generate a test that triggers that branch
- Assert the expected output for that branch
- Add a test for the inverse condition (branch not taken)
- Add boundary tests (min, max, just-above, just-below)
Hint 4: Documentation Validation Loop
1. Generate documentation from code analysis
2. Generate tests from documentation
3. Run tests against actual code
4. If tests fail → documentation was wrong → regenerate
5. If tests pass → documentation matches behavior ✓
Hint 5: Complexity Signals If a function has:
- Cyclomatic complexity > 10 → suggest refactoring before documenting
- More than 5 parameters → suggest object parameter pattern
- Deeply nested logic → suggest extracting helper functions
- No return type annotation → infer and suggest adding it
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| AST parsing and code analysis | “Compilers: Principles and Practice” by Parag H. Dave | Ch. 2-3 (Lexical/Syntax Analysis) |
| Test generation strategies | “The Art of Software Testing” by Glenford J. Myers | Ch. 4-5 (Test Case Design) |
| Code complexity metrics | “Code Complete” by Steve McConnell | Ch. 19 (Complexity Management) |
| Documentation best practices | “Clean Code” by Robert C. Martin | Ch. 4 (Comments) |
| Property-based testing | “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert | Ch. 1-3 |
Common Pitfalls & Debugging
Problem 1: “Generated docs claim the function does X, but tests show it does Y”
- Why: LLM hallucinated functionality based on function name, not actual code behavior
- Fix: Always validate docs against actual execution (run tests)
- Quick test:
npm test -- --coverageand check if tests pass
Problem 2: “Tests are too brittle - they break when code is refactored”
- Why: Tests are coupled to implementation details, not behavior
- Fix: Test public API behavior, not internal implementation
- Example: Test
calculateDiscount(100, 'pro') === 5notexpect(tierDiscountMap['pro']).toBe(0.05)
Problem 3: “AST parser fails on modern JavaScript syntax (optional chaining, nullish coalescing)”
- Why: Using outdated parser or wrong parser configuration
- Fix: Use TypeScript Compiler API or Babel with latest preset
- Quick test:
npx tsc --version(ensure TypeScript 5.x+)
Problem 4: “Generated tests have 100% line coverage but miss critical bugs”
- Why: Line coverage doesn’t measure test quality, only execution
- Fix: Add mutation testing (Stryker) to validate tests catch bugs
- Quick test:
npx stryker runand check mutation score
Problem 5: “Function is too complex to document clearly (cyclomatic complexity 20+)”
- Why: Function violates Single Responsibility Principle
- Fix: Suggest refactoring before documenting: “This function is too complex. Consider breaking it into smaller functions: extractActiveRecords(), applyLimit(), applyTransform()”
- Signal: If you can’t write clear docs, the code is too complex
Definition of Done
- Generated documentation includes purpose, all parameters, return value, and examples
- All documented edge cases have corresponding test cases
- Tests achieve 100% branch coverage (not just line coverage)
- Tests pass when run against the actual code
- Documentation follows language conventions (JSDoc/docstring/XML doc)
- Complexity warnings are shown for functions with cyclomatic complexity > 10
- Generated tests use realistic test data (not just
foo,bar,123) - Tests are independent (no shared state between tests)
- Error cases are documented and tested (throw conditions, edge cases)
- Public API documentation includes usage examples
Project 33: “The Full Stack Scaffolder” — Rapid Prototyping
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | React / Node.js |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Rapid Prototyping |
What you’ll build: From empty directory to working full stack app in one session.
Why it teaches Agency: The planner and executor must collaborate.
Success criteria:
- App runs locally with a working UI and API.
Real World Outcome
You’ll have a Kiro workflow that transforms a high-level specification into a working full-stack application with:
- Backend API (Express/Fastify with TypeScript)
- Frontend UI (React/Next.js with Tailwind CSS)
- Database (PostgreSQL with Prisma ORM)
- Authentication (JWT or session-based)
- Development tooling (ESLint, Prettier, TypeScript, tests)
- Deployment readiness (Dockerfiles, environment configs)
Example Session:
$ mkdir task-manager-app && cd task-manager-app
$ kiro "Build a full-stack task manager app with user authentication, task CRUD, and tagging. Use React, Node.js, PostgreSQL."
🏗️ Full Stack Scaffolder - Planning Phase
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 Requirements Analysis:
├─ Frontend: React with TypeScript
├─ Backend: Node.js + Express + TypeScript
├─ Database: PostgreSQL + Prisma ORM
├─ Features: Auth, Task CRUD, Tags
└─ Estimated files: ~45 files
📐 Architecture Plan:
task-manager-app/
├── backend/
│ ├── src/
│ │ ├── routes/ (auth.ts, tasks.ts, tags.ts)
│ │ ├── middleware/ (auth.ts, validation.ts, error.ts)
│ │ ├── services/ (AuthService.ts, TaskService.ts)
│ │ ├── models/ (Prisma client)
│ │ └── server.ts
│ ├── prisma/
│ │ └── schema.prisma
│ ├── tests/
│ └── package.json
│
└── frontend/
├── src/
│ ├── components/ (TaskList, TaskForm, LoginForm)
│ ├── pages/ (Dashboard, Login, Register)
│ ├── hooks/ (useAuth, useTasks)
│ ├── api/ (client.ts)
│ └── App.tsx
├── public/
└── package.json
🔄 Executing Build Plan...
[1/8] Setting up backend structure...
✅ Created backend/ directory
✅ Initialized package.json with dependencies
✅ Created tsconfig.json
✅ Set up folder structure
[2/8] Installing backend dependencies...
✅ express, cors, dotenv, jsonwebtoken
✅ prisma, @prisma/client, bcrypt
✅ TypeScript, @types/* packages
✅ jest, supertest (dev dependencies)
[3/8] Configuring Prisma schema...
✅ Created prisma/schema.prisma with User, Task, Tag models
✅ Generated Prisma Client
✅ Created initial migration
model User {
id String @id @default(uuid())
email String @unique
password String
tasks Task[]
createdAt DateTime @default(now())
}
model Task {
id String @id @default(uuid())
title String
description String?
status TaskStatus @default(TODO)
userId String
user User @relation(fields: [userId], references: [id])
tags Tag[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
[4/8] Building backend API routes...
✅ POST /api/auth/register
✅ POST /api/auth/login
✅ GET /api/tasks (protected)
✅ POST /api/tasks (protected)
✅ PUT /api/tasks/:id (protected)
✅ DELETE /api/tasks/:id (protected)
✅ GET /api/tags (protected)
[5/8] Setting up frontend structure...
✅ Created React app with TypeScript
✅ Installed Tailwind CSS
✅ Created component structure
✅ Set up React Router
[6/8] Building frontend components...
✅ LoginForm component (email, password, validation)
✅ TaskList component (fetch, display, delete)
✅ TaskForm component (create/edit tasks)
✅ TaskCard component (status badge, tags, actions)
✅ ProtectedRoute wrapper (auth check)
[7/8] Connecting frontend to backend...
✅ API client with JWT interceptor
✅ useAuth hook (login, register, logout, token refresh)
✅ useTasks hook (CRUD operations)
✅ Environment variables (.env.local)
[8/8] Final setup...
✅ Created README.md with setup instructions
✅ Created .gitignore
✅ Created Docker Compose file (Postgres + app)
✅ Created example.env files
✅ Added npm scripts (dev, build, test)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Full Stack App Generated Successfully!
📦 Files created: 47 files
📏 Total lines of code: ~3,200 LOC
🚀 Quick Start:
# Terminal 1 (Backend)
cd backend
cp example.env .env # Then edit DATABASE_URL
npm install
npm run db:migrate
npm run dev # Runs on http://localhost:3001
# Terminal 2 (Frontend)
cd frontend
cp example.env .env.local
npm install
npm run dev # Runs on http://localhost:3000
# Terminal 3 (Database)
docker-compose up postgres
🎯 Try it out:
1. Open http://localhost:3000
2. Register a new account
3. Create a task with tags
4. See it persist in PostgreSQL!
What You See:
- Working full-stack app generated from a single prompt
- Backend API with authentication, authorization, and CRUD endpoints
- Frontend UI with login, task list, task creation, and tag filtering
- Database schema with relationships (User → Tasks → Tags)
- Complete development environment (hot reload, TypeScript, linting)
- Production-ready patterns (error handling, validation, logging)
The Core Question You’re Answering
“How can AI coordinate multiple sub-agents to plan, architect, and implement a cohesive full-stack application without human intervention?”
This project explores multi-agent collaboration: Kiro must act as both architect (planning) and builder (executing). The planner agent creates a dependency graph (database → backend → frontend), while executor agents implement each layer. Success requires maintaining architectural coherence across all layers.
Concepts You Must Understand First
Stop and research these before coding:
- Full-Stack Architecture Patterns
- What is the separation between presentation, business logic, and data layers?
- How do you design REST APIs that are easy to evolve?
- What is the role of an ORM (Prisma, TypeORM, Sequelize)?
- Book Reference: “Patterns of Enterprise Application Architecture” by Martin Fowler - Ch. 1-3
- Dependency Management and Build Order
- Why must the database schema be defined before the backend?
- Why must the API routes be defined before the frontend?
- How do you handle circular dependencies (frontend ↔ backend during dev)?
- Book Reference: “Building Microservices” by Sam Newman - Ch. 4 (Integration)
- Code Generation Quality
- How do you generate code that is readable, not just functional?
- What conventions should generated code follow (naming, file structure)?
- How do you avoid generating brittle code that breaks on changes?
- Book Reference: “Clean Code” by Robert C. Martin - Ch. 2-3
- Testing Full-Stack Applications
- What is the difference between unit tests, integration tests, and E2E tests?
- How do you test API endpoints (mocking vs real database)?
- How do you test frontend components (React Testing Library)?
- Book Reference: “Test-Driven Development” by Kent Beck - Ch. 1-5
Questions to Guide Your Design
Before implementing, think through these:
- Planning Strategy
- How will you decompose the high-level spec into tasks?
- How will you determine the build order (database → backend → frontend)?
- How will you handle missing requirements (e.g., “should we use sessions or JWT?”)?
- How will you validate the plan before executing?
- Agent Coordination
- How will you split work across sub-agents (one per layer vs one per feature)?
- How will agents communicate shared context (API contracts, types)?
- How will you handle failures mid-build (rollback vs partial completion)?
- How will you parallelize independent tasks (frontend UI + backend routes)?
- Code Quality
- How will you ensure generated code follows best practices?
- How will you avoid hardcoding secrets (database passwords, API keys)?
- How will you generate meaningful variable/function names (not
foo,bar)? - How will you add comments explaining non-obvious logic?
- Development Experience
- How will you set up hot reload for frontend and backend?
- How will you generate helpful README with setup instructions?
- How will you create example
.envfiles with placeholder values? - How will you add npm scripts for common tasks (dev, test, build, deploy)?
Thinking Exercise
Exercise: Design the Build Order for a Blog Platform
Given the spec: “Build a blog platform with posts, comments, authors, and markdown rendering”
Plan the Dependency Graph:
1. Database Schema (Prisma)
├─ User (authors)
├─ Post (belongs to User)
└─ Comment (belongs to Post and User)
2. Backend API Routes
├─ POST /api/auth/register
├─ POST /api/auth/login
├─ GET /api/posts (public)
├─ POST /api/posts (protected, authors only)
├─ GET /api/posts/:id (public)
└─ POST /api/posts/:id/comments (protected)
3. Frontend Components
├─ PostList (fetches from GET /api/posts)
├─ PostDetail (fetches from GET /api/posts/:id, renders markdown)
├─ CommentSection (fetches comments, posts new comment)
└─ AuthorDashboard (fetches user's posts, creates new post)
Questions to answer:
- Which layer must be built first, and why?
- Can frontend and backend be built in parallel? What’s needed for that?
- How would you generate TypeScript types shared between frontend and backend?
- What testing strategy would you use (unit, integration, E2E)?
- How would you handle markdown rendering (server-side vs client-side)?
- What security considerations exist (XSS in markdown, auth on comments)?
The Interview Questions They’ll Ask
-
“How would you design a multi-agent system where one agent plans and others execute? How do they communicate?”
-
“Explain the trade-offs between generating a monorepo vs separate repositories for frontend and backend.”
-
“How would you ensure generated code follows the user’s preferred conventions (tabs vs spaces, naming style)?”
-
“What strategies would you use to validate the generated app actually works (automated testing vs manual verification)?”
-
“How would you handle evolving requirements (user asks to add a feature to the generated app)?”
-
“Explain how you’d generate a database schema that supports future migrations without breaking existing data.”
Hints in Layers
Hint 1: Task Decomposition Strategy Break the spec into layers and features:
- Infrastructure: Database, environment config, Docker
- Backend Foundation: Express server, middleware, error handling
- Database Models: Prisma schema, migrations
- Backend Features: Auth routes → Task routes → Tag routes
- Frontend Foundation: React app, routing, API client
- Frontend Features: Login page → Dashboard → Task components
Hint 2: Sub-Agent Workflow Use Kiro’s subagent system:
# Planning agent
kiro plan "Generate full-stack task manager app"
# Execution agents (parallel)
kiro task "Set up backend Express server with TypeScript"
kiro task "Create Prisma schema with User, Task, Tag models"
kiro task "Build frontend React app with Tailwind"
Hint 3: Shared Type Generation Generate TypeScript types that both frontend and backend use:
// backend/src/types/api.ts
export interface Task {
id: string;
title: string;
status: 'TODO' | 'IN_PROGRESS' | 'DONE';
tags: Tag[];
}
// frontend/src/types/api.ts (copy or import from backend)
import type { Task } from '../../backend/src/types/api';
Better: Use a shared types/ package or generate from Prisma schema.
Hint 4: Validation Before Execution Before generating 47 files, validate the plan:
1. Show the directory structure to the user
2. Ask: "Should I proceed with this architecture?"
3. If approved → execute
4. If not → refine plan based on feedback
Hint 5: Incremental Verification After each layer, validate it works:
1. Generate database schema → Run `prisma generate` → Check for errors
2. Generate backend routes → Run `npm run build` → Check TypeScript errors
3. Generate frontend → Run `npm run build` → Check for compilation errors
4. Start servers → Run `curl http://localhost:3001/health` → Verify 200 OK
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Full-stack architecture | “Patterns of Enterprise Application Architecture” by Martin Fowler | Ch. 1-3 (Layering) |
| REST API design | “RESTful Web APIs” by Leonard Richardson | Ch. 3-5 (Resource Design) |
| React patterns | “React Design Patterns and Best Practices” by Michele Bertoli | Ch. 1-4 |
| TypeScript best practices | “Effective TypeScript” by Dan Vanderkam | Ch. 1-3 (Types) |
| Testing strategies | “Test-Driven Development” by Kent Beck | Ch. 1-5 |
| Multi-agent systems | “Building Microservices” by Sam Newman | Ch. 4 (Integration) |
Common Pitfalls & Debugging
Problem 1: “Generated frontend tries to call backend API before backend is running”
- Why: Frontend hardcodes API URL without checking if backend is reachable
- Fix: Add health check endpoint and retry logic in API client
- Quick test:
curl http://localhost:3001/healthshould return 200 OK
Problem 2: “Prisma migration fails with ‘relation does not exist’“
- Why: Running migrations before database is created
- Fix: Ensure Docker Compose starts Postgres before running
prisma migrate - Quick test:
docker psshows postgres container, thennpm run db:migrate
Problem 3: “Frontend builds but shows blank page (no errors in console)”
- Why: React Router routes not configured correctly, or missing index.html
- Fix: Check
App.tsxhas<BrowserRouter>and routes defined - Quick test: Open browser console, check for 404s or router errors
Problem 4: “Generated code has inconsistent formatting (mix of tabs and spaces)”
- Why: Different code generators use different formatting
- Fix: Run Prettier on all generated files:
npx prettier --write "**/*.{ts,tsx}" - Quick test:
git diffshould show consistent indentation
Problem 5: “Backend API returns 401 Unauthorized for all protected routes”
- Why: JWT secret not set in .env, or frontend not sending token
- Fix: Check
.envhasJWT_SECRET, check frontend setsAuthorization: Bearer <token> - Quick test:
curl -H "Authorization: Bearer <token>" http://localhost:3001/api/tasks
Problem 6: “Generated app works locally but environment variables are committed to git”
- Why:
.envfile was committed instead ofexample.env - Fix: Add
.envto.gitignore, commitexample.envwith placeholders - Security: Never commit real secrets; use
example.envwithCHANGE_MEvalues
Definition of Done
- Backend server starts without errors (
npm run devin backend/) - Frontend builds and runs (
npm run devin frontend/) - Database migrations apply cleanly (
npm run db:migrate) - All API routes return expected responses (test with curl or Postman)
- Frontend can register a user, login, and perform CRUD operations
- TypeScript compiles without errors in both frontend and backend
- ESLint and Prettier pass on all generated files
- README.md includes setup instructions and quick start guide
.gitignoreexcludes node_modules, .env, build artifacts- Example environment files (
example.env) are provided with placeholders - Docker Compose file successfully starts the database
- At least one test passes in backend (API route test) and frontend (component test)
Project 34: “The Cloud Native Deployer” — DevOps
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Docker / Kubernetes YAML |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 3: Advanced |
| Knowledge Area | DevOps |
What you’ll build: Dockerize Project 33 and generate Kubernetes manifests.
Why it teaches Ops: Kiro removes boilerplate friction.
Success criteria:
- Containers build and Kubernetes manifests apply cleanly.
Real World Outcome
You’ll have a Kiro workflow that containerizes an existing full-stack application and generates production-ready Kubernetes manifests with:
- Multi-stage Dockerfiles (optimized layer caching, minimal image size)
- Kubernetes Deployments (frontend, backend, database)
- Service definitions (ClusterIP, LoadBalancer, NodePort)
- ConfigMaps and Secrets (environment variables, database credentials)
- Persistent Volumes (PostgreSQL data persistence)
- Ingress rules (routing, TLS certificates)
- Health checks (liveness and readiness probes)
- Resource limits (CPU, memory quotas)
Example Session:
$ cd task-manager-app # From Project 33
$ kiro "Containerize this full-stack app and generate Kubernetes manifests for production deployment"
🐳 Cloud Native Deployer - Containerization Phase
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📦 Analyzing Project Structure...
├─ Frontend: React app (Node.js build)
├─ Backend: Express API (Node.js runtime)
└─ Database: PostgreSQL (external managed service)
🔨 Generating Dockerfiles...
[1/3] Frontend Dockerfile (Multi-stage build)
✅ Created frontend/Dockerfile
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Stage 2: Production
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
📊 Image size optimization:
├─ Before: 1.2 GB (node_modules included)
└─ After: 45 MB (only static files + nginx)
[2/3] Backend Dockerfile
✅ Created backend/Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npx prisma generate
EXPOSE 3001
CMD ["npm", "start"]
[3/3] Database (Using managed Postgres)
✅ Skipping Dockerfile (use managed service: AWS RDS, GCP Cloud SQL, or Azure PostgreSQL)
✅ Created k8s/postgres-statefulset.yaml (for local/dev clusters only)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
☸️ Generating Kubernetes Manifests...
[1/8] Namespace
✅ Created k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: task-manager
[2/8] ConfigMaps
✅ Created k8s/configmap-frontend.yaml
✅ Created k8s/configmap-backend.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: backend-config
namespace: task-manager
data:
NODE_ENV: "production"
PORT: "3001"
DATABASE_HOST: "postgres-service"
DATABASE_PORT: "5432"
[3/8] Secrets (Base64 encoded)
✅ Created k8s/secret-db.yaml
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
namespace: task-manager
type: Opaque
data:
DATABASE_URL: <base64-encoded-connection-string>
JWT_SECRET: <base64-encoded-secret>
⚠️ WARNING: Never commit real secrets to git!
Use sealed-secrets or external secret management (Vault, AWS Secrets Manager)
[4/8] Persistent Volumes (PostgreSQL)
✅ Created k8s/pvc-postgres.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
namespace: task-manager
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
[5/8] Deployments
✅ Created k8s/deployment-frontend.yaml
✅ Created k8s/deployment-backend.yaml
✅ Created k8s/statefulset-postgres.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
namespace: task-manager
spec:
replicas: 3
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- name: backend
image: task-manager-backend:latest
ports:
- containerPort: 3001
envFrom:
- configMapRef:
name: backend-config
- secretRef:
name: db-credentials
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3001
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3001
initialDelaySeconds: 10
periodSeconds: 5
[6/8] Services
✅ Created k8s/service-frontend.yaml (type: LoadBalancer)
✅ Created k8s/service-backend.yaml (type: ClusterIP)
✅ Created k8s/service-postgres.yaml (type: ClusterIP)
[7/8] Ingress (NGINX with TLS)
✅ Created k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: task-manager-ingress
namespace: task-manager
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
tls:
- hosts:
- taskmanager.example.com
secretName: taskmanager-tls
rules:
- host: taskmanager.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: backend
port:
number: 3001
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 80
[8/8] Helm Chart (Optional)
✅ Created helm/task-manager/Chart.yaml
✅ Created helm/task-manager/values.yaml
✅ Created helm/task-manager/templates/
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Containerization Complete!
📂 Generated Files:
├── frontend/Dockerfile
├── backend/Dockerfile
├── docker-compose.yml (for local development)
├── k8s/
│ ├── namespace.yaml
│ ├── configmap-frontend.yaml
│ ├── configmap-backend.yaml
│ ├── secret-db.yaml
│ ├── pvc-postgres.yaml
│ ├── deployment-frontend.yaml
│ ├── deployment-backend.yaml
│ ├── statefulset-postgres.yaml
│ ├── service-frontend.yaml
│ ├── service-backend.yaml
│ ├── service-postgres.yaml
│ └── ingress.yaml
└── helm/task-manager/ (chart)
🚀 Deployment Instructions:
# Build and push images
docker build -t your-registry/task-manager-frontend:v1.0 ./frontend
docker build -t your-registry/task-manager-backend:v1.0 ./backend
docker push your-registry/task-manager-frontend:v1.0
docker push your-registry/task-manager-backend:v1.0
# Apply Kubernetes manifests
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap-*.yaml
kubectl apply -f k8s/secret-db.yaml
kubectl apply -f k8s/pvc-postgres.yaml
kubectl apply -f k8s/statefulset-postgres.yaml
kubectl apply -f k8s/service-*.yaml
kubectl apply -f k8s/deployment-*.yaml
kubectl apply -f k8s/ingress.yaml
# Or use Helm
helm install task-manager ./helm/task-manager
# Check deployment status
kubectl get pods -n task-manager
kubectl get services -n task-manager
kubectl logs -n task-manager deployment/backend
What You See:
- Optimized multi-stage Dockerfiles (45 MB frontend vs 1.2 GB before)
- Complete Kubernetes manifests with health checks, resource limits, and auto-scaling
- Ingress configuration with TLS termination
- Persistent volumes for database data
- Helm chart for easy deployment and upgrades
The Core Question You’re Answering
“How can AI bridge the gap between ‘works on my machine’ and production-ready cloud-native deployments?”
This project teaches Infrastructure as Code (IaC): converting an application into declarative Kubernetes manifests. Success means understanding the 12-factor app principles, security boundaries (secrets management), and operational concerns (health checks, resource limits).
Concepts You Must Understand First
Stop and research these before coding:
- Container Fundamentals
- What is a container image vs a container instance?
- How do layers work in Docker (layer caching, .dockerignore)?
- What is the difference between
RUN,CMD, andENTRYPOINT? - Book Reference: “Docker Deep Dive” by Nigel Poulton - Ch. 3-5
- Kubernetes Architecture
- What are Pods, Deployments, Services, and Ingress?
- How does service discovery work (DNS, ClusterIP)?
- What is the difference between Deployment and StatefulSet?
- Book Reference: “Kubernetes in Action” by Marko Lukša - Ch. 1-5
- Configuration Management
- When should you use ConfigMaps vs Secrets vs environment variables?
- How do you handle database connection strings securely?
- What is the principle of least privilege for secrets?
- Book Reference: “Kubernetes Patterns” by Bilgin Ibryam - Ch. 4 (Configuration)
- Health Checks and Observability
- What is the difference between liveness and readiness probes?
- Why would a pod be “Running” but not “Ready”?
- How do you prevent cascading failures (circuit breakers, retries)?
- Book Reference: “Site Reliability Engineering” by Google - Ch. 21 (Monitoring)
Questions to Guide Your Design
Before implementing, think through these:
- Docker Image Optimization
- How will you minimize image size (multi-stage builds, Alpine Linux)?
- How will you optimize layer caching (COPY package.json before COPY .)?
- How will you handle secrets during build (BuildKit secrets, .dockerignore)?
- How will you tag images (semantic versioning, git SHA)?
- Kubernetes Resource Sizing
- How will you determine CPU/memory requests and limits?
- What happens if a pod exceeds its memory limit (OOMKilled)?
- How will you handle auto-scaling (HorizontalPodAutoscaler)?
- How will you prevent resource starvation (PodDisruptionBudgets)?
- Database Strategy
- Will you run Postgres in Kubernetes (StatefulSet) or use managed service (RDS)?
- How will you handle database migrations (init containers, Jobs)?
- How will you back up database data (PersistentVolume snapshots)?
- How will you handle connection pooling (PgBouncer sidecar)?
- Networking and Security
- How will you expose the app (LoadBalancer, Ingress, NodePort)?
- How will you handle TLS certificates (cert-manager, manual)?
- How will you restrict network traffic (NetworkPolicies)?
- How will you inject secrets (mounted volumes, environment variables)?
Thinking Exercise
Exercise: Design a Multi-Region Kubernetes Deployment
Given: “Deploy the task manager app to 3 regions (us-east, eu-west, ap-south) with geo-routing”
Architecture Decisions:
1. Image Registry Strategy
├─ Option A: Single registry with global replication (GCR, ECR with cross-region)
├─ Option B: Regional registries with image sync
└─ Trade-off: Latency vs consistency
2. Database Strategy
├─ Option A: Single primary DB in us-east, read replicas in other regions
├─ Option B: Multi-region DB with CockroachDB or Spanner
└─ Trade-off: Complexity vs latency
3. Traffic Routing
├─ Option A: Global load balancer (AWS Global Accelerator, GCP Cloud Load Balancing)
├─ Option B: DNS-based geo-routing (Route 53, Cloud DNS)
└─ Trade-off: Cost vs failover speed
Questions to answer:
- How do you ensure all regions run the same version (blue-green deployment)?
- How do you handle database writes (single-region write master vs multi-master)?
- How do you route users to the nearest region (latency-based DNS)?
- How do you handle region failures (automatic failover, manual intervention)?
- What monitoring would you add (Prometheus, Grafana, distributed tracing)?
The Interview Questions They’ll Ask
-
“Explain the difference between liveness and readiness probes. Give an example where a pod is alive but not ready.”
-
“How would you debug a pod that is CrashLoopBackOff? Walk me through your debugging process.”
-
“What are the security risks of mounting Secrets as environment variables vs as files?”
-
“Explain how Kubernetes service discovery works. How does a frontend pod find the backend service?”
-
“How would you perform a zero-downtime deployment with rolling updates in Kubernetes?”
-
“What is a sidecar container pattern? Give three examples of when you’d use it.”
Hints in Layers
Hint 1: Multi-Stage Docker Builds
# Build stage (large, includes dev dependencies)
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci # Includes devDependencies for build
COPY . .
RUN npm run build
# Production stage (small, only runtime dependencies)
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]
Hint 2: Kubernetes Resource Manifest Pattern Every Kubernetes resource needs:
apiVersion(which API group: v1, apps/v1, networking.k8s.io/v1)kind(resource type: Pod, Deployment, Service, Ingress)metadata(name, namespace, labels)spec(desired state)
Hint 3: Health Check Design
livenessProbe: # "Is the app alive?" (restart if fails)
httpGet:
path: /health
port: 3001
initialDelaySeconds: 30 # Wait 30s before first check
periodSeconds: 10 # Check every 10s
readinessProbe: # "Is the app ready for traffic?" (remove from service if fails)
httpGet:
path: /ready
port: 3001
initialDelaySeconds: 10
periodSeconds: 5
Hint 4: Secrets Management Best Practices Never commit secrets to git:
# k8s/secret-db.yaml.template
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
type: Opaque
data:
DATABASE_URL: <REPLACE_WITH_BASE64_ENCODED_VALUE>
JWT_SECRET: <REPLACE_WITH_BASE64_ENCODED_VALUE>
Generate secrets at deployment time:
kubectl create secret generic db-credentials \
--from-literal=DATABASE_URL="postgres://..." \
--from-literal=JWT_SECRET="random-generated-secret"
Hint 5: Apply Order Matters Apply resources in dependency order:
- Namespace (everything else goes in here)
- ConfigMaps and Secrets (env vars for pods)
- PersistentVolumeClaims (storage for StatefulSets)
- StatefulSets/Deployments (the apps)
- Services (networking between apps)
- Ingress (external access)
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Docker fundamentals | “Docker Deep Dive” by Nigel Poulton | Ch. 3-5 (Images, Containers) |
| Kubernetes architecture | “Kubernetes in Action” by Marko Lukša | Ch. 1-5 (Pods, Deployments, Services) |
| K8s configuration patterns | “Kubernetes Patterns” by Bilgin Ibryam | Ch. 4 (Configuration), Ch. 5 (Health Probes) |
| Cloud-native apps | “Cloud Native DevOps with Kubernetes” by John Arundel | Ch. 3-6 |
| Site reliability | “Site Reliability Engineering” by Google | Ch. 21-23 (Monitoring, Alerts) |
| 12-factor apps | “The Twelve-Factor App” by Adam Wiggins | All (online resource) |
Common Pitfalls & Debugging
Problem 1: “Pod is in CrashLoopBackOff state”
- Why: Application exits immediately (missing env vars, connection refused to DB)
- Fix: Check logs:
kubectl logs -n task-manager deployment/backend - Quick test:
kubectl describe pod -n task-manager <pod-name>shows exit code and reason
Problem 2: “Frontend can’t connect to backend API (CORS errors)”
- Why: Backend Service is ClusterIP (internal only), not reachable from browser
- Fix: Frontend should call backend via Ingress path
/api, not directly - Quick test:
curl http://<ingress-ip>/api/healthfrom outside cluster
Problem 3: “Database data is lost when pod restarts”
- Why: No PersistentVolumeClaim, data stored in ephemeral pod storage
- Fix: Create PVC and mount it to
/var/lib/postgresql/datain Postgres pod - Quick test:
kubectl get pvc -n task-managershows Bound status
Problem 4: “Secrets are visible in plaintext when running kubectl get secret -o yaml”
- Why: Secrets are base64-encoded, not encrypted (base64 is reversible!)
- Fix: Use Sealed Secrets, external secret stores (Vault), or RBAC to restrict access
- Security: Secrets are only encrypted at rest in etcd, not in transit
Problem 5: “Liveness probe keeps killing healthy pods”
- Why:
initialDelaySecondsis too short; app needs more time to start - Fix: Increase
initialDelaySecondsto account for startup time (database migrations, cache warming) - Quick test:
kubectl describe pod <name>shows probe failure logs
Problem 6: “Docker build is slow (reinstalls node_modules every time)”
- Why: COPY . . invalidates cache before
npm install - Fix: COPY package.json first, run
npm install, then COPY rest of app - Pattern:
COPY package*.json ./ RUN npm ci COPY . . # This comes AFTER npm ci
Definition of Done
- Dockerfiles build successfully (
docker build -t test .) - Multi-stage builds reduce image size by >50% (use
docker imagesto compare) - All Kubernetes manifests apply without errors (
kubectl apply -f k8s/) - Pods start and reach “Running” and “Ready” states
- Liveness and readiness probes pass (check with
kubectl describe pod) - Frontend is accessible via Ingress (curl from outside cluster)
- Backend API responds to health checks (
/health,/ready) - Database data persists across pod restarts (PersistentVolumeClaim)
- Resource limits prevent pods from consuming excessive CPU/memory
- Secrets are not committed to git (use
.gitignore, sealed-secrets) - README includes deployment instructions and troubleshooting guide
- Helm chart (optional) successfully installs and upgrades the app
Project 35: “The Deep Reasoner (Claude Code)” — Algorithmic Reasoning
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Python |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Algorithmic Reasoning |
What you’ll build: Solve a complex algorithmic problem using a scratchpad file and Opus model.
Why it teaches Reasoning: You use high-end reasoning as a tool.
Success criteria:
- The solution passes a non-trivial test set.
Real World Outcome
You’ll have a Kiro workflow that solves complex algorithmic problems by:
- Breaking down problems into sub-problems (problem decomposition)
- Using a scratchpad to work through examples and edge cases
- Switching to Opus (Claude Opus 4.5) for deep reasoning tasks
- Iterating on solutions based on test failures
- Generating test cases to validate correctness
Example Session:
$ kiro "Solve LeetCode #1235: Maximum Profit in Job Scheduling using dynamic programming with binary search. Use Opus for reasoning."
🧠 Deep Reasoner - Algorithmic Problem Solving
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 Problem Analysis Phase (using Opus)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Switching to Claude Opus 4.5 for deep reasoning...
Problem: Maximum Profit in Job Scheduling
Inputs:
- startTime: [1,2,3,3]
- endTime: [3,4,5,6]
- profit: [50,10,40,70]
Output: 120 (jobs 0 and 3)
Constraint: Cannot take two overlapping jobs.
📝 Creating Scratchpad: /tmp/job_scheduling_scratchpad.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SCRATCHPAD: Job Scheduling Problem
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
## Step 1: Understanding the Problem
Jobs:
Job 0: [1, 3] profit=50
Job 1: [2, 4] profit=10
Job 2: [3, 5] profit=40
Job 3: [3, 6] profit=70
Overlapping check:
- Job 0 [1,3] overlaps with Job 1 [2,4]? YES (2 < 3)
- Job 0 [1,3] overlaps with Job 2 [3,5]? NO (3 >= 3)
- Job 0 [1,3] overlaps with Job 3 [3,6]? NO (3 >= 3)
## Step 2: Brute Force Approach
Try all subsets: 2^n = 16 combinations
- {} → profit = 0
- {0} → profit = 50
- {1} → profit = 10
- {0,2} → profit = 90 (0 ends at 3, 2 starts at 3 ✓)
- {0,3} → profit = 120 (0 ends at 3, 3 starts at 3 ✓) ← MAX!
Brute force works but O(2^n) - too slow for n=50,000!
## Step 3: Dynamic Programming Insight
Key observation: If we sort jobs by end time, we can use DP!
dp[i] = max profit using jobs 0..i
For each job i:
Option 1: Skip job i → dp[i] = dp[i-1]
Option 2: Take job i → dp[i] = profit[i] + dp[last_compatible_job]
Finding last_compatible_job:
- Need job j where endTime[j] <= startTime[i]
- Binary search on sorted end times! (O(log n))
## Step 4: Algorithm Design
1. Sort jobs by end time
2. For each job i:
- Binary search to find latest job j where endTime[j] <= startTime[i]
- dp[i] = max(dp[i-1], profit[i] + dp[j])
3. Return dp[n-1]
Time: O(n log n) - sorting + binary search per job
Space: O(n) - DP array
## Step 5: Edge Cases to Test
- Single job → profit = job profit
- All jobs overlap → profit = max single job
- No jobs → profit = 0
- Jobs already sorted vs unsorted
- Identical start/end times
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Reasoning complete. Generating solution...
🐍 Generating Python Implementation...
def jobScheduling(startTime, endTime, profit):
# Combine and sort jobs by end time
jobs = sorted(zip(startTime, endTime, profit), key=lambda x: x[1])
n = len(jobs)
# Binary search for last compatible job
def find_last_compatible(i):
left, right = 0, i - 1
result = -1
while left <= right:
mid = (left + right) // 2
if jobs[mid][1] <= jobs[i][0]: # endTime[mid] <= startTime[i]
result = mid
left = mid + 1
else:
right = mid - 1
return result
# DP array
dp = [0] * n
dp[0] = jobs[0][2] # First job profit
for i in range(1, n):
# Option 1: Skip job i
skip = dp[i - 1]
# Option 2: Take job i
take = jobs[i][2] # Current job profit
last_compatible = find_last_compatible(i)
if last_compatible != -1:
take += dp[last_compatible]
dp[i] = max(skip, take)
return dp[n - 1]
✅ Solution generated: solution.py
🧪 Generating Test Cases...
# Test 1: Example from problem
assert jobScheduling([1,2,3,3], [3,4,5,6], [50,10,40,70]) == 120
# Test 2: All overlapping jobs
assert jobScheduling([1,1,1], [2,3,4], [5,6,4]) == 6
# Test 3: No overlapping jobs
assert jobScheduling([1,2,3], [2,3,4], [5,6,4]) == 15
# Test 4: Single job
assert jobScheduling([1], [2], [100]) == 100
# Test 5: Two identical jobs
assert jobScheduling([1,1], [2,2], [50,50]) == 50
Running tests...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Test 1: PASSED (120 == 120)
✅ Test 2: PASSED (6 == 6)
✅ Test 3: PASSED (15 == 15)
✅ Test 4: PASSED (100 == 100)
✅ Test 5: PASSED (50 == 50)
All tests passed! ✓
📊 Complexity Analysis:
├─ Time: O(n log n) - sorting + binary search
├─ Space: O(n) - DP array
└─ Verified against 50,000 job test set ✓
💡 Key Insights (from scratchpad):
1. Sorting by end time enables DP
2. Binary search finds last compatible job in O(log n)
3. Trade-off: Skip vs Take each job
Would you like me to:
1. Add more test cases?
2. Optimize space to O(1)?
3. Explain the binary search in more detail?
What You See:
- Step-by-step problem breakdown in scratchpad file
- Opus-powered reasoning for algorithm design
- Generated Python solution with binary search optimization
- Comprehensive test suite with edge cases
- Complexity analysis and verification
The Core Question You’re Answering
“How can AI leverage extended reasoning (Claude Opus) and structured scratchpad files to solve problems that require multi-step logical deduction?”
This project teaches metacognitive problem-solving: using an AI system that knows when to slow down and think deeply. Switching to Opus for hard problems is analogous to a developer pausing to whiteboard a solution before coding.
Concepts You Must Understand First
Stop and research these before coding:
- Dynamic Programming Fundamentals
- What is overlapping subproblems and optimal substructure?
- How do you identify when DP is applicable?
- What is the difference between top-down (memoization) and bottom-up (tabulation)?
- Book Reference: “Introduction to Algorithms” (CLRS) - Ch. 15
- Binary Search and Divide-and-Conquer
- How does binary search achieve O(log n) time?
- What invariants must be maintained during binary search?
- When is binary search applicable (sorted data, monotonic function)?
- Book Reference: “Algorithms” by Sedgewick & Wayne - Ch. 3.1
- Problem Decomposition Strategies
- How do you break a complex problem into smaller sub-problems?
- What is the “simplify and generalize” technique?
- How do you identify base cases vs recursive cases?
- Book Reference: “How to Solve It” by George Pólya - Ch. 1-2
- Complexity Analysis
- How do you calculate time complexity (Big-O notation)?
- What is the difference between time complexity and space complexity?
- How do you identify bottlenecks (profiling, asymptotic analysis)?
- Book Reference: “Grokking Algorithms” by Aditya Bhargava - Ch. 1-2
Questions to Guide Your Design
Before implementing, think through these:
- Model Selection Strategy
- When should you use Opus (slow, expensive, deep reasoning)?
- When should you use Sonnet (fast, cheaper, good for most tasks)?
- How do you detect that a problem requires deep reasoning?
- How do you balance cost vs solution quality?
- Scratchpad Design
- What should go in the scratchpad (examples, edge cases, algorithm sketches)?
- How do you structure the scratchpad (markdown sections, code blocks)?
- How do you prevent the scratchpad from growing too large (context limits)?
- How do you use the scratchpad to validate the final solution?
- Test Case Generation
- How do you generate edge cases (empty input, single element, all same values)?
- How do you generate stress tests (large n, worst-case inputs)?
- How do you verify correctness (test against brute force, known solutions)?
- How do you measure coverage (all branches, all edge cases)?
- Solution Validation
- How do you verify time complexity matches the theoretical analysis?
- How do you test on large inputs (n=50,000, n=100,000)?
- How do you handle time limit exceeded (TLE) failures?
- How do you iterate when tests fail (analyze failure, refine algorithm)?
Thinking Exercise
Exercise: Solve a Two-Pointer Problem with Scratchpad
Problem: “Three Sum - Find all unique triplets in an array that sum to zero”
# Input: nums = [-1, 0, 1, 2, -1, -4]
# Output: [[-1, -1, 2], [-1, 0, 1]]
Scratchpad Template:
## Step 1: Understand the Problem
- Input: Array of integers (may have duplicates)
- Output: All unique triplets that sum to 0
- Constraint: No duplicate triplets in result
## Step 2: Examples and Edge Cases
Example 1: [-1, 0, 1, 2, -1, -4]
Sort: [-4, -1, -1, 0, 1, 2]
Fix -4: need two numbers that sum to 4 → no solution
Fix -1: need two numbers that sum to 1 → (-1, 0, 1) ✓ and (-1, 2, -1) ✓
Edge cases:
- All zeros: [0, 0, 0] → [[0, 0, 0]]
- No solution: [1, 2, 3] → []
- Duplicates: [-1, -1, 0, 1, 1] → [[-1, 0, 1]]
## Step 3: Brute Force
Try all triplets: O(n³)
for i in range(n):
for j in range(i+1, n):
for k in range(j+1, n):
if nums[i] + nums[j] + nums[k] == 0:
result.append([nums[i], nums[j], nums[k]])
Too slow for n=3000!
## Step 4: Optimized Approach
Sort array: O(n log n)
Fix first element, use two pointers for remaining: O(n²)
Questions to answer:
- How do you avoid duplicate triplets (skip same values)?
- How do you move the two pointers (left++, right–)?
- What is the time complexity (O(n²) after sorting)?
- How do you test this (unit tests, large inputs)?
The Interview Questions They’ll Ask
-
“Explain the difference between using Claude Sonnet vs Claude Opus. When would you choose each?”
-
“How would you design a scratchpad system that prevents context overflow (scratchpad growing too large)?”
-
“Walk me through how you’d debug a dynamic programming solution that passes small tests but fails large ones.”
-
“How would you validate that an AI-generated algorithm matches the claimed time complexity?”
-
“Explain how you’d use test-driven development with AI code generation (write tests first, then generate solution).”
-
“How would you prevent AI from overfitting to test cases (generating code that passes tests but is wrong)?”
Hints in Layers
Hint 1: Opus Model Selection Use Opus when:
- Problem requires multi-step logical reasoning (DP, graph algorithms)
- Problem has non-obvious insights (greedy choice, invariant)
- Initial Sonnet solution fails tests and you need deeper analysis
- You need detailed explanations for teaching/understanding
Use Sonnet when:
- Problem is straightforward (two-pointer, hash map)
- You need fast iteration (testing multiple approaches)
- Cost is a concern (Opus is 5x more expensive)
Hint 2: Scratchpad Structure
# Problem: [Title]
## Step 1: Problem Understanding
[Restate in own words, identify constraints]
## Step 2: Examples (3-5 examples, including edge cases)
[Work through examples by hand]
## Step 3: Brute Force
[Naive O(n²) or O(2^n) solution, analyze why it's slow]
## Step 4: Optimization Insight
[Key observation that leads to faster solution]
## Step 5: Algorithm Design
[Pseudocode or step-by-step description]
## Step 6: Complexity Analysis
[Time and space complexity with justification]
## Step 7: Edge Cases
[List all edge cases to test]
Hint 3: Test Generation Strategy Generate tests in this order:
- Example from problem statement (sanity check)
- Edge cases (empty, single element, all same)
- Boundary cases (min/max values, size limits)
- Stress tests (large n, worst-case complexity)
- Random tests (fuzzing for unexpected bugs)
Hint 4: Solution Validation Loop
1. Generate solution using Opus + scratchpad
2. Run tests (example + edge + stress)
3. If all pass → done ✓
4. If some fail:
a. Analyze failure (wrong output, TLE, crash)
b. Add failing case to scratchpad
c. Ask Opus to refine solution
d. Go to step 2
Hint 5: Complexity Verification
import time
# Test time complexity empirically
for n in [100, 1000, 10000, 100000]:
input_data = generate_test_input(n)
start = time.time()
result = jobScheduling(input_data)
elapsed = time.time() - start
print(f"n={n:6d} time={elapsed:.4f}s")
# Expected for O(n log n):
# n=100 time=0.0001s
# n=1000 time=0.0012s (10x input → ~10x time)
# n=10000 time=0.015s (10x input → ~13x time)
# n=100000 time=0.20s (10x input → ~13x time)
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Dynamic programming | “Introduction to Algorithms” (CLRS) | Ch. 15 (Dynamic Programming) |
| Problem-solving strategies | “How to Solve It” by George Pólya | Ch. 1-2 (Understanding, Devising a Plan) |
| Algorithm design | “Algorithms” by Sedgewick & Wayne | Ch. 3-6 (Sorting, Searching, DP) |
| Complexity analysis | “Grokking Algorithms” by Aditya Bhargava | Ch. 1-2 (Big-O Notation) |
| Competitive programming | “Competitive Programming 4” by Steven Halim | Ch. 3 (Problem Solving Paradigms) |
| Interview prep | “Cracking the Coding Interview” by Gayle McDowell | Ch. 8-9 (DP, Recursion) |
Common Pitfalls & Debugging
Problem 1: “Solution passes small tests but gets Time Limit Exceeded (TLE) on large inputs”
- Why: Algorithm is O(n³) or O(2^n), too slow for n=50,000
- Fix: Analyze complexity in scratchpad, find O(n log n) or O(n²) solution
- Quick test: Run on n=100, n=1000, n=10000 and measure time scaling
Problem 2: “Opus generates a solution but doesn’t explain the key insight”
- Why: Opus jumps to solution without showing reasoning steps
- Fix: Ask Opus to “work through examples in the scratchpad first, then generate code”
- Pattern: Always request scratchpad-driven reasoning before code
Problem 3: “Generated solution is correct but hard to understand (no comments, cryptic variable names)”
- Why: Optimization prioritized over readability
- Fix: Ask for “readable solution with comments explaining each step”
- Example: Change
dp[i] = max(dp[i-1], p[i] + dp[f(i)])to include comment explaining skip vs take
Problem 4: “Tests pass but solution fails on edge case not in test suite”
- Why: Incomplete test coverage (missing edge case)
- Fix: Add property-based testing or fuzz testing
- Example: Use Hypothesis library to generate random test inputs
Problem 5: “Binary search has off-by-one error (infinite loop or wrong answer)”
- Why: Incorrect loop invariant (left <= right vs left < right)
- Fix: Trace through binary search by hand on small example
- Debugging: Print left, right, mid at each iteration to see what’s wrong
Problem 6: “Opus uses too much context analyzing the problem (exceeds token limit)”
- Why: Scratchpad includes too much detail (full trace of all examples)
- Fix: Summarize examples instead of full traces
- Pattern: “Show 2-3 key examples, not all 10 test cases”
Definition of Done
- Scratchpad file shows step-by-step reasoning (problem understanding → brute force → optimization)
- Solution is generated using Opus model (verify model selection in logs)
- All example test cases from problem statement pass
- Edge cases are identified in scratchpad and tested
- Stress test with large input (n=50,000) completes in <1 second
- Time complexity matches theoretical analysis (empirical verification)
- Space complexity is documented and justified
- Solution includes comments explaining key steps (binary search, DP transition)
- Code is readable (meaningful variable names, no cryptic abbreviations)
- Complexity analysis is included in output (Big-O with justification)
- Solution generalizes beyond test cases (not hardcoded for specific inputs)
Project 36: “The Global Translator” — Internationalization
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | JSON (i18n) |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 1: Beginner |
| Knowledge Area | Internationalization |
What you’ll build: An automated translation system that takes your application’s English locale files and generates accurate, context-aware translations for multiple languages while preserving key structure, placeholders, and formatting.
Why it teaches Grunt Work: This project demonstrates how AI agents excel at high-volume, high-precision tasks that are tedious for humans but critical for global applications. You’ll learn to leverage Kiro’s language understanding to maintain translation consistency across hundreds of strings.
Core challenges you’ll face:
- Key structure preservation → Maps to nested object validation and structural equality checking
- Placeholder detection → Maps to regex patterns for interpolation variables (
{name},{{count}},%s) - Context-aware translation → Maps to providing semantic context to avoid literal translations
- Pluralization rules → Maps to ICU MessageFormat and language-specific plural forms
Real World Outcome
You’ll have a CLI tool that transforms a single English locale file into multiple language files with verified structural integrity:
$ ls locales/
en.json
$ cat locales/en.json
{
"common": {
"welcome": "Welcome, {name}!",
"itemCount": "{count, plural, =0 {No items} =1 {One item} other {# items}}",
"errors": {
"notFound": "The resource at {path} was not found",
"unauthorized": "You must be logged in to access this page"
}
},
"dashboard": {
"greeting": "Good {timeOfDay}, {userName}",
"stats": {
"totalUsers": "{count} active users",
"lastLogin": "Last login: {date, date, medium}"
}
}
}
$ kiro translate --source locales/en.json --target es,fr,de,ja,pt
[Kiro CLI Session]
🌍 Global Translator Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 Source: locales/en.json (47 keys)
🎯 Target languages: es, fr, de, ja, pt
Step 1: Analyzing source structure...
✓ Found 47 translation keys
✓ Detected 12 placeholders: {name}, {count}, {path}, {timeOfDay}, {userName}, {date}
✓ Identified 3 plural forms (ICU MessageFormat)
Step 2: Extracting context...
• "welcome" → greeting context, informal tone
• "itemCount" → UI element, requires plural support
• "errors.notFound" → error message, technical context
• "dashboard.greeting" → personalized greeting with time
Step 3: Translating to Spanish (es)...
✓ 47/47 keys translated
✓ All placeholders preserved
✓ Plural rules adapted for Spanish
Step 4: Translating to French (fr)...
✓ 47/47 keys translated
✓ Gender-neutral alternatives used
✓ Plural rules adapted for French
Step 5: Translating to German (de)...
✓ 47/47 keys translated
✓ Formal "Sie" used for general UI
✓ Plural rules adapted for German
Step 6: Translating to Japanese (ja)...
✓ 47/47 keys translated
✓ Polite form (です/ます) used
✓ No plural forms (Japanese uses counters)
Step 7: Translating to Portuguese (pt)...
✓ 47/47 keys translated
✓ Brazilian Portuguese variant selected
✓ Plural rules adapted for Portuguese
Step 8: Validation...
✓ All files have identical key structures
✓ All placeholders match source
✓ No missing or extra keys
✓ ICU MessageFormat syntax valid
✅ Translation complete!
Generated files:
locales/es.json (47 keys)
locales/fr.json (47 keys)
locales/de.json (47 keys)
locales/ja.json (47 keys)
locales/pt.json (47 keys)
$ cat locales/es.json
{
"common": {
"welcome": "¡Bienvenido, {name}!",
"itemCount": "{count, plural, =0 {Sin elementos} =1 {Un elemento} other {# elementos}}",
"errors": {
"notFound": "El recurso en {path} no fue encontrado",
"unauthorized": "Debes iniciar sesión para acceder a esta página"
}
},
"dashboard": {
"greeting": "Buenas {timeOfDay}, {userName}",
"stats": {
"totalUsers": "{count} usuarios activos",
"lastLogin": "Último acceso: {date, date, medium}"
}
}
}
$ npm run validate-locales
> Validating locale files...
✓ All 5 locale files have matching key structure
✓ All placeholders are consistent
✓ ICU MessageFormat syntax is valid
✓ No duplicate keys found
# Your application now supports 5 languages with zero manual translation!
Exactly what happens:
- Kiro reads your English locale file and builds a structural map
- For each target language, it translates strings while preserving:
- Nested object structure (exact key paths)
- Placeholder syntax (
{variableName}) - ICU MessageFormat plural rules
- HTML entities and special characters
- It validates every generated file against the source structure
- It produces a diff report showing what changed beyond just the text
The Core Question You’re Answering
“How do you leverage AI for high-volume, high-precision tasks that require both creativity (natural translation) and rigid constraints (structural validation)?”
This is the essence of “AI as a power tool” — you’re not just prompting for a translation; you’re building a system that:
- Uses AI’s language understanding for quality translations
- Enforces programmatic validation to prevent structural drift
- Scales to hundreds of languages without manual effort
- Maintains consistency across all locales
Concepts You Must Understand First
Stop and research these before coding:
- Internationalization (i18n) vs Localization (l10n)
- What’s the difference between i18n and l10n?
- What are locale codes (en-US, pt-BR, zh-CN)?
- What is the difference between language and region?
- Book Reference: “Internationalization and Localization Using Microsoft .NET” by Nick Symmonds - Ch. 1-2
- ICU MessageFormat
- What are interpolation placeholders?
- How do plural rules work in different languages? (English: one/other, Russian: one/few/many/other)
- What is gender-based inflection?
- Reference: ICU MessageFormat Specification (https://unicode-org.github.io/icu/userguide/format_parse/messages/)
- JSON Structure Validation
- How do you recursively compare nested objects?
- What’s the difference between shallow and deep equality?
- How do you validate that two JSON files have the same keys but different values?
- Book Reference: “JavaScript: The Definitive Guide” by David Flanigan - Ch. 6 (Objects)
- Context-Aware Translation
- Why does “Welcome” translate differently in formal vs informal contexts?
- What are gender-neutral alternatives in gendered languages (French, Spanish)?
- How do you handle culturally-specific idioms?
- Book Reference: “Found in Translation” by Nataly Kelly & Jost Zetzsche
Questions to Guide Your Design
Before implementing, think through these:
- Structural Validation
- How will you detect if a translated file is missing a key?
- How will you validate that placeholders weren’t accidentally translated?
- What happens if the source file changes after translations are generated?
- Should you support nested keys with dot notation (
errors.notFound) or only nested objects?
- Translation Quality
- How do you provide context to Kiro for better translations? (e.g., “button label” vs “error message”)
- Should you allow overrides for specific keys that need manual review?
- How do you handle brand names or technical terms that shouldn’t be translated?
- What about HTML tags inside strings (
<strong>Bold</strong> text)?
- Plural Handling
- How many plural forms does each language have? (English: 2, Arabic: 6)
- Should you use ICU MessageFormat or a simpler plural syntax?
- What happens if the source uses English plural rules but the target language needs different forms?
- Incremental Updates
- If you add 5 new keys to
en.json, should the tool re-translate everything or just the new keys? - How do you track which translations are human-reviewed vs AI-generated?
- Should you version your locale files?
- If you add 5 new keys to
Thinking Exercise
Placeholder Preservation Challenge
You have this English string:
"greeting": "Hello, {name}! You have {count} new {count, plural, =1 {message} other {messages}}."
Kiro translates it to Spanish as:
"greeting": "¡Hola, {nombre}! Tienes {cantidad} {cantidad, plural, =1 {mensaje nuevo} other {mensajes nuevos}}."
Questions to reason through:
- What went wrong? (Hint: look at the placeholder names)
- How would your validation catch this error?
- What instruction would you give Kiro to prevent placeholder name changes?
- How would you handle the fact that Spanish puts the adjective after the noun (“mensajes nuevos” vs “new messages”)?
- Should you allow Kiro to reorder placeholders if the target language has different word order?
The Interview Questions They’ll Ask
Prepare to answer these:
- “How would you handle right-to-left (RTL) languages like Arabic or Hebrew in your translation system?”
- “What’s the difference between i18n and l10n? Give a concrete example.”
- “How do you ensure that date/time formatting respects locale conventions (MM/DD/YYYY vs DD/MM/YYYY)?”
- “Your translation system accidentally translated ‘Apple’ (the company) to ‘Manzana’ (the fruit). How do you prevent this?”
- “What are the security implications of allowing user-provided locale files in your application?”
- “How would you implement translation fallback chains? (e.g., pt-BR → pt → en)”
Hints in Layers
Hint 1: Start with Structure Validation Before you translate anything, write a function that:
- Reads the source JSON
- Extracts all key paths (e.g.,
["common.welcome", "common.itemCount", "common.errors.notFound"]) - Validates that a translated file has exactly the same key paths
- This becomes your “validator” that runs after every translation
Hint 2: Placeholder Detection Extract all placeholders from the source strings using regex:
const placeholderRegex = /\{[^}]+\}/g;
const placeholders = sourceString.match(placeholderRegex);
// ["name}"], ["{count}"]
For each translated string, verify the same placeholders exist (exact match).
Hint 3: Kiro Prompt Structure Give Kiro context for each string:
Translate the following strings to {targetLanguage}.
CRITICAL RULES:
1. Preserve ALL placeholders exactly as written: {name}, {count}, {path}
2. Do NOT translate placeholder names
3. Maintain ICU MessageFormat syntax for plurals
4. Use {formality} tone (formal/informal)
5. Keep HTML tags unchanged
Context: {stringContext} (e.g., "error message", "button label", "greeting")
Source (English):
{
"key": "value with {placeholder}"
}
Return ONLY valid JSON with the same structure.
Hint 4: Validation Loop After translation:
- Parse both source and target JSON
- Extract key paths from both (recursive traversal)
- Assert:
sourceKeys === targetKeys(same order, same depth) - For each key, extract placeholders from source and target
- Assert:
sourcePlaceholders === targetPlaceholders - If validation fails, show diff and ask if Kiro should retry
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| i18n fundamentals | “Internationalization and Localization Using Microsoft .NET” by Nick Symmonds | Ch. 1-3 |
| JSON manipulation | “JavaScript: The Definitive Guide” by David Flanigan | Ch. 6 |
| ICU MessageFormat | ICU User Guide (online) | Message Formatting |
| Translation best practices | “Found in Translation” by Nataly Kelly | Ch. 4, 7 |
Common Pitfalls and Debugging
Problem 1: “Placeholders are being translated”
- Why: Kiro doesn’t know that
{name}is a variable, not English text - Fix: Explicitly instruct: “Do NOT translate text inside curly braces
{}” - Quick test:
grep -o '{[^}]*}' locales/es.jsonshould matchgrep -o '{[^}]*}' locales/en.json
Problem 2: “Key structure doesn’t match”
- Why: JSON parsing errors or Kiro adding/removing keys
- Fix: Write a structural diff function that shows which keys are missing/extra
- Quick test:
const sourceKeys = Object.keys(flattenObject(source)).sort(); const targetKeys = Object.keys(flattenObject(target)).sort(); console.log("Missing:", sourceKeys.filter(k => !targetKeys.includes(k))); console.log("Extra:", targetKeys.filter(k => !sourceKeys.includes(k)));
Problem 3: “Plural forms are broken in the target language”
- Why: Different languages have different plural categories (English: 2, Russian: 4, Arabic: 6)
- Fix: Use a plural rules library (e.g.,
make-plural) to generate correct ICU syntax - Quick test: Validate with
intl-messageformatparser before writing the file
Problem 4: “HTML tags inside strings are malformed after translation”
- Why: Kiro might rearrange or escape HTML:
<strong>→<strong> - Fix: Instruct Kiro: “Preserve all HTML tags exactly as written, including attributes”
- Quick test: Count
<and>characters before and after translation
Definition of Done
- Source locale file is parsed and all key paths are extracted
- For each target language, a valid JSON file is generated with identical structure
- All placeholders in translated strings match the source (name and count)
- ICU MessageFormat plural syntax is valid and adapted to target language plural rules
- No HTML tags or special characters are malformed
- A validation report shows 100% structural match across all locale files
- Incremental updates work: adding a new key to source updates only that key in translations
- Tool outputs a human-readable diff showing what changed beyond just translations
Project 37: “The SQL Optimizer” — Database Performance
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | SQL |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Database Performance |
What you’ll build: A SQL performance analyzer that takes slow queries, runs EXPLAIN ANALYZE, identifies bottlenecks (missing indexes, inefficient joins, full table scans), and generates optimized queries with index recommendations.
Why it teaches Analysis: This project shows how Kiro can act like a senior DBA by reading query execution plans, understanding database internals, and suggesting data-driven optimizations. You’ll learn to leverage AI for performance tuning that would normally require years of experience.
Core challenges you’ll face:
- Execution plan interpretation → Maps to understanding EXPLAIN output formats (PostgreSQL, MySQL, etc.)
- Index recommendation → Maps to analyzing WHERE clauses, JOIN conditions, and ORDER BY columns
- Query rewriting → Maps to CTE optimization, subquery elimination, join order changes
- Cost estimation validation → Maps to verifying that suggested changes actually reduce query cost
Real World Outcome
You’ll have a CLI tool that analyzes slow queries and produces actionable optimization recommendations with before/after performance metrics:
$ cat slow_queries.sql
-- Query 1: User dashboard
SELECT u.id, u.email, u.created_at,
COUNT(DISTINCT o.id) as order_count,
SUM(o.total_amount) as total_spent,
MAX(o.created_at) as last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2023-01-01'
AND o.status IN ('completed', 'shipped')
GROUP BY u.id, u.email, u.created_at
ORDER BY total_spent DESC
LIMIT 100;
$ kiro sql-optimize --file slow_queries.sql --database postgres --explain
[Kiro CLI Session]
🔍 SQL Optimizer Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 Analyzing Query 1: User dashboard
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Running EXPLAIN ANALYZE...
QUERY PLAN (Original)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Limit (cost=45623.22..45623.47 rows=100 width=56)
(actual time=3847.234..3847.289 rows=100 loops=1)
-> Sort (cost=45623.22..45891.78 rows=107424 width=56)
(actual time=3847.232..3847.261 rows=100 loops=1)
Sort Key: (sum(o.total_amount)) DESC
Sort Method: top-N heapsort Memory: 32kB
-> HashAggregate (cost=38456.89..39531.13 rows=107424 width=56)
(actual time=3621.445..3742.667 rows=125000 loops=1)
Group Key: u.id, u.email, u.created_at
-> Hash Left Join (cost=12456.00..34789.56 rows=244488 width=48)
(actual time=124.567..2345.789 rows=456000 loops=1)
Hash Cond: (u.id = o.user_id)
-> Seq Scan on users u (cost=0.00..3567.00 rows=125000 width=24)
(actual time=0.012..45.678 rows=125000 loops=1)
Filter: (created_at > '2023-01-01'::date)
Rows Removed by Filter: 75000
-> Hash (cost=8934.00..8934.00 rows=244488 width=24)
(actual time=123.456..123.456 rows=456000 loops=1)
Buckets: 65536 Batches: 8 Memory Usage: 4567kB
-> Seq Scan on orders o (cost=0.00..8934.00 rows=244488 width=24)
(actual time=0.023..67.891 rows=456000 loops=1)
Filter: (status = ANY ('{completed,shipped}'::text[]))
Rows Removed by Filter: 144000
Planning Time: 1.234 ms
Execution Time: 3847.567 ms ⚠️ SLOW!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 2: Identifying Bottlenecks...
🚨 CRITICAL ISSUES FOUND:
1. Sequential Scan on 'users' (125k rows, Filter removes 75k)
→ Missing index on users.created_at
Cost impact: ~45ms per scan
2. Sequential Scan on 'orders' (600k rows, Filter removes 144k)
→ Missing index on orders.status
Cost impact: ~68ms per scan
3. HashAggregate on 107k rows
→ Could be reduced with filtered join
Cost impact: ~121ms
4. Hash Join with 8 batches (memory spill to disk)
→ work_mem too low (current: 4MB, needed: ~8MB)
Cost impact: ~200ms (I/O overhead)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 3: Generating Index Recommendations...
CREATE INDEX CONCURRENTLY idx_users_created_at
ON users(created_at)
WHERE created_at > '2020-01-01'; -- Partial index (smaller, faster)
CREATE INDEX CONCURRENTLY idx_orders_user_status
ON orders(user_id, status)
INCLUDE (total_amount, created_at); -- Covering index (no table lookup)
Estimated index sizes:
• idx_users_created_at: 3.2 MB
• idx_orders_user_status: 28.4 MB
Total disk space required: 31.6 MB
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 4: Optimized Query (v1)...
-- Optimized version: filtered join + covering index
WITH recent_users AS (
SELECT id, email, created_at
FROM users
WHERE created_at > '2023-01-01' -- Index scan on idx_users_created_at
)
SELECT ru.id, ru.email, ru.created_at,
COUNT(DISTINCT o.id) as order_count,
SUM(o.total_amount) as total_spent,
MAX(o.created_at) as last_order_date
FROM recent_users ru
LEFT JOIN orders o
ON ru.id = o.user_id
AND o.status IN ('completed', 'shipped') -- Index scan on idx_orders_user_status
GROUP BY ru.id, ru.email, ru.created_at
ORDER BY total_spent DESC
LIMIT 100;
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 5: Running EXPLAIN ANALYZE (Optimized)...
QUERY PLAN (Optimized)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Limit (cost=8934.56..8934.81 rows=100 width=56)
(actual time=287.123..287.156 rows=100 loops=1)
-> Sort (cost=8934.56..9203.12 rows=107424 width=56)
(actual time=287.121..287.142 rows=100 loops=1)
Sort Key: (sum(o.total_amount)) DESC
Sort Method: top-N heapsort Memory: 32kB
-> HashAggregate (cost=6234.89..7309.13 rows=107424 width=56)
(actual time=245.678..267.891 rows=125000 loops=1)
Group Key: ru.id, ru.email, ru.created_at
-> Hash Left Join (cost=3456.00..4789.56 rows=244488 width=48)
(actual time=34.567..178.234 rows=456000 loops=1)
Hash Cond: (ru.id = o.user_id)
-> Index Scan using idx_users_created_at on users ✓
(cost=0.29..845.00 rows=125000 width=24)
(actual time=0.012..12.345 rows=125000 loops=1)
Index Cond: (created_at > '2023-01-01'::date)
-> Hash (cost=2345.00..2345.00 rows=244488 width=24)
(actual time=34.123..34.123 rows=456000 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 8234kB ✓
-> Index Scan using idx_orders_user_status on orders ✓
(cost=0.43..2345.00 rows=244488 width=24)
(actual time=0.023..18.456 rows=456000 loops=1)
Index Cond: (status = ANY ('{completed,shipped}'::text[]))
Planning Time: 0.987 ms
Execution Time: 287.456 ms ✅ 13.4x FASTER!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Performance Summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Before: 3847.567 ms
After: 287.456 ms
Speedup: 13.4x
Improvements:
• Sequential Scans eliminated: 2
• Index Scans added: 2
• Memory spills reduced: 1 (8 batches → 1 batch)
• Rows filtered earlier: 144k (at index level)
✅ Optimizations applied successfully!
Would you like to:
1. Apply indexes to database? (CREATE INDEX commands)
2. See alternative query plans?
3. Analyze another query?
Exactly what happens:
- Kiro runs
EXPLAIN ANALYZEon your slow query - It identifies bottlenecks: sequential scans, missing indexes, inefficient joins
- It recommends specific indexes with size estimates
- It rewrites the query to leverage those indexes
- It re-runs
EXPLAIN ANALYZEand shows before/after metrics - You get actionable SQL commands to apply the optimizations
The Core Question You’re Answering
“How do you teach an AI to think like a database query optimizer and recommend performance improvements based on execution plans?”
This is about giving Kiro the mental model of a senior DBA:
- Understanding what “Seq Scan” vs “Index Scan” means for performance
- Recognizing when a query will cause memory spills or hash join overflow
- Knowing which columns benefit from indexes based on WHERE/JOIN/ORDER BY usage
- Validating that optimizations actually improve cost (not just guessing)
Concepts You Must Understand First
Stop and research these before coding:
- EXPLAIN and EXPLAIN ANALYZE
- What’s the difference between EXPLAIN (estimated) and EXPLAIN ANALYZE (actual)?
- What are the key metrics: cost, rows, width, actual time, loops?
- How do you interpret nested plan nodes?
- Book Reference: “PostgreSQL: Up and Running” by Regina Obe & Leo Hsu - Ch. 10
- Index Types and Use Cases
- B-tree vs Hash vs GiST vs GIN indexes
- When is a partial index better than a full index?
- What is a covering index (INCLUDE columns)?
- Why doesn’t PostgreSQL use an index for
LIKE '%pattern%'? - Book Reference: “SQL Performance Explained” by Markus Winand - Ch. 2-3
- Query Execution Plans
- What is a Sequential Scan and when is it acceptable?
- What is a Hash Join vs Nested Loop vs Merge Join?
- What does “Rows Removed by Filter” indicate?
- What is the significance of “Batches” in a Hash operation?
- Book Reference: “PostgreSQL Query Optimization” by Henrietta Dombrovskaya - Ch. 4
- Cost-Based Optimization
- How does the planner estimate query cost?
- What are random_page_cost and seq_page_cost?
- Why might the planner choose a slower plan than you expect?
- Book Reference: “Database Internals” by Alex Petrov - Ch. 12
Questions to Guide Your Design
Before implementing, think through these:
- Execution Plan Parsing
- How will you parse the text output of EXPLAIN ANALYZE? (Regex? Structured JSON format?)
- Different databases have different EXPLAIN formats (PostgreSQL vs MySQL vs SQLite). Will you support multiple?
- How do you extract the key bottlenecks programmatically?
- Should you use
EXPLAIN (FORMAT JSON)for easier parsing?
- Index Recommendation Logic
- How do you identify which columns should be indexed?
- What if a column is already indexed but the index isn’t being used?
- Should you recommend composite indexes for multi-column WHERE clauses?
- How do you avoid recommending too many indexes (index bloat)?
- Query Rewriting
- When should you suggest a CTE vs a subquery?
- How do you know if reordering joins will help?
- What if the query uses features Kiro doesn’t understand (window functions, recursive CTEs)?
- Should you preserve query semantics exactly or allow minor changes?
- Validation
- How do you ensure the optimized query returns the same results?
- What if the index recommendations require more disk space than available?
- Should you test on a staging database first?
- How do you handle queries that are already well-optimized?
Thinking Exercise
Index Selection Challenge
You have this query:
SELECT p.id, p.title, p.created_at, u.username, COUNT(c.id) as comment_count
FROM posts p
JOIN users u ON p.author_id = u.id
LEFT JOIN comments c ON p.id = c.post_id
WHERE p.published = true
AND p.created_at > NOW() - INTERVAL '30 days'
AND u.is_active = true
ORDER BY p.created_at DESC, comment_count DESC
LIMIT 20;
EXPLAIN shows:
- Sequential Scan on
posts(2M rows, filtering 1.8M) - Sequential Scan on
users(500k rows, filtering 100k) - Hash Join on comments (5M rows)
Questions to reason through:
- Which columns should you index and why?
- Should you create separate indexes or one composite index? (
posts(published, created_at)vsposts(published)+posts(created_at)) - The ORDER BY uses
comment_countwhich is computed. Can you index it? - Would a partial index help here? (
WHERE published = true) - What’s the tradeoff between index size and query speed for rarely-used queries?
Proposed indexes:
CREATE INDEX idx_posts_published_created ON posts(published, created_at DESC);
CREATE INDEX idx_users_active ON users(is_active) WHERE is_active = true;
CREATE INDEX idx_comments_post ON comments(post_id);
Are these optimal? What would you change?
The Interview Questions They’ll Ask
Prepare to answer these:
- “Explain the difference between a Sequential Scan and an Index Scan. When might a Sequential Scan actually be faster?”
- “Your query uses
WHERE user_id = 123, and there’s an index onuser_id, but EXPLAIN shows a Seq Scan. Why?” - “What is a covering index and when should you use one?”
- “How would you optimize a query with
ORDER BYon a non-indexed column?” - “What’s the difference between
EXPLAINandEXPLAIN ANALYZE? When would you use each?” - “A junior developer adds 15 indexes to a table. What problems might this cause?”
Hints in Layers
Hint 1: Start with EXPLAIN Parsing
Run EXPLAIN (FORMAT JSON) <query> to get structured output:
{
"Plan": {
"Node Type": "Limit",
"Startup Cost": 45623.22,
"Total Cost": 45623.47,
"Plan Rows": 100,
"Plan Width": 56,
"Actual Startup Time": 3847.234,
"Actual Total Time": 3847.289,
"Actual Rows": 100,
"Actual Loops": 1,
"Plans": [...]
}
}
Parse this JSON to extract:
- Node types (“Seq Scan”, “Index Scan”, “Hash Join”)
- Costs (Total Cost, Actual Total Time)
- Filters (“Rows Removed by Filter”)
Hint 2: Bottleneck Detection Look for these patterns in the execution plan:
Seq Scanwith highRows Removed by Filter→ missing WHERE indexHash JoinwithBatches > 1→ memory overflow, need more work_memSortwithexternal merge Disk→ sort spilled to disk, need index for ORDER BYNested Loopwith high loop count → inefficient join, consider hash join
Hint 3: Index Recommendation Algorithm
For each table in the query:
1. Extract WHERE conditions → index these columns
2. Extract JOIN conditions → index foreign keys
3. Extract ORDER BY columns → index with DESC/ASC matching
4. If all columns are in SELECT → recommend covering index (INCLUDE)
5. If WHERE has multiple ANDs → recommend composite index
Hint 4: Validation Loop After generating recommendations:
- Create the indexes in a transaction:
BEGIN; CREATE INDEX ...; ROLLBACK;(test without committing) - Run
EXPLAINon the optimized query - Compare costs: if
new_cost < old_cost * 0.8(20% improvement), accept - If cost increased, reject the suggestion and explain why
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| PostgreSQL EXPLAIN | “PostgreSQL: Up and Running” by Regina Obe & Leo Hsu | Ch. 10 |
| Index fundamentals | “SQL Performance Explained” by Markus Winand | Ch. 2-3 |
| Query optimization | “PostgreSQL Query Optimization” by Henrietta Dombrovskaya | Ch. 4-5 |
| Database internals | “Database Internals” by Alex Petrov | Ch. 12 |
| Execution plans | “SQL Tuning” by Dan Tow | Ch. 3-4 |
Common Pitfalls and Debugging
Problem 1: “Index exists but isn’t being used”
- Why: Index selectivity is too low (planner prefers Seq Scan for large result sets)
- Fix: Check
SELECT COUNT(*) / (SELECT COUNT(*) FROM table)— if >20%, Seq Scan is often faster - Quick test:
SET enable_seqscan = off; EXPLAIN <query>;(force index usage to compare)
Problem 2: “Optimized query returns different results”
- Why: Query rewriting changed semantics (e.g., moving WHERE from JOIN to outer query)
- Fix: Use
EXCEPTto find missing/extra rows:(original EXCEPT optimized) UNION (optimized EXCEPT original) - Quick test: Hash the results:
SELECT MD5(string_agg(row::text, '')) FROM (query) AS row;
Problem 3: “Index recommendation is way too large”
- Why: Composite index on high-cardinality columns
- Fix: Use partial indexes with WHERE clause to reduce size
- Quick test:
SELECT pg_size_pretty(pg_relation_size('index_name'));
Problem 4: “EXPLAIN shows Index Scan but query is still slow”
- Why: Index Scan has high
Actual Loops(nested loop with bad join order) - Fix: Reorder joins so smaller table is scanned first, or switch to Hash Join
- Quick test: Check
Actual Loops— if >1000, you’re doing a nested loop on too many rows
Definition of Done
- Tool accepts a SQL query and database connection string
- EXPLAIN ANALYZE output is captured and parsed (JSON format preferred)
- Bottlenecks are identified: Seq Scans with high filter cost, missing indexes, memory spills
- Index recommendations are generated with estimated size and creation SQL
- Optimized query is generated with CTE or join reordering
- Before/after EXPLAIN comparison shows measurable improvement (cost reduction >20%)
- Results are validated: optimized query returns same row count and hash as original
- Tool outputs a report with: bottlenecks, recommendations, speedup metrics
- Optional: Tool can auto-apply indexes with
CREATE INDEX CONCURRENTLY(no table locks)
Project 38: “The Refactoring Surgeon” — Software Architecture
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | TypeScript |
| Coolness Level | Level 3: Genuinely Clever |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Software Architecture |
What you’ll build: A refactoring tool that identifies God Classes (classes with too many responsibilities) and uses Kiro to safely decompose them into focused services, utilities, and domain models while preserving all tests and behavior.
Why it teaches Safe Changes: This project shows how AI agents excel at mechanical refactoring tasks that require understanding code structure, dependency graphs, and test coverage. You’ll learn to use Kiro for large-scale architectural changes that would take days manually.
Core challenges you’ll face:
- God Class detection → Maps to cyclomatic complexity analysis, SRP (Single Responsibility Principle) violations
- Dependency extraction → Maps to identifying method dependencies, data flow analysis
- Safe decomposition → Maps to extract class/method refactorings that preserve semantics
- Test preservation → Maps to ensuring all tests pass after refactoring
Real World Outcome
You’ll have a CLI tool that analyzes a codebase, identifies bloated classes, and refactors them into clean, testable components:
$ cat src/services/OrderService.ts
// WARNING: God Class - 1,247 lines, 43 methods, cyclomatic complexity: 87
export class OrderService {
constructor(
private db: Database,
private emailClient: EmailClient,
private paymentGateway: PaymentGateway,
private inventoryService: InventoryService,
private shippingService: ShippingService,
private taxCalculator: TaxCalculator,
private discountEngine: DiscountEngine
) {}
// Payment methods (8 methods, 312 lines)
async processPayment(orderId: string) { /* ... */ }
async refundPayment(orderId: string) { /* ... */ }
async captureAuthorization(orderId: string) { /* ... */ }
// ... 5 more payment methods
// Email methods (6 methods, 189 lines)
async sendOrderConfirmation(orderId: string) { /* ... */ }
async sendShippingNotification(orderId: string) { /* ... */ }
async sendCancellationEmail(orderId: string) { /* ... */ }
// ... 3 more email methods
// Inventory methods (7 methods, 234 lines)
async reserveInventory(orderId: string) { /* ... */ }
async releaseInventory(orderId: string) { /* ... */ }
async checkStockAvailability(items: Item[]) { /* ... */ }
// ... 4 more inventory methods
// Tax/discount methods (9 methods, 278 lines)
async calculateTax(order: Order) { /* ... */ }
async applyDiscount(order: Order, code: string) { /* ... */ }
// ... 7 more calculation methods
// Shipping methods (5 methods, 156 lines)
async calculateShipping(order: Order) { /* ... */ }
async schedulePickup(orderId: string) { /* ... */ }
// ... 3 more shipping methods
// Core order methods (8 methods, 178 lines)
async createOrder(data: CreateOrderDTO) { /* ... */ }
async cancelOrder(orderId: string) { /* ... */ }
async getOrderDetails(orderId: string) { /* ... */ }
// ... 5 more order methods
}
$ kiro refactor --file src/services/OrderService.ts --analyze
[Kiro CLI Session]
🔧 Refactoring Surgeon Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Analyzing OrderService.ts...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Metrics:
• Lines of Code: 1,247
• Number of Methods: 43
• Cyclomatic Complexity: 87 ⚠️ (threshold: 20)
• Coupling (Dependencies): 7 classes
• Cohesion: LOW (methods operate on different data)
🚨 God Class Detected!
Responsibility Clusters Identified:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. Payment Operations (8 methods, 312 lines)
→ processPayment, refundPayment, captureAuthorization, etc.
→ Dependencies: PaymentGateway
→ Suggested: Extract to PaymentService
2. Email Notifications (6 methods, 189 lines)
→ sendOrderConfirmation, sendShippingNotification, etc.
→ Dependencies: EmailClient
→ Suggested: Extract to OrderNotificationService
3. Inventory Management (7 methods, 234 lines)
→ reserveInventory, releaseInventory, checkStockAvailability
→ Dependencies: InventoryService
→ Suggested: Extract to OrderInventoryCoordinator
4. Tax & Discount Calculations (9 methods, 278 lines)
→ calculateTax, applyDiscount, validateCoupon, etc.
→ Dependencies: TaxCalculator, DiscountEngine
→ Suggested: Extract to OrderPricingService
5. Shipping Coordination (5 methods, 156 lines)
→ calculateShipping, schedulePickup, trackShipment
→ Dependencies: ShippingService
→ Suggested: Extract to ShippingCoordinator
6. Core Order Domain (8 methods, 178 lines)
→ createOrder, cancelOrder, getOrderDetails
→ Dependencies: Database
→ Keep in OrderService (core domain logic)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Refactoring Plan:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Extract PaymentService (8 methods)
Step 2: Extract OrderNotificationService (6 methods)
Step 3: Extract OrderInventoryCoordinator (7 methods)
Step 4: Extract OrderPricingService (9 methods)
Step 5: Extract ShippingCoordinator (5 methods)
Step 6: Update OrderService to use new services (8 core methods remain)
Step 7: Update all imports and tests
Impact:
• 5 new service classes created
• OrderService reduced from 1,247 to 178 lines
• Cyclomatic complexity: 87 → 12
• Test files to update: 14
Proceed with refactoring? (y/n): y
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Extracting PaymentService...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Created: src/services/payments/PaymentService.ts
export class PaymentService {
constructor(private paymentGateway: PaymentGateway) {}
async processPayment(orderId: string, amount: number): Promise<PaymentResult> {
// [Original logic moved here]
}
async refundPayment(orderId: string, amount: number): Promise<void> {
// [Original logic moved here]
}
async captureAuthorization(orderId: string): Promise<void> {
// [Original logic moved here]
}
// ... 5 more methods
}
✓ PaymentService extracted (8 methods, 312 lines)
✓ Tests updated: PaymentService.test.ts created
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 2: Extracting OrderNotificationService...
Created: src/services/notifications/OrderNotificationService.ts
✓ OrderNotificationService extracted (6 methods, 189 lines)
✓ Tests updated: OrderNotificationService.test.ts created
Step 3: Extracting OrderInventoryCoordinator...
Created: src/services/inventory/OrderInventoryCoordinator.ts
✓ OrderInventoryCoordinator extracted (7 methods, 234 lines)
✓ Tests updated: OrderInventoryCoordinator.test.ts created
Step 4: Extracting OrderPricingService...
Created: src/services/pricing/OrderPricingService.ts
✓ OrderPricingService extracted (9 methods, 278 lines)
✓ Tests updated: OrderPricingService.test.ts created
Step 5: Extracting ShippingCoordinator...
Created: src/services/shipping/ShippingCoordinator.ts
✓ ShippingCoordinator extracted (5 methods, 156 lines)
✓ Tests updated: ShippingCoordinator.test.ts created
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 6: Updating OrderService (core domain logic)...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Updated: src/services/OrderService.ts
export class OrderService {
constructor(
private db: Database,
private paymentService: PaymentService,
private notificationService: OrderNotificationService,
private inventoryCoordinator: OrderInventoryCoordinator,
private pricingService: OrderPricingService,
private shippingCoordinator: ShippingCoordinator
) {}
async createOrder(data: CreateOrderDTO): Promise<Order> {
// 1. Reserve inventory
await this.inventoryCoordinator.reserveInventory(data.items);
// 2. Calculate pricing
const pricing = await this.pricingService.calculateTotal(data);
// 3. Create order record
const order = await this.db.orders.create({
...data,
totalAmount: pricing.total,
status: 'pending'
});
// 4. Process payment
const paymentResult = await this.paymentService.processPayment(
order.id,
pricing.total
);
// 5. Update order status
await this.db.orders.update(order.id, { status: 'confirmed' });
// 6. Send confirmation email
await this.notificationService.sendOrderConfirmation(order.id);
return order;
}
async cancelOrder(orderId: string): Promise<void> {
const order = await this.db.orders.findById(orderId);
// 1. Refund payment
await this.paymentService.refundPayment(orderId, order.totalAmount);
// 2. Release inventory
await this.inventoryCoordinator.releaseInventory(order.items);
// 3. Update order status
await this.db.orders.update(orderId, { status: 'cancelled' });
// 4. Send cancellation email
await this.notificationService.sendCancellationEmail(orderId);
}
// ... 6 more core order methods (getOrderDetails, updateOrder, etc.)
}
New Metrics:
• Lines of Code: 178 (was 1,247) — 86% reduction
• Number of Methods: 8 (was 43)
• Cyclomatic Complexity: 12 (was 87)
• Coupling: 6 services (clean dependency injection)
✓ OrderService refactored successfully!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 7: Running Tests...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
$ npm test
> Running 247 tests...
OrderService.test.ts
✓ creates order successfully (45ms)
✓ cancels order and refunds payment (67ms)
✓ handles payment failure gracefully (34ms)
... 23 more tests
PaymentService.test.ts
✓ processes payment via gateway (23ms)
✓ refunds payment correctly (29ms)
... 14 more tests
[All other service tests passing...]
Test Suites: 14 passed, 14 total
Tests: 247 passed, 247 total
Time: 8.456s
✅ All tests passing! Behavior preserved.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Refactoring Summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Files Created: 5
• src/services/payments/PaymentService.ts
• src/services/notifications/OrderNotificationService.ts
• src/services/inventory/OrderInventoryCoordinator.ts
• src/services/pricing/OrderPricingService.ts
• src/services/shipping/ShippingCoordinator.ts
Files Modified: 15
• src/services/OrderService.ts (refactored)
• 14 test files updated
Before vs After:
• OrderService: 1,247 lines → 178 lines (86% reduction)
• Cyclomatic Complexity: 87 → 12 (86% reduction)
• Single Responsibility: ✓ Each service has one clear purpose
• Testability: ✓ Services can be tested in isolation
• Maintainability: ✓ Changes are now localized
✅ Refactoring complete! Your God Class is now 6 focused services.
Exactly what happens:
- Kiro analyzes the class and calculates complexity metrics
- It identifies clusters of methods that operate on similar data/dependencies
- It extracts each cluster into a new service class
- It updates the original class to delegate to the new services
- It updates all tests to match the new structure
- It runs the full test suite to verify behavior is preserved
The Core Question You’re Answering
“How do you use AI to perform safe, large-scale refactoring that would take humans days but preserves all behavior and tests?”
This is about teaching Kiro architectural thinking:
- Recognizing code smells (God Classes, tight coupling)
- Understanding the Single Responsibility Principle
- Safely extracting classes without breaking tests
- Maintaining dependency injection patterns
Concepts You Must Understand First
Stop and research these before coding:
- SOLID Principles (especially SRP)
- What is the Single Responsibility Principle?
- How do you identify when a class has too many responsibilities?
- What is coupling vs cohesion?
- Book Reference: “Clean Architecture” by Robert C. Martin - Ch. 7-8
- Refactoring Patterns
- Extract Class, Extract Method, Move Method
- How do you safely refactor without changing behavior?
- What is the “red-green-refactor” cycle?
- Book Reference: “Refactoring” by Martin Fowler - Ch. 6-7
- Code Metrics
- What is cyclomatic complexity and why does it matter?
- How do you measure coupling and cohesion?
- What’s the difference between LOC and meaningful complexity?
- Book Reference: “Code Complete” by Steve McConnell - Ch. 19
- Dependency Injection
- Why use constructor injection vs property injection?
- How does DI make code testable?
- What is inversion of control (IoC)?
- Book Reference: “Dependency Injection Principles, Practices, and Patterns” by Steven van Deursen & Mark Seemann - Ch. 1-2
Questions to Guide Your Design
Before implementing, think through these:
- Cluster Detection
- How do you identify which methods belong together?
- Should you cluster by data dependencies, control flow, or domain concepts?
- What if a method uses dependencies from multiple clusters?
- Should you use static analysis or dynamic runtime profiling?
- Safe Extraction
- How do you ensure the extracted class has the same behavior?
- What if the original methods had side effects or shared mutable state?
- Should you extract one service at a time or all at once?
- How do you handle private methods that are called by methods in different clusters?
- Test Preservation
- Should you update tests to match the new structure or keep them as integration tests?
- What if some tests break after refactoring?
- How do you ensure test coverage doesn’t decrease?
- Should you add new unit tests for the extracted services?
- Naming and Structure
- How do you name the new service classes?
- Where should they live in the directory structure?
- Should you group them by layer (services/, repositories/) or by domain (orders/, payments/)?
- What if the God Class is already named “OrderService” — do you rename it?
Thinking Exercise
Refactoring Decision Challenge
You have this class:
class UserService {
// Authentication methods
async login(email, password) { /* ... */ }
async logout(userId) { /* ... */ }
async resetPassword(email) { /* ... */ }
// Profile management
async updateProfile(userId, data) { /* ... */ }
async uploadAvatar(userId, file) { /* ... */ }
// Notification preferences
async updateEmailPreferences(userId, prefs) { /* ... */ }
async updatePushPreferences(userId, prefs) { /* ... */ }
// Analytics
async trackLogin(userId) { /* ... */ }
async trackProfileView(userId, viewerId) { /* ... */ }
}
Questions to reason through:
- How many services should you extract? (3? 4?)
- Should
loginandlogoutgo inAuthenticationServiceorSessionService? - The
trackLoginmethod depends onlogin— which service should it live in? - After refactoring, should the original
UserServicestill exist? What should it contain? - What if 50 files import
UserService— how do you update them all safely?
Proposed structure:
AuthenticationService: login, logout, resetPassword
ProfileService: updateProfile, uploadAvatar
NotificationPreferenceService: updateEmailPreferences, updatePushPreferences
UserAnalyticsService: trackLogin, trackProfileView
Is this optimal? What would you change?
The Interview Questions They’ll Ask
Prepare to answer these:
- “What is a God Class and why is it a code smell?”
- “How do you safely refactor a class without breaking existing functionality?”
- “What’s the difference between coupling and cohesion? Give examples.”
- “You extract a service but tests start failing. What’s your debugging strategy?”
- “How would you refactor a God Class that has circular dependencies?”
- “What metrics do you use to identify classes that need refactoring?”
Hints in Layers
Hint 1: Start with Static Analysis Use TypeScript Compiler API to parse the class:
const sourceFile = ts.createSourceFile(filename, code, ts.ScriptTarget.Latest);
const classNode = findClassDeclaration(sourceFile, 'OrderService');
const methods = classNode.members.filter(m => ts.isMethodDeclaration(m));
// For each method, extract:
// - Dependencies used (from constructor params or method args)
// - Other methods called
// - Properties accessed
This gives you a dependency graph to cluster methods.
Hint 2: Cluster by Dependencies
For each method:
1. Extract dependencies it uses (PaymentGateway, EmailClient, etc.)
2. Extract other methods it calls
3. Group methods that share >70% of dependencies
Example:
processPayment → uses PaymentGateway
refundPayment → uses PaymentGateway
captureAuthorization → uses PaymentGateway
→ Cluster: PaymentService
Hint 3: Extract Class Refactoring For each cluster:
- Generate new class file
- Move methods to new class
- Add constructor with required dependencies
- Update original class to instantiate new service
- Replace direct calls with
this.paymentService.processPayment(...)
Hint 4: Test Validation After each extraction:
- Run full test suite:
npm test - If tests fail, analyze failures:
- Missing imports? → Add imports
- Constructor signature changed? → Update DI container
- Method signature changed? → Verify parameters match
- If all tests pass, commit the refactoring
- If tests fail repeatedly, rollback and try a different clustering
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| SOLID principles | “Clean Architecture” by Robert C. Martin | Ch. 7-8 |
| Refactoring patterns | “Refactoring” by Martin Fowler | Ch. 6-7 |
| Code metrics | “Code Complete” by Steve McConnell | Ch. 19 |
| Dependency injection | “Dependency Injection” by Steven van Deursen | Ch. 1-2 |
| Software design | “Domain-Driven Design” by Eric Evans | Ch. 5 |
Common Pitfalls and Debugging
Problem 1: “Tests fail after extraction”
- Why: Constructor signature changed but DI container wasn’t updated
- Fix: Update all places where OrderService is instantiated (DI config, test setup)
- Quick test: Search for
new OrderService(and verify all constructor calls match
Problem 2: “Extracted service has circular dependency”
- Why: Two clusters reference each other’s methods
- Fix: Introduce an interface or event bus to break the cycle
- Quick test: Draw dependency graph — if there’s a cycle, refactor to one-way dependencies
Problem 3: “Method calls private methods from different clusters”
- Why: Private method is utility logic used by multiple responsibilities
- Fix: Extract private method to a separate utility class or make it public in the appropriate service
- Quick test: Analyze private method dependencies — does it belong in one cluster more than others?
Problem 4: “Extracted services are too granular (over-engineering)”
- Why: Created too many small services (e.g., separate service for each method)
- Fix: Merge related services (e.g., PaymentService + RefundService → PaymentService)
- Quick test: If a service has <3 methods and low complexity, consider merging
Definition of Done
- God Class is identified (cyclomatic complexity >20, >500 lines, or >10 methods)
- Responsibility clusters are detected programmatically (shared dependencies)
- Each cluster is extracted into a new service class with focused responsibility
- Original class delegates to new services via constructor injection
- All tests pass after refactoring (behavior is preserved)
- Cyclomatic complexity is reduced by >50%
- Each new service has <10 methods and single responsibility
- Test coverage remains the same or improves
Project 39: “The API Client Generator” — Integration
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | TypeScript / Python |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 2: Intermediate |
| Knowledge Area | Integration |
What you’ll build: An OpenAPI-to-SDK generator that reads openapi.yaml specifications and produces fully-typed client libraries in TypeScript or Python with request/response validation, error handling, and authentication support.
Why it teaches Automation: This project eliminates the manual work of writing API client boilerplate. You’ll learn how to use Kiro to generate production-ready SDKs that stay in sync with your API spec.
Core challenges you’ll face:
- OpenAPI spec parsing → Maps to YAML/JSON parsing, JSON Schema validation
- Code generation → Maps to template engines, AST builders
- Type safety → Maps to TypeScript interfaces from JSON Schema
- Authentication patterns → Maps to API key, Bearer token, OAuth2 flow implementation
Real World Outcome
You’ll have a CLI tool that transforms an OpenAPI spec into a production-ready, typed SDK:
$ cat openapi.yaml
openapi: 3.0.0
info:
title: Task Management API
version: 1.0.0
servers:
- url: https://api.example.com/v1
components:
securitySchemes:
bearerAuth:
type: http
scheme: bearer
schemas:
Task:
type: object
required: [id, title, status]
properties:
id:
type: string
format: uuid
title:
type: string
minLength: 1
maxLength: 200
description:
type: string
nullable: true
status:
type: string
enum: [todo, in_progress, done]
dueDate:
type: string
format: date-time
nullable: true
tags:
type: array
items:
type: string
paths:
/tasks:
get:
summary: List all tasks
security:
- bearerAuth: []
parameters:
- name: status
in: query
schema:
type: string
enum: [todo, in_progress, done]
- name: limit
in: query
schema:
type: integer
default: 20
minimum: 1
maximum: 100
responses:
'200':
description: Success
content:
application/json:
schema:
type: object
properties:
tasks:
type: array
items:
$ref: '#/components/schemas/Task'
total:
type: integer
post:
summary: Create a task
security:
- bearerAuth: []
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [title]
properties:
title:
type: string
description:
type: string
status:
type: string
enum: [todo, in_progress, done]
default: todo
responses:
'201':
description: Task created
content:
application/json:
schema:
$ref: '#/components/schemas/Task'
/tasks/{taskId}:
get:
summary: Get task by ID
security:
- bearerAuth: []
parameters:
- name: taskId
in: path
required: true
schema:
type: string
responses:
'200':
description: Success
content:
application/json:
schema:
$ref: '#/components/schemas/Task'
$ kiro generate-sdk --spec openapi.yaml --language typescript --output task-api-client
[Kiro CLI Session]
📦 API Client Generator
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Parsing OpenAPI spec...
✓ OpenAPI version: 3.0.0
✓ API title: Task Management API
✓ Base URL: https://api.example.com/v1
✓ Security: Bearer Token (HTTP)
✓ Schemas: 1 (Task)
✓ Endpoints: 3 (GET /tasks, POST /tasks, GET /tasks/{taskId})
Step 2: Generating TypeScript types...
✓ Generated: src/types.ts
✓ Generated: src/schemas.ts (Zod validators)
Step 3: Generating API client...
✓ Generated: src/client.ts
✓ Generated: src/endpoints/TasksApi.ts
Step 4: Generating tests...
✓ Generated: tests/client.test.ts
Step 5: Setting up project...
✓ package.json created
✓ tsconfig.json created
✓ README.md created
✅ SDK generated successfully!
$ tree task-api-client/
task-api-client/
├── src/
│ ├── client.ts (Main API client class)
│ ├── types.ts (TypeScript interfaces)
│ ├── schemas.ts (Zod runtime validators)
│ └── endpoints/
│ └── TasksApi.ts (Task endpoints wrapper)
├── tests/
│ └── client.test.ts
├── package.json
├── tsconfig.json
└── README.md
$ cat task-api-client/src/types.ts
// Auto-generated from openapi.yaml — DO NOT EDIT
export interface Task {
id: string; // UUID
title: string; // 1-200 chars
description?: string | null;
status: 'todo' | 'in_progress' | 'done';
dueDate?: string | null; // ISO 8601 date-time
tags?: string[];
}
export interface ListTasksRequest {
status?: 'todo' | 'in_progress' | 'done';
limit?: number; // 1-100, default: 20
}
export interface ListTasksResponse {
tasks: Task[];
total: number;
}
export interface CreateTaskRequest {
title: string;
description?: string;
status?: 'todo' | 'in_progress' | 'done';
}
$ cat task-api-client/src/client.ts
import axios, { AxiosInstance } from 'axios';
import { TasksApi } from './endpoints/TasksApi';
export class TaskApiClient {
private axiosInstance: AxiosInstance;
public tasks: TasksApi;
constructor(config: {
baseURL?: string;
apiKey: string;
timeout?: number;
}) {
this.axiosInstance = axios.create({
baseURL: config.baseURL || 'https://api.example.com/v1',
timeout: config.timeout || 30000,
headers: {
'Authorization': `Bearer ${config.apiKey}`,
'Content-Type': 'application/json',
},
});
// Initialize endpoint wrappers
this.tasks = new TasksApi(this.axiosInstance);
}
}
$ cat task-api-client/src/endpoints/TasksApi.ts
import { AxiosInstance } from 'axios';
import {
Task,
ListTasksRequest,
ListTasksResponse,
CreateTaskRequest,
} from '../types';
import { TaskSchema, ListTasksResponseSchema } from '../schemas';
export class TasksApi {
constructor(private axios: AxiosInstance) {}
/**
* List all tasks
* @param params - Query parameters
* @returns List of tasks with total count
*/
async list(params?: ListTasksRequest): Promise<ListTasksResponse> {
const response = await this.axios.get<ListTasksResponse>('/tasks', {
params,
});
// Runtime validation with Zod
return ListTasksResponseSchema.parse(response.data);
}
/**
* Create a new task
* @param data - Task creation payload
* @returns The created task
*/
async create(data: CreateTaskRequest): Promise<Task> {
const response = await this.axios.post<Task>('/tasks', data);
return TaskSchema.parse(response.data);
}
/**
* Get task by ID
* @param taskId - Task UUID
* @returns The task details
*/
async get(taskId: string): Promise<Task> {
const response = await this.axios.get<Task>(`/tasks/${taskId}`);
return TaskSchema.parse(response.data);
}
}
$ cat task-api-client/src/schemas.ts
import { z } from 'zod';
export const TaskSchema = z.object({
id: z.string().uuid(),
title: z.string().min(1).max(200),
description: z.string().nullable().optional(),
status: z.enum(['todo', 'in_progress', 'done']),
dueDate: z.string().datetime().nullable().optional(),
tags: z.array(z.string()).optional(),
});
export const ListTasksResponseSchema = z.object({
tasks: z.array(TaskSchema),
total: z.number().int(),
});
$ cat task-api-client/README.md
# Task Management API Client
Auto-generated TypeScript SDK for Task Management API.
## Installation
```bash
npm install task-api-client
Usage
import { TaskApiClient } from 'task-api-client';
const client = new TaskApiClient({
apiKey: process.env.API_KEY!,
});
// List tasks
const { tasks, total } = await client.tasks.list({
status: 'todo',
limit: 10,
});
// Create a task
const newTask = await client.tasks.create({
title: 'Implement user authentication',
description: 'Add JWT-based auth flow',
status: 'todo',
});
// Get task by ID
const task = await client.tasks.get(newTask.id);
Features
✓ Full TypeScript type safety ✓ Runtime validation with Zod ✓ Automatic authentication (Bearer token) ✓ Error handling and retries ✓ Request/response interceptors ✓ Auto-generated from OpenAPI 3.0 spec
$ cd task-api-client && npm install && npm test
task-api-client@1.0.0 test jest
PASS tests/client.test.ts TaskApiClient ✓ creates client with API key (12ms) ✓ lists tasks with filters (45ms) ✓ creates a new task (34ms) ✓ validates response schema (23ms) ✓ throws error on invalid status (18ms)
Tests: 5 passed, 5 total Time: 2.134s
✅ SDK is ready to publish!
**Exactly what happens:**
1. Kiro parses the OpenAPI spec and extracts schemas, endpoints, and auth requirements
2. It generates TypeScript interfaces from JSON Schema definitions
3. It creates Zod validators for runtime type safety
4. It generates an API client class with typed methods for each endpoint
5. It adds authentication, error handling, and request validation
6. It produces a complete npm package ready to publish
#### The Core Question You're Answering
> "How do you automatically generate production-ready API clients that stay in sync with your OpenAPI spec and provide full type safety?"
This is about code generation as a force multiplier:
- Eliminating manual SDK maintenance (API changes → regenerate SDK)
- Providing better DX than hand-written clients (types, validation, docs)
- Ensuring client and server contracts match (single source of truth)
#### Concepts You Must Understand First
**Stop and research these before coding:**
1. **OpenAPI Specification 3.0**
- What are paths, operations, parameters, requestBody, responses?
- How do JSON Schema definitions map to types?
- What are `$ref` references and how do you resolve them?
- *Reference:* OpenAPI 3.0 Specification (https://spec.openapis.org/oas/v3.0.3)
2. **Code Generation Strategies**
- Template engines (Handlebars, EJS) vs AST builders (TypeScript Compiler API)
- When to use string concatenation vs structured code generation?
- How do you generate readable, idiomatic code?
- *Book Reference:* "Code Generation in Action" by Jack Herrington
3. **Runtime Validation**
- Why use Zod, Yup, or io-ts for runtime type checking?
- What's the difference between compile-time types (TypeScript) and runtime validation?
- How do you handle optional fields, nullable types, and unions?
- *Reference:* Zod documentation (https://zod.dev)
4. **HTTP Client Patterns**
- Axios vs Fetch API — which to use for generated clients?
- How do you handle authentication (API keys, Bearer tokens, OAuth2)?
- What about request interceptors, retries, and error handling?
- *Book Reference:* "RESTful Web API Patterns & Practices" by Mike Amundsen
#### Questions to Guide Your Design
**Before implementing, think through these:**
1. **Spec Parsing**
- How do you resolve `$ref` pointers to schemas in other files?
- What if the spec has circular references (Task → User → Task)?
- Should you validate the OpenAPI spec before generating code?
- How do you handle deprecated endpoints or parameters?
2. **Type Generation**
- Should you generate interfaces or types? (`interface Task` vs `type Task`)
- How do you handle discriminated unions (polymorphic schemas)?
- What about enums — should they be TypeScript enums or union types?
- How do you generate JSDoc comments from OpenAPI descriptions?
3. **SDK Structure**
- Should you group endpoints by tags or paths?
- Should each endpoint be a method or a separate class?
- How do you handle pagination, filtering, sorting?
- What about file uploads (multipart/form-data)?
4. **Versioning and Updates**
- If the API spec changes, how do you regenerate without breaking client code?
- Should you version the SDK independently of the API?
- How do you handle breaking vs non-breaking changes?
- Should the generator produce a diff showing what changed?
#### Thinking Exercise
### Type Generation Challenge
You have this OpenAPI schema:
```yaml
components:
schemas:
User:
type: object
required: [id, email, role]
properties:
id:
type: string
email:
type: string
format: email
role:
type: string
enum: [admin, member, guest]
profile:
oneOf:
- $ref: '#/components/schemas/AdminProfile'
- $ref: '#/components/schemas/MemberProfile'
AdminProfile:
type: object
properties:
permissions:
type: array
items:
type: string
MemberProfile:
type: object
properties:
joinedAt:
type: string
format: date-time
Questions to reason through:
- Should
rolebe a TypeScript enum or a union type ('admin' | 'member' | 'guest')? - How do you represent
oneOfin TypeScript? (Union type? Discriminated union?) - The
profilefield depends onrole— can you enforce this at the type level? - Should you generate separate types for AdminProfile and MemberProfile or inline them?
- How would Zod validate the
oneOfrelationship at runtime?
Generated types:
export type UserRole = 'admin' | 'member' | 'guest';
export interface AdminProfile {
permissions: string[];
}
export interface MemberProfile {
joinedAt: string; // ISO 8601 date-time
}
export interface User {
id: string;
email: string;
role: UserRole;
profile: AdminProfile | MemberProfile;
}
Is this type-safe enough? How would you improve it?
The Interview Questions They’ll Ask
Prepare to answer these:
- “How would you handle versioning in a generated SDK? (e.g., API v1 vs v2)”
- “Your OpenAPI spec has a circular reference. How do you generate types without infinite loops?”
- “What’s the difference between compile-time type safety and runtime validation? Why do you need both?”
- “How would you handle authentication in a generated client? (API keys, OAuth2, etc.)”
- “The API spec changes frequently. How do you keep the SDK in sync without manual work?”
- “Should you generate one giant SDK or multiple packages per API resource?”
Hints in Layers
Hint 1: Parse OpenAPI with a Library Don’t parse YAML manually — use a library:
import SwaggerParser from '@apidevtools/swagger-parser';
const api = await SwaggerParser.dereference('openapi.yaml');
// This resolves all $ref pointers into a single object
const paths = api.paths;
const schemas = api.components.schemas;
Hint 2: Generate Types from JSON Schema
For each schema in components.schemas:
function generateInterface(name: string, schema: any): string {
const required = schema.required || [];
const properties = Object.entries(schema.properties || {})
.map(([key, prop]: [string, any]) => {
const optional = !required.includes(key) ? '?' : '';
const type = mapJsonSchemaTypeToTS(prop);
return ` ${key}${optional}: ${type};`;
})
.join('\n');
return `export interface ${name} {\n${properties}\n}`;
}
function mapJsonSchemaTypeToTS(schema: any): string {
if (schema.type === 'string') {
if (schema.enum) return schema.enum.map(v => `'${v}'`).join(' | ');
return 'string';
}
if (schema.type === 'number' || schema.type === 'integer') return 'number';
if (schema.type === 'boolean') return 'boolean';
if (schema.type === 'array') return `${mapJsonSchemaTypeToTS(schema.items)}[]`;
if (schema.type === 'object') return 'Record<string, any>';
return 'any';
}
Hint 3: Generate API Methods
For each endpoint in paths:
function generateMethod(path: string, method: string, operation: any): string {
const functionName = operation.operationId || generateOperationId(path, method);
const params = extractParameters(operation);
const requestBody = operation.requestBody;
const response = operation.responses['200'] || operation.responses['201'];
return `
async ${functionName}(${params}): Promise<${getResponseType(response)}> {
const response = await this.axios.${method}('${path}', ${getRequestConfig()});
return ${getResponseSchema()}.parse(response.data);
}
`;
}
Hint 4: Add Authentication
Generate authentication logic based on securitySchemes:
if (spec.components.securitySchemes.bearerAuth) {
// Add Bearer token to headers
headers['Authorization'] = `Bearer ${config.apiKey}`;
}
if (spec.components.securitySchemes.apiKey) {
// Add API key to query or header
const apiKeyLocation = spec.components.securitySchemes.apiKey.in;
if (apiKeyLocation === 'header') {
headers[spec.components.securitySchemes.apiKey.name] = config.apiKey;
}
}
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| OpenAPI fundamentals | “Designing Web APIs” by Brenda Jin et al. | Ch. 3-4 |
| Code generation | “Code Generation in Action” by Jack Herrington | Ch. 1-2 |
| TypeScript types | “Programming TypeScript” by Boris Cherny | Ch. 6 |
| REST API patterns | “RESTful Web API Patterns” by Mike Amundsen | Ch. 5 |
Common Pitfalls and Debugging
Problem 1: “Generated types don’t match runtime data”
- Why: OpenAPI spec is out of sync with actual API responses
- Fix: Add runtime validation with Zod to catch mismatches early
- Quick test: Call real API and log
response.data— does it match the schema?
Problem 2: “Circular references cause infinite loop”
- Why: Schema A references Schema B, which references Schema A
- Fix: Use lazy evaluation in Zod:
z.lazy(() => UserSchema) - Quick test: Try generating types for
User → Team → User— does it terminate?
Problem 3: “Generated code is unreadable”
- Why: Long lines, no formatting, missing comments
- Fix: Run generated code through Prettier:
prettier --write src/**/*.ts - Quick test: Open generated file — would you be comfortable editing it?
Problem 4: “Authentication doesn’t work”
- Why: Security scheme in spec doesn’t match actual API requirements
- Fix: Test generated client against real API with
curlequivalent - Quick test:
await client.tasks.list()— does it return 401 or actual data?
Definition of Done
- OpenAPI spec is parsed and validated
- TypeScript interfaces are generated for all schemas
- Zod validators are generated for runtime validation
- API client class is generated with typed methods for each endpoint
- Authentication is implemented based on
securitySchemes - Error handling and request interceptors are included
- Generated SDK can successfully call at least one endpoint
- package.json, tsconfig.json, and README are generated
- Generated code compiles without errors
- Tests pass for basic SDK functionality
Project 40: “The Autonomous Developer (Capstone)” — Full Agentic Mastery
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Polyglot |
| Coolness Level | Level 5: Pure Magic |
| Business Potential | 5. Industry Disruptor (Agentic Workflow) |
| Difficulty | Level 5: Master |
| Knowledge Area | Full Agentic Mastery |
What you’ll build: A fully autonomous CI/CD healing agent that monitors GitHub Actions, detects failures, diagnoses root causes, patches code, runs tests, and opens pull requests—all without human intervention.
Why it teaches Mastery: This capstone project combines every skill from Projects 1-39: headless operation, hooks, MCP servers, shell tools, reasoning, context management, and multi-agent orchestration. If you can build this, you’ve mastered Kiro.
Core challenges you’ll face:
- Headless GitHub Actions monitoring → Maps to GitHub API polling, webhook handling
- Log analysis and root cause diagnosis → Maps to error pattern matching, stack trace parsing
- Autonomous code patching → Maps to multi-agent collaboration, test-driven fixes
- Verification loop → Maps to running tests, validating fixes before PR
Real World Outcome
You’ll have a system that automatically fixes broken CI/CD pipelines:
# Setup: Deploy the autonomous agent to a server
$ kiro autonomous-dev setup --repo myorg/my-app --webhook-url https://my-server.com/webhook
[Kiro CLI Session]
🤖 Autonomous Developer Agent
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step 1: Configuring GitHub webhook...
✓ Webhook URL: https://my-server.com/webhook
✓ Events: workflow_run, push
✓ Secret: ••••••••
Step 2: Starting headless Kiro agent...
✓ Listening for GitHub Actions failures
✓ Agent running in background (PID: 12345)
✓ Logs: /var/log/kiro-agent.log
✅ Autonomous agent deployed!
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[Time passes... a GitHub Action fails]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[2024-12-20 14:32:15] Webhook received: workflow_run.failed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Workflow: CI
Run ID: 123456789
Commit: a3f2c1d
Branch: feature/add-user-auth
Triggered by: @developer123
Step 1: Fetching logs...
✓ Downloaded logs from GitHub Actions
Step 2: Analyzing failure...
Logs:
Run npm test
FAIL src/services/AuthService.test.ts
AuthService
✕ should validate JWT token (234ms)
● AuthService › should validate JWT token
TypeError: Cannot read property 'split' of undefined
at AuthService.validateToken (src/services/AuthService.ts:45:23)
at Object.<anonymous> (src/services/AuthService.test.ts:67:34)
Test Suites: 1 failed, 15 passed, 16 total
Tests: 1 failed, 187 passed, 188 total
🔍 Diagnosis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• Error: TypeError: Cannot read property 'split' of undefined
• File: src/services/AuthService.ts:45
• Method: validateToken
• Root cause: Missing null check before calling .split()
• Impact: JWT validation crashes when token is undefined/null
Proposed fix:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Add null check before token.split() call
Step 3: Reading source code...
✓ Fetched src/services/AuthService.ts
Current code (line 45):
const parts = token.split('.');
Suggested fix:
if (!token) {
throw new Error('Token is required');
}
const parts = token.split('.');
Step 4: Applying fix...
✓ Created branch: fix/auth-service-null-check
✓ Modified: src/services/AuthService.ts
✓ Committed: "Fix null check in AuthService.validateToken"
Step 5: Running tests locally (headless)...
✓ Checked out fix/auth-service-null-check
✓ npm install (cached, 2.3s)
✓ npm test
PASS src/services/AuthService.test.ts
AuthService
✓ should validate JWT token (89ms)
✓ should throw error for null token (23ms) ← NEW TEST ADDED
✓ should throw error for invalid token (34ms)
Test Suites: 16 passed, 16 total
Tests: 188 passed, 188 total ← ALL PASSING!
Step 6: Opening pull request...
✓ Pushed branch to GitHub
✓ PR #456 opened: "Fix: Add null check in AuthService.validateToken"
PR Description:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
## Summary
Fixes CI failure caused by missing null check in `AuthService.validateToken`.
## Root Cause
The method attempted to call `.split()` on a potentially `null` or `undefined` token, causing a TypeError when tests passed `null` as input.
## Changes
- Added null check before `token.split()`
- Throws descriptive error when token is missing
- All tests now pass ✅
## Testing
- ✓ Existing tests pass
- ✓ New test added for null token case
- ✓ CI pipeline successful
## Autonomous Fix
🤖 This PR was automatically generated by Kiro Autonomous Developer Agent.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Fix deployed!
PR URL: https://github.com/myorg/my-app/pull/456
Status: Awaiting review
CI Status: ✅ All checks passing
[Agent returns to monitoring mode...]
What just happened:
- GitHub Actions workflow failed
- Agent received webhook notification
- Agent downloaded and analyzed failure logs
- Agent diagnosed the root cause (null check missing)
- Agent read the source code and generated a fix
- Agent created a branch, committed the fix, and ran tests
- Agent verified all tests pass
- Agent opened a PR with full context and test results
This is full autonomy — zero human intervention required.
The Core Question You’re Answering
“Can you build an AI agent that operates completely autonomously in a production environment, diagnosing and fixing real failures without human guidance?”
This is the culmination of everything:
- Headless operation (no interactive prompts)
- Event-driven architecture (webhooks trigger actions)
- Multi-step reasoning (diagnosis → fix → verify)
- Safe automation (tests must pass before PR)
- Production-ready (handles edge cases, logs all actions)
Concepts You Must Understand First
Stop and research these before coding:
- GitHub Actions and Webhooks
- How do GitHub webhooks work? (delivery, signatures, retries)
- How do you download workflow logs via GitHub API?
- What information is in a
workflow_runevent? - Reference: GitHub Webhooks Documentation
- Headless Automation
- How do you run Kiro without interactive mode?
- How do you pass instructions via environment variables or config files?
- How do you handle errors when there’s no human to ask?
- Book Reference: “Continuous Delivery” by Jez Humble - Ch. 10
- Root Cause Analysis
- How do you parse stack traces programmatically?
- What patterns indicate common failure types? (null checks, type errors, async issues)
- How do you distinguish flaky tests from real bugs?
- Book Reference: “Release It!” by Michael Nygard - Ch. 4
- Test-Driven Fixes
- How do you verify a fix is correct without human review?
- Should the agent add new tests for the failure case?
- What if the fix causes other tests to fail?
- Book Reference: “Test Driven Development” by Kent Beck - Ch. 1-2
Questions to Guide Your Design
Before implementing, think through these:
- Event Handling
- How do you ensure webhook deliveries aren’t lost? (queue, retry logic)
- What if multiple workflows fail simultaneously?
- Should the agent handle one failure at a time or in parallel?
- How do you prevent duplicate fixes for the same failure?
- Diagnosis
- How do you extract the root cause from logs? (regex patterns, LLM analysis)
- What if the logs don’t have enough information?
- Should the agent ask Kiro to analyze logs or use static patterns?
- How do you handle flaky tests (failures that pass on retry)?
- Fix Generation
- Should the agent always attempt a fix or only for certain error types?
- What if Kiro generates a fix that makes things worse?
- Should the agent rollback if tests fail after the fix?
- How do you prevent infinite loops (fix → test fail → new fix → …)?
- Safety and Approval
- Should all PRs auto-merge or require human review?
- What if the agent opens 100 PRs in a day?
- Should there be a “dry-run” mode that shows what it would do?
- How do you audit all agent actions?
Thinking Exercise
Autonomous Decision Tree
The agent encounters this failure:
Error: ECONNREFUSED
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16)
Tests failed: 12 / 188
Questions to reason through:
- Is this a code bug or an infrastructure issue? (Database not running?)
- Should the agent attempt a code fix or just notify a human?
- If it’s a missing service, how does the agent start it?
- What if the error is intermittent (connection refused sometimes)?
- Should the agent retry the workflow or fix the code first?
Decision tree:
Is error deterministic? (same failure every time)
Yes → Attempt code fix
No → Mark as flaky, notify human
Is error in application code or infrastructure?
Application → Generate code patch
Infrastructure → Notify ops team
Did fix pass tests?
Yes → Open PR
No → Rollback, try alternative fix
The Interview Questions They’ll Ask
Prepare to answer these:
- “How would you prevent the autonomous agent from making things worse?”
- “What if the agent generates an infinite loop of PRs?”
- “How do you ensure the agent doesn’t leak secrets or sensitive data?”
- “What happens if the agent’s fix causes a production outage?”
- “How would you audit all actions taken by the autonomous agent?”
- “Should the agent have access to merge PRs or only create them?”
Hints in Layers
Hint 1: Webhook Server Set up an Express server to receive GitHub webhooks:
import express from 'express';
import crypto from 'crypto';
const app = express();
app.post('/webhook', express.json(), async (req, res) => {
// Verify signature
const signature = req.headers['x-hub-signature-256'];
const hmac = crypto.createHmac('sha256', WEBHOOK_SECRET);
const digest = 'sha256=' + hmac.update(JSON.stringify(req.body)).digest('hex');
if (signature !== digest) {
return res.status(401).send('Invalid signature');
}
// Handle event
if (req.body.action === 'completed' && req.body.workflow_run.conclusion === 'failure') {
await handleWorkflowFailure(req.body.workflow_run);
}
res.status(200).send('OK');
});
Hint 2: Log Analysis Download logs and extract the failure:
const logs = await octokit.actions.downloadWorkflowRunLogs({
owner,
repo,
run_id,
});
// Parse logs to find error
const errorPattern = /Error: (.+)\n\s+at (.+):(\d+):(\d+)/;
const match = logs.match(errorPattern);
if (match) {
const [, message, file, line, column] = match;
return { message, file, line: parseInt(line), column: parseInt(column) };
}
Hint 3: Headless Kiro Invocation Run Kiro in non-interactive mode:
$ kiro --headless --prompt "Fix the null check error in AuthService.ts line 45" \
--files src/services/AuthService.ts \
--output-branch fix/auth-service-null-check \
--auto-commit
Or via API if Kiro has one:
const result = await kiro.executeTask({
instruction: "Add null check before token.split() on line 45",
files: ['src/services/AuthService.ts'],
branch: 'fix/auth-service-null-check',
runTests: true,
});
Hint 4: Verification Loop After generating a fix:
- Checkout the fix branch
- Run
npm install(orpip install, etc.) - Run
npm test - Parse test output:
- If all pass → open PR
- If some fail → analyze failures and retry
- If all fail → abort and notify human
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Webhooks | “Webhooks: Events for RESTful APIs” by Mike Amundsen | Ch. 2-3 |
| CI/CD | “Continuous Delivery” by Jez Humble | Ch. 10 |
| Root cause analysis | “Release It!” by Michael Nygard | Ch. 4 |
| Autonomous systems | “Building Event-Driven Microservices” by Adam Bellemare | Ch. 6 |
Common Pitfalls and Debugging
Problem 1: “Agent creates duplicate PRs for the same failure”
- Why: Webhook is delivered multiple times or agent doesn’t track what it’s fixed
- Fix: Store a hash of (run_id + failure_message) in a database to deduplicate
- Quick test: Trigger same failure twice — does it create one PR or two?
Problem 2: “Fix causes other tests to fail”
- Why: Fix is too aggressive or changes behavior elsewhere
- Fix: Run full test suite before opening PR; if new failures appear, rollback
- Quick test: Generate a fix that breaks a different test — does agent catch it?
Problem 3: “Agent leaks API keys or secrets in PRs”
- Why: Logs or fix code include sensitive data
- Fix: Use a secret sanitization hook before commits
- Quick test: Simulate a failure with API key in logs — is it redacted in PR?
Problem 4: “Infinite loop: fix fails → new fix → fails → …“
- Why: No circuit breaker for repeated failures
- Fix: Limit retries to 3; if all fail, notify human and stop
- Quick test: Create an unfixable failure — does agent stop after 3 tries?
Definition of Done
- Webhook server receives GitHub Actions failure events
- Agent downloads and parses workflow logs to find root cause
- Agent diagnoses the failure type (null check, type error, etc.)
- Agent generates a code fix using headless Kiro
- Agent creates a branch, commits the fix, and runs tests
- All tests must pass before opening a PR
- Agent opens a PR with detailed description and test results
- Agent logs all actions for audit trail
- Circuit breaker prevents infinite retry loops
- Secrets are sanitized before commits
- At least one real workflow failure is automatically fixed end-to-end
Project Comparison Table
| Project Range | Difficulty | Focus | Cool Factor |
|---|---|---|---|
| 1-5 (Foundations) | Beginner | Config, Context, Planning | 2/5 |
| 6-10 (Steering) | Intermediate | Personas, Specs, PBT | 3/5 |
| 11-16 (MCP) | Advanced | DBs, Cloud, Tools | 4/5 |
| 17-24 (Hooks/Remote) | Advanced | Security, SSH, Safety | 4/5 |
| 25-32 (Workflows) | Mixed | Tangents, Checkpoints, Docs | 3/5 |
| 33-40 (Capstone) | Master | Full Autonomy | 5/5 |
Summary
This learning path covers Kiro CLI through 40 hands-on projects.
| # | Project Name | Focus |
|---|---|---|
| 1 | Personalized Config | Configuration |
| 2 | Steering Enforcer | Prompt Engineering |
| 3 | Context Detective | LLM Context |
| 4 | Subagent Researcher | Agent Delegation |
| 5 | Plan Architect | Spec-Driven Dev |
| 6 | Custom Persona | Agent Config |
| 7 | Executable Spec | Documentation |
| 8 | PBT Suite | Testing |
| 9 | Postgres Analyst | MCP (Database) |
| 10 | GitHub Manager | MCP (Workflow) |
| 11 | AWS Architect | MCP (Cloud) |
| 12 | Doc Librarian | MCP (RAG) |
| 13 | Custom Tool (Py) | MCP (Protocol) |
| 14 | FS Guardian (Node) | MCP (Protocol) |
| 15 | Chrome Puppeteer | Browser Automation |
| 16 | Design to Code | Multimodal |
| 17 | Type-Safe Hook | Bun / TypeScript |
| 18 | Security Firewall | Policy / Governance |
| 19 | Auto-Fixer Loop | Feedback Loops |
| 20 | Git Context Injector | Context Automation |
| 21 | Headless Setup | Remote Dev |
| 22 | SSH Tunnel Agent | Networking |
| 23 | Corporate Proxy | Enterprise Ops |
| 24 | Secret Sanitizer | Security |
| 25 | Tangent Explorer | Context Management |
| 26 | Checkpoint Time Machine | Safety |
| 27 | Checklist Manager | Task Execution |
| 28 | Semantic Search | RAG |
| 29 | Delegate Worker | Async Tasks |
| 30 | Recursive Improver | Metacognition |
| 31 | Legacy Archaeologist | Code Exploration |
| 32 | Reverse Documenter | Documentation |
| 33 | Full Stack Scaffolder | Prototyping |
| 34 | Cloud Native Deployer | DevOps |
| 35 | Deep Reasoner | Algorithms |
| 36 | Global Translator | i18n |
| 37 | SQL Optimizer | Performance |
| 38 | Refactoring Surgeon | Architecture |
| 39 | API Client Generator | Integration |
| 40 | Autonomous Developer | Full Autonomy |
Recommended Learning Path
For beginners: Start with Project 1, 2, 5. Get comfortable with steering first.
For system architects: Jump to Project 9 (MCP) and Project 17 (Hooks).
For DevOps/SRE: Focus on Project 21 (Headless) and Project 40 (Capstone).