Project 3: “The Context Detective” — LLM Context Window
| Attribute | Value |
|---|---|
| File | KIRO_CLI_MASTERY.md |
| Main Programming Language | Bash / Kiro Commands |
| Coolness Level | Level 2: Practical |
| Difficulty | Level 1: Beginner |
| Knowledge Area | LLM Context Window |
What you’ll build: Experiments to visualize token usage. Load a large file, use /context show, /compact, and /context remove.
Why it teaches Context: You learn to garbage collect AI memory to maintain performance and accuracy.
Core challenges you’ll face:
- Distinguishing file context from chat history.
- Detecting when instructions are being forgotten.
Success criteria:
- You can reduce context size without losing key constraints.
Real World Outcome
You’ll understand exactly how Kiro manages its 200K token context window and learn to use commands to prevent context overflow. You’ll be able to see what’s consuming context and strategically free up space.
Example Output:
$ kiro chat
You: "Load the entire src/ directory"
Kiro: [reads 250 files totaling 180K tokens]
Context usage: 90% (180K / 200K tokens)
⚠️ WARNING: Approaching context limit
I've loaded your source code, but we're at 90% context capacity.
Consider using `/compact` to summarize our conversation or
`/context remove` to free up space.
You: "/context show"
Kiro:
┌─────────────────────────────────────────────────┐
│ Context Window Usage: 180K / 200K (90%) │
├─────────────────────────────────────────────────┤
│ Category │ Tokens │ Percentage │
├─────────────────────┼───────────┼──────────────┤
│ Files (src/) │ 140K │ 70% │
│ Chat History │ 25K │ 12.5% │
│ Steering Files │ 8K │ 4% │
│ System Prompt │ 5K │ 2.5% │
│ Tools/Resources │ 2K │ 1% │
└─────────────────────┴───────────┴──────────────┘
Top file consumers:
1. src/components/Dashboard.tsx → 18K tokens
2. src/api/handlers.ts → 15K tokens
3. src/lib/database.ts → 12K tokens
You: "/context remove src/components/Dashboard.tsx"
Kiro: Removed src/components/Dashboard.tsx (18K tokens)
New context usage: 162K / 200K (81%)
You: "/compact"
Kiro: [summarizes conversation history]
Compacted 25K tokens of chat history → 5K token summary
New context usage: 142K / 200K (71%)
Summary preserved:
- Your project uses Next.js 14 with TypeScript
- Current task: debugging authentication flow
- Key files: src/api/handlers.ts, src/lib/database.ts
You: "Now analyze the auth flow"
Kiro: [works with 71% context usage]
Looking at src/api/handlers.ts...
[analysis continues with plenty of context headroom]
The context usage meter shows in real-time as files/messages consume space.
The Core Question You’re Answering
“What happens when I load too much code into Kiro’s context, and how do I manage the 200K token budget without losing important information?”
Before experimenting, understand: Kiro’s context window is like RAM—finite and precious. Once full, either Kiro auto-compacts (potentially losing details) or refuses to load more. This project teaches you to be a context window architect: strategic about what you load, when to summarize, and how to preserve critical constraints.
Concepts You Must Understand First
Stop and research these before experimenting:
- Token Counting
- What is a “token” in LLM terms?
- How many tokens does a typical code file consume?
- Do comments, whitespace, and variable names count as tokens?
- Reference: Kiro Context Management
- Context Window Composition
- What’s the breakdown of Kiro’s 200K context? (files, chat, steering, system prompt)
- Which components are fixed (system prompt) vs dynamic (chat history)?
- How does adding a steering file affect available space?
- Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 1 (Foundations)
- Compaction vs Removal
- What’s the difference between
/compact(summarize) and/context remove(delete)? - What information is lost during compaction?
- When should you compact vs when should you remove?
- Reference: Slash Commands Reference
- What’s the difference between
Questions to Guide Your Design
Before experimenting, think through these:
- Loading Strategy
- Should you load the entire codebase at once or selectively load files as needed?
- How do you decide which files are “important enough” to keep in context?
- What’s the tradeoff between having more context vs faster responses?
- Compaction Timing
- Should you wait for Kiro’s auto-compact (80% threshold) or manually compact earlier?
- What information must survive compaction? (steering rules, architectural decisions, bug context)
- How do you verify that compaction preserved the right details?
- Multi-File Workflows
- When debugging across 10 files, how do you keep all relevant context loaded?
- How do you avoid reloading files you’ve already removed?
- Should you use subagents for parallel file analysis instead?
Thinking Exercise
Exercise: Context Budget Allocation
You have 200K tokens. You’re debugging a Next.js authentication bug. Plan your context budget:
Available: 200K tokens
Fixed costs:
- System prompt: 5K
- Steering files: 8K
- Tools/Resources: 2K
─────────────────────────
Remaining budget: 185K
You need to analyze:
src/auth/login.tsx(12K tokens)src/api/auth.ts(8K tokens)src/lib/session.ts(6K tokens)src/middleware.ts(4K tokens).env.example(1K tokens)- Chat history will grow over 50 messages (~25K tokens)
Questions while planning:
- How much space should you reserve for growing chat history?
- If you run out of space mid-conversation, which file would you remove first?
- Should you proactively compact at 60% or wait until 80%?
- Could you use
/grepto search files instead of loading them entirely?
The Interview Questions They’ll Ask
- “Explain how Kiro’s 200K context window is allocated between files, chat history, and system components.”
- “What’s the difference between
/compactand/context remove? When would you use each?” - “How would you debug an issue across 20 files without exceeding the context window?”
- “What information is lost when Kiro auto-compacts at 80% context usage?”
- “How do steering files affect available context space?”
- “What strategies would you use to work with a codebase larger than 200K tokens?”
Hints in Layers
Hint 1: Monitor Before You Act
Always run /context show before making decisions. Don’t guess about usage—measure it. This shows exactly what’s consuming space.
Hint 2: Load Incrementally
Don’t run /context add src/ to load everything. Instead, load specific files: /context add src/auth/login.tsx. Add more only when needed. Start small, expand gradually.
Hint 3: Use Grep for Reconnaissance
Before loading a file into context, use /grep to search it. Example: /grep "authenticate" src/auth/login.tsx. This finds info without burning context tokens.
Hint 4: Compact Early and Often
Don’t wait until 90% usage. When you finish a subtask (e.g., “fixed login bug”), run /compact to summarize that work and free up space for the next subtask.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Token-based language models | “Speech and Language Processing” by Jurafsky & Martin | Ch. 3: N-gram Language Models |
| Memory management principles | “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron | Ch. 9: Virtual Memory |
| Resource allocation strategies | “Designing Data-Intensive Applications” by Kleppmann | Ch. 1: Foundations of Data Systems |
Common Pitfalls & Debugging
Problem 1: “Kiro auto-compacted and forgot my instructions”
- Why: Instructions were only in chat history, not in steering files or repeated in context
- Fix: Put persistent instructions in
.kiro/steering/*.md, not in chat messages - Quick test: After
/compact, ask Kiro to repeat your key constraints—they should still be remembered
Problem 2: “Context fills up instantly when I load files”
- Why: Loaded entire directory with
/context add src/instead of specific files - Fix: Remove with
/context clear, then load only essential files one by one - Quick test:
/context show→ check if you’re loading files you don’t actually need
Problem 3: “Can’t load more files even though context shows 40%”
- Why: Individual file might be too large (>50K tokens) or rate limiting
- Fix: Use
/grepto search file instead of loading it, or load only relevant sections - Quick test: Check file size with
wc -w filename.tsx(words × 1.3 ≈ tokens)
Problem 4: “After compaction, Kiro gives different answers”
- Why: Compaction creates a lossy summary; nuance is lost
- Fix: Before compacting, explicitly state what must be remembered: “Key constraint: authentication must use JWT tokens with 1-hour expiry”
- Quick test: Ask Kiro “what are the key constraints?” before and after
/compact
Definition of Done
- Ran
/context showand understand the breakdown of usage - Loaded a large file (>10K tokens) and observed context percentage increase
- Used
/context remove <file>to free up space and verified percentage decrease - Triggered auto-compaction by reaching 80% context usage (or manually ran
/compact) - Verified that key instructions survived compaction
- Experimented with
/grepas an alternative to loading full files - Identified which components are fixed (system prompt) vs dynamic (chat history)
- Can explain when to use compact vs remove vs subagents for large tasks