Project 20: "The Git Context Injector" — Context Management

Project 20: “The Git Context Injector” — Context Management

Attribute	Value
File	`KIRO_CLI_LEARNING_PROJECTS.md`
Main Programming Language	Bash
Coolness Level	Level 2: Practical
Difficulty	Level 2: Intermediate
Knowledge Area	Context Management

What you’ll build: A UserPromptSubmit hook that appends git diff --staged.

Why it teaches Dynamic Context: The AI always sees the current change set.

Success criteria:

Prompt includes diff content automatically.

Real World Outcome

You’ll have a context injector that automatically enriches every Kiro prompt with git state information, ensuring the AI always has visibility into what code is currently changed, staged, or uncommitted. This eliminates the need to manually paste git diff output:

Without the hook:

$ kiro "write tests for the changes I just made"

Kiro: I don't see any recent changes in the conversation. Can you share what files you modified?

With the hook:

$ git add src/auth.ts  # Stage your changes

$ kiro "write tests for the changes I just made"

[UserPromptSubmit Hook] Injecting git context...

Enhanced prompt sent to Kiro:
────────────────────────────────────
Original: "write tests for the changes I just made"

Git Context:
Branch: feature/oauth-login
Status: 1 file changed, 45 insertions(+), 12 deletions(-)

Staged Changes:
diff --git a/src/auth.ts b/src/auth.ts
index 1234567..abcdefg 100644
--- a/src/auth.ts
+++ b/src/auth.ts
@@ -10,7 +10,15 @@ export class AuthService {
-  async login(username: string, password: string) {
-    return this.basicAuth(username, password);
+  async login(provider: 'google' | 'github', token: string) {
+    const user = await this.oauthVerify(provider, token);
+    return this.createSession(user);
   }
+
+  private async oauthVerify(provider: string, token: string) {
+    // New OAuth verification logic
+  }
────────────────────────────────────

Kiro: I can see you've refactored the login method to support OAuth. I'll write comprehensive tests for both Google and GitHub OAuth flows, covering token validation, user creation, and session management.

[Kiro writes auth.test.ts with OAuth-specific tests]

Context injection report:

$ bash analyze-context-usage.sh

Git Context Injector Report (Last 30 Days)
───────────────────────────────────────────
Total Prompts: 1,847
Context Injected: 1,245 (67%)
Context Skipped: 602 (33% - no staged changes)

Average Context Size:
- Staged diff: 234 lines
- Unstaged diff: 127 lines
- Recent commits: 3 commits

Top Use Cases:
1. "Write tests for these changes" (387 prompts)
2. "Review this code" (298 prompts)
3. "Fix the bug I introduced" (156 prompts)
4. "Document what I changed" (89 prompts)

Token Budget Impact:
- Average context added: 1,200 tokens
- Prompts that exceeded budget: 12 (0.6%)
- Context truncation applied: 8 times

The hook intelligently decides what git information is relevant and formats it for maximum AI comprehension.

The Core Question You’re Answering

“How do I give AI visibility into my current work context without manually pasting git diffs every time?”

Before you start coding, consider: AI is stateless—it doesn’t know what files you changed, what branch you’re on, or what you committed yesterday. Developers waste time copying git diff output or explaining “I just modified the auth file.” A context injector automates this, making every prompt git-aware. This project teaches you to augment prompts with dynamic, session-specific context that makes AI more effective.

Concepts You Must Understand First

Stop and research these before coding:

UserPromptSubmit Hook Lifecycle
- When does UserPromptSubmit execute (before prompt is sent to LLM)?
- Can you modify the user’s prompt text?
- How do you append context without overwriting the original prompt?
- What is the maximum context size before truncation is needed?
- Book Reference: Kiro CLI documentation - Hook System Architecture
Git Plumbing Commands
- How do you get staged changes only (git diff --cached)?
- How do you get unstaged changes (git diff)?
- How do you get recent commit history (git log -n 3 --oneline)?
- How do you check if you’re in a git repository (git rev-parse --is-inside-work-tree)?
- Book Reference: “Pro Git” by Scott Chacon - Ch. 10 (Git Internals)
Context Relevance Heuristics
- When should you inject diff (user mentions “changes”, “modified”, “tests”)?
- When should you skip injection (generic questions about unrelated topics)?
- How do you detect if the user is asking about code vs asking about concepts?
- Should you always inject branch name and commit history?
- Book Reference: None - requires experimentation and user feedback
Token Budget Management
- How many tokens does a typical diff consume?
- How do you truncate large diffs (>100 files changed)?
- Should you prioritize staged changes over unstaged?
- How do you summarize commits vs including full diffs?
- Book Reference: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 4 (Encoding)

Questions to Guide Your Design

Before implementing, think through these:

Context Selection
- Should you inject staged changes, unstaged changes, or both?
- Do you include recent commits, or only uncommitted work?
- How do you decide between git diff and git show HEAD?
- Should you include file renames, binary file changes, or submodule updates?
Prompt Augmentation
- Where do you inject context (before prompt, after, or in structured fields)?
- How do you format diffs for readability (syntax highlight, collapse large hunks)?
- Should you summarize (“3 files changed, 45 insertions”) or show full diffs?
- Do you annotate the context (“Git Context:” header) or inject silently?
Trigger Logic
- Do you inject context on every prompt or only when relevant?
- How do you detect relevance (keyword matching, NLP, always-on)?
- Should users be able to opt out (–no-git-context flag)?
- What if there are no changes—do you inject “no changes” or skip entirely?
Performance and Safety
- How do you handle repositories with thousands of files changed?
- Should you run git diff synchronously or cache results?
- What if git diff takes 10 seconds (large binary files)?
- How do you avoid leaking secrets in diffs (API keys, passwords)?

Thinking Exercise

Manual Context Injection Walkthrough

Before writing code, trace how your hook enhances different prompts:

Scenario 1: User asks about recent changes

User prompt: "Review the authentication changes I made"

Hook detects keywords: ["changes", "made"]
Hook runs: git diff --cached

Injected context:
Branch: feature/oauth
Staged: src/auth.ts (+45, -12)

Enhanced prompt:
"Review the authentication changes I made

Git Context:
<diff output>
"

Scenario 2: User asks generic question

User prompt: "What is the difference between OAuth and JWT?"

Hook detects: No code-specific keywords
Hook decision: Skip git context (not relevant)

Prompt sent unchanged:
"What is the difference between OAuth and JWT?"

Scenario 3: Large diff (500 files changed)

User prompt: "Write tests for my refactor"

Hook detects: 500 files changed in staging area
Hook decision: Truncate to top 10 most-changed files

Enhanced prompt:
"Write tests for my refactor

Git Context (truncated to top 10 files):
src/auth.ts (+200, -50)
src/db.ts (+150, -30)
...
[490 more files not shown]
"

Scenario 4: No changes staged

User prompt: "Explain this error message"

Hook runs: git diff --cached
Result: No output (nothing staged)

Hook decision: Inject "No staged changes" summary

Enhanced prompt:
"Explain this error message

Git Context: No staged changes. Branch: main (up to date)
"

Questions while tracing:

How do you balance verbosity (full diffs) vs conciseness (summaries)?
Should you always show branch name, even if it’s not relevant?
How do you handle merge conflicts in diffs?
What if the user’s prompt is already very long—do you still inject context?

The Interview Questions They’ll Ask

Prepare to answer these:

“How would you design a heuristic to determine when git context is relevant to a user’s prompt vs when it’s just noise?”
“Explain the difference between git diff, git diff --cached, and git diff HEAD. When would you use each?”
“How would you handle a scenario where the git diff output contains sensitive information like API keys or passwords?”
“What strategies would you use to truncate large diffs (500+ files changed) while preserving the most important information?”
“How would you implement caching for git commands to avoid running expensive operations on every prompt?”
“Explain how you would detect if a user’s prompt is asking about code (inject context) vs asking a conceptual question (skip context).”

Hints in Layers

Hint 1: Start with Always-On Injection Begin by injecting git context on every prompt. Don’t implement keyword detection yet. Get the basic flow working: read stdin (user prompt), run git diff --cached, append to prompt, output to stdout.

Hint 2: Check for Git Repository First Before running git commands, verify you’re in a repo:

if ! git rev-parse --is-inside-work-tree &>/dev/null; then
  # Not a git repo, skip injection
  echo "$original_prompt"
  exit 0
fi

Hint 3: Format Context for Readability Use markdown fences to make diffs clear:

echo "$original_prompt"
echo ""
echo "Git Context:"
echo '```diff'
git diff --cached
echo '```'

Hint 4: Truncate Large Diffs Limit diff size to avoid token budget issues:

diff_lines=$(git diff --cached | wc -l)
if [ "$diff_lines" -gt 500 ]; then
  git diff --cached --stat  # Show summary only
else
  git diff --cached
fi

Books That Will Help

Topic	Book	Chapter
Git Internals	“Pro Git” by Scott Chacon	Ch. 10 (Git Internals), Ch. 2 (Git Basics)
Hook System	Kiro CLI documentation	Hooks System, UserPromptSubmit
Shell Scripting	“The Linux Command Line” by William Shotts	Ch. 27 (Flow Control), Ch. 24 (Script Debugging)
Token Management	Kiro CLI docs	Context Window Management
Text Processing	“Unix Power Tools” by Shelley Powers	Ch. 13 (Searching and Substitution)

Common Pitfalls & Debugging

Problem 1: “Hook adds context to every prompt, even unrelated questions”

Why: No relevance detection implemented

Fix: Add keyword matching:

if echo "$prompt" | grep -qiE '(change|diff|commit|modify|test|review)'; then
  inject_git_context
fi

Quick test: Ask “What is Python?” and verify no git context is added

Problem 2: “Diff output contains API keys or secrets”

Why: No secret scanning before injecting context

Fix: Filter out sensitive patterns:

git diff --cached | grep -vE '(API_KEY|SECRET|PASSWORD|TOKEN)='

Quick test: Stage a file with API_KEY=abc123, verify it’s redacted

Problem 3: “Hook is slow, takes 5+ seconds per prompt”

Why: Running git diff on a massive repository every time

Fix: Cache diff results and invalidate on file changes:

cache_file="/tmp/git-context-$(git rev-parse HEAD).cache"
if [ ! -f "$cache_file" ]; then
  git diff --cached > "$cache_file"
fi
cat "$cache_file"

Quick test: Time hook execution—should be <100ms with cache

Problem 4: “Large diffs break Kiro’s context window”

Why: 500-file refactor generates 50,000 lines of diff

Fix: Implement smart truncation:

if [ $(git diff --cached | wc -l) -gt 500 ]; then
  echo "Git Context (large changeset, showing summary):"
  git diff --cached --stat | head -20
  echo "[...truncated...]"
else
  git diff --cached
fi

Quick test: Create a large diff, verify it’s summarized

Problem 5: “Hook doesn’t work in subdirectories”

Why: Git commands run from hook’s directory, not user’s cwd

Fix: Detect git root and run commands there:

git_root=$(git rev-parse --show-toplevel)
cd "$git_root" || exit 0
git diff --cached

Quick test: Run Kiro from a subdirectory, verify context injection works