Project 7: The Codebase Concierge (Git & PR Agent)

Build a CLI assistant that can understand a local repository, propose safe patches, run tests, and prepare a pull request with an audit trail.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	30–45 hours
Language	Python (Alternatives: Rust, Go)
Prerequisites	Git fluency, testing basics, AST concepts, prompt-injection awareness
Key Topics	code understanding, context pruning, patch planning, tool safety, repo automation, evals

1. Learning Objectives

By completing this project, you will:

Index a repository into searchable units (files, symbols, functions) with metadata.
Retrieve the minimum viable context to solve an issue (context pruning discipline).
Generate patches as structured diffs and validate them with tests/linters.
Build safety rails (read-only vs write, sandboxed commands, approvals).
Produce reproducible “agent traces” (inputs → evidence → patch → test results).

2. Theoretical Foundation

2.1 Core Concepts

Program representation:
- Plain text search finds strings, not structure.
- AST parsing (Tree-sitter, language parsers) finds symbols, definitions, call sites.
Context pruning: LLMs are powerful but context-limited; you must retrieve the smallest set of relevant snippets.
Change safety:
- Always separate “analyze” from “apply changes”.
- Prefer generating patches that humans can review.
- Run tests automatically to catch unintended breakage.
Prompt injection in code: Comments and docs can contain malicious instructions (“ignore your rules”). Treat repository text as untrusted input.
Evals for code agents: You measure success by compilation, test pass rate, and diffs that match intent—not by eloquence.

2.2 Why This Matters

This is the most “real” personal assistant project: it touches your actual work. The patterns you learn (retrieval, safety, structured patching, validation) are directly transferable to production agent systems.

2.3 Common Misconceptions

“Just send the whole repo.” It won’t fit and it increases error rate.
“If the agent can edit, it should auto-commit.” Unsafe by default; commits/PRs should be explicitly requested.
“Passing tests means correct.” Tests help; you also need scope limits and reviewable diffs.

3. Project Specification

3.1 What You Will Build

A terminal assistant that can:

Answer questions about the repo (“Where is auth implemented?”)
Create a plan for a requested change (“Add unit tests for login”)
Generate a patch (diff) and apply it in a controlled way
Run tests and summarize results
Optionally prepare a PR (branch + commit + PR body)

3.2 Functional Requirements

Repo scan: detect language/build system and inventory files.
Symbol index: extract functions/classes/modules (at least for one language).
Retrieval: text search + optional embeddings over code chunks.
Patch generation: produce unified diff with file paths and changes.
Command runner: run safe commands (tests, formatters) with logging.
Guardrails: require explicit user confirmation before file writes or network actions.
PR prep (optional): create branch/commit and draft PR description.

3.3 Non-Functional Requirements

Safety: strict allowlist for commands; default read-only mode.
Reproducibility: save traces (retrieval inputs, selected context, patch, test output).
Maintainability: keep core logic independent of any single repo.
Performance: initial indexing under a few minutes for mid-size repos.

3.4 Example Usage / Output

$ python concierge.py "Add unit tests for the login function in auth.py"

Plan:
  1) Locate login() implementation and dependencies
  2) Identify existing test framework and patterns
  3) Add tests covering success + failure + edge cases
  4) Run tests and report

Proposed patch: 2 files changed
Run tests? (y/n)

4. Solution Architecture

4.1 High-Level Design

┌───────────────┐   request   ┌──────────────────┐   retrieve   ┌───────────────┐
│ CLI/UI         │────────────▶│ Planner/Agent     │────────────▶│ Context Engine │
└───────────────┘             └───────┬───────────┘            └───────┬───────┘
                                       │                                 │
                                       ▼                                 ▼
                               ┌──────────────────┐              ┌───────────────┐
                               │ Patch Generator   │─────────────▶│ Patch Applier  │
                               └───────┬──────────┘              └───────┬───────┘
                                       │                                 │
                                       ▼                                 ▼
                               ┌──────────────────┐              ┌───────────────┐
                               │ Command Runner    │─────────────▶│ Test Results   │
                               └──────────────────┘              └───────────────┘

4.2 Key Components

Component	Responsibility	Key Decisions
Repo analyzer	detect layout, build, tests	start with heuristics + config overrides
Symbol indexer	parse AST and build symbol map	Tree-sitter for multi-language support
Retriever	find minimal context	hybrid: ripgrep + embeddings
Patch pipeline	generate/review/apply diffs	unified diff + explicit confirmation
Command runner	run tests/linters	allowlist + sandbox + timeouts

4.3 Data Structures

from dataclasses import dataclass

@dataclass(frozen=True)
class SymbolRef:
    file_path: str
    kind: str  # "function" | "class" | "module"
    name: str
    start_line: int
    end_line: int

@dataclass(frozen=True)
class ProposedPatch:
    diff_text: str
    files_touched: list[str]
    rationale: str

4.4 Algorithm Overview

Key Algorithm: safe code change loop

Interpret the request and produce an explicit plan.
Retrieve relevant code context (symbols + surrounding lines).
Generate a patch as a diff, not direct edits.
Present patch summary; require confirmation to apply.
Apply patch and run tests; summarize results.
If tests fail, retrieve failure context and propose a bounded fix.

Complexity Analysis:

Indexing: O(files) parsing time
Each request: O(retrieval + LLM calls + test runtime)

5. Implementation Guide

5.1 Development Environment Setup

python -m venv .venv
source .venv/bin/activate
pip install pydantic rich tree_sitter

5.2 Project Structure

codebase-concierge/
├── src/
│   ├── cli.py
│   ├── analyze.py
│   ├── index/
│   ├── retrieve.py
│   ├── plan.py
│   ├── patch.py
│   ├── run.py
│   └── safety.py
└── data/
    └── traces/

5.3 Implementation Phases

Phase 1: Read-only Q&A with retrieval (8–12h)

Goals:

Answer “where is X?” questions with citations to file/lines.

Tasks:

Build repo analyzer + rg-based retrieval.
Add symbol index for one language (start with Python).
Return answers with referenced file paths and line ranges.

Checkpoint: “Where is auth implemented?” returns correct files quickly.

Phase 2: Patch generation + apply with confirmation (10–15h)

Goals:

Generate diffs and apply them safely.

Tasks:

Implement diff generation and patch validation (paths, no binary edits).
Add explicit confirmation before writes.
Save full trace (context → patch → apply result).

Checkpoint: You can add a small feature (e.g., rename function) safely.

Phase 3: Test runner + fix loop (12–18h)

Goals:

Run tests and iteratively repair failures.

Tasks:

Detect test framework and run commands.
Parse failures into structured summaries.
Add bounded “fix once” loop with new patch.

Checkpoint: For a requested test addition, tests pass with traceable steps.

5.4 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Retrieval	text-only vs AST-assisted	hybrid	structure reduces false positives
Write mode	always-on vs gated	gated	safety and trust
Patching	direct file edits vs diff	diff	reviewable and reproducible

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit	safety/patch	reject unsafe commands, validate diffs
Integration	retrieval/index	ensure symbol index matches files
Scenario	end-to-end	“add unit test”, “refactor name”

6.2 Critical Test Cases

Prompt injection: repo contains malicious comment → assistant ignores it.
Unsafe command: request to run rm -rf → assistant refuses.
Patch validity: malformed diff is rejected and not applied.

7. Common Pitfalls & Debugging

Pitfall	Symptom	Solution
Too much context	model edits wrong area	tighten retrieval and show selected context
Hidden side effects	change breaks unrelated modules	require tests; minimize patch scope
Build variance	tests behave differently	capture environment info and command logs
Over-automation	agent acts without consent	strict write-mode gating

Debugging strategies:

Store traces for replay; make the assistant explain its retrieval set.
Start with read-only mode until retrieval feels trustworthy.

8. Extensions & Challenges

8.1 Beginner Extensions

Add “explain file” and “summarize module” commands.
Add local embedding search for code chunks.

8.2 Intermediate Extensions

Add multi-language symbol indexing (Tree-sitter grammars).
Add PR template generation (title, checklist, risk assessment).

8.3 Advanced Extensions

Add automated refactoring with semantic constraints (AST transforms).
Add evaluation harness with seeded repos and tasks.

9. Real-World Connections

9.1 Industry Applications

Developer agents that draft PRs, write tests, and assist refactors.
Internal platform tools that automate repetitive repo work.

9.3 Interview Relevance

Retrieval + context pruning, structured patching, tool safety, and eval discipline.

10. Resources

10.1 Essential Reading

AI Engineering (Chip Huyen) — agentic workflows + safety (Ch. 6, 8)
The LLM Engineering Handbook (Paul Iusztin) — evaluations and RAG patterns (Ch. 8)

10.3 Tools & Documentation

Tree-sitter docs (grammars, parsing)
GitHub API docs (pull requests) if implementing PR creation

Previous: Project 6 (tool routing) — generalized tool registry patterns
Next: Project 10 (monitoring) — observability for agents that run commands

11. Self-Assessment Checklist

I can explain how I pruned context and why it’s sufficient.
I can show a trace from request → evidence → patch → tests.
I can prevent unsafe commands and unreviewed writes.
I can add a new repo analyzer heuristic without breaking others.

12. Submission / Completion Criteria

Minimum Viable Completion:

Read-only Q&A with codebase retrieval and citations
Patch generation as unified diff + explicit apply confirmation
Test command runner with logged outputs

Full Completion:

Symbol-aware retrieval (AST) and bounded fix loop on failures
Optional PR drafting (branch + commit message + PR body)

Excellence (Going Above & Beyond):

Multi-language support and automated eval suite for code-agent tasks

This guide was generated from project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY.md. For the complete sprint overview, see project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY/README.md.

Project 7: The Codebase Concierge (Git & PR Agent)

Quick Reference

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Common Misconceptions

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 Implementation Phases

Phase 1: Read-only Q&A with retrieval (8–12h)

Phase 2: Patch generation + apply with confirmation (10–15h)

Phase 3: Test runner + fix loop (12–18h)

5.4 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

7. Common Pitfalls & Debugging

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

12. Submission / Completion Criteria