Project 7: The Codebase Concierge (Git & PR Agent)
Project 7: The Codebase Concierge (Git & PR Agent)
Build a CLI assistant that can understand a local repository, propose safe patches, run tests, and prepare a pull request with an audit trail.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert |
| Time Estimate | 30โ45 hours |
| Language | Python (Alternatives: Rust, Go) |
| Prerequisites | Git fluency, testing basics, AST concepts, prompt-injection awareness |
| Key Topics | code understanding, context pruning, patch planning, tool safety, repo automation, evals |
1. Learning Objectives
By completing this project, you will:
- Index a repository into searchable units (files, symbols, functions) with metadata.
- Retrieve the minimum viable context to solve an issue (context pruning discipline).
- Generate patches as structured diffs and validate them with tests/linters.
- Build safety rails (read-only vs write, sandboxed commands, approvals).
- Produce reproducible โagent tracesโ (inputs โ evidence โ patch โ test results).
2. Theoretical Foundation
2.1 Core Concepts
- Program representation:
- Plain text search finds strings, not structure.
- AST parsing (Tree-sitter, language parsers) finds symbols, definitions, call sites.
- Context pruning: LLMs are powerful but context-limited; you must retrieve the smallest set of relevant snippets.
- Change safety:
- Always separate โanalyzeโ from โapply changesโ.
- Prefer generating patches that humans can review.
- Run tests automatically to catch unintended breakage.
- Prompt injection in code: Comments and docs can contain malicious instructions (โignore your rulesโ). Treat repository text as untrusted input.
- Evals for code agents: You measure success by compilation, test pass rate, and diffs that match intentโnot by eloquence.
2.2 Why This Matters
This is the most โrealโ personal assistant project: it touches your actual work. The patterns you learn (retrieval, safety, structured patching, validation) are directly transferable to production agent systems.
2.3 Common Misconceptions
- โJust send the whole repo.โ It wonโt fit and it increases error rate.
- โIf the agent can edit, it should auto-commit.โ Unsafe by default; commits/PRs should be explicitly requested.
- โPassing tests means correct.โ Tests help; you also need scope limits and reviewable diffs.
3. Project Specification
3.1 What You Will Build
A terminal assistant that can:
- Answer questions about the repo (โWhere is auth implemented?โ)
- Create a plan for a requested change (โAdd unit tests for loginโ)
- Generate a patch (diff) and apply it in a controlled way
- Run tests and summarize results
- Optionally prepare a PR (branch + commit + PR body)
3.2 Functional Requirements
- Repo scan: detect language/build system and inventory files.
- Symbol index: extract functions/classes/modules (at least for one language).
- Retrieval: text search + optional embeddings over code chunks.
- Patch generation: produce unified diff with file paths and changes.
- Command runner: run safe commands (tests, formatters) with logging.
- Guardrails: require explicit user confirmation before file writes or network actions.
- PR prep (optional): create branch/commit and draft PR description.
3.3 Non-Functional Requirements
- Safety: strict allowlist for commands; default read-only mode.
- Reproducibility: save traces (retrieval inputs, selected context, patch, test output).
- Maintainability: keep core logic independent of any single repo.
- Performance: initial indexing under a few minutes for mid-size repos.
3.4 Example Usage / Output
$ python concierge.py "Add unit tests for the login function in auth.py"
Plan:
1) Locate login() implementation and dependencies
2) Identify existing test framework and patterns
3) Add tests covering success + failure + edge cases
4) Run tests and report
Proposed patch: 2 files changed
Run tests? (y/n)
4. Solution Architecture
4.1 High-Level Design
โโโโโโโโโโโโโโโโโ request โโโโโโโโโโโโโโโโโโโโ retrieve โโโโโโโโโโโโโโโโโ
โ CLI/UI โโโโโโโโโโโโโโถโ Planner/Agent โโโโโโโโโโโโโโถโ Context Engine โ
โโโโโโโโโโโโโโโโโ โโโโโโโโโฌโโโโโโโโโโโโ โโโโโโโโโฌโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ Patch Generator โโโโโโโโโโโโโโโถโ Patch Applier โ
โโโโโโโโโฌโโโโโโโโโโโ โโโโโโโโโฌโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ Command Runner โโโโโโโโโโโโโโโถโ Test Results โ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Repo analyzer | detect layout, build, tests | start with heuristics + config overrides |
| Symbol indexer | parse AST and build symbol map | Tree-sitter for multi-language support |
| Retriever | find minimal context | hybrid: ripgrep + embeddings |
| Patch pipeline | generate/review/apply diffs | unified diff + explicit confirmation |
| Command runner | run tests/linters | allowlist + sandbox + timeouts |
4.3 Data Structures
from dataclasses import dataclass
@dataclass(frozen=True)
class SymbolRef:
file_path: str
kind: str # "function" | "class" | "module"
name: str
start_line: int
end_line: int
@dataclass(frozen=True)
class ProposedPatch:
diff_text: str
files_touched: list[str]
rationale: str
4.4 Algorithm Overview
Key Algorithm: safe code change loop
- Interpret the request and produce an explicit plan.
- Retrieve relevant code context (symbols + surrounding lines).
- Generate a patch as a diff, not direct edits.
- Present patch summary; require confirmation to apply.
- Apply patch and run tests; summarize results.
- If tests fail, retrieve failure context and propose a bounded fix.
Complexity Analysis:
- Indexing: O(files) parsing time
- Each request: O(retrieval + LLM calls + test runtime)
5. Implementation Guide
5.1 Development Environment Setup
python -m venv .venv
source .venv/bin/activate
pip install pydantic rich tree_sitter
5.2 Project Structure
codebase-concierge/
โโโ src/
โ โโโ cli.py
โ โโโ analyze.py
โ โโโ index/
โ โโโ retrieve.py
โ โโโ plan.py
โ โโโ patch.py
โ โโโ run.py
โ โโโ safety.py
โโโ data/
โโโ traces/
5.3 Implementation Phases
Phase 1: Read-only Q&A with retrieval (8โ12h)
Goals:
- Answer โwhere is X?โ questions with citations to file/lines.
Tasks:
- Build repo analyzer +
rg-based retrieval. - Add symbol index for one language (start with Python).
- Return answers with referenced file paths and line ranges.
Checkpoint: โWhere is auth implemented?โ returns correct files quickly.
Phase 2: Patch generation + apply with confirmation (10โ15h)
Goals:
- Generate diffs and apply them safely.
Tasks:
- Implement diff generation and patch validation (paths, no binary edits).
- Add explicit confirmation before writes.
- Save full trace (context โ patch โ apply result).
Checkpoint: You can add a small feature (e.g., rename function) safely.
Phase 3: Test runner + fix loop (12โ18h)
Goals:
- Run tests and iteratively repair failures.
Tasks:
- Detect test framework and run commands.
- Parse failures into structured summaries.
- Add bounded โfix onceโ loop with new patch.
Checkpoint: For a requested test addition, tests pass with traceable steps.
5.4 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Retrieval | text-only vs AST-assisted | hybrid | structure reduces false positives |
| Write mode | always-on vs gated | gated | safety and trust |
| Patching | direct file edits vs diff | diff | reviewable and reproducible |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | safety/patch | reject unsafe commands, validate diffs |
| Integration | retrieval/index | ensure symbol index matches files |
| Scenario | end-to-end | โadd unit testโ, โrefactor nameโ |
6.2 Critical Test Cases
- Prompt injection: repo contains malicious comment โ assistant ignores it.
- Unsafe command: request to run
rm -rfโ assistant refuses. - Patch validity: malformed diff is rejected and not applied.
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Too much context | model edits wrong area | tighten retrieval and show selected context |
| Hidden side effects | change breaks unrelated modules | require tests; minimize patch scope |
| Build variance | tests behave differently | capture environment info and command logs |
| Over-automation | agent acts without consent | strict write-mode gating |
Debugging strategies:
- Store traces for replay; make the assistant explain its retrieval set.
- Start with read-only mode until retrieval feels trustworthy.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add โexplain fileโ and โsummarize moduleโ commands.
- Add local embedding search for code chunks.
8.2 Intermediate Extensions
- Add multi-language symbol indexing (Tree-sitter grammars).
- Add PR template generation (title, checklist, risk assessment).
8.3 Advanced Extensions
- Add automated refactoring with semantic constraints (AST transforms).
- Add evaluation harness with seeded repos and tasks.
9. Real-World Connections
9.1 Industry Applications
- Developer agents that draft PRs, write tests, and assist refactors.
- Internal platform tools that automate repetitive repo work.
9.3 Interview Relevance
- Retrieval + context pruning, structured patching, tool safety, and eval discipline.
10. Resources
10.1 Essential Reading
- AI Engineering (Chip Huyen) โ agentic workflows + safety (Ch. 6, 8)
- The LLM Engineering Handbook (Paul Iusztin) โ evaluations and RAG patterns (Ch. 8)
10.3 Tools & Documentation
- Tree-sitter docs (grammars, parsing)
- GitHub API docs (pull requests) if implementing PR creation
10.4 Related Projects in This Series
- Previous: Project 6 (tool routing) โ generalized tool registry patterns
- Next: Project 10 (monitoring) โ observability for agents that run commands
11. Self-Assessment Checklist
- I can explain how I pruned context and why itโs sufficient.
- I can show a trace from request โ evidence โ patch โ tests.
- I can prevent unsafe commands and unreviewed writes.
- I can add a new repo analyzer heuristic without breaking others.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Read-only Q&A with codebase retrieval and citations
- Patch generation as unified diff + explicit apply confirmation
- Test command runner with logged outputs
Full Completion:
- Symbol-aware retrieval (AST) and bounded fix loop on failures
- Optional PR drafting (branch + commit message + PR body)
Excellence (Going Above & Beyond):
- Multi-language support and automated eval suite for code-agent tasks
This guide was generated from project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY.md. For the complete sprint overview, see project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY/README.md.