Project 7: The Codebase Concierge (Git & PR Agent)

Project 7: The Codebase Concierge (Git & PR Agent)

Build a CLI assistant that can understand a local repository, propose safe patches, run tests, and prepare a pull request with an audit trail.

Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 30โ€“45 hours
Language Python (Alternatives: Rust, Go)
Prerequisites Git fluency, testing basics, AST concepts, prompt-injection awareness
Key Topics code understanding, context pruning, patch planning, tool safety, repo automation, evals

1. Learning Objectives

By completing this project, you will:

  1. Index a repository into searchable units (files, symbols, functions) with metadata.
  2. Retrieve the minimum viable context to solve an issue (context pruning discipline).
  3. Generate patches as structured diffs and validate them with tests/linters.
  4. Build safety rails (read-only vs write, sandboxed commands, approvals).
  5. Produce reproducible โ€œagent tracesโ€ (inputs โ†’ evidence โ†’ patch โ†’ test results).

2. Theoretical Foundation

2.1 Core Concepts

  • Program representation:
    • Plain text search finds strings, not structure.
    • AST parsing (Tree-sitter, language parsers) finds symbols, definitions, call sites.
  • Context pruning: LLMs are powerful but context-limited; you must retrieve the smallest set of relevant snippets.
  • Change safety:
    • Always separate โ€œanalyzeโ€ from โ€œapply changesโ€.
    • Prefer generating patches that humans can review.
    • Run tests automatically to catch unintended breakage.
  • Prompt injection in code: Comments and docs can contain malicious instructions (โ€œignore your rulesโ€). Treat repository text as untrusted input.
  • Evals for code agents: You measure success by compilation, test pass rate, and diffs that match intentโ€”not by eloquence.

2.2 Why This Matters

This is the most โ€œrealโ€ personal assistant project: it touches your actual work. The patterns you learn (retrieval, safety, structured patching, validation) are directly transferable to production agent systems.

2.3 Common Misconceptions

  • โ€œJust send the whole repo.โ€ It wonโ€™t fit and it increases error rate.
  • โ€œIf the agent can edit, it should auto-commit.โ€ Unsafe by default; commits/PRs should be explicitly requested.
  • โ€œPassing tests means correct.โ€ Tests help; you also need scope limits and reviewable diffs.

3. Project Specification

3.1 What You Will Build

A terminal assistant that can:

  • Answer questions about the repo (โ€œWhere is auth implemented?โ€)
  • Create a plan for a requested change (โ€œAdd unit tests for loginโ€)
  • Generate a patch (diff) and apply it in a controlled way
  • Run tests and summarize results
  • Optionally prepare a PR (branch + commit + PR body)

3.2 Functional Requirements

  1. Repo scan: detect language/build system and inventory files.
  2. Symbol index: extract functions/classes/modules (at least for one language).
  3. Retrieval: text search + optional embeddings over code chunks.
  4. Patch generation: produce unified diff with file paths and changes.
  5. Command runner: run safe commands (tests, formatters) with logging.
  6. Guardrails: require explicit user confirmation before file writes or network actions.
  7. PR prep (optional): create branch/commit and draft PR description.

3.3 Non-Functional Requirements

  • Safety: strict allowlist for commands; default read-only mode.
  • Reproducibility: save traces (retrieval inputs, selected context, patch, test output).
  • Maintainability: keep core logic independent of any single repo.
  • Performance: initial indexing under a few minutes for mid-size repos.

3.4 Example Usage / Output

$ python concierge.py "Add unit tests for the login function in auth.py"

Plan:
  1) Locate login() implementation and dependencies
  2) Identify existing test framework and patterns
  3) Add tests covering success + failure + edge cases
  4) Run tests and report

Proposed patch: 2 files changed
Run tests? (y/n)

4. Solution Architecture

4.1 High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   request   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   retrieve   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ CLI/UI         โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ Planner/Agent     โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ Context Engine โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                       โ”‚                                 โ”‚
                                       โ–ผ                                 โ–ผ
                               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                               โ”‚ Patch Generator   โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ Patch Applier  โ”‚
                               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                       โ”‚                                 โ”‚
                                       โ–ผ                                 โ–ผ
                               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                               โ”‚ Command Runner    โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ Test Results   โ”‚
                               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.2 Key Components

Component Responsibility Key Decisions
Repo analyzer detect layout, build, tests start with heuristics + config overrides
Symbol indexer parse AST and build symbol map Tree-sitter for multi-language support
Retriever find minimal context hybrid: ripgrep + embeddings
Patch pipeline generate/review/apply diffs unified diff + explicit confirmation
Command runner run tests/linters allowlist + sandbox + timeouts

4.3 Data Structures

from dataclasses import dataclass

@dataclass(frozen=True)
class SymbolRef:
    file_path: str
    kind: str  # "function" | "class" | "module"
    name: str
    start_line: int
    end_line: int

@dataclass(frozen=True)
class ProposedPatch:
    diff_text: str
    files_touched: list[str]
    rationale: str

4.4 Algorithm Overview

Key Algorithm: safe code change loop

  1. Interpret the request and produce an explicit plan.
  2. Retrieve relevant code context (symbols + surrounding lines).
  3. Generate a patch as a diff, not direct edits.
  4. Present patch summary; require confirmation to apply.
  5. Apply patch and run tests; summarize results.
  6. If tests fail, retrieve failure context and propose a bounded fix.

Complexity Analysis:

  • Indexing: O(files) parsing time
  • Each request: O(retrieval + LLM calls + test runtime)

5. Implementation Guide

5.1 Development Environment Setup

python -m venv .venv
source .venv/bin/activate
pip install pydantic rich tree_sitter

5.2 Project Structure

codebase-concierge/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ cli.py
โ”‚   โ”œโ”€โ”€ analyze.py
โ”‚   โ”œโ”€โ”€ index/
โ”‚   โ”œโ”€โ”€ retrieve.py
โ”‚   โ”œโ”€โ”€ plan.py
โ”‚   โ”œโ”€โ”€ patch.py
โ”‚   โ”œโ”€โ”€ run.py
โ”‚   โ””โ”€โ”€ safety.py
โ””โ”€โ”€ data/
    โ””โ”€โ”€ traces/

5.3 Implementation Phases

Phase 1: Read-only Q&A with retrieval (8โ€“12h)

Goals:

  • Answer โ€œwhere is X?โ€ questions with citations to file/lines.

Tasks:

  1. Build repo analyzer + rg-based retrieval.
  2. Add symbol index for one language (start with Python).
  3. Return answers with referenced file paths and line ranges.

Checkpoint: โ€œWhere is auth implemented?โ€ returns correct files quickly.

Phase 2: Patch generation + apply with confirmation (10โ€“15h)

Goals:

  • Generate diffs and apply them safely.

Tasks:

  1. Implement diff generation and patch validation (paths, no binary edits).
  2. Add explicit confirmation before writes.
  3. Save full trace (context โ†’ patch โ†’ apply result).

Checkpoint: You can add a small feature (e.g., rename function) safely.

Phase 3: Test runner + fix loop (12โ€“18h)

Goals:

  • Run tests and iteratively repair failures.

Tasks:

  1. Detect test framework and run commands.
  2. Parse failures into structured summaries.
  3. Add bounded โ€œfix onceโ€ loop with new patch.

Checkpoint: For a requested test addition, tests pass with traceable steps.

5.4 Key Implementation Decisions

Decision Options Recommendation Rationale
Retrieval text-only vs AST-assisted hybrid structure reduces false positives
Write mode always-on vs gated gated safety and trust
Patching direct file edits vs diff diff reviewable and reproducible

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit safety/patch reject unsafe commands, validate diffs
Integration retrieval/index ensure symbol index matches files
Scenario end-to-end โ€œadd unit testโ€, โ€œrefactor nameโ€

6.2 Critical Test Cases

  1. Prompt injection: repo contains malicious comment โ†’ assistant ignores it.
  2. Unsafe command: request to run rm -rf โ†’ assistant refuses.
  3. Patch validity: malformed diff is rejected and not applied.

7. Common Pitfalls & Debugging

Pitfall Symptom Solution
Too much context model edits wrong area tighten retrieval and show selected context
Hidden side effects change breaks unrelated modules require tests; minimize patch scope
Build variance tests behave differently capture environment info and command logs
Over-automation agent acts without consent strict write-mode gating

Debugging strategies:

  • Store traces for replay; make the assistant explain its retrieval set.
  • Start with read-only mode until retrieval feels trustworthy.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add โ€œexplain fileโ€ and โ€œsummarize moduleโ€ commands.
  • Add local embedding search for code chunks.

8.2 Intermediate Extensions

  • Add multi-language symbol indexing (Tree-sitter grammars).
  • Add PR template generation (title, checklist, risk assessment).

8.3 Advanced Extensions

  • Add automated refactoring with semantic constraints (AST transforms).
  • Add evaluation harness with seeded repos and tasks.

9. Real-World Connections

9.1 Industry Applications

  • Developer agents that draft PRs, write tests, and assist refactors.
  • Internal platform tools that automate repetitive repo work.

9.3 Interview Relevance

  • Retrieval + context pruning, structured patching, tool safety, and eval discipline.

10. Resources

10.1 Essential Reading

  • AI Engineering (Chip Huyen) โ€” agentic workflows + safety (Ch. 6, 8)
  • The LLM Engineering Handbook (Paul Iusztin) โ€” evaluations and RAG patterns (Ch. 8)

10.3 Tools & Documentation

  • Tree-sitter docs (grammars, parsing)
  • GitHub API docs (pull requests) if implementing PR creation
  • Previous: Project 6 (tool routing) โ€” generalized tool registry patterns
  • Next: Project 10 (monitoring) โ€” observability for agents that run commands

11. Self-Assessment Checklist

  • I can explain how I pruned context and why itโ€™s sufficient.
  • I can show a trace from request โ†’ evidence โ†’ patch โ†’ tests.
  • I can prevent unsafe commands and unreviewed writes.
  • I can add a new repo analyzer heuristic without breaking others.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Read-only Q&A with codebase retrieval and citations
  • Patch generation as unified diff + explicit apply confirmation
  • Test command runner with logged outputs

Full Completion:

  • Symbol-aware retrieval (AST) and bounded fix loop on failures
  • Optional PR drafting (branch + commit message + PR body)

Excellence (Going Above & Beyond):

  • Multi-language support and automated eval suite for code-agent tasks

This guide was generated from project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY.md. For the complete sprint overview, see project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY/README.md.