Project 7: git-insight

Build a CLI wrapper that turns raw git data into higher-level insights.

Quick Reference

Attribute Value
Difficulty Level 2 (Intermediate)
Time Estimate Weekend
Language Go (Alternatives: Rust, Python)
Prerequisites CLI basics, subprocess execution
Key Topics Process execution, parsing output, composition

1. Learning Objectives

By completing this project, you will:

  1. Execute subprocesses reliably and capture stdout/stderr.
  2. Parse command output into structured data safely.
  3. Build derived insights by combining multiple git commands.
  4. Provide human-readable and machine-readable output.
  5. Handle repo errors and edge cases gracefully.

2. Theoretical Foundation

2.1 Core Concepts

  • Process Execution: Wrappers must capture stdout/stderr, check exit codes, and avoid hanging on long output.
  • Parsing Text Output: Many CLI tools are text-first. Reliable parsing requires stable flags (--porcelain, --format).
  • Composition: The Unix way is to build new behavior by orchestrating existing tools.
  • Read-Only Safety: Default to read-only operations; avoid destructive commands unless explicitly allowed.

2.2 Why This Matters

Real-world tools are often wrappers around existing CLIs. A clean wrapper is easier to maintain than reimplementing git functionality.

2.3 Historical Context / Background

git is a toolkit of commands. The gh CLI and hub are wrappers that add workflows on top. You will build a narrower, safer wrapper focused on insight.

2.4 Common Misconceptions

  • “Parsing any git output is fine”: Many outputs change with localization or config. Use --porcelain and --format.
  • “Subprocess errors are obvious”: You must propagate exit codes and context.

3. Project Specification

3.1 What You Will Build

A CLI named git-insight that provides:

  • git-insight status — repo summary and working tree cleanliness
  • git-insight churn --commits 50 — top files changed in the last N commits
  • git-insight authors --path src/ — top contributors by file or folder
  • git-insight stale --days 30 — branches not updated recently

3.2 Functional Requirements

  1. Command runner: Execute git commands with timeouts and error handling.
  2. Stable parsing: Use git status --porcelain, git log --format.
  3. Output formats: Table by default; --output json available.
  4. Repo validation: Detect and error if not in a repo.
  5. Filtering: Support path filters for churn/authors.

3.3 Non-Functional Requirements

  • Reliability: No hangs on large repos.
  • Portability: Works anywhere git runs.
  • Safety: Must not modify repo state.

3.4 Example Usage / Output

$ git-insight churn --commits 50
File                      Commits
src/api/server.go          14
src/db/schema.sql          11

3.5 Real World Outcome

You run the tool inside a repo and receive actionable summaries:

$ git-insight status
Branch: main (up to date)
Working tree: clean
Untracked files: 0

$ git-insight authors --path src/
Author               Commits
alex@example.com     62
sam@example.com      41

The same reports can be scripted with JSON output:

$ git-insight churn --commits 50 --output json
[{"file":"src/api/server.go","commits":14},{"file":"src/db/schema.sql","commits":11}]

4. Solution Architecture

4.1 High-Level Design

+-------------+     +-----------------+     +------------------+
| Runner      | --> | Parser/Mapper   | --> | Report Renderer  |
+-------------+     +-----------------+     +------------------+
         |                   |                     |
         +-------------------+---------------------+
                       Shared config

4.2 Key Components

Component Responsibility Key Decisions
Runner Execute git commands timeouts, cwd
Parser Convert text to data porcelain formats
Reporter Table/JSON output stable schema

4.3 Data Structures

type ChurnEntry struct {
    File string
    Commits int
}

type AuthorEntry struct {
    Author string
    Commits int
}

4.4 Algorithm Overview

Key Algorithm: Churn computation

  1. Run git log --name-only --format="" -n N.
  2. Count file occurrences.
  3. Sort by count desc.

Complexity Analysis:

  • Time: O(N * F) per commit list
  • Space: O(F) for file counts

5. Implementation Guide

5.1 Development Environment Setup

brew install go
mkdir git-insight && cd git-insight
go mod init git-insight

5.2 Project Structure

git-insight/
├── cmd/
│   ├── root.go
│   ├── status.go
│   ├── churn.go
│   ├── authors.go
│   └── stale.go
├── internal/
│   ├── runner/
│   ├── parse/
│   └── output/
└── README.md

5.3 The Core Question You Are Answering

“How do I compose existing CLI tools without fragile parsing or unsafe side effects?”

5.4 Concepts You Must Understand First

  1. git status --porcelain formats
  2. Process execution and exit codes
  3. Text parsing with stable delimiters

5.5 Questions to Guide Your Design

  1. Which git commands are stable enough for parsing?
  2. How do you handle large output safely?
  3. Should the tool operate outside a repo? (Answer: no, detect early.)

5.6 Thinking Exercise

Run git status --porcelain and map each status code to a human-readable description.

5.7 The Interview Questions They Will Ask

  1. Why use --porcelain instead of default output?
  2. How do you prevent command injection in wrappers?
  3. How do you handle subprocess failures cleanly?

5.8 Hints in Layers

Hint 1: Start with status only and parse porcelain output.

Hint 2: Implement churn by counting filenames from git log --name-only.

Hint 3: Add JSON output once table output is correct.

5.9 Books That Will Help

Topic Book Chapter
CLI reliability “The Linux Command Line” Ch. 26-27
Process handling “Advanced Programming in the UNIX Environment” Ch. 8

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Goals:

  • Command runner
  • status summary

Checkpoint: git-insight status works in any repo.

Phase 2: Core Insights (1-2 days)

Goals:

  • Churn and authors reports

Checkpoint: Churn matches manual git log inspection.

Phase 3: Polish (1 day)

Goals:

  • JSON output
  • Error messaging

Checkpoint: --output json works.


6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Parsing porcelain parsing
Integration Tests CLI output churn report
Edge Cases Empty repo clean outputs

6.2 Critical Test Cases

  1. Run in non-repo directory -> error.
  2. Churn in repo with no commits -> empty output.
  3. JSON output matches schema.

7. Common Pitfalls and Debugging

Pitfall Symptom Solution
Parsing human output Wrong results Use porcelain/format flags
Missing errors Silent failures Propagate exit codes
Slow on large repos Laggy output Limit commits, add flags

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add git-insight top --n 20
  • Add --since filters

8.2 Intermediate Extensions

  • Add heatmap output
  • Add CSV output

8.3 Advanced Extensions

  • Add blame-based ownership heatmaps
  • Add repo comparison

9. Real-World Connections

  • Repo analytics in engineering orgs
  • Code review prioritization

10. Resources

  • git man pages for porcelain formats
  • gh CLI source for patterns

11. Self-Assessment Checklist

  • I can explain how porcelain output differs
  • I can handle subprocess errors correctly

12. Submission / Completion Criteria

Minimum Viable Completion:

  • status and churn commands work

Full Completion:

  • JSON output + authors + stale

Excellence (Going Above and Beyond):

  • Heatmaps or blame-based insights