Project 3: Interactive Rebase Simulator — Understand History Rewriting

A tool that simulates git rebase -i without actually modifying the repository, showing exactly what commits would be created, dropped, or modified—and explaining why.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Main Programming Language Python
Alternative Programming Languages Go, Rust, TypeScript
Coolness Level Level 4: Hardcore Tech Flex
Business Potential 2. The “Micro-SaaS / Pro Tool”
Prerequisites Project 1 and 2 completed, understanding of rebase vs merge
Key Topics Commit Identity, Rebase Operations, The Three-Way Merge

1. Learning Objectives

By completing this project, you will:

  1. Implement a working version of: A tool that simulates git rebase -i without actually modifying the repository, showing exactly what commits would be created, dropped, or modified—and explaining why..
  2. Explain the core Git workflow tradeoff this project is designed to surface.
  3. Design deterministic checks so results can be verified and reproduced.
  4. Document operational failure modes and safe recovery actions.

2. All Theory Needed (Per-Concept Breakdown)

Commit Identity

Fundamentals This concept matters in this project because your implementation will fail or become non-deterministic without a precise model of Commit Identity. You should define what the concept controls, what invariants must hold, and which actions are safe versus destructive. Treat this concept as a production concern, not a tutorial checkbox.

Deep Dive into the concept When applying Commit Identity in this project, reason in three passes: data shape, state transitions, and enforcement. First, identify which artifacts are authoritative (commit objects, refs, metadata, policy config, CI status, or scan findings). Second, map how those artifacts change when your tool runs. Third, define failure behavior explicitly. In Git tooling, silent partial success is dangerous: you need either complete success with evidence or an explicit failure state with remediation guidance. Also account for scale behavior. A workflow that works on a toy repo may fail on large history depth, concurrent updates, or mixed branch policies. Include trace logs for every irreversible action, and separate simulation mode from write mode. For interview readiness, be able to explain how this concept protects delivery speed while reducing operational risk.

How this fit on projects In this project, Commit Identity is directly used in design decisions, implementation constraints, and verification criteria.

Definitions & key terms

  • Commit Identity invariant: A condition that must remain true before and after every operation.
  • Safety boundary: The point where actions become destructive unless guarded.
  • Verification signal: Evidence proving the action behaved as expected.

Mental model diagram

Input state -> Validate invariant -> Apply change -> Verify output -> Record evidence

How it works

  1. Capture current state and constraints.
  2. Evaluate whether Commit Identity preconditions are satisfied.
  3. Execute the minimal safe transition.
  4. Verify postconditions and publish an auditable result.

Failure modes: stale state, partial writes, race conditions, ambiguous output contracts.

Minimal concrete example

Plan -> dry-run -> execute -> verify -> rollback/forward-fix decision

Common misconceptions

  • Assuming local success implies team-safe behavior.
  • Treating policy violations as warnings instead of merge blockers.
  • Skipping deterministic verification because the output appears correct.

Check-your-understanding questions

  1. Which invariant is most likely to break first under concurrency?
  2. What output proves your tool handled an edge case correctly?
  3. Where should enforcement happen: local hook, CI, or protected branch gate?

Check-your-understanding answers

  1. The invariant tied to mutable refs or policy-dependent merge eligibility.
  2. A deterministic transcript showing both success and controlled failure behavior.
  3. Layered enforcement: fast local checks plus non-bypassable server-side gates.

Real-world applications

  • Change-management tooling for fast-moving teams.
  • Incident-safe release workflows with traceable rollback paths.
  • Compliance-ready source-control automation.

Where you’ll apply it This project and its immediate adjacent projects in this sprint.

References

  • https://git-scm.com/docs
  • https://dora.dev/capabilities/trunk-based-development/

Key insights Commit Identity is only valuable when its invariants are encoded into tooling and checks.

Summary Mastering Commit Identity here gives you transferable patterns for larger workflow systems.

Homework/Exercises to practice the concept

  1. Write one failing scenario and expected detection output.
  2. Define one invariant and one explicit violation test.

Solutions to the homework/exercises

  1. Use a stale branch or invalid metadata case and assert deterministic error reporting.
  2. Invariant: protected branch must not accept unchecked changes; violation test: bypass attempt should fail fast.

Rebase Operations

Fundamentals This concept matters in this project because your implementation will fail or become non-deterministic without a precise model of Rebase Operations. You should define what the concept controls, what invariants must hold, and which actions are safe versus destructive. Treat this concept as a production concern, not a tutorial checkbox.

Deep Dive into the concept When applying Rebase Operations in this project, reason in three passes: data shape, state transitions, and enforcement. First, identify which artifacts are authoritative (commit objects, refs, metadata, policy config, CI status, or scan findings). Second, map how those artifacts change when your tool runs. Third, define failure behavior explicitly. In Git tooling, silent partial success is dangerous: you need either complete success with evidence or an explicit failure state with remediation guidance. Also account for scale behavior. A workflow that works on a toy repo may fail on large history depth, concurrent updates, or mixed branch policies. Include trace logs for every irreversible action, and separate simulation mode from write mode. For interview readiness, be able to explain how this concept protects delivery speed while reducing operational risk.

How this fit on projects In this project, Rebase Operations is directly used in design decisions, implementation constraints, and verification criteria.

Definitions & key terms

  • Rebase Operations invariant: A condition that must remain true before and after every operation.
  • Safety boundary: The point where actions become destructive unless guarded.
  • Verification signal: Evidence proving the action behaved as expected.

Mental model diagram

Input state -> Validate invariant -> Apply change -> Verify output -> Record evidence

How it works

  1. Capture current state and constraints.
  2. Evaluate whether Rebase Operations preconditions are satisfied.
  3. Execute the minimal safe transition.
  4. Verify postconditions and publish an auditable result.

Failure modes: stale state, partial writes, race conditions, ambiguous output contracts.

Minimal concrete example

Plan -> dry-run -> execute -> verify -> rollback/forward-fix decision

Common misconceptions

  • Assuming local success implies team-safe behavior.
  • Treating policy violations as warnings instead of merge blockers.
  • Skipping deterministic verification because the output appears correct.

Check-your-understanding questions

  1. Which invariant is most likely to break first under concurrency?
  2. What output proves your tool handled an edge case correctly?
  3. Where should enforcement happen: local hook, CI, or protected branch gate?

Check-your-understanding answers

  1. The invariant tied to mutable refs or policy-dependent merge eligibility.
  2. A deterministic transcript showing both success and controlled failure behavior.
  3. Layered enforcement: fast local checks plus non-bypassable server-side gates.

Real-world applications

  • Change-management tooling for fast-moving teams.
  • Incident-safe release workflows with traceable rollback paths.
  • Compliance-ready source-control automation.

Where you’ll apply it This project and its immediate adjacent projects in this sprint.

References

  • https://git-scm.com/docs
  • https://dora.dev/capabilities/trunk-based-development/

Key insights Rebase Operations is only valuable when its invariants are encoded into tooling and checks.

Summary Mastering Rebase Operations here gives you transferable patterns for larger workflow systems.

Homework/Exercises to practice the concept

  1. Write one failing scenario and expected detection output.
  2. Define one invariant and one explicit violation test.

Solutions to the homework/exercises

  1. Use a stale branch or invalid metadata case and assert deterministic error reporting.
  2. Invariant: protected branch must not accept unchecked changes; violation test: bypass attempt should fail fast.

The Three-Way Merge

Fundamentals This concept matters in this project because your implementation will fail or become non-deterministic without a precise model of The Three-Way Merge. You should define what the concept controls, what invariants must hold, and which actions are safe versus destructive. Treat this concept as a production concern, not a tutorial checkbox.

Deep Dive into the concept When applying The Three-Way Merge in this project, reason in three passes: data shape, state transitions, and enforcement. First, identify which artifacts are authoritative (commit objects, refs, metadata, policy config, CI status, or scan findings). Second, map how those artifacts change when your tool runs. Third, define failure behavior explicitly. In Git tooling, silent partial success is dangerous: you need either complete success with evidence or an explicit failure state with remediation guidance. Also account for scale behavior. A workflow that works on a toy repo may fail on large history depth, concurrent updates, or mixed branch policies. Include trace logs for every irreversible action, and separate simulation mode from write mode. For interview readiness, be able to explain how this concept protects delivery speed while reducing operational risk.

How this fit on projects In this project, The Three-Way Merge is directly used in design decisions, implementation constraints, and verification criteria.

Definitions & key terms

  • The Three-Way Merge invariant: A condition that must remain true before and after every operation.
  • Safety boundary: The point where actions become destructive unless guarded.
  • Verification signal: Evidence proving the action behaved as expected.

Mental model diagram

Input state -> Validate invariant -> Apply change -> Verify output -> Record evidence

How it works

  1. Capture current state and constraints.
  2. Evaluate whether The Three-Way Merge preconditions are satisfied.
  3. Execute the minimal safe transition.
  4. Verify postconditions and publish an auditable result.

Failure modes: stale state, partial writes, race conditions, ambiguous output contracts.

Minimal concrete example

Plan -> dry-run -> execute -> verify -> rollback/forward-fix decision

Common misconceptions

  • Assuming local success implies team-safe behavior.
  • Treating policy violations as warnings instead of merge blockers.
  • Skipping deterministic verification because the output appears correct.

Check-your-understanding questions

  1. Which invariant is most likely to break first under concurrency?
  2. What output proves your tool handled an edge case correctly?
  3. Where should enforcement happen: local hook, CI, or protected branch gate?

Check-your-understanding answers

  1. The invariant tied to mutable refs or policy-dependent merge eligibility.
  2. A deterministic transcript showing both success and controlled failure behavior.
  3. Layered enforcement: fast local checks plus non-bypassable server-side gates.

Real-world applications

  • Change-management tooling for fast-moving teams.
  • Incident-safe release workflows with traceable rollback paths.
  • Compliance-ready source-control automation.

Where you’ll apply it This project and its immediate adjacent projects in this sprint.

References

  • https://git-scm.com/docs
  • https://dora.dev/capabilities/trunk-based-development/

Key insights The Three-Way Merge is only valuable when its invariants are encoded into tooling and checks.

Summary Mastering The Three-Way Merge here gives you transferable patterns for larger workflow systems.

Homework/Exercises to practice the concept

  1. Write one failing scenario and expected detection output.
  2. Define one invariant and one explicit violation test.

Solutions to the homework/exercises

  1. Use a stale branch or invalid metadata case and assert deterministic error reporting.
  2. Invariant: protected branch must not accept unchecked changes; violation test: bypass attempt should fail fast.

3. Project Specification

3.1 What You Will Build

A tool that simulates git rebase -i without actually modifying the repository, showing exactly what commits would be created, dropped, or modified—and explaining why.

3.2 Functional Requirements

  1. Scope control: Deliver a deterministic and testable implementation.
  2. Correctness: Preserve Git invariants and policy constraints.

3.3 Non-Functional Requirements

  • Performance: Deterministic execution with documented runtime behavior on representative history sizes.
  • Reliability: Repeated runs on the same input produce identical outputs.
  • Usability: Clear CLI or report output for both success and failure cases.

3.4 Example Usage / Output

You’ll have a tool that shows exactly what an interactive rebase would do, step by step:

Example Output:

$ ./rebase-sim --repo /path/to/repo --onto main --branch feature

=== Interactive Rebase Simulator ===
Simulating: git rebase -i main feature

Current branch 'feature' has 5 commits not on 'main':
  abc1234 Add user model
  def5678 Fix typo in user model
  ghi9012 Add authentication
  jkl3456 WIP debugging
  mno7890 Add password hashing

Enter todo commands (or press Enter for default 'pick all'):
pick abc1234
squash def5678
pick ghi9012
drop jkl3456
pick mno7890

=== SIMULATION RESULTS ===

Step 1: pick abc1234 "Add user model"
  Old SHA: abc1234...
  New SHA: xyz7777... (different because parent changed!)
  Old parent: 111222... (old main)
  New parent: 999888... (current main tip)

Step 2: squash def5678 "Fix typo in user model"
  Combining with previous commit...
  New message will be:
    "Add user model

    Fix typo in user model"
  Combined SHA: uvw4444...

Step 3: pick ghi9012 "Add authentication"
  Old SHA: ghi9012...
  New SHA: rst5555...

Step 4: drop jkl3456 "WIP debugging"
  ⚠️  This commit will be DELETED
  Changes in this commit:
    - src/debug.py (will be lost!)

Step 5: pick mno7890 "Add password hashing"
  Old SHA: mno7890...
  New SHA: pqr6666...

=== FINAL STATE ===
main:    999888... (unchanged)
feature: pqr6666... (was: mno7890...)

Commits on feature after rebase: 4 (was 5)
Total commits created: 4 new, 5 old destroyed

WARNING: 1 commit dropped. Changes may be lost!

3.5 Data Formats / Schemas / Protocols

Describe input repository assumptions, output report shape, and any policy/config schema consumed by the tool.

3.6 Edge Cases

  • Empty repository or shallow clone state.
  • Detached HEAD or rewritten history during execution.
  • Invalid metadata/policy configuration.

3.7 Real World Outcome

You’ll have a tool that shows exactly what an interactive rebase would do, step by step:

Example Output:

$ ./rebase-sim --repo /path/to/repo --onto main --branch feature

=== Interactive Rebase Simulator ===
Simulating: git rebase -i main feature

Current branch 'feature' has 5 commits not on 'main':
  abc1234 Add user model
  def5678 Fix typo in user model
  ghi9012 Add authentication
  jkl3456 WIP debugging
  mno7890 Add password hashing

Enter todo commands (or press Enter for default 'pick all'):
pick abc1234
squash def5678
pick ghi9012
drop jkl3456
pick mno7890

=== SIMULATION RESULTS ===

Step 1: pick abc1234 "Add user model"
  Old SHA: abc1234...
  New SHA: xyz7777... (different because parent changed!)
  Old parent: 111222... (old main)
  New parent: 999888... (current main tip)

Step 2: squash def5678 "Fix typo in user model"
  Combining with previous commit...
  New message will be:
    "Add user model

    Fix typo in user model"
  Combined SHA: uvw4444...

Step 3: pick ghi9012 "Add authentication"
  Old SHA: ghi9012...
  New SHA: rst5555...

Step 4: drop jkl3456 "WIP debugging"
  ⚠️  This commit will be DELETED
  Changes in this commit:
    - src/debug.py (will be lost!)

Step 5: pick mno7890 "Add password hashing"
  Old SHA: mno7890...
  New SHA: pqr6666...

=== FINAL STATE ===
main:    999888... (unchanged)
feature: pqr6666... (was: mno7890...)

Commits on feature after rebase: 4 (was 5)
Total commits created: 4 new, 5 old destroyed

WARNING: 1 commit dropped. Changes may be lost!


4. Solution Architecture

4.1 High-Level Design

Inputs -> Validation -> Core Engine -> Output Formatter -> Verification Report

4.2 Key Components

Component Responsibility Key Decisions
Input loader Discover commits/refs/config inputs Deterministic ordering and clear failure messages
Core engine Compute project-specific logic Separate read-only simulation from mutating actions
Reporter Produce user-facing output and evidence Include machine-readable and human-readable forms

4.4 Data Structures (No Full Code)

ProjectState { refs, commits, policy, findings, metrics }
Result { status, evidence, warnings, next_actions }

4.4 Algorithm Overview

  1. Collect state from repository and configuration.
  2. Evaluate invariants and policy preconditions.
  3. Execute core transformation or analysis logic.
  4. Verify postconditions and emit deterministic report.

Complexity Analysis:

  • Time: O(history + affected scope)
  • Space: O(active graph window + report size)

5. Implementation Guide

5.1 Development Environment Setup

Use the environment defined in the main guide. Pin tool versions and fixture data to keep outputs reproducible.

5.2 Project Structure

project-root/
├── fixtures/
├── src/
├── tests/
├── docs/
└── README.md

5.3 The Core Question You’re Answering

“Why does rebase ‘rewrite history,’ and what does that actually mean at the commit level?”

Before you write any code, sit with this question. When you rebase, Git doesn’t move commits—it replays the changes onto a new base, creating entirely new commits. The old commits still exist (until garbage collection), but your branch pointer moves to the new chain.


5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Commit Identity
    • What determines a commit’s SHA?
    • If you change just the parent, what happens to the SHA?
    • If you change the commit message, what happens to the SHA?
    • Book Reference: “Pro Git” Ch. 10.2 — Chacon
  2. Rebase Operations
    • What does each rebase command (pick, squash, fixup, reword, edit, drop) do?
    • How does squash differ from fixup?
    • What happens during a rebase conflict?
    • Book Reference: “Pro Git” Ch. 7.6 — Chacon
  3. The Three-Way Merge
    • How does Git apply changes from one commit onto another?
    • What’s the “merge base” in a rebase context?
    • Why can rebase produce different conflicts than merge?
    • Book Reference: “Pro Git” Ch. 3.2 — Chacon

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. Simulation Fidelity
    • How will you calculate what the new SHA would be without creating objects?
    • Can you predict if there would be conflicts?
    • How do you represent the state after each step?
  2. Commit Combination
    • When squashing, how do you combine commit messages?
    • When squashing, how do you combine tree states?
    • What if squashed commits touched the same file?
  3. User Interface
    • How do you present the todo list for editing?
    • How do you show the diff between old and new state?
    • How do you warn about potentially lost changes?

5.6 Thinking Exercise

Trace a Rebase Manually

Create and rebase a test branch:

git init test && cd test
echo "a" > file && git add . && git commit -m "A"
echo "b" > file && git commit -am "B"
git checkout -b feature
echo "c" > file && git commit -am "C"
echo "d" > file && git commit -am "D"
git checkout main
echo "e" > file && git commit -am "E"
git checkout feature
git log --oneline --all --graph  # Note the SHAs
git rebase main
git log --oneline --all --graph  # Compare the SHAs

Questions while tracing:

  • What are the SHAs of C and D before rebase?
  • What are the SHAs of C’ and D’ after rebase?
  • Can you find the old commits with git reflog?
  • What happened to the parent pointers?

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain what happens step-by-step when you run git rebase main from a feature branch.”
  2. “Why should you never rebase commits that have been pushed to a shared branch?”
  3. “What’s the difference between git rebase -i with squash versus fixup?”
  4. “How would you recover commits that were ‘lost’ during a rebase?”
  5. “When would you use rebase vs. merge, and what are the tradeoffs?”

5.8 Hints in Layers

Hint 1: Starting Point Parse the todo file format: <command> <sha> <message>. The commands are: pick, reword, edit, squash, fixup, drop.

Hint 2: SHA Calculation A commit’s SHA is sha1(f"commit {size}\0{content}"). The content includes tree, parent(s), author, committer, and message.

Hint 3: Squash Logic When squashing, the tree comes from applying both commits’ changes, and the message combines both (unless fixup, which discards the second message).

Hint 4: Conflict Detection To predict conflicts, you’d need to simulate the three-way merge. For this project, you can note “potential conflict” when the same file is modified.


5.9 Books That Will Help

Topic Book Chapter
Rebase in depth “Pro Git” by Chacon Ch. 3.6, 7.6
Three-way merge “Version Control with Git” by Loeliger Ch. 9
Reflog and recovery “Pro Git” by Chacon Ch. 7.3

5.10 Implementation Phases

Phase 1: Foundation (1-2 sessions)

  • Define fixtures, expected outputs, and invariant checks.
  • Build read-only analysis path.

Phase 2: Core Functionality (2-4 sessions)

  • Implement project-specific core logic and deterministic reporting.
  • Add policy and edge-case handling.

Phase 3: Polish and Edge Cases (1-2 sessions)

  • Add failure demos, performance notes, and usability improvements.
  • Finalize docs and validation transcripts.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Execution mode direct write vs dry-run+write dry-run+write Safer and easier to debug
Output contract free text vs structured+text structured+text Better automation and readability
Enforcement location local only vs local+CI local+CI Prevents bypass in shared branches

6. Testing Strategy

6.1 Test Categories

  • Unit tests for parsing and policy logic.
  • Integration tests on fixture repositories.
  • Edge-case tests for stale refs, malformed metadata, and large histories.

6.2 Critical Test Cases

  1. Deterministic golden-path scenario.
  2. Policy violation hard-fail scenario.
  3. Recovery path after partial or conflicting state.

6.3 Test Data

Use fixed repository fixtures with known commit graphs and expected outputs stored under version control.


7. Common Pitfalls & Debugging

Problem 1: “Output looks correct but history or metadata is inconsistent”

  • Why: Validation happens after mutation, not before.
  • Fix: Add a preflight invariant check and a post-write verification step.
  • Quick test: Run the same command twice on the same fixture and verify identical results.

Problem 2: “Tool works on small repo but times out on larger history”

  • Why: Full traversal is performed where selective traversal is possible.
  • Fix: Cache intermediate graph lookups and scope analysis to affected commits/paths.
  • Quick test: Compare runtime on small and large fixtures with a clear budget target.

Problem 3: “Policy check can be bypassed by local-only behavior”

  • Why: Enforcement is advisory, not server-authoritative.
  • Fix: Mirror critical checks in CI and protected branch rules.
  • Quick test: Attempt merge with failing policy in CI and confirm hard block.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add richer error messages with remediation hints.
  • Add fixture generation helpers for repeatable demos.

8.2 Intermediate Extensions

  • Add performance instrumentation and budget assertions.
  • Add policy configuration profiles by repository type.

8.3 Advanced Extensions

  • Add distributed execution support for large repositories.
  • Add signed evidence exports for compliance workflows.

9. Real-World Connections

9.1 Industry Applications

  • Internal developer portals.
  • Enterprise repository governance systems.
  • Release safety and incident diagnostics tooling.
  • Git core: https://git-scm.com/
  • GitHub CLI: https://github.com/cli/cli
  • pre-commit framework: https://pre-commit.com/

9.3 Interview Relevance

This project prepares you for architecture and debugging interviews that focus on merge policy, CI gates, and workflow reliability tradeoffs.


10. Resources

10.1 Essential Reading

  • Pro Git (Internals and Workflows chapters)
  • Software Engineering at Google (Version control and build chapters)
  • Accelerate (delivery performance practices)

10.2 Video Resources

  • Git internals talks from Git Merge conference archives.
  • DORA and delivery metrics conference sessions.

10.3 Tools and Documentation

  • https://git-scm.com/docs
  • https://docs.github.com/
  • https://dora.dev/

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the primary invariant this project enforces.
  • I can explain one failure mode and one safe recovery path.

11.2 Implementation

  • Functional requirements are met on deterministic fixtures.
  • Critical edge cases are tested and documented.

11.3 Growth

  • I can describe tradeoffs in an interview setting.
  • I documented what I would change in a production version.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Deterministic golden-path output exists.
  • One failure scenario is handled with clear output.
  • Core workflow objective is demonstrably met.

Full Completion:

  • Minimum criteria plus policy validation, structured reporting, and edge-case coverage.

Excellence:

  • Full completion plus measurable performance budget and production-hardening notes.