Project 10: Binary Diff and Patch Tool
- File: P10-binary-diff-patch.md
- Main Programming Language: C or Python
- Alternative Programming Languages: Rust, Go
- Coolness Level: Level 3 (See REFERENCE.md)
- Business Potential: Level 2 (See REFERENCE.md)
- Difficulty: Level 3 (See REFERENCE.md)
- Knowledge Area: Binary Editing
- Software or Tool: CLI
- Main Book: “The C Programming Language”
What you will build: A tool that compares two binary files and emits a minimal patch script (offset + byte values) and can apply it.
Why it teaches binary/hex: Patching requires exact byte-level reasoning and offset discipline.
Core challenges you will face:
- Offset tracking -> Endianness
- Byte-by-byte comparison -> Bits/Bytes/Nibbles
- Patch application -> Bitwise Operations
Real World Outcome
$ bindiff a.bin b.bin
offset 0x00000010: 3A -> 7F
offset 0x0000001F: 00 -> 01
$ binpatch a.bin patch.txt out.bin
patched: 2 bytes changed
The Core Question You Are Answering
“How do I describe changes in a binary file with absolute precision?”
Concepts You Must Understand First
- Byte offsets
- Why are offsets shown in hex?
- Book Reference: “Computer Systems: A Programmer’s Perspective” - Ch. 2
- Hex rendering
- How do you display byte changes?
- Book Reference: “The C Programming Language” - Ch. 7
Questions to Guide Your Design
- Patch format
- How will you encode offset and byte values?
- Safety
- Will you support dry-run mode?
Thinking Exercise
Patch Reasoning
Given two hex strings, mark which offsets differ and how you would encode them.
Questions to answer:
- How do you avoid ambiguity about byte order?
- How do you ensure patches are deterministic?
The Interview Questions They Will Ask
- “How would you design a binary patch format?”
- “Why is offset accuracy critical?”
- “How can you verify a patch applied correctly?”
- “What is the difference between text diff and binary diff?”
- “How would you optimize diffing large files?”
Hints in Layers
Hint 1: Starting Point Compare files byte-by-byte and record offsets that differ.
Hint 2: Next Level Write patch lines as: offset, old byte, new byte.
Hint 3: Technical Details Pseudocode:
for offset in 0..min(lenA,lenB):
if A[offset] != B[offset]:
emit offset, A[offset], B[offset]
Hint 4: Tools/Debugging
Use xxd -r to verify patch output when needed.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| File I/O | “The C Programming Language” | Ch. 7 |
Common Pitfalls and Debugging
Problem 1: “Patch applies but file is corrupted”
- Why: You wrote offsets in decimal but read them as hex.
- Fix: Enforce a single notation and document it.
- Quick test: Apply patch to a known small file and compare byte-by-byte.
Definition of Done
- Emits deterministic patch files
- Applies patches safely and reproducibly
- Includes verification by byte comparison
- Handles different file lengths