Project 10: Binary Diff and Patch Tool

  • File: P10-binary-diff-patch.md
  • Main Programming Language: C or Python
  • Alternative Programming Languages: Rust, Go
  • Coolness Level: Level 3 (See REFERENCE.md)
  • Business Potential: Level 2 (See REFERENCE.md)
  • Difficulty: Level 3 (See REFERENCE.md)
  • Knowledge Area: Binary Editing
  • Software or Tool: CLI
  • Main Book: “The C Programming Language”

What you will build: A tool that compares two binary files and emits a minimal patch script (offset + byte values) and can apply it.

Why it teaches binary/hex: Patching requires exact byte-level reasoning and offset discipline.

Core challenges you will face:

  • Offset tracking -> Endianness
  • Byte-by-byte comparison -> Bits/Bytes/Nibbles
  • Patch application -> Bitwise Operations

Real World Outcome

$ bindiff a.bin b.bin
offset 0x00000010: 3A -> 7F
offset 0x0000001F: 00 -> 01

$ binpatch a.bin patch.txt out.bin
patched: 2 bytes changed

The Core Question You Are Answering

“How do I describe changes in a binary file with absolute precision?”

Concepts You Must Understand First

  1. Byte offsets
    • Why are offsets shown in hex?
    • Book Reference: “Computer Systems: A Programmer’s Perspective” - Ch. 2
  2. Hex rendering
    • How do you display byte changes?
    • Book Reference: “The C Programming Language” - Ch. 7

Questions to Guide Your Design

  1. Patch format
    • How will you encode offset and byte values?
  2. Safety
    • Will you support dry-run mode?

Thinking Exercise

Patch Reasoning

Given two hex strings, mark which offsets differ and how you would encode them.

Questions to answer:

  • How do you avoid ambiguity about byte order?
  • How do you ensure patches are deterministic?

The Interview Questions They Will Ask

  1. “How would you design a binary patch format?”
  2. “Why is offset accuracy critical?”
  3. “How can you verify a patch applied correctly?”
  4. “What is the difference between text diff and binary diff?”
  5. “How would you optimize diffing large files?”

Hints in Layers

Hint 1: Starting Point Compare files byte-by-byte and record offsets that differ.

Hint 2: Next Level Write patch lines as: offset, old byte, new byte.

Hint 3: Technical Details Pseudocode:

for offset in 0..min(lenA,lenB):
  if A[offset] != B[offset]:
    emit offset, A[offset], B[offset]

Hint 4: Tools/Debugging Use xxd -r to verify patch output when needed.

Books That Will Help

Topic Book Chapter
File I/O “The C Programming Language” Ch. 7

Common Pitfalls and Debugging

Problem 1: “Patch applies but file is corrupted”

  • Why: You wrote offsets in decimal but read them as hex.
  • Fix: Enforce a single notation and document it.
  • Quick test: Apply patch to a known small file and compare byte-by-byte.

Definition of Done

  • Emits deterministic patch files
  • Applies patches safely and reproducibly
  • Includes verification by byte comparison
  • Handles different file lengths