Project 8: File Signature (Magic Number) Identifier

What you will build: A tool that reads bytes at offsets and identifies files based on magic patterns.

Why it teaches binary/hex: File identification relies on magic-number rules evaluated against byte sequences.

Core challenges you will face:

$ magicid samples/
file.bin: unknown
image.bin: matched rule "PNG"
doc.bin: matched rule "PDF"

“How can you identify a file by its bytes instead of its name?”

Magic patterns
- What do magic files test?
- Book Reference: “Practical Binary Analysis” - Ch. 3
Offsets and types
- Why do rules specify offsets and data types?
- Book Reference: “The C Programming Language” - Ch. 7

Rule format
- Will you define your own rule syntax or parse an existing one?
Matching
- How will you handle overlapping patterns?

Offset Reasoning

Imagine a signature rule that checks bytes 0-3. Why might another rule check bytes at offset 512?

Questions to answer:

Hint 1: Starting Point Start with a small JSON or text file of rules: offset + bytes + label.

Hint 2: Next Level Read only the maximum offset you need for all rules to avoid full-file reads.

Hint 3: Technical Details Pseudocode:

for each rule:
  read bytes at offset
  if bytes match pattern: report label

Hint 4: Tools/Debugging Compare results with the file command for the same inputs.

Topic	Book	Chapter
File signatures	“Practical Binary Analysis”	Ch. 3
Binary I/O	“The C Programming Language”	Ch. 7

Problem 1: “Matches are inconsistent”