Project 2: The MP3 Frame Scanner

Build a command-line tool that scans MP3 files, finds every frame, parses headers, and reports statistics (bitrate, sample rate, duration, VBR detection, ID3 tags).

Quick Reference

Attribute	Value
File	P02-mp3-frame-scanner-parser.md
Main Programming Language	C
Alternative Programming Languages	Rust, Python, Go
Coolness Level	Level 3: Genuinely Clever
Business Potential	1. The “Resume Gold”
Difficulty	Level 3: Advanced (The Engineer)
Knowledge Area	Binary Parsing, Audio Codecs, Bit Manipulation
Software or Tool	xxd, hexdump, custom parser
Main Book	“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron

What You Will Build

A command-line tool that scans MP3 files, finds every frame, parses headers, and reports statistics (bitrate, sample rate, duration, VBR detection, ID3 tags).

Why It Teaches MP3 Fundamentals

Before decoding audio, you must navigate the bitstream. This project forces you to understand the MP3 container format—frame sync patterns, header bit fields, VBR vs. CBR, and the infamous bit reservoir. You’ll learn the structure without the complexity of audio DSP.

Core Challenges You Will Face

Finding frame sync patterns → Maps to binary pattern matching and false positive handling
Parsing bit-level header fields → Maps to bit manipulation and bitwise operators
Handling ID3v2 tags → Maps to syncsafe integers and metadata skipping
Detecting VBR files → Maps to Xing/VBRI header parsing
Calculating accurate duration → Maps to sample counting and frame indexing

Real World Outcome

You will have a forensic MP3 analysis tool that reveals the internal structure of any MP3 file.

Example Session:

$ ./mp3scan song.mp3

MP3 Frame Scanner v1.0
══════════════════════════════════════════════════════════════════

File: song.mp3
Size: 4,523,847 bytes

ID3v2 Tag Detected
──────────────────
  Version: ID3v2.3.0
  Size: 8,742 bytes (syncsafe)
  Title: "Bohemian Rhapsody"
  Artist: "Queen"
  Album: "A Night at the Opera"
  Year: 1975

Audio Analysis
──────────────
  First audio frame at offset: 0x2226 (8742)
  MPEG Version: MPEG-1
  Layer: III
  Sample Rate: 44100 Hz
  Channel Mode: Joint Stereo (M/S + Intensity)

Frame Statistics
────────────────
  Total frames: 8,847
  VBR: Yes (Xing header detected)
  Bitrate range: 128-320 kbps
  Average bitrate: 256 kbps

Duration Calculation
────────────────────
  Samples per frame: 1152
  Total samples: 10,191,744
  Duration: 231.04 seconds (3:51)

Frame Distribution by Bitrate
─────────────────────────────
  128 kbps: ████░░░░░░░░░░░░░░░░ 1,023 frames (11.6%)
  160 kbps: ██████░░░░░░░░░░░░░░ 1,841 frames (20.8%)
  192 kbps: ████████░░░░░░░░░░░░ 2,456 frames (27.8%)
  256 kbps: ██████░░░░░░░░░░░░░░ 1,892 frames (21.4%)
  320 kbps: ███░░░░░░░░░░░░░░░░░ 1,635 frames (18.5%)

Scan complete. No errors detected.
$

What you see when it works:

ID3 tag extraction: Title, artist, album parsed from metadata
Frame-by-frame analysis: Every frame’s header is validated
VBR detection: Xing/VBRI headers identified
Bitrate distribution: Histogram showing encoding quality
Accurate duration: Calculated from actual frame count, not file size

The Core Question You Are Answering

“What is an MP3 file, really? How do I find where the audio starts and where each frame lives?”

Before writing any code, sit with this question. An MP3 file is not a simple linear stream. It may start with ID3 tags, contain VBR headers, have frames of varying sizes, and include garbage bytes that look like sync patterns. Your job is to navigate this mess reliably.

The answer forces you to understand:

Sync word detection: Why 0xFF 0xFB appears (and why false positives happen)
Header bit fields: How 32 bits encode version, layer, bitrate, sample rate, padding, mode
Frame size calculation: The formula that determines exactly where the next frame starts
VBR vs. CBR: Why you can’t calculate duration from file size for variable bitrate files

Concepts You Must Understand First

Stop and research these before coding:

Binary File I/O and Bit Manipulation

How do you read a 32-bit big-endian value from a byte array?
What’s the difference between logical and arithmetic right shift?
How do you extract bits 12-15 from a 32-bit integer?
Book Reference: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Ch. 2

MP3 Frame Header Structure

What are the 32 bits of an MP3 header and what do they mean?
Why are the first 11 bits always 1?
Which combinations of version/layer/bitrate are valid?
Book Reference: ISO/IEC 11172-3 (MPEG-1 Audio) or online tutorials

ID3v2 Tag Format

What is a syncsafe integer and why does ID3v2 use them?
How do you detect ID3v2 at the start of a file?
What if ID3v2 appears in the middle of a file (ID3v2 footer)?
Book Reference: id3.org/id3v2.3.0 specification

VBR Header Formats

Where does the Xing header appear in a VBR file?
What fields does Xing/VBRI provide (frame count, byte count, TOC)?
How does the TOC enable accurate seeking in VBR files?
Book Reference: Xing VBR header specification (Gabriel Bouvigne’s documentation)

Questions to Guide Your Design

Before implementing, think through these:

Sync Pattern Detection

How will you distinguish real frame syncs from coincidental 0xFF bytes in audio data?
What’s your strategy when a sync word leads to an invalid header?
How many consecutive valid frames confirm you found real audio?
Will you scan byte-by-byte or use optimized search?

Error Recovery

What happens if a frame is corrupted or truncated?
How do you handle files that have garbage appended at the end?
What if the file claims one bitrate but has frames of another?
How do you report errors without failing the entire scan?

Memory and Performance

Will you memory-map the file or read in chunks?
How large can MP3 files be? (Multi-hour podcasts can be 100MB+)
Do you need to store all frame offsets or just count them?
What’s the minimum data needed to calculate duration?

Output Format

What information is most useful for debugging MP3 issues?
Should you support machine-readable output (JSON, CSV)?
How will you visualize bitrate distribution?
What warnings should you emit for unusual files?

Thinking Exercise

Parse a Real Header

Get an MP3 file and examine it with xxd:

$ xxd song.mp3 | head -20

Find the first ff fb or ff fa pattern after any ID3 tag. That’s your frame header. For example, if you see ff fb 90 04:

Convert to binary: 1111 1111 1111 1011 1001 0000 0000 0100
Extract fields:
- Bits 21-31 (sync): Should be 111 1111 1111 = all 1s ✓
- Bits 19-20 (version): 11 = MPEG-1
- Bits 17-18 (layer): 01 = Layer III
- Bit 16 (protection): 1 = No CRC
- Bits 12-15 (bitrate): 1001 = 128 kbps (from table)
- Bits 10-11 (sample rate): 00 = 44100 Hz (for MPEG-1)
- Bit 9 (padding): 0 = No padding
- Bit 8 (private): 0
- Bits 6-7 (channel mode): 00 = Stereo
- And so on…

Questions while parsing:

What bitrate does 1001 map to for MPEG-1 Layer III?
What’s the frame size formula? (144 × bitrate / sample_rate + padding)
Where should the next frame start?

The Interview Questions They Will Ask

Prepare to answer these:

“How do you reliably find the start of audio data in an MP3 file that has ID3 tags?”
“What is a syncsafe integer? Why does ID3v2 use it instead of regular integers?” (Hint: avoid false sync patterns)
“Given an MP3 with variable bitrate, how do you calculate its exact duration without decoding?” (Hint: count frames or use Xing header)
“How would you implement seeking to 50% of an MP3 file? How does VBR complicate this?” (Hint: Xing TOC)
“What happens if two bytes in the audio data happen to look like a frame sync? How do you avoid false positives?”
“Why does MP3 use a bit reservoir? What does this mean for frame independence?”

Hints in Layers

Hint 1: Starting Point

Begin by finding and skipping ID3v2 tags. The first 3 bytes are “ID3”, then version (2 bytes), flags (1 byte), and size (4 bytes syncsafe). After that, scan for 0xFF followed by 0xE0 or higher (sync pattern with valid version bits).

Hint 2: Header Parsing Mask

Extract header fields with bit masks:

sync_word    = (header >> 21) & 0x7FF    // bits 21-31 (should be 0x7FF)
version      = (header >> 19) & 0x03     // bits 19-20
layer        = (header >> 17) & 0x03     // bits 17-18
protection   = (header >> 16) & 0x01     // bit 16
bitrate_idx  = (header >> 12) & 0x0F     // bits 12-15
sample_idx   = (header >> 10) & 0x03     // bits 10-11
padding      = (header >> 9)  & 0x01     // bit 9
channel_mode = (header >> 6)  & 0x03     // bits 6-7

Use lookup tables to convert indices to actual values (e.g., bitrate_idx 9 → 128 kbps).

Hint 3: Frame Size Formula

For MPEG-1 Layer III:

frame_size = 144 * bitrate / sample_rate + padding
           = 144 * 128000 / 44100 + 0
           = 417 bytes

Read 4 bytes at offset +417 and verify it’s another valid sync word.

Hint 4: Xing Header Detection

In VBR files, the first frame (after ID3) often contains a Xing header instead of audio:

Offset into frame data:
  - Stereo/Joint Stereo: 36 bytes after header
  - Mono: 21 bytes after header

Look for: "Xing" or "Info" (4 bytes)
Next 4 bytes: flags indicating which fields follow
If flag & 1: next 4 bytes = frame count
If flag & 2: next 4 bytes = byte count
If flag & 4: next 100 bytes = seek TOC

Books That Will Help

Topic	Book	Chapter
Bit Manipulation	“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron	Ch. 2
Binary File I/O	“C Programming: A Modern Approach” by K. N. King	Ch. 22
Data Representation	“Code: The Hidden Language” by Charles Petzold	Ch. 15-16
Low-Level Parsing	“The Linux Programming Interface” by Michael Kerrisk	Ch. 5-6
MPEG Standards	ISO/IEC 11172-3 (MPEG-1 Audio)	Full document

Common Pitfalls and Debugging

Problem 1: “I found a sync word but the header is invalid”

Why: You found 0xFF 0xFB in the audio data itself, not a real frame header.
Fix: After finding a potential sync, verify the full 32-bit header (valid version, layer, bitrate index). Then check if the next frame also has a valid header at the calculated offset.
Quick test: Require 3 consecutive valid frames before accepting the first as real.

Problem 2: “Frame count doesn’t match Xing header”

Why: You’re counting the Xing/Info frame itself as an audio frame.
Fix: The Xing frame contains no audio data. Start counting after it.
Quick test: Compare your count to what ffprobe or mp3info reports.

Problem 3: “ID3 tag size is way too big”

Why: You didn’t decode the syncsafe integer correctly.
Fix: Syncsafe means each byte only uses 7 bits: size = (b0 << 21) | (b1 << 14) | (b2 << 7) | b3
Quick test: Print raw bytes and decoded size, compare with id3v2 -l file.mp3.

Problem 4: “Duration calculation is wrong for VBR files”

Why: You’re using file_size * 8 / bitrate which assumes constant bitrate.
Fix: For VBR, count actual frames and multiply by samples per frame (1152 for Layer III).
Quick test: Compare duration with ffprobe -show_entries format=duration.

Problem 5: “I can’t find the first frame in some files”

Why: ID3v2 tags can have padding, or the file has both ID3v2 at the start and ID3v1 at the end.
Fix: After ID3v2, scan forward for valid sync. Remember ID3v1 is 128 bytes at EOF with “TAG” signature.
Quick test: xxd -s +8742 file.mp3 | head to skip past ID3v2 and see what follows.

Definition of Done

References

Main guide: LEARN_C_MP3_PLAYER_FROM_SCRATCH.md
ID3v2.3.0 Specification
MP3 Frame Header Format
Xing VBR Header
ISO/IEC 11172-3 (MPEG-1 Audio Layer III)