Project 3: The MP3 Decoder Core

Build a complete MP3 decoder that transforms compressed MP3 frames into raw PCM audio samples, implementing Huffman decoding, dequantization, stereo processing, IMDCT, and synthesis filterbank.

Quick Reference

Attribute	Value
File	P03-the-huffman-decoder-imdct.md
Main Programming Language	C
Alternative Programming Languages	Rust, C++
Coolness Level	Level 5: Pure Magic (Super Cool)
Business Potential	1. The “Resume Gold”
Difficulty	Level 5: Master (The First-Principles Wizard)
Knowledge Area	Signal Processing, Compression, Algorithm Implementation
Software or Tool	Reference decoder, Audacity (for verification)
Main Book	“MPEG Audio Compression Basics” (online resources) and CSAPP

What You Will Build

A complete MP3 decoder that transforms compressed MP3 frames into raw PCM audio samples, implementing Huffman decoding, dequantization, stereo processing, IMDCT, and synthesis filterbank.

Why It Teaches Deep Audio Knowledge

This is the heart of MP3—where compressed bits become music. You’ll implement the exact algorithms specified in ISO 11172-3, understanding every mathematical transformation. Completing this project puts you in elite company; most developers never touch codec internals.

Core Challenges You Will Face

Parsing side information → Maps to complex bit field extraction (17 bytes for stereo)
Huffman decoding with multiple tables → Maps to lookup tables and region boundaries
Dequantization with 4/3 power law → Maps to fixed-point math and scale factors
Stereo processing (M/S and intensity) → Maps to channel recombination algorithms
IMDCT 36-point and 12-point → Maps to DCT mathematics and overlap-add
Synthesis filterbank (32 subbands) → Maps to polyphase filters and matrix operations

Real World Outcome

You will have a library that decodes MP3 frames to PCM samples, verified against reference audio.

Example Session:

$ ./mp3decode test_vectors/layer3_stereo_44100.mp3 output.wav

MP3 Decoder v1.0
═══════════════════════════════════════════════════════

Input: layer3_stereo_44100.mp3
  Format: MPEG-1 Layer III, 128 kbps, 44100 Hz, Stereo

Decoding Progress
─────────────────
Frame 1: Side info parsed (17 bytes), 2 granules
         Scalefactors decoded (both channels)
         Huffman: 1152 samples decoded per channel
         Dequantized: peak sample = 0.9234
         Stereo mode: Joint Stereo (M/S enabled)
         IMDCT: 18 long blocks × 2 granules
         Synthesis complete: 1152 stereo samples written

Frame 2: ...
...
Frame 1000 of 1000: Complete

Output: output.wav
  PCM: 16-bit, 44100 Hz, Stereo
  Samples: 1,152,000
  Duration: 26.12 seconds

Verification
────────────
  Comparing with reference decode...
  Max deviation: 0.00031 (31 LSBs at 16-bit)
  RMS error: 0.000019
  Status: PASS ✓

$ aplay output.wav
# Perfect audio playback!

What you see when it works:

Frame-by-frame decode info: Each frame’s structure exposed
Algorithm stages visible: Side info → Huffman → Dequant → Stereo → IMDCT → Synthesis
Verification against reference: Proves your implementation is correct
Playable WAV output: Load in Audacity, visually compare waveforms

How to verify correctness:

Compare your output against ffmpeg -i input.mp3 -f wav reference.wav:

Waveforms should be virtually identical
Listen to both—any clicks or distortion means a bug
Use spectrograms to spot frequency-domain errors

The Core Question You Are Answering

“How do 128 kbps of compressed data recreate audio that sounds almost as good as a 1411 kbps CD?”

Before writing any code, sit with this question. MP3 achieves ~10:1 compression by exploiting psychoacoustic masking—removing sounds your brain can’t hear. Your decoder reverses this process, reconstructing the approximation from frequency coefficients.

The answer forces you to understand:

Frequency domain: Audio stored as DCT coefficients, not waveform samples
Quantization: Precision thrown away based on what’s audible
Huffman coding: Variable-length codes for efficient entropy encoding
Overlap-add: How IMDCT reconstructs continuous waveforms from blocks

Concepts You Must Understand First

Stop and research these before coding:

MP3 Frame Side Information

What is main_data_begin and why does it point backward?
What do scfsi (scalefactor select information) bits control?
How do block_type and mixed_block_flag affect decoding?
Book Reference: ISO 11172-3 Section 2.4.1.7 (side_info)

Huffman Coding in MP3

Why does MP3 use 32 different Huffman tables?
What are big_values, count1, and how do they partition the spectrum?
How do linbits extend Huffman values for large magnitudes?
Book Reference: “Data Compression” by Khalid Sayood - Ch. 4

Dequantization and Scale Factors

What is the 4/3 power law and why does MP3 use it?
How do scalefac_l and scalefac_s modify subband gains?
What’s the difference between global_gain and scalefactor gain?
Book Reference: ISO 11172-3 Section 2.4.3.4

Stereo Processing

How does M/S stereo encode sum/difference instead of L/R?
What is intensity stereo and when is it used?
How do you know which mode applies to which bands?
Book Reference: ISO 11172-3 Section 2.4.3.2

IMDCT and Windowing

What is the IMDCT and how does it differ from standard DCT?
Why are there three window types (normal, start, stop)?
How does overlap-add eliminate block boundary artifacts?
Book Reference: “MPEG Video Compression Standard” by Mitchell - Ch. 3

Synthesis Filterbank

How does the 32-subband polyphase filter work?
What is the matrixing step (cosine table multiplication)?
How do you produce 32 PCM samples from 32 subband values?
Book Reference: ISO 11172-3 Section 2.4.4

Questions to Guide Your Design

Before implementing, think through these:

Bit Reservoir Handling

How will you buffer main_data across frame boundaries?
What happens when main_data_begin is larger than the previous frame?
How do you handle the first frame where there’s no prior data?

Precision and Overflow

Will you use floating-point or fixed-point arithmetic?
What’s the dynamic range of IMDCT outputs?
How do you clip/saturate without introducing audible artifacts?
What precision does the 4/3 power calculation need?

Memory Layout

How will you organize the 576 frequency coefficients per granule?
Where do you store overlap samples between blocks?
How much state persists across frames (hint: synthesis state)?

Algorithm Implementation Order

Which module should you implement first for easiest debugging?
How can you test Huffman decoding before implementing IMDCT?
What reference outputs can you compare against mid-implementation?

Thinking Exercise

Trace One Subband

Pick one of the 32 subbands (say, subband 15 which covers ~6.9-7.5 kHz). Trace its path through the decoder:

Huffman: Where in big_values does subband 15’s coefficient come from?
Dequant: Which scalefactor (long or short) modifies it?
IMDCT: It’s one of 18 frequency lines—how does IMDCT produce 18 time samples?
Synthesis: How does subband 15 combine with all 31 others to produce 32 PCM samples?

Draw a diagram showing the data flow. At each step, what are the typical value ranges?

Questions while tracing:

If subband 15 has quantized value 0, what happens in dequantization?
If M/S stereo is enabled for this band, what extra processing occurs?
How does a short block (window_type 2) change the IMDCT for this subband?

The Interview Questions They Will Ask

Prepare to answer these:

“Explain the MP3 decoding pipeline from Huffman to PCM. Where does most of the complexity lie?”
“What is the bit reservoir? Why does MP3 use it, and what challenge does it create for streaming?” (Hint: frame interdependence)
“Describe the difference between long blocks and short blocks in MP3. When would an encoder choose each?”
“What is the synthesis filterbank doing conceptually? Why not just output the 576 frequency coefficients directly?”
“How would you verify that your MP3 decoder is correct without listening to every file?” (Hint: binary comparison with reference)
“What causes ‘digital artifacts’ in low-bitrate MP3? How does the psychoacoustic model contribute?”

Hints in Layers

Hint 1: Start with the Structure

Implement parsing first, decoding later. Parse side_info, extract big_values/count1/scalefac, verify against a reference. Print everything. Once parsing is perfect, add algorithms one by one.

Hint 2: Huffman Table Implementation

The 32 Huffman tables are defined in ISO 11172-3 Annex B. Create them as 2D arrays indexed by the Huffman codeword. Use tree traversal or table lookup:

// Pseudocode for table lookup
while (not_done):
    read_next_bit()
    if table[current_node].is_leaf:
        emit(table[current_node].value)
        reset to root
    else:
        current_node = table[current_node].child[bit]

For tables with linbits (large values), after decoding the base value, read additional bits: final_value = base_value + read_bits(linbits).

Hint 3: Dequantization Formula

The core formula from ISO 11172-3:

xr = sign(is[i]) × |is[i]|^(4/3) × 2^(0.25 × (global_gain - 210 - scalefac × scalefac_multiplier))

Where:

is[i] = Huffman-decoded integer value
global_gain = from side info
scalefac = appropriate scale factor for this band
scalefac_multiplier = 0.5 or 1.0 depending on scalefac_scale bit

Hint 4: IMDCT Implementation

For the 36-point IMDCT (long blocks):

// S[k] = input (18 frequency coefficients)
// s[n] = output (36 time samples, but only 18 are new)

for n = 0 to 35:
    s[n] = 0
    for k = 0 to 17:
        s[n] += S[k] * cos(π/72 * (2*n + 1 + 18) * (2*k + 1))

Then window and overlap-add with previous block’s second half.

Books That Will Help

Topic	Book	Chapter
MP3 Format Details	ISO/IEC 11172-3 Standard	Sections 2.4.1-2.4.4
Huffman Coding	“Data Compression” by Khalid Sayood	Ch. 3-4
DCT/IMDCT Math	“Discrete Cosine Transform” by Rao & Yip	Ch. 4-5
Signal Processing	“Understanding DSP” by Richard Lyons	Ch. 1-4
Fixed-Point Arithmetic	“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron	Ch. 2
Audio Compression	“The MPEG Handbook” by Watkinson	Ch. 5-7

Common Pitfalls and Debugging

Problem 1: “Huffman decoding returns garbage values”

Why: Reading bits in wrong order, or table indices off by one.
Fix: Print the raw bits before decoding. Compare with known test vectors from ISO compliance tests.
Quick test: Decode a CBR file where all frames should parse identically.

Problem 2: “Dequantized values are way too large or small”

Why: Global gain or scalefactor extraction error, or 4/3 power calculation wrong.
Fix: Use floating-point initially for debugging. Print intermediate values and compare with reference decoder.
Quick test: Check global_gain is in range 0-255; typical values are 140-200.

Problem 3: “Audio sounds like robot/underwater”

Why: IMDCT window type wrong, or overlap-add not working correctly.
Fix: Check window_switching_flag and block_type. Ensure you’re accumulating overlap correctly across frames.
Quick test: Force long blocks only (find a file with no short blocks) and verify.

Problem 4: “Stereo channels are swapped or mono”

Why: M/S stereo decoding wrong—you must transform Mid/Side back to Left/Right.
Fix: After dequantization: Left = (Mid + Side) / sqrt(2), Right = (Mid - Side) / sqrt(2).
Quick test: Find a file with extreme stereo (sound in one ear only).

Problem 5: “Output is correct but VERY slow”

Why: Unoptimized IMDCT or synthesis filterbank. Naive O(n²) implementations.
Fix: Use fast DCT algorithms or precomputed cosine tables. The synthesis filterbank can use SIMD.
Quick test: Profile to find hotspots. IMDCT and synthesis should dominate.

Problem 6: “First few frames decode wrong but rest is OK”

Why: Bit reservoir initialization. First frame may reference main_data from previous frames that don’t exist.
Fix: Handle main_data_begin = 0 case specially, or skip first frame.
Quick test: Start decoding from frame 10 and see if audio is cleaner.

Definition of Done

References

Main guide: LEARN_C_MP3_PLAYER_FROM_SCRATCH.md
ISO/IEC 11172-3 (MPEG-1 Audio Layer III)
Underbit’s MAD Decoder — Reference implementation
MP3 Technical Reference — Detailed format documentation
minimp3 — Minimal single-header MP3 decoder for reference