Project 1: The WAV Player

Build the audio output foundation: a command-line WAV file player that streams uncompressed audio to your speakers with pause, resume, and seek functionality.

Quick Reference

Attribute	Value
File	P01-the-wav-player.md
Main Programming Language	C
Alternative Programming Languages	Rust, C++, Zig
Coolness Level	Level 3: Genuinely Clever
Business Potential	1. The “Resume Gold”
Difficulty	Level 3: Advanced (The Engineer)
Knowledge Area	Audio Systems, Systems Programming
Software or Tool	ALSA (Linux), CoreAudio (macOS), WASAPI (Windows)
Main Book	“The Linux Programming Interface” by Michael Kerrisk

What You Will Build

A command-line WAV file player that streams uncompressed audio to your speakers with pause, resume, and seek functionality.

Why It Teaches Audio Fundamentals

Before decoding MP3, you must master audio output. WAV files are uncompressed PCM—the exact format audio hardware expects. Building a WAV player teaches you sample formats, audio APIs, and real-time streaming without codec complexity.

Core Challenges You Will Face

Parsing the RIFF/WAV container → Maps to binary file parsing and chunk navigation
Configuring audio hardware → Maps to platform audio APIs and device parameters
Real-time streaming without underruns → Maps to buffer management and timing
Handling different sample formats → Maps to PCM data representation (8/16/24/32-bit, float)
User input without blocking audio → Maps to concurrent I/O design

Real World Outcome

You will have a fully functional command-line audio player that plays WAV files with responsive controls.

Example Session:

$ ./wavplay music.wav

WAV Player v1.0
──────────────────────────────────────────────────────
File: music.wav
Format: PCM, 44100 Hz, 16-bit, Stereo
Duration: 3:42 (9,878,400 samples)

Controls: [SPACE] Pause/Resume  [←/→] Seek 5s  [q] Quit
──────────────────────────────────────────────────────

Playing... ▶ 01:23 / 03:42  [████████████░░░░░░░░░░░░░] 37%

^C
Playback stopped at 01:23.
$

What you see when it works correctly:

File information display: Shows sample rate, bit depth, channels, and duration
Progress bar: Updates in real-time (every 100ms or so)
Responsive controls: Space pauses within 50ms, seek moves playback position
Clean shutdown: Ctrl+C or ‘q’ stops gracefully without audio pops
Error handling: Clear messages for invalid files, unsupported formats, or device errors

What you hear:

Smooth, uninterrupted playback with no clicks, pops, or dropouts
Pause/resume without audio artifacts
Seeks jump to the correct position without glitches

The Core Question You Are Answering

“How do computers actually produce sound from numbers?”

Before writing any code, sit with this question. Most programmers treat audio as a black box—call a library, pass some data, sound comes out. But you’re going to understand the entire chain: how discrete samples become continuous voltage, how buffers prevent stuttering, and why wrong byte order creates white noise instead of music.

The answer forces you to understand:

Time-domain representation: Sound is pressure waves; we sample voltage at fixed intervals
Sample rate: 44100 Hz means 44100 amplitude values per second per channel
Bit depth: Each sample’s precision (16-bit = 65536 amplitude levels)
Double buffering: While hardware plays buffer A, software fills buffer B

Concepts You Must Understand First

Stop and research these before coding:

PCM Audio Representation

What is the Nyquist frequency and why does 44.1 kHz capture up to 22 kHz?
How are samples interleaved for stereo? (L R L R L R…)
What does “signed 16-bit little-endian” mean for a sample value?
Book Reference: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Ch. 2

The RIFF/WAV File Format

What are RIFF chunks and how do you navigate them?
What fields are in the “fmt “ sub-chunk?
Where does the actual audio data start?
Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 63 (File I/O)

Audio Hardware Interfaces

What is a sound card’s sample buffer and how do you write to it?
What causes audio underruns and how do you prevent them?
What are period size and buffer size in ALSA terminology?
Book Reference: ALSA Project Documentation (alsa-project.org)

Real-Time Constraints

How much data must you deliver per second for 44.1 kHz stereo 16-bit? (176,400 bytes/sec)
What’s the maximum latency before audio stutters?
How do you balance latency vs. CPU efficiency?
Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 23 (Timers)

Questions to Guide Your Design

Before implementing, think through these:

File Parsing Strategy

Will you load the entire file into memory or stream from disk?
How will you handle WAV files with extra chunks (metadata, cue points)?
What if the “data” chunk doesn’t immediately follow “fmt “?
How will you validate the file is actually a WAV and not corrupted?

Audio Output Architecture

What sample format will you request from the audio device?
How large should your audio buffer be? (Latency vs. underrun risk)
How will you handle the audio device being busy or unavailable?
Will you convert sample formats or require specific input formats?

Playback Control

How will you read keyboard input without blocking audio output?
How will you implement seek? (File position + buffer flush)
What happens to partially-filled buffers on pause?
How will you calculate and display the current playback position?

Concurrency Model

Will you use threads, async I/O, or a single-threaded event loop?
Who writes to the audio buffer: main thread or dedicated audio thread?
How will you synchronize UI updates with playback position?

Thinking Exercise

Trace the Sample Path

Before coding, draw the complete path of a single audio sample from WAV file to speaker. Include:

File offset where the sample lives
Read buffer in your program’s memory
Audio buffer (e.g., ALSA ring buffer)
DMA transfer to the audio codec chip
DAC conversion to analog voltage
Amplifier and speaker

Questions while tracing:

If the WAV file is 16-bit little-endian but your machine is big-endian, what happens?
If you seek to position 1000000 bytes in the data chunk, what sample number is that for stereo 16-bit audio?
If ALSA reports 4 periods of 1024 frames each, how much latency in milliseconds at 44.1 kHz?

The Interview Questions They Will Ask

Prepare to answer these:

“Explain the difference between sample rate and bit depth. What happens if you play a 48 kHz file at 44.1 kHz?”
“How would you debug an audio player that plays static instead of music?” (Hint: check byte order, sample format, channel count)
“What is an audio buffer underrun? How do you prevent them without adding too much latency?”
“Design an audio mixer that plays two WAV files simultaneously. What challenges arise?”
“Why do audio applications need real-time scheduling? What’s the consequence of missing a deadline?”
“How would you implement gapless playback between two audio files?”

Hints in Layers

Hint 1: Starting Point

Begin with the simplest possible case: hardcode 44.1 kHz, 16-bit, stereo. Don’t worry about other formats initially. Read the file in chunks (e.g., 16KB) and write to the audio device in a loop. Get any sound playing first.

Hint 2: WAV Parsing Structure

The WAV file structure:

Bytes 0-3:   "RIFF"
Bytes 4-7:   File size - 8
Bytes 8-11:  "WAVE"
Bytes 12+:   Chunks...

Each chunk: 4-byte ID, 4-byte size (little-endian), then data. Find “fmt “ for format info, “data” for audio samples.

Hint 3: ALSA Configuration Pattern

Pseudocode for ALSA setup:

open_pcm_device("default", PLAYBACK)
set_hw_params:
    access = INTERLEAVED
    format = S16_LE
    channels = 2
    rate = 44100
    period_size = 1024 frames
    buffer_size = 4096 frames
prepare_device()

while (samples_remaining):
    read_from_file(buffer, period_size * frame_size)
    write_to_device(buffer, period_size)

close_device()

Hint 4: Non-Blocking Input

Use select() or poll() to check stdin for keystrokes while audio plays:

poll_fds[0] = { .fd = 0, .events = POLLIN };  // stdin
poll(poll_fds, 1, 0);  // 0ms timeout = non-blocking
if (poll_fds[0].revents & POLLIN) {
    read_key_and_handle();
}

Set terminal to raw mode with tcsetattr() to get single keystrokes.

Books That Will Help

Topic	Book	Chapter
ALSA Programming	“The Linux Programming Interface” by Michael Kerrisk	Ch. 63 (Alternative I/O Models)
Binary File Parsing	“C Programming: A Modern Approach” by K. N. King	Ch. 22 (Input/Output)
Low-Level I/O	“Advanced Programming in the UNIX Environment” by Stevens	Ch. 3, 14
Real-Time Considerations	“The Linux Programming Interface” by Michael Kerrisk	Ch. 22, 23
PCM Audio Concepts	“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron	Ch. 2 (Data Representations)

Common Pitfalls and Debugging

Problem 1: “I hear static/noise instead of music”

Why: Wrong sample format or byte order. Most common: treating unsigned as signed, or big-endian as little-endian.
Fix: Verify WAV header says S16_LE (signed 16-bit little-endian). Check your ALSA format matches exactly.
Quick test: xxd music.wav | head -20 — samples should be small numbers near zero for silence, not 0xFF bytes.

Problem 2: “Audio stutters or has periodic clicks”

Why: Buffer underrun. You’re not writing samples fast enough.
Fix: Increase buffer size (add latency) or reduce period size (more frequent, smaller writes). Check for slow file I/O or CPU spikes.
Quick test: Run LIBASOUND_DEBUG=1 ./wavplay to see ALSA warnings about underruns.

Problem 3: “No sound at all, but no errors”

Why: Wrong audio device, or samples are silent (all zeros), or system mixer is muted.
Fix: Try aplay -D default music.wav first. Check alsamixer for muted channels. Print the first 20 sample values to verify they’re non-zero.
Quick test: aplay -l lists available sound cards.

Problem 4: “Playback is too fast/slow (chipmunk or slow-mo effect)”

Why: Sample rate mismatch. You’re telling ALSA 44100 but the file is 48000, or vice versa.
Fix: Read the sample rate from the WAV header and configure ALSA to match.
Quick test: Print the sample rate parsed from the WAV header.

Problem 5: “Program hangs when I press a key”

Why: stdin is in line-buffered mode, waiting for Enter. Or you’re reading stdin in blocking mode.
Fix: Set terminal to raw mode with tcsetattr(). Use poll() or select() for non-blocking input.
Quick test: Check if single keypresses work in raw mode: stty raw && cat.

Definition of Done

References

Main guide: LEARN_C_MP3_PLAYER_FROM_SCRATCH.md
ALSA Project Documentation
Introduction to Sound Programming with ALSA — Linux Journal
Audio File Format Specifications — Library of Congress