Project 1: The WAV Player
Build the audio output foundation: a command-line WAV file player that streams uncompressed audio to your speakers with pause, resume, and seek functionality.
Quick Reference
| Attribute | Value |
|---|---|
| File | P01-the-wav-player.md |
| Main Programming Language | C |
| Alternative Programming Languages | Rust, C++, Zig |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | 1. The “Resume Gold” |
| Difficulty | Level 3: Advanced (The Engineer) |
| Knowledge Area | Audio Systems, Systems Programming |
| Software or Tool | ALSA (Linux), CoreAudio (macOS), WASAPI (Windows) |
| Main Book | “The Linux Programming Interface” by Michael Kerrisk |
What You Will Build
A command-line WAV file player that streams uncompressed audio to your speakers with pause, resume, and seek functionality.
Why It Teaches Audio Fundamentals
Before decoding MP3, you must master audio output. WAV files are uncompressed PCM—the exact format audio hardware expects. Building a WAV player teaches you sample formats, audio APIs, and real-time streaming without codec complexity.
Core Challenges You Will Face
- Parsing the RIFF/WAV container → Maps to binary file parsing and chunk navigation
- Configuring audio hardware → Maps to platform audio APIs and device parameters
- Real-time streaming without underruns → Maps to buffer management and timing
- Handling different sample formats → Maps to PCM data representation (8/16/24/32-bit, float)
- User input without blocking audio → Maps to concurrent I/O design
Real World Outcome
You will have a fully functional command-line audio player that plays WAV files with responsive controls.
Example Session:
$ ./wavplay music.wav
WAV Player v1.0
──────────────────────────────────────────────────────
File: music.wav
Format: PCM, 44100 Hz, 16-bit, Stereo
Duration: 3:42 (9,878,400 samples)
Controls: [SPACE] Pause/Resume [←/→] Seek 5s [q] Quit
──────────────────────────────────────────────────────
Playing... ▶ 01:23 / 03:42 [████████████░░░░░░░░░░░░░] 37%
^C
Playback stopped at 01:23.
$
What you see when it works correctly:
- File information display: Shows sample rate, bit depth, channels, and duration
- Progress bar: Updates in real-time (every 100ms or so)
- Responsive controls: Space pauses within 50ms, seek moves playback position
- Clean shutdown: Ctrl+C or ‘q’ stops gracefully without audio pops
- Error handling: Clear messages for invalid files, unsupported formats, or device errors
What you hear:
- Smooth, uninterrupted playback with no clicks, pops, or dropouts
- Pause/resume without audio artifacts
- Seeks jump to the correct position without glitches
The Core Question You Are Answering
“How do computers actually produce sound from numbers?”
Before writing any code, sit with this question. Most programmers treat audio as a black box—call a library, pass some data, sound comes out. But you’re going to understand the entire chain: how discrete samples become continuous voltage, how buffers prevent stuttering, and why wrong byte order creates white noise instead of music.
The answer forces you to understand:
- Time-domain representation: Sound is pressure waves; we sample voltage at fixed intervals
- Sample rate: 44100 Hz means 44100 amplitude values per second per channel
- Bit depth: Each sample’s precision (16-bit = 65536 amplitude levels)
- Double buffering: While hardware plays buffer A, software fills buffer B
Concepts You Must Understand First
Stop and research these before coding:
PCM Audio Representation
- What is the Nyquist frequency and why does 44.1 kHz capture up to 22 kHz?
- How are samples interleaved for stereo? (L R L R L R…)
- What does “signed 16-bit little-endian” mean for a sample value?
- Book Reference: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Ch. 2
The RIFF/WAV File Format
- What are RIFF chunks and how do you navigate them?
- What fields are in the “fmt “ sub-chunk?
- Where does the actual audio data start?
- Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 63 (File I/O)
Audio Hardware Interfaces
- What is a sound card’s sample buffer and how do you write to it?
- What causes audio underruns and how do you prevent them?
- What are period size and buffer size in ALSA terminology?
- Book Reference: ALSA Project Documentation (alsa-project.org)
Real-Time Constraints
- How much data must you deliver per second for 44.1 kHz stereo 16-bit? (176,400 bytes/sec)
- What’s the maximum latency before audio stutters?
- How do you balance latency vs. CPU efficiency?
- Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 23 (Timers)
Questions to Guide Your Design
Before implementing, think through these:
File Parsing Strategy
- Will you load the entire file into memory or stream from disk?
- How will you handle WAV files with extra chunks (metadata, cue points)?
- What if the “data” chunk doesn’t immediately follow “fmt “?
- How will you validate the file is actually a WAV and not corrupted?
Audio Output Architecture
- What sample format will you request from the audio device?
- How large should your audio buffer be? (Latency vs. underrun risk)
- How will you handle the audio device being busy or unavailable?
- Will you convert sample formats or require specific input formats?
Playback Control
- How will you read keyboard input without blocking audio output?
- How will you implement seek? (File position + buffer flush)
- What happens to partially-filled buffers on pause?
- How will you calculate and display the current playback position?
Concurrency Model
- Will you use threads, async I/O, or a single-threaded event loop?
- Who writes to the audio buffer: main thread or dedicated audio thread?
- How will you synchronize UI updates with playback position?
Thinking Exercise
Trace the Sample Path
Before coding, draw the complete path of a single audio sample from WAV file to speaker. Include:
- File offset where the sample lives
- Read buffer in your program’s memory
- Audio buffer (e.g., ALSA ring buffer)
- DMA transfer to the audio codec chip
- DAC conversion to analog voltage
- Amplifier and speaker
Questions while tracing:
- If the WAV file is 16-bit little-endian but your machine is big-endian, what happens?
- If you seek to position 1000000 bytes in the data chunk, what sample number is that for stereo 16-bit audio?
- If ALSA reports 4 periods of 1024 frames each, how much latency in milliseconds at 44.1 kHz?
The Interview Questions They Will Ask
Prepare to answer these:
-
“Explain the difference between sample rate and bit depth. What happens if you play a 48 kHz file at 44.1 kHz?”
-
“How would you debug an audio player that plays static instead of music?” (Hint: check byte order, sample format, channel count)
-
“What is an audio buffer underrun? How do you prevent them without adding too much latency?”
-
“Design an audio mixer that plays two WAV files simultaneously. What challenges arise?”
-
“Why do audio applications need real-time scheduling? What’s the consequence of missing a deadline?”
-
“How would you implement gapless playback between two audio files?”
Hints in Layers
Hint 1: Starting Point
Begin with the simplest possible case: hardcode 44.1 kHz, 16-bit, stereo. Don’t worry about other formats initially. Read the file in chunks (e.g., 16KB) and write to the audio device in a loop. Get any sound playing first.
Hint 2: WAV Parsing Structure
The WAV file structure:
Bytes 0-3: "RIFF"
Bytes 4-7: File size - 8
Bytes 8-11: "WAVE"
Bytes 12+: Chunks...
Each chunk: 4-byte ID, 4-byte size (little-endian), then data. Find “fmt “ for format info, “data” for audio samples.
Hint 3: ALSA Configuration Pattern
Pseudocode for ALSA setup:
open_pcm_device("default", PLAYBACK)
set_hw_params:
access = INTERLEAVED
format = S16_LE
channels = 2
rate = 44100
period_size = 1024 frames
buffer_size = 4096 frames
prepare_device()
while (samples_remaining):
read_from_file(buffer, period_size * frame_size)
write_to_device(buffer, period_size)
close_device()
Hint 4: Non-Blocking Input
Use select() or poll() to check stdin for keystrokes while audio plays:
poll_fds[0] = { .fd = 0, .events = POLLIN }; // stdin
poll(poll_fds, 1, 0); // 0ms timeout = non-blocking
if (poll_fds[0].revents & POLLIN) {
read_key_and_handle();
}
Set terminal to raw mode with tcsetattr() to get single keystrokes.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| ALSA Programming | “The Linux Programming Interface” by Michael Kerrisk | Ch. 63 (Alternative I/O Models) |
| Binary File Parsing | “C Programming: A Modern Approach” by K. N. King | Ch. 22 (Input/Output) |
| Low-Level I/O | “Advanced Programming in the UNIX Environment” by Stevens | Ch. 3, 14 |
| Real-Time Considerations | “The Linux Programming Interface” by Michael Kerrisk | Ch. 22, 23 |
| PCM Audio Concepts | “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron | Ch. 2 (Data Representations) |
Common Pitfalls and Debugging
Problem 1: “I hear static/noise instead of music”
- Why: Wrong sample format or byte order. Most common: treating unsigned as signed, or big-endian as little-endian.
- Fix: Verify WAV header says S16_LE (signed 16-bit little-endian). Check your ALSA format matches exactly.
- Quick test:
xxd music.wav | head -20— samples should be small numbers near zero for silence, not 0xFF bytes.
Problem 2: “Audio stutters or has periodic clicks”
- Why: Buffer underrun. You’re not writing samples fast enough.
- Fix: Increase buffer size (add latency) or reduce period size (more frequent, smaller writes). Check for slow file I/O or CPU spikes.
- Quick test: Run
LIBASOUND_DEBUG=1 ./wavplayto see ALSA warnings about underruns.
Problem 3: “No sound at all, but no errors”
- Why: Wrong audio device, or samples are silent (all zeros), or system mixer is muted.
- Fix: Try
aplay -D default music.wavfirst. Checkalsamixerfor muted channels. Print the first 20 sample values to verify they’re non-zero. - Quick test:
aplay -llists available sound cards.
Problem 4: “Playback is too fast/slow (chipmunk or slow-mo effect)”
- Why: Sample rate mismatch. You’re telling ALSA 44100 but the file is 48000, or vice versa.
- Fix: Read the sample rate from the WAV header and configure ALSA to match.
- Quick test: Print the sample rate parsed from the WAV header.
Problem 5: “Program hangs when I press a key”
- Why: stdin is in line-buffered mode, waiting for Enter. Or you’re reading stdin in blocking mode.
- Fix: Set terminal to raw mode with
tcsetattr(). Usepoll()orselect()for non-blocking input. - Quick test: Check if single keypresses work in raw mode:
stty raw && cat.
Definition of Done
- Plays 16-bit 44.1 kHz stereo WAV files without audible artifacts
- Correctly parses WAV headers and extracts format information
- Displays file info, playback position, and duration
- Space bar pauses and resumes playback within 100ms
- Left/Right arrows seek backward/forward by 5 seconds
- Quit key stops playback cleanly without audio pop
- Handles WAV files with extra metadata chunks (skips them)
- Reports clear errors for invalid/unsupported files
- Works on files from a few seconds to several hours in length
- No memory leaks (verified with Valgrind)
References
- Main guide: LEARN_C_MP3_PLAYER_FROM_SCRATCH.md
- ALSA Project Documentation
- Introduction to Sound Programming with ALSA — Linux Journal
- Audio File Format Specifications — Library of Congress