Project 1: The WAV Player

Build the audio output foundation: a command-line WAV file player that streams uncompressed audio to your speakers with pause, resume, and seek functionality.


Quick Reference

Attribute Value
File P01-the-wav-player.md
Main Programming Language C
Alternative Programming Languages Rust, C++, Zig
Coolness Level Level 3: Genuinely Clever
Business Potential 1. The “Resume Gold”
Difficulty Level 3: Advanced (The Engineer)
Knowledge Area Audio Systems, Systems Programming
Software or Tool ALSA (Linux), CoreAudio (macOS), WASAPI (Windows)
Main Book “The Linux Programming Interface” by Michael Kerrisk

What You Will Build

A command-line WAV file player that streams uncompressed audio to your speakers with pause, resume, and seek functionality.


Why It Teaches Audio Fundamentals

Before decoding MP3, you must master audio output. WAV files are uncompressed PCM—the exact format audio hardware expects. Building a WAV player teaches you sample formats, audio APIs, and real-time streaming without codec complexity.


Core Challenges You Will Face

  • Parsing the RIFF/WAV container → Maps to binary file parsing and chunk navigation
  • Configuring audio hardware → Maps to platform audio APIs and device parameters
  • Real-time streaming without underruns → Maps to buffer management and timing
  • Handling different sample formats → Maps to PCM data representation (8/16/24/32-bit, float)
  • User input without blocking audio → Maps to concurrent I/O design

Real World Outcome

You will have a fully functional command-line audio player that plays WAV files with responsive controls.

Example Session:

$ ./wavplay music.wav

WAV Player v1.0
──────────────────────────────────────────────────────
File: music.wav
Format: PCM, 44100 Hz, 16-bit, Stereo
Duration: 3:42 (9,878,400 samples)

Controls: [SPACE] Pause/Resume  [←/→] Seek 5s  [q] Quit
──────────────────────────────────────────────────────

Playing... ▶ 01:23 / 03:42  [████████████░░░░░░░░░░░░░] 37%

^C
Playback stopped at 01:23.
$

What you see when it works correctly:

  1. File information display: Shows sample rate, bit depth, channels, and duration
  2. Progress bar: Updates in real-time (every 100ms or so)
  3. Responsive controls: Space pauses within 50ms, seek moves playback position
  4. Clean shutdown: Ctrl+C or ‘q’ stops gracefully without audio pops
  5. Error handling: Clear messages for invalid files, unsupported formats, or device errors

What you hear:

  • Smooth, uninterrupted playback with no clicks, pops, or dropouts
  • Pause/resume without audio artifacts
  • Seeks jump to the correct position without glitches

The Core Question You Are Answering

“How do computers actually produce sound from numbers?”

Before writing any code, sit with this question. Most programmers treat audio as a black box—call a library, pass some data, sound comes out. But you’re going to understand the entire chain: how discrete samples become continuous voltage, how buffers prevent stuttering, and why wrong byte order creates white noise instead of music.

The answer forces you to understand:

  • Time-domain representation: Sound is pressure waves; we sample voltage at fixed intervals
  • Sample rate: 44100 Hz means 44100 amplitude values per second per channel
  • Bit depth: Each sample’s precision (16-bit = 65536 amplitude levels)
  • Double buffering: While hardware plays buffer A, software fills buffer B

Concepts You Must Understand First

Stop and research these before coding:

PCM Audio Representation

  • What is the Nyquist frequency and why does 44.1 kHz capture up to 22 kHz?
  • How are samples interleaved for stereo? (L R L R L R…)
  • What does “signed 16-bit little-endian” mean for a sample value?
  • Book Reference: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Ch. 2

The RIFF/WAV File Format

  • What are RIFF chunks and how do you navigate them?
  • What fields are in the “fmt “ sub-chunk?
  • Where does the actual audio data start?
  • Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 63 (File I/O)

Audio Hardware Interfaces

  • What is a sound card’s sample buffer and how do you write to it?
  • What causes audio underruns and how do you prevent them?
  • What are period size and buffer size in ALSA terminology?
  • Book Reference: ALSA Project Documentation (alsa-project.org)

Real-Time Constraints

  • How much data must you deliver per second for 44.1 kHz stereo 16-bit? (176,400 bytes/sec)
  • What’s the maximum latency before audio stutters?
  • How do you balance latency vs. CPU efficiency?
  • Book Reference: “The Linux Programming Interface” by Michael Kerrisk - Ch. 23 (Timers)

Questions to Guide Your Design

Before implementing, think through these:

File Parsing Strategy

  • Will you load the entire file into memory or stream from disk?
  • How will you handle WAV files with extra chunks (metadata, cue points)?
  • What if the “data” chunk doesn’t immediately follow “fmt “?
  • How will you validate the file is actually a WAV and not corrupted?

Audio Output Architecture

  • What sample format will you request from the audio device?
  • How large should your audio buffer be? (Latency vs. underrun risk)
  • How will you handle the audio device being busy or unavailable?
  • Will you convert sample formats or require specific input formats?

Playback Control

  • How will you read keyboard input without blocking audio output?
  • How will you implement seek? (File position + buffer flush)
  • What happens to partially-filled buffers on pause?
  • How will you calculate and display the current playback position?

Concurrency Model

  • Will you use threads, async I/O, or a single-threaded event loop?
  • Who writes to the audio buffer: main thread or dedicated audio thread?
  • How will you synchronize UI updates with playback position?

Thinking Exercise

Trace the Sample Path

Before coding, draw the complete path of a single audio sample from WAV file to speaker. Include:

  1. File offset where the sample lives
  2. Read buffer in your program’s memory
  3. Audio buffer (e.g., ALSA ring buffer)
  4. DMA transfer to the audio codec chip
  5. DAC conversion to analog voltage
  6. Amplifier and speaker

Questions while tracing:

  • If the WAV file is 16-bit little-endian but your machine is big-endian, what happens?
  • If you seek to position 1000000 bytes in the data chunk, what sample number is that for stereo 16-bit audio?
  • If ALSA reports 4 periods of 1024 frames each, how much latency in milliseconds at 44.1 kHz?

The Interview Questions They Will Ask

Prepare to answer these:

  1. “Explain the difference between sample rate and bit depth. What happens if you play a 48 kHz file at 44.1 kHz?”

  2. “How would you debug an audio player that plays static instead of music?” (Hint: check byte order, sample format, channel count)

  3. “What is an audio buffer underrun? How do you prevent them without adding too much latency?”

  4. “Design an audio mixer that plays two WAV files simultaneously. What challenges arise?”

  5. “Why do audio applications need real-time scheduling? What’s the consequence of missing a deadline?”

  6. “How would you implement gapless playback between two audio files?”


Hints in Layers

Hint 1: Starting Point

Begin with the simplest possible case: hardcode 44.1 kHz, 16-bit, stereo. Don’t worry about other formats initially. Read the file in chunks (e.g., 16KB) and write to the audio device in a loop. Get any sound playing first.

Hint 2: WAV Parsing Structure

The WAV file structure:

Bytes 0-3:   "RIFF"
Bytes 4-7:   File size - 8
Bytes 8-11:  "WAVE"
Bytes 12+:   Chunks...

Each chunk: 4-byte ID, 4-byte size (little-endian), then data. Find “fmt “ for format info, “data” for audio samples.

Hint 3: ALSA Configuration Pattern

Pseudocode for ALSA setup:

open_pcm_device("default", PLAYBACK)
set_hw_params:
    access = INTERLEAVED
    format = S16_LE
    channels = 2
    rate = 44100
    period_size = 1024 frames
    buffer_size = 4096 frames
prepare_device()

while (samples_remaining):
    read_from_file(buffer, period_size * frame_size)
    write_to_device(buffer, period_size)

close_device()

Hint 4: Non-Blocking Input

Use select() or poll() to check stdin for keystrokes while audio plays:

poll_fds[0] = { .fd = 0, .events = POLLIN };  // stdin
poll(poll_fds, 1, 0);  // 0ms timeout = non-blocking
if (poll_fds[0].revents & POLLIN) {
    read_key_and_handle();
}

Set terminal to raw mode with tcsetattr() to get single keystrokes.


Books That Will Help

Topic Book Chapter
ALSA Programming “The Linux Programming Interface” by Michael Kerrisk Ch. 63 (Alternative I/O Models)
Binary File Parsing “C Programming: A Modern Approach” by K. N. King Ch. 22 (Input/Output)
Low-Level I/O “Advanced Programming in the UNIX Environment” by Stevens Ch. 3, 14
Real-Time Considerations “The Linux Programming Interface” by Michael Kerrisk Ch. 22, 23
PCM Audio Concepts “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron Ch. 2 (Data Representations)

Common Pitfalls and Debugging

Problem 1: “I hear static/noise instead of music”

  • Why: Wrong sample format or byte order. Most common: treating unsigned as signed, or big-endian as little-endian.
  • Fix: Verify WAV header says S16_LE (signed 16-bit little-endian). Check your ALSA format matches exactly.
  • Quick test: xxd music.wav | head -20 — samples should be small numbers near zero for silence, not 0xFF bytes.

Problem 2: “Audio stutters or has periodic clicks”

  • Why: Buffer underrun. You’re not writing samples fast enough.
  • Fix: Increase buffer size (add latency) or reduce period size (more frequent, smaller writes). Check for slow file I/O or CPU spikes.
  • Quick test: Run LIBASOUND_DEBUG=1 ./wavplay to see ALSA warnings about underruns.

Problem 3: “No sound at all, but no errors”

  • Why: Wrong audio device, or samples are silent (all zeros), or system mixer is muted.
  • Fix: Try aplay -D default music.wav first. Check alsamixer for muted channels. Print the first 20 sample values to verify they’re non-zero.
  • Quick test: aplay -l lists available sound cards.

Problem 4: “Playback is too fast/slow (chipmunk or slow-mo effect)”

  • Why: Sample rate mismatch. You’re telling ALSA 44100 but the file is 48000, or vice versa.
  • Fix: Read the sample rate from the WAV header and configure ALSA to match.
  • Quick test: Print the sample rate parsed from the WAV header.

Problem 5: “Program hangs when I press a key”

  • Why: stdin is in line-buffered mode, waiting for Enter. Or you’re reading stdin in blocking mode.
  • Fix: Set terminal to raw mode with tcsetattr(). Use poll() or select() for non-blocking input.
  • Quick test: Check if single keypresses work in raw mode: stty raw && cat.

Definition of Done

  • Plays 16-bit 44.1 kHz stereo WAV files without audible artifacts
  • Correctly parses WAV headers and extracts format information
  • Displays file info, playback position, and duration
  • Space bar pauses and resumes playback within 100ms
  • Left/Right arrows seek backward/forward by 5 seconds
  • Quit key stops playback cleanly without audio pop
  • Handles WAV files with extra metadata chunks (skips them)
  • Reports clear errors for invalid/unsupported files
  • Works on files from a few seconds to several hours in length
  • No memory leaks (verified with Valgrind)

References