Project 6: Audio Capture and Playback Pipeline

Record audio via USB sound card, store WAV files safely, and play them back without dropouts.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	1–2 weekends
Main Programming Language	Python (Alternatives: C, Go, Rust)
Alternative Programming Languages	C, Go, Rust
Coolness Level	Medium
Business Potential	Medium
Prerequisites	Linux CLI, file I/O basics, USB device knowledge
Key Topics	ALSA PCM, buffering, WAV format, storage durability

1. Learning Objectives

By completing this project, you will:

Capture audio from a USB sound card using ALSA.
Explain how buffer size affects latency and dropouts.
Save deterministic WAV files and verify their integrity.
Detect and handle buffer underruns.

2. All Theory Needed (Per-Concept Breakdown)

Concept 1: ALSA PCM Streams, Buffering, and WAV Storage

Fundamentals

Audio capture on Linux uses ALSA (Advanced Linux Sound Architecture), which exposes PCM streams for recording and playback. A PCM stream is a sequence of frames, each containing samples for one or more channels. To record reliably, you must choose a sample rate, bit depth, and buffer size. Too small a buffer leads to underruns or overruns; too large adds latency. WAV files store raw PCM data plus a header describing the format. Understanding ALSA’s buffering model and the WAV header structure is essential for building a stable capture pipeline.

Deep Dive into the concept

ALSA presents audio devices as hardware and software abstractions. The “hardware” device (hw:1,0) exposes the real device with minimal software processing. The “plug” device (plughw:1,0) can perform automatic format conversion. For deterministic recording, you should configure a fixed sample rate (e.g., 44100 Hz), channels (mono or stereo), and sample width (16-bit). ALSA buffers audio in frames; each frame contains one sample per channel. The buffer is divided into periods, which are chunks of audio processed by the driver. If your application does not read data fast enough, the buffer overflows and you lose audio samples. If you do not write to playback fast enough, you get underruns and audible clicks.

Buffer size and period size are key tuning parameters. A typical configuration might be a buffer of 4096 frames with periods of 1024 frames. Smaller buffers reduce latency but increase CPU overhead and risk dropouts. Larger buffers are safer but add latency. On a Pi Zero 2 W, CPU resources are limited, so you need to balance these factors. A robust pipeline measures underruns and adjusts buffer size until errors disappear under normal load. This is especially important if you are also running other services on the Pi.

When recording, you should write data in chunks rather than sample-by-sample to reduce overhead. A typical loop reads a period of frames, appends to a file buffer, and flushes to disk periodically. Since SD cards are slower and more error-prone than SSDs, you should avoid excessive fsync calls, but still ensure the file is closed cleanly. WAV files use a RIFF header that includes the file length. Because the length is unknown at the start of recording, most implementations write a placeholder header, record data, then seek back to fill in the correct sizes. If your program crashes mid-recording, the file may be missing proper sizes. A simple recovery strategy is to rewrite the header at the end or use a tool to repair WAV headers.

Detecting buffer underruns is part of correctness. ALSA exposes status info, and many libraries provide error callbacks. You can also detect dropouts by comparing expected frames with actual frames read. Logging underruns is essential because a recording that “sounds fine” may still contain gaps. The correct pipeline should report underruns and optionally retry or increase buffer size.

USB sound cards add another layer: they must be enumerated properly, and the Pi Zero 2 W’s USB power budget is limited. If the device disconnects, ALSA may return errors and the device file may disappear. Your program should detect these errors and fail with a clear message, not hang.

How this fit on projects

This concept is used in §3 (requirements), §4 (architecture), and §5.10 (implementation). It also relates to Project 12 (log durability) when writing audio files safely.

Definitions & key terms

ALSA: Linux audio system.
PCM: Pulse-code modulation, raw audio samples.
Buffer/Period: ALSA buffering units.
Underrun/Overrun: Audio buffer starvation or overflow.
WAV: File format storing PCM data with RIFF headers.

Mental model diagram (ASCII)

Mic -> ALSA buffer (periods) -> App -> WAV file -> Playback

How it works (step-by-step, with invariants and failure modes)

Enumerate audio device and open PCM stream.
Set sample rate, channels, format.
Read periods in a loop.
Write frames to WAV, update header on close.

Failure modes:

Buffer overrun -> dropped samples.
USB disconnect -> ALSA errors.
Incorrect WAV header -> file unreadable.

Minimal concrete example

stream = pa.open(format=pyaudio.paInt16, channels=1, rate=44100, input=True, frames_per_buffer=1024)
frames = stream.read(1024)

Common misconceptions

“Bigger buffers are always better.” They increase latency.
“WAV is compressed.” It is raw PCM with a header.
“USB devices are always stable.” Power budget and cables matter.

Check-your-understanding questions

What causes an ALSA buffer underrun?
Why must WAV headers be updated after recording?
How does buffer size affect latency?

Check-your-understanding answers

The app fails to read or write audio fast enough.
The file size is unknown at the start; header must be updated.
Larger buffers increase latency but reduce dropouts.

Real-world applications

Voice recorders, smart speakers, and audio monitoring devices.

Where you’ll apply it

This project: §3.2, §5.10.
Other projects: Project 12.

References

ALSA documentation
“The Linux Programming Interface” — file I/O

Key insights

Audio reliability is a buffering problem, not just a coding problem.

Summary

A stable audio pipeline requires correct ALSA configuration, buffer tuning, and safe WAV writing.

Homework/Exercises to practice the concept

Record 5 seconds with different buffer sizes and compare dropouts.
Inspect a WAV header and identify its fields.
Force a USB disconnect and observe error handling.

Solutions to the homework/exercises

Smaller buffers increase dropout risk; find the smallest stable value.
The header contains “RIFF”, format, sample rate, and data size.
ALSA returns an error; your program should exit gracefully.

3. Project Specification

3.1 What You Will Build

A recording pipeline that captures audio from a USB sound card, writes WAV files, and plays them back without dropouts.

3.2 Functional Requirements

Record audio for a configurable duration.
Save WAV files with correct headers.
Play back the recorded file.
Log buffer underruns and device info.

3.3 Non-Functional Requirements

Performance: No dropouts during 30s recording at 44.1 kHz.
Reliability: WAV file always playable.
Usability: Clear CLI options.

3.4 Example Usage / Output

$ ./audio_pipeline
Input device: USB Audio CODEC
Recording: 00:30
Buffers captured: 1500
File saved: recordings/room_2026-01-01.wav
Playback complete

3.5 Data Formats / Schemas / Protocols

WAV header fields (simplified):

RIFF, chunk_size, WAVE, fmt, sample_rate, bits_per_sample, data_size

3.6 Edge Cases

USB audio device not present.
Buffer underruns under load.
Disk full during recording.

3.7 Real World Outcome

You can capture and replay audio with no clicks or missing segments.

3.7.1 How to Run (Copy/Paste)

python3 audio_pipeline.py --device "hw:1,0" --duration 30 --out recordings/test.wav

3.7.2 Golden Path Demo (Deterministic)

export FIXED_TIME="2026-01-01T10:40:00Z"
python3 audio_pipeline.py --simulate --frames 1323000

Expected output:

[2026-01-01T10:40:00Z] Recorded 30.0s, underruns=0

3.7.3 Failure Demo (Deterministic)

python3 audio_pipeline.py --device "hw:9,9"

Expected output:

[ERROR] Audio device not found

Exit code: 61

3.7.4 CLI Exit Codes

0: Success
60: WAV write failure
61: Audio device not found
62: Buffer underrun threshold exceeded

4. Solution Architecture

4.1 High-Level Design

ALSA Capture -> Buffer -> WAV Writer -> Playback -> Logger

4.2 Key Components

4.3 Data Structures (No Full Code)

frames = []  # list of byte chunks

4.4 Algorithm Overview

Key Algorithm: Record-Write-Verify

Open PCM stream.
Read frames into buffer.
Write WAV and update header.
Playback and verify.

Complexity Analysis:

Time: O(n) frames
Space: O(n) if fully buffered; O(1) streaming

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install -y alsa-utils

5.2 Project Structure

project-root/
├── audio_pipeline.py
├── wav_writer.py
└── README.md

5.3 The Core Question You’re Answering

“How does Linux turn a real-time audio stream into reliable data on slow storage?”

5.4 Concepts You Must Understand First

ALSA PCM stream configuration.
Buffer vs latency tradeoffs.
WAV file structure.

5.5 Questions to Guide Your Design

What buffer size avoids dropouts under load?
How will you validate that a WAV file is correct?

5.6 Thinking Exercise

Calculate WAV file size for 30s, 44.1kHz, 16-bit mono.

5.7 The Interview Questions They’ll Ask

What causes buffer underruns?
Why is WAV suitable for raw audio capture?
How do you list audio devices on Linux?

5.8 Hints in Layers

Hint 1: Use arecord -l to list devices.

Hint 2: Start with a 5-second recording.

Hint 3: Increase buffer size until underruns disappear.

5.9 Books That Will Help

| Topic | Book | Chapter | |—|—|—| | Linux I/O | The Linux Programming Interface | Ch. 13 | | Streams | Advanced Programming in the UNIX Environment | Ch. 2 |

5.10 Implementation Phases

Phase 1: Device Selection (2 hours)

Enumerate devices and open stream.

Phase 2: Recording (4 hours)

Record and write WAV files.

Phase 3: Robustness (3 hours)

Add underrun detection and error handling.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

WAV file plays without errors.
Buffer underruns logged as warnings.
Device missing -> exit code 61.

6.3 Test Data

Sample rate: 44100
Channels: 1
Duration: 30s

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Compare WAV file sizes to expected size.
Use aplay to test playback.

7.3 Performance Traps

Logging every frame adds overhead; log per period.

8. Extensions & Challenges

8.1 Beginner Extensions

Add automatic file naming with timestamps.

8.2 Intermediate Extensions

Add audio level meter (RMS calculation).

8.3 Advanced Extensions

Stream audio over network (UDP or RTP).

9. Real-World Connections

9.1 Industry Applications

Smart speakers, audio recorders, environmental monitoring.

arecord / aplay ALSA utilities.

9.3 Interview Relevance

Buffering and latency tradeoffs are common audio questions.

10. Resources

10.1 Essential Reading

ALSA PCM documentation.

10.2 Video Resources

ALSA basics tutorials.

10.3 Tools & Documentation

arecord and aplay man pages.

Previous: Project 5
Next: Project 7

11. Self-Assessment Checklist

11.1 Understanding

I can explain ALSA buffer/period concepts.
I can describe WAV file headers.

11.2 Implementation

Recordings are clean and playable.
Underruns are detected and handled.

11.3 Growth

I can discuss buffer/latency tradeoffs in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

Record and play back a 5-second WAV file.

Full Completion:

30-second recording with zero underruns.

Excellence (Going Above & Beyond):

Live streaming pipeline with dropouts monitoring.