Project 6: Audio Capture and Playback Pipeline
Record audio via USB sound card, store WAV files safely, and play them back without dropouts.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1–2 weekends |
| Main Programming Language | Python (Alternatives: C, Go, Rust) |
| Alternative Programming Languages | C, Go, Rust |
| Coolness Level | Medium |
| Business Potential | Medium |
| Prerequisites | Linux CLI, file I/O basics, USB device knowledge |
| Key Topics | ALSA PCM, buffering, WAV format, storage durability |
1. Learning Objectives
By completing this project, you will:
- Capture audio from a USB sound card using ALSA.
- Explain how buffer size affects latency and dropouts.
- Save deterministic WAV files and verify their integrity.
- Detect and handle buffer underruns.
2. All Theory Needed (Per-Concept Breakdown)
Concept 1: ALSA PCM Streams, Buffering, and WAV Storage
Fundamentals
Audio capture on Linux uses ALSA (Advanced Linux Sound Architecture), which exposes PCM streams for recording and playback. A PCM stream is a sequence of frames, each containing samples for one or more channels. To record reliably, you must choose a sample rate, bit depth, and buffer size. Too small a buffer leads to underruns or overruns; too large adds latency. WAV files store raw PCM data plus a header describing the format. Understanding ALSA’s buffering model and the WAV header structure is essential for building a stable capture pipeline.
Deep Dive into the concept
ALSA presents audio devices as hardware and software abstractions. The “hardware” device (hw:1,0) exposes the real device with minimal software processing. The “plug” device (plughw:1,0) can perform automatic format conversion. For deterministic recording, you should configure a fixed sample rate (e.g., 44100 Hz), channels (mono or stereo), and sample width (16-bit). ALSA buffers audio in frames; each frame contains one sample per channel. The buffer is divided into periods, which are chunks of audio processed by the driver. If your application does not read data fast enough, the buffer overflows and you lose audio samples. If you do not write to playback fast enough, you get underruns and audible clicks.
Buffer size and period size are key tuning parameters. A typical configuration might be a buffer of 4096 frames with periods of 1024 frames. Smaller buffers reduce latency but increase CPU overhead and risk dropouts. Larger buffers are safer but add latency. On a Pi Zero 2 W, CPU resources are limited, so you need to balance these factors. A robust pipeline measures underruns and adjusts buffer size until errors disappear under normal load. This is especially important if you are also running other services on the Pi.
When recording, you should write data in chunks rather than sample-by-sample to reduce overhead. A typical loop reads a period of frames, appends to a file buffer, and flushes to disk periodically. Since SD cards are slower and more error-prone than SSDs, you should avoid excessive fsync calls, but still ensure the file is closed cleanly. WAV files use a RIFF header that includes the file length. Because the length is unknown at the start of recording, most implementations write a placeholder header, record data, then seek back to fill in the correct sizes. If your program crashes mid-recording, the file may be missing proper sizes. A simple recovery strategy is to rewrite the header at the end or use a tool to repair WAV headers.
Detecting buffer underruns is part of correctness. ALSA exposes status info, and many libraries provide error callbacks. You can also detect dropouts by comparing expected frames with actual frames read. Logging underruns is essential because a recording that “sounds fine” may still contain gaps. The correct pipeline should report underruns and optionally retry or increase buffer size.
USB sound cards add another layer: they must be enumerated properly, and the Pi Zero 2 W’s USB power budget is limited. If the device disconnects, ALSA may return errors and the device file may disappear. Your program should detect these errors and fail with a clear message, not hang.
How this fit on projects
This concept is used in §3 (requirements), §4 (architecture), and §5.10 (implementation). It also relates to Project 12 (log durability) when writing audio files safely.
Definitions & key terms
- ALSA: Linux audio system.
- PCM: Pulse-code modulation, raw audio samples.
- Buffer/Period: ALSA buffering units.
- Underrun/Overrun: Audio buffer starvation or overflow.
- WAV: File format storing PCM data with RIFF headers.
Mental model diagram (ASCII)
Mic -> ALSA buffer (periods) -> App -> WAV file -> Playback
How it works (step-by-step, with invariants and failure modes)
- Enumerate audio device and open PCM stream.
- Set sample rate, channels, format.
- Read periods in a loop.
- Write frames to WAV, update header on close.
Failure modes:
- Buffer overrun -> dropped samples.
- USB disconnect -> ALSA errors.
- Incorrect WAV header -> file unreadable.
Minimal concrete example
stream = pa.open(format=pyaudio.paInt16, channels=1, rate=44100, input=True, frames_per_buffer=1024)
frames = stream.read(1024)
Common misconceptions
- “Bigger buffers are always better.” They increase latency.
- “WAV is compressed.” It is raw PCM with a header.
- “USB devices are always stable.” Power budget and cables matter.
Check-your-understanding questions
- What causes an ALSA buffer underrun?
- Why must WAV headers be updated after recording?
- How does buffer size affect latency?
Check-your-understanding answers
- The app fails to read or write audio fast enough.
- The file size is unknown at the start; header must be updated.
- Larger buffers increase latency but reduce dropouts.
Real-world applications
- Voice recorders, smart speakers, and audio monitoring devices.
Where you’ll apply it
- This project: §3.2, §5.10.
- Other projects: Project 12.
References
- ALSA documentation
- “The Linux Programming Interface” — file I/O
Key insights
Audio reliability is a buffering problem, not just a coding problem.
Summary
A stable audio pipeline requires correct ALSA configuration, buffer tuning, and safe WAV writing.
Homework/Exercises to practice the concept
- Record 5 seconds with different buffer sizes and compare dropouts.
- Inspect a WAV header and identify its fields.
- Force a USB disconnect and observe error handling.
Solutions to the homework/exercises
- Smaller buffers increase dropout risk; find the smallest stable value.
- The header contains “RIFF”, format, sample rate, and data size.
- ALSA returns an error; your program should exit gracefully.
3. Project Specification
3.1 What You Will Build
A recording pipeline that captures audio from a USB sound card, writes WAV files, and plays them back without dropouts.
3.2 Functional Requirements
- Record audio for a configurable duration.
- Save WAV files with correct headers.
- Play back the recorded file.
- Log buffer underruns and device info.
3.3 Non-Functional Requirements
- Performance: No dropouts during 30s recording at 44.1 kHz.
- Reliability: WAV file always playable.
- Usability: Clear CLI options.
3.4 Example Usage / Output
$ ./audio_pipeline
Input device: USB Audio CODEC
Recording: 00:30
Buffers captured: 1500
File saved: recordings/room_2026-01-01.wav
Playback complete
3.5 Data Formats / Schemas / Protocols
WAV header fields (simplified):
RIFF, chunk_size, WAVE, fmt, sample_rate, bits_per_sample, data_size
3.6 Edge Cases
- USB audio device not present.
- Buffer underruns under load.
- Disk full during recording.
3.7 Real World Outcome
You can capture and replay audio with no clicks or missing segments.
3.7.1 How to Run (Copy/Paste)
python3 audio_pipeline.py --device "hw:1,0" --duration 30 --out recordings/test.wav
3.7.2 Golden Path Demo (Deterministic)
export FIXED_TIME="2026-01-01T10:40:00Z"
python3 audio_pipeline.py --simulate --frames 1323000
Expected output:
[2026-01-01T10:40:00Z] Recorded 30.0s, underruns=0
3.7.3 Failure Demo (Deterministic)
python3 audio_pipeline.py --device "hw:9,9"
Expected output:
[ERROR] Audio device not found
Exit code: 61
3.7.4 CLI Exit Codes
0: Success60: WAV write failure61: Audio device not found62: Buffer underrun threshold exceeded
4. Solution Architecture
4.1 High-Level Design
ALSA Capture -> Buffer -> WAV Writer -> Playback -> Logger
4.2 Key Components
| Component | Responsibility | Key Decisions |
|—|—|—|
| Device Selector | Pick USB audio device | hw vs plughw |
| Recorder | Read PCM frames | Buffer size |
| WAV Writer | Header + data | Header update strategy |
| Player | Playback recorded file | Blocking vs async |
4.3 Data Structures (No Full Code)
frames = [] # list of byte chunks
4.4 Algorithm Overview
Key Algorithm: Record-Write-Verify
- Open PCM stream.
- Read frames into buffer.
- Write WAV and update header.
- Playback and verify.
Complexity Analysis:
- Time: O(n) frames
- Space: O(n) if fully buffered; O(1) streaming
5. Implementation Guide
5.1 Development Environment Setup
sudo apt-get install -y alsa-utils
5.2 Project Structure
project-root/
├── audio_pipeline.py
├── wav_writer.py
└── README.md
5.3 The Core Question You’re Answering
“How does Linux turn a real-time audio stream into reliable data on slow storage?”
5.4 Concepts You Must Understand First
- ALSA PCM stream configuration.
- Buffer vs latency tradeoffs.
- WAV file structure.
5.5 Questions to Guide Your Design
- What buffer size avoids dropouts under load?
- How will you validate that a WAV file is correct?
5.6 Thinking Exercise
Calculate WAV file size for 30s, 44.1kHz, 16-bit mono.
5.7 The Interview Questions They’ll Ask
- What causes buffer underruns?
- Why is WAV suitable for raw audio capture?
- How do you list audio devices on Linux?
5.8 Hints in Layers
Hint 1: Use arecord -l to list devices.
Hint 2: Start with a 5-second recording.
Hint 3: Increase buffer size until underruns disappear.
5.9 Books That Will Help
| Topic | Book | Chapter | |—|—|—| | Linux I/O | The Linux Programming Interface | Ch. 13 | | Streams | Advanced Programming in the UNIX Environment | Ch. 2 |
5.10 Implementation Phases
Phase 1: Device Selection (2 hours)
- Enumerate devices and open stream.
Phase 2: Recording (4 hours)
- Record and write WAV files.
Phase 3: Robustness (3 hours)
- Add underrun detection and error handling.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |—|—|—|—| | Buffer size | 512/1024/4096 | 1024–2048 | Balance latency and reliability | | WAV writing | Buffer all / Stream | Stream | Reduce memory usage |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|—|—|—|
| Unit Tests | WAV header correctness | File size validation |
| Integration Tests | Full record/playback | 30s capture |
| Edge Case Tests | Device missing | hw:9,9 |
6.2 Critical Test Cases
- WAV file plays without errors.
- Buffer underruns logged as warnings.
- Device missing -> exit code
61.
6.3 Test Data
Sample rate: 44100
Channels: 1
Duration: 30s
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|—|—|—|
| Buffer too small | Clicks | Increase buffer |
| Wrong device | No audio | Use arecord -l |
| WAV header wrong | File unreadable | Update header sizes |
7.2 Debugging Strategies
- Compare WAV file sizes to expected size.
- Use
aplayto test playback.
7.3 Performance Traps
- Logging every frame adds overhead; log per period.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add automatic file naming with timestamps.
8.2 Intermediate Extensions
- Add audio level meter (RMS calculation).
8.3 Advanced Extensions
- Stream audio over network (UDP or RTP).
9. Real-World Connections
9.1 Industry Applications
- Smart speakers, audio recorders, environmental monitoring.
9.2 Related Open Source Projects
arecord/aplayALSA utilities.
9.3 Interview Relevance
- Buffering and latency tradeoffs are common audio questions.
10. Resources
10.1 Essential Reading
- ALSA PCM documentation.
10.2 Video Resources
- ALSA basics tutorials.
10.3 Tools & Documentation
arecordandaplayman pages.
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain ALSA buffer/period concepts.
- I can describe WAV file headers.
11.2 Implementation
- Recordings are clean and playable.
- Underruns are detected and handled.
11.3 Growth
- I can discuss buffer/latency tradeoffs in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Record and play back a 5-second WAV file.
Full Completion:
- 30-second recording with zero underruns.
Excellence (Going Above & Beyond):
- Live streaming pipeline with dropouts monitoring.