Project 13: Audio DSP with SIMD
Build real-time audio effects (EQ, reverb, compression) using SIMD to process multiple samples simultaneously, meeting the strict latency requirements of audio processing.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Language | C++ |
| Prerequisites | Project 12 (SIMD Math Library), basic audio concepts (samples, sample rate), familiarity with filters helpful |
| Key Topics | SIMD (std::simd), real-time audio, IIR/FIR filters, buffer management, biquad filters, dynamics processing |
| Coolness Level | Level 5: Pure Magic (Super Cool) |
| Business Potential | Micro-SaaS / Pro Tool |
Learning Objectives
After completing this project, you will:
- Master real-time audio constraints: Understand why audio processing has hard deadlines and how to meet them consistently
- Implement SIMD audio processing: Apply vectorization techniques to audio buffers, achieving 5-8x speedups
- Build classic audio effects: Create EQ, compression, and limiting effects using efficient algorithms
- Understand filter design: Implement biquad filters and understand IIR vs FIR tradeoffs
- Handle buffer formats: Convert between interleaved and planar audio formats, understanding when each is optimal
- Apply real-time programming rules: Avoid allocations, locks, and I/O in audio callbacks
- Profile and optimize audio code: Measure latency, ensure glitch-free playback, and verify SIMD effectiveness
Theoretical Foundation
Core Concepts
Sample Rates and Buffers
Digital audio represents sound as a sequence of samples taken at regular intervals. The sample rate determines how many samples per second are captured:
Sample Rates and Their Uses:
44.1 kHz ─────┐ CD quality, most music
│
48 kHz ──────┤ Professional video, DAWs
│
96 kHz ──────┤ High-resolution audio
│
192 kHz ─────┘ Archival, studio masters
Time per sample at 44.1 kHz:
1 / 44100 = 22.7 microseconds
Buffer of 512 samples:
512 / 44100 = 11.6 milliseconds of audio
Audio is processed in buffers (also called blocks or frames). Common buffer sizes:
| Buffer Size | Latency @ 44.1kHz | Use Case |
|---|---|---|
| 64 samples | 1.5 ms | Live performance, very low latency |
| 128 samples | 2.9 ms | Professional DAWs |
| 256 samples | 5.8 ms | Standard production |
| 512 samples | 11.6 ms | General purpose |
| 1024 samples | 23.2 ms | High-latency, CPU-limited systems |
The buffer size creates a fundamental tradeoff: smaller buffers mean lower latency but require more CPU efficiency (the callback runs more frequently).
Interleaved vs Planar Audio Formats
Audio data can be organized two ways:
INTERLEAVED FORMAT (Common in APIs):
┌─────────────────────────────────────────────────┐
│ L0 R0 L1 R1 L2 R2 L3 R3 L4 R4 L5 R5 L6 R6 L7 R7 │
└─────────────────────────────────────────────────┘
↑
Samples alternate between channels
PLANAR FORMAT (Better for SIMD):
┌─────────────────────────────────────────────────┐
│ L0 L1 L2 L3 L4 L5 L6 L7 │ R0 R1 R2 R3 R4 R5 R6 R7 │
└─────────────────────────────────────────────────┘
↑ ↑
All left samples All right samples
contiguous contiguous
Why planar is better for SIMD:
- SIMD processes N samples simultaneously (e.g., 8 with AVX)
- Planar format: Load 8 consecutive left samples, process, store
- Interleaved format: Samples are scattered, requiring shuffles
// SIMD with PLANAR format - simple and fast
void apply_gain_planar(float* channel, size_t n, float gain) {
using simd_t = stdx::native_simd<float>;
simd_t gain_vec = gain; // Broadcast scalar to all lanes
for (size_t i = 0; i + simd_t::size() <= n; i += simd_t::size()) {
simd_t s(&channel[i], stdx::element_aligned);
s *= gain_vec;
s.copy_to(&channel[i], stdx::element_aligned);
}
}
// SIMD with INTERLEAVED format - requires shuffling
// Much more complex and often not worth it for stereo
Digital Filters: IIR and FIR
Filters are the foundation of audio effects. Two fundamental types:
FIR (Finite Impulse Response):
y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] + ... + bM*x[n-M]
Output depends only on input samples (no feedback)
- Always stable
- Linear phase possible
- Requires more coefficients for sharp cutoffs
- Easily parallelized with SIMD
IIR (Infinite Impulse Response):
y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2]
Output depends on previous outputs (feedback)
- Can be unstable if poorly designed
- Non-linear phase
- Very efficient (few coefficients)
- The biquad is the most common form
The Biquad Filter:
The second-order IIR filter (biquad) is the building block of audio EQ:
BIQUAD DIFFERENCE EQUATION:
y[n] = (b0/a0)*x[n] + (b1/a0)*x[n-1] + (b2/a0)*x[n-2]
- (a1/a0)*y[n-1] - (a2/a0)*y[n-2]
Typically normalized so a0 = 1:
y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2]
BIQUAD SIGNAL FLOW:
x[n] ───┬───► [b0] ───┬───────────────────────────► y[n]
│ │ │
▼ │ │
[z⁻¹] │ │
│ │ │
x[n-1] ─┴───► [b1] ──►(+)◄── [-a1] ◄───── [z⁻¹] ◄───┘
│ │ │
▼ │ │
[z⁻¹] │ │
│ │ │
x[n-2] ─┴───► [b2] ──►(+)◄── [-a2] ◄───── [z⁻¹]
Different coefficient calculations create different filter types:
- Low-pass: Attenuates high frequencies
- High-pass: Attenuates low frequencies
- Band-pass: Passes a range of frequencies
- Notch: Removes a specific frequency
- Peak/EQ: Boosts or cuts around a center frequency
- Low-shelf / High-shelf: Boosts or cuts below/above a frequency
SIMD Challenge with IIR Filters
IIR filters have a fundamental problem for SIMD: temporal dependencies.
y[0] = b0*x[0] + b1*x[-1] + b2*x[-2] - a1*y[-1] - a2*y[-2]
y[1] = b0*x[1] + b1*x[0] + b2*x[-1] - a1*y[0] - a2*y[-1]
↑
Depends on y[0]!
Can't compute in parallel
Solutions:
- Process channels in parallel: 8 mono channels = 8 SIMD lanes
// Process 8 different audio channels simultaneously // Each channel has its own filter state struct MultiChannelFilter { std::array<BiquadState, 8> states; // One per SIMD lane void process(float* channels[8], size_t n) { for (size_t i = 0; i < n; ++i) { simd_t x = gather_sample(channels, i); simd_t y = biquad_step(x); // All 8 channels at once scatter_sample(channels, i, y); } } }; - Transposed Direct Form II: Reduces dependencies (advanced)
w[n] = x[n] - a1*w[n-1] - a2*w[n-2] y[n] = b0*w[n] + b1*w[n-1] + b2*w[n-2] Separates input and output dependencies - Block processing for FIR portions: Batch the feedforward part
// Compute all x-term contributions in parallel // Then sequentially add y-term contributions
Dynamics Processing: Compression
A compressor reduces the dynamic range of audio by attenuating signals above a threshold:
COMPRESSION TRANSFER CURVE:
Output dB
▲
│ ╱ Threshold
│ ╱
│ ╱
│ ╱ ← Ratio 4:1
│ ╱ (4dB input → 1dB output above threshold)
│╱
└───────────────────────────► Input dB
Threshold
Key Parameters:
- Threshold: Level where compression begins
- Ratio: How much to reduce (4:1 means 4dB in → 1dB out above threshold)
- Attack: How quickly compression engages
- Release: How quickly compression releases
SIMD Compression Implementation:
void compress(float* samples, size_t n, float threshold, float ratio) {
using simd_t = stdx::native_simd<float>;
simd_t thresh_vec = threshold;
simd_t ratio_vec = ratio;
for (size_t i = 0; i + simd_t::size() <= n; i += simd_t::size()) {
simd_t s(&samples[i], stdx::element_aligned);
simd_t abs_s = stdx::abs(s);
// Where abs > threshold, apply compression
auto mask = abs_s > thresh_vec;
simd_t compressed = thresh_vec + (abs_s - thresh_vec) / ratio_vec;
// Preserve original sign, apply compressed magnitude
s = stdx::where(mask, stdx::copysign(compressed, s), s);
s.copy_to(&samples[i], stdx::element_aligned);
}
}
The key insight: stdx::where() enables conditional processing without branches, which would destroy SIMD efficiency.
Why This Matters: Real-Time Constraints
Audio processing has hard real-time requirements. Miss a deadline, and users hear glitches:
AUDIO BUFFER TIMELINE:
│◄──────── Buffer Period (11.6ms @ 512/44.1k) ────────────────────►│
│ │
│ Audio │ Buffer N │ Buffer N+1 │ Buffer N+2 │
│ Output: │ Playing │ Playing │ Playing │
│ │ │ │ │
│ Your │ │◄─ Must ─►│ │ │
│ Code: │ │ finish │ │ │
│ │ │ here │ │ │
│ │ │ │ │
│ Buffer N+1 Buffer N+2 Buffer N+3 │
│ Processing Processing Processing │
│ │
If you don't finish processing Buffer N+1 before Buffer N finishes
playing, you get an underrun (glitch/crackle/silence).
The Real-Time Rules:
- No memory allocations in the audio callback
new,malloc,std::vector::push_backcan block- Pre-allocate everything before audio starts
- No locks (mutexes) in the audio callback
- Locks can block waiting for other threads
- Use lock-free queues for communication
- No I/O or system calls
- File I/O, network, logging can block indefinitely
- Buffer log messages and write from another thread
- Pre-allocate everything
- Filter coefficients computed before processing
- Delay lines sized at initialization
- Scratch buffers allocated once
// BAD: Allocation in audio callback
void processBlock(float* buffer, size_t n) {
std::vector<float> temp(n); // ALLOCATION! Can block!
// ...
}
// GOOD: Pre-allocated
class AudioProcessor {
std::vector<float> temp_buffer; // Allocated in constructor
AudioProcessor(size_t max_buffer_size)
: temp_buffer(max_buffer_size) {}
void processBlock(float* buffer, size_t n) {
// Use temp_buffer, no allocation
}
};
Historical Context
Digital Audio Timeline:
- 1957: Max Mathews at Bell Labs creates MUSIC I, first computer music program
- 1965: Cooley-Tukey FFT algorithm enables efficient spectral processing
- 1979: Sony/Philips develop CD format (44.1 kHz, 16-bit)
- 1983: MIDI standard enables synthesizer control
- 1996: MP3 popularizes digital music distribution
- 1999: Pro Tools 5.0 with DSP accelerator cards
- 2000s: CPU speed enables real-time plugin processing
- 2010s: SIMD instructions (AVX, AVX-512) enable massive parallelism
- Today: Software plugins rival hardware, real-time ML audio processing
Why SIMD transformed audio:
- Before SIMD: Each sample processed individually
- With AVX: 8 samples processed in one instruction
- Processing time: 0.15ms (scalar) vs 0.02ms (SIMD) per buffer
- Enables: More plugins, lower latency, higher sample rates
Common Misconceptions
Misconception 1: “Higher sample rates always sound better” Reality: Human hearing tops out at ~20kHz. 44.1kHz captures this perfectly (Nyquist). Higher rates matter for processing headroom and easier filter design, not “hearing” more.
Misconception 2: “SIMD automatically makes audio code faster” Reality: SIMD only helps when:
- You have enough parallel data (small buffers may not benefit)
- Data is properly aligned and contiguous
- Algorithm maps to SIMD operations (filters with feedback are hard)
Misconception 3: “Floating-point is always better than integer for audio” Reality: 32-bit float is standard in plugins, but:
- 24-bit fixed-point has identical dynamic range
- Integer can be faster (especially on embedded)
- Modern converters are 24-bit; 32-bit float is computational convenience
Misconception 4: “Real-time means fast” Reality: Real-time means predictable. A function that takes exactly 10ms every time is “real-time safe.” A function that takes 1ms 99.9% of the time but 100ms occasionally is NOT.
Project Specification
What You Will Build
A command-line audio effects processor that:
- Reads WAV files and processes them through a chain of effects
- Implements classic audio effects: EQ, compressor, limiter
- Uses SIMD for all sample processing
- Optionally supports real-time audio passthrough (stretch goal)
- Reports performance metrics comparing scalar vs SIMD implementations
Functional Requirements
| ID | Requirement | Priority |
|---|---|---|
| F1 | Load and save WAV files (16-bit and 24-bit, mono and stereo) | Must |
| F2 | Apply gain with SIMD (volume control) | Must |
| F3 | Implement parametric EQ with biquad filters | Must |
| F4 | Implement dynamics compressor with threshold and ratio | Must |
| F5 | Implement limiter (brick-wall) | Must |
| F6 | Support effect chaining (multiple effects in series) | Must |
| F7 | Report processing time per buffer | Must |
| F8 | Compare SIMD vs scalar performance | Must |
| F9 | Display VU meters (peak level per channel) | Should |
| F10 | Real-time audio passthrough mode | Could |
Non-Functional Requirements
| ID | Requirement | Target |
|---|---|---|
| N1 | Processing time per 512-sample buffer | < 0.1ms |
| N2 | SIMD speedup vs scalar | >= 5x |
| N3 | Real-time ratio | >= 500x (can process 500x faster than real-time) |
| N4 | Latency in real-time mode | < 15ms |
| N5 | No allocations during audio processing | 0 allocations |
| N6 | Support files up to 1 hour | ~180 million samples |
Performance Metrics
Your processor must meet these performance targets:
PERFORMANCE TARGETS:
Buffer Size: 512 samples
Sample Rate: 44.1 kHz
Real-time budget: 11.6 ms per buffer
Your target: < 0.1 ms per buffer
= 116x safety margin
= Can run 116 effect chains simultaneously
SIMD Speedup Requirements:
┌─────────────────────┬─────────────┬──────────────┐
│ Effect │ Scalar Time │ SIMD Speedup │
├─────────────────────┼─────────────┼──────────────┤
│ Gain │ 0.02 ms │ >= 7x │
│ Biquad EQ (stereo) │ 0.08 ms │ >= 2x │
│ Compressor │ 0.05 ms │ >= 5x │
│ Full chain │ 0.15 ms │ >= 5x │
└─────────────────────┴─────────────┴──────────────┘
Real World Outcome
When complete, your audio processor should produce output like this:
$ ./audio_dsp --input music.wav --output processed.wav
Input: music.wav
Duration: 3:45 (13,230,000 samples)
Sample rate: 44100 Hz
Channels: 2 (stereo)
Processing chain:
1. Low-shelf EQ (+3dB @ 100Hz)
2. Parametric EQ (-2dB @ 3kHz, Q=1.5)
3. Compressor (threshold=-20dB, ratio=4:1)
4. Limiter (ceiling=-0.3dB)
Performance:
Samples per SIMD op: 8 (AVX)
Time per buffer (512 samples): 0.02ms
Real-time ratio: 580x (could process 580 streams in real-time!)
Comparison:
Scalar processing: 0.15ms per buffer
SIMD processing: 0.02ms per buffer
Speedup: 7.5x
Output saved to processed.wav
For optional real-time mode:
$ ./audio_dsp --realtime --device "Built-in Audio"
Real-time mode: Processing live audio
Latency: 11.6ms (512 samples @ 44.1kHz)
Running... Press Ctrl+C to stop
[VU Meter] L: ████████░░ -6dB R: ███████░░░ -8dB
Solution Architecture
High-Level Design
AUDIO DSP PROCESSING ARCHITECTURE:
┌─────────────────────────────────────────────────────────────────────────────┐
│ audio_dsp CLI │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────────────────────────┐ │
│ │ WAV I/O │────►│ Buffer │────►│ Processing Chain │ │
│ │ (libsnd) │ │ Manager │ │ │ │
│ └────────────┘ └────────────┘ │ ┌──────┐ ┌─────┐ ┌───────┐ │ │
│ │ │ │ │ EQ │─►│Comp │─►│Limiter│ │ │
│ │ │ │ └──────┘ └─────┘ └───────┘ │ │
│ ▼ ▼ └────────────────────────────────┘ │
│ ┌────────────┐ ┌────────────┐ │ │
│ │ Format │ │ Planar ◄──►│ ▼ │
│ │ Conversion │ │ Interleave │ ┌────────────────────────────────┐ │
│ └────────────┘ └────────────┘ │ Performance Monitor │ │
│ │ (timing, VU meters, stats) │ │
│ └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
PROCESSING CHAIN DETAIL:
Input Buffer Effect 1 Effect 2 Output Buffer
(Planar Float) (EQ) (Compressor) (Planar Float)
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ L0 L1 L2... │───►│ Biquad │───►│ Envelope │───►│ L0 L1 L2... │
│ R0 R1 R2... │ │ Filter │ │ Follower │ │ R0 R1 R2... │
└─────────────┘ │ (SIMD) │ │ + Gain │ └─────────────┘
└─────────────┘ │ (SIMD) │
└─────────────┘
Key Components
| Component | Responsibility | SIMD Usage |
|---|---|---|
AudioBuffer |
Hold planar audio data, manage format conversion | Load/store aligned |
BiquadFilter |
Apply EQ (low-shelf, high-shelf, parametric) | Parallel channel processing |
Compressor |
Dynamics processing with envelope follower | Conditional with where() |
Limiter |
Hard clip/limit to prevent distortion | Clamping with min/max |
ProcessingChain |
Ordered list of effects | N/A (orchestration) |
PerformanceMonitor |
Measure and report timing | N/A (measurement) |
WavIO |
Read/write WAV files | N/A (file I/O) |
Data Structures
// Planar audio buffer (better for SIMD)
struct AudioBuffer {
std::vector<float> left;
std::vector<float> right;
size_t num_samples;
size_t sample_rate;
// Ensure alignment for SIMD
static constexpr size_t alignment = 64; // Cache line
void resize(size_t n) {
// Allocate with alignment
left.resize(n);
right.resize(n);
num_samples = n;
}
};
// Biquad filter coefficients
struct BiquadCoeffs {
float b0, b1, b2; // Feedforward
float a1, a2; // Feedback (normalized, a0 = 1)
// Factory methods for different filter types
static BiquadCoeffs lowShelf(float fc, float gain_db, float sample_rate);
static BiquadCoeffs highShelf(float fc, float gain_db, float sample_rate);
static BiquadCoeffs parametric(float fc, float gain_db, float Q, float sample_rate);
static BiquadCoeffs lowPass(float fc, float Q, float sample_rate);
};
// Biquad filter state (per channel)
struct BiquadState {
float x1 = 0, x2 = 0; // Previous inputs
float y1 = 0, y2 = 0; // Previous outputs
};
// Stereo biquad filter
struct StereoBiquad {
BiquadCoeffs coeffs;
BiquadState left_state, right_state;
void process(AudioBuffer& buffer);
void reset() { left_state = {}; right_state = {}; }
};
// Compressor parameters and state
struct Compressor {
float threshold_linear; // Linear, not dB
float ratio;
float attack_coeff; // Envelope follower attack (per sample)
float release_coeff; // Envelope follower release
float envelope = 0; // Current envelope level
void process(AudioBuffer& buffer);
void setParams(float threshold_db, float ratio,
float attack_ms, float release_ms, float sample_rate);
};
// Limiter (simple brick-wall)
struct Limiter {
float ceiling_linear;
void process(AudioBuffer& buffer);
void setCeiling(float ceiling_db);
};
// Processing chain
class ProcessingChain {
std::vector<std::unique_ptr<Effect>> effects;
public:
void addEffect(std::unique_ptr<Effect> effect);
void process(AudioBuffer& buffer);
void reset(); // Clear all filter states
};
Algorithm Overview
Biquad Filter Algorithm:
FOR each sample n:
y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2]
// Update state
x[n-2] = x[n-1]
x[n-1] = x[n]
y[n-2] = y[n-1]
y[n-1] = y[n]
Time complexity: O(n) per channel
Space complexity: O(1) state per channel
Compressor Algorithm:
FOR each sample:
1. Compute sample amplitude (abs value)
2. Update envelope follower:
IF amplitude > envelope:
envelope += attack_coeff * (amplitude - envelope)
ELSE:
envelope += release_coeff * (amplitude - envelope)
3. Compute gain reduction:
IF envelope > threshold:
gain = threshold + (envelope - threshold) / ratio
gain = gain / envelope // Convert to multiplier
ELSE:
gain = 1.0
4. Apply gain to sample
SIMD Gain (Simple Example):
SIMD lanes: 8 (AVX)
FOR i = 0 to n STEP 8:
// Load 8 samples into SIMD register
samples = SIMD_LOAD(buffer[i:i+8])
// Multiply all 8 by gain
samples = samples * gain_vector
// Store 8 samples back
SIMD_STORE(buffer[i:i+8], samples)
Speedup: ~7-8x vs scalar
Implementation Guide
Development Environment Setup
# Required packages (Ubuntu/Debian)
sudo apt-get install libsndfile1-dev # WAV file I/O
sudo apt-get install libasound2-dev # ALSA for real-time (optional)
sudo apt-get install portaudio19-dev # PortAudio for cross-platform (optional)
# For macOS
brew install libsndfile portaudio
# Verify compiler SIMD support
g++ -march=native -Q --help=target | grep -E "avx|sse"
# Create project structure
mkdir -p audio_dsp/{src,include,tests,assets}
cd audio_dsp
Project Structure
audio_dsp/
├── CMakeLists.txt
├── include/
│ ├── audio_buffer.hpp # AudioBuffer class
│ ├── simd_ops.hpp # SIMD operations wrapper
│ ├── biquad.hpp # Biquad filter
│ ├── compressor.hpp # Dynamics compressor
│ ├── limiter.hpp # Brick-wall limiter
│ ├── chain.hpp # Processing chain
│ ├── wav_io.hpp # WAV file reading/writing
│ └── performance.hpp # Timing and metrics
├── src/
│ ├── main.cpp # CLI entry point
│ ├── audio_buffer.cpp
│ ├── biquad.cpp
│ ├── compressor.cpp
│ ├── limiter.cpp
│ ├── chain.cpp
│ └── wav_io.cpp
├── tests/
│ ├── test_biquad.cpp
│ ├── test_compressor.cpp
│ ├── test_simd.cpp
│ └── reference_signals/ # Known-good test signals
└── assets/
└── test_audio/ # Sample WAV files for testing
Implementation Phases
Phase 1: Foundation (Days 1-3)
Goals:
- Set up build system with SIMD support
- Implement AudioBuffer with format conversion
- Implement WAV file I/O
- Create basic SIMD gain operation
Tasks:
- Create CMakeLists.txt with SIMD flags: ```cmake cmake_minimum_required(VERSION 3.16) project(audio_dsp)
set(CMAKE_CXX_STANDARD 20) set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -march=native -O3”)
find_package(PkgConfig REQUIRED) pkg_check_modules(SNDFILE REQUIRED sndfile)
add_executable(audio_dsp src/main.cpp src/audio_buffer.cpp src/biquad.cpp src/compressor.cpp src/limiter.cpp src/chain.cpp src/wav_io.cpp )
target_include_directories(audio_dsp PRIVATE include ${SNDFILE_INCLUDE_DIRS} ) target_link_libraries(audio_dsp ${SNDFILE_LIBRARIES})
2. Implement AudioBuffer:
```cpp
// audio_buffer.hpp
#pragma once
#include <vector>
#include <experimental/simd>
namespace stdx = std::experimental;
struct AudioBuffer {
alignas(64) std::vector<float> left;
alignas(64) std::vector<float> right;
size_t num_samples = 0;
size_t sample_rate = 44100;
int num_channels = 2;
void resize(size_t n);
void fromInterleaved(const float* interleaved, size_t n);
void toInterleaved(float* interleaved) const;
void clear();
};
- Implement WAV I/O using libsndfile:
```cpp
// wav_io.hpp
#pragma once
#include “audio_buffer.hpp”
#include
bool loadWav(const std::string& path, AudioBuffer& buffer); bool saveWav(const std::string& path, const AudioBuffer& buffer);
4. Implement basic SIMD gain:
```cpp
// simd_ops.hpp
#pragma once
#include <experimental/simd>
namespace stdx = std::experimental;
using simd_f = stdx::native_simd<float>;
inline void apply_gain(float* samples, size_t n, float gain) {
simd_f gain_vec = gain;
const size_t step = simd_f::size();
size_t i = 0;
for (; i + step <= n; i += step) {
simd_f s(&samples[i], stdx::element_aligned);
s *= gain_vec;
s.copy_to(&samples[i], stdx::element_aligned);
}
// Scalar tail
for (; i < n; ++i) {
samples[i] *= gain;
}
}
Checkpoint: Load a WAV file, apply gain, save result. Verify by listening.
Phase 2: Biquad Filter (Days 4-7)
Goals:
- Implement biquad coefficient calculations
- Implement filter processing (scalar first, then optimize)
- Create EQ effect types (low-shelf, high-shelf, parametric)
- Test with sine sweeps
Tasks:
- Implement coefficient calculations (from Audio EQ Cookbook):
```cpp
// biquad.cpp
#include “biquad.hpp”
#include
BiquadCoeffs BiquadCoeffs::lowShelf(float fc, float gain_db, float sr) { float A = std::pow(10.0f, gain_db / 40.0f); float w0 = 2.0f * M_PI * fc / sr; float cos_w0 = std::cos(w0); float sin_w0 = std::sin(w0); float alpha = sin_w0 / 2.0f * std::sqrt((A + 1.0f/A) * 2.0f);
float a0 = (A + 1) + (A - 1) * cos_w0 + 2 * std::sqrt(A) * alpha;
BiquadCoeffs c;
c.b0 = A * ((A + 1) - (A - 1) * cos_w0 + 2 * std::sqrt(A) * alpha) / a0;
c.b1 = 2 * A * ((A - 1) - (A + 1) * cos_w0) / a0;
c.b2 = A * ((A + 1) - (A - 1) * cos_w0 - 2 * std::sqrt(A) * alpha) / a0;
c.a1 = -2 * ((A - 1) + (A + 1) * cos_w0) / a0;
c.a2 = ((A + 1) + (A - 1) * cos_w0 - 2 * std::sqrt(A) * alpha) / a0;
return c; }
// Similar implementations for highShelf, parametric, lowPass, highPass
2. Implement filter processing:
```cpp
void StereoBiquad::process(AudioBuffer& buffer) {
// Process left channel
for (size_t i = 0; i < buffer.num_samples; ++i) {
float x = buffer.left[i];
float y = coeffs.b0 * x
+ coeffs.b1 * left_state.x1
+ coeffs.b2 * left_state.x2
- coeffs.a1 * left_state.y1
- coeffs.a2 * left_state.y2;
left_state.x2 = left_state.x1;
left_state.x1 = x;
left_state.y2 = left_state.y1;
left_state.y1 = y;
buffer.left[i] = y;
}
// Same for right channel
}
- Create test with known signals:
// Generate sine sweep for testing void generateSineSweep(AudioBuffer& buffer, float f_start, float f_end) { for (size_t i = 0; i < buffer.num_samples; ++i) { float t = static_cast<float>(i) / buffer.sample_rate; float f = f_start + (f_end - f_start) * t / (buffer.num_samples / buffer.sample_rate); float sample = std::sin(2.0f * M_PI * f * t); buffer.left[i] = buffer.right[i] = sample; } }
Checkpoint: Apply low-pass filter to noise, verify high frequencies are attenuated.
Phase 3: SIMD Optimization (Days 8-10)
Goals:
- Optimize filter for SIMD (parallel channels)
- Implement SIMD compressor with
stdx::where() - Benchmark scalar vs SIMD
- Verify correctness matches scalar implementation
Tasks:
-
Multi-channel SIMD filter (process 8 channels at once): ```cpp // For true SIMD benefit with IIR, process multiple channels template
class MultiChannelBiquad { std::array<BiquadCoeffs, N_CHANNELS> coeffs; // State as AoS for SIMD simd_f x1, x2, y1, y2; simd_f b0, b1, b2, a1, a2;
public: void process(std::array<float*, N_CHANNELS> channels, size_t n) { for (size_t i = 0; i < n; ++i) { simd_f x = gather(channels, i);
simd_f y = b0 * x + b1 * x1 + b2 * x2 - a1 * y1 - a2 * y2;
x2 = x1; x1 = x;
y2 = y1; y1 = y;
scatter(channels, i, y);
}
} }; ```
- SIMD compressor:
void Compressor::process(AudioBuffer& buffer) { simd_f thresh = threshold_linear; simd_f ratio_inv = 1.0f / ratio; // Process in SIMD chunks for (size_t i = 0; i + simd_f::size() <= buffer.num_samples; i += simd_f::size()) { simd_f left(&buffer.left[i], stdx::element_aligned); simd_f right(&buffer.right[i], stdx::element_aligned); // Peak of stereo simd_f peak = stdx::max(stdx::abs(left), stdx::abs(right)); // Compute gain reduction auto over_thresh = peak > thresh; simd_f gain_reduction = thresh + (peak - thresh) * ratio_inv; simd_f gain = stdx::where(over_thresh, gain_reduction / peak, simd_f(1.0f)); // Apply gain left *= gain; right *= gain; left.copy_to(&buffer.left[i], stdx::element_aligned); right.copy_to(&buffer.right[i], stdx::element_aligned); } } - Benchmark framework: ```cpp struct BenchmarkResult { double scalar_time_ms; double simd_time_ms; double speedup; };
BenchmarkResult benchmark(std::function<void()> scalar_fn, std::function<void()> simd_fn, int iterations = 1000) { // Warmup for (int i = 0; i < 10; ++i) { scalar_fn(); simd_fn(); }
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < iterations; ++i) scalar_fn();
auto scalar_time = std::chrono::high_resolution_clock::now() - start;
start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < iterations; ++i) simd_fn();
auto simd_time = std::chrono::high_resolution_clock::now() - start;
return {
scalar_time.count() / 1e6 / iterations,
simd_time.count() / 1e6 / iterations,
scalar_time.count() / static_cast<double>(simd_time.count())
}; } ```
Checkpoint: SIMD version matches scalar output within floating-point tolerance, achieves 5x+ speedup.
Phase 4: Processing Chain & CLI (Days 11-13)
Goals:
- Implement effect chaining
- Build CLI interface
- Add limiter
- Add performance reporting
Tasks:
- Processing chain:
```cpp
class ProcessingChain {
std::vector<std::unique_ptr
> effects;
public:
template<typename T, typename… Args>
void add(Args&&… args) {
effects.push_back(std::make_unique
void process(AudioBuffer& buffer) {
for (auto& effect : effects) {
effect->process(buffer);
}
}
void reset() {
for (auto& effect : effects) {
effect->reset();
}
} }; ```
- Limiter:
void Limiter::process(AudioBuffer& buffer) { simd_f ceiling_pos = ceiling_linear; simd_f ceiling_neg = -ceiling_linear; for (size_t i = 0; i + simd_f::size() <= buffer.num_samples; i += simd_f::size()) { simd_f left(&buffer.left[i], stdx::element_aligned); simd_f right(&buffer.right[i], stdx::element_aligned); // Clamp to ceiling left = stdx::clamp(left, ceiling_neg, ceiling_pos); right = stdx::clamp(right, ceiling_neg, ceiling_pos); left.copy_to(&buffer.left[i], stdx::element_aligned); right.copy_to(&buffer.right[i], stdx::element_aligned); } } - CLI main:
int main(int argc, char* argv[]) { // Parse arguments std::string input_file, output_file; bool realtime_mode = false; // ... argparse ... // Load audio AudioBuffer buffer; if (!loadWav(input_file, buffer)) { std::cerr << "Failed to load " << input_file << "\n"; return 1; } // Build processing chain ProcessingChain chain; chain.add<LowShelfEQ>(100.0f, 3.0f, buffer.sample_rate); chain.add<ParametricEQ>(3000.0f, -2.0f, 1.5f, buffer.sample_rate); chain.add<Compressor>(-20.0f, 4.0f, 10.0f, 100.0f, buffer.sample_rate); chain.add<Limiter>(-0.3f); // Process with timing auto start = std::chrono::high_resolution_clock::now(); chain.process(buffer); auto elapsed = std::chrono::high_resolution_clock::now() - start; // Report performance double ms = elapsed.count() / 1e6; double realtime_ratio = (buffer.num_samples / buffer.sample_rate * 1000) / ms; std::cout << "Processed in " << ms << " ms\n"; std::cout << "Real-time ratio: " << realtime_ratio << "x\n"; // Save output saveWav(output_file, buffer); return 0; }
Checkpoint: Full pipeline works end-to-end, processes audio correctly.
Phase 5: Testing & Polish (Day 14)
Goals:
- Comprehensive testing
- VU meter display
- Documentation
- Handle edge cases
Tasks:
- Test suite:
- Unit tests for each effect
- Integration tests for full chain
- Compare SIMD vs scalar output
- Test with various WAV formats
- VU meter:
struct VUMeter { float peak_left = 0, peak_right = 0; float decay = 0.99f; void update(const AudioBuffer& buffer) { peak_left *= decay; peak_right *= decay; for (size_t i = 0; i < buffer.num_samples; ++i) { peak_left = std::max(peak_left, std::abs(buffer.left[i])); peak_right = std::max(peak_right, std::abs(buffer.right[i])); } } void display() { auto bar = [](float level) { int blocks = static_cast<int>(level * 10); return std::string(blocks, '#') + std::string(10 - blocks, ' '); }; std::cout << "[L: " << bar(peak_left) << "] " << "[R: " << bar(peak_right) << "]\r" << std::flush; } };
Testing Strategy
Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Verify individual components | Biquad coefficients correct, compressor gain reduction correct |
| Integration Tests | Verify full processing chain | Load WAV, process, save, verify output |
| Performance Tests | Verify speed requirements | SIMD >= 5x scalar, < 0.1ms per buffer |
| Audio Quality Tests | Verify output sounds correct | Visual inspection of waveforms, listening tests |
Critical Test Cases
- Biquad Coefficient Test:
void testBiquadCoefficients() { // Test low-pass at Nyquist/4 with Q=1 (Butterworth) auto c = BiquadCoeffs::lowPass(11025.0f, 0.707f, 44100.0f); // Verify at DC: gain should be 0dB float dc_gain = (c.b0 + c.b1 + c.b2) / (1 + c.a1 + c.a2); assert(std::abs(dc_gain - 1.0f) < 0.001f); // Verify at Nyquist: gain should be very low (< -60dB) // (Using z-transform evaluation) } - Impulse Response Test:
void testBiquadImpulse() { StereoBiquad filter; filter.setCoeffs(BiquadCoeffs::lowPass(1000.0f, 0.707f, 44100.0f)); AudioBuffer impulse(1024, 44100); impulse.left[0] = 1.0f; // Impulse filter.process(impulse); // Verify impulse response decays to near-zero assert(std::abs(impulse.left[1023]) < 0.0001f); } - Compressor Threshold Test:
void testCompressorThreshold() { Compressor comp(-20.0f, 4.0f, 0.0f, 0.0f, 44100.0f); // Instant attack/release AudioBuffer buffer(1024, 44100); // Fill with signal at -10dB (above threshold) float level = std::pow(10.0f, -10.0f / 20.0f); // ~0.316 std::fill(buffer.left.begin(), buffer.left.end(), level); comp.process(buffer); // Output should be at -20 + (-10 - -20) / 4 = -17.5 dB float expected = std::pow(10.0f, -17.5f / 20.0f); assert(std::abs(buffer.left[512] - expected) < 0.01f); } - SIMD Correctness Test:
void testSIMDMatchesScalar() { AudioBuffer buffer1(4096, 44100), buffer2(4096, 44100); // Fill with random data std::random_device rd; std::mt19937 gen(rd()); std::uniform_real_distribution<float> dist(-1.0f, 1.0f); for (size_t i = 0; i < 4096; ++i) { float sample = dist(gen); buffer1.left[i] = buffer2.left[i] = sample; } // Process with scalar CompressorScalar scalar_comp(-20, 4, 10, 100, 44100); scalar_comp.process(buffer1); // Process with SIMD CompressorSIMD simd_comp(-20, 4, 10, 100, 44100); simd_comp.process(buffer2); // Compare for (size_t i = 0; i < 4096; ++i) { assert(std::abs(buffer1.left[i] - buffer2.left[i]) < 1e-5f); } } - Real-Time Safety Test:
void testNoAllocations() { // Set up allocator tracking AllocationCounter counter; ProcessingChain chain; chain.add<LowShelfEQ>(100, 3, 44100); chain.add<Compressor>(-20, 4, 10, 100, 44100); AudioBuffer buffer(512, 44100); counter.reset(); for (int i = 0; i < 1000; ++i) { chain.process(buffer); } assert(counter.allocations() == 0); }
Test Data
# Generate test signals using sox or similar
# 1. White noise (tests all frequencies equally)
sox -n -r 44100 -c 2 tests/white_noise.wav synth 5 whitenoise
# 2. Sine sweep (visualize frequency response)
sox -n -r 44100 -c 2 tests/sine_sweep.wav synth 5 sine 20-20000
# 3. Impulse (measure impulse response)
sox -n -r 44100 -c 1 tests/impulse.wav synth 1 sine 0 pad 0 1
# 4. Music sample (subjective listening test)
# Use any royalty-free music file
Common Pitfalls & Debugging
Frequent Mistakes
| Pitfall | Symptom | Root Cause | Fix |
|---|---|---|---|
| Allocation in audio callback | Glitches under load | std::vector::push_back or new |
Pre-allocate all buffers |
| Denormalized floats | Performance cliff | Very small values slow down FPU | Use FTZ (Flush To Zero) mode |
| Uninitialized filter state | Click at start | x1, x2, y1, y2 are garbage |
Initialize all state to 0 |
| Wrong sample rate | Filter sounds wrong | Coefficients calculated for wrong fs | Pass sample rate to coefficient functions |
| Buffer alignment | SIMD crash | Unaligned loads/stores | Use alignas(64) on buffers |
| Interleaved assumed planar | Stereo swap / noise | Format mismatch | Convert format explicitly |
Debugging Strategies
- Visualize waveforms:
```bash
Use Audacity or Python matplotlib to view audio
import numpy as np import matplotlib.pyplot as plt from scipy.io import wavfile
rate, data = wavfile.read(‘output.wav’) plt.plot(data[:4410]) # First 100ms plt.show()
2. **Test with simple signals:**
- DC offset: Should pass through unchanged by EQ
- Sine at filter frequency: Should be affected predictably
- Impulse: Reveals filter response directly
3. **Compare with reference implementation:**
- JUCE's built-in filters
- Python's scipy.signal
- Online filter calculators (for coefficients)
4. **Enable denormal flushing:**
```cpp
#include <immintrin.h>
void setupFPU() {
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
}
- Profile with proper tools:
```bash
Linux perf
perf stat -e cycles,instructions,cache-misses ./audio_dsp –input test.wav –output out.wav
VTune (Intel)
vtune -collect hotspots ./audio_dsp …
macOS Instruments
instruments -t “Time Profiler” ./audio_dsp … ```
Performance Traps
- Not using
-march=native: Without this, compiler won’t use AVX/AVX2 - Debug builds: 100x slower than release; always profile release builds
- Unrolling too aggressively: Can hurt cache performance
- Function call overhead: Inline hot paths; use
inlineor LTO - Mixing scalar and SIMD: Transition costs; process entire buffers in SIMD mode
Extensions & Challenges
Beginner Extensions
- Add more filter types: High-pass, band-pass, notch, all-pass
- Fade in/out: Apply smooth volume envelope at start/end
- Mono to stereo: Duplicate mono channel to both outputs
- Normalize: Scale audio so peak reaches 0dB
- DC offset removal: High-pass at very low frequency (5Hz)
Intermediate Extensions
- Multi-band compressor: Split into low/mid/high, compress separately
- Look-ahead limiter: Delay signal to anticipate peaks
- Convolution reverb: Apply room impulse response (FIR with FFT)
- Oversampling: Process at 2x or 4x sample rate for better filters
- Parameter smoothing: Avoid clicks when changing parameters
- Sidechain compression: Compress based on different input signal
Advanced Extensions
- FFT spectral processing: Implement spectral EQ, noise reduction
- Real-time audio I/O: Use PortAudio or JUCE for live processing
- VST/AU plugin: Wrap as a plugin for DAWs
- GPU acceleration: Use CUDA/OpenCL for convolution
- SIMD convolution: Implement overlap-add with AVX
- Latency compensation: Report and compensate for processing delay
- State variable filter: Implement with better numerical behavior
Research Extensions
- Model-based compression: Emulate analog compressor behavior
- Neural audio effects: Use ML for effect modeling
- Spatial audio: Implement HRTF and binaural processing
- Adaptive filtering: Echo cancellation with LMS algorithm
Resources
Essential Reading
- “DAFX: Digital Audio Effects” (Zolzer) - Comprehensive DSP reference
- “The Audio EQ Cookbook” (Robert Bristow-Johnson) - Biquad coefficient formulas
- “Designing Audio Effect Plugins in C++” (Will Pirkle) - Practical plugin development
- “Real-Time Audio Signal Processing” (Boulanger) - Fundamentals of real-time audio
Online Resources
- The Audio Programmer (YouTube) - JUCE tutorials and audio DSP
- Bela Blog - Real-time audio programming articles
- musicdsp.org - Algorithm library and code snippets
- Audio EQ Cookbook - Filter coefficient formulas
- KVR Audio DSP Forum - Developer discussions
Documentation
- JUCE Documentation - Industry-standard audio framework
- PortAudio Documentation - Cross-platform audio I/O
- libsndfile Documentation - Audio file reading/writing
- std::experimental::simd - C++ SIMD types
Related Projects
- JUCE - Full-featured audio framework
- libsndfile - Audio file I/O library
- RTAudio - Real-time audio I/O
- fftw - Fast FFT library
- KissFFT - Simple FFT library
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Digital filter design | DAFX: Digital Audio Effects | Ch. 2 |
| Biquad implementation | Designing Audio Effect Plugins in C++ | Ch. 6-8 |
| Dynamics processing | DAFX: Digital Audio Effects | Ch. 4 |
| SIMD programming | Intel Intrinsics Guide | (online) |
| Real-time audio | Real-Time Audio Signal Processing | Ch. 1-3 |
| C++ performance | Optimizing C++ (Bulka) | Ch. 8-10 |
Self-Assessment Checklist
Before considering this project complete, verify:
Understanding
- I can explain why audio has hard real-time constraints and calculate latency from buffer size
- I understand the difference between IIR and FIR filters and when to use each
- I can derive biquad coefficients for basic filter types using the EQ Cookbook
- I understand why IIR filters are difficult to parallelize with SIMD
- I can explain the tradeoffs of interleaved vs planar audio formats
- I know the rules for real-time safe code (no allocations, no locks, no I/O)
- I understand how compression works (threshold, ratio, attack, release)
- I can calculate SIMD speedup and explain why it may be less than lane count
Implementation
- WAV file loading and saving works correctly
- Gain control with SIMD achieves 7x+ speedup over scalar
- Biquad filter produces correct frequency response
- Compressor applies gain reduction correctly above threshold
- Limiter prevents clipping (all samples within ceiling)
- Processing chain runs all effects in sequence
- SIMD version produces identical output to scalar (within tolerance)
- No memory allocations occur during processing
- Processing time is under 0.1ms per 512-sample buffer
Performance
- SIMD implementation achieves >= 5x speedup on full chain
- Real-time ratio is >= 500x
- No glitches during extended processing
- Profile shows time spent in SIMD loops, not scalar tails
Quality
- Processed audio sounds correct (subjective listening test)
- Filter frequency response matches design
- No clicks or pops at buffer boundaries
- Handles mono and stereo files correctly
Submission / Completion Criteria
Minimum Viable Completion
- Load and save WAV files (at least 16-bit stereo @ 44.1kHz)
- Implement SIMD gain control with measurable speedup
- Implement at least one biquad filter type (e.g., low-pass)
- Implement basic compressor (threshold + ratio)
- Report processing time per buffer
- Runs without crashes on valid input
Full Completion
- All functional requirements F1-F9 implemented
- Performance meets all non-functional requirements (N1-N5)
- Multiple filter types: low-shelf, high-shelf, parametric
- Compressor with attack and release
- Limiter
- Processing chain with multiple effects
- Benchmark comparison showing SIMD vs scalar speedup
- VU meter display
- Clean error handling for invalid files
Excellence (Going Above & Beyond)
- Real-time audio I/O (F10)
- Multi-band compressor
- Look-ahead limiter
- Convolution reverb
- FFT-based spectral processing
- VST/AU plugin wrapper
- GPU acceleration for convolution
- Comprehensive test suite with CI integration
Deliverables:
- Source code with clear organization and comments
- CMakeLists.txt for building
- README with:
- Build instructions
- Usage examples
- Performance results
- Architecture overview
- Test files demonstrating each effect
- Performance benchmark results (scalar vs SIMD)
- Brief writeup explaining SIMD optimization strategy
The Interview Questions They’ll Ask
After completing this project, you’ll be ready for these questions:
- “Why is audio processing considered ‘hard real-time’?”
- Answer: Audio has strict deadlines (buffer period). Missing a deadline causes audible glitches. Unlike “soft” real-time (video), even occasional misses are unacceptable.
- “What makes SIMD difficult for IIR filters?”
- Answer: IIR filters have temporal dependencies (output depends on previous output). You can’t compute y[n+1] until you know y[n]. Solutions: process multiple channels in parallel, use FIR where possible, or restructure algorithms.
- “How would you prevent allocations in an audio callback?”
- Answer: Pre-allocate all buffers at initialization. Use fixed-size arrays or pre-sized vectors. Avoid
std::string,std::vector::push_back,std::map::operator[], exceptions with allocation. Consider a custom allocator that panics on allocation for testing.
- Answer: Pre-allocate all buffers at initialization. Use fixed-size arrays or pre-sized vectors. Avoid
- “Explain the difference between interleaved and planar audio.”
- Answer: Interleaved alternates channels (LRLRLR), common in APIs. Planar groups channels (LLLLRRRR), better for SIMD. SIMD processes consecutive memory efficiently; planar gives consecutive samples per channel.
- “How does a biquad filter work?”
- Answer: It’s a second-order IIR filter with 5 coefficients. Output is weighted sum of current/past inputs and past outputs:
y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2]. Different coefficient calculations create different filter types.
- Answer: It’s a second-order IIR filter with 5 coefficients. Output is weighted sum of current/past inputs and past outputs:
- “What’s the latency of your audio processor?”
- Answer: Latency = buffer_size / sample_rate. For 512 samples at 44.1kHz, that’s 11.6ms. Plus any processing delay (look-ahead in limiter, group delay in filters). I can reduce latency by using smaller buffers at the cost of higher CPU usage per sample.
- “How would you debug a glitch in real-time audio?”
- Answer: Check for allocations (custom allocator that asserts), check for locks, profile for CPU spikes, log timing of each buffer, visualize audio for anomalies, reduce buffer size to make problem more frequent, add diagnostics that don’t affect real-time performance.
- “What SIMD speedup did you achieve and why wasn’t it 8x (for AVX)?”
- Answer: Achieved 5-7x speedup. Less than 8x due to: memory bandwidth limits, scalar tail handling, data dependencies in filters, function call overhead, cache effects. IIR filters particularly limited by sequential nature.
This project demonstrates mastery of both SIMD programming and real-time systems constraints. Audio processing is one of the most demanding SIMD applications because of the hard latency requirements. The skills you develop here apply directly to game audio, music production software, telecommunications, and any performance-critical signal processing application.
Related Projects:
- Previous: P12: SIMD Math Library - Foundation for this project
- Next: P14: Real-Time Game Physics - Combines SIMD with parallel algorithms
For the complete learning path, see the project index.