Project 13: Audio DSP with SIMD

Build real-time audio effects (EQ, reverb, compression) using SIMD to process multiple samples simultaneously, meeting the strict latency requirements of audio processing.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	2-3 weeks
Language	C++
Prerequisites	Project 12 (SIMD Math Library), basic audio concepts (samples, sample rate), familiarity with filters helpful
Key Topics	SIMD (std::simd), real-time audio, IIR/FIR filters, buffer management, biquad filters, dynamics processing
Coolness Level	Level 5: Pure Magic (Super Cool)
Business Potential	Micro-SaaS / Pro Tool

Learning Objectives

After completing this project, you will:

Master real-time audio constraints: Understand why audio processing has hard deadlines and how to meet them consistently
Implement SIMD audio processing: Apply vectorization techniques to audio buffers, achieving 5-8x speedups
Build classic audio effects: Create EQ, compression, and limiting effects using efficient algorithms
Understand filter design: Implement biquad filters and understand IIR vs FIR tradeoffs
Handle buffer formats: Convert between interleaved and planar audio formats, understanding when each is optimal
Apply real-time programming rules: Avoid allocations, locks, and I/O in audio callbacks
Profile and optimize audio code: Measure latency, ensure glitch-free playback, and verify SIMD effectiveness

Theoretical Foundation

Core Concepts

Sample Rates and Buffers

Digital audio represents sound as a sequence of samples taken at regular intervals. The sample rate determines how many samples per second are captured:

Sample Rates and Their Uses:

 44.1 kHz ─────┐ CD quality, most music
              │
 48 kHz ──────┤ Professional video, DAWs
              │
 96 kHz ──────┤ High-resolution audio
              │
 192 kHz ─────┘ Archival, studio masters

Time per sample at 44.1 kHz:
  1 / 44100 = 22.7 microseconds

Buffer of 512 samples:
  512 / 44100 = 11.6 milliseconds of audio

Audio is processed in buffers (also called blocks or frames). Common buffer sizes:

Buffer Size	Latency @ 44.1kHz	Use Case
64 samples	1.5 ms	Live performance, very low latency
128 samples	2.9 ms	Professional DAWs
256 samples	5.8 ms	Standard production
512 samples	11.6 ms	General purpose
1024 samples	23.2 ms	High-latency, CPU-limited systems

The buffer size creates a fundamental tradeoff: smaller buffers mean lower latency but require more CPU efficiency (the callback runs more frequently).

Interleaved vs Planar Audio Formats

Audio data can be organized two ways:

INTERLEAVED FORMAT (Common in APIs):
┌─────────────────────────────────────────────────┐
│ L0 R0 L1 R1 L2 R2 L3 R3 L4 R4 L5 R5 L6 R6 L7 R7 │
└─────────────────────────────────────────────────┘
         ↑
  Samples alternate between channels

PLANAR FORMAT (Better for SIMD):
┌─────────────────────────────────────────────────┐
│ L0 L1 L2 L3 L4 L5 L6 L7 │ R0 R1 R2 R3 R4 R5 R6 R7 │
└─────────────────────────────────────────────────┘
         ↑                          ↑
  All left samples       All right samples
  contiguous             contiguous

Why planar is better for SIMD:

SIMD processes N samples simultaneously (e.g., 8 with AVX)
Planar format: Load 8 consecutive left samples, process, store
Interleaved format: Samples are scattered, requiring shuffles

// SIMD with PLANAR format - simple and fast
void apply_gain_planar(float* channel, size_t n, float gain) {
    using simd_t = stdx::native_simd<float>;
    simd_t gain_vec = gain;  // Broadcast scalar to all lanes

    for (size_t i = 0; i + simd_t::size() <= n; i += simd_t::size()) {
        simd_t s(&channel[i], stdx::element_aligned);
        s *= gain_vec;
        s.copy_to(&channel[i], stdx::element_aligned);
    }
}

// SIMD with INTERLEAVED format - requires shuffling
// Much more complex and often not worth it for stereo

Digital Filters: IIR and FIR

Filters are the foundation of audio effects. Two fundamental types:

FIR (Finite Impulse Response):

y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] + ... + bM*x[n-M]

Output depends only on input samples (no feedback)
- Always stable
- Linear phase possible
- Requires more coefficients for sharp cutoffs
- Easily parallelized with SIMD

IIR (Infinite Impulse Response):

y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2]

Output depends on previous outputs (feedback)
- Can be unstable if poorly designed
- Non-linear phase
- Very efficient (few coefficients)
- The biquad is the most common form

The Biquad Filter:

The second-order IIR filter (biquad) is the building block of audio EQ:

BIQUAD DIFFERENCE EQUATION:

y[n] = (b0/a0)*x[n] + (b1/a0)*x[n-1] + (b2/a0)*x[n-2]
                    - (a1/a0)*y[n-1] - (a2/a0)*y[n-2]

Typically normalized so a0 = 1:

y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2]


BIQUAD SIGNAL FLOW:

    x[n] ───┬───► [b0] ───┬───────────────────────────► y[n]
            │             │                              │
            ▼             │                              │
         [z⁻¹]            │                              │
            │             │                              │
    x[n-1] ─┴───► [b1] ──►(+)◄── [-a1] ◄───── [z⁻¹] ◄───┘
            │             │                      │
            ▼             │                      │
         [z⁻¹]            │                      │
            │             │                      │
    x[n-2] ─┴───► [b2] ──►(+)◄── [-a2] ◄───── [z⁻¹]

Different coefficient calculations create different filter types:

Low-pass: Attenuates high frequencies
High-pass: Attenuates low frequencies
Band-pass: Passes a range of frequencies
Notch: Removes a specific frequency
Peak/EQ: Boosts or cuts around a center frequency
Low-shelf / High-shelf: Boosts or cuts below/above a frequency

SIMD Challenge with IIR Filters

IIR filters have a fundamental problem for SIMD: temporal dependencies.

y[0] = b0*x[0] + b1*x[-1] + b2*x[-2] - a1*y[-1] - a2*y[-2]
y[1] = b0*x[1] + b1*x[0]  + b2*x[-1] - a1*y[0]  - a2*y[-1]
                                           ↑
                                 Depends on y[0]!
                                 Can't compute in parallel

Solutions:

Process channels in parallel: 8 mono channels = 8 SIMD lanes

// Process 8 different audio channels simultaneously
// Each channel has its own filter state
struct MultiChannelFilter {
    std::array<BiquadState, 8> states;  // One per SIMD lane

    void process(float* channels[8], size_t n) {
        for (size_t i = 0; i < n; ++i) {
            simd_t x = gather_sample(channels, i);
            simd_t y = biquad_step(x);  // All 8 channels at once
            scatter_sample(channels, i, y);
        }
    }
};

Transposed Direct Form II: Reduces dependencies (advanced)

w[n] = x[n] - a1*w[n-1] - a2*w[n-2]
y[n] = b0*w[n] + b1*w[n-1] + b2*w[n-2]

Separates input and output dependencies

Block processing for FIR portions: Batch the feedforward part

// Compute all x-term contributions in parallel
// Then sequentially add y-term contributions

Dynamics Processing: Compression

A compressor reduces the dynamic range of audio by attenuating signals above a threshold:

COMPRESSION TRANSFER CURVE:

Output dB
    ▲
    │                    ╱ Threshold
    │                ╱
    │            ╱
    │        ╱        ← Ratio 4:1
    │    ╱                (4dB input → 1dB output above threshold)
    │╱
    └───────────────────────────► Input dB
         Threshold

Key Parameters:
- Threshold: Level where compression begins
- Ratio: How much to reduce (4:1 means 4dB in → 1dB out above threshold)
- Attack: How quickly compression engages
- Release: How quickly compression releases

SIMD Compression Implementation:

void compress(float* samples, size_t n, float threshold, float ratio) {
    using simd_t = stdx::native_simd<float>;
    simd_t thresh_vec = threshold;
    simd_t ratio_vec = ratio;

    for (size_t i = 0; i + simd_t::size() <= n; i += simd_t::size()) {
        simd_t s(&samples[i], stdx::element_aligned);
        simd_t abs_s = stdx::abs(s);

        // Where abs > threshold, apply compression
        auto mask = abs_s > thresh_vec;
        simd_t compressed = thresh_vec + (abs_s - thresh_vec) / ratio_vec;

        // Preserve original sign, apply compressed magnitude
        s = stdx::where(mask, stdx::copysign(compressed, s), s);

        s.copy_to(&samples[i], stdx::element_aligned);
    }
}

The key insight: stdx::where() enables conditional processing without branches, which would destroy SIMD efficiency.

Why This Matters: Real-Time Constraints

Audio processing has hard real-time requirements. Miss a deadline, and users hear glitches:

AUDIO BUFFER TIMELINE:

│◄──────── Buffer Period (11.6ms @ 512/44.1k) ────────────────────►│
│                                                                   │
│  Audio     │ Buffer N          │ Buffer N+1        │ Buffer N+2  │
│  Output:   │ Playing           │ Playing           │ Playing     │
│            │                   │                   │             │
│  Your      │      │◄─ Must ─►│                   │             │
│  Code:     │      │  finish  │                   │             │
│            │      │  here    │                   │             │
│            │                   │                   │             │
│            Buffer N+1          Buffer N+2          Buffer N+3    │
│            Processing          Processing          Processing    │
│                                                                   │

If you don't finish processing Buffer N+1 before Buffer N finishes
playing, you get an underrun (glitch/crackle/silence).

The Real-Time Rules:

No memory allocations in the audio callback
- new, malloc, std::vector::push_back can block
- Pre-allocate everything before audio starts
No locks (mutexes) in the audio callback
- Locks can block waiting for other threads
- Use lock-free queues for communication
No I/O or system calls
- File I/O, network, logging can block indefinitely
- Buffer log messages and write from another thread
Pre-allocate everything
- Filter coefficients computed before processing
- Delay lines sized at initialization
- Scratch buffers allocated once

// BAD: Allocation in audio callback
void processBlock(float* buffer, size_t n) {
    std::vector<float> temp(n);  // ALLOCATION! Can block!
    // ...
}

// GOOD: Pre-allocated
class AudioProcessor {
    std::vector<float> temp_buffer;  // Allocated in constructor

    AudioProcessor(size_t max_buffer_size)
        : temp_buffer(max_buffer_size) {}

    void processBlock(float* buffer, size_t n) {
        // Use temp_buffer, no allocation
    }
};

Historical Context

Digital Audio Timeline:

1957: Max Mathews at Bell Labs creates MUSIC I, first computer music program
1965: Cooley-Tukey FFT algorithm enables efficient spectral processing
1979: Sony/Philips develop CD format (44.1 kHz, 16-bit)
1983: MIDI standard enables synthesizer control
1996: MP3 popularizes digital music distribution
1999: Pro Tools 5.0 with DSP accelerator cards
2000s: CPU speed enables real-time plugin processing
2010s: SIMD instructions (AVX, AVX-512) enable massive parallelism
Today: Software plugins rival hardware, real-time ML audio processing

Why SIMD transformed audio:

Before SIMD: Each sample processed individually
With AVX: 8 samples processed in one instruction
Processing time: 0.15ms (scalar) vs 0.02ms (SIMD) per buffer
Enables: More plugins, lower latency, higher sample rates

Common Misconceptions

Misconception 1: “Higher sample rates always sound better” Reality: Human hearing tops out at ~20kHz. 44.1kHz captures this perfectly (Nyquist). Higher rates matter for processing headroom and easier filter design, not “hearing” more.

Misconception 2: “SIMD automatically makes audio code faster” Reality: SIMD only helps when:

You have enough parallel data (small buffers may not benefit)
Data is properly aligned and contiguous
Algorithm maps to SIMD operations (filters with feedback are hard)

Misconception 3: “Floating-point is always better than integer for audio” Reality: 32-bit float is standard in plugins, but:

24-bit fixed-point has identical dynamic range
Integer can be faster (especially on embedded)
Modern converters are 24-bit; 32-bit float is computational convenience

Misconception 4: “Real-time means fast” Reality: Real-time means predictable. A function that takes exactly 10ms every time is “real-time safe.” A function that takes 1ms 99.9% of the time but 100ms occasionally is NOT.

Project Specification

What You Will Build

A command-line audio effects processor that:

Reads WAV files and processes them through a chain of effects
Implements classic audio effects: EQ, compressor, limiter
Uses SIMD for all sample processing
Optionally supports real-time audio passthrough (stretch goal)
Reports performance metrics comparing scalar vs SIMD implementations

Functional Requirements

ID	Requirement	Priority
F1	Load and save WAV files (16-bit and 24-bit, mono and stereo)	Must
F2	Apply gain with SIMD (volume control)	Must
F3	Implement parametric EQ with biquad filters	Must
F4	Implement dynamics compressor with threshold and ratio	Must
F5	Implement limiter (brick-wall)	Must
F6	Support effect chaining (multiple effects in series)	Must
F7	Report processing time per buffer	Must
F8	Compare SIMD vs scalar performance	Must
F9	Display VU meters (peak level per channel)	Should
F10	Real-time audio passthrough mode	Could

Non-Functional Requirements

ID	Requirement	Target
N1	Processing time per 512-sample buffer	< 0.1ms
N2	SIMD speedup vs scalar	>= 5x
N3	Real-time ratio	>= 500x (can process 500x faster than real-time)
N4	Latency in real-time mode	< 15ms
N5	No allocations during audio processing	0 allocations
N6	Support files up to 1 hour	~180 million samples

Performance Metrics

Your processor must meet these performance targets:

PERFORMANCE TARGETS:

Buffer Size: 512 samples
Sample Rate: 44.1 kHz
Real-time budget: 11.6 ms per buffer

Your target: < 0.1 ms per buffer
            = 116x safety margin
            = Can run 116 effect chains simultaneously

SIMD Speedup Requirements:
┌─────────────────────┬─────────────┬──────────────┐
│ Effect              │ Scalar Time │ SIMD Speedup │
├─────────────────────┼─────────────┼──────────────┤
│ Gain                │ 0.02 ms     │ >= 7x        │
│ Biquad EQ (stereo)  │ 0.08 ms     │ >= 2x        │
│ Compressor          │ 0.05 ms     │ >= 5x        │
│ Full chain          │ 0.15 ms     │ >= 5x        │
└─────────────────────┴─────────────┴──────────────┘

Real World Outcome

When complete, your audio processor should produce output like this:

$ ./audio_dsp --input music.wav --output processed.wav

Input: music.wav
  Duration: 3:45 (13,230,000 samples)
  Sample rate: 44100 Hz
  Channels: 2 (stereo)

Processing chain:
  1. Low-shelf EQ (+3dB @ 100Hz)
  2. Parametric EQ (-2dB @ 3kHz, Q=1.5)
  3. Compressor (threshold=-20dB, ratio=4:1)
  4. Limiter (ceiling=-0.3dB)

Performance:
  Samples per SIMD op: 8 (AVX)
  Time per buffer (512 samples): 0.02ms
  Real-time ratio: 580x (could process 580 streams in real-time!)

Comparison:
  Scalar processing: 0.15ms per buffer
  SIMD processing: 0.02ms per buffer
  Speedup: 7.5x

Output saved to processed.wav

For optional real-time mode:

$ ./audio_dsp --realtime --device "Built-in Audio"
Real-time mode: Processing live audio
Latency: 11.6ms (512 samples @ 44.1kHz)
Running... Press Ctrl+C to stop
[VU Meter] L: ████████░░ -6dB  R: ███████░░░ -8dB

Solution Architecture

High-Level Design

AUDIO DSP PROCESSING ARCHITECTURE:

┌─────────────────────────────────────────────────────────────────────────────┐
│                              audio_dsp CLI                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌────────────┐     ┌────────────┐     ┌────────────────────────────────┐  │
│   │  WAV I/O   │────►│  Buffer    │────►│     Processing Chain           │  │
│   │  (libsnd)  │     │  Manager   │     │                                │  │
│   └────────────┘     └────────────┘     │  ┌──────┐  ┌─────┐  ┌───────┐ │  │
│         │                  │            │  │  EQ  │─►│Comp │─►│Limiter│ │  │
│         │                  │            │  └──────┘  └─────┘  └───────┘ │  │
│         ▼                  ▼            └────────────────────────────────┘  │
│   ┌────────────┐     ┌────────────┐                    │                    │
│   │ Format     │     │ Planar ◄──►│                    ▼                    │
│   │ Conversion │     │ Interleave │     ┌────────────────────────────────┐  │
│   └────────────┘     └────────────┘     │     Performance Monitor        │  │
│                                          │  (timing, VU meters, stats)    │  │
│                                          └────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

PROCESSING CHAIN DETAIL:

Input Buffer          Effect 1            Effect 2            Output Buffer
(Planar Float)         (EQ)              (Compressor)        (Planar Float)

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ L0 L1 L2... │───►│   Biquad    │───►│  Envelope   │───►│ L0 L1 L2... │
│ R0 R1 R2... │    │   Filter    │    │  Follower   │    │ R0 R1 R2... │
└─────────────┘    │   (SIMD)    │    │   + Gain    │    └─────────────┘
                   └─────────────┘    │   (SIMD)    │
                                      └─────────────┘

Key Components

Component	Responsibility	SIMD Usage
`AudioBuffer`	Hold planar audio data, manage format conversion	Load/store aligned
`BiquadFilter`	Apply EQ (low-shelf, high-shelf, parametric)	Parallel channel processing
`Compressor`	Dynamics processing with envelope follower	Conditional with `where()`
`Limiter`	Hard clip/limit to prevent distortion	Clamping with min/max
`ProcessingChain`	Ordered list of effects	N/A (orchestration)
`PerformanceMonitor`	Measure and report timing	N/A (measurement)
`WavIO`	Read/write WAV files	N/A (file I/O)

Data Structures

// Planar audio buffer (better for SIMD)
struct AudioBuffer {
    std::vector<float> left;
    std::vector<float> right;
    size_t num_samples;
    size_t sample_rate;

    // Ensure alignment for SIMD
    static constexpr size_t alignment = 64;  // Cache line

    void resize(size_t n) {
        // Allocate with alignment
        left.resize(n);
        right.resize(n);
        num_samples = n;
    }
};

// Biquad filter coefficients
struct BiquadCoeffs {
    float b0, b1, b2;  // Feedforward
    float a1, a2;       // Feedback (normalized, a0 = 1)

    // Factory methods for different filter types
    static BiquadCoeffs lowShelf(float fc, float gain_db, float sample_rate);
    static BiquadCoeffs highShelf(float fc, float gain_db, float sample_rate);
    static BiquadCoeffs parametric(float fc, float gain_db, float Q, float sample_rate);
    static BiquadCoeffs lowPass(float fc, float Q, float sample_rate);
};

// Biquad filter state (per channel)
struct BiquadState {
    float x1 = 0, x2 = 0;  // Previous inputs
    float y1 = 0, y2 = 0;  // Previous outputs
};

// Stereo biquad filter
struct StereoBiquad {
    BiquadCoeffs coeffs;
    BiquadState left_state, right_state;

    void process(AudioBuffer& buffer);
    void reset() { left_state = {}; right_state = {}; }
};

// Compressor parameters and state
struct Compressor {
    float threshold_linear;  // Linear, not dB
    float ratio;
    float attack_coeff;      // Envelope follower attack (per sample)
    float release_coeff;     // Envelope follower release
    float envelope = 0;      // Current envelope level

    void process(AudioBuffer& buffer);
    void setParams(float threshold_db, float ratio,
                   float attack_ms, float release_ms, float sample_rate);
};

// Limiter (simple brick-wall)
struct Limiter {
    float ceiling_linear;

    void process(AudioBuffer& buffer);
    void setCeiling(float ceiling_db);
};

// Processing chain
class ProcessingChain {
    std::vector<std::unique_ptr<Effect>> effects;

public:
    void addEffect(std::unique_ptr<Effect> effect);
    void process(AudioBuffer& buffer);
    void reset();  // Clear all filter states
};

Algorithm Overview

Biquad Filter Algorithm:

FOR each sample n:
    y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2]

    // Update state
    x[n-2] = x[n-1]
    x[n-1] = x[n]
    y[n-2] = y[n-1]
    y[n-1] = y[n]

Time complexity: O(n) per channel
Space complexity: O(1) state per channel

Compressor Algorithm:

FOR each sample:
    1. Compute sample amplitude (abs value)
    2. Update envelope follower:
       IF amplitude > envelope:
           envelope += attack_coeff * (amplitude - envelope)
       ELSE:
           envelope += release_coeff * (amplitude - envelope)
    3. Compute gain reduction:
       IF envelope > threshold:
           gain = threshold + (envelope - threshold) / ratio
           gain = gain / envelope  // Convert to multiplier
       ELSE:
           gain = 1.0
    4. Apply gain to sample

SIMD Gain (Simple Example):

SIMD lanes: 8 (AVX)

FOR i = 0 to n STEP 8:
    // Load 8 samples into SIMD register
    samples = SIMD_LOAD(buffer[i:i+8])

    // Multiply all 8 by gain
    samples = samples * gain_vector

    // Store 8 samples back
    SIMD_STORE(buffer[i:i+8], samples)

Speedup: ~7-8x vs scalar

Implementation Guide

Development Environment Setup

# Required packages (Ubuntu/Debian)
sudo apt-get install libsndfile1-dev  # WAV file I/O
sudo apt-get install libasound2-dev   # ALSA for real-time (optional)
sudo apt-get install portaudio19-dev  # PortAudio for cross-platform (optional)

# For macOS
brew install libsndfile portaudio

# Verify compiler SIMD support
g++ -march=native -Q --help=target | grep -E "avx|sse"

# Create project structure
mkdir -p audio_dsp/{src,include,tests,assets}
cd audio_dsp

Project Structure

audio_dsp/
├── CMakeLists.txt
├── include/
│   ├── audio_buffer.hpp     # AudioBuffer class
│   ├── simd_ops.hpp         # SIMD operations wrapper
│   ├── biquad.hpp           # Biquad filter
│   ├── compressor.hpp       # Dynamics compressor
│   ├── limiter.hpp          # Brick-wall limiter
│   ├── chain.hpp            # Processing chain
│   ├── wav_io.hpp           # WAV file reading/writing
│   └── performance.hpp      # Timing and metrics
├── src/
│   ├── main.cpp             # CLI entry point
│   ├── audio_buffer.cpp
│   ├── biquad.cpp
│   ├── compressor.cpp
│   ├── limiter.cpp
│   ├── chain.cpp
│   └── wav_io.cpp
├── tests/
│   ├── test_biquad.cpp
│   ├── test_compressor.cpp
│   ├── test_simd.cpp
│   └── reference_signals/   # Known-good test signals
└── assets/
    └── test_audio/          # Sample WAV files for testing

Implementation Phases

Phase 1: Foundation (Days 1-3)

Goals:

Set up build system with SIMD support
Implement AudioBuffer with format conversion
Implement WAV file I/O
Create basic SIMD gain operation

Tasks:

Create CMakeLists.txt with SIMD flags: ```cmake cmake_minimum_required(VERSION 3.16) project(audio_dsp)

set(CMAKE_CXX_STANDARD 20) set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -march=native -O3”)

find_package(PkgConfig REQUIRED) pkg_check_modules(SNDFILE REQUIRED sndfile)

add_executable(audio_dsp src/main.cpp src/audio_buffer.cpp src/biquad.cpp src/compressor.cpp src/limiter.cpp src/chain.cpp src/wav_io.cpp )

target_include_directories(audio_dsp PRIVATE include ${SNDFILE_INCLUDE_DIRS} ) target_link_libraries(audio_dsp ${SNDFILE_LIBRARIES})

2. Implement AudioBuffer:
```cpp
// audio_buffer.hpp
#pragma once
#include <vector>
#include <experimental/simd>

namespace stdx = std::experimental;

struct AudioBuffer {
    alignas(64) std::vector<float> left;
    alignas(64) std::vector<float> right;
    size_t num_samples = 0;
    size_t sample_rate = 44100;
    int num_channels = 2;

    void resize(size_t n);
    void fromInterleaved(const float* interleaved, size_t n);
    void toInterleaved(float* interleaved) const;
    void clear();
};

Implement WAV I/O using libsndfile: ```cpp // wav_io.hpp #pragma once #include “audio_buffer.hpp” #include

bool loadWav(const std::string& path, AudioBuffer& buffer); bool saveWav(const std::string& path, const AudioBuffer& buffer);

4. Implement basic SIMD gain:
```cpp
// simd_ops.hpp
#pragma once
#include <experimental/simd>

namespace stdx = std::experimental;
using simd_f = stdx::native_simd<float>;

inline void apply_gain(float* samples, size_t n, float gain) {
    simd_f gain_vec = gain;
    const size_t step = simd_f::size();

    size_t i = 0;
    for (; i + step <= n; i += step) {
        simd_f s(&samples[i], stdx::element_aligned);
        s *= gain_vec;
        s.copy_to(&samples[i], stdx::element_aligned);
    }
    // Scalar tail
    for (; i < n; ++i) {
        samples[i] *= gain;
    }
}

Checkpoint: Load a WAV file, apply gain, save result. Verify by listening.

Phase 2: Biquad Filter (Days 4-7)

Goals:

Implement biquad coefficient calculations
Implement filter processing (scalar first, then optimize)
Create EQ effect types (low-shelf, high-shelf, parametric)
Test with sine sweeps

Tasks:

Implement coefficient calculations (from Audio EQ Cookbook): ```cpp // biquad.cpp #include “biquad.hpp” #include

BiquadCoeffs BiquadCoeffs::lowShelf(float fc, float gain_db, float sr) { float A = std::pow(10.0f, gain_db / 40.0f); float w0 = 2.0f * M_PI * fc / sr; float cos_w0 = std::cos(w0); float sin_w0 = std::sin(w0); float alpha = sin_w0 / 2.0f * std::sqrt((A + 1.0f/A) * 2.0f);

float a0 = (A + 1) + (A - 1) * cos_w0 + 2 * std::sqrt(A) * alpha;

BiquadCoeffs c;
c.b0 = A * ((A + 1) - (A - 1) * cos_w0 + 2 * std::sqrt(A) * alpha) / a0;
c.b1 = 2 * A * ((A - 1) - (A + 1) * cos_w0) / a0;
c.b2 = A * ((A + 1) - (A - 1) * cos_w0 - 2 * std::sqrt(A) * alpha) / a0;
c.a1 = -2 * ((A - 1) + (A + 1) * cos_w0) / a0;
c.a2 = ((A + 1) + (A - 1) * cos_w0 - 2 * std::sqrt(A) * alpha) / a0;

return c; }

// Similar implementations for highShelf, parametric, lowPass, highPass

2. Implement filter processing:
```cpp
void StereoBiquad::process(AudioBuffer& buffer) {
    // Process left channel
    for (size_t i = 0; i < buffer.num_samples; ++i) {
        float x = buffer.left[i];
        float y = coeffs.b0 * x
                + coeffs.b1 * left_state.x1
                + coeffs.b2 * left_state.x2
                - coeffs.a1 * left_state.y1
                - coeffs.a2 * left_state.y2;

        left_state.x2 = left_state.x1;
        left_state.x1 = x;
        left_state.y2 = left_state.y1;
        left_state.y1 = y;

        buffer.left[i] = y;
    }
    // Same for right channel
}

Create test with known signals:

// Generate sine sweep for testing
void generateSineSweep(AudioBuffer& buffer, float f_start, float f_end) {
 for (size_t i = 0; i < buffer.num_samples; ++i) {
     float t = static_cast<float>(i) / buffer.sample_rate;
     float f = f_start + (f_end - f_start) * t /
               (buffer.num_samples / buffer.sample_rate);
     float sample = std::sin(2.0f * M_PI * f * t);
     buffer.left[i] = buffer.right[i] = sample;
 }
}

Checkpoint: Apply low-pass filter to noise, verify high frequencies are attenuated.

Phase 3: SIMD Optimization (Days 8-10)

Goals:

Optimize filter for SIMD (parallel channels)
Implement SIMD compressor with stdx::where()
Benchmark scalar vs SIMD
Verify correctness matches scalar implementation

Tasks:

Multi-channel SIMD filter (process 8 channels at once): ```cpp // For true SIMD benefit with IIR, process multiple channels template class MultiChannelBiquad { std::array<BiquadCoeffs, N_CHANNELS> coeffs;

// State as AoS for SIMD simd_f x1, x2, y1, y2; simd_f b0, b1, b2, a1, a2;

public: void process(std::array<float*, N_CHANNELS> channels, size_t n) { for (size_t i = 0; i < n; ++i) { simd_f x = gather(channels, i);

        simd_f y = b0 * x + b1 * x1 + b2 * x2 - a1 * y1 - a2 * y2;

        x2 = x1; x1 = x;
        y2 = y1; y1 = y;

        scatter(channels, i, y);
    }
} }; ```

SIMD compressor:

void Compressor::process(AudioBuffer& buffer) {
 simd_f thresh = threshold_linear;
 simd_f ratio_inv = 1.0f / ratio;

 // Process in SIMD chunks
 for (size_t i = 0; i + simd_f::size() <= buffer.num_samples;
      i += simd_f::size()) {
     simd_f left(&buffer.left[i], stdx::element_aligned);
     simd_f right(&buffer.right[i], stdx::element_aligned);

     // Peak of stereo
     simd_f peak = stdx::max(stdx::abs(left), stdx::abs(right));

     // Compute gain reduction
     auto over_thresh = peak > thresh;
     simd_f gain_reduction = thresh + (peak - thresh) * ratio_inv;
     simd_f gain = stdx::where(over_thresh, gain_reduction / peak, simd_f(1.0f));

     // Apply gain
     left *= gain;
     right *= gain;

     left.copy_to(&buffer.left[i], stdx::element_aligned);
     right.copy_to(&buffer.right[i], stdx::element_aligned);
 }
}

Benchmark framework: ```cpp struct BenchmarkResult { double scalar_time_ms; double simd_time_ms; double speedup; };

BenchmarkResult benchmark(std::function<void()> scalar_fn, std::function<void()> simd_fn, int iterations = 1000) { // Warmup for (int i = 0; i < 10; ++i) { scalar_fn(); simd_fn(); }

auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < iterations; ++i) scalar_fn();
auto scalar_time = std::chrono::high_resolution_clock::now() - start;

start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < iterations; ++i) simd_fn();
auto simd_time = std::chrono::high_resolution_clock::now() - start;

return {
    scalar_time.count() / 1e6 / iterations,
    simd_time.count() / 1e6 / iterations,
    scalar_time.count() / static_cast<double>(simd_time.count())
}; } ```

Checkpoint: SIMD version matches scalar output within floating-point tolerance, achieves 5x+ speedup.

Phase 4: Processing Chain & CLI (Days 11-13)

Goals:

Implement effect chaining
Build CLI interface
Add limiter
Add performance reporting

Tasks:

Processing chain: ```cpp class ProcessingChain { std::vector<std::unique_ptr> effects;

public: template<typename T, typename… Args> void add(Args&&… args) { effects.push_back(std::make_unique(std::forward(args)...)); }

void process(AudioBuffer& buffer) {
    for (auto& effect : effects) {
        effect->process(buffer);
    }
}

void reset() {
    for (auto& effect : effects) {
        effect->reset();
    }
} }; ```

Limiter:

void Limiter::process(AudioBuffer& buffer) {
 simd_f ceiling_pos = ceiling_linear;
 simd_f ceiling_neg = -ceiling_linear;

 for (size_t i = 0; i + simd_f::size() <= buffer.num_samples;
      i += simd_f::size()) {
     simd_f left(&buffer.left[i], stdx::element_aligned);
     simd_f right(&buffer.right[i], stdx::element_aligned);

     // Clamp to ceiling
     left = stdx::clamp(left, ceiling_neg, ceiling_pos);
     right = stdx::clamp(right, ceiling_neg, ceiling_pos);

     left.copy_to(&buffer.left[i], stdx::element_aligned);
     right.copy_to(&buffer.right[i], stdx::element_aligned);
 }
}

CLI main:

int main(int argc, char* argv[]) {
 // Parse arguments
 std::string input_file, output_file;
 bool realtime_mode = false;
 // ... argparse ...

 // Load audio
 AudioBuffer buffer;
 if (!loadWav(input_file, buffer)) {
     std::cerr << "Failed to load " << input_file << "\n";
     return 1;
 }

 // Build processing chain
 ProcessingChain chain;
 chain.add<LowShelfEQ>(100.0f, 3.0f, buffer.sample_rate);
 chain.add<ParametricEQ>(3000.0f, -2.0f, 1.5f, buffer.sample_rate);
 chain.add<Compressor>(-20.0f, 4.0f, 10.0f, 100.0f, buffer.sample_rate);
 chain.add<Limiter>(-0.3f);

 // Process with timing
 auto start = std::chrono::high_resolution_clock::now();
 chain.process(buffer);
 auto elapsed = std::chrono::high_resolution_clock::now() - start;

 // Report performance
 double ms = elapsed.count() / 1e6;
 double realtime_ratio = (buffer.num_samples / buffer.sample_rate * 1000) / ms;

 std::cout << "Processed in " << ms << " ms\n";
 std::cout << "Real-time ratio: " << realtime_ratio << "x\n";

 // Save output
 saveWav(output_file, buffer);

 return 0;
}

Checkpoint: Full pipeline works end-to-end, processes audio correctly.

Phase 5: Testing & Polish (Day 14)

Goals:

Comprehensive testing
VU meter display
Documentation
Handle edge cases

Tasks:

Test suite:
- Unit tests for each effect
- Integration tests for full chain
- Compare SIMD vs scalar output
- Test with various WAV formats

VU meter:

struct VUMeter {
 float peak_left = 0, peak_right = 0;
 float decay = 0.99f;

 void update(const AudioBuffer& buffer) {
     peak_left *= decay;
     peak_right *= decay;

     for (size_t i = 0; i < buffer.num_samples; ++i) {
         peak_left = std::max(peak_left, std::abs(buffer.left[i]));
         peak_right = std::max(peak_right, std::abs(buffer.right[i]));
     }
 }

 void display() {
     auto bar = [](float level) {
         int blocks = static_cast<int>(level * 10);
         return std::string(blocks, '#') + std::string(10 - blocks, ' ');
     };

     std::cout << "[L: " << bar(peak_left) << "] "
               << "[R: " << bar(peak_right) << "]\r" << std::flush;
 }
};

Testing Strategy

Test Categories

Category	Purpose	Examples
Unit Tests	Verify individual components	Biquad coefficients correct, compressor gain reduction correct
Integration Tests	Verify full processing chain	Load WAV, process, save, verify output
Performance Tests	Verify speed requirements	SIMD >= 5x scalar, < 0.1ms per buffer
Audio Quality Tests	Verify output sounds correct	Visual inspection of waveforms, listening tests

Critical Test Cases

Biquad Coefficient Test:

void testBiquadCoefficients() {
 // Test low-pass at Nyquist/4 with Q=1 (Butterworth)
 auto c = BiquadCoeffs::lowPass(11025.0f, 0.707f, 44100.0f);

 // Verify at DC: gain should be 0dB
 float dc_gain = (c.b0 + c.b1 + c.b2) / (1 + c.a1 + c.a2);
 assert(std::abs(dc_gain - 1.0f) < 0.001f);

 // Verify at Nyquist: gain should be very low (< -60dB)
 // (Using z-transform evaluation)
}

Impulse Response Test:

void testBiquadImpulse() {
 StereoBiquad filter;
 filter.setCoeffs(BiquadCoeffs::lowPass(1000.0f, 0.707f, 44100.0f));

 AudioBuffer impulse(1024, 44100);
 impulse.left[0] = 1.0f;  // Impulse

 filter.process(impulse);

 // Verify impulse response decays to near-zero
 assert(std::abs(impulse.left[1023]) < 0.0001f);
}

Compressor Threshold Test:

void testCompressorThreshold() {
 Compressor comp(-20.0f, 4.0f, 0.0f, 0.0f, 44100.0f);  // Instant attack/release

 AudioBuffer buffer(1024, 44100);
 // Fill with signal at -10dB (above threshold)
 float level = std::pow(10.0f, -10.0f / 20.0f);  // ~0.316
 std::fill(buffer.left.begin(), buffer.left.end(), level);

 comp.process(buffer);

 // Output should be at -20 + (-10 - -20) / 4 = -17.5 dB
 float expected = std::pow(10.0f, -17.5f / 20.0f);
 assert(std::abs(buffer.left[512] - expected) < 0.01f);
}

SIMD Correctness Test:

void testSIMDMatchesScalar() {
 AudioBuffer buffer1(4096, 44100), buffer2(4096, 44100);
 // Fill with random data
 std::random_device rd;
 std::mt19937 gen(rd());
 std::uniform_real_distribution<float> dist(-1.0f, 1.0f);
 for (size_t i = 0; i < 4096; ++i) {
     float sample = dist(gen);
     buffer1.left[i] = buffer2.left[i] = sample;
 }

 // Process with scalar
 CompressorScalar scalar_comp(-20, 4, 10, 100, 44100);
 scalar_comp.process(buffer1);

 // Process with SIMD
 CompressorSIMD simd_comp(-20, 4, 10, 100, 44100);
 simd_comp.process(buffer2);

 // Compare
 for (size_t i = 0; i < 4096; ++i) {
     assert(std::abs(buffer1.left[i] - buffer2.left[i]) < 1e-5f);
 }
}

Real-Time Safety Test:

void testNoAllocations() {
 // Set up allocator tracking
 AllocationCounter counter;

 ProcessingChain chain;
 chain.add<LowShelfEQ>(100, 3, 44100);
 chain.add<Compressor>(-20, 4, 10, 100, 44100);

 AudioBuffer buffer(512, 44100);

 counter.reset();
 for (int i = 0; i < 1000; ++i) {
     chain.process(buffer);
 }

 assert(counter.allocations() == 0);
}

Test Data

# Generate test signals using sox or similar
# 1. White noise (tests all frequencies equally)
sox -n -r 44100 -c 2 tests/white_noise.wav synth 5 whitenoise

# 2. Sine sweep (visualize frequency response)
sox -n -r 44100 -c 2 tests/sine_sweep.wav synth 5 sine 20-20000

# 3. Impulse (measure impulse response)
sox -n -r 44100 -c 1 tests/impulse.wav synth 1 sine 0 pad 0 1

# 4. Music sample (subjective listening test)
# Use any royalty-free music file

Common Pitfalls & Debugging

Frequent Mistakes

Pitfall	Symptom	Root Cause	Fix
Allocation in audio callback	Glitches under load	`std::vector::push_back` or `new`	Pre-allocate all buffers
Denormalized floats	Performance cliff	Very small values slow down FPU	Use `FTZ` (Flush To Zero) mode
Uninitialized filter state	Click at start	`x1`, `x2`, `y1`, `y2` are garbage	Initialize all state to 0
Wrong sample rate	Filter sounds wrong	Coefficients calculated for wrong fs	Pass sample rate to coefficient functions
Buffer alignment	SIMD crash	Unaligned loads/stores	Use `alignas(64)` on buffers
Interleaved assumed planar	Stereo swap / noise	Format mismatch	Convert format explicitly

Debugging Strategies

Visualize waveforms: ```bash
Use Audacity or Python matplotlib to view audio

import numpy as np import matplotlib.pyplot as plt from scipy.io import wavfile

rate, data = wavfile.read(‘output.wav’) plt.plot(data[:4410]) # First 100ms plt.show()

2. **Test with simple signals:**
   - DC offset: Should pass through unchanged by EQ
   - Sine at filter frequency: Should be affected predictably
   - Impulse: Reveals filter response directly

3. **Compare with reference implementation:**
   - JUCE's built-in filters
   - Python's scipy.signal
   - Online filter calculators (for coefficients)

4. **Enable denormal flushing:**
```cpp
#include <immintrin.h>

void setupFPU() {
    _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
    _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
}

Profile with proper tools: ```bash
Linux perf

perf stat -e cycles,instructions,cache-misses ./audio_dsp –input test.wav –output out.wav

VTune (Intel)

vtune -collect hotspots ./audio_dsp …

macOS Instruments

instruments -t “Time Profiler” ./audio_dsp … ```

Performance Traps

Not using -march=native: Without this, compiler won’t use AVX/AVX2
Debug builds: 100x slower than release; always profile release builds
Unrolling too aggressively: Can hurt cache performance
Function call overhead: Inline hot paths; use inline or LTO
Mixing scalar and SIMD: Transition costs; process entire buffers in SIMD mode

Extensions & Challenges

Beginner Extensions

Add more filter types: High-pass, band-pass, notch, all-pass
Fade in/out: Apply smooth volume envelope at start/end
Mono to stereo: Duplicate mono channel to both outputs
Normalize: Scale audio so peak reaches 0dB
DC offset removal: High-pass at very low frequency (5Hz)

Intermediate Extensions

Multi-band compressor: Split into low/mid/high, compress separately
Look-ahead limiter: Delay signal to anticipate peaks
Convolution reverb: Apply room impulse response (FIR with FFT)
Oversampling: Process at 2x or 4x sample rate for better filters
Parameter smoothing: Avoid clicks when changing parameters
Sidechain compression: Compress based on different input signal

Advanced Extensions

FFT spectral processing: Implement spectral EQ, noise reduction
Real-time audio I/O: Use PortAudio or JUCE for live processing
VST/AU plugin: Wrap as a plugin for DAWs
GPU acceleration: Use CUDA/OpenCL for convolution
SIMD convolution: Implement overlap-add with AVX
Latency compensation: Report and compensate for processing delay
State variable filter: Implement with better numerical behavior

Research Extensions

Model-based compression: Emulate analog compressor behavior
Neural audio effects: Use ML for effect modeling
Spatial audio: Implement HRTF and binaural processing
Adaptive filtering: Echo cancellation with LMS algorithm

Resources

Essential Reading

“DAFX: Digital Audio Effects” (Zolzer) - Comprehensive DSP reference
“The Audio EQ Cookbook” (Robert Bristow-Johnson) - Biquad coefficient formulas
“Designing Audio Effect Plugins in C++” (Will Pirkle) - Practical plugin development
“Real-Time Audio Signal Processing” (Boulanger) - Fundamentals of real-time audio

Online Resources

The Audio Programmer (YouTube) - JUCE tutorials and audio DSP
Bela Blog - Real-time audio programming articles
musicdsp.org - Algorithm library and code snippets
Audio EQ Cookbook - Filter coefficient formulas
KVR Audio DSP Forum - Developer discussions

Documentation

JUCE Documentation - Industry-standard audio framework
PortAudio Documentation - Cross-platform audio I/O
libsndfile Documentation - Audio file reading/writing
std::experimental::simd - C++ SIMD types

JUCE - Full-featured audio framework
libsndfile - Audio file I/O library
RTAudio - Real-time audio I/O
fftw - Fast FFT library
KissFFT - Simple FFT library

Books That Will Help

Topic	Book	Chapter
Digital filter design	DAFX: Digital Audio Effects	Ch. 2
Biquad implementation	Designing Audio Effect Plugins in C++	Ch. 6-8
Dynamics processing	DAFX: Digital Audio Effects	Ch. 4
SIMD programming	Intel Intrinsics Guide	(online)
Real-time audio	Real-Time Audio Signal Processing	Ch. 1-3
C++ performance	Optimizing C++ (Bulka)	Ch. 8-10

Self-Assessment Checklist

Before considering this project complete, verify:

Understanding

I can explain why audio has hard real-time constraints and calculate latency from buffer size
I understand the difference between IIR and FIR filters and when to use each
I can derive biquad coefficients for basic filter types using the EQ Cookbook
I understand why IIR filters are difficult to parallelize with SIMD
I can explain the tradeoffs of interleaved vs planar audio formats
I know the rules for real-time safe code (no allocations, no locks, no I/O)
I understand how compression works (threshold, ratio, attack, release)
I can calculate SIMD speedup and explain why it may be less than lane count

Implementation

WAV file loading and saving works correctly
Gain control with SIMD achieves 7x+ speedup over scalar
Biquad filter produces correct frequency response
Compressor applies gain reduction correctly above threshold
Limiter prevents clipping (all samples within ceiling)
Processing chain runs all effects in sequence
SIMD version produces identical output to scalar (within tolerance)
No memory allocations occur during processing
Processing time is under 0.1ms per 512-sample buffer

Performance

SIMD implementation achieves >= 5x speedup on full chain
Real-time ratio is >= 500x
No glitches during extended processing
Profile shows time spent in SIMD loops, not scalar tails

Quality

Processed audio sounds correct (subjective listening test)
Filter frequency response matches design
No clicks or pops at buffer boundaries
Handles mono and stereo files correctly

Submission / Completion Criteria

Minimum Viable Completion

Load and save WAV files (at least 16-bit stereo @ 44.1kHz)
Implement SIMD gain control with measurable speedup
Implement at least one biquad filter type (e.g., low-pass)
Implement basic compressor (threshold + ratio)
Report processing time per buffer
Runs without crashes on valid input

Full Completion

All functional requirements F1-F9 implemented
Performance meets all non-functional requirements (N1-N5)
Multiple filter types: low-shelf, high-shelf, parametric
Compressor with attack and release
Limiter
Processing chain with multiple effects
Benchmark comparison showing SIMD vs scalar speedup
VU meter display
Clean error handling for invalid files

Excellence (Going Above & Beyond)

Deliverables:

Source code with clear organization and comments
CMakeLists.txt for building
README with:
- Build instructions
- Usage examples
- Performance results
- Architecture overview
Test files demonstrating each effect
Performance benchmark results (scalar vs SIMD)
Brief writeup explaining SIMD optimization strategy

The Interview Questions They’ll Ask

After completing this project, you’ll be ready for these questions:

“Why is audio processing considered ‘hard real-time’?”
- Answer: Audio has strict deadlines (buffer period). Missing a deadline causes audible glitches. Unlike “soft” real-time (video), even occasional misses are unacceptable.
“What makes SIMD difficult for IIR filters?”
- Answer: IIR filters have temporal dependencies (output depends on previous output). You can’t compute y[n+1] until you know y[n]. Solutions: process multiple channels in parallel, use FIR where possible, or restructure algorithms.
“How would you prevent allocations in an audio callback?”
- Answer: Pre-allocate all buffers at initialization. Use fixed-size arrays or pre-sized vectors. Avoid std::string, std::vector::push_back, std::map::operator[], exceptions with allocation. Consider a custom allocator that panics on allocation for testing.
“Explain the difference between interleaved and planar audio.”
- Answer: Interleaved alternates channels (LRLRLR), common in APIs. Planar groups channels (LLLLRRRR), better for SIMD. SIMD processes consecutive memory efficiently; planar gives consecutive samples per channel.
“How does a biquad filter work?”
- Answer: It’s a second-order IIR filter with 5 coefficients. Output is weighted sum of current/past inputs and past outputs: y[n] = b0*x[n] + b1*x[n-1] + b2*x[n-2] - a1*y[n-1] - a2*y[n-2]. Different coefficient calculations create different filter types.
“What’s the latency of your audio processor?”
- Answer: Latency = buffer_size / sample_rate. For 512 samples at 44.1kHz, that’s 11.6ms. Plus any processing delay (look-ahead in limiter, group delay in filters). I can reduce latency by using smaller buffers at the cost of higher CPU usage per sample.
“How would you debug a glitch in real-time audio?”
- Answer: Check for allocations (custom allocator that asserts), check for locks, profile for CPU spikes, log timing of each buffer, visualize audio for anomalies, reduce buffer size to make problem more frequent, add diagnostics that don’t affect real-time performance.
“What SIMD speedup did you achieve and why wasn’t it 8x (for AVX)?”
- Answer: Achieved 5-7x speedup. Less than 8x due to: memory bandwidth limits, scalar tail handling, data dependencies in filters, function call overhead, cache effects. IIR filters particularly limited by sequential nature.

This project demonstrates mastery of both SIMD programming and real-time systems constraints. Audio processing is one of the most demanding SIMD applications because of the hard latency requirements. The skills you develop here apply directly to game audio, music production software, telecommunications, and any performance-critical signal processing application.

Related Projects:

Previous: P12: SIMD Math Library - Foundation for this project
Next: P14: Real-Time Game Physics - Combines SIMD with parallel algorithms

For the complete learning path, see the project index.

Project 13: Audio DSP with SIMD

Quick Reference

Learning Objectives

Theoretical Foundation

Core Concepts

Sample Rates and Buffers

Interleaved vs Planar Audio Formats

Digital Filters: IIR and FIR

SIMD Challenge with IIR Filters

Dynamics Processing: Compression

Why This Matters: Real-Time Constraints

Historical Context

Common Misconceptions

Project Specification

What You Will Build

Functional Requirements

Non-Functional Requirements

Performance Metrics

Real World Outcome

Solution Architecture

High-Level Design

Key Components

Data Structures

Algorithm Overview

Implementation Guide

Development Environment Setup

Project Structure

Implementation Phases

Phase 1: Foundation (Days 1-3)

Phase 2: Biquad Filter (Days 4-7)

Phase 3: SIMD Optimization (Days 8-10)

Phase 4: Processing Chain & CLI (Days 11-13)

Phase 5: Testing & Polish (Day 14)

Testing Strategy

Test Categories

Critical Test Cases

Test Data

Common Pitfalls & Debugging

Frequent Mistakes

Debugging Strategies

Use Audacity or Python matplotlib to view audio

Linux perf

VTune (Intel)

macOS Instruments

Performance Traps

Extensions & Challenges

Beginner Extensions

Intermediate Extensions

Advanced Extensions

Research Extensions

Resources

Essential Reading

Online Resources

Documentation

Related Projects

Books That Will Help

Self-Assessment Checklist

Understanding

Implementation

Performance

Quality

Submission / Completion Criteria

Minimum Viable Completion

Full Completion

Excellence (Going Above & Beyond)

The Interview Questions They’ll Ask