Project 4: Real-Time Audio Spectrum Analyzer
Project 4: Real-Time Audio Spectrum Analyzer
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 3-4 weeks |
| Main Language | C |
| Alternatives | Rust, Arduino C++, MicroPython |
| Primary Book | The Scientist and Engineerโs Guide to DSP by Steven W. Smith |
| Knowledge Areas | DSP, Audio Processing, I2S, Dual-Core FreeRTOS, DMA |
What Youโll Build
A device that captures audio through a microphone, performs FFT analysis, and displays a real-time frequency spectrum on an LED matrix or OLED display.
Physical Setup:
- ESP32 connected to INMP441 (I2S digital microphone) or MAX4466 (analog)
- 8x32 or 16x16 WS2812B LED matrix, or SSD1306 OLED display
- Music or voice near the microphone creates dancing visualizations
What Youโll See:
LED Matrix (8 frequency bands):
โ
โ โ
โ โ โ โ โ
โ โ โ โ โ โ โ โ
โ โ โ โ โ โ โ โ
20 50 125 315 800 2k 5k 12k Hz
OLED Display (128x64):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Real-Time Audio Spectrum โ
โ โ
โ โโโโ
โโโโโโ
โโโโโโโโโ
โโโโโโ
โโโโ โ
โ โ
โ Peak: 2.4kHz dB: -18 โ
โ FPS: 45 Clipping: No โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Learning Objectives
By completing this project, you will be able to:
- Sample audio using I2S with DMA for zero-copy continuous capture
- Implement the Fast Fourier Transform on resource-constrained hardware
- Leverage ESP32โs dual-core architecture for parallel processing
- Apply digital signal processing concepts: windowing, magnitude calculation, dB scaling
- Drive WS2812B LED matrices using the RMT peripheral
- Profile and optimize real-time systems to achieve target frame rates
- Understand time-frequency domain trade-offs in spectrum analysis
Deep Theoretical Foundation
Digital Audio Fundamentals
Sound is a continuous pressure wave in air. To process it digitally, we must sample it at discrete points in time.
Sampling and Nyquist Theorem
Continuous Sound Wave:
Amplitude
^
โ โญโโโโฎ โญโโโโฎ โญโโโโฎ
โ โฑ โฒ โฑ โฒ โฑ โฒ
0 โโผโโโฏโโโโโโโโฒโโโฏโโโโโโโโฒโโโฏโโโโโโโโฒโโ Time
โ โฐโฎ โฐโฎ โฐ
โ โฐโโโโโโโโโโฐโโโโโโโโโโโ
โ
Sampled at discrete points:
โ โข โข โข
โ โข โข โข
โ โข โข โข
0 โโผโโขโโโโโโโโโโโโโโโโขโโโโโโโโโโขโโโโโโโ Time
โ โข
โ โข
Nyquist Theorem: To accurately represent a frequency f, you must sample at least 2f times per second.
| Sample Rate | Maximum Frequency | Common Use |
|---|---|---|
| 8,000 Hz | 4,000 Hz | Telephone |
| 22,050 Hz | 11,025 Hz | AM radio quality |
| 44,100 Hz | 22,050 Hz | CD quality |
| 48,000 Hz | 24,000 Hz | Professional audio |
Why 44.1kHz for this project: Human hearing extends to ~20kHz. Sampling at 44.1kHz captures the full audible spectrum with headroom.
Aliasing: What Happens When Nyquist is Violated
If you sample a 15kHz tone at 22kHz (less than 2ร15kHz):
Original signal (15kHz):
โญโโฎ โญโโฎ โญโโฎ โญโโฎ โญโโฎ
โโโโฏ โฐโฏ โฐโฏ โฐโฏ โฐโฏ โฐโโโ
Sample points at 22kHz:
โข โข โข โข โข โข
What we reconstruct (7kHz!):
โญโโโโโฎ โญโโโโโฎ
โโโโโฏ โฐโโโโฏ โฐโโโ
The 15kHz signal "aliases" to 7kHz (22-15=7)
Anti-aliasing filter: Hardware or software filter to remove frequencies above Nyquist before sampling. Many I2S microphones include this internally.
The Fast Fourier Transform (FFT)
The FFT transforms time-domain samples into frequency-domain componentsโthe mathematical heart of a spectrum analyzer.
What FFT Computes
Given N time-domain samples, FFT produces N/2 frequency โbinsโ:
Time Domain (1024 samples): Frequency Domain (512 bins):
Amplitude Magnitude
^ ^
โ โญโฎโญโฎ โญโฎโญโฎ โญโฎโญโฎ โ โ
โโญโฏโฐโฏโฐโฎ โญโฏโฐโฏโฐโฎ โญโฏโฐโฏโฐโฎ โ โ โ โ
0 โโค โฐโฏ โฐโฏ โฐโฏ โ โ โ โ โ โ
โ 0 โโผโโโโโโโโโโโโโโโโโโโ Frequency
โ FFT 0Hz 22kHz
โโโโโโโโโโโโโโโโโโโโโโโโ Time โโ bin 0 bin 511 โโ
Each bin represents a frequency range:
- Bin width = Sample Rate / N = 44100 / 1024 = 43.07 Hz per bin
- Bin 0 = 0 Hz (DC component)
- Bin 1 = 0-43 Hz
- Bin 100 = 4300-4343 Hz
FFT Output: Complex Numbers
FFT output is complex: each bin has real and imaginary components.
FFT output for bin k: X[k] = a + bi
Where:
- a = real component (cosine amplitude)
- b = imaginary component (sine amplitude)
Magnitude: |X[k]| = โ(aยฒ + bยฒ)
Phase: โ X[k] = atan2(b, a) (not needed for visualization)
For spectrum display, we only care about magnitudeโhow strong each frequency is.
Why FFT is Fast
DFT (Discrete Fourier Transform):
- Direct calculation: O(Nยฒ)
- 1024 samples: ~1 million operations
- Too slow for real-time
FFT (Fast Fourier Transform):
- Divide and conquer: O(N log N)
- 1024 samples: ~10,000 operations
- 100x faster!
The FFT exploits symmetry in the DFT equations using the Cooley-Tukey algorithm (1965).
Windowing: Reducing Spectral Leakage
FFT assumes the input repeats infinitely. But our 1024-sample buffer doesnโt perfectly align with signal periods.
Without windowing - discontinuity at edges:
Sample buffer:
โโญโโโโฎ โญโโโโฎ โญโโโโ Discontinuity!
โโ โฒ โฑ โฒ โฑ โ
โโ โณ โณ โ
โโฐโโโโฏ โฐโโโโฏ โฐโโโโฏ โ
โโโโโ 1024 samples โโโโ
This discontinuity creates artificial frequencies (spectral leakage)
Window functions taper the signal to zero at the edges:
Hann Window:
โญโโโโโโโโโโโโโโโฎ
โฑ โฒ
โฑ โฒ
โฑ โฒ
โฑ________________________โฒ
โโโโโ 1024 samples โโโโโโ
Applied to signal:
- Multiplied sample-by-sample
- Edges become zero (no discontinuity)
- Center preserved (minimal signal distortion)
| Window | Sidelobe Level | Frequency Resolution | Use Case |
|---|---|---|---|
| Rectangular | -13 dB | Excellent | Testing only |
| Hann | -31 dB | Good | General purpose |
| Hamming | -43 dB | Good | Speech analysis |
| Blackman | -58 dB | Poor | High dynamic range |
For spectrum visualizers: Hann window is idealโgood compromise between frequency resolution and sidelobe suppression.
I2S: Digital Audio Interface
I2S (Inter-IC Sound) is a standard for transmitting digital audio between chips.
I2S Signal Lines
BCLK (Bit Clock): โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
WS (Word Select): โโโโโโโโโโโโโโโโโโโโโ
Left channel โโโโ โโโโโโโโโโโโโโโโ
Right channel
DOUT (Data): โD15โD14โ...โD1โD0โD15โD14โ...โD1โD0โ
โโ Left channel โโโ Right channel โโ
- BCLK: One clock per bit (e.g., 44100 ร 16 ร 2 = 1.4 MHz for 16-bit stereo)
- WS/LRCLK: High = Right channel, Low = Left channel
- DOUT: Serial audio data, MSB first
I2S on ESP32
ESP32 has two I2S peripherals. For audio input:
i2s_config_t i2s_config = {
.mode = I2S_MODE_MASTER | I2S_MODE_RX, // Receive mode
.sample_rate = 44100,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.dma_buf_count = 8, // Number of DMA buffers
.dma_buf_len = 1024, // Samples per buffer
.use_apll = true, // Use APLL for accurate sample rate
};
i2s_pin_config_t pin_config = {
.bck_io_num = 26, // Bit Clock
.ws_io_num = 25, // Word Select (Left/Right)
.data_in_num = 33, // Data input
};
DMA: Zero-Copy Audio Capture
DMA (Direct Memory Access) moves data between peripherals and memory without CPU involvement.
Without DMA: With DMA:
I2S โ CPU โ RAM I2S โ DMA โ RAM
โ
CPU must copy each sample CPU is free to
(blocks processing) run FFT while DMA
fills next buffer
Double Buffering (Ping-Pong)
Time โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
DMA: โ Fill Buffer A โ Fill Buffer B โ Fill Buffer A โ
โ โ โ
CPU: โ Process A โ Process B โ Process A
โผ โผ โผ
Buffer A: [samples...] (being processed) [new samples]
Buffer B: (being filled) [samples...] (being filled)
This allows continuous audio capture with no gaps.
ESP32 Dual-Core Architecture
ESP32 has two Xtensa LX6 cores. We can dedicate each to specific tasks:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ESP32 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Core 0 โ Core 1 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Audio Capture Task โ FFT Processing Task โ
โ - I2S DMA management โ - Apply window function โ
โ - Buffer ready signal โ - Compute FFT โ
โ - Continuous sampling โ - Calculate magnitudes โ
โ โ - Map to display โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ Display Update Task โ
โ โ - Render LED/OLED โ
โ โ - Gamma correction โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Task Pinning
// Pin audio capture to Core 0
xTaskCreatePinnedToCore(
audio_capture_task, // Function
"audio", // Name
4096, // Stack size
NULL, // Parameters
10, // Priority (high)
&audio_task_handle, // Task handle
0 // Core 0
);
// Pin FFT processing to Core 1
xTaskCreatePinnedToCore(
fft_process_task,
"fft",
8192, // Larger stack for FFT arrays
NULL,
5, // Medium priority
&fft_task_handle,
1 // Core 1
);
WS2812B LED Matrix Driving
WS2812B LEDs use a single-wire timing-critical protocol. Each bit is encoded by pulse duration:
Bit 0: Bit 1:
High: 0.4ยตs High: 0.8ยตs
Low: 0.85ยตs Low: 0.45ยตs
โโโโโโโโโ โโโโโโโโโโโโโโโ
โ โโโโโโโโโโโโโโ โ โโโโโโโโ
โโ 0.4 โโโ 0.85 โโ โโ 0.8 โโโ 0.45 โโ
Total bit time: 1.25ยตs (800kHz data rate)
For an 8ร32 matrix (256 LEDs ร 24 bits = 6,144 bits):
- Transmission time: 6,144 ร 1.25ยตs = 7.68ms
- Maximum refresh rate: ~130 Hz
ESP32 RMT Peripheral
The RMT (Remote Control Transceiver) peripheral generates precise timing for WS2812B:
#include "driver/rmt.h"
// Configure RMT for WS2812B
rmt_config_t config = {
.rmt_mode = RMT_MODE_TX,
.channel = RMT_CHANNEL_0,
.gpio_num = LED_GPIO,
.clk_div = 2, // 40MHz
.mem_block_num = 1,
};
// Timing for WS2812B
#define T0H 16 // 0.4ยตs at 40MHz
#define T1H 32 // 0.8ยตs
#define T0L 34 // 0.85ยตs
#define T1L 18 // 0.45ยตs
Project Specification
Hardware Requirements
| Component | Quantity | Purpose |
|---|---|---|
| ESP32 DevKit | 1 | Main MCU |
| INMP441 I2S Mic | 1 | Digital audio input |
| 8x32 WS2812B Matrix | 1 | Spectrum display |
| 5V 3A Power Supply | 1 | LED power |
| Level Shifter (3.3Vโ5V) | 1 | Data line for LEDs |
| Capacitor (1000ยตF) | 1 | Power smoothing |
Wiring Diagram
ESP32 DevKit INMP441 Microphone
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโ
โ GPIO26 โโโโโโโโโโโโโโโโโโ SCK (BCLK) โ
โ GPIO25 โโโโโโโโโโโโโโโโโโ WS (LRCLK) โ
โ GPIO33 โโโโโโโโโโโโโโโโโโ SD (DOUT) โ
โ 3.3V โโโโโโโโโโโโโโโโโโ VDD โ
โ GND โโโโโโโโโโโโโโโโโโ GND โ
โ โ โ L/R โ GND โ (left channel)
โ โ โโโโโโโโโโโโโโ
โ โ
โ โ WS2812B LED Matrix
โ โ โโโโโโโโโโโโโโโโโโโโโโ
โ GPIO13 โโโโ[Level]โโโโโโโ DIN โ
โ โ Shifter โ โ
โ GND โโโโโโโโโโโโโโโโโโ GND โ
โ โ โโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ โ โ 5V 3A Supply
โ โ โโโโโโโโโโโโโโโโโโโโโโ
โ โ โ 5V โโโโโโโ LED VCC โ
โ โ โ GND โโโโโโ LED GND โ
โ โ โโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโ
Note: Add 1000ยตF capacitor across LED power rails
Add 330ฮฉ resistor in series with DIN line
Functional Requirements
- Audio Capture
- 44.1kHz sample rate, 16-bit mono
- 1024-sample FFT window (23ms latency)
- Continuous capture via DMA
- FFT Processing
- Apply Hann window to reduce spectral leakage
- Compute 1024-point FFT
- Calculate magnitude in dB scale
- Frequency Mapping
- Map 512 bins to 8 display bands (logarithmic)
- Apply smoothing for pleasant visuals
- Implement peak hold (optional)
- Display Output
- 30+ FPS refresh rate
- Rainbow color gradient
- Gamma correction for LEDs
- Performance
- Total latency < 50ms (audio to display)
- No audio dropouts
- CPU usage < 80% per core
Solution Architecture
System Pipeline
โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ
โ I2S/DMA โโโโโโ Window โโโโโโ FFT โโโโโโ Magnitude โ
โ Capture โ โ Function โ โ (1024pt) โ โ Calc โ
โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ
23ms 1ms 8ms 2ms
โ
โผ
โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ
โ LED โโโโโโ Color โโโโโโ Smoothing โโโโโโ Bin โ
โ Output โ โ Mapping โ โ Filter โ โ Mapping โ
โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโ
8ms 1ms 1ms 1ms
Total pipeline latency: ~45ms (well under perceptual threshold)
Task Structure
// Core 0: Audio capture
void audio_capture_task(void* param) {
int16_t samples[1024];
size_t bytes_read;
while (1) {
// DMA fills buffer, blocks until ready
i2s_read(I2S_NUM_0, samples, sizeof(samples), &bytes_read, portMAX_DELAY);
// Send to FFT task via queue
xQueueSend(audio_queue, samples, 0);
}
}
// Core 1: FFT processing and display
void fft_process_task(void* param) {
int16_t samples[1024];
float fft_input[1024];
float fft_output[1024];
float magnitudes[8]; // 8 frequency bands
while (1) {
// Wait for audio data
xQueueReceive(audio_queue, samples, portMAX_DELAY);
// Convert to float and apply window
for (int i = 0; i < 1024; i++) {
fft_input[i] = samples[i] * hann_window[i];
}
// Compute FFT
dsps_fft2r_fc32(fft_input, 1024);
// Calculate magnitudes for 8 bands
calculate_band_magnitudes(fft_output, magnitudes);
// Update LED display
update_leds(magnitudes);
}
}
Frequency Band Mapping
Human hearing is logarithmic. We use logarithmic spacing for natural-looking response:
FFT bin โ Frequency โ Display Bar
Bin Range Frequency Range Bar Description
โโโโโโโโโ โโโโโโโโโโโโโโโ โโโ โโโโโโโโโโโ
1-2 43-86 Hz 0 Sub-bass
3-5 86-215 Hz 1 Bass (kick drum)
6-12 215-516 Hz 2 Low-mid (bass guitar)
13-25 516-1075 Hz 3 Mid (vocals fundamental)
26-50 1075-2150 Hz 4 Upper-mid (presence)
51-100 2150-4300 Hz 5 High-mid (consonants)
101-200 4300-8600 Hz 6 High (sibilance)
201-400 8600-17200 Hz 7 Air (sparkle)
Key Data Structures
// Pre-computed Hann window coefficients
float hann_window[1024]; // Computed once at startup
// FFT band configuration
typedef struct {
uint16_t bin_start;
uint16_t bin_end;
float smoothing; // 0.0 = instant, 0.9 = very smooth
float peak; // Peak hold value
} fft_band_t;
fft_band_t bands[8] = {
{1, 2, 0.7, 0},
{3, 5, 0.7, 0},
// ... etc
};
// LED frame buffer
typedef struct {
uint8_t r, g, b;
} rgb_t;
rgb_t led_buffer[256]; // 8x32 matrix
Phased Implementation Guide
Phase 1: Audio Capture (Day 1-3)
Goal: See audio waveform in serial plotter
- Configure I2S
- Set up INMP441 microphone
- 44.1kHz, 16-bit, mono
- Verify with serial output
- Visualize Raw Samples
- Print samples to serial
- Use Arduino Serial Plotter
- Speak/clap โ see waveform
- Verify DMA Operation
- Check no buffer overruns
- Measure timing consistency
- Confirm continuous capture
Checkpoint: Serial plotter shows clean audio waveform when you speak
Phase 2: FFT Implementation (Day 4-7)
Goal: See frequency spectrum in serial
- Compute Basic FFT
- Use ESP-DSP library
- 1024-point FFT
- Print raw bin values
- Add Windowing
- Pre-compute Hann coefficients
- Apply before FFT
- Compare with/without (reduced leakage)
- Calculate Magnitudes
- sqrt(realยฒ + imagยฒ)
- Convert to dB: 20*log10(mag)
- Print 8-band summary
- Test with Tone Generator
- Play 440Hz tone from phone
- Should peak in bin ~10 (440/43)
- Verify frequency accuracy
Checkpoint: 1kHz tone shows peak in correct frequency range
Phase 3: Dual-Core Pipeline (Day 8-10)
Goal: Parallel audio capture and FFT processing
- Create Task Structure
- Audio capture on Core 0
- FFT processing on Core 1
- Queue for sample transfer
- Measure Performance
- Time each stage
- FFT should complete before next buffer
- Check for queue overflows
- Handle Edge Cases
- Queue full โ drop oldest buffer
- FFT too slow โ reduce size or optimize
- Monitor heap usage
Checkpoint: Continuous processing at 44Hz (1024 samples / 44100 = 23ms)
Phase 4: LED Display (Day 11-14)
Goal: Visualization on LED matrix
- Configure RMT for WS2812B
- Set timing parameters
- Test with solid color
- Verify all 256 LEDs work
- Implement Bar Graph
- Map magnitude to bar height
- Apply logarithmic scaling
- Add color gradient (rainbow or single-color)
- Add Visual Polish
- Smoothing filter (exponential moving average)
- Peak hold with decay
- Gamma correction for LEDs
Checkpoint: Dancing spectrum display responding to music
Phase 5: Optimization (Day 15-21)
Goal: Smooth, responsive, efficient
- Profile Performance
- Measure total pipeline latency
- Identify bottlenecks
- Target <50ms total delay
- Optimize FFT
- Use ESP-DSP optimized functions
- Consider fixed-point math
- Benchmark alternatives
- Reduce Memory Usage
- Static allocation where possible
- Share buffers carefully
- Monitor for leaks
- Add Features
- Multiple visualization modes
- Sensitivity adjustment
- Beat detection (bonus)
Testing Strategy
Unit Tests
| Component | Test | Expected Result |
|---|---|---|
| I2S | Read 1024 samples | Non-zero values |
| FFT | 440Hz input | Peak at bin 10ยฑ1 |
| FFT | White noise | Flat spectrum |
| Window | Apply Hann | Edge samples = 0 |
| LED | Set color | Correct color displayed |
Performance Tests
| Metric | Target | How to Measure |
|---|---|---|
| FFT time | < 15ms | esp_timer_get_time() |
| Display update | < 10ms | Timer around LED write |
| Total latency | < 50ms | Clap test (visual delay) |
| Frame rate | > 30 FPS | Count frames per second |
| CPU usage | < 80% | vTaskGetRunTimeStats() |
Audio Quality Tests
- Frequency Accuracy
- Play known frequencies (100Hz, 1kHz, 10kHz)
- Verify correct bars light up
- Dynamic Range
- Whisper โ quiet bars
- Loud music โ full bars
- No clipping at max volume
- Response Time
- Sharp transients (clap)
- Should appear within 2 frames (~66ms)
Common Pitfalls and Debugging
Audio Issues
Problem: No audio input
- Check I2S pin connections
- Verify microphone power (3.3V)
- L/R pin determines channel (try toggling)
Problem: Audio is distorted
- Check for clipping (samples at ยฑ32767)
- Reduce gain or add attenuation
- Verify sample rate matches microphone
Problem: High-frequency noise
- Add decoupling capacitor near mic (0.1ยตF)
- Use shielded wires
- Check for WiFi interference (disable WiFi if not needed)
FFT Issues
Problem: Spectrum looks wrong
- Verify windowing is applied
- Check FFT size matches sample count
- Ensure correct bin-to-frequency mapping
Problem: All frequencies show same level
- Check for DC offset (subtract average)
- Verify magnitude calculation (realยฒ + imagยฒ)
- Window might not be applied
Display Issues
Problem: LEDs flicker
- Check power supply (5V 3A minimum for 256 LEDs)
- Add capacitor across power rails
- Reduce brightness if power limited
Problem: Colors are wrong
- WS2812B is GRB, not RGB
- Check color order in library
- Verify gamma correction
Problem: Only some LEDs work
- Check data line connections
- Level shifter may be needed (3.3V โ 5V)
- Test with fewer LEDs first
Extensions and Challenges
Beginner Extensions
- Multiple Display Modes
- Bar graph, waterfall, oscilloscope
- Button to cycle through modes
- Color Themes
- Rainbow, fire, ocean, custom
- Store preference in NVS
Intermediate Challenges
- Beat Detection
- Detect kick drum hits
- Flash LEDs on beat
- Calculate BPM
- OLED Display
- Alternative to LED matrix
- Higher resolution spectrum
- Show peak frequency and dB level
Advanced Challenges
- Stereo Analysis
- Two microphones
- Compare left/right channels
- Visualize stereo field
- Wireless Audio
- Bluetooth A2DP sink
- Analyze streamed audio
- No microphone needed
- Machine Learning
- Train classifier on ESP32
- Detect music vs speech
- Identify specific songs
Real-World Connections
Commercial Products
| Product | Your Project Skill |
|---|---|
| Equalizer apps | FFT analysis, visualization |
| Guitar tuners | Frequency detection |
| Smart speakers | Audio processing, DSP |
| Music visualizers | Real-time graphics |
Industry Applications
- Audio Engineering: Spectrum analyzers, room correction
- Voice Assistants: Preprocessing before speech recognition
- Musical Instruments: Electronic effects, synthesizers
- Environmental Monitoring: Sound level meters, noise detection
Resources
Official Documentation
| Resource | URL |
|---|---|
| ESP-IDF I2S | docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/i2s.html |
| ESP-DSP Library | github.com/espressif/esp-dsp |
| WS2812B Datasheet | cdn-shop.adafruit.com/datasheets/WS2812B.pdf |
Books
| Book | Author | Relevant Chapters |
|---|---|---|
| DSP Guide | Steven W. Smith | Ch. 8-12: FFT (free online) |
| Making Embedded Systems | Elecia White | Ch. 6, 8: DMA, Multitasking |
| Mastering FreeRTOS | FreeRTOS.org | Tasks, Queues (free PDF) |
Online Resources
| Resource | Description |
|---|---|
| DSPGuide.com | Free complete DSP textbook |
| INMP441 Hookup Guide | SparkFun tutorial |
| FastLED Library | Arduino LED library |
Self-Assessment Checklist
Fundamentals
- I can explain the Nyquist theorem
- I understand what FFT computes and why itโs fast
- I can describe why windowing reduces spectral leakage
- I know how DMA enables zero-copy audio capture
Implementation
- Audio waveform is clean in serial plotter
- Known frequencies map to correct FFT bins
- Display achieves 30+ FPS
- Latency is imperceptible (<100ms)
Code Quality
- No audio dropouts during operation
- Memory usage is stable over time
- Both cores have headroom (<80% usage)
- Display looks smooth and responsive
Interview Preparation
Be ready to answer these questions:
- โExplain how FFT converts time-domain audio to frequency-domain spectrum.โ
- Decomposes signal into constituent frequencies, O(n log n), complex output, magnitude = energy
- โWhy must sample rate be at least 2ร the highest frequency?โ
- Nyquist theorem, aliasing if violated, frequencies fold back
- โHow does I2S DMA work and why is it necessary?โ
- DMA moves data without CPU, prevents sample drops, ping-pong buffering
- โHow do you divide work between ESP32โs two cores?โ
- Pin tasks with xTaskCreatePinnedToCore, queues for communication
- โWhat is spectral leakage and how does windowing fix it?โ
- Discontinuity at buffer edges causes energy spread, window tapers edges to zero
- โHow do you map 512 FFT bins to 8 display bars?โ
- Logarithmic frequency spacing (humans hear logarithmically), bin averaging
Next Project: P05-ota-smart-home-hub.md - OTA-Updatable Smart Home Hub