LEARN VBAN PROTOCOL
Learn VBAN Protocol: From Packets to Production Audio Streaming
Goal: Deeply understand the VBAN (VB-Audio Network) protocol—from raw UDP packets to building complete audio streaming systems, embedded implementations, and network audio applications.
Why Learn VBAN?
VBAN is a lightweight, open protocol for streaming audio (and MIDI/text/serial data) over standard IP networks. Unlike professional protocols like Dante or AES67, VBAN is:
- Simple: 28-byte header + PCM data over UDP
- Open: Free specification, no licensing fees
- Practical: Works on any network, no special hardware
- Extensible: Sub-protocols for audio, MIDI, text, and services
Understanding VBAN teaches you:
- Network audio fundamentals - How digital audio travels over IP
- UDP programming - Real-time, low-latency network protocols
- Binary protocols - Parsing and constructing packet formats
- Audio processing - PCM formats, sample rates, channel configurations
- Embedded systems - Implementing protocols on microcontrollers
- Real-time systems - Handling jitter, latency, and synchronization
After completing these projects, you will:
- Understand every byte in a VBAN packet
- Build your own audio streaming tools
- Create embedded audio devices (ESP32, Arduino)
- Implement network audio monitoring and routing
- Appreciate the trade-offs between VBAN and professional protocols
Core Concept Analysis
What is VBAN?
┌─────────────────────────────────────────────────────────────────────────────┐
│ VBAN OVERVIEW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ VBAN = VB-Audio Network Protocol │
│ │
│ Created by: Vincent Burel (VB-Audio Software) │
│ First released: June 2015 (with Voicemeeter) │
│ License: Open/Free for any use │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ VBAN CHARACTERISTICS │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ • Transport: UDP (User Datagram Protocol) │ │
│ │ • Latency: Low (no TCP handshaking) │ │
│ │ • Reliability: None (no retransmission, no ACKs) │ │
│ │ • Model: Broadcast (sender=master, receiver=slave) │ │
│ │ • Sync: None (receiver adapts to sender's clock) │ │
│ │ • Format: Native PCM (uncompressed audio) │ │
│ │ │ │
│ │ Max Channels: 256 (commonly 1-8) │ │
│ │ Max Sample Rate: 705,600 Hz │ │
│ │ Bit Depths: 8, 10, 12, 16, 24, 32-bit int, 32/64-bit float │ │
│ │ Default Port: 6980 │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Sub-protocols: │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ AUDIO │ │ SERIAL │ │ TEXT │ │ SERVICE │ │
│ │ (0x00) │ │ (0x20) │ │ (0x40) │ │ (0x60) │ │
│ │ │ │ (MIDI) │ │ (Remote) │ │ (Ping) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
VBAN vs Professional Audio Networking
| Aspect | VBAN | Dante | AES67 | AVB |
|---|---|---|---|---|
| Cost | Free | Licensing fees | Free standard | Free standard |
| Hardware | Any network | Any network | Any network | AVB switches required |
| Sync | None (slave adapts) | PTP (precise) | PTPv2 | gPTP |
| Reliability | Best effort (UDP) | Redundant networks | Redundant | QoS guaranteed |
| Latency | ~5-20ms typical | <1ms possible | <1ms possible | <2ms guaranteed |
| Discovery | Manual/Ping | Automatic | Manual | Automatic |
| Channels | 256 | 1024+ | Unlimited | 8 per stream |
| Use case | Home/Semi-pro | Professional | Professional | Professional |
VBAN Packet Structure
┌─────────────────────────────────────────────────────────────────────────────┐
│ VBAN PACKET STRUCTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ UDP Packet (max 1464 bytes for VBAN) │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ VBAN HEADER (28 bytes) │ AUDIO DATA (1-1436 bytes) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ VBAN HEADER LAYOUT (28 bytes) │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Offset Size Field Description │ │
│ │ ────────────────────────────────────────────────────────────── │ │
│ │ 0 4 vban Magic: 'V','B','A','N' (0x4E414256) │ │
│ │ 4 1 format_SR Sample Rate Index + Protocol │ │
│ │ 5 1 format_nbs Samples per frame - 1 (0-255) │ │
│ │ 6 1 format_nbc Channels - 1 (0-255) │ │
│ │ 7 1 format_bit Bit format + Codec │ │
│ │ 8 16 streamname Stream name (ASCII, null-padded) │ │
│ │ 24 4 nuFrame Frame counter (32-bit, little-end) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ format_SR byte layout: │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Bits 7-5: Sub-Protocol Bits 4-0: Sample Rate Index │ │
│ │ ┌───┬───┬───┬───┬───┬───┬───┬───┐ │ │
│ │ │ P │ P │ P │ SR│ SR│ SR│ SR│ SR│ │ │
│ │ └───┴───┴───┴───┴───┴───┴───┴───┘ │ │
│ │ Protocol: 000=Audio, 001=Serial, 010=Text, 011=Service │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ format_bit byte layout: │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Bits 7-4: Codec Type Bits 3-0: Bit Resolution │ │
│ │ ┌───┬───┬───┬───┬───┬───┬───┬───┐ │ │
│ │ │ C │ C │ C │ C │ BR│ BR│ BR│ BR│ │ │
│ │ └───┴───┴───┴───┴───┴───┴───┴───┘ │ │
│ │ Codec: 0=PCM, 1=VBCA (compressed), 2=VBCV (voice)... │ │
│ │ BitRes: 0=U8, 1=S16, 2=S24, 3=S32, 4=F32, 5=F64, 6=12bit, 7=10bit │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Sample Rate Index Table
┌─────────────────────────────────────────────────────────────────────────────┐
│ VBAN SAMPLE RATE INDEX TABLE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Index Sample Rate Common Use │
│ ───────────────────────────────────────────────────────────────────────── │
│ 0 6,000 Hz Low quality voice │
│ 1 12,000 Hz Voice │
│ 2 24,000 Hz Voice/AM radio │
│ 3 48,000 Hz ★ Professional audio, video │
│ 4 96,000 Hz High-resolution audio │
│ 5 192,000 Hz Studio master │
│ 6 384,000 Hz Ultra-high resolution │
│ 7 8,000 Hz Telephone │
│ 8 16,000 Hz Wideband voice │
│ 9 32,000 Hz FM broadcast │
│ 10 64,000 Hz - │
│ 11 128,000 Hz - │
│ 12 256,000 Hz - │
│ 13 512,000 Hz - │
│ 14 11,025 Hz Low-quality audio │
│ 15 22,050 Hz Half CD quality │
│ 16 44,100 Hz ★ CD quality │
│ 17 88,200 Hz 2x CD │
│ 18 176,400 Hz 4x CD │
│ 19 352,800 Hz 8x CD (DSD conversion) │
│ 20 705,600 Hz 16x CD │
│ │
│ ★ = Most commonly used │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Data Flow: Sender to Receiver
┌─────────────────────────────────────────────────────────────────────────────┐
│ VBAN DATA FLOW │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ SENDER (Master) RECEIVER (Slave) │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ Audio Source │ │ Audio Output │ │
│ │ (Microphone, DAW, │ │ (Speakers, DAW, │ │
│ │ System audio) │ │ Recording) │ │
│ └──────────┬──────────┘ └──────────▲──────────┘ │
│ │ │ │
│ ▼ │ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ PCM Buffer │ │ Jitter Buffer │ │
│ │ (256 samples max │ │ (Compensates for │ │
│ │ per VBAN packet) │ │ network variance) │ │
│ └──────────┬──────────┘ └──────────▲──────────┘ │
│ │ │ │
│ ▼ │ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ VBAN Packet │ │ VBAN Parser │ │
│ │ Assembly │ │ • Validate header │ │
│ │ • Add 28-byte hdr │ │ • Check stream name│ │
│ │ • Increment nuFrame│ │ • Extract config │ │
│ └──────────┬──────────┘ └──────────▲──────────┘ │
│ │ │ │
│ ▼ │ │
│ ┌─────────────────────┐ UDP/IP ┌─────────────────────┐ │
│ │ UDP Socket │ ═══════════════► │ UDP Socket │ │
│ │ sendto(IP:6980) │ │ bind(0.0.0.0:6980) │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TIMING CONSIDERATIONS │ │
│ ├─────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ 48kHz, 256 samples/packet = 5.33ms per packet │ │
│ │ 48kHz, 128 samples/packet = 2.67ms per packet │ │
│ │ 44.1kHz, 256 samples/packet = 5.80ms per packet │ │
│ │ │ │
│ │ Network latency (LAN): typically < 1ms │ │
│ │ Total end-to-end: 10-30ms typical (including buffers) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Protocol Constants
// VBAN Header Size
#define VBAN_HEADER_SIZE 28 // 4 + 1 + 1 + 1 + 1 + 16 + 4
// Maximum sizes
#define VBAN_DATA_MAX_SIZE 1436 // MTU - IP - UDP - safety margin
#define VBAN_PACKET_MAX_SIZE 1464 // Header + Data
#define VBAN_SAMPLES_MAX 256 // Per packet
// Protocol types (bits 7-5 of format_SR)
#define VBAN_PROTOCOL_AUDIO 0x00 // PCM audio
#define VBAN_PROTOCOL_SERIAL 0x20 // Serial/MIDI
#define VBAN_PROTOCOL_TXT 0x40 // Text commands
#define VBAN_PROTOCOL_SERVICE 0x60 // Ping/discovery
#define VBAN_PROTOCOL_MASK 0xE0
// Sample rate mask (bits 4-0 of format_SR)
#define VBAN_SR_MASK 0x1F
#define VBAN_SR_MAXNUMBER 21
// Data types (bits 3-0 of format_bit)
#define VBAN_DATATYPE_U8 0x00 // Unsigned 8-bit
#define VBAN_DATATYPE_S16 0x01 // Signed 16-bit
#define VBAN_DATATYPE_S24 0x02 // Signed 24-bit
#define VBAN_DATATYPE_S32 0x03 // Signed 32-bit
#define VBAN_DATATYPE_F32 0x04 // Float 32-bit
#define VBAN_DATATYPE_F64 0x05 // Float 64-bit
#define VBAN_DATATYPE_12BIT 0x06 // 12-bit packed
#define VBAN_DATATYPE_10BIT 0x07 // 10-bit packed
#define VBAN_DATATYPE_MASK 0x07
// Codec types (bits 7-4 of format_bit)
#define VBAN_CODEC_PCM 0x00 // Native PCM
#define VBAN_CODEC_VBCA 0x10 // VB-Audio compressed
#define VBAN_CODEC_VBCV 0x20 // VB-Audio voice
#define VBAN_CODEC_MASK 0xF0
// Default port
#define VBAN_DEFAULT_PORT 6980
Project List
Projects are ordered from basic protocol understanding to complete audio systems and embedded implementations.
Project 1: VBAN Packet Analyzer (See the Wire Format)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Python
- Alternative Programming Languages: C, Rust, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Network Protocols / Binary Parsing
- Software or Tool: Wireshark, Python
- Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens
What you’ll build: A packet analyzer that captures VBAN packets from the network (or reads pcap files), decodes the 28-byte header, and displays stream information: sample rate, channels, bit depth, stream name, and frame counter.
Why it teaches VBAN: Before you can send or receive audio, you need to see the protocol. This project forces you to understand every byte in the header—the foundation of everything else.
Core challenges you’ll face:
- Capturing UDP packets → maps to socket programming with raw sockets or pcap
- Parsing binary headers → maps to struct unpacking, endianness
- Decoding indexed values → maps to sample rate lookup tables
- Filtering by stream name → maps to null-terminated ASCII strings
Resources for key challenges:
- VBAN Protocol Specification (PDF) - Official protocol documentation
- quiniouben/vban - Reference C implementation
- pyVBAN - Python implementation
Key Concepts:
- UDP Sockets: “The Linux Programming Interface” Chapter 57 - Michael Kerrisk
- Binary Parsing: Python
structmodule documentation - Network Capture: Wireshark User Guide
Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Python, understanding of networking basics
Real world outcome:
$ sudo python vban_analyzer.py --interface eth0
Listening for VBAN packets on eth0:6980...
[VBAN Audio] 192.168.1.100 → 192.168.1.255
Stream: "Stream1"
Protocol: AUDIO (PCM)
Sample Rate: 48000 Hz (index 3)
Channels: 2 (stereo)
Bit Depth: 16-bit signed
Samples/Frame: 256
Frame #: 123456
Payload: 1024 bytes (256 samples × 2 ch × 2 bytes)
[VBAN Audio] 192.168.1.100 → 192.168.1.255
Stream: "Stream1"
Frame #: 123457 (+1)
...
[VBAN Text] 192.168.1.50 → 192.168.1.100
Stream: "Command1"
Protocol: TEXT
Content: "Strip[0].Mute = 1"
Implementation Hints:
VBAN header structure in Python:
import struct
# VBAN Header: 28 bytes total
# 'VBAN' (4) + format_SR (1) + format_nbs (1) + format_nbc (1) +
# format_bit (1) + streamname (16) + nuFrame (4)
VBAN_HEADER_FORMAT = '<4sBBBB16sI' # Little-endian
VBAN_HEADER_SIZE = 28
# Sample rate lookup table (21 entries)
VBAN_SR_LIST = [
6000, 12000, 24000, 48000, 96000, 192000, 384000,
8000, 16000, 32000, 64000, 128000, 256000, 512000,
11025, 22050, 44100, 88200, 176400, 352800, 705600
]
def parse_vban_header(data):
if len(data) < VBAN_HEADER_SIZE:
return None
vban, format_sr, format_nbs, format_nbc, format_bit, \
streamname, nu_frame = struct.unpack(VBAN_HEADER_FORMAT, data[:28])
# Validate magic bytes
if vban != b'VBAN':
return None
# Extract fields
protocol = (format_sr & 0xE0) >> 5
sr_index = format_sr & 0x1F
sample_rate = VBAN_SR_LIST[sr_index] if sr_index < 21 else 0
samples_per_frame = format_nbs + 1
channels = format_nbc + 1
bit_format = format_bit & 0x07
codec = (format_bit & 0xF0) >> 4
stream_name = streamname.rstrip(b'\x00').decode('ascii', errors='replace')
return {
'protocol': protocol,
'sample_rate': sample_rate,
'samples': samples_per_frame,
'channels': channels,
'bit_format': bit_format,
'codec': codec,
'stream_name': stream_name,
'frame': nu_frame
}
UDP listener:
import socket
def listen_for_vban(port=6980):
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('0.0.0.0', port))
print(f"Listening on port {port}...")
while True:
data, addr = sock.recvfrom(1500)
header = parse_vban_header(data)
if header:
print(f"From {addr}: {header['stream_name']} - "
f"{header['sample_rate']}Hz, {header['channels']}ch, "
f"Frame #{header['frame']}")
Questions to guide your implementation:
- What happens if packets arrive out of order? (Check frame counter)
- How do you distinguish audio from text packets? (Protocol bits)
- What’s the relationship between sample rate, channels, and packet size?
Learning milestones:
- You capture raw packets → You understand UDP socket binding
- You parse the header correctly → You understand binary formats
- You decode all field types → You understand the protocol structure
- You track frame numbers → You understand packet ordering
Project 2: Simple VBAN Receiver (Play Network Audio)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Python
- Alternative Programming Languages: C, Rust, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Audio I/O / Network Programming
- Software or Tool: Python, PyAudio or sounddevice
- Main Book: “The Audio Programming Book” by Richard Boulanger
What you’ll build: A command-line VBAN receiver that listens for a specific stream, buffers incoming audio, and plays it through your speakers in real-time.
Why it teaches VBAN: Receiving audio forces you to handle real-time constraints—buffering, jitter compensation, and sample rate matching. This is where VBAN’s “slave” model becomes real.
Core challenges you’ll face:
- Jitter buffering → maps to compensating for network timing variance
- Sample rate matching → maps to configuring audio output to match stream
- Stream filtering → maps to selecting specific stream by name/IP
- Dropout handling → maps to detecting lost packets, silence insertion
Key Concepts:
- Audio I/O: PyAudio or sounddevice documentation
- Ring Buffers: For audio buffering
- Jitter: “Computer Networks” Chapter 6 - Tanenbaum
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 1, understanding of audio basics
Real world outcome:
$ python vban_receiver.py --stream "Stream1" --ip 192.168.1.100
VBAN Receiver v1.0
Listening for stream "Stream1" from 192.168.1.100 on port 6980
[Connected] Receiving: 48000 Hz, 2 channels, 16-bit
Buffer: [████████████░░░░░░░░] 60% | Latency: 15ms
Press Ctrl+C to stop...
^C
Statistics:
Packets received: 12,345
Packets lost: 3 (0.02%)
Underruns: 0
Total time: 64.2 seconds
Implementation Hints:
Audio output with sounddevice:
import sounddevice as sd
import numpy as np
from collections import deque
import threading
class VBANReceiver:
def __init__(self, stream_name, buffer_ms=50):
self.stream_name = stream_name
self.buffer = deque(maxlen=100) # Ring buffer
self.audio_stream = None
self.sample_rate = None
self.channels = None
self.running = False
self.buffer_target_ms = buffer_ms
self.last_frame = -1
self.packets_lost = 0
def audio_callback(self, outdata, frames, time, status):
"""Called by sounddevice when it needs audio"""
if status:
print(f"Audio status: {status}")
if len(self.buffer) > 0:
# Get audio from buffer
audio = self.buffer.popleft()
# Ensure correct shape
if len(audio) < frames:
# Pad with zeros if underrun
audio = np.pad(audio, ((0, frames - len(audio)), (0, 0)))
outdata[:] = audio[:frames]
else:
# Buffer underrun - output silence
outdata.fill(0)
def process_packet(self, data, addr):
header = parse_vban_header(data)
if not header:
return
# Filter by stream name
if header['stream_name'] != self.stream_name:
return
# Check for lost packets
if self.last_frame >= 0:
expected = (self.last_frame + 1) & 0xFFFFFFFF
if header['frame'] != expected:
lost = (header['frame'] - expected) & 0xFFFFFFFF
self.packets_lost += lost
self.last_frame = header['frame']
# Initialize audio stream on first packet
if self.audio_stream is None:
self.sample_rate = header['sample_rate']
self.channels = header['channels']
self.start_audio()
# Convert PCM data to numpy array
audio_data = data[VBAN_HEADER_SIZE:]
samples = self.convert_audio(audio_data, header)
self.buffer.append(samples)
def convert_audio(self, data, header):
"""Convert raw PCM bytes to numpy array"""
dtype_map = {
0: np.uint8, # U8
1: np.int16, # S16
2: None, # S24 - special handling
3: np.int32, # S32
4: np.float32, # F32
5: np.float64, # F64
}
dtype = dtype_map.get(header['bit_format'], np.int16)
audio = np.frombuffer(data, dtype=dtype)
# Reshape to (samples, channels)
audio = audio.reshape(-1, header['channels'])
# Normalize to float32 for output
if dtype == np.int16:
audio = audio.astype(np.float32) / 32768.0
elif dtype == np.int32:
audio = audio.astype(np.float32) / 2147483648.0
return audio
def start_audio(self):
self.audio_stream = sd.OutputStream(
samplerate=self.sample_rate,
channels=self.channels,
callback=self.audio_callback,
blocksize=256
)
self.audio_stream.start()
Main receive loop:
def main():
receiver = VBANReceiver(stream_name="Stream1")
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('0.0.0.0', 6980))
print(f"Listening for VBAN stream...")
try:
while True:
data, addr = sock.recvfrom(1500)
receiver.process_packet(data, addr)
except KeyboardInterrupt:
print("\nStopping...")
Learning milestones:
- Audio plays (with glitches) → You understand the basic pipeline
- Buffering reduces glitches → You understand jitter compensation
- Lost packets detected → You understand frame counter usage
- Clean playback achieved → You’ve built a working receiver
Project 3: VBAN Emitter (Send Audio to the Network)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Python
- Alternative Programming Languages: C, Rust, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Audio Capture / Network Programming
- Software or Tool: Python, PyAudio or sounddevice
- Main Book: “The Audio Programming Book” by Richard Boulanger
What you’ll build: A command-line VBAN emitter that captures audio from your microphone or system audio and streams it over the network.
Why it teaches VBAN: Being the sender (master) teaches you about timing—you control the clock. You’ll understand packet pacing and why the sender drives everything.
Core challenges you’ll face:
- Audio capture → maps to reading from input devices
- Packet timing → maps to sending at consistent intervals
- Header construction → maps to building valid VBAN packets
- Broadcast vs unicast → maps to choosing destination addressing
Key Concepts:
- Audio Capture: Platform audio APIs
- Packet Pacing: Timing audio transmission
- UDP Broadcasting: Sending to multiple receivers
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 2, audio capture basics
Real world outcome:
$ python vban_emitter.py --stream "MyStream" --dest 192.168.1.255 \
--device "Microphone" --rate 48000 --channels 2
VBAN Emitter v1.0
Source: Microphone (Built-in)
Destination: 192.168.1.255:6980 (broadcast)
Stream: "MyStream"
Format: 48000 Hz, 2 channels, 16-bit PCM
Streaming... [████████████████████] 100% CPU: 2%
Packets sent: 15,234 | Bytes: 23.4 MB | Time: 81.2s
Press Ctrl+C to stop...
Implementation Hints:
VBAN packet construction:
class VBANEmitter:
def __init__(self, stream_name, dest_ip, dest_port=6980,
sample_rate=48000, channels=2, bit_depth=16):
self.stream_name = stream_name
self.dest = (dest_ip, dest_port)
self.sample_rate = sample_rate
self.channels = channels
self.bit_depth = bit_depth
self.frame_counter = 0
# Find sample rate index
self.sr_index = VBAN_SR_LIST.index(sample_rate)
self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
# Enable broadcast if using broadcast address
self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
def build_header(self, samples_per_frame):
"""Construct 28-byte VBAN header"""
# format_SR: protocol (3 bits) + sample rate index (5 bits)
format_sr = (0x00 << 5) | (self.sr_index & 0x1F) # Audio protocol
# format_nbs: samples per frame - 1
format_nbs = samples_per_frame - 1
# format_nbc: channels - 1
format_nbc = self.channels - 1
# format_bit: codec (4 bits) + bit depth (4 bits)
bit_format = 1 if self.bit_depth == 16 else 0 # S16
format_bit = (0x00 << 4) | (bit_format & 0x0F) # PCM codec
# Stream name (16 bytes, null-padded)
stream_bytes = self.stream_name.encode('ascii')[:16].ljust(16, b'\x00')
header = struct.pack('<4sBBBB16sI',
b'VBAN',
format_sr,
format_nbs,
format_nbc,
format_bit,
stream_bytes,
self.frame_counter
)
self.frame_counter = (self.frame_counter + 1) & 0xFFFFFFFF
return header
def send_audio(self, audio_data):
"""Send audio samples as VBAN packet"""
# audio_data should be numpy array of shape (samples, channels)
samples_per_frame = len(audio_data)
# Convert to bytes
if self.bit_depth == 16:
pcm_data = (audio_data * 32767).astype(np.int16).tobytes()
# Build packet
header = self.build_header(samples_per_frame)
packet = header + pcm_data
# Send
self.sock.sendto(packet, self.dest)
Audio capture and streaming:
def audio_input_callback(indata, frames, time, status):
"""Called when audio input is available"""
if status:
print(f"Input status: {status}")
# indata is numpy array of shape (frames, channels)
emitter.send_audio(indata.copy())
def start_streaming():
global emitter
emitter = VBANEmitter(
stream_name="MyStream",
dest_ip="192.168.1.255",
sample_rate=48000,
channels=2
)
# Start audio input stream
with sd.InputStream(
samplerate=48000,
channels=2,
callback=audio_input_callback,
blocksize=256 # 256 samples = one VBAN packet
):
print("Streaming... Press Ctrl+C to stop")
while True:
sd.sleep(1000)
Learning milestones:
- Packets are sent → You understand header construction
- Voicemeeter receives them → Your packets are valid
- Audio plays correctly → Timing and format are correct
- Works across network → Broadcast/unicast works
Project 4: VBAN Text Protocol (Remote Control)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Python
- Alternative Programming Languages: C, Go, JavaScript
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Remote Control / Command Protocols
- Software or Tool: Voicemeeter, Python
- Main Book: “Design Patterns” by Gang of Four (Command pattern)
What you’ll build: A tool that sends VBAN text commands to control Voicemeeter remotely—muting channels, adjusting volumes, changing settings.
Why it teaches VBAN sub-protocols: VBAN isn’t just audio. The TEXT sub-protocol shows how the same packet format extends to different data types. You’ll learn protocol design flexibility.
Core challenges you’ll face:
- Text encoding → maps to UTF-8 in VBAN packets
- Command syntax → maps to Voicemeeter’s control language
- Bidirectional communication → maps to sending commands, receiving state
- Reliable delivery → maps to UDP unreliability, retries
Key Concepts:
- VBAN Text Protocol: VB-Audio documentation
- Command Pattern: Remote control design
- UTF-8 Encoding: Text in binary protocols
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Project 1
Real world outcome:
# Mute input strip 0
$ vban_sendtext --ip 192.168.1.100 "Strip[0].Mute = 1"
Sent: Strip[0].Mute = 1
# Set volume of bus 0
$ vban_sendtext --ip 192.168.1.100 "Bus[0].Gain = -6.0"
Sent: Bus[0].Gain = -6.0
# Interactive mode
$ vban_sendtext --ip 192.168.1.100 --interactive
VBAN Text Console (connected to 192.168.1.100)
> Strip[0].Mute = 0
Sent.
> Strip[0].Gain = -3.0
Sent.
> quit
Implementation Hints:
VBAN text packet construction:
class VBANText:
def __init__(self, dest_ip, dest_port=6980, stream_name="Command1"):
self.dest = (dest_ip, dest_port)
self.stream_name = stream_name
self.frame_counter = 0
self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
def send_text(self, text):
"""Send text command via VBAN"""
# format_SR: TEXT protocol (0x40) + encoding
# Encoding: 0=ASCII, 1=UTF8, 2=WCHAR
format_sr = 0x40 | 0x01 # TEXT + UTF8
# format_nbs, format_nbc not used for text (set to 0)
format_nbs = 0
format_nbc = 0
# format_bit: charset info
format_bit = 0x10 # UTF8
stream_bytes = self.stream_name.encode('ascii')[:16].ljust(16, b'\x00')
header = struct.pack('<4sBBBB16sI',
b'VBAN',
format_sr,
format_nbs,
format_nbc,
format_bit,
stream_bytes,
self.frame_counter
)
self.frame_counter = (self.frame_counter + 1) & 0xFFFFFFFF
# Text payload (UTF-8 encoded)
payload = text.encode('utf-8')
packet = header + payload
self.sock.sendto(packet, self.dest)
return True
Voicemeeter command examples:
# Common Voicemeeter commands
COMMANDS = {
# Strips (inputs)
'mute_strip': 'Strip[{n}].Mute = {v}',
'strip_gain': 'Strip[{n}].Gain = {v}',
'strip_solo': 'Strip[{n}].Solo = {v}',
# Buses (outputs)
'mute_bus': 'Bus[{n}].Mute = {v}',
'bus_gain': 'Bus[{n}].Gain = {v}',
# Routing
'strip_to_bus': 'Strip[{strip}].A{bus} = {v}', # A1-A5, B1-B3
# Recorder
'record': 'Recorder.Record = {v}',
'stop': 'Recorder.Stop = {v}',
}
def mute_strip(vban_text, strip_number, muted=True):
cmd = f"Strip[{strip_number}].Mute = {1 if muted else 0}"
vban_text.send_text(cmd)
Learning milestones:
- Text packets construct → You understand the TEXT sub-protocol
- Voicemeeter responds → Your commands are valid
- Complex commands work → You understand the control syntax
- You build an interface → You’ve created a remote control
Project 5: VBAN Serial/MIDI (Musical Control)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Python
- Alternative Programming Languages: C, C++
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: MIDI Protocol / Serial Communication
- Software or Tool: Python, mido (MIDI library)
- Main Book: “MIDI Manual” by David Miles Huber
What you’ll build: A MIDI-over-VBAN bridge that sends and receives MIDI messages, enabling control of DAWs and synthesizers over the network.
Why it teaches VBAN extensions: The SERIAL sub-protocol shows VBAN’s versatility. You’ll learn how binary protocols can carry different payloads and how MIDI integrates with network transport.
Core challenges you’ll face:
- MIDI message format → maps to note on/off, CC, pitch bend
- Serial header configuration → maps to baud rate in SR field
- Timing accuracy → maps to MIDI timing requirements
- Multiple messages per packet → maps to efficient MIDI bundling
Key Concepts:
- MIDI Protocol: MIDI 1.0 specification
- VBAN Serial: VB-Audio documentation
- Real-time Constraints: MIDI timing requirements
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Project 1, MIDI basics
Real world outcome:
$ vban_midi_bridge --ip 192.168.1.100 --local-port 6980 \
--midi-in "USB MIDI Controller" --midi-out "Virtual MIDI Port"
VBAN MIDI Bridge v1.0
Local MIDI In: USB MIDI Controller
Local MIDI Out: Virtual MIDI Port
Remote: 192.168.1.100:6980
[TX] Note On: Ch1 C4 vel:100
[TX] Note Off: Ch1 C4 vel:0
[RX] CC: Ch1 CC7 val:80
[RX] Note On: Ch1 E4 vel:90
...
Implementation Hints:
VBAN Serial/MIDI header:
class VBANMidi:
def __init__(self, dest_ip, dest_port=6980, stream_name="Midi1"):
self.dest = (dest_ip, dest_port)
self.stream_name = stream_name
self.frame_counter = 0
self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
def send_midi(self, midi_messages):
"""
Send MIDI messages via VBAN Serial protocol
midi_messages: list of bytes objects (raw MIDI)
"""
# format_SR: SERIAL protocol (0x20) + bps index
# For MIDI, we use 31250 baud (standard MIDI rate)
# The SR field contains bps/100, so 31250 → index for ~312
format_sr = 0x20 | 0x00 # SERIAL + MIDI mode
# format_nbs: not used (0)
format_nbs = 0
# format_nbc: not used (0)
format_nbc = 0
# format_bit: MIDI type indicator
format_bit = 0x10 # MIDI mode
stream_bytes = self.stream_name.encode('ascii')[:16].ljust(16, b'\x00')
header = struct.pack('<4sBBBB16sI',
b'VBAN',
format_sr,
format_nbs,
format_nbc,
format_bit,
stream_bytes,
self.frame_counter
)
self.frame_counter = (self.frame_counter + 1) & 0xFFFFFFFF
# MIDI payload: concatenate all messages
payload = b''.join(midi_messages)
packet = header + payload
self.sock.sendto(packet, self.dest)
MIDI message construction:
def note_on(channel, note, velocity):
"""Create MIDI Note On message"""
return bytes([0x90 | (channel & 0x0F), note & 0x7F, velocity & 0x7F])
def note_off(channel, note, velocity=0):
"""Create MIDI Note Off message"""
return bytes([0x80 | (channel & 0x0F), note & 0x7F, velocity & 0x7F])
def control_change(channel, controller, value):
"""Create MIDI Control Change message"""
return bytes([0xB0 | (channel & 0x0F), controller & 0x7F, value & 0x7F])
def pitch_bend(channel, value):
"""Create MIDI Pitch Bend message (value: -8192 to 8191)"""
value = value + 8192 # Convert to 0-16383
lsb = value & 0x7F
msb = (value >> 7) & 0x7F
return bytes([0xE0 | (channel & 0x0F), lsb, msb])
Learning milestones:
- MIDI messages send → You understand MIDI format
- DAW receives notes → Your VBAN-MIDI bridge works
- Bidirectional works → You handle both directions
- Timing is accurate → MIDI performance is usable
Project 6: ESP32 VBAN Audio Node (Embedded Implementation)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: C++ (Arduino)
- Alternative Programming Languages: C, MicroPython
- Coolness Level: Level 5: Pure Magic
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Embedded Systems / I2S Audio
- Software or Tool: ESP32, Arduino IDE, I2S microphone/DAC
- Main Book: “Making Embedded Systems” by Elecia White
What you’ll build: An ESP32-based network audio device that can send microphone audio to the network (emitter) and/or receive audio and play it through speakers (receiver).
Why it teaches embedded VBAN: Moving from PC to microcontroller forces you to understand memory constraints, DMA, and real-time requirements. This is where VBAN becomes a hardware protocol.
Core challenges you’ll face:
- I2S audio interface → maps to hardware audio on ESP32
- WiFi UDP → maps to wireless networking on microcontrollers
- DMA buffers → maps to efficient audio transfer
- Real-time constraints → maps to timing on embedded systems
Resources for key challenges:
- ESP32-VBAN-Audio-Source - Reference implementation
- ESP32-VBAN-Network-Audio-Player - Receiver reference
- Arduino Audio Tools - Audio library with VBAN support
Key Concepts:
- I2S Protocol: ESP32 I2S documentation
- DMA: Direct Memory Access for audio
- WiFi UDP: ESP32 networking
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 1-3, Arduino basics, soldering
Real world outcome:
Hardware Setup:
┌─────────────────────────────────────────────────────────────────┐
│ ESP32 VBAN Audio Node │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ INMP441 │────►│ ESP32 │────►│ MAX98357 │ │
│ │ I2S Mic │ │ │ │ I2S DAC │ │
│ └──────────┘ │ WiFi │ └────┬─────┘ │
│ │ ▲ │ │ │
│ └────┼─────┘ ▼ │
│ │ ┌─────────┐ │
│ VBAN UDP │ Speaker │ │
│ │ └─────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Voicemeeter │ │
│ │ (PC) │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Serial Output:
ESP32 VBAN Audio Node
Connecting to WiFi: MyNetwork... Connected!
IP Address: 192.168.1.42
VBAN Emitter: streaming to 192.168.1.100:6980 as "ESP32-Mic"
VBAN Receiver: listening for "Stream1" on port 6980
Audio: 44100 Hz, 16-bit mono
Running...
Implementation Hints:
ESP32 VBAN header structure:
// VBAN Header (28 bytes)
typedef struct __attribute__((packed)) {
char vban[4]; // "VBAN"
uint8_t format_SR; // Sample rate index + protocol
uint8_t format_nbs; // Samples per frame - 1
uint8_t format_nbc; // Channels - 1
uint8_t format_bit; // Bit format + codec
char streamname[16]; // Stream name
uint32_t nuFrame; // Frame counter
} VBANHeader;
// Sample rate lookup
const uint32_t VBAN_SRList[] = {
6000, 12000, 24000, 48000, 96000, 192000, 384000,
8000, 16000, 32000, 64000, 128000, 256000, 512000,
11025, 22050, 44100, 88200, 176400, 352800, 705600
};
// Find sample rate index
uint8_t getSRIndex(uint32_t sampleRate) {
for (int i = 0; i < 21; i++) {
if (VBAN_SRList[i] == sampleRate) return i;
}
return 3; // Default to 48000
}
I2S microphone input:
#include <driver/i2s.h>
#define I2S_WS 25 // Word Select (LRCK)
#define I2S_SD 33 // Serial Data
#define I2S_SCK 32 // Serial Clock
void setupI2SMic() {
i2s_config_t i2s_config = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
.sample_rate = 44100,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = 256,
.use_apll = false
};
i2s_pin_config_t pin_config = {
.bck_io_num = I2S_SCK,
.ws_io_num = I2S_WS,
.data_out_num = I2S_PIN_NO_CHANGE,
.data_in_num = I2S_SD
};
i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);
i2s_set_pin(I2S_NUM_0, &pin_config);
}
VBAN packet sending:
#include <WiFi.h>
#include <WiFiUdp.h>
WiFiUDP udp;
uint32_t frameCounter = 0;
uint8_t packet[1464]; // Max VBAN packet size
void sendVBANPacket(int16_t* samples, int numSamples) {
VBANHeader* header = (VBANHeader*)packet;
// Fill header
memcpy(header->vban, "VBAN", 4);
header->format_SR = 0x00 | getSRIndex(44100); // Audio + 44100Hz
header->format_nbs = numSamples - 1;
header->format_nbc = 0; // 1 channel
header->format_bit = 0x01; // 16-bit signed
strncpy(header->streamname, "ESP32-Mic", 16);
header->nuFrame = frameCounter++;
// Copy audio data
memcpy(packet + 28, samples, numSamples * 2);
// Send
udp.beginPacket(destIP, 6980);
udp.write(packet, 28 + numSamples * 2);
udp.endPacket();
}
void loop() {
int16_t samples[256];
size_t bytesRead;
i2s_read(I2S_NUM_0, samples, 512, &bytesRead, portMAX_DELAY);
if (bytesRead == 512) {
sendVBANPacket(samples, 256);
}
}
Learning milestones:
- WiFi connects → You understand ESP32 networking
- I2S reads audio → You understand hardware audio
- Voicemeeter receives → Your packets are valid
- Bidirectional works → You’ve built a complete audio node
Project 7: VBAN Network Scanner (Service Discovery)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust, C
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Network Discovery / Service Protocol
- Software or Tool: Python
- Main Book: “Computer Networks” by Tanenbaum
What you’ll build: A network scanner that discovers VBAN devices using the SERVICE sub-protocol, lists active streams, and monitors network audio traffic.
Why it teaches VBAN discovery: VBAN has no built-in discovery (unlike Dante). Building one teaches you about service discovery patterns and how to work around protocol limitations.
Core challenges you’ll face:
- Passive discovery → maps to listening for any VBAN packets
- Active ping → maps to SERVICE sub-protocol
- Stream enumeration → maps to tracking unique streams
- Network topology → maps to understanding broadcast domains
Key Concepts:
- Service Discovery: mDNS, broadcast patterns
- Network Scanning: Ethical considerations
- VBAN Service Protocol: Ping packets
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 1
Real world outcome:
$ vban_scanner --timeout 10
VBAN Network Scanner v1.0
Scanning for 10 seconds...
Discovered VBAN Devices:
┌───────────────────┬──────────────────┬─────────────┬──────────┬──────────┐
│ IP Address │ Stream Name │ Type │ Format │ Packets │
├───────────────────┼──────────────────┼─────────────┼──────────┼──────────┤
│ 192.168.1.100 │ Stream1 │ Audio │ 48kHz/2ch│ 1,234 │
│ 192.168.1.100 │ Stream2 │ Audio │ 44.1kHz/1│ 567 │
│ 192.168.1.42 │ ESP32-Mic │ Audio │ 44.1kHz/1│ 890 │
│ 192.168.1.50 │ Command1 │ Text │ UTF-8 │ 23 │
└───────────────────┴──────────────────┴─────────────┴──────────┴──────────┘
Stream Details:
Stream1 @ 192.168.1.100
Duration: 10.0s
Packets: 1,234
Rate: 123.4 packets/sec
Bytes: 1.27 MB
Lost: 0 (0.00%)
Implementation Hints:
Stream tracker:
from collections import defaultdict
import time
class VBANScanner:
def __init__(self):
self.streams = defaultdict(lambda: {
'first_seen': None,
'last_seen': None,
'packets': 0,
'bytes': 0,
'last_frame': -1,
'lost': 0,
'sample_rate': 0,
'channels': 0,
'protocol': 0
})
def process_packet(self, data, addr):
header = parse_vban_header(data)
if not header:
return
key = (addr[0], header['stream_name'])
stream = self.streams[key]
now = time.time()
if stream['first_seen'] is None:
stream['first_seen'] = now
stream['last_seen'] = now
stream['packets'] += 1
stream['bytes'] += len(data)
stream['sample_rate'] = header['sample_rate']
stream['channels'] = header['channels']
stream['protocol'] = header['protocol']
# Track lost packets
if stream['last_frame'] >= 0:
expected = (stream['last_frame'] + 1) & 0xFFFFFFFF
if header['frame'] != expected:
lost = (header['frame'] - expected) & 0xFFFFFFFF
if lost < 1000: # Sanity check
stream['lost'] += lost
stream['last_frame'] = header['frame']
def get_report(self):
report = []
for (ip, name), stream in self.streams.items():
duration = stream['last_seen'] - stream['first_seen']
rate = stream['packets'] / duration if duration > 0 else 0
report.append({
'ip': ip,
'name': name,
'protocol': ['Audio', 'Serial', 'Text', 'Service'][stream['protocol']],
'sample_rate': stream['sample_rate'],
'channels': stream['channels'],
'packets': stream['packets'],
'bytes': stream['bytes'],
'rate': rate,
'lost': stream['lost'],
'duration': duration
})
return report
Learning milestones:
- You see all streams → You understand passive discovery
- Statistics are accurate → You track packets correctly
- Lost packets detected → Frame counting works
- Clean reporting → You’ve built a useful tool
Project 8: VBAN Audio Router (Multi-Stream Hub)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Python
- Alternative Programming Languages: C, Rust, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Audio Routing / Stream Processing
- Software or Tool: Python, numpy
- Main Book: “Designing Audio Effect Plugins in C++” by Will Pirkle
What you’ll build: A software audio router that receives multiple VBAN streams, mixes/routes them to different destinations, and optionally applies processing (gain, mixing).
Why it teaches advanced VBAN: Real audio systems need routing. Building a router teaches you about multiple streams, mixing, and the complexity of multi-source audio.
Core challenges you’ll face:
- Multiple stream handling → maps to concurrent reception
- Sample rate conversion → maps to matching different streams
- Audio mixing → maps to combining streams without clipping
- Routing matrix → maps to flexible input→output mapping
Key Concepts:
- Audio Mixing: Summing, gain staging
- Sample Rate Conversion: Resampling algorithms
- Thread Safety: Concurrent audio processing
Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Projects 1-3, audio processing basics
Real world outcome:
$ vban_router --config router.yaml
VBAN Audio Router v1.0
Inputs:
[1] 192.168.1.100:Stream1 → 48kHz/2ch [████████░░] -6dB
[2] 192.168.1.42:ESP32-Mic → 44.1kHz/1ch [██████████] 0dB
[3] 192.168.1.50:Music → 48kHz/2ch [████░░░░░░] -12dB
Outputs:
[A] 192.168.1.200:6980 "RouterMix" ← [1,2,3]
[B] 192.168.1.201:6980 "VoiceOnly" ← [1,2]
Routing Matrix:
Out A Out B
In 1 [X] [X] (gain: 0dB)
In 2 [X] [X] (gain: +3dB)
In 3 [X] [ ] (gain: -6dB)
Stats: CPU 12% | Latency 15ms | Packets/s: 375
Implementation Hints:
Multi-stream receiver:
import threading
import queue
import numpy as np
class VBANRouter:
def __init__(self, config):
self.inputs = {} # stream_name → input config
self.outputs = {} # output_name → output config
self.routing = {} # (input, output) → gain
self.buffers = {} # stream_name → audio queue
self.running = False
def receive_thread(self, sock):
"""Receive packets and distribute to input buffers"""
while self.running:
data, addr = sock.recvfrom(1500)
header = parse_vban_header(data)
if not header:
continue
key = f"{addr[0]}:{header['stream_name']}"
if key in self.inputs:
audio = self.extract_audio(data, header)
self.buffers[key].put((header, audio))
def mixer_thread(self):
"""Mix inputs according to routing matrix"""
while self.running:
# Collect audio from all inputs
input_audio = {}
for name, buf in self.buffers.items():
try:
header, audio = buf.get(timeout=0.01)
input_audio[name] = (header, audio)
except queue.Empty:
pass
# Mix for each output
for output_name, output_config in self.outputs.items():
mixed = self.mix_for_output(output_name, input_audio)
if mixed is not None:
self.send_output(output_name, mixed)
def mix_for_output(self, output_name, input_audio):
"""Mix all routed inputs for one output"""
mixed = None
for input_name, (header, audio) in input_audio.items():
key = (input_name, output_name)
if key in self.routing:
gain = self.routing[key]
# Apply gain
scaled = audio * (10 ** (gain / 20))
# Mix
if mixed is None:
mixed = scaled.copy()
else:
# Handle different sample rates...
mixed = mixed + scaled
# Clip to prevent overflow
if mixed is not None:
mixed = np.clip(mixed, -1.0, 1.0)
return mixed
Learning milestones:
- Multiple streams receive → You handle concurrent input
- Mixing works → You understand audio summing
- Routing is configurable → You built a flexible system
- No glitches → Buffering and timing work correctly
Project 9: VBAN Quality Monitor (Network Analysis)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, C
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Network Analysis / QoS Monitoring
- Software or Tool: Python, matplotlib
- Main Book: “High Performance Browser Networking” by Ilya Grigorik
What you’ll build: A monitoring tool that analyzes VBAN stream quality—measuring jitter, packet loss, latency estimation, and displaying real-time graphs.
Why it teaches network audio quality: VBAN over UDP means packets can be lost or delayed. Understanding quality metrics is essential for diagnosing audio problems.
Core challenges you’ll face:
- Jitter measurement → maps to variance in packet arrival times
- Loss detection → maps to frame counter gaps
- Latency estimation → maps to one-way delay measurement
- Real-time visualization → maps to live updating graphs
Key Concepts:
- Jitter: Network timing variance
- QoS Metrics: MOS, latency, loss
- Network Buffers: How to size them
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1, 7
Real world outcome:
$ vban_monitor --stream "Stream1" --ip 192.168.1.100
VBAN Quality Monitor v1.0
Monitoring: Stream1 @ 192.168.1.100
Real-time Metrics (updated every 1s):
┌─────────────────────────────────────────────────────────────────┐
│ Packet Rate: 187.5 pkt/s Expected: 187.5 pkt/s OK │
│ Packet Loss: 0.02% (3 of 14,062 packets) │
│ Jitter: 1.2ms avg (0.5ms - 3.8ms range) │
│ Buffer Need: ~4ms (based on jitter) │
├─────────────────────────────────────────────────────────────────┤
│ Inter-Packet Timing (ms): │
│ 5.33 ████████████████████████████████████████████ (expected) │
│ 5.00 ██████████████████████████████████░░░░░░░░░░ │
│ 5.50 ██████████████████████████████████████░░░░░░ │
│ 6.00 ██████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │
├─────────────────────────────────────────────────────────────────┤
│ Jitter History (last 60s): │
│ 3ms ┤ * │
│ 2ms ┤ * * * * │
│ 1ms ┤ * * * * * * * * * * * * * * * │
│ 0ms ┼──────────────────────────────────────────► time │
└─────────────────────────────────────────────────────────────────┘
Implementation Hints:
Jitter calculation:
import statistics
from collections import deque
class QualityMonitor:
def __init__(self, window_size=1000):
self.arrival_times = deque(maxlen=window_size)
self.intervals = deque(maxlen=window_size)
self.last_arrival = None
self.last_frame = -1
self.packets_received = 0
self.packets_lost = 0
def process_packet(self, header, arrival_time):
self.packets_received += 1
# Track inter-packet intervals
if self.last_arrival is not None:
interval = (arrival_time - self.last_arrival) * 1000 # ms
self.intervals.append(interval)
self.last_arrival = arrival_time
self.arrival_times.append(arrival_time)
# Track lost packets
if self.last_frame >= 0:
expected = (self.last_frame + 1) & 0xFFFFFFFF
if header['frame'] != expected:
lost = (header['frame'] - expected) & 0xFFFFFFFF
if lost < 1000:
self.packets_lost += lost
self.last_frame = header['frame']
def get_jitter_stats(self):
if len(self.intervals) < 2:
return None
# Expected interval based on sample rate and samples/frame
# e.g., 48000 Hz, 256 samples = 5.33ms
mean = statistics.mean(self.intervals)
stdev = statistics.stdev(self.intervals)
min_val = min(self.intervals)
max_val = max(self.intervals)
# Jitter = deviation from expected
# RFC 3550 jitter calculation
jitter = stdev
return {
'mean_interval': mean,
'jitter': jitter,
'min': min_val,
'max': max_val,
'range': max_val - min_val
}
def get_loss_rate(self):
total = self.packets_received + self.packets_lost
if total == 0:
return 0
return self.packets_lost / total * 100
def recommended_buffer_size(self):
stats = self.get_jitter_stats()
if stats is None:
return 10 # Default 10ms
# Buffer should cover worst-case jitter
# Rule of thumb: 2-3x the jitter
return stats['jitter'] * 3
Learning milestones:
- Jitter measured → You understand timing variance
- Loss tracked → Frame counter analysis works
- Graphs display → Real-time visualization works
- Recommendations accurate → You understand buffer sizing
Project 10: Complete VBAN Application (Full Implementation)
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Rust or C
- Alternative Programming Languages: Go, C++
- Coolness Level: Level 5: Pure Magic
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Systems Programming / Audio Engineering
- Software or Tool: Rust or C, ALSA/CoreAudio/WASAPI
- Main Book: “Programming Rust” by Blandy & Orendorff
What you’ll build: A complete VBAN implementation from scratch in a systems language—full protocol support, platform audio integration, low-latency performance, and production-quality code.
Why this is the ultimate project: You’ll implement VBAN at the same level as the official tools, understanding every optimization and design decision.
Core challenges you’ll face:
- Zero-copy packet handling → maps to memory efficiency
- Platform audio abstraction → maps to ALSA/CoreAudio/WASAPI
- Lock-free buffers → maps to real-time audio requirements
- All sub-protocols → maps to complete specification coverage
Difficulty: Expert Time estimate: 2-3 months Prerequisites: All previous projects, systems programming experience
Real world outcome:
$ vban --mode emitter --device "hw:0" --stream "MyStream" \
--dest 192.168.1.100 --rate 48000 --channels 2 --format s16
$ vban --mode receptor --stream "Stream1" --device "hw:0" \
--quality 3 --buffer 20
$ vban --mode text --dest 192.168.1.100 "Strip[0].Mute = 1"
$ vban --mode scan --timeout 10
Implementation Hints:
Rust VBAN types:
use std::net::UdpSocket;
const VBAN_HEADER_SIZE: usize = 28;
const VBAN_MAX_DATA_SIZE: usize = 1436;
#[repr(C, packed)]
#[derive(Clone, Copy)]
struct VBANHeader {
vban: [u8; 4],
format_sr: u8,
format_nbs: u8,
format_nbc: u8,
format_bit: u8,
streamname: [u8; 16],
nu_frame: u32,
}
impl VBANHeader {
fn new(sample_rate_index: u8, samples: u8, channels: u8,
bit_format: u8, name: &str, frame: u32) -> Self {
let mut streamname = [0u8; 16];
let name_bytes = name.as_bytes();
let len = name_bytes.len().min(16);
streamname[..len].copy_from_slice(&name_bytes[..len]);
Self {
vban: *b"VBAN",
format_sr: sample_rate_index,
format_nbs: samples - 1,
format_nbc: channels - 1,
format_bit: bit_format,
streamname,
nu_frame: frame,
}
}
fn to_bytes(&self) -> [u8; 28] {
unsafe { std::mem::transmute(*self) }
}
fn from_bytes(data: &[u8]) -> Option<Self> {
if data.len() < 28 {
return None;
}
if &data[0..4] != b"VBAN" {
return None;
}
let header: Self = unsafe {
std::ptr::read(data.as_ptr() as *const Self)
};
Some(header)
}
}
Lock-free ring buffer for audio:
use std::sync::atomic::{AtomicUsize, Ordering};
struct RingBuffer<T: Copy + Default, const N: usize> {
buffer: [T; N],
write_pos: AtomicUsize,
read_pos: AtomicUsize,
}
impl<T: Copy + Default, const N: usize> RingBuffer<T, N> {
fn new() -> Self {
Self {
buffer: [T::default(); N],
write_pos: AtomicUsize::new(0),
read_pos: AtomicUsize::new(0),
}
}
fn push(&self, items: &[T]) -> usize {
let write = self.write_pos.load(Ordering::Relaxed);
let read = self.read_pos.load(Ordering::Acquire);
let available = if write >= read {
N - (write - read) - 1
} else {
read - write - 1
};
let to_write = items.len().min(available);
// ... write logic with wrap-around
self.write_pos.store((write + to_write) % N, Ordering::Release);
to_write
}
fn pop(&self, out: &mut [T]) -> usize {
// ... similar read logic
}
}
Learning milestones:
- Packets send/receive → Core protocol works
- Platform audio works → System integration complete
- All sub-protocols → Full specification coverage
- Production quality → Error handling, logging, config
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| 1. Packet Analyzer | ⭐ | Weekend | ⚡⚡⚡ | 🎮🎮🎮 |
| 2. VBAN Receiver | ⭐⭐ | 1 week | ⚡⚡⚡⚡ | 🎮🎮🎮🎮 |
| 3. VBAN Emitter | ⭐⭐ | 1 week | ⚡⚡⚡⚡ | 🎮🎮🎮🎮 |
| 4. Text Protocol | ⭐⭐ | Weekend | ⚡⚡⚡ | 🎮🎮🎮 |
| 5. MIDI Bridge | ⭐⭐⭐ | 2 weeks | ⚡⚡⚡⚡ | 🎮🎮🎮🎮🎮 |
| 6. ESP32 Audio Node | ⭐⭐⭐ | 2-3 weeks | ⚡⚡⚡⚡⚡ | 🎮🎮🎮🎮🎮 |
| 7. Network Scanner | ⭐⭐ | 1 week | ⚡⚡⚡ | 🎮🎮🎮 |
| 8. Audio Router | ⭐⭐⭐⭐ | 3-4 weeks | ⚡⚡⚡⚡⚡ | 🎮🎮🎮🎮 |
| 9. Quality Monitor | ⭐⭐⭐ | 2 weeks | ⚡⚡⚡⚡ | 🎮🎮🎮🎮 |
| 10. Complete Implementation | ⭐⭐⭐⭐ | 2-3 months | ⚡⚡⚡⚡⚡ | 🎮🎮🎮🎮🎮 |
Recommended Learning Path
Your Starting Point
If you’re learning network audio from scratch: Projects 1 → 2 → 3 → 7 → 9 (Core protocol understanding)
If you’re interested in embedded/IoT audio: Projects 1 → 2 → 3 → 6 (ESP32 focus)
If you want to build tools for Voicemeeter: Projects 1 → 4 → 5 → 8 (Control and routing)
Recommended Sequence
Phase 1: Protocol Fundamentals (1-2 weeks)
├── Project 1: Packet Analyzer → See the wire format
└── Project 2: VBAN Receiver → Understand slave behavior
Phase 2: Bidirectional Communication (2-3 weeks)
├── Project 3: VBAN Emitter → Understand master behavior
└── Project 4: Text Protocol → Learn sub-protocols
Phase 3: Specialized Applications (3-5 weeks)
├── Project 5: MIDI Bridge → Musical control
├── Project 6: ESP32 Node → Embedded audio
└── Project 7: Network Scanner → Discovery
Phase 4: Advanced Systems (5-8 weeks)
├── Project 8: Audio Router → Multi-stream handling
├── Project 9: Quality Monitor → Network analysis
└── Project 10: Complete Implementation → Production quality
Final Project: Networked Audio Production System
- File: LEARN_VBAN_PROTOCOL.md
- Main Programming Language: Mixed (Python, C++, Arduino)
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Complete Audio System Design
- Software or Tool: Everything from previous projects
What you’ll build: A complete networked audio production system using VBAN—multiple ESP32 microphones, a mixing/routing server, monitoring dashboards, and remote control.
System Architecture:
┌─────────────────────────────────────────────────────────────────────────────┐
│ NETWORKED AUDIO PRODUCTION SYSTEM │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ ESP32 │ │ ESP32 │ │ ESP32 │ VBAN Audio │
│ │ Mic 1 │ │ Mic 2 │ │ Mic 3 │ ═══════► │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ VBAN ROUTER/MIXER │ │
│ │ │ │
│ │ Inputs: [Mic1] [Mic2] [Mic3] │ │
│ │ Outputs: [Main] [Monitor] [Record]│ │
│ │ Effects: [EQ] [Comp] [Gate] │ │
│ └──────────────┬──────────────────────┘ │
│ │ │
│ ┌──────────┼──────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────┐ ┌───────┐ ┌───────┐ │
│ │ Main │ │Monitor│ │Recorder│ │
│ │Output │ │ Mix │ │ │ │
│ └───────┘ └───────┘ └───────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ MONITORING DASHBOARD │ │
│ │ [Levels] [Jitter] [Latency] [Loss] │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ REMOTE CONTROL (Web UI) │ │
│ │ [Mute] [Gain] [Routing] [Presets] │ │
│ └─────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Difficulty: Master Time estimate: 3-4 months Prerequisites: All 10 projects completed
Essential Resources
Official Documentation
| Resource | URL | Description |
|---|---|---|
| VBAN Specification | vb-audio.com/Voicemeeter/VBANProtocol_Specifications.pdf | Official protocol spec |
| VB-Audio VBAN Page | vb-audio.com/Voicemeeter/vban.htm | Overview and downloads |
| VB-Audio Forums | forum.vb-audio.com | Developer discussions |
Open Source Implementations
| Project | Language | URL |
|---|---|---|
| vban (quiniouben) | C | github.com/quiniouben/vban |
| pyVBAN | Python | github.com/TheStaticTurtle/pyVBAN |
| ESP32-VBAN-Audio-Source | C++ | github.com/rkinnett/ESP32-VBAN-Audio-Source |
| ESP32-VBAN-Network-Audio-Player | C++ | github.com/rkinnett/ESP32-VBAN-Network-Audio-Player |
| Arduino Audio Tools | C++ | github.com/pschatzmann/arduino-audio-tools |
| vban (npm) | JavaScript | npmjs.com/package/vban |
Books
| Book | Author | Best For |
|---|---|---|
| TCP/IP Illustrated, Volume 1 | W. Richard Stevens | UDP networking fundamentals |
| The Audio Programming Book | Richard Boulanger | Audio DSP concepts |
| Making Embedded Systems | Elecia White | ESP32 development |
| Computer Networks | Andrew Tanenbaum | Network protocols |
| Programming Rust | Blandy & Orendorff | Systems implementation |
Tools
| Tool | Purpose |
|---|---|
| Voicemeeter | VBAN-compatible virtual mixer |
| Wireshark | Packet capture and analysis |
| VB-Cable | Virtual audio cables |
| VBAN Receptor/Emitter | Official VBAN apps |
Summary
| # | Project | Main Language | Knowledge Area |
|---|---|---|---|
| 1 | Packet Analyzer | Python | Network Protocols / Binary Parsing |
| 2 | VBAN Receiver | Python | Audio I/O / Network Programming |
| 3 | VBAN Emitter | Python | Audio Capture / Packet Timing |
| 4 | Text Protocol | Python | Remote Control / Commands |
| 5 | MIDI Bridge | Python | MIDI Protocol / Serial |
| 6 | ESP32 Audio Node | C++ (Arduino) | Embedded Systems / I2S Audio |
| 7 | Network Scanner | Python | Service Discovery / Network Analysis |
| 8 | Audio Router | Python | Audio Routing / Stream Processing |
| 9 | Quality Monitor | Python | QoS / Jitter Analysis |
| 10 | Complete Implementation | Rust/C | Systems Programming |
| Final | Production System | Mixed | Complete Audio System |
Getting Started Checklist
Before starting Project 1:
- Python 3.8+ installed
- Install Voicemeeter (for testing): vb-audio.com/Voicemeeter
- Read the VBAN Protocol Specification
- Install Wireshark for packet analysis
- Understand basic UDP networking
- Have a local network for testing (or use loopback)
Welcome to the world of network audio protocols! 🎧
Generated for deep understanding of the VBAN protocol and network audio streaming