LEARN VBAN PROTOCOL

Learn VBAN Protocol: From Packets to Production Audio Streaming

Goal: Deeply understand the VBAN (VB-Audio Network) protocol—from raw UDP packets to building complete audio streaming systems, embedded implementations, and network audio applications.

Why Learn VBAN?

VBAN is a lightweight, open protocol for streaming audio (and MIDI/text/serial data) over standard IP networks. Unlike professional protocols like Dante or AES67, VBAN is:

Simple: 28-byte header + PCM data over UDP
Open: Free specification, no licensing fees
Practical: Works on any network, no special hardware
Extensible: Sub-protocols for audio, MIDI, text, and services

Understanding VBAN teaches you:

Network audio fundamentals - How digital audio travels over IP
UDP programming - Real-time, low-latency network protocols
Binary protocols - Parsing and constructing packet formats
Audio processing - PCM formats, sample rates, channel configurations
Embedded systems - Implementing protocols on microcontrollers
Real-time systems - Handling jitter, latency, and synchronization

After completing these projects, you will:

Understand every byte in a VBAN packet
Build your own audio streaming tools
Create embedded audio devices (ESP32, Arduino)
Implement network audio monitoring and routing
Appreciate the trade-offs between VBAN and professional protocols

Core Concept Analysis

What is VBAN?

┌─────────────────────────────────────────────────────────────────────────────┐
│                              VBAN OVERVIEW                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  VBAN = VB-Audio Network Protocol                                           │
│                                                                              │
│  Created by: Vincent Burel (VB-Audio Software)                              │
│  First released: June 2015 (with Voicemeeter)                               │
│  License: Open/Free for any use                                             │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    VBAN CHARACTERISTICS                             │    │
│  ├─────────────────────────────────────────────────────────────────────┤    │
│  │                                                                     │    │
│  │  • Transport: UDP (User Datagram Protocol)                         │    │
│  │  • Latency: Low (no TCP handshaking)                               │    │
│  │  • Reliability: None (no retransmission, no ACKs)                  │    │
│  │  • Model: Broadcast (sender=master, receiver=slave)                │    │
│  │  • Sync: None (receiver adapts to sender's clock)                  │    │
│  │  • Format: Native PCM (uncompressed audio)                         │    │
│  │                                                                     │    │
│  │  Max Channels: 256 (commonly 1-8)                                  │    │
│  │  Max Sample Rate: 705,600 Hz                                        │    │
│  │  Bit Depths: 8, 10, 12, 16, 24, 32-bit int, 32/64-bit float        │    │
│  │  Default Port: 6980                                                 │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  Sub-protocols:                                                              │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐                    │
│  │  AUDIO   │  │  SERIAL  │  │   TEXT   │  │ SERVICE  │                    │
│  │  (0x00)  │  │  (0x20)  │  │  (0x40)  │  │  (0x60)  │                    │
│  │          │  │  (MIDI)  │  │ (Remote) │  │  (Ping)  │                    │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

VBAN vs Professional Audio Networking

Aspect	VBAN	Dante	AES67	AVB
Cost	Free	Licensing fees	Free standard	Free standard
Hardware	Any network	Any network	Any network	AVB switches required
Sync	None (slave adapts)	PTP (precise)	PTPv2	gPTP
Reliability	Best effort (UDP)	Redundant networks	Redundant	QoS guaranteed
Latency	~5-20ms typical	<1ms possible	<1ms possible	<2ms guaranteed
Discovery	Manual/Ping	Automatic	Manual	Automatic
Channels	256	1024+	Unlimited	8 per stream
Use case	Home/Semi-pro	Professional	Professional	Professional

VBAN Packet Structure

┌─────────────────────────────────────────────────────────────────────────────┐
│                          VBAN PACKET STRUCTURE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  UDP Packet (max 1464 bytes for VBAN)                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  VBAN HEADER (28 bytes)  │  AUDIO DATA (1-1436 bytes)              │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                     VBAN HEADER LAYOUT (28 bytes)                   │    │
│  ├─────────────────────────────────────────────────────────────────────┤    │
│  │                                                                     │    │
│  │  Offset  Size   Field          Description                         │    │
│  │  ──────────────────────────────────────────────────────────────    │    │
│  │  0       4      vban           Magic: 'V','B','A','N' (0x4E414256) │    │
│  │  4       1      format_SR      Sample Rate Index + Protocol        │    │
│  │  5       1      format_nbs     Samples per frame - 1 (0-255)       │    │
│  │  6       1      format_nbc     Channels - 1 (0-255)                │    │
│  │  7       1      format_bit     Bit format + Codec                  │    │
│  │  8       16     streamname     Stream name (ASCII, null-padded)    │    │
│  │  24      4      nuFrame        Frame counter (32-bit, little-end)  │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  format_SR byte layout:                                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Bits 7-5: Sub-Protocol    Bits 4-0: Sample Rate Index             │    │
│  │  ┌───┬───┬───┬───┬───┬───┬───┬───┐                                 │    │
│  │  │ P │ P │ P │ SR│ SR│ SR│ SR│ SR│                                 │    │
│  │  └───┴───┴───┴───┴───┴───┴───┴───┘                                 │    │
│  │  Protocol: 000=Audio, 001=Serial, 010=Text, 011=Service            │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  format_bit byte layout:                                                     │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Bits 7-4: Codec Type      Bits 3-0: Bit Resolution                │    │
│  │  ┌───┬───┬───┬───┬───┬───┬───┬───┐                                 │    │
│  │  │ C │ C │ C │ C │ BR│ BR│ BR│ BR│                                 │    │
│  │  └───┴───┴───┴───┴───┴───┴───┴───┘                                 │    │
│  │  Codec: 0=PCM, 1=VBCA (compressed), 2=VBCV (voice)...              │    │
│  │  BitRes: 0=U8, 1=S16, 2=S24, 3=S32, 4=F32, 5=F64, 6=12bit, 7=10bit │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Sample Rate Index Table

┌─────────────────────────────────────────────────────────────────────────────┐
│                      VBAN SAMPLE RATE INDEX TABLE                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Index    Sample Rate      Common Use                                        │
│  ─────────────────────────────────────────────────────────────────────────  │
│  0        6,000 Hz         Low quality voice                                │
│  1        12,000 Hz        Voice                                            │
│  2        24,000 Hz        Voice/AM radio                                   │
│  3        48,000 Hz        ★ Professional audio, video                      │
│  4        96,000 Hz        High-resolution audio                            │
│  5        192,000 Hz       Studio master                                    │
│  6        384,000 Hz       Ultra-high resolution                            │
│  7        8,000 Hz         Telephone                                        │
│  8        16,000 Hz        Wideband voice                                   │
│  9        32,000 Hz        FM broadcast                                     │
│  10       64,000 Hz        -                                                │
│  11       128,000 Hz       -                                                │
│  12       256,000 Hz       -                                                │
│  13       512,000 Hz       -                                                │
│  14       11,025 Hz        Low-quality audio                                │
│  15       22,050 Hz        Half CD quality                                  │
│  16       44,100 Hz        ★ CD quality                                     │
│  17       88,200 Hz        2x CD                                            │
│  18       176,400 Hz       4x CD                                            │
│  19       352,800 Hz       8x CD (DSD conversion)                           │
│  20       705,600 Hz       16x CD                                           │
│                                                                              │
│  ★ = Most commonly used                                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow: Sender to Receiver

┌─────────────────────────────────────────────────────────────────────────────┐
│                        VBAN DATA FLOW                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  SENDER (Master)                          RECEIVER (Slave)                   │
│  ┌─────────────────────┐                  ┌─────────────────────┐           │
│  │  Audio Source       │                  │  Audio Output       │           │
│  │  (Microphone, DAW,  │                  │  (Speakers, DAW,    │           │
│  │   System audio)     │                  │   Recording)        │           │
│  └──────────┬──────────┘                  └──────────▲──────────┘           │
│             │                                        │                       │
│             ▼                                        │                       │
│  ┌─────────────────────┐                  ┌─────────────────────┐           │
│  │  PCM Buffer         │                  │  Jitter Buffer      │           │
│  │  (256 samples max   │                  │  (Compensates for   │           │
│  │   per VBAN packet)  │                  │   network variance) │           │
│  └──────────┬──────────┘                  └──────────▲──────────┘           │
│             │                                        │                       │
│             ▼                                        │                       │
│  ┌─────────────────────┐                  ┌─────────────────────┐           │
│  │  VBAN Packet        │                  │  VBAN Parser        │           │
│  │  Assembly           │                  │  • Validate header  │           │
│  │  • Add 28-byte hdr  │                  │  • Check stream name│           │
│  │  • Increment nuFrame│                  │  • Extract config   │           │
│  └──────────┬──────────┘                  └──────────▲──────────┘           │
│             │                                        │                       │
│             ▼                                        │                       │
│  ┌─────────────────────┐    UDP/IP        ┌─────────────────────┐           │
│  │  UDP Socket         │ ═══════════════► │  UDP Socket         │           │
│  │  sendto(IP:6980)    │                  │  bind(0.0.0.0:6980) │           │
│  └─────────────────────┘                  └─────────────────────┘           │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    TIMING CONSIDERATIONS                            │    │
│  ├─────────────────────────────────────────────────────────────────────┤    │
│  │                                                                     │    │
│  │  48kHz, 256 samples/packet = 5.33ms per packet                     │    │
│  │  48kHz, 128 samples/packet = 2.67ms per packet                     │    │
│  │  44.1kHz, 256 samples/packet = 5.80ms per packet                   │    │
│  │                                                                     │    │
│  │  Network latency (LAN): typically < 1ms                            │    │
│  │  Total end-to-end: 10-30ms typical (including buffers)             │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Protocol Constants

// VBAN Header Size
#define VBAN_HEADER_SIZE        28    // 4 + 1 + 1 + 1 + 1 + 16 + 4

// Maximum sizes
#define VBAN_DATA_MAX_SIZE      1436  // MTU - IP - UDP - safety margin
#define VBAN_PACKET_MAX_SIZE    1464  // Header + Data
#define VBAN_SAMPLES_MAX        256   // Per packet

// Protocol types (bits 7-5 of format_SR)
#define VBAN_PROTOCOL_AUDIO     0x00  // PCM audio
#define VBAN_PROTOCOL_SERIAL    0x20  // Serial/MIDI
#define VBAN_PROTOCOL_TXT       0x40  // Text commands
#define VBAN_PROTOCOL_SERVICE   0x60  // Ping/discovery
#define VBAN_PROTOCOL_MASK      0xE0

// Sample rate mask (bits 4-0 of format_SR)
#define VBAN_SR_MASK            0x1F
#define VBAN_SR_MAXNUMBER       21

// Data types (bits 3-0 of format_bit)
#define VBAN_DATATYPE_U8        0x00  // Unsigned 8-bit
#define VBAN_DATATYPE_S16       0x01  // Signed 16-bit
#define VBAN_DATATYPE_S24       0x02  // Signed 24-bit
#define VBAN_DATATYPE_S32       0x03  // Signed 32-bit
#define VBAN_DATATYPE_F32       0x04  // Float 32-bit
#define VBAN_DATATYPE_F64       0x05  // Float 64-bit
#define VBAN_DATATYPE_12BIT     0x06  // 12-bit packed
#define VBAN_DATATYPE_10BIT     0x07  // 10-bit packed
#define VBAN_DATATYPE_MASK      0x07

// Codec types (bits 7-4 of format_bit)
#define VBAN_CODEC_PCM          0x00  // Native PCM
#define VBAN_CODEC_VBCA         0x10  // VB-Audio compressed
#define VBAN_CODEC_VBCV         0x20  // VB-Audio voice
#define VBAN_CODEC_MASK         0xF0

// Default port
#define VBAN_DEFAULT_PORT       6980

Project List

Projects are ordered from basic protocol understanding to complete audio systems and embedded implementations.

Project 1: VBAN Packet Analyzer (See the Wire Format)

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: Python
Alternative Programming Languages: C, Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 1: Beginner
Knowledge Area: Network Protocols / Binary Parsing
Software or Tool: Wireshark, Python
Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens

What you’ll build: A packet analyzer that captures VBAN packets from the network (or reads pcap files), decodes the 28-byte header, and displays stream information: sample rate, channels, bit depth, stream name, and frame counter.

Why it teaches VBAN: Before you can send or receive audio, you need to see the protocol. This project forces you to understand every byte in the header—the foundation of everything else.

Core challenges you’ll face:

Capturing UDP packets → maps to socket programming with raw sockets or pcap
Parsing binary headers → maps to struct unpacking, endianness
Decoding indexed values → maps to sample rate lookup tables
Filtering by stream name → maps to null-terminated ASCII strings

Resources for key challenges:

VBAN Protocol Specification (PDF) - Official protocol documentation
quiniouben/vban - Reference C implementation
pyVBAN - Python implementation

Key Concepts:

UDP Sockets: “The Linux Programming Interface” Chapter 57 - Michael Kerrisk
Binary Parsing: Python struct module documentation
Network Capture: Wireshark User Guide

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Python, understanding of networking basics

Real world outcome:

$ sudo python vban_analyzer.py --interface eth0
Listening for VBAN packets on eth0:6980...

[VBAN Audio] 192.168.1.100 → 192.168.1.255
  Stream: "Stream1"
  Protocol: AUDIO (PCM)
  Sample Rate: 48000 Hz (index 3)
  Channels: 2 (stereo)
  Bit Depth: 16-bit signed
  Samples/Frame: 256
  Frame #: 123456
  Payload: 1024 bytes (256 samples × 2 ch × 2 bytes)

[VBAN Audio] 192.168.1.100 → 192.168.1.255
  Stream: "Stream1"
  Frame #: 123457 (+1)
  ...

[VBAN Text] 192.168.1.50 → 192.168.1.100
  Stream: "Command1"
  Protocol: TEXT
  Content: "Strip[0].Mute = 1"

Implementation Hints:

VBAN header structure in Python:

import struct

# VBAN Header: 28 bytes total
# 'VBAN' (4) + format_SR (1) + format_nbs (1) + format_nbc (1) +
# format_bit (1) + streamname (16) + nuFrame (4)

VBAN_HEADER_FORMAT = '<4sBBBB16sI'  # Little-endian
VBAN_HEADER_SIZE = 28

# Sample rate lookup table (21 entries)
VBAN_SR_LIST = [
    6000, 12000, 24000, 48000, 96000, 192000, 384000,
    8000, 16000, 32000, 64000, 128000, 256000, 512000,
    11025, 22050, 44100, 88200, 176400, 352800, 705600
]

def parse_vban_header(data):
    if len(data) < VBAN_HEADER_SIZE:
        return None

    vban, format_sr, format_nbs, format_nbc, format_bit, \
        streamname, nu_frame = struct.unpack(VBAN_HEADER_FORMAT, data[:28])

    # Validate magic bytes
    if vban != b'VBAN':
        return None

    # Extract fields
    protocol = (format_sr & 0xE0) >> 5
    sr_index = format_sr & 0x1F
    sample_rate = VBAN_SR_LIST[sr_index] if sr_index < 21 else 0
    samples_per_frame = format_nbs + 1
    channels = format_nbc + 1
    bit_format = format_bit & 0x07
    codec = (format_bit & 0xF0) >> 4
    stream_name = streamname.rstrip(b'\x00').decode('ascii', errors='replace')

    return {
        'protocol': protocol,
        'sample_rate': sample_rate,
        'samples': samples_per_frame,
        'channels': channels,
        'bit_format': bit_format,
        'codec': codec,
        'stream_name': stream_name,
        'frame': nu_frame
    }

UDP listener:

import socket

def listen_for_vban(port=6980):
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.bind(('0.0.0.0', port))

    print(f"Listening on port {port}...")

    while True:
        data, addr = sock.recvfrom(1500)
        header = parse_vban_header(data)
        if header:
            print(f"From {addr}: {header['stream_name']} - "
                  f"{header['sample_rate']}Hz, {header['channels']}ch, "
                  f"Frame #{header['frame']}")

Questions to guide your implementation:

What happens if packets arrive out of order? (Check frame counter)
How do you distinguish audio from text packets? (Protocol bits)
What’s the relationship between sample rate, channels, and packet size?

Learning milestones:

You capture raw packets → You understand UDP socket binding
You parse the header correctly → You understand binary formats
You decode all field types → You understand the protocol structure
You track frame numbers → You understand packet ordering

Project 2: Simple VBAN Receiver (Play Network Audio)

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: Python
Alternative Programming Languages: C, Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Audio I/O / Network Programming
Software or Tool: Python, PyAudio or sounddevice
Main Book: “The Audio Programming Book” by Richard Boulanger

What you’ll build: A command-line VBAN receiver that listens for a specific stream, buffers incoming audio, and plays it through your speakers in real-time.

Why it teaches VBAN: Receiving audio forces you to handle real-time constraints—buffering, jitter compensation, and sample rate matching. This is where VBAN’s “slave” model becomes real.

Core challenges you’ll face:

Jitter buffering → maps to compensating for network timing variance
Sample rate matching → maps to configuring audio output to match stream
Stream filtering → maps to selecting specific stream by name/IP
Dropout handling → maps to detecting lost packets, silence insertion

Key Concepts:

Audio I/O: PyAudio or sounddevice documentation
Ring Buffers: For audio buffering
Jitter: “Computer Networks” Chapter 6 - Tanenbaum

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 1, understanding of audio basics

Real world outcome:

$ python vban_receiver.py --stream "Stream1" --ip 192.168.1.100
VBAN Receiver v1.0
Listening for stream "Stream1" from 192.168.1.100 on port 6980

[Connected] Receiving: 48000 Hz, 2 channels, 16-bit
Buffer: [████████████░░░░░░░░] 60% | Latency: 15ms

Press Ctrl+C to stop...

^C
Statistics:
  Packets received: 12,345
  Packets lost: 3 (0.02%)
  Underruns: 0
  Total time: 64.2 seconds

Implementation Hints:

Audio output with sounddevice:

import sounddevice as sd
import numpy as np
from collections import deque
import threading

class VBANReceiver:
    def __init__(self, stream_name, buffer_ms=50):
        self.stream_name = stream_name
        self.buffer = deque(maxlen=100)  # Ring buffer
        self.audio_stream = None
        self.sample_rate = None
        self.channels = None
        self.running = False
        self.buffer_target_ms = buffer_ms
        self.last_frame = -1
        self.packets_lost = 0

    def audio_callback(self, outdata, frames, time, status):
        """Called by sounddevice when it needs audio"""
        if status:
            print(f"Audio status: {status}")

        if len(self.buffer) > 0:
            # Get audio from buffer
            audio = self.buffer.popleft()
            # Ensure correct shape
            if len(audio) < frames:
                # Pad with zeros if underrun
                audio = np.pad(audio, ((0, frames - len(audio)), (0, 0)))
            outdata[:] = audio[:frames]
        else:
            # Buffer underrun - output silence
            outdata.fill(0)

    def process_packet(self, data, addr):
        header = parse_vban_header(data)
        if not header:
            return

        # Filter by stream name
        if header['stream_name'] != self.stream_name:
            return

        # Check for lost packets
        if self.last_frame >= 0:
            expected = (self.last_frame + 1) & 0xFFFFFFFF
            if header['frame'] != expected:
                lost = (header['frame'] - expected) & 0xFFFFFFFF
                self.packets_lost += lost
        self.last_frame = header['frame']

        # Initialize audio stream on first packet
        if self.audio_stream is None:
            self.sample_rate = header['sample_rate']
            self.channels = header['channels']
            self.start_audio()

        # Convert PCM data to numpy array
        audio_data = data[VBAN_HEADER_SIZE:]
        samples = self.convert_audio(audio_data, header)
        self.buffer.append(samples)

    def convert_audio(self, data, header):
        """Convert raw PCM bytes to numpy array"""
        dtype_map = {
            0: np.uint8,    # U8
            1: np.int16,    # S16
            2: None,        # S24 - special handling
            3: np.int32,    # S32
            4: np.float32,  # F32
            5: np.float64,  # F64
        }

        dtype = dtype_map.get(header['bit_format'], np.int16)
        audio = np.frombuffer(data, dtype=dtype)

        # Reshape to (samples, channels)
        audio = audio.reshape(-1, header['channels'])

        # Normalize to float32 for output
        if dtype == np.int16:
            audio = audio.astype(np.float32) / 32768.0
        elif dtype == np.int32:
            audio = audio.astype(np.float32) / 2147483648.0

        return audio

    def start_audio(self):
        self.audio_stream = sd.OutputStream(
            samplerate=self.sample_rate,
            channels=self.channels,
            callback=self.audio_callback,
            blocksize=256
        )
        self.audio_stream.start()

Main receive loop:

def main():
    receiver = VBANReceiver(stream_name="Stream1")

    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    sock.bind(('0.0.0.0', 6980))

    print(f"Listening for VBAN stream...")

    try:
        while True:
            data, addr = sock.recvfrom(1500)
            receiver.process_packet(data, addr)
    except KeyboardInterrupt:
        print("\nStopping...")

Learning milestones:

Audio plays (with glitches) → You understand the basic pipeline
Buffering reduces glitches → You understand jitter compensation
Lost packets detected → You understand frame counter usage
Clean playback achieved → You’ve built a working receiver

Project 3: VBAN Emitter (Send Audio to the Network)

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: Python
Alternative Programming Languages: C, Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Audio Capture / Network Programming
Software or Tool: Python, PyAudio or sounddevice
Main Book: “The Audio Programming Book” by Richard Boulanger

What you’ll build: A command-line VBAN emitter that captures audio from your microphone or system audio and streams it over the network.

Why it teaches VBAN: Being the sender (master) teaches you about timing—you control the clock. You’ll understand packet pacing and why the sender drives everything.

Core challenges you’ll face:

Audio capture → maps to reading from input devices
Packet timing → maps to sending at consistent intervals
Header construction → maps to building valid VBAN packets
Broadcast vs unicast → maps to choosing destination addressing

Key Concepts:

Audio Capture: Platform audio APIs
Packet Pacing: Timing audio transmission
UDP Broadcasting: Sending to multiple receivers

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 2, audio capture basics

Real world outcome:

$ python vban_emitter.py --stream "MyStream" --dest 192.168.1.255 \
    --device "Microphone" --rate 48000 --channels 2

VBAN Emitter v1.0
Source: Microphone (Built-in)
Destination: 192.168.1.255:6980 (broadcast)
Stream: "MyStream"
Format: 48000 Hz, 2 channels, 16-bit PCM

Streaming... [████████████████████] 100% CPU: 2%
Packets sent: 15,234 | Bytes: 23.4 MB | Time: 81.2s

Press Ctrl+C to stop...

Implementation Hints:

VBAN packet construction:

class VBANEmitter:
    def __init__(self, stream_name, dest_ip, dest_port=6980,
                 sample_rate=48000, channels=2, bit_depth=16):
        self.stream_name = stream_name
        self.dest = (dest_ip, dest_port)
        self.sample_rate = sample_rate
        self.channels = channels
        self.bit_depth = bit_depth
        self.frame_counter = 0

        # Find sample rate index
        self.sr_index = VBAN_SR_LIST.index(sample_rate)

        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        # Enable broadcast if using broadcast address
        self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)

    def build_header(self, samples_per_frame):
        """Construct 28-byte VBAN header"""

        # format_SR: protocol (3 bits) + sample rate index (5 bits)
        format_sr = (0x00 << 5) | (self.sr_index & 0x1F)  # Audio protocol

        # format_nbs: samples per frame - 1
        format_nbs = samples_per_frame - 1

        # format_nbc: channels - 1
        format_nbc = self.channels - 1

        # format_bit: codec (4 bits) + bit depth (4 bits)
        bit_format = 1 if self.bit_depth == 16 else 0  # S16
        format_bit = (0x00 << 4) | (bit_format & 0x0F)  # PCM codec

        # Stream name (16 bytes, null-padded)
        stream_bytes = self.stream_name.encode('ascii')[:16].ljust(16, b'\x00')

        header = struct.pack('<4sBBBB16sI',
            b'VBAN',
            format_sr,
            format_nbs,
            format_nbc,
            format_bit,
            stream_bytes,
            self.frame_counter
        )

        self.frame_counter = (self.frame_counter + 1) & 0xFFFFFFFF
        return header

    def send_audio(self, audio_data):
        """Send audio samples as VBAN packet"""
        # audio_data should be numpy array of shape (samples, channels)
        samples_per_frame = len(audio_data)

        # Convert to bytes
        if self.bit_depth == 16:
            pcm_data = (audio_data * 32767).astype(np.int16).tobytes()

        # Build packet
        header = self.build_header(samples_per_frame)
        packet = header + pcm_data

        # Send
        self.sock.sendto(packet, self.dest)

Audio capture and streaming:

def audio_input_callback(indata, frames, time, status):
    """Called when audio input is available"""
    if status:
        print(f"Input status: {status}")

    # indata is numpy array of shape (frames, channels)
    emitter.send_audio(indata.copy())

def start_streaming():
    global emitter
    emitter = VBANEmitter(
        stream_name="MyStream",
        dest_ip="192.168.1.255",
        sample_rate=48000,
        channels=2
    )

    # Start audio input stream
    with sd.InputStream(
        samplerate=48000,
        channels=2,
        callback=audio_input_callback,
        blocksize=256  # 256 samples = one VBAN packet
    ):
        print("Streaming... Press Ctrl+C to stop")
        while True:
            sd.sleep(1000)

Learning milestones:

Packets are sent → You understand header construction
Voicemeeter receives them → Your packets are valid
Audio plays correctly → Timing and format are correct
Works across network → Broadcast/unicast works

Project 4: VBAN Text Protocol (Remote Control)

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: Python
Alternative Programming Languages: C, Go, JavaScript
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Remote Control / Command Protocols
Software or Tool: Voicemeeter, Python
Main Book: “Design Patterns” by Gang of Four (Command pattern)

What you’ll build: A tool that sends VBAN text commands to control Voicemeeter remotely—muting channels, adjusting volumes, changing settings.

Why it teaches VBAN sub-protocols: VBAN isn’t just audio. The TEXT sub-protocol shows how the same packet format extends to different data types. You’ll learn protocol design flexibility.

Core challenges you’ll face:

Text encoding → maps to UTF-8 in VBAN packets
Command syntax → maps to Voicemeeter’s control language
Bidirectional communication → maps to sending commands, receiving state
Reliable delivery → maps to UDP unreliability, retries

Key Concepts:

VBAN Text Protocol: VB-Audio documentation
Command Pattern: Remote control design
UTF-8 Encoding: Text in binary protocols

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Project 1

Real world outcome:

# Mute input strip 0
$ vban_sendtext --ip 192.168.1.100 "Strip[0].Mute = 1"
Sent: Strip[0].Mute = 1

# Set volume of bus 0
$ vban_sendtext --ip 192.168.1.100 "Bus[0].Gain = -6.0"
Sent: Bus[0].Gain = -6.0

# Interactive mode
$ vban_sendtext --ip 192.168.1.100 --interactive
VBAN Text Console (connected to 192.168.1.100)
> Strip[0].Mute = 0
Sent.
> Strip[0].Gain = -3.0
Sent.
> quit

Implementation Hints:

VBAN text packet construction:

class VBANText:
    def __init__(self, dest_ip, dest_port=6980, stream_name="Command1"):
        self.dest = (dest_ip, dest_port)
        self.stream_name = stream_name
        self.frame_counter = 0
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

    def send_text(self, text):
        """Send text command via VBAN"""

        # format_SR: TEXT protocol (0x40) + encoding
        # Encoding: 0=ASCII, 1=UTF8, 2=WCHAR
        format_sr = 0x40 | 0x01  # TEXT + UTF8

        # format_nbs, format_nbc not used for text (set to 0)
        format_nbs = 0
        format_nbc = 0

        # format_bit: charset info
        format_bit = 0x10  # UTF8

        stream_bytes = self.stream_name.encode('ascii')[:16].ljust(16, b'\x00')

        header = struct.pack('<4sBBBB16sI',
            b'VBAN',
            format_sr,
            format_nbs,
            format_nbc,
            format_bit,
            stream_bytes,
            self.frame_counter
        )

        self.frame_counter = (self.frame_counter + 1) & 0xFFFFFFFF

        # Text payload (UTF-8 encoded)
        payload = text.encode('utf-8')

        packet = header + payload
        self.sock.sendto(packet, self.dest)

        return True

Voicemeeter command examples:

# Common Voicemeeter commands
COMMANDS = {
    # Strips (inputs)
    'mute_strip': 'Strip[{n}].Mute = {v}',
    'strip_gain': 'Strip[{n}].Gain = {v}',
    'strip_solo': 'Strip[{n}].Solo = {v}',

    # Buses (outputs)
    'mute_bus': 'Bus[{n}].Mute = {v}',
    'bus_gain': 'Bus[{n}].Gain = {v}',

    # Routing
    'strip_to_bus': 'Strip[{strip}].A{bus} = {v}',  # A1-A5, B1-B3

    # Recorder
    'record': 'Recorder.Record = {v}',
    'stop': 'Recorder.Stop = {v}',
}

def mute_strip(vban_text, strip_number, muted=True):
    cmd = f"Strip[{strip_number}].Mute = {1 if muted else 0}"
    vban_text.send_text(cmd)

Learning milestones:

Text packets construct → You understand the TEXT sub-protocol
Voicemeeter responds → Your commands are valid
Complex commands work → You understand the control syntax
You build an interface → You’ve created a remote control

Project 5: VBAN Serial/MIDI (Musical Control)

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: Python
Alternative Programming Languages: C, C++
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: MIDI Protocol / Serial Communication
Software or Tool: Python, mido (MIDI library)
Main Book: “MIDI Manual” by David Miles Huber

What you’ll build: A MIDI-over-VBAN bridge that sends and receives MIDI messages, enabling control of DAWs and synthesizers over the network.

Why it teaches VBAN extensions: The SERIAL sub-protocol shows VBAN’s versatility. You’ll learn how binary protocols can carry different payloads and how MIDI integrates with network transport.

Core challenges you’ll face:

MIDI message format → maps to note on/off, CC, pitch bend
Serial header configuration → maps to baud rate in SR field
Timing accuracy → maps to MIDI timing requirements
Multiple messages per packet → maps to efficient MIDI bundling

Key Concepts:

MIDI Protocol: MIDI 1.0 specification
VBAN Serial: VB-Audio documentation
Real-time Constraints: MIDI timing requirements

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Project 1, MIDI basics

Real world outcome:

$ vban_midi_bridge --ip 192.168.1.100 --local-port 6980 \
    --midi-in "USB MIDI Controller" --midi-out "Virtual MIDI Port"

VBAN MIDI Bridge v1.0
Local MIDI In: USB MIDI Controller
Local MIDI Out: Virtual MIDI Port
Remote: 192.168.1.100:6980

[TX] Note On:  Ch1 C4 vel:100
[TX] Note Off: Ch1 C4 vel:0
[RX] CC: Ch1 CC7 val:80
[RX] Note On: Ch1 E4 vel:90
...

Implementation Hints:

VBAN Serial/MIDI header:

class VBANMidi:
    def __init__(self, dest_ip, dest_port=6980, stream_name="Midi1"):
        self.dest = (dest_ip, dest_port)
        self.stream_name = stream_name
        self.frame_counter = 0
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

    def send_midi(self, midi_messages):
        """
        Send MIDI messages via VBAN Serial protocol
        midi_messages: list of bytes objects (raw MIDI)
        """

        # format_SR: SERIAL protocol (0x20) + bps index
        # For MIDI, we use 31250 baud (standard MIDI rate)
        # The SR field contains bps/100, so 31250 → index for ~312
        format_sr = 0x20 | 0x00  # SERIAL + MIDI mode

        # format_nbs: not used (0)
        format_nbs = 0

        # format_nbc: not used (0)
        format_nbc = 0

        # format_bit: MIDI type indicator
        format_bit = 0x10  # MIDI mode

        stream_bytes = self.stream_name.encode('ascii')[:16].ljust(16, b'\x00')

        header = struct.pack('<4sBBBB16sI',
            b'VBAN',
            format_sr,
            format_nbs,
            format_nbc,
            format_bit,
            stream_bytes,
            self.frame_counter
        )

        self.frame_counter = (self.frame_counter + 1) & 0xFFFFFFFF

        # MIDI payload: concatenate all messages
        payload = b''.join(midi_messages)

        packet = header + payload
        self.sock.sendto(packet, self.dest)

MIDI message construction:

def note_on(channel, note, velocity):
    """Create MIDI Note On message"""
    return bytes([0x90 | (channel & 0x0F), note & 0x7F, velocity & 0x7F])

def note_off(channel, note, velocity=0):
    """Create MIDI Note Off message"""
    return bytes([0x80 | (channel & 0x0F), note & 0x7F, velocity & 0x7F])

def control_change(channel, controller, value):
    """Create MIDI Control Change message"""
    return bytes([0xB0 | (channel & 0x0F), controller & 0x7F, value & 0x7F])

def pitch_bend(channel, value):
    """Create MIDI Pitch Bend message (value: -8192 to 8191)"""
    value = value + 8192  # Convert to 0-16383
    lsb = value & 0x7F
    msb = (value >> 7) & 0x7F
    return bytes([0xE0 | (channel & 0x0F), lsb, msb])

Learning milestones:

MIDI messages send → You understand MIDI format
DAW receives notes → Your VBAN-MIDI bridge works
Bidirectional works → You handle both directions
Timing is accurate → MIDI performance is usable

Project 6: ESP32 VBAN Audio Node (Embedded Implementation)

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: C++ (Arduino)
Alternative Programming Languages: C, MicroPython
Coolness Level: Level 5: Pure Magic
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Embedded Systems / I2S Audio
Software or Tool: ESP32, Arduino IDE, I2S microphone/DAC
Main Book: “Making Embedded Systems” by Elecia White

What you’ll build: An ESP32-based network audio device that can send microphone audio to the network (emitter) and/or receive audio and play it through speakers (receiver).

Why it teaches embedded VBAN: Moving from PC to microcontroller forces you to understand memory constraints, DMA, and real-time requirements. This is where VBAN becomes a hardware protocol.

Core challenges you’ll face:

I2S audio interface → maps to hardware audio on ESP32
WiFi UDP → maps to wireless networking on microcontrollers
DMA buffers → maps to efficient audio transfer
Real-time constraints → maps to timing on embedded systems

Resources for key challenges:

ESP32-VBAN-Audio-Source - Reference implementation
ESP32-VBAN-Network-Audio-Player - Receiver reference
Arduino Audio Tools - Audio library with VBAN support

Key Concepts:

I2S Protocol: ESP32 I2S documentation
DMA: Direct Memory Access for audio
WiFi UDP: ESP32 networking

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 1-3, Arduino basics, soldering

Real world outcome:

Hardware Setup:
┌─────────────────────────────────────────────────────────────────┐
│  ESP32 VBAN Audio Node                                          │
│                                                                  │
│  ┌──────────┐     ┌──────────┐     ┌──────────┐                │
│  │ INMP441  │────►│  ESP32   │────►│ MAX98357 │                │
│  │ I2S Mic  │     │          │     │ I2S DAC  │                │
│  └──────────┘     │   WiFi   │     └────┬─────┘                │
│                   │    ▲     │          │                       │
│                   └────┼─────┘          ▼                       │
│                        │           ┌─────────┐                  │
│                   VBAN UDP         │ Speaker │                  │
│                        │           └─────────┘                  │
│                        ▼                                        │
│                ┌──────────────┐                                 │
│                │ Voicemeeter  │                                 │
│                │    (PC)      │                                 │
│                └──────────────┘                                 │
└─────────────────────────────────────────────────────────────────┘

Serial Output:
ESP32 VBAN Audio Node
Connecting to WiFi: MyNetwork... Connected!
IP Address: 192.168.1.42
VBAN Emitter: streaming to 192.168.1.100:6980 as "ESP32-Mic"
VBAN Receiver: listening for "Stream1" on port 6980
Audio: 44100 Hz, 16-bit mono
Running...

Implementation Hints:

ESP32 VBAN header structure:

// VBAN Header (28 bytes)
typedef struct __attribute__((packed)) {
    char vban[4];           // "VBAN"
    uint8_t format_SR;      // Sample rate index + protocol
    uint8_t format_nbs;     // Samples per frame - 1
    uint8_t format_nbc;     // Channels - 1
    uint8_t format_bit;     // Bit format + codec
    char streamname[16];    // Stream name
    uint32_t nuFrame;       // Frame counter
} VBANHeader;

// Sample rate lookup
const uint32_t VBAN_SRList[] = {
    6000, 12000, 24000, 48000, 96000, 192000, 384000,
    8000, 16000, 32000, 64000, 128000, 256000, 512000,
    11025, 22050, 44100, 88200, 176400, 352800, 705600
};

// Find sample rate index
uint8_t getSRIndex(uint32_t sampleRate) {
    for (int i = 0; i < 21; i++) {
        if (VBAN_SRList[i] == sampleRate) return i;
    }
    return 3;  // Default to 48000
}

I2S microphone input:

#include <driver/i2s.h>

#define I2S_WS  25   // Word Select (LRCK)
#define I2S_SD  33   // Serial Data
#define I2S_SCK 32   // Serial Clock

void setupI2SMic() {
    i2s_config_t i2s_config = {
        .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
        .sample_rate = 44100,
        .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
        .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
        .communication_format = I2S_COMM_FORMAT_STAND_I2S,
        .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
        .dma_buf_count = 8,
        .dma_buf_len = 256,
        .use_apll = false
    };

    i2s_pin_config_t pin_config = {
        .bck_io_num = I2S_SCK,
        .ws_io_num = I2S_WS,
        .data_out_num = I2S_PIN_NO_CHANGE,
        .data_in_num = I2S_SD
    };

    i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);
    i2s_set_pin(I2S_NUM_0, &pin_config);
}

VBAN packet sending:

#include <WiFi.h>
#include <WiFiUdp.h>

WiFiUDP udp;
uint32_t frameCounter = 0;
uint8_t packet[1464];  // Max VBAN packet size

void sendVBANPacket(int16_t* samples, int numSamples) {
    VBANHeader* header = (VBANHeader*)packet;

    // Fill header
    memcpy(header->vban, "VBAN", 4);
    header->format_SR = 0x00 | getSRIndex(44100);  // Audio + 44100Hz
    header->format_nbs = numSamples - 1;
    header->format_nbc = 0;  // 1 channel
    header->format_bit = 0x01;  // 16-bit signed
    strncpy(header->streamname, "ESP32-Mic", 16);
    header->nuFrame = frameCounter++;

    // Copy audio data
    memcpy(packet + 28, samples, numSamples * 2);

    // Send
    udp.beginPacket(destIP, 6980);
    udp.write(packet, 28 + numSamples * 2);
    udp.endPacket();
}

void loop() {
    int16_t samples[256];
    size_t bytesRead;

    i2s_read(I2S_NUM_0, samples, 512, &bytesRead, portMAX_DELAY);

    if (bytesRead == 512) {
        sendVBANPacket(samples, 256);
    }
}

Learning milestones:

WiFi connects → You understand ESP32 networking
I2S reads audio → You understand hardware audio
Voicemeeter receives → Your packets are valid
Bidirectional works → You’ve built a complete audio node

What you’ll build: A network scanner that discovers VBAN devices using the SERVICE sub-protocol, lists active streams, and monitors network audio traffic.

Why it teaches VBAN discovery: VBAN has no built-in discovery (unlike Dante). Building one teaches you about service discovery patterns and how to work around protocol limitations.

Core challenges you’ll face:

Passive discovery → maps to listening for any VBAN packets
Active ping → maps to SERVICE sub-protocol
Stream enumeration → maps to tracking unique streams
Network topology → maps to understanding broadcast domains

Key Concepts:

Service Discovery: mDNS, broadcast patterns
Network Scanning: Ethical considerations
VBAN Service Protocol: Ping packets

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 1

Real world outcome:

$ vban_scanner --timeout 10
VBAN Network Scanner v1.0
Scanning for 10 seconds...

Discovered VBAN Devices:
┌───────────────────┬──────────────────┬─────────────┬──────────┬──────────┐
│ IP Address        │ Stream Name      │ Type        │ Format   │ Packets  │
├───────────────────┼──────────────────┼─────────────┼──────────┼──────────┤
│ 192.168.1.100     │ Stream1          │ Audio       │ 48kHz/2ch│ 1,234    │
│ 192.168.1.100     │ Stream2          │ Audio       │ 44.1kHz/1│ 567      │
│ 192.168.1.42      │ ESP32-Mic        │ Audio       │ 44.1kHz/1│ 890      │
│ 192.168.1.50      │ Command1         │ Text        │ UTF-8    │ 23       │
└───────────────────┴──────────────────┴─────────────┴──────────┴──────────┘

Stream Details:
  Stream1 @ 192.168.1.100
    Duration: 10.0s
    Packets: 1,234
    Rate: 123.4 packets/sec
    Bytes: 1.27 MB
    Lost: 0 (0.00%)

Implementation Hints:

Stream tracker:

from collections import defaultdict
import time

class VBANScanner:
    def __init__(self):
        self.streams = defaultdict(lambda: {
            'first_seen': None,
            'last_seen': None,
            'packets': 0,
            'bytes': 0,
            'last_frame': -1,
            'lost': 0,
            'sample_rate': 0,
            'channels': 0,
            'protocol': 0
        })

    def process_packet(self, data, addr):
        header = parse_vban_header(data)
        if not header:
            return

        key = (addr[0], header['stream_name'])
        stream = self.streams[key]

        now = time.time()
        if stream['first_seen'] is None:
            stream['first_seen'] = now
        stream['last_seen'] = now
        stream['packets'] += 1
        stream['bytes'] += len(data)
        stream['sample_rate'] = header['sample_rate']
        stream['channels'] = header['channels']
        stream['protocol'] = header['protocol']

        # Track lost packets
        if stream['last_frame'] >= 0:
            expected = (stream['last_frame'] + 1) & 0xFFFFFFFF
            if header['frame'] != expected:
                lost = (header['frame'] - expected) & 0xFFFFFFFF
                if lost < 1000:  # Sanity check
                    stream['lost'] += lost
        stream['last_frame'] = header['frame']

    def get_report(self):
        report = []
        for (ip, name), stream in self.streams.items():
            duration = stream['last_seen'] - stream['first_seen']
            rate = stream['packets'] / duration if duration > 0 else 0

            report.append({
                'ip': ip,
                'name': name,
                'protocol': ['Audio', 'Serial', 'Text', 'Service'][stream['protocol']],
                'sample_rate': stream['sample_rate'],
                'channels': stream['channels'],
                'packets': stream['packets'],
                'bytes': stream['bytes'],
                'rate': rate,
                'lost': stream['lost'],
                'duration': duration
            })
        return report

Learning milestones:

You see all streams → You understand passive discovery
Statistics are accurate → You track packets correctly
Lost packets detected → Frame counting works
Clean reporting → You’ve built a useful tool

Project 8: VBAN Audio Router (Multi-Stream Hub)

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: Python
Alternative Programming Languages: C, Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Audio Routing / Stream Processing
Software or Tool: Python, numpy
Main Book: “Designing Audio Effect Plugins in C++” by Will Pirkle

What you’ll build: A software audio router that receives multiple VBAN streams, mixes/routes them to different destinations, and optionally applies processing (gain, mixing).

Why it teaches advanced VBAN: Real audio systems need routing. Building a router teaches you about multiple streams, mixing, and the complexity of multi-source audio.

Core challenges you’ll face:

Multiple stream handling → maps to concurrent reception
Sample rate conversion → maps to matching different streams
Audio mixing → maps to combining streams without clipping
Routing matrix → maps to flexible input→output mapping

Key Concepts:

Audio Mixing: Summing, gain staging
Sample Rate Conversion: Resampling algorithms
Thread Safety: Concurrent audio processing

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Projects 1-3, audio processing basics

Real world outcome:

$ vban_router --config router.yaml
VBAN Audio Router v1.0

Inputs:
  [1] 192.168.1.100:Stream1 → 48kHz/2ch [████████░░] -6dB
  [2] 192.168.1.42:ESP32-Mic → 44.1kHz/1ch [██████████] 0dB
  [3] 192.168.1.50:Music → 48kHz/2ch [████░░░░░░] -12dB

Outputs:
  [A] 192.168.1.200:6980 "RouterMix" ← [1,2,3]
  [B] 192.168.1.201:6980 "VoiceOnly" ← [1,2]

Routing Matrix:
       Out A   Out B
In 1   [X]     [X]    (gain: 0dB)
In 2   [X]     [X]    (gain: +3dB)
In 3   [X]     [ ]    (gain: -6dB)

Stats: CPU 12% | Latency 15ms | Packets/s: 375

Implementation Hints:

Multi-stream receiver:

import threading
import queue
import numpy as np

class VBANRouter:
    def __init__(self, config):
        self.inputs = {}   # stream_name → input config
        self.outputs = {}  # output_name → output config
        self.routing = {}  # (input, output) → gain
        self.buffers = {}  # stream_name → audio queue
        self.running = False

    def receive_thread(self, sock):
        """Receive packets and distribute to input buffers"""
        while self.running:
            data, addr = sock.recvfrom(1500)
            header = parse_vban_header(data)
            if not header:
                continue

            key = f"{addr[0]}:{header['stream_name']}"
            if key in self.inputs:
                audio = self.extract_audio(data, header)
                self.buffers[key].put((header, audio))

    def mixer_thread(self):
        """Mix inputs according to routing matrix"""
        while self.running:
            # Collect audio from all inputs
            input_audio = {}
            for name, buf in self.buffers.items():
                try:
                    header, audio = buf.get(timeout=0.01)
                    input_audio[name] = (header, audio)
                except queue.Empty:
                    pass

            # Mix for each output
            for output_name, output_config in self.outputs.items():
                mixed = self.mix_for_output(output_name, input_audio)
                if mixed is not None:
                    self.send_output(output_name, mixed)

    def mix_for_output(self, output_name, input_audio):
        """Mix all routed inputs for one output"""
        mixed = None

        for input_name, (header, audio) in input_audio.items():
            key = (input_name, output_name)
            if key in self.routing:
                gain = self.routing[key]

                # Apply gain
                scaled = audio * (10 ** (gain / 20))

                # Mix
                if mixed is None:
                    mixed = scaled.copy()
                else:
                    # Handle different sample rates...
                    mixed = mixed + scaled

        # Clip to prevent overflow
        if mixed is not None:
            mixed = np.clip(mixed, -1.0, 1.0)

        return mixed

Learning milestones:

Multiple streams receive → You handle concurrent input
Mixing works → You understand audio summing
Routing is configurable → You built a flexible system
No glitches → Buffering and timing work correctly

Project 9: VBAN Quality Monitor (Network Analysis)

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: Python
Alternative Programming Languages: Go, C
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: Network Analysis / QoS Monitoring
Software or Tool: Python, matplotlib
Main Book: “High Performance Browser Networking” by Ilya Grigorik

What you’ll build: A monitoring tool that analyzes VBAN stream quality—measuring jitter, packet loss, latency estimation, and displaying real-time graphs.

Why it teaches network audio quality: VBAN over UDP means packets can be lost or delayed. Understanding quality metrics is essential for diagnosing audio problems.

Core challenges you’ll face:

Jitter measurement → maps to variance in packet arrival times
Loss detection → maps to frame counter gaps
Latency estimation → maps to one-way delay measurement
Real-time visualization → maps to live updating graphs

Key Concepts:

Jitter: Network timing variance
QoS Metrics: MOS, latency, loss
Network Buffers: How to size them

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1, 7

Real world outcome:

$ vban_monitor --stream "Stream1" --ip 192.168.1.100
VBAN Quality Monitor v1.0
Monitoring: Stream1 @ 192.168.1.100

Real-time Metrics (updated every 1s):
┌─────────────────────────────────────────────────────────────────┐
│  Packet Rate:  187.5 pkt/s    Expected: 187.5 pkt/s   OK       │
│  Packet Loss:  0.02%          (3 of 14,062 packets)            │
│  Jitter:       1.2ms avg      (0.5ms - 3.8ms range)            │
│  Buffer Need:  ~4ms           (based on jitter)                │
├─────────────────────────────────────────────────────────────────┤
│  Inter-Packet Timing (ms):                                      │
│  5.33 ████████████████████████████████████████████ (expected)  │
│  5.00 ██████████████████████████████████░░░░░░░░░░             │
│  5.50 ██████████████████████████████████████░░░░░░             │
│  6.00 ██████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             │
├─────────────────────────────────────────────────────────────────┤
│  Jitter History (last 60s):                                     │
│  3ms ┤      *                                                   │
│  2ms ┤   *     * *    *                                        │
│  1ms ┤ *   * *     * *  * * * * * * * * * *                    │
│  0ms ┼──────────────────────────────────────────► time         │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Jitter calculation:

import statistics
from collections import deque

class QualityMonitor:
    def __init__(self, window_size=1000):
        self.arrival_times = deque(maxlen=window_size)
        self.intervals = deque(maxlen=window_size)
        self.last_arrival = None
        self.last_frame = -1
        self.packets_received = 0
        self.packets_lost = 0

    def process_packet(self, header, arrival_time):
        self.packets_received += 1

        # Track inter-packet intervals
        if self.last_arrival is not None:
            interval = (arrival_time - self.last_arrival) * 1000  # ms
            self.intervals.append(interval)

        self.last_arrival = arrival_time
        self.arrival_times.append(arrival_time)

        # Track lost packets
        if self.last_frame >= 0:
            expected = (self.last_frame + 1) & 0xFFFFFFFF
            if header['frame'] != expected:
                lost = (header['frame'] - expected) & 0xFFFFFFFF
                if lost < 1000:
                    self.packets_lost += lost
        self.last_frame = header['frame']

    def get_jitter_stats(self):
        if len(self.intervals) < 2:
            return None

        # Expected interval based on sample rate and samples/frame
        # e.g., 48000 Hz, 256 samples = 5.33ms

        mean = statistics.mean(self.intervals)
        stdev = statistics.stdev(self.intervals)
        min_val = min(self.intervals)
        max_val = max(self.intervals)

        # Jitter = deviation from expected
        # RFC 3550 jitter calculation
        jitter = stdev

        return {
            'mean_interval': mean,
            'jitter': jitter,
            'min': min_val,
            'max': max_val,
            'range': max_val - min_val
        }

    def get_loss_rate(self):
        total = self.packets_received + self.packets_lost
        if total == 0:
            return 0
        return self.packets_lost / total * 100

    def recommended_buffer_size(self):
        stats = self.get_jitter_stats()
        if stats is None:
            return 10  # Default 10ms

        # Buffer should cover worst-case jitter
        # Rule of thumb: 2-3x the jitter
        return stats['jitter'] * 3

Learning milestones:

Jitter measured → You understand timing variance
Loss tracked → Frame counter analysis works
Graphs display → Real-time visualization works
Recommendations accurate → You understand buffer sizing

Project 10: Complete VBAN Application (Full Implementation)

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: Rust or C
Alternative Programming Languages: Go, C++
Coolness Level: Level 5: Pure Magic
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Systems Programming / Audio Engineering
Software or Tool: Rust or C, ALSA/CoreAudio/WASAPI
Main Book: “Programming Rust” by Blandy & Orendorff

What you’ll build: A complete VBAN implementation from scratch in a systems language—full protocol support, platform audio integration, low-latency performance, and production-quality code.

Why this is the ultimate project: You’ll implement VBAN at the same level as the official tools, understanding every optimization and design decision.

Core challenges you’ll face:

Zero-copy packet handling → maps to memory efficiency
Platform audio abstraction → maps to ALSA/CoreAudio/WASAPI
Lock-free buffers → maps to real-time audio requirements
All sub-protocols → maps to complete specification coverage

Difficulty: Expert Time estimate: 2-3 months Prerequisites: All previous projects, systems programming experience

Real world outcome:

$ vban --mode emitter --device "hw:0" --stream "MyStream" \
    --dest 192.168.1.100 --rate 48000 --channels 2 --format s16

$ vban --mode receptor --stream "Stream1" --device "hw:0" \
    --quality 3 --buffer 20

$ vban --mode text --dest 192.168.1.100 "Strip[0].Mute = 1"

$ vban --mode scan --timeout 10

Implementation Hints:

Rust VBAN types:

use std::net::UdpSocket;

const VBAN_HEADER_SIZE: usize = 28;
const VBAN_MAX_DATA_SIZE: usize = 1436;

#[repr(C, packed)]
#[derive(Clone, Copy)]
struct VBANHeader {
    vban: [u8; 4],
    format_sr: u8,
    format_nbs: u8,
    format_nbc: u8,
    format_bit: u8,
    streamname: [u8; 16],
    nu_frame: u32,
}

impl VBANHeader {
    fn new(sample_rate_index: u8, samples: u8, channels: u8,
           bit_format: u8, name: &str, frame: u32) -> Self {
        let mut streamname = [0u8; 16];
        let name_bytes = name.as_bytes();
        let len = name_bytes.len().min(16);
        streamname[..len].copy_from_slice(&name_bytes[..len]);

        Self {
            vban: *b"VBAN",
            format_sr: sample_rate_index,
            format_nbs: samples - 1,
            format_nbc: channels - 1,
            format_bit: bit_format,
            streamname,
            nu_frame: frame,
        }
    }

    fn to_bytes(&self) -> [u8; 28] {
        unsafe { std::mem::transmute(*self) }
    }

    fn from_bytes(data: &[u8]) -> Option<Self> {
        if data.len() < 28 {
            return None;
        }
        if &data[0..4] != b"VBAN" {
            return None;
        }

        let header: Self = unsafe {
            std::ptr::read(data.as_ptr() as *const Self)
        };
        Some(header)
    }
}

Lock-free ring buffer for audio:

use std::sync::atomic::{AtomicUsize, Ordering};

struct RingBuffer<T: Copy + Default, const N: usize> {
    buffer: [T; N],
    write_pos: AtomicUsize,
    read_pos: AtomicUsize,
}

impl<T: Copy + Default, const N: usize> RingBuffer<T, N> {
    fn new() -> Self {
        Self {
            buffer: [T::default(); N],
            write_pos: AtomicUsize::new(0),
            read_pos: AtomicUsize::new(0),
        }
    }

    fn push(&self, items: &[T]) -> usize {
        let write = self.write_pos.load(Ordering::Relaxed);
        let read = self.read_pos.load(Ordering::Acquire);

        let available = if write >= read {
            N - (write - read) - 1
        } else {
            read - write - 1
        };

        let to_write = items.len().min(available);
        // ... write logic with wrap-around

        self.write_pos.store((write + to_write) % N, Ordering::Release);
        to_write
    }

    fn pop(&self, out: &mut [T]) -> usize {
        // ... similar read logic
    }
}

Learning milestones:

Packets send/receive → Core protocol works
Platform audio works → System integration complete
All sub-protocols → Full specification coverage
Production quality → Error handling, logging, config

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor
1. Packet Analyzer	⭐	Weekend	⚡⚡⚡	🎮🎮🎮
2. VBAN Receiver	⭐⭐	1 week	⚡⚡⚡⚡	🎮🎮🎮🎮
3. VBAN Emitter	⭐⭐	1 week	⚡⚡⚡⚡	🎮🎮🎮🎮
4. Text Protocol	⭐⭐	Weekend	⚡⚡⚡	🎮🎮🎮
5. MIDI Bridge	⭐⭐⭐	2 weeks	⚡⚡⚡⚡	🎮🎮🎮🎮🎮
6. ESP32 Audio Node	⭐⭐⭐	2-3 weeks	⚡⚡⚡⚡⚡	🎮🎮🎮🎮🎮
7. Network Scanner	⭐⭐	1 week	⚡⚡⚡	🎮🎮🎮
8. Audio Router	⭐⭐⭐⭐	3-4 weeks	⚡⚡⚡⚡⚡	🎮🎮🎮🎮
9. Quality Monitor	⭐⭐⭐	2 weeks	⚡⚡⚡⚡	🎮🎮🎮🎮
10. Complete Implementation	⭐⭐⭐⭐	2-3 months	⚡⚡⚡⚡⚡	🎮🎮🎮🎮🎮

Recommended Learning Path

Your Starting Point

If you’re learning network audio from scratch: Projects 1 → 2 → 3 → 7 → 9 (Core protocol understanding)

If you’re interested in embedded/IoT audio: Projects 1 → 2 → 3 → 6 (ESP32 focus)

If you want to build tools for Voicemeeter: Projects 1 → 4 → 5 → 8 (Control and routing)

Recommended Sequence

Phase 1: Protocol Fundamentals (1-2 weeks)
├── Project 1: Packet Analyzer → See the wire format
└── Project 2: VBAN Receiver → Understand slave behavior

Phase 2: Bidirectional Communication (2-3 weeks)
├── Project 3: VBAN Emitter → Understand master behavior
└── Project 4: Text Protocol → Learn sub-protocols

Phase 3: Specialized Applications (3-5 weeks)
├── Project 5: MIDI Bridge → Musical control
├── Project 6: ESP32 Node → Embedded audio
└── Project 7: Network Scanner → Discovery

Phase 4: Advanced Systems (5-8 weeks)
├── Project 8: Audio Router → Multi-stream handling
├── Project 9: Quality Monitor → Network analysis
└── Project 10: Complete Implementation → Production quality

Final Project: Networked Audio Production System

File: LEARN_VBAN_PROTOCOL.md
Main Programming Language: Mixed (Python, C++, Arduino)
Coolness Level: Level 5: Pure Magic
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 5: Master
Knowledge Area: Complete Audio System Design
Software or Tool: Everything from previous projects

What you’ll build: A complete networked audio production system using VBAN—multiple ESP32 microphones, a mixing/routing server, monitoring dashboards, and remote control.

System Architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    NETWORKED AUDIO PRODUCTION SYSTEM                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐                                     │
│   │ ESP32   │  │ ESP32   │  │ ESP32   │    VBAN Audio                       │
│   │ Mic 1   │  │ Mic 2   │  │ Mic 3   │    ═══════►                         │
│   └────┬────┘  └────┬────┘  └────┬────┘                                     │
│        │            │            │                                           │
│        └────────────┼────────────┘                                           │
│                     │                                                        │
│                     ▼                                                        │
│   ┌─────────────────────────────────────┐                                   │
│   │         VBAN ROUTER/MIXER           │                                   │
│   │                                     │                                   │
│   │  Inputs:  [Mic1] [Mic2] [Mic3]     │                                   │
│   │  Outputs: [Main] [Monitor] [Record]│                                   │
│   │  Effects: [EQ] [Comp] [Gate]       │                                   │
│   └──────────────┬──────────────────────┘                                   │
│                  │                                                           │
│       ┌──────────┼──────────┐                                               │
│       │          │          │                                               │
│       ▼          ▼          ▼                                               │
│   ┌───────┐  ┌───────┐  ┌───────┐                                          │
│   │ Main  │  │Monitor│  │Recorder│                                         │
│   │Output │  │ Mix   │  │        │                                         │
│   └───────┘  └───────┘  └───────┘                                          │
│                                                                              │
│   ┌─────────────────────────────────────┐                                   │
│   │         MONITORING DASHBOARD         │                                   │
│   │  [Levels] [Jitter] [Latency] [Loss] │                                   │
│   └─────────────────────────────────────┘                                   │
│                                                                              │
│   ┌─────────────────────────────────────┐                                   │
│   │         REMOTE CONTROL (Web UI)      │                                   │
│   │  [Mute] [Gain] [Routing] [Presets]  │                                   │
│   └─────────────────────────────────────┘                                   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Difficulty: Master Time estimate: 3-4 months Prerequisites: All 10 projects completed

Essential Resources

Official Documentation

Resource	URL	Description
VBAN Specification	vb-audio.com/Voicemeeter/VBANProtocol_Specifications.pdf	Official protocol spec
VB-Audio VBAN Page	vb-audio.com/Voicemeeter/vban.htm	Overview and downloads
VB-Audio Forums	forum.vb-audio.com	Developer discussions

Open Source Implementations

Project	Language	URL
vban (quiniouben)	C	github.com/quiniouben/vban
pyVBAN	Python	github.com/TheStaticTurtle/pyVBAN
ESP32-VBAN-Audio-Source	C++	github.com/rkinnett/ESP32-VBAN-Audio-Source
ESP32-VBAN-Network-Audio-Player	C++	github.com/rkinnett/ESP32-VBAN-Network-Audio-Player
Arduino Audio Tools	C++	github.com/pschatzmann/arduino-audio-tools
vban (npm)	JavaScript	npmjs.com/package/vban

Books

Book	Author	Best For
TCP/IP Illustrated, Volume 1	W. Richard Stevens	UDP networking fundamentals
The Audio Programming Book	Richard Boulanger	Audio DSP concepts
Making Embedded Systems	Elecia White	ESP32 development
Computer Networks	Andrew Tanenbaum	Network protocols
Programming Rust	Blandy & Orendorff	Systems implementation

Tools

Tool	Purpose
Voicemeeter	VBAN-compatible virtual mixer
Wireshark	Packet capture and analysis
VB-Cable	Virtual audio cables
VBAN Receptor/Emitter	Official VBAN apps

Summary

#	Project	Main Language	Knowledge Area
1	Packet Analyzer	Python	Network Protocols / Binary Parsing
2	VBAN Receiver	Python	Audio I/O / Network Programming
3	VBAN Emitter	Python	Audio Capture / Packet Timing
4	Text Protocol	Python	Remote Control / Commands
5	MIDI Bridge	Python	MIDI Protocol / Serial
6	ESP32 Audio Node	C++ (Arduino)	Embedded Systems / I2S Audio
7	Network Scanner	Python	Service Discovery / Network Analysis
8	Audio Router	Python	Audio Routing / Stream Processing
9	Quality Monitor	Python	QoS / Jitter Analysis
10	Complete Implementation	Rust/C	Systems Programming
Final	Production System	Mixed	Complete Audio System

Getting Started Checklist

Before starting Project 1:

Python 3.8+ installed
Install Voicemeeter (for testing): vb-audio.com/Voicemeeter
Read the VBAN Protocol Specification
Install Wireshark for packet analysis
Understand basic UDP networking
Have a local network for testing (or use loopback)

Welcome to the world of network audio protocols! 🎧

Generated for deep understanding of the VBAN protocol and network audio streaming

Learn VBAN Protocol: From Packets to Production Audio Streaming

Why Learn VBAN?

Core Concept Analysis

What is VBAN?

VBAN vs Professional Audio Networking

VBAN Packet Structure

Sample Rate Index Table

Data Flow: Sender to Receiver

Protocol Constants

Project List

Project 1: VBAN Packet Analyzer (See the Wire Format)

Project 2: Simple VBAN Receiver (Play Network Audio)

Project 3: VBAN Emitter (Send Audio to the Network)

Project 4: VBAN Text Protocol (Remote Control)

Project 5: VBAN Serial/MIDI (Musical Control)

Project 6: ESP32 VBAN Audio Node (Embedded Implementation)

Project 7: VBAN Network Scanner (Service Discovery)

Project 8: VBAN Audio Router (Multi-Stream Hub)

Project 9: VBAN Quality Monitor (Network Analysis)

Project 10: Complete VBAN Application (Full Implementation)

Project Comparison Table

Recommended Learning Path

Your Starting Point

Recommended Sequence

Final Project: Networked Audio Production System

Essential Resources

Official Documentation

Open Source Implementations

Books

Tools

Summary

Getting Started Checklist