← Back to all projects

LEARN VBAN PROTOCOL

Learn VBAN Protocol: From Packets to Production Audio Streaming

Goal: Deeply understand the VBAN (VB-Audio Network) protocol—from raw UDP packets to building complete audio streaming systems, embedded implementations, and network audio applications.


Why Learn VBAN?

VBAN is a lightweight, open protocol for streaming audio (and MIDI/text/serial data) over standard IP networks. Unlike professional protocols like Dante or AES67, VBAN is:

  • Simple: 28-byte header + PCM data over UDP
  • Open: Free specification, no licensing fees
  • Practical: Works on any network, no special hardware
  • Extensible: Sub-protocols for audio, MIDI, text, and services

Understanding VBAN teaches you:

  • Network audio fundamentals - How digital audio travels over IP
  • UDP programming - Real-time, low-latency network protocols
  • Binary protocols - Parsing and constructing packet formats
  • Audio processing - PCM formats, sample rates, channel configurations
  • Embedded systems - Implementing protocols on microcontrollers
  • Real-time systems - Handling jitter, latency, and synchronization

After completing these projects, you will:

  • Understand every byte in a VBAN packet
  • Build your own audio streaming tools
  • Create embedded audio devices (ESP32, Arduino)
  • Implement network audio monitoring and routing
  • Appreciate the trade-offs between VBAN and professional protocols

Core Concept Analysis

What is VBAN?

┌─────────────────────────────────────────────────────────────────────────────┐
│                              VBAN OVERVIEW                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  VBAN = VB-Audio Network Protocol                                           │
│                                                                              │
│  Created by: Vincent Burel (VB-Audio Software)                              │
│  First released: June 2015 (with Voicemeeter)                               │
│  License: Open/Free for any use                                             │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    VBAN CHARACTERISTICS                             │    │
│  ├─────────────────────────────────────────────────────────────────────┤    │
│  │                                                                     │    │
│  │  • Transport: UDP (User Datagram Protocol)                         │    │
│  │  • Latency: Low (no TCP handshaking)                               │    │
│  │  • Reliability: None (no retransmission, no ACKs)                  │    │
│  │  • Model: Broadcast (sender=master, receiver=slave)                │    │
│  │  • Sync: None (receiver adapts to sender's clock)                  │    │
│  │  • Format: Native PCM (uncompressed audio)                         │    │
│  │                                                                     │    │
│  │  Max Channels: 256 (commonly 1-8)                                  │    │
│  │  Max Sample Rate: 705,600 Hz                                        │    │
│  │  Bit Depths: 8, 10, 12, 16, 24, 32-bit int, 32/64-bit float        │    │
│  │  Default Port: 6980                                                 │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  Sub-protocols:                                                              │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐                    │
│  │  AUDIO   │  │  SERIAL  │  │   TEXT   │  │ SERVICE  │                    │
│  │  (0x00)  │  │  (0x20)  │  │  (0x40)  │  │  (0x60)  │                    │
│  │          │  │  (MIDI)  │  │ (Remote) │  │  (Ping)  │                    │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

VBAN vs Professional Audio Networking

Aspect VBAN Dante AES67 AVB
Cost Free Licensing fees Free standard Free standard
Hardware Any network Any network Any network AVB switches required
Sync None (slave adapts) PTP (precise) PTPv2 gPTP
Reliability Best effort (UDP) Redundant networks Redundant QoS guaranteed
Latency ~5-20ms typical <1ms possible <1ms possible <2ms guaranteed
Discovery Manual/Ping Automatic Manual Automatic
Channels 256 1024+ Unlimited 8 per stream
Use case Home/Semi-pro Professional Professional Professional

VBAN Packet Structure

┌─────────────────────────────────────────────────────────────────────────────┐
│                          VBAN PACKET STRUCTURE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  UDP Packet (max 1464 bytes for VBAN)                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  VBAN HEADER (28 bytes)  │  AUDIO DATA (1-1436 bytes)              │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                     VBAN HEADER LAYOUT (28 bytes)                   │    │
│  ├─────────────────────────────────────────────────────────────────────┤    │
│  │                                                                     │    │
│  │  Offset  Size   Field          Description                         │    │
│  │  ──────────────────────────────────────────────────────────────    │    │
│  │  0       4      vban           Magic: 'V','B','A','N' (0x4E414256) │    │
│  │  4       1      format_SR      Sample Rate Index + Protocol        │    │
│  │  5       1      format_nbs     Samples per frame - 1 (0-255)       │    │
│  │  6       1      format_nbc     Channels - 1 (0-255)                │    │
│  │  7       1      format_bit     Bit format + Codec                  │    │
│  │  8       16     streamname     Stream name (ASCII, null-padded)    │    │
│  │  24      4      nuFrame        Frame counter (32-bit, little-end)  │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  format_SR byte layout:                                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Bits 7-5: Sub-Protocol    Bits 4-0: Sample Rate Index             │    │
│  │  ┌───┬───┬───┬───┬───┬───┬───┬───┐                                 │    │
│  │  │ P │ P │ P │ SR│ SR│ SR│ SR│ SR│                                 │    │
│  │  └───┴───┴───┴───┴───┴───┴───┴───┘                                 │    │
│  │  Protocol: 000=Audio, 001=Serial, 010=Text, 011=Service            │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  format_bit byte layout:                                                     │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Bits 7-4: Codec Type      Bits 3-0: Bit Resolution                │    │
│  │  ┌───┬───┬───┬───┬───┬───┬───┬───┐                                 │    │
│  │  │ C │ C │ C │ C │ BR│ BR│ BR│ BR│                                 │    │
│  │  └───┴───┴───┴───┴───┴───┴───┴───┘                                 │    │
│  │  Codec: 0=PCM, 1=VBCA (compressed), 2=VBCV (voice)...              │    │
│  │  BitRes: 0=U8, 1=S16, 2=S24, 3=S32, 4=F32, 5=F64, 6=12bit, 7=10bit │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Sample Rate Index Table

┌─────────────────────────────────────────────────────────────────────────────┐
│                      VBAN SAMPLE RATE INDEX TABLE                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Index    Sample Rate      Common Use                                        │
│  ─────────────────────────────────────────────────────────────────────────  │
│  0        6,000 Hz         Low quality voice                                │
│  1        12,000 Hz        Voice                                            │
│  2        24,000 Hz        Voice/AM radio                                   │
│  3        48,000 Hz        ★ Professional audio, video                      │
│  4        96,000 Hz        High-resolution audio                            │
│  5        192,000 Hz       Studio master                                    │
│  6        384,000 Hz       Ultra-high resolution                            │
│  7        8,000 Hz         Telephone                                        │
│  8        16,000 Hz        Wideband voice                                   │
│  9        32,000 Hz        FM broadcast                                     │
│  10       64,000 Hz        -                                                │
│  11       128,000 Hz       -                                                │
│  12       256,000 Hz       -                                                │
│  13       512,000 Hz       -                                                │
│  14       11,025 Hz        Low-quality audio                                │
│  15       22,050 Hz        Half CD quality                                  │
│  16       44,100 Hz        ★ CD quality                                     │
│  17       88,200 Hz        2x CD                                            │
│  18       176,400 Hz       4x CD                                            │
│  19       352,800 Hz       8x CD (DSD conversion)                           │
│  20       705,600 Hz       16x CD                                           │
│                                                                              │
│  ★ = Most commonly used                                                      │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow: Sender to Receiver

┌─────────────────────────────────────────────────────────────────────────────┐
│                        VBAN DATA FLOW                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  SENDER (Master)                          RECEIVER (Slave)                   │
│  ┌─────────────────────┐                  ┌─────────────────────┐           │
│  │  Audio Source       │                  │  Audio Output       │           │
│  │  (Microphone, DAW,  │                  │  (Speakers, DAW,    │           │
│  │   System audio)     │                  │   Recording)        │           │
│  └──────────┬──────────┘                  └──────────▲──────────┘           │
│             │                                        │                       │
│             ▼                                        │                       │
│  ┌─────────────────────┐                  ┌─────────────────────┐           │
│  │  PCM Buffer         │                  │  Jitter Buffer      │           │
│  │  (256 samples max   │                  │  (Compensates for   │           │
│  │   per VBAN packet)  │                  │   network variance) │           │
│  └──────────┬──────────┘                  └──────────▲──────────┘           │
│             │                                        │                       │
│             ▼                                        │                       │
│  ┌─────────────────────┐                  ┌─────────────────────┐           │
│  │  VBAN Packet        │                  │  VBAN Parser        │           │
│  │  Assembly           │                  │  • Validate header  │           │
│  │  • Add 28-byte hdr  │                  │  • Check stream name│           │
│  │  • Increment nuFrame│                  │  • Extract config   │           │
│  └──────────┬──────────┘                  └──────────▲──────────┘           │
│             │                                        │                       │
│             ▼                                        │                       │
│  ┌─────────────────────┐    UDP/IP        ┌─────────────────────┐           │
│  │  UDP Socket         │ ═══════════════► │  UDP Socket         │           │
│  │  sendto(IP:6980)    │                  │  bind(0.0.0.0:6980) │           │
│  └─────────────────────┘                  └─────────────────────┘           │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    TIMING CONSIDERATIONS                            │    │
│  ├─────────────────────────────────────────────────────────────────────┤    │
│  │                                                                     │    │
│  │  48kHz, 256 samples/packet = 5.33ms per packet                     │    │
│  │  48kHz, 128 samples/packet = 2.67ms per packet                     │    │
│  │  44.1kHz, 256 samples/packet = 5.80ms per packet                   │    │
│  │                                                                     │    │
│  │  Network latency (LAN): typically < 1ms                            │    │
│  │  Total end-to-end: 10-30ms typical (including buffers)             │    │
│  │                                                                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Protocol Constants

// VBAN Header Size
#define VBAN_HEADER_SIZE        28    // 4 + 1 + 1 + 1 + 1 + 16 + 4

// Maximum sizes
#define VBAN_DATA_MAX_SIZE      1436  // MTU - IP - UDP - safety margin
#define VBAN_PACKET_MAX_SIZE    1464  // Header + Data
#define VBAN_SAMPLES_MAX        256   // Per packet

// Protocol types (bits 7-5 of format_SR)
#define VBAN_PROTOCOL_AUDIO     0x00  // PCM audio
#define VBAN_PROTOCOL_SERIAL    0x20  // Serial/MIDI
#define VBAN_PROTOCOL_TXT       0x40  // Text commands
#define VBAN_PROTOCOL_SERVICE   0x60  // Ping/discovery
#define VBAN_PROTOCOL_MASK      0xE0

// Sample rate mask (bits 4-0 of format_SR)
#define VBAN_SR_MASK            0x1F
#define VBAN_SR_MAXNUMBER       21

// Data types (bits 3-0 of format_bit)
#define VBAN_DATATYPE_U8        0x00  // Unsigned 8-bit
#define VBAN_DATATYPE_S16       0x01  // Signed 16-bit
#define VBAN_DATATYPE_S24       0x02  // Signed 24-bit
#define VBAN_DATATYPE_S32       0x03  // Signed 32-bit
#define VBAN_DATATYPE_F32       0x04  // Float 32-bit
#define VBAN_DATATYPE_F64       0x05  // Float 64-bit
#define VBAN_DATATYPE_12BIT     0x06  // 12-bit packed
#define VBAN_DATATYPE_10BIT     0x07  // 10-bit packed
#define VBAN_DATATYPE_MASK      0x07

// Codec types (bits 7-4 of format_bit)
#define VBAN_CODEC_PCM          0x00  // Native PCM
#define VBAN_CODEC_VBCA         0x10  // VB-Audio compressed
#define VBAN_CODEC_VBCV         0x20  // VB-Audio voice
#define VBAN_CODEC_MASK         0xF0

// Default port
#define VBAN_DEFAULT_PORT       6980

Project List

Projects are ordered from basic protocol understanding to complete audio systems and embedded implementations.


Project 1: VBAN Packet Analyzer (See the Wire Format)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Python
  • Alternative Programming Languages: C, Rust, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Network Protocols / Binary Parsing
  • Software or Tool: Wireshark, Python
  • Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens

What you’ll build: A packet analyzer that captures VBAN packets from the network (or reads pcap files), decodes the 28-byte header, and displays stream information: sample rate, channels, bit depth, stream name, and frame counter.

Why it teaches VBAN: Before you can send or receive audio, you need to see the protocol. This project forces you to understand every byte in the header—the foundation of everything else.

Core challenges you’ll face:

  • Capturing UDP packets → maps to socket programming with raw sockets or pcap
  • Parsing binary headers → maps to struct unpacking, endianness
  • Decoding indexed values → maps to sample rate lookup tables
  • Filtering by stream name → maps to null-terminated ASCII strings

Resources for key challenges:

Key Concepts:

  • UDP Sockets: “The Linux Programming Interface” Chapter 57 - Michael Kerrisk
  • Binary Parsing: Python struct module documentation
  • Network Capture: Wireshark User Guide

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Python, understanding of networking basics

Real world outcome:

$ sudo python vban_analyzer.py --interface eth0
Listening for VBAN packets on eth0:6980...

[VBAN Audio] 192.168.1.100 → 192.168.1.255
  Stream: "Stream1"
  Protocol: AUDIO (PCM)
  Sample Rate: 48000 Hz (index 3)
  Channels: 2 (stereo)
  Bit Depth: 16-bit signed
  Samples/Frame: 256
  Frame #: 123456
  Payload: 1024 bytes (256 samples × 2 ch × 2 bytes)

[VBAN Audio] 192.168.1.100 → 192.168.1.255
  Stream: "Stream1"
  Frame #: 123457 (+1)
  ...

[VBAN Text] 192.168.1.50 → 192.168.1.100
  Stream: "Command1"
  Protocol: TEXT
  Content: "Strip[0].Mute = 1"

Implementation Hints:

VBAN header structure in Python:

import struct

# VBAN Header: 28 bytes total
# 'VBAN' (4) + format_SR (1) + format_nbs (1) + format_nbc (1) +
# format_bit (1) + streamname (16) + nuFrame (4)

VBAN_HEADER_FORMAT = '<4sBBBB16sI'  # Little-endian
VBAN_HEADER_SIZE = 28

# Sample rate lookup table (21 entries)
VBAN_SR_LIST = [
    6000, 12000, 24000, 48000, 96000, 192000, 384000,
    8000, 16000, 32000, 64000, 128000, 256000, 512000,
    11025, 22050, 44100, 88200, 176400, 352800, 705600
]

def parse_vban_header(data):
    if len(data) < VBAN_HEADER_SIZE:
        return None

    vban, format_sr, format_nbs, format_nbc, format_bit, \
        streamname, nu_frame = struct.unpack(VBAN_HEADER_FORMAT, data[:28])

    # Validate magic bytes
    if vban != b'VBAN':
        return None

    # Extract fields
    protocol = (format_sr & 0xE0) >> 5
    sr_index = format_sr & 0x1F
    sample_rate = VBAN_SR_LIST[sr_index] if sr_index < 21 else 0
    samples_per_frame = format_nbs + 1
    channels = format_nbc + 1
    bit_format = format_bit & 0x07
    codec = (format_bit & 0xF0) >> 4
    stream_name = streamname.rstrip(b'\x00').decode('ascii', errors='replace')

    return {
        'protocol': protocol,
        'sample_rate': sample_rate,
        'samples': samples_per_frame,
        'channels': channels,
        'bit_format': bit_format,
        'codec': codec,
        'stream_name': stream_name,
        'frame': nu_frame
    }

UDP listener:

import socket

def listen_for_vban(port=6980):
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.bind(('0.0.0.0', port))

    print(f"Listening on port {port}...")

    while True:
        data, addr = sock.recvfrom(1500)
        header = parse_vban_header(data)
        if header:
            print(f"From {addr}: {header['stream_name']} - "
                  f"{header['sample_rate']}Hz, {header['channels']}ch, "
                  f"Frame #{header['frame']}")

Questions to guide your implementation:

  • What happens if packets arrive out of order? (Check frame counter)
  • How do you distinguish audio from text packets? (Protocol bits)
  • What’s the relationship between sample rate, channels, and packet size?

Learning milestones:

  1. You capture raw packets → You understand UDP socket binding
  2. You parse the header correctly → You understand binary formats
  3. You decode all field types → You understand the protocol structure
  4. You track frame numbers → You understand packet ordering

Project 2: Simple VBAN Receiver (Play Network Audio)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Python
  • Alternative Programming Languages: C, Rust, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Audio I/O / Network Programming
  • Software or Tool: Python, PyAudio or sounddevice
  • Main Book: “The Audio Programming Book” by Richard Boulanger

What you’ll build: A command-line VBAN receiver that listens for a specific stream, buffers incoming audio, and plays it through your speakers in real-time.

Why it teaches VBAN: Receiving audio forces you to handle real-time constraints—buffering, jitter compensation, and sample rate matching. This is where VBAN’s “slave” model becomes real.

Core challenges you’ll face:

  • Jitter buffering → maps to compensating for network timing variance
  • Sample rate matching → maps to configuring audio output to match stream
  • Stream filtering → maps to selecting specific stream by name/IP
  • Dropout handling → maps to detecting lost packets, silence insertion

Key Concepts:

  • Audio I/O: PyAudio or sounddevice documentation
  • Ring Buffers: For audio buffering
  • Jitter: “Computer Networks” Chapter 6 - Tanenbaum

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 1, understanding of audio basics

Real world outcome:

$ python vban_receiver.py --stream "Stream1" --ip 192.168.1.100
VBAN Receiver v1.0
Listening for stream "Stream1" from 192.168.1.100 on port 6980

[Connected] Receiving: 48000 Hz, 2 channels, 16-bit
Buffer: [████████████░░░░░░░░] 60% | Latency: 15ms

Press Ctrl+C to stop...

^C
Statistics:
  Packets received: 12,345
  Packets lost: 3 (0.02%)
  Underruns: 0
  Total time: 64.2 seconds

Implementation Hints:

Audio output with sounddevice:

import sounddevice as sd
import numpy as np
from collections import deque
import threading

class VBANReceiver:
    def __init__(self, stream_name, buffer_ms=50):
        self.stream_name = stream_name
        self.buffer = deque(maxlen=100)  # Ring buffer
        self.audio_stream = None
        self.sample_rate = None
        self.channels = None
        self.running = False
        self.buffer_target_ms = buffer_ms
        self.last_frame = -1
        self.packets_lost = 0

    def audio_callback(self, outdata, frames, time, status):
        """Called by sounddevice when it needs audio"""
        if status:
            print(f"Audio status: {status}")

        if len(self.buffer) > 0:
            # Get audio from buffer
            audio = self.buffer.popleft()
            # Ensure correct shape
            if len(audio) < frames:
                # Pad with zeros if underrun
                audio = np.pad(audio, ((0, frames - len(audio)), (0, 0)))
            outdata[:] = audio[:frames]
        else:
            # Buffer underrun - output silence
            outdata.fill(0)

    def process_packet(self, data, addr):
        header = parse_vban_header(data)
        if not header:
            return

        # Filter by stream name
        if header['stream_name'] != self.stream_name:
            return

        # Check for lost packets
        if self.last_frame >= 0:
            expected = (self.last_frame + 1) & 0xFFFFFFFF
            if header['frame'] != expected:
                lost = (header['frame'] - expected) & 0xFFFFFFFF
                self.packets_lost += lost
        self.last_frame = header['frame']

        # Initialize audio stream on first packet
        if self.audio_stream is None:
            self.sample_rate = header['sample_rate']
            self.channels = header['channels']
            self.start_audio()

        # Convert PCM data to numpy array
        audio_data = data[VBAN_HEADER_SIZE:]
        samples = self.convert_audio(audio_data, header)
        self.buffer.append(samples)

    def convert_audio(self, data, header):
        """Convert raw PCM bytes to numpy array"""
        dtype_map = {
            0: np.uint8,    # U8
            1: np.int16,    # S16
            2: None,        # S24 - special handling
            3: np.int32,    # S32
            4: np.float32,  # F32
            5: np.float64,  # F64
        }

        dtype = dtype_map.get(header['bit_format'], np.int16)
        audio = np.frombuffer(data, dtype=dtype)

        # Reshape to (samples, channels)
        audio = audio.reshape(-1, header['channels'])

        # Normalize to float32 for output
        if dtype == np.int16:
            audio = audio.astype(np.float32) / 32768.0
        elif dtype == np.int32:
            audio = audio.astype(np.float32) / 2147483648.0

        return audio

    def start_audio(self):
        self.audio_stream = sd.OutputStream(
            samplerate=self.sample_rate,
            channels=self.channels,
            callback=self.audio_callback,
            blocksize=256
        )
        self.audio_stream.start()

Main receive loop:

def main():
    receiver = VBANReceiver(stream_name="Stream1")

    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    sock.bind(('0.0.0.0', 6980))

    print(f"Listening for VBAN stream...")

    try:
        while True:
            data, addr = sock.recvfrom(1500)
            receiver.process_packet(data, addr)
    except KeyboardInterrupt:
        print("\nStopping...")

Learning milestones:

  1. Audio plays (with glitches) → You understand the basic pipeline
  2. Buffering reduces glitches → You understand jitter compensation
  3. Lost packets detected → You understand frame counter usage
  4. Clean playback achieved → You’ve built a working receiver

Project 3: VBAN Emitter (Send Audio to the Network)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Python
  • Alternative Programming Languages: C, Rust, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Audio Capture / Network Programming
  • Software or Tool: Python, PyAudio or sounddevice
  • Main Book: “The Audio Programming Book” by Richard Boulanger

What you’ll build: A command-line VBAN emitter that captures audio from your microphone or system audio and streams it over the network.

Why it teaches VBAN: Being the sender (master) teaches you about timing—you control the clock. You’ll understand packet pacing and why the sender drives everything.

Core challenges you’ll face:

  • Audio capture → maps to reading from input devices
  • Packet timing → maps to sending at consistent intervals
  • Header construction → maps to building valid VBAN packets
  • Broadcast vs unicast → maps to choosing destination addressing

Key Concepts:

  • Audio Capture: Platform audio APIs
  • Packet Pacing: Timing audio transmission
  • UDP Broadcasting: Sending to multiple receivers

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 2, audio capture basics

Real world outcome:

$ python vban_emitter.py --stream "MyStream" --dest 192.168.1.255 \
    --device "Microphone" --rate 48000 --channels 2

VBAN Emitter v1.0
Source: Microphone (Built-in)
Destination: 192.168.1.255:6980 (broadcast)
Stream: "MyStream"
Format: 48000 Hz, 2 channels, 16-bit PCM

Streaming... [████████████████████] 100% CPU: 2%
Packets sent: 15,234 | Bytes: 23.4 MB | Time: 81.2s

Press Ctrl+C to stop...

Implementation Hints:

VBAN packet construction:

class VBANEmitter:
    def __init__(self, stream_name, dest_ip, dest_port=6980,
                 sample_rate=48000, channels=2, bit_depth=16):
        self.stream_name = stream_name
        self.dest = (dest_ip, dest_port)
        self.sample_rate = sample_rate
        self.channels = channels
        self.bit_depth = bit_depth
        self.frame_counter = 0

        # Find sample rate index
        self.sr_index = VBAN_SR_LIST.index(sample_rate)

        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        # Enable broadcast if using broadcast address
        self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)

    def build_header(self, samples_per_frame):
        """Construct 28-byte VBAN header"""

        # format_SR: protocol (3 bits) + sample rate index (5 bits)
        format_sr = (0x00 << 5) | (self.sr_index & 0x1F)  # Audio protocol

        # format_nbs: samples per frame - 1
        format_nbs = samples_per_frame - 1

        # format_nbc: channels - 1
        format_nbc = self.channels - 1

        # format_bit: codec (4 bits) + bit depth (4 bits)
        bit_format = 1 if self.bit_depth == 16 else 0  # S16
        format_bit = (0x00 << 4) | (bit_format & 0x0F)  # PCM codec

        # Stream name (16 bytes, null-padded)
        stream_bytes = self.stream_name.encode('ascii')[:16].ljust(16, b'\x00')

        header = struct.pack('<4sBBBB16sI',
            b'VBAN',
            format_sr,
            format_nbs,
            format_nbc,
            format_bit,
            stream_bytes,
            self.frame_counter
        )

        self.frame_counter = (self.frame_counter + 1) & 0xFFFFFFFF
        return header

    def send_audio(self, audio_data):
        """Send audio samples as VBAN packet"""
        # audio_data should be numpy array of shape (samples, channels)
        samples_per_frame = len(audio_data)

        # Convert to bytes
        if self.bit_depth == 16:
            pcm_data = (audio_data * 32767).astype(np.int16).tobytes()

        # Build packet
        header = self.build_header(samples_per_frame)
        packet = header + pcm_data

        # Send
        self.sock.sendto(packet, self.dest)

Audio capture and streaming:

def audio_input_callback(indata, frames, time, status):
    """Called when audio input is available"""
    if status:
        print(f"Input status: {status}")

    # indata is numpy array of shape (frames, channels)
    emitter.send_audio(indata.copy())

def start_streaming():
    global emitter
    emitter = VBANEmitter(
        stream_name="MyStream",
        dest_ip="192.168.1.255",
        sample_rate=48000,
        channels=2
    )

    # Start audio input stream
    with sd.InputStream(
        samplerate=48000,
        channels=2,
        callback=audio_input_callback,
        blocksize=256  # 256 samples = one VBAN packet
    ):
        print("Streaming... Press Ctrl+C to stop")
        while True:
            sd.sleep(1000)

Learning milestones:

  1. Packets are sent → You understand header construction
  2. Voicemeeter receives them → Your packets are valid
  3. Audio plays correctly → Timing and format are correct
  4. Works across network → Broadcast/unicast works

Project 4: VBAN Text Protocol (Remote Control)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Python
  • Alternative Programming Languages: C, Go, JavaScript
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Remote Control / Command Protocols
  • Software or Tool: Voicemeeter, Python
  • Main Book: “Design Patterns” by Gang of Four (Command pattern)

What you’ll build: A tool that sends VBAN text commands to control Voicemeeter remotely—muting channels, adjusting volumes, changing settings.

Why it teaches VBAN sub-protocols: VBAN isn’t just audio. The TEXT sub-protocol shows how the same packet format extends to different data types. You’ll learn protocol design flexibility.

Core challenges you’ll face:

  • Text encoding → maps to UTF-8 in VBAN packets
  • Command syntax → maps to Voicemeeter’s control language
  • Bidirectional communication → maps to sending commands, receiving state
  • Reliable delivery → maps to UDP unreliability, retries

Key Concepts:

  • VBAN Text Protocol: VB-Audio documentation
  • Command Pattern: Remote control design
  • UTF-8 Encoding: Text in binary protocols

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Project 1

Real world outcome:

# Mute input strip 0
$ vban_sendtext --ip 192.168.1.100 "Strip[0].Mute = 1"
Sent: Strip[0].Mute = 1

# Set volume of bus 0
$ vban_sendtext --ip 192.168.1.100 "Bus[0].Gain = -6.0"
Sent: Bus[0].Gain = -6.0

# Interactive mode
$ vban_sendtext --ip 192.168.1.100 --interactive
VBAN Text Console (connected to 192.168.1.100)
> Strip[0].Mute = 0
Sent.
> Strip[0].Gain = -3.0
Sent.
> quit

Implementation Hints:

VBAN text packet construction:

class VBANText:
    def __init__(self, dest_ip, dest_port=6980, stream_name="Command1"):
        self.dest = (dest_ip, dest_port)
        self.stream_name = stream_name
        self.frame_counter = 0
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

    def send_text(self, text):
        """Send text command via VBAN"""

        # format_SR: TEXT protocol (0x40) + encoding
        # Encoding: 0=ASCII, 1=UTF8, 2=WCHAR
        format_sr = 0x40 | 0x01  # TEXT + UTF8

        # format_nbs, format_nbc not used for text (set to 0)
        format_nbs = 0
        format_nbc = 0

        # format_bit: charset info
        format_bit = 0x10  # UTF8

        stream_bytes = self.stream_name.encode('ascii')[:16].ljust(16, b'\x00')

        header = struct.pack('<4sBBBB16sI',
            b'VBAN',
            format_sr,
            format_nbs,
            format_nbc,
            format_bit,
            stream_bytes,
            self.frame_counter
        )

        self.frame_counter = (self.frame_counter + 1) & 0xFFFFFFFF

        # Text payload (UTF-8 encoded)
        payload = text.encode('utf-8')

        packet = header + payload
        self.sock.sendto(packet, self.dest)

        return True

Voicemeeter command examples:

# Common Voicemeeter commands
COMMANDS = {
    # Strips (inputs)
    'mute_strip': 'Strip[{n}].Mute = {v}',
    'strip_gain': 'Strip[{n}].Gain = {v}',
    'strip_solo': 'Strip[{n}].Solo = {v}',

    # Buses (outputs)
    'mute_bus': 'Bus[{n}].Mute = {v}',
    'bus_gain': 'Bus[{n}].Gain = {v}',

    # Routing
    'strip_to_bus': 'Strip[{strip}].A{bus} = {v}',  # A1-A5, B1-B3

    # Recorder
    'record': 'Recorder.Record = {v}',
    'stop': 'Recorder.Stop = {v}',
}

def mute_strip(vban_text, strip_number, muted=True):
    cmd = f"Strip[{strip_number}].Mute = {1 if muted else 0}"
    vban_text.send_text(cmd)

Learning milestones:

  1. Text packets construct → You understand the TEXT sub-protocol
  2. Voicemeeter responds → Your commands are valid
  3. Complex commands work → You understand the control syntax
  4. You build an interface → You’ve created a remote control

Project 5: VBAN Serial/MIDI (Musical Control)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Python
  • Alternative Programming Languages: C, C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: MIDI Protocol / Serial Communication
  • Software or Tool: Python, mido (MIDI library)
  • Main Book: “MIDI Manual” by David Miles Huber

What you’ll build: A MIDI-over-VBAN bridge that sends and receives MIDI messages, enabling control of DAWs and synthesizers over the network.

Why it teaches VBAN extensions: The SERIAL sub-protocol shows VBAN’s versatility. You’ll learn how binary protocols can carry different payloads and how MIDI integrates with network transport.

Core challenges you’ll face:

  • MIDI message format → maps to note on/off, CC, pitch bend
  • Serial header configuration → maps to baud rate in SR field
  • Timing accuracy → maps to MIDI timing requirements
  • Multiple messages per packet → maps to efficient MIDI bundling

Key Concepts:

  • MIDI Protocol: MIDI 1.0 specification
  • VBAN Serial: VB-Audio documentation
  • Real-time Constraints: MIDI timing requirements

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Project 1, MIDI basics

Real world outcome:

$ vban_midi_bridge --ip 192.168.1.100 --local-port 6980 \
    --midi-in "USB MIDI Controller" --midi-out "Virtual MIDI Port"

VBAN MIDI Bridge v1.0
Local MIDI In: USB MIDI Controller
Local MIDI Out: Virtual MIDI Port
Remote: 192.168.1.100:6980

[TX] Note On:  Ch1 C4 vel:100
[TX] Note Off: Ch1 C4 vel:0
[RX] CC: Ch1 CC7 val:80
[RX] Note On: Ch1 E4 vel:90
...

Implementation Hints:

VBAN Serial/MIDI header:

class VBANMidi:
    def __init__(self, dest_ip, dest_port=6980, stream_name="Midi1"):
        self.dest = (dest_ip, dest_port)
        self.stream_name = stream_name
        self.frame_counter = 0
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

    def send_midi(self, midi_messages):
        """
        Send MIDI messages via VBAN Serial protocol
        midi_messages: list of bytes objects (raw MIDI)
        """

        # format_SR: SERIAL protocol (0x20) + bps index
        # For MIDI, we use 31250 baud (standard MIDI rate)
        # The SR field contains bps/100, so 31250 → index for ~312
        format_sr = 0x20 | 0x00  # SERIAL + MIDI mode

        # format_nbs: not used (0)
        format_nbs = 0

        # format_nbc: not used (0)
        format_nbc = 0

        # format_bit: MIDI type indicator
        format_bit = 0x10  # MIDI mode

        stream_bytes = self.stream_name.encode('ascii')[:16].ljust(16, b'\x00')

        header = struct.pack('<4sBBBB16sI',
            b'VBAN',
            format_sr,
            format_nbs,
            format_nbc,
            format_bit,
            stream_bytes,
            self.frame_counter
        )

        self.frame_counter = (self.frame_counter + 1) & 0xFFFFFFFF

        # MIDI payload: concatenate all messages
        payload = b''.join(midi_messages)

        packet = header + payload
        self.sock.sendto(packet, self.dest)

MIDI message construction:

def note_on(channel, note, velocity):
    """Create MIDI Note On message"""
    return bytes([0x90 | (channel & 0x0F), note & 0x7F, velocity & 0x7F])

def note_off(channel, note, velocity=0):
    """Create MIDI Note Off message"""
    return bytes([0x80 | (channel & 0x0F), note & 0x7F, velocity & 0x7F])

def control_change(channel, controller, value):
    """Create MIDI Control Change message"""
    return bytes([0xB0 | (channel & 0x0F), controller & 0x7F, value & 0x7F])

def pitch_bend(channel, value):
    """Create MIDI Pitch Bend message (value: -8192 to 8191)"""
    value = value + 8192  # Convert to 0-16383
    lsb = value & 0x7F
    msb = (value >> 7) & 0x7F
    return bytes([0xE0 | (channel & 0x0F), lsb, msb])

Learning milestones:

  1. MIDI messages send → You understand MIDI format
  2. DAW receives notes → Your VBAN-MIDI bridge works
  3. Bidirectional works → You handle both directions
  4. Timing is accurate → MIDI performance is usable

Project 6: ESP32 VBAN Audio Node (Embedded Implementation)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: C++ (Arduino)
  • Alternative Programming Languages: C, MicroPython
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Embedded Systems / I2S Audio
  • Software or Tool: ESP32, Arduino IDE, I2S microphone/DAC
  • Main Book: “Making Embedded Systems” by Elecia White

What you’ll build: An ESP32-based network audio device that can send microphone audio to the network (emitter) and/or receive audio and play it through speakers (receiver).

Why it teaches embedded VBAN: Moving from PC to microcontroller forces you to understand memory constraints, DMA, and real-time requirements. This is where VBAN becomes a hardware protocol.

Core challenges you’ll face:

  • I2S audio interface → maps to hardware audio on ESP32
  • WiFi UDP → maps to wireless networking on microcontrollers
  • DMA buffers → maps to efficient audio transfer
  • Real-time constraints → maps to timing on embedded systems

Resources for key challenges:

Key Concepts:

  • I2S Protocol: ESP32 I2S documentation
  • DMA: Direct Memory Access for audio
  • WiFi UDP: ESP32 networking

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 1-3, Arduino basics, soldering

Real world outcome:

Hardware Setup:
┌─────────────────────────────────────────────────────────────────┐
│  ESP32 VBAN Audio Node                                          │
│                                                                  │
│  ┌──────────┐     ┌──────────┐     ┌──────────┐                │
│  │ INMP441  │────►│  ESP32   │────►│ MAX98357 │                │
│  │ I2S Mic  │     │          │     │ I2S DAC  │                │
│  └──────────┘     │   WiFi   │     └────┬─────┘                │
│                   │    ▲     │          │                       │
│                   └────┼─────┘          ▼                       │
│                        │           ┌─────────┐                  │
│                   VBAN UDP         │ Speaker │                  │
│                        │           └─────────┘                  │
│                        ▼                                        │
│                ┌──────────────┐                                 │
│                │ Voicemeeter  │                                 │
│                │    (PC)      │                                 │
│                └──────────────┘                                 │
└─────────────────────────────────────────────────────────────────┘

Serial Output:
ESP32 VBAN Audio Node
Connecting to WiFi: MyNetwork... Connected!
IP Address: 192.168.1.42
VBAN Emitter: streaming to 192.168.1.100:6980 as "ESP32-Mic"
VBAN Receiver: listening for "Stream1" on port 6980
Audio: 44100 Hz, 16-bit mono
Running...

Implementation Hints:

ESP32 VBAN header structure:

// VBAN Header (28 bytes)
typedef struct __attribute__((packed)) {
    char vban[4];           // "VBAN"
    uint8_t format_SR;      // Sample rate index + protocol
    uint8_t format_nbs;     // Samples per frame - 1
    uint8_t format_nbc;     // Channels - 1
    uint8_t format_bit;     // Bit format + codec
    char streamname[16];    // Stream name
    uint32_t nuFrame;       // Frame counter
} VBANHeader;

// Sample rate lookup
const uint32_t VBAN_SRList[] = {
    6000, 12000, 24000, 48000, 96000, 192000, 384000,
    8000, 16000, 32000, 64000, 128000, 256000, 512000,
    11025, 22050, 44100, 88200, 176400, 352800, 705600
};

// Find sample rate index
uint8_t getSRIndex(uint32_t sampleRate) {
    for (int i = 0; i < 21; i++) {
        if (VBAN_SRList[i] == sampleRate) return i;
    }
    return 3;  // Default to 48000
}

I2S microphone input:

#include <driver/i2s.h>

#define I2S_WS  25   // Word Select (LRCK)
#define I2S_SD  33   // Serial Data
#define I2S_SCK 32   // Serial Clock

void setupI2SMic() {
    i2s_config_t i2s_config = {
        .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
        .sample_rate = 44100,
        .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
        .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
        .communication_format = I2S_COMM_FORMAT_STAND_I2S,
        .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
        .dma_buf_count = 8,
        .dma_buf_len = 256,
        .use_apll = false
    };

    i2s_pin_config_t pin_config = {
        .bck_io_num = I2S_SCK,
        .ws_io_num = I2S_WS,
        .data_out_num = I2S_PIN_NO_CHANGE,
        .data_in_num = I2S_SD
    };

    i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);
    i2s_set_pin(I2S_NUM_0, &pin_config);
}

VBAN packet sending:

#include <WiFi.h>
#include <WiFiUdp.h>

WiFiUDP udp;
uint32_t frameCounter = 0;
uint8_t packet[1464];  // Max VBAN packet size

void sendVBANPacket(int16_t* samples, int numSamples) {
    VBANHeader* header = (VBANHeader*)packet;

    // Fill header
    memcpy(header->vban, "VBAN", 4);
    header->format_SR = 0x00 | getSRIndex(44100);  // Audio + 44100Hz
    header->format_nbs = numSamples - 1;
    header->format_nbc = 0;  // 1 channel
    header->format_bit = 0x01;  // 16-bit signed
    strncpy(header->streamname, "ESP32-Mic", 16);
    header->nuFrame = frameCounter++;

    // Copy audio data
    memcpy(packet + 28, samples, numSamples * 2);

    // Send
    udp.beginPacket(destIP, 6980);
    udp.write(packet, 28 + numSamples * 2);
    udp.endPacket();
}

void loop() {
    int16_t samples[256];
    size_t bytesRead;

    i2s_read(I2S_NUM_0, samples, 512, &bytesRead, portMAX_DELAY);

    if (bytesRead == 512) {
        sendVBANPacket(samples, 256);
    }
}

Learning milestones:

  1. WiFi connects → You understand ESP32 networking
  2. I2S reads audio → You understand hardware audio
  3. Voicemeeter receives → Your packets are valid
  4. Bidirectional works → You’ve built a complete audio node

Project 7: VBAN Network Scanner (Service Discovery)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Go, Rust, C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Network Discovery / Service Protocol
  • Software or Tool: Python
  • Main Book: “Computer Networks” by Tanenbaum

What you’ll build: A network scanner that discovers VBAN devices using the SERVICE sub-protocol, lists active streams, and monitors network audio traffic.

Why it teaches VBAN discovery: VBAN has no built-in discovery (unlike Dante). Building one teaches you about service discovery patterns and how to work around protocol limitations.

Core challenges you’ll face:

  • Passive discovery → maps to listening for any VBAN packets
  • Active ping → maps to SERVICE sub-protocol
  • Stream enumeration → maps to tracking unique streams
  • Network topology → maps to understanding broadcast domains

Key Concepts:

  • Service Discovery: mDNS, broadcast patterns
  • Network Scanning: Ethical considerations
  • VBAN Service Protocol: Ping packets

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 1

Real world outcome:

$ vban_scanner --timeout 10
VBAN Network Scanner v1.0
Scanning for 10 seconds...

Discovered VBAN Devices:
┌───────────────────┬──────────────────┬─────────────┬──────────┬──────────┐
│ IP Address        │ Stream Name      │ Type        │ Format   │ Packets  │
├───────────────────┼──────────────────┼─────────────┼──────────┼──────────┤
│ 192.168.1.100     │ Stream1          │ Audio       │ 48kHz/2ch│ 1,234    │
│ 192.168.1.100     │ Stream2          │ Audio       │ 44.1kHz/1│ 567      │
│ 192.168.1.42      │ ESP32-Mic        │ Audio       │ 44.1kHz/1│ 890      │
│ 192.168.1.50      │ Command1         │ Text        │ UTF-8    │ 23       │
└───────────────────┴──────────────────┴─────────────┴──────────┴──────────┘

Stream Details:
  Stream1 @ 192.168.1.100
    Duration: 10.0s
    Packets: 1,234
    Rate: 123.4 packets/sec
    Bytes: 1.27 MB
    Lost: 0 (0.00%)

Implementation Hints:

Stream tracker:

from collections import defaultdict
import time

class VBANScanner:
    def __init__(self):
        self.streams = defaultdict(lambda: {
            'first_seen': None,
            'last_seen': None,
            'packets': 0,
            'bytes': 0,
            'last_frame': -1,
            'lost': 0,
            'sample_rate': 0,
            'channels': 0,
            'protocol': 0
        })

    def process_packet(self, data, addr):
        header = parse_vban_header(data)
        if not header:
            return

        key = (addr[0], header['stream_name'])
        stream = self.streams[key]

        now = time.time()
        if stream['first_seen'] is None:
            stream['first_seen'] = now
        stream['last_seen'] = now
        stream['packets'] += 1
        stream['bytes'] += len(data)
        stream['sample_rate'] = header['sample_rate']
        stream['channels'] = header['channels']
        stream['protocol'] = header['protocol']

        # Track lost packets
        if stream['last_frame'] >= 0:
            expected = (stream['last_frame'] + 1) & 0xFFFFFFFF
            if header['frame'] != expected:
                lost = (header['frame'] - expected) & 0xFFFFFFFF
                if lost < 1000:  # Sanity check
                    stream['lost'] += lost
        stream['last_frame'] = header['frame']

    def get_report(self):
        report = []
        for (ip, name), stream in self.streams.items():
            duration = stream['last_seen'] - stream['first_seen']
            rate = stream['packets'] / duration if duration > 0 else 0

            report.append({
                'ip': ip,
                'name': name,
                'protocol': ['Audio', 'Serial', 'Text', 'Service'][stream['protocol']],
                'sample_rate': stream['sample_rate'],
                'channels': stream['channels'],
                'packets': stream['packets'],
                'bytes': stream['bytes'],
                'rate': rate,
                'lost': stream['lost'],
                'duration': duration
            })
        return report

Learning milestones:

  1. You see all streams → You understand passive discovery
  2. Statistics are accurate → You track packets correctly
  3. Lost packets detected → Frame counting works
  4. Clean reporting → You’ve built a useful tool

Project 8: VBAN Audio Router (Multi-Stream Hub)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Python
  • Alternative Programming Languages: C, Rust, Go
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 4: Expert
  • Knowledge Area: Audio Routing / Stream Processing
  • Software or Tool: Python, numpy
  • Main Book: “Designing Audio Effect Plugins in C++” by Will Pirkle

What you’ll build: A software audio router that receives multiple VBAN streams, mixes/routes them to different destinations, and optionally applies processing (gain, mixing).

Why it teaches advanced VBAN: Real audio systems need routing. Building a router teaches you about multiple streams, mixing, and the complexity of multi-source audio.

Core challenges you’ll face:

  • Multiple stream handling → maps to concurrent reception
  • Sample rate conversion → maps to matching different streams
  • Audio mixing → maps to combining streams without clipping
  • Routing matrix → maps to flexible input→output mapping

Key Concepts:

  • Audio Mixing: Summing, gain staging
  • Sample Rate Conversion: Resampling algorithms
  • Thread Safety: Concurrent audio processing

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Projects 1-3, audio processing basics

Real world outcome:

$ vban_router --config router.yaml
VBAN Audio Router v1.0

Inputs:
  [1] 192.168.1.100:Stream1 → 48kHz/2ch [████████░░] -6dB
  [2] 192.168.1.42:ESP32-Mic → 44.1kHz/1ch [██████████] 0dB
  [3] 192.168.1.50:Music → 48kHz/2ch [████░░░░░░] -12dB

Outputs:
  [A] 192.168.1.200:6980 "RouterMix"[1,2,3]
  [B] 192.168.1.201:6980 "VoiceOnly"[1,2]

Routing Matrix:
       Out A   Out B
In 1   [X]     [X]    (gain: 0dB)
In 2   [X]     [X]    (gain: +3dB)
In 3   [X]     [ ]    (gain: -6dB)

Stats: CPU 12% | Latency 15ms | Packets/s: 375

Implementation Hints:

Multi-stream receiver:

import threading
import queue
import numpy as np

class VBANRouter:
    def __init__(self, config):
        self.inputs = {}   # stream_name → input config
        self.outputs = {}  # output_name → output config
        self.routing = {}  # (input, output) → gain
        self.buffers = {}  # stream_name → audio queue
        self.running = False

    def receive_thread(self, sock):
        """Receive packets and distribute to input buffers"""
        while self.running:
            data, addr = sock.recvfrom(1500)
            header = parse_vban_header(data)
            if not header:
                continue

            key = f"{addr[0]}:{header['stream_name']}"
            if key in self.inputs:
                audio = self.extract_audio(data, header)
                self.buffers[key].put((header, audio))

    def mixer_thread(self):
        """Mix inputs according to routing matrix"""
        while self.running:
            # Collect audio from all inputs
            input_audio = {}
            for name, buf in self.buffers.items():
                try:
                    header, audio = buf.get(timeout=0.01)
                    input_audio[name] = (header, audio)
                except queue.Empty:
                    pass

            # Mix for each output
            for output_name, output_config in self.outputs.items():
                mixed = self.mix_for_output(output_name, input_audio)
                if mixed is not None:
                    self.send_output(output_name, mixed)

    def mix_for_output(self, output_name, input_audio):
        """Mix all routed inputs for one output"""
        mixed = None

        for input_name, (header, audio) in input_audio.items():
            key = (input_name, output_name)
            if key in self.routing:
                gain = self.routing[key]

                # Apply gain
                scaled = audio * (10 ** (gain / 20))

                # Mix
                if mixed is None:
                    mixed = scaled.copy()
                else:
                    # Handle different sample rates...
                    mixed = mixed + scaled

        # Clip to prevent overflow
        if mixed is not None:
            mixed = np.clip(mixed, -1.0, 1.0)

        return mixed

Learning milestones:

  1. Multiple streams receive → You handle concurrent input
  2. Mixing works → You understand audio summing
  3. Routing is configurable → You built a flexible system
  4. No glitches → Buffering and timing work correctly

Project 9: VBAN Quality Monitor (Network Analysis)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Go, C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Network Analysis / QoS Monitoring
  • Software or Tool: Python, matplotlib
  • Main Book: “High Performance Browser Networking” by Ilya Grigorik

What you’ll build: A monitoring tool that analyzes VBAN stream quality—measuring jitter, packet loss, latency estimation, and displaying real-time graphs.

Why it teaches network audio quality: VBAN over UDP means packets can be lost or delayed. Understanding quality metrics is essential for diagnosing audio problems.

Core challenges you’ll face:

  • Jitter measurement → maps to variance in packet arrival times
  • Loss detection → maps to frame counter gaps
  • Latency estimation → maps to one-way delay measurement
  • Real-time visualization → maps to live updating graphs

Key Concepts:

  • Jitter: Network timing variance
  • QoS Metrics: MOS, latency, loss
  • Network Buffers: How to size them

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1, 7

Real world outcome:

$ vban_monitor --stream "Stream1" --ip 192.168.1.100
VBAN Quality Monitor v1.0
Monitoring: Stream1 @ 192.168.1.100

Real-time Metrics (updated every 1s):
┌─────────────────────────────────────────────────────────────────┐
│  Packet Rate:  187.5 pkt/s    Expected: 187.5 pkt/s   OK       │
│  Packet Loss:  0.02%          (3 of 14,062 packets)            │
│  Jitter:       1.2ms avg      (0.5ms - 3.8ms range)            │
│  Buffer Need:  ~4ms           (based on jitter)                │
├─────────────────────────────────────────────────────────────────┤
│  Inter-Packet Timing (ms):                                      │
│  5.33 ████████████████████████████████████████████ (expected)  │
│  5.00 ██████████████████████████████████░░░░░░░░░░             │
│  5.50 ██████████████████████████████████████░░░░░░             │
│  6.00 ██████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             │
├─────────────────────────────────────────────────────────────────┤
│  Jitter History (last 60s):                                     │
│  3ms ┤      *                                                   │
│  2ms ┤   *     * *    *                                        │
│  1ms ┤ *   * *     * *  * * * * * * * * * *                    │
│  0ms ┼──────────────────────────────────────────► time         │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Jitter calculation:

import statistics
from collections import deque

class QualityMonitor:
    def __init__(self, window_size=1000):
        self.arrival_times = deque(maxlen=window_size)
        self.intervals = deque(maxlen=window_size)
        self.last_arrival = None
        self.last_frame = -1
        self.packets_received = 0
        self.packets_lost = 0

    def process_packet(self, header, arrival_time):
        self.packets_received += 1

        # Track inter-packet intervals
        if self.last_arrival is not None:
            interval = (arrival_time - self.last_arrival) * 1000  # ms
            self.intervals.append(interval)

        self.last_arrival = arrival_time
        self.arrival_times.append(arrival_time)

        # Track lost packets
        if self.last_frame >= 0:
            expected = (self.last_frame + 1) & 0xFFFFFFFF
            if header['frame'] != expected:
                lost = (header['frame'] - expected) & 0xFFFFFFFF
                if lost < 1000:
                    self.packets_lost += lost
        self.last_frame = header['frame']

    def get_jitter_stats(self):
        if len(self.intervals) < 2:
            return None

        # Expected interval based on sample rate and samples/frame
        # e.g., 48000 Hz, 256 samples = 5.33ms

        mean = statistics.mean(self.intervals)
        stdev = statistics.stdev(self.intervals)
        min_val = min(self.intervals)
        max_val = max(self.intervals)

        # Jitter = deviation from expected
        # RFC 3550 jitter calculation
        jitter = stdev

        return {
            'mean_interval': mean,
            'jitter': jitter,
            'min': min_val,
            'max': max_val,
            'range': max_val - min_val
        }

    def get_loss_rate(self):
        total = self.packets_received + self.packets_lost
        if total == 0:
            return 0
        return self.packets_lost / total * 100

    def recommended_buffer_size(self):
        stats = self.get_jitter_stats()
        if stats is None:
            return 10  # Default 10ms

        # Buffer should cover worst-case jitter
        # Rule of thumb: 2-3x the jitter
        return stats['jitter'] * 3

Learning milestones:

  1. Jitter measured → You understand timing variance
  2. Loss tracked → Frame counter analysis works
  3. Graphs display → Real-time visualization works
  4. Recommendations accurate → You understand buffer sizing

Project 10: Complete VBAN Application (Full Implementation)

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Rust or C
  • Alternative Programming Languages: Go, C++
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 4: Expert
  • Knowledge Area: Systems Programming / Audio Engineering
  • Software or Tool: Rust or C, ALSA/CoreAudio/WASAPI
  • Main Book: “Programming Rust” by Blandy & Orendorff

What you’ll build: A complete VBAN implementation from scratch in a systems language—full protocol support, platform audio integration, low-latency performance, and production-quality code.

Why this is the ultimate project: You’ll implement VBAN at the same level as the official tools, understanding every optimization and design decision.

Core challenges you’ll face:

  • Zero-copy packet handling → maps to memory efficiency
  • Platform audio abstraction → maps to ALSA/CoreAudio/WASAPI
  • Lock-free buffers → maps to real-time audio requirements
  • All sub-protocols → maps to complete specification coverage

Difficulty: Expert Time estimate: 2-3 months Prerequisites: All previous projects, systems programming experience

Real world outcome:

$ vban --mode emitter --device "hw:0" --stream "MyStream" \
    --dest 192.168.1.100 --rate 48000 --channels 2 --format s16

$ vban --mode receptor --stream "Stream1" --device "hw:0" \
    --quality 3 --buffer 20

$ vban --mode text --dest 192.168.1.100 "Strip[0].Mute = 1"

$ vban --mode scan --timeout 10

Implementation Hints:

Rust VBAN types:

use std::net::UdpSocket;

const VBAN_HEADER_SIZE: usize = 28;
const VBAN_MAX_DATA_SIZE: usize = 1436;

#[repr(C, packed)]
#[derive(Clone, Copy)]
struct VBANHeader {
    vban: [u8; 4],
    format_sr: u8,
    format_nbs: u8,
    format_nbc: u8,
    format_bit: u8,
    streamname: [u8; 16],
    nu_frame: u32,
}

impl VBANHeader {
    fn new(sample_rate_index: u8, samples: u8, channels: u8,
           bit_format: u8, name: &str, frame: u32) -> Self {
        let mut streamname = [0u8; 16];
        let name_bytes = name.as_bytes();
        let len = name_bytes.len().min(16);
        streamname[..len].copy_from_slice(&name_bytes[..len]);

        Self {
            vban: *b"VBAN",
            format_sr: sample_rate_index,
            format_nbs: samples - 1,
            format_nbc: channels - 1,
            format_bit: bit_format,
            streamname,
            nu_frame: frame,
        }
    }

    fn to_bytes(&self) -> [u8; 28] {
        unsafe { std::mem::transmute(*self) }
    }

    fn from_bytes(data: &[u8]) -> Option<Self> {
        if data.len() < 28 {
            return None;
        }
        if &data[0..4] != b"VBAN" {
            return None;
        }

        let header: Self = unsafe {
            std::ptr::read(data.as_ptr() as *const Self)
        };
        Some(header)
    }
}

Lock-free ring buffer for audio:

use std::sync::atomic::{AtomicUsize, Ordering};

struct RingBuffer<T: Copy + Default, const N: usize> {
    buffer: [T; N],
    write_pos: AtomicUsize,
    read_pos: AtomicUsize,
}

impl<T: Copy + Default, const N: usize> RingBuffer<T, N> {
    fn new() -> Self {
        Self {
            buffer: [T::default(); N],
            write_pos: AtomicUsize::new(0),
            read_pos: AtomicUsize::new(0),
        }
    }

    fn push(&self, items: &[T]) -> usize {
        let write = self.write_pos.load(Ordering::Relaxed);
        let read = self.read_pos.load(Ordering::Acquire);

        let available = if write >= read {
            N - (write - read) - 1
        } else {
            read - write - 1
        };

        let to_write = items.len().min(available);
        // ... write logic with wrap-around

        self.write_pos.store((write + to_write) % N, Ordering::Release);
        to_write
    }

    fn pop(&self, out: &mut [T]) -> usize {
        // ... similar read logic
    }
}

Learning milestones:

  1. Packets send/receive → Core protocol works
  2. Platform audio works → System integration complete
  3. All sub-protocols → Full specification coverage
  4. Production quality → Error handling, logging, config

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. Packet Analyzer Weekend ⚡⚡⚡ 🎮🎮🎮
2. VBAN Receiver ⭐⭐ 1 week ⚡⚡⚡⚡ 🎮🎮🎮🎮
3. VBAN Emitter ⭐⭐ 1 week ⚡⚡⚡⚡ 🎮🎮🎮🎮
4. Text Protocol ⭐⭐ Weekend ⚡⚡⚡ 🎮🎮🎮
5. MIDI Bridge ⭐⭐⭐ 2 weeks ⚡⚡⚡⚡ 🎮🎮🎮🎮🎮
6. ESP32 Audio Node ⭐⭐⭐ 2-3 weeks ⚡⚡⚡⚡⚡ 🎮🎮🎮🎮🎮
7. Network Scanner ⭐⭐ 1 week ⚡⚡⚡ 🎮🎮🎮
8. Audio Router ⭐⭐⭐⭐ 3-4 weeks ⚡⚡⚡⚡⚡ 🎮🎮🎮🎮
9. Quality Monitor ⭐⭐⭐ 2 weeks ⚡⚡⚡⚡ 🎮🎮🎮🎮
10. Complete Implementation ⭐⭐⭐⭐ 2-3 months ⚡⚡⚡⚡⚡ 🎮🎮🎮🎮🎮

Your Starting Point

If you’re learning network audio from scratch: Projects 1 → 2 → 3 → 7 → 9 (Core protocol understanding)

If you’re interested in embedded/IoT audio: Projects 1 → 2 → 3 → 6 (ESP32 focus)

If you want to build tools for Voicemeeter: Projects 1 → 4 → 5 → 8 (Control and routing)

Phase 1: Protocol Fundamentals (1-2 weeks)
├── Project 1: Packet Analyzer → See the wire format
└── Project 2: VBAN Receiver → Understand slave behavior

Phase 2: Bidirectional Communication (2-3 weeks)
├── Project 3: VBAN Emitter → Understand master behavior
└── Project 4: Text Protocol → Learn sub-protocols

Phase 3: Specialized Applications (3-5 weeks)
├── Project 5: MIDI Bridge → Musical control
├── Project 6: ESP32 Node → Embedded audio
└── Project 7: Network Scanner → Discovery

Phase 4: Advanced Systems (5-8 weeks)
├── Project 8: Audio Router → Multi-stream handling
├── Project 9: Quality Monitor → Network analysis
└── Project 10: Complete Implementation → Production quality

Final Project: Networked Audio Production System

  • File: LEARN_VBAN_PROTOCOL.md
  • Main Programming Language: Mixed (Python, C++, Arduino)
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 5: Master
  • Knowledge Area: Complete Audio System Design
  • Software or Tool: Everything from previous projects

What you’ll build: A complete networked audio production system using VBAN—multiple ESP32 microphones, a mixing/routing server, monitoring dashboards, and remote control.

System Architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    NETWORKED AUDIO PRODUCTION SYSTEM                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐                                     │
│   │ ESP32   │  │ ESP32   │  │ ESP32   │    VBAN Audio                       │
│   │ Mic 1   │  │ Mic 2   │  │ Mic 3   │    ═══════►                         │
│   └────┬────┘  └────┬────┘  └────┬────┘                                     │
│        │            │            │                                           │
│        └────────────┼────────────┘                                           │
│                     │                                                        │
│                     ▼                                                        │
│   ┌─────────────────────────────────────┐                                   │
│   │         VBAN ROUTER/MIXER           │                                   │
│   │                                     │                                   │
│   │  Inputs:  [Mic1] [Mic2] [Mic3]     │                                   │
│   │  Outputs: [Main] [Monitor] [Record]│                                   │
│   │  Effects: [EQ] [Comp] [Gate]       │                                   │
│   └──────────────┬──────────────────────┘                                   │
│                  │                                                           │
│       ┌──────────┼──────────┐                                               │
│       │          │          │                                               │
│       ▼          ▼          ▼                                               │
│   ┌───────┐  ┌───────┐  ┌───────┐                                          │
│   │ Main  │  │Monitor│  │Recorder│                                         │
│   │Output │  │ Mix   │  │        │                                         │
│   └───────┘  └───────┘  └───────┘                                          │
│                                                                              │
│   ┌─────────────────────────────────────┐                                   │
│   │         MONITORING DASHBOARD         │                                   │
│   │  [Levels] [Jitter] [Latency] [Loss] │                                   │
│   └─────────────────────────────────────┘                                   │
│                                                                              │
│   ┌─────────────────────────────────────┐                                   │
│   │         REMOTE CONTROL (Web UI)      │                                   │
│   │  [Mute] [Gain] [Routing] [Presets]  │                                   │
│   └─────────────────────────────────────┘                                   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Difficulty: Master Time estimate: 3-4 months Prerequisites: All 10 projects completed


Essential Resources

Official Documentation

Resource URL Description
VBAN Specification vb-audio.com/Voicemeeter/VBANProtocol_Specifications.pdf Official protocol spec
VB-Audio VBAN Page vb-audio.com/Voicemeeter/vban.htm Overview and downloads
VB-Audio Forums forum.vb-audio.com Developer discussions

Open Source Implementations

Project Language URL
vban (quiniouben) C github.com/quiniouben/vban
pyVBAN Python github.com/TheStaticTurtle/pyVBAN
ESP32-VBAN-Audio-Source C++ github.com/rkinnett/ESP32-VBAN-Audio-Source
ESP32-VBAN-Network-Audio-Player C++ github.com/rkinnett/ESP32-VBAN-Network-Audio-Player
Arduino Audio Tools C++ github.com/pschatzmann/arduino-audio-tools
vban (npm) JavaScript npmjs.com/package/vban

Books

Book Author Best For
TCP/IP Illustrated, Volume 1 W. Richard Stevens UDP networking fundamentals
The Audio Programming Book Richard Boulanger Audio DSP concepts
Making Embedded Systems Elecia White ESP32 development
Computer Networks Andrew Tanenbaum Network protocols
Programming Rust Blandy & Orendorff Systems implementation

Tools

Tool Purpose
Voicemeeter VBAN-compatible virtual mixer
Wireshark Packet capture and analysis
VB-Cable Virtual audio cables
VBAN Receptor/Emitter Official VBAN apps

Summary

# Project Main Language Knowledge Area
1 Packet Analyzer Python Network Protocols / Binary Parsing
2 VBAN Receiver Python Audio I/O / Network Programming
3 VBAN Emitter Python Audio Capture / Packet Timing
4 Text Protocol Python Remote Control / Commands
5 MIDI Bridge Python MIDI Protocol / Serial
6 ESP32 Audio Node C++ (Arduino) Embedded Systems / I2S Audio
7 Network Scanner Python Service Discovery / Network Analysis
8 Audio Router Python Audio Routing / Stream Processing
9 Quality Monitor Python QoS / Jitter Analysis
10 Complete Implementation Rust/C Systems Programming
Final Production System Mixed Complete Audio System

Getting Started Checklist

Before starting Project 1:

Welcome to the world of network audio protocols! 🎧


Generated for deep understanding of the VBAN protocol and network audio streaming