← Back to all projects

LEARN WEBRTC DEEP DIVE

Learn WebRTC: From Zero to Real-Time Communication Master

Goal: Deeply understand WebRTC—from capturing media to establishing peer connections, traversing NATs, building video conferencing systems, and implementing your own media servers.


Why WebRTC Matters

WebRTC enables real-time, peer-to-peer communication directly in browsers without plugins. It powers:

  • Video calls (Google Meet, Discord, Zoom Web)
  • Screen sharing
  • File transfers
  • Live streaming
  • Online gaming
  • IoT device communication

Yet most developers use WebRTC libraries as black boxes. After completing these projects, you will:

  • Understand every step of a WebRTC connection (from SDP to ICE to DTLS-SRTP)
  • Know how NAT traversal actually works
  • Build video conferencing from scratch
  • Implement signaling servers
  • Understand why connections fail and how to debug them
  • Build production-ready real-time applications

Core Concept Analysis

The WebRTC Connection Flow

┌─────────────┐                              ┌─────────────┐
│   Peer A    │                              │   Peer B    │
├─────────────┤                              ├─────────────┤
│             │ 1. Create Offer (SDP)        │             │
│             │ ─────────────────────────►   │             │
│             │                              │             │
│             │ 2. Create Answer (SDP)       │             │
│             │ ◄─────────────────────────   │             │
│             │                              │             │
│             │ 3. Exchange ICE Candidates   │             │
│             │ ◄────────────────────────►   │             │
│             │                              │             │
│             │ 4. DTLS Handshake            │             │
│             │ ◄═══════════════════════►    │             │
│             │                              │             │
│             │ 5. SRTP Media Flow           │             │
│             │ ◄══════════════════════►     │             │
└─────────────┘                              └─────────────┘
        │                                            │
        │         ┌─────────────────┐                │
        └────────►│ Signaling Server│◄───────────────┘
                  │   (WebSocket)   │
                  └─────────────────┘

Fundamental Concepts

1. Media Capture (getUserMedia/getDisplayMedia)

┌──────────────────────────────────────────┐
│              Browser APIs                 │
├──────────────────────────────────────────┤
│  navigator.mediaDevices.getUserMedia()   │ → Camera/Mic
│  navigator.mediaDevices.getDisplayMedia()│ → Screen
│                    ↓                      │
│            MediaStream                    │
│         ┌─────────┬─────────┐            │
│         │ Video   │ Audio   │            │
│         │ Track   │ Track   │            │
│         └─────────┴─────────┘            │
└──────────────────────────────────────────┘

2. Session Description Protocol (SDP)

SDP is the “contract” between peers describing:

  • Media types (audio/video)
  • Codecs supported (VP8, H.264, Opus)
  • Network information
  • Security parameters
v=0
o=- 123456789 2 IN IP4 127.0.0.1
s=-
t=0 0
m=video 9 UDP/TLS/RTP/SAVPF 96
c=IN IP4 0.0.0.0
a=rtcp-mux
a=rtpmap:96 VP8/90000
a=fingerprint:sha-256 AB:CD:EF:...
a=ice-ufrag:abcd
a=ice-pwd:efghijklmnop

3. ICE (Interactive Connectivity Establishment)

ICE finds the best path between peers through:

  • Host candidates: Local IP addresses
  • Server Reflexive (srflx): Public IP via STUN
  • Relay: Traffic through TURN server
┌─────────┐          ┌──────────┐          ┌─────────┐
│ Peer A  │          │   NAT    │          │ Peer B  │
│ (Local) │◄────────►│ (Router) │◄────────►│ (Local) │
└─────────┘          └──────────┘          └─────────┘
     │                    │                     │
     │    ┌───────────────┴───────────────┐    │
     │    │                               │    │
     ▼    ▼                               ▼    ▼
┌─────────────┐                     ┌─────────────┐
│ STUN Server │                     │ TURN Server │
│(Get Public IP)│                   │(Relay Traffic)│
└─────────────┘                     └─────────────┘

4. DTLS and SRTP

  • DTLS: TLS for UDP—establishes encryption keys
  • SRTP: Secure RTP—encrypted media transport

5. Data Channels (RTCDataChannel)

  • Arbitrary data transfer (text, files, game state)
  • Reliable or unreliable modes
  • Ordered or unordered delivery
  • Built on SCTP over DTLS

6. Topologies for Multi-Party

MESH (P2P)           SFU                    MCU
    ┌───┐           ┌───┐                 ┌───┐
   A│   │B         A│   │B               A│   │B
    └─┬─┘           └─┬─┘                 └─┬─┘
      │               │                     │
      │    ┌───┐      │    ┌─────┐         │    ┌─────┐
      └────┤   ├──────┘    │ SFU │         └────┤ MCU │
           │ C │◄─────────►│     │◄────────────►│     │
           └───┘           └─────┘              └─────┘
                              │                    │
N connections: N(N-1)/2    N connections      1 mixed stream
Bandwidth: High            Bandwidth: Medium  Bandwidth: Low

Project List

Projects are ordered from fundamental understanding to advanced implementations.


Project 1: Media Stream Playground

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: TypeScript, Dart (Flutter)
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Media Capture / Browser APIs
  • Software or Tool: Browser MediaDevices API
  • Main Book: “High Performance Browser Networking” by Ilya Grigorik

What you’ll build: A web application that captures camera/microphone, displays real-time video, shows audio levels via visualizer, lists all available devices, and lets users switch between cameras/microphones dynamically.

Why it teaches WebRTC: Before you can send media, you must capture it. This project makes you intimately familiar with MediaStream, MediaStreamTrack, and the constraints system that controls resolution, frame rate, and device selection.

Core challenges you’ll face:

  • Getting user permission and handling denials → maps to understanding browser security model
  • Parsing MediaStream into tracks → maps to video/audio track separation
  • Building an audio visualizer → maps to Web Audio API integration
  • Switching devices without stopping the stream → maps to track replacement patterns

Key Concepts:

  • getUserMedia constraints: MDN Web Docs - “MediaDevices.getUserMedia()”
  • MediaStream API: “High Performance Browser Networking” Chapter 18 - Ilya Grigorik
  • Web Audio API for visualization: “Web Audio API” Chapter 3 - Boris Smus

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic JavaScript, HTML/CSS, understanding of Promises/async-await

Real world outcome:

┌─────────────────────────────────────────────┐
│  🎥 Media Stream Playground                 │
├─────────────────────────────────────────────┤
│  ┌───────────────────────────────────────┐  │
│  │                                       │  │
│  │         [Your Live Video]             │  │
│  │                                       │  │
│  └───────────────────────────────────────┘  │
│                                             │
│  Audio Level: ████████████░░░░░░ 67%        │
│                                             │
│  Camera:  [▼ Logitech C920          ]       │
│  Mic:     [▼ Blue Yeti              ]       │
│                                             │
│  Resolution: 1280x720 @ 30fps               │
│  [Mirror] [Mute Audio] [Mute Video]         │
└─────────────────────────────────────────────┘

Implementation Hints:

The MediaDevices API is your entry point:

navigator.mediaDevices.getUserMedia({ video: true, audio: true })
  → Returns Promise<MediaStream>

Constraints control what you get:

{
  video: {
    width: { ideal: 1280 },
    height: { ideal: 720 },
    deviceId: { exact: "specific-camera-id" }
  },
  audio: {
    echoCancellation: true,
    noiseSuppression: true
  }
}

To enumerate devices: navigator.mediaDevices.enumerateDevices() returns all cameras, mics, and speakers.

For the audio visualizer, connect the MediaStream to an AnalyserNode:

  1. Create AudioContext
  2. Create MediaStreamSource from your stream
  3. Connect to AnalyserNode
  4. Use getByteFrequencyData() in requestAnimationFrame loop
  5. Draw bars to canvas

Device switching pattern:

  1. Get new stream with new device constraint
  2. Get the track from new stream
  3. Replace the track in any existing peer connections (for future projects)
  4. Stop the old track

Learning milestones:

  1. Video appears in your page → You understand getUserMedia and video element srcObject
  2. Audio visualizer animates → You understand MediaStream ↔ Web Audio integration
  3. Device dropdown works → You understand device enumeration and constraints
  4. Switching cameras works smoothly → You understand track lifecycle

Project 2: Local Screen Recorder

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: TypeScript, Electron (for desktop app)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Screen Capture / MediaRecorder API
  • Software or Tool: Browser getDisplayMedia + MediaRecorder
  • Main Book: “High Performance Browser Networking” by Ilya Grigorik

What you’ll build: A screen recording application that captures your screen (or specific window/tab), optionally overlays webcam video, records to WebM/MP4, shows recording duration, and allows downloading the final video.

Why it teaches WebRTC: Screen sharing is a critical WebRTC feature. This project teaches getDisplayMedia, the differences from getUserMedia, and how to combine multiple streams—skills essential for building screen-sharing in video calls.

Core challenges you’ll face:

  • Using getDisplayMedia with system picker → maps to screen capture API specifics
  • Combining screen + webcam into one canvas → maps to stream composition
  • Recording with MediaRecorder → maps to encoding and container formats
  • Handling user stopping the share → maps to track ended events

Key Concepts:

  • getDisplayMedia API: MDN Web Docs - “Screen Capture API”
  • MediaRecorder API: MDN Web Docs - “MediaRecorder”
  • Canvas compositing: “HTML5 Canvas” Chapter 4 - Steve Fulton
  • Video codecs in browsers: “High Performance Browser Networking” Chapter 18

Difficulty: Beginner Time estimate: Weekend Prerequisites: Project 1 (Media Stream Playground), basic Canvas knowledge

Real world outcome:

┌─────────────────────────────────────────────┐
│  🎬 Screen Recorder Pro                      │
├─────────────────────────────────────────────┤
│  ┌───────────────────────────────────────┐  │
│  │                                       │  │
│  │      [Screen Preview Here]            │  │
│  │                                   ┌──┐│  │
│  │                                   │🎥││  │
│  │                                   └──┘│  │
│  └───────────────────────────────────────┘  │
│                                             │
│  ⏺ Recording: 00:02:34    [Pause] [Stop]   │
│                                             │
│  Include:                                   │
│  [✓] System Audio  [✓] Microphone  [✓] Webcam│
│                                             │
│  Webcam Position: [Top-Right ▼]             │
│                                             │
│  [🔴 Start Recording]  [📥 Download Last]   │
└─────────────────────────────────────────────┘

After recording, user can download recording-2024-01-15.webm.

Implementation Hints:

getDisplayMedia is similar to getUserMedia but captures screen:

navigator.mediaDevices.getDisplayMedia({
  video: { cursor: "always" },
  audio: true  // System audio (browser support varies)
})

The user sees a system picker to choose screen/window/tab.

To combine screen + webcam, you need a Canvas:

  1. Create canvas matching screen dimensions
  2. In requestAnimationFrame loop:
    • Draw screen video to full canvas
    • Draw webcam video to corner (scaled down)
  3. Capture canvas as stream: canvas.captureStream(30)

MediaRecorder records the combined stream:

const recorder = new MediaRecorder(combinedStream, {
  mimeType: 'video/webm; codecs=vp9'
});
recorder.ondataavailable = (e) => chunks.push(e.data);
recorder.onstop = () => {
  const blob = new Blob(chunks, { type: 'video/webm' });
  // Create download link
};

Handle the screen share ending (user clicks “Stop Sharing”):

screenTrack.onended = () => {
  recorder.stop();
  // Clean up
};

Learning milestones:

  1. Screen picker appears and preview works → You understand getDisplayMedia
  2. Webcam overlay appears in corner → You understand canvas compositing
  3. Recording downloads successfully → You understand MediaRecorder
  4. System audio is captured → You understand audio source options

Project 3: Real-Time Video Filters

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: TypeScript, Rust (via WebAssembly for performance)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Video Processing / Canvas / WebGL
  • Software or Tool: Canvas 2D / WebGL Shaders
  • Main Book: “WebGL Programming Guide” by Kouichi Matsuda

What you’ll build: A video processing pipeline that applies real-time filters to your webcam feed—blur backgrounds, apply color effects, add virtual backgrounds, face detection overlays—all running at 30fps in the browser.

Why it teaches WebRTC: In real video calls, you often need to process video before sending (virtual backgrounds, beautification). This project teaches you how MediaStreams can be transformed through canvas/WebGL before being fed into peer connections.

Core challenges you’ll face:

  • Maintaining 30fps with pixel manipulation → maps to performance optimization
  • Implementing background blur/replacement → maps to segmentation algorithms
  • Using WebGL shaders for effects → maps to GPU-accelerated processing
  • Creating a processed MediaStream output → maps to canvas.captureStream()

Key Concepts:

  • Canvas pixel manipulation: “HTML5 Canvas” Chapter 8 - Steve Fulton
  • WebGL shaders: “WebGL Programming Guide” Chapter 5 - Kouichi Matsuda
  • TensorFlow.js body segmentation: TensorFlow.js documentation - “Body Segmentation”
  • requestAnimationFrame optimization: “High Performance Browser Networking” Chapter 10

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Projects 1-2, basic understanding of graphics/shaders helpful

Real world outcome:

┌─────────────────────────────────────────────┐
│  🎨 Video Filter Studio                      │
├─────────────────────────────────────────────┤
│  ┌────────────────┐  ┌────────────────┐     │
│  │                │  │                │     │
│  │   [Original]   │  │   [Filtered]   │     │
│  │                │  │                │     │
│  └────────────────┘  └────────────────┘     │
│                                             │
│  Filters:                                   │
│  [Blur BG] [Virtual BG] [Grayscale] [Sepia] │
│  [Pixelate] [Edge Detect] [Night Vision]    │
│                                             │
│  Virtual Background:                        │
│  [🏖 Beach] [🏢 Office] [🌌 Space] [Upload] │
│                                             │
│  Performance: 32fps | CPU: 15% | GPU: 45%   │
│                                             │
│  [Export Processed Stream for WebRTC]       │
└─────────────────────────────────────────────┘

Implementation Hints:

Basic pipeline architecture:

Camera → Canvas (processing) → captureStream() → Output MediaStream

For simple filters (grayscale, sepia), use Canvas 2D:

  1. drawImage(video, 0, 0) each frame
  2. getImageData() to get pixel array
  3. Manipulate RGBA values
  4. putImageData() back

For performance-critical filters, use WebGL:

  1. Video texture uploaded to GPU each frame
  2. Fragment shader applies effect
  3. Much faster than CPU pixel manipulation

For background blur/replacement, use TensorFlow.js BodyPix or MediaPipe:

  1. Run segmentation model on video frame
  2. Get mask indicating person vs background
  3. Apply blur only to background pixels
  4. Or composite person onto virtual background image

To output the processed video as a MediaStream for WebRTC:

const outputStream = canvas.captureStream(30);
// This stream can be added to RTCPeerConnection

WebGL shader example for grayscale:

precision mediump float;
varying vec2 v_texCoord;
uniform sampler2D u_image;

void main() {
  vec4 color = texture2D(u_image, v_texCoord);
  float gray = dot(color.rgb, vec3(0.299, 0.587, 0.114));
  gl_FragColor = vec4(gray, gray, gray, color.a);
}

Learning milestones:

  1. Basic filters work at 30fps → You understand the video-canvas-stream pipeline
  2. WebGL shaders run smoothly → You understand GPU-accelerated processing
  3. Background blur works → You understand ML-based segmentation
  4. Output stream is usable → You understand how to integrate with WebRTC

Project 4: Peer-to-Peer Video Call (The Core)

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: TypeScript, Go (for signaling server)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: WebRTC Core / Signaling / SDP / ICE
  • Software or Tool: RTCPeerConnection, WebSocket
  • Main Book: “Real-Time Communication with WebRTC” by Salvatore Loreto

What you’ll build: A complete 1-to-1 video calling application with a signaling server that handles room creation, SDP exchange, ICE candidate exchange, and connection state management. Users can create/join rooms and have real video calls.

Why it teaches WebRTC: This is THE foundational WebRTC project. You’ll implement the complete offer/answer flow, understand every field in an SDP, see ICE candidates being gathered and exchanged, and watch a peer connection come to life.

Core challenges you’ll face:

  • Implementing offer/answer SDP exchange → maps to session negotiation
  • Handling ICE candidate gathering and exchange → maps to connectivity establishment
  • Building the signaling server → maps to out-of-band communication
  • Managing connection states → maps to RTCPeerConnection lifecycle

Key Concepts:

  • RTCPeerConnection API: “Real-Time Communication with WebRTC” Chapter 3 - Loreto & Romano
  • SDP format and fields: RFC 4566 - “SDP: Session Description Protocol”
  • ICE candidate types: RFC 8445 - “ICE: A Protocol for NAT Traversal”
  • WebSocket signaling: “High Performance Browser Networking” Chapter 17

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Projects 1-3, basic Node.js/WebSocket knowledge

Real world outcome:

┌─────────────────────────────────────────────┐
│  📞 P2P Video Call                          │
├─────────────────────────────────────────────┤
│  Room: meeting-abc123    [Copy Link]        │
│                                             │
│  ┌──────────────────┐ ┌──────────────────┐  │
│  │                  │ │                  │  │
│  │   [Remote User]  │ │   [You - Local]  │  │
│  │                  │ │                  │  │
│  │    Connected!    │ │                  │  │
│  │                  │ │                  │  │
│  └──────────────────┘ └──────────────────┘  │
│                                             │
│  Connection State: connected                │
│  ICE State: completed                       │
│  Signaling State: stable                    │
│                                             │
│  [🎤 Mute] [📷 Camera Off] [📞 End Call]    │
│                                             │
│  Debug Panel:                               │
│  ├─ ICE Candidates: 4 local, 3 remote       │
│  ├─ Selected: 192.168.1.5:54321 (host)      │
│  └─ Codec: VP8, Opus                        │
└─────────────────────────────────────────────┘

Implementation Hints:

The WebRTC connection dance:

Caller side:

  1. Create RTCPeerConnection with ICE server config
  2. Add local media tracks to connection
  3. Create offer: pc.createOffer()
  4. Set local description: pc.setLocalDescription(offer)
  5. Send offer to remote peer via signaling server
  6. Receive answer from remote peer
  7. Set remote description: pc.setRemoteDescription(answer)
  8. Exchange ICE candidates as they’re gathered

Callee side:

  1. Create RTCPeerConnection
  2. Receive offer from signaling
  3. Set remote description: pc.setRemoteDescription(offer)
  4. Add local media tracks
  5. Create answer: pc.createAnswer()
  6. Set local description: pc.setLocalDescription(answer)
  7. Send answer back via signaling
  8. Exchange ICE candidates

Signaling server (Node.js + WebSocket):

Messages to handle:
- "join-room" → Track user in room
- "offer" → Forward to other user in room
- "answer" → Forward to caller
- "ice-candidate" → Forward to peer
- "leave" → Notify peer

ICE candidate handling:

pc.onicecandidate = (event) => {
  if (event.candidate) {
    sendToSignaling({ type: 'ice-candidate', candidate: event.candidate });
  }
};

// When receiving from signaling:
pc.addIceCandidate(new RTCIceCandidate(candidate));

Connection state monitoring:

  • pc.connectionState: ‘new’ → ‘connecting’ → ‘connected’
  • pc.iceConnectionState: ‘checking’ → ‘connected’ or ‘completed’
  • pc.signalingState: ‘stable’ ↔ ‘have-local-offer’ ↔ ‘have-remote-offer’

Learning milestones:

  1. Signaling messages flow correctly → You understand the coordination layer
  2. SDP exchange completes → You understand session negotiation
  3. ICE candidates are exchanged → You understand connectivity establishment
  4. Video appears from remote peer → You’ve built a working WebRTC call!

Project 5: WebRTC Data Channel File Transfer

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: TypeScript, Rust (WebAssembly for chunking)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Data Channels / Binary Transfer / SCTP
  • Software or Tool: RTCDataChannel
  • Main Book: “Real-Time Communication with WebRTC” by Salvatore Loreto

What you’ll build: A peer-to-peer file transfer application where users can send files of any size directly to each other—no server upload required. Features progress indication, pause/resume, and transfer speed display.

Why it teaches WebRTC: RTCDataChannel is the unsung hero of WebRTC. This project teaches you how WebRTC handles arbitrary data (not just media), the SCTP protocol underneath, reliability options, and how to efficiently transfer binary data.

Core challenges you’ll face:

  • Chunking large files for transmission → maps to buffer management
  • Handling backpressure and bufferedAmount → maps to flow control
  • Reassembling chunks on receiver side → maps to ordered delivery
  • Implementing pause/resume → maps to channel state management

Key Concepts:

  • RTCDataChannel API: “Real-Time Communication with WebRTC” Chapter 5 - Loreto & Romano
  • SCTP protocol: RFC 4960 - “Stream Control Transmission Protocol”
  • ArrayBuffer and Blob handling: MDN Web Docs - “Using files from web applications”
  • Backpressure in streams: WHATWG Streams Standard - “Backpressure”

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 4 (P2P Video Call)

Real world outcome:

┌─────────────────────────────────────────────┐
│  📁 P2P File Drop                            │
├─────────────────────────────────────────────┤
│  Connected to: peer-xyz789                  │
│                                             │
│  ┌─────────────────────────────────────────┐│
│  │                                         ││
│  │     Drag and drop files here            ││
│  │         or click to browse              ││
│  │                                         ││
│  └─────────────────────────────────────────┘│
│                                             │
│  Transfers:                                 │
│  ┌─────────────────────────────────────────┐│
│  │ 📄 project.zip (245 MB)                 ││
│  │ ████████████████░░░░░░ 78% - 12.4 MB/s  ││
│  │ [Pause] [Cancel]                        ││
│  ├─────────────────────────────────────────┤│
│  │ 🎵 song.mp3 (8.2 MB)          ✓ Done    ││
│  ├─────────────────────────────────────────┤│
│  │ 🖼 photo.jpg (2.1 MB)          ✓ Done    ││
│  └─────────────────────────────────────────┘│
│                                             │
│  Stats: 267 MB transferred | Avg: 10.2 MB/s │
└─────────────────────────────────────────────┘

Implementation Hints:

Creating a data channel:

// On initiating peer:
const dataChannel = pc.createDataChannel("files", {
  ordered: true  // Important for file integrity
});

// On receiving peer:
pc.ondatachannel = (event) => {
  const dataChannel = event.channel;
  dataChannel.onmessage = handleMessage;
};

File chunking strategy:

1. Read file as ArrayBuffer
2. Split into chunks (e.g., 16KB each)
3. Send metadata first: { type: 'file-start', name, size, chunks }
4. Send each chunk with index: { type: 'chunk', index, data }
5. Send completion: { type: 'file-end' }

Handling backpressure (critical for large files):

const BUFFER_THRESHOLD = 65535; // 64KB

async function sendChunk(chunk) {
  while (dataChannel.bufferedAmount > BUFFER_THRESHOLD) {
    await new Promise(resolve => setTimeout(resolve, 10));
  }
  dataChannel.send(chunk);
}

Receiver reassembly:

const chunks = [];
let expectedChunks = 0;

function handleMessage(event) {
  const msg = JSON.parse(event.data);
  if (msg.type === 'file-start') {
    expectedChunks = msg.chunks;
    // Show in UI
  } else if (msg.type === 'chunk') {
    chunks[msg.index] = msg.data;
    // Update progress
  } else if (msg.type === 'file-end') {
    const blob = new Blob(chunks);
    // Trigger download
  }
}

Learning milestones:

  1. Small text files transfer → You understand basic data channel usage
  2. Large binary files work → You understand chunking and ArrayBuffers
  3. Progress bar is accurate → You understand metadata messaging
  4. Pause/resume works → You understand channel state management

Project 6: Multi-Party Mesh Video Conference

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: TypeScript, Go (signaling)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Multi-Party Topology / Connection Management
  • Software or Tool: Multiple RTCPeerConnections
  • Main Book: “Real-Time Communication with WebRTC” by Salvatore Loreto

What you’ll build: A video conference for 3-6 participants using mesh topology where each participant connects directly to every other participant. Includes participant grid, speaker detection, and bandwidth adaptation.

Why it teaches WebRTC: Mesh topology exposes the N*(N-1)/2 connection problem. You’ll understand why mesh doesn’t scale, how to manage multiple peer connections, and the bandwidth/CPU implications—setting the stage for understanding why SFUs exist.

Core challenges you’ll face:

  • Managing N peer connections simultaneously → maps to connection lifecycle per peer
  • Updating UI as participants join/leave → maps to state synchronization
  • Handling bandwidth constraints → maps to why mesh fails beyond 4-5 users
  • Implementing speaker detection → maps to audio level analysis

Key Concepts:

  • Mesh topology limitations: “WebRTC Blueprints” Chapter 4 - Oleg Oreshnikov
  • RTCPeerConnection per peer: “Real-Time Communication with WebRTC” Chapter 6
  • Audio level detection: Web Audio API AnalyserNode documentation
  • Simulcast basics: “WebRTC: APIs and RTCWEB Protocols” Chapter 8 - Oleg Oreshnikov

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Project 4 (P2P Video Call)

Real world outcome:

┌─────────────────────────────────────────────────────┐
│  🎥 Mesh Conference - Room: standup-daily          │
├─────────────────────────────────────────────────────┤
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐    │
│  │             │ │             │ │             │    │
│  │    Alice    │ │     Bob     │ │   Charlie   │    │
│  │  (speaking) │ │             │ │             │    │
│  │   🎤 🟢     │ │   🎤 🔇     │ │   🎤 🔇     │    │
│  └─────────────┘ └─────────────┘ └─────────────┘    │
│  ┌─────────────┐ ┌─────────────┐                    │
│  │             │ │             │                    │
│  │    Diana    │ │     You     │  Participants: 5   │
│  │             │ │   (local)   │  Connections: 4    │
│  │   🎤 🔇     │ │   🎤 🟢     │  Bandwidth: 8.2Mbps│
│  └─────────────┘ └─────────────┘                    │
│                                                     │
│  Network Stats:                                     │
│  ├─ → Alice: 1.8 Mbps, RTT: 45ms                   │
│  ├─ → Bob: 2.1 Mbps, RTT: 32ms                     │
│  ├─ → Charlie: 1.5 Mbps, RTT: 78ms                 │
│  └─ → Diana: 2.0 Mbps, RTT: 51ms                   │
│                                                     │
│  [🎤 Mute] [📷 Off] [🖥 Share] [📞 Leave]           │
└─────────────────────────────────────────────────────┘

Implementation Hints:

Architecture: One RTCPeerConnection per remote peer:

const peers = new Map(); // peerId → { pc: RTCPeerConnection, stream: MediaStream }

function connectToPeer(peerId) {
  const pc = new RTCPeerConnection(config);

  // Add local tracks to this connection
  localStream.getTracks().forEach(track => {
    pc.addTrack(track, localStream);
  });

  // Handle remote tracks from this peer
  pc.ontrack = (event) => {
    peers.get(peerId).stream = event.streams[0];
    updateUI();
  };

  peers.set(peerId, { pc, stream: null });
  // Start offer/answer...
}

When someone joins:

  1. Signaling server notifies all existing participants
  2. Each existing participant creates a connection to new peer
  3. New participant receives connections from everyone

When someone leaves:

  1. Close their peer connection
  2. Remove from UI
  3. Clean up resources

Speaker detection:

function detectSpeaker(stream) {
  const audioContext = new AudioContext();
  const analyser = audioContext.createAnalyser();
  const source = audioContext.createMediaStreamSource(stream);
  source.connect(analyser);

  const data = new Uint8Array(analyser.frequencyBinCount);

  setInterval(() => {
    analyser.getByteFrequencyData(data);
    const volume = data.reduce((a, b) => a + b) / data.length;
    // Highlight speaker if volume > threshold
  }, 100);
}

Bandwidth awareness:

  • With 5 participants, you’re sending your video 4 times
  • You’re receiving 4 video streams
  • Total bandwidth = 4 * uploadBitrate + 4 * downloadBitrate
  • This is why mesh fails beyond ~6 participants

Learning milestones:

  1. 3 people can join and see each other → You understand multi-peer management
  2. 4th/5th person degrades quality → You understand mesh limitations
  3. Speaker highlight works → You understand audio analysis
  4. Stats show bandwidth per peer → You understand why SFUs are needed

Project 7: STUN/TURN Server from Scratch

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, C, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: NAT Traversal / Network Protocols / UDP
  • Software or Tool: Raw UDP sockets
  • Main Book: “WebRTC: APIs and RTCWEB Protocols” by Alan B. Johnston

What you’ll build: A STUN server that responds to binding requests (telling clients their public IP) and a TURN server that relays media when direct connectivity fails. Supports the full STUN message format with authentication.

Why it teaches WebRTC: NAT traversal is the hardest part of WebRTC to understand. By building STUN/TURN, you’ll see exactly how NAT hole-punching works, why TURN is the fallback, and understand every ICE candidate type.

Core challenges you’ll face:

  • Implementing STUN message parsing → maps to RFC 5389 binary format
  • Handling NAT types → maps to symmetric vs cone NAT behavior
  • Implementing TURN allocations → maps to relay resource management
  • HMAC authentication → maps to long-term credentials

Key Concepts:

  • STUN Protocol: RFC 5389 - “Session Traversal Utilities for NAT”
  • TURN Protocol: RFC 5766 - “TURN: Relay Extensions to STUN”
  • NAT Types: “WebRTC: APIs and RTCWEB Protocols” Chapter 6 - Johnston
  • ICE Candidate Gathering: RFC 8445 - “ICE”

Difficulty: Expert Time estimate: 1 month Prerequisites: Projects 4-6, understanding of UDP sockets, network programming

Real world outcome:

$ ./ministun -listen 0.0.0.0:3478 -verbose

STUN/TURN Server started on 0.0.0.0:3478
Public IP detected: 203.0.113.50

[STUN] Binding Request from 192.168.1.100:54321
  Transaction ID: 0x1234567890abcdef12345678
  Response: XOR-MAPPED-ADDRESS 86.12.34.56:54321 (client's public IP)

[TURN] Allocate Request from 10.0.0.50:12345
  Username: user1
  Realm: example.com
  Auth: ✓ Valid
  Allocated relay: 203.0.113.50:49152
  Lifetime: 600s

[TURN] Send Indication 10.0.0.50:12345 → 203.0.113.50:49152
  Relaying 1024 bytes to peer 74.125.200.100:19302

Stats:
  STUN Bindings: 1,247
  TURN Allocations: 23 active
  Relayed Data: 1.2 GB

Implementation Hints:

STUN message format (20-byte header + attributes):

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0|     STUN Message Type     |         Message Length        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         Magic Cookie                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
|                     Transaction ID (96 bits)                  |
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Message types:

  • 0x0001: Binding Request
  • 0x0101: Binding Response
  • 0x0111: Binding Error Response

STUN Binding Request handling (pseudo-code):

1. Receive UDP packet
2. Verify Magic Cookie (0x2112A442)
3. Parse Transaction ID
4. Get source IP:port from UDP header
5. XOR the IP:port with Magic Cookie
6. Send Binding Response with XOR-MAPPED-ADDRESS attribute

TURN is more complex—you manage “allocations”:

Allocation = {
  client: "192.168.1.100:54321",
  relay: "203.0.113.50:49152",  // Port you allocate
  permissions: ["74.125.200.100"],  // Allowed peers
  lifetime: 600,  // Seconds
  channels: {}  // For ChannelData optimization
}

TURN data relay:

1. Client sends SendIndication with peer address + data
2. Server looks up allocation by client address
3. Check permissions for peer
4. Send data from relay port to peer
5. When peer sends back, reverse the process

Learning milestones:

  1. STUN binding works, curl/stun-client shows public IP → You understand STUN basics
  2. Multiple NAT types are handled differently → You understand NAT behavior
  3. TURN allocation works → You understand relay setup
  4. Two peers communicate via your TURN → You’ve built NAT traversal infrastructure

Project 8: Signaling Server with Room Management

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Node.js, Rust, Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: WebSocket / State Management / Pub-Sub
  • Software or Tool: WebSocket, Redis (for scaling)
  • Main Book: “Building Realtime Apps with Node.js” by Ethan Brown

What you’ll build: A production-quality signaling server supporting rooms, authentication, presence, reconnection handling, and horizontal scaling with Redis pub-sub. Includes an admin dashboard showing live rooms and connections.

Why it teaches WebRTC: Signaling is the “glue” that lets WebRTC work. A real signaling server must handle edge cases: reconnections, room limits, authentication, and scaling. This project teaches you the infrastructure layer of any WebRTC application.

Core challenges you’ll face:

  • Managing room state and membership → maps to in-memory data structures
  • Handling client disconnection/reconnection → maps to state persistence
  • Scaling across multiple server instances → maps to Redis pub-sub
  • Implementing authentication → maps to JWT/session tokens

Key Concepts:

  • WebSocket protocol: RFC 6455 - “The WebSocket Protocol”
  • Pub-Sub for scaling: “Redis in Action” Chapter 5 - Josiah L. Carlson
  • Graceful disconnection handling: “Building Realtime Apps” Chapter 7
  • JWT authentication: RFC 7519 - “JSON Web Token”

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 4 (P2P Video Call), basic backend development

Real world outcome:

$ ./signaling-server --port 8080 --redis redis://localhost:6379

Signaling Server v1.0
├─ HTTP: http://localhost:8080
├─ WebSocket: ws://localhost:8080/ws
├─ Redis: connected
└─ Admin: http://localhost:8080/admin

[10:01:32] Client connected: user_abc (session: sess_123)
[10:01:33] user_abc joined room "team-standup" (2/10 participants)
[10:01:34] Forwarding offer: user_abc → user_xyz
[10:01:35] Forwarding answer: user_xyz → user_abc
[10:01:35] ICE candidates exchanging...
[10:01:36] Call established in room "team-standup"

Admin Dashboard (http://localhost:8080/admin):
┌──────────────────────────────────────────────────┐
│  Active Rooms: 12    Total Connections: 47       │
├──────────────────────────────────────────────────┤
│  Room              Participants    Created       │
│  team-standup      4/10           10 min ago     │
│  interview-123     2/2            5 min ago      │
│  support-call      3/5            2 min ago      │
│  ...                                             │
├──────────────────────────────────────────────────┤
│  Server Instances: 3                             │
│  └─ server-1: 18 connections                     │
│  └─ server-2: 15 connections                     │
│  └─ server-3: 14 connections                     │
└──────────────────────────────────────────────────┘

Implementation Hints:

Message protocol:

{ "type": "join", "room": "team-standup", "token": "jwt..." }
{ "type": "offer", "target": "user_xyz", "sdp": {...} }
{ "type": "answer", "target": "user_abc", "sdp": {...} }
{ "type": "ice-candidate", "target": "user_abc", "candidate": {...} }
{ "type": "leave" }
{ "type": "room-state", "participants": ["user_abc", "user_xyz"] }

Room data structure:

type Room struct {
    ID           string
    Participants map[string]*Client
    MaxSize      int
    CreatedAt    time.Time
    Locked       bool
}

type Client struct {
    ID        string
    Conn      *websocket.Conn
    Room      *Room
    UserData  map[string]interface{}
}

Handling disconnection with grace period:

1. WebSocket closes
2. Don't immediately remove from room
3. Start 30-second timer
4. If client reconnects within 30s with same session:
   - Restore to same room
   - Notify peers of "reconnected" status
5. If timer expires:
   - Remove from room
   - Notify peers of departure

Redis pub-sub for horizontal scaling:

1. Each server instance subscribes to channels:
   - "room:{roomId}" for room messages
   - "server:{serverId}" for server-specific messages

2. When client sends message:
   - Check if target is on this server
   - If yes: send directly via WebSocket
   - If no: publish to Redis channel

3. When receiving from Redis:
   - Check if target is on this server
   - If yes: forward to WebSocket

Learning milestones:

  1. Basic room join/leave works → You understand room state management
  2. SDP/ICE forwarding enables calls → You understand signaling role
  3. Reconnection preserves session → You understand state persistence
  4. Multi-server deployment works → You understand horizontal scaling

Project 9: WebRTC Stats Dashboard

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript/TypeScript
  • Alternative Programming Languages: React, Vue, Svelte
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: RTC Statistics / Data Visualization / Debugging
  • Software or Tool: RTCPeerConnection.getStats(), Chart.js/D3.js
  • Main Book: “Real-Time Communication with WebRTC” by Salvatore Loreto

What you’ll build: A real-time dashboard that visualizes all WebRTC statistics: bitrate graphs, packet loss, jitter, round-trip time, codec information, ICE candidate pairs, and connection quality scores. Essential for debugging WebRTC issues.

Why it teaches WebRTC: getStats() exposes the internals of WebRTC connections. Building this dashboard forces you to understand every metric: what causes high jitter, why packet loss spikes, what different ICE states mean, and how to diagnose call quality issues.

Core challenges you’ll face:

  • Parsing the complex stats report structure → maps to understanding RTCStatsReport
  • Calculating derived metrics (bitrate over time) → maps to stats deltas
  • Visualizing real-time data efficiently → maps to streaming data visualization
  • Correlating stats to diagnose issues → maps to troubleshooting skills

Key Concepts:

  • RTCStatsReport types: W3C WebRTC Statistics specification
  • Calculating bitrate: “Real-Time Communication with WebRTC” Chapter 8
  • Quality metrics interpretation: WebRTC.org “Debugging Guide”
  • Time-series visualization: “D3.js in Action” Chapter 7 - Elijah Meeks

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 4 (P2P Video Call), basic data visualization

Real world outcome:

┌─────────────────────────────────────────────────────────────────┐
│  📊 WebRTC Stats Dashboard - Call Quality Monitor               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Quality Score: ████████░░ 78/100 (Good)                       │
│                                                                 │
│  ┌─────────────── Bitrate (Video) ────────────────┐            │
│  │    2.5Mbps ─╮    ╭────╮                        │            │
│  │    2.0Mbps  ╰────╯    ╰──╮  ╭──                │            │
│  │    1.5Mbps              ╰──╯                   │            │
│  │    1.0Mbps                                     │            │
│  │           -60s  -45s  -30s  -15s  now          │            │
│  └────────────────────────────────────────────────┘            │
│                                                                 │
│  ┌─ Network Metrics ─────┐  ┌─ Media Stats ──────┐             │
│  │ RTT:        45ms      │  │ Video Codec: VP8   │             │
│  │ Jitter:     12ms      │  │ Resolution: 1280x720│            │
│  │ Packet Loss: 0.2%     │  │ Framerate: 28fps   │             │
│  │ Bandwidth Est: 3.2Mbps│  │ Audio Codec: Opus  │             │
│  └───────────────────────┘  │ Audio Level: ██░ 65%│            │
│                             └────────────────────┘             │
│  ┌─ ICE Candidate Pair ──────────────────────────┐             │
│  │ Local:  192.168.1.100:54321 (host/udp)        │             │
│  │ Remote: 86.12.34.56:12345 (srflx/udp)         │             │
│  │ State: succeeded | Priority: 7960929456789    │             │
│  │ Bytes Sent: 45.2 MB | Received: 52.1 MB       │             │
│  └───────────────────────────────────────────────┘             │
│                                                                 │
│  [Export Stats] [Start Recording] [Simulate Packet Loss]       │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Fetching stats periodically:

setInterval(async () => {
  const stats = await peerConnection.getStats();

  stats.forEach(report => {
    switch(report.type) {
      case 'inbound-rtp':
        handleInboundRTP(report);
        break;
      case 'outbound-rtp':
        handleOutboundRTP(report);
        break;
      case 'candidate-pair':
        handleCandidatePair(report);
        break;
      case 'transport':
        handleTransport(report);
        break;
      // ... many more types
    }
  });
}, 1000);

Key stat types to process:

  • inbound-rtp: Received media (bytesReceived, packetsReceived, packetsLost, jitter)
  • outbound-rtp: Sent media (bytesSent, packetsSent)
  • candidate-pair: ICE connection info (currentRoundTripTime, state)
  • codec: Active codecs (mimeType, clockRate)
  • track: Media track info (frameWidth, frameHeight, framesPerSecond)

Calculating bitrate (requires delta between samples):

let prevStats = null;

function calculateBitrate(currentStats) {
  if (!prevStats) {
    prevStats = currentStats;
    return 0;
  }

  const bytesDelta = currentStats.bytesReceived - prevStats.bytesReceived;
  const timeDelta = currentStats.timestamp - prevStats.timestamp;
  const bitrate = (bytesDelta * 8) / (timeDelta / 1000); // bits per second

  prevStats = currentStats;
  return bitrate;
}

Quality score calculation (simplified):

function calculateQuality(stats) {
  let score = 100;

  // Penalize high RTT
  if (stats.rtt > 100) score -= 10;
  if (stats.rtt > 200) score -= 20;

  // Penalize packet loss
  score -= stats.packetLoss * 50; // 2% loss = -100

  // Penalize high jitter
  if (stats.jitter > 30) score -= 15;

  return Math.max(0, score);
}

Learning milestones:

  1. Stats appear and update in real-time → You understand getStats() API
  2. Bitrate graph shows smooth line → You understand delta calculations
  3. You can diagnose a “bad call” → You understand what metrics indicate problems
  4. ICE candidate selection is visible → You understand connection establishment

Project 10: Audio-Only Walkie-Talkie App

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: TypeScript, React Native, Flutter
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Audio Processing / Push-to-Talk / Voice Activity Detection
  • Software or Tool: WebRTC + Web Audio API
  • Main Book: “Web Audio API” by Boris Smus

What you’ll build: A group walkie-talkie application with push-to-talk, voice activity detection, spatial audio (hear people from different directions), and noise suppression. Works like Discord voice channels or Clubhouse.

Why it teaches WebRTC: Audio-focused WebRTC teaches you about Opus codec, audio processing pipelines, voice activity detection, and how to optimize for low-latency voice communication. Many WebRTC apps are audio-first.

Core challenges you’ll face:

  • Implementing push-to-talk correctly → maps to muting/unmuting tracks
  • Building voice activity detection → maps to audio level analysis
  • Adding spatial audio positioning → maps to Web Audio spatialization
  • Minimizing audio latency → maps to codec and network optimization

Key Concepts:

  • Opus codec for voice: RFC 6716 - “Opus Audio Codec”
  • Voice Activity Detection: “Web Audio API” Chapter 4 - Boris Smus
  • Spatial Audio: Web Audio API PannerNode documentation
  • Audio constraints: MDN - “MediaTrackConstraints for audio”

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 4 (P2P Video Call)

Real world outcome:

┌─────────────────────────────────────────────────────┐
│  🎙️ Walkie-Talkie - Channel: "gaming-squad"        │
├─────────────────────────────────────────────────────┤
│                                                     │
│        ┌─────┐                                      │
│        │  👤 │ ← Alice (speaking)                   │
│        │ 🔊  │                                      │
│        └─────┘                                      │
│   ┌─────┐         ┌─────┐         ┌─────┐          │
│   │  👤 │         │  👤 │         │  👤 │          │
│   │ 🔇  │         │ 🔇  │         │ 🔊  │ ← Bob    │
│   └─────┘         └─────┘         └─────┘          │
│   Charlie          Diana           (you)           │
│                                                     │
│  Spatial Audio: ON  [Arrange Positions]            │
│                                                     │
│  ┌─────────────────────────────────────────────────┐│
│  │                                                 ││
│  │      [PRESS AND HOLD SPACE TO TALK]            ││
│  │                                                 ││
│  └─────────────────────────────────────────────────┘│
│                                                     │
│  Mode: [Push-to-Talk] [Voice Activity] [Always On] │
│                                                     │
│  Voice Settings:                                   │
│  ├─ Input: Blue Yeti                               │
│  ├─ Noise Suppression: ███████░░░ 70%              │
│  └─ VAD Sensitivity: ████░░░░░░ 40%                │
│                                                     │
│  [🔇 Deafen] [⚙️ Settings] [🚪 Leave Channel]      │
└─────────────────────────────────────────────────────┘

Implementation Hints:

Audio-only peer connection setup:

const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    echoCancellation: true,
    noiseSuppression: true,
    autoGainControl: true
  },
  video: false
});

// Start muted for push-to-talk
stream.getAudioTracks()[0].enabled = false;

Push-to-talk implementation:

document.addEventListener('keydown', (e) => {
  if (e.code === 'Space' && !e.repeat) {
    localStream.getAudioTracks()[0].enabled = true;
    showTalkingIndicator();
  }
});

document.addEventListener('keyup', (e) => {
  if (e.code === 'Space') {
    localStream.getAudioTracks()[0].enabled = false;
    hideTalkingIndicator();
  }
});

Voice Activity Detection (VAD):

const audioContext = new AudioContext();
const analyser = audioContext.createAnalyser();
const source = audioContext.createMediaStreamSource(stream);
source.connect(analyser);

const dataArray = new Uint8Array(analyser.frequencyBinCount);

function checkVoiceActivity() {
  analyser.getByteFrequencyData(dataArray);
  const average = dataArray.reduce((a, b) => a + b) / dataArray.length;

  if (average > VAD_THRESHOLD) {
    // Voice detected - unmute
    localStream.getAudioTracks()[0].enabled = true;
  } else {
    // Silence - mute
    localStream.getAudioTracks()[0].enabled = false;
  }

  requestAnimationFrame(checkVoiceActivity);
}

Spatial audio with PannerNode:

const audioContext = new AudioContext();
const listener = audioContext.listener;

function positionPeer(peerId, x, y) {
  const panner = pannerNodes.get(peerId);
  panner.positionX.value = x;
  panner.positionY.value = 0;
  panner.positionZ.value = y;
}

// Connect remote audio through panner
function addRemoteAudio(stream, peerId) {
  const source = audioContext.createMediaStreamSource(stream);
  const panner = audioContext.createPanner();
  panner.panningModel = 'HRTF';
  source.connect(panner);
  panner.connect(audioContext.destination);
  pannerNodes.set(peerId, panner);
}

Learning milestones:

  1. Push-to-talk works → You understand audio track enabling/disabling
  2. Voice activity detection works → You understand audio level analysis
  3. Spatial audio gives direction → You understand Web Audio spatialization
  4. Low latency achieved → You understand audio optimization

Project 11: Remote Desktop Viewer

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript + Electron
  • Alternative Programming Languages: TypeScript, Rust (native), Go
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Screen Sharing / Remote Control / Input Capture
  • Software or Tool: WebRTC + Electron for native access
  • Main Book: “Electron in Action” by Steve Kinney

What you’ll build: A complete remote desktop application where one user shares their screen and the other can see it and control it (mouse movement, clicks, keyboard input). Like TeamViewer or AnyDesk.

Why it teaches WebRTC: This project combines screen capture, low-latency video streaming, and bidirectional data channels for input events. It’s a complex real-world application that pushes WebRTC to its limits.

Core challenges you’ll face:

  • Capturing screen with system-level permissions → maps to Electron/native APIs
  • Sending mouse/keyboard events over data channel → maps to input serialization
  • Injecting input events on host machine → maps to OS-level input APIs
  • Optimizing for low latency → maps to encoder settings, network priority

Key Concepts:

  • Electron desktopCapturer: Electron documentation - “desktopCapturer”
  • Input event injection: Electron - “Keyboard and Mouse Simulation”
  • Low-latency encoding: “High Performance Browser Networking” Chapter 18
  • Data channel for control: “Real-Time Communication with WebRTC” Chapter 5

Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Projects 4-5, Electron basics

Real world outcome:

┌─────────────────────────────────────────────────────────────────┐
│  🖥️ Remote Desktop - Connected to: Alice's MacBook Pro         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                                                             ││
│  │                                                             ││
│  │               [Remote Desktop View]                         ││
│  │                                                             ││
│  │     You see Alice's screen and can control it               ││
│  │     Mouse movements and clicks are sent in real-time        ││
│  │                                                             ││
│  │                                                             ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                 │
│  ┌─ Connection ────────┐  ┌─ Performance ──────────────────────┐│
│  │ Latency: 32ms       │  │ Resolution: 1920x1080              ││
│  │ Bandwidth: 4.2 Mbps │  │ FPS: 30                            ││
│  │ Packet Loss: 0.0%   │  │ Codec: VP9                         ││
│  └─────────────────────┘  │ Quality: ████████░░ Excellent      ││
│                           └─────────────────────────────────────┘│
│                                                                 │
│  [🖱️ Request Control] [⌨️ Send Ctrl+Alt+Del] [📋 Clipboard Sync]│
│  [📁 File Transfer] [🔒 Lock Session] [❌ Disconnect]           │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Host side (sharing screen + receiving input):

// Electron main process - capture screen
const { desktopCapturer } = require('electron');

async function getScreenStream() {
  const sources = await desktopCapturer.getSources({ types: ['screen'] });
  const stream = await navigator.mediaDevices.getUserMedia({
    video: {
      mandatory: {
        chromeMediaSource: 'desktop',
        chromeMediaSourceId: sources[0].id,
        maxWidth: 1920,
        maxHeight: 1080,
        maxFrameRate: 30
      }
    },
    audio: false
  });
  return stream;
}

Input event protocol (sent over data channel):

// Viewer sends these messages
{ type: 'mouse-move', x: 0.5, y: 0.3 }  // Normalized coordinates
{ type: 'mouse-down', button: 0 }       // 0=left, 1=middle, 2=right
{ type: 'mouse-up', button: 0 }
{ type: 'key-down', key: 'a', modifiers: ['ctrl'] }
{ type: 'key-up', key: 'a' }
{ type: 'scroll', deltaX: 0, deltaY: -120 }

Host receiving and injecting input (using robotjs or similar):

const robot = require('robotjs');

dataChannel.onmessage = (event) => {
  const msg = JSON.parse(event.data);

  switch (msg.type) {
    case 'mouse-move':
      const screenSize = robot.getScreenSize();
      robot.moveMouse(
        msg.x * screenSize.width,
        msg.y * screenSize.height
      );
      break;
    case 'mouse-down':
      robot.mouseClick(msg.button === 0 ? 'left' : 'right');
      break;
    case 'key-down':
      robot.keyTap(msg.key, msg.modifiers);
      break;
  }
};

Viewer capturing mouse on video element:

videoElement.addEventListener('mousemove', (e) => {
  const rect = videoElement.getBoundingClientRect();
  const x = (e.clientX - rect.left) / rect.width;
  const y = (e.clientY - rect.top) / rect.height;

  dataChannel.send(JSON.stringify({ type: 'mouse-move', x, y }));
});

Low-latency optimizations:

  • Use VP9 or H.264 with hardware encoding
  • Set maxBitrate high (5-10 Mbps for crisp text)
  • Prioritize data channel messages
  • Use unreliable data channel for mouse-move (ordered but no retransmit)

Learning milestones:

  1. Screen share works in Electron → You understand desktopCapturer
  2. Mouse movements are sent and received → You understand data channel input
  3. Clicks and keyboard work → You understand input injection
  4. Low enough latency to be usable → You understand optimization

Project 12: SFU (Selective Forwarding Unit)

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, C++, Node.js (Mediasoup)
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 5: Master
  • Knowledge Area: Media Servers / RTP Forwarding / Simulcast
  • Software or Tool: Pion WebRTC (Go) or libwebrtc
  • Main Book: “WebRTC for the Curious” by Sean DuBois (Pion creator)

What you’ll build: A Selective Forwarding Unit that sits between conference participants—receiving one video stream from each user and forwarding it to all others. Supports simulcast (multiple quality layers) and dynamic layer switching.

Why it teaches WebRTC: The SFU is the architecture behind Zoom, Meet, and Teams. Building one teaches you RTP packet handling, RTCP feedback, simulcast, bandwidth estimation, and how to build scalable real-time infrastructure.

Core challenges you’ll face:

  • Terminating WebRTC connections server-side → maps to using Pion/libwebrtc
  • Forwarding RTP packets efficiently → maps to avoiding transcoding
  • Implementing simulcast layer selection → maps to bandwidth adaptation
  • Handling RTCP feedback → maps to PLI, NACK, REMB

Key Concepts:

  • RTP/RTCP protocols: RFC 3550 - “RTP: A Transport Protocol for Real-Time Applications”
  • Simulcast: “WebRTC for the Curious” Chapter 8 - Sean DuBois
  • Pion WebRTC library: Pion documentation and examples
  • REMB (bandwidth estimation): draft-alvestrand-rmcat-remb

Difficulty: Master Time estimate: 1-2 months Prerequisites: All previous projects, Go programming, deep networking knowledge

Real world outcome:

$ ./mini-sfu --port 8443 --stun stun.l.google.com:19302

Mini-SFU v1.0 started
├─ HTTPS/WSS: https://localhost:8443
├─ STUN: stun.l.google.com:19302
└─ TURN: configured

[Room: daily-standup]
├─ Participants: 4
├─ Ingress Streams: 4
├─ Egress Streams: 12 (each user receives 3)
└─ Total Bandwidth: 18.4 Mbps

Stream Routing:
┌────────────────────────────────────────────────────────────────┐
│  Alice (sender)                                                │
│  └─ Simulcast: 1280x720 @ 2.5Mbps                             │
│               640x360 @ 0.8Mbps                                │
│               320x180 @ 0.2Mbps                                │
│  Forwarding to:                                                │
│    → Bob (720p - good bandwidth)                               │
│    → Charlie (360p - medium bandwidth)                         │
│    → Diana (180p - poor bandwidth)                             │
└────────────────────────────────────────────────────────────────┘

RTCP Feedback:
├─ PLI requests: 12 (picture loss)
├─ NACK requests: 45 (packet retransmit)
└─ REMB estimates: Alice=3.2Mbps, Bob=1.8Mbps

API: POST /api/rooms/:id/subscribe
     POST /api/rooms/:id/set-layer
     GET  /api/stats

Implementation Hints:

SFU architecture:

     ┌─────────┐
     │  Alice  │
     └────┬────┘
          │ (1 upload)
          ▼
    ┌─────────────┐
    │     SFU     │
    │  ┌───────┐  │
    │  │Router │  │
    │  └───┬───┘  │
    └──────┼──────┘
    ┌──────┼──────┐
    │      │      │
    ▼      ▼      ▼
 ┌─────┐┌─────┐┌─────┐
 │ Bob ││Carol││Diana│
 └─────┘└─────┘└─────┘
   (3 downloads)

Pion WebRTC setup (Go):

// Create a new WebRTC API
api := webrtc.NewAPI(webrtc.WithMediaEngine(mediaEngine))

// For each participant:
peerConnection, _ := api.NewPeerConnection(config)

// When receiving track from participant:
peerConnection.OnTrack(func(track *webrtc.TrackRemote, receiver *webrtc.RTPReceiver) {
    // Forward to all other participants
    for _, otherPeer := range room.Peers {
        if otherPeer != thisPeer {
            // Create local track to send
            localTrack, _ := webrtc.NewTrackLocalStaticRTP(
                track.Codec().RTPCodecCapability,
                track.ID(),
                track.StreamID(),
            )
            otherPeer.AddTrack(localTrack)

            // Forward RTP packets
            go forwardRTP(track, localTrack)
        }
    }
})

func forwardRTP(remote *webrtc.TrackRemote, local *webrtc.TrackLocalStaticRTP) {
    for {
        rtp, _, _ := remote.ReadRTP()
        local.WriteRTP(rtp)
    }
}

Simulcast handling:

// Participant sends 3 layers with different RIDs
// SFU receives all three

type SimulcastLayers struct {
    High   *webrtc.TrackRemote  // 720p
    Medium *webrtc.TrackRemote  // 360p
    Low    *webrtc.TrackRemote  // 180p
}

// Select layer based on receiver bandwidth
func selectLayer(receiver *Peer, layers *SimulcastLayers) *webrtc.TrackRemote {
    bandwidth := receiver.EstimatedBandwidth
    if bandwidth > 2_000_000 {
        return layers.High
    } else if bandwidth > 500_000 {
        return layers.Medium
    }
    return layers.Low
}

RTCP feedback handling:

// Handle Picture Loss Indication (PLI)
// Forward to the original sender to request keyframe
peerConnection.OnTrack(func(track *webrtc.TrackRemote, receiver *webrtc.RTPReceiver) {
    go func() {
        for {
            rtcpPackets, _ := receiver.ReadRTCP()
            for _, pkt := range rtcpPackets {
                if pli, ok := pkt.(*rtcp.PictureLossIndication); ok {
                    // Forward to sender
                    senderPeer.WriteRTCP([]rtcp.Packet{pli})
                }
            }
        }
    }()
})

Learning milestones:

  1. SFU receives and forwards 2 streams → You understand basic forwarding
  2. 3+ participants work → You understand N-way routing
  3. Simulcast layer selection works → You understand bandwidth adaptation
  4. RTCP feedback flows correctly → You understand the full protocol

Project 13: WebRTC-SIP Gateway

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, C++, Node.js
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Protocol Translation / SIP / VoIP
  • Software or Tool: Pion WebRTC + SIP library
  • Main Book: “SIP: Understanding the Session Initiation Protocol” by Alan B. Johnston

What you’ll build: A gateway that connects WebRTC clients to traditional phone systems (SIP). A browser user can call a phone number, or receive calls from phones. Handles codec transcoding between Opus and G.711.

Why it teaches WebRTC: Real-world telephony integration requires understanding both WebRTC and traditional VoIP. This project teaches protocol translation, codec negotiation across different systems, and how WebRTC fits into the larger telecommunications ecosystem.

Core challenges you’ll face:

  • Implementing SIP signaling → maps to INVITE/ACK/BYE flow
  • Translating SDP between WebRTC and SIP → maps to codec negotiation
  • Transcoding Opus ↔ G.711 → maps to media processing
  • Handling DTMF (phone dial tones) → maps to RFC 4733

Key Concepts:

  • SIP Protocol: “SIP: Understanding the Session Initiation Protocol” - Johnston
  • SDP for SIP: RFC 3264 - “An Offer/Answer Model with SDP”
  • G.711 codec: ITU-T G.711
  • DTMF over RTP: RFC 4733

Difficulty: Expert Time estimate: 1-2 months Prerequisites: Projects 4, 7, 8, understanding of VoIP

Real world outcome:

┌─────────────────────────────────────────────────────────────────┐
│  📞 WebRTC-SIP Gateway                                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Browser User                 Gateway                Phone      │
│  (WebRTC/Opus)                                    (SIP/G.711)  │
│       │                         │                      │        │
│       │──── WebRTC Offer ─────►│                      │        │
│       │                         │──── SIP INVITE ────►│        │
│       │                         │◄─── SIP 180 Ring ───│        │
│       │◄─── ringback tone ─────│                      │        │
│       │                         │◄─── SIP 200 OK ─────│        │
│       │◄─── WebRTC Answer ─────│                      │        │
│       │                         │                      │        │
│       │◄═══ Opus Audio ════════│═══ G.711 Audio ════►│        │
│       │     (transcoded)        │                      │        │
│                                                                 │
│  Active Calls: 23                                               │
│  ├─ +1-555-0123 ↔ user@browser.com (02:34)                     │
│  ├─ +1-555-0456 ↔ support@browser.com (15:22)                  │
│  └─ +1-555-0789 ← incoming (ringing)                           │
│                                                                 │
│  SIP Trunk: sip.provider.com (registered)                      │
│  Codecs: Opus (WebRTC) ↔ PCMU/PCMA (SIP)                       │
│  Transcoding Load: 12% CPU                                      │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Architecture:

WebRTC Client ◄──► [Gateway] ◄──► SIP Trunk/PBX
                       │
              ┌────────┴────────┐
              │   Transcoder    │
              │  Opus ↔ G.711   │
              └─────────────────┘

SIP signaling flow (simplified):

WebRTC                Gateway                    SIP
  │                      │                        │
  │── Offer (SDP) ──────►│                        │
  │                      │── INVITE (SDP) ──────►│
  │                      │◄── 100 Trying ────────│
  │                      │◄── 180 Ringing ───────│
  │◄── ICE candidates ───│                        │
  │                      │◄── 200 OK (SDP) ──────│
  │◄── Answer (SDP) ─────│                        │
  │                      │── ACK ───────────────►│
  │                      │                        │
  │◄════ RTP (Opus) ═════│═════ RTP (G.711) ════►│

SDP translation (WebRTC to SIP):

WebRTC SDP:
m=audio 9 UDP/TLS/RTP/SAVPF 111
a=rtpmap:111 opus/48000/2
a=ice-ufrag:...
a=fingerprint:sha-256 ...

Translated SIP SDP:
m=audio 10000 RTP/AVP 0 8
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
c=IN IP4 192.168.1.100

Opus to G.711 transcoding (using libopus and g711 codec):

// Receive Opus packet from WebRTC
opusPacket := readFromWebRTC()

// Decode Opus to PCM
pcmSamples := opusDecoder.Decode(opusPacket)

// Encode PCM to G.711 μ-law
g711Samples := make([]byte, len(pcmSamples)/2)
for i, sample := range pcmSamples {
    g711Samples[i] = linearToMulaw(sample)
}

// Send to SIP endpoint
sendToSIP(g711Samples)

DTMF handling:

// Receive DTMF event from WebRTC data channel
// or detect in-band DTMF tones

// Send as RFC 4733 RTP event to SIP
dtmfPayload := []byte{
    digit,        // 0-9, *, #
    0x80 | 10,    // End flag + volume
    0x00, 0xa0,   // Duration (160 samples)
}

Learning milestones:

  1. SIP INVITE/200/ACK works → You understand SIP signaling
  2. Audio transcoding works → You understand codec conversion
  3. Calls to real phones work → You’ve built a complete gateway
  4. DTMF is transmitted → You understand telephony details

Project 14: Live Streaming with WebRTC

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: JavaScript + Go
  • Alternative Programming Languages: TypeScript, Rust
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Broadcasting / One-to-Many / Media Servers
  • Software or Tool: WebRTC + HLS/DASH fallback
  • Main Book: “Streaming Media with Peer-to-Peer Networks” by Eli Kara

What you’ll build: A live streaming platform where a broadcaster sends video via WebRTC (low latency) and thousands of viewers receive it—either via WebRTC (for <500ms latency) or HLS/DASH fallback (for scale). Like Twitch but with WebRTC.

Why it teaches WebRTC: One-to-many streaming pushes WebRTC’s architecture. You’ll understand why you need server-side infrastructure (SFU/CDN), how to handle massive fanout, and when to fall back to traditional streaming protocols.

Core challenges you’ll face:

  • Ingesting WebRTC and broadcasting to many → maps to SFU fanout
  • Converting WebRTC to HLS/DASH for CDN → maps to transcoding/packaging
  • Handling viewer scale → maps to infrastructure design
  • Achieving sub-second latency → maps to WebRTC advantages

Key Concepts:

  • WebRTC ingest: “WebRTC for the Curious” - Sean DuBois
  • HLS protocol: Apple HLS specification
  • DASH protocol: MPEG-DASH standard
  • CDN integration: “High Performance Browser Networking” Chapter 14

Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Project 12 (SFU), understanding of video streaming

Real world outcome:

┌─────────────────────────────────────────────────────────────────┐
│  📺 Live Stream - "Gaming with Alice"                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                                                             ││
│  │                    [LIVE VIDEO PLAYER]                      ││
│  │                                                             ││
│  │                   🔴 LIVE  👁 12,453 viewers                ││
│  │                                                             ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                 │
│  Latency Mode: [Ultra-Low (WebRTC)] [Low (LL-HLS)] [Normal]    │
│  Current Latency: 380ms (WebRTC)                               │
│                                                                 │
│  Stream Stats:                                                 │
│  ├─ Broadcaster: Alice (WebRTC ingest)                         │
│  │   └─ 1920x1080 @ 60fps, 8 Mbps                              │
│  ├─ Viewers (WebRTC): 847 (< 500ms latency)                    │
│  ├─ Viewers (LL-HLS): 4,206 (2-4s latency)                     │
│  └─ Viewers (HLS): 7,400 (8-10s latency)                       │
│                                                                 │
│  Chat: [WebRTC viewers see chat sync'd with video]             │
│                                                                 │
│  Broadcaster Dashboard:                                        │
│  └─ Stream Key: rtmp://ingest.example.com/live/abc123          │
│     OR WebRTC: https://studio.example.com/broadcast            │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Architecture:

              ┌──────────────┐
              │ Broadcaster  │
              │  (WebRTC)    │
              └──────┬───────┘
                     │
              ┌──────▼───────┐
              │   Ingest     │
              │   Server     │
              └──────┬───────┘
         ┌───────────┼───────────┐
         │           │           │
   ┌─────▼─────┐ ┌───▼───┐ ┌────▼────┐
   │ SFU Relay │ │Transcoder│ │ CDN    │
   │ (WebRTC)  │ │ (FFmpeg) │ │ Origin │
   └─────┬─────┘ └────┬────┘ └────┬───┘
         │            │           │
    ┌────▼────┐  ┌────▼────┐ ┌────▼────┐
    │ WebRTC  │  │ LL-HLS  │ │   HLS   │
    │ Viewers │  │ Viewers │ │ Viewers │
    │ (<500)  │  │ (~3s)   │ │ (~10s)  │
    └─────────┘  └─────────┘ └─────────┘

WebRTC ingest (server-side):

// Receive broadcaster's WebRTC stream
pc.OnTrack(func(track *webrtc.TrackRemote, receiver *webrtc.RTPReceiver) {
    // 1. Forward to SFU for WebRTC viewers
    sfu.AddTrack(track)

    // 2. Feed to transcoder for HLS
    go func() {
        for {
            rtp, _, _ := track.ReadRTP()
            transcoder.WriteRTP(rtp)
        }
    }()
})

Transcoding to HLS (using FFmpeg):

# Receive RTP from Go code via UDP, output HLS
ffmpeg -i udp://127.0.0.1:5000 \
  -c:v libx264 -preset veryfast \
  -c:a aac \
  -f hls \
  -hls_time 2 \
  -hls_list_size 5 \
  -hls_flags delete_segments \
  /var/www/live/stream.m3u8

Low-Latency HLS (LL-HLS):

  • Smaller segments (0.5-1s instead of 6s)
  • Partial segments
  • Blocking playlist reload
  • Achieves 2-4s latency

Viewer connection logic:

async function connectViewer() {
  // Try WebRTC first (best latency)
  try {
    await connectWebRTC();
    return;
  } catch (e) {
    console.log('WebRTC failed, falling back to HLS');
  }

  // Fall back to HLS for reliability
  connectHLS();
}

Scalability considerations:

  • WebRTC: Limited by SFU capacity (~500-2000 viewers per server)
  • LL-HLS: CDN can handle millions
  • Hybrid: Offer choice, use WebRTC for VIP/interactive viewers

Learning milestones:

  1. WebRTC ingest works → You understand server-side WebRTC
  2. Transcoding to HLS works → You understand protocol conversion
  3. WebRTC fanout to 10+ viewers → You understand SFU scaling
  4. Hybrid delivery works → You understand real-world streaming architecture

Project 15: P2P Multiplayer Game Engine

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: Rust (WebAssembly), C++
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Game Networking / State Sync / Prediction
  • Software or Tool: WebRTC Data Channels + Canvas/WebGL
  • Main Book: “Multiplayer Game Programming” by Joshua Glazer

What you’ll build: A real-time multiplayer game using WebRTC data channels for P2P communication. Implement client-side prediction, server reconciliation (or peer authority), and lag compensation. Build a simple game (like asteroids or a shooter) to demonstrate.

Why it teaches WebRTC: Games demand the lowest latency and most efficient data transfer. This project teaches you unreliable/unordered data channels, binary protocols, and how to build real-time synchronized experiences.

Core challenges you’ll face:

  • Designing efficient binary game state protocol → maps to ArrayBuffer messaging
  • Implementing client-side prediction → maps to local simulation
  • Handling network jitter → maps to interpolation/extrapolation
  • Managing P2P topology for games → maps to host migration

Key Concepts:

  • Game networking patterns: “Multiplayer Game Programming” Chapter 6 - Glazer & Madhav
  • Client-side prediction: Valve’s “Source Multiplayer Networking” article
  • Binary serialization: “Real-Time Collision Detection” Appendix - Ericson
  • Unreliable data channels: WebRTC Data Channel specification

Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Projects 4-5, game development basics

Real world outcome:

┌─────────────────────────────────────────────────────────────────┐
│  🎮 P2P Space Shooter - Lobby: "friends-match"                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │              * .    .        .   .    *                     ││
│  │    .    *         .    △ (you)    .        *               ││
│  │         .    ▷ (alice)    .           .                    ││
│  │   .  .       .       *       .   ◁ (bob)   .    *         ││
│  │        .   *    ═══════ (laser)    .       .              ││
│  │    *        .        .    .    *        .         .       ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                 │
│  Scoreboard:                                                   │
│  1. Alice    │ 12 kills │ Ping: 45ms                           │
│  2. You      │  8 kills │ Local                                │
│  3. Bob      │  5 kills │ Ping: 67ms                           │
│                                                                 │
│  Network Stats:                                                │
│  ├─ Topology: Mesh (3 players)                                 │
│  ├─ Updates/sec: 60 (unreliable channel)                       │
│  ├─ State size: 128 bytes/update                               │
│  ├─ Prediction error: 2.3 pixels avg                           │
│  └─ Rollback rate: 0.8%                                        │
│                                                                 │
│  [WASD to move] [Space to shoot] [Esc for menu]                │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Data channel configuration for games:

const gameChannel = peerConnection.createDataChannel("game", {
  ordered: false,        // Don't wait for out-of-order packets
  maxRetransmits: 0      // Unreliable - no retransmission
});

// For critical events (player joined, game over), use reliable channel
const eventChannel = peerConnection.createDataChannel("events", {
  ordered: true          // Reliable and ordered
});

Binary game state protocol (using ArrayBuffer):

// Position update: [type(1) | playerId(1) | x(4) | y(4) | angle(4) | velX(4) | velY(4)]
function encodePosition(player) {
  const buffer = new ArrayBuffer(22);
  const view = new DataView(buffer);

  view.setUint8(0, MESSAGE_TYPE.POSITION);
  view.setUint8(1, player.id);
  view.setFloat32(2, player.x, true);
  view.setFloat32(6, player.y, true);
  view.setFloat32(10, player.angle, true);
  view.setFloat32(14, player.velX, true);
  view.setFloat32(18, player.velY, true);

  return buffer;
}

function decodePosition(buffer) {
  const view = new DataView(buffer);
  return {
    type: view.getUint8(0),
    id: view.getUint8(1),
    x: view.getFloat32(2, true),
    y: view.getFloat32(6, true),
    angle: view.getFloat32(10, true),
    velX: view.getFloat32(14, true),
    velY: view.getFloat32(18, true)
  };
}

Client-side prediction:

class GameState {
  constructor() {
    this.localInputs = [];  // Unacknowledged inputs
    this.tick = 0;
  }

  processLocalInput(input) {
    // 1. Apply input locally immediately
    this.applyInput(this.localPlayer, input);

    // 2. Save input with tick number
    this.localInputs.push({ tick: this.tick, input });

    // 3. Send to peers
    sendInput(input, this.tick);

    this.tick++;
  }

  receiveServerState(state) {
    // 1. Set authoritative state
    this.localPlayer = state.player;

    // 2. Remove acknowledged inputs
    this.localInputs = this.localInputs.filter(i => i.tick > state.lastTick);

    // 3. Re-apply unacknowledged inputs
    for (const saved of this.localInputs) {
      this.applyInput(this.localPlayer, saved.input);
    }
  }
}

Interpolation for remote players:

class RemotePlayer {
  constructor() {
    this.positionBuffer = [];  // Timestamped positions
  }

  addPosition(pos, timestamp) {
    this.positionBuffer.push({ pos, timestamp });
    // Keep only last 1 second
    const cutoff = Date.now() - 1000;
    this.positionBuffer = this.positionBuffer.filter(p => p.timestamp > cutoff);
  }

  getInterpolatedPosition() {
    const renderTime = Date.now() - 100; // Render 100ms in the past

    // Find two positions to interpolate between
    for (let i = 0; i < this.positionBuffer.length - 1; i++) {
      const a = this.positionBuffer[i];
      const b = this.positionBuffer[i + 1];

      if (a.timestamp <= renderTime && renderTime <= b.timestamp) {
        const t = (renderTime - a.timestamp) / (b.timestamp - a.timestamp);
        return lerp(a.pos, b.pos, t);
      }
    }

    // Extrapolate if no future data
    return this.extrapolate();
  }
}

Learning milestones:

  1. Two players see each other move → You understand basic state sync
  2. Movement feels smooth locally → You understand client-side prediction
  3. Remote players move smoothly → You understand interpolation
  4. Shots register correctly → You understand lag compensation

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. Media Stream Playground Beginner Weekend Medium ⭐⭐⭐
2. Screen Recorder Beginner Weekend Medium ⭐⭐⭐⭐
3. Real-Time Video Filters Intermediate 1-2 weeks High ⭐⭐⭐⭐⭐
4. P2P Video Call Intermediate 1-2 weeks Very High ⭐⭐⭐⭐
5. File Transfer Intermediate 1 week High ⭐⭐⭐
6. Mesh Conference Advanced 2-3 weeks Very High ⭐⭐⭐⭐
7. STUN/TURN Server Expert 1 month Extreme ⭐⭐⭐⭐⭐
8. Signaling Server Intermediate 1-2 weeks High ⭐⭐⭐
9. Stats Dashboard Intermediate 1-2 weeks Very High ⭐⭐⭐
10. Walkie-Talkie Intermediate 1-2 weeks High ⭐⭐⭐⭐
11. Remote Desktop Advanced 3-4 weeks Very High ⭐⭐⭐⭐⭐
12. SFU Master 1-2 months Extreme ⭐⭐⭐⭐⭐
13. WebRTC-SIP Gateway Expert 1-2 months Extreme ⭐⭐⭐⭐
14. Live Streaming Advanced 3-4 weeks Very High ⭐⭐⭐⭐⭐
15. P2P Game Engine Advanced 3-4 weeks Very High ⭐⭐⭐⭐⭐

For Beginners (0-3 months):

Start here to build foundation:

  1. Project 1: Media Stream Playground - Understand browser media APIs
  2. Project 2: Screen Recorder - Learn capture and recording
  3. Project 4: P2P Video Call - THE core WebRTC project
  4. Project 5: File Transfer - Understand data channels

For Intermediate Developers (3-6 months):

Expand your skills:

  1. Project 3: Video Filters - Media processing pipeline
  2. Project 8: Signaling Server - Production infrastructure
  3. Project 9: Stats Dashboard - Debugging expertise
  4. Project 6: Mesh Conference - Multi-party complexity

For Advanced Developers (6-12 months):

Master the technology:

  1. Project 7: STUN/TURN Server - NAT traversal from scratch
  2. Project 11: Remote Desktop - Complex real-world application
  3. Project 12: SFU - Server-side WebRTC

For Experts (12+ months):

Build infrastructure:

  1. Project 13: SIP Gateway - Protocol bridging
  2. Project 14: Live Streaming - Broadcast at scale
  3. Project 15: P2P Game - Real-time game networking

Final Capstone Project: Production Video Conferencing Platform

  • File: LEARN_WEBRTC_DEEP_DIVE.md
  • Main Programming Language: TypeScript (Frontend) + Go (Backend)
  • Alternative Programming Languages: Rust, C++
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 5: Master
  • Knowledge Area: Full-Stack WebRTC / Distributed Systems / Media Servers
  • Software or Tool: Everything learned above
  • Main Book: All previous books combined

What you’ll build: A complete video conferencing platform like Zoom/Meet with:

  • Multi-party video calls (SFU-based)
  • Screen sharing with annotation
  • Virtual backgrounds
  • Recording to cloud
  • Breakout rooms
  • Waiting room and host controls
  • Mobile apps (React Native/Flutter)
  • Dial-in via SIP gateway
  • Live streaming to YouTube/Twitch
  • Real-time captions
  • Analytics dashboard

Why this is the capstone: This project combines every concept from all previous projects. You’ll integrate signaling, SFU, STUN/TURN, stats monitoring, screen sharing, video processing, and more into a cohesive, production-ready platform.

Core challenges you’ll face:

  • Orchestrating all WebRTC components → maps to system architecture
  • Handling scale (1000s of concurrent meetings) → maps to distributed systems
  • Mobile + Web consistency → maps to cross-platform development
  • Reliability and fallbacks → maps to production engineering

Key Concepts: All concepts from previous projects, plus:

  • Kubernetes for scaling media servers: “Kubernetes in Action” - Marko Lukša
  • Distributed systems: “Designing Data-Intensive Applications” - Martin Kleppmann
  • Mobile WebRTC: Platform-specific documentation

Difficulty: Master Time estimate: 6-12 months Prerequisites: All 15 previous projects

Real world outcome:

┌─────────────────────────────────────────────────────────────────┐
│  🎥 MeetUp Pro - Video Conferencing Platform                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Meeting: "Q4 Planning" | Host: Alice | 47 participants        │
│                                                                 │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐   │
│  │ Alice🎤 │ │   Bob   │ │ Charlie │ │  Diana  │ │   Eve   │   │
│  │(speaking)│ │         │ │         │ │         │ │         │   │
│  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘   │
│  [+ 42 more participants in grid view]                         │
│                                                                 │
│  ┌─ Screen Share ────────────────────────────────────────────┐  │
│  │                                                           │  │
│  │   [Alice's Screen - Quarterly Revenue Spreadsheet]        │  │
│  │   📝 Annotations enabled                                  │  │
│  │                                                           │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
│  Features:                                                     │
│  [🎤 Mute All] [📹 Spotlight] [🖐 Raise Hand Queue: 3]         │
│  [🚪 Breakout: 4 rooms] [⏺ Recording] [📺 Stream to YouTube]   │
│  [☎️ Dial-in: +1-555-0123] [💬 Live Captions: ON]              │
│                                                                 │
│  Platform Stats (Admin):                                       │
│  ├─ Concurrent meetings: 2,847                                 │
│  ├─ Total participants: 18,432                                 │
│  ├─ SFU servers: 12 (auto-scaling)                             │
│  ├─ Average latency: 89ms                                      │
│  └─ Recording storage: 2.4 TB today                            │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

This capstone integrates all previous knowledge:

  1. Frontend: React/Vue with custom video grid, controls, and chat
  2. Signaling: Project 8’s signaling server with room management
  3. Media Servers: Project 12’s SFU, deployed with Kubernetes
  4. TURN Servers: Project 7’s TURN for connectivity
  5. Video Processing: Project 3’s virtual backgrounds
  6. Screen Sharing: Project 2 + Project 11’s control channel
  7. Stats: Project 9’s dashboard for monitoring
  8. Telephony: Project 13’s SIP gateway for dial-in
  9. Streaming: Project 14’s WebRTC-to-HLS conversion
  10. Games/Interactive: Project 15’s low-latency patterns

Architecture:

                    ┌──────────────────┐
                    │   Load Balancer  │
                    └────────┬─────────┘
                             │
    ┌────────────────────────┼────────────────────────┐
    │                        │                        │
┌───▼───┐              ┌─────▼─────┐             ┌────▼────┐
│Web App│              │  API/Auth │             │  Admin  │
│(React)│              │   Server  │             │Dashboard│
└───┬───┘              └─────┬─────┘             └─────────┘
    │                        │
    │         ┌──────────────┼──────────────┐
    │         │              │              │
    │    ┌────▼────┐   ┌─────▼─────┐  ┌─────▼─────┐
    │    │Signaling│   │  Redis    │  │PostgreSQL │
    │    │ Server  │   │ (PubSub)  │  │ (Rooms)   │
    │    └────┬────┘   └───────────┘  └───────────┘
    │         │
    │    ┌────▼─────────────────────────────┐
    │    │        SFU Cluster (K8s)         │
    │    │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐│
    └────┼─►│SFU-1│ │SFU-2│ │SFU-3│ │SFU-N││
         │  └─────┘ └─────┘ └─────┘ └─────┘│
         └─────────────────────────────────┘
                       │
         ┌─────────────┼─────────────┐
         │             │             │
    ┌────▼────┐  ┌─────▼────┐  ┌─────▼────┐
    │Recording│  │Transcoder│  │   TURN   │
    │ Service │  │ (HLS out)│  │ Servers  │
    └─────────┘  └──────────┘  └──────────┘

Learning milestones:

  1. 2-party calls work → Basic integration complete
  2. 10-party meetings work → SFU properly configured
  3. Features work (recording, screen share) → Subsystems integrated
  4. 1000 concurrent meetings → Scale achieved
  5. Mobile apps work → Cross-platform complete

Summary: All Projects and Languages

# Project Main Language
1 Media Stream Playground JavaScript
2 Local Screen Recorder JavaScript
3 Real-Time Video Filters JavaScript
4 P2P Video Call JavaScript
5 Data Channel File Transfer JavaScript
6 Multi-Party Mesh Conference JavaScript
7 STUN/TURN Server Go
8 Signaling Server with Rooms Go
9 WebRTC Stats Dashboard JavaScript/TypeScript
10 Audio-Only Walkie-Talkie JavaScript
11 Remote Desktop Viewer JavaScript + Electron
12 SFU (Selective Forwarding Unit) Go
13 WebRTC-SIP Gateway Go
14 Live Streaming Platform JavaScript + Go
15 P2P Multiplayer Game Engine TypeScript
Capstone Production Video Conferencing TypeScript + Go

Essential Resources

Books

  • “Real-Time Communication with WebRTC” by Salvatore Loreto & Simon Pietro Romano - The foundational WebRTC book
  • “WebRTC for the Curious” by Sean DuBois - Free online book by Pion creator
  • “High Performance Browser Networking” by Ilya Grigorik - Network fundamentals
  • “WebRTC: APIs and RTCWEB Protocols” by Alan B. Johnston - Deep protocol coverage

Specifications (for deep understanding)

  • RFC 8825 - WebRTC Overview
  • RFC 5389 - STUN Protocol
  • RFC 5766 - TURN Protocol
  • RFC 8445 - ICE Protocol
  • RFC 3550 - RTP Protocol

Libraries

  • Pion (Go) - Production WebRTC implementation
  • mediasoup (Node.js) - Popular SFU framework
  • Janus (C) - General-purpose WebRTC server
  • Jitsi (Java) - Open-source video conferencing

Tools

  • chrome://webrtc-internals - Built-in WebRTC debugger
  • Wireshark - Packet inspection (RTP/RTCP)
  • testRTC - WebRTC testing platform

After completing this learning journey, you will have built everything from basic media capture to production video conferencing infrastructure. You’ll understand WebRTC at every level—from the browser APIs to the network protocols to the server architectures that power modern real-time communication.