Learning VoIP Systems: From Protocols to Production

Goal: After completing these projects, you will deeply understand how voice calls are transmitted over IP networks from first principles. You’ll know how SIP signaling orchestrates call setup/teardown, how RTP transports real-time audio with quality guarantees, how codecs balance bandwidth vs quality, and how modern telephony platforms scale to millions of concurrent calls while interconnecting with the traditional phone network. You’ll be able to build production-grade VoIP systems that handle real phone calls and debug complex telephony issues by reading packet captures.


Why VoIP Matters

The Telecommunications Revolution

Voice over IP has fundamentally transformed how the world communicates. In 1995, a voice call meant circuit-switched copper wires and expensive long-distance charges. By 2025, over 3.1 billion video calls occur daily using WebRTC-based services, and the global VoIP market is valued at $161.79 billion, projected to reach $413.36 billion by 2032 with a CAGR of 12.7% (Coherent Market Insights, Nextiva VoIP Stats).

Key Industry Statistics (2025):

Historical Evolution

Traditional PSTN (1876-1990s)              VoIP Era (2000s-Present)
┌─────────────────────────┐                ┌─────────────────────────┐
│  Circuit-Switched       │                │   Packet-Switched       │
│  ┌────┐    ┌────┐      │                │  ┌────┐    ┌────┐      │
│  │ A  │────│ B  │      │                │  │ A  │····│ B  │      │
│  └────┘    └────┘      │                │  └────┘    └────┘      │
│  Dedicated copper wire  │                │  Shared IP network     │
│  $0.25/min long-dist.   │                │  $0.01/min or free     │
│  No data integration    │                │  Unified with apps     │
└─────────────────────────┘                └─────────────────────────┘

Why this transformation happened:

  1. Economics: IP networks were already deployed for data—running voice over them eliminated redundant infrastructure
  2. Integration: VoIP enables unified communications (voice + video + messaging + screen sharing)
  3. Scalability: Software-based systems scale horizontally vs expensive hardware PBXs
  4. Flexibility: Remote work, mobile clients, browser-based calling all become trivial

Real-World Applications

  • Contact Centers: Companies like Genesys, NICE, Amazon Connect use SIP to handle millions of customer calls
  • UCaaS Platforms: Zoom, Microsoft Teams, Webex power enterprise collaboration
  • CPaaS Services: Twilio, Vonage, Plivo provide programmable voice APIs for developers
  • Traditional Carriers: AT&T, Verizon have migrated to IP-based core networks
  • Emergency Services: Next-generation 911 systems use SIP for location and multimedia

Prerequisites & Background Knowledge

Essential Prerequisites (Must Have)

Before starting these projects, you should have:

  1. Networking Fundamentals
    • TCP/UDP socket programming
    • Understanding of IP addressing, routing, NAT
    • Familiarity with packet captures (Wireshark)
  2. Programming Skills
    • Proficiency in C or Go for low-level projects
    • Comfort with JavaScript/Python for higher-level projects
    • Experience with multithreading and asynchronous I/O
  3. Linux System Administration
    • Command-line proficiency
    • Process management, networking tools
    • Basic firewall configuration

Helpful But Not Required

You’ll learn these during the projects:

  • Audio processing and DSP
  • Real-time systems design
  • Codec theory and compression
  • Telecommunications regulations
  • High-availability architecture

Self-Assessment Questions

Can you answer these? If yes, you’re ready:

  • What’s the difference between TCP and UDP, and why would you choose one over the other?
  • How do you capture network packets on a Linux system?
  • What is NAT and why does it complicate peer-to-peer connections?
  • Can you write a basic TCP server that accepts connections and echoes back data?
  • Do you understand what a state machine is and how to implement one?

Don’t worry if you can’t answer these yet:

  • What is jitter and how do you compensate for it?
  • How does SIP differ from H.323?
  • What codecs are commonly used in VoIP and why?

Development Environment Setup

Required Tools:

  • Linux system (Ubuntu 22.04+ recommended) or macOS
  • Wireshark for packet analysis
  • Text editor with syntax highlighting
  • SIP testing tools: SIPp, SIPVicious
  • Audio playback/recording tools: SoX, FFmpeg

Recommended:

  • VirtualBox/Docker for isolated test environments
  • SIP softphones: Zoiper, Linphone, MicroSIP
  • SIP server for testing: Asterisk, FreeSWITCH, Kamailio
  • SIP trunk provider account: Voip.ms ($0.85/month DID + usage), Telnyx

Development Setup:

# Ubuntu/Debian
sudo apt install -y build-essential wireshark tshark sox ffmpeg

# Install Asterisk for testing
sudo apt install -y asterisk

# Install SIPp for load testing
sudo apt install -y sipp

Time Investment

  • Project 1 (Raw SIP Client): 3-4 weeks, 20-30 hours/week
  • Project 2 (Asterisk PBX): 2-3 weeks, 10-15 hours/week
  • Project 3 (Twilio App): 1-2 weeks, 10-15 hours/week
  • Project 4 (SIP Proxy): 4-5 weeks, 20-30 hours/week
  • Project 5 (WebRTC Gateway): 4-6 weeks, 25-35 hours/week
  • Total learning path: 4-6 months of consistent work

Important Reality Check

VoIP is a deep technical domain. You’re dealing with:

  • Multiple complex protocols (SIP, SDP, RTP, RTCP)
  • Real-time constraints (audio must arrive within ~150ms)
  • Network unpredictability (packet loss, jitter, NAT)
  • Legacy telephony integration (SS7, ISDN)

This is normal: Spending a week debugging why audio only works one-way is part of the learning process. Every VoIP engineer has been there.


Core Concept Analysis

VoIP sits at the intersection of networking, real-time systems, audio processing, and telephony infrastructure. To truly understand it, you need to grasp several layers.

1. SIP Signaling - The Call Control Plane

SIP (Session Initiation Protocol) is the “brain” that sets up, modifies, and tears down calls. It’s a text-based protocol similar to HTTP.

Basic SIP Call Flow:

Alice (UAC)                    SIP Proxy                    Bob (UAS)
    │                              │                              │
    │─────INVITE──────────────────>│                              │
    │  (I want to call Bob)        │                              │
    │                              │─────INVITE──────────────────>│
    │                              │                              │
    │<────100 Trying───────────────│<────100 Trying───────────────│
    │  (Looking for Bob...)        │  (Phone ringing...)          │
    │                              │                              │
    │                              │<────180 Ringing──────────────│
    │<────180 Ringing──────────────│  (Bob's phone is ringing)    │
    │                              │                              │
    │                              │<────200 OK───────────────────│
    │<────200 OK───────────────────│  (Bob answered!)             │
    │  (Call established)          │                              │
    │                              │                              │
    │─────ACK──────────────────────>│─────ACK──────────────────────>│
    │  (Got it, thanks)            │                              │
    │                              │                              │
    │<═══════════ RTP Audio (direct) ══════════════════════════════>│
    │                              │                              │
    │─────BYE──────────────────────>│─────BYE──────────────────────>│
    │  (Hanging up)                │                              │
    │<────200 OK───────────────────│<────200 OK───────────────────│

Key Insight: SIP only handles signaling—it tells endpoints where to send audio, but audio flows directly via RTP (not through the proxy).

2. RTP/RTCP - The Media Transport Plane

Once SIP establishes the call, Real-time Transport Protocol (RTP) carries the actual audio packets.

RTP Packet Structure:

┌─────────────────────────────────────────────────────────┐
│ RTP Header (12 bytes)                                   │
├────────┬────────┬────────────┬──────────────────────────┤
│ V=2    │ PT     │ Seq Number │ Timestamp                │
│ (2bit) │ (7bit) │ (16bit)    │ (32bit)                  │
├────────┴────────┴────────────┴──────────────────────────┤
│ SSRC (Synchronization Source - 32bit)                   │
├─────────────────────────────────────────────────────────┤
│ Payload (Audio Data - Variable length)                  │
│ • G.711: 160 bytes (20ms of audio)                      │
│ • Opus: 40-120 bytes (adaptive)                         │
└─────────────────────────────────────────────────────────┘

Real-Time Constraints:

  • Packets sent every 20ms (50 packets/second)
  • Must arrive within ~150ms for good quality
  • Jitter (variance in arrival time) must be buffered
  • Lost packets cannot be retransmitted (unlike TCP)

RTCP (Companion Protocol):

  • Monitors call quality (packet loss %, jitter, round-trip time)
  • Sent every 5 seconds
  • Used by endpoints to adjust codec bitrate

3. Codecs - Balancing Quality vs Bandwidth

Codecs compress audio to fit over networks. The choice depends on bandwidth availability and quality requirements.

Common Codec Comparison:

┌───────────┬───────────┬──────────────┬──────────────┬────────────┐
│ Codec     │ Bitrate   │ Bandwidth    │ Quality      │ Complexity │
├───────────┼───────────┼──────────────┼──────────────┼────────────┤
│ G.711     │ 64 kbps   │ ~80 kbps     │ Toll quality │ Very low   │
│ (μ-law)   │           │ (with RTP)   │ (reference)  │ (none)     │
├───────────┼───────────┼──────────────┼──────────────┼────────────┤
│ G.729     │ 8 kbps    │ ~24 kbps     │ Good         │ High       │
│           │           │              │ (compressed) │ (licensed) │
├───────────┼───────────┼──────────────┼──────────────┼────────────┤
│ Opus      │ 6-510kbps │ Adaptive     │ Excellent    │ Medium     │
│ (WebRTC)  │ (dynamic) │              │ (best)       │ (free)     │
├───────────┼───────────┼──────────────┼──────────────┼────────────┤
│ G.722     │ 64 kbps   │ ~80 kbps     │ Wideband HD  │ Low        │
│           │           │              │ (50-7000 Hz) │            │
└───────────┴───────────┴──────────────┴──────────────┴────────────┘

Codec Negotiation via SDP:

v=0
o=alice 2890844526 2890844526 IN IP4 192.168.1.100
s=Call
c=IN IP4 192.168.1.100
m=audio 49170 RTP/AVP 0 8 97
a=rtpmap:0 PCMU/8000      ← G.711 μ-law (priority 1)
a=rtpmap:8 PCMA/8000      ← G.711 A-law (priority 2)
a=rtpmap:97 opus/48000    ← Opus (priority 3)

4. NAT Traversal - The Real-World Challenge

Most VoIP endpoints are behind NATs/firewalls. RTP uses random UDP ports, making direct connections impossible without help.

NAT Problem Visualization:

Home Network (NAT)                          Internet
┌───────────────────────┐
│ Alice: 192.168.1.100  │
│ RTP Port: 50000       │
└──────────┬────────────┘
           │
      ┌────▼─────┐
      │ NAT/FW   │
      │ Public:  │
      │ 8.8.8.8  │
      └────┬─────┘
           │
           │ Outside: Can I send to
           │ 192.168.1.100:50000? NO!
           │
           └──────────────────────────────────────> Bob

Solutions:

  1. STUN: Server tells you your public IP/port
  2. TURN: Relay server forwards packets when direct connection fails
  3. ICE: Tries multiple methods (direct, STUN, TURN) and picks best

ICE Negotiation Process:

Alice                    STUN Server              Bob
  │                           │                    │
  │───Get reflexive address───>│                   │
  │<──You are 8.8.8.8:50001────│                   │
  │                           │                    │
  │─────Offer (candidates)────────────────────────>│
  │  • Host: 192.168.1.100:50000                   │
  │  • Srflx: 8.8.8.8:50001 (via STUN)            │
  │  • Relay: turn.server.com:60000 (via TURN)    │
  │                           │                    │
  │<────Answer (candidates)──────────────────────-─│
  │  • Host: 10.0.0.50:40000                       │
  │  • Srflx: 9.9.9.9:40001                        │
  │                           │                    │
  │<══Connectivity checks (find best path)═══════>│
  │  ✓ Direct connection works!                    │
  │<════════════ RTP Audio ════════════════════════>│

5. PBX Architecture - Enterprise Call Routing

A PBX (Private Branch Exchange) routes calls within an organization and to/from the PSTN.

Traditional vs IP-PBX:

Traditional PBX (1960s-1990s)        IP-PBX (Modern)
┌─────────────────────────┐          ┌─────────────────────────┐
│  Proprietary Hardware   │          │  Software (Asterisk)    │
│  ┌──────────────────┐   │          │  ┌──────────────────┐   │
│  │ Analog line cards│   │          │  │ SIP trunks       │   │
│  │ Copper phones    │   │          │  │ IP phones        │   │
│  │ Manual wiring    │   │          │  │ Softphones       │   │
│  └──────────────────┘   │          │  │ WebRTC clients   │   │
│  Cost: $50,000+         │          │  └──────────────────┘   │
│  Vendor lock-in         │          │  Cost: Free (software)  │
└─────────────────────────┘          │  Open standards         │
                                     └─────────────────────────┘

Dialplan Example (Asterisk):

exten => 1001,1,Answer()           ; Extension 1001
    same => n,Dial(SIP/alice,30)    ; Ring Alice for 30sec
    same => n,VoiceMail(1001@main)  ; If no answer, voicemail
    same => n,Hangup()

exten => 9,1,Answer()               ; Extension 9 (external)
    same => n,Dial(SIP/twilio/1${EXTEN}) ; Route via SIP trunk

6. PSTN Interconnection - Bridging Old and New

Connecting VoIP to traditional phone numbers requires SIP trunking.

SIP Trunk Architecture:

Your VoIP System                  SIP Trunk Provider              PSTN
┌────────────────┐                ┌────────────────┐          ┌──────────┐
│  Asterisk PBX  │                │  Twilio/Telnyx │          │ AT&T     │
│  192.168.1.10  │                │  ┌──────────┐  │          │ ┌──────┐ │
│  ┌──────────┐  │   SIP/RTP      │  │ SBC      │  │  SS7/SIP │ │PSTN  │ │
│  │Extensions│  │───────────────>│  │ Gateway  │──┼─────────>│ │Switch│ │
│  │1001-1099 │  │                │  └──────────┘  │          │ └──────┘ │
│  └──────────┘  │                │  DID Pool:     │          │ Real     │
│                │                │  +1-555-0100   │          │ Phone    │
└────────────────┘                └────────────────┘          └──────────┘

What you pay for:

  • DID (Direct Inward Dialing): $0.85-3/month per number
  • Inbound calls: $0.0050-0.01/minute
  • Outbound calls: $0.01-0.02/minute (US)
  • SMS: $0.0075/message (optional)

Concept Summary Table

Concept What You Must Internalize Which Projects Teach This
SIP Protocol Request/response model, transaction state machines, dialog management Project 1, 4
SDP Negotiation Offer/answer model, codec selection, media capabilities Project 1, 5
RTP Streaming Packet structure, sequencing, timing, payload formats Project 1, 5
NAT Traversal STUN/TURN/ICE, symmetric vs full-cone NAT, port prediction Project 5
Codec Theory Sampling rates, compression algorithms, quality vs bandwidth Project 1, 2, 5
PBX Logic Extension routing, IVR state machines, call queues, voicemail Project 2
SIP Trunking DIDs, E.164 numbering, carrier interconnection, billing Project 2, 6
Quality of Service Jitter buffers, packet loss concealment, RTCP feedback, echo cancellation Project 1, 2
Signaling Servers Proxy vs B2BUA, stateful vs stateless, registrar, location service Project 4
WebRTC DTLS-SRTP, ICE, data channels, browser APIs Project 5

Deep Dive Reading by Concept

For SIP Protocol Understanding

| Book | Chapters | Why These Chapters | |——|———-|——————-| | “SIP: Understanding the Session Initiation Protocol” (Alan B. Johnston) | Ch 1-4, 7-9 | Covers SIP architecture, message formats, transactions, and dialogs in depth | | “SIP Demystified” (Gonzalo Camarillo) | Ch 3-5 | Excellent for understanding proxy behavior and routing |

For RTP and Media Transport

| Book | Chapters | Why These Chapters | |——|———-|——————-| | “VoIP Voice and Fax Over IP” (Yaacov Levin) | Ch 4-6 | Deep dive into RTP, RTCP, and codec implementation | | “High Performance Browser Networking” (Ilya Grigorik) | Ch 18 | WebRTC and real-time media transport |

For PBX and Asterisk

| Book | Chapters | Why These Chapters | |——|———-|——————-| | “Asterisk: The Definitive Guide” (5th Ed) | Ch 6-7, 16, 22 | Dialplan programming, SIP trunking, IVR, clustering | | “Operating FreeSWITCH” (Anthony Minessale) | Ch 3-5 | Alternative view of PBX architecture |

For WebRTC

| Book | Chapters | Why These Chapters | |——|———-|——————-| | “WebRTC: APIs and RTCWEB Protocols” (Alan B. Johnston) | Ch 1-5, 8 | Complete WebRTC stack, ICE, DTLS-SRTP | | “Real-Time Communication with WebRTC” (Salvatore Loreto) | Ch 2-4 | Practical WebRTC implementation |

For Production Deployment

| Book | Chapters | Why These Chapters | |——|———-|——————-| | “Hacking Exposed VoIP” (David Endler) | Ch 5-7 | Security, fraud prevention, toll fraud | | “VoIP and Unified Communications” (William A. Flanagan) | Ch 8-12 | Enterprise deployment, billing, QoS |


Quick Start: First 48 Hours

Overwhelmed? Start here:

Hour 0-4: Set Up Your Lab

# Install Asterisk on Ubuntu
sudo apt update && sudo apt install -y asterisk

# Start Asterisk
sudo systemctl start asterisk
sudo asterisk -rvvv  # Connect to console

Hour 4-8: Configure Two SIP Extensions

Edit /etc/asterisk/pjsip.conf:

[1001]
type=endpoint
context=internal
disallow=all
allow=ulaw
auth=1001
aors=1001

[1001]
type=auth
auth_type=userpass
username=1001
password=secret1001

[1001]
type=aor
max_contacts=1

Repeat for extension 1002.

Hour 8-12: Write Your First Dialplan

Edit /etc/asterisk/extensions.conf:

[internal]
exten => 1001,1,Dial(PJSIP/1001,30)
exten => 1002,1,Dial(PJSIP/1002,30)

Reload: sudo asterisk -rx "dialplan reload"

Hour 12-24: Make Your First Call

  1. Download Zoiper softphone
  2. Register as extension 1001 (server: your IP, username: 1001, password: secret1001)
  3. Register second instance as 1002
  4. Call 1001 from 1002—you’ll hear audio!

Hour 24-48: Capture Packets and Understand

sudo tcpdump -i any -n port 5060 -A  # Watch SIP signaling
sudo tcpdump -i any -n udp portrange 10000-20000  # Watch RTP audio

Open captures in Wireshark: Telephony → VoIP Calls

Milestone: When you see the INVITE→180→200→ACK sequence in Wireshark and understand what each message does, you’re ready for the projects.


Path 1: “I Want to Build VoIP Products” (Practical First)

  1. Week 1-2: Project 2 (Asterisk PBX) - Get hands-on quickly
  2. Week 3-4: Project 3 (Twilio App) - See cloud telephony
  3. Week 5-8: Project 1 (Raw SIP Client) - Understand protocols deeply
  4. Week 9-14: Project 5 (WebRTC Gateway) - Modern architecture

Why: You’ll build useful systems immediately while progressively deepening your understanding.

Path 2: “I Want Deep Protocol Understanding” (Theory First)

  1. Week 1-4: Read RFC 3261 (SIP) alongside Project 1
  2. Week 5-8: Project 4 (SIP Proxy) - Server-side perspective
  3. Week 9-11: Project 2 (Asterisk) - See how production systems work
  4. Week 12-18: Project 5 (WebRTC Gateway) - Modern protocols

Why: You’ll have complete protocol mastery before tackling higher-level systems.

Path 3: “I’m a Web Developer” (Browser First)

  1. Week 1-2: Project 3 (Twilio) - Familiar APIs
  2. Week 3-6: Project 5 (WebRTC Gateway) - Browser-native calling
  3. Week 7-10: Project 1 (Raw SIP Client) - Understand underlying protocols
  4. Week 11-13: Project 2 (Asterisk) - Backend infrastructure

Why: Leverages your existing web skills while gradually introducing VoIP concepts.

Path 4: “I Want to Launch a VoIP Business” (Full Stack)

  1. Week 1-3: Project 2 (Asterisk) + Get real DID from Voip.ms
  2. Week 4-6: Project 3 (Twilio) - Learn CPaaS business model
  3. Week 7-10: Project 1 (SIP Client) - Differentiate with custom features
  4. Week 11-16: Project 5 (WebRTC) - Browser-based offering
  5. Week 17-30: Final Project (Production Service) - Build your platform

Why: Covers the full commercial stack from infrastructure to customer-facing features.


Project 1: Raw SIP Client from Scratch

  • File: VOIP_TELEPHONY_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 4: Expert
  • Knowledge Area: Telephony / VoIP
  • Software or Tool: SIP Protocol
  • Main Book: “SIP: Understanding the Session Initiation Protocol” by Alan B. Johnston

What you’ll build: A minimal SIP softphone that can register with a SIP server, make calls, and send/receive audio - using raw sockets, no VoIP libraries.

Why it teaches VoIP: You cannot fake understanding SIP. By parsing SIP messages yourself and implementing the state machine for call setup (INVITE → 100 Trying → 180 Ringing → 200 OK → ACK), you’ll internalize how VoIP signaling actually works.

Core challenges you’ll face:

  • Parsing SIP messages and SDP (maps to understanding signaling)
  • Implementing SIP transaction state machines (maps to call flow understanding)
  • Negotiating codecs via SDP offer/answer (maps to media negotiation)
  • Opening RTP sockets and streaming audio (maps to media transport)
  • Handling NAT traversal issues (maps to real-world deployment challenges)

Resources for key challenges:

  • “SIP: Understanding the Session Initiation Protocol” by Alan B. Johnston - The definitive SIP reference
  • RFC 3261 (SIP) and RFC 3550 (RTP) - Read alongside your implementation

Key Concepts:

  • SIP Message Parsing: RFC 3261 Section 7 - Understanding request/response structure
  • SDP Offer/Answer Model: RFC 3264 - How media capabilities are negotiated
  • RTP Packet Format: RFC 3550 Section 5 - Real-time transport protocol structure
  • NAT Traversal: “VoIP: Voice and Fax Over IP” by Yaacov Levin, Ch. 8 - STUN/TURN/ICE concepts

Difficulty: Advanced Time estimate: 1 month+ Prerequisites: Socket programming, basic networking (TCP/UDP), some audio basics

Real world outcome:

  • Your softphone will be able to dial another SIP phone (like Zoiper or Linphone) and have an actual voice conversation
  • You’ll see your SIP messages in Wireshark and understand every field
  • You can call your own phone number through a SIP trunk provider

Learning milestones:

  1. Successfully register with a SIP server (Asterisk) and see “200 OK” - you understand SIP authentication
  2. Complete a call setup handshake and hear audio - you understand the full signaling flow
  3. Handle call transfers, hold, and DTMF - you understand advanced SIP features

Real World Outcome

When completed, you’ll have a working softphone that can make and receive calls like any commercial VoIP app. Here’s exactly what you’ll see:

Terminal output when registering:

$ ./sip_client --register sip:alice@192.168.1.10 --password secret123

[2025-01-15 10:23:45] SIP Client v1.0 starting...
[2025-01-15 10:23:45] Local IP: 192.168.1.100
[2025-01-15 10:23:45] Binding RTP socket to port 50000
[2025-01-15 10:23:45] Binding SIP socket to port 5060

[TX] Sending REGISTER to 192.168.1.10:5060:
REGISTER sip:192.168.1.10 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bK776asdhds
From: <sip:alice@192.168.1.10>;tag=1928301774
To: <sip:alice@192.168.1.10>
Call-ID: a84b4c76e66710
CSeq: 1 REGISTER
Contact: <sip:alice@192.168.1.100:5060>
Expires: 3600
Content-Length: 0

[RX] Received from 192.168.1.10:5060:
SIP/2.0 401 Unauthorized
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bK776asdhds
From: <sip:alice@192.168.1.10>;tag=1928301774
To: <sip:alice@192.168.1.10>;tag=as7d0a3e4f
Call-ID: a84b4c76e66710
CSeq: 1 REGISTER
WWW-Authenticate: Digest realm="asterisk", nonce="1a2b3c4d5e6f7g8h9i0j"
Content-Length: 0

[INFO] Authentication required. Computing digest response...
[TX] Sending authenticated REGISTER...

[RX] Received from 192.168.1.10:5060:
SIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bK776asdhds
From: <sip:alice@192.168.1.10>;tag=1928301774
To: <sip:alice@192.168.1.10>;tag=as7d0a3e4f
Contact: <sip:alice@192.168.1.100:5060>;expires=3600
Expires: 3600

✓ Registration successful! Expires in 3600 seconds.
✓ Ready to make/receive calls.

Terminal output when making a call:

$ ./sip_client --call sip:bob@192.168.1.10

[TX] Sending INVITE to bob@192.168.1.10:
INVITE sip:bob@192.168.1.10 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:5060;branch=z9hG4bKnashds8
From: "Alice" <sip:alice@192.168.1.10>;tag=9fxced76sl
To: <sip:bob@192.168.1.10>
Call-ID: 3848276298220188511@192.168.1.100
CSeq: 314159 INVITE
Contact: <sip:alice@192.168.1.100:5060>
Content-Type: application/sdp
Content-Length: 147

v=0
o=alice 2890844526 2890844526 IN IP4 192.168.1.100
s=Call
c=IN IP4 192.168.1.100
t=0 0
m=audio 50000 RTP/AVP 0 8
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000

[RX] SIP/2.0 100 Trying
[RX] SIP/2.0 180 Ringing
♪ Ringback tone playing...

[RX] SIP/2.0 200 OK
Content-Type: application/sdp

v=0
o=bob 2890844527 2890844527 IN IP4 192.168.1.101
s=Call
c=IN IP4 192.168.1.101
t=0 0
m=audio 40000 RTP/AVP 0
a=rtpmap:0 PCMU/8000

[INFO] Call answered! Codec: G.711 μ-law
[INFO] RTP: Sending to 192.168.1.101:40000
[INFO] RTP: Receiving from 192.168.1.101:40000
🎤 Audio streaming... (Ctrl+C to hang up)

[STATS] Packets sent: 1,247 | Received: 1,243 | Loss: 0.32%
[STATS] Jitter: 2.3ms | Round-trip: 45ms

Wireshark capture showing your packets:

  • You’ll see your SIP INVITE going out
  • You’ll see RTP packets flowing every 20ms
  • You can play back the actual audio from the capture
  • You can inspect every field in your SIP headers

Audio quality indicators:

  • Crystal clear audio (if local network)
  • < 50ms latency
  • < 1% packet loss in normal conditions

The Core Question You’re Answering

“How does a voice call work at the protocol level—from dialing a number to hearing the other person’s voice?”

By the end, you’ll be able to explain:

  • What happens in the 500ms between clicking “Call” and hearing ringback
  • Why audio sometimes works one direction but not the other (NAT/firewall issues)
  • How phones negotiate which codec to use
  • Why VoIP apps need both TCP and UDP connections

Concepts You Must Understand First

Before starting, ensure you grasp these concepts:

Concept What You Need to Know Where to Learn It
UDP Socket Programming Creating UDP sockets, sendto/recvfrom, non-blocking I/O “Unix Network Programming” (Stevens) Ch. 8
State Machines Designing states and transitions, handling events “Design Patterns” (Gang of Four) - State pattern
SIP Basics Request/response model, key headers (Via, From, To, CSeq) “SIP: Understanding SIP Protocol” (Johnston) Ch. 1-2
SDP Format Media descriptions, codec negotiation RFC 4566 + “SIP Protocol” (Johnston) Ch. 4
MD5 Digest Auth How HTTP Digest authentication works RFC 2617
Audio Basics PCM encoding, sampling rate, bit depth “Digital Signal Processing” (Oppenheim) Ch. 4

Questions to Guide Your Design

Answer these before writing code:

  1. Registration State Machine: What states can your client be in? (Unregistered → Registering → Registered → Refreshing → Unregistering)

  2. Call State Machine: How do you track call progress? (Idle → Calling → Ringing → Incall → Hangup)

  3. SIP Parser: Will you use a library or parse manually? What happens with malformed messages?

  4. Threading Model: Separate threads for SIP handling, RTP sending, RTP receiving, and UI?

  5. NAT Handling: How will you discover your public IP? What if RTP doesn’t work?

  6. Audio Chain: Microphone → Codec → RTP vs RTP → Codec → Speaker—what’s the flow?

  7. Error Handling: What if you get “486 Busy Here” or “404 Not Found”?

Thinking Exercise (Do This Before Coding!)

Exercise: Trace a complete call flow on paper.

Draw boxes for: Alice’s phone, SIP server, Bob’s phone. Draw arrows for every SIP message from INVITE to BYE. Include:

  • Via headers (how responses route back)
  • Tags (how dialogs are identified)
  • Branch IDs (transaction matching)
  • SDP offer/answer (codec negotiation)
  • RTP flow (direct between phones)

Expected outcome: You should have ~12-15 arrows. If you have fewer, you’re missing messages.

The Interview Questions They’ll Ask

Senior VoIP engineers will test if you truly understand:

  1. “Why does SIP use both Call-ID and tags?”
    • Answer: Call-ID identifies the call globally. Tags distinguish endpoints in a dialog. Together they form a unique dialog identifier.
  2. “What’s the difference between a SIP proxy and a B2BUA?”
    • Answer: Proxy forwards messages and stays in signaling path but doesn’t modify dialog. B2BUA (Back-to-Back User Agent) terminates both dialogs and can modify SDP, enforce policy, etc.
  3. “You can hear the other person but they can’t hear you. What’s wrong?”
    • Answer: Asymmetric NAT issue. Your RTP is getting through, but their RTP can’t reach you (firewall/NAT blocking inbound UDP to random ports).
  4. “How does a SIP phone know which REGISTER to refresh?”
    • Answer: Tracks the Expires value from 200 OK response. Sets timer for ~90% of expiry time to re-REGISTER.
  5. “What happens if you lose 10 consecutive RTP packets?”
    • Answer: 200ms gap in audio. Jitter buffer empties. Decoder plays comfort noise or repeats last frame. RTCP reports severe loss.
  6. “Why can’t you use TCP for RTP?”
    • Answer: TCP retransmits lost packets, adding latency. Voice needs consistent timing—better to skip lost packets than wait for retransmission.

Hints in Layers

Stuck on SIP parsing?

Hint 1: Structure SIP messages are text-based like HTTP: ```c // Simplified parser structure typedef struct { char method[16]; // INVITE, REGISTER, etc. char request_uri[256]; int status_code; // 200, 404, etc. (responses only) char reason[32]; // OK, Not Found, etc. // Headers (simplified - use hash map in real code) char from[256]; char to[256]; char call_id[128]; int cseq; char via[512]; char contact[256]; char *body; int content_length; } sip_message_t; ```
Hint 2: Parsing approach ```c sip_message_t *parse_sip_message(const char *raw) { sip_message_t *msg = calloc(1, sizeof(sip_message_t)); // Split by \r\n to get lines // First line: Request-Line or Status-Line // Following lines: Headers // Empty line, then body char *line = strtok(raw, "\r\n"); // Parse first line if (strncmp(line, "SIP/2.0", 7) == 0) { // Response: "SIP/2.0 200 OK" sscanf(line, "SIP/2.0 %d %s", &msg->status_code, msg->reason); } else { // Request: "INVITE sip:bob@example.com SIP/2.0" sscanf(line, "%s %s", msg->method, msg->request_uri); } // Parse headers... while ((line = strtok(NULL, "\r\n")) != NULL) { if (strlen(line) == 0) break; // Empty line = end of headers if (strncasecmp(line, "From:", 5) == 0) { strcpy(msg->from, line + 6); // Skip "From: " } // ... parse other headers } return msg; } ```
Hint 3: Digest authentication When you get 401 Unauthorized: ```c // Extract nonce and realm from WWW-Authenticate header // WWW-Authenticate: Digest realm="asterisk", nonce="abc123" char realm[64], nonce[128]; // Parse these from header... // Compute HA1 = MD5(username:realm:password) char ha1_input[256]; sprintf(ha1_input, "%s:%s:%s", username, realm, password); char ha1[33]; md5_string(ha1_input, ha1); // You'll need MD5 library // Compute HA2 = MD5(method:uri) char ha2_input[256]; sprintf(ha2_input, "%s:%s", "REGISTER", "sip:192.168.1.10"); char ha2[33]; md5_string(ha2_input, ha2); // Compute response = MD5(HA1:nonce:HA2) char response_input[256]; sprintf(response_input, "%s:%s:%s", ha1, nonce, ha2); char response[33]; md5_string(response_input, response); // Add Authorization header to new REGISTER: // Authorization: Digest username="alice", realm="asterisk", // nonce="abc123", uri="sip:192.168.1.10", // response="..." ```

Stuck on RTP?

Hint 4: RTP packet structure ```c typedef struct { unsigned char vpxcc; // V(2), P(1), X(1), CC(4) unsigned char mpt; // M(1), PT(7) uint16_t seq_num; // Sequence number uint32_t timestamp; // RTP timestamp uint32_t ssrc; // Sync source ID unsigned char payload[160]; // G.711: 160 bytes for 20ms } rtp_packet_t; void send_rtp_packet(int sock, struct sockaddr_in *dest, uint16_t *seq, uint32_t *ts, unsigned char *audio, int audio_len) { rtp_packet_t pkt = {0}; pkt.vpxcc = 0x80; // Version 2, no padding/ext/csrc pkt.mpt = 0; // Payload type 0 = G.711 μ-law pkt.seq_num = htons((*seq)++); pkt.timestamp = htonl(*ts); *ts += 160; // G.711: 160 samples per packet (20ms at 8kHz) pkt.ssrc = htonl(0x12345678); // Random sync source memcpy(pkt.payload, audio, audio_len); sendto(sock, &pkt, sizeof(pkt), 0, (struct sockaddr*)dest, sizeof(*dest)); } ```

Books That Will Help

Book Relevant Chapters What You’ll Learn
“SIP: Understanding the Session Initiation Protocol” (Johnston) Ch 1-4 (Basics), Ch 7 (SDP), Ch 9 (Registration) Complete SIP protocol fundamentals
“Unix Network Programming Vol 1” (Stevens) Ch 8 (UDP), Ch 16 (Non-blocking I/O) Socket programming foundation
“VoIP Voice and Fax Over IP” (Levin) Ch 4-5 (RTP/RTCP), Ch 6 (Codecs) Media transport and audio processing
RFC 3261 Sections 8, 10, 13, 24 SIP transactions, registration, dialogs, examples
RFC 3550 Sections 3, 5, 6 RTP packet format, profiles, RTCP

Common Pitfalls & Debugging

Problem 1: “Registration gets 401, then times out”

  • Why: Digest auth MD5 calculation is wrong, or you’re not including all required fields
  • Fix: Print your HA1, HA2, and response. Compare to online MD5 calculators
  • Quick test: echo -n "user:realm:pass" | md5sum should match your HA1

Problem 2: “INVITE sent, no response”

  • Why: Via header branch must start with “z9hG4bK” (RFC magic cookie)
  • Fix: Generate branch like z9hG4bK + random_string()
  • Quick test: tcpdump -i any -n port 5060 -A to see if server responds with error

Problem 3: “Call connects but no audio”

  • Why: RTP ports blocked by firewall, or sending to wrong IP/port from SDP
  • Fix: Parse c= line and m=audio port from 200 OK SDP, send RTP there
  • Quick test: sudo tcpdump -i any -n udp port 50000 - do you see RTP packets?

Problem 4: “Audio works but sounds robotic/choppy”

  • Why: Wrong codec (sending G.711 A-law when negotiated μ-law), or wrong timestamp increment
  • Fix: Verify payload type matches RTP map. Timestamp += 160 per packet for G.711@8kHz
  • Quick test: Wireshark → Telephony → RTP → Stream Analysis - check for jitter/loss

Problem 5: “Can hear them, they can’t hear me”

  • Why: NAT not allowing inbound RTP. Your Contact header has private IP
  • Fix: Use STUN to discover public IP, put that in Contact and SDP
  • Quick test: curl -s http://ifconfig.me - if different from your local IP, you’re behind NAT

Problem 6: “Registration works for 30min, then calls fail”

  • Why: Registration expired, you’re not refreshing
  • Fix: Set timer for 90% of Expires value, send new REGISTER
  • Quick test: Asterisk CLI: pjsip show contacts - is your contact still there?

Project 2: Build Your Own Asterisk-Based Phone System

  • File: VOIP_TELEPHONY_LEARNING_PROJECTS.md
  • Programming Language: Asterisk Dialplan / Linux Shell
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Telephony / System Administration
  • Software or Tool: Asterisk
  • Main Book: “Asterisk: The Definitive Guide” by Jim Van Meggelen

What you’ll build: A complete small-business phone system with Asterisk: auto-attendant IVR (“Press 1 for sales…”), voicemail, call queues, conference rooms, and connection to real phone numbers via SIP trunking.

Why it teaches VoIP: Asterisk is the Linux of telephony - it exposes everything. By configuring dialplans, you’ll understand how real PBX systems route calls, handle media, and integrate with the PSTN.

Core challenges you’ll face:

  • Writing Asterisk dialplans in extensions.conf (maps to call routing logic)
  • Configuring SIP trunks to a provider like Telnyx/Voip.ms (maps to PSTN connectivity)
  • Setting up voicemail, IVR prompts, and queues (maps to PBX features)
  • Handling audio quality issues and codec selection (maps to media engineering)
  • Securing against VoIP fraud and toll fraud (maps to production security)

Resources for key challenges:

  • “Asterisk: The Definitive Guide, 5th Edition” by Jim Van Meggelen et al. - Comprehensive, practical
  • Telnyx/Voip.ms documentation - Real SIP trunk setup guides

Key Concepts:

  • Dialplan Logic: “Asterisk: The Definitive Guide” Ch. 6-7 - The heart of call routing
  • SIP Trunking: “Asterisk: The Definitive Guide” Ch. 7 - Connecting to carriers
  • IVR Design: “Asterisk: The Definitive Guide” Ch. 16 - Building voice menus
  • VoIP Security: “Hacking Exposed VoIP” by David Endler - Fraud prevention

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Linux administration basics, basic networking

Real world outcome:

  • Call your Asterisk system from your cell phone, hear your IVR, leave a voicemail
  • Have a real phone number (DID) that rings your system
  • Set up a conference bridge and have multiple people join
  • Forward calls to your mobile when you’re away

Learning milestones:

  1. Internal SIP phones can call each other - you understand basic PBX routing
  2. Inbound/outbound PSTN calls work - you understand SIP trunking and DIDs
  3. IVR, voicemail, and queues functional - you understand production telephony features

Real World Outcome

When completed, you’ll have a fully functional small-business phone system accessible from anywhere. Here’s what you’ll experience:

Calling from your cell phone to your system:

# You dial your DID (e.g., +1-555-867-5309) from your mobile phone

[You hear:]
"Thank you for calling Acme Corporation. For sales, press 1.
 For support, press 2. For the company directory, press 3."

[You press 2]

"Please hold while we connect you to support."
[Hold music plays]

# Your SIP phone at extension 2001 starts ringing
# OR if no one answers after 30 seconds:

"The support team is currently unavailable. Please leave a message after the beep."
[BEEP]
[You leave voicemail]

"Your message has been recorded. Goodbye."

Asterisk console output while call is active:

*CLI> pjsip show channels
 Channel              Location             State       Application(Data)
 =====================================================================================
 PJSIP/voipms-000001  s@from-trunk:1       Up          Queue(support)
 PJSIP/2001-000002    2001@internal:1      Ringing     AppDial((Outgoing Line))

*CLI> queue show support
support has 0 calls (max unlimited) in 'ringall' strategy (5s holdtime, 0s talktime), W:0, C:12, A:8, SL:95.0%, SL2:100.0% within 60s
   Members:
      PJSIP/2001 (ringinuse disabled) (dynamic) (In use) has taken 8 calls (last was 142 secs ago)
   Callers:
      1. Caller#0001 - +15555551234 (wait: 0:05, prio: 0)

Voicemail notification email you’ll receive:

From: Asterisk PBX <asterisk@yourdomain.com>
To: support@yourdomain.com
Subject: [Asterisk] New voicemail from +1-555-555-1234 in mailbox 2001

You have a new voicemail in mailbox 2001:

  From:     "+1-555-555-1234" <+15555551234>
  Length:   0:42 seconds
  Date:     January 15, 2025 3:47 PM

Dial *98 from your extension to listen to your messages.

[Attached: msg0001.wav]

Conference bridge in action:

# Participant 1 dials 6000
*CLI> confbridge list 6000
Conference Bridge: 6000
Users: 1
Marked Users: 0

User #1 (PJSIP/2001-000003): Alice <sip:alice@192.168.1.100>
  Admin: No  Muted: No  Talking: Yes

# Participant 2 joins
*CLI> confbridge list 6000
Conference Bridge: 6000
Users: 2
Marked Users: 0

User #1 (PJSIP/2001-000003): Alice
User #2 (PJSIP/2002-000004): Bob

# Both participants can talk and hear each other in real-time

SIP trunk registration status:

*CLI> pjsip show registrations

 <Registration/ServerURI..............................>  <Auth..........>  <Status.......>
 ========================================================================================
 voipms/sip.voip.ms:5060                                  voipms            Registered

 Objects found: 1

You can verify everything works by:

  • Calling your DID from any phone → hear your IVR
  • Navigating the menu → reach extensions/voicemail
  • Checking /var/spool/asterisk/voicemail/ → see recorded messages
  • Looking at CDR logs in /var/log/asterisk/cdr-csv/Master.csv → billing records
  • Using Asterisk CLI: core show channels → active calls

The Core Question You’re Answering

“How does a production phone system route calls, handle features like voicemail and IVR, and connect to the real telephone network?”

Before building this, most developers think of phone systems as black boxes. After completing this project, you’ll understand:

  • How PBXs translate business logic (ring sales team, then overflow to voicemail) into dialplan code
  • Why companies pay for SIP trunks instead of using free services
  • How DIDs (phone numbers) map to internal extensions
  • The economics of telephony (per-minute billing, DID rental costs)

Concepts You Must Understand First

Concept What You Need to Know Where to Learn It
Dialplan Logic Sequential execution, pattern matching, application syntax “Asterisk: The Definitive Guide” Ch. 6
SIP Endpoints Difference between trunks and extensions, authentication “Asterisk: The Definitive Guide” Ch. 7
Audio Codecs G.711, G.722, Opus—when to use each “VoIP Voice and Fax Over IP” Ch. 6
PSTN Numbering E.164 format, DIDs, caller ID Wikipedia: E.164 + carrier docs
IVR State Machines Read() vs WaitExten(), timeout handling “Asterisk: The Definitive Guide” Ch. 16
NAT Traversal Why SIP needs special NAT handling RFC 5389 (STUN) + Asterisk NAT docs

Questions to Guide Your Design

Before configuring Asterisk, answer these:

  1. Extension Numbering Plan: Will you use 3-digit (100-199) or 4-digit (1000-1999)? How will you organize departments?

  2. Call Routing Strategy: If someone calls the main number, should it ring all phones (ring group), go to an IVR, or go directly to reception?

  3. Voicemail Behavior: Should callers hear voicemail after 4 rings? 30 seconds? Different timeout for different extensions?

  4. IVR Menu Structure: How many levels deep? Should pressing 0 always reach an operator?

  5. SIP Trunk Provider: Which provider? (Voip.ms, Telnyx, Twilio?) Consider: pricing, reliability, support, features (SMS, E911)

  6. Codec Selection: Allow all codecs or restrict? G.711 for quality or G.729 for bandwidth savings?

  7. Security: How will you prevent toll fraud? (Fail2ban, IP whitelisting, strong SIP passwords)

Thinking Exercise (Do This Before Configuring!)

Exercise: Design your dialplan on paper.

Draw a flowchart for an incoming call to your main number:

  • What happens when someone dials your DID?
  • Where does the call go first? (IVR, extension, ring group?)
  • What happens if no one answers?
  • How do internal users dial out?
  • How do you handle after-hours calls?

Expected outcome: Your flowchart should have decision points (time-based routing), loops (retry logic), and dead ends (voicemail, hang up).

The Interview Questions They’ll Ask

Real interview questions for Asterisk administrators:

  1. “What’s the difference between PJSIP and chan_sip?”
    • Answer: PJSIP is the modern, maintained SIP channel driver with better NAT handling, IPv6 support, and multithreading. chan_sip is deprecated.
  2. “A user complains they can’t make outbound calls but can receive calls. What do you check?”
    • Answer: Check dialplan—likely the outbound context isn’t included, or the trunk isn’t configured for outbound, or the user’s COS (Class of Service) doesn’t allow external dialing.
  3. “How do you prevent toll fraud on an Asterisk PBX?”
    • Answer: Strong SIP passwords, Fail2ban for brute force, permit/deny IP lists, disable guest calls, use context isolation, enable alwaysauthreject, monitor CDRs for unusual patterns.
  4. “Explain the difference between a trunk and an extension.”
    • Answer: Extensions are internal users/endpoints that register to the PBX. Trunks are connections to external providers (SIP carriers) or other PBXs—they carry multiple simultaneous calls.
  5. “How does Asterisk match incoming DID numbers to extensions?”
    • Answer: Incoming calls hit a context specified in the trunk config (e.g., context=from-trunk). The dialplan then pattern-matches the DID in that context (e.g., exten => 5558675309,1,Dial(PJSIP/reception)).
  6. “What’s the purpose of the ‘n’ priority in dialplan?”
    • Answer: n means “next” priority—it automatically increments, making dialplans easier to maintain than hardcoded priority numbers (1,2,3…).

Hints in Layers

Stuck on SIP trunk configuration?

Hint 1: Required components Every SIP trunk needs three PJSIP objects: - **endpoint**: Defines the remote server - **aor**: Address of Record (where to send/receive) - **identify**: Matches inbound traffic to the endpoint (by IP) Plus optionally: - **auth**: If your provider requires registration - **registration**: To register your PBX with the provider
Hint 2: Example structure ```ini [voipms] type = endpoint context = from-trunk ; Incoming calls go here dtmf_mode = rfc4733 disallow = all allow = ulaw,alaw from_user = 555123 ; Your DID outbound_auth = voipms-auth [voipms-auth] type = auth auth_type = userpass username = 555123_yourusername password = your_password [voipms] type = aor contact = sip:toronto.voip.ms:5060 [voipms] type = identify endpoint = voipms match = 64.71.144.0/24 ; Provider's IP range [voipms-reg] type = registration outbound_auth = voipms-auth server_uri = sip:toronto.voip.ms client_uri = sip:555123_yourusername@toronto.voip.ms ```
Hint 3: Testing registration Check if your trunk registered: ```bash asterisk -rx "pjsip show registrations" # Should show "Registered" # If not: asterisk -rx "pjsip set logger on" # Watch for authentication failures or network issues ```

Stuck on dialplan for IVR?

Hint 4: IVR pattern ```ini [from-trunk] exten => 5558675309,1,Answer() same => n,Background(welcome) ; Plays prompt without waiting same => n,WaitExten(5) ; Wait 5 sec for input exten => 1,1,Goto(sales,s,1) ; Press 1 → sales exten => 2,1,Goto(support,s,1) ; Press 2 → support exten => 3,1,Directory(default) ; Press 3 → directory exten => i,1,Playback(invalid) ; Invalid input same => n,Goto(from-trunk,5558675309,2) exten => t,1,Playback(timeout) ; Timeout same => n,Goto(from-trunk,5558675309,2) [sales] exten => s,1,Dial(PJSIP/1001&PJSIP/1002,30) same => n,VoiceMail(1001@default) same => n,Hangup() ```

Books That Will Help

Book Relevant Chapters What You’ll Learn
“Asterisk: The Definitive Guide” 5th Ed Ch 5-7 (Installation, Dialplan, SIP) Core Asterisk administration
“Asterisk: The Definitive Guide” 5th Ed Ch 16 (IVR Menus) Advanced IVR design
“Asterisk: The Definitive Guide” 5th Ed Ch 22 (Clustering) High availability for production
“Hacking Exposed VoIP” (Endler) Ch 5-7 Security and fraud prevention
Voip.ms Knowledge Base SIP Configuration Articles Provider-specific setup

Common Pitfalls & Debugging

Problem 1: “Registration fails with 401 Unauthorized”

  • Why: Username/password wrong, or auth section not linked to endpoint
  • Fix: Verify credentials with provider, ensure outbound_auth=voipms-auth in endpoint
  • Quick test: asterisk -rx "pjsip show endpoint voipms" - look for auth

Problem 2: “Inbound calls don’t ring any phones”

  • Why: DID not matched in dialplan, or wrong context in trunk config
  • Fix: Check context=from-trunk in trunk endpoint, add exten => _X.,1,NoOp(${EXTEN}) to see what number is coming in
  • Quick test: asterisk -rvvvv and watch console when call comes in

Problem 3: “Outbound calls fail immediately”

  • Why: Dialplan doesn’t have outbound pattern, or trunk doesn’t allow outbound
  • Fix: Add pattern like exten => _NXXNXXXXXX,1,Dial(PJSIP/${EXTEN}@voipms)
  • Quick test: asterisk -rx "dialplan show internal" - verify outbound pattern exists

Problem 4: “Caller ID shows wrong number”

  • Why: Not setting From header correctly for outbound calls
  • Fix: Use Set(CALLERID(num)=5558675309) before Dial(), or set from_user in trunk endpoint
  • Quick test: Check provider’s call logs—what caller ID do they see?

Problem 5: “One-way audio through Asterisk”

  • Why: NAT misconfiguration, RTP ports blocked, or codec mismatch
  • Fix: Set local_net and external_media_address in pjsip.conf, open UDP 10000-20000 on firewall
  • Quick test: asterisk -rx "pjsip show endpoint 1001" - verify RTP is symmetric

Problem 6: “Voicemail password always wrong”

  • Why: Case-sensitive, or using wrong mailbox number
  • Fix: Check /etc/asterisk/voicemail.conf - mailbox numbers must match extensions exactly
  • Quick test: asterisk -rx "voicemail show users" - see configured mailboxes

Problem 7: “IVR prompt doesn’t play”

  • Why: Audio file missing, wrong format (needs 8kHz mono WAV), or wrong path
  • Fix: Place files in /var/lib/asterisk/sounds/en/ as .wav or .ulaw
  • Quick test: ls -l /var/lib/asterisk/sounds/en/ | grep welcome - file exists?

Problem 8: “High call volume causes crashes”

  • Why: Too many simultaneous channels, or memory leak
  • Fix: Set maxfiles in /etc/asterisk/asterisk.conf, monitor with core show channels count
  • Quick test: ulimit -n should be >8192, increase if needed

Additional Resources for This Project

Based on the latest Asterisk debugging guides (2025), here are essential resources:


Project 3: Twilio/Programmable Voice Application

  • File: VOIP_TELEPHONY_LEARNING_PROJECTS.md
  • Programming Language: JavaScript (Node.js) or Python
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Cloud Telephony / APIs
  • Software or Tool: Twilio API
  • Main Book: “Twilio Cookbook” by Roger Stringer

What you’ll build: A customer support hotline with: phone number provisioning, call routing based on caller input, speech-to-text transcription, call recording, and real-time agent dashboard showing active calls.

Why it teaches VoIP: Twilio abstracts the hard infrastructure but exposes the telephony concepts. You’ll learn how modern cloud telephony works - webhooks, TwiML, media streams - while seeing how a production-grade API handles the complexity you built manually in Project 1.

Core challenges you’ll face:

  • Handling Twilio webhooks for call events (maps to telephony event model)
  • Writing TwiML for call flow control (maps to IVR/dialplan concepts)
  • Implementing real-time media streams (maps to WebSocket audio handling)
  • Building call analytics and CDR processing (maps to telecom billing/metrics)
  • Handling concurrent calls and scaling (maps to production telephony)

Resources for key challenges:

  • Twilio’s official documentation - Excellent, practical guides
  • “VoIP and Unified Communications” by William A. Flanagan - Business telephony context

Key Concepts:

  • TwiML Structure: Twilio Docs “TwiML Voice” - Declarative call control
  • Webhook Event Model: Twilio Docs “Voice Webhooks” - How cloud telephony signals
  • Media Streams API: Twilio Docs “Media Streams” - Real-time audio access
  • Programmable Voice Pricing: Twilio Pricing Page - Understanding telecom economics

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Web development (any backend language), REST APIs

Real world outcome:

  • A real 1-800 number that customers can call
  • Callers hear your IVR, get routed to the right department
  • You can see live calls on a dashboard, listen to recordings
  • Call transcripts are searchable in your database

Learning milestones:

  1. Inbound calls trigger your webhook and play audio - you understand cloud telephony
  2. Call routing based on input works - you understand programmatic call control
  3. Real-time transcription and dashboards work - you understand media streams

Real World Outcome

You’ll build a production-grade customer support hotline that customers can actually call. Here’s what you’ll experience:

Customer calling your Twilio number:

# Customer dials +1-800-555-HELP from their phone

[Customer hears:]
"Welcome to TechSupport Pro. To help us route your call:
 Press 1 for billing questions
 Press 2 for technical support
 Press 3 to speak with a representative
 Or, say what you need help with after the beep."

[Customer presses 2]

"You've selected technical support. Please hold while we connect you."
[Hold music plays for 5 seconds]

# Your agent's phone rings
[Agent answers]
"Hello, this is Sarah from technical support. How can I help you today?"

Your webhook server receiving call events (Node.js console):

$ npm start
Server listening on port 3000

[2025-01-15 14:32:18] POST /voice/inbound
  From: +15555551234
  To: +18005555435
  CallSid: CA1234567890abcdef1234567890abcdef
  → Sending TwiML: Gather with menu options

[2025-01-15 14:32:22] POST /voice/menu
  From: +15555551234
  Digits: 2
  CallSid: CA1234567890abcdef1234567890abcdef
  → Routing to technical support queue

[2025-01-15 14:32:27] POST /voice/agent-answer
  CallSid: CA1234567890abcdef1234567890abcdef
  CallStatus: in-progress
  → Agent answered, starting recording

[2025-01-15 14:35:43] POST /voice/recording-complete
  RecordingUrl: https://api.twilio.com/2010-04-01/.../Recordings/RE...
  RecordingDuration: 196
  CallSid: CA1234567890abcdef1234567890abcdef
  → Saved recording, triggering transcription

[2025-01-15 14:36:02] POST /voice/transcription-complete
  TranscriptionText: "Hi I'm having trouble connecting to the VPN..."
  CallSid: CA1234567890abcdef1234567890abcdef
  → Indexed transcript in database

Your real-time dashboard (React frontend):

// Live call monitoring view
┌─────────────────────────────────────────────────────────────┐
 Active Calls (3)                          [Auto-refresh: ON] 
├─────────────────────────────────────────────────────────────┤
 Call ID       From           Status     Duration  Agent  
 CA12345...    +1-555-551234  In Call    3:16      Sarah  
 CA67890...    +1-555-559876  Ringing    0:08      -      
 CA11111...    +1-555-552468  Voicemail  1:42      -      
└─────────────────────────────────────────────────────────────┘

// Call detail when you click on a call
┌─────────────────────────────────────────────────────────────┐
 Call Details: CA12345...                                     
├─────────────────────────────────────────────────────────────┤
 Customer: +1-555-551234 (New York, NY)                      
 Started: 14:32:18 | Duration: 3:16 | Status: In Progress    
 Route: Main Menu  Tech Support  Agent (Sarah)             
                                                               
 Timeline:                                                     
  14:32:18  Call received                                     
  14:32:22  Selected option 2 (Tech Support)                  
  14:32:27  Agent answered                                    
  14:32:27  Recording started                                 
                                                               
 [ Listen Live] [📥 Download Recording] [📝 View Transcript] 
└─────────────────────────────────────────────────────────────┘

Twilio console showing your usage:

Voice Usage (Today)
  Inbound calls:     47 calls
  Total duration:    2h 34m
  Cost:             $1.88

Numbers
  +1-800-555-5435   Active   $2.00/month + $0.04/min

Recent Calls
  14:32:18  +1-555-551234  Completed  3:16   $0.13
  14:28:03  +1-555-559876  Completed  8:42   $0.35
  14:15:22  +1-555-552468  No Answer  0:30   $0.02

Email notification sent after call:

From: TechSupport Pro <noreply@yourdomain.com>
To: support-team@yourdomain.com
Subject: New Support Call - Transcript Available

Call Summary:
  Customer: +1-555-551234
  Duration: 3 minutes, 16 seconds
  Agent: Sarah Johnson
  Outcome: Issue resolved

Transcript:
  [00:00] Agent: "Hello, this is Sarah from technical support."
  [00:03] Customer: "Hi I'm having trouble connecting to the VPN..."
  [00:08] Agent: "I can help with that. What error message are you seeing?"
  ...

Recording: [Listen] [Download]
Tags: vpn, connectivity, resolved

Database records you’ll have:

SELECT * FROM calls WHERE created_at > NOW() - INTERVAL '1 day';

 id  | call_sid         | from_number    | to_number      | status    | duration | agent_id | cost
-----+------------------+----------------+----------------+-----------+----------+----------+-------
 142 | CA12345...       | +15555551234   | +18005555435   | completed | 196      | 7        | 0.13
 143 | CA67890...       | +15555559876   | +18005555435   | completed | 522      | 3        | 0.35
 144 | CA11111...       | +15555552468   | +18005555435   | no-answer | 30       | NULL     | 0.02

SELECT * FROM transcriptions WHERE call_id = 142;

 call_id | text                                           | confidence
---------+------------------------------------------------+------------
 142     | Hi I'm having trouble connecting to the VPN... | 0.94

The Core Question You’re Answering

“How do modern cloud telephony platforms work, and how can developers build voice applications without managing infrastructure?”

After building this, you’ll understand:

  • The webhook-based event model that powers programmable telephony
  • How TwiML (Twilio Markup Language) declaratively controls call flows
  • The economics of CPaaS (Communications Platform as a Service)
  • How to build scalable voice applications without running SIP servers
  • The difference between building telephony infrastructure (Project 1, 2) vs consuming it as an API

Concepts You Must Understand First

Concept What You Need to Know Where to Learn It
HTTP Webhooks POST requests, request/response model, handling retries MDN Web Docs: HTTP
TwiML Structure XML-based call control, nesting verbs Twilio Docs: TwiML Voice
Event-Driven Architecture Asynchronous events, callback URLs “Building Event-Driven Microservices” (O’Reilly)
WebSocket Streams Bi-directional real-time communication MDN Web Docs: WebSockets
REST APIs Authentication, rate limiting, pagination Twilio API Reference
Call State Management Tracking call lifecycle, handling race conditions “Designing Data-Intensive Applications” Ch. 7

Questions to Guide Your Design

Before coding, answer these:

  1. Webhook Security: How will you verify webhooks are actually from Twilio? (Signature validation required!)

  2. Failure Handling: What happens if your webhook server is down when a call comes in? (Fallback URLs!)

  3. Database Schema: How will you model calls, recordings, transcriptions, and their relationships?

  4. Agent Availability: How do you know which agents are online and available to take calls?

  5. Call Queueing: If all agents are busy, how long should customers wait? What music/message do they hear?

  6. Transcription Strategy: Real-time streaming transcription (expensive, fast) or batch transcription (cheap, slow)?

  7. Cost Control: How will you prevent runaway costs? (Set spending limits, monitor usage alerts)

Thinking Exercise (Do This Before Coding!)

Exercise: Map out your TwiML call flow.

Draw a flowchart showing:

  • Initial call → webhook fires
  • TwiML response → menu options
  • User input → routing decision
  • Agent availability check → connect or queue
  • Call completion → recording + transcription

Questions while designing:

  • What if the user doesn’t press anything? (timeout handling)
  • What if they press an invalid option? (input validation)
  • What if your database is down? (graceful degradation)
  • How do you prevent double-charging if webhook fires twice?

The Interview Questions They’ll Ask

Real questions for cloud telephony developers:

  1. “How do you secure Twilio webhooks?”
    • Answer: Validate the X-Twilio-Signature header using request signature validation. Hash the full webhook URL + POST params with your auth token and compare.
  2. “What’s the difference between <Say> and <Play> in TwiML?”
    • Answer: <Say> uses text-to-speech to generate audio. <Play> plays a pre-recorded audio file from a URL.
  3. “A customer complains they called but your webhook never fired. How do you debug?”
    • Answer: Check Twilio’s debugger console for the call—it shows every HTTP request/response, status codes, and errors. Look for webhook timeouts (>15s), network errors, or invalid TwiML responses.
  4. “How does Twilio’s Media Streams work for real-time audio?”
    • Answer: WebSocket connection sends base64-encoded μ-law audio chunks (20ms each) bi-directionally. You can process live audio for transcription, sentiment analysis, or voice AI.
  5. “What’s the cost structure of Twilio Voice?”
    • Answer: Phone number rental (~$1-2/month), inbound calls (~$0.0085/min), outbound calls (~$0.013/min), recording storage, and transcription costs. Prices vary by country.
  6. “How do you implement failover for your webhook endpoints?”
    • Answer: Configure fallback URLs in Twilio console. If primary webhook times out or returns 500, Twilio retries fallback URL. Use monitoring and load balancers for high availability.

Hints in Layers

Stuck on webhook setup?

Hint 1: Basic webhook endpoint Your webhook needs to: 1. Accept POST requests 2. Return valid TwiML XML 3. Respond within 15 seconds (or Twilio times out) Basic structure: ```javascript app.post('/voice/inbound', (req, res) => { const twiml = new twilio.twiml.VoiceResponse(); // Build TwiML response here res.type('text/xml'); res.send(twiml.toString()); }); ```
Hint 2: TwiML for IVR menu ```javascript const twiml = new twilio.twiml.VoiceResponse(); const gather = twiml.gather({ action: '/voice/menu', // Where to send user's input numDigits: 1, method: 'POST' }); gather.say('Press 1 for sales. Press 2 for support.'); // If they don't press anything: twiml.say('We didn\'t receive your input.'); twiml.redirect('/voice/inbound'); // Loop back res.type('text/xml'); res.send(twiml.toString()); ```
Hint 3: Validating webhook signature ```javascript const twilio = require('twilio'); function validateTwilioRequest(req, res, next) { const signature = req.headers['x-twilio-signature']; const url = `https://${req.headers.host}${req.originalUrl}`; const params = req.body; const isValid = twilio.validateRequest( process.env.TWILIO_AUTH_TOKEN, signature, url, params ); if (!isValid) { return res.status(403).send('Forbidden'); } next(); } app.post('/voice/inbound', validateTwilioRequest, (req, res) => { // Handler code }); ```

Stuck on Media Streams?

Hint 4: WebSocket setup for real-time audio ```javascript // In your TwiML response: const twiml = new twilio.twiml.VoiceResponse(); const start = twiml.start(); start.stream({ url: 'wss://yourdomain.com/media-stream' }); twiml.say('Please speak after the beep.'); // WebSocket server: const WebSocket = require('ws'); const wss = new WebSocket.Server({ port: 8080 }); wss.on('connection', (ws) => { console.log('Media stream connected'); ws.on('message', (data) => { const msg = JSON.parse(data); if (msg.event === 'media') { // msg.media.payload contains base64-encoded μ-law audio const audioChunk = Buffer.from(msg.media.payload, 'base64'); // Process audio: send to speech-to-text, etc. } }); }); ```

Books That Will Help

Book Relevant Chapters What You’ll Learn
“Twilio Cookbook” (Roger Stringer) Ch 1-3 Practical Twilio Voice examples
“VoIP and Unified Communications” (Flanagan) Ch 8-9 Business telephony context
“Building Event-Driven Microservices” (Bellemare) Ch 4-5 Webhook architecture patterns
Twilio Voice API Docs All sections Comprehensive API reference
Twilio Best Practices Guide Voice SDK, Security Production-grade implementation

Common Pitfalls & Debugging

Problem 1: “Webhook never fires when calls come in”

  • Why: Webhook URL misconfigured, not publicly accessible, or HTTPS required
  • Fix: Use ngrok for local testing: ngrok http 3000, then set webhook URL to ngrok HTTPS URL
  • Quick test: curl -X POST https://your-webhook-url/voice/inbound - does it respond?

Problem 2: “TwiML error: Invalid XML”

  • Why: Malformed XML, missing closing tags, or invalid verb nesting
  • Fix: Use Twilio’s helper libraries—they generate valid TwiML. Never concatenate strings manually
  • Quick test: Copy TwiML output, paste into Twilio TwiML Bin tester

Problem 3: “Calls connect but recordings are missing”

  • Why: Didn’t include record="true" attribute, or recording callback not configured
  • Fix: Add record="record-from-answer" to <Dial> verb
  • Quick test: Check Twilio console → Calls → select call → Recordings tab

Problem 4: “Transcription never arrives”

  • Why: Transcription disabled, callback URL wrong, or audio quality too poor
  • Fix: Enable transcription in <Record> with transcribe="true" and transcribeCallback URL
  • Quick test: Download recording, play it—if you can’t understand it, neither can transcription

Problem 5: “High costs due to accidental call loops”

  • Why: Webhook redirects infinitely, causing calls to never hang up
  • Fix: Add call duration tracking, force hangup after N redirects, set timeout attributes
  • Quick test: Set spending alerts in Twilio console—get notified when usage spikes

Problem 6: “Webhook signature validation always fails”

  • Why: Using wrong auth token, URL mismatch (http vs https), or ngrok hostname changes
  • Fix: Use Account SID + Auth Token from console, ensure webhook URL exactly matches what Twilio sends to
  • Quick test: Log the URL Twilio constructs vs your validation URL—must be identical

Problem 7: “Media streams disconnect randomly”

  • Why: WebSocket timeout, your server not sending keepalive, or network issues
  • Fix: Send periodic keepalive messages, handle reconnection, check Twilio’s connection status events
  • Quick test: Monitor WebSocket ping/pong frames with browser dev tools

Problem 8: “Agent phones ring but they can’t hear customer”

  • Why: Missing <Dial> action callback, or early media issues
  • Fix: Ensure both legs of call are connected before starting recording/transcription
  • Quick test: Use Twilio Voice Insights to see audio quality metrics per call leg

Additional Resources for This Project

Based on the latest Twilio best practices (2025):


Project 4: SIP Proxy/Registrar Server

  • File: VOIP_TELEPHONY_LEARNING_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: Go, Rust, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 4: The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced (The Engineer)
  • Knowledge Area: VoIP, Networking, Telephony
  • Software or Tool: Kamailio, SIP
  • Main Book: SIP: Understanding the Session Initiation Protocol by Alan B. Johnston

What you’ll build: Your own SIP server (like a minimal Kamailio) that can: register SIP clients, proxy calls between them, handle authentication, and perform basic load balancing.

Why it teaches VoIP: This forces you to understand SIP from the server perspective - how registration works, how proxies route requests, how forking works, and why stateful vs stateless proxies matter.

Core challenges you’ll face:

  • Implementing SIP registrar (REGISTER handling) (maps to SIP location service)
  • Building a stateful SIP proxy (maps to call state management)
  • Handling Via headers and Record-Route (maps to SIP routing concepts)
  • Implementing authentication challenges (maps to SIP security)
  • Scaling with multiple proxies (maps to high-availability telephony)

Resources for key challenges:

  • “SIP: Understanding the Session Initiation Protocol” by Alan B. Johnston - Chapters on proxies
  • Kamailio documentation - See how production SIP proxies work
  • RFC 3261 Sections 16-17 - Proxy behavior specification

Key Concepts:

  • SIP Proxy vs B2BUA: “SIP Demystified” by Gonzalo Camarillo - Understanding server types
  • SIP Registration: RFC 3261 Section 10 - How location services work
  • Proxy Routing: RFC 3261 Section 16 - Request forwarding mechanics
  • SIP Authentication: RFC 3261 Section 22 - Digest authentication

Difficulty: Advanced Time estimate: 1 month+ Prerequisites: Project 1 completed, strong networking knowledge

Real world outcome:

  • Multiple SIP phones can register to your server
  • Calls between registered users work through your proxy
  • You can see all SIP traffic flowing through your server in logs
  • Add a second proxy server and see failover work

Learning milestones:

  1. Phones can register and you track their location - you understand SIP registration
  2. Calls route through your proxy correctly - you understand SIP proxy behavior
  3. Authentication and basic security work - you understand SIP security

Real World Outcome

You’ll build a working SIP server that acts like a mini-Kamailio. Multiple SIP phones will be able to register, and calls between them will route through your proxy.

SIP phone registering to your server:

# Zoiper SIP client configured to register at 192.168.1.10:5060
# Username: alice, Password: secret123

[Your server console output:]
$ ./sip_proxy --port 5060 --db users.db

SIP Proxy Server v1.0 starting...
[2025-01-15 10:15:22] Listening on 0.0.0.0:5060 (UDP)
[2025-01-15 10:15:22] Loading user database...
[2025-01-15 10:15:22] Loaded 50 registered endpoints

[2025-01-15 10:15:45] <<< Received from 192.168.1.100:54321:
REGISTER sip:192.168.1.10 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:54321;branch=z9hG4bKnashds
From: <sip:alice@192.168.1.10>;tag=456248
To: <sip:alice@192.168.1.10>
Call-ID: 843817637684230@192.168.1.100
CSeq: 1 REGISTER
Contact: <sip:alice@192.168.1.100:54321>
Expires: 3600

[2025-01-15 10:15:45] Processing REGISTER for alice@192.168.1.10
[2025-01-15 10:15:45] No credentials provided, challenging with 401

[2025-01-15 10:15:45] >>> Sending to 192.168.1.100:54321:
SIP/2.0 401 Unauthorized
Via: SIP/2.0/UDP 192.168.1.100:54321;branch=z9hG4bKnashds
From: <sip:alice@192.168.1.10>;tag=456248
To: <sip:alice@192.168.1.10>;tag=proxy-7f8a9b
Call-ID: 843817637684230@192.168.1.100
CSeq: 1 REGISTER
WWW-Authenticate: Digest realm="myproxy", nonce="abc123def456"

[2025-01-15 10:15:46] <<< Received authenticated REGISTER
[2025-01-15 10:15:46] Digest authentication successful for alice
[2025-01-15 10:15:46] Stored location: alice → sip:alice@192.168.1.100:54321
[2025-01-15 10:15:46] Expires in 3600 seconds

[2025-01-15 10:15:46] >>> Sending to 192.168.1.100:54321:
SIP/2.0 200 OK
Contact: <sip:alice@192.168.1.100:54321>;expires=3600

✓ alice@192.168.1.10 registered from 192.168.1.100:54321

Call routing through your proxy:

# Alice calls Bob (both registered to your proxy)

[2025-01-15 10:20:15] <<< INVITE from alice@192.168.1.100:54321
INVITE sip:bob@192.168.1.10 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.100:54321;branch=z9hG4bK74bf9
From: "Alice" <sip:alice@192.168.1.10>;tag=9fxced76sl
To: <sip:bob@192.168.1.10>
Call-ID: 3848276298220188511@192.168.1.100
CSeq: 314159 INVITE

[2025-01-15 10:20:15] Looking up location for bob@192.168.1.10
[2025-01-15 10:20:15] Found: bob → sip:bob@192.168.1.101:45678
[2025-01-15 10:20:15] Proxying INVITE to 192.168.1.101:45678
[2025-01-15 10:20:15] Adding Record-Route: <sip:192.168.1.10;lr>
[2025-01-15 10:20:15] Adding Via: SIP/2.0/UDP 192.168.1.10:5060;branch=z9hG4bKproxy123

[2025-01-15 10:20:15] >>> Forwarding to bob@192.168.1.101:45678:
INVITE sip:bob@192.168.1.101:45678 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.10:5060;branch=z9hG4bKproxy123
Via: SIP/2.0/UDP 192.168.1.100:54321;branch=z9hG4bK74bf9
Record-Route: <sip:192.168.1.10;lr>
From: "Alice" <sip:alice@192.168.1.10>;tag=9fxced76sl
To: <sip:bob@192.168.1.10>

[2025-01-15 10:20:16] <<< 100 Trying from bob
[2025-01-15 10:20:16] >>> Forwarding 100 to alice

[2025-01-15 10:20:17] <<< 180 Ringing from bob
[2025-01-15 10:20:17] >>> Forwarding 180 to alice

[2025-01-15 10:20:22] <<< 200 OK from bob
[2025-01-15 10:20:22] >>> Forwarding 200 to alice
[2025-01-15 10:20:22] Call established: alice ↔ bob

[2025-01-15 10:20:22] RTP flows directly: 192.168.1.100 ↔ 192.168.1.101
[2025-01-15 10:20:22] Proxy stays in signaling path only

# ... call continues ...

[2025-01-15 10:23:47] <<< BYE from alice
[2025-01-15 10:23:47] >>> Forwarding BYE to bob
[2025-01-15 10:23:47] <<< 200 OK from bob
[2025-01-15 10:23:47] >>> Forwarding 200 to alice
[2025-01-15 10:23:47] Call terminated. Duration: 3min 25sec

Status monitoring:

$ ./sip_proxy_cli status

SIP Proxy Server Status
=======================
Uptime: 2 days, 5 hours, 32 minutes
Registered endpoints: 42
Active calls: 7
Total calls processed: 1,247

Recent Registrations:
  alice@192.168.1.10   192.168.1.100:54321   Expires: 3541s
  bob@192.168.1.10     192.168.1.101:45678   Expires: 3598s
  carol@192.168.1.10   192.168.1.102:39872   Expires: 1820s

Active Calls:
  alice → bob         In progress   3:25
  dave → frank        Ringing       0:08
  grace → helen       In progress   12:03

The Core Question You’re Answering

“How do SIP servers route calls between endpoints, and what’s the difference between a stateless proxy, stateful proxy, and B2BUA?”

By building this, you’ll understand:

  • Why SIP needs a registrar (location service)
  • How proxies add Via and Record-Route headers to maintain routing context
  • The difference between transaction state (stateful) and dialog state (B2BUA)
  • Why proxies stay in the signaling path but not the media path
  • How to scale SIP infrastructure horizontally

Concepts You Must Understand First

Concept What You Need to Know Where to Learn It
SIP Transactions INVITE, non-INVITE, transaction matching via branch ID RFC 3261 Section 17
SIP Dialogs Call-ID + tags = dialog, dialog state RFC 3261 Section 12
Via Header Processing How responses route back, why order matters RFC 3261 Section 16.7
Record-Route How to stay in signaling path for subsequent requests RFC 3261 Section 16.6
Location Service Mapping AOR to Contact address RFC 3261 Section 10
SIP Authentication Digest auth challenge/response RFC 2617 + RFC 3261 Section 22

Questions to Guide Your Design

Before implementing, decide:

  1. Stateful or Stateless?: Will you track transaction state? Stateful = more memory, better reliability.

  2. Location Storage: In-memory hash table, SQLite database, or Redis? Consider persistence and clustering.

  3. Threading Model: One thread per transaction? Thread pool? Event-driven with epoll/kqueue?

  4. Routing Algorithm: Strict routing (Route header), loose routing (lr parameter), or record-routing?

  5. Forking: If user has multiple devices registered, do you fork INVITE to all? (Parallel or sequential?)

  6. NAT Handling: Do you fix Contact headers for NATed clients? Use rport? Run STUN?

  7. Security: Rate limiting? IP-based ACLs? Require TLS? How to prevent DoS?

Thinking Exercise (Do This Before Coding!)

Exercise: Trace Via and Record-Route headers through a call.

Draw three boxes: Alice, Proxy, Bob. For each SIP message (INVITE → 200 OK → ACK → BYE → 200 OK), show:

  • Via headers added/removed
  • Record-Route headers
  • Route headers used in subsequent requests

Expected outcome: You should see Via headers stack up on requests and get popped on responses. Record-Route should appear in 200 OK and get copied to Route in ACK/BYE.

The Interview Questions They’ll Ask

Real interview questions for SIP server developers:

  1. “What’s the difference between a SIP proxy and a B2BUA?”
    • Answer: Proxy forwards messages, maintains transaction state but not dialog state. B2BUA terminates both call legs as separate dialogs, can modify SDP, enforce policies, and appears as both UAC and UAS.
  2. “Why do SIP responses follow Via headers backward instead of using From/To?”
    • Answer: From/To identify endpoints in a dialog, not network paths. Via headers record the actual path through proxies, ensuring responses route back correctly even through multiple proxies/NATs.
  3. “A client sends INVITE to user@domain.com. How does your proxy know where to send it?”
    • Answer: Look up domain.com in DNS SRV records for _sip._udp. If user registered, query location service for their Contact address. If not registered, return 404 Not Found.
  4. “What’s the ‘lr’ parameter in Record-Route and why is it important?”
    • Answer: “lr” means loose routing. Without it, proxies expect strict routing (rewrite Request-URI). With lr, proxies route based on Route header and preserve Request-URI.
  5. “How do you prevent your proxy from creating SIP message loops?”
    • Answer: Check Max-Forwards header (decrement, drop if zero). Check Via headers for loops (if your address already appears). Use branch IDs to detect transaction loops.
  6. “What happens if you forget to add Record-Route?”
    • Answer: Subsequent in-dialog requests (ACK, BYE, re-INVITE) go directly between endpoints, bypassing your proxy. This breaks call control, NAT traversal, and policy enforcement.

Hints in Layers

Stuck on transaction matching?

Hint 1: Transaction identification SIP transactions are identified by: - **Client transactions**: Branch ID + Method - **Server transactions**: Branch ID + Method + top Via sent-by The magic cookie "z9hG4bK" in branch ID indicates RFC 3261 compliance. ```c typedef struct { char branch_id[256]; // From Via header char method[16]; // INVITE, REGISTER, etc. char sent_by[128]; // IP:port from top Via } transaction_key_t; ```
Hint 2: Proxying INVITE ```c void proxy_invite(sip_message_t *invite, contact_t *target) { // 1. Add your Via header add_via_header(invite, local_ip, local_port, generate_branch()); // 2. Add Record-Route (to stay in signaling path) add_record_route(invite, local_ip, local_port, "lr"); // 3. Decrement Max-Forwards int max_fwd = get_max_forwards(invite); set_max_forwards(invite, max_fwd - 1); // 4. Update Request-URI to target's Contact set_request_uri(invite, target->contact); // 5. Send to target send_sip_message(invite, target->ip, target->port); // 6. Create transaction state create_transaction(invite, TRANSACTION_INVITE_CLIENT); } ```
Hint 3: Handling responses ```c void handle_response(sip_message_t *response) { // 1. Find matching transaction (via branch ID) transaction_t *txn = find_transaction_by_branch(response->via_branch); if (!txn) { // Orphaned response, drop it return; } // 2. Remove top Via (that's your Via) remove_top_via(response); // 3. Forward response upstream (use next Via) via_header_t *next_via = get_top_via(response); send_sip_message(response, next_via->ip, next_via->port); // 4. If final response (>= 200), destroy transaction if (response->status_code >= 200) { destroy_transaction(txn); } } ```

Stuck on location service?

Hint 4: Simple in-memory location storage ```c typedef struct { char aor[256]; // Address of Record: sip:alice@domain.com char contact[256]; // Contact URI: sip:alice@192.168.1.100:5060 time_t expires; // Expiration timestamp char ip[64]; // IP address for NAT fixup int port; // Port for NAT fixup } binding_t; // Hash table: AOR → binding GHashTable *location_db; void save_binding(const char *aor, const char *contact, int expires_delta) { binding_t *binding = g_new0(binding_t, 1); strcpy(binding->aor, aor); strcpy(binding->contact, contact); binding->expires = time(NULL) + expires_delta; g_hash_table_insert(location_db, g_strdup(aor), binding); printf("✓ Saved: %s → %s (expires in %ds)\n", aor, contact, expires_delta); } binding_t *lookup_binding(const char *aor) { binding_t *binding = g_hash_table_lookup(location_db, aor); if (binding && binding->expires < time(NULL)) { // Expired, remove it g_hash_table_remove(location_db, aor); return NULL; } return binding; } ```

Books That Will Help

Book Relevant Chapters What You’ll Learn
“SIP: Understanding SIP Protocol” (Johnston) Ch 8-9 (Proxies, Registrars) Proxy behavior and routing
“SIP Demystified” (Camarillo) Ch 3-5 (Proxy types, routing) Deep dive into proxy logic
RFC 3261 Sections 10, 16, 17 Registration, proxy behavior, transactions
“Unix Network Programming” (Stevens) Vol 1, Ch 6, 8 UDP socket programming
Kamailio Documentation Core Cookbook Production SIP proxy patterns

Common Pitfalls & Debugging

Problem 1: “Responses never reach the client”

  • Why: Not removing your Via header from responses, or Via processing order wrong
  • Fix: Pop exactly one Via header from responses before forwarding
  • Quick test: tcpdump -i any port 5060 -A grep Via - count Via headers

Problem 2: “BYE goes directly between endpoints, bypassing proxy”

  • Why: Forgot to add Record-Route in INVITE/200 OK
  • Fix: Add Record-Route with ;lr parameter in both INVITE and 200 OK responses
  • Quick test: Look at ACK message—should have Route header pointing to your proxy

Problem 3: “Registration succeeds but calls to user fail with 404”

  • Why: Not saving Contact from REGISTER, or lookup failing
  • Fix: Extract Contact header from REGISTER, save AOR → Contact mapping
  • Quick test: Print location database after REGISTER—is the binding there?

Problem 4: “Authentication challenge loops infinitely”

  • Why: Nonce not being validated, or same nonce used twice
  • Fix: Generate unique nonce per challenge, validate it in authenticated request
  • Quick test: Log nonces—should see different nonce for each 401 challenge

Problem 5: “Proxy crashes with SIGSEGV under load”

  • Why: Race condition in transaction hash table, buffer overflow, or NULL pointer
  • Fix: Use thread-safe data structures (mutexes), bounds checking, sanitizers (AddressSanitizer)
  • Quick test: Run with valgrind --leak-check=full ./sip_proxy

Problem 6: “Calls work locally but fail through NAT”

  • Why: Not fixing Contact headers for NATed clients (use received/rport)
  • Fix: Check if source IP differs from Via sent-by, update Contact in REGISTER
  • Quick test: Compare Via sent-by to actual source IP in REGISTER

Problem 7: “Transaction matching fails randomly”

  • Why: Not generating RFC 3261-compliant branch IDs (must start with “z9hG4bK”)
  • Fix: Generate branch as “z9hG4bK” + random_string()
  • Quick test: Inspect Via headers—all branch IDs should start with magic cookie

Problem 8: “Memory usage grows unbounded”

  • Why: Transactions never getting destroyed, or expired bindings not purged
  • Fix: Set timers to clean up transactions after 32 seconds, run periodic binding cleanup
  • Quick test: Monitor memory with top, should stabilize after initial growth

Project 5: WebRTC-to-SIP Gateway

  • File: VOIP_TELEPHONY_LEARNING_PROJECTS.md
  • Programming Language: C / JavaScript
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Real-Time Communications / Gateway
  • Software or Tool: WebRTC / SIP
  • Main Book: “WebRTC: APIs and RTCWEB Protocols” by Alan B. Johnston

What you’ll build: A gateway that lets browser-based WebRTC clients (no plugins) make and receive calls to/from traditional SIP phones and PSTN numbers.

Why it teaches VoIP: WebRTC and SIP speak different dialects. Building a gateway forces you to understand both protocols deeply, plus handle the impedance mismatch: ICE vs SIP NAT traversal, different codecs, SRTP-DTLS vs SRTP-SDES.

Core challenges you’ll face:

  • Implementing WebRTC signaling server (maps to modern real-time signaling)
  • Translating WebRTC SDP to SIP-compatible SDP (maps to protocol bridging)
  • Handling ICE negotiation for WebRTC side (maps to NAT traversal)
  • Transcoding between WebRTC codecs (Opus) and PSTN codecs (G.711) (maps to media processing)
  • Managing certificate/DTLS for secure media (maps to VoIP security)

Resources for key challenges:

  • “WebRTC: APIs and RTCWEB Protocols of the HTML5 Real-Time Web” by Alan B. Johnston
  • “High Performance Browser Networking” by Ilya Grigorik - WebRTC chapter
  • FreeSWITCH documentation - See how production gateways work

Key Concepts:

  • WebRTC Architecture: “WebRTC: APIs and RTCWEB Protocols” Ch. 1-3 - Core WebRTC concepts
  • ICE/STUN/TURN: RFC 8445 + “High Performance Browser Networking” Ch. 18 - NAT traversal
  • SRTP-DTLS vs SDES: RFC 5764 - Understanding secure media differences
  • Codec Transcoding: FFmpeg/libav documentation - Audio format conversion

Difficulty: Advanced Time estimate: 1 month+ Prerequisites: WebRTC basics, SIP knowledge from earlier projects

Real world outcome:

  • Open a browser, click a button, and call a real phone number
  • Receive incoming calls in your browser from any phone
  • Quality is comparable to commercial solutions
  • Works behind corporate NATs and firewalls

Learning milestones:

  1. Browser can establish WebRTC connection to your server - you understand WebRTC signaling
  2. Browser-to-SIP calls connect (even if audio is rough) - you understand protocol bridging
  3. Bidirectional audio with good quality - you understand media handling

Real World Outcome

You’ll build a working click-to-call system where users can make phone calls directly from their browser—no plugins, no downloads.

User experience in browser:

<!-- Your web app interface -->
<!DOCTYPE html>
<html>
<body>
  <h1>Browser Phone</h1>

  <input id="phoneNumber" placeholder="+1-555-867-5309" />
  <button id="callBtn">📞 Call</button>
  <button id="hangupBtn" disabled>❌ Hang Up</button>

  <div id="status">Ready to call...</div>
  <div id="callDuration"></div>

  <audio id="remoteAudio" autoplay></audio>
</body>
</html>

What the user sees when calling:

[User clicks "Call"]

Status: Requesting microphone access...
Status: Microphone granted ✓
Status: Connecting to server...
Status: Connected to WebRTC gateway ✓
Status: Calling +1-555-867-5309...
Status: Ringing... 🔔
Status: Call answered ✓
Call Duration: 00:03... 00:04... 00:05...

[User clicks "Hang Up"]
Status: Call ended (Duration: 3:47)

Your gateway server console:

$ ./webrtc_sip_gateway --ws-port 8443 --sip-port 5060

WebRTC-SIP Gateway v1.0
[2025-01-15 14:30:00] WebSocket server listening on wss://0.0.0.0:8443
[2025-01-15 14:30:00] SIP endpoint listening on 0.0.0.0:5060
[2025-01-15 14:30:00] TURN server enabled at turn:0.0.0.0:3478
[2025-01-15 14:30:00] Gateway ready

# Browser connects
[2025-01-15 14:32:15] WebSocket connection from 203.0.113.50:54123
[2025-01-15 14:32:15] Client ID: webrtc-client-7a9f4b
[2025-01-15 14:32:15] Generating ICE credentials

# WebRTC offer received
[2025-01-15 14:32:16] <<< WebRTC Offer received
SDP Session:
  v=0
  o=mozilla...THIS_IS_SDPARTA 58018 IN IP4 0.0.0.0
  m=audio 54312 UDP/TLS/RTP/SAVPF 111 9 0 8
  a=rtpmap:111 opus/48000/2
  a=rtpmap:9 G722/8000
  a=rtpmap:0 PCMU/8000
  a=fingerprint:sha-256 AB:CD:EF:...  (DTLS-SRTP)
  a=ice-ufrag:a3f8, ice-pwd:78f9a...
  a=candidate:1 1 UDP 2130706431 192.168.1.50 54312 typ host
  a=candidate:2 1 UDP 1694498815 203.0.113.50 54312 typ srflx raddr 192.168.1.50

[2025-01-15 14:32:16] Parsing WebRTC SDP...
[2025-01-15 14:32:16] Codec priority: Opus (preferred), G.722, PCMU
[2025-01-15 14:32:16] ICE candidates: 2 (host + srflx)
[2025-01-15 14:32:16] DTLS fingerprint: AB:CD:EF...

# Translating to SIP
[2025-01-15 14:32:17] Call request: +15558675309 (E.164 normalized)
[2025-01-15 14:32:17] Converting WebRTC SDP → SIP SDP
[2025-01-15 14:32:17] ⚠ Opus not supported by PSTN, transcoding to PCMU
[2025-01-15 14:32:17] ⚠ DTLS-SRTP → SRTP-SDES conversion

[2025-01-15 14:32:17] >>> SIP INVITE to trunk provider
INVITE sip:+15558675309@sip.provider.com SIP/2.0
From: <sip:anonymous@gateway.local>;tag=webrtc-7a9f4b
To: <sip:+15558675309@sip.provider.com>
SDP:
  v=0
  c=IN IP4 198.51.100.10  (gateway's public IP)
  m=audio 20000 RTP/SAVP 0
  a=rtpmap:0 PCMU/8000
  a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:...  (SDES)

[2025-01-15 14:32:18] <<< SIP 100 Trying
[2025-01-15 14:32:19] <<< SIP 180 Ringing
[2025-01-15 14:32:19] >>> Sending "ringing" to browser via WebSocket

[2025-01-15 14:32:24] <<< SIP 200 OK
SDP:
  m=audio 16384 RTP/AVP 0
  a=rtpmap:0 PCMU/8000

[2025-01-15 14:32:24] Call answered!
[2025-01-15 14:32:24] Media relay active:
  • Browser ↔ Gateway: Opus @ 48kHz (DTLS-SRTP, ICE via TURN)
  • Gateway ↔ PSTN: PCMU @ 8kHz (SDES-SRTP)
  • Transcoding: Opus 48kHz → PCMU 8kHz

[2025-01-15 14:32:24] >>> Sending "answered" + SDP answer to browser

# Call in progress
[2025-01-15 14:32:25] RTP stats (every 5s):
  Browser side: 247 pkts sent, 245 pkts recv, 0.81% loss, 23ms jitter
  PSTN side:    123 pkts sent, 122 pkts recv, 0.00% loss, 3ms jitter
  Transcoding:  ~12% CPU, 0 packet drops

[2025-01-15 14:36:12] <<< SIP BYE from PSTN
[2025-01-15 14:36:12] Call ended by remote party
[2025-01-15 14:36:12] Duration: 3min 47sec
[2025-01-15 14:36:12] >>> Sending "hangup" to browser
[2025-01-15 14:36:12] Releasing transcoding resources
[2025-01-15 14:36:12] WebSocket connection closed

✓ Call completed successfully
  Total duration: 3:47
  Audio quality: MOS 4.2 (Good)
  Packets transcoded: 11,310

Network traffic visualization:

Browser (WebRTC)          Gateway (Bridge)        PSTN (SIP)
┌─────────────────┐      ┌─────────────────┐      ┌─────────────────┐
│ JavaScript      │      │ WebRTC endpoint │      │ SIP trunk       │
│ getUserMedia()  │      │ DTLS-SRTP       │      │ provider        │
│ RTCPeerConn.    │◄────►│ ICE/TURN server │      │                 │
│                 │ WSS  │                 │      │                 │
│ Opus 48kHz      │─────►│ Opus↔PCMU xcode│◄────►│ PCMU 8kHz       │
│ DTLS-SRTP       │ TURN │ SDES-SRTP       │ RTP  │ SRTP/RTP        │
└─────────────────┘      └─────────────────┘      └─────────────────┘
     (NAT/FW)                                          (Internet)

The Core Question You’re Answering

“How do you bridge the gap between modern browser-based WebRTC and traditional SIP/PSTN networks?”

After completing this, you’ll understand:

  • Why WebRTC and SIP are incompatible without translation (different security, NAT handling, codecs)
  • How ICE/STUN/TURN work to traverse NAT for WebRTC
  • The complexity of transcoding between Opus (wideband, WebRTC) and G.711 (narrowband, PSTN)
  • How DTLS-SRTP differs from SDES-SRTP (encryption key exchange mechanisms)
  • Why gateways are CPU-intensive (transcoding, media relay, encryption/decryption)

Concepts You Must Understand First

Concept What You Need to Know Where to Learn It
WebRTC Architecture PeerConnection, SDP offer/answer, ICE candidates “WebRTC: APIs and RTCWEB Protocols” Ch. 1-3
ICE/STUN/TURN NAT traversal, candidate gathering, connectivity checks RFC 8445 + “High Performance Browser Networking” Ch. 18
DTLS-SRTP Certificate-based SRTP key exchange RFC 5764
SDP Munging Modifying SDP for interop (codec mapping, IP rewriting) RFC 3264 + RFC 4566
Opus Codec Wideband audio, adaptive bitrate RFC 6716 + Opus documentation
Audio Transcoding Resampling (48kHz→8kHz), format conversion libsamplerate, FFmpeg docs

Questions to Guide Your Design

Before building, decide:

  1. Media Relay Strategy: Will you relay all RTP (secure but CPU-heavy) or use TURN only when direct connection fails?

  2. Transcoding Policy: Always transcode to PCMU for PSTN, or negotiate Opus if SIP endpoint supports it?

  3. ICE Implementation: Full ICE (slow, reliable) or ICE-Lite (faster, less compatible)?

  4. TURN Server: Run your own (STUN/TURN server like coturn) or use a cloud service (Twilio TURN)?

  5. WebSocket Signaling: How will browsers discover your gateway? WebSocket URL in JavaScript, or SIP over WebSocket?

  6. Scaling: One gateway process per call (simple), or shared process with multiple call contexts (complex)?

  7. Security: How do you authenticate browsers? API keys, JWT tokens, SIP digest auth?

Thinking Exercise (Do This Before Coding!)

Exercise: Map SDP field translations.

Create a table showing how each WebRTC SDP attribute maps to SIP SDP:

WebRTC SDP SIP SDP Transformation Required
a=ice-ufrag (none) Remove (SIP doesn’t use ICE)
a=fingerprint (none) Remove (convert to SDES crypto)
a=rtpmap:111 opus/48000 a=rtpmap:0 PCMU/8000 Codec negotiation + transcode
m=audio 9 UDP/TLS/RTP/SAVPF m=audio 20000 RTP/AVP Port + transport protocol

Expected outcome: You should identify ~10-15 incompatibilities that need translation.

The Interview Questions They’ll Ask

Real questions for WebRTC/SIP gateway engineers:

  1. “Why can’t browsers just do SIP natively?”
    • Answer: Browsers standardized on WebRTC (DTLS-SRTP, ICE, SCTP data channels) instead of SIP. SIP requires server-managed registration and complex NAT handling that doesn’t fit the web security model.
  2. “What’s the difference between DTLS-SRTP and SDES-SRTP?”
    • Answer: DTLS-SRTP uses TLS handshake over UDP to exchange SRTP keys (WebRTC standard). SDES-SRTP puts keys directly in SDP (SIP standard, less secure—keys in plaintext in SDP).
  3. “How does ICE candidate gathering work in WebRTC?”
    • Answer: Browser gathers candidates (host, srflx via STUN, relay via TURN), sends all to peer in SDP. Both sides perform connectivity checks to find best path. Preference: host > srflx > relay.
  4. “Why is transcoding expensive?”
    • Answer: Opus (48kHz) → PCMU (8kHz) requires decoding Opus to PCM, resampling from 48kHz to 8kHz, then encoding to PCMU. Each call leg needs real-time DSP processing (10-20% CPU per call).
  5. “What causes one-way audio in WebRTC-SIP calls?”
    • Answer: Asymmetric media path—often ICE fails to browser (firewall blocking UDP), but TURN relay works outbound. Or, SDP c= line has private IP unreachable from gateway.
  6. “How do you handle DTMF in WebRTC?”
    • Answer: WebRTC sends DTMF as RTP events (RFC 4733). SIP expects same format, but gateway must map payload types correctly or SIP side won’t recognize tones.

Hints in Layers

Stuck on WebRTC signaling?

Hint 1: WebSocket server for signaling ```javascript const WebSocket = require('ws'); const wss = new WebSocket.Server({ port: 8443 }); wss.on('connection', (ws) => { console.log('Browser connected'); ws.on('message', (data) => { const msg = JSON.parse(data); if (msg.type === 'offer') { // Got WebRTC SDP offer from browser handle_webrtc_offer(msg.sdp, ws); } else if (msg.type === 'candidate') { // Got ICE candidate from browser add_ice_candidate(msg.candidate); } }); }); function send_answer(ws, sdp) { ws.send(JSON.stringify({ type: 'answer', sdp: sdp })); } ```
Hint 2: SDP translation example ```javascript function webrtc_to_sip_sdp(webrtc_sdp) { let sip_sdp = webrtc_sdp; // Remove WebRTC-specific attributes sip_sdp = sip_sdp.replace(/a=ice-ufrag:.*\r\n/g, ''); sip_sdp = sip_sdp.replace(/a=ice-pwd:.*\r\n/g, ''); sip_sdp = sip_sdp.replace(/a=fingerprint:.*\r\n/g, ''); sip_sdp = sip_sdp.replace(/a=setup:.*\r\n/g, ''); sip_sdp = sip_sdp.replace(/a=candidate:.*\r\n/g, ''); // Change transport protocol sip_sdp = sip_sdp.replace(/UDP\/TLS\/RTP\/SAVPF/, 'RTP/AVP'); // Map Opus to PCMU sip_sdp = sip_sdp.replace(/a=rtpmap:111 opus\/48000\/2/, 'a=rtpmap:0 PCMU/8000'); // Add SDES crypto (SRTP keys) sip_sdp += `a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:${generate_srtp_key()}\r\n`; // Update connection address to gateway's public IP sip_sdp = sip_sdp.replace(/c=IN IP4 .*/, `c=IN IP4 ${GATEWAY_PUBLIC_IP}`); return sip_sdp; } ```
Hint 3: Audio transcoding pipeline ```c // Simplified transcoding flow void transcode_audio(uint8_t *opus_packet, int opus_len, uint8_t *pcmu_out, int *pcmu_len) { // 1. Decode Opus to PCM float samples (48kHz) float pcm_48k[960]; // 20ms at 48kHz opus_decode_float(opus_decoder, opus_packet, opus_len, pcm_48k, 960, 0); // 2. Resample from 48kHz to 8kHz float pcm_8k[160]; // 20ms at 8kHz src_simple(&src_data, SRC_SINC_FASTEST, 1); // libsamplerate // 3. Convert float to int16 int16_t pcm_8k_int[160]; for (int i = 0; i < 160; i++) { pcm_8k_int[i] = pcm_8k[i] * 32767.0f; } // 4. Encode to G.711 μ-law for (int i = 0; i < 160; i++) { pcmu_out[i] = linear_to_ulaw(pcm_8k_int[i]); } *pcmu_len = 160; } ```

Stuck on ICE/TURN?

Hint 4: Setting up TURN server integration ```javascript // Browser-side: Configure TURN server const peerConnection = new RTCPeerConnection({ iceServers: [ { urls: 'stun:stun.l.google.com:19302' }, // Public STUN { urls: 'turn:yourgateway.com:3478', // Your TURN username: 'webrtc-user', credential: 'secret-password' } ] }); // Gateway-side: Run coturn TURN server // /etc/turnserver.conf: /* listening-port=3478 realm=yourgateway.com user=webrtc-user:secret-password external-ip=203.0.113.10 */ ```

Books That Will Help

Book Relevant Chapters What You’ll Learn
“WebRTC: APIs and RTCWEB Protocols” (Johnston) Ch 1-5, 8-9 Complete WebRTC stack
“High Performance Browser Networking” (Grigorik) Ch 18 WebRTC deep-dive, ICE, TURN
“Real-Time Communication with WebRTC” (Loreto) Ch 2-4, 6 Practical WebRTC implementation
“SIP: Understanding SIP Protocol” (Johnston) Ch 7 (SDP) SIP SDP format for interop
FFmpeg/libav documentation Audio transcoding guides Opus/PCMU codec conversion

Common Pitfalls & Debugging

Problem 1: “Browser connects but call never starts”

  • Why: SDP offer/answer negotiation failed, likely codec mismatch or malformed SDP
  • Fix: Log full SDP exchange, validate with SDP parsing tools
  • Quick test: Copy SDPs to WebRTC SDP validator

Problem 2: “One-way audio: browser hears nothing”

  • Why: ICE connectivity failed from gateway to browser (firewall blocking UDP)
  • Fix: Ensure TURN server is running, browser includes TURN relay candidates
  • Quick test: Check ICE candidate types in browser DevTools—should see “relay” candidate

Problem 3: “Audio is robotic/choppy in browser”

  • Why: Transcoding too slow, or packet pacing wrong (sending too fast/slow)
  • Fix: Optimize transcoder, send RTP every 20ms precisely (use timers)
  • Quick test: Monitor CPU—if >80%, transcoding can’t keep up

Problem 4: “DTLS handshake fails”

  • Why: Certificate mismatch, fingerprint in SDP doesn’t match actual certificate
  • Fix: Generate self-signed cert, compute SHA-256 fingerprint correctly for SDP
  • Quick test: Browser console shows DTLS error—check certificate fingerprint

Problem 5: “SIP side connects but no audio”

  • Why: SRTP keys wrong (SDES crypto attribute malformed), or no SRTP support
  • Fix: Verify SDES key format: a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:BASE64
  • Quick test: Wireshark shows encrypted RTP but no audio—key mismatch

Problem 6: “High CPU usage, server can’t handle >5 calls”

  • Why: Transcoding is CPU-bound, not optimized
  • Fix: Use hardware acceleration (Intel Quick Sync), or offload to GPU, or limit codec negotiation to PCMU-only (no transcoding)
  • Quick test: Profile with perf top—libopus/resampling should dominate

Problem 7: “Calls work locally but fail over internet”

  • Why: TURN server not reachable, or symmetric NAT blocking direct connection
  • Fix: Open UDP ports for TURN (3478), verify TURN server is publicly accessible
  • Quick test: Use TURN checker tool or browser ICE test page

Problem 8: “DTMF tones don’t work”

  • Why: Payload type mismatch for telephone-event (RFC 4733)
  • Fix: Map WebRTC DTMF PT to SIP trunk’s expected PT in SDP
  • Quick test: Send DTMF, capture RTP—should see PT 101 or 96 with event code

Additional Resources for This Project

Based on the latest WebRTC-SIP integration research (2025):


Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
Raw SIP Client Advanced 1 month+ ⭐⭐⭐⭐⭐ (deepest) ⭐⭐⭐
Asterisk Phone System Intermediate 2-3 weeks ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Twilio Application Intermediate 1-2 weeks ⭐⭐⭐ ⭐⭐⭐⭐
SIP Proxy Server Advanced 1 month+ ⭐⭐⭐⭐⭐ ⭐⭐⭐
WebRTC-SIP Gateway Advanced 1 month+ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐

Recommendation

Based on your goal of understanding VoIP “for public use connected to the normal phone network,” I recommend this progression:

Start with: Project 2 (Asterisk Phone System)

This gives you the fastest path to a working system that connects to real phone numbers. You’ll learn the business side of VoIP (SIP trunks, DIDs, pricing) while building something immediately useful.

Then do: Project 1 (Raw SIP Client)

Once you’ve seen VoIP working at the Asterisk level, go deeper. Building a SIP client from scratch will demystify everything you configured in Asterisk. You’ll understand why those config options exist.

Finally: Project 5 (WebRTC-SIP Gateway)

This is the modern architecture for public-facing VoIP. Most new telephony products are browser-first. This combines everything you’ve learned.


Final Overall Project: Production VoIP Service

What you’ll build: A complete, production-ready VoIP platform like a mini-Twilio:

  • Web dashboard for provisioning phone numbers
  • REST API for programmatic call control
  • WebRTC client for browser-based calling
  • Asterisk/FreeSWITCH backend for media handling
  • SIP trunk connections to multiple carriers for redundancy
  • Call detail records, billing, usage analytics
  • Fraud detection and rate limiting

Why it teaches VoIP end-to-end: This is what companies like Twilio, Vonage, and Plivo actually built. You’ll face every real challenge: carrier negotiations, number portability, E911 compliance, billing by the minute, handling thousands of concurrent calls.

Core challenges you’ll face:

  • Multi-tenant architecture (maps to SaaS telephony platforms)
  • Carrier failover and least-cost routing (maps to production reliability)
  • Real-time billing and fraud detection (maps to telecom economics)
  • E911 and regulatory compliance (maps to legal requirements)
  • Scaling media servers horizontally (maps to high-availability design)

Resources for key challenges:

  • “Operating FreeSWITCH” by Anthony Minessale - Production deployment
  • “Asterisk: The Definitive Guide” Ch. 22 - Clustering and HA
  • Twilio/Vonage engineering blogs - How they solved scale problems
  • FCC/CRTC regulations - Compliance requirements

Key Concepts:

  • Least Cost Routing: Kamailio LCR module docs - Carrier selection algorithms
  • E911 Requirements: FCC VoIP E911 rules - Legal compliance
  • CDR Processing: “VoIP and Unified Communications” Ch. 12 - Billing systems
  • Horizontal Scaling: FreeSWITCH clustering docs - Media server scaling
  • Fraud Prevention: “Hacking Exposed VoIP” - Toll fraud patterns

Difficulty: Expert Time estimate: 3-6 months Prerequisites: All previous projects, production system design experience

Real world outcome:

  • Customers can sign up, buy phone numbers, and make calls
  • Your platform handles hundreds of concurrent calls
  • You have a billing system that charges by usage
  • Monitoring dashboards show real-time platform health
  • You could theoretically launch this as a business

Learning milestones:

  1. Single-tenant version works end-to-end - you understand the full stack
  2. Multi-tenant with isolation works - you understand SaaS telephony
  3. Carrier failover and scaling work - you understand production telephony
  4. Billing and compliance in place - you understand the business of VoIP

Quick Start Recommendation

This weekend: Set up a DigitalOcean/Vultr VPS, install Asterisk, get a $5 DID from Voip.ms, and make your first PSTN call through your own PBX. This gives you immediate hands-on experience with real telephony.