← Back to all projects

VIDEO AUDIO MUXING DEMUXING PROJECTS

Learning Video/Audio Muxing, Demuxing & Formats Through Low-Level Programming

Excellent choice! This domain sits at the intersection of binary formats, compression theory, and systems programming. Understanding how FFmpeg works internally will teach you about container formats, codec architectures, and how multimedia data flows through a processing pipeline.

Core Concept Analysis

To truly understand video/audio muxing and demuxing, you need to grasp these fundamental building blocks:

Concept What It Means
Container Format The “box” that holds streams (MP4, MKV, AVI, TS) - stores metadata, timing, and interleaves data
Codec Algorithm that compresses/decompresses actual video/audio data (H.264, AAC, VP9)
Muxing Combining multiple streams (video, audio, subtitles) into a single container file
Demuxing Extracting individual streams from a container file
Packets Chunks of compressed data belonging to a stream
Frames Decoded raw data (pixels for video, samples for audio)
PTS/DTS Presentation/Decode Timestamps - controls playback timing and sync
Bitstream The raw encoded data format within a codec (NAL units for H.264)

Project 1: WAV Audio File Parser & Writer

  • File: VIDEO_AUDIO_MUXING_DEMUXING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Multimedia / File Formats
  • Software or Tool: WAV Format
  • Main Book: “The Audio Programming Book” by Richard Boulanger

What you’ll build: A C program that reads WAV files, displays header information, manipulates raw PCM audio data, and writes modified WAV files.

Why it teaches multimedia fundamentals: WAV is the “hello world” of container formats—it has a simple, well-documented structure with a header followed by raw audio samples. You’ll learn binary file parsing, endianness handling, and the fundamental concept of separating metadata (container) from payload (audio data).

Core challenges you’ll face:

  • Parsing binary structures (maps to understanding container headers)
  • Handling different sample formats (8-bit, 16-bit, 32-bit float)
  • Understanding sample rate, channels, and bit depth relationships
  • Writing properly formatted binary files

Key Concepts:

  • Binary file I/O in C: “C Programming: A Modern Approach” by K.N. King - Chapter 22 (Input/Output)
  • Endianness and byte ordering: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Chapter 2
  • Audio fundamentals (sample rate, bit depth): “Digital Audio Fundamentals” article on Wikipedia

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic C programming, understanding of binary/hex

Real world outcome:

  • Run your program on any WAV file and see: sample rate, channels, duration, bit depth printed to console
  • Apply effects like volume change, reverse audio, or fade in/out
  • Play your modified WAV file in any audio player to verify it works

Learning milestones:

  1. Successfully parse WAV header → understand container structure concept
  2. Read and modify PCM samples → grasp raw vs. encoded data distinction
  3. Write valid WAV file → understand muxing at its simplest level

Project 2: BMP Image Sequence to Raw Video Converter

  • File: VIDEO_AUDIO_MUXING_DEMUXING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Video / Data Processing
  • Software or Tool: YUV / FFmpeg
  • Main Book: “Computer Graphics from Scratch” by Gabriel Gambetta

What you’ll build: A tool that takes a folder of BMP images and creates an uncompressed raw video file (YUV4MPEG2 format), then uses FFmpeg CLI to encode it.

Why it teaches video fundamentals: Before understanding compressed video, you must understand raw video—frames as arrays of pixels, color spaces (RGB vs YUV), frame timing. YUV4MPEG2 (Y4M) is a simple header + raw frames format that FFmpeg can read.

Core challenges you’ll face:

  • Understanding RGB to YUV color space conversion (this is how video codecs think!)
  • Handling frame dimensions, aspect ratios, and pixel formats
  • Writing sequential frame data with proper timing metadata
  • Understanding planar vs. packed pixel formats

Key Concepts:

  • Color spaces (RGB, YUV): “Computer Graphics from Scratch” by Gabriel Gambetta - Chapter on color
  • Image file formats: BMP specification is publicly available and simple
  • Video fundamentals: FFmpeg libav tutorial - Introduction section

Difficulty: Beginner-Intermediate Time estimate: Weekend - 1 week Prerequisites: C programming, basic understanding of images as pixel arrays

Real world outcome:

  • Feed your Y4M file to FFmpeg: ffmpeg -i output.y4m -c:v libx264 video.mp4
  • Play the resulting MP4 in VLC—you created a video from scratch!
  • Vary frame rate and see how it affects playback speed

Learning milestones:

  1. Successfully convert BMP to raw pixel data → understand frames as data arrays
  2. Write valid Y4M file → grasp raw video container concept
  3. Successfully encode with FFmpeg → see the compression ratio difference

Project 3: MPEG-TS Demuxer from Scratch

  • File: VIDEO_AUDIO_MUXING_DEMUXING_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++, Go
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 1: The “Resume Gold”
  • Difficulty: Level 2: Intermediate (The Developer)
  • Knowledge Area: Video/Audio, Binary Parsing
  • Software or Tool: FFmpeg, MPEG-TS
  • Main Book: Digital Video and HD: Algorithms and Interfaces by Charles Poynton

What you’ll build: A C program that parses MPEG Transport Stream (.ts) files, extracts the packet structure, identifies streams (video/audio PIDs), and dumps elementary streams to separate files.

Why it teaches demuxing deeply: MPEG-TS is the format used for broadcast TV, streaming (HLS), and Blu-rays. It has a relatively simple packet structure (188-byte fixed packets) but teaches you about Program Association Tables (PAT), Program Map Tables (PMT), PIDs, and how multiple streams are interleaved. This is real demuxing.

Core challenges you’ll face:

  • Parsing fixed-size packet headers and sync bytes
  • Understanding PID (Packet Identifier) routing
  • Parsing PAT/PMT tables to discover stream types
  • Reconstructing elementary streams from fragmented packets
  • Handling adaptation fields and stuffing bytes

Resources for key challenges:

  • FFmpeg mov.c source - Reference implementation to study
  • ISO/IEC 13818-1 specification (MPEG-2 Systems) - The actual standard
  • “Digital Video and HD: Algorithms and Interfaces” by Charles Poynton - Comprehensive reference

Key Concepts:

  • Binary protocol parsing: “Computer Systems: A Programmer’s Perspective” - Chapter 7 (Linking) for understanding structured binary
  • Transport streams: MPEG-TS specification overview on Wikipedia
  • Packet-based multiplexing: FFmpeg libav tutorial - Demuxing section

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Comfortable with C, binary file parsing, bitwise operations

Real world outcome:

  • Run on any .ts file (record from TV, download HLS segment)
  • Print stream table: “PID 256: H.264 Video, PID 257: AAC Audio”
  • Extract raw H.264 stream to file, verify with ffprobe extracted.h264
  • Feed extracted stream back through FFmpeg to remux into MP4

Learning milestones:

  1. Parse packet headers, find sync bytes → understand transport layer
  2. Parse PAT/PMT, identify streams → understand stream discovery
  3. Extract complete elementary stream → understand demuxing fully

Project 4: Video Player Using libav (FFmpeg Libraries)

  • File: VIDEO_AUDIO_MUXING_DEMUXING_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: C++, Rust, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 2: The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate (The Developer)
  • Knowledge Area: Video/Audio, Multimedia
  • Software or Tool: FFmpeg, SDL2
  • Main Book: Video Demystified by Keith Jack

What you’ll build: A minimal video player in C that uses FFmpeg’s libavformat (demuxing), libavcodec (decoding), and SDL2 (display) to play video files with audio sync.

Why it teaches FFmpeg architecture: This is how real video players work. You’ll understand AVFormatContext, AVCodecContext, AVPacket, AVFrame—the core abstractions that power everything from VLC to YouTube’s backend. You’ll experience firsthand the demux → decode → render pipeline.

Core challenges you’ll face:

  • Opening containers and finding streams with libavformat
  • Setting up decoders with libavcodec
  • Converting pixel formats with libswscale
  • Audio/video synchronization using PTS values
  • Real-time playback timing

Resources for key challenges:

Key Concepts:

  • FFmpeg data structures: FFmpeg libav tutorial - “Learn FFmpeg libav the Hard Way”
  • A/V sync: “Video Demystified” by Keith Jack - Chapter on timing
  • SDL2 basics: SDL2 documentation and tutorials

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Solid C, understanding of pointers and memory management, basic threading concepts

Real world outcome:

  • Play any video file (MP4, MKV, AVI) in your own player window
  • See frames render on screen with synchronized audio
  • Add features: seek, pause, volume control
  • Understand exactly what VLC does under the hood

Learning milestones:

  1. Open file, enumerate streams → understand libavformat
  2. Decode video frame, display it → understand libavcodec
  3. Sync audio and video playback → understand PTS/DTS timing
  4. Handle multiple container formats → appreciate format abstraction

Project 5: H.264 NAL Unit Parser

  • File: VIDEO_AUDIO_MUXING_DEMUXING_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++, Go
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 1: The “Resume Gold”
  • Difficulty: Level 3: Advanced (The Engineer)
  • Knowledge Area: Video Codecs, Binary Parsing
  • Software or Tool: H.264, x264
  • Main Book: H.264 and MPEG-4 Video Compression by Iain Richardson

What you’ll build: A tool that parses H.264/AVC bitstreams, identifies NAL (Network Abstraction Layer) units, extracts SPS/PPS parameters, and reports frame types (I/P/B frames).

Why it teaches codec internals: Demuxing gets you packets, but those packets contain encoded bitstreams. H.264 organizes data into NAL units—understanding this layer bridges the gap between container and raw pixels. You’ll see I-frames (keyframes), understand why seeking jumps to keyframes, and grasp the concept of reference frames.

Core challenges you’ll face:

  • Finding NAL unit start codes (0x000001 or 0x00000001)
  • Parsing NAL unit headers (type, reference IDC)
  • Understanding SPS (Sequence Parameter Set) and PPS (Picture Parameter Set)
  • Exponential-Golomb coding for parsing syntax elements
  • Distinguishing slice types (I/P/B)

Resources for key challenges:

  • Vcodex H.264 Overview - Excellent technical introduction
  • “H.264 and MPEG-4 Video Compression” by Iain Richardson - The definitive book
  • x264 source code - Study a real implementation

Key Concepts:

  • Bitstream parsing and variable-length codes: H.264 specification ITU-T H.264
  • Video compression fundamentals: “H.264 and MPEG-4 Video Compression” by Iain Richardson - Chapters 1-5
  • NAL unit structure: Vcodex H.264 Overview

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Strong C, comfortable with bit manipulation, understanding of video frames

Real world outcome:

  • Run on any H.264 file: ./h264parse video.h264
  • Output like:
    NAL Unit 0: SPS (width=1920, height=1080, profile=High)
    NAL Unit 1: PPS
    NAL Unit 2: IDR Slice (I-frame, keyframe)
    NAL Unit 3: Non-IDR Slice (P-frame, refs=1)
    ...
    
  • Understand why ffmpeg -i input.mp4 -c:v copy -f h264 output.h264 produces what it does

Learning milestones:

  1. Find and count NAL units → understand bitstream structure
  2. Parse SPS, extract resolution → understand parameter sets
  3. Identify frame types → understand I/P/B frame dependencies

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
WAV Parser/Writer Beginner Weekend ⭐⭐ (container basics) ⭐⭐⭐ (immediate audio feedback)
BMP to Raw Video Beginner-Int Weekend-1wk ⭐⭐⭐ (video fundamentals) ⭐⭐⭐⭐ (create videos!)
MPEG-TS Demuxer Intermediate 1-2 weeks ⭐⭐⭐⭐ (real demuxing) ⭐⭐⭐ (satisfying parsing)
libav Video Player Intermediate 2-3 weeks ⭐⭐⭐⭐⭐ (full pipeline) ⭐⭐⭐⭐⭐ (build a player!)
H.264 NAL Parser Advanced 2-3 weeks ⭐⭐⭐⭐⭐ (codec internals) ⭐⭐⭐ (deep but abstract)

Recommendation

Based on the goal of understanding how FFmpeg works and learning low-level programming:

Start with: Project 1 (WAV Parser) → Takes a weekend, builds binary parsing confidence

Then: Project 2 (BMP to Raw Video) → Understand video fundamentals before compression

Main learning: Project 4 (libav Video Player) → This is where everything clicks. The ffmpeg-libav-tutorial by Leandro Moreira is exceptional and will guide you through building this.

Deep dive: Project 3 (MPEG-TS Demuxer) if you want to understand container internals, or Project 5 (H.264 Parser) if you want to understand codec internals.


Final Overall Project: Build a Media Transcoder CLI

What you’ll build: A complete command-line transcoder (like a mini-FFmpeg) that can:

  • Read any video file (MP4, MKV, AVI, TS)
  • Decode video and audio streams
  • Apply filters (resize, crop, audio gain)
  • Re-encode to different codecs (using libavcodec)
  • Mux into a different container format

Why this is the capstone: This project ties together everything:

  • Demuxing (libavformat)
  • Decoding (libavcodec)
  • Frame processing (libavutil, libswscale, libswresample)
  • Encoding (libavcodec)
  • Muxing (libavformat)

You’ll implement the exact pipeline that FFmpeg uses: input → demux → decode → filter → encode → mux → output

Core challenges you’ll face:

  • Managing multiple codecs simultaneously
  • Handling different timebase conversions
  • Memory management for frame buffers
  • Supporting various input/output format combinations
  • Implementing proper flush/drain on stream end

Key Concepts:

Difficulty: Advanced Time estimate: 1 month+ Prerequisites: Completed Projects 1-4, solid understanding of FFmpeg libraries

Real world outcome:

  • Run: ./mytranscoder input.mkv -vcodec h264 -acodec aac -s 1280x720 output.mp4
  • Produce valid, playable output files
  • Understand exactly what ffmpeg -i input.mkv -c:v libx264 -c:a aac -s 1280x720 output.mp4 does internally
  • Be able to read FFmpeg source code and understand it

Learning milestones:

  1. Transmux (change container, copy codecs) → understand format independence
  2. Transcode video only → understand decode/encode cycle
  3. Add audio transcoding → understand multi-stream handling
  4. Add filters → understand frame processing pipeline
  5. Handle edge cases → production-quality understanding

Essential Resources

The single best resource for this entire learning journey:

📚 FFmpeg libav tutorial by Leandro Moreira - This GitHub repo walks you through everything from “hello world” to transcoding, with excellent explanations and working C code.

Additional resources:


Sources: