Filesystem Internals Learning Projects

Filesystem Internals Learning Projects

A comprehensive project-based learning series for mastering filesystem internals through hands-on C programming. From parsing raw disk structures to building production-grade copy-on-write filesystems.

Series Overview

Filesystems are one of the most elegant abstractions in computingโ€”they transform raw disk blocks into the familiar hierarchy of files and directories we use daily. This series takes you from reading raw bytes to building a complete, crashproof filesystem with snapshots.

Core Concepts Covered

Concept What It Covers
On-disk layout Superblock, inode tables, data blocks, block groups
Metadata management Inodes, directory entries, permissions, timestamps
Block allocation Bitmap vs extent-based, fragmentation, free space tracking
Journaling Write-ahead logging, crash recovery, consistency guarantees
VFS layer How Linux abstracts different filesystems behind one interface
Caching Page cache, buffer cache, write-back vs write-through
Directory implementation Linear lists vs hash tables vs B-trees

Projects in This Series

# Project Difficulty Time What Youโ€™ll Build
1 Hex Dump Disk Explorer Intermediate 1-2 weeks CLI tool to visualize ext2/ext4 filesystem structures
2 In-Memory Filesystem with FUSE Advanced 2-3 weeks Complete mountable RAM filesystem
3 Persistent Block-Based Filesystem Expert 3-4 weeks Mini ext2 with bitmap allocation and indirect blocks
4 Journaling Layer Master 2-3 weeks Write-ahead logging for crash consistency
5 Filesystem Comparison Tool Advanced 2 weeks Benchmark and analyze ext4, FAT32, and custom FS
C Copy-on-Write Filesystem (Capstone) Master 1-2 months ZFS/Btrfs-style COW with snapshots and checksums

Learning Path

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    RECOMMENDED PROGRESSION                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚   P1: Hex Dump Explorer                                          โ”‚
โ”‚   โ””โ”€โ”€ Understand on-disk structures                              โ”‚
โ”‚        โ”‚                                                         โ”‚
โ”‚        โ–ผ                                                         โ”‚
โ”‚   P2: In-Memory FUSE                                             โ”‚
โ”‚   โ””โ”€โ”€ Learn VFS operations and syscall interface                 โ”‚
โ”‚        โ”‚                                                         โ”‚
โ”‚        โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”‚
โ”‚        โ–ผ                     โ–ผ                  โ–ผ               โ”‚
โ”‚   P3: Persistent FS     P5: Comparison     (Breadth)            โ”‚
โ”‚   โ””โ”€โ”€ Block allocation       Tool                                โ”‚
โ”‚        โ”‚                                                         โ”‚
โ”‚        โ–ผ                                                         โ”‚
โ”‚   P4: Journaling                                                 โ”‚
โ”‚   โ””โ”€โ”€ Crash consistency                                          โ”‚
โ”‚        โ”‚                                                         โ”‚
โ”‚        โ–ผ                                                         โ”‚
โ”‚   CAPSTONE: Copy-on-Write FS                                     โ”‚
โ”‚   โ””โ”€โ”€ Modern filesystem design                                   โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Prerequisites

Before starting this series, you should have:

  • C Programming: Comfortable with pointers, structs, memory management
  • Binary Data: Understanding of hex, endianness, bit manipulation
  • Linux Basics: Command line, file operations, basic system calls
  • Data Structures: Arrays, linked lists, trees (helpful for later projects)

Key Resources

Books

Topic Book Why It Helps
Filesystem fundamentals โ€œOperating Systems: Three Easy Piecesโ€ by Arpaci-Dusseau Ch. 39-43 cover everything from inodes to journaling
ext2/ext4 implementation โ€œUnderstanding the Linux Kernelโ€ by Bovet & Cesati Ch. 12 & 18 for VFS and ext2/3 internals
System programming โ€œThe Linux Programming Interfaceโ€ by Kerrisk File I/O, stat, directories
Performance analysis โ€œSystems Performanceโ€ by Brendan Gregg Ch. 8 for filesystem benchmarking
Binary parsing โ€œC Programming: A Modern Approachโ€ by K.N. King Ch. 22 for binary file I/O

Online References

Project Comparison

Feature P1 P2 P3 P4 P5 Capstone
Read filesystem structures โœ“ ย  โœ“ ย  โœ“ โœ“
Write filesystem structures ย  โœ“ โœ“ โœ“ ย  โœ“
FUSE integration ย  โœ“ โœ“ โœ“ ย  โœ“
Persistence ย  ย  โœ“ โœ“ ย  โœ“
Crash recovery ย  ย  ย  โœ“ ย  โœ“
Multiple FS formats โœ“ ย  ย  ย  โœ“ ย 
Performance benchmarking ย  ย  ย  ย  โœ“ ย 
Snapshots ย  ย  ย  ย  ย  โœ“
Data checksums ย  ย  ย  ย  ย  โœ“

What Youโ€™ll Understand After This Series

  1. Why filesystems exist: The abstraction layer between raw disk blocks and user files
  2. How data is organized: Superblocks, inodes, block groups, directory entries
  3. Allocation strategies: Bitmaps vs FAT chains vs extents
  4. Crash consistency: Why journaling matters and how write-ahead logging works
  5. Performance tradeoffs: Why ext4 is faster than FAT32, why SSDs change the game
  6. Modern designs: Copy-on-write, snapshots, self-healing filesystems

Interview Preparation

Completing these projects prepares you for systems-level interview questions:

  • โ€œWhat is an inode, and what information does it store?โ€
  • โ€œHow does ext4 resolve a file path to disk blocks?โ€
  • โ€œWhat happens if power fails during a file write?โ€
  • โ€œWhy does FAT32 have a 4GB file size limit?โ€
  • โ€œHow would you implement snapshots efficiently?โ€

Each project includes specific interview questions and expected answers.

Getting Started

  1. Environment Setup: Linux (native or VM) with development tools
    sudo apt install build-essential libfuse3-dev
    
  2. Create Test Images: Each project shows how to create test disk images
    dd if=/dev/zero of=test.img bs=1M count=10
    mkfs.ext2 test.img
    
  3. Start with Project 1: Build the foundational understanding of on-disk structures

License

Educational materials for personal learning. See individual project files for specific licensing.