Filesystem Internals - Expanded Project Guides

Generated from: FILESYSTEM_INTERNALS_LEARNING_PROJECTS.md

Overview

Filesystems are one of the most elegant abstractions in computing—they transform raw disk blocks into the familiar hierarchy of files and directories we use daily. This expanded project series takes you from parsing raw filesystem structures to building production-quality filesystems with journaling and copy-on-write semantics.

Core Concepts Covered:

  • On-disk layout (superblocks, inode tables, data blocks, block groups)
  • Metadata management (inodes, directory entries, permissions, timestamps)
  • Block allocation (bitmap vs extent-based, fragmentation, free space tracking)
  • Journaling (write-ahead logging, crash recovery, consistency guarantees)
  • VFS layer (how Linux abstracts different filesystems behind one interface)
  • Caching (page cache, buffer cache, write-back vs write-through)
  • Directory implementation (linear lists vs hash tables vs B-trees)

Project Index

# Project Difficulty Time Key Focus
1 Hex Dump Disk Explorer Beginner-Intermediate 1-2 weeks Binary parsing, ext2 structures, on-disk layout
2 In-Memory Filesystem with FUSE Intermediate 2-3 weeks VFS operations, FUSE API, inode management
3 Persistent Block-Based Filesystem Intermediate-Advanced 3-4 weeks Block allocation, bitmaps, indirect pointers
4 Journaling Layer Advanced 2-3 weeks Write-ahead logging, crash recovery, transactions
5 Filesystem Comparison Tool Intermediate 2 weeks Performance analysis, FAT32 parsing, benchmarking
6 Production-Grade Copy-on-Write Filesystem Advanced 1-2 months COW semantics, snapshots, checksums, garbage collection

Learning Paths

Goal: Understand how filesystems work from first principles

  1. P01: Hex Dump Disk Explorer (1-2 weeks)
    • Parse raw ext2 structures
    • Build mental model of on-disk layout
    • Understand inodes, directories, block pointers
  2. P02: In-Memory FUSE Filesystem (2-3 weeks)
    • Implement VFS operations (getattr, read, write, mkdir)
    • See what the kernel does for every file operation
    • Most satisfying “aha moment” when ls and cat just work

Path 2: Production Systems (For Intermediate Developers)

Goal: Build a complete, persistent, crash-consistent filesystem

  1. Complete Path 1 first
  2. P03: Persistent Block-Based Filesystem (3-4 weeks)
    • Add persistence to disk images
    • Implement bitmap-based block allocation
    • Handle indirect blocks for large files
  3. P04: Journaling Layer (2-3 weeks)
    • Add write-ahead logging
    • Implement crash recovery
    • Understand why ext4 mounts instantly after crashes

Path 3: Comparative Analysis (For Breadth)

Goal: Understand why different filesystems exist

  1. P01: Hex Dump Disk Explorer (foundation)
  2. P05: Filesystem Comparison Tool (2 weeks)
    • Parse FAT32 and ext4 structures
    • Measure performance, fragmentation, space efficiency
    • Understand tradeoffs between filesystem designs

Path 4: Modern Filesystem Architecture (For Advanced Developers)

Goal: Build a modern, production-grade filesystem

  1. Complete Paths 1 and 2
  2. P06: Copy-on-Write Filesystem (1-2 months)
    • Implement copy-on-write block allocation
    • Add snapshot support
    • Implement data checksums and scrubbing
    • Build garbage collection

Prerequisites

Essential Knowledge

  • C Programming: Pointers, structs, dynamic memory allocation
  • Binary Data: Hex/binary familiarity, endianness, bit manipulation
  • Operating Systems: Basic understanding of files, directories, permissions

Helpful But Not Required

  • Linux command line proficiency
  • Experience with system calls (open, read, write, lseek)
  • Understanding of disk drives and storage

Development Environment

# Required packages (Ubuntu/Debian)
sudo apt install build-essential libfuse3-dev pkg-config

# For testing
sudo apt install hexdump fdisk e2fsprogs dosfstools

Key References

Topic Book Chapters
Filesystem fundamentals “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau Ch. 39-40, 42-43
ext2/ext4 structures “Understanding the Linux Kernel” by Bovet & Cesati Ch. 12, 18
File operations “The Linux Programming Interface” by Michael Kerrisk Ch. 4-5, 14-15, 18
Binary I/O in C “C Programming: A Modern Approach” by K.N. King Ch. 22
Performance analysis “Systems Performance” by Brendan Gregg Ch. 8

Filesystem Feature Comparison

Feature ext2 ext4 FAT32 NTFS ZFS/Btrfs
Journaling No Yes No Yes CoW instead
Max file size 2TB 16TB 4GB 16EB 16EB
Block allocation Bitmap Extents FAT chain B-tree CoW B-tree
Snapshots No No No VSS Native
Checksums No Metadata No No All data
Permissions Unix Unix None ACLs Unix+ACLs
Fragmentation Medium Low High Very Low Very Low

Project Progression

P01: Hex Dump Explorer
    │
    │   "I can SEE what's on disk"
    │
    ▼
P02: In-Memory FUSE ────────────► P05: Comparison Tool
    │                                    │
    │   "I can MOUNT my own FS"         "I understand TRADEOFFS"
    │
    ▼
P03: Persistent Filesystem
    │
    │   "Data SURVIVES reboots"
    │
    ▼
P04: Journaling Layer
    │
    │   "Data survives CRASHES"
    │
    ▼
P06: Copy-on-Write Filesystem
    │
    "Modern production-grade design"

Expected Outcomes

After completing these projects, you will be able to:

  1. Read raw disk images and identify filesystem structures by sight
  2. Implement VFS operations (getattr, read, write, mkdir, unlink)
  3. Design on-disk layouts balancing metadata overhead and performance
  4. Implement block allocation with bitmaps and indirect pointers
  5. Add journaling for crash consistency
  6. Analyze filesystem performance and explain tradeoffs
  7. Build modern COW filesystems with snapshots and checksums
  8. Answer technical interview questions about filesystem internals

Sources