Filesystem Internals - Expanded Project Guides
Generated from:
FILESYSTEM_INTERNALS_LEARNING_PROJECTS.md
Overview
Filesystems are one of the most elegant abstractions in computing—they transform raw disk blocks into the familiar hierarchy of files and directories we use daily. This expanded project series takes you from parsing raw filesystem structures to building production-quality filesystems with journaling and copy-on-write semantics.
Core Concepts Covered:
- On-disk layout (superblocks, inode tables, data blocks, block groups)
- Metadata management (inodes, directory entries, permissions, timestamps)
- Block allocation (bitmap vs extent-based, fragmentation, free space tracking)
- Journaling (write-ahead logging, crash recovery, consistency guarantees)
- VFS layer (how Linux abstracts different filesystems behind one interface)
- Caching (page cache, buffer cache, write-back vs write-through)
- Directory implementation (linear lists vs hash tables vs B-trees)
Project Index
| # | Project | Difficulty | Time | Key Focus |
|---|---|---|---|---|
| 1 | Hex Dump Disk Explorer | Beginner-Intermediate | 1-2 weeks | Binary parsing, ext2 structures, on-disk layout |
| 2 | In-Memory Filesystem with FUSE | Intermediate | 2-3 weeks | VFS operations, FUSE API, inode management |
| 3 | Persistent Block-Based Filesystem | Intermediate-Advanced | 3-4 weeks | Block allocation, bitmaps, indirect pointers |
| 4 | Journaling Layer | Advanced | 2-3 weeks | Write-ahead logging, crash recovery, transactions |
| 5 | Filesystem Comparison Tool | Intermediate | 2 weeks | Performance analysis, FAT32 parsing, benchmarking |
| 6 | Production-Grade Copy-on-Write Filesystem | Advanced | 1-2 months | COW semantics, snapshots, checksums, garbage collection |
Learning Paths
Path 1: Foundation Builder (Recommended for Beginners)
Goal: Understand how filesystems work from first principles
- P01: Hex Dump Disk Explorer (1-2 weeks)
- Parse raw ext2 structures
- Build mental model of on-disk layout
- Understand inodes, directories, block pointers
- P02: In-Memory FUSE Filesystem (2-3 weeks)
- Implement VFS operations (getattr, read, write, mkdir)
- See what the kernel does for every file operation
- Most satisfying “aha moment” when
lsandcatjust work
Path 2: Production Systems (For Intermediate Developers)
Goal: Build a complete, persistent, crash-consistent filesystem
- Complete Path 1 first
- P03: Persistent Block-Based Filesystem (3-4 weeks)
- Add persistence to disk images
- Implement bitmap-based block allocation
- Handle indirect blocks for large files
- P04: Journaling Layer (2-3 weeks)
- Add write-ahead logging
- Implement crash recovery
- Understand why ext4 mounts instantly after crashes
Path 3: Comparative Analysis (For Breadth)
Goal: Understand why different filesystems exist
- P01: Hex Dump Disk Explorer (foundation)
- P05: Filesystem Comparison Tool (2 weeks)
- Parse FAT32 and ext4 structures
- Measure performance, fragmentation, space efficiency
- Understand tradeoffs between filesystem designs
Path 4: Modern Filesystem Architecture (For Advanced Developers)
Goal: Build a modern, production-grade filesystem
- Complete Paths 1 and 2
- P06: Copy-on-Write Filesystem (1-2 months)
- Implement copy-on-write block allocation
- Add snapshot support
- Implement data checksums and scrubbing
- Build garbage collection
Prerequisites
Essential Knowledge
- C Programming: Pointers, structs, dynamic memory allocation
- Binary Data: Hex/binary familiarity, endianness, bit manipulation
- Operating Systems: Basic understanding of files, directories, permissions
Helpful But Not Required
- Linux command line proficiency
- Experience with system calls (open, read, write, lseek)
- Understanding of disk drives and storage
Development Environment
# Required packages (Ubuntu/Debian)
sudo apt install build-essential libfuse3-dev pkg-config
# For testing
sudo apt install hexdump fdisk e2fsprogs dosfstools
Key References
| Topic | Book | Chapters |
|---|---|---|
| Filesystem fundamentals | “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau | Ch. 39-40, 42-43 |
| ext2/ext4 structures | “Understanding the Linux Kernel” by Bovet & Cesati | Ch. 12, 18 |
| File operations | “The Linux Programming Interface” by Michael Kerrisk | Ch. 4-5, 14-15, 18 |
| Binary I/O in C | “C Programming: A Modern Approach” by K.N. King | Ch. 22 |
| Performance analysis | “Systems Performance” by Brendan Gregg | Ch. 8 |
Filesystem Feature Comparison
| Feature | ext2 | ext4 | FAT32 | NTFS | ZFS/Btrfs |
|---|---|---|---|---|---|
| Journaling | No | Yes | No | Yes | CoW instead |
| Max file size | 2TB | 16TB | 4GB | 16EB | 16EB |
| Block allocation | Bitmap | Extents | FAT chain | B-tree | CoW B-tree |
| Snapshots | No | No | No | VSS | Native |
| Checksums | No | Metadata | No | No | All data |
| Permissions | Unix | Unix | None | ACLs | Unix+ACLs |
| Fragmentation | Medium | Low | High | Very Low | Very Low |
Project Progression
P01: Hex Dump Explorer
│
│ "I can SEE what's on disk"
│
▼
P02: In-Memory FUSE ────────────► P05: Comparison Tool
│ │
│ "I can MOUNT my own FS" "I understand TRADEOFFS"
│
▼
P03: Persistent Filesystem
│
│ "Data SURVIVES reboots"
│
▼
P04: Journaling Layer
│
│ "Data survives CRASHES"
│
▼
P06: Copy-on-Write Filesystem
│
"Modern production-grade design"
Expected Outcomes
After completing these projects, you will be able to:
- Read raw disk images and identify filesystem structures by sight
- Implement VFS operations (getattr, read, write, mkdir, unlink)
- Design on-disk layouts balancing metadata overhead and performance
- Implement block allocation with bitmaps and indirect pointers
- Add journaling for crash consistency
- Analyze filesystem performance and explain tradeoffs
- Build modern COW filesystems with snapshots and checksums
- Answer technical interview questions about filesystem internals