Filesystem Internals Learning Projects
Filesystem Internals Learning Projects
A comprehensive project-based learning series for mastering filesystem internals through hands-on C programming. From parsing raw disk structures to building production-grade copy-on-write filesystems.
Series Overview
Filesystems are one of the most elegant abstractions in computingโthey transform raw disk blocks into the familiar hierarchy of files and directories we use daily. This series takes you from reading raw bytes to building a complete, crashproof filesystem with snapshots.
Core Concepts Covered
| Concept | What It Covers |
|---|---|
| On-disk layout | Superblock, inode tables, data blocks, block groups |
| Metadata management | Inodes, directory entries, permissions, timestamps |
| Block allocation | Bitmap vs extent-based, fragmentation, free space tracking |
| Journaling | Write-ahead logging, crash recovery, consistency guarantees |
| VFS layer | How Linux abstracts different filesystems behind one interface |
| Caching | Page cache, buffer cache, write-back vs write-through |
| Directory implementation | Linear lists vs hash tables vs B-trees |
Projects in This Series
| # | Project | Difficulty | Time | What Youโll Build |
|---|---|---|---|---|
| 1 | Hex Dump Disk Explorer | Intermediate | 1-2 weeks | CLI tool to visualize ext2/ext4 filesystem structures |
| 2 | In-Memory Filesystem with FUSE | Advanced | 2-3 weeks | Complete mountable RAM filesystem |
| 3 | Persistent Block-Based Filesystem | Expert | 3-4 weeks | Mini ext2 with bitmap allocation and indirect blocks |
| 4 | Journaling Layer | Master | 2-3 weeks | Write-ahead logging for crash consistency |
| 5 | Filesystem Comparison Tool | Advanced | 2 weeks | Benchmark and analyze ext4, FAT32, and custom FS |
| C | Copy-on-Write Filesystem (Capstone) | Master | 1-2 months | ZFS/Btrfs-style COW with snapshots and checksums |
Learning Path
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RECOMMENDED PROGRESSION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ P1: Hex Dump Explorer โ
โ โโโ Understand on-disk structures โ
โ โ โ
โ โผ โ
โ P2: In-Memory FUSE โ
โ โโโ Learn VFS operations and syscall interface โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ P3: Persistent FS P5: Comparison (Breadth) โ
โ โโโ Block allocation Tool โ
โ โ โ
โ โผ โ
โ P4: Journaling โ
โ โโโ Crash consistency โ
โ โ โ
โ โผ โ
โ CAPSTONE: Copy-on-Write FS โ
โ โโโ Modern filesystem design โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Prerequisites
Before starting this series, you should have:
- C Programming: Comfortable with pointers, structs, memory management
- Binary Data: Understanding of hex, endianness, bit manipulation
- Linux Basics: Command line, file operations, basic system calls
- Data Structures: Arrays, linked lists, trees (helpful for later projects)
Key Resources
Books
| Topic | Book | Why It Helps |
|---|---|---|
| Filesystem fundamentals | โOperating Systems: Three Easy Piecesโ by Arpaci-Dusseau | Ch. 39-43 cover everything from inodes to journaling |
| ext2/ext4 implementation | โUnderstanding the Linux Kernelโ by Bovet & Cesati | Ch. 12 & 18 for VFS and ext2/3 internals |
| System programming | โThe Linux Programming Interfaceโ by Kerrisk | File I/O, stat, directories |
| Performance analysis | โSystems Performanceโ by Brendan Gregg | Ch. 8 for filesystem benchmarking |
| Binary parsing | โC Programming: A Modern Approachโ by K.N. King | Ch. 22 for binary file I/O |
Online References
- The Second Extended File System - Official ext2 documentation
- OSDev Wiki - Ext2 - Community filesystem documentation
- FUSE Tutorial - Comprehensive FUSE guide
- Linux Kernel Filesystem Docs
Project Comparison
| Feature | P1 | P2 | P3 | P4 | P5 | Capstone |
|---|---|---|---|---|---|---|
| Read filesystem structures | โ | ย | โ | ย | โ | โ |
| Write filesystem structures | ย | โ | โ | โ | ย | โ |
| FUSE integration | ย | โ | โ | โ | ย | โ |
| Persistence | ย | ย | โ | โ | ย | โ |
| Crash recovery | ย | ย | ย | โ | ย | โ |
| Multiple FS formats | โ | ย | ย | ย | โ | ย |
| Performance benchmarking | ย | ย | ย | ย | โ | ย |
| Snapshots | ย | ย | ย | ย | ย | โ |
| Data checksums | ย | ย | ย | ย | ย | โ |
What Youโll Understand After This Series
- Why filesystems exist: The abstraction layer between raw disk blocks and user files
- How data is organized: Superblocks, inodes, block groups, directory entries
- Allocation strategies: Bitmaps vs FAT chains vs extents
- Crash consistency: Why journaling matters and how write-ahead logging works
- Performance tradeoffs: Why ext4 is faster than FAT32, why SSDs change the game
- Modern designs: Copy-on-write, snapshots, self-healing filesystems
Interview Preparation
Completing these projects prepares you for systems-level interview questions:
- โWhat is an inode, and what information does it store?โ
- โHow does ext4 resolve a file path to disk blocks?โ
- โWhat happens if power fails during a file write?โ
- โWhy does FAT32 have a 4GB file size limit?โ
- โHow would you implement snapshots efficiently?โ
Each project includes specific interview questions and expected answers.
Getting Started
- Environment Setup: Linux (native or VM) with development tools
sudo apt install build-essential libfuse3-dev - Create Test Images: Each project shows how to create test disk images
dd if=/dev/zero of=test.img bs=1M count=10 mkfs.ext2 test.img - Start with Project 1: Build the foundational understanding of on-disk structures
License
Educational materials for personal learning. See individual project files for specific licensing.