← Back to all projects

FILESYSTEM INTERNALS LEARNING PROJECTS

Deeply Understanding Filesystems Through Building

Filesystems are one of the most elegant abstractions in computing—they transform raw disk blocks into the familiar hierarchy of files and directories we use daily. To truly understand them, you need to grapple with their core challenges: block management, metadata organization, journaling for crash consistency, and the VFS abstraction layer.

Core Concept Analysis

Understanding filesystems breaks down into these fundamental building blocks:

Concept What It Covers
On-disk layout Superblock, inode tables, data blocks, block groups
Metadata management Inodes, directory entries, permissions, timestamps
Block allocation Bitmap vs extent-based, fragmentation, free space tracking
Journaling Write-ahead logging, crash recovery, consistency guarantees
VFS layer How Linux abstracts different filesystems behind one interface
Caching Page cache, buffer cache, write-back vs write-through
Directory implementation Linear lists vs hash tables vs B-trees

Project 1: Hex Dump Disk Explorer

  • File: FILESYSTEM_INTERNALS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Filesystems / Binary Parsing
  • Software or Tool: Hex Editors
  • Main Book: “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau

What you’ll build: A command-line tool that reads raw disk images or block devices and visualizes filesystem structures—superblocks, inode tables, directory entries, and data blocks—in human-readable format.

Why it teaches filesystems: Before you can build a filesystem, you need to see one. This project forces you to understand exactly how bytes on disk translate to the files and directories you see. You’ll parse real ext2/ext4 structures and decode them byte-by-byte.

Core challenges you’ll face:

  • Reading and interpreting the superblock at byte 1024 (understanding magic numbers, block sizes, inode counts)
  • Navigating block groups and locating inode tables
  • Decoding inode structures (permissions, timestamps, block pointers, indirect blocks)
  • Parsing directory entries and following file paths through the tree

Key Concepts:

  • Superblock structure: The Second Extended File System - nongnu.org - Official ext2 documentation
  • Block groups and inodes: “Operating Systems: Three Easy Pieces” Ch. 40 (File System Implementation)
  • Binary file I/O in C: “C Programming: A Modern Approach” Ch. 22 by K.N. King
  • Disk layout visualization: OSDev Wiki - Ext2

Difficulty: Beginner-Intermediate Time estimate: 1-2 weeks Prerequisites: C basics, understanding of structs and pointers, hex/binary familiarity

Real world outcome:

  • Run ./fsexplore disk.img and see output like:
    SUPERBLOCK at offset 1024:
      Magic: 0xEF53 ✓ (ext2 signature)
      Block size: 4096 bytes
      Inodes: 128016
      Free blocks: 45231
    
    INODE #2 (root directory):
      Type: Directory
      Size: 4096 bytes
      Permissions: drwxr-xr-x
      Links: 23
      Direct blocks: [258, 0, 0, ...]
    
    DIRECTORY at block 258:
      [2]  .
      [2]  ..
      [11] lost+found
      [12] home
      [13] etc
    

Learning milestones:

  1. After parsing the superblock - you understand that filesystems are just structured data at known offsets
  2. After reading inodes - you see how Unix “everything is a file” actually works
  3. After traversing directories - you understand path resolution and why .. exists

Project 2: In-Memory Filesystem with FUSE

  • File: FILESYSTEM_INTERNALS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Filesystems / Kernel Interface
  • Software or Tool: FUSE
  • Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A complete filesystem that stores files in RAM, mounted as a real directory on your Linux system. Create files, directories, read, write, delete—all backed by your C code.

Why it teaches filesystems: FUSE lets you implement a real, mountable filesystem without writing kernel code. You’ll implement every syscall—open(), read(), write(), mkdir(), unlink()—and understand exactly what happens when a program accesses a file.

Core challenges you’ll face:

  • Implementing the VFS operations (getattr, readdir, open, read, write, create, unlink, mkdir, rmdir)
  • Managing your own inode table and directory structure in memory
  • Handling file descriptors and ensuring consistency
  • Supporting file permissions and timestamps (stat structure)

Resources for implementing FUSE callbacks:

Key Concepts:

  • VFS abstraction: “Understanding the Linux Kernel” Ch. 12 by Bovet & Cesati
  • File operations syscalls: “The Linux Programming Interface” Ch. 4-5 by Michael Kerrisk
  • FUSE high-level API: libfuse GitHub
  • Directory entry management: Writing a Simple Filesystem Using FUSE

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Project 1 or equivalent understanding, comfort with C pointers and structs

Real world outcome:

$ mkdir /tmp/myfs
$ ./memfs /tmp/myfs
$ cd /tmp/myfs
$ echo "Hello filesystem!" > test.txt
$ cat test.txt
Hello filesystem!
$ ls -la
total 0
drwxr-xr-x 2 user user    0 Dec 18 10:00 .
drwxr-xr-x 3 user user 4096 Dec 18 10:00 ..
-rw-r--r-- 1 user user   18 Dec 18 10:01 test.txt
$ mkdir subdir
$ # Your filesystem is real and mounted!

Learning milestones:

  1. After implementing getattr/readdir - you understand how ls actually works
  2. After implementing read/write - you see buffering and offset management
  3. After implementing create/unlink - you understand inode lifecycle and reference counting

Project 3: Persistent Block-Based Filesystem

  • File: FILESYSTEM_INTERNALS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Filesystems / Storage
  • Software or Tool: Block Devices
  • Main Book: “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau

What you’ll build: Extend your FUSE filesystem to persist data to a file (simulating a disk). Implement proper block allocation with bitmaps, inode structures, and data blocks—a mini ext2.

Why it teaches filesystems: This is where theory meets practice. You’ll implement the exact same structures used in ext2/ext3—superblock, block group descriptors, inode tables, data block bitmaps—and understand why these structures exist.

Core challenges you’ll face:

  • Designing and writing a superblock with filesystem metadata
  • Implementing bitmap-based block allocation (finding free blocks, marking used/free)
  • Managing indirect blocks for large files (single, double, triple indirection)
  • Handling crash consistency (what happens if power fails mid-write?)

Key Concepts:

Difficulty: Intermediate-Advanced Time estimate: 3-4 weeks Prerequisites: Project 2 completed

Real world outcome:

$ ./mkfs.myfs -s 100M disk.img    # Create 100MB filesystem
$ ./myfs disk.img /mnt/myfs       # Mount it
$ cp -r ~/Documents/* /mnt/myfs/  # Copy real files
$ umount /mnt/myfs                # Unmount
$ ./myfs disk.img /mnt/myfs       # Remount - files persist!
$ ls /mnt/myfs
Documents report.pdf photos/
$ ./fsck.myfs disk.img            # Your own filesystem checker
Checking superblock... OK
Checking inode bitmap... OK
Checking block bitmap... OK
Free blocks: 24531/25600

Learning milestones:

  1. After implementing block bitmaps - you understand why fragmentation happens
  2. After implementing indirect blocks - you see why there’s a maximum file size
  3. After adding persistence - you understand the “sync” problem and why databases use WAL

Project 4: Journaling Layer

  • File: FILESYSTEM_INTERNALS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 5: Master
  • Knowledge Area: Crash Consistency / Databases
  • Software or Tool: WAL
  • Main Book: “Database Internals” by Alex Petrov

What you’ll build: Add write-ahead logging (journaling) to your block-based filesystem. Before any metadata change, log the operation so crashes can be recovered.

Why it teaches filesystems: Journaling is what separates “toy” filesystems from production ones. ext3 added journaling to ext2. You’ll understand why fsck used to take hours and now takes seconds, and why databases and filesystems share the same fundamental insight.

Core challenges you’ll face:

  • Designing a journal format (transaction records, commit blocks)
  • Implementing write-ahead: log first, then apply
  • Recovery: replaying committed transactions after crash
  • Choosing what to journal (metadata-only vs full data journaling)

Key Concepts:

  • Crash consistency problem: “Operating Systems: Three Easy Pieces” Ch. 42 (Crash Consistency: FSCK and Journaling)
  • Write-ahead logging: “Designing Data-Intensive Applications” Ch. 7 by Martin Kleppmann
  • ext3 journaling: Linux Kernel Documentation - ext4
  • Transaction semantics: “Database Internals” Ch. 3 by Alex Petrov

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Project 3 completed

Real world outcome:

$ ./myfs disk.img /mnt/myfs
$ dd if=/dev/urandom of=/mnt/myfs/bigfile bs=1M count=50 &
# Kill power mid-write (or kill -9 the process)
$ ./myfs disk.img /mnt/myfs
Journal recovery: replaying 3 transactions...
Transaction 1: inode 15 allocation - COMMITTED, replaying
Transaction 2: block 1024-1087 allocation - COMMITTED, replaying
Transaction 3: inode 15 size update - INCOMPLETE, discarding
Recovery complete. Filesystem consistent.
$ ls /mnt/myfs  # No corruption!

Learning milestones:

  1. After implementing basic journaling - you understand why “safe eject” matters for USB drives
  2. After implementing recovery - you see why ext4 mounts instantly after crashes
  3. After understanding the tradeoffs - you know why databases still use their own journaling

Project 5: Filesystem Comparison Tool

  • File: FILESYSTEM_INTERNALS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 1: Pure Corporate Snoozefest
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Performance Analysis / Filesystems
  • Software or Tool: Benchmarking
  • Main Book: “Systems Performance” by Brendan Gregg

What you’ll build: A tool that mounts and analyzes multiple filesystem types (ext2, FAT32, your custom FS) and produces detailed comparison reports—performance metrics, space efficiency, feature support.

Why it teaches filesystems: Different filesystems make different tradeoffs. By measuring and comparing them, you’ll understand why ext4 is better for Linux, why FAT32 persists for USB drives, and why ZFS exists for data integrity.

Core challenges you’ll face:

  • Parsing FAT32 structures (FAT table, directory entries, cluster chains)
  • Measuring fragmentation levels across filesystem types
  • Benchmarking read/write performance with different file sizes
  • Analyzing space overhead (metadata vs data ratio)

Key Concepts:

  • FAT filesystem structure: “Operating Systems: Three Easy Pieces” Ch. 39 (Interlude: Files and Directories)
  • Filesystem performance characteristics: “Systems Performance” by Brendan Gregg
  • Comparative analysis: Linux Kernel Filesystems Documentation

Difficulty: Intermediate Time estimate: 2 weeks Prerequisites: Project 1 or strong understanding of filesystem structures

Real world outcome:

$ ./fscompare ext4.img fat32.img myfs.img

FILESYSTEM COMPARISON REPORT
============================

                    ext4        FAT32       MyFS
-------------------------------------------------
Max filename:       255         255         255
Max file size:      16TB        4GB         2GB
Journaling:         Yes         No          Yes
Permissions:        Unix        None        Unix
Case sensitive:     Yes         No          Yes

SPACE EFFICIENCY (1GB disk, 10000 small files):
Metadata overhead:  2.1%        0.8%        3.2%
Avg fragmentation:  1.2%        15.3%       4.1%
Space wasted:       12MB        156MB       31MB

PERFORMANCE (sequential 100MB write):
Write speed:        245 MB/s    198 MB/s    156 MB/s
Read speed:         312 MB/s    287 MB/s    201 MB/s

Learning milestones:

  1. After parsing FAT32 - you understand why it has 4GB file limit (32-bit cluster pointers)
  2. After measuring fragmentation - you see why ext4 uses extents instead of block pointers
  3. After comparing features - you understand filesystem design tradeoffs

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
Hex Dump Disk Explorer Beginner-Intermediate 1-2 weeks ⭐⭐⭐ ⭐⭐⭐
In-Memory FUSE FS Intermediate 2-3 weeks ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Persistent Block-Based FS Intermediate-Advanced 3-4 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
Journaling Layer Advanced 2-3 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐
Filesystem Comparison Tool Intermediate 2 weeks ⭐⭐⭐⭐ ⭐⭐⭐⭐

Recommendation

Based on wanting to deeply understand filesystems with C:

Start with Project 1 (Hex Dump Explorer) - 1-2 weeks. This builds the mental model you need. When you can look at raw bytes and see “that’s the inode table, those are indirect block pointers,” everything else clicks.

Then do Project 2 (In-Memory FUSE) - This is the most satisfying project. When you mount your filesystem and use normal ls, cat, cp commands and they just work, you’ll have an “aha moment” about what the kernel does for every file operation.

Then choose your path:

  • If you want production-level understanding → Project 3 + 4 (persistence and journaling)
  • If you want breadth across filesystem types → Project 5 (comparison tool)

Final Capstone Project: Production-Grade Copy-on-Write Filesystem

What you’ll build: A complete, persistent, FUSE-based filesystem with copy-on-write semantics (like ZFS/Btrfs), snapshots, and data integrity verification (checksums). This is a “real” filesystem you could actually use.

Why it teaches everything: Copy-on-write is the modern approach to filesystem design. It solves journaling, snapshots, and data integrity in one elegant model. Building this requires synthesizing everything from the previous projects.

Core challenges you’ll face:

  • Implementing copy-on-write block allocation (never overwrite, always write new)
  • Building a block-pointer tree (like Btrfs B-trees) for efficient lookups
  • Adding snapshot support (just save the root pointer at a point in time)
  • Implementing data checksums and scrubbing for integrity verification
  • Handling garbage collection of orphaned blocks

Key Concepts:

  • Copy-on-write semantics: “Operating Systems: Three Easy Pieces” Ch. 43 (Log-structured File Systems)
  • B-tree structures: “Algorithms” by Sedgewick & Wayne (Ch. 3.3)
  • ZFS/Btrfs design: Btrfs Wiki Design Documentation
  • Data integrity and checksums: “Designing Data-Intensive Applications” Ch. 7-9 by Kleppmann
  • Garbage collection: “Operating Systems: Three Easy Pieces” Ch. 43

Difficulty: Advanced Time estimate: 1-2 months Prerequisites: Projects 1-4 completed

Real world outcome:

$ ./mkfs.cowfs -s 1G mydisk.img
$ ./cowfs mydisk.img /mnt/cow

# Normal operations work
$ cp -r ~/projects /mnt/cow/
$ ls /mnt/cow/projects
myapp/ webapp/ scripts/

# Create a snapshot before dangerous operation
$ ./cowctl /mnt/cow snapshot create before-refactor
Snapshot 'before-refactor' created (root block: 4521)

# Make changes
$ rm -rf /mnt/cow/projects/myapp/src/*
$ echo "oops" > /mnt/cow/projects/myapp/src/main.c

# Restore from snapshot instantly
$ ./cowctl /mnt/cow snapshot restore before-refactor
Restored to snapshot 'before-refactor'
$ ls /mnt/cow/projects/myapp/src/
main.c utils.c config.h  # Everything back!

# Verify data integrity
$ ./cowctl /mnt/cow scrub
Scrubbing filesystem...
Checked 15231 blocks
Verified 15231 checksums
0 errors found

Learning milestones:

  1. After implementing COW - you understand why Btrfs/ZFS never need fsck
  2. After adding snapshots - you see why they’re “free” in COW filesystems
  3. After implementing checksums - you understand end-to-end data integrity
  4. After building garbage collection - you understand the space/performance tradeoffs

Key Differences Between Filesystems (Reference)

Feature ext2 ext4 FAT32 NTFS ZFS/Btrfs
Journaling No Yes No Yes CoW instead
Max file size 2TB 16TB 4GB 16EB 16EB
Block allocation Bitmap Extents FAT chain B-tree CoW B-tree
Snapshots No No No VSS Native
Checksums No Metadata No No All data
Permissions Unix Unix None ACLs Unix+ACLs
Fragmentation Medium Low High Low Very Low

Building these projects will teach you why each column is different—not just memorizing facts, but understanding the engineering tradeoffs.


Sources