Project 1: Hex Dump Disk Explorer

Project Overview

Attribute	Details
Difficulty	Intermediate
Time Estimate	1-2 weeks
Language	C
Knowledge Area	Filesystems / Binary Parsing
Main Book	“Operating Systems: Three Easy Pieces” by Arpaci-Dusseau
Prerequisites	C basics, structs and pointers, hex/binary familiarity

What You’ll Build

A command-line tool that reads raw disk images or block devices and visualizes filesystem structures—superblocks, inode tables, directory entries, and data blocks—in human-readable format.

Real World Output:

$ ./fsexplore disk.img

SUPERBLOCK at offset 1024:
  Magic: 0xEF53 ✓ (ext2 signature)
  Block size: 4096 bytes
  Inodes: 128016
  Free blocks: 45231

INODE #2 (root directory):
  Type: Directory
  Size: 4096 bytes
  Permissions: drwxr-xr-x
  Links: 23
  Direct blocks: [258, 0, 0, ...]

DIRECTORY at block 258:
  [2]  .
  [2]  ..
  [11] lost+found
  [12] home
  [13] etc

Learning Objectives

By completing this project, you will:

Parse binary data structures from raw disk images using C structs
Understand ext2/ext4 on-disk layout including superblock, block groups, and inodes
Navigate inode tables and decode file metadata (permissions, timestamps, block pointers)
Traverse directory entries and understand path resolution
Work with indirect block pointers for large file support
Debug binary parsing issues using hex editors and careful offset calculations

The Core Question You’re Answering

“How does a filesystem actually store my files on disk? What does ‘ext2’ really mean?”

When you create a file named “hello.txt” and write “Hello World” to it, where do those bytes actually go? How does the OS know which disk sectors contain your file versus someone else’s?

Most developers treat filesystems as black boxes. This project rips away the abstraction. You’ll read raw disk bytes and decode the exact structures that ext2 uses. When you finish, you’ll see filesystems not as magic, but as elegant data structures mapped onto persistent storage.

Deep Theoretical Foundation

1. Why Filesystems Organize Data Into Blocks

Hard drives and SSDs don’t operate on individual bytes—they work with fixed-size units called sectors (historically 512 bytes, now often 4096 bytes on modern drives). Filesystems organize data into blocks, which are typically 1024, 2048, or 4096 bytes.

Why blocks matter:

Efficiency: Reading one 4KB block is faster than 4096 individual byte reads
Addressing: With 32-bit block numbers and 4KB blocks, you can address 16TB of storage
Metadata overhead: Block-level tracking (one bit per block) uses far less space than byte-level tracking

The fundamental tradeoff:

Larger blocks → less metadata, faster sequential I/O, more internal fragmentation
Smaller blocks → more metadata, less wasted space for small files

2. The Superblock: Filesystem’s Master Record

Every ext2/ext4 filesystem begins with a superblock at byte offset 1024. The first 1024 bytes are reserved for boot code (MBR/GPT).

The superblock contains critical metadata:

struct ext2_super_block {
    uint32_t s_inodes_count;        // Total number of inodes
    uint32_t s_blocks_count;        // Total number of blocks
    uint32_t s_free_blocks_count;   // Free blocks available
    uint32_t s_free_inodes_count;   // Free inodes available
    uint32_t s_first_data_block;    // First data block (0 or 1)
    uint32_t s_log_block_size;      // Block size = 1024 << this value
    uint32_t s_blocks_per_group;    // Blocks per block group
    uint32_t s_inodes_per_group;    // Inodes per block group
    uint16_t s_magic;               // Magic signature (0xEF53)
    // ... many more fields
};

Key insight: The superblock is duplicated in every block group. If the primary superblock corrupts, fsck can restore it from backups.

3. Block Groups: Locality for Performance

ext2/ext4 divides the disk into block groups. Each group contains:

Superblock copy (in some groups)
Block Group Descriptor Table copy
Block bitmap (tracks used/free blocks in this group)
Inode bitmap (tracks used/free inodes)
Inode table (actual inode structures)
Data blocks

Why block groups exist:

Locality: Files stored near their inodes → faster access
Parallelism: Different groups can be accessed concurrently
Resilience: Damage to one group doesn’t destroy the entire filesystem

Block Group 0          Block Group 1          Block Group 2
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Superblock      │    │ SB copy (sparse)│    │ SB copy (sparse)│
│ BGDT            │    │ BGDT copy       │    │ BGDT copy       │
│ Block bitmap    │    │ Block bitmap    │    │ Block bitmap    │
│ Inode bitmap    │    │ Inode bitmap    │    │ Inode bitmap    │
│ Inode table     │    │ Inode table     │    │ Inode table     │
│ Data blocks...  │    │ Data blocks...  │    │ Data blocks...  │
└─────────────────┘    └─────────────────┘    └─────────────────┘

4. Inodes: The Heart of Unix Filesystems

An inode (index node) is a fixed-size structure (typically 128 or 256 bytes) that stores everything about a file except its name:

struct ext2_inode {
    uint16_t i_mode;        // File type + permissions
    uint16_t i_uid;         // Owner user ID
    uint32_t i_size;        // File size in bytes
    uint32_t i_atime;       // Last access time
    uint32_t i_ctime;       // Creation time
    uint32_t i_mtime;       // Modification time
    uint32_t i_dtime;       // Deletion time
    uint16_t i_gid;         // Group ID
    uint16_t i_links_count; // Hard link count
    uint32_t i_blocks;      // Number of 512-byte blocks
    uint32_t i_block[15];   // Block pointers
    // ... more fields
};

The block pointer array:

i_block[0-11]: Direct block pointers (12 blocks × 4KB = 48KB directly addressable)
i_block[12]: Single indirect pointer (points to block of pointers)
i_block[13]: Double indirect pointer
i_block[14]: Triple indirect pointer

Maximum file size calculation (4KB blocks, 4-byte pointers):

Direct: 12 × 4KB = 48KB
Single indirect: 1024 × 4KB = 4MB
Double indirect: 1024 × 1024 × 4KB = 4GB
Triple indirect: 1024³ × 4KB = 4TB
Total: ~4TB maximum file size

5. Directory Entries: Mapping Names to Inodes

Directories are special files whose data blocks contain directory entries:

struct ext2_dir_entry_2 {
    uint32_t inode;      // Inode number
    uint16_t rec_len;    // Entry length (for traversal)
    uint8_t  name_len;   // Filename length
    uint8_t  file_type;  // File type (1=file, 2=dir, 7=symlink)
    char     name[];     // Filename (variable length, not null-terminated)
};

Key insight: Filenames live in directory entries, not in inodes. This enables:

Hard links: Multiple directory entries pointing to same inode
Efficient rename: Just update directory entry, not file data
Case sensitivity: Filesystem controls name comparison

Path resolution algorithm:

Start at root inode (always inode 2)
Read root’s directory data blocks
Search for first path component
Get that entry’s inode number
Repeat for each path component

6. Binary Data Layout and Endianness

ext2 stores all multi-byte integers in little-endian format (least significant byte first). On x86/x64 systems, you can read structs directly. On big-endian systems, you need conversion:

// Convert little-endian to host byte order
#include <endian.h>
uint32_t blocks = le32toh(sb.s_blocks_count);

Struct padding warning:

// This struct might have padding between fields!
struct bad_example {
    uint8_t  a;      // 1 byte
    // 3 bytes padding here on many systems
    uint32_t b;      // 4 bytes
};

// Use packed attribute to match on-disk layout
struct ext2_superblock {
    // fields...
} __attribute__((packed));

Project Specification

Functional Requirements

Open and validate disk images
- Accept file path as command-line argument
- Open in binary read mode
- Validate ext2 magic number (0xEF53)
- Handle errors gracefully (file not found, not ext2, etc.)
Parse and display superblock
- Read superblock at offset 1024
- Calculate derived values (block size, group count)
- Display in human-readable format
Navigate block groups
- Parse Block Group Descriptor Table
- Display information for each block group
- Show bitmap locations and inode table positions
Read and decode inodes
- Accept inode number as input
- Calculate inode location (which group, which offset)
- Display all inode fields with interpretation
Traverse directory contents
- Read directory inode
- Parse directory entries from data blocks
- Display entries with inode numbers and names
Follow file paths
- Accept path string (e.g., “/home/user/file.txt”)
- Resolve path to inode number
- Display final inode information

Non-Functional Requirements

Handle disk images from 1MB to 100GB
Parse in reasonable time (< 1 second for path resolution)
Provide meaningful error messages
Support both ext2 and ext4 basic structures

Solution Architecture

Component Design

┌─────────────────────────────────────────────────────────────┐
│                     CLI Interface                            │
│                  (argument parsing, output formatting)       │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                    Filesystem Parser                         │
│              (superblock, BGDT, inodes, directories)         │
├─────────────────┬───────────────────┬───────────────────────┤
│  Superblock     │   Block Group     │     Inode             │
│  Parser         │   Navigator       │     Reader            │
└────────┬────────┴─────────┬─────────┴──────────┬────────────┘
         │                  │                    │
         ▼                  ▼                    ▼
┌─────────────────────────────────────────────────────────────┐
│                    Block I/O Layer                           │
│               (read blocks, handle offsets)                  │
└──────────────────────────┬──────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│                    Disk Image File                           │
│                  (opened with fopen/read)                    │
└─────────────────────────────────────────────────────────────┘

Data Structures

// Filesystem context (global state)
struct fs_context {
    FILE *disk;                      // Open file handle
    struct ext2_super_block sb;      // Cached superblock
    uint32_t block_size;             // Calculated block size
    uint32_t groups_count;           // Number of block groups
    struct ext2_group_desc *bgdt;    // Block group descriptors
};

// Parsed inode with derived info
struct parsed_inode {
    struct ext2_inode raw;           // Raw inode data
    uint32_t ino;                    // Inode number
    char type_str[16];               // "file", "dir", "symlink"
    char perm_str[11];               // "drwxr-xr-x"
    char mtime_str[32];              // Human-readable timestamp
};

// Directory entry list
struct dir_entry_list {
    uint32_t inode;
    char name[256];
    uint8_t file_type;
    struct dir_entry_list *next;
};

Key Functions

// Initialization
int fs_open(struct fs_context *ctx, const char *path);
void fs_close(struct fs_context *ctx);

// Low-level I/O
int read_block(struct fs_context *ctx, uint32_t block_num, void *buf);
int read_bytes(struct fs_context *ctx, off_t offset, size_t len, void *buf);

// Superblock operations
int parse_superblock(struct fs_context *ctx);
void print_superblock(struct fs_context *ctx);

// Block group operations
int load_bgdt(struct fs_context *ctx);
void print_block_group(struct fs_context *ctx, int group_num);

// Inode operations
int read_inode(struct fs_context *ctx, uint32_t ino, struct ext2_inode *inode);
void print_inode(struct parsed_inode *pi);
int inode_get_block(struct fs_context *ctx, struct ext2_inode *inode,
                    uint32_t logical_block, uint32_t *physical_block);

// Directory operations
struct dir_entry_list* read_directory(struct fs_context *ctx,
                                       struct ext2_inode *dir_inode);
void free_dir_list(struct dir_entry_list *list);

// Path resolution
int resolve_path(struct fs_context *ctx, const char *path,
                 struct ext2_inode *result);

Implementation Guide

Phase 1: Binary I/O Foundation (Days 1-2)

Goals:

Create disk image I/O layer
Read raw bytes at specific offsets
Verify you can read the superblock magic number

Steps:

Create a minimal program that opens a file in binary mode
Seek to offset 1024 and read 2 bytes at offset 56 (magic number location)
Verify you read 0x53EF (little-endian for 0xEF53)

Testing:

# Create test image
dd if=/dev/zero of=test.img bs=1M count=10
mkfs.ext2 test.img

# Verify magic number manually
xxd -s 1024 -l 64 test.img | head -5

Key learning: Binary file I/O with fopen("rb"), fseek(), fread()

Phase 2: Superblock Parser (Days 2-3)

Goals:

Define complete superblock struct
Parse all relevant fields
Display human-readable output

Steps:

Define struct ext2_super_block with all fields (use ext2 documentation)
Read the struct from offset 1024
Calculate derived values (block size = 1024 « s_log_block_size)
Print formatted output

Testing:

# Compare with dumpe2fs output
dumpe2fs test.img 2>/dev/null | head -30
./fsexplore test.img superblock

Key learning: Struct packing, byte order, derived calculations

Goals:

Parse Block Group Descriptor Table
Locate inode tables and bitmaps
Navigate to specific block groups

Steps:

BGDT starts at block 1 (or 2 for 1KB blocks)
Each descriptor is 32 bytes
Parse inode table location, bitmap locations
Calculate number of groups from superblock

Testing:

# Verify with dumpe2fs
dumpe2fs test.img 2>/dev/null | grep -A 10 "Group 0"

Key learning: Multi-level data structures, offset calculations

Phase 4: Inode Reader (Days 4-6)

Goals:

Calculate inode location from inode number
Read and parse inode structures
Decode permissions and file types

Steps:

Given inode N, find its block group: (N - 1) / inodes_per_group
Find offset within group: (N - 1) % inodes_per_group
Get inode table start from BGDT
Read inode at calculated offset

Testing:

# Create files and check their inodes
sudo mount -o loop test.img /mnt
sudo touch /mnt/testfile
ls -i /mnt/testfile  # Shows inode number
sudo umount /mnt
./fsexplore test.img inode <number>

Key learning: Inode addressing, metadata interpretation

Phase 5: Directory Traversal (Days 6-8)

Goals:

Read directory data blocks
Parse variable-length directory entries
List directory contents

Steps:

Read directory inode (inode 2 for root)
Follow block pointers to read directory data
Parse entries: inode, rec_len, name_len, name
Handle variable-length entries correctly

Testing:

# Add files to test image
sudo mount -o loop test.img /mnt
sudo mkdir /mnt/subdir
sudo touch /mnt/file1.txt /mnt/file2.txt
sudo umount /mnt
./fsexplore test.img dir /

Key learning: Variable-length structures, linked traversal

Phase 6: Path Resolution (Days 8-10)

Goals:

Parse path strings
Traverse directory tree
Find final inode for any path

Steps:

Split path by ‘/’ separator
Start at inode 2 (root)
For each component, read directory, find matching entry
Use entry’s inode as new current inode
Repeat until path exhausted

Testing:

sudo mount -o loop test.img /mnt
sudo mkdir -p /mnt/a/b/c
sudo touch /mnt/a/b/c/deep.txt
sudo umount /mnt
./fsexplore test.img path /a/b/c/deep.txt

Key learning: Recursive traversal, path parsing

Phase 7: Indirect Blocks (Days 10-14)

Goals:

Handle files larger than 48KB
Follow indirect, double-indirect, triple-indirect pointers
Read any byte from any file

Steps:

For logical block 0-11, use direct pointers
For block 12-1035, read indirect block, use Nth pointer
For blocks 1036+, read double-indirect, then indirect
Implement get_block_num(inode, logical_block) function

Testing:

sudo mount -o loop test.img /mnt
dd if=/dev/urandom of=/mnt/largefile bs=1M count=5
sudo umount /mnt
./fsexplore test.img file-blocks /largefile

Key learning: Multi-level indirection, block addressing

Testing Strategy

Unit Tests

// Test superblock validation
void test_validate_superblock() {
    struct ext2_super_block sb;
    sb.s_magic = 0xEF53;
    assert(is_valid_ext2(&sb) == 1);

    sb.s_magic = 0x0000;
    assert(is_valid_ext2(&sb) == 0);
}

// Test block size calculation
void test_block_size() {
    struct ext2_super_block sb;
    sb.s_log_block_size = 0;  // 1024 << 0 = 1024
    assert(get_block_size(&sb) == 1024);

    sb.s_log_block_size = 2;  // 1024 << 2 = 4096
    assert(get_block_size(&sb) == 4096);
}

// Test inode location calculation
void test_inode_location() {
    // With 8192 inodes per group, 128 byte inodes:
    // Inode 1 is at group 0, index 0
    // Inode 8192 is at group 0, index 8191
    // Inode 8193 is at group 1, index 0
    assert(get_inode_group(1, 8192) == 0);
    assert(get_inode_index(1, 8192) == 0);
    assert(get_inode_group(8193, 8192) == 1);
    assert(get_inode_index(8193, 8192) == 0);
}

Integration Tests

#!/bin/bash
# test_fsexplore.sh

# Create test image with known contents
dd if=/dev/zero of=test.img bs=1M count=10 2>/dev/null
mkfs.ext2 -F test.img 2>/dev/null
sudo mount -o loop test.img /mnt
sudo mkdir /mnt/testdir
echo "Hello World" | sudo tee /mnt/testdir/hello.txt > /dev/null
sudo umount /mnt

# Test superblock
./fsexplore test.img superblock | grep -q "Magic: 0xEF53"
[ $? -eq 0 ] && echo "PASS: Superblock magic" || echo "FAIL: Superblock magic"

# Test root directory
./fsexplore test.img dir / | grep -q "testdir"
[ $? -eq 0 ] && echo "PASS: Root directory" || echo "FAIL: Root directory"

# Test path resolution
./fsexplore test.img path /testdir/hello.txt | grep -q "Type: file"
[ $? -eq 0 ] && echo "PASS: Path resolution" || echo "FAIL: Path resolution"

# Cleanup
rm test.img

Comparison Testing

# Compare your output with standard tools
dumpe2fs test.img 2>/dev/null > expected.txt
./fsexplore test.img superblock > actual.txt
diff -u expected.txt actual.txt

Common Pitfalls and Debugging

Pitfall 1: Struct Padding

Problem: Your struct size doesn’t match on-disk size.

struct bad_inode {
    uint16_t i_mode;    // 2 bytes
    uint16_t i_uid;     // 2 bytes + 2 bytes padding
    uint32_t i_size;    // 4 bytes
};  // Size: 10 bytes, but ext2 expects 128!

Solution: Use packed attribute and verify size.

struct ext2_inode {
    // fields...
} __attribute__((packed));

_Static_assert(sizeof(struct ext2_inode) == 128, "Inode size mismatch");

Pitfall 2: Off-by-One in Inode Numbering

Problem: Inodes are numbered from 1, not 0.

Solution:

// WRONG
uint32_t index = inode_num / inodes_per_group;

// CORRECT
uint32_t index = (inode_num - 1) / inodes_per_group;

Pitfall 3: Block 0 Special Case

Problem: Block pointers of 0 don’t mean block 0—they mean “hole” (sparse file).

Solution:

if (block_ptr == 0) {
    // Sparse region - return zeros, don't try to read block 0
    memset(buffer, 0, block_size);
}

Pitfall 4: Directory Entry Alignment

Problem: Directory entries have padding to 4-byte boundaries.

Solution:

// rec_len includes padding to next 4-byte boundary
// Use rec_len for iteration, not (8 + name_len)
entry = (struct ext2_dir_entry_2 *)((char *)entry + entry->rec_len);

Debugging Techniques

1. Hex dump verification:

# Read superblock bytes manually
xxd -s 1024 -l 256 disk.img

# Compare with your parsed values
./fsexplore disk.img superblock --raw

2. Known-good comparison:

# Use debugfs to explore
debugfs disk.img
debugfs: stat /path/to/file
debugfs: ls -l /

3. Minimal test cases:

# Create minimal filesystem
dd if=/dev/zero of=tiny.img bs=1K count=1024
mkfs.ext2 -F -b 1024 -N 16 tiny.img

Extensions and Challenges

Extension 1: ext4 Support

Parse extent tree instead of indirect blocks
Handle 48-bit block addresses
Support extents feature flag

Extension 2: Deleted File Recovery

Scan inode table for deleted inodes (dtime set)
Follow block pointers to recover data
Reconstruct filenames from orphaned directory entries

Extension 3: Visual Block Map

$ ./fsexplore disk.img blockmap
Block usage map (# = used, . = free):

     0         1         2         3
     0123456789012345678901234567890123456789
 0:  ########################################
 1:  ########################################
 2:  ###########.............................
 3:  ........................................

Extension 4: JSON Output

{
  "superblock": {
    "magic": "0xEF53",
    "block_size": 4096,
    "inodes_count": 128016
  },
  "groups": [
    {"inode_bitmap": 3, "block_bitmap": 4, "inode_table": 5}
  ]
}

Challenge: Cross-Filesystem Support

Add support for reading FAT32 filesystems:

Parse FAT boot sector
Follow cluster chains
Handle VFAT long filenames

Real-World Connections

Forensic Analysis

Forensic investigators use similar tools to:

Recover deleted files (reading freed inodes)
Analyze file timestamps (atime, mtime, ctime)
Track file modifications through journaling

Data Recovery

Understanding filesystem internals enables:

Recovering data from corrupted filesystems
Rebuilding superblocks from backup copies
Extracting files without mounting (damaged filesystems)

Operating System Development

OS developers need to:

Implement filesystem drivers
Optimize block allocation strategies
Handle crash recovery

Interview Questions

Conceptual

“What is an inode, and what information does it store?”
- Expected: Metadata (permissions, timestamps, size, owner), block pointers, but NOT the filename
“How does ext2 resolve a file path like ‘/home/user/file.txt’?”
- Expected: Start at root inode (2), read directory entries, find “home” entry’s inode, repeat
“Why does ext2 use direct, indirect, double-indirect, and triple-indirect pointers?”
- Expected: Balance between small file efficiency and large file support
“What is the maximum file size in ext2 with 4096-byte blocks?”
- Expected: ~4TB (show calculation)

Implementation

“Why is the superblock at offset 1024 instead of 0?”
- Expected: First 1024 bytes reserved for boot sector/MBR
“What makes ‘.’ and ‘..’ directory entries special?”
- Expected: . points to current directory’s inode, .. points to parent
“How would you implement ‘fsck’ (filesystem consistency check)?”
- Expected: Verify bitmaps match actual allocation, check directory integrity

Self-Assessment Checklist

Before considering this project complete, verify you can:

Explain what a superblock contains and why it exists
Calculate an inode’s disk location given its number
Describe how directory entries link names to inodes
Trace a path resolution from root to any file
Explain why indirect blocks exist and how they work
Use a hex editor to manually verify your parsing is correct
Create test cases for edge cases (empty directories, sparse files)
Handle errors gracefully (bad magic, corrupted data)

Resources

Documentation

The Second Extended File System - Official ext2 specification
OSDev Wiki - Ext2 - Implementation guide

Books

“Operating Systems: Three Easy Pieces” Ch. 40 - File System Implementation
“The Linux Programming Interface” Ch. 14 - File Systems
“C Programming: A Modern Approach” Ch. 22 - Input/Output

Tools

dumpe2fs - Display ext2/ext3/ext4 filesystem information
debugfs - Interactive ext2 filesystem debugger
xxd - Make hexdump of any file

Project 1: Hex Dump Disk Explorer

Project Overview

What You’ll Build

Learning Objectives

The Core Question You’re Answering

Deep Theoretical Foundation

1. Why Filesystems Organize Data Into Blocks

2. The Superblock: Filesystem’s Master Record

3. Block Groups: Locality for Performance

4. Inodes: The Heart of Unix Filesystems

5. Directory Entries: Mapping Names to Inodes

6. Binary Data Layout and Endianness

Project Specification

Functional Requirements

Non-Functional Requirements

Solution Architecture

Component Design

Data Structures

Key Functions

Implementation Guide

Phase 1: Binary I/O Foundation (Days 1-2)

Phase 2: Superblock Parser (Days 2-3)

Phase 3: Block Group Navigation (Days 3-4)

Phase 4: Inode Reader (Days 4-6)

Phase 5: Directory Traversal (Days 6-8)

Phase 6: Path Resolution (Days 8-10)

Phase 7: Indirect Blocks (Days 10-14)

Testing Strategy

Unit Tests

Integration Tests

Comparison Testing

Common Pitfalls and Debugging

Pitfall 1: Struct Padding

Pitfall 2: Off-by-One in Inode Numbering

Pitfall 3: Block 0 Special Case

Pitfall 4: Directory Entry Alignment

Debugging Techniques

Extensions and Challenges

Extension 1: ext4 Support

Extension 2: Deleted File Recovery

Extension 3: Visual Block Map

Extension 4: JSON Output

Challenge: Cross-Filesystem Support

Real-World Connections

Forensic Analysis

Data Recovery

Operating System Development

Interview Questions

Conceptual

Implementation

Self-Assessment Checklist

Resources

Documentation

Books

Tools