Project 1: Hex Dump Disk Explorer

Project Overview

Attribute Details
Difficulty Beginner-Intermediate (Level 2)
Time Estimate 1-2 weeks
Language C
Knowledge Area Filesystems / Binary Parsing
Main Book “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau
Prerequisites C basics, structs and pointers, hex/binary familiarity

1. Learning Objectives

By completing this project, you will:

  1. Parse binary data structures from raw disk images using C structs and understand byte-level data layout
  2. Understand ext2/ext4 on-disk layout including superblock, block groups, block group descriptor table, and inode tables
  3. Navigate inode tables and decode file metadata (permissions, timestamps, block pointers, file types)
  4. Traverse directory entries and understand how filenames map to inode numbers
  5. Work with indirect block pointers for handling large file support beyond 48KB
  6. Debug binary parsing issues using hex editors, offset calculations, and comparison with standard tools
  7. Handle endianness and struct padding in cross-platform binary parsing
  8. Build production-quality CLI tools with proper error handling and user-friendly output

2. Theoretical Foundation

2.1 Core Concepts

Why Filesystems Organize Data Into Blocks

Hard drives and SSDs do not operate on individual bytes–they work with fixed-size units called sectors (historically 512 bytes, now often 4096 bytes on modern drives). Filesystems organize data into blocks, which are typically 1024, 2048, or 4096 bytes.

Why blocks matter:

  • Efficiency: Reading one 4KB block is faster than 4096 individual byte reads
  • Addressing: With 32-bit block numbers and 4KB blocks, you can address 16TB of storage
  • Metadata overhead: Block-level tracking (one bit per block) uses far less space than byte-level tracking

The fundamental tradeoff:

  • Larger blocks = less metadata, faster sequential I/O, more internal fragmentation
  • Smaller blocks = more metadata, less wasted space for small files

The Superblock: Filesystem’s Master Record

Every ext2/ext4 filesystem begins with a superblock at byte offset 1024. The first 1024 bytes are reserved for boot code (MBR/GPT).

Disk Layout:
┌────────────────────────────────────────────────────────────────┐
│ Byte 0-1023      │ Boot sector (MBR/GPT, partition table)      │
├────────────────────────────────────────────────────────────────┤
│ Byte 1024-2047   │ SUPERBLOCK (filesystem metadata)            │
├────────────────────────────────────────────────────────────────┤
│ Byte 2048+       │ Block Group Descriptor Table                │
├────────────────────────────────────────────────────────────────┤
│ ...              │ Block bitmaps, inode bitmaps, inode tables  │
├────────────────────────────────────────────────────────────────┤
│ ...              │ Data blocks                                 │
└────────────────────────────────────────────────────────────────┘

The superblock contains critical metadata:

struct ext2_super_block {
    uint32_t s_inodes_count;        // Total number of inodes
    uint32_t s_blocks_count;        // Total number of blocks
    uint32_t s_free_blocks_count;   // Free blocks available
    uint32_t s_free_inodes_count;   // Free inodes available
    uint32_t s_first_data_block;    // First data block (0 or 1)
    uint32_t s_log_block_size;      // Block size = 1024 << this value
    uint32_t s_blocks_per_group;    // Blocks per block group
    uint32_t s_inodes_per_group;    // Inodes per block group
    uint16_t s_magic;               // Magic signature (0xEF53)
    // ... many more fields (total 1024 bytes)
};

Key insight: The superblock is duplicated in every block group. If the primary superblock corrupts, fsck can restore it from backups.

Block Groups: Locality for Performance

ext2/ext4 divides the disk into block groups. Each group contains:

  1. Superblock copy (in some groups–sparse superblock feature)
  2. Block Group Descriptor Table copy
  3. Block bitmap (tracks used/free blocks in this group)
  4. Inode bitmap (tracks used/free inodes)
  5. Inode table (actual inode structures)
  6. Data blocks
Block Group Structure:
┌─────────────────────────────────────────────────────────────────┐
│ Block Group 0          │ Block Group 1          │ Block Group 2 │
├─────────────────────────┼─────────────────────────┼───────────────┤
│ Superblock              │ SB copy (sparse)        │ SB copy       │
│ BGDT                    │ BGDT copy               │ BGDT copy     │
│ Block bitmap            │ Block bitmap            │ Block bitmap  │
│ Inode bitmap            │ Inode bitmap            │ Inode bitmap  │
│ Inode table             │ Inode table             │ Inode table   │
│ Data blocks...          │ Data blocks...          │ Data blocks...│
└─────────────────────────┴─────────────────────────┴───────────────┘

Why block groups exist:

  • Locality: Files stored near their inodes lead to faster access
  • Parallelism: Different groups can be accessed concurrently
  • Resilience: Damage to one group does not destroy the entire filesystem

Inodes: The Heart of Unix Filesystems

An inode (index node) is a fixed-size structure (typically 128 or 256 bytes) that stores everything about a file except its name:

struct ext2_inode {
    uint16_t i_mode;        // File type + permissions
    uint16_t i_uid;         // Owner user ID
    uint32_t i_size;        // File size in bytes (lower 32 bits)
    uint32_t i_atime;       // Last access time
    uint32_t i_ctime;       // Creation/change time
    uint32_t i_mtime;       // Modification time
    uint32_t i_dtime;       // Deletion time
    uint16_t i_gid;         // Group ID
    uint16_t i_links_count; // Hard link count
    uint32_t i_blocks;      // Number of 512-byte blocks allocated
    uint32_t i_flags;       // File flags
    uint32_t i_osd1;        // OS-dependent value
    uint32_t i_block[15];   // Block pointers (12 direct + 3 indirect)
    uint32_t i_generation;  // File version (for NFS)
    uint32_t i_file_acl;    // File ACL
    uint32_t i_size_high;   // Upper 32 bits of file size (ext4)
    // ... more fields to reach 128 bytes
};

The block pointer array (i_block[15]):

  • i_block[0-11]: 12 Direct block pointers (48KB directly addressable with 4KB blocks)
  • i_block[12]: Single indirect pointer (points to block full of pointers)
  • i_block[13]: Double indirect pointer (pointer to pointers to pointers)
  • i_block[14]: Triple indirect pointer

Maximum file size calculation (4KB blocks, 4-byte pointers):

  • Direct: 12 x 4KB = 48KB
  • Single indirect: 1024 x 4KB = 4MB
  • Double indirect: 1024 x 1024 x 4KB = 4GB
  • Triple indirect: 1024^3 x 4KB = 4TB
  • Total: approximately 4TB maximum file size
Block Pointer Indirection:
                          ┌─────────────┐
     i_block[0-11] ──────►│ Data Block  │ (Direct)
                          └─────────────┘

                          ┌─────────────┐     ┌─────────────┐
     i_block[12] ────────►│ Ptr Block   │────►│ Data Block  │ (Indirect)
                          │ (1024 ptrs) │     └─────────────┘
                          └─────────────┘

                          ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
     i_block[13] ────────►│ Ptr Block   │────►│ Ptr Block   │────►│ Data Block  │
                          │ (1024 ptrs) │     │ (1024 ptrs) │     └─────────────┘
                          └─────────────┘     └─────────────┘
                                              (Double Indirect)

Directory Entries: Mapping Names to Inodes

Directories are special files whose data blocks contain directory entries:

struct ext2_dir_entry_2 {
    uint32_t inode;      // Inode number (0 = deleted entry)
    uint16_t rec_len;    // Total entry length (for traversal)
    uint8_t  name_len;   // Actual filename length
    uint8_t  file_type;  // File type (1=file, 2=dir, 7=symlink)
    char     name[];     // Filename (variable length, NOT null-terminated)
};

Key insight: Filenames live in directory entries, not in inodes. This enables:

  • Hard links: Multiple directory entries pointing to same inode
  • Efficient rename: Just update directory entry, not file data
  • Case sensitivity: Filesystem controls name comparison

Path resolution algorithm:

  1. Start at root inode (always inode 2 in ext2/ext4)
  2. Read root’s directory data blocks
  3. Search for first path component in directory entries
  4. Get that entry’s inode number
  5. Repeat for each path component until target reached

Binary Data Layout and Endianness

ext2 stores all multi-byte integers in little-endian format (least significant byte first). On x86/x64 systems, you can read structs directly. On big-endian systems, you need conversion:

// Convert little-endian to host byte order
#include <endian.h>
uint32_t blocks = le32toh(sb.s_blocks_count);
uint16_t magic = le16toh(sb.s_magic);

Struct padding warning:

// This struct might have unexpected padding!
struct bad_example {
    uint8_t  a;      // 1 byte
    // 3 bytes padding here on many systems!
    uint32_t b;      // 4 bytes
};  // Size: 8 bytes, not 5!

// Use packed attribute to match on-disk layout exactly
struct ext2_superblock {
    // fields...
} __attribute__((packed));

2.2 Why This Matters

Real-World Applications:

  1. Data Recovery: When files are accidentally deleted or filesystems corrupted, forensic tools parse raw disk structures to recover data. Understanding these structures lets you build recovery tools.

  2. Forensic Analysis: Digital forensics investigators examine filesystem metadata to track file modifications, deletions, and user activity. This project teaches the fundamentals.

  3. Operating System Development: OS developers implementing filesystem drivers need deep understanding of on-disk layouts.

  4. Cloud Storage: Understanding block-level storage helps when working with block storage services (AWS EBS, Azure Disk).

  5. Database Internals: Databases often implement their own block management similar to filesystems.

2.3 Historical Context

The ext2 filesystem was developed by Remy Card in 1993 as an improvement over the original ext filesystem. Key historical points:

  • ext (1992): First Linux filesystem, limited to 2GB partitions
  • ext2 (1993): Added block groups, improved performance, became Linux standard
  • ext3 (2001): Added journaling to ext2 for crash recovery
  • ext4 (2008): Added extents, improved timestamps, larger filesystem support

ext2 remains relevant today because:

  • It is the simplest modern Unix filesystem to understand
  • ext3/ext4 are backward-compatible with ext2 structures
  • Many embedded systems still use ext2 (no journal overhead)
  • It is the foundation for understanding all ext-family filesystems

2.4 Common Misconceptions

Misconception 1: “Files are stored as contiguous bytes on disk”

  • Reality: Files are split into blocks scattered across the disk. The inode’s block pointers track where each piece lives.

Misconception 2: “Filenames are stored in inodes”

  • Reality: Inodes only contain metadata and block pointers. Filenames are stored in parent directory’s data blocks as directory entries.

Misconception 3: “Deleting a file erases its data”

  • Reality: Deletion only marks blocks as free in the bitmap and clears the directory entry. Data remains until overwritten (this is why data recovery works).

Misconception 4: “Block size equals sector size”

  • Reality: Filesystem blocks are typically larger than disk sectors. A 4KB block spans eight 512-byte sectors.

Misconception 5: “The superblock is at byte 0”

  • Reality: The superblock starts at byte 1024. Bytes 0-1023 are reserved for boot sector/MBR.

3. Project Specification

3.1 What You Will Build

A command-line tool that reads raw disk images or block devices and visualizes filesystem structures–superblocks, inode tables, directory entries, and data blocks–in human-readable format. Think of it as a combination of dumpe2fs, debugfs, and xxd tailored for learning.

3.2 Functional Requirements

  1. Open and validate disk images
    • Accept file path as command-line argument
    • Open in binary read mode
    • Validate ext2 magic number (0xEF53) at offset 1024+56
    • Handle errors gracefully (file not found, not ext2, permissions)
  2. Parse and display superblock
    • Read superblock at offset 1024
    • Calculate derived values (block size, group count, inode size)
    • Display in human-readable format with field explanations
  3. Navigate block groups
    • Parse Block Group Descriptor Table
    • Display information for each block group
    • Show bitmap locations and inode table positions
  4. Read and decode inodes
    • Accept inode number as input
    • Calculate inode location (which group, which offset)
    • Display all inode fields with interpretation (permissions string, timestamps)
  5. Traverse directory contents
    • Read directory inode’s data blocks
    • Parse variable-length directory entries
    • Display entries with inode numbers, names, and file types
  6. Follow file paths
    • Accept path string (e.g., “/home/user/file.txt”)
    • Resolve path to inode number through directory traversal
    • Display final inode information
  7. Hex dump mode
    • Display raw bytes of any block or offset range
    • Annotate with structure interpretations

3.3 Non-Functional Requirements

  • Handle disk images from 1MB to 100GB
  • Parse filesystem structures in reasonable time (less than 1 second for path resolution)
  • Provide meaningful error messages for invalid operations
  • Support both ext2 and ext4 basic structures
  • Compile without warnings on GCC with -Wall -Wextra
  • Work on both Linux and macOS (with appropriate endianness handling)

3.4 Example Usage and Output

$ ./fsexplore disk.img

SUPERBLOCK at offset 1024:
  Magic: 0xEF53 (valid ext2/3/4 signature)
  Block size: 4096 bytes (log_block_size=2)
  Total inodes: 128016
  Total blocks: 512000
  Free inodes: 127993
  Free blocks: 451231
  Blocks per group: 32768
  Inodes per group: 8016
  First data block: 0
  Inode size: 256 bytes
  Number of block groups: 16

$ ./fsexplore disk.img inode 2

INODE #2 (root directory):
  Type: Directory
  Permissions: drwxr-xr-x (0755)
  Owner: uid=0, gid=0
  Size: 4096 bytes
  Links: 23
  Blocks: 8 (512-byte units)
  Created: 2024-01-15 10:30:45
  Modified: 2024-01-20 14:22:33
  Accessed: 2024-01-20 15:00:00
  Direct blocks: [258, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  Indirect: 0, Double-indirect: 0, Triple-indirect: 0

$ ./fsexplore disk.img dir /

DIRECTORY contents at inode 2 (block 258):
  Inode   Type      Name
  ─────   ────      ────
  2       DIR       .
  2       DIR       ..
  11      DIR       lost+found
  12      DIR       home
  13      DIR       etc
  14      DIR       var
  15      FILE      README.txt

$ ./fsexplore disk.img path /home/user/file.txt

Resolving path: /home/user/file.txt
  / -> inode 2 (root)
  /home -> inode 12
  /home/user -> inode 1045
  /home/user/file.txt -> inode 2891

INODE #2891:
  Type: Regular file
  Size: 1234 bytes
  Permissions: -rw-r--r-- (0644)
  ...

$ ./fsexplore disk.img hexdump 258 --size 256

Block 258 (offset 1056768):
00000000: 02 00 00 00 0c 00 01 02 2e 00 00 00 02 00 00 00  |................|
00000010: 0c 00 02 02 2e 2e 00 00 0b 00 00 00 14 00 0a 02  |................|
00000020: 6c 6f 73 74 2b 66 6f 75 6e 64 00 00 0c 00 00 00  |lost+found......|
...

3.5 Real World Outcome

When you complete this project, you will have a tool that:

  1. Opens any ext2/ext3/ext4 disk image and validates it
  2. Displays superblock information matching dumpe2fs output
  3. Navigates to any inode and shows complete metadata
  4. Lists directory contents similar to ls -la
  5. Resolves file paths by traversing the directory tree
  6. Dumps raw hex for any disk region with structure annotations

Verification method: Compare your output with dumpe2fs, debugfs, and manual xxd inspection to verify correctness.


4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────────┐
│                         CLI Interface                                │
│              (argument parsing, command dispatch, output formatting) │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Filesystem Parser                             │
│               (orchestrates parsing of all structures)               │
├──────────────┬──────────────┬──────────────────┬────────────────────┤
│  Superblock  │  Block Group │      Inode       │    Directory       │
│   Parser     │   Navigator  │     Reader       │    Parser          │
└──────┬───────┴──────┬───────┴────────┬─────────┴─────────┬──────────┘
       │              │                │                   │
       └──────────────┼────────────────┼───────────────────┘
                      │                │
                      ▼                ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Block I/O Layer                               │
│                  (read blocks, handle byte offsets)                  │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Disk Image File                               │
│                   (opened with fopen in binary mode)                 │
└─────────────────────────────────────────────────────────────────────┘

4.2 Key Components

1. CLI Interface (main.c)

  • Parse command-line arguments
  • Dispatch to appropriate command handler
  • Format output for human readability

2. Filesystem Context (fs_context.h)

  • Hold open file handle and cached superblock
  • Manage block group descriptor table
  • Provide convenience methods for block/inode access

3. Superblock Parser (superblock.c)

  • Read and validate superblock at offset 1024
  • Calculate derived values (block size, group count)
  • Pretty-print all superblock fields

4. Block Group Navigator (block_group.c)

  • Parse BGDT entries
  • Locate inode tables, bitmaps
  • Calculate block group boundaries

5. Inode Reader (inode.c)

  • Calculate inode location from inode number
  • Read and parse inode structure
  • Decode permissions, timestamps, file types

6. Directory Parser (directory.c)

  • Read directory data blocks
  • Parse variable-length directory entries
  • Handle entry iteration and name extraction

7. Path Resolver (path.c)

  • Tokenize path strings
  • Traverse directory tree
  • Return final inode for any valid path

4.3 Data Structures

// ext2_defs.h - On-disk structure definitions

#include <stdint.h>

#define EXT2_SUPER_MAGIC    0xEF53
#define EXT2_ROOT_INO       2
#define EXT2_SUPERBLOCK_OFFSET 1024
#define EXT2_SUPERBLOCK_SIZE   1024

// File types in directory entries
#define EXT2_FT_UNKNOWN     0
#define EXT2_FT_REG_FILE    1
#define EXT2_FT_DIR         2
#define EXT2_FT_CHRDEV      3
#define EXT2_FT_BLKDEV      4
#define EXT2_FT_FIFO        5
#define EXT2_FT_SOCK        6
#define EXT2_FT_SYMLINK     7

// Superblock structure (partial - key fields)
struct ext2_super_block {
    uint32_t s_inodes_count;
    uint32_t s_blocks_count;
    uint32_t s_r_blocks_count;
    uint32_t s_free_blocks_count;
    uint32_t s_free_inodes_count;
    uint32_t s_first_data_block;
    uint32_t s_log_block_size;
    int32_t  s_log_frag_size;
    uint32_t s_blocks_per_group;
    uint32_t s_frags_per_group;
    uint32_t s_inodes_per_group;
    uint32_t s_mtime;
    uint32_t s_wtime;
    uint16_t s_mnt_count;
    int16_t  s_max_mnt_count;
    uint16_t s_magic;
    uint16_t s_state;
    uint16_t s_errors;
    uint16_t s_minor_rev_level;
    uint32_t s_lastcheck;
    uint32_t s_checkinterval;
    uint32_t s_creator_os;
    uint32_t s_rev_level;
    uint16_t s_def_resuid;
    uint16_t s_def_resgid;
    // Extended fields for revision 1
    uint32_t s_first_ino;
    uint16_t s_inode_size;
    uint16_t s_block_group_nr;
    uint32_t s_feature_compat;
    uint32_t s_feature_incompat;
    uint32_t s_feature_ro_compat;
    uint8_t  s_uuid[16];
    char     s_volume_name[16];
    char     s_last_mounted[64];
    uint32_t s_algo_bitmap;
    // Padding to reach 1024 bytes
    uint8_t  padding[820];
} __attribute__((packed));

// Block Group Descriptor
struct ext2_group_desc {
    uint32_t bg_block_bitmap;
    uint32_t bg_inode_bitmap;
    uint32_t bg_inode_table;
    uint16_t bg_free_blocks_count;
    uint16_t bg_free_inodes_count;
    uint16_t bg_used_dirs_count;
    uint16_t bg_pad;
    uint32_t bg_reserved[3];
} __attribute__((packed));

// Inode structure
struct ext2_inode {
    uint16_t i_mode;
    uint16_t i_uid;
    uint32_t i_size;
    uint32_t i_atime;
    uint32_t i_ctime;
    uint32_t i_mtime;
    uint32_t i_dtime;
    uint16_t i_gid;
    uint16_t i_links_count;
    uint32_t i_blocks;
    uint32_t i_flags;
    uint32_t i_osd1;
    uint32_t i_block[15];
    uint32_t i_generation;
    uint32_t i_file_acl;
    uint32_t i_size_high;
    uint32_t i_faddr;
    uint8_t  i_osd2[12];
} __attribute__((packed));

// Directory entry
struct ext2_dir_entry_2 {
    uint32_t inode;
    uint16_t rec_len;
    uint8_t  name_len;
    uint8_t  file_type;
    char     name[];
} __attribute__((packed));

// Verify struct sizes at compile time
_Static_assert(sizeof(struct ext2_super_block) == 1024, "Superblock must be 1024 bytes");
_Static_assert(sizeof(struct ext2_group_desc) == 32, "Group descriptor must be 32 bytes");
_Static_assert(sizeof(struct ext2_inode) == 128, "Inode must be 128 bytes");
// fs_context.h - Runtime filesystem context

struct fs_context {
    FILE *disk;                         // Open file handle
    struct ext2_super_block sb;         // Cached superblock
    uint32_t block_size;                // Calculated block size
    uint32_t groups_count;              // Number of block groups
    uint32_t inode_size;                // Size of each inode
    struct ext2_group_desc *bgdt;       // Block group descriptors
    char *image_path;                   // Path to disk image
};

// Parsed inode with human-readable info
struct parsed_inode {
    struct ext2_inode raw;              // Raw inode data
    uint32_t ino;                       // Inode number
    char type_char;                     // 'd', '-', 'l', etc.
    char perm_str[11];                  // "drwxr-xr-x"
    char mtime_str[32];                 // Human-readable timestamp
    char atime_str[32];
    char ctime_str[32];
};

// Directory entry for iteration
struct dir_entry_info {
    uint32_t inode;
    uint8_t  file_type;
    char     name[256];
};

4.4 Algorithm Overview

Superblock Parsing:

1. Seek to offset 1024
2. Read 1024 bytes into ext2_super_block struct
3. Validate magic number == 0xEF53
4. Calculate: block_size = 1024 << s_log_block_size
5. Calculate: groups_count = ceil(s_blocks_count / s_blocks_per_group)
6. Determine inode_size (128 for rev 0, s_inode_size for rev 1+)

Inode Location:

Given inode number N:
1. block_group = (N - 1) / s_inodes_per_group
2. local_index = (N - 1) % s_inodes_per_group
3. inode_table_block = bgdt[block_group].bg_inode_table
4. byte_offset = inode_table_block * block_size + local_index * inode_size
5. Seek to byte_offset, read inode_size bytes

Directory Traversal:

Given directory inode:
1. Read data blocks pointed to by inode.i_block[0..11]
2. For each block:
   a. offset = 0
   b. While offset < block_size:
      - Read dir_entry at offset
      - If entry.inode != 0, process entry
      - offset += entry.rec_len

Path Resolution:

Given path "/a/b/c":
1. current_inode = 2 (root)
2. Split path by '/'
3. For each component:
   a. Read current_inode as directory
   b. Search entries for component name
   c. If found, current_inode = entry.inode
   d. If not found, return -ENOENT
4. Return current_inode

5. Implementation Guide

5.1 Development Environment Setup

Required packages (Ubuntu/Debian):

sudo apt install build-essential
sudo apt install e2fsprogs     # For mkfs.ext2, dumpe2fs, debugfs
sudo apt install hexdump xxd   # For verification

Required packages (macOS):

brew install e2fsprogs
# Note: mounting ext2 on macOS requires ext4fuse or similar

Project structure:

fsexplore/
├── Makefile
├── include/
│   ├── ext2_defs.h
│   ├── fs_context.h
│   ├── superblock.h
│   ├── block_group.h
│   ├── inode.h
│   ├── directory.h
│   └── path.h
├── src/
│   ├── main.c
│   ├── fs_context.c
│   ├── superblock.c
│   ├── block_group.c
│   ├── inode.c
│   ├── directory.c
│   └── path.c
├── tests/
│   ├── test_superblock.c
│   ├── test_inode.c
│   └── test_integration.sh
└── README.md

Makefile:

CC = gcc
CFLAGS = -Wall -Wextra -Werror -std=c11 -I include
LDFLAGS =
SRC = src/main.c src/fs_context.c src/superblock.c src/block_group.c \
      src/inode.c src/directory.c src/path.c
OBJ = $(SRC:.c=.o)
TARGET = fsexplore

all: $(TARGET)

$(TARGET): $(OBJ)
	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)

%.o: %.c
	$(CC) $(CFLAGS) -c -o $@ $<

clean:
	rm -f $(OBJ) $(TARGET)

test: $(TARGET)
	./tests/test_integration.sh

.PHONY: all clean test

Create test disk image:

# Create 10MB ext2 image
dd if=/dev/zero of=test.img bs=1M count=10
mkfs.ext2 test.img

# Add some content
mkdir -p /tmp/mnt
sudo mount -o loop test.img /tmp/mnt
sudo mkdir /tmp/mnt/subdir
echo "Hello World" | sudo tee /tmp/mnt/hello.txt
echo "Test file" | sudo tee /tmp/mnt/subdir/test.txt
sudo umount /tmp/mnt

# Verify with standard tools
dumpe2fs test.img 2>/dev/null | head -30
debugfs -R "ls -l /" test.img

5.2 Project Structure

See section 5.1 for directory layout. Each source file should:

  • Include only necessary headers
  • Define static functions for internal use
  • Export public functions declared in headers
  • Use consistent error handling (return -1 or NULL on error)

5.3 The Core Question You Are Answering

“How does a filesystem actually store my files on disk? What does ‘ext2’ really mean?”

When you create a file named “hello.txt” and write “Hello World” to it, where do those bytes actually go? How does the OS know which disk sectors contain your file versus someone else’s?

Most developers treat filesystems as black boxes. This project rips away the abstraction. You will read raw disk bytes and decode the exact structures that ext2 uses. When you finish, you will see filesystems not as magic, but as elegant data structures mapped onto persistent storage.

5.4 Concepts You Must Understand First

Before coding, ensure you can answer these questions:

Binary File I/O in C:

  • How do you open a file in binary mode? (fopen(path, "rb"))
  • What is the difference between fread() and fscanf()?
  • How do you seek to a specific byte offset? (fseek(), lseek())
  • Why is byte alignment important when reading structs from disk?

Book Reference: “C Programming: A Modern Approach” Ch. 22 - K.N. King

Structs and Binary Data Layout:

  • How are struct members laid out in memory?
  • What is struct padding, and why does it exist?
  • How does __attribute__((packed)) affect struct layout?
  • Why might the same struct have different sizes on different architectures?

Book Reference: “The C Programming Language” Ch. 6 - Kernighan & Ritchie

Hexadecimal and Binary Number Systems:

  • Why is hex the natural way to read binary data?
  • How do you interpret multi-byte integers (endianness)?
  • What does “0xEF53” as a magic number mean?
  • How do bitmasks work? (e.g., 0x01FF for file permissions)

Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 2 - Bryant & O’Hallaron

5.5 Questions to Guide Your Design

Phase 1: Reading the Superblock

  1. Why is the ext2 superblock always at byte offset 1024?
  2. What happens in the first 1024 bytes? (Boot sector space)
  3. How do you read exactly 1024 bytes starting at offset 1024?
  4. What is the “magic number” for ext2? Where in the superblock is it?
  5. What should your program do if the magic number is wrong?
  6. How do you determine the block size from s_log_block_size?

Phase 2: Navigating to Inodes

  1. Where does the Block Group Descriptor Table start?
  2. How big is each block group descriptor struct?
  3. How do you calculate which block group contains a given inode number?
  4. Given inode number N, what is its byte offset on disk?
  5. What is the size of an inode struct?

Phase 3: Following Block Pointers

  1. An inode has 12 direct block pointers. How much data can they address?
  2. What is i_block[12]? How do you follow it?
  3. If a block pointer is 0, what does that mean? (Sparse file)
  4. How would you read the 13th data block of a file?

Phase 4: Parsing Directories

  1. What fields are in struct ext2_dir_entry_2?
  2. Why is rec_len variable?
  3. How do you iterate through directory entries in a block?
  4. How do you extract the filename from a directory entry?

5.6 Thinking Exercise

Trace ext2 Structures By Hand

Before coding, draw these structures on paper for a minimal ext2 filesystem:

Scenario: A fresh ext2 filesystem with:

  • Block size: 1024 bytes
  • Total blocks: 1024 (1MB filesystem)
  • Inodes: 128
  • One block group

Given files:

/ (root directory, inode 2)
├── hello.txt (inode 12, contains "Hello World\n", 12 bytes)
└── subdir/ (inode 13)
    └── test.txt (inode 14, contains "Test\n", 5 bytes)

Questions to trace:

  1. Superblock (offset 1024)
    • Draw the superblock. Where is the magic number 0xEF53?
    • Calculate: s_blocks_count = 1024, s_inodes_count = 128, s_log_block_size = 0 (means 1024)
  2. Block Group Descriptor Table (offset 2048)
    • Draw the single BGDT entry
    • Where is the inode bitmap? Block bitmap? Inode table?
  3. Inode Table
    • Draw inode 2 (root directory). What blocks does it use?
    • Draw inode 12 (hello.txt). It is 12 bytes, so it uses 1 block. Which block?
  4. Directory Data
    • Root directory’s data block contains directory entries. Draw the entries:
      [inode=2, name="."]
      [inode=2, name=".."]
      [inode=12, name="hello.txt"]
      [inode=13, name="subdir"]
      
    • How big is each entry? (variable due to filename length)
  5. Path Resolution
    • Trace “/subdir/test.txt”:
      1. Start at inode 2 (root)
      2. Read root’s directory data
      3. Find “subdir” entry, get inode 13
      4. Read inode 13’s directory data
      5. Find “test.txt” entry, get inode 14
      6. Read inode 14 to get file data blocks

Draw this layout:

Byte 0    : Boot sector (unused)
Byte 1024 : Superblock
Byte 2048 : Block Group Descriptor Table
Byte 3072 : Block bitmap (1024 bits, one per block)
Byte 4096 : Inode bitmap (128 bits, one per inode)
Byte 5120 : Inode table (128 inodes x 128 bytes = 16384 bytes)
Byte 21504: Data blocks start here

This exercise builds the mental model you need before parsing real disk images.

5.7 Hints in Layers

Hint 1: Start with superblock (Conceptual Direction)

Your first goal: read and display the superblock. Do not try to implement everything at once.

#include <stdio.h>
#include <stdint.h>

#define EXT2_SUPER_MAGIC 0xEF53

struct ext2_superblock {
    uint32_t s_inodes_count;
    uint32_t s_blocks_count;
    uint32_t s_r_blocks_count;
    uint32_t s_free_blocks_count;
    uint32_t s_free_inodes_count;
    uint32_t s_first_data_block;
    uint32_t s_log_block_size;
    // ... more fields
    uint8_t  padding[968];  // Pad to 1024 bytes
} __attribute__((packed));

int main(int argc, char *argv[]) {
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <disk.img>\n", argv[0]);
        return 1;
    }

    FILE *disk = fopen(argv[1], "rb");
    if (!disk) {
        perror("fopen");
        return 1;
    }

    // Seek to superblock
    fseek(disk, 1024, SEEK_SET);

    // Read superblock
    struct ext2_superblock sb;
    if (fread(&sb, sizeof(sb), 1, disk) != 1) {
        perror("fread");
        fclose(disk);
        return 1;
    }

    // Validate magic
    uint16_t *magic_ptr = (uint16_t *)((char *)&sb + 56);
    if (*magic_ptr != EXT2_SUPER_MAGIC) {
        fprintf(stderr, "Not an ext2 filesystem (magic: 0x%04X)\n", *magic_ptr);
        fclose(disk);
        return 1;
    }

    printf("Valid ext2 filesystem!\n");
    printf("Block size: %u bytes\n", 1024 << sb.s_log_block_size);
    printf("Total blocks: %u\n", sb.s_blocks_count);
    printf("Total inodes: %u\n", sb.s_inodes_count);

    fclose(disk);
    return 0;
}

Hint 2: Define complete structs (More Specific)

Use the ext2 documentation to define complete structs. The key is proper field ordering and __attribute__((packed)):

// The magic number is at offset 56 within the superblock
// Byte 0-3:   s_inodes_count
// Byte 4-7:   s_blocks_count
// ...
// Byte 56-57: s_magic

Verify your struct matches by checking sizeof() and comparing field offsets.

Hint 3: Calculate offsets carefully (Technical Details)

To find inode N:

uint32_t group = (inode_num - 1) / sb.s_inodes_per_group;
uint32_t index = (inode_num - 1) % sb.s_inodes_per_group;
uint32_t inode_table_block = bgdt[group].bg_inode_table;
off_t offset = (off_t)inode_table_block * block_size + (off_t)index * inode_size;

Note: Inodes are numbered from 1, not 0. Always subtract 1 before division/modulo.

Hint 4: Handle endianness (Verification)

ext2 stores integers in little-endian. On little-endian systems (x86), read directly. On big-endian, convert:

#include <endian.h>

uint32_t blocks = le32toh(sb.s_blocks_count);
uint16_t magic = le16toh(sb.s_magic);

Verify with xxd that your interpretation matches raw bytes.

Hint 5: Parse directory entries carefully

Directory entries are variable-length and packed:

struct ext2_dir_entry_2 {
    uint32_t inode;
    uint16_t rec_len;   // Length of THIS entry (including padding)
    uint8_t  name_len;  // Actual length of filename
    uint8_t  file_type;
    char     name[];    // Variable-length (name_len bytes, NOT null-terminated)
} __attribute__((packed));

// Iteration through a directory block:
char *block_data = read_block(block_num);
char *ptr = block_data;

while (ptr < block_data + block_size) {
    struct ext2_dir_entry_2 *entry = (struct ext2_dir_entry_2 *)ptr;

    if (entry->inode != 0) {  // Skip deleted entries
        char name[256];
        memcpy(name, entry->name, entry->name_len);
        name[entry->name_len] = '\0';
        printf("[%u] %s\n", entry->inode, name);
    }

    ptr += entry->rec_len;  // Move to next entry
}

Hint 6: Use hex editor to verify

When in doubt, use xxd to inspect raw bytes:

# Read superblock
xxd -s 1024 -l 128 disk.img

# Read specific block (block 5, assuming 4KB blocks)
xxd -s $((5 * 4096)) -l 256 disk.img

Compare byte values with your parsed output to find bugs.

5.8 Interview Questions They Will Ask

Conceptual Questions:

  1. “What is an inode, and what information does it store?”
    • Expected: Metadata (permissions, timestamps, size, owner), block pointers, but NOT the filename. Explain why separating names from metadata enables hard links.
  2. “How does ext2 resolve a file path like ‘/home/user/file.txt’?”
    • Expected: Start at root inode (2), read directory entries, find “home” entry’s inode, read that directory, find “user”, repeat for “file.txt”.
  3. “Why does ext2 use direct, indirect, double-indirect, and triple-indirect pointers?”
    • Expected: Balance between small file efficiency (direct pointers are fast) and large file support (indirection allows huge files). Most files are small, so optimize the common case.
  4. “What is the maximum file size in ext2 with 4096-byte blocks?”
    • Expected: Calculate and show work: 12 direct (48KB) + 1024 indirect (4MB) + 1024^2 double (4GB) + 1024^3 triple (4TB) = ~4TB
  5. “What happens if a program writes to byte offset 1,000,000,000 in an empty file without writing the bytes before it?”
    • Expected: Sparse file. Block pointers for unwritten regions are 0 (holes). Only actually written blocks are allocated. Reading holes returns zeros.

Implementation Questions:

  1. “Why is the superblock at offset 1024 instead of 0?”
    • Expected: First 1024 bytes reserved for boot sector/MBR/partition table. Some systems boot from the filesystem’s boot sector.
  2. “What makes ‘.’ and ‘..’ directory entries special?”
    • Expected: . points to current directory’s inode, .. points to parent’s inode. They enable relative path navigation and are required in every directory.
  3. “How would you implement ‘fsck’ (filesystem consistency check)?”
    • Expected: Walk all inodes, verify bitmaps match actual usage, check directory tree integrity, verify link counts, detect orphan inodes.
  4. “What happens to the data when you delete a file?”
    • Expected: Directory entry removed, inode link count decremented. When link count reaches 0, blocks marked free in bitmap. Data NOT erased–remains until overwritten. This is why data recovery is possible.

5.9 Books That Will Help

Topic Book Chapter
Binary file I/O in C “C Programming: A Modern Approach” by K.N. King Ch. 22 (Input/Output)
Struct layout and padding “The C Programming Language” by Kernighan & Ritchie Ch. 6 (Structures)
Binary data representation “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron Ch. 2 (Representing Information)
Filesystem fundamentals “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau Ch. 39-40 (Files, File System Implementation)
ext2 structure details “Understanding the Linux Kernel” by Bovet & Cesati Ch. 18 (Ext2/Ext3 Filesystems)
Directory operations “The Linux Programming Interface” by Michael Kerrisk Ch. 18 (Directories and Links)
Debugging binary data “The Art of Debugging with GDB, DDD, and Eclipse” by Matloff Ch. 1-3 (Examining Memory)

5.10 Implementation Phases

Phase 1: Binary I/O Foundation (Days 1-2)

Goals:

  • Create disk image I/O layer
  • Read raw bytes at specific offsets
  • Verify you can read the superblock magic number

Steps:

  1. Create a minimal program that opens a file in binary mode
  2. Seek to offset 1024 and read 2 bytes at offset 56 (magic number location)
  3. Verify you read 0x53EF (little-endian for 0xEF53)

Testing:

# Create test image
dd if=/dev/zero of=test.img bs=1M count=10
mkfs.ext2 test.img

# Verify magic number manually
xxd -s 1080 -l 2 test.img
# Should show: 53 ef

Key learning: Binary file I/O with fopen("rb"), fseek(), fread()

Phase 2: Superblock Parser (Days 2-3)

Goals:

  • Define complete superblock struct
  • Parse all relevant fields
  • Display human-readable output

Steps:

  1. Define struct ext2_super_block with all fields (use ext2 documentation)
  2. Read the struct from offset 1024
  3. Calculate derived values (block size = 1024 « s_log_block_size)
  4. Print formatted output

Testing:

# Compare with dumpe2fs output
dumpe2fs test.img 2>/dev/null | head -30
./fsexplore test.img superblock
# Values should match

Phase 3: Block Group Navigation (Days 3-4)

Goals:

  • Parse Block Group Descriptor Table
  • Locate inode tables and bitmaps
  • Navigate to specific block groups

Steps:

  1. BGDT starts at block 1 (or 2 for 1KB blocks after superblock)
  2. Each descriptor is 32 bytes
  3. Parse inode table location, bitmap locations
  4. Calculate number of groups from superblock

Testing:

# Verify with dumpe2fs
dumpe2fs test.img 2>/dev/null | grep -A 10 "Group 0"

Phase 4: Inode Reader (Days 4-6)

Goals:

  • Calculate inode location from inode number
  • Read and parse inode structures
  • Decode permissions and file types

Steps:

  1. Given inode N, find its block group: (N - 1) / inodes_per_group
  2. Find offset within group: (N - 1) % inodes_per_group
  3. Get inode table start from BGDT
  4. Read inode at calculated offset

Testing:

# Create files and check their inodes
sudo mount -o loop test.img /tmp/mnt
sudo touch /tmp/mnt/testfile
ls -i /tmp/mnt/testfile  # Shows inode number
sudo umount /tmp/mnt
./fsexplore test.img inode <number>
# Compare with: debugfs -R "stat <number>" test.img

Phase 5: Directory Traversal (Days 6-8)

Goals:

  • Read directory data blocks
  • Parse variable-length directory entries
  • List directory contents

Steps:

  1. Read directory inode (inode 2 for root)
  2. Follow block pointers to read directory data
  3. Parse entries: inode, rec_len, name_len, name
  4. Handle variable-length entries correctly

Testing:

# Add files to test image
sudo mount -o loop test.img /tmp/mnt
sudo mkdir /tmp/mnt/subdir
sudo touch /tmp/mnt/file1.txt /tmp/mnt/file2.txt
sudo umount /tmp/mnt
./fsexplore test.img dir /
# Compare with: debugfs -R "ls -l /" test.img

Phase 6: Path Resolution (Days 8-10)

Goals:

  • Parse path strings
  • Traverse directory tree
  • Find final inode for any path

Steps:

  1. Split path by ‘/’ separator
  2. Start at inode 2 (root)
  3. For each component, read directory, find matching entry
  4. Use entry’s inode as new current inode
  5. Repeat until path exhausted

Testing:

sudo mount -o loop test.img /tmp/mnt
sudo mkdir -p /tmp/mnt/a/b/c
sudo touch /tmp/mnt/a/b/c/deep.txt
sudo umount /tmp/mnt
./fsexplore test.img path /a/b/c/deep.txt

Phase 7: Indirect Blocks (Days 10-14)

Goals:

  • Handle files larger than 48KB
  • Follow indirect, double-indirect, triple-indirect pointers
  • Read any byte from any file

Steps:

  1. For logical block 0-11, use direct pointers
  2. For block 12-1035, read indirect block, use Nth pointer
  3. For blocks 1036+, read double-indirect, then indirect
  4. Implement get_block_num(inode, logical_block) function

Testing:

sudo mount -o loop test.img /tmp/mnt
dd if=/dev/urandom of=/tmp/mnt/largefile bs=1M count=5
sudo umount /tmp/mnt
./fsexplore test.img file-blocks /largefile
# Verify indirect blocks are being used

5.11 Key Implementation Decisions

Decision 1: Read-through vs. Caching

  • Simple: Read from disk for every operation (slower but simpler)
  • Better: Cache superblock and BGDT in context struct
  • Best: Add block cache for frequently accessed blocks

Decision 2: Error Handling Strategy

  • Return -1 and set errno
  • Or return error codes and use error message buffer
  • Be consistent throughout

Decision 3: Path Buffer Management

  • Static buffers with fixed max length (simpler)
  • Dynamic allocation (more flexible, requires careful free)
  • Consider PATH_MAX (4096 on Linux)

Decision 4: Output Format

  • Human-readable text (default)
  • JSON for scripting
  • Raw hex dump option

6. Testing Strategy

6.1 Unit Tests

// test_superblock.c
#include <assert.h>
#include <string.h>
#include "ext2_defs.h"

// Test superblock validation
void test_validate_superblock() {
    struct ext2_super_block sb;
    memset(&sb, 0, sizeof(sb));

    // Set valid magic at correct offset (byte 56)
    sb.s_magic = 0xEF53;
    // Function should return true for valid superblock
    assert(is_valid_ext2(&sb) == 1);

    sb.s_magic = 0x0000;
    assert(is_valid_ext2(&sb) == 0);

    printf("test_validate_superblock: PASSED\n");
}

// Test block size calculation
void test_block_size() {
    struct ext2_super_block sb;
    memset(&sb, 0, sizeof(sb));

    sb.s_log_block_size = 0;  // 1024 << 0 = 1024
    assert(get_block_size(&sb) == 1024);

    sb.s_log_block_size = 1;  // 1024 << 1 = 2048
    assert(get_block_size(&sb) == 2048);

    sb.s_log_block_size = 2;  // 1024 << 2 = 4096
    assert(get_block_size(&sb) == 4096);

    printf("test_block_size: PASSED\n");
}

// Test inode location calculation
void test_inode_location() {
    uint32_t inodes_per_group = 8192;

    // Inode 1 is at group 0, index 0
    assert(get_inode_group(1, inodes_per_group) == 0);
    assert(get_inode_index(1, inodes_per_group) == 0);

    // Inode 8192 is at group 0, index 8191
    assert(get_inode_group(8192, inodes_per_group) == 0);
    assert(get_inode_index(8192, inodes_per_group) == 8191);

    // Inode 8193 is at group 1, index 0
    assert(get_inode_group(8193, inodes_per_group) == 1);
    assert(get_inode_index(8193, inodes_per_group) == 0);

    printf("test_inode_location: PASSED\n");
}

// Test permission string generation
void test_permission_string() {
    char perm[11];

    // Directory with rwxr-xr-x
    mode_to_string(S_IFDIR | 0755, perm);
    assert(strcmp(perm, "drwxr-xr-x") == 0);

    // Regular file with rw-r--r--
    mode_to_string(S_IFREG | 0644, perm);
    assert(strcmp(perm, "-rw-r--r--") == 0);

    // Symlink with rwxrwxrwx
    mode_to_string(S_IFLNK | 0777, perm);
    assert(strcmp(perm, "lrwxrwxrwx") == 0);

    printf("test_permission_string: PASSED\n");
}

int main() {
    test_validate_superblock();
    test_block_size();
    test_inode_location();
    test_permission_string();
    printf("\nAll unit tests passed!\n");
    return 0;
}

6.2 Integration Tests

#!/bin/bash
# test_integration.sh

set -e

TOOL=./fsexplore
TESTIMG=test_integration.img
MNT=/tmp/fsexplore_test_mnt

cleanup() {
    sudo umount $MNT 2>/dev/null || true
    rm -f $TESTIMG
    rmdir $MNT 2>/dev/null || true
}
trap cleanup EXIT

# Create test image
echo "Creating test image..."
dd if=/dev/zero of=$TESTIMG bs=1M count=10 2>/dev/null
mkfs.ext2 -F $TESTIMG 2>/dev/null

# Mount and populate
mkdir -p $MNT
sudo mount -o loop $TESTIMG $MNT
sudo mkdir $MNT/testdir
echo "Hello World" | sudo tee $MNT/testdir/hello.txt > /dev/null
echo "Another file" | sudo tee $MNT/file1.txt > /dev/null
sudo umount $MNT

# Test 1: Superblock magic
echo -n "Test 1 - Superblock magic: "
$TOOL $TESTIMG superblock | grep -q "0xEF53" && echo "PASS" || echo "FAIL"

# Test 2: Block size
echo -n "Test 2 - Block size parsed: "
$TOOL $TESTIMG superblock | grep -q "Block size:" && echo "PASS" || echo "FAIL"

# Test 3: Root directory listing
echo -n "Test 3 - Root directory: "
$TOOL $TESTIMG dir / | grep -q "testdir" && echo "PASS" || echo "FAIL"

# Test 4: Subdirectory listing
echo -n "Test 4 - Subdirectory: "
$TOOL $TESTIMG dir /testdir | grep -q "hello.txt" && echo "PASS" || echo "FAIL"

# Test 5: Path resolution
echo -n "Test 5 - Path resolution: "
$TOOL $TESTIMG path /testdir/hello.txt | grep -q "inode" && echo "PASS" || echo "FAIL"

# Test 6: Root inode (always 2)
echo -n "Test 6 - Root inode is 2: "
$TOOL $TESTIMG inode 2 | grep -q "Directory" && echo "PASS" || echo "FAIL"

# Test 7: Invalid path handling
echo -n "Test 7 - Invalid path: "
$TOOL $TESTIMG path /nonexistent 2>&1 | grep -qi "not found\|error\|ENOENT" && echo "PASS" || echo "FAIL"

# Test 8: Compare with dumpe2fs
echo -n "Test 8 - Match dumpe2fs inode count: "
EXPECTED=$(dumpe2fs $TESTIMG 2>/dev/null | grep "Inode count:" | awk '{print $3}')
ACTUAL=$($TOOL $TESTIMG superblock | grep -i "inode" | head -1 | grep -o '[0-9]*')
[ "$EXPECTED" = "$ACTUAL" ] && echo "PASS" || echo "FAIL (expected $EXPECTED, got $ACTUAL)"

echo ""
echo "Integration tests complete."

6.3 Comparison Testing

# Compare output with standard tools
echo "Comparing with dumpe2fs..."
dumpe2fs test.img 2>/dev/null > expected_sb.txt
./fsexplore test.img superblock > actual_sb.txt
# Manually compare key fields

echo "Comparing with debugfs..."
debugfs -R "stat <2>" test.img 2>/dev/null > expected_inode2.txt
./fsexplore test.img inode 2 > actual_inode2.txt
# Manually compare fields

echo "Comparing directory listing..."
debugfs -R "ls -l /" test.img 2>/dev/null > expected_root.txt
./fsexplore test.img dir / > actual_root.txt
# Compare entries

7. Common Pitfalls and Debugging

Pitfall 1: Struct Padding

Problem: Your struct size does not match on-disk size.

struct bad_inode {
    uint16_t i_mode;    // 2 bytes
    uint16_t i_uid;     // 2 bytes
    uint32_t i_size;    // 4 bytes
};  // Might be 8 bytes due to alignment, but ext2 expects specific layout

Symptom: Fields after the first few are garbage values.

Solution: Use packed attribute and verify size.

struct ext2_inode {
    // fields exactly as documented...
} __attribute__((packed));

// Compile-time verification
_Static_assert(sizeof(struct ext2_inode) == 128, "Inode size mismatch");

Verification:

# Check your struct size
printf "sizeof(struct ext2_inode) = %zu\n", sizeof(struct ext2_inode)
# Should print 128

Pitfall 2: Off-by-One in Inode Numbering

Problem: Inodes are numbered from 1, not 0.

// WRONG: Reading inode 1 at index 1
uint32_t index = inode_num / inodes_per_group;

// CORRECT: Reading inode 1 at index 0
uint32_t index = (inode_num - 1) / inodes_per_group;

Symptom: Root directory (inode 2) shows wrong data, or all inodes are offset by one.

Verification:

# Use debugfs to verify inode 2 location
debugfs -R "imap <2>" test.img

Pitfall 3: Block 0 Special Case (Sparse Files)

Problem: Block pointer 0 does not mean block 0–it means “hole” (sparse file).

// WRONG
uint32_t block = inode->i_block[0];
read_block(block, buffer);  // Reads block 0, which is boot sector!

// CORRECT
uint32_t block = inode->i_block[0];
if (block == 0) {
    // Sparse region - return zeros
    memset(buffer, 0, block_size);
} else {
    read_block(block, buffer);
}

Symptom: Reading files shows boot sector data or crashes.

Pitfall 4: Directory Entry Alignment

Problem: Directory entries have padding to 4-byte boundaries.

// WRONG: Using calculated size
entry = (struct ext2_dir_entry_2 *)((char *)entry + 8 + entry->name_len);

// CORRECT: Using rec_len which includes padding
entry = (struct ext2_dir_entry_2 *)((char *)entry + entry->rec_len);

Symptom: Directory listing shows garbage after first few entries.

Pitfall 5: Filename Not Null-Terminated

Problem: Directory entry names are NOT null-terminated.

// WRONG
printf("Name: %s\n", entry->name);  // May print garbage

// CORRECT
char name[256];
memcpy(name, entry->name, entry->name_len);
name[entry->name_len] = '\0';
printf("Name: %s\n", name);

Pitfall 6: Endianness on Big-Endian Systems

Problem: ext2 is little-endian; big-endian systems need conversion.

// Check if conversion needed
#if __BYTE_ORDER == __BIG_ENDIAN
    #define LE32(x) le32toh(x)
    #define LE16(x) le16toh(x)
#else
    #define LE32(x) (x)
    #define LE16(x) (x)
#endif

uint32_t blocks = LE32(sb.s_blocks_count);

Debugging Techniques

1. Hex dump verification:

# Read superblock bytes manually
xxd -s 1024 -l 256 disk.img

# Read specific inode (inode 2, 128 bytes each, after inode table start)
# First find inode table block from BGDT

2. Known-good comparison:

# Use debugfs to explore
debugfs disk.img
debugfs: stats
debugfs: stat <2>
debugfs: ls -l /
debugfs: dump <12> /tmp/extracted_file

3. Minimal test cases:

# Create minimal filesystem
dd if=/dev/zero of=tiny.img bs=1K count=1024
mkfs.ext2 -F -b 1024 -N 16 tiny.img
# Only 16 inodes, 1MB total - easy to trace manually

4. Add verbose debugging output:

#ifdef DEBUG
    #define DBG(fmt, ...) fprintf(stderr, "[DEBUG] " fmt "\n", ##__VA_ARGS__)
#else
    #define DBG(fmt, ...)
#endif

// Usage
DBG("Reading inode %u at offset %ld", ino, offset);

8. Extensions and Challenges

Extension 1: ext4 Extent Support

ext4 uses extents instead of indirect blocks for more efficient large file storage:

struct ext4_extent_header {
    uint16_t eh_magic;      // 0xF30A
    uint16_t eh_entries;
    uint16_t eh_max;
    uint16_t eh_depth;
    uint32_t eh_generation;
};

struct ext4_extent {
    uint32_t ee_block;      // Logical block number
    uint16_t ee_len;        // Number of blocks
    uint16_t ee_start_hi;   // High 16 bits of physical block
    uint32_t ee_start_lo;   // Low 32 bits of physical block
};

Challenge: Detect extent flag in inode, parse extent tree instead of block pointers.

Extension 2: Deleted File Recovery

// Find deleted inodes by:
// 1. Scanning inode table for inodes with dtime != 0
// 2. Following their block pointers (if still valid)
// 3. Checking block bitmap for actually-free blocks

void recover_deleted_files(struct fs_context *ctx) {
    for (uint32_t ino = 1; ino <= ctx->sb.s_inodes_count; ino++) {
        struct ext2_inode inode;
        read_inode(ctx, ino, &inode);

        if (inode.i_dtime != 0 && inode.i_size > 0) {
            printf("Deleted inode %u, size %u, deleted at %s\n",
                   ino, inode.i_size, ctime(&inode.i_dtime));
            // Attempt recovery...
        }
    }
}

Extension 3: Visual Block Map

$ ./fsexplore disk.img blockmap

Block usage map (# = used, . = free, S = superblock, I = inode table):

Group 0:
     0         1         2         3         4
     01234567890123456789012345678901234567890123456789
 0:  SBBIIIIIIIII################################......
 1:  ####################............................
 2:  ................................................

Legend: S=Superblock B=BGDT/Bitmap I=Inode table #=Data .=Free

Extension 4: JSON Output

{
  "superblock": {
    "magic": "0xEF53",
    "block_size": 4096,
    "inodes_count": 128016,
    "blocks_count": 512000,
    "free_inodes_count": 127993,
    "free_blocks_count": 451231
  },
  "block_groups": [
    {
      "group": 0,
      "inode_bitmap": 3,
      "block_bitmap": 4,
      "inode_table": 5,
      "free_blocks": 28231,
      "free_inodes": 7993
    }
  ]
}

Extension 5: Cross-Filesystem Support

Add support for reading FAT32 filesystems:

  • Parse FAT boot sector (different structure)
  • Follow FAT cluster chains (different allocation method)
  • Handle VFAT long filenames (different directory entry format)

This teaches you how different design decisions affect filesystem structure.

Challenge: Write Support (Advanced)

Implement read-write mode:

  1. Modify directory entries (rename, delete)
  2. Update bitmaps when allocating/freeing blocks
  3. Modify inode metadata
  4. Update superblock free counts

Warning: Test only on disk images, never on real filesystems!


9. Real-World Connections

Forensic Analysis

Forensic investigators use similar tools to:

  • Recover deleted files: Read freed inodes whose blocks have not been overwritten
  • Analyze file timestamps: Track when files were created, modified, accessed
  • Detect tampering: Check for inconsistencies between timestamps and content
  • Extract evidence: Pull files from disk images without mounting

Popular tools like Autopsy, Sleuth Kit, and ext4magic build on these fundamentals.

Data Recovery

Understanding filesystem internals enables:

  • Recovering data from corrupted filesystems: Manually parse structures when fsck fails
  • Rebuilding superblocks: Copy from backup superblock locations
  • Extracting files without mounting: Read damaged filesystems that will not mount
  • Undeleting files: Find and recover recently deleted data

Operating System Development

OS developers implementing filesystem drivers need to:

  • Parse on-disk structures correctly
  • Handle all edge cases (sparse files, hard links, symlinks)
  • Optimize block allocation for performance
  • Implement crash recovery mechanisms

Cloud and Database Systems

Similar concepts apply to:

  • Cloud block storage: AWS EBS, Azure Disk work at block level
  • Database storage engines: MySQL InnoDB, PostgreSQL use similar block/page concepts
  • Object storage internals: S3 and similar systems organize data in blocks
  • Log-structured filesystems: Understanding traditional filesystems helps understand newer designs

10. Resources

Documentation

Books

Book Relevant Chapters
“Operating Systems: Three Easy Pieces” by Arpaci-Dusseau Ch. 39-40 (Files, FS Implementation)
“The Linux Programming Interface” by Michael Kerrisk Ch. 14-15, 18 (Files, Directories)
“Understanding the Linux Kernel” by Bovet & Cesati Ch. 18 (Ext2/Ext3)
“C Programming: A Modern Approach” by K.N. King Ch. 22 (Input/Output)
“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron Ch. 2 (Data Representation)

Tools

  • dumpe2fs - Display ext2/ext3/ext4 filesystem information
  • debugfs - Interactive ext2 filesystem debugger
  • xxd - Make hexdump of any file
  • mke2fs / mkfs.ext2 - Create ext2 filesystems
  • e2fsck - Check ext2 filesystem consistency
  • stat - Display file metadata

Example Code


11. Self-Assessment Checklist

Before considering this project complete, verify you can:

Understanding

  • Explain what a superblock contains and why it exists
  • Describe the purpose of block groups and their components
  • Calculate an inode’s disk location given its number
  • Describe how directory entries link names to inodes
  • Trace a path resolution from root to any file
  • Explain why indirect blocks exist and how they work
  • Calculate maximum file size for given block size

Implementation

  • Open and validate ext2 disk images
  • Parse and display superblock information matching dumpe2fs
  • Read any inode and display its metadata
  • List contents of any directory
  • Resolve arbitrary file paths to inode numbers
  • Handle indirect blocks for large files

Quality

  • Use a hex editor to manually verify your parsing is correct
  • Create test cases for edge cases (empty directories, sparse files)
  • Handle errors gracefully (bad magic, corrupted data, invalid paths)
  • Compare output with standard tools (dumpe2fs, debugfs)

Bonus

  • Support ext4 extent-based files
  • Implement deleted file detection
  • Add JSON output mode
  • Create visual block usage maps

12. Submission/Completion Criteria

Your project is complete when:

Minimum Requirements

  1. Compiles without warnings using gcc -Wall -Wextra -std=c11

  2. Passes all integration tests in the test script

  3. Correctly parses superblock with output matching dumpe2fs for:
    • Magic number
    • Block size
    • Inode count
    • Block count
    • Free counts
  4. Correctly reads inodes with accurate:
    • File type detection
    • Permission string
    • Size
    • Block pointer listing
  5. Correctly lists directories showing:
    • All entries including . and ..
    • Correct inode numbers
    • File types
  6. Correctly resolves paths for:
    • Root directory
    • Single-level paths
    • Multi-level nested paths
    • Handles non-existent paths with error

Stretch Goals

  1. Handles indirect blocks for files larger than 48KB

  2. Supports ext4 extent-based file reading

  3. Provides hex dump mode with structure annotations

  4. Includes comprehensive error handling with meaningful messages

Verification

Run these commands to verify completion:

# Build
make clean && make

# Run tests
make test

# Compare with standard tools
./fsexplore test.img superblock
dumpe2fs test.img 2>/dev/null | head -20

./fsexplore test.img inode 2
debugfs -R "stat <2>" test.img

./fsexplore test.img dir /
debugfs -R "ls -l /" test.img

You have succeeded when your output matches the standard tools for all tested operations.


Summary

This project teaches you to see through the abstraction of filesystems. By building a tool that reads raw disk bytes and interprets them as structured data, you will understand:

  • How files are really stored: Not as contiguous bytes, but as scattered blocks tracked by inodes
  • How directories work: Special files containing name-to-inode mappings
  • How path resolution happens: Recursive traversal through directory entries
  • How metadata is organized: Superblocks, block groups, bitmaps, inode tables
  • How binary data is parsed: Struct packing, endianness, offset calculations

This knowledge transfers directly to:

  • Data recovery and forensics
  • Operating system development
  • Database storage engine design
  • Cloud storage architecture
  • Security analysis and reverse engineering

Build it, trace through it with a hex editor, and you will never look at ls the same way again.