Project 1: Hex Dump Disk Explorer
Project 1: Hex Dump Disk Explorer
Project Overview
| Attribute | Details |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Language | C |
| Knowledge Area | Filesystems / Binary Parsing |
| Main Book | โOperating Systems: Three Easy Piecesโ by Arpaci-Dusseau |
| Prerequisites | C basics, structs and pointers, hex/binary familiarity |
What Youโll Build
A command-line tool that reads raw disk images or block devices and visualizes filesystem structuresโsuperblocks, inode tables, directory entries, and data blocksโin human-readable format.
Real World Output:
$ ./fsexplore disk.img
SUPERBLOCK at offset 1024:
Magic: 0xEF53 โ (ext2 signature)
Block size: 4096 bytes
Inodes: 128016
Free blocks: 45231
INODE #2 (root directory):
Type: Directory
Size: 4096 bytes
Permissions: drwxr-xr-x
Links: 23
Direct blocks: [258, 0, 0, ...]
DIRECTORY at block 258:
[2] .
[2] ..
[11] lost+found
[12] home
[13] etc
Learning Objectives
By completing this project, you will:
- Parse binary data structures from raw disk images using C structs
- Understand ext2/ext4 on-disk layout including superblock, block groups, and inodes
- Navigate inode tables and decode file metadata (permissions, timestamps, block pointers)
- Traverse directory entries and understand path resolution
- Work with indirect block pointers for large file support
- Debug binary parsing issues using hex editors and careful offset calculations
The Core Question Youโre Answering
โHow does a filesystem actually store my files on disk? What does โext2โ really mean?โ
When you create a file named โhello.txtโ and write โHello Worldโ to it, where do those bytes actually go? How does the OS know which disk sectors contain your file versus someone elseโs?
Most developers treat filesystems as black boxes. This project rips away the abstraction. Youโll read raw disk bytes and decode the exact structures that ext2 uses. When you finish, youโll see filesystems not as magic, but as elegant data structures mapped onto persistent storage.
Deep Theoretical Foundation
1. Why Filesystems Organize Data Into Blocks
Hard drives and SSDs donโt operate on individual bytesโthey work with fixed-size units called sectors (historically 512 bytes, now often 4096 bytes on modern drives). Filesystems organize data into blocks, which are typically 1024, 2048, or 4096 bytes.
Why blocks matter:
- Efficiency: Reading one 4KB block is faster than 4096 individual byte reads
- Addressing: With 32-bit block numbers and 4KB blocks, you can address 16TB of storage
- Metadata overhead: Block-level tracking (one bit per block) uses far less space than byte-level tracking
The fundamental tradeoff:
- Larger blocks โ less metadata, faster sequential I/O, more internal fragmentation
- Smaller blocks โ more metadata, less wasted space for small files
2. The Superblock: Filesystemโs Master Record
Every ext2/ext4 filesystem begins with a superblock at byte offset 1024. The first 1024 bytes are reserved for boot code (MBR/GPT).
The superblock contains critical metadata:
struct ext2_super_block {
uint32_t s_inodes_count; // Total number of inodes
uint32_t s_blocks_count; // Total number of blocks
uint32_t s_free_blocks_count; // Free blocks available
uint32_t s_free_inodes_count; // Free inodes available
uint32_t s_first_data_block; // First data block (0 or 1)
uint32_t s_log_block_size; // Block size = 1024 << this value
uint32_t s_blocks_per_group; // Blocks per block group
uint32_t s_inodes_per_group; // Inodes per block group
uint16_t s_magic; // Magic signature (0xEF53)
// ... many more fields
};
Key insight: The superblock is duplicated in every block group. If the primary superblock corrupts, fsck can restore it from backups.
3. Block Groups: Locality for Performance
ext2/ext4 divides the disk into block groups. Each group contains:
- Superblock copy (in some groups)
- Block Group Descriptor Table copy
- Block bitmap (tracks used/free blocks in this group)
- Inode bitmap (tracks used/free inodes)
- Inode table (actual inode structures)
- Data blocks
Why block groups exist:
- Locality: Files stored near their inodes โ faster access
- Parallelism: Different groups can be accessed concurrently
- Resilience: Damage to one group doesnโt destroy the entire filesystem
Block Group 0 Block Group 1 Block Group 2
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Superblock โ โ SB copy (sparse)โ โ SB copy (sparse)โ
โ BGDT โ โ BGDT copy โ โ BGDT copy โ
โ Block bitmap โ โ Block bitmap โ โ Block bitmap โ
โ Inode bitmap โ โ Inode bitmap โ โ Inode bitmap โ
โ Inode table โ โ Inode table โ โ Inode table โ
โ Data blocks... โ โ Data blocks... โ โ Data blocks... โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
4. Inodes: The Heart of Unix Filesystems
An inode (index node) is a fixed-size structure (typically 128 or 256 bytes) that stores everything about a file except its name:
struct ext2_inode {
uint16_t i_mode; // File type + permissions
uint16_t i_uid; // Owner user ID
uint32_t i_size; // File size in bytes
uint32_t i_atime; // Last access time
uint32_t i_ctime; // Creation time
uint32_t i_mtime; // Modification time
uint32_t i_dtime; // Deletion time
uint16_t i_gid; // Group ID
uint16_t i_links_count; // Hard link count
uint32_t i_blocks; // Number of 512-byte blocks
uint32_t i_block[15]; // Block pointers
// ... more fields
};
The block pointer array:
i_block[0-11]: Direct block pointers (12 blocks ร 4KB = 48KB directly addressable)i_block[12]: Single indirect pointer (points to block of pointers)i_block[13]: Double indirect pointeri_block[14]: Triple indirect pointer
Maximum file size calculation (4KB blocks, 4-byte pointers):
- Direct: 12 ร 4KB = 48KB
- Single indirect: 1024 ร 4KB = 4MB
- Double indirect: 1024 ร 1024 ร 4KB = 4GB
- Triple indirect: 1024ยณ ร 4KB = 4TB
- Total: ~4TB maximum file size
5. Directory Entries: Mapping Names to Inodes
Directories are special files whose data blocks contain directory entries:
struct ext2_dir_entry_2 {
uint32_t inode; // Inode number
uint16_t rec_len; // Entry length (for traversal)
uint8_t name_len; // Filename length
uint8_t file_type; // File type (1=file, 2=dir, 7=symlink)
char name[]; // Filename (variable length, not null-terminated)
};
Key insight: Filenames live in directory entries, not in inodes. This enables:
- Hard links: Multiple directory entries pointing to same inode
- Efficient rename: Just update directory entry, not file data
- Case sensitivity: Filesystem controls name comparison
Path resolution algorithm:
- Start at root inode (always inode 2)
- Read rootโs directory data blocks
- Search for first path component
- Get that entryโs inode number
- Repeat for each path component
6. Binary Data Layout and Endianness
ext2 stores all multi-byte integers in little-endian format (least significant byte first). On x86/x64 systems, you can read structs directly. On big-endian systems, you need conversion:
// Convert little-endian to host byte order
#include <endian.h>
uint32_t blocks = le32toh(sb.s_blocks_count);
Struct padding warning:
// This struct might have padding between fields!
struct bad_example {
uint8_t a; // 1 byte
// 3 bytes padding here on many systems
uint32_t b; // 4 bytes
};
// Use packed attribute to match on-disk layout
struct ext2_superblock {
// fields...
} __attribute__((packed));
Project Specification
Functional Requirements
- Open and validate disk images
- Accept file path as command-line argument
- Open in binary read mode
- Validate ext2 magic number (0xEF53)
- Handle errors gracefully (file not found, not ext2, etc.)
- Parse and display superblock
- Read superblock at offset 1024
- Calculate derived values (block size, group count)
- Display in human-readable format
- Navigate block groups
- Parse Block Group Descriptor Table
- Display information for each block group
- Show bitmap locations and inode table positions
- Read and decode inodes
- Accept inode number as input
- Calculate inode location (which group, which offset)
- Display all inode fields with interpretation
- Traverse directory contents
- Read directory inode
- Parse directory entries from data blocks
- Display entries with inode numbers and names
- Follow file paths
- Accept path string (e.g., โ/home/user/file.txtโ)
- Resolve path to inode number
- Display final inode information
Non-Functional Requirements
- Handle disk images from 1MB to 100GB
- Parse in reasonable time (< 1 second for path resolution)
- Provide meaningful error messages
- Support both ext2 and ext4 basic structures
Solution Architecture
Component Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CLI Interface โ
โ (argument parsing, output formatting) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Filesystem Parser โ
โ (superblock, BGDT, inodes, directories) โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโค
โ Superblock โ Block Group โ Inode โ
โ Parser โ Navigator โ Reader โ
โโโโโโโโโโฌโโโโโโโโโดโโโโโโโโโโฌโโโโโโโโโโดโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Block I/O Layer โ
โ (read blocks, handle offsets) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Disk Image File โ
โ (opened with fopen/read) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Data Structures
// Filesystem context (global state)
struct fs_context {
FILE *disk; // Open file handle
struct ext2_super_block sb; // Cached superblock
uint32_t block_size; // Calculated block size
uint32_t groups_count; // Number of block groups
struct ext2_group_desc *bgdt; // Block group descriptors
};
// Parsed inode with derived info
struct parsed_inode {
struct ext2_inode raw; // Raw inode data
uint32_t ino; // Inode number
char type_str[16]; // "file", "dir", "symlink"
char perm_str[11]; // "drwxr-xr-x"
char mtime_str[32]; // Human-readable timestamp
};
// Directory entry list
struct dir_entry_list {
uint32_t inode;
char name[256];
uint8_t file_type;
struct dir_entry_list *next;
};
Key Functions
// Initialization
int fs_open(struct fs_context *ctx, const char *path);
void fs_close(struct fs_context *ctx);
// Low-level I/O
int read_block(struct fs_context *ctx, uint32_t block_num, void *buf);
int read_bytes(struct fs_context *ctx, off_t offset, size_t len, void *buf);
// Superblock operations
int parse_superblock(struct fs_context *ctx);
void print_superblock(struct fs_context *ctx);
// Block group operations
int load_bgdt(struct fs_context *ctx);
void print_block_group(struct fs_context *ctx, int group_num);
// Inode operations
int read_inode(struct fs_context *ctx, uint32_t ino, struct ext2_inode *inode);
void print_inode(struct parsed_inode *pi);
int inode_get_block(struct fs_context *ctx, struct ext2_inode *inode,
uint32_t logical_block, uint32_t *physical_block);
// Directory operations
struct dir_entry_list* read_directory(struct fs_context *ctx,
struct ext2_inode *dir_inode);
void free_dir_list(struct dir_entry_list *list);
// Path resolution
int resolve_path(struct fs_context *ctx, const char *path,
struct ext2_inode *result);
Implementation Guide
Phase 1: Binary I/O Foundation (Days 1-2)
Goals:
- Create disk image I/O layer
- Read raw bytes at specific offsets
- Verify you can read the superblock magic number
Steps:
- Create a minimal program that opens a file in binary mode
- Seek to offset 1024 and read 2 bytes at offset 56 (magic number location)
- Verify you read 0x53EF (little-endian for 0xEF53)
Testing:
# Create test image
dd if=/dev/zero of=test.img bs=1M count=10
mkfs.ext2 test.img
# Verify magic number manually
xxd -s 1024 -l 64 test.img | head -5
Key learning: Binary file I/O with fopen("rb"), fseek(), fread()
Phase 2: Superblock Parser (Days 2-3)
Goals:
- Define complete superblock struct
- Parse all relevant fields
- Display human-readable output
Steps:
- Define
struct ext2_super_blockwith all fields (use ext2 documentation) - Read the struct from offset 1024
- Calculate derived values (block size = 1024 ยซย s_log_block_size)
- Print formatted output
Testing:
# Compare with dumpe2fs output
dumpe2fs test.img 2>/dev/null | head -30
./fsexplore test.img superblock
Key learning: Struct packing, byte order, derived calculations
Phase 3: Block Group Navigation (Days 3-4)
Goals:
- Parse Block Group Descriptor Table
- Locate inode tables and bitmaps
- Navigate to specific block groups
Steps:
- BGDT starts at block 1 (or 2 for 1KB blocks)
- Each descriptor is 32 bytes
- Parse inode table location, bitmap locations
- Calculate number of groups from superblock
Testing:
# Verify with dumpe2fs
dumpe2fs test.img 2>/dev/null | grep -A 10 "Group 0"
Key learning: Multi-level data structures, offset calculations
Phase 4: Inode Reader (Days 4-6)
Goals:
- Calculate inode location from inode number
- Read and parse inode structures
- Decode permissions and file types
Steps:
- Given inode N, find its block group:
(N - 1) / inodes_per_group - Find offset within group:
(N - 1) % inodes_per_group - Get inode table start from BGDT
- Read inode at calculated offset
Testing:
# Create files and check their inodes
sudo mount -o loop test.img /mnt
sudo touch /mnt/testfile
ls -i /mnt/testfile # Shows inode number
sudo umount /mnt
./fsexplore test.img inode <number>
Key learning: Inode addressing, metadata interpretation
Phase 5: Directory Traversal (Days 6-8)
Goals:
- Read directory data blocks
- Parse variable-length directory entries
- List directory contents
Steps:
- Read directory inode (inode 2 for root)
- Follow block pointers to read directory data
- Parse entries: inode, rec_len, name_len, name
- Handle variable-length entries correctly
Testing:
# Add files to test image
sudo mount -o loop test.img /mnt
sudo mkdir /mnt/subdir
sudo touch /mnt/file1.txt /mnt/file2.txt
sudo umount /mnt
./fsexplore test.img dir /
Key learning: Variable-length structures, linked traversal
Phase 6: Path Resolution (Days 8-10)
Goals:
- Parse path strings
- Traverse directory tree
- Find final inode for any path
Steps:
- Split path by โ/โ separator
- Start at inode 2 (root)
- For each component, read directory, find matching entry
- Use entryโs inode as new current inode
- Repeat until path exhausted
Testing:
sudo mount -o loop test.img /mnt
sudo mkdir -p /mnt/a/b/c
sudo touch /mnt/a/b/c/deep.txt
sudo umount /mnt
./fsexplore test.img path /a/b/c/deep.txt
Key learning: Recursive traversal, path parsing
Phase 7: Indirect Blocks (Days 10-14)
Goals:
- Handle files larger than 48KB
- Follow indirect, double-indirect, triple-indirect pointers
- Read any byte from any file
Steps:
- For logical block 0-11, use direct pointers
- For block 12-1035, read indirect block, use Nth pointer
- For blocks 1036+, read double-indirect, then indirect
- Implement
get_block_num(inode, logical_block)function
Testing:
sudo mount -o loop test.img /mnt
dd if=/dev/urandom of=/mnt/largefile bs=1M count=5
sudo umount /mnt
./fsexplore test.img file-blocks /largefile
Key learning: Multi-level indirection, block addressing
Testing Strategy
Unit Tests
// Test superblock validation
void test_validate_superblock() {
struct ext2_super_block sb;
sb.s_magic = 0xEF53;
assert(is_valid_ext2(&sb) == 1);
sb.s_magic = 0x0000;
assert(is_valid_ext2(&sb) == 0);
}
// Test block size calculation
void test_block_size() {
struct ext2_super_block sb;
sb.s_log_block_size = 0; // 1024 << 0 = 1024
assert(get_block_size(&sb) == 1024);
sb.s_log_block_size = 2; // 1024 << 2 = 4096
assert(get_block_size(&sb) == 4096);
}
// Test inode location calculation
void test_inode_location() {
// With 8192 inodes per group, 128 byte inodes:
// Inode 1 is at group 0, index 0
// Inode 8192 is at group 0, index 8191
// Inode 8193 is at group 1, index 0
assert(get_inode_group(1, 8192) == 0);
assert(get_inode_index(1, 8192) == 0);
assert(get_inode_group(8193, 8192) == 1);
assert(get_inode_index(8193, 8192) == 0);
}
Integration Tests
#!/bin/bash
# test_fsexplore.sh
# Create test image with known contents
dd if=/dev/zero of=test.img bs=1M count=10 2>/dev/null
mkfs.ext2 -F test.img 2>/dev/null
sudo mount -o loop test.img /mnt
sudo mkdir /mnt/testdir
echo "Hello World" | sudo tee /mnt/testdir/hello.txt > /dev/null
sudo umount /mnt
# Test superblock
./fsexplore test.img superblock | grep -q "Magic: 0xEF53"
[ $? -eq 0 ] && echo "PASS: Superblock magic" || echo "FAIL: Superblock magic"
# Test root directory
./fsexplore test.img dir / | grep -q "testdir"
[ $? -eq 0 ] && echo "PASS: Root directory" || echo "FAIL: Root directory"
# Test path resolution
./fsexplore test.img path /testdir/hello.txt | grep -q "Type: file"
[ $? -eq 0 ] && echo "PASS: Path resolution" || echo "FAIL: Path resolution"
# Cleanup
rm test.img
Comparison Testing
# Compare your output with standard tools
dumpe2fs test.img 2>/dev/null > expected.txt
./fsexplore test.img superblock > actual.txt
diff -u expected.txt actual.txt
Common Pitfalls and Debugging
Pitfall 1: Struct Padding
Problem: Your struct size doesnโt match on-disk size.
struct bad_inode {
uint16_t i_mode; // 2 bytes
uint16_t i_uid; // 2 bytes + 2 bytes padding
uint32_t i_size; // 4 bytes
}; // Size: 10 bytes, but ext2 expects 128!
Solution: Use packed attribute and verify size.
struct ext2_inode {
// fields...
} __attribute__((packed));
_Static_assert(sizeof(struct ext2_inode) == 128, "Inode size mismatch");
Pitfall 2: Off-by-One in Inode Numbering
Problem: Inodes are numbered from 1, not 0.
Solution:
// WRONG
uint32_t index = inode_num / inodes_per_group;
// CORRECT
uint32_t index = (inode_num - 1) / inodes_per_group;
Pitfall 3: Block 0 Special Case
Problem: Block pointers of 0 donโt mean block 0โthey mean โholeโ (sparse file).
Solution:
if (block_ptr == 0) {
// Sparse region - return zeros, don't try to read block 0
memset(buffer, 0, block_size);
}
Pitfall 4: Directory Entry Alignment
Problem: Directory entries have padding to 4-byte boundaries.
Solution:
// rec_len includes padding to next 4-byte boundary
// Use rec_len for iteration, not (8 + name_len)
entry = (struct ext2_dir_entry_2 *)((char *)entry + entry->rec_len);
Debugging Techniques
1. Hex dump verification:
# Read superblock bytes manually
xxd -s 1024 -l 256 disk.img
# Compare with your parsed values
./fsexplore disk.img superblock --raw
2. Known-good comparison:
# Use debugfs to explore
debugfs disk.img
debugfs: stat /path/to/file
debugfs: ls -l /
3. Minimal test cases:
# Create minimal filesystem
dd if=/dev/zero of=tiny.img bs=1K count=1024
mkfs.ext2 -F -b 1024 -N 16 tiny.img
Extensions and Challenges
Extension 1: ext4 Support
- Parse extent tree instead of indirect blocks
- Handle 48-bit block addresses
- Support extents feature flag
Extension 2: Deleted File Recovery
- Scan inode table for deleted inodes (dtime set)
- Follow block pointers to recover data
- Reconstruct filenames from orphaned directory entries
Extension 3: Visual Block Map
$ ./fsexplore disk.img blockmap
Block usage map (# = used, . = free):
0 1 2 3
0123456789012345678901234567890123456789
0: ########################################
1: ########################################
2: ###########.............................
3: ........................................
Extension 4: JSON Output
{
"superblock": {
"magic": "0xEF53",
"block_size": 4096,
"inodes_count": 128016
},
"groups": [
{"inode_bitmap": 3, "block_bitmap": 4, "inode_table": 5}
]
}
Challenge: Cross-Filesystem Support
Add support for reading FAT32 filesystems:
- Parse FAT boot sector
- Follow cluster chains
- Handle VFAT long filenames
Real-World Connections
Forensic Analysis
Forensic investigators use similar tools to:
- Recover deleted files (reading freed inodes)
- Analyze file timestamps (atime, mtime, ctime)
- Track file modifications through journaling
Data Recovery
Understanding filesystem internals enables:
- Recovering data from corrupted filesystems
- Rebuilding superblocks from backup copies
- Extracting files without mounting (damaged filesystems)
Operating System Development
OS developers need to:
- Implement filesystem drivers
- Optimize block allocation strategies
- Handle crash recovery
Interview Questions
Conceptual
- โWhat is an inode, and what information does it store?โ
- Expected: Metadata (permissions, timestamps, size, owner), block pointers, but NOT the filename
- โHow does ext2 resolve a file path like โ/home/user/file.txtโ?โ
- Expected: Start at root inode (2), read directory entries, find โhomeโ entryโs inode, repeat
- โWhy does ext2 use direct, indirect, double-indirect, and triple-indirect pointers?โ
- Expected: Balance between small file efficiency and large file support
- โWhat is the maximum file size in ext2 with 4096-byte blocks?โ
- Expected: ~4TB (show calculation)
Implementation
- โWhy is the superblock at offset 1024 instead of 0?โ
- Expected: First 1024 bytes reserved for boot sector/MBR
- โWhat makes โ.โ and โ..โ directory entries special?โ
- Expected:
.points to current directoryโs inode,..points to parent
- Expected:
- โHow would you implement โfsckโ (filesystem consistency check)?โ
- Expected: Verify bitmaps match actual allocation, check directory integrity
Self-Assessment Checklist
Before considering this project complete, verify you can:
- Explain what a superblock contains and why it exists
- Calculate an inodeโs disk location given its number
- Describe how directory entries link names to inodes
- Trace a path resolution from root to any file
- Explain why indirect blocks exist and how they work
- Use a hex editor to manually verify your parsing is correct
- Create test cases for edge cases (empty directories, sparse files)
- Handle errors gracefully (bad magic, corrupted data)
Resources
Documentation
- The Second Extended File System - Official ext2 specification
- OSDev Wiki - Ext2 - Implementation guide
Books
- โOperating Systems: Three Easy Piecesโ Ch. 40 - File System Implementation
- โThe Linux Programming Interfaceโ Ch. 14 - File Systems
- โC Programming: A Modern Approachโ Ch. 22 - Input/Output
Tools
dumpe2fs- Display ext2/ext3/ext4 filesystem informationdebugfs- Interactive ext2 filesystem debuggerxxd- Make hexdump of any file