Project 5: Filesystem Comparison Tool

Project 5: Filesystem Comparison Tool

Project Overview

Attribute Details
Difficulty Advanced
Time Estimate 2 weeks
Language C
Knowledge Area Performance Analysis / Filesystems
Main Book โ€œSystems Performanceโ€ by Brendan Gregg
Prerequisites Project 1 or strong understanding of filesystem structures

What Youโ€™ll Build

A tool that mounts and analyzes multiple filesystem types (ext2/ext4, FAT32, your custom FS) and produces detailed comparison reportsโ€”performance metrics, space efficiency, feature support.

Real World Output:

$ ./fscompare ext4.img fat32.img myfs.img

FILESYSTEM COMPARISON REPORT
============================

                    ext4        FAT32       MyFS
-------------------------------------------------
Max filename:       255         255         255
Max file size:      16TB        4GB         2GB
Journaling:         Yes         No          Yes
Permissions:        Unix        None        Unix
Case sensitive:     Yes         No          Yes

SPACE EFFICIENCY (1GB disk, 10000 small files):
Metadata overhead:  2.1%        0.8%        3.2%
Avg fragmentation:  1.2%        15.3%       4.1%
Space wasted:       12MB        156MB       31MB

PERFORMANCE (sequential 100MB write):
Write speed:        245 MB/s    198 MB/s    156 MB/s
Read speed:         312 MB/s    287 MB/s    201 MB/s

Learning Objectives

By completing this project, you will:

  1. Parse multiple filesystem formats (ext2/4, FAT32, custom)
  2. Measure real performance with proper benchmarking methodology
  3. Analyze space efficiency and fragmentation
  4. Understand filesystem design tradeoffs through direct comparison
  5. Build practical filesystem analysis tools
  6. Generate meaningful reports from raw filesystem data

The Core Question Youโ€™re Answering

โ€œWhy do different filesystems exist? What fundamental tradeoffs force us to choose between FAT32, ext4, NTFS, and Btrfs?โ€

Most developers think filesystems are โ€œjust different versions of the same thing.โ€ Theyโ€™re not. Each represents a fundamentally different answer to the question: What do you optimize for?

  • FAT32 optimizes for: Simplicity and universal compatibility
  • ext4 optimizes for: Performance and features for Linux
  • NTFS optimizes for: Windows integration and security
  • Btrfs/ZFS optimize for: Data integrity and snapshots

This project forces you to understand these tradeoffs by measuring them directly.


Deep Theoretical Foundation

1. Why Different Filesystems Exist

Historical Context

Year Filesystem Why It Was Created
1977 FAT12 Simple filesystem for floppy disks
1996 FAT32 Larger drives, still simple
1993 ext2 Modern Unix filesystem for Linux
1993 NTFS Windows NT needed security, journaling
2001 ext3 Added journaling to ext2
2006 ext4 Extents, larger files, better performance
2005 ZFS Data integrity, snapshots, pooling
2007 Btrfs Linux answer to ZFS features

Fundamental Tradeoffs

Tradeoff 1: Simplicity vs Features

FAT32                                   ext4
Simple โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Complex

- No permissions                        - Full Unix permissions
- No journaling                         - Journaling
- 4GB file limit                        - 16TB files
- Universal compatibility               - Linux-specific

Tradeoff 2: Space Efficiency vs Performance

Small blocks (1KB)                      Large blocks (64KB)
Space efficient โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ High throughput

- Less internal fragmentation           - More wasted space
- More metadata overhead                - Faster large file I/O
- Slower random I/O                     - Fewer seeks for big files

Tradeoff 3: Write Speed vs Durability

No journaling                           Full data journaling
Fast writes โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Guaranteed consistency

- Writes go directly to disk            - Writes go to journal first
- Crash can corrupt filesystem          - Crash recovery is fast
- Simple implementation                 - Complex, more I/O

2. ext4 Internal Structure

ext4 is the most common Linux filesystem. Key structures:

Superblock (block 0 or 1):

struct ext4_super_block {
    uint32_t s_inodes_count;      // Total inodes
    uint32_t s_blocks_count_lo;   // Total blocks (low 32 bits)
    uint32_t s_log_block_size;    // Block size = 1024 << this
    uint16_t s_magic;             // 0xEF53
    uint32_t s_feature_compat;    // Compatible features
    uint32_t s_feature_incompat;  // Incompatible features
    // ... many more
};

// Feature flags tell you what features are enabled
#define EXT4_FEATURE_INCOMPAT_EXTENTS    0x40
#define EXT4_FEATURE_COMPAT_HAS_JOURNAL  0x04

Block Groups: ext4 divides disk into block groups for locality:

[Group 0][Group 1][Group 2]...[Group N]

Each group:
[SB copy][BGDT][Block bitmap][Inode bitmap][Inode table][Data]

Extents (modern) vs Block Pointers (legacy):

Block pointers:           Extents:
inode.i_block[0] = 100    extent: start=100, length=50
inode.i_block[1] = 101    (one entry describes 50 blocks)
inode.i_block[2] = 102
... (50 entries)

Extents are much more efficient for contiguous files.

3. FAT32 Internal Structure

FAT32 is elegant in its simplicity:

Boot Sector (sector 0):

struct fat32_boot_sector {
    uint8_t  jmp[3];           // Jump instruction
    uint8_t  oem[8];           // OEM name
    uint16_t bytes_per_sector; // Usually 512
    uint8_t  sectors_per_cluster;
    uint16_t reserved_sectors;
    uint8_t  num_fats;         // Usually 2
    uint32_t sectors_per_fat;
    uint32_t root_cluster;     // Root directory start
    // ...
};

File Allocation Table:

The FAT is just an array of 32-bit cluster numbers:

FAT[0] = reserved
FAT[1] = reserved
FAT[2] = 3         // Cluster 2 โ†’ cluster 3
FAT[3] = 4         // Cluster 3 โ†’ cluster 4
FAT[4] = 0x0FFFFFF8  // Cluster 4 is end of chain
FAT[5] = 0         // Free cluster
...

To read a file: follow the chain from starting cluster.

Why FAT32 has 4GB file limit: Directory entries store file size in a 32-bit field:

struct fat32_dir_entry {
    char     name[11];        // 8.3 filename
    uint8_t  attr;
    uint16_t cluster_high;    // High 16 bits of start cluster
    uint16_t cluster_low;     // Low 16 bits
    uint32_t file_size;       // Max 4GB-1
};

4. Performance Measurement Fundamentals

What to Measure:

Metric What It Means How to Measure
Throughput MB/s for large sequential I/O Time large reads/writes
IOPS Operations per second Count small random I/O
Latency Time for single operation High-resolution timer
CPU Usage Processing overhead /proc/stat before/after

Avoiding Measurement Errors:

  1. Page Cache Lies: The kernel caches recently read data. Drop caches or use O_DIRECT. ```c // Drop caches before reading system(โ€œecho 3 > /proc/sys/vm/drop_cachesโ€);

// Or use O_DIRECT to bypass cache int fd = open(path, O_RDONLY | O_DIRECT);


2. **Warm-up Effects:** First operations are slower. Run warmup iterations.

3. **Statistical Significance:** Run multiple iterations, report median and stddev.

4. **Background Noise:** Minimize other system activity.

### 5. Fragmentation Analysis

**What is Fragmentation?**

A file is fragmented when its data blocks are not contiguous on disk:

Contiguous file: Fragmented file: Block: 100 101 102 103 Block: 100 200 150 300 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”€โ”ฌโ”€โ–บโ”€โ”ฌโ”€โ–บโ”€โ”ฌโ”€โ–บ (1 seek) (4 seeks)


**Measuring Fragmentation:**

For ext4 (with extents):
```c
// Count extents per file
// 1 extent = perfect
// Many extents = fragmented

int count_extents(int fd) {
    struct fiemap fiemap = {0};
    fiemap.fm_length = ~0ULL;
    ioctl(fd, FS_IOC_FIEMAP, &fiemap);
    return fiemap.fm_mapped_extents;
}

For FAT32 (with FAT chains):

// Count cluster chain discontinuities
int count_fragments(uint32_t start_cluster) {
    int fragments = 1;
    uint32_t prev = start_cluster;
    uint32_t current = fat_table[start_cluster];

    while (current < 0x0FFFFFF8) {
        if (current != prev + 1) {
            fragments++;  // Discontinuity
        }
        prev = current;
        current = fat_table[current];
    }
    return fragments;
}

6. Space Efficiency Analysis

Metadata Overhead:

Total disk space = Metadata + Usable data space

ext4: Superblock, BGDT, bitmaps, inode tables (~2-5%)
FAT32: Boot sector, FAT tables (0.5-2%)

Internal Fragmentation:

A 1-byte file on 4KB block filesystem wastes 4095 bytes.

Average waste per file = block_size / 2

With 10000 files and 4KB blocks:
Expected waste = 10000 ร— 2KB = 20MB

Calculate efficiency:

struct space_analysis {
    uint64_t total_size;
    uint64_t metadata_size;
    uint64_t data_used;
    uint64_t internal_waste;
    double efficiency;  // data_used / (data_used + waste)
};

Project Specification

Functional Requirements

  1. Parse filesystem images
    • Detect filesystem type from magic numbers
    • Parse ext2/ext4 superblock and structures
    • Parse FAT32 boot sector and FAT table
    • Support your custom filesystem format
  2. Feature comparison
    • Maximum file/filename size
    • Journaling support
    • Permission support
    • Case sensitivity
  3. Space efficiency analysis
    • Metadata overhead percentage
    • Internal fragmentation measurement
    • Free space tracking
  4. Fragmentation analysis
    • Per-file extent/fragment count
    • Average fragmentation score
    • Free space fragmentation
  5. Performance benchmarking
    • Sequential read/write throughput
    • Random I/O performance
    • Small file creation speed
  6. Report generation
    • Human-readable text output
    • Machine-readable JSON/CSV
    • Comparison tables

Non-Functional Requirements

  • Parse images up to 1TB
  • Complete analysis in < 5 minutes
  • Accurate timing (microsecond resolution)
  • Reproducible results

Solution Architecture

Component Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        CLI Interface                             โ”‚
โ”‚              (parse arguments, format output)                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
                           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     Comparison Engine                            โ”‚
โ”‚        (coordinate analysis, aggregate results)                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚                 โ”‚                       โ”‚
        โ–ผ                 โ–ผ                       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Feature       โ”‚ โ”‚ Space         โ”‚     โ”‚ Performance   โ”‚
โ”‚ Detector      โ”‚ โ”‚ Analyzer      โ”‚     โ”‚ Benchmarker   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ”‚                 โ”‚                     โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
                          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Filesystem Parsers                            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    ext4 Parser  โ”‚   FAT32 Parser  โ”‚     Custom FS Parser        โ”‚
โ”‚  (superblock,   โ”‚   (boot sector, โ”‚    (your format)            โ”‚
โ”‚   inodes,       โ”‚    FAT table,   โ”‚                             โ”‚
โ”‚   extents)      โ”‚    clusters)    โ”‚                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
                          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Disk Image File(s)                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Structures

// Filesystem capabilities
struct fs_features {
    char name[32];
    bool has_journaling;
    bool has_permissions;
    bool case_sensitive;
    bool supports_sparse;
    bool supports_hardlinks;
    bool supports_symlinks;
    uint64_t max_file_size;
    uint32_t max_filename_len;
    uint32_t max_path_len;
};

// Space analysis results
struct space_metrics {
    uint64_t total_bytes;
    uint64_t used_bytes;
    uint64_t free_bytes;
    uint64_t metadata_bytes;
    uint64_t data_bytes;
    double metadata_overhead_pct;
    double space_efficiency_pct;
};

// Fragmentation results
struct frag_metrics {
    uint64_t total_files;
    uint64_t fragmented_files;
    double avg_extents_per_file;
    uint32_t max_extents;
    double fragmentation_score;
};

// Performance results
struct perf_metrics {
    double seq_write_mbps;
    double seq_read_mbps;
    double rand_write_iops;
    double rand_read_iops;
    double create_files_per_sec;
    double metadata_ops_per_sec;
};

// Complete analysis for one filesystem
struct fs_analysis {
    char image_path[256];
    char fs_type[32];
    struct fs_features features;
    struct space_metrics space;
    struct frag_metrics frag;
    struct perf_metrics perf;
};

Key Functions

// Filesystem detection
int detect_fs_type(const char *image_path, char *type_out);

// Parsers
int parse_ext4(const char *image, struct fs_analysis *result);
int parse_fat32(const char *image, struct fs_analysis *result);
int parse_myfs(const char *image, struct fs_analysis *result);

// Feature detection
int detect_features(const char *image, struct fs_features *features);

// Space analysis
int analyze_space(const char *image, struct space_metrics *metrics);
int measure_fragmentation(const char *image, struct frag_metrics *metrics);

// Performance benchmarks
int benchmark_sequential(const char *mount_point, struct perf_metrics *metrics);
int benchmark_random(const char *mount_point, struct perf_metrics *metrics);
int benchmark_metadata(const char *mount_point, struct perf_metrics *metrics);

// Report generation
void print_comparison_report(struct fs_analysis *analyses, int count);
void export_json(struct fs_analysis *analyses, int count, const char *path);
void export_csv(struct fs_analysis *analyses, int count, const char *path);

Implementation Guide

Phase 1: Filesystem Detection (Days 1-2)

Goals:

  • Detect filesystem type from magic numbers
  • Create unified parser interface
  • Handle unknown formats gracefully

Magic Numbers:

// ext2/3/4: bytes 1080-1081 = 0x53EF (little-endian)
#define EXT4_MAGIC_OFFSET 1080
#define EXT4_MAGIC 0xEF53

// FAT32: bytes 82-89 = "FAT32   "
#define FAT32_SIG_OFFSET 82
#define FAT32_SIG "FAT32   "

// Your custom filesystem
#define MYFS_MAGIC 0xDEADBEEF

int detect_fs_type(const char *image_path, char *type_out) {
    int fd = open(image_path, O_RDONLY);

    // Check ext4
    uint16_t ext_magic;
    pread(fd, &ext_magic, 2, EXT4_MAGIC_OFFSET);
    if (le16toh(ext_magic) == EXT4_MAGIC) {
        strcpy(type_out, "ext4");
        close(fd);
        return 0;
    }

    // Check FAT32
    char fat_sig[9];
    pread(fd, fat_sig, 8, FAT32_SIG_OFFSET);
    fat_sig[8] = '\0';
    if (strcmp(fat_sig, FAT32_SIG) == 0) {
        strcpy(type_out, "fat32");
        close(fd);
        return 0;
    }

    // Check custom filesystem
    uint32_t myfs_magic;
    pread(fd, &myfs_magic, 4, 0);
    if (le32toh(myfs_magic) == MYFS_MAGIC) {
        strcpy(type_out, "myfs");
        close(fd);
        return 0;
    }

    close(fd);
    strcpy(type_out, "unknown");
    return -1;
}

Phase 2: Feature Detection (Days 2-4)

Goals:

  • Parse filesystem metadata
  • Extract feature flags
  • Determine limits (file size, filename length)

ext4 Features:

int parse_ext4_features(int fd, struct fs_features *f) {
    struct ext4_super_block sb;
    pread(fd, &sb, sizeof(sb), 1024);

    strcpy(f->name, "ext4");

    // Check feature flags
    f->has_journaling = (sb.s_feature_compat & EXT4_FEATURE_COMPAT_HAS_JOURNAL);
    f->has_permissions = true;  // Always for ext4
    f->case_sensitive = true;
    f->supports_sparse = true;
    f->supports_hardlinks = true;
    f->supports_symlinks = true;

    // Calculate max file size from block size
    uint32_t block_size = 1024 << sb.s_log_block_size;
    // With extents, max is 16TB
    f->max_file_size = 16ULL * 1024 * 1024 * 1024 * 1024;
    f->max_filename_len = 255;
    f->max_path_len = 4096;

    return 0;
}

FAT32 Features:

int parse_fat32_features(int fd, struct fs_features *f) {
    struct fat32_boot_sector bs;
    pread(fd, &bs, sizeof(bs), 0);

    strcpy(f->name, "fat32");
    f->has_journaling = false;
    f->has_permissions = false;
    f->case_sensitive = false;  // Case-preserving but insensitive
    f->supports_sparse = false;
    f->supports_hardlinks = false;
    f->supports_symlinks = false;
    f->max_file_size = 0xFFFFFFFFULL;  // 4GB - 1
    f->max_filename_len = 255;  // With VFAT
    f->max_path_len = 260;

    return 0;
}

Phase 3: Space Analysis (Days 4-6)

Goals:

  • Calculate metadata overhead
  • Measure used/free space
  • Analyze internal fragmentation

ext4 Space Analysis:

int analyze_ext4_space(int fd, struct space_metrics *m) {
    struct ext4_super_block sb;
    pread(fd, &sb, sizeof(sb), 1024);

    uint32_t block_size = 1024 << sb.s_log_block_size;
    uint64_t total_blocks = sb.s_blocks_count_lo;
    uint64_t free_blocks = sb.s_free_blocks_count_lo;

    m->total_bytes = total_blocks * block_size;
    m->free_bytes = free_blocks * block_size;
    m->used_bytes = m->total_bytes - m->free_bytes;

    // Metadata: superblock copies, BGDT, bitmaps, inode tables
    uint32_t groups = (total_blocks + sb.s_blocks_per_group - 1) /
                      sb.s_blocks_per_group;

    // Each group has: SB(1) + BGDT(varies) + bitmaps(2) + inode table
    uint32_t inodes_per_group = sb.s_inodes_per_group;
    uint32_t inode_table_blocks = (inodes_per_group * sb.s_inode_size + block_size - 1)
                                  / block_size;

    m->metadata_bytes = groups * (4 + inode_table_blocks) * block_size;
    m->data_bytes = m->used_bytes - m->metadata_bytes;

    m->metadata_overhead_pct = 100.0 * m->metadata_bytes / m->total_bytes;
    m->space_efficiency_pct = 100.0 * m->data_bytes / m->used_bytes;

    return 0;
}

Phase 4: Fragmentation Analysis (Days 6-8)

Goals:

  • Count extents/fragments per file
  • Calculate fragmentation scores
  • Identify heavily fragmented files

Using FIEMAP for ext4:

#include <linux/fiemap.h>
#include <linux/fs.h>

int count_file_extents(const char *path) {
    int fd = open(path, O_RDONLY);
    if (fd < 0) return -1;

    struct {
        struct fiemap fm;
        struct fiemap_extent extents[1024];
    } buf;

    memset(&buf, 0, sizeof(buf));
    buf.fm.fm_length = ~0ULL;
    buf.fm.fm_extent_count = 1024;

    if (ioctl(fd, FS_IOC_FIEMAP, &buf.fm) < 0) {
        close(fd);
        return -1;
    }

    close(fd);
    return buf.fm.fm_mapped_extents;
}

int analyze_fragmentation(const char *mount_point, struct frag_metrics *m) {
    m->total_files = 0;
    m->fragmented_files = 0;
    m->max_extents = 0;
    uint64_t total_extents = 0;

    // Walk directory tree
    FTS *fts = fts_open((char*[]){(char*)mount_point, NULL}, FTS_PHYSICAL, NULL);
    FTSENT *ent;

    while ((ent = fts_read(fts)) != NULL) {
        if (ent->fts_info == FTS_F) {  // Regular file
            int extents = count_file_extents(ent->fts_accpath);
            if (extents > 0) {
                m->total_files++;
                total_extents += extents;

                if (extents > 1) {
                    m->fragmented_files++;
                }
                if (extents > m->max_extents) {
                    m->max_extents = extents;
                }
            }
        }
    }

    fts_close(fts);

    m->avg_extents_per_file = (double)total_extents / m->total_files;
    m->fragmentation_score = 100.0 * m->fragmented_files / m->total_files;

    return 0;
}

Phase 5: Performance Benchmarking (Days 8-11)

Goals:

  • Measure sequential I/O throughput
  • Measure random I/O performance
  • Measure metadata operation speed

Sequential Benchmark:

#include <time.h>

int benchmark_sequential_write(const char *mount_point, double *mbps) {
    char path[512];
    snprintf(path, sizeof(path), "%s/bench_seq_write", mount_point);

    // Use O_DIRECT to bypass cache
    int fd = open(path, O_WRONLY | O_CREAT | O_DIRECT, 0644);
    if (fd < 0) {
        // Fallback without O_DIRECT
        fd = open(path, O_WRONLY | O_CREAT, 0644);
    }

    size_t block_size = 1024 * 1024;  // 1MB blocks
    size_t total_size = 100 * block_size;  // 100MB total
    void *buffer;
    posix_memalign(&buffer, 4096, block_size);
    memset(buffer, 'X', block_size);

    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);

    for (size_t written = 0; written < total_size; written += block_size) {
        if (write(fd, buffer, block_size) != block_size) {
            free(buffer);
            close(fd);
            unlink(path);
            return -1;
        }
    }
    fsync(fd);  // Ensure data is on disk

    clock_gettime(CLOCK_MONOTONIC, &end);

    double elapsed = (end.tv_sec - start.tv_sec) +
                    (end.tv_nsec - start.tv_nsec) / 1e9;

    *mbps = (total_size / (1024.0 * 1024.0)) / elapsed;

    free(buffer);
    close(fd);
    unlink(path);

    return 0;
}

int benchmark_random_read(const char *mount_point, double *iops) {
    char path[512];
    snprintf(path, sizeof(path), "%s/bench_rand_read", mount_point);

    // Create test file
    int fd = open(path, O_RDWR | O_CREAT, 0644);
    size_t file_size = 100 * 1024 * 1024;  // 100MB
    ftruncate(fd, file_size);

    char buffer[4096];
    int num_ops = 10000;

    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);

    for (int i = 0; i < num_ops; i++) {
        off_t offset = (rand() % (file_size / 4096)) * 4096;
        pread(fd, buffer, 4096, offset);
    }

    clock_gettime(CLOCK_MONOTONIC, &end);

    double elapsed = (end.tv_sec - start.tv_sec) +
                    (end.tv_nsec - start.tv_nsec) / 1e9;

    *iops = num_ops / elapsed;

    close(fd);
    unlink(path);

    return 0;
}

Phase 6: Report Generation (Days 11-14)

Goals:

  • Format comparison tables
  • Generate multiple output formats
  • Highlight significant differences
void print_comparison_table(struct fs_analysis *a, int count) {
    // Header
    printf("\nFILESYSTEM COMPARISON REPORT\n");
    printf("============================\n\n");

    // Feature comparison
    printf("%-20s", "Feature");
    for (int i = 0; i < count; i++) {
        printf("%-15s", a[i].features.name);
    }
    printf("\n");
    printf("%.60s\n", "------------------------------------------------------------");

    // Max file size
    printf("%-20s", "Max file size");
    for (int i = 0; i < count; i++) {
        char buf[32];
        format_size(a[i].features.max_file_size, buf);
        printf("%-15s", buf);
    }
    printf("\n");

    // Journaling
    printf("%-20s", "Journaling");
    for (int i = 0; i < count; i++) {
        printf("%-15s", a[i].features.has_journaling ? "Yes" : "No");
    }
    printf("\n");

    // ... more features ...

    // Space efficiency
    printf("\nSPACE EFFICIENCY\n");
    printf("%.60s\n", "------------------------------------------------------------");

    printf("%-20s", "Metadata overhead");
    for (int i = 0; i < count; i++) {
        printf("%-15.1f%%", a[i].space.metadata_overhead_pct);
    }
    printf("\n");

    printf("%-20s", "Avg fragmentation");
    for (int i = 0; i < count; i++) {
        printf("%-15.1f%%", a[i].frag.fragmentation_score);
    }
    printf("\n");

    // Performance
    printf("\nPERFORMANCE\n");
    printf("%.60s\n", "------------------------------------------------------------");

    printf("%-20s", "Seq write (MB/s)");
    for (int i = 0; i < count; i++) {
        printf("%-15.1f", a[i].perf.seq_write_mbps);
    }
    printf("\n");

    printf("%-20s", "Seq read (MB/s)");
    for (int i = 0; i < count; i++) {
        printf("%-15.1f", a[i].perf.seq_read_mbps);
    }
    printf("\n");
}

void export_json(struct fs_analysis *a, int count, const char *path) {
    FILE *f = fopen(path, "w");

    fprintf(f, "{\n  \"filesystems\": [\n");
    for (int i = 0; i < count; i++) {
        fprintf(f, "    {\n");
        fprintf(f, "      \"name\": \"%s\",\n", a[i].fs_type);
        fprintf(f, "      \"features\": {\n");
        fprintf(f, "        \"journaling\": %s,\n",
                a[i].features.has_journaling ? "true" : "false");
        fprintf(f, "        \"max_file_size\": %lu\n",
                a[i].features.max_file_size);
        fprintf(f, "      },\n");
        fprintf(f, "      \"performance\": {\n");
        fprintf(f, "        \"seq_write_mbps\": %.2f,\n", a[i].perf.seq_write_mbps);
        fprintf(f, "        \"seq_read_mbps\": %.2f\n", a[i].perf.seq_read_mbps);
        fprintf(f, "      }\n");
        fprintf(f, "    }%s\n", i < count - 1 ? "," : "");
    }
    fprintf(f, "  ]\n}\n");

    fclose(f);
}

Testing Strategy

Unit Tests

// Test filesystem detection
void test_detect_ext4() {
    // Create minimal ext4 image
    system("dd if=/dev/zero of=/tmp/test.img bs=1M count=10 2>/dev/null");
    system("mkfs.ext4 -F /tmp/test.img >/dev/null 2>&1");

    char type[32];
    int ret = detect_fs_type("/tmp/test.img", type);

    assert(ret == 0);
    assert(strcmp(type, "ext4") == 0);

    unlink("/tmp/test.img");
}

// Test space calculation
void test_space_analysis() {
    // Create image with known contents
    // Verify calculations match expected values
}

// Test fragmentation counting
void test_fragmentation() {
    // Create file with known number of extents
    // Verify count matches
}

Integration Tests

#!/bin/bash
# Test full comparison workflow

# Create test images
dd if=/dev/zero of=ext4.img bs=1M count=100 2>/dev/null
dd if=/dev/zero of=fat32.img bs=1M count=100 2>/dev/null

mkfs.ext4 -F ext4.img >/dev/null 2>&1
mkfs.vfat -F 32 fat32.img >/dev/null 2>&1

# Mount and add test files
sudo mount -o loop ext4.img /tmp/ext4_mnt
sudo mount -o loop fat32.img /tmp/fat32_mnt

for i in $(seq 1 100); do
    dd if=/dev/urandom of=/tmp/ext4_mnt/file_$i bs=1K count=10 2>/dev/null
    dd if=/dev/urandom of=/tmp/fat32_mnt/file_$i bs=1K count=10 2>/dev/null
done

sudo umount /tmp/ext4_mnt
sudo umount /tmp/fat32_mnt

# Run comparison
./fscompare ext4.img fat32.img > report.txt

# Verify report contains expected sections
grep -q "FILESYSTEM COMPARISON" report.txt && echo "PASS: Header" || echo "FAIL"
grep -q "ext4" report.txt && echo "PASS: ext4 detected" || echo "FAIL"
grep -q "fat32" report.txt && echo "PASS: FAT32 detected" || echo "FAIL"
grep -q "Journaling.*Yes.*No" report.txt && echo "PASS: Features" || echo "FAIL"

rm ext4.img fat32.img report.txt

Common Pitfalls and Debugging

Pitfall 1: Page Cache Invalidation

Problem: Benchmarks show impossibly high speeds (reading from cache).

Solution:

// Drop caches before benchmarks
system("sync; echo 3 > /proc/sys/vm/drop_caches");

// Or use O_DIRECT
int fd = open(path, O_RDONLY | O_DIRECT);

Pitfall 2: Incorrect Struct Packing

Problem: Parsed values are garbage.

Solution:

// Always use packed structs for on-disk formats
struct ext4_super_block {
    // ...
} __attribute__((packed));

// Verify size matches spec
_Static_assert(sizeof(struct ext4_super_block) == 1024, "size mismatch");

Pitfall 3: Endianness

Problem: Values look wrong on big-endian systems.

Solution:

#include <endian.h>

// ext4 and FAT32 are little-endian
uint32_t block_count = le32toh(sb.s_blocks_count_lo);

Extensions and Challenges

Extension 1: Graphical Reports

Generate HTML with charts:

<canvas id="perfChart"></canvas>
<script>
var ctx = document.getElementById('perfChart');
new Chart(ctx, {
    type: 'bar',
    data: { /* from JSON export */ }
});
</script>

Extension 2: NTFS Support

Parse NTFS structures:

  • $MFT (Master File Table)
  • Attribute headers
  • Data runs

Extension 3: Live Comparison

Compare mounted filesystems in real-time:

./fscompare --live /mnt/ext4 /mnt/fat32 --interval 5

Interview Questions

  1. โ€œWhy does FAT32 have a 4GB file size limit?โ€
    • Expected: Directory entry uses 32-bit file size field
  2. โ€œHow would you measure filesystem fragmentation?โ€
    • Expected: Count extents per file (FIEMAP) or follow allocation chains
  3. โ€œWhat makes benchmarking filesystems difficult?โ€
    • Expected: Page cache, warm-up effects, background activity, I/O scheduler
  4. โ€œWhy is ext4 faster than FAT32 for small files?โ€
    • Expected: Extent-based allocation, better directory indexing, locality

Self-Assessment Checklist

  • Correctly detect ext4, FAT32, and custom filesystem types
  • Parse superblock/boot sector data accurately
  • Calculate metadata overhead correctly
  • Measure fragmentation using FIEMAP or equivalent
  • Benchmark with proper cache handling
  • Generate readable comparison reports
  • Export data in JSON/CSV formats

Resources

Books

  • โ€œSystems Performanceโ€ by Brendan Gregg - Ch. 8 (Filesystems)
  • โ€œOperating Systems: Three Easy Piecesโ€ - Ch. 39-42

Online