Project 5: Filesystem Comparison Tool
Project 5: Filesystem Comparison Tool
Project Overview
| Attribute | Details |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2 weeks |
| Language | C |
| Knowledge Area | Performance Analysis / Filesystems |
| Main Book | โSystems Performanceโ by Brendan Gregg |
| Prerequisites | Project 1 or strong understanding of filesystem structures |
What Youโll Build
A tool that mounts and analyzes multiple filesystem types (ext2/ext4, FAT32, your custom FS) and produces detailed comparison reportsโperformance metrics, space efficiency, feature support.
Real World Output:
$ ./fscompare ext4.img fat32.img myfs.img
FILESYSTEM COMPARISON REPORT
============================
ext4 FAT32 MyFS
-------------------------------------------------
Max filename: 255 255 255
Max file size: 16TB 4GB 2GB
Journaling: Yes No Yes
Permissions: Unix None Unix
Case sensitive: Yes No Yes
SPACE EFFICIENCY (1GB disk, 10000 small files):
Metadata overhead: 2.1% 0.8% 3.2%
Avg fragmentation: 1.2% 15.3% 4.1%
Space wasted: 12MB 156MB 31MB
PERFORMANCE (sequential 100MB write):
Write speed: 245 MB/s 198 MB/s 156 MB/s
Read speed: 312 MB/s 287 MB/s 201 MB/s
Learning Objectives
By completing this project, you will:
- Parse multiple filesystem formats (ext2/4, FAT32, custom)
- Measure real performance with proper benchmarking methodology
- Analyze space efficiency and fragmentation
- Understand filesystem design tradeoffs through direct comparison
- Build practical filesystem analysis tools
- Generate meaningful reports from raw filesystem data
The Core Question Youโre Answering
โWhy do different filesystems exist? What fundamental tradeoffs force us to choose between FAT32, ext4, NTFS, and Btrfs?โ
Most developers think filesystems are โjust different versions of the same thing.โ Theyโre not. Each represents a fundamentally different answer to the question: What do you optimize for?
- FAT32 optimizes for: Simplicity and universal compatibility
- ext4 optimizes for: Performance and features for Linux
- NTFS optimizes for: Windows integration and security
- Btrfs/ZFS optimize for: Data integrity and snapshots
This project forces you to understand these tradeoffs by measuring them directly.
Deep Theoretical Foundation
1. Why Different Filesystems Exist
Historical Context
| Year | Filesystem | Why It Was Created |
|---|---|---|
| 1977 | FAT12 | Simple filesystem for floppy disks |
| 1996 | FAT32 | Larger drives, still simple |
| 1993 | ext2 | Modern Unix filesystem for Linux |
| 1993 | NTFS | Windows NT needed security, journaling |
| 2001 | ext3 | Added journaling to ext2 |
| 2006 | ext4 | Extents, larger files, better performance |
| 2005 | ZFS | Data integrity, snapshots, pooling |
| 2007 | Btrfs | Linux answer to ZFS features |
Fundamental Tradeoffs
Tradeoff 1: Simplicity vs Features
FAT32 ext4
Simple โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Complex
- No permissions - Full Unix permissions
- No journaling - Journaling
- 4GB file limit - 16TB files
- Universal compatibility - Linux-specific
Tradeoff 2: Space Efficiency vs Performance
Small blocks (1KB) Large blocks (64KB)
Space efficient โโโโโโโโโโโโโโโโโโโโโโ High throughput
- Less internal fragmentation - More wasted space
- More metadata overhead - Faster large file I/O
- Slower random I/O - Fewer seeks for big files
Tradeoff 3: Write Speed vs Durability
No journaling Full data journaling
Fast writes โโโโโโโโโโโโโโโโโโโโโโโโโโ Guaranteed consistency
- Writes go directly to disk - Writes go to journal first
- Crash can corrupt filesystem - Crash recovery is fast
- Simple implementation - Complex, more I/O
2. ext4 Internal Structure
ext4 is the most common Linux filesystem. Key structures:
Superblock (block 0 or 1):
struct ext4_super_block {
uint32_t s_inodes_count; // Total inodes
uint32_t s_blocks_count_lo; // Total blocks (low 32 bits)
uint32_t s_log_block_size; // Block size = 1024 << this
uint16_t s_magic; // 0xEF53
uint32_t s_feature_compat; // Compatible features
uint32_t s_feature_incompat; // Incompatible features
// ... many more
};
// Feature flags tell you what features are enabled
#define EXT4_FEATURE_INCOMPAT_EXTENTS 0x40
#define EXT4_FEATURE_COMPAT_HAS_JOURNAL 0x04
Block Groups: ext4 divides disk into block groups for locality:
[Group 0][Group 1][Group 2]...[Group N]
Each group:
[SB copy][BGDT][Block bitmap][Inode bitmap][Inode table][Data]
Extents (modern) vs Block Pointers (legacy):
Block pointers: Extents:
inode.i_block[0] = 100 extent: start=100, length=50
inode.i_block[1] = 101 (one entry describes 50 blocks)
inode.i_block[2] = 102
... (50 entries)
Extents are much more efficient for contiguous files.
3. FAT32 Internal Structure
FAT32 is elegant in its simplicity:
Boot Sector (sector 0):
struct fat32_boot_sector {
uint8_t jmp[3]; // Jump instruction
uint8_t oem[8]; // OEM name
uint16_t bytes_per_sector; // Usually 512
uint8_t sectors_per_cluster;
uint16_t reserved_sectors;
uint8_t num_fats; // Usually 2
uint32_t sectors_per_fat;
uint32_t root_cluster; // Root directory start
// ...
};
File Allocation Table:
The FAT is just an array of 32-bit cluster numbers:
FAT[0] = reserved
FAT[1] = reserved
FAT[2] = 3 // Cluster 2 โ cluster 3
FAT[3] = 4 // Cluster 3 โ cluster 4
FAT[4] = 0x0FFFFFF8 // Cluster 4 is end of chain
FAT[5] = 0 // Free cluster
...
To read a file: follow the chain from starting cluster.
Why FAT32 has 4GB file limit: Directory entries store file size in a 32-bit field:
struct fat32_dir_entry {
char name[11]; // 8.3 filename
uint8_t attr;
uint16_t cluster_high; // High 16 bits of start cluster
uint16_t cluster_low; // Low 16 bits
uint32_t file_size; // Max 4GB-1
};
4. Performance Measurement Fundamentals
What to Measure:
| Metric | What It Means | How to Measure |
|---|---|---|
| Throughput | MB/s for large sequential I/O | Time large reads/writes |
| IOPS | Operations per second | Count small random I/O |
| Latency | Time for single operation | High-resolution timer |
| CPU Usage | Processing overhead | /proc/stat before/after |
Avoiding Measurement Errors:
- Page Cache Lies: The kernel caches recently read data. Drop caches or use O_DIRECT. ```c // Drop caches before reading system(โecho 3 > /proc/sys/vm/drop_cachesโ);
// Or use O_DIRECT to bypass cache int fd = open(path, O_RDONLY | O_DIRECT);
2. **Warm-up Effects:** First operations are slower. Run warmup iterations.
3. **Statistical Significance:** Run multiple iterations, report median and stddev.
4. **Background Noise:** Minimize other system activity.
### 5. Fragmentation Analysis
**What is Fragmentation?**
A file is fragmented when its data blocks are not contiguous on disk:
Contiguous file: Fragmented file: Block: 100 101 102 103 Block: 100 200 150 300 โโโโโโโโโโโโโโบ โโฌโโบโโฌโโบโโฌโโบ (1 seek) (4 seeks)
**Measuring Fragmentation:**
For ext4 (with extents):
```c
// Count extents per file
// 1 extent = perfect
// Many extents = fragmented
int count_extents(int fd) {
struct fiemap fiemap = {0};
fiemap.fm_length = ~0ULL;
ioctl(fd, FS_IOC_FIEMAP, &fiemap);
return fiemap.fm_mapped_extents;
}
For FAT32 (with FAT chains):
// Count cluster chain discontinuities
int count_fragments(uint32_t start_cluster) {
int fragments = 1;
uint32_t prev = start_cluster;
uint32_t current = fat_table[start_cluster];
while (current < 0x0FFFFFF8) {
if (current != prev + 1) {
fragments++; // Discontinuity
}
prev = current;
current = fat_table[current];
}
return fragments;
}
6. Space Efficiency Analysis
Metadata Overhead:
Total disk space = Metadata + Usable data space
ext4: Superblock, BGDT, bitmaps, inode tables (~2-5%)
FAT32: Boot sector, FAT tables (0.5-2%)
Internal Fragmentation:
A 1-byte file on 4KB block filesystem wastes 4095 bytes.
Average waste per file = block_size / 2
With 10000 files and 4KB blocks:
Expected waste = 10000 ร 2KB = 20MB
Calculate efficiency:
struct space_analysis {
uint64_t total_size;
uint64_t metadata_size;
uint64_t data_used;
uint64_t internal_waste;
double efficiency; // data_used / (data_used + waste)
};
Project Specification
Functional Requirements
- Parse filesystem images
- Detect filesystem type from magic numbers
- Parse ext2/ext4 superblock and structures
- Parse FAT32 boot sector and FAT table
- Support your custom filesystem format
- Feature comparison
- Maximum file/filename size
- Journaling support
- Permission support
- Case sensitivity
- Space efficiency analysis
- Metadata overhead percentage
- Internal fragmentation measurement
- Free space tracking
- Fragmentation analysis
- Per-file extent/fragment count
- Average fragmentation score
- Free space fragmentation
- Performance benchmarking
- Sequential read/write throughput
- Random I/O performance
- Small file creation speed
- Report generation
- Human-readable text output
- Machine-readable JSON/CSV
- Comparison tables
Non-Functional Requirements
- Parse images up to 1TB
- Complete analysis in < 5 minutes
- Accurate timing (microsecond resolution)
- Reproducible results
Solution Architecture
Component Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CLI Interface โ
โ (parse arguments, format output) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Comparison Engine โ
โ (coordinate analysis, aggregate results) โ
โโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ Feature โ โ Space โ โ Performance โ
โ Detector โ โ Analyzer โ โ Benchmarker โ
โโโโโโโโโฌโโโโโโโโ โโโโโโโโโฌโโโโโโโโ โโโโโโโโโฌโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Filesystem Parsers โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ext4 Parser โ FAT32 Parser โ Custom FS Parser โ
โ (superblock, โ (boot sector, โ (your format) โ
โ inodes, โ FAT table, โ โ
โ extents) โ clusters) โ โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Disk Image File(s) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Data Structures
// Filesystem capabilities
struct fs_features {
char name[32];
bool has_journaling;
bool has_permissions;
bool case_sensitive;
bool supports_sparse;
bool supports_hardlinks;
bool supports_symlinks;
uint64_t max_file_size;
uint32_t max_filename_len;
uint32_t max_path_len;
};
// Space analysis results
struct space_metrics {
uint64_t total_bytes;
uint64_t used_bytes;
uint64_t free_bytes;
uint64_t metadata_bytes;
uint64_t data_bytes;
double metadata_overhead_pct;
double space_efficiency_pct;
};
// Fragmentation results
struct frag_metrics {
uint64_t total_files;
uint64_t fragmented_files;
double avg_extents_per_file;
uint32_t max_extents;
double fragmentation_score;
};
// Performance results
struct perf_metrics {
double seq_write_mbps;
double seq_read_mbps;
double rand_write_iops;
double rand_read_iops;
double create_files_per_sec;
double metadata_ops_per_sec;
};
// Complete analysis for one filesystem
struct fs_analysis {
char image_path[256];
char fs_type[32];
struct fs_features features;
struct space_metrics space;
struct frag_metrics frag;
struct perf_metrics perf;
};
Key Functions
// Filesystem detection
int detect_fs_type(const char *image_path, char *type_out);
// Parsers
int parse_ext4(const char *image, struct fs_analysis *result);
int parse_fat32(const char *image, struct fs_analysis *result);
int parse_myfs(const char *image, struct fs_analysis *result);
// Feature detection
int detect_features(const char *image, struct fs_features *features);
// Space analysis
int analyze_space(const char *image, struct space_metrics *metrics);
int measure_fragmentation(const char *image, struct frag_metrics *metrics);
// Performance benchmarks
int benchmark_sequential(const char *mount_point, struct perf_metrics *metrics);
int benchmark_random(const char *mount_point, struct perf_metrics *metrics);
int benchmark_metadata(const char *mount_point, struct perf_metrics *metrics);
// Report generation
void print_comparison_report(struct fs_analysis *analyses, int count);
void export_json(struct fs_analysis *analyses, int count, const char *path);
void export_csv(struct fs_analysis *analyses, int count, const char *path);
Implementation Guide
Phase 1: Filesystem Detection (Days 1-2)
Goals:
- Detect filesystem type from magic numbers
- Create unified parser interface
- Handle unknown formats gracefully
Magic Numbers:
// ext2/3/4: bytes 1080-1081 = 0x53EF (little-endian)
#define EXT4_MAGIC_OFFSET 1080
#define EXT4_MAGIC 0xEF53
// FAT32: bytes 82-89 = "FAT32 "
#define FAT32_SIG_OFFSET 82
#define FAT32_SIG "FAT32 "
// Your custom filesystem
#define MYFS_MAGIC 0xDEADBEEF
int detect_fs_type(const char *image_path, char *type_out) {
int fd = open(image_path, O_RDONLY);
// Check ext4
uint16_t ext_magic;
pread(fd, &ext_magic, 2, EXT4_MAGIC_OFFSET);
if (le16toh(ext_magic) == EXT4_MAGIC) {
strcpy(type_out, "ext4");
close(fd);
return 0;
}
// Check FAT32
char fat_sig[9];
pread(fd, fat_sig, 8, FAT32_SIG_OFFSET);
fat_sig[8] = '\0';
if (strcmp(fat_sig, FAT32_SIG) == 0) {
strcpy(type_out, "fat32");
close(fd);
return 0;
}
// Check custom filesystem
uint32_t myfs_magic;
pread(fd, &myfs_magic, 4, 0);
if (le32toh(myfs_magic) == MYFS_MAGIC) {
strcpy(type_out, "myfs");
close(fd);
return 0;
}
close(fd);
strcpy(type_out, "unknown");
return -1;
}
Phase 2: Feature Detection (Days 2-4)
Goals:
- Parse filesystem metadata
- Extract feature flags
- Determine limits (file size, filename length)
ext4 Features:
int parse_ext4_features(int fd, struct fs_features *f) {
struct ext4_super_block sb;
pread(fd, &sb, sizeof(sb), 1024);
strcpy(f->name, "ext4");
// Check feature flags
f->has_journaling = (sb.s_feature_compat & EXT4_FEATURE_COMPAT_HAS_JOURNAL);
f->has_permissions = true; // Always for ext4
f->case_sensitive = true;
f->supports_sparse = true;
f->supports_hardlinks = true;
f->supports_symlinks = true;
// Calculate max file size from block size
uint32_t block_size = 1024 << sb.s_log_block_size;
// With extents, max is 16TB
f->max_file_size = 16ULL * 1024 * 1024 * 1024 * 1024;
f->max_filename_len = 255;
f->max_path_len = 4096;
return 0;
}
FAT32 Features:
int parse_fat32_features(int fd, struct fs_features *f) {
struct fat32_boot_sector bs;
pread(fd, &bs, sizeof(bs), 0);
strcpy(f->name, "fat32");
f->has_journaling = false;
f->has_permissions = false;
f->case_sensitive = false; // Case-preserving but insensitive
f->supports_sparse = false;
f->supports_hardlinks = false;
f->supports_symlinks = false;
f->max_file_size = 0xFFFFFFFFULL; // 4GB - 1
f->max_filename_len = 255; // With VFAT
f->max_path_len = 260;
return 0;
}
Phase 3: Space Analysis (Days 4-6)
Goals:
- Calculate metadata overhead
- Measure used/free space
- Analyze internal fragmentation
ext4 Space Analysis:
int analyze_ext4_space(int fd, struct space_metrics *m) {
struct ext4_super_block sb;
pread(fd, &sb, sizeof(sb), 1024);
uint32_t block_size = 1024 << sb.s_log_block_size;
uint64_t total_blocks = sb.s_blocks_count_lo;
uint64_t free_blocks = sb.s_free_blocks_count_lo;
m->total_bytes = total_blocks * block_size;
m->free_bytes = free_blocks * block_size;
m->used_bytes = m->total_bytes - m->free_bytes;
// Metadata: superblock copies, BGDT, bitmaps, inode tables
uint32_t groups = (total_blocks + sb.s_blocks_per_group - 1) /
sb.s_blocks_per_group;
// Each group has: SB(1) + BGDT(varies) + bitmaps(2) + inode table
uint32_t inodes_per_group = sb.s_inodes_per_group;
uint32_t inode_table_blocks = (inodes_per_group * sb.s_inode_size + block_size - 1)
/ block_size;
m->metadata_bytes = groups * (4 + inode_table_blocks) * block_size;
m->data_bytes = m->used_bytes - m->metadata_bytes;
m->metadata_overhead_pct = 100.0 * m->metadata_bytes / m->total_bytes;
m->space_efficiency_pct = 100.0 * m->data_bytes / m->used_bytes;
return 0;
}
Phase 4: Fragmentation Analysis (Days 6-8)
Goals:
- Count extents/fragments per file
- Calculate fragmentation scores
- Identify heavily fragmented files
Using FIEMAP for ext4:
#include <linux/fiemap.h>
#include <linux/fs.h>
int count_file_extents(const char *path) {
int fd = open(path, O_RDONLY);
if (fd < 0) return -1;
struct {
struct fiemap fm;
struct fiemap_extent extents[1024];
} buf;
memset(&buf, 0, sizeof(buf));
buf.fm.fm_length = ~0ULL;
buf.fm.fm_extent_count = 1024;
if (ioctl(fd, FS_IOC_FIEMAP, &buf.fm) < 0) {
close(fd);
return -1;
}
close(fd);
return buf.fm.fm_mapped_extents;
}
int analyze_fragmentation(const char *mount_point, struct frag_metrics *m) {
m->total_files = 0;
m->fragmented_files = 0;
m->max_extents = 0;
uint64_t total_extents = 0;
// Walk directory tree
FTS *fts = fts_open((char*[]){(char*)mount_point, NULL}, FTS_PHYSICAL, NULL);
FTSENT *ent;
while ((ent = fts_read(fts)) != NULL) {
if (ent->fts_info == FTS_F) { // Regular file
int extents = count_file_extents(ent->fts_accpath);
if (extents > 0) {
m->total_files++;
total_extents += extents;
if (extents > 1) {
m->fragmented_files++;
}
if (extents > m->max_extents) {
m->max_extents = extents;
}
}
}
}
fts_close(fts);
m->avg_extents_per_file = (double)total_extents / m->total_files;
m->fragmentation_score = 100.0 * m->fragmented_files / m->total_files;
return 0;
}
Phase 5: Performance Benchmarking (Days 8-11)
Goals:
- Measure sequential I/O throughput
- Measure random I/O performance
- Measure metadata operation speed
Sequential Benchmark:
#include <time.h>
int benchmark_sequential_write(const char *mount_point, double *mbps) {
char path[512];
snprintf(path, sizeof(path), "%s/bench_seq_write", mount_point);
// Use O_DIRECT to bypass cache
int fd = open(path, O_WRONLY | O_CREAT | O_DIRECT, 0644);
if (fd < 0) {
// Fallback without O_DIRECT
fd = open(path, O_WRONLY | O_CREAT, 0644);
}
size_t block_size = 1024 * 1024; // 1MB blocks
size_t total_size = 100 * block_size; // 100MB total
void *buffer;
posix_memalign(&buffer, 4096, block_size);
memset(buffer, 'X', block_size);
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
for (size_t written = 0; written < total_size; written += block_size) {
if (write(fd, buffer, block_size) != block_size) {
free(buffer);
close(fd);
unlink(path);
return -1;
}
}
fsync(fd); // Ensure data is on disk
clock_gettime(CLOCK_MONOTONIC, &end);
double elapsed = (end.tv_sec - start.tv_sec) +
(end.tv_nsec - start.tv_nsec) / 1e9;
*mbps = (total_size / (1024.0 * 1024.0)) / elapsed;
free(buffer);
close(fd);
unlink(path);
return 0;
}
int benchmark_random_read(const char *mount_point, double *iops) {
char path[512];
snprintf(path, sizeof(path), "%s/bench_rand_read", mount_point);
// Create test file
int fd = open(path, O_RDWR | O_CREAT, 0644);
size_t file_size = 100 * 1024 * 1024; // 100MB
ftruncate(fd, file_size);
char buffer[4096];
int num_ops = 10000;
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
for (int i = 0; i < num_ops; i++) {
off_t offset = (rand() % (file_size / 4096)) * 4096;
pread(fd, buffer, 4096, offset);
}
clock_gettime(CLOCK_MONOTONIC, &end);
double elapsed = (end.tv_sec - start.tv_sec) +
(end.tv_nsec - start.tv_nsec) / 1e9;
*iops = num_ops / elapsed;
close(fd);
unlink(path);
return 0;
}
Phase 6: Report Generation (Days 11-14)
Goals:
- Format comparison tables
- Generate multiple output formats
- Highlight significant differences
void print_comparison_table(struct fs_analysis *a, int count) {
// Header
printf("\nFILESYSTEM COMPARISON REPORT\n");
printf("============================\n\n");
// Feature comparison
printf("%-20s", "Feature");
for (int i = 0; i < count; i++) {
printf("%-15s", a[i].features.name);
}
printf("\n");
printf("%.60s\n", "------------------------------------------------------------");
// Max file size
printf("%-20s", "Max file size");
for (int i = 0; i < count; i++) {
char buf[32];
format_size(a[i].features.max_file_size, buf);
printf("%-15s", buf);
}
printf("\n");
// Journaling
printf("%-20s", "Journaling");
for (int i = 0; i < count; i++) {
printf("%-15s", a[i].features.has_journaling ? "Yes" : "No");
}
printf("\n");
// ... more features ...
// Space efficiency
printf("\nSPACE EFFICIENCY\n");
printf("%.60s\n", "------------------------------------------------------------");
printf("%-20s", "Metadata overhead");
for (int i = 0; i < count; i++) {
printf("%-15.1f%%", a[i].space.metadata_overhead_pct);
}
printf("\n");
printf("%-20s", "Avg fragmentation");
for (int i = 0; i < count; i++) {
printf("%-15.1f%%", a[i].frag.fragmentation_score);
}
printf("\n");
// Performance
printf("\nPERFORMANCE\n");
printf("%.60s\n", "------------------------------------------------------------");
printf("%-20s", "Seq write (MB/s)");
for (int i = 0; i < count; i++) {
printf("%-15.1f", a[i].perf.seq_write_mbps);
}
printf("\n");
printf("%-20s", "Seq read (MB/s)");
for (int i = 0; i < count; i++) {
printf("%-15.1f", a[i].perf.seq_read_mbps);
}
printf("\n");
}
void export_json(struct fs_analysis *a, int count, const char *path) {
FILE *f = fopen(path, "w");
fprintf(f, "{\n \"filesystems\": [\n");
for (int i = 0; i < count; i++) {
fprintf(f, " {\n");
fprintf(f, " \"name\": \"%s\",\n", a[i].fs_type);
fprintf(f, " \"features\": {\n");
fprintf(f, " \"journaling\": %s,\n",
a[i].features.has_journaling ? "true" : "false");
fprintf(f, " \"max_file_size\": %lu\n",
a[i].features.max_file_size);
fprintf(f, " },\n");
fprintf(f, " \"performance\": {\n");
fprintf(f, " \"seq_write_mbps\": %.2f,\n", a[i].perf.seq_write_mbps);
fprintf(f, " \"seq_read_mbps\": %.2f\n", a[i].perf.seq_read_mbps);
fprintf(f, " }\n");
fprintf(f, " }%s\n", i < count - 1 ? "," : "");
}
fprintf(f, " ]\n}\n");
fclose(f);
}
Testing Strategy
Unit Tests
// Test filesystem detection
void test_detect_ext4() {
// Create minimal ext4 image
system("dd if=/dev/zero of=/tmp/test.img bs=1M count=10 2>/dev/null");
system("mkfs.ext4 -F /tmp/test.img >/dev/null 2>&1");
char type[32];
int ret = detect_fs_type("/tmp/test.img", type);
assert(ret == 0);
assert(strcmp(type, "ext4") == 0);
unlink("/tmp/test.img");
}
// Test space calculation
void test_space_analysis() {
// Create image with known contents
// Verify calculations match expected values
}
// Test fragmentation counting
void test_fragmentation() {
// Create file with known number of extents
// Verify count matches
}
Integration Tests
#!/bin/bash
# Test full comparison workflow
# Create test images
dd if=/dev/zero of=ext4.img bs=1M count=100 2>/dev/null
dd if=/dev/zero of=fat32.img bs=1M count=100 2>/dev/null
mkfs.ext4 -F ext4.img >/dev/null 2>&1
mkfs.vfat -F 32 fat32.img >/dev/null 2>&1
# Mount and add test files
sudo mount -o loop ext4.img /tmp/ext4_mnt
sudo mount -o loop fat32.img /tmp/fat32_mnt
for i in $(seq 1 100); do
dd if=/dev/urandom of=/tmp/ext4_mnt/file_$i bs=1K count=10 2>/dev/null
dd if=/dev/urandom of=/tmp/fat32_mnt/file_$i bs=1K count=10 2>/dev/null
done
sudo umount /tmp/ext4_mnt
sudo umount /tmp/fat32_mnt
# Run comparison
./fscompare ext4.img fat32.img > report.txt
# Verify report contains expected sections
grep -q "FILESYSTEM COMPARISON" report.txt && echo "PASS: Header" || echo "FAIL"
grep -q "ext4" report.txt && echo "PASS: ext4 detected" || echo "FAIL"
grep -q "fat32" report.txt && echo "PASS: FAT32 detected" || echo "FAIL"
grep -q "Journaling.*Yes.*No" report.txt && echo "PASS: Features" || echo "FAIL"
rm ext4.img fat32.img report.txt
Common Pitfalls and Debugging
Pitfall 1: Page Cache Invalidation
Problem: Benchmarks show impossibly high speeds (reading from cache).
Solution:
// Drop caches before benchmarks
system("sync; echo 3 > /proc/sys/vm/drop_caches");
// Or use O_DIRECT
int fd = open(path, O_RDONLY | O_DIRECT);
Pitfall 2: Incorrect Struct Packing
Problem: Parsed values are garbage.
Solution:
// Always use packed structs for on-disk formats
struct ext4_super_block {
// ...
} __attribute__((packed));
// Verify size matches spec
_Static_assert(sizeof(struct ext4_super_block) == 1024, "size mismatch");
Pitfall 3: Endianness
Problem: Values look wrong on big-endian systems.
Solution:
#include <endian.h>
// ext4 and FAT32 are little-endian
uint32_t block_count = le32toh(sb.s_blocks_count_lo);
Extensions and Challenges
Extension 1: Graphical Reports
Generate HTML with charts:
<canvas id="perfChart"></canvas>
<script>
var ctx = document.getElementById('perfChart');
new Chart(ctx, {
type: 'bar',
data: { /* from JSON export */ }
});
</script>
Extension 2: NTFS Support
Parse NTFS structures:
- $MFT (Master File Table)
- Attribute headers
- Data runs
Extension 3: Live Comparison
Compare mounted filesystems in real-time:
./fscompare --live /mnt/ext4 /mnt/fat32 --interval 5
Interview Questions
- โWhy does FAT32 have a 4GB file size limit?โ
- Expected: Directory entry uses 32-bit file size field
- โHow would you measure filesystem fragmentation?โ
- Expected: Count extents per file (FIEMAP) or follow allocation chains
- โWhat makes benchmarking filesystems difficult?โ
- Expected: Page cache, warm-up effects, background activity, I/O scheduler
- โWhy is ext4 faster than FAT32 for small files?โ
- Expected: Extent-based allocation, better directory indexing, locality
Self-Assessment Checklist
- Correctly detect ext4, FAT32, and custom filesystem types
- Parse superblock/boot sector data accurately
- Calculate metadata overhead correctly
- Measure fragmentation using FIEMAP or equivalent
- Benchmark with proper cache handling
- Generate readable comparison reports
- Export data in JSON/CSV formats
Resources
Books
- โSystems Performanceโ by Brendan Gregg - Ch. 8 (Filesystems)
- โOperating Systems: Three Easy Piecesโ - Ch. 39-42