Project 11: File System Driver (FAT16)
Build a FAT16 file system driver that can read files from a disk image–implementing ATA/IDE disk I/O, FAT table parsing, directory traversal, and file reading–the foundation for any operating system that needs persistent storage.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 3-4 weeks |
| Language | C (alt: Rust) |
| Prerequisites | Projects 6-8 (x86 protected mode, IDT, paging) |
| Key Topics | ATA PIO, FAT16, block I/O, cluster chains, directory entries |
| Platform | x86 32-bit (QEMU) |
| Tools | GCC cross-compiler, NASM, QEMU, GDB |
| Main Book | “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau |
1. Learning Objectives
By completing this project, you will:
- Understand block device I/O: Master how operating systems communicate with storage hardware at the lowest level
- Implement ATA PIO mode: Write a driver that reads sectors from a hard disk using x86 I/O ports
- Parse FAT16 structures: Decode the Boot Parameter Block (BPB), FAT table, and directory entries
- Navigate file hierarchies: Implement directory listing and subdirectory traversal
- Read file contents: Follow cluster chains through the FAT table to read complete files
- Handle fragmentation: Understand why files are stored in chains and how to reassemble them
- Debug storage code: Develop techniques for debugging persistent storage drivers
- Build toward a complete OS: File system support is essential for loading programs and configurations
2. Theoretical Foundation
2.1 Core Concepts
What is a File System?
A file system provides a logical abstraction over raw disk storage. Without a file system, a disk is just a sequence of 512-byte sectors. The file system organizes these sectors into files and directories:
┌─────────────────────────────────────────────────────────────────────────────┐
│ RAW DISK vs FILE SYSTEM │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ RAW DISK (what hardware sees): │
│ ───────────────────────────── │
│ │
│ ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐ │
│ │ S0 │ S1 │ S2 │ S3 │ S4 │ S5 │ S6 │ S7 │ S8 │ S9 │ ... │ │
│ └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘ │
│ 512 512 512 512 512 512 512 512 512 512 bytes │
│ │
│ Just numbered sectors. No concept of files, names, or organization. │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ FILE SYSTEM (what users see): │
│ ───────────────────────────── │
│ │
│ / │
│ ├── hello.txt (17 bytes) │
│ ├── documents/ │
│ │ ├── notes.txt (1,240 bytes) │
│ │ └── data.csv (15,892 bytes) │
│ └── programs/ │
│ └── editor.exe (45,312 bytes) │
│ │
│ Files have names, sizes, and can be organized in directories. │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ THE FILE SYSTEM'S JOB: │
│ │
│ • Map file names to disk sectors │
│ • Track which sectors belong to which file │
│ • Manage free space │
│ • Support directories (folders) │
│ • Store metadata (size, dates, permissions) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Why FAT16?
FAT16 (File Allocation Table, 16-bit) is the ideal learning file system:
┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY FAT16 FOR LEARNING? │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ADVANTAGES: │
│ │
│ ✓ Simple structure - only 3 regions to understand │
│ ✓ Publicly documented - Microsoft specification available │
│ ✓ No journaling complexity │
│ ✓ Easy to create test images │
│ ✓ Widely supported (Windows, Linux, embedded) │
│ ✓ Still used in many embedded systems │
│ │
│ LIMITATIONS (that make it a good learning tool): │
│ │
│ • Max partition size ~2GB (perfect for learning) │
│ • Max file size 2GB │
│ • No permissions (keeps things simple) │
│ • No journaling (simpler recovery) │
│ • 8.3 filenames (easier parsing, LFN is optional) │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ COMPARISON: │
│ │
│ File System Complexity Learning Value Real-World Use │
│ ─────────── ────────── ────────────── ────────────── │
│ FAT12 Very Low Good Floppy disks │
│ FAT16 Low Excellent USB drives, cameras │
│ FAT32 Medium Good USB, SD cards │
│ ext2 Medium-High Good Linux legacy │
│ ext4 High Advanced Linux current │
│ NTFS Very High Expert Windows current │
│ ZFS Extreme Master Enterprise storage │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
FAT16 Disk Layout
A FAT16 partition has a simple three-region structure:
┌─────────────────────────────────────────────────────────────────────────────┐
│ FAT16 DISK LAYOUT │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Sector 0 Reserved FAT Root Dir Data │
│ ↓ Sectors Region Region Region │
│ ┌────────┬─────────────────────┬────────────────┬──────────┬────────────┐ │
│ │ Boot │ (optional reserved │ FAT1 │ FAT2 │ Root Dir │ Data │ │
│ │ Sector │ sectors) │ │ (copy) │ Entries │ Clusters │ │
│ └────────┴─────────────────────┴──────┴────────┴──────────┴────────────┘ │
│ │ │ │ │ │ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ │
│ REGION 1: RESERVED SECTORS (starts at sector 0) │
│ ─────────────────────────────────────────────── │
│ • First sector = Boot Sector (BPB) │
│ • Contains file system parameters │
│ • Usually 1 sector, but can be more │
│ │
│ REGION 2: FAT TABLES (File Allocation Tables) │
│ ──────────────────────────────────────────── │
│ • Array of 16-bit cluster pointers │
│ • Maps cluster chains for files │
│ • Usually 2 copies for redundancy │
│ • Size depends on partition size │
│ │
│ REGION 3: ROOT DIRECTORY │
│ ──────────────────────────── │
│ • Fixed-size area for root directory entries │
│ • Each entry is 32 bytes │
│ • Maximum entries defined in BPB (typically 512) │
│ • Unlike subdirectories, cannot grow │
│ │
│ REGION 4: DATA REGION │
│ ──────────────────────── │
│ • Where actual file contents are stored │
│ • Organized in "clusters" (groups of sectors) │
│ • Subdirectories also stored here │
│ • First cluster is cluster 2 (clusters 0,1 are reserved) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Boot Sector (BIOS Parameter Block)
The first sector contains critical file system parameters:
┌─────────────────────────────────────────────────────────────────────────────┐
│ FAT16 BOOT SECTOR STRUCTURE (512 bytes) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Offset Size Name Description │
│ ────── ──── ───────────────── ──────────────────────────────────── │
│ 0x00 3 jmp_boot Jump instruction (EB XX 90) │
│ 0x03 8 oem_name OEM identifier ("MSWIN4.1") │
│ │
│ ─── BIOS PARAMETER BLOCK (BPB) ────────────────────────────────────────── │
│ │
│ 0x0B 2 bytes_per_sector Usually 512 │
│ 0x0D 1 sectors_per_cluster Power of 2 (1,2,4,8,16,32,64,128) │
│ 0x0E 2 reserved_sectors Sectors before FAT (usually 1) │
│ 0x10 1 num_fats Number of FAT copies (usually 2) │
│ 0x11 2 root_entry_count Max root dir entries (512 for FAT16) │
│ 0x13 2 total_sectors_16 Total sectors if < 65536 │
│ 0x15 1 media_type 0xF8 for hard disk │
│ 0x16 2 fat_size_16 Sectors per FAT │
│ 0x18 2 sectors_per_track For CHS geometry (legacy) │
│ 0x1A 2 num_heads For CHS geometry (legacy) │
│ 0x1C 4 hidden_sectors Sectors before partition │
│ 0x20 4 total_sectors_32 Total sectors if >= 65536 │
│ │
│ ─── EXTENDED BOOT RECORD ──────────────────────────────────────────────── │
│ │
│ 0x24 1 drive_number BIOS drive number (0x80 for HD) │
│ 0x25 1 reserved1 Reserved │
│ 0x26 1 boot_signature Extended boot sig (0x29) │
│ 0x27 4 volume_id Volume serial number │
│ 0x2B 11 volume_label Volume label ("MY DISK ") │
│ 0x36 8 fs_type "FAT16 " (informational only) │
│ │
│ ─── BOOT CODE AND SIGNATURE ───────────────────────────────────────────── │
│ │
│ 0x3E 448 boot_code Bootstrap code (or zeros) │
│ 0x1FE 2 signature Boot signature (0xAA55) │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ CALCULATING KEY VALUES: │
│ │
│ fat_start = reserved_sectors │
│ root_dir_start = reserved_sectors + (num_fats * fat_size_16) │
│ root_dir_sectors = (root_entry_count * 32) / bytes_per_sector │
│ data_start = root_dir_start + root_dir_sectors │
│ first_data_sector = reserved_sectors + (num_fats * fat_size_16) │
│ + root_dir_sectors │
│ cluster_to_lba(N) = first_data_sector + (N - 2) * sectors_per_cluster │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The FAT Table (File Allocation Table)
The FAT is an array of 16-bit values that form cluster chains:
┌─────────────────────────────────────────────────────────────────────────────┐
│ FILE ALLOCATION TABLE (FAT) STRUCTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ The FAT is an array where each entry corresponds to a cluster: │
│ │
│ Index: 0 1 2 3 4 5 6 7 8 │
│ ┌──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐ │
│ Value: │0xFFF8│0xFFFF│ 3 │ 4 │0xFFFF│ 7 │ 0 │ 8 │0xFFFF│ │
│ └──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┘ │
│ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ │
│ │ │ │ │ │ │ │ │ │ │
│ Media Reserved │ │ End of │ Free │ │ End │
│ ID cluster │ │ file A │ cluster │ │of file B│
│ │ │ │ │ │ │
│ └──────┴────────────┘ └──────┴─────────┘ │
│ File A: clusters 2→3→4 File B: 5→7→8 │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ FAT ENTRY VALUES: │
│ │
│ Value Meaning │
│ ───── ─────── │
│ 0x0000 Free cluster (available for allocation) │
│ 0x0001 Reserved (never used) │
│ 0x0002-0xFFEF Next cluster in chain │
│ 0xFFF0-0xFFF6 Reserved │
│ 0xFFF7 Bad cluster (damaged sector) │
│ 0xFFF8-0xFFFF End of cluster chain (EOF marker) │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ READING A FILE: │
│ │
│ 1. Get starting cluster from directory entry │
│ 2. Read data from that cluster │
│ 3. Look up next cluster in FAT table │
│ 4. If next cluster < 0xFFF8, repeat from step 2 │
│ 5. If next cluster >= 0xFFF8, file is complete │
│ │
│ EXAMPLE: Reading a file starting at cluster 2: │
│ │
│ Cluster 2: Read data, FAT[2] = 3 → continue │
│ Cluster 3: Read data, FAT[3] = 4 → continue │
│ Cluster 4: Read data, FAT[4] = 0xFFFF → END OF FILE │
│ │
│ File contents = data from clusters 2 + 3 + 4 │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Directory Entry Structure
Each directory entry is 32 bytes:
┌─────────────────────────────────────────────────────────────────────────────┐
│ FAT DIRECTORY ENTRY (32 bytes) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Offset Size Name Description │
│ ────── ──── ───────── ─────────────────────────────────────── │
│ 0x00 8 name Short filename (8 chars, space-padded) │
│ 0x08 3 extension Extension (3 chars, space-padded) │
│ 0x0B 1 attributes File attributes (see below) │
│ 0x0C 1 reserved Reserved for Windows NT │
│ 0x0D 1 create_time_10ms Creation time in 10ms increments │
│ 0x0E 2 create_time Creation time (hour:min:sec) │
│ 0x10 2 create_date Creation date (year:month:day) │
│ 0x12 2 access_date Last access date │
│ 0x14 2 cluster_high High 16 bits of cluster (FAT32 only) │
│ 0x16 2 modify_time Last modification time │
│ 0x18 2 modify_date Last modification date │
│ 0x1A 2 cluster_low Starting cluster (low 16 bits) │
│ 0x1C 4 file_size File size in bytes │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ ATTRIBUTE BYTE (offset 0x0B): │
│ │
│ Bit Hex Name Description │
│ ─── ──── ──── ─────────────────────────────────────────── │
│ 0 0x01 READ_ONLY File is read-only │
│ 1 0x02 HIDDEN File is hidden │
│ 2 0x04 SYSTEM System file │
│ 3 0x08 VOLUME_ID Volume label (special entry) │
│ 4 0x10 DIRECTORY Entry is a directory, not a file │
│ 5 0x20 ARCHIVE File has been modified │
│ │
│ Special: 0x0F = Long File Name entry (see LFN section) │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ 8.3 FILENAME FORMAT: │
│ │
│ Filename stored as 8 chars + 3 chars, uppercase, space-padded: │
│ │
│ "HELLO TXT" = HELLO.TXT │
│ "README " = README (no extension) │
│ "DATA DAT" = DATA.DAT │
│ ". " = "." (current directory) │
│ ".. " = ".." (parent directory) │
│ │
│ SPECIAL FIRST-BYTE VALUES: │
│ │
│ 0x00 = End of directory (no more entries) │
│ 0xE5 = Deleted entry (slot can be reused) │
│ 0x05 = First char is actually 0xE5 (escaped) │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ DATE/TIME FORMAT: │
│ │
│ Time (2 bytes): │
│ Bits 15-11: Hours (0-23) │
│ Bits 10-5: Minutes (0-59) │
│ Bits 4-0: Seconds/2 (0-29, representing 0-58) │
│ │
│ Date (2 bytes): │
│ Bits 15-9: Year since 1980 (0-127, so 1980-2107) │
│ Bits 8-5: Month (1-12) │
│ Bits 4-0: Day (1-31) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Long File Name (LFN) Entries
FAT16 supports long filenames through special directory entries:
┌─────────────────────────────────────────────────────────────────────────────┐
│ LONG FILE NAME (LFN) ENTRIES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Long filenames are stored in special directory entries BEFORE the │
│ standard 8.3 entry. Each LFN entry holds up to 13 Unicode characters. │
│ │
│ LFN ENTRY STRUCTURE (32 bytes): │
│ │
│ Offset Size Description │
│ ────── ──── ─────────────────────────────────────────────── │
│ 0x00 1 Sequence number (1-based, 0x40 = last entry) │
│ 0x01 10 Characters 1-5 (Unicode, 2 bytes each) │
│ 0x0B 1 Attribute (always 0x0F for LFN) │
│ 0x0C 1 Type (0 for LFN entry) │
│ 0x0D 1 Checksum of 8.3 name │
│ 0x0E 12 Characters 6-11 (Unicode, 2 bytes each) │
│ 0x1A 2 First cluster (always 0 for LFN) │
│ 0x1C 4 Characters 12-13 (Unicode, 2 bytes each) │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ EXAMPLE: File "My Long Filename.txt" │
│ │
│ Directory entries (read from disk): │
│ │
│ Entry 1: LFN #2 (0x42) "name.txt" │
│ Entry 2: LFN #1 (0x01) "My Long File" │
│ Entry 3: 8.3 entry "MYLONG~1TXT" │
│ │
│ The 8.3 name is auto-generated with ~N suffix. │
│ LFN entries are stored in REVERSE order. │
│ Sequence number 0x40 | N indicates last LFN entry. │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ LFN CHECKSUM ALGORITHM: │
│ │
│ uint8_t lfn_checksum(const char *shortname) { │
│ uint8_t sum = 0; │
│ for (int i = 0; i < 11; i++) { │
│ sum = ((sum & 1) ? 0x80 : 0) + (sum >> 1) + shortname[i]; │
│ } │
│ return sum; │
│ } │
│ │
│ For this project, you can focus on 8.3 names first, then add LFN │
│ support as an extension. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
ATA PIO Mode
ATA (AT Attachment) is the interface to IDE/PATA hard drives. PIO (Programmed I/O) mode uses CPU I/O ports:
┌─────────────────────────────────────────────────────────────────────────────┐
│ ATA PIO MODE INTERFACE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ATA uses two I/O port ranges: │
│ │
│ Primary Controller: 0x1F0 - 0x1F7, 0x3F6 │
│ Secondary Controller: 0x170 - 0x177, 0x376 │
│ │
│ PORT LAYOUT (Primary Controller): │
│ │
│ Port Read Write │
│ ───── ──── ───── │
│ 0x1F0 Data (16-bit) Data (16-bit) │
│ 0x1F1 Error register Features register │
│ 0x1F2 Sector count Sector count │
│ 0x1F3 LBA low (0-7) LBA low (0-7) │
│ 0x1F4 LBA mid (8-15) LBA mid (8-15) │
│ 0x1F5 LBA high (16-23) LBA high (16-23) │
│ 0x1F6 Drive/head Drive/head │
│ 0x1F7 Status Command │
│ 0x3F6 Alt status Device control │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ STATUS REGISTER (0x1F7 read): │
│ │
│ Bit Name Description │
│ ─── ──── ─────────────────────────────────────────────────────────── │
│ 7 BSY Drive is busy (wait for 0 before sending commands) │
│ 6 DRDY Drive is ready │
│ 5 DF Drive fault │
│ 4 SRV Service request │
│ 3 DRQ Data request (ready to transfer data) │
│ 2 CORR Corrected data │
│ 1 IDX Index mark (obsolete) │
│ 0 ERR Error occurred (check error register) │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ DRIVE/HEAD REGISTER (0x1F6): │
│ │
│ Bit 7: 1 (reserved) │
│ Bit 6: LBA mode (1) or CHS mode (0) │
│ Bit 5: 1 (reserved) │
│ Bit 4: Drive (0 = master, 1 = slave) │
│ Bits 3-0: LBA bits 24-27 (or head number in CHS) │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ ATA COMMANDS: │
│ │
│ 0x20 READ SECTORS (PIO) │
│ 0x30 WRITE SECTORS (PIO) │
│ 0xEC IDENTIFY DRIVE │
│ 0xE7 FLUSH CACHE │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Reading a Sector with ATA PIO
┌─────────────────────────────────────────────────────────────────────────────┐
│ ATA PIO READ SEQUENCE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: Wait for drive ready │
│ ───────────────────────────── │
│ │
│ while (inb(0x1F7) & 0x80); // Wait for BSY = 0 │
│ │
│ STEP 2: Select drive and send LBA address │
│ ───────────────────────────────────────── │
│ │
│ outb(0x1F6, 0xE0 | ((lba >> 24) & 0x0F)); // Drive 0, LBA bits 24-27 │
│ outb(0x1F2, 1); // Sector count = 1 │
│ outb(0x1F3, lba & 0xFF); // LBA bits 0-7 │
│ outb(0x1F4, (lba >> 8) & 0xFF); // LBA bits 8-15 │
│ outb(0x1F5, (lba >> 16) & 0xFF); // LBA bits 16-23 │
│ │
│ STEP 3: Send READ command │
│ ───────────────────────── │
│ │
│ outb(0x1F7, 0x20); // READ SECTORS command │
│ │
│ STEP 4: Wait for data ready │
│ ──────────────────────────── │
│ │
│ while (!(inb(0x1F7) & 0x08)); // Wait for DRQ = 1 │
│ │
│ STEP 5: Read data (256 words = 512 bytes) │
│ ───────────────────────────────────────── │
│ │
│ for (int i = 0; i < 256; i++) { │
│ buffer[i] = inw(0x1F0); // Read 16-bit word │
│ } │
│ │
│ ───────────────────────────────────────────────────────────────────────── │
│ │
│ COMPLETE READ FUNCTION: │
│ │
│ void ata_read_sector(uint32_t lba, uint16_t *buffer) { │
│ // Wait for drive ready │
│ while (inb(0x1F7) & 0x80); │
│ │
│ // Select drive 0, LBA mode │
│ outb(0x1F6, 0xE0 | ((lba >> 24) & 0x0F)); │
│ outb(0x1F2, 1); // 1 sector │
│ outb(0x1F3, lba & 0xFF); // LBA low │
│ outb(0x1F4, (lba >> 8) & 0xFF); // LBA mid │
│ outb(0x1F5, (lba >> 16) & 0xFF); // LBA high │
│ │
│ // Send read command │
│ outb(0x1F7, 0x20); │
│ │
│ // Wait for data │
│ while (!(inb(0x1F7) & 0x08)); │
│ │
│ // Read 256 words (512 bytes) │
│ for (int i = 0; i < 256; i++) { │
│ buffer[i] = inw(0x1F0); │
│ } │
│ } │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
2.2 Why This Matters
Foundation for Operating Systems:
- Every OS needs to read files from disk
- File system drivers are essential kernel components
- Understanding FAT helps understand more complex file systems
Understanding Storage:
- How data is organized on disk
- Why fragmentation affects performance
- How file deletion really works
Practical Applications:
- Embedded systems often use FAT
- USB drives and SD cards use FAT32
- Understanding enables data recovery
- Firmware and bootloaders read FAT
2.3 Historical Context
The Evolution of FAT:
┌─────────────────────────────────────────────────────────────────────────────┐
│ HISTORY OF FAT FILE SYSTEMS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1977 - FAT (original) │
│ Created by Marc McDonald for Microsoft Standalone Disk BASIC │
│ Used in MDOS/MIDAS │
│ │
│ 1980 - FAT12 │
│ 12-bit cluster entries │
│ Max volume size: 32 MB │
│ Used on floppy disks │
│ QDOS (later MS-DOS) adopted FAT │
│ │
│ 1984 - FAT16 │
│ 16-bit cluster entries │
│ Originally max 32 MB, later 2 GB │
│ MS-DOS 3.0+, Windows 3.x │
│ │
│ 1996 - FAT32 (Windows 95 OSR2) │
│ 28-bit cluster entries (called "32" for marketing) │
│ Max volume size: 2 TB │
│ Still widely used on USB drives │
│ │
│ 2006 - exFAT │
│ Designed for flash drives │
│ Max file size: 16 EB (exabytes) │
│ Used on SDXC cards │
│ │
│ Today - FAT is everywhere! │
│ - USB drives, SD cards │
│ - Cameras, game consoles │
│ - Embedded systems │
│ - EFI System Partition (ESP) │
│ - Cross-platform compatibility │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
2.4 Common Misconceptions
| Misconception | Reality |
|---|---|
| “FAT is obsolete” | FAT is still the most portable file system, used on billions of devices |
| “FAT12/16/32 are different file systems” | They share the same structure, only differ in cluster pointer size |
| “The FAT has to be at sector 1” | FAT starts after reserved sectors (usually 1, but can be more) |
| “Root directory can grow” | Unlike subdirectories, root directory has fixed size in FAT16 |
| “LFN is part of original FAT” | LFN was added by Microsoft in Windows 95, layered on top of 8.3 |
| “Cluster 0 and 1 can be used” | First usable cluster is always 2; 0 and 1 have special meanings |
3. Project Specification
3.1 What You Will Build
A complete FAT16 file system driver that:
- Reads sectors from disk using ATA PIO mode
- Parses the boot sector to understand file system layout
- Reads the FAT table to follow cluster chains
- Lists directory contents showing files and subdirectories
- Reads file contents from fragmented clusters
- Navigates subdirectories to access nested files
- Provides a simple shell interface for user interaction
3.2 Functional Requirements
| ID | Requirement |
|---|---|
| FR1 | Read sectors from ATA drive using PIO mode |
| FR2 | Parse FAT16 boot sector (BPB) and extract parameters |
| FR3 | Load and interpret the FAT table |
| FR4 | List root directory contents with file names, sizes, and types |
| FR5 | Read entire file contents by following cluster chains |
| FR6 | Support subdirectory navigation |
| FR7 | Handle both short (8.3) filenames and optional LFN |
| FR8 | Provide ls and cat commands |
| FR9 | Handle files larger than one cluster (fragmented files) |
| FR10 | Display meaningful error messages for invalid operations |
3.3 Non-Functional Requirements
| ID | Requirement |
|---|---|
| NFR1 | Read speed should be limited only by hardware, not software |
| NFR2 | Support disk images up to 128 MB (plenty for learning) |
| NFR3 | Work correctly with standard FAT16 images from mkfs.fat |
| NFR4 | Code should be readable and well-documented |
| NFR5 | Must handle edge cases (empty directories, zero-byte files) |
3.4 Example Output
$ make
nasm -f elf32 boot.asm -o boot.o
i686-elf-gcc -c kernel.c -o kernel.o -ffreestanding -Wall -Wextra
i686-elf-gcc -c ata.c -o ata.o -ffreestanding -Wall -Wextra
i686-elf-gcc -c fat16.c -o fat16.o -ffreestanding -Wall -Wextra
i686-elf-gcc -c shell.c -o shell.o -ffreestanding -Wall -Wextra
i686-elf-ld -T link.ld -o kernel.elf boot.o kernel.o ata.o fat16.o shell.o
$ qemu-system-i386 -kernel kernel.elf -hda disk.img -serial stdio
================================================================================
FAT16 File System Driver Demo v1.0
================================================================================
[INIT] Detecting ATA drives...
Primary Master: 64 MB disk detected
[INIT] Reading boot sector...
Bytes per sector: 512
Sectors per cluster: 4
Reserved sectors: 1
Number of FATs: 2
Root entry count: 512
FAT size: 128 sectors
Volume label: MY DISK
Total clusters: 16256
[INIT] File system mounted successfully!
================================================================================
FAT16 Shell
================================================================================
Type 'help' for available commands.
> ls
Directory of /
HELLO.TXT 17 bytes 2024-12-29 14:30
README.MD 256 bytes 2024-12-29 14:32
CONFIG.SYS 48 bytes 2024-12-29 14:28
DOCS/ <DIR> 2024-12-29 14:35
PROGRAMS/ <DIR> 2024-12-29 14:36
5 entries (321 bytes in files)
> cat hello.txt
Hello from file!
> ls docs
Directory of /DOCS
. <DIR>
.. <DIR>
NOTES.TXT 1,240 bytes 2024-12-29 14:35
DATA.CSV 15,892 bytes 2024-12-29 14:35
4 entries (17,132 bytes in files)
> cat docs/notes.txt
These are my notes about FAT16 file system implementation.
The FAT16 file system was developed by Microsoft and has been
widely used in various operating systems including MS-DOS and
Windows 95/98.
Key components:
- Boot sector with BPB
- File Allocation Table
- Root directory
- Data region
...
> cd docs
Current directory: /DOCS
> ls
Directory of /DOCS
. <DIR>
.. <DIR>
NOTES.TXT 1,240 bytes 2024-12-29 14:35
DATA.CSV 15,892 bytes 2024-12-29 14:35
4 entries (17,132 bytes in files)
> cat data.csv
Name,Value,Date
Item1,100,2024-01-01
Item2,200,2024-01-02
Item3,300,2024-01-03
...
> cd ..
Current directory: /
> cat programs/missing.txt
Error: File not found: MISSING.TXT
> help
Available commands:
ls [path] - List directory contents
cat <file> - Display file contents
cd <path> - Change directory
pwd - Print working directory
info - Show file system information
help - Show this help message
> info
=== FAT16 File System Information ===
Volume Label: MY DISK
Volume Serial: 0x12345678
OEM Name: MSDOS5.0
Geometry:
Bytes/Sector: 512
Sectors/Cluster: 4
Cluster Size: 2048 bytes
Layout:
Reserved: 1 sectors
FAT Start: sector 1
FAT Size: 128 sectors x 2 copies
Root Dir Start: sector 257
Root Dir Size: 32 sectors
Data Start: sector 289
Total Clusters: 16256
Usage:
Total Space: 64 MB
Used: 2 MB
Free: 62 MB
>
3.5 Real World Outcome
After completing this project, you will have:
- A working file system driver that can read files from a FAT16 disk image
- Deep understanding of how operating systems access persistent storage
- ATA driver experience for direct hardware communication
- Foundation for OS development – loading programs, reading configs
- Debugging skills for storage-related issues
- Practical FAT knowledge applicable to embedded systems
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────────────────┐
│ FAT16 DRIVER ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐│
│ │ USER INTERFACE ││
│ │ ┌──────────────────────────────────────────────────────────────────┐ ││
│ │ │ Shell / Commands │ ││
│ │ │ ls cat cd pwd info help │ ││
│ │ └──────────────────────────────────────────────────────────────────┘ ││
│ └────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐│
│ │ FAT16 FILE SYSTEM LAYER ││
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌────────────────────────┐ ││
│ │ │ Directory Ops │ │ File Ops │ │ Cluster Management │ ││
│ │ │ - list_dir() │ │ - read_file() │ │ - get_next_cluster() │ ││
│ │ │ - find_entry() │ │ - get_size() │ │ - cluster_to_lba() │ ││
│ │ │ - parse_entry()│ │ │ │ - read_cluster() │ ││
│ │ └─────────────────┘ └─────────────────┘ └────────────────────────┘ ││
│ │ │ ││
│ │ ┌───────────────────────────────────────────────────────────────────┐ ││
│ │ │ FAT Table Cache │ ││
│ │ │ (16-bit array mapping cluster -> next cluster) │ ││
│ │ └───────────────────────────────────────────────────────────────────┘ ││
│ │ │ ││
│ │ ┌───────────────────────────────────────────────────────────────────┐ ││
│ │ │ BPB / Superblock Info │ ││
│ │ │ sectors_per_cluster, fat_start, root_start, data_start, etc. │ ││
│ │ └───────────────────────────────────────────────────────────────────┘ ││
│ └────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐│
│ │ BLOCK I/O LAYER ││
│ │ ┌─────────────────────────────────────────────────────────────────┐ ││
│ │ │ Sector Cache (optional) │ ││
│ │ └─────────────────────────────────────────────────────────────────┘ ││
│ │ │ ││
│ │ ┌─────────────────────────────────────────────────────────────────┐ ││
│ │ │ read_sector(lba, buffer) │ ││
│ │ │ read_sectors(lba, count, buffer) │ ││
│ │ └─────────────────────────────────────────────────────────────────┘ ││
│ └────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐│
│ │ ATA DRIVER ││
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ ││
│ │ │ ata_identify() │ │ ata_read_pio() │ │ Port I/O (inb/outb) │ ││
│ │ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ ││
│ └────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐│
│ │ HARDWARE ││
│ │ ATA Controller (0x1F0-0x1F7) ││
│ │ Hard Disk ││
│ └────────────────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | File | Responsibility |
|---|---|---|
| ATA Driver | ata.c / ata.h |
Low-level disk I/O via PIO mode |
| FAT16 Driver | fat16.c / fat16.h |
File system parsing and navigation |
| Shell | shell.c / shell.h |
User command interface |
| Kernel Main | kernel.c |
Initialization and main loop |
| Boot | boot.asm |
Protected mode entry (from previous projects) |
| Linker Script | link.ld |
Memory layout |
4.3 Data Structures
//------------------------------------------
// ATA Driver Structures
//------------------------------------------
typedef struct {
uint16_t base_port; // Command block base (0x1F0 or 0x170)
uint16_t ctrl_port; // Control block (0x3F6 or 0x376)
uint8_t drive; // 0 = master, 1 = slave
uint32_t size_sectors; // Total sectors on drive
char model[41]; // Drive model string
} ata_drive_t;
//------------------------------------------
// FAT16 Boot Parameter Block
//------------------------------------------
typedef struct __attribute__((packed)) {
uint8_t jmp_boot[3]; // Jump instruction
char oem_name[8]; // OEM identifier
uint16_t bytes_per_sector; // Usually 512
uint8_t sectors_per_cluster; // Power of 2
uint16_t reserved_sectors; // Before first FAT
uint8_t num_fats; // Usually 2
uint16_t root_entry_count; // Max root entries (512)
uint16_t total_sectors_16; // If < 65536
uint8_t media_type; // 0xF8 for HD
uint16_t fat_size_16; // Sectors per FAT
uint16_t sectors_per_track; // CHS geometry
uint16_t num_heads; // CHS geometry
uint32_t hidden_sectors; // Before partition
uint32_t total_sectors_32; // If >= 65536
// Extended boot record
uint8_t drive_number; // BIOS drive
uint8_t reserved1;
uint8_t boot_signature; // 0x29 if extended
uint32_t volume_id; // Serial number
char volume_label[11]; // Volume name
char fs_type[8]; // "FAT16 "
} fat16_bpb_t;
//------------------------------------------
// FAT16 Directory Entry
//------------------------------------------
typedef struct __attribute__((packed)) {
char name[8]; // Short name
char ext[3]; // Extension
uint8_t attributes; // File attributes
uint8_t reserved; // NT reserved
uint8_t create_time_tenth; // 10ms units
uint16_t create_time; // Hour:min:sec
uint16_t create_date; // Year:month:day
uint16_t access_date; // Last access
uint16_t cluster_high; // High 16 bits (FAT32)
uint16_t modify_time; // Last modification
uint16_t modify_date;
uint16_t cluster_low; // Starting cluster
uint32_t file_size; // Size in bytes
} fat16_dir_entry_t;
// Attribute flags
#define ATTR_READ_ONLY 0x01
#define ATTR_HIDDEN 0x02
#define ATTR_SYSTEM 0x04
#define ATTR_VOLUME_ID 0x08
#define ATTR_DIRECTORY 0x10
#define ATTR_ARCHIVE 0x20
#define ATTR_LFN 0x0F // Long filename marker
//------------------------------------------
// FAT16 File System State
//------------------------------------------
typedef struct {
// Drive reference
ata_drive_t *drive;
// BPB cached values
uint16_t bytes_per_sector;
uint8_t sectors_per_cluster;
uint16_t reserved_sectors;
uint8_t num_fats;
uint16_t root_entry_count;
uint16_t fat_size;
uint32_t total_sectors;
// Calculated values
uint32_t fat_start_sector;
uint32_t root_dir_start_sector;
uint32_t root_dir_sectors;
uint32_t data_start_sector;
uint32_t total_clusters;
uint32_t cluster_size; // bytes
// FAT table in memory
uint16_t *fat_table;
// Current directory tracking
uint16_t current_dir_cluster; // 0 = root
char current_path[256];
} fat16_fs_t;
//------------------------------------------
// File Handle (for reading files)
//------------------------------------------
typedef struct {
fat16_fs_t *fs;
uint16_t start_cluster;
uint16_t current_cluster;
uint32_t file_size;
uint32_t position; // Current read position
char name[13]; // 8.3 + dot + null
} fat16_file_t;
4.4 Algorithm Overview
Mounting the File System:
1. Read sector 0 (boot sector)
2. Parse BPB structure
3. Calculate derived values:
- fat_start = reserved_sectors
- root_dir_start = reserved + (num_fats * fat_size)
- root_dir_sectors = (root_entries * 32) / bytes_per_sector
- data_start = root_dir_start + root_dir_sectors
4. Allocate and read FAT table into memory
5. Initialize current directory to root
Reading a Directory:
1. If directory is root:
- Read sectors from root_dir_start
- Process up to root_entry_count entries
2. If directory is subdirectory:
- Start at directory's cluster
- Read cluster, process entries
- Follow cluster chain for large directories
3. For each entry:
- Skip if first byte is 0x00 (end) or 0xE5 (deleted)
- Skip if attributes == 0x0F (LFN entry)
- Parse filename, size, attributes, cluster
Reading a File:
1. Find file's directory entry
2. Get starting cluster from entry
3. Initialize position = 0, bytes_read = 0
4. While position < file_size:
a. Calculate offset within current cluster
b. Read data from current cluster
c. Advance position
d. If reached end of cluster:
- Look up next cluster in FAT
- If FAT[cluster] >= 0xFFF8, EOF
- Else current_cluster = FAT[cluster]
5. Return bytes_read
Converting Cluster to LBA:
lba = data_start_sector + (cluster - 2) * sectors_per_cluster
5. Implementation Guide
5.1 Environment Setup
# Same as previous x86 kernel projects
# Verify cross-compiler
i686-elf-gcc --version
# Verify NASM
nasm --version
# Verify QEMU
qemu-system-i386 --version
# Create a FAT16 disk image for testing
dd if=/dev/zero of=disk.img bs=1M count=64
mkfs.fat -F 16 -n "MY DISK" disk.img
# Mount and add test files (Linux)
mkdir mnt
sudo mount disk.img mnt
echo "Hello from file!" | sudo tee mnt/HELLO.TXT
sudo mkdir mnt/DOCS
echo "Test content" | sudo tee mnt/DOCS/TEST.TXT
sudo umount mnt
# Alternative: Use mtools (doesn't need root)
mformat -F -v "MY DISK" -i disk.img ::
echo "Hello from file!" | mwrite -i disk.img - ::/HELLO.TXT
mmd -i disk.img ::/DOCS
echo "Test content" | mwrite -i disk.img - ::/DOCS/TEST.TXT
5.2 Project Structure
fat16_driver/
├── Makefile
├── link.ld # Linker script
├── boot.asm # Protected mode entry
├── kernel.c # Main initialization
├── kernel.h # Common definitions
├── ata.c # ATA PIO driver
├── ata.h
├── fat16.c # FAT16 file system
├── fat16.h
├── shell.c # Command interface
├── shell.h
├── string.c # String utilities
├── string.h
├── vga.c # VGA text output (from P6)
├── vga.h
├── serial.c # Serial debug output
├── serial.h
├── io.h # Port I/O functions
└── disk.img # Test FAT16 image
5.3 The Core Question You’re Answering
“How does an operating system read files from a hard disk?”
You’ll answer this by:
- Writing code that communicates with the disk controller
- Understanding how data is organized into clusters and chains
- Parsing on-disk data structures to find files
- Following chains through the FAT to read complete files
5.4 Concepts You Must Understand First
| Concept | Self-Assessment Question | Reference |
|---|---|---|
| x86 Port I/O | How do inb/outb/inw work? | Intel SDM, Vol 1 |
| Protected Mode | Is your kernel in protected mode? | Project 6 |
| Memory-mapped structs | How to read a struct from raw bytes? | C alignment |
| LBA addressing | How is LBA different from CHS? | ATA specification |
| Endianness | Is FAT little-endian or big-endian? | FAT specification |
5.5 Questions to Guide Your Design
ATA Driver:
- How do you detect if a drive is present?
- What’s the maximum LBA you can address with 28-bit LBA?
- How do you handle read errors?
- Should you poll status or use interrupts?
FAT16 Driver:
- How do you distinguish FAT12 from FAT16 from FAT32?
- Should you load the entire FAT into memory?
- How do you handle the root directory differently from subdirectories?
- What happens when a file spans multiple clusters?
Shell:
- How do you parse user input?
- How do you handle paths with subdirectories?
- What happens when a file is too large to display?
5.6 Thinking Exercise
Before coding, trace through this scenario by hand:
SCENARIO: Reading a 5,000 byte file on FAT16 with 2KB clusters
Given:
- bytes_per_sector = 512
- sectors_per_cluster = 4
- cluster_size = 2048 bytes
- File "DATA.TXT" starts at cluster 10
- FAT table: FAT[10] = 11, FAT[11] = 12, FAT[12] = 0xFFFF
Questions:
1. How many clusters does this file occupy?
2. What are the LBA addresses for each cluster?
3. How many sectors do you need to read?
4. How much of the last cluster is actually file data?
Draw the cluster chain and calculate byte ranges for each cluster.
5.7 Hints in Layers
Hint 1 - Starting Point: Start with the ATA driver. Get single-sector reads working first. Use a simple test: read sector 0 and verify the 0xAA55 signature at offset 0x1FE.
Hint 2 - Next Level: FAT16 detection: After reading the boot sector, check these conditions:
bpb->bytes_per_sectorshould be 512, 1024, 2048, or 4096bpb->sectors_per_clustershould be 1, 2, 4, 8, 16, 32, 64, or 128- The signature at offset 0x1FE should be 0xAA55
Calculate total clusters:
uint32_t root_dir_sectors = ((bpb->root_entry_count * 32) +
(bpb->bytes_per_sector - 1)) /
bpb->bytes_per_sector;
uint32_t data_sectors = total_sectors -
(bpb->reserved_sectors +
(bpb->num_fats * bpb->fat_size_16) +
root_dir_sectors);
uint32_t total_clusters = data_sectors / bpb->sectors_per_cluster;
// FAT type determination:
// If total_clusters < 4085, it's FAT12
// If total_clusters < 65525, it's FAT16
// Otherwise, it's FAT32
Hint 3 - Technical Details: Directory entry parsing:
int parse_dir_entry(fat16_dir_entry_t *entry, char *filename) {
// Check for end of directory
if (entry->name[0] == 0x00) return -1; // No more entries
// Check for deleted entry
if (entry->name[0] == 0xE5) return 0; // Skip this entry
// Check for LFN entry
if ((entry->attributes & 0x0F) == 0x0F) return 0; // Skip LFN
// Check for volume label
if (entry->attributes & ATTR_VOLUME_ID) return 0; // Skip
// Build 8.3 filename
int pos = 0;
for (int i = 0; i < 8 && entry->name[i] != ' '; i++) {
filename[pos++] = entry->name[i];
}
// Add extension if present
if (entry->ext[0] != ' ') {
filename[pos++] = '.';
for (int i = 0; i < 3 && entry->ext[i] != ' '; i++) {
filename[pos++] = entry->ext[i];
}
}
filename[pos] = '\0';
return 1; // Valid entry
}
Hint 4 - Debugging: Add extensive debug output:
void dump_bpb(fat16_bpb_t *bpb) {
serial_printf("BPB Debug:\n");
serial_printf(" bytes_per_sector: %u\n", bpb->bytes_per_sector);
serial_printf(" sectors_per_cluster: %u\n", bpb->sectors_per_cluster);
serial_printf(" reserved_sectors: %u\n", bpb->reserved_sectors);
serial_printf(" num_fats: %u\n", bpb->num_fats);
serial_printf(" root_entry_count: %u\n", bpb->root_entry_count);
serial_printf(" fat_size_16: %u\n", bpb->fat_size_16);
serial_printf(" total_sectors: %u\n",
bpb->total_sectors_16 ? bpb->total_sectors_16
: bpb->total_sectors_32);
}
void dump_fat_chain(fat16_fs_t *fs, uint16_t start_cluster) {
serial_printf("FAT chain from cluster %u:\n ", start_cluster);
uint16_t cluster = start_cluster;
int count = 0;
while (cluster < 0xFFF8 && count < 100) { // Limit to prevent infinite loop
serial_printf("%u -> ", cluster);
cluster = fs->fat_table[cluster];
count++;
}
serial_printf("END\n");
}
5.8 The Interview Questions They’ll Ask
- “Explain how FAT16 stores files on disk.”
- Expected: Boot sector -> FAT table -> Directory entries -> Data clusters
- Mention cluster chains and how FAT entries link clusters
- “What is fragmentation and how does FAT handle it?”
- Expected: Files stored in non-contiguous clusters
- FAT table links clusters regardless of physical location
- Performance impact: more seeks needed
- “How does the operating system know where a file’s data is located?”
- Expected: Directory entry contains starting cluster
- FAT table chains to subsequent clusters
- Data region starts after root directory
- “What’s the difference between a file and a directory in FAT?”
- Expected: Both are directory entries
- Directory has ATTR_DIRECTORY flag set
- Directory’s “file data” is array of more directory entries
- “Why might FAT16 be preferred over FAT32 in some embedded systems?”
- Expected: Smaller FAT table, simpler code, less memory
- Good enough for small storage (< 2GB)
- More compatible with legacy systems
- “How would you implement file deletion in FAT?”
- Expected: Mark directory entry as deleted (0xE5)
- Mark all clusters in chain as free (0x0000)
- Data remains until overwritten (data recovery possible)
- “What happens if the FAT table becomes corrupted?”
- Expected: File data becomes unrecoverable
- That’s why two FAT copies exist
- fsck/chkdsk compare and repair
5.9 Books That Will Help
| Topic | Book | Chapters |
|---|---|---|
| File Systems Overview | “Operating Systems: Three Easy Pieces” | Ch. 39-42 |
| FAT File System | Microsoft FAT Specification | Entire document |
| ATA Interface | ATA/ATAPI-7 Specification | Volume 1, Ch. 5-6 |
| x86 I/O | “Write Great Code, Vol 1” | Ch. 11 |
| Block I/O | “Linux Device Drivers” | Ch. 16 |
5.10 Implementation Phases
Phase 1: ATA Driver (Week 1)
- Implement port I/O functions (inb, outb, inw, outw)
- Write ATA initialization and drive detection
- Implement single-sector read function
- Test by reading sector 0 and verifying boot signature
- Checkpoint: Can read any sector from disk
Phase 2: FAT16 Mounting (Week 1-2)
- Parse boot sector into BPB structure
- Calculate file system geometry (FAT start, root start, data start)
- Allocate and load FAT table into memory
- Verify FAT16 detection (cluster count check)
- Checkpoint: Prints correct file system parameters
Phase 3: Directory Reading (Week 2)
- Implement root directory reading
- Parse directory entries, skip deleted and LFN entries
- Format and display directory listing
- Handle directories with many entries (multiple sectors)
- Checkpoint:
lscommand shows root directory
Phase 4: File Reading (Week 2-3)
- Implement file lookup by name
- Follow cluster chain to read file data
- Handle files larger than one cluster
- Handle edge cases (empty files, exactly cluster-sized files)
- Checkpoint:
catcommand displays file contents
Phase 5: Subdirectory Navigation (Week 3)
- Implement subdirectory reading (similar to file reading)
- Handle “.” and “..” entries
- Implement path parsing and navigation
- Implement
cdcommand - Checkpoint: Can navigate and list subdirectories
Phase 6: Polish and Testing (Week 3-4)
- Add
pwd,info,helpcommands - Create comprehensive test disk image
- Test edge cases (long paths, many files)
- Add error handling and user-friendly messages
- Optional: Add LFN support
- Checkpoint: All tests pass, clean user experience
5.11 Key Implementation Decisions
| Decision | Option A | Option B | Recommendation |
|---|---|---|---|
| FAT caching | Load entire FAT | Load on demand | Entire FAT (simpler, <128KB for 64MB disk) |
| Sector buffer | Static global | Stack-allocated | Static (stack might be limited) |
| Case sensitivity | Case-sensitive | Case-insensitive | Case-insensitive (FAT standard) |
| Path separator | / Unix style |
\ DOS style |
/ (Unix, more familiar) |
| Error handling | Return codes | Global errno | Return codes (simpler) |
| Directory listing | Simple | Column-formatted | Column-formatted (more readable) |
6. Testing Strategy
6.1 Creating Test Disk Images
#!/bin/bash
# create_test_disk.sh
# Create 64MB FAT16 disk image
dd if=/dev/zero of=disk.img bs=1M count=64
mkfs.fat -F 16 -n "TESTDISK" disk.img
# Add test files using mtools (no root needed)
# Root directory files
echo "Hello from file!" > /tmp/hello.txt
mcopy -i disk.img /tmp/hello.txt ::HELLO.TXT
echo "Short file" > /tmp/short.txt
mcopy -i disk.img /tmp/short.txt ::SHORT.TXT
# Create a file larger than one cluster
dd if=/dev/urandom bs=5000 count=1 | base64 > /tmp/large.txt
mcopy -i disk.img /tmp/large.txt ::LARGE.TXT
# Create subdirectory with files
mmd -i disk.img ::DOCS
echo "Document content" > /tmp/doc.txt
mcopy -i disk.img /tmp/doc.txt ::DOCS/README.TXT
# Create nested subdirectory
mmd -i disk.img ::DOCS/NESTED
echo "Nested file" > /tmp/nested.txt
mcopy -i disk.img /tmp/nested.txt ::DOCS/NESTED/FILE.TXT
# Create empty directory
mmd -i disk.img ::EMPTY
# Create empty file
touch /tmp/empty.txt
mcopy -i disk.img /tmp/empty.txt ::EMPTY.TXT
# Show what was created
echo "Disk contents:"
mdir -i disk.img ::
mdir -i disk.img ::DOCS
6.2 Unit Tests
// test_ata.c
void test_ata_read_sector_0(void) {
uint8_t buffer[512];
int result = ata_read_sector(0, buffer);
assert(result == 0); // No error
// Check boot signature
uint16_t signature = *(uint16_t*)(buffer + 0x1FE);
assert(signature == 0xAA55);
serial_print("test_ata_read_sector_0: PASS\n");
}
void test_ata_read_multiple_sectors(void) {
uint8_t buffer1[512], buffer2[512];
// Read same sector twice, should match
ata_read_sector(0, buffer1);
ata_read_sector(0, buffer2);
for (int i = 0; i < 512; i++) {
assert(buffer1[i] == buffer2[i]);
}
// Read different sectors, should NOT match
ata_read_sector(1, buffer2);
int different = 0;
for (int i = 0; i < 512; i++) {
if (buffer1[i] != buffer2[i]) different++;
}
assert(different > 0); // At least some bytes different
serial_print("test_ata_read_multiple_sectors: PASS\n");
}
// test_fat16.c
void test_fat16_mount(void) {
fat16_fs_t fs;
int result = fat16_mount(&primary_drive, &fs);
assert(result == 0);
// Verify expected values
assert(fs.bytes_per_sector == 512);
assert(fs.sectors_per_cluster == 4); // 2KB clusters for 64MB
serial_print("test_fat16_mount: PASS\n");
}
void test_fat16_list_root(void) {
fat16_fs_t fs;
fat16_mount(&primary_drive, &fs);
fat16_dir_entry_t entries[16];
int count = fat16_list_dir(&fs, 0, entries, 16);
assert(count > 0); // Should have at least one entry
// Look for HELLO.TXT
int found = 0;
for (int i = 0; i < count; i++) {
char name[13];
parse_filename(&entries[i], name);
if (strcmp(name, "HELLO.TXT") == 0) {
found = 1;
assert(entries[i].file_size == 17); // "Hello from file!\n"
}
}
assert(found);
serial_print("test_fat16_list_root: PASS\n");
}
void test_fat16_read_file(void) {
fat16_fs_t fs;
fat16_mount(&primary_drive, &fs);
char buffer[64];
int bytes = fat16_read_file(&fs, "HELLO.TXT", buffer, sizeof(buffer));
assert(bytes == 17);
buffer[bytes] = '\0';
assert(strcmp(buffer, "Hello from file!\n") == 0);
serial_print("test_fat16_read_file: PASS\n");
}
void test_fat16_large_file(void) {
fat16_fs_t fs;
fat16_mount(&primary_drive, &fs);
// LARGE.TXT is ~6700 bytes, spans multiple clusters
char buffer[8192];
int bytes = fat16_read_file(&fs, "LARGE.TXT", buffer, sizeof(buffer));
assert(bytes > 2048); // Should be larger than one cluster
// Verify it's valid base64 (no null bytes in middle)
for (int i = 0; i < bytes - 1; i++) {
assert(buffer[i] != '\0');
}
serial_print("test_fat16_large_file: PASS\n");
}
void test_fat16_subdirectory(void) {
fat16_fs_t fs;
fat16_mount(&primary_drive, &fs);
// Find DOCS directory
fat16_dir_entry_t entry;
int result = fat16_find_entry(&fs, "DOCS", &entry);
assert(result == 0);
assert(entry.attributes & ATTR_DIRECTORY);
// List DOCS directory
fat16_dir_entry_t entries[16];
int count = fat16_list_dir(&fs, entry.cluster_low, entries, 16);
// Should have ".", "..", and "README.TXT"
assert(count >= 3);
serial_print("test_fat16_subdirectory: PASS\n");
}
6.3 Integration Tests
void test_shell_commands(void) {
// Test ls command
shell_execute("ls");
// Manual verification: check serial output
// Test cat command
shell_execute("cat HELLO.TXT");
// Should print: Hello from file!
// Test cd and pwd
shell_execute("cd DOCS");
shell_execute("pwd");
// Should print: /DOCS
// Test relative paths
shell_execute("cat README.TXT");
// Should print: Document content
// Test parent directory
shell_execute("cd ..");
shell_execute("pwd");
// Should print: /
serial_print("test_shell_commands: Manual verification required\n");
}
6.4 Edge Case Tests
void test_edge_cases(void) {
fat16_fs_t fs;
fat16_mount(&primary_drive, &fs);
// Test empty file
char buffer[64];
int bytes = fat16_read_file(&fs, "EMPTY.TXT", buffer, sizeof(buffer));
assert(bytes == 0);
// Test non-existent file
bytes = fat16_read_file(&fs, "NOEXIST.TXT", buffer, sizeof(buffer));
assert(bytes == -1); // Error return
// Test empty directory
fat16_dir_entry_t entries[16];
fat16_dir_entry_t dir_entry;
fat16_find_entry(&fs, "EMPTY", &dir_entry);
int count = fat16_list_dir(&fs, dir_entry.cluster_low, entries, 16);
assert(count == 2); // Only "." and ".."
serial_print("test_edge_cases: PASS\n");
}
7. Common Pitfalls and Debugging
7.1 Common Pitfalls
| Symptom | Likely Cause | Fix |
|---|---|---|
| Garbage when reading sectors | Wrong endianness | x86 is little-endian, FAT is little-endian (should match) |
| First cluster data wrong | Cluster numbering | First data cluster is 2, not 0 |
| Can’t find files | Case sensitivity | Convert filenames to uppercase for comparison |
| Partial file contents | Wrong cluster chain | Verify FAT table reading and interpretation |
| Subdirectory empty | Reading root instead | Root uses sectors, subdirs use clusters |
| Hang on disk read | Not waiting for BSY | Poll status register before and after commands |
| Wrong file size | Attribute confusion | Check attribute byte, don’t confuse with size |
| Truncated long files | Single cluster read | Must follow chain for multi-cluster files |
7.2 Debugging Techniques
Hex Dump Sector:
void dump_sector(uint8_t *buffer) {
for (int row = 0; row < 32; row++) {
serial_printf("%04x: ", row * 16);
for (int col = 0; col < 16; col++) {
serial_printf("%02x ", buffer[row * 16 + col]);
}
serial_printf(" ");
for (int col = 0; col < 16; col++) {
char c = buffer[row * 16 + col];
serial_printf("%c", (c >= 32 && c < 127) ? c : '.');
}
serial_printf("\n");
}
}
Verify Disk Image with Linux:
# Mount the disk image to verify contents
sudo mount -o loop,ro disk.img /mnt
ls -la /mnt
cat /mnt/HELLO.TXT
sudo umount /mnt
# Or use mtools
mdir -i disk.img ::
mtype -i disk.img ::HELLO.TXT
# Hex dump boot sector
xxd disk.img | head -32
# Check FAT entries
# FAT starts at sector 1 (reserved_sectors = 1)
# Each FAT16 entry is 2 bytes
dd if=disk.img bs=512 skip=1 count=1 | xxd | head
GDB with QEMU:
# Terminal 1:
qemu-system-i386 -kernel kernel.elf -hda disk.img -s -S -serial stdio
# Terminal 2:
gdb kernel.elf
(gdb) target remote localhost:1234
(gdb) break fat16_read_file
(gdb) continue
# When breakpoint hits:
(gdb) print *fs
(gdb) print filename
(gdb) x/32xb buffer # Examine buffer contents
7.3 The Classic Bugs
Bug 1: Off-by-two cluster calculation
// WRONG - forgetting that clusters start at 2
uint32_t lba = fs->data_start_sector + cluster * fs->sectors_per_cluster;
// RIGHT
uint32_t lba = fs->data_start_sector + (cluster - 2) * fs->sectors_per_cluster;
Bug 2: Reading root directory as clusters
// WRONG - treating root directory like a subdirectory
uint32_t lba = cluster_to_lba(fs, 0);
// RIGHT - root directory has special location
if (cluster == 0) {
// Root directory: use root_dir_start_sector
lba = fs->root_dir_start_sector;
} else {
// Subdirectory: use cluster chain
lba = cluster_to_lba(fs, cluster);
}
Bug 3: Case-sensitive comparison
// WRONG - FAT stores uppercase
if (strcmp(entry->name, "hello ") == 0)
// RIGHT - compare uppercase
char name_upper[12];
for (int i = 0; i < 11; i++) {
name_upper[i] = toupper(entry->name[i]);
}
if (memcmp(name_upper, "HELLO TXT", 11) == 0)
Bug 4: Not handling packed structures
// WRONG - might have padding on some compilers
struct fat16_bpb {
uint8_t jmp[3];
char oem[8];
uint16_t bytes_per_sector; // Might be at wrong offset!
};
// RIGHT - force packed
struct fat16_bpb {
uint8_t jmp[3];
char oem[8];
uint16_t bytes_per_sector;
} __attribute__((packed));
Bug 5: Not waiting for drive ready
// WRONG - sending commands immediately
outb(0x1F7, 0x20); // Might fail if drive busy
// RIGHT - wait for BSY clear first
while (inb(0x1F7) & 0x80); // Wait for BSY = 0
outb(0x1F7, 0x20);
8. Extensions and Challenges
8.1 Beginner Extensions
- File search: Implement recursive search for files by name
- File size formatting: Show sizes in KB/MB when appropriate
- Date formatting: Parse and display human-readable dates
- Tab completion: Basic filename completion for shell commands
- Command history: Up/down arrows to recall previous commands
8.2 Intermediate Extensions
- Long File Name (LFN) support: Read and display long filenames
- Write support: Implement file creation and writing
- Directory creation: Implement mkdir command
- File copying: Implement cp command (requires write)
- Multiple drives: Support secondary ATA drive
8.3 Advanced Extensions
- FAT32 support: Extend to handle 32-bit cluster numbers
- Caching layer: Add sector cache for better performance
- VFS layer: Abstract file system interface
- Partition table parsing: Read MBR and support partitions
- DMA transfers: Use ATA DMA mode for better performance
8.4 Expert Challenges
- File deletion and defragmentation: Implement delete and defrag
- fsck equivalent: Implement file system consistency checking
- Journaling wrapper: Add simple journaling for crash recovery
- Network file access: Load files from TFTP or HTTP
- Port to ext2: Implement a more complex file system
9. Real-World Connections
9.1 How Linux Does It
Linux VFS Layer: Linux has a Virtual File System layer that abstracts different file systems:
User Space: open("/mnt/usb/file.txt", O_RDONLY)
│
▼
Kernel VFS: struct file_operations { .read, .write, ... }
│
▼
FAT Driver: fat_file_read() → fat_get_cluster() → sb_bread()
│
▼
Block Layer: submit_bio() → request queue
│
▼
ATA Driver: ata_scsi_queuecmd() → ata_pio_task()
Linux FAT Implementation:
- Located in
fs/fat/directory inode.c- inode operationsdir.c- directory operationsfile.c- file operationscache.c- FAT table cachingfatent.c- FAT entry manipulation
9.2 Modern Storage Stack
Your simple ATA PIO driver is the ancestor of modern storage:
Your Implementation: Modern Linux:
───────────────── ─────────────
┌─────────────────┐
│ io_uring / AIO │ Async interface
└────────┬────────┘
│
FAT16 Driver ───► ┌────────▼────────┐
│ VFS Layer │ Abstraction
└────────┬────────┘
│
┌────────▼────────┐
│ Block Layer │ Scheduling, merging
└────────┬────────┘
│
ATA PIO Driver ───► ┌────────▼────────┐
│ libata │ AHCI, NVMe
└────────┬────────┘
│
Direct Port I/O ───► ┌────────▼────────┐
│ DMA / IOMMU │ Hardware acceleration
└─────────────────┘
9.3 Career Applications
Embedded Systems Developer:
- Many embedded devices use FAT for external storage
- SD cards in cameras, IoT devices, automotive
- Understanding FAT is directly applicable
Operating Systems Developer:
- File system development is core OS work
- Companies: Microsoft, Apple, Google, Red Hat
- Foundation for more complex file systems
Storage Systems Engineer:
- Understanding from raw blocks to files
- Companies: NetApp, Pure Storage, Dell EMC
- Performance optimization, reliability
Forensics and Data Recovery:
- Understanding FAT structure enables recovery
- Deleted file recovery
- Disk image analysis
10. Resources
10.1 Essential Reading
| Resource | URL/Location | Usage |
|---|---|---|
| Microsoft FAT Spec | Microsoft Hardware Dev | Definitive reference |
| OSDev FAT | wiki.osdev.org/FAT | Implementation guide |
| OSDev ATA PIO | wiki.osdev.org/ATA_PIO_Mode | ATA driver reference |
| OSTEP File Systems | ostep.org, Ch. 39-42 | Conceptual foundation |
10.2 Reference Implementations
| Project | URL | Notes |
|---|---|---|
| FatFs | elm-chan.org/fsw/ff/ | Embedded FAT library |
| Linux FAT | github.com/torvalds/linux/tree/master/fs/fat | Production implementation |
| SerenityOS FAT | serenityos.org | Clean C++ implementation |
| xv6 file system | github.com/mit-pdos/xv6-public | Unix V6 style (simpler) |
10.3 Specifications
| Document | Description |
|---|---|
| Microsoft FAT Specification | Official FAT12/16/32 spec |
| ATA/ATAPI-7 | T13 standard for ATA interface |
| ECMA-107 | FAT file system standard |
10.4 Tools
| Tool | Usage |
|---|---|
| mtools | Manipulate FAT images without mounting |
| mkfs.fat | Create FAT file systems |
| dosfsck | Check/repair FAT file systems |
| xxd | Hex dump for debugging |
| hd / hexdump | Alternative hex dump |
11. Self-Assessment Checklist
Before Starting
- I have completed Project 6 (Protected Mode Kernel)
- I understand x86 port I/O (inb/outb/inw/outw)
- I can read and understand hardware specifications
- I understand binary/hexadecimal and byte ordering
- I’m comfortable with C structs and pointers
After Phase 2 (FAT16 Mounting)
- I can explain the FAT16 disk layout
- I understand the difference between sectors and clusters
- My driver correctly parses the BPB
- I can calculate FAT start, root start, and data start sectors
- I understand why FAT is duplicated
After Phase 4 (File Reading)
- I can explain how cluster chains work
- My driver reads files spanning multiple clusters
- I understand the difference between root directory and subdirectories
- I can trace a file from directory entry to disk sectors
- I handle edge cases (empty files, exact cluster-size files)
After Completion
- All test cases pass
- I can navigate directories with cd command
- I can display files with cat command
- I understand how to extend to FAT32
- I can answer all interview questions in section 5.8
- I could port this to a different architecture
12. Submission / Completion Criteria
Your project is complete when:
- Functionality:
- ATA driver can read any sector from disk
- FAT16 boot sector is correctly parsed
- Root directory listing works
- File reading works (including multi-cluster files)
- Subdirectory navigation works
- Shell provides ls, cat, cd, pwd commands
- Code Quality:
- Clean separation: ata.c, fat16.c, shell.c
- Well-documented structures and algorithms
- No warnings with -Wall -Wextra
- Handles errors gracefully
- Testing:
- Works with standard FAT16 images from mkfs.fat
- Handles edge cases (empty files, large files)
- Verified against mtools output
- Understanding:
- Can explain FAT16 structure
- Can explain cluster chain following
- Can add new features (LFN, write support)
- Understands real-world applications
Summary
This project bridges the gap between raw disk hardware and the organized files users expect. You’ll learn:
- Block Device I/O – How to talk to storage hardware
- On-Disk Data Structures – How file systems organize data
- Cluster Chains – How files are stored and retrieved
- Directory Hierarchy – How paths map to disk locations
These concepts are fundamental to all operating systems. Whether you work on Linux, Windows, embedded systems, or cloud storage, understanding file systems from the ground up makes you a more effective systems programmer.
Estimated time: 3-4 weeks for complete implementation Difficulty: Expert (requires understanding of Projects 6-8) Reward: Your OS can now load files!
Previous: P10-kernel-multitasking.md - Preemptive multitasking kernel Next: P12-uefi-application.md - UEFI application development