Project 15: Robust Unix I/O Toolkit

Build a “Unix file toolbox” that copies, tees, and transforms streams while producing clear evidence of buffering behavior and syscall counts. Master the art of handling partial operations, understanding buffering trade-offs, and using memory-mapped files correctly.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	1-2 weeks
Language	C (Alternatives: Rust, Zig, C++)
Prerequisites	Basic C, familiarity with file operations
Key Topics	Unix I/O, file descriptors, buffering, mmap, robust I/O patterns
CS:APP Chapters	9, 10

Learning Objectives
Theoretical Foundation
Project Specification
Solution Architecture
Implementation Guide
Testing Strategy
Common Pitfalls
Extensions
Real-World Connections
Resources
Self-Assessment Checklist

1. Learning Objectives

By completing this project, you will:

Internalize the Unix I/O model: Understand that “everything is a file” and what that means for your programs
Master file descriptors: Know how file descriptor tables work and how they’re inherited/shared
Handle partial I/O correctly: Never assume read() or write() transfers all bytes requested
Implement robust I/O wrappers: Build the Rio (Robust I/O) package from CS:APP
Understand buffering trade-offs: Explain when to use buffered vs unbuffered I/O with evidence
Use mmap correctly: Treat memory-mapped files as “VM + file backing,” not magic
Debug I/O problems: Diagnose hangs, truncation, and performance issues systematically

2. Theoretical Foundation

2.1 The Unix I/O Model: Everything Is a File

The elegance of Unix lies in a simple abstraction: everything is a file. This means the same operations work on:

+------------------+------------------------------------------+
|    "File" Type   |              Examples                    |
+------------------+------------------------------------------+
| Regular files    | Documents, binaries, data                |
| Directories      | Folders (though read-only via read())    |
| Devices          | /dev/null, /dev/urandom, /dev/sda        |
| Pipes            | Created by pipe() syscall                |
| Sockets          | Network connections                      |
| Terminals        | /dev/tty, stdin when interactive         |
+------------------+------------------------------------------+

This unification enables powerful composition:

# The same data flows through files, pipes, and network
cat /etc/passwd | grep root | nc server 8080

2.2 File Descriptors: The Handle to Everything

A file descriptor is a small non-negative integer that identifies an open file. When you open a file, the kernel returns a file descriptor that you use for all subsequent operations.

+-----------------------------------------------------------------------+
|                    FILE DESCRIPTOR ARCHITECTURE                        |
+-----------------------------------------------------------------------+
|                                                                       |
|   Process                           Kernel                            |
|   +------------------+              +---------------------------+     |
|   | Descriptor Table |              | Open File Table           |     |
|   | (per process)    |              | (system-wide)             |     |
|   +------------------+              +---------------------------+     |
|   |  0  | ---------->|------------->| File pos: 0               |     |
|   +-----+            |              | Flags: O_RDONLY           |--+  |
|   |  1  | --------+  |              | refcount: 1               |  |  |
|   +-----+         |  |              +---------------------------+  |  |
|   |  2  | -----+  |  |              | File pos: 1024            |  |  |
|   +-----+      |  +--|------------->| Flags: O_WRONLY           |  |  |
|   |  3  | --+  |     |              | refcount: 2               |  |  |
|   +-----+   |  |     |              +---------------------------+  |  |
|   | ... |   |  +-----|------------->| File pos: 0               |  |  |
|   +-----+   |        |              | Flags: O_RDWR             |  |  |
|             |        |              | refcount: 1               |  |  |
|             |        |              +---------------------------+  |  |
|             |        |                           |                 |  |
|             |        |                           v                 |  |
|             |        |              +---------------------------+  |  |
|             |        |              | v-node Table              |  |  |
|             |        |              | (per filesystem object)   |<-+  |
|             |        |              +---------------------------+     |
|             |        |              | inode info                |     |
|             |        |              | file size, permissions    |     |
|             |        |              | device, block pointers    |     |
|             |        |              +---------------------------+     |
|             |        |                                                |
+-----------------------------------------------------------------------+
|                                                                       |
|   Key Insights:                                                       |
|   - Multiple descriptors can point to the same open file entry       |
|   - Each open file entry has its OWN file position                   |
|   - dup() creates new descriptor pointing to SAME open file entry    |
|   - open() always creates NEW open file entry                        |
|   - fork() duplicates descriptor table, SHARES open file entries     |
|                                                                       |
+-----------------------------------------------------------------------+

Standard Descriptors:

Descriptor	Name	Constant	Purpose
0	stdin	STDIN_FILENO	Standard input
1	stdout	STDOUT_FILENO	Standard output
2	stderr	STDERR_FILENO	Standard error

Critical Understanding: Descriptors 0, 1, 2 are just conventions. The shell sets them up before your program runs. You can redirect them:

// Redirect stdout to a file
int fd = open("output.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
dup2(fd, STDOUT_FILENO);  // Now printf() writes to output.txt
close(fd);

2.3 Core System Calls: open(), close(), read(), write()

open() - Opening Files

#include <fcntl.h>

int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);

Flags (combine with |):

Flag	Purpose
`O_RDONLY`	Open for reading only
`O_WRONLY`	Open for writing only
`O_RDWR`	Open for reading and writing
`O_CREAT`	Create file if it doesn’t exist
`O_TRUNC`	Truncate existing file to zero length
`O_APPEND`	Append to end of file
`O_EXCL`	With O_CREAT, fail if file exists

Return value: File descriptor on success, -1 on error (check errno).

close() - Closing Files

int close(int fd);

Important: Always close files when done. Failing to close:

Leaks file descriptors (limited resource, usually 1024 default)
May lose buffered data
Holds locks longer than necessary

read() - Reading Data

ssize_t read(int fd, void *buf, size_t count);

Returns:

Number of bytes actually read (may be LESS than count!)
0 on EOF
-1 on error

CRITICAL: read() is NOT guaranteed to read count bytes!

+-----------------------------------------------------------------------+
|                    READ() RETURN VALUE SEMANTICS                       |
+-----------------------------------------------------------------------+
|                                                                       |
|   Requested: read(fd, buf, 1000)                                     |
|                                                                       |
|   Possible Returns:                                                   |
|   +------------------------------------------------------------------+
|   | Return | Meaning                        | Action Required        |
|   +--------+--------------------------------+------------------------+
|   | 1000   | Got all requested bytes        | Done (rare for large)  |
|   +--------+--------------------------------+------------------------+
|   | 500    | Got partial data (SHORT COUNT) | Loop and read more!   |
|   +--------+--------------------------------+------------------------+
|   | 0      | End of file reached            | Done, no more data     |
|   +--------+--------------------------------+------------------------+
|   | -1     | Error occurred                 | Check errno            |
|   +--------+--------------------------------+------------------------+
|                                                                       |
|   Why Short Counts Happen:                                           |
|   - Reading from pipe/socket: data arrives incrementally             |
|   - Reading from terminal: returns after newline or buffer full      |
|   - Interrupted by signal: may return partial data                   |
|   - Near EOF: only remaining bytes returned                          |
|   - Kernel buffer boundaries                                          |
|                                                                       |
+-----------------------------------------------------------------------+

write() - Writing Data

ssize_t write(int fd, const void *buf, size_t count);

Returns:

Number of bytes actually written (may be LESS than count!)
-1 on error

CRITICAL: write() is also NOT guaranteed to write all bytes!

Short writes happen when:

Writing to a pipe and it fills up
Writing to a socket with limited send buffer
Interrupted by a signal
Disk full condition

2.4 Short Counts: The Core Challenge

The single most important lesson of Chapter 10:

Never assume a single read() or write() transfers all requested bytes.

The Wrong Way (DO NOT DO THIS):

// BROKEN: Assumes read() returns exactly n bytes
char buf[BUFSIZE];
read(fd, buf, BUFSIZE);  // May read less!
process(buf, BUFSIZE);   // Oops, processing uninitialized data

The Right Way (Robust I/O loop):

// Read exactly n bytes, or return error/EOF
ssize_t readn(int fd, void *buf, size_t n) {
    size_t nleft = n;
    ssize_t nread;
    char *ptr = buf;

    while (nleft > 0) {
        if ((nread = read(fd, ptr, nleft)) < 0) {
            if (errno == EINTR)  // Interrupted by signal
                nread = 0;       // Call read() again
            else
                return -1;       // Error
        } else if (nread == 0) {
            break;               // EOF
        }
        nleft -= nread;
        ptr += nread;
    }
    return n - nleft;  // Return bytes actually read
}

2.5 The RIO (Robust I/O) Package

CS:APP introduces the Rio package to handle robust I/O. It provides:

Unbuffered transfer functions: rio_readn, rio_writen
Buffered input functions: rio_readlineb, rio_readnb

+-----------------------------------------------------------------------+
|                         RIO PACKAGE OVERVIEW                           |
+-----------------------------------------------------------------------+
|                                                                       |
|   UNBUFFERED (for binary data):                                       |
|   +-------------------------------------------------------------------+
|   | rio_readn(fd, buf, n)  | Read exactly n bytes (loop on short)    |
|   | rio_writen(fd, buf, n) | Write exactly n bytes (loop on short)   |
|   +-------------------------------------------------------------------+
|                                                                       |
|   BUFFERED (for text lines):                                          |
|   +-------------------------------------------------------------------+
|   | rio_readinitb(rp, fd)    | Associate buffer with descriptor      |
|   | rio_readlineb(rp, buf, n)| Read line up to n bytes (buffered)    |
|   | rio_readnb(rp, buf, n)   | Read n bytes (buffered)               |
|   +-------------------------------------------------------------------+
|                                                                       |
|   Internal Buffer Structure:                                          |
|   +-------------------------------------------------------------------+
|   |  rio_t                                                            |
|   |  +-----------------------------+                                  |
|   |  | rio_fd       | descriptor   |                                  |
|   |  | rio_cnt      | unread bytes |                                  |
|   |  | rio_bufptr   | next byte    |                                  |
|   |  | rio_buf[8192]| internal buf |                                  |
|   |  +-----------------------------+                                  |
|   |                                                                   |
|   |  When rio_readlineb() is called:                                  |
|   |  1. If buffer has data, scan for newline                          |
|   |  2. If buffer empty, refill with read(fd, buf, 8192)              |
|   |  3. Return line without extra syscalls                            |
|   +-------------------------------------------------------------------+
|                                                                       |
+-----------------------------------------------------------------------+

Why buffering matters for lines:

Without buffering, reading one character at a time to find newlines requires one syscall per byte. With buffering:

Approach	Syscalls for 1000-char line
getchar() loop	1000 (one per char)
Rio buffered	~1 (reads 8192 bytes at once)

2.6 Buffered I/O: stdio vs System Calls

Standard I/O (stdio): fopen, fread, fwrite, printf, scanf

System Calls: open, read, write

+-----------------------------------------------------------------------+
|                    STDIO vs SYSTEM CALLS                               |
+-----------------------------------------------------------------------+
|                                                                       |
|   Application                                                         |
|       |                                                               |
|       v                                                               |
|   +-------------------+                                               |
|   |   stdio library   |  <-- User-space buffering                    |
|   |   fread/fwrite    |      (usually 8KB buffer)                    |
|   +-------------------+                                               |
|       |                                                               |
|       v                                                               |
|   +-------------------+                                               |
|   |   read()/write()  |  <-- System call boundary                    |
|   +-------------------+                                               |
|       |                                                               |
|       v                                                               |
|   +-------------------+                                               |
|   |   Kernel buffers  |  <-- Kernel-space buffering                  |
|   |   Page cache      |                                              |
|   +-------------------+                                               |
|       |                                                               |
|       v                                                               |
|   +-------------------+                                               |
|   |   Disk/Device     |                                              |
|   +-------------------+                                               |
|                                                                       |
+-----------------------------------------------------------------------+
|                                                                       |
|   Buffering Modes (setvbuf):                                         |
|   +-------------------+----------------------------------------+      |
|   | _IOFBF (full)     | Flush when buffer full (default files) |      |
|   | _IOLBF (line)     | Flush on newline (default tty stdout)  |      |
|   | _IONBF (none)     | No buffering (default stderr)          |      |
|   +-------------------+----------------------------------------+      |
|                                                                       |
+-----------------------------------------------------------------------+

When to use which:

Use Case	Recommended	Why
Text processing	stdio	Line-oriented, convenient
Binary data	read/write	No format interpretation
Network I/O	read/write	Need precise control
Mixed text/binary	Rio	Best of both worlds
Performance critical	mmap or direct I/O	Minimize copies

Danger: Mixing stdio and system calls:

// DANGEROUS: Don't do this!
FILE *f = fopen("data.txt", "r");
int fd = fileno(f);

fread(buf1, 1, 100, f);   // Reads into stdio buffer (maybe 8KB)
read(fd, buf2, 100);      // Reads PAST what fread buffered!

// You've skipped data because stdio buffered ahead

2.7 File Metadata: stat() and Friends

Files have metadata beyond their contents:

#include <sys/stat.h>

int stat(const char *pathname, struct stat *statbuf);
int fstat(int fd, struct stat *statbuf);
int lstat(const char *pathname, struct stat *statbuf);  // Don't follow symlinks

struct stat {
    dev_t     st_dev;      // Device containing file
    ino_t     st_ino;      // Inode number
    mode_t    st_mode;     // File type and permissions
    nlink_t   st_nlink;    // Number of hard links
    uid_t     st_uid;      // User ID of owner
    gid_t     st_gid;      // Group ID of owner
    dev_t     st_rdev;     // Device ID (if special file)
    off_t     st_size;     // Total size in bytes
    blksize_t st_blksize;  // Block size for I/O
    blkcnt_t  st_blocks;   // Number of 512B blocks
    time_t    st_atime;    // Last access time
    time_t    st_mtime;    // Last modification time
    time_t    st_ctime;    // Last status change time
};

File type macros (for st_mode):

S_ISREG(m)   // Is regular file?
S_ISDIR(m)   // Is directory?
S_ISCHR(m)   // Is character device?
S_ISBLK(m)   // Is block device?
S_ISFIFO(m)  // Is FIFO (named pipe)?
S_ISLNK(m)   // Is symbolic link?
S_ISSOCK(m)  // Is socket?

2.8 Memory-Mapped I/O: mmap()

mmap() maps a file directly into virtual memory, allowing you to access file contents as if they were arrays in memory.

#include <sys/mman.h>

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
int munmap(void *addr, size_t length);

+-----------------------------------------------------------------------+
|                    MEMORY-MAPPED FILE ARCHITECTURE                     |
+-----------------------------------------------------------------------+
|                                                                       |
|   Virtual Address Space                    Physical Memory / Disk     |
|   +---------------------------+            +----------------------+   |
|   |         Stack             |            |                      |   |
|   +---------------------------+            |      Disk File       |   |
|   |         Heap              |            |    +-----------+     |   |
|   +---------------------------+            |    | Block 0   |     |   |
|   |    Memory-Mapped Region   |<---------->|    | Block 1   |     |   |
|   |    (file contents here)   |    PAGE    |    | Block 2   |     |   |
|   |                           |    FAULT   |    | Block 3   |     |   |
|   +---------------------------+            |    +-----------+     |   |
|   |         Data              |            |                      |   |
|   +---------------------------+            +----------------------+   |
|   |         Text              |                                       |
|   +---------------------------+                                       |
|                                                                       |
|   How it Works:                                                       |
|   1. mmap() reserves virtual address range                            |
|   2. No data loaded yet (lazy loading)                                |
|   3. First access triggers page fault                                 |
|   4. Kernel loads page from file into memory                          |
|   5. Page table updated, access continues                             |
|   6. Modifications may be written back (MAP_SHARED)                   |
|                                                                       |
+-----------------------------------------------------------------------+

Protection flags (prot):

Flag	Meaning
`PROT_READ`	Pages can be read
`PROT_WRITE`	Pages can be written
`PROT_EXEC`	Pages can be executed
`PROT_NONE`	Pages cannot be accessed

Mapping flags (flags):

Flag	Meaning
`MAP_SHARED`	Changes visible to other processes, written to file
`MAP_PRIVATE`	Copy-on-write private mapping
`MAP_ANONYMOUS`	No file backing (memory allocation)
`MAP_FIXED`	Map at exact address (dangerous)

Example: Reading a file with mmap:

#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

void process_file_with_mmap(const char *filename) {
    int fd = open(filename, O_RDONLY);
    if (fd < 0) { perror("open"); return; }

    struct stat sb;
    if (fstat(fd, &sb) < 0) { perror("fstat"); close(fd); return; }

    void *addr = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (addr == MAP_FAILED) { perror("mmap"); close(fd); return; }

    // Now access file as memory!
    char *data = (char *)addr;
    for (size_t i = 0; i < sb.st_size; i++) {
        // Process data[i] - no read() calls needed!
    }

    munmap(addr, sb.st_size);
    close(fd);
}

When to use mmap:

Use Case	mmap Appropriate?
Random access to large file	Yes - avoids seeks
Read file once sequentially	Maybe - read() is simpler
Multiple processes sharing	Yes - MAP_SHARED
File size changes during access	No - mmap size is fixed
Pipes/sockets	No - not seekable

2.9 Pipes and FIFOs

Pipes are unidirectional byte streams between processes.

int pipe(int pipefd[2]);
// pipefd[0] is the read end
// pipefd[1] is the write end

+-----------------------------------------------------------------------+
|                         PIPE ARCHITECTURE                              |
+-----------------------------------------------------------------------+
|                                                                       |
|   Process A                 Kernel Buffer              Process B      |
|   (writer)                  (typically 64KB)           (reader)       |
|                                                                       |
|   +----------+              +----------------+         +----------+   |
|   |          |  write(fd)   |                |  read() |          |   |
|   |   fd[1]  | -----------> | =============> | ------> |   fd[0]  |   |
|   |          |              |     FIFO       |         |          |   |
|   +----------+              +----------------+         +----------+   |
|                                                                       |
|   Properties:                                                         |
|   - Unidirectional (use two pipes for bidirectional)                 |
|   - Data is FIFO ordered                                             |
|   - write() blocks when pipe is full                                 |
|   - read() blocks when pipe is empty                                 |
|   - read() returns 0 when all write ends are closed (EOF)            |
|   - write() to pipe with no readers: SIGPIPE                         |
|                                                                       |
+-----------------------------------------------------------------------+

Classic pattern: parent-child communication:

int pipefd[2];
pipe(pipefd);

pid_t pid = fork();
if (pid == 0) {
    // Child: read from pipe
    close(pipefd[1]);  // Close write end
    char buf[256];
    ssize_t n = read(pipefd[0], buf, sizeof(buf));
    // Process buf...
    close(pipefd[0]);
} else {
    // Parent: write to pipe
    close(pipefd[0]);  // Close read end
    write(pipefd[1], "Hello, child!", 13);
    close(pipefd[1]);
    wait(NULL);
}

FIFOs (Named Pipes): Like pipes, but have names in the filesystem.

mkfifo("/tmp/myfifo", 0644);  // Create named pipe

// Process A:
int fd = open("/tmp/myfifo", O_WRONLY);
write(fd, data, len);

// Process B (can be unrelated):
int fd = open("/tmp/myfifo", O_RDONLY);
read(fd, buf, sizeof(buf));

2.10 dup() and dup2(): Descriptor Manipulation

int dup(int oldfd);           // Duplicate to lowest available fd
int dup2(int oldfd, int newfd);  // Duplicate to specific fd

Key insight: After dup2(fd, newfd), both descriptors point to the same open file entry, sharing position and flags.

I/O Redirection implementation:

// Redirect stdout to a file (what shell does for >)
int fd = open("output.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
dup2(fd, STDOUT_FILENO);  // stdout now writes to output.txt
close(fd);                 // Don't need the extra descriptor
printf("This goes to output.txt\n");

// Redirect stdin from a file (what shell does for <)
int fd = open("input.txt", O_RDONLY);
dup2(fd, STDIN_FILENO);   // stdin now reads from input.txt
close(fd);
// getchar() now reads from input.txt

Pipe connection for shell pipeline:

// Implement: cmd1 | cmd2
int pipefd[2];
pipe(pipefd);

if (fork() == 0) {
    // Child 1: cmd1 (stdout -> pipe)
    close(pipefd[0]);
    dup2(pipefd[1], STDOUT_FILENO);
    close(pipefd[1]);
    exec(cmd1);
}

if (fork() == 0) {
    // Child 2: cmd2 (stdin <- pipe)
    close(pipefd[1]);
    dup2(pipefd[0], STDIN_FILENO);
    close(pipefd[0]);
    exec(cmd2);
}

// Parent: close both ends and wait
close(pipefd[0]);
close(pipefd[1]);
wait(NULL);
wait(NULL);

2.11 I/O Multiplexing: select/poll/epoll Overview

When you need to handle multiple I/O sources (multiple clients, stdin + network, etc.):

+-----------------------------------------------------------------------+
|                    I/O MULTIPLEXING OPTIONS                            |
+-----------------------------------------------------------------------+
|                                                                       |
|   Problem: Server needs to handle multiple clients                    |
|                                                                       |
|   Solutions:                                                          |
|   +---------------+------------------------------------------------+  |
|   | Approach      | Description                                     |  |
|   +---------------+------------------------------------------------+  |
|   | Blocking I/O  | One thread per client (expensive)              |  |
|   | select()      | Wait on multiple fds (O(n) per call)           |  |
|   | poll()        | Like select, no fd limit (still O(n))          |  |
|   | epoll (Linux) | Wait on multiple fds (O(1) per event)          |  |
|   | kqueue (BSD)  | BSD equivalent of epoll                        |  |
|   +---------------+------------------------------------------------+  |
|                                                                       |
|   select() example:                                                   |
|   +---------------------------------------------------------------+   |
|   | fd_set readfds;                                                |   |
|   | FD_ZERO(&readfds);                                             |   |
|   | FD_SET(stdin_fd, &readfds);                                    |   |
|   | FD_SET(socket_fd, &readfds);                                   |   |
|   |                                                                |   |
|   | int n = select(max_fd + 1, &readfds, NULL, NULL, &timeout);    |   |
|   |                                                                |   |
|   | if (FD_ISSET(stdin_fd, &readfds))                              |   |
|   |     handle_stdin();                                            |   |
|   | if (FD_ISSET(socket_fd, &readfds))                             |   |
|   |     handle_client();                                           |   |
|   +---------------------------------------------------------------+   |
|                                                                       |
+-----------------------------------------------------------------------+

2.12 Common Misconceptions

Misconception 1: “read() always returns the requested number of bytes”

Reality: read() can return fewer bytes (short count) for many reasons. Always loop.

Misconception 2: “Closing a file ensures data is written to disk”

Reality: close() ensures data reaches the kernel buffer, not necessarily the disk. Use fsync() for durability.

Misconception 3: “mmap is always faster than read()”

Reality: mmap has overhead for page faults and TLB management. For sequential single-pass reads, read() with large buffers may be faster.

Misconception 4: “stdio buffering is always helpful”

Reality: For network I/O, stdio buffering can cause latency issues. Interactive protocols often need unbuffered or line-buffered I/O.

Misconception 5: “Pipes have infinite capacity”

Reality: Pipes have a finite buffer (typically 64KB on Linux). Writing to a full pipe blocks.

3. Project Specification

3.1 What You Will Build

A Unix I/O toolkit with several components:

rio-cat: Robust file copy that handles short counts
rio-tee: Split input to multiple outputs (like tee)
rio-transform: Apply transformations to streams
rio-stat: Display syscall counts and buffering behavior
mmap-search: Memory-mapped file search

3.2 Functional Requirements

3.2.1 rio-cat: Robust Copy

# Copy file to stdout
./rio-cat input.txt

# Copy stdin to stdout
cat large_file | ./rio-cat

# Copy with syscall tracing
./rio-cat --trace input.txt

Must handle:

Regular files of any size
Pipes (stdin from pipe)
Slow sources (demonstrate short counts)
EOF detection

3.2.2 rio-tee: Stream Splitter

# Copy stdin to stdout AND file
cat data | ./rio-tee output.txt

# Tee to multiple files
cat data | ./rio-tee -a file1.txt file2.txt file3.txt

Must handle:

Multiple output files
Append mode (-a)
Partial writes to any destination

3.2.3 rio-transform: Stream Transformer

# Convert to uppercase
./rio-transform --upper < input.txt

# ROT13 encoding
./rio-transform --rot13 < input.txt

# Hex dump
./rio-transform --hex < binary_file

3.2.4 rio-stat: I/O Statistics

# Show buffering behavior
./rio-stat --buffer-compare input.txt

# Count syscalls
./rio-stat --syscall-count input.txt

# Compare read sizes
./rio-stat --read-sizes 1 64 4096 < input.txt

Output should demonstrate:

Number of read/write syscalls
Average bytes per syscall
Time taken
Performance comparison

3.2.5 mmap-search: Memory-Mapped Search

# Search file using mmap
./mmap-search pattern large_file.txt

# Compare with read-based search
./mmap-search --compare pattern large_file.txt

3.3 Non-Functional Requirements

Robustness: Never hang, never silently truncate
Correctness: Output matches input exactly (for copy operations)
Observability: Trace mode shows what’s happening
Performance: Demonstrate buffering impact with evidence

3.4 Example Session

$ echo "Hello, World!" | ./rio-cat --trace
[TRACE] read(0, buf, 8192) = 14
[TRACE] write(1, buf, 14) = 14
[TRACE] read(0, buf, 8192) = 0 (EOF)
Hello, World!
Total: 2 reads, 1 write

$ ./rio-stat --buffer-compare /usr/share/dict/words
Reading 102401 bytes with different buffer sizes:

Buffer Size  Reads  Writes  Time(ms)
-----------  -----  ------  --------
         1   102401    1     1523.45
        64    1601     1       34.21
       512     201     1       12.87
      4096      26     1        8.43
      8192      13     1        7.91
     65536       2     1        7.12

Conclusion: Larger buffers = fewer syscalls = faster

$ ./mmap-search --compare "hello" large_log.txt
Read-based search:
  Time: 234.5 ms
  Syscalls: 1250

Mmap-based search:
  Time: 89.2 ms
  Syscalls: 1 (mmap) + page faults handled by kernel

Pattern found at offsets: 1234, 5678, 91011

4. Solution Architecture

4.1 High-Level Design

+-----------------------------------------------------------------------+
|                         UNIX I/O TOOLKIT                               |
+-----------------------------------------------------------------------+
|                                                                       |
|   +------------------+     +------------------+     +--------------+  |
|   |    rio-cat       |     |    rio-tee       |     | rio-transform|  |
|   +------------------+     +------------------+     +--------------+  |
|           |                        |                       |          |
|           v                        v                       v          |
|   +------------------------------------------------------------------+|
|   |                        librio (Robust I/O)                        ||
|   |------------------------------------------------------------------|
|   |  rio_readn()   | Read exactly n bytes, handling short counts     ||
|   |  rio_writen()  | Write exactly n bytes, handling short counts    ||
|   |  rio_readlineb()| Read line with buffering                       ||
|   |  rio_readinitb()| Initialize buffered reader                     ||
|   +------------------------------------------------------------------+|
|           |                        |                       |          |
|           v                        v                       v          |
|   +------------------------------------------------------------------+|
|   |                        I/O Tracing Layer                          ||
|   |------------------------------------------------------------------|
|   |  trace_read()  | Log read syscalls                               ||
|   |  trace_write() | Log write syscalls                              ||
|   |  get_stats()   | Return accumulated statistics                   ||
|   +------------------------------------------------------------------+|
|           |                                                           |
|           v                                                           |
|   +------------------------------------------------------------------+|
|   |                     System Calls (read/write/mmap)                ||
|   +------------------------------------------------------------------+|
|                                                                       |
+-----------------------------------------------------------------------+

4.2 Key Data Structures

/* Robust I/O buffer (from CS:APP) */
#define RIO_BUFSIZE 8192

typedef struct {
    int rio_fd;                  /* Descriptor for this buffer */
    ssize_t rio_cnt;             /* Unread bytes in buffer */
    char *rio_bufptr;            /* Next unread byte */
    char rio_buf[RIO_BUFSIZE];   /* Internal buffer */
} rio_t;

/* I/O statistics */
typedef struct {
    size_t read_calls;           /* Number of read() calls */
    size_t write_calls;          /* Number of write() calls */
    size_t bytes_read;           /* Total bytes read */
    size_t bytes_written;        /* Total bytes written */
    size_t short_reads;          /* Reads that returned < requested */
    size_t short_writes;         /* Writes that returned < requested */
    struct timespec start_time;
    struct timespec end_time;
} io_stats_t;

/* Transform function type */
typedef ssize_t (*transform_fn)(const char *in, size_t in_len,
                                char *out, size_t out_size);

4.3 Module Structure

unix-io-toolkit/
+-- src/
|   +-- rio.c           # Robust I/O implementation
|   +-- rio.h
|   +-- trace.c         # Syscall tracing layer
|   +-- trace.h
|   +-- transform.c     # Transformation functions
|   +-- transform.h
|   +-- mmap_utils.c    # mmap helper functions
|   +-- mmap_utils.h
|   +-- rio_cat.c       # Main for rio-cat
|   +-- rio_tee.c       # Main for rio-tee
|   +-- rio_transform.c # Main for rio-transform
|   +-- rio_stat.c      # Main for rio-stat
|   +-- mmap_search.c   # Main for mmap-search
+-- tests/
|   +-- test_rio.c
|   +-- test_short_counts.c
|   +-- test_pipes.c
|   +-- test_large_files.c
|   +-- test_mmap.c
+-- Makefile
+-- README.md

5. Implementation Guide

5.1 Development Environment Setup

# Required tools (Linux)
sudo apt-get install build-essential strace

# macOS (uses dtrace instead of strace)
xcode-select --install

# Create project structure
mkdir -p unix-io-toolkit/{src,tests,artifacts}
cd unix-io-toolkit

5.2 Phase 1: Rio Library Core (Days 1-3)

Goal: Implement the Robust I/O package from CS:APP

Tasks:

Implement rio_readn() - read exactly n bytes
Implement rio_writen() - write exactly n bytes
Implement buffered reader (rio_readlineb, rio_readnb)
Add comprehensive error handling

Implementation:

/* rio.c - Robust I/O functions */
#include "rio.h"
#include <errno.h>
#include <unistd.h>
#include <string.h>

/*
 * rio_readn - Robustly read n bytes (unbuffered)
 * Returns: number of bytes read (can be less than n only on EOF)
 *          -1 on error
 */
ssize_t rio_readn(int fd, void *usrbuf, size_t n)
{
    size_t nleft = n;
    ssize_t nread;
    char *bufp = usrbuf;

    while (nleft > 0) {
        if ((nread = read(fd, bufp, nleft)) < 0) {
            if (errno == EINTR)  /* Interrupted by signal */
                nread = 0;       /* Call read() again */
            else
                return -1;       /* Error, errno set by read() */
        }
        else if (nread == 0)
            break;               /* EOF */

        nleft -= nread;
        bufp += nread;
    }
    return (n - nleft);          /* Return >= 0 */
}

/*
 * rio_writen - Robustly write n bytes (unbuffered)
 * Returns: n on success, -1 on error
 */
ssize_t rio_writen(int fd, const void *usrbuf, size_t n)
{
    size_t nleft = n;
    ssize_t nwritten;
    const char *bufp = usrbuf;

    while (nleft > 0) {
        if ((nwritten = write(fd, bufp, nleft)) <= 0) {
            if (errno == EINTR)  /* Interrupted by signal */
                nwritten = 0;    /* Call write() again */
            else
                return -1;       /* Error */
        }
        nleft -= nwritten;
        bufp += nwritten;
    }
    return n;
}

/*
 * rio_readinitb - Initialize a read buffer
 */
void rio_readinitb(rio_t *rp, int fd)
{
    rp->rio_fd = fd;
    rp->rio_cnt = 0;
    rp->rio_bufptr = rp->rio_buf;
}

/*
 * rio_read - Internal buffered read (not exposed)
 * Refills buffer as needed
 */
static ssize_t rio_read(rio_t *rp, char *usrbuf, size_t n)
{
    int cnt;

    while (rp->rio_cnt <= 0) {  /* Buffer empty, refill it */
        rp->rio_cnt = read(rp->rio_fd, rp->rio_buf, sizeof(rp->rio_buf));

        if (rp->rio_cnt < 0) {
            if (errno != EINTR)
                return -1;       /* Error other than interrupt */
        }
        else if (rp->rio_cnt == 0)
            return 0;            /* EOF */
        else
            rp->rio_bufptr = rp->rio_buf;
    }

    /* Copy min(n, available) bytes from internal buffer */
    cnt = n;
    if ((size_t)rp->rio_cnt < n)
        cnt = rp->rio_cnt;
    memcpy(usrbuf, rp->rio_bufptr, cnt);
    rp->rio_bufptr += cnt;
    rp->rio_cnt -= cnt;
    return cnt;
}

/*
 * rio_readlineb - Read a line (buffered), including newline
 */
ssize_t rio_readlineb(rio_t *rp, void *usrbuf, size_t maxlen)
{
    int n, rc;
    char c, *bufp = usrbuf;

    for (n = 1; n < (int)maxlen; n++) {
        if ((rc = rio_read(rp, &c, 1)) == 1) {
            *bufp++ = c;
            if (c == '\n') {
                n++;
                break;
            }
        } else if (rc == 0) {
            if (n == 1)
                return 0;        /* EOF, no data read */
            else
                break;           /* EOF, some data read */
        } else {
            return -1;           /* Error */
        }
    }
    *bufp = 0;
    return n - 1;
}

/*
 * rio_readnb - Read n bytes (buffered)
 */
ssize_t rio_readnb(rio_t *rp, void *usrbuf, size_t n)
{
    size_t nleft = n;
    ssize_t nread;
    char *bufp = usrbuf;

    while (nleft > 0) {
        if ((nread = rio_read(rp, bufp, nleft)) < 0)
            return -1;
        else if (nread == 0)
            break;
        nleft -= nread;
        bufp += nread;
    }
    return (n - nleft);
}

Checkpoint: Test with pipes that deliver data slowly to verify short count handling.

5.3 Phase 2: I/O Tracing Layer (Days 4-5)

Goal: Add syscall counting and tracing

Tasks:

Create wrapper functions that log syscalls
Track statistics (counts, bytes, timing)
Add trace output mode

/* trace.h */
#ifndef TRACE_H
#define TRACE_H

#include <stddef.h>
#include <sys/types.h>
#include <time.h>

typedef struct {
    size_t read_calls;
    size_t write_calls;
    size_t bytes_read;
    size_t bytes_written;
    size_t short_reads;
    size_t short_writes;
    struct timespec start_time;
} io_stats_t;

/* Initialize tracing */
void trace_init(int enable_trace);

/* Traced I/O operations */
ssize_t traced_read(int fd, void *buf, size_t count);
ssize_t traced_write(int fd, const void *buf, size_t count);

/* Get statistics */
io_stats_t *trace_get_stats(void);
void trace_print_stats(void);
void trace_reset_stats(void);

#endif

/* trace.c */
#include "trace.h"
#include <stdio.h>
#include <unistd.h>

static io_stats_t stats;
static int tracing_enabled = 0;

void trace_init(int enable_trace)
{
    tracing_enabled = enable_trace;
    trace_reset_stats();
    clock_gettime(CLOCK_MONOTONIC, &stats.start_time);
}

ssize_t traced_read(int fd, void *buf, size_t count)
{
    ssize_t result = read(fd, buf, count);

    stats.read_calls++;
    if (result > 0) {
        stats.bytes_read += result;
        if ((size_t)result < count && result > 0) {
            stats.short_reads++;
        }
    }

    if (tracing_enabled) {
        if (result >= 0)
            fprintf(stderr, "[TRACE] read(%d, buf, %zu) = %zd%s\n",
                    fd, count, result, result == 0 ? " (EOF)" : "");
        else
            fprintf(stderr, "[TRACE] read(%d, buf, %zu) = -1 (error)\n",
                    fd, count);
    }

    return result;
}

ssize_t traced_write(int fd, const void *buf, size_t count)
{
    ssize_t result = write(fd, buf, count);

    stats.write_calls++;
    if (result > 0) {
        stats.bytes_written += result;
        if ((size_t)result < count) {
            stats.short_writes++;
        }
    }

    if (tracing_enabled) {
        fprintf(stderr, "[TRACE] write(%d, buf, %zu) = %zd\n",
                fd, count, result);
    }

    return result;
}

void trace_print_stats(void)
{
    struct timespec now;
    clock_gettime(CLOCK_MONOTONIC, &now);

    double elapsed = (now.tv_sec - stats.start_time.tv_sec) +
                     (now.tv_nsec - stats.start_time.tv_nsec) / 1e9;

    fprintf(stderr, "\n=== I/O Statistics ===\n");
    fprintf(stderr, "Read calls:    %zu\n", stats.read_calls);
    fprintf(stderr, "Write calls:   %zu\n", stats.write_calls);
    fprintf(stderr, "Bytes read:    %zu\n", stats.bytes_read);
    fprintf(stderr, "Bytes written: %zu\n", stats.bytes_written);
    fprintf(stderr, "Short reads:   %zu\n", stats.short_reads);
    fprintf(stderr, "Short writes:  %zu\n", stats.short_writes);
    fprintf(stderr, "Elapsed time:  %.3f ms\n", elapsed * 1000);

    if (stats.read_calls > 0) {
        fprintf(stderr, "Avg read size: %.1f bytes\n",
                (double)stats.bytes_read / stats.read_calls);
    }
}

Checkpoint: Verify traces show actual syscall patterns.

5.4 Phase 3: rio-cat and rio-tee (Days 6-8)

Goal: Build the main copy utilities

rio-cat implementation:

/* rio_cat.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include "rio.h"
#include "trace.h"

#define DEFAULT_BUFSIZE 8192

void usage(const char *prog)
{
    fprintf(stderr, "Usage: %s [--trace] [--bufsize N] [file ...]\n", prog);
    exit(1);
}

int copy_fd_to_stdout(int fd, size_t bufsize, int trace)
{
    char *buf = malloc(bufsize);
    if (!buf) {
        perror("malloc");
        return -1;
    }

    ssize_t nread;
    while ((nread = traced_read(fd, buf, bufsize)) > 0) {
        ssize_t nwritten = 0;
        while (nwritten < nread) {
            ssize_t n = traced_write(STDOUT_FILENO,
                                     buf + nwritten,
                                     nread - nwritten);
            if (n < 0) {
                perror("write");
                free(buf);
                return -1;
            }
            nwritten += n;
        }
    }

    if (nread < 0) {
        perror("read");
        free(buf);
        return -1;
    }

    free(buf);
    return 0;
}

int main(int argc, char **argv)
{
    int trace = 0;
    size_t bufsize = DEFAULT_BUFSIZE;
    int file_start = 1;

    /* Parse arguments */
    for (int i = 1; i < argc; i++) {
        if (strcmp(argv[i], "--trace") == 0) {
            trace = 1;
            file_start = i + 1;
        } else if (strcmp(argv[i], "--bufsize") == 0 && i + 1 < argc) {
            bufsize = atoi(argv[++i]);
            file_start = i + 1;
        } else if (argv[i][0] == '-') {
            usage(argv[0]);
        } else {
            file_start = i;
            break;
        }
    }

    trace_init(trace);

    int status = 0;

    if (file_start >= argc) {
        /* No files specified, read from stdin */
        status = copy_fd_to_stdout(STDIN_FILENO, bufsize, trace);
    } else {
        /* Process each file */
        for (int i = file_start; i < argc; i++) {
            int fd = open(argv[i], O_RDONLY);
            if (fd < 0) {
                perror(argv[i]);
                status = 1;
                continue;
            }
            if (copy_fd_to_stdout(fd, bufsize, trace) < 0) {
                status = 1;
            }
            close(fd);
        }
    }

    if (trace) {
        trace_print_stats();
    }

    return status;
}

rio-tee implementation:

/* rio_tee.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include "rio.h"
#include "trace.h"

#define BUFSIZE 8192
#define MAX_OUTPUTS 64

int main(int argc, char **argv)
{
    int append = 0;
    int trace = 0;
    int file_start = 1;
    int output_fds[MAX_OUTPUTS];
    int num_outputs = 0;

    /* Parse arguments */
    for (int i = 1; i < argc; i++) {
        if (strcmp(argv[i], "-a") == 0) {
            append = 1;
        } else if (strcmp(argv[i], "--trace") == 0) {
            trace = 1;
        } else if (argv[i][0] != '-') {
            file_start = i;
            break;
        }
    }

    trace_init(trace);

    /* Always output to stdout */
    output_fds[num_outputs++] = STDOUT_FILENO;

    /* Open output files */
    int flags = O_WRONLY | O_CREAT | (append ? O_APPEND : O_TRUNC);
    for (int i = file_start; i < argc && num_outputs < MAX_OUTPUTS; i++) {
        int fd = open(argv[i], flags, 0644);
        if (fd < 0) {
            perror(argv[i]);
            /* Continue with other files */
        } else {
            output_fds[num_outputs++] = fd;
        }
    }

    /* Read from stdin, write to all outputs */
    char buf[BUFSIZE];
    ssize_t nread;

    while ((nread = traced_read(STDIN_FILENO, buf, BUFSIZE)) > 0) {
        for (int i = 0; i < num_outputs; i++) {
            ssize_t nleft = nread;
            char *ptr = buf;

            while (nleft > 0) {
                ssize_t nwritten = traced_write(output_fds[i], ptr, nleft);
                if (nwritten < 0) {
                    perror("write");
                    break;
                }
                nleft -= nwritten;
                ptr += nwritten;
            }
        }
    }

    /* Close output files (not stdout) */
    for (int i = 1; i < num_outputs; i++) {
        close(output_fds[i]);
    }

    if (trace) {
        trace_print_stats();
    }

    return (nread < 0) ? 1 : 0;
}

Checkpoint: Test with large files, pipes, and multiple outputs.

5.5 Phase 4: rio-stat and Buffer Comparison (Days 9-10)

Goal: Create diagnostic tool that demonstrates buffering impact

/* rio_stat.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <time.h>
#include <sys/stat.h>
#include "trace.h"

void benchmark_buffer_size(const char *filename, size_t bufsize)
{
    int fd = open(filename, O_RDONLY);
    if (fd < 0) {
        perror(filename);
        return;
    }

    /* Reset file position and stats for each test */
    lseek(fd, 0, SEEK_SET);
    trace_init(0);  /* No output, just stats */

    char *buf = malloc(bufsize);
    if (!buf) {
        close(fd);
        return;
    }

    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);

    ssize_t nread;
    size_t total = 0;
    while ((nread = traced_read(fd, buf, bufsize)) > 0) {
        total += nread;
        /* Simulate minimal processing: touch each byte */
        volatile char sum = 0;
        for (ssize_t i = 0; i < nread; i++) {
            sum += buf[i];
        }
    }

    clock_gettime(CLOCK_MONOTONIC, &end);

    double elapsed = (end.tv_sec - start.tv_sec) * 1000.0 +
                     (end.tv_nsec - start.tv_nsec) / 1e6;

    io_stats_t *stats = trace_get_stats();
    printf("%10zu %8zu %7zu %10.2f\n",
           bufsize, stats->read_calls, total, elapsed);

    free(buf);
    close(fd);
}

void compare_buffer_sizes(const char *filename)
{
    struct stat sb;
    if (stat(filename, &sb) < 0) {
        perror(filename);
        return;
    }

    printf("Reading %lld bytes with different buffer sizes:\n\n",
           (long long)sb.st_size);
    printf("%10s %8s %7s %10s\n",
           "Buffer", "Reads", "Bytes", "Time(ms)");
    printf("%10s %8s %7s %10s\n",
           "------", "-----", "-----", "--------");

    size_t sizes[] = {1, 64, 256, 512, 1024, 4096, 8192, 65536};
    int nsizes = sizeof(sizes) / sizeof(sizes[0]);

    for (int i = 0; i < nsizes; i++) {
        benchmark_buffer_size(filename, sizes[i]);
    }

    printf("\nConclusion: Larger buffers = fewer syscalls = faster I/O\n");
}

void demonstrate_short_counts(void)
{
    printf("=== Short Count Demonstration ===\n\n");
    printf("Run this with: (sleep 0.1; echo 'line1'; sleep 0.1; echo 'line2') | %s\n\n",
           "rio-stat --short");

    char buf[1024];
    ssize_t nread;
    int count = 0;

    trace_init(1);

    while ((nread = traced_read(STDIN_FILENO, buf, sizeof(buf))) > 0) {
        count++;
        printf("Read %d returned %zd bytes\n", count, nread);
    }

    printf("\nNote: Each read may return different amounts due to timing!\n");
    trace_print_stats();
}

int main(int argc, char **argv)
{
    if (argc < 2) {
        fprintf(stderr, "Usage: %s [--buffer-compare file | --short]\n", argv[0]);
        return 1;
    }

    if (strcmp(argv[1], "--buffer-compare") == 0 && argc >= 3) {
        compare_buffer_sizes(argv[2]);
    } else if (strcmp(argv[1], "--short") == 0) {
        demonstrate_short_counts();
    } else {
        fprintf(stderr, "Unknown option: %s\n", argv[1]);
        return 1;
    }

    return 0;
}

Checkpoint: Output clearly shows syscall count reduction with larger buffers.

5.6 Phase 5: Memory-Mapped Search (Days 11-12)

Goal: Implement and compare mmap-based file access

/* mmap_search.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <time.h>

typedef struct {
    size_t *offsets;
    size_t count;
    size_t capacity;
} match_list_t;

void match_list_init(match_list_t *list)
{
    list->count = 0;
    list->capacity = 16;
    list->offsets = malloc(list->capacity * sizeof(size_t));
}

void match_list_add(match_list_t *list, size_t offset)
{
    if (list->count >= list->capacity) {
        list->capacity *= 2;
        list->offsets = realloc(list->offsets, list->capacity * sizeof(size_t));
    }
    list->offsets[list->count++] = offset;
}

void match_list_free(match_list_t *list)
{
    free(list->offsets);
}

/* Read-based search */
double search_with_read(const char *filename, const char *pattern,
                        match_list_t *matches)
{
    int fd = open(filename, O_RDONLY);
    if (fd < 0) return -1;

    struct stat sb;
    fstat(fd, &sb);

    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);

    size_t plen = strlen(pattern);
    size_t bufsize = 65536;
    char *buf = malloc(bufsize + plen);
    size_t overlap = 0;
    size_t file_offset = 0;

    ssize_t nread;
    while ((nread = read(fd, buf + overlap, bufsize)) > 0) {
        size_t total = overlap + nread;

        /* Search for pattern in buffer */
        for (size_t i = 0; i <= total - plen; i++) {
            if (memcmp(buf + i, pattern, plen) == 0) {
                match_list_add(matches, file_offset + i - overlap);
            }
        }

        /* Keep potential partial match at end */
        if (total >= plen) {
            overlap = plen - 1;
            memmove(buf, buf + total - overlap, overlap);
        }
        file_offset += nread;
    }

    clock_gettime(CLOCK_MONOTONIC, &end);
    close(fd);
    free(buf);

    return (end.tv_sec - start.tv_sec) * 1000.0 +
           (end.tv_nsec - start.tv_nsec) / 1e6;
}

/* mmap-based search */
double search_with_mmap(const char *filename, const char *pattern,
                        match_list_t *matches)
{
    int fd = open(filename, O_RDONLY);
    if (fd < 0) return -1;

    struct stat sb;
    fstat(fd, &sb);

    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);

    char *addr = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (addr == MAP_FAILED) {
        close(fd);
        return -1;
    }

    /* Advise kernel about access pattern (sequential) */
    madvise(addr, sb.st_size, MADV_SEQUENTIAL);

    size_t plen = strlen(pattern);
    for (size_t i = 0; i <= (size_t)sb.st_size - plen; i++) {
        if (memcmp(addr + i, pattern, plen) == 0) {
            match_list_add(matches, i);
        }
    }

    clock_gettime(CLOCK_MONOTONIC, &end);
    munmap(addr, sb.st_size);
    close(fd);

    return (end.tv_sec - start.tv_sec) * 1000.0 +
           (end.tv_nsec - start.tv_nsec) / 1e6;
}

void print_matches(match_list_t *matches, int max_show)
{
    printf("Found %zu matches", matches->count);
    if (matches->count > 0) {
        printf(" at offsets: ");
        int show = matches->count < (size_t)max_show ?
                   matches->count : max_show;
        for (int i = 0; i < show; i++) {
            printf("%zu", matches->offsets[i]);
            if (i < show - 1) printf(", ");
        }
        if (matches->count > (size_t)max_show) {
            printf(", ...");
        }
    }
    printf("\n");
}

int main(int argc, char **argv)
{
    if (argc < 3) {
        fprintf(stderr, "Usage: %s [--compare] pattern file\n", argv[0]);
        return 1;
    }

    int compare = (strcmp(argv[1], "--compare") == 0);
    const char *pattern = compare ? argv[2] : argv[1];
    const char *filename = compare ? argv[3] : argv[2];

    if (compare) {
        printf("Comparing search methods on '%s' for pattern '%s'\n\n",
               filename, pattern);

        match_list_t read_matches, mmap_matches;
        match_list_init(&read_matches);
        match_list_init(&mmap_matches);

        double read_time = search_with_read(filename, pattern, &read_matches);
        printf("Read-based search:\n");
        printf("  Time: %.2f ms\n", read_time);
        print_matches(&read_matches, 5);

        double mmap_time = search_with_mmap(filename, pattern, &mmap_matches);
        printf("\nMmap-based search:\n");
        printf("  Time: %.2f ms\n", mmap_time);
        print_matches(&mmap_matches, 5);

        printf("\nSpeedup: %.2fx\n",
               mmap_time > 0 ? read_time / mmap_time : 0);

        match_list_free(&read_matches);
        match_list_free(&mmap_matches);
    } else {
        match_list_t matches;
        match_list_init(&matches);
        double time = search_with_mmap(filename, pattern, &matches);
        printf("Search completed in %.2f ms\n", time);
        print_matches(&matches, 10);
        match_list_free(&matches);
    }

    return 0;
}

Checkpoint: Demonstrate mmap advantage for random access patterns.

5.7 Phase 6: Integration and Polish (Days 13-14)

Goal: Complete toolkit, error handling, documentation

Tasks:

Implement rio-transform
Add comprehensive error messages
Write tests for edge cases
Create usage documentation

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Example Tests
Unit Tests	Test individual Rio functions	readn returns exact bytes
Short Count Tests	Verify handling of partial I/O	Read from slow pipe
Large File Tests	Handle files bigger than memory	1GB file copy
Edge Cases	Boundary conditions	Empty file, exact buffer size
Integration Tests	Full pipeline	cat \| tee \| transform

6.2 Critical Test Cases

Test 1: Short Counts from Pipes

#!/bin/bash
# test_short_counts.sh

# Create slow pipe producer
(
    for i in {1..10}; do
        echo "Line $i of data that spans multiple bytes"
        sleep 0.05
    done
) | ./rio-cat --trace 2>&1 | grep -c "short read"
# Should show multiple reads, some possibly short

Test 2: Large File Handling

#!/bin/bash
# test_large_file.sh

# Create large test file
dd if=/dev/urandom of=large_test.bin bs=1M count=100 2>/dev/null

# Copy and compare
./rio-cat large_test.bin > large_copy.bin
diff large_test.bin large_copy.bin && echo "PASS: Large file copy correct"

# Check no hangs with pipe
./rio-cat large_test.bin | ./rio-cat > /dev/null
echo "PASS: Piped large file"

rm large_test.bin large_copy.bin

Test 3: Partial Write Handling

/* test_partial_write.c */
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>

/* Create a situation where write() might return less than requested */
int main(void)
{
    int pipefd[2];
    pipe(pipefd);

    /* Fill pipe buffer (typically 64KB on Linux) */
    fcntl(pipefd[1], F_SETFL, O_NONBLOCK);

    char buf[4096];
    memset(buf, 'A', sizeof(buf));

    int total = 0;
    while (write(pipefd[1], buf, sizeof(buf)) > 0) {
        total += sizeof(buf);
    }

    printf("Pipe buffer filled with %d bytes\n", total);
    printf("Now test rio_writen behavior...\n");

    close(pipefd[0]);
    close(pipefd[1]);
    return 0;
}

Test 4: EOF Detection

#!/bin/bash
# test_eof.sh

# Empty file
touch empty.txt
./rio-cat empty.txt | wc -c  # Should be 0

# File without newline at end
printf "no trailing newline" > nonewline.txt
./rio-cat nonewline.txt | wc -c  # Should be 19

rm empty.txt nonewline.txt

Test 5: Binary Data Integrity

#!/bin/bash
# test_binary.sh

# Create binary file with all byte values
python3 -c "import sys; sys.stdout.buffer.write(bytes(range(256)) * 100)" > binary.bin

# Copy and verify
./rio-cat binary.bin | ./rio-cat > binary_copy.bin
cmp binary.bin binary_copy.bin && echo "PASS: Binary data preserved"

rm binary.bin binary_copy.bin

6.3 Automated Test Script

#!/bin/bash
# run_tests.sh

PASS=0
FAIL=0

test_result() {
    if [ $1 -eq 0 ]; then
        echo "PASS: $2"
        ((PASS++))
    else
        echo "FAIL: $2"
        ((FAIL++))
    fi
}

# Test 1: Basic copy
echo "Hello, World!" | ./rio-cat > /tmp/test_out.txt
diff <(echo "Hello, World!") /tmp/test_out.txt > /dev/null
test_result $? "Basic stdin to stdout copy"

# Test 2: File copy
echo "File content" > /tmp/test_in.txt
./rio-cat /tmp/test_in.txt > /tmp/test_out.txt
diff /tmp/test_in.txt /tmp/test_out.txt > /dev/null
test_result $? "File copy"

# Test 3: Tee to file
echo "Tee test" | ./rio-tee /tmp/tee_out.txt > /dev/null
grep -q "Tee test" /tmp/tee_out.txt
test_result $? "Tee to file"

# Test 4: Empty input
./rio-cat < /dev/null > /tmp/empty_out.txt
[ ! -s /tmp/empty_out.txt ]
test_result $? "Empty input handling"

# Test 5: Large file (10MB)
dd if=/dev/zero bs=1M count=10 2>/dev/null | ./rio-cat > /tmp/large_out.txt
[ $(stat -f%z /tmp/large_out.txt 2>/dev/null || stat -c%s /tmp/large_out.txt) -eq 10485760 ]
test_result $? "Large file (10MB)"

# Cleanup
rm -f /tmp/test_*.txt /tmp/tee_out.txt /tmp/large_out.txt

echo ""
echo "Results: $PASS passed, $FAIL failed"
exit $FAIL

7. Common Pitfalls

Pitfall	Symptom	Solution
Not handling short counts	Silent data truncation	Always loop on read/write
Mixing stdio and syscalls	Skipped or duplicate data	Use one approach consistently
Not flushing buffers	Missing output	Call fflush() or use unbuffered
Line buffering assumptions	Interactive I/O delays	Check tty with isatty()

7.2 File Descriptor Leaks

/* WRONG: Leaks fd on error */
int fd = open(file, O_RDONLY);
if (some_error) {
    return -1;  /* fd not closed! */
}

/* RIGHT: Always clean up */
int fd = open(file, O_RDONLY);
if (some_error) {
    close(fd);
    return -1;
}

7.3 mmap Pitfalls

Pitfall	Symptom	Solution
Forgetting munmap	Memory leak	Always pair mmap/munmap
mmap on non-seekable	EINVAL	Check file type first
File truncated after mmap	SIGBUS	Handle or avoid
Writing to MAP_PRIVATE expecting persistence	Data lost	Use MAP_SHARED

7.4 Signal Handling

/* Handle EINTR in I/O loops */
while ((nread = read(fd, buf, n)) < 0) {
    if (errno == EINTR)
        continue;  /* Retry on signal interrupt */
    else
        return -1; /* Real error */
}

7.5 Common Error Patterns

/* WRONG: Assuming read() fills buffer */
char buf[1024];
read(fd, buf, 1024);
printf("%s", buf);  /* buf may not be null-terminated! */

/* RIGHT: Track actual bytes read */
char buf[1024];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
if (n > 0) {
    buf[n] = '\0';  /* Null-terminate */
    printf("%s", buf);
}

8. Extensions

8.1 Beginner Extensions

Add compression: Pipe through gzip
Progress indicator: Show bytes processed for large files
Timeout support: Abort if I/O stalls too long

8.2 Intermediate Extensions

Parallel I/O: Use multiple threads for tee destinations
Scatter/gather: Implement readv/writev wrappers
Direct I/O: Bypass page cache with O_DIRECT
Sparse file handling: Detect and preserve holes

8.3 Advanced Extensions

Async I/O: Use io_uring (Linux 5.1+)
Network I/O: Add socket support with select/epoll
Journaling: Implement write-ahead logging
Fuzz testing: Use AFL or libfuzzer on I/O functions

9. Real-World Connections

9.1 Industry Applications

Application	Relevant Concepts
Database systems	Buffer management, fsync, mmap
Web servers	Non-blocking I/O, sendfile()
Log processing	Large file streaming, buffering
Network proxies	Pipe/socket handling, select/epoll
Backup software	Efficient copying, sparse files

dd: Low-level copy with block size control
cat: Basic file concatenation
tee: Stream splitting
strace: Syscall tracing (use to verify your implementation)
splice/sendfile: Zero-copy data transfer (advanced)

9.3 Interview Relevance

This project prepares you for questions like:

“How would you implement a robust file copy?”
“Explain the difference between buffered and unbuffered I/O”
“When would you use mmap vs read()?”
“How do pipes work?”
“What causes partial reads/writes?”

10. Resources

10.1 Essential Reading

CS:APP Chapter 10: “System-Level I/O” - The core material
CS:APP Chapter 9: “Virtual Memory” - For mmap understanding
“The Linux Programming Interface” by Michael Kerrisk: Chapters 4, 5, 49
Advanced Programming in the UNIX Environment by Stevens: I/O chapters

10.2 Man Pages (Read These!)

man 2 read
man 2 write
man 2 open
man 2 close
man 2 mmap
man 2 pipe
man 2 dup2
man 2 select
man 7 pipe

10.3 Online Resources

Previous: P14 (Build Your Own Malloc) - Memory management foundation
Next: P16 (Concurrency Workbench) - Builds on I/O concepts
Capstone: P17 uses robust I/O for network proxy

11. Self-Assessment Checklist

Understanding

I can explain why read() might return fewer bytes than requested
I understand the difference between file descriptors and open file entries
I can draw the three-level file descriptor architecture (descriptor table, open file table, v-node table)
I know when to use stdio vs system calls vs mmap
I understand how pipe buffering works
I can explain what dup2 does and why it’s useful

Implementation

rio_readn correctly handles short counts and EINTR
rio_writen correctly handles short counts and EINTR
Buffered reader efficiently handles line-oriented input
Tools work correctly with pipes (stdin from pipe)
Tools work correctly with large files (>1GB)
mmap-search correctly handles edge cases

Evidence

Buffer comparison shows clear syscall reduction with larger buffers
Trace mode accurately shows all read/write calls
Short count demonstration shows actual partial reads
mmap vs read comparison shows measurable differences

Debugging Skills

I can use strace to verify my implementation’s syscalls
I can diagnose a “hanging” I/O operation
I can explain why data might be silently truncated
I can identify buffer-mixing bugs

12. Real World Outcome

When you complete this project, here’s exactly what you’ll see when running your tools:

Basic File Copy with rio-cat

$ ./rio-cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...

$ echo "Hello from pipe!" | ./rio-cat
Hello from pipe!

$ ./rio-cat /etc/passwd | wc -l
42

Tracing Mode - See Every Syscall

$ ./rio-cat --trace /etc/passwd > /dev/null

=== RIO TRACE: /etc/passwd ===
[open]   fd=3, path=/etc/passwd, flags=O_RDONLY
[read]   fd=3, requested=8192, returned=2847
[write]  fd=1, requested=2847, returned=2847
[read]   fd=3, requested=8192, returned=0 (EOF)
[close]  fd=3

Summary:
  Total bytes read:    2,847
  Total bytes written: 2,847
  Read syscalls:       2 (avg 1,423 bytes/call)
  Write syscalls:      1
  Short reads:         0
  Short writes:        0

Buffer Size Comparison

$ ./rio-stat --buffer-compare /usr/share/dict/words

=== BUFFER SIZE COMPARISON ===
File: /usr/share/dict/words (976,241 bytes)

Buffer Size    Read Calls    Time (ms)    Throughput
----------------------------------------------------------
        16       61,016       156.3        6.1 MB/s
        64       15,254        42.8       22.3 MB/s
       256        3,814        15.2       62.8 MB/s
      1024          954         8.4      113.7 MB/s
      4096          239         4.2      227.4 MB/s
     16384           60         3.1      308.3 MB/s
     65536           15         2.8      341.3 MB/s

Analysis:
  - 16 -> 65536 byte buffer: 56x throughput improvement
  - Syscall overhead dominates at small buffer sizes
  - Diminishing returns after ~16KB buffer
  - System page size: 4096 bytes (aligned reads most efficient)

Short Count Demonstration

$ ./rio-stat --short

=== SHORT COUNT DEMONSTRATION ===

Creating a slow pipe producer (100 bytes every 50ms)...

Attempt 1: read(fd, buf, 8192)
  Requested: 8192 bytes
  Returned:  100 bytes  <-- SHORT COUNT!
  Reason:    Pipe had only 100 bytes available

Attempt 2: read(fd, buf, 8192)
  Requested: 8192 bytes
  Returned:  100 bytes  <-- SHORT COUNT!
  Reason:    More data arrived from slow producer

... (8 more reads) ...

Total data: 1000 bytes in 10 read() calls
Average per read: 100 bytes (expected with slow producer)

Key lesson:
  read() can return FEWER bytes than requested!
  - Pipes: only returns what's in the buffer
  - Sockets: network packets arrive in chunks
  - Terminals: line-buffered input

  This is why rio_readn() loops until all bytes received.

Memory-Mapped Search

$ ./mmap-search --compare "TODO" src/

=== MMAP vs READ SEARCH COMPARISON ===
Pattern: "TODO"
Directory: src/ (47 files, 234 KB total)

Read-based search:
  Time: 12.4 ms
  Found: 23 matches across 8 files
  Syscalls: 94 (open + read + close for each file)

Mmap-based search:
  Time: 8.7 ms
  Found: 23 matches across 8 files
  Syscalls: 47 (open + mmap + munmap + close)

Speedup: 1.43x faster with mmap

When mmap wins:
  - Random access patterns (jumping around file)
  - Repeated searches of same file
  - Files that fit in page cache

When read() wins:
  - Sequential single-pass processing
  - Small files (mmap overhead not worth it)
  - Files larger than available RAM

Tee Command with Stats

$ cat /var/log/syslog | ./rio-tee --stats backup.log | grep ERROR

=== RIO-TEE STATISTICS ===
Input:  stdin (pipe)
Output: stdout (tty), backup.log (file)

Bytes transferred:  1,247,892
  To stdout:        1,247,892
  To backup.log:    1,247,892

Read calls:         153
Write calls:        306 (2x, one per output)
Average chunk:      8,155 bytes

Time elapsed:       0.847 seconds
Throughput:         1.44 MB/s

13. The Core Question You’re Answering

“Why can’t I just call read() once and expect to get all my data? What could possibly go wrong with simple file I/O?”

This project reveals that Unix I/O is fundamentally about dealing with uncertainty. Network connections deliver data in packets. Pipes buffer limited amounts. Signals can interrupt any system call. Your code must handle all these cases gracefully. The Robust I/O (Rio) package provides wrappers that handle these complexities, turning unreliable primitives into reliable abstractions.

14. Concepts You Must Understand First

Before starting this project, ensure you understand these concepts:

Concept	Why It Matters	Where to Learn
File descriptors	You’ll work with fds directly	CS:APP 10.1
The open/read/write/close syscalls	Core primitives you’ll wrap	CS:APP 10.2-10.4
What errno means	Error handling depends on errno	CS:APP 10.4, man errno
Difference between stdin/stdout and fd 0/1	Stdio vs. syscalls	CS:APP 10.1, 10.10
What buffering means	You’ll implement buffers	CS:APP 10.5
How pipes work (pipe(), dup2())	Testing short counts	CS:APP 10.9
Basic pointer arithmetic	Buffer management	Any C book

15. Questions to Guide Your Design

Work through these questions BEFORE writing code:

Short Count Strategy: What should rio_readn do when read() returns fewer bytes than requested? How do you track progress?
Buffer Design: For buffered I/O (rio_readlineb), where does the buffer live? How big should it be? What state do you track?
Error vs. EOF: How do you distinguish end-of-file (read returns 0) from an error (read returns -1)?
EINTR Handling: What happens if a signal interrupts read()? Should you retry or return an error?
Binary vs. Text: Your rio_readn handles binary data. Your rio_readlineb handles text. What’s different about them?
Thread Safety: If two threads use the same buffered I/O structure, what could go wrong?
Performance Measurement: How will you measure syscall counts and timing to prove buffering helps?

16. Thinking Exercise

Before writing any code, trace through this scenario by hand:

You’re reading from a network socket where data arrives in chunks:

Chunk 1: “Hello” (5 bytes) arrives
Chunk 2: “ World” (6 bytes) arrives
Chunk 3: “!\n” (2 bytes) arrives
Connection closes (EOF)

Your buffer is 4096 bytes, and you’re calling rio_readlineb(rp, buf, 100).

Exercise: On paper, answer:

First read(): What does the kernel return? What goes in your internal buffer?
Scanning for newline: You scan the buffer for ‘\n’. Is it there?
Second read(): You haven’t found ‘\n’. What do you do? What does kernel return?
Third read(): Now what? Is the line complete?
State after rio_readlineb returns: What’s in buf? What’s left in internal buffer? What’s the return value?
Second call to rio_readlineb: Would it need to call read()? Why or why not?

Verify your answers by implementing buffered I/O with debug prints.

17. The Interview Questions They’ll Ask

After completing this project, you’ll be ready for these common interview questions:

“Why might read() return fewer bytes than you requested?”
- Expected: Pipes/sockets may not have enough data; signals can interrupt
- Bonus: Explain EINTR, explain difference between blocking and non-blocking
“Explain the Unix file descriptor table.”
- Expected: Per-process table mapping fd numbers to open file entries
- Bonus: Draw the three-level structure (fd table, open file table, v-node table), explain fork() sharing
“When would you use mmap() instead of read()?”
- Expected: Random access, shared memory, let OS handle caching
- Bonus: Discuss page faults, dirty pages, MAP_PRIVATE vs MAP_SHARED
“What’s the difference between buffered I/O (stdio) and unbuffered I/O (syscalls)?”
- Expected: Buffered reduces syscalls, but adds complexity (flushing, mixing)
- Bonus: Explain line buffering vs. block buffering, setbuf(), setvbuf()
“How would you efficiently copy a large file?”
- Expected: Large buffer sizes, possibly sendfile() or splice()
- Bonus: Discuss zero-copy techniques, when mmap + write is efficient
“What happens if you mix read() and fgets() on the same file descriptor?”
- Expected: Disaster - fgets() has internal buffer that read() doesn’t know about
- Bonus: Explain why you should never mix buffered and unbuffered I/O

18. Hints in Layers

If you’re stuck, reveal hints one at a time:

Hint 1: rio_readn Structure

The key insight is to loop until you’ve read all requested bytes OR hit EOF/error:

ssize_t rio_readn(int fd, void *usrbuf, size_t n) {
    size_t nleft = n;
    ssize_t nread;
    char *bufp = usrbuf;

    while (nleft > 0) {
        if ((nread = read(fd, bufp, nleft)) < 0) {
            if (errno == EINTR)  // Interrupted by signal
                nread = 0;       // Retry
            else
                return -1;       // Real error
        }
        else if (nread == 0)     // EOF
            break;

        nleft -= nread;
        bufp += nread;
    }
    return (n - nleft);  // Return bytes actually read
}

Hint 2: Buffered I/O State

For buffered reading, you need to track:

#define RIO_BUFSIZE 8192

typedef struct {
    int rio_fd;                 // Descriptor for this internal buf
    int rio_cnt;                // Unread bytes in internal buf
    char *rio_bufptr;           // Next unread byte in internal buf
    char rio_buf[RIO_BUFSIZE];  // Internal buffer
} rio_t;

void rio_readinitb(rio_t *rp, int fd) {
    rp->rio_fd = fd;
    rp->rio_cnt = 0;
    rp->rio_bufptr = rp->rio_buf;
}

The key: always check internal buffer first, only call read() when buffer is empty.

Hint 3: Buffered Read Implementation

Two-level approach: low-level fills buffer, high-level returns what user needs:

// Low-level: fill internal buffer
static ssize_t rio_read(rio_t *rp, char *usrbuf, size_t n) {
    int cnt;

    while (rp->rio_cnt <= 0) {  // Buffer empty - refill it
        rp->rio_cnt = read(rp->rio_fd, rp->rio_buf, sizeof(rp->rio_buf));
        if (rp->rio_cnt < 0) {
            if (errno != EINTR)
                return -1;
        }
        else if (rp->rio_cnt == 0)  // EOF
            return 0;
        else
            rp->rio_bufptr = rp->rio_buf;  // Reset pointer
    }

    // Copy min(n, available) bytes from buffer to user
    cnt = n;
    if (rp->rio_cnt < n)
        cnt = rp->rio_cnt;
    memcpy(usrbuf, rp->rio_bufptr, cnt);
    rp->rio_bufptr += cnt;
    rp->rio_cnt -= cnt;
    return cnt;
}

Hint 4: Testing Short Counts

To reliably create short counts, use a slow pipe:

int pipefd[2];
pipe(pipefd);

if (fork() == 0) {
    // Child: write slowly
    close(pipefd[0]);
    for (int i = 0; i < 10; i++) {
        write(pipefd[1], "AAAAAAAAAA", 10);  // 10 bytes
        usleep(50000);  // 50ms delay
    }
    close(pipefd[1]);
    exit(0);
}

// Parent: read with large buffer
close(pipefd[1]);
char buf[8192];
ssize_t n;
while ((n = read(pipefd[0], buf, sizeof(buf))) > 0) {
    printf("read() returned %zd bytes (requested 8192)\n", n);
}

You’ll see read() returning ~10 bytes each time!

19. Books That Will Help

Topic	Book	Chapter/Section
Unix I/O overview	CS:APP 3rd Ed	Chapter 10.1-10.4 “Unix I/O”, “Files”, “Opening and Closing Files”, “Reading and Writing Files”
Robust I/O package	CS:APP 3rd Ed	Chapter 10.5 “Robust Reading and Writing with the Rio Package”
File sharing and redirection	CS:APP 3rd Ed	Chapter 10.8-10.9 “Sharing Files”, “I/O Redirection”
Standard I/O	CS:APP 3rd Ed	Chapter 10.10 “Standard I/O”
Memory-mapped I/O	CS:APP 3rd Ed	Chapter 9.8 “Memory Mapping”
Pipes in depth	The Linux Programming Interface	Chapter 44 “Pipes and FIFOs”
Advanced I/O	Advanced Programming in the UNIX Environment	Chapter 14 “Advanced I/O”
Zero-copy techniques	Linux System Programming	Chapter 4 “Advanced File I/O”

20. Submission / Completion Criteria

Minimum Viable Completion

rio_readn and rio_writen implemented and tested
rio-cat works with files and stdin
Basic trace mode shows syscall counts
Handles large files without issues

Full Completion

All five tools implemented (rio-cat, rio-tee, rio-transform, rio-stat, mmap-search)
Buffered I/O functions implemented
Comprehensive tracing with statistics
Buffer size comparison produces clear evidence
mmap search with performance comparison
All test cases pass

Excellence (Going Above & Beyond)

Async I/O with io_uring
Network socket support
Comprehensive error messages with recovery suggestions
Performance within 10% of system tools (cat, tee)
Fuzz testing for robustness

This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.

Project 15: Robust Unix I/O Toolkit

Quick Reference

Table of Contents

1. Learning Objectives

2. Theoretical Foundation

2.1 The Unix I/O Model: Everything Is a File

2.2 File Descriptors: The Handle to Everything

2.3 Core System Calls: open(), close(), read(), write()

open() - Opening Files

close() - Closing Files

read() - Reading Data

write() - Writing Data

2.4 Short Counts: The Core Challenge

2.5 The RIO (Robust I/O) Package

2.6 Buffered I/O: stdio vs System Calls

2.7 File Metadata: stat() and Friends

2.8 Memory-Mapped I/O: mmap()

2.9 Pipes and FIFOs

2.10 dup() and dup2(): Descriptor Manipulation

2.11 I/O Multiplexing: select/poll/epoll Overview

2.12 Common Misconceptions

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.2.1 rio-cat: Robust Copy

3.2.2 rio-tee: Stream Splitter

3.2.3 rio-transform: Stream Transformer

3.2.4 rio-stat: I/O Statistics

3.2.5 mmap-search: Memory-Mapped Search

3.3 Non-Functional Requirements

3.4 Example Session

4. Solution Architecture

4.1 High-Level Design

4.2 Key Data Structures

4.3 Module Structure

5. Implementation Guide

5.1 Development Environment Setup

5.2 Phase 1: Rio Library Core (Days 1-3)

5.3 Phase 2: I/O Tracing Layer (Days 4-5)

5.4 Phase 3: rio-cat and rio-tee (Days 6-8)

5.5 Phase 4: rio-stat and Buffer Comparison (Days 9-10)

5.6 Phase 5: Memory-Mapped Search (Days 11-12)

5.7 Phase 6: Integration and Polish (Days 13-14)

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Test 1: Short Counts from Pipes

Test 2: Large File Handling

Test 3: Partial Write Handling

Test 4: EOF Detection

Test 5: Binary Data Integrity

6.3 Automated Test Script

7. Common Pitfalls

7.1 Buffering-Related Bugs

7.2 File Descriptor Leaks

7.3 mmap Pitfalls

7.4 Signal Handling

7.5 Common Error Patterns

8. Extensions

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Tools

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Man Pages (Read These!)

10.3 Online Resources

10.4 Related Projects in This Series

11. Self-Assessment Checklist

Understanding

Implementation

Evidence

Debugging Skills

12. Real World Outcome

Basic File Copy with rio-cat

Tracing Mode - See Every Syscall

Buffer Size Comparison