Project 2: An Asynchronous `cat` Clone

Build a file reader that uses libuv’s threadpool-backed async file operations, mastering callback chaining for sequential asynchronous logic.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	Weekend
Language	C
Prerequisites	Project 1, solid C skills (pointers, memory)
Key Topics	File I/O, threadpool, callback chaining, buffer management

1. Learning Objectives

By completing this project, you will:

Use libuv’s asynchronous file system operations (uv_fs_*)
Understand how libuv uses a threadpool for blocking operations
Master callback chaining for sequential async operations
Manage buffers and memory in an async context
Handle errors properly across callback boundaries
Clean up uv_fs_t request objects correctly

2. Theoretical Foundation

2.1 Core Concepts

Asynchronous File I/O

Unlike network I/O, file system operations cannot use OS-level async mechanisms like epoll or kqueue. The kernel’s file system layer is fundamentally blocking.

libuv solves this by using a thread pool:

┌─────────────────────────────────────────────────────────────────┐
│                     Your C Code (Main Thread)                    │
│                                                                  │
│  uv_fs_open(loop, &req, "file.txt", O_RDONLY, 0, on_open);      │
│                         │                                        │
│                         │ Posts work to thread pool              │
│                         ▼                                        │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                     Thread Pool (4 threads default)          ││
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       ││
│  │  │ Thread 1 │ │ Thread 2 │ │ Thread 3 │ │ Thread 4 │       ││
│  │  │          │ │          │ │          │ │          │       ││
│  │  │ open()   │ │  (idle)  │ │  (idle)  │ │  (idle)  │       ││
│  │  │ blocking │ │          │ │          │ │          │       ││
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘       ││
│  └─────────────────────────────────────────────────────────────┘│
│                         │                                        │
│                         │ Work completes                         │
│                         ▼                                        │
│  Event loop picks up result, calls on_open() in main thread      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Request Objects

For file operations, libuv uses request objects (uv_fs_t) instead of handles:

// Request object holds state for one operation
uv_fs_t open_req;

// Start the operation
uv_fs_open(loop, &open_req, "file.txt", O_RDONLY, 0, on_open);

// In callback, result is in the request
void on_open(uv_fs_t* req) {
    int fd = req->result;  // File descriptor or error code
}

Key differences from handles:

Handles	Requests
Long-lived	Short-lived (one operation)
Must be `uv_close()`d	Must call `uv_fs_req_cleanup()`
Represent a resource	Represent an operation
e.g., `uv_tcp_t`	e.g., `uv_fs_t`

Callback Chaining

Since operations are asynchronous, you chain them through callbacks:

on_open()  ──►  on_read()  ──►  on_read()  ──►  on_close()
    │              │              │               │
    ▼              ▼              ▼               ▼
Got FD         Got data      Got more data    Cleanup done
Start read     Print it      Print it         Exit
               Read more     EOF? Close

2.2 Why This Matters

Async file I/O is critical for:

High-performance servers: Serving static files without blocking
Database engines: Reading/writing data files
Build tools: Parallel file processing
Log processors: Tailing files without blocking

Understanding the threadpool model helps you:

Know when operations truly run in parallel
Tune the pool size (UV_THREADPOOL_SIZE)
Understand Node.js fs module internals

2.3 Historical Context

Traditional I/O: Programs called read() and blocked
1990s: Thread-per-connection model for parallelism
2000s: Event loops for network, but FS still blocking
2011: libuv introduces threadpool for async FS
Today: Standard pattern in Node.js, Julia, etc.

2.4 Common Misconceptions

Misconception	Reality
“Async file I/O uses kernel async”	Uses threadpool, not kernel async
“All threads are always busy”	Only 4 threads by default
“Callbacks run in thread pool”	Callbacks run in main thread
“You can use same request twice”	Must cleanup and reinit, or use new request

3. Project Specification

3.1 What You Will Build

A program that reads a file specified on the command line and prints its contents to stdout, using libuv’s async file operations.

3.2 Functional Requirements

Accept filename as command-line argument
Open file asynchronously
Read file in chunks (64KB at a time)
Print each chunk to stdout
Repeat reading until EOF
Close file asynchronously
Clean up all resources

3.3 Non-Functional Requirements

Handle files of any size
Handle read errors gracefully
Print meaningful error messages
No memory leaks
Compile without warnings

3.4 Example Usage / Output

$ echo "Hello, libuv!" > test.txt
$ ./uv-cat test.txt
Hello, libuv!

$ ./uv-cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...

$ ./uv-cat nonexistent.txt
Error opening file: no such file or directory

3.5 Real World Outcome

A working async file reader that demonstrates:

Threadpool-backed file operations
Callback chaining for sequential logic
Proper buffer and memory management
Production-quality error handling

4. Solution Architecture

4.1 High-Level Design

┌──────────────────────────────────────────────────────────────────┐
│                          main()                                   │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │ uv_fs_open(path, on_open)                                  │  │
│  └─────────────────────────────┬──────────────────────────────┘  │
│                                │                                  │
│                                ▼                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │ on_open()                                                  │  │
│  │   - Check result (error or fd)                             │  │
│  │   - Save fd for later                                      │  │
│  │   - uv_fs_read(fd, buffer, on_read)                       │  │
│  └─────────────────────────────┬──────────────────────────────┘  │
│                                │                                  │
│                                ▼                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │ on_read()                            ◄───────────────────┐ │  │
│  │   - If result < 0: error, close                          │ │  │
│  │   - If result == 0: EOF, close                           │ │  │
│  │   - If result > 0: print, read more ─────────────────────┘ │  │
│  └─────────────────────────────┬──────────────────────────────┘  │
│                                │ (EOF or error)                   │
│                                ▼                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │ on_close()                                                 │  │
│  │   - Cleanup all requests                                   │  │
│  │   - Free buffer                                            │  │
│  │   - Loop exits (no more work)                              │  │
│  └────────────────────────────────────────────────────────────┘  │
│                                                                   │
└──────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component	Type	Purpose
`open_req`	`uv_fs_t`	Request for open operation
`read_req`	`uv_fs_t`	Request for read operations
`close_req`	`uv_fs_t`	Request for close operation
`buffer`	`char[65536]`	Buffer for file data
`fd`	`int`	File descriptor
`iov`	`uv_buf_t`	Buffer descriptor for read

4.3 Data Structures

// File request structure (simplified)
struct uv_fs_s {
    uv_req_type type;      // UV_FS
    uv_loop_t* loop;       // Associated loop
    uv_fs_cb cb;           // Callback function
    ssize_t result;        // Result: bytes read, fd, or error
    void* ptr;             // Internal use
    const char* path;      // File path
    uv_stat_t statbuf;     // For stat operations
    // ... more fields ...
};

// Buffer structure
struct uv_buf_t {
    char* base;    // Pointer to buffer data
    size_t len;    // Buffer length
};

4.4 Algorithm Overview

ALGORITHM: Async File Reader

INPUT: filename from command line
OUTPUT: file contents to stdout

1. PARSE ARGUMENTS
   - Get filename from argv[1]
   - Validate argument exists

2. OPEN FILE
   - Call uv_fs_open(filename, on_open)
   - Run loop

3. ON_OPEN CALLBACK
   - Check result for errors
   - If error, print message and exit
   - If success, save fd
   - Allocate buffer
   - Call uv_fs_read(fd, buffer, on_read)

4. ON_READ CALLBACK
   - If result < 0: error, call close
   - If result == 0: EOF, call close
   - If result > 0:
     - Print result bytes to stdout
     - Call uv_fs_read again (loop)

5. ON_CLOSE CALLBACK
   - Cleanup all uv_fs_t requests
   - Free buffer
   - Loop exits automatically

5. Implementation Guide

5.1 Development Environment Setup

# Create project directory
mkdir uv-cat && cd uv-cat

# Create files
touch main.c

# Create Makefile
cat > Makefile << 'EOF'
CC = gcc
CFLAGS = -Wall -Wextra -g $(shell pkg-config --cflags libuv)
LDFLAGS = $(shell pkg-config --libs libuv)

uv-cat: main.c
	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)

clean:
	rm -f uv-cat

.PHONY: clean
EOF

5.2 Project Structure

uv-cat/
├── Makefile
└── main.c

5.3 The Core Question You’re Answering

How do you sequence multiple asynchronous operations when each depends on the result of the previous one?

The answer: Callback chaining - each callback initiates the next operation.

5.4 Concepts You Must Understand First

What’s the difference between uv_fs_t and uv_tcp_t?
- uv_fs_t is a request (one operation)
- uv_tcp_t is a handle (long-lived)
Why does file I/O use a threadpool?
- Kernel FS operations are blocking
- Can’t use epoll/kqueue for files
What does req->result contain?
- For open: file descriptor or negative error
- For read: bytes read, 0 for EOF, or negative error

5.5 Questions to Guide Your Design

State Management:

Where do you store the file descriptor between callbacks?
How does the read callback access the buffer?
Should you use global variables or a context struct?

Buffer Strategy:

How big should the buffer be?
Who allocates the buffer?
Who frees the buffer?

Error Handling:

What if open fails?
What if read fails?
How do you cleanup on error?

5.6 Thinking Exercise

Trace this scenario before coding:

File: "hello.txt" contains "Hello, World!\n" (14 bytes)
Buffer size: 64KB

Time T0: main() calls uv_fs_open()
         - Work posted to threadpool

Time T1: Thread completes open()
         - Result: fd = 3
         - on_open() scheduled

Time T2: on_open() runs
         - Gets fd = 3
         - Calls uv_fs_read()
         - Work posted to threadpool

Time T3: Thread completes read()
         - Result: 14 bytes read
         - on_read() scheduled

Time T4: on_read() runs
         - Prints "Hello, World!\n"
         - Calls uv_fs_read() again

Time T5: Thread completes read()
         - Result: 0 (EOF)
         - on_read() scheduled

Time T6: on_read() runs
         - Sees result == 0
         - Calls uv_fs_close()

Time T7: on_close() runs
         - Cleanup complete
         - Loop exits

Questions:

How many times does on_read() run?
What triggers the close operation?
When does uv_run() return?

5.7 Hints in Layers

Hint 1: Starting Point

Headers you need:

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <uv.h>

Key libuv functions:

uv_fs_open(loop, req, path, flags, mode, callback)
uv_fs_read(loop, req, fd, bufs, nbufs, offset, callback)
uv_fs_close(loop, req, fd, callback)
uv_fs_req_cleanup(req)

Hint 2: Global State Structure

// Global state for simplicity
uv_fs_t open_req;
uv_fs_t read_req;
uv_fs_t close_req;

char buffer[65536];
uv_buf_t iov;
int file_fd;

Initialize the buffer descriptor:

iov = uv_buf_init(buffer, sizeof(buffer));

Hint 3: Callback Signatures

// All uv_fs callbacks have this signature
void on_open(uv_fs_t* req);
void on_read(uv_fs_t* req);
void on_close(uv_fs_t* req);

In on_open:

if (req->result < 0) {
    fprintf(stderr, "Error: %s\n", uv_strerror(req->result));
    uv_fs_req_cleanup(req);
    return;
}
file_fd = req->result;
uv_fs_read(loop, &read_req, file_fd, &iov, 1, -1, on_read);

Hint 4: Read Loop Logic

void on_read(uv_fs_t* req) {
    if (req->result < 0) {
        // Error
        fprintf(stderr, "Read error: %s\n", uv_strerror(req->result));
    } else if (req->result == 0) {
        // EOF - close the file
        uv_fs_close(loop, &close_req, file_fd, on_close);
    } else {
        // Got data - print it
        fwrite(buffer, 1, req->result, stdout);
        // Read more
        uv_fs_req_cleanup(req);
        uv_fs_read(loop, &read_req, file_fd, &iov, 1, -1, on_read);
    }
}

5.8 The Interview Questions They’ll Ask

“Why does libuv use a threadpool for file I/O but not for network I/O?”
- Network I/O has OS-level async support (epoll/kqueue)
- File I/O in Linux is fundamentally blocking
- Windows has async file I/O (IOCP), but libuv uses threadpool for consistency
“What happens if all threadpool threads are busy?”
- New work waits in a queue
- Can cause latency issues
- Can increase pool size with UV_THREADPOOL_SIZE
“How would you handle reading multiple files concurrently?”
- Start multiple uv_fs_open() calls
- Each has its own request object and callback chain
- They run in parallel on threadpool threads
“Why do you need to call uv_fs_req_cleanup()?”
- Frees internal memory allocated by libuv
- Path strings, result buffers, etc.
- Prevents memory leaks
“What’s the difference between synchronous and async libuv FS calls?”
- Sync: uv_fs_open(loop, &req, path, flags, mode, NULL)
- Passing NULL for callback makes it synchronous
- Blocks the calling thread

5.9 Books That Will Help

Topic	Book	Chapter
libuv FS operations	An Introduction to libuv	Chapter on Filesystem
POSIX file I/O	Advanced Programming in UNIX	Chapters 3-4
Buffer management	C Programming: A Modern Approach	Chapter 12 (Pointers)
Thread pools	C++ Concurrency in Action	Thread pool patterns

5.10 Implementation Phases

Phase 1: Synchronous Version (1 hour)

First, build a working version using synchronous calls:

uv_fs_t req;
uv_fs_open(loop, &req, filename, O_RDONLY, 0, NULL);  // NULL = sync
int fd = req.result;
// ... read and close synchronously

This helps you understand the API before adding callbacks.

Phase 2: Async Open (1 hour)

Convert open to async:

Add on_open callback
Handle success and error cases
Don’t read yet, just print “Opened: fd = X”

Phase 3: Async Read (2 hours)

Add the read loop:

Implement on_read callback
Handle data, EOF, and error cases
Print data to stdout
Loop until EOF

Phase 4: Async Close (1 hour)

Add proper cleanup:

Implement on_close callback
Call uv_fs_req_cleanup() on all requests
Verify with Valgrind

5.11 Key Implementation Decisions

Decision	Options	Recommendation
State storage	Global vars vs struct	Global for simplicity
Buffer size	4KB, 64KB, 1MB	64KB (good balance)
Buffer allocation	Stack vs heap	Stack (simpler, no leaks)
Error handling	Exit vs continue	Exit on error (simpler)
Read offset	Track position vs -1	Use -1 (current position)

6. Testing Strategy

Test Files

# Create test files
echo "Hello, World!" > small.txt
dd if=/dev/urandom of=large.bin bs=1M count=10  # 10MB binary
printf "Line 1\nLine 2\nLine 3\n" > lines.txt
touch empty.txt

Test Cases

Test	Command	Expected
Small file	`./uv-cat small.txt`	“Hello, World!”
Large file	`./uv-cat large.bin \\| wc -c`	10485760
Empty file	`./uv-cat empty.txt`	(no output)
Non-existent	`./uv-cat nope.txt`	Error message
No argument	`./uv-cat`	Usage message

Memory Testing

# Check for leaks
valgrind --leak-check=full ./uv-cat small.txt

# Check with large file
valgrind --leak-check=full ./uv-cat large.bin > /dev/null

7. Common Pitfalls & Debugging

Problem	Symptom	Root Cause	Fix
Double cleanup	Crash or corruption	Calling cleanup twice	Track cleanup state
Use after cleanup	Crash	Accessing req after cleanup	Cleanup only in final callback
Missing cleanup	Memory leak	Forgot `uv_fs_req_cleanup()`	Call in on_close
Wrong offset	Wrong data	Not using -1 for offset	Use -1 to read at current position
Buffer overflow	Corruption	Buffer too small	Use adequate size
Infinite loop	Hangs	Never reaching EOF	Check read result == 0

Debugging Tips

// Add debug prints
void on_read(uv_fs_t* req) {
    fprintf(stderr, "[DEBUG] on_read: result = %zd\n", req->result);
    // ...
}

8. Extensions & Challenges

Extension 1: Multiple Files

Read multiple files from command line:

./uv-cat file1.txt file2.txt file3.txt

Challenge: How do you chain multiple file operations?

Extension 2: Write Output

Add -o output.txt flag to write instead of stdout.

Challenge: You’ll need uv_fs_write() and proper ordering.

Extension 3: File Stats

Print file size and modification time before contents:

$ ./uv-cat file.txt
Size: 1234 bytes
Modified: 2024-01-15 10:30:00
Contents:
...

Challenge: Use uv_fs_stat() before reading.

Extension 4: Progress Bar

For large files, show a progress bar:

$ ./uv-cat large.bin > output.bin
[=========>          ] 50% (5MB / 10MB)

Challenge: Need to stat file first, track bytes read.

9. Real-World Connections

How Node.js Uses This

// JavaScript
const fs = require('fs');
fs.readFile('data.txt', (err, data) => {
    console.log(data);
});

// Translates to (simplified):
// 1. uv_fs_open() - open file
// 2. uv_fs_fstat() - get file size
// 3. uv_fs_read() - read entire file
// 4. uv_fs_close() - close file
// 5. Deliver data to JavaScript callback

Production Patterns

Use Case	Pattern
Static file server	Async read into response
Log rotation	Async read, transform, write
Config loading	Async read, parse JSON
File watching	`uv_fs_event` + read on change

10. Resources

Documentation

Reference Implementations

11. Self-Assessment Checklist

Before moving to Project 3, verify:

Program reads and prints files correctly
Handles empty files
Handles large files (test with 100MB+)
Prints meaningful error for missing files
No memory leaks (Valgrind clean)
You can explain the callback chain
You understand why threadpool is used
You know when to call uv_fs_req_cleanup()

12. Submission / Completion Criteria

Your project is complete when:

Functional: Correctly reads and prints any file
Robust: Handles errors gracefully
Clean: No warnings, no memory leaks
Documented: Comments explain the callback flow

Bonus: Implement at least one extension.

Previous	Up	Next
P01: Event Loop Hello World	README	P03: TCP Echo Server

Project 2: An Asynchronous cat Clone