Project 2: An Asynchronous cat Clone
Build a file reader that uses libuv’s threadpool-backed async file operations, mastering callback chaining for sequential asynchronous logic.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | Weekend |
| Language | C |
| Prerequisites | Project 1, solid C skills (pointers, memory) |
| Key Topics | File I/O, threadpool, callback chaining, buffer management |
1. Learning Objectives
By completing this project, you will:
- Use libuv’s asynchronous file system operations (
uv_fs_*) - Understand how libuv uses a threadpool for blocking operations
- Master callback chaining for sequential async operations
- Manage buffers and memory in an async context
- Handle errors properly across callback boundaries
- Clean up
uv_fs_trequest objects correctly
2. Theoretical Foundation
2.1 Core Concepts
Asynchronous File I/O
Unlike network I/O, file system operations cannot use OS-level async mechanisms like epoll or kqueue. The kernel’s file system layer is fundamentally blocking.
libuv solves this by using a thread pool:
┌─────────────────────────────────────────────────────────────────┐
│ Your C Code (Main Thread) │
│ │
│ uv_fs_open(loop, &req, "file.txt", O_RDONLY, 0, on_open); │
│ │ │
│ │ Posts work to thread pool │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Thread Pool (4 threads default) ││
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││
│ │ │ Thread 1 │ │ Thread 2 │ │ Thread 3 │ │ Thread 4 │ ││
│ │ │ │ │ │ │ │ │ │ ││
│ │ │ open() │ │ (idle) │ │ (idle) │ │ (idle) │ ││
│ │ │ blocking │ │ │ │ │ │ │ ││
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││
│ └─────────────────────────────────────────────────────────────┘│
│ │ │
│ │ Work completes │
│ ▼ │
│ Event loop picks up result, calls on_open() in main thread │
│ │
└─────────────────────────────────────────────────────────────────┘
Request Objects
For file operations, libuv uses request objects (uv_fs_t) instead of handles:
// Request object holds state for one operation
uv_fs_t open_req;
// Start the operation
uv_fs_open(loop, &open_req, "file.txt", O_RDONLY, 0, on_open);
// In callback, result is in the request
void on_open(uv_fs_t* req) {
int fd = req->result; // File descriptor or error code
}
Key differences from handles:
| Handles | Requests |
|---|---|
| Long-lived | Short-lived (one operation) |
Must be uv_close()d |
Must call uv_fs_req_cleanup() |
| Represent a resource | Represent an operation |
e.g., uv_tcp_t |
e.g., uv_fs_t |
Callback Chaining
Since operations are asynchronous, you chain them through callbacks:
on_open() ──► on_read() ──► on_read() ──► on_close()
│ │ │ │
▼ ▼ ▼ ▼
Got FD Got data Got more data Cleanup done
Start read Print it Print it Exit
Read more EOF? Close
2.2 Why This Matters
Async file I/O is critical for:
- High-performance servers: Serving static files without blocking
- Database engines: Reading/writing data files
- Build tools: Parallel file processing
- Log processors: Tailing files without blocking
Understanding the threadpool model helps you:
- Know when operations truly run in parallel
- Tune the pool size (
UV_THREADPOOL_SIZE) - Understand Node.js
fsmodule internals
2.3 Historical Context
- Traditional I/O: Programs called
read()and blocked - 1990s: Thread-per-connection model for parallelism
- 2000s: Event loops for network, but FS still blocking
- 2011: libuv introduces threadpool for async FS
- Today: Standard pattern in Node.js, Julia, etc.
2.4 Common Misconceptions
| Misconception | Reality |
|---|---|
| “Async file I/O uses kernel async” | Uses threadpool, not kernel async |
| “All threads are always busy” | Only 4 threads by default |
| “Callbacks run in thread pool” | Callbacks run in main thread |
| “You can use same request twice” | Must cleanup and reinit, or use new request |
3. Project Specification
3.1 What You Will Build
A program that reads a file specified on the command line and prints its contents to stdout, using libuv’s async file operations.
3.2 Functional Requirements
- Accept filename as command-line argument
- Open file asynchronously
- Read file in chunks (64KB at a time)
- Print each chunk to stdout
- Repeat reading until EOF
- Close file asynchronously
- Clean up all resources
3.3 Non-Functional Requirements
- Handle files of any size
- Handle read errors gracefully
- Print meaningful error messages
- No memory leaks
- Compile without warnings
3.4 Example Usage / Output
$ echo "Hello, libuv!" > test.txt
$ ./uv-cat test.txt
Hello, libuv!
$ ./uv-cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...
$ ./uv-cat nonexistent.txt
Error opening file: no such file or directory
3.5 Real World Outcome
A working async file reader that demonstrates:
- Threadpool-backed file operations
- Callback chaining for sequential logic
- Proper buffer and memory management
- Production-quality error handling
4. Solution Architecture
4.1 High-Level Design
┌──────────────────────────────────────────────────────────────────┐
│ main() │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ uv_fs_open(path, on_open) │ │
│ └─────────────────────────────┬──────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ on_open() │ │
│ │ - Check result (error or fd) │ │
│ │ - Save fd for later │ │
│ │ - uv_fs_read(fd, buffer, on_read) │ │
│ └─────────────────────────────┬──────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ on_read() ◄───────────────────┐ │ │
│ │ - If result < 0: error, close │ │ │
│ │ - If result == 0: EOF, close │ │ │
│ │ - If result > 0: print, read more ─────────────────────┘ │ │
│ └─────────────────────────────┬──────────────────────────────┘ │
│ │ (EOF or error) │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ on_close() │ │
│ │ - Cleanup all requests │ │
│ │ - Free buffer │ │
│ │ - Loop exits (no more work) │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | Type | Purpose |
|---|---|---|
open_req |
uv_fs_t |
Request for open operation |
read_req |
uv_fs_t |
Request for read operations |
close_req |
uv_fs_t |
Request for close operation |
buffer |
char[65536] |
Buffer for file data |
fd |
int |
File descriptor |
iov |
uv_buf_t |
Buffer descriptor for read |
4.3 Data Structures
// File request structure (simplified)
struct uv_fs_s {
uv_req_type type; // UV_FS
uv_loop_t* loop; // Associated loop
uv_fs_cb cb; // Callback function
ssize_t result; // Result: bytes read, fd, or error
void* ptr; // Internal use
const char* path; // File path
uv_stat_t statbuf; // For stat operations
// ... more fields ...
};
// Buffer structure
struct uv_buf_t {
char* base; // Pointer to buffer data
size_t len; // Buffer length
};
4.4 Algorithm Overview
ALGORITHM: Async File Reader
INPUT: filename from command line
OUTPUT: file contents to stdout
1. PARSE ARGUMENTS
- Get filename from argv[1]
- Validate argument exists
2. OPEN FILE
- Call uv_fs_open(filename, on_open)
- Run loop
3. ON_OPEN CALLBACK
- Check result for errors
- If error, print message and exit
- If success, save fd
- Allocate buffer
- Call uv_fs_read(fd, buffer, on_read)
4. ON_READ CALLBACK
- If result < 0: error, call close
- If result == 0: EOF, call close
- If result > 0:
- Print result bytes to stdout
- Call uv_fs_read again (loop)
5. ON_CLOSE CALLBACK
- Cleanup all uv_fs_t requests
- Free buffer
- Loop exits automatically
5. Implementation Guide
5.1 Development Environment Setup
# Create project directory
mkdir uv-cat && cd uv-cat
# Create files
touch main.c
# Create Makefile
cat > Makefile << 'EOF'
CC = gcc
CFLAGS = -Wall -Wextra -g $(shell pkg-config --cflags libuv)
LDFLAGS = $(shell pkg-config --libs libuv)
uv-cat: main.c
$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)
clean:
rm -f uv-cat
.PHONY: clean
EOF
5.2 Project Structure
uv-cat/
├── Makefile
└── main.c
5.3 The Core Question You’re Answering
How do you sequence multiple asynchronous operations when each depends on the result of the previous one?
The answer: Callback chaining - each callback initiates the next operation.
5.4 Concepts You Must Understand First
- What’s the difference between
uv_fs_tanduv_tcp_t?uv_fs_tis a request (one operation)uv_tcp_tis a handle (long-lived)
- Why does file I/O use a threadpool?
- Kernel FS operations are blocking
- Can’t use epoll/kqueue for files
- What does
req->resultcontain?- For open: file descriptor or negative error
- For read: bytes read, 0 for EOF, or negative error
5.5 Questions to Guide Your Design
State Management:
- Where do you store the file descriptor between callbacks?
- How does the read callback access the buffer?
- Should you use global variables or a context struct?
Buffer Strategy:
- How big should the buffer be?
- Who allocates the buffer?
- Who frees the buffer?
Error Handling:
- What if open fails?
- What if read fails?
- How do you cleanup on error?
5.6 Thinking Exercise
Trace this scenario before coding:
File: "hello.txt" contains "Hello, World!\n" (14 bytes)
Buffer size: 64KB
Time T0: main() calls uv_fs_open()
- Work posted to threadpool
Time T1: Thread completes open()
- Result: fd = 3
- on_open() scheduled
Time T2: on_open() runs
- Gets fd = 3
- Calls uv_fs_read()
- Work posted to threadpool
Time T3: Thread completes read()
- Result: 14 bytes read
- on_read() scheduled
Time T4: on_read() runs
- Prints "Hello, World!\n"
- Calls uv_fs_read() again
Time T5: Thread completes read()
- Result: 0 (EOF)
- on_read() scheduled
Time T6: on_read() runs
- Sees result == 0
- Calls uv_fs_close()
Time T7: on_close() runs
- Cleanup complete
- Loop exits
Questions:
- How many times does
on_read()run? - What triggers the close operation?
- When does
uv_run()return?
5.7 Hints in Layers
Hint 1: Starting Point
Headers you need:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <uv.h>
Key libuv functions:
uv_fs_open(loop, req, path, flags, mode, callback)uv_fs_read(loop, req, fd, bufs, nbufs, offset, callback)uv_fs_close(loop, req, fd, callback)uv_fs_req_cleanup(req)
Hint 2: Global State Structure
// Global state for simplicity
uv_fs_t open_req;
uv_fs_t read_req;
uv_fs_t close_req;
char buffer[65536];
uv_buf_t iov;
int file_fd;
Initialize the buffer descriptor:
iov = uv_buf_init(buffer, sizeof(buffer));
Hint 3: Callback Signatures
// All uv_fs callbacks have this signature
void on_open(uv_fs_t* req);
void on_read(uv_fs_t* req);
void on_close(uv_fs_t* req);
In on_open:
if (req->result < 0) {
fprintf(stderr, "Error: %s\n", uv_strerror(req->result));
uv_fs_req_cleanup(req);
return;
}
file_fd = req->result;
uv_fs_read(loop, &read_req, file_fd, &iov, 1, -1, on_read);
Hint 4: Read Loop Logic
void on_read(uv_fs_t* req) {
if (req->result < 0) {
// Error
fprintf(stderr, "Read error: %s\n", uv_strerror(req->result));
} else if (req->result == 0) {
// EOF - close the file
uv_fs_close(loop, &close_req, file_fd, on_close);
} else {
// Got data - print it
fwrite(buffer, 1, req->result, stdout);
// Read more
uv_fs_req_cleanup(req);
uv_fs_read(loop, &read_req, file_fd, &iov, 1, -1, on_read);
}
}
5.8 The Interview Questions They’ll Ask
- “Why does libuv use a threadpool for file I/O but not for network I/O?”
- Network I/O has OS-level async support (epoll/kqueue)
- File I/O in Linux is fundamentally blocking
- Windows has async file I/O (IOCP), but libuv uses threadpool for consistency
- “What happens if all threadpool threads are busy?”
- New work waits in a queue
- Can cause latency issues
- Can increase pool size with
UV_THREADPOOL_SIZE
- “How would you handle reading multiple files concurrently?”
- Start multiple
uv_fs_open()calls - Each has its own request object and callback chain
- They run in parallel on threadpool threads
- Start multiple
- “Why do you need to call
uv_fs_req_cleanup()?”- Frees internal memory allocated by libuv
- Path strings, result buffers, etc.
- Prevents memory leaks
- “What’s the difference between synchronous and async libuv FS calls?”
- Sync:
uv_fs_open(loop, &req, path, flags, mode, NULL) - Passing NULL for callback makes it synchronous
- Blocks the calling thread
- Sync:
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| libuv FS operations | An Introduction to libuv | Chapter on Filesystem |
| POSIX file I/O | Advanced Programming in UNIX | Chapters 3-4 |
| Buffer management | C Programming: A Modern Approach | Chapter 12 (Pointers) |
| Thread pools | C++ Concurrency in Action | Thread pool patterns |
5.10 Implementation Phases
Phase 1: Synchronous Version (1 hour)
First, build a working version using synchronous calls:
uv_fs_t req;
uv_fs_open(loop, &req, filename, O_RDONLY, 0, NULL); // NULL = sync
int fd = req.result;
// ... read and close synchronously
This helps you understand the API before adding callbacks.
Phase 2: Async Open (1 hour)
Convert open to async:
- Add
on_opencallback - Handle success and error cases
- Don’t read yet, just print “Opened: fd = X”
Phase 3: Async Read (2 hours)
Add the read loop:
- Implement
on_readcallback - Handle data, EOF, and error cases
- Print data to stdout
- Loop until EOF
Phase 4: Async Close (1 hour)
Add proper cleanup:
- Implement
on_closecallback - Call
uv_fs_req_cleanup()on all requests - Verify with Valgrind
5.11 Key Implementation Decisions
| Decision | Options | Recommendation |
|---|---|---|
| State storage | Global vars vs struct | Global for simplicity |
| Buffer size | 4KB, 64KB, 1MB | 64KB (good balance) |
| Buffer allocation | Stack vs heap | Stack (simpler, no leaks) |
| Error handling | Exit vs continue | Exit on error (simpler) |
| Read offset | Track position vs -1 | Use -1 (current position) |
6. Testing Strategy
Test Files
# Create test files
echo "Hello, World!" > small.txt
dd if=/dev/urandom of=large.bin bs=1M count=10 # 10MB binary
printf "Line 1\nLine 2\nLine 3\n" > lines.txt
touch empty.txt
Test Cases
| Test | Command | Expected |
|---|---|---|
| Small file | ./uv-cat small.txt |
“Hello, World!” |
| Large file | ./uv-cat large.bin \| wc -c |
10485760 |
| Empty file | ./uv-cat empty.txt |
(no output) |
| Non-existent | ./uv-cat nope.txt |
Error message |
| No argument | ./uv-cat |
Usage message |
Memory Testing
# Check for leaks
valgrind --leak-check=full ./uv-cat small.txt
# Check with large file
valgrind --leak-check=full ./uv-cat large.bin > /dev/null
7. Common Pitfalls & Debugging
| Problem | Symptom | Root Cause | Fix |
|---|---|---|---|
| Double cleanup | Crash or corruption | Calling cleanup twice | Track cleanup state |
| Use after cleanup | Crash | Accessing req after cleanup | Cleanup only in final callback |
| Missing cleanup | Memory leak | Forgot uv_fs_req_cleanup() |
Call in on_close |
| Wrong offset | Wrong data | Not using -1 for offset | Use -1 to read at current position |
| Buffer overflow | Corruption | Buffer too small | Use adequate size |
| Infinite loop | Hangs | Never reaching EOF | Check read result == 0 |
Debugging Tips
// Add debug prints
void on_read(uv_fs_t* req) {
fprintf(stderr, "[DEBUG] on_read: result = %zd\n", req->result);
// ...
}
8. Extensions & Challenges
Extension 1: Multiple Files
Read multiple files from command line:
./uv-cat file1.txt file2.txt file3.txt
Challenge: How do you chain multiple file operations?
Extension 2: Write Output
Add -o output.txt flag to write instead of stdout.
Challenge: You’ll need uv_fs_write() and proper ordering.
Extension 3: File Stats
Print file size and modification time before contents:
$ ./uv-cat file.txt
Size: 1234 bytes
Modified: 2024-01-15 10:30:00
Contents:
...
Challenge: Use uv_fs_stat() before reading.
Extension 4: Progress Bar
For large files, show a progress bar:
$ ./uv-cat large.bin > output.bin
[=========> ] 50% (5MB / 10MB)
Challenge: Need to stat file first, track bytes read.
9. Real-World Connections
How Node.js Uses This
// JavaScript
const fs = require('fs');
fs.readFile('data.txt', (err, data) => {
console.log(data);
});
// Translates to (simplified):
// 1. uv_fs_open() - open file
// 2. uv_fs_fstat() - get file size
// 3. uv_fs_read() - read entire file
// 4. uv_fs_close() - close file
// 5. Deliver data to JavaScript callback
Production Patterns
| Use Case | Pattern |
|---|---|
| Static file server | Async read into response |
| Log rotation | Async read, transform, write |
| Config loading | Async read, parse JSON |
| File watching | uv_fs_event + read on change |
10. Resources
Documentation
Reference Implementations
11. Self-Assessment Checklist
Before moving to Project 3, verify:
- Program reads and prints files correctly
- Handles empty files
- Handles large files (test with 100MB+)
- Prints meaningful error for missing files
- No memory leaks (Valgrind clean)
- You can explain the callback chain
- You understand why threadpool is used
- You know when to call
uv_fs_req_cleanup()
12. Submission / Completion Criteria
Your project is complete when:
- Functional: Correctly reads and prints any file
- Robust: Handles errors gracefully
- Clean: No warnings, no memory leaks
- Documented: Comments explain the callback flow
Bonus: Implement at least one extension.
Navigation
| Previous | Up | Next |
|---|---|---|
| P01: Event Loop Hello World | README | P03: TCP Echo Server |