Project 2: An Asynchronous cat Clone

Build a file reader that uses libuv’s threadpool-backed async file operations, mastering callback chaining for sequential asynchronous logic.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate Weekend
Language C
Prerequisites Project 1, solid C skills (pointers, memory)
Key Topics File I/O, threadpool, callback chaining, buffer management

1. Learning Objectives

By completing this project, you will:

  1. Use libuv’s asynchronous file system operations (uv_fs_*)
  2. Understand how libuv uses a threadpool for blocking operations
  3. Master callback chaining for sequential async operations
  4. Manage buffers and memory in an async context
  5. Handle errors properly across callback boundaries
  6. Clean up uv_fs_t request objects correctly

2. Theoretical Foundation

2.1 Core Concepts

Asynchronous File I/O

Unlike network I/O, file system operations cannot use OS-level async mechanisms like epoll or kqueue. The kernel’s file system layer is fundamentally blocking.

libuv solves this by using a thread pool:

┌─────────────────────────────────────────────────────────────────┐
│                     Your C Code (Main Thread)                    │
│                                                                  │
│  uv_fs_open(loop, &req, "file.txt", O_RDONLY, 0, on_open);      │
│                         │                                        │
│                         │ Posts work to thread pool              │
│                         ▼                                        │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                     Thread Pool (4 threads default)          ││
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       ││
│  │  │ Thread 1 │ │ Thread 2 │ │ Thread 3 │ │ Thread 4 │       ││
│  │  │          │ │          │ │          │ │          │       ││
│  │  │ open()   │ │  (idle)  │ │  (idle)  │ │  (idle)  │       ││
│  │  │ blocking │ │          │ │          │ │          │       ││
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘       ││
│  └─────────────────────────────────────────────────────────────┘│
│                         │                                        │
│                         │ Work completes                         │
│                         ▼                                        │
│  Event loop picks up result, calls on_open() in main thread      │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Request Objects

For file operations, libuv uses request objects (uv_fs_t) instead of handles:

// Request object holds state for one operation
uv_fs_t open_req;

// Start the operation
uv_fs_open(loop, &open_req, "file.txt", O_RDONLY, 0, on_open);

// In callback, result is in the request
void on_open(uv_fs_t* req) {
    int fd = req->result;  // File descriptor or error code
}

Key differences from handles:

Handles Requests
Long-lived Short-lived (one operation)
Must be uv_close()d Must call uv_fs_req_cleanup()
Represent a resource Represent an operation
e.g., uv_tcp_t e.g., uv_fs_t

Callback Chaining

Since operations are asynchronous, you chain them through callbacks:

on_open()  ──►  on_read()  ──►  on_read()  ──►  on_close()
    │              │              │               │
    ▼              ▼              ▼               ▼
Got FD         Got data      Got more data    Cleanup done
Start read     Print it      Print it         Exit
               Read more     EOF? Close

2.2 Why This Matters

Async file I/O is critical for:

  • High-performance servers: Serving static files without blocking
  • Database engines: Reading/writing data files
  • Build tools: Parallel file processing
  • Log processors: Tailing files without blocking

Understanding the threadpool model helps you:

  • Know when operations truly run in parallel
  • Tune the pool size (UV_THREADPOOL_SIZE)
  • Understand Node.js fs module internals

2.3 Historical Context

  • Traditional I/O: Programs called read() and blocked
  • 1990s: Thread-per-connection model for parallelism
  • 2000s: Event loops for network, but FS still blocking
  • 2011: libuv introduces threadpool for async FS
  • Today: Standard pattern in Node.js, Julia, etc.

2.4 Common Misconceptions

Misconception Reality
“Async file I/O uses kernel async” Uses threadpool, not kernel async
“All threads are always busy” Only 4 threads by default
“Callbacks run in thread pool” Callbacks run in main thread
“You can use same request twice” Must cleanup and reinit, or use new request

3. Project Specification

3.1 What You Will Build

A program that reads a file specified on the command line and prints its contents to stdout, using libuv’s async file operations.

3.2 Functional Requirements

  1. Accept filename as command-line argument
  2. Open file asynchronously
  3. Read file in chunks (64KB at a time)
  4. Print each chunk to stdout
  5. Repeat reading until EOF
  6. Close file asynchronously
  7. Clean up all resources

3.3 Non-Functional Requirements

  1. Handle files of any size
  2. Handle read errors gracefully
  3. Print meaningful error messages
  4. No memory leaks
  5. Compile without warnings

3.4 Example Usage / Output

$ echo "Hello, libuv!" > test.txt
$ ./uv-cat test.txt
Hello, libuv!

$ ./uv-cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...

$ ./uv-cat nonexistent.txt
Error opening file: no such file or directory

3.5 Real World Outcome

A working async file reader that demonstrates:

  • Threadpool-backed file operations
  • Callback chaining for sequential logic
  • Proper buffer and memory management
  • Production-quality error handling

4. Solution Architecture

4.1 High-Level Design

┌──────────────────────────────────────────────────────────────────┐
│                          main()                                   │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │ uv_fs_open(path, on_open)                                  │  │
│  └─────────────────────────────┬──────────────────────────────┘  │
│                                │                                  │
│                                ▼                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │ on_open()                                                  │  │
│  │   - Check result (error or fd)                             │  │
│  │   - Save fd for later                                      │  │
│  │   - uv_fs_read(fd, buffer, on_read)                       │  │
│  └─────────────────────────────┬──────────────────────────────┘  │
│                                │                                  │
│                                ▼                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │ on_read()                            ◄───────────────────┐ │  │
│  │   - If result < 0: error, close                          │ │  │
│  │   - If result == 0: EOF, close                           │ │  │
│  │   - If result > 0: print, read more ─────────────────────┘ │  │
│  └─────────────────────────────┬──────────────────────────────┘  │
│                                │ (EOF or error)                   │
│                                ▼                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │ on_close()                                                 │  │
│  │   - Cleanup all requests                                   │  │
│  │   - Free buffer                                            │  │
│  │   - Loop exits (no more work)                              │  │
│  └────────────────────────────────────────────────────────────┘  │
│                                                                   │
└──────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component Type Purpose
open_req uv_fs_t Request for open operation
read_req uv_fs_t Request for read operations
close_req uv_fs_t Request for close operation
buffer char[65536] Buffer for file data
fd int File descriptor
iov uv_buf_t Buffer descriptor for read

4.3 Data Structures

// File request structure (simplified)
struct uv_fs_s {
    uv_req_type type;      // UV_FS
    uv_loop_t* loop;       // Associated loop
    uv_fs_cb cb;           // Callback function
    ssize_t result;        // Result: bytes read, fd, or error
    void* ptr;             // Internal use
    const char* path;      // File path
    uv_stat_t statbuf;     // For stat operations
    // ... more fields ...
};

// Buffer structure
struct uv_buf_t {
    char* base;    // Pointer to buffer data
    size_t len;    // Buffer length
};

4.4 Algorithm Overview

ALGORITHM: Async File Reader

INPUT: filename from command line
OUTPUT: file contents to stdout

1. PARSE ARGUMENTS
   - Get filename from argv[1]
   - Validate argument exists

2. OPEN FILE
   - Call uv_fs_open(filename, on_open)
   - Run loop

3. ON_OPEN CALLBACK
   - Check result for errors
   - If error, print message and exit
   - If success, save fd
   - Allocate buffer
   - Call uv_fs_read(fd, buffer, on_read)

4. ON_READ CALLBACK
   - If result < 0: error, call close
   - If result == 0: EOF, call close
   - If result > 0:
     - Print result bytes to stdout
     - Call uv_fs_read again (loop)

5. ON_CLOSE CALLBACK
   - Cleanup all uv_fs_t requests
   - Free buffer
   - Loop exits automatically

5. Implementation Guide

5.1 Development Environment Setup

# Create project directory
mkdir uv-cat && cd uv-cat

# Create files
touch main.c

# Create Makefile
cat > Makefile << 'EOF'
CC = gcc
CFLAGS = -Wall -Wextra -g $(shell pkg-config --cflags libuv)
LDFLAGS = $(shell pkg-config --libs libuv)

uv-cat: main.c
	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)

clean:
	rm -f uv-cat

.PHONY: clean
EOF

5.2 Project Structure

uv-cat/
├── Makefile
└── main.c

5.3 The Core Question You’re Answering

How do you sequence multiple asynchronous operations when each depends on the result of the previous one?

The answer: Callback chaining - each callback initiates the next operation.

5.4 Concepts You Must Understand First

  1. What’s the difference between uv_fs_t and uv_tcp_t?
    • uv_fs_t is a request (one operation)
    • uv_tcp_t is a handle (long-lived)
  2. Why does file I/O use a threadpool?
    • Kernel FS operations are blocking
    • Can’t use epoll/kqueue for files
  3. What does req->result contain?
    • For open: file descriptor or negative error
    • For read: bytes read, 0 for EOF, or negative error

5.5 Questions to Guide Your Design

State Management:

  • Where do you store the file descriptor between callbacks?
  • How does the read callback access the buffer?
  • Should you use global variables or a context struct?

Buffer Strategy:

  • How big should the buffer be?
  • Who allocates the buffer?
  • Who frees the buffer?

Error Handling:

  • What if open fails?
  • What if read fails?
  • How do you cleanup on error?

5.6 Thinking Exercise

Trace this scenario before coding:

File: "hello.txt" contains "Hello, World!\n" (14 bytes)
Buffer size: 64KB

Time T0: main() calls uv_fs_open()
         - Work posted to threadpool

Time T1: Thread completes open()
         - Result: fd = 3
         - on_open() scheduled

Time T2: on_open() runs
         - Gets fd = 3
         - Calls uv_fs_read()
         - Work posted to threadpool

Time T3: Thread completes read()
         - Result: 14 bytes read
         - on_read() scheduled

Time T4: on_read() runs
         - Prints "Hello, World!\n"
         - Calls uv_fs_read() again

Time T5: Thread completes read()
         - Result: 0 (EOF)
         - on_read() scheduled

Time T6: on_read() runs
         - Sees result == 0
         - Calls uv_fs_close()

Time T7: on_close() runs
         - Cleanup complete
         - Loop exits

Questions:

  1. How many times does on_read() run?
  2. What triggers the close operation?
  3. When does uv_run() return?

5.7 Hints in Layers

Hint 1: Starting Point

Headers you need:

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <uv.h>

Key libuv functions:

  • uv_fs_open(loop, req, path, flags, mode, callback)
  • uv_fs_read(loop, req, fd, bufs, nbufs, offset, callback)
  • uv_fs_close(loop, req, fd, callback)
  • uv_fs_req_cleanup(req)

Hint 2: Global State Structure

// Global state for simplicity
uv_fs_t open_req;
uv_fs_t read_req;
uv_fs_t close_req;

char buffer[65536];
uv_buf_t iov;
int file_fd;

Initialize the buffer descriptor:

iov = uv_buf_init(buffer, sizeof(buffer));

Hint 3: Callback Signatures

// All uv_fs callbacks have this signature
void on_open(uv_fs_t* req);
void on_read(uv_fs_t* req);
void on_close(uv_fs_t* req);

In on_open:

if (req->result < 0) {
    fprintf(stderr, "Error: %s\n", uv_strerror(req->result));
    uv_fs_req_cleanup(req);
    return;
}
file_fd = req->result;
uv_fs_read(loop, &read_req, file_fd, &iov, 1, -1, on_read);

Hint 4: Read Loop Logic

void on_read(uv_fs_t* req) {
    if (req->result < 0) {
        // Error
        fprintf(stderr, "Read error: %s\n", uv_strerror(req->result));
    } else if (req->result == 0) {
        // EOF - close the file
        uv_fs_close(loop, &close_req, file_fd, on_close);
    } else {
        // Got data - print it
        fwrite(buffer, 1, req->result, stdout);
        // Read more
        uv_fs_req_cleanup(req);
        uv_fs_read(loop, &read_req, file_fd, &iov, 1, -1, on_read);
    }
}

5.8 The Interview Questions They’ll Ask

  1. “Why does libuv use a threadpool for file I/O but not for network I/O?”
    • Network I/O has OS-level async support (epoll/kqueue)
    • File I/O in Linux is fundamentally blocking
    • Windows has async file I/O (IOCP), but libuv uses threadpool for consistency
  2. “What happens if all threadpool threads are busy?”
    • New work waits in a queue
    • Can cause latency issues
    • Can increase pool size with UV_THREADPOOL_SIZE
  3. “How would you handle reading multiple files concurrently?”
    • Start multiple uv_fs_open() calls
    • Each has its own request object and callback chain
    • They run in parallel on threadpool threads
  4. “Why do you need to call uv_fs_req_cleanup()?”
    • Frees internal memory allocated by libuv
    • Path strings, result buffers, etc.
    • Prevents memory leaks
  5. “What’s the difference between synchronous and async libuv FS calls?”
    • Sync: uv_fs_open(loop, &req, path, flags, mode, NULL)
    • Passing NULL for callback makes it synchronous
    • Blocks the calling thread

5.9 Books That Will Help

Topic Book Chapter
libuv FS operations An Introduction to libuv Chapter on Filesystem
POSIX file I/O Advanced Programming in UNIX Chapters 3-4
Buffer management C Programming: A Modern Approach Chapter 12 (Pointers)
Thread pools C++ Concurrency in Action Thread pool patterns

5.10 Implementation Phases

Phase 1: Synchronous Version (1 hour)

First, build a working version using synchronous calls:

uv_fs_t req;
uv_fs_open(loop, &req, filename, O_RDONLY, 0, NULL);  // NULL = sync
int fd = req.result;
// ... read and close synchronously

This helps you understand the API before adding callbacks.

Phase 2: Async Open (1 hour)

Convert open to async:

  • Add on_open callback
  • Handle success and error cases
  • Don’t read yet, just print “Opened: fd = X”

Phase 3: Async Read (2 hours)

Add the read loop:

  • Implement on_read callback
  • Handle data, EOF, and error cases
  • Print data to stdout
  • Loop until EOF

Phase 4: Async Close (1 hour)

Add proper cleanup:

  • Implement on_close callback
  • Call uv_fs_req_cleanup() on all requests
  • Verify with Valgrind

5.11 Key Implementation Decisions

Decision Options Recommendation
State storage Global vars vs struct Global for simplicity
Buffer size 4KB, 64KB, 1MB 64KB (good balance)
Buffer allocation Stack vs heap Stack (simpler, no leaks)
Error handling Exit vs continue Exit on error (simpler)
Read offset Track position vs -1 Use -1 (current position)

6. Testing Strategy

Test Files

# Create test files
echo "Hello, World!" > small.txt
dd if=/dev/urandom of=large.bin bs=1M count=10  # 10MB binary
printf "Line 1\nLine 2\nLine 3\n" > lines.txt
touch empty.txt

Test Cases

Test Command Expected
Small file ./uv-cat small.txt “Hello, World!”
Large file ./uv-cat large.bin \| wc -c 10485760
Empty file ./uv-cat empty.txt (no output)
Non-existent ./uv-cat nope.txt Error message
No argument ./uv-cat Usage message

Memory Testing

# Check for leaks
valgrind --leak-check=full ./uv-cat small.txt

# Check with large file
valgrind --leak-check=full ./uv-cat large.bin > /dev/null

7. Common Pitfalls & Debugging

Problem Symptom Root Cause Fix
Double cleanup Crash or corruption Calling cleanup twice Track cleanup state
Use after cleanup Crash Accessing req after cleanup Cleanup only in final callback
Missing cleanup Memory leak Forgot uv_fs_req_cleanup() Call in on_close
Wrong offset Wrong data Not using -1 for offset Use -1 to read at current position
Buffer overflow Corruption Buffer too small Use adequate size
Infinite loop Hangs Never reaching EOF Check read result == 0

Debugging Tips

// Add debug prints
void on_read(uv_fs_t* req) {
    fprintf(stderr, "[DEBUG] on_read: result = %zd\n", req->result);
    // ...
}

8. Extensions & Challenges

Extension 1: Multiple Files

Read multiple files from command line:

./uv-cat file1.txt file2.txt file3.txt

Challenge: How do you chain multiple file operations?

Extension 2: Write Output

Add -o output.txt flag to write instead of stdout.

Challenge: You’ll need uv_fs_write() and proper ordering.

Extension 3: File Stats

Print file size and modification time before contents:

$ ./uv-cat file.txt
Size: 1234 bytes
Modified: 2024-01-15 10:30:00
Contents:
...

Challenge: Use uv_fs_stat() before reading.

Extension 4: Progress Bar

For large files, show a progress bar:

$ ./uv-cat large.bin > output.bin
[=========>          ] 50% (5MB / 10MB)

Challenge: Need to stat file first, track bytes read.


9. Real-World Connections

How Node.js Uses This

// JavaScript
const fs = require('fs');
fs.readFile('data.txt', (err, data) => {
    console.log(data);
});

// Translates to (simplified):
// 1. uv_fs_open() - open file
// 2. uv_fs_fstat() - get file size
// 3. uv_fs_read() - read entire file
// 4. uv_fs_close() - close file
// 5. Deliver data to JavaScript callback

Production Patterns

Use Case Pattern
Static file server Async read into response
Log rotation Async read, transform, write
Config loading Async read, parse JSON
File watching uv_fs_event + read on change

10. Resources

Documentation

Reference Implementations


11. Self-Assessment Checklist

Before moving to Project 3, verify:

  • Program reads and prints files correctly
  • Handles empty files
  • Handles large files (test with 100MB+)
  • Prints meaningful error for missing files
  • No memory leaks (Valgrind clean)
  • You can explain the callback chain
  • You understand why threadpool is used
  • You know when to call uv_fs_req_cleanup()

12. Submission / Completion Criteria

Your project is complete when:

  1. Functional: Correctly reads and prints any file
  2. Robust: Handles errors gracefully
  3. Clean: No warnings, no memory leaks
  4. Documented: Comments explain the callback flow

Bonus: Implement at least one extension.


Previous Up Next
P01: Event Loop Hello World README P03: TCP Echo Server