Project 20: HTTP Web Server
Build a small but real HTTP server in C: parse requests, serve static files, and run simple CGI-style dynamic handlers. This project ties together Unix I/O, socket programming, and process control into a working networked application.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C (alt: Rust, Go) |
| Difficulty | Intermediate |
| Time | 1-2 weeks |
| Chapters | 10, 11, 8 |
| Coolness | ★★★★☆ Real Server |
| Portfolio Value | Interview Gold |
Learning Objectives
By completing this project, you will:
- Master socket programming fundamentals: Understand the socket lifecycle (socket/bind/listen/accept) and client-server communication patterns
- Implement the HTTP/1.0 protocol: Parse request lines and headers, generate proper response headers with status codes
- Handle file I/O for static content: Read files from disk and serve them with correct MIME types and content lengths
- Build robust request parsing: Defensively handle malformed requests, long lines, and edge cases
- Implement CGI for dynamic content: Fork child processes, set up environment variables, and redirect I/O for executable scripts
- Understand concurrent server models: Compare iterative vs. forking vs. threaded approaches
- Apply security best practices: Prevent path traversal attacks, validate inputs, and handle resource limits
- Debug network applications: Use tools like curl, netcat, and tcpdump to diagnose issues
- Connect Unix I/O with networking: See how file descriptors unify disk files, sockets, and pipes
- Prepare for production systems: Understand what production servers like nginx and Apache do beyond this implementation
Deep Theoretical Foundation
The Big Picture: Client-Server Architecture
Before diving into sockets, understand where web servers fit in networked systems:
┌────────────────────────────────────────────────────────────────────────┐
│ CLIENT-SERVER ARCHITECTURE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ CLIENT SERVER │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Web Browser │ │ Web Server │ │
│ │ (Chrome, curl) │ │ (tiny, nginx) │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ │ 1. Initiate connection │ │
│ │ ────────────────────────────>│ │
│ │ │ │
│ │ 2. Send HTTP request │ │
│ │ ────────────────────────────>│ │
│ │ │ │
│ │ 3. Process request │ │
│ │ │ ← Read file / Run CGI │
│ │ │ │
│ │ 4. Send HTTP response │ │
│ │ <────────────────────────────│ │
│ │ │ │
│ │ 5. Close connection │ │
│ │ <────────────────────────────│ │
│ │
│ This is HTTP/1.0 - one request per connection. │
│ HTTP/1.1 adds keep-alive for multiple requests per connection. │
│ │
└────────────────────────────────────────────────────────────────────────┘
Socket Programming Fundamentals
What is a Socket?
A socket is an endpoint for communication - think of it as a “file descriptor for the network.” Just like you can read/write files, you can read/write sockets. The key insight from CS:APP is that Unix I/O unifies everything as file descriptors.
┌────────────────────────────────────────────────────────────────────────┐
│ SOCKET ABSTRACTION │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Unix treats everything as a file: │
│ │
│ FILE DESCRIPTOR TABLE │
│ ┌─────┬──────────────────────────────────────────┐ │
│ │ 0 │ stdin - keyboard input │ │
│ │ 1 │ stdout - terminal output │ │
│ │ 2 │ stderr - error output │ │
│ │ 3 │ regular file - open("/etc/passwd", ...) │ │
│ │ 4 │ socket - socket(AF_INET, ...) │ ← Network! │
│ │ 5 │ pipe - pipe() │ │
│ └─────┴──────────────────────────────────────────┘ │
│ │
│ For fd 3 (file): read(3, buf, n) → bytes from disk │
│ For fd 4 (socket): read(4, buf, n) → bytes from network │
│ │
│ Same system calls, different underlying resources! │
│ │
│ SOCKET = (IP address, Port number) │
│ │
│ Client: 192.168.1.100:54321 (ephemeral port) │
│ Server: 93.184.216.34:80 (well-known port) │
│ │
└────────────────────────────────────────────────────────────────────────┘
The Socket API: Server Side
The server uses a specific sequence of system calls to accept connections:
┌────────────────────────────────────────────────────────────────────────┐
│ SERVER SOCKET LIFECYCLE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: socket() - Create a socket │
│ ───────────────────────────────── │
│ │
│ int listenfd = socket(AF_INET, SOCK_STREAM, 0); │
│ │
│ AF_INET: IPv4 Internet protocols │
│ SOCK_STREAM: TCP (reliable, connection-oriented) │
│ 0: Default protocol (TCP for SOCK_STREAM) │
│ │
│ Returns: File descriptor for the socket (e.g., 3) │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ KERNEL SPACE │ │
│ │ ┌────────────────────────────────────────────┐ │ │
│ │ │ Socket created but not bound to address │ │ │
│ │ │ State: CLOSED │ │ │
│ │ └────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ STEP 2: bind() - Assign address to socket │
│ ───────────────────────────────────────── │
│ │
│ struct sockaddr_in addr; │
│ addr.sin_family = AF_INET; │
│ addr.sin_port = htons(8080); // Port 8080 │
│ addr.sin_addr.s_addr = INADDR_ANY; // Any local interface │
│ │
│ bind(listenfd, (struct sockaddr *)&addr, sizeof(addr)); │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Socket now bound to 0.0.0.0:8080 │ │
│ │ State: CLOSED (but with address) │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ STEP 3: listen() - Mark socket as passive (accepting connections) │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ listen(listenfd, SOMAXCONN); // Backlog queue size │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Socket marked as LISTENING │ │
│ │ Kernel creates backlog queue for pending conns │ │
│ │ State: LISTEN │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ STEP 4: accept() - Wait for and accept a connection │
│ ─────────────────────────────────────────────────── │
│ │
│ struct sockaddr_in clientaddr; │
│ socklen_t clientlen = sizeof(clientaddr); │
│ int connfd = accept(listenfd, (struct sockaddr *)&clientaddr, │
│ &clientlen); │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ BLOCKING: Waits until client connects │ │
│ │ Returns NEW file descriptor (connfd) for THIS │ │
│ │ specific client connection │ │
│ │ listenfd remains open for more connections │ │
│ └──────────────────────────────────────────────────┘ │
│ │
│ IMPORTANT DISTINCTION: │
│ - listenfd: Used only for accepting new connections │
│ - connfd: Used for reading/writing with the client │
│ │
└────────────────────────────────────────────────────────────────────────┘
The Complete Server Accept Loop
┌────────────────────────────────────────────────────────────────────────┐
│ SERVER ACCEPT LOOP │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ int main() { │
│ int listenfd = open_listenfd(port); // Steps 1-3 │
│ │
│ while (1) { // Infinite loop │
│ connfd = accept(listenfd, ...); // Block until client │
│ handle_request(connfd); // Process request │
│ close(connfd); // Done with this client │
│ } │
│ } │
│ │
│ TIMELINE: │
│ │
│ Server Client │
│ ────── ────── │
│ socket() │
│ bind() │
│ listen() │
│ accept() ─────┐ │
│ BLOCKED │ │
│ │ connect() ──────────────────┐ │
│ │ │ │
│ │<─────── TCP 3-way handshake ───────────────────>│ │
│ │ │ │
│ UNBLOCKED ────┘ Connection │ │
│ connfd = 5 established │ │
│ │ │
│ read(connfd) <──────────────────────────── write() "GET /..." │ │
│ │
│ (process request) │
│ │
│ write(connfd) ────────────────────────────> read() response │
│ │
│ close(connfd) │
│ │
│ accept() ─────┐ │
│ BLOCKED │ Ready for next client │
│ │ │
│ │
└────────────────────────────────────────────────────────────────────────┘
TCP Connection Details
Understanding TCP is essential for debugging network issues:
┌────────────────────────────────────────────────────────────────────────┐
│ TCP THREE-WAY HANDSHAKE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Client Server │
│ ────── ────── │
│ │
│ State: CLOSED State: LISTEN │
│ │
│ ───────── SYN (seq=x) ─────────────> │
│ │
│ State: SYN-SENT State: SYN-RECEIVED │
│ │
│ <──────── SYN+ACK (seq=y, ack=x+1) ── │
│ │
│ State: ESTABLISHED │
│ │
│ ───────── ACK (ack=y+1) ────────────> │
│ │
│ State: ESTABLISHED │
│ │
│ WHY THREE STEPS? │
│ ───────────────── │
│ 1. SYN: Client says "I want to connect, my sequence number is x" │
│ 2. SYN+ACK: Server says "OK, my sequence number is y, I got your x" │
│ 3. ACK: Client says "I got your y" - connection established │
│ │
│ This ensures BOTH sides know the other is ready. │
│ │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ TCP CONNECTION TERMINATION │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Client Server │
│ ────── ────── │
│ │
│ close(connfd) │
│ ───────── FIN ──────────────────────> │
│ │
│ State: FIN-WAIT-1 State: CLOSE-WAIT │
│ │
│ <──────── ACK ────────────────────── │
│ │
│ State: FIN-WAIT-2 │
│ (server may still send │
│ data if needed) │
│ │
│ close(connfd) │
│ <──────── FIN ────────────────────── │
│ │
│ State: TIME-WAIT State: LAST-ACK │
│ │
│ ───────── ACK ──────────────────────> │
│ │
│ (wait 2*MSL) State: CLOSED │
│ │
│ State: CLOSED │
│ │
│ TIME-WAIT: Server waits to ensure final ACK arrived │
│ MSL = Maximum Segment Lifetime (~30 seconds) │
│ │
└────────────────────────────────────────────────────────────────────────┘
HTTP Protocol Basics
HTTP (Hypertext Transfer Protocol) is a text-based request-response protocol:
┌────────────────────────────────────────────────────────────────────────┐
│ HTTP REQUEST FORMAT │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ REQUEST LINE: │
│ ────────────── │
│ METHOD SP URI SP VERSION CRLF │
│ │
│ Example: "GET /index.html HTTP/1.0\r\n" │
│ │
│ ┌─────────┬──────────────┬─────────────┐ │
│ │ GET │ /index.html │ HTTP/1.0 │ │
│ └─────────┴──────────────┴─────────────┘ │
│ METHOD URI VERSION │
│ │
│ COMMON METHODS: │
│ - GET: Retrieve a resource │
│ - POST: Submit data to a resource │
│ - HEAD: Like GET but only headers (no body) │
│ - PUT: Replace a resource │
│ - DELETE: Remove a resource │
│ │
│ HEADERS: │
│ ──────── │
│ header-name: header-value CRLF │
│ header-name: header-value CRLF │
│ ... │
│ CRLF (empty line marks end of headers) │
│ │
│ Example headers: │
│ Host: www.example.com\r\n │
│ User-Agent: curl/7.68.0\r\n │
│ Accept: */*\r\n │
│ \r\n │
│ │
│ BODY (optional, for POST/PUT): │
│ ───── │
│ Raw bytes as specified by Content-Length header │
│ │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ HTTP RESPONSE FORMAT │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ STATUS LINE: │
│ ──────────── │
│ VERSION SP STATUS-CODE SP REASON-PHRASE CRLF │
│ │
│ Example: "HTTP/1.0 200 OK\r\n" │
│ │
│ ┌─────────────┬─────┬───────────┐ │
│ │ HTTP/1.0 │ 200 │ OK │ │
│ └─────────────┴─────┴───────────┘ │
│ VERSION CODE REASON │
│ │
│ COMMON STATUS CODES: │
│ ──────────────────── │
│ 200 OK - Request succeeded │
│ 301 Moved Permanently - Resource relocated │
│ 400 Bad Request - Malformed request syntax │
│ 403 Forbidden - Access denied │
│ 404 Not Found - Resource doesn't exist │
│ 500 Internal Error - Server error │
│ 501 Not Implemented - Method not supported │
│ │
│ HEADERS: │
│ ──────── │
│ Content-Type: text/html\r\n │
│ Content-Length: 1234\r\n │
│ Connection: close\r\n │
│ \r\n │
│ │
│ BODY: │
│ ───── │
│ <html><body>Hello World!</body></html> │
│ │
└────────────────────────────────────────────────────────────────────────┘
Complete HTTP Exchange Example
┌────────────────────────────────────────────────────────────────────────┐
│ COMPLETE HTTP EXCHANGE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ CLIENT REQUEST (raw bytes sent over socket): │
│ ──────────────────────────────────────────── │
│ │
│ GET /hello.html HTTP/1.0\r\n │
│ Host: localhost:8080\r\n │
│ User-Agent: curl/7.68.0\r\n │
│ Accept: */*\r\n │
│ \r\n │
│ │
│ ═══════════════════════════════════════════════════════════════════ │
│ │
│ SERVER RESPONSE (raw bytes sent back): │
│ ────────────────────────────────────── │
│ │
│ HTTP/1.0 200 OK\r\n │
│ Server: Tiny Web Server\r\n │
│ Connection: close\r\n │
│ Content-Length: 47\r\n │
│ Content-Type: text/html\r\n │
│ \r\n │
│ <html><body><h1>Hello World!</h1></body></html> │
│ │
│ ═══════════════════════════════════════════════════════════════════ │
│ │
│ NOTE: \r\n is CRLF (carriage return + line feed, bytes 0x0D 0x0A) │
│ HTTP requires CRLF, not just LF (\n) │
│ │
└────────────────────────────────────────────────────────────────────────┘
MIME Types and Content Negotiation
MIME types tell the browser how to handle the response:
┌────────────────────────────────────────────────────────────────────────┐
│ MIME TYPES │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ MIME = Multipurpose Internet Mail Extensions │
│ Format: type/subtype │
│ │
│ COMMON MIME TYPES: │
│ ────────────────── │
│ │
│ Extension MIME Type Description │
│ ───────── ───────── ─────────── │
│ .html text/html HTML document │
│ .css text/css CSS stylesheet │
│ .js application/javascript JavaScript │
│ .json application/json JSON data │
│ .txt text/plain Plain text │
│ .xml application/xml XML document │
│ │
│ .jpg/.jpeg image/jpeg JPEG image │
│ .png image/png PNG image │
│ .gif image/gif GIF image │
│ .svg image/svg+xml SVG vector │
│ .ico image/x-icon Favicon │
│ │
│ .pdf application/pdf PDF document │
│ .zip application/zip ZIP archive │
│ .mp3 audio/mpeg MP3 audio │
│ .mp4 video/mp4 MP4 video │
│ │
│ .bin application/octet-stream Binary (download) │
│ │
│ WHY IT MATTERS: │
│ ─────────────── │
│ - Browser uses MIME type to decide how to display content │
│ - Wrong MIME type = broken page or download prompt │
│ - Security: Can't trust file extension (validate content) │
│ │
│ IMPLEMENTATION: │
│ ─────────────── │
│ 1. Extract file extension from URI │
│ 2. Look up in a table (or switch statement) │
│ 3. Default to application/octet-stream for unknown │
│ │
└────────────────────────────────────────────────────────────────────────┘
CGI and Dynamic Content
CGI (Common Gateway Interface) allows servers to run programs that generate dynamic responses:
┌────────────────────────────────────────────────────────────────────────┐
│ CGI EXECUTION MODEL │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ STATIC CONTENT: DYNAMIC CONTENT (CGI): │
│ ─────────────── ──────────────────────── │
│ │
│ Request: GET /page.html Request: GET /cgi-bin/time.cgi │
│ │
│ Server: Server: │
│ 1. Open file 1. fork() child process │
│ 2. Read contents 2. Set up environment vars │
│ 3. Send to client 3. Redirect stdout to socket │
│ 4. exec() the CGI program │
│ 5. Program writes to stdout │
│ 6. Output goes to client │
│ │
│ FILE-BASED PROGRAM-GENERATED │
│ (pre-written content) (computed on each request) │
│ │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ CGI FORK/EXEC PATTERN │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ CLIENT REQUEST: │
│ GET /cgi-bin/adder?15&213 HTTP/1.0 │
│ │
│ ═══════════════════════════════════════════════════════════════════ │
│ │
│ SERVER PROCESS │
│ ────────────── │
│ │ │
│ │ if (is_cgi_request(uri)) { │
│ │ │
│ ├──────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ pid = fork(); │ │
│ │ │ │
│ │ if (pid == 0) { // CHILD PROCESS │ │
│ │ │ │ │
│ │ │ // Set CGI environment variables │ │
│ │ │ setenv("QUERY_STRING", "15&213", 1); │ │
│ │ │ setenv("REQUEST_METHOD", "GET", 1); │ │
│ │ │ setenv("CONTENT_LENGTH", "0", 1); │ │
│ │ │ │ │
│ │ │ // Redirect stdout to client socket │ │
│ │ │ dup2(connfd, STDOUT_FILENO); │ │
│ │ │ │ │
│ │ │ // Execute the CGI program │ │
│ │ │ execve("/www/cgi-bin/adder", argv, environ);│ │
│ │ │ │ │
│ │ │ // If we get here, exec failed │ │
│ │ │ exit(1); │ │
│ │ } │ │
│ │ │ │
│ │ // PARENT: wait for child │ │
│ │ waitpid(pid, NULL, 0); │ │
│ │ │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ CGI PROGRAM (adder.c): │
│ ────────────────────── │
│ │
│ int main() { │
│ char *query = getenv("QUERY_STRING"); // "15&213" │
│ int n1, n2; │
│ sscanf(query, "%d&%d", &n1, &n2); │
│ │
│ // CGI program must output its own headers! │
│ printf("Content-Type: text/html\r\n"); │
│ printf("\r\n"); │
│ printf("<html><body>%d + %d = %d</body></html>\n", │
│ n1, n2, n1 + n2); │
│ return 0; │
│ } │
│ │
│ OUTPUT TO CLIENT: │
│ ───────────────── │
│ Content-Type: text/html\r\n │
│ \r\n │
│ <html><body>15 + 213 = 228</body></html> │
│ │
└────────────────────────────────────────────────────────────────────────┘
CGI Environment Variables
┌────────────────────────────────────────────────────────────────────────┐
│ CGI ENVIRONMENT VARIABLES │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ VARIABLE DESCRIPTION EXAMPLE │
│ ──────── ─────────── ─────── │
│ REQUEST_METHOD HTTP method GET │
│ QUERY_STRING URL parameters (after ?) name=john&age=25 │
│ CONTENT_TYPE Body MIME type (for POST) application/json │
│ CONTENT_LENGTH Body size in bytes 42 │
│ PATH_INFO Extra path after script /users/123 │
│ SCRIPT_NAME CGI script path /cgi-bin/handler │
│ SERVER_NAME Server hostname localhost │
│ SERVER_PORT Server port 8080 │
│ SERVER_PROTOCOL Protocol version HTTP/1.0 │
│ REMOTE_ADDR Client IP address 192.168.1.100 │
│ HTTP_* Any HTTP header HTTP_USER_AGENT │
│ │
│ FOR POST REQUESTS: │
│ ────────────────── │
│ - Body data comes from stdin (not QUERY_STRING) │
│ - Read CONTENT_LENGTH bytes from stdin │
│ - Parse based on CONTENT_TYPE │
│ │
└────────────────────────────────────────────────────────────────────────┘
Concurrent Server Models
A simple iterative server can only handle one client at a time. Production servers need concurrency:
┌────────────────────────────────────────────────────────────────────────┐
│ CONCURRENT SERVER MODELS │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ MODEL 1: ITERATIVE (Sequential) │
│ ─────────────────────────────── │
│ │
│ while (1) { │
│ connfd = accept(listenfd, ...); │
│ handle_request(connfd); // Blocks until complete │
│ close(connfd); │
│ } │
│ │
│ + Simple to implement │
│ - Can only handle one client at a time │
│ - Slow client blocks everyone │
│ │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ MODEL 2: PROCESS-BASED (Fork per connection) │
│ ──────────────────────────────────────────── │
│ │
│ while (1) { │
│ connfd = accept(listenfd, ...); │
│ if (fork() == 0) { // Child process │
│ close(listenfd); // Child doesn't need listener │
│ handle_request(connfd); │
│ close(connfd); │
│ exit(0); │
│ } │
│ close(connfd); // Parent doesn't need connection │
│ } │
│ │
│ + True parallelism (separate address spaces) │
│ + One client crash doesn't affect others │
│ - High overhead (fork is expensive) │
│ - Many processes = high memory usage │
│ - Need to handle zombie processes (SIGCHLD) │
│ │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ MODEL 3: THREAD-BASED (Thread per connection) │
│ ───────────────────────────────────────────── │
│ │
│ while (1) { │
│ connfd = accept(listenfd, ...); │
│ pthread_create(&tid, NULL, handle_thread, &connfd); │
│ pthread_detach(tid); // Auto-cleanup when done │
│ } │
│ │
│ void *handle_thread(void *arg) { │
│ int connfd = *((int *)arg); │
│ handle_request(connfd); │
│ close(connfd); │
│ return NULL; │
│ } │
│ │
│ + Lower overhead than processes │
│ + Shared memory (efficient data sharing) │
│ - Shared memory (race conditions!) │
│ - Thread count can explode under load │
│ - Stack size limits total threads │
│ │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ MODEL 4: I/O MULTIPLEXING (select/poll/epoll) │
│ ───────────────────────────────────────────── │
│ │
│ while (1) { │
│ ready = select(maxfd+1, &read_set, NULL, NULL, NULL); │
│ for (each fd in read_set) { │
│ if (fd == listenfd) │
│ accept new connection │
│ else │
│ handle partial I/O on existing connection │
│ } │
│ } │
│ │
│ + Single thread handles many connections │
│ + Very efficient (no context switch overhead) │
│ - Complex state machine programming │
│ - Can't use blocking I/O │
│ - A single slow operation blocks everything │
│ │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ MODEL 5: THREAD POOL (Pre-created threads) │
│ ────────────────────────────────────────── │
│ │
│ // Create N worker threads at startup │
│ for (int i = 0; i < POOL_SIZE; i++) { │
│ pthread_create(&workers[i], NULL, worker_thread, &work_queue); │
│ } │
│ │
│ while (1) { │
│ connfd = accept(listenfd, ...); │
│ enqueue(work_queue, connfd); // Add to shared queue │
│ } │
│ │
│ void *worker_thread(void *arg) { │
│ while (1) { │
│ connfd = dequeue(work_queue); // Blocks if empty │
│ handle_request(connfd); │
│ close(connfd); │
│ } │
│ } │
│ │
│ + Bounded resource usage │
│ + Lower overhead than thread-per-connection │
│ + Graceful degradation under load │
│ - Requires thread-safe queue implementation │
│ - Pool sizing is an art │
│ │
└────────────────────────────────────────────────────────────────────────┘
Robust I/O (Rio) Functions
CS:APP provides robust I/O wrappers that handle short counts and interrupts:
┌────────────────────────────────────────────────────────────────────────┐
│ ROBUST I/O (RIO) FUNCTIONS │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ PROBLEM: read() and write() can return fewer bytes than requested │
│ ──────── │
│ │
│ // You want to read 1000 bytes: │
│ n = read(fd, buf, 1000); │
│ │
│ // But you might get: │
│ n = 500 // Short count - only 500 bytes available │
│ n = -1 // Error (check errno) │
│ n = 0 // EOF │
│ │
│ SHORT COUNTS HAPPEN WHEN: │
│ ───────────────────────── │
│ - Reading from network (data arrives in packets) │
│ - Reading from pipes (writer hasn't written enough) │
│ - Reading from terminal (line buffering) │
│ - Signal interrupts the read (errno == EINTR) │
│ │
│ ═══════════════════════════════════════════════════════════════════ │
│ │
│ SOLUTION: Rio functions from CS:APP │
│ ─────────────────────────────────── │
│ │
│ UNBUFFERED (for binary data): │
│ │
│ ssize_t rio_readn(int fd, void *buf, size_t n); │
│ // Reads exactly n bytes (unless EOF or error) │
│ // Returns: n if successful, 0 on EOF, -1 on error │
│ │
│ ssize_t rio_writen(int fd, void *buf, size_t n); │
│ // Writes exactly n bytes (unless error) │
│ // Returns: n if successful, -1 on error │
│ │
│ BUFFERED (for text lines): │
│ │
│ typedef struct { │
│ int rio_fd; // File descriptor │
│ int rio_cnt; // Unread bytes in buffer │
│ char *rio_bufptr; // Next unread byte │
│ char rio_buf[RIO_BUFSIZE]; // Internal buffer │
│ } rio_t; │
│ │
│ void rio_readinitb(rio_t *rp, int fd); │
│ // Initialize rio_t for buffered reading │
│ │
│ ssize_t rio_readlineb(rio_t *rp, void *buf, size_t maxlen); │
│ // Reads next text line (including \n) │
│ // Returns: bytes read (including \n), 0 on EOF, -1 on error │
│ │
│ ssize_t rio_readnb(rio_t *rp, void *buf, size_t n); │
│ // Buffered version of rio_readn │
│ │
│ ═══════════════════════════════════════════════════════════════════ │
│ │
│ WHY BUFFERED READING? │
│ ───────────────────── │
│ │
│ Reading one byte at a time is SLOW (syscall per byte): │
│ │
│ while (read(fd, &c, 1) == 1) { // System call per character! │
│ if (c == '\n') break; │
│ *buf++ = c; │
│ } │
│ │
│ Buffered reading is FAST (one syscall per buffer): │
│ │
│ [ Buffer: 8KB of data from one read() call ] │
│ ^ │
│ rio_bufptr - rio_readlineb advances through buffer │
│ │
└────────────────────────────────────────────────────────────────────────┘
Project Specification
What You Will Build
A command-line HTTP server called tiny that:
- Accepts TCP connections on a specified port
- Parses HTTP/1.0 requests (GET method at minimum)
- Serves static files from a document root directory
- Supports CGI for dynamic content generation
- Returns proper HTTP responses with status codes and headers
Functional Requirements
┌────────────────────────────────────────────────────────────────────────┐
│ FUNCTIONAL REQUIREMENTS │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ CORE FEATURES (Must Have): │
│ ────────────────────────── │
│ │
│ 1. LISTEN: Accept connections on specified port │
│ ./tiny 8080 │
│ │
│ 2. STATIC CONTENT: Serve files from document root │
│ GET /index.html → ./www/index.html │
│ GET /css/style.css → ./www/css/style.css │
│ │
│ 3. MIME TYPES: Set Content-Type based on file extension │
│ .html → text/html │
│ .css → text/css │
│ .js → application/javascript │
│ .jpg → image/jpeg │
│ .png → image/png │
│ .gif → image/gif │
│ │
│ 4. ERROR RESPONSES: Return appropriate HTTP errors │
│ 404 Not Found - file doesn't exist │
│ 403 Forbidden - permission denied │
│ 400 Bad Request - malformed request │
│ 501 Not Implemented - unsupported method │
│ │
│ ENHANCED FEATURES (Should Have): │
│ ───────────────────────────────── │
│ │
│ 5. CGI SUPPORT: Execute programs in cgi-bin/ │
│ GET /cgi-bin/adder?15&213 → run ./www/cgi-bin/adder │
│ │
│ 6. REQUEST LOGGING: Log each request │
│ [timestamp] client_ip "GET /path" status bytes │
│ │
│ SECURITY REQUIREMENTS: │
│ ────────────────────── │
│ │
│ 7. PATH TRAVERSAL: Block requests containing ".." │
│ GET /../../../etc/passwd → 400 Bad Request │
│ │
│ 8. DOCUMENT ROOT: Never serve files outside doc root │
│ │
└────────────────────────────────────────────────────────────────────────┘
CLI Interface
# Basic usage
$ ./tiny <port>
$ ./tiny 8080
# Expected startup output
listening on port 8080
document root: ./www
# Request logging format
127.0.0.1 - - [29/Dec/2024:10:15:30] "GET /index.html HTTP/1.0" 200 1234
127.0.0.1 - - [29/Dec/2024:10:15:31] "GET /style.css HTTP/1.0" 200 567
127.0.0.1 - - [29/Dec/2024:10:15:32] "GET /missing.html HTTP/1.0" 404 -
# Graceful shutdown on Ctrl+C
^C
shutting down...
Example HTTP Exchanges
Successful Static File Request
REQUEST:
────────
GET /index.html HTTP/1.0
Host: localhost:8080
User-Agent: curl/7.68.0
Accept: */*
RESPONSE:
─────────
HTTP/1.0 200 OK
Server: Tiny Web Server
Connection: close
Content-Length: 137
Content-Type: text/html
<!DOCTYPE html>
<html>
<head><title>Welcome</title></head>
<body>
<h1>Welcome to Tiny!</h1>
<p>A simple web server.</p>
</body>
</html>
404 Not Found
REQUEST:
────────
GET /nonexistent.html HTTP/1.0
Host: localhost:8080
RESPONSE:
─────────
HTTP/1.0 404 Not Found
Server: Tiny Web Server
Connection: close
Content-Type: text/html
Content-Length: 112
<!DOCTYPE html>
<html>
<head><title>404 Not Found</title></head>
<body>
<h1>404 Not Found</h1>
<p>File not found.</p>
</body>
</html>
CGI Request
REQUEST:
────────
GET /cgi-bin/adder?15&213 HTTP/1.0
Host: localhost:8080
RESPONSE (generated by adder program):
──────────────────────────────────────
HTTP/1.0 200 OK
Server: Tiny Web Server
Connection: close
Content-Type: text/html
<!DOCTYPE html>
<html>
<body>
<h1>Addition Result</h1>
<p>15 + 213 = 228</p>
</body>
</html>
Bad Request (Path Traversal Attempt)
REQUEST:
────────
GET /../../../etc/passwd HTTP/1.0
Host: localhost:8080
RESPONSE:
─────────
HTTP/1.0 400 Bad Request
Server: Tiny Web Server
Connection: close
Content-Type: text/html
Content-Length: 89
<!DOCTYPE html>
<html>
<body>
<h1>400 Bad Request</h1>
<p>Invalid request.</p>
</body>
</html>
Real World Outcome
When you complete this project, here’s exactly what you’ll see:
Starting the Server
$ make
gcc -Wall -O2 -o tiny tiny.c csapp.c -lpthread
gcc -Wall -O2 -o cgi-bin/adder adder.c
$ ./tiny 8080
Tiny Web Server started
Listening on port 8080
Document root: ./www
Press Ctrl+C to stop
Testing with curl
# Test static HTML
$ curl -v http://localhost:8080/index.html
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /index.html HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.68.0
> Accept: */*
>
< HTTP/1.0 200 OK
< Server: Tiny Web Server
< Connection: close
< Content-Length: 137
< Content-Type: text/html
<
<!DOCTYPE html>
<html>
<head><title>Welcome</title></head>
<body>
<h1>Welcome to Tiny!</h1>
</body>
</html>
* Closing connection 0
# Test CGI
$ curl "http://localhost:8080/cgi-bin/adder?5&10"
<!DOCTYPE html>
<html>
<body>
<h1>Addition Result</h1>
<p>5 + 10 = 15</p>
</body>
</html>
# Test 404
$ curl -I http://localhost:8080/missing.html
HTTP/1.0 404 Not Found
Server: Tiny Web Server
Connection: close
Content-Type: text/html
Content-Length: 112
# Test path traversal protection
$ curl http://localhost:8080/../../../etc/passwd
<!DOCTYPE html>
<html>
<body><h1>400 Bad Request</h1></body>
</html>
Server Log Output
$ ./tiny 8080
Tiny Web Server started
Listening on port 8080
Document root: ./www
[2024-12-29 10:15:30] 127.0.0.1 "GET /index.html HTTP/1.0" 200 137
[2024-12-29 10:15:45] 127.0.0.1 "GET /style.css HTTP/1.0" 200 456
[2024-12-29 10:16:02] 127.0.0.1 "GET /cgi-bin/adder?5&10 HTTP/1.0" 200 -
[2024-12-29 10:16:15] 127.0.0.1 "GET /missing.html HTTP/1.0" 404 112
[2024-12-29 10:16:30] 127.0.0.1 "GET /../etc/passwd HTTP/1.0" 400 89
^C
Shutting down...
Solution Architecture
High-Level Design
┌────────────────────────────────────────────────────────────────────────┐
│ SERVER ARCHITECTURE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ main() │
│ │ │
│ ├── parse_args() // Get port from command line │
│ │ │
│ ├── open_listenfd(port) // Socket, bind, listen │
│ │ │
│ └── while (1) { │
│ │ │
│ ├── accept() // Block waiting for client │
│ │ │
│ ├── handle_request() // Process this connection │
│ │ │ │
│ │ ├── read_request() │
│ │ │ ├── rio_readlineb() // Read request line │
│ │ │ └── parse_request() // Extract method/uri/ver │
│ │ │ │
│ │ ├── read_headers() │
│ │ │ └── rio_readlineb() // Read until empty line │
│ │ │ │
│ │ ├── route_request() │
│ │ │ ├── is_static() // Check if static file │
│ │ │ └── is_cgi() // Check if CGI request │
│ │ │ │
│ │ ├── [static] serve_static() │
│ │ │ ├── get_mime_type() │
│ │ │ ├── send_headers() │
│ │ │ └── send_file() // rio_writen │
│ │ │ │
│ │ └── [cgi] serve_cgi() │
│ │ ├── fork() │
│ │ ├── setenv() // CGI variables │
│ │ ├── dup2() // Redirect stdout │
│ │ └── execve() // Run program │
│ │ │
│ └── close(connfd) // Done with this client │
│ } │
│ │
└────────────────────────────────────────────────────────────────────────┘
Module Design
┌────────────────────────────────────────────────────────────────────────┐
│ MODULE STRUCTURE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ tiny.c (main server) │
│ ───────────────────── │
│ - main() Entry point, accept loop │
│ - handle_request() Dispatch to static/CGI │
│ - serve_static() Serve static files │
│ - serve_cgi() Fork and exec CGI │
│ - client_error() Send error responses │
│ │
│ http.c (HTTP parsing) │
│ ───────────────────── │
│ - parse_request_line() Parse "GET /path HTTP/1.0" │
│ - parse_uri() Extract path and query string │
│ - read_headers() Read and parse headers │
│ - get_mime_type() Map extension to MIME type │
│ │
│ net.c (networking) │
│ ────────────────── │
│ - open_listenfd() Create listening socket │
│ - send_response_line() Send "HTTP/1.0 200 OK" │
│ - send_headers() Send response headers │
│ │
│ rio.c (robust I/O) │
│ ────────────────── │
│ - rio_readn() Read exactly n bytes │
│ - rio_writen() Write exactly n bytes │
│ - rio_readlineb() Read line with buffering │
│ - rio_readinitb() Initialize buffered reader │
│ │
│ cgi.c (CGI support) │
│ ─────────────────── │
│ - is_cgi_request() Check if URI is in cgi-bin │
│ - setup_cgi_env() Set CGI environment variables │
│ - exec_cgi() Fork, redirect, exec │
│ │
└────────────────────────────────────────────────────────────────────────┘
Request Processing Flow
┌────────────────────────────────────────────────────────────────────────┐
│ REQUEST PROCESSING FLOW │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ INCOMING REQUEST: "GET /index.html HTTP/1.0\r\n..." │
│ │
│ 1. READ REQUEST LINE │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ rio_readlineb() → "GET /index.html HTTP/1.0\r\n" │ │
│ │ │ │
│ │ parse_request_line(): │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ method = "GET" │ │ │
│ │ │ uri = "/index.html" │ │ │
│ │ │ version = "HTTP/1.0" │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 2. VALIDATE METHOD │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ if (method != "GET" && method != "HEAD") │ │
│ │ return 501 Not Implemented │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 3. READ HEADERS (until empty line) │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ while (rio_readlineb() != "\r\n") { │ │
│ │ // Can parse headers here if needed │ │
│ │ // Host: localhost:8080 │ │
│ │ // User-Agent: curl/7.68.0 │ │
│ │ } │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 4. PARSE URI │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ parse_uri("/index.html"): │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ path = "./www/index.html" │ │ │
│ │ │ query_string = "" │ │ │
│ │ │ is_static = true │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ parse_uri("/cgi-bin/adder?15&213"): │ │
│ │ ┌─────────────────────────────────────────────────────┐ │ │
│ │ │ path = "./www/cgi-bin/adder" │ │ │
│ │ │ query_string = "15&213" │ │ │
│ │ │ is_static = false │ │ │
│ │ └─────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 5. SECURITY CHECK │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ if (strstr(uri, "..")) │ │
│ │ return 400 Bad Request // Path traversal attempt │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 6. ROUTE TO HANDLER │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ if (is_static) if (!is_static) │ │
│ │ serve_static(connfd, path) OR serve_cgi(...) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Static File Serving
┌────────────────────────────────────────────────────────────────────────┐
│ STATIC FILE SERVING │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ serve_static(connfd, filename): │
│ │
│ 1. CHECK FILE EXISTS │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ struct stat sbuf; │ │
│ │ if (stat(filename, &sbuf) < 0) { │ │
│ │ client_error(connfd, 404, "Not Found"); │ │
│ │ return; │ │
│ │ } │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 2. CHECK PERMISSIONS │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ if (!(S_ISREG(sbuf.st_mode)) || !(S_IRUSR & sbuf.st_mode)) │ │
│ │ client_error(connfd, 403, "Forbidden"); │ │
│ │ return; │ │
│ │ } │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 3. GET FILE INFO │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ size_t filesize = sbuf.st_size; │ │
│ │ char *filetype = get_mime_type(filename); │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 4. SEND RESPONSE HEADERS │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ sprintf(buf, "HTTP/1.0 200 OK\r\n"); │ │
│ │ sprintf(buf, "%sServer: Tiny Web Server\r\n", buf); │ │
│ │ sprintf(buf, "%sConnection: close\r\n", buf); │ │
│ │ sprintf(buf, "%sContent-Length: %ld\r\n", buf, filesize); │ │
│ │ sprintf(buf, "%sContent-Type: %s\r\n\r\n", buf, filetype);│ │
│ │ rio_writen(connfd, buf, strlen(buf)); │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 5. SEND FILE BODY │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ // Option A: Read into buffer and write │ │
│ │ int srcfd = open(filename, O_RDONLY); │ │
│ │ char *srcp = malloc(filesize); │ │
│ │ rio_readn(srcfd, srcp, filesize); │ │
│ │ rio_writen(connfd, srcp, filesize); │ │
│ │ free(srcp); │ │
│ │ close(srcfd); │ │
│ │ │ │
│ │ // Option B: Memory map the file (more efficient) │ │
│ │ int srcfd = open(filename, O_RDONLY); │ │
│ │ char *srcp = mmap(0, filesize, PROT_READ, │ │
│ │ MAP_PRIVATE, srcfd, 0); │ │
│ │ close(srcfd); │ │
│ │ rio_writen(connfd, srcp, filesize); │ │
│ │ munmap(srcp, filesize); │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Implementation Guide
The Core Question You’re Answering
“When I type a URL in my browser, what sequence of system calls does the server make to send back the webpage, and how do sockets unify disk I/O with network I/O?”
This project reveals that serving web pages is fundamentally about:
- Creating a listening socket and waiting for connections
- Parsing a simple text protocol (HTTP)
- Reading files from disk and sending bytes over the network
- Forking processes to run dynamic programs
You’ll see that Unix’s “everything is a file” philosophy means the same read/write calls work for disk files, network sockets, and pipes - a beautiful abstraction.
Concepts You Must Understand First
Before starting this project, ensure you understand these concepts:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| File descriptors | Sockets are file descriptors - you read/write them like files | CS:APP 10.1-10.2 |
| Socket API (socket/bind/listen/accept) | Core server setup sequence | CS:APP 11.4 |
| Network byte order | Port numbers must be in network byte order (htons) | CS:APP 11.3 |
| fork() and exec() | CGI requires spawning child processes | CS:APP 8.4 |
| dup2() for I/O redirection | Redirect CGI stdout to socket | CS:APP 10.9 |
| Robust I/O (Rio) | Handle short reads/writes from network | CS:APP 10.5 |
| HTTP protocol basics | Must parse request format correctly | Web search or RFC 2616 |
Questions to Guide Your Design
Work through these questions BEFORE writing code:
-
Socket Setup: What’s the difference between the listening socket and connection socket? Why do you need both?
-
Request Parsing: How will you read the request line? How do you know when headers end?
-
URI Parsing: How do you extract the path from the URI? What about query strings for CGI?
-
File Mapping: How do you map URI
/index.htmlto filesystem path./www/index.html? -
Error Handling: What happens if a file doesn’t exist? If the request is malformed?
-
CGI Execution: Why do you need fork()? How does the CGI program’s output reach the client?
-
Resource Management: When do you close file descriptors? What about the child process after exec?
Thinking Exercise
Before writing any code, trace through this scenario by hand:
Given a request:
GET /cgi-bin/adder?15&213 HTTP/1.0
Host: localhost:8080
User-Agent: curl/7.68.0
Exercise: Answer these questions:
-
After accept(): What information do you have about the client?
-
Parsing the request line: What are the three components? How do you split them?
-
Is this static or CGI?: How do you decide? What’s the rule?
-
CGI execution sequence: List every system call the parent makes. List every system call the child makes before exec.
-
Environment setup: What value goes in QUERY_STRING? What value goes in REQUEST_METHOD?
-
I/O redirection: After
dup2(connfd, STDOUT_FILENO), where doesprintf()output go? -
Response headers: The CGI program outputs “Content-Type: text/html\r\n\r\n” - why does it include headers?
Verify your answers by adding print statements and running the server.
Interview Questions They’ll Ask
After completing this project, you’ll be ready for these common interview questions:
- “Explain the socket API for a TCP server.”
- Expected: socket() creates endpoint, bind() assigns address, listen() marks as passive, accept() waits for clients
- Bonus: Explain why accept() returns a NEW file descriptor
- “What’s the difference between HTTP/1.0 and HTTP/1.1?”
- Expected: HTTP/1.1 has persistent connections (keep-alive), chunked transfer encoding, Host header required
- Bonus: Discuss connection reuse benefits and how servers handle it
- “How would you handle multiple concurrent clients?”
- Expected: Options include fork() per connection, thread per connection, thread pool, I/O multiplexing
- Bonus: Discuss trade-offs of each approach (overhead, complexity, resource limits)
- “What is CGI and how does it work?”
- Expected: Common Gateway Interface - server forks, sets environment variables, redirects stdout to socket, execs program
- Bonus: Explain why CGI is slow and how FastCGI improves it
- “How do you prevent path traversal attacks?”
- Expected: Reject URIs containing “..”, validate paths stay under document root
- Bonus: Discuss realpath() for canonical path resolution, chroot jails
- “What happens if you don’t handle short reads on a socket?”
- Expected: May read partial request, corrupt data, or hang waiting for more bytes
- Bonus: Explain why network reads differ from disk reads (packet boundaries, TCP segmentation)
Hints in Layers
If you’re stuck, reveal hints one at a time:
Hint 1: Basic Server Structure
Start with this skeleton:
int main(int argc, char **argv) {
int listenfd, connfd, port;
socklen_t clientlen;
struct sockaddr_in clientaddr;
if (argc != 2) {
fprintf(stderr, "usage: %s <port>\n", argv[0]);
exit(1);
}
port = atoi(argv[1]);
listenfd = open_listenfd(port);
if (listenfd < 0) {
fprintf(stderr, "open_listenfd error\n");
exit(1);
}
printf("Tiny Web Server listening on port %d\n", port);
while (1) {
clientlen = sizeof(clientaddr);
connfd = accept(listenfd, (struct sockaddr *)&clientaddr, &clientlen);
if (connfd < 0) {
perror("accept");
continue;
}
handle_request(connfd);
close(connfd);
}
}
Test with: curl http://localhost:8080/ (will hang - no response yet)
Hint 2: Reading the Request
Use buffered Rio functions to read the request:
void handle_request(int connfd) {
char buf[MAXLINE], method[MAXLINE], uri[MAXLINE], version[MAXLINE];
rio_t rio;
// Initialize buffered reader
rio_readinitb(&rio, connfd);
// Read request line
if (rio_readlineb(&rio, buf, MAXLINE) <= 0)
return;
// Parse request line
if (sscanf(buf, "%s %s %s", method, uri, version) != 3) {
client_error(connfd, 400, "Bad Request",
"Tiny couldn't parse the request line");
return;
}
// Read headers (and discard for now)
while (rio_readlineb(&rio, buf, MAXLINE) > 0) {
if (strcmp(buf, "\r\n") == 0)
break; // Empty line marks end of headers
// Could parse headers here: Host, Content-Length, etc.
}
printf("Method: %s, URI: %s, Version: %s\n", method, uri, version);
// Now route to handler...
}
The key insight: read line-by-line until you see an empty line (\r\n).
Hint 3: Serving Static Files
Here’s the static file serving flow:
void serve_static(int connfd, char *filename, int filesize) {
int srcfd;
char *srcp, filetype[MAXLINE], buf[MAXBUF];
// Determine MIME type
get_filetype(filename, filetype);
// Send response headers
sprintf(buf, "HTTP/1.0 200 OK\r\n");
sprintf(buf, "%sServer: Tiny Web Server\r\n", buf);
sprintf(buf, "%sConnection: close\r\n", buf);
sprintf(buf, "%sContent-Length: %d\r\n", buf, filesize);
sprintf(buf, "%sContent-Type: %s\r\n\r\n", buf, filetype);
rio_writen(connfd, buf, strlen(buf));
// Send file body using mmap
srcfd = open(filename, O_RDONLY, 0);
srcp = mmap(0, filesize, PROT_READ, MAP_PRIVATE, srcfd, 0);
close(srcfd);
rio_writen(connfd, srcp, filesize);
munmap(srcp, filesize);
}
void get_filetype(char *filename, char *filetype) {
if (strstr(filename, ".html"))
strcpy(filetype, "text/html");
else if (strstr(filename, ".gif"))
strcpy(filetype, "image/gif");
else if (strstr(filename, ".png"))
strcpy(filetype, "image/png");
else if (strstr(filename, ".jpg"))
strcpy(filetype, "image/jpeg");
else if (strstr(filename, ".css"))
strcpy(filetype, "text/css");
else if (strstr(filename, ".js"))
strcpy(filetype, "application/javascript");
else
strcpy(filetype, "application/octet-stream");
}
Key points:
- Send headers first, then body
- Use mmap() for efficient large file serving
- Don’t forget the empty line after headers (
\r\n\r\n)
Hint 4: CGI Execution
CGI requires fork/exec with I/O redirection:
void serve_cgi(int connfd, char *filename, char *cgiargs) {
char buf[MAXLINE], *emptylist[] = { NULL };
// Send minimal response header (CGI will add Content-Type)
sprintf(buf, "HTTP/1.0 200 OK\r\n");
sprintf(buf, "%sServer: Tiny Web Server\r\n", buf);
rio_writen(connfd, buf, strlen(buf));
pid_t pid = fork();
if (pid == 0) { // Child
// Set CGI environment variables
setenv("QUERY_STRING", cgiargs, 1);
setenv("REQUEST_METHOD", "GET", 1);
// Redirect stdout to client socket
dup2(connfd, STDOUT_FILENO);
// Execute CGI program
execve(filename, emptylist, environ);
// If we get here, exec failed
perror("execve");
exit(1);
}
// Parent: wait for child to complete
waitpid(pid, NULL, 0);
}
Key points:
- Child sets up environment, redirects I/O, then execs
- Parent waits for child to prevent zombies
- CGI program writes to stdout, which goes to socket
Hint 5: Error Handling and Security
Always validate input and send proper error responses:
void client_error(int fd, int status, char *shortmsg, char *longmsg) {
char buf[MAXBUF], body[MAXBUF];
// Build HTML error page body
sprintf(body, "<!DOCTYPE html>\r\n");
sprintf(body, "%s<html><head><title>%d %s</title></head>\r\n",
body, status, shortmsg);
sprintf(body, "%s<body>\r\n", body);
sprintf(body, "%s<h1>%d %s</h1>\r\n", body, status, shortmsg);
sprintf(body, "%s<p>%s</p>\r\n", body, longmsg);
sprintf(body, "%s</body></html>\r\n", body);
// Send response headers
sprintf(buf, "HTTP/1.0 %d %s\r\n", status, shortmsg);
sprintf(buf, "%sServer: Tiny Web Server\r\n", buf);
sprintf(buf, "%sConnection: close\r\n", buf);
sprintf(buf, "%sContent-Type: text/html\r\n", buf);
sprintf(buf, "%sContent-Length: %lu\r\n\r\n", buf, strlen(body));
rio_writen(fd, buf, strlen(buf));
rio_writen(fd, body, strlen(body));
}
// Security: Check for path traversal
int is_safe_path(char *uri) {
if (strstr(uri, "..") != NULL)
return 0; // Reject ".." anywhere in path
if (uri[0] != '/')
return 0; // Must start with /
return 1;
}
Key security checks:
- Reject URIs containing “..”
- Verify file exists before serving
- Check file permissions (S_ISREG, readable)
- Don’t expose error details that help attackers
Testing Strategy
Manual Testing with curl
# Test basic connectivity
curl -v http://localhost:8080/
# Test static HTML file
curl http://localhost:8080/index.html
# Test with verbose output (shows headers)
curl -v http://localhost:8080/index.html
# Test HEAD request (headers only)
curl -I http://localhost:8080/index.html
# Test different MIME types
curl -v http://localhost:8080/style.css
curl -v http://localhost:8080/image.png
# Test CGI
curl "http://localhost:8080/cgi-bin/adder?5&10"
# Test 404
curl -v http://localhost:8080/nonexistent.html
# Test path traversal (should be blocked)
curl http://localhost:8080/../../../etc/passwd
# Test with raw netcat (see exact bytes)
echo -e "GET /index.html HTTP/1.0\r\nHost: localhost\r\n\r\n" | nc localhost 8080
Automated Test Script
#!/bin/bash
# test_server.sh
PORT=8080
BASE_URL="http://localhost:$PORT"
PASS=0
FAIL=0
test_response() {
local url="$1"
local expected_status="$2"
local description="$3"
status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$status" = "$expected_status" ]; then
echo "[PASS] $description (got $status)"
((PASS++))
else
echo "[FAIL] $description (expected $expected_status, got $status)"
((FAIL++))
fi
}
echo "=== Testing Tiny Web Server ==="
# Test static files
test_response "$BASE_URL/index.html" "200" "Static HTML"
test_response "$BASE_URL/style.css" "200" "Static CSS"
test_response "$BASE_URL/" "200" "Directory index"
# Test errors
test_response "$BASE_URL/nonexistent.html" "404" "404 Not Found"
test_response "$BASE_URL/../etc/passwd" "400" "Path traversal blocked"
# Test CGI
test_response "$BASE_URL/cgi-bin/adder?5&10" "200" "CGI request"
# Test MIME types
content_type=$(curl -s -I "$BASE_URL/index.html" | grep -i "Content-Type" | tr -d '\r')
if [[ "$content_type" == *"text/html"* ]]; then
echo "[PASS] MIME type for HTML"
((PASS++))
else
echo "[FAIL] MIME type for HTML (got: $content_type)"
((FAIL++))
fi
echo ""
echo "=== Results: $PASS passed, $FAIL failed ==="
Load Testing
# Install Apache Bench (ab) if not present
# apt-get install apache2-utils # Debian/Ubuntu
# brew install httpd # macOS
# Basic load test
ab -n 1000 -c 10 http://localhost:8080/index.html
# Output interpretation:
# - Requests per second: throughput
# - Time per request: latency
# - Failed requests: should be 0!
# Using wrk (more modern)
wrk -t4 -c100 -d30s http://localhost:8080/index.html
# Siege for concurrent testing
siege -c 50 -t 30s http://localhost:8080/index.html
Debugging with tcpdump
# See raw HTTP traffic
sudo tcpdump -i lo -A port 8080
# Save to file for Wireshark analysis
sudo tcpdump -i lo -w capture.pcap port 8080
Common Pitfalls & Debugging
Pitfall 1: Partial Writes
┌────────────────────────────────────────────────────────────────────────┐
│ PARTIAL WRITES BUG │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ SYMPTOM: Large files get truncated, browser shows incomplete page │
│ │
│ BUG: │
│ ───── │
│ write(connfd, buf, filesize); // May not write all bytes! │
│ │
│ Problem: write() can return fewer bytes than requested │
│ (network buffer full, signal interrupt, etc.) │
│ │
│ FIX: │
│ ──── │
│ // Use rio_writen which loops until all bytes written │
│ ssize_t rio_writen(int fd, void *usrbuf, size_t n) { │
│ size_t nleft = n; │
│ ssize_t nwritten; │
│ char *bufp = usrbuf; │
│ │
│ while (nleft > 0) { │
│ if ((nwritten = write(fd, bufp, nleft)) <= 0) { │
│ if (errno == EINTR) // Interrupted by signal │
│ nwritten = 0; │
│ else │
│ return -1; // Real error │
│ } │
│ nleft -= nwritten; │
│ bufp += nwritten; │
│ } │
│ return n; │
│ } │
│ │
│ VERIFICATION: │
│ ───────────── │
│ curl http://localhost:8080/large_file.html | wc -c │
│ # Should match file size │
│ │
└────────────────────────────────────────────────────────────────────────┘
Pitfall 2: Wrong MIME Type
┌────────────────────────────────────────────────────────────────────────┐
│ WRONG MIME TYPE BUG │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ SYMPTOM: Browser downloads file instead of displaying, │
│ CSS not applied, JavaScript not executed │
│ │
│ BUG: │
│ ───── │
│ // Using wrong MIME type │
│ Content-Type: text/plain // For all files! │
│ │
│ // Or using wrong extension check │
│ if (strstr(filename, "html")) // Matches "foo.htmlx" too! │
│ │
│ FIX: │
│ ──── │
│ // Check extension properly │
│ char *get_mime_type(char *filename) { │
│ char *ext = strrchr(filename, '.'); // Find LAST dot │
│ if (ext == NULL) │
│ return "application/octet-stream"; │
│ │
│ if (strcasecmp(ext, ".html") == 0) │
│ return "text/html"; │
│ if (strcasecmp(ext, ".css") == 0) │
│ return "text/css"; │
│ if (strcasecmp(ext, ".js") == 0) │
│ return "application/javascript"; │
│ if (strcasecmp(ext, ".jpg") == 0 || strcasecmp(ext, ".jpeg") == 0)│
│ return "image/jpeg"; │
│ if (strcasecmp(ext, ".png") == 0) │
│ return "image/png"; │
│ if (strcasecmp(ext, ".gif") == 0) │
│ return "image/gif"; │
│ │
│ return "application/octet-stream"; // Default: binary │
│ } │
│ │
│ VERIFICATION: │
│ ───────────── │
│ curl -I http://localhost:8080/style.css | grep Content-Type │
│ # Should show: Content-Type: text/css │
│ │
└────────────────────────────────────────────────────────────────────────┘
Pitfall 3: Path Traversal Vulnerability
┌────────────────────────────────────────────────────────────────────────┐
│ PATH TRAVERSAL VULNERABILITY │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ SYMPTOM: Attacker can read /etc/passwd or other sensitive files │
│ │
│ BUG: │
│ ───── │
│ // Naive path construction │
│ sprintf(filename, "./www%s", uri); │
│ // URI: "/../../../etc/passwd" │
│ // Result: "./www/../../../etc/passwd" = "/etc/passwd" ! │
│ │
│ FIX: │
│ ──── │
│ // Option 1: Reject ".." anywhere │
│ if (strstr(uri, "..") != NULL) { │
│ client_error(connfd, 400, "Bad Request", "Invalid path"); │
│ return; │
│ } │
│ │
│ // Option 2: Use realpath and verify │
│ char resolved[PATH_MAX]; │
│ char docroot[PATH_MAX]; │
│ │
│ realpath("./www", docroot); │
│ sprintf(filename, "./www%s", uri); │
│ │
│ if (realpath(filename, resolved) == NULL) { │
│ client_error(connfd, 404, "Not Found", "File not found"); │
│ return; │
│ } │
│ │
│ // Verify resolved path is under docroot │
│ if (strncmp(resolved, docroot, strlen(docroot)) != 0) { │
│ client_error(connfd, 403, "Forbidden", "Access denied"); │
│ return; │
│ } │
│ │
│ VERIFICATION: │
│ ───────────── │
│ curl http://localhost:8080/../../../etc/passwd │
│ # Should return 400 or 403, NOT file contents! │
│ │
└────────────────────────────────────────────────────────────────────────┘
Pitfall 4: Zombie Processes
┌────────────────────────────────────────────────────────────────────────┐
│ ZOMBIE PROCESS BUG │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ SYMPTOM: `ps aux` shows many <defunct> processes after CGI requests │
│ │
│ BUG: │
│ ───── │
│ // Parent doesn't wait for child │
│ if (fork() == 0) { │
│ // child does CGI │
│ exit(0); │
│ } │
│ // Parent continues without wait() - ZOMBIE! │
│ │
│ FIX OPTIONS: │
│ ──────────── │
│ │
│ // Option 1: Wait for child (blocks parent) │
│ pid_t pid = fork(); │
│ if (pid == 0) { │
│ // child │
│ } else { │
│ waitpid(pid, NULL, 0); // Block until child exits │
│ } │
│ │
│ // Option 2: Signal handler for SIGCHLD │
│ void sigchld_handler(int sig) { │
│ while (waitpid(-1, NULL, WNOHANG) > 0) │
│ ; // Reap all zombies │
│ } │
│ // In main(): │
│ signal(SIGCHLD, sigchld_handler); │
│ │
│ // Option 3: Double fork (child exits immediately) │
│ if (fork() == 0) { // First child │
│ if (fork() == 0) { // Second child (grandchild) │
│ // Do CGI work │
│ exit(0); │
│ } │
│ exit(0); // First child exits immediately │
│ } │
│ waitpid(-1, NULL, 0); // Parent reaps first child │
│ // Grandchild inherited by init, no zombie │
│ │
│ VERIFICATION: │
│ ───────────── │
│ # Run many CGI requests, then check: │
│ ps aux | grep tiny | grep defunct │
│ # Should show nothing │
│ │
└────────────────────────────────────────────────────────────────────────┘
Pitfall 5: Missing CRLF
┌────────────────────────────────────────────────────────────────────────┐
│ MISSING CRLF BUG │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ SYMPTOM: Browser hangs, shows nothing, or garbled output │
│ │
│ BUG: │
│ ───── │
│ // Using \n instead of \r\n │
│ sprintf(buf, "HTTP/1.0 200 OK\n"); // WRONG! │
│ sprintf(buf, "%sContent-Type: text/html\n\n", buf); // WRONG! │
│ │
│ HTTP REQUIRES \r\n (CRLF): │
│ - Line ending: \r\n │
│ - End of headers: \r\n\r\n │
│ │
│ FIX: │
│ ──── │
│ sprintf(buf, "HTTP/1.0 200 OK\r\n"); │
│ sprintf(buf, "%sContent-Type: text/html\r\n", buf); │
│ sprintf(buf, "%sContent-Length: %d\r\n", buf, len); │
│ sprintf(buf, "%s\r\n", buf); // Empty line ends headers │
│ │
│ VERIFICATION: │
│ ───────────── │
│ # Use hexdump to see actual bytes │
│ curl -s http://localhost:8080/ | xxd | head │
│ # Look for 0d 0a (CRLF) not just 0a (LF) │
│ │
└────────────────────────────────────────────────────────────────────────┘
Extensions & Challenges
Beginner Extensions
- Default index file: Serve
index.htmlwhen directory is requested - More MIME types: Add support for PDF, SVG, fonts, JSON
- Directory listing: Generate HTML listing when no index.html
- Access logging: Log requests in Apache combined log format
- HEAD method: Return headers only, no body
Intermediate Extensions
- HTTP/1.1 keep-alive: Handle multiple requests per connection
- POST method: Parse request body, support form submissions
- Chunked transfer: Send responses in chunks for streaming
- Virtual hosts: Serve different content based on Host header
- Basic authentication: Implement HTTP Basic Auth
- Concurrent with fork(): Fork a child for each connection
Advanced Extensions
- Thread pool: Pre-create worker threads, use a work queue
- I/O multiplexing: Use select/poll/epoll for event-driven design
- SSL/TLS: Add HTTPS support with OpenSSL
- WebSocket: Upgrade connections to WebSocket protocol
- HTTP/2: Implement binary framing, multiplexing
- Reverse proxy: Forward requests to backend servers
- Caching: Add Cache-Control headers, If-Modified-Since support
- Compression: gzip/deflate response bodies
Project Ideas
- Static Site Generator Backend: Serve markdown files rendered as HTML
- API Server: Respond to REST requests with JSON
- File Upload Server: Handle multipart form data
- Chat Server: WebSocket-based real-time messaging
- Load Balancer: Distribute requests across backend servers
Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| Unix I/O fundamentals | CS:APP 3rd Ed | Chapter 10 “System-Level I/O” |
| Robust I/O functions | CS:APP 3rd Ed | Section 10.5 “Robust Reading and Writing” |
| Socket programming | CS:APP 3rd Ed | Chapter 11 “Network Programming” |
| Socket API details | CS:APP 3rd Ed | Section 11.4 “The Sockets Interface” |
| HTTP and Web servers | CS:APP 3rd Ed | Section 11.5 “Web Servers” |
| Process control (fork/exec) | CS:APP 3rd Ed | Chapter 8 “Exceptional Control Flow” |
| Advanced socket programming | Unix Network Programming Vol. 1 | Chapters 4-6 |
| HTTP protocol details | HTTP: The Definitive Guide | Chapters 1-4 |
| Production web servers | “nginx: A Practical Guide to High Performance” | Entire book |
| TCP/IP internals | TCP/IP Illustrated Vol. 1 | Chapters 12-24 |
Self-Assessment Checklist
Understanding
- I can explain the difference between a listening socket and connection socket
- I understand why accept() returns a new file descriptor
- I can describe the HTTP request/response format with correct line endings
- I know what MIME types are and why they matter
- I can explain how CGI uses fork/exec/dup2 to run programs
- I understand why short reads/writes happen and how Rio handles them
Implementation
- Server accepts connections on the specified port
- Request line parsing extracts method, URI, and version
- Headers are read until empty line
- Static files are served with correct MIME types
- Content-Length header matches actual body size
- 404 returned for missing files
- 403 returned for permission denied
- 400 returned for malformed requests and path traversal
- CGI programs execute and output reaches client
- No zombie processes after CGI requests
- No memory leaks (check with valgrind)
Security
- Path traversal attacks are blocked
- No files served outside document root
- Error messages don’t reveal server internals
- CGI environment is properly sanitized
Performance
- Can handle at least 100 requests/second for small files
- Large files (10MB+) transfer completely
- No connection timeouts for slow clients
- Resource cleanup happens on errors
Growth
- I tested with curl, netcat, and a real browser
- I debugged at least one issue using tcpdump/wireshark
- I understand what production servers do that this doesn’t
- I can discuss HTTP server architecture in an interview
Submission / Completion Criteria
Minimum Viable Completion:
- Server accepts connections and serves static files
- Correct HTTP response format with headers
- Basic MIME type detection (.html, .css, .jpg, .png)
- 404 error for missing files
- Path traversal protection
Full Completion:
- All of the above plus CGI support
- Proper zombie process handling
- Complete MIME type coverage
- Request logging
- Clean error handling for all edge cases
Excellence (Going Above & Beyond):
- Concurrent handling (fork, threads, or I/O multiplexing)
- Keep-alive connections
- Virtual host support
- Access authentication
- Performance benchmarks and optimization
This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.