Project 13: HTTP/1.1 Server Implementation

Build a compliant HTTP/1.1 server that serves static files, handles keep-alive connections, supports chunked transfer encoding, and parses HTTP headers correctly.

Quick Reference

Attribute Value
Difficulty Level 4 - Expert
Time Estimate 2-3 Weeks
Language C (primary), Rust/Go (alternatives)
Prerequisites Socket programming (Project 12), file I/O, string parsing
Key Topics HTTP protocol, TCP sockets, MIME types, keep-alive, range requests

1. Learning Objectives

After completing this project, you will:

  • Understand HTTP at the byte level: Parse request lines, headers, and bodies according to RFC 7230-7235
  • Implement a state machine for protocol parsing: Handle partial reads, incomplete requests, and streaming data
  • Master TCP connection management: Keep-alive, timeouts, graceful shutdown, and connection reuse
  • Learn efficient file serving techniques: Use sendfile() for zero-copy file transfers
  • Handle HTTP security concerns: Prevent path traversal attacks, validate input, sanitize URLs
  • Implement proper MIME type detection: Map file extensions to Content-Type headers
  • Understand HTTP error handling: Return appropriate status codes (200, 206, 301, 400, 404, 500, etc.)

2. Theoretical Foundation

2.1 Core Concepts

HTTP (HyperText Transfer Protocol) is a request-response protocol that powers the World Wide Web. Understanding HTTP at the byte level is essential for anyone building web applications, debugging network issues, or implementing server infrastructure.

HTTP Request/Response Flow

Client                                  Server
  │                                       │
  │  ──── TCP Connection Setup ────────>  │
  │  <─── TCP SYN-ACK ─────────────────   │
  │                                       │
  │  ──── HTTP Request ────────────────>  │
  │       GET /index.html HTTP/1.1        │
  │       Host: localhost:8080            │
  │       Connection: keep-alive          │
  │                                       │
  │  <─── HTTP Response ───────────────   │
  │       HTTP/1.1 200 OK                 │
  │       Content-Type: text/html         │
  │       Content-Length: 1234            │
  │       <html>...</html>                │
  │                                       │
  │  ──── Next Request (Keep-Alive) ───>  │
  │       GET /style.css HTTP/1.1         │
  │                                       │
  │  <─── Response ────────────────────   │
  │                                       │
  │  ──── Connection Close ────────────>  │
  └───────────────────────────────────────┘

HTTP Message Structure:

HTTP Request Format:
┌────────────────────────────────────────────────┐
│ Request-Line: METHOD SP Request-URI SP HTTP/1.1│
│ CRLF                                           │
├────────────────────────────────────────────────┤
│ Header-1: value1 CRLF                          │
│ Header-2: value2 CRLF                          │
│ ...                                            │
│ CRLF (empty line ends headers)                 │
├────────────────────────────────────────────────┤
│ [Optional Message Body]                        │
└────────────────────────────────────────────────┘

HTTP Response Format:
┌────────────────────────────────────────────────┐
│ Status-Line: HTTP/1.1 SP Status-Code SP Reason │
│ CRLF                                           │
├────────────────────────────────────────────────┤
│ Content-Type: text/html CRLF                   │
│ Content-Length: 1234 CRLF                      │
│ Connection: keep-alive CRLF                    │
│ CRLF (empty line ends headers)                 │
├────────────────────────────────────────────────┤
│ <html>...actual content...</html>              │
└────────────────────────────────────────────────┘

CRLF = Carriage Return + Line Feed (\r\n)
SP = Space character

Key HTTP/1.1 Features:

  1. Persistent Connections (Keep-Alive): By default, HTTP/1.1 connections are persistent. Multiple requests can be sent over the same TCP connection, avoiding the overhead of connection establishment.

  2. Host Header Requirement: HTTP/1.1 requires the Host header, enabling virtual hosting (multiple domains on one IP).

  3. Chunked Transfer Encoding: Allows sending data in chunks when content length is unknown.

  4. Range Requests: Clients can request specific byte ranges (essential for video streaming and resumable downloads).

  5. Cache-Control: Headers for controlling caching behavior.

2.2 Why This Matters

Real-World Importance:

  • Every web interaction uses HTTP: Understanding HTTP deeply helps debug issues, optimize performance, and build better applications
  • Foundation for modern protocols: HTTP/2 and HTTP/3 build upon HTTP/1.1 semantics
  • Security implications: Many vulnerabilities (XSS, CSRF, injection) relate to HTTP handling
  • Performance optimization: Understanding keep-alive, caching, and compression enables 10x performance improvements

Industry Usage:

  • nginx: Powers 34% of all websites, handling millions of requests per second
  • Apache: The most deployed web server historically
  • Node.js http module: Built on similar principles
  • Load balancers (HAProxy, Envoy): Must understand HTTP for routing decisions

Career Relevance:

Building an HTTP server demonstrates:

  • Protocol implementation skills (parsing, state machines)
  • Systems programming proficiency (sockets, file I/O, memory management)
  • Security awareness (input validation, path sanitization)
  • Performance optimization (zero-copy, connection pooling)

2.3 Historical Context

Timeline of HTTP:

HTTP Evolution Timeline

1991  HTTP/0.9  - Simple GET requests, no headers
        │
        v
1996  HTTP/1.0  - Headers, POST, status codes (RFC 1945)
        │         - New connection per request
        v
1997  HTTP/1.1  - Keep-alive default, Host required (RFC 2068)
        │         - Chunked encoding, range requests
        │         - Virtual hosting enabled
        v
2014  HTTP/1.1  - Updated specifications (RFC 7230-7235)
        │         - Clearer semantics, security fixes
        v
2015  HTTP/2    - Binary framing, multiplexing (RFC 7540)
        │         - Header compression (HPACK)
        │         - Server push
        v
2022  HTTP/3    - QUIC transport (RFC 9114)
                  - UDP-based, 0-RTT connections

The UNIX Philosophy in Web Servers:

Early web servers like NCSA HTTPd (1993) and Apache (1995) followed UNIX principles:

  • Configuration via text files
  • Modular architecture
  • Process-per-connection model

Modern servers like nginx (2004) use event-driven architecture (epoll/kqueue) for handling thousands of concurrent connections efficiently.

2.4 Common Misconceptions

Misconception 1: “HTTP is simple text protocol”

  • Reality: While text-based, HTTP has subtle parsing requirements. Headers can span multiple lines (obsolete but must be handled), header names are case-insensitive, and line endings must be CRLF.

Misconception 2: “Just read until you see the end of the request”

  • Reality: HTTP requests may be split across multiple TCP packets. Your parser must handle partial reads, buffering, and reassembly.

Misconception 3: “Content-Length is always present”

  • Reality: Chunked encoding doesn’t use Content-Length. You must detect and parse chunk sizes.

Misconception 4: “Closing the connection signals end of response”

  • Reality: With keep-alive, you must use Content-Length or chunked encoding to delimit responses. Closing the connection is a last resort.

Misconception 5: “Path from URL is the filesystem path”

  • Reality: URL paths must be validated, decoded (URL encoding), and mapped to filesystem paths safely. ../ sequences are attack vectors.

3. Project Specification

3.1 What You Will Build

A fully functional HTTP/1.1 server (myhttpd) that:

  1. Listens on a configurable port for incoming TCP connections
  2. Serves static files from a document root directory
  3. Parses HTTP/1.1 requests including method, path, headers
  4. Supports keep-alive connections with configurable timeout
  5. Returns proper HTTP responses with correct status codes and headers
  6. Handles directory listings (optional, configurable)
  7. Supports range requests for partial content delivery
  8. Implements security measures against path traversal attacks
Architecture Overview

                    ┌─────────────────────────────────────────┐
                    │              myhttpd                     │
                    │                                          │
   Client Request   │  ┌──────────────┐   ┌────────────────┐  │
   ─────────────────>  │ TCP Listener │──>│ Request Parser │  │
                    │  └──────────────┘   └───────┬────────┘  │
                    │                             │            │
                    │                    ┌────────v────────┐   │
                    │                    │ Router/Handler  │   │
                    │                    └────────┬────────┘   │
                    │           ┌────────────────┬─────────┐   │
                    │           v                v         v   │
                    │  ┌─────────────┐  ┌──────────┐ ┌────────┐│
                    │  │Static Files │  │Directory │ │ Error  ││
                    │  │  Handler    │  │ Listing  │ │Handler ││
                    │  └──────┬──────┘  └────┬─────┘ └───┬────┘│
                    │         └──────────────┴───────────┘     │
                    │                        │                 │
   HTTP Response    │              ┌─────────v─────────┐       │
   <─────────────────              │ Response Builder  │       │
                    │              └───────────────────┘       │
                    └─────────────────────────────────────────┘

3.2 Functional Requirements

  1. Socket Operations
    • Create a listening socket on specified port
    • Accept incoming connections
    • Handle multiple concurrent connections (can start with single-threaded)
  2. Request Parsing
    • Parse HTTP request line: METHOD /path HTTP/1.1
    • Parse all headers into key-value pairs
    • Handle request body for POST requests (by Content-Length)
    • Support URL decoding (%20 -> space, etc.)
  3. Response Generation
    • Generate proper HTTP/1.1 response headers
    • Set correct Content-Type based on file extension
    • Set Content-Length from file size
    • Include Date header with current time
  4. Static File Serving
    • Serve files from document root
    • Return 404 for missing files
    • Return 403 for directory access (unless listing enabled)
    • Support index.html as default for directories
  5. Keep-Alive Support
    • Honor Connection: keep-alive header
    • Timeout idle connections after configurable period
    • Close connection after max requests or timeout
  6. Range Requests
    • Parse Range header (bytes=start-end)
    • Return 206 Partial Content with Content-Range
    • Support single range requests (multi-range optional)
  7. Error Handling
    • Return appropriate status codes (400, 403, 404, 500, etc.)
    • Generate HTML error pages
    • Log errors to stderr
  8. Security
    • Prevent path traversal (../)
    • Validate all input
    • Use realpath() to resolve symbolic links safely

3.3 Non-Functional Requirements

  1. Performance
    • Use sendfile() for efficient file transfers
    • Buffer size of at least 8KB for headers
    • Support at least 100 concurrent connections
  2. Reliability
    • Handle all error conditions gracefully
    • No crashes on malformed input
    • Clean resource cleanup on shutdown
  3. Logging
    • Log all requests in combined log format
    • Include client IP, timestamp, request, status, bytes
  4. Portability
    • Compile on Linux and macOS
    • Use POSIX APIs where possible

3.4 Example Usage / Output

# 1. Start server
$ ./myhttpd -p 8080 -d /var/www
HTTP server started
Document root: /var/www
Listening on port 8080

# 2. Test with curl
$ curl -v http://localhost:8080/index.html
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /index.html HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.1.2
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/html
< Content-Length: 1234
< Connection: keep-alive
< Last-Modified: Sat, 15 Mar 2024 10:00:00 GMT
<
<!DOCTYPE html>
<html>...

# 3. Test keep-alive
$ curl -v http://localhost:8080/file1.txt http://localhost:8080/file2.txt
* Connection #0 reused (for file2.txt)

# 4. Test 404
$ curl -i http://localhost:8080/nonexistent
HTTP/1.1 404 Not Found
Content-Type: text/html
Content-Length: 123

<html><body><h1>404 Not Found</h1></body></html>

# 5. Test directory listing (if enabled)
$ curl http://localhost:8080/images/
<html><body>
<h1>Index of /images/</h1>
<ul>
  <li><a href="photo1.jpg">photo1.jpg</a></li>
  <li><a href="photo2.png">photo2.png</a></li>
</ul>
</body></html>

# 6. Test range request (for video streaming)
$ curl -H "Range: bytes=0-999" -i http://localhost:8080/video.mp4
HTTP/1.1 206 Partial Content
Content-Range: bytes 0-999/1000000
Content-Length: 1000

3.5 Real World Outcome

What you will see:

  1. Serve web pages: Works with real browsers (Chrome, Firefox, Safari)
  2. Static files: HTML, CSS, JS, images load correctly
  3. Keep-alive: Multiple requests per connection (verify with Wireshark or strace)
  4. Proper headers: Content-Type, Content-Length, Cache-Control all correct

Success Indicators:

  • Browser renders a complete web page with CSS and images
  • curl -v shows proper HTTP headers
  • strace shows sendfile() used for file transfers
  • Multiple requests reuse the same connection
  • No memory leaks (verified with Valgrind)

4. Solution Architecture

4.1 High-Level Design

System Architecture

    ┌─────────────────────────────────────────────────────────────┐
    │                        Main Process                          │
    │                                                              │
    │   ┌──────────────┐                                           │
    │   │   socket()   │ Create listening socket                   │
    │   │   bind()     │ Bind to port                              │
    │   │   listen()   │ Start accepting connections               │
    │   └──────┬───────┘                                           │
    │          │                                                   │
    │          v                                                   │
    │   ┌──────────────┐                                           │
    │   │   accept()   │ ─── Loop: wait for connections ───────┐  │
    │   └──────┬───────┘                                        │  │
    │          │                                                │  │
    │          v                                                │  │
    │   ┌────────────────────────────────────────────────────┐  │  │
    │   │              Connection Handler                     │  │  │
    │   │                                                     │  │  │
    │   │  ┌────────────────┐    ┌─────────────────────────┐ │  │  │
    │   │  │ Read Request   │───>│ Parse HTTP Request      │ │  │  │
    │   │  │ (recv loop)    │    │ - Request line          │ │  │  │
    │   │  └────────────────┘    │ - Headers               │ │  │  │
    │   │                        │ - Body (if POST)        │ │  │  │
    │   │                        └──────────┬──────────────┘ │  │  │
    │   │                                   │                 │  │  │
    │   │                        ┌──────────v──────────────┐ │  │  │
    │   │                        │ Validate & Route        │ │  │  │
    │   │                        │ - Check path safety     │ │  │  │
    │   │                        │ - Map to file           │ │  │  │
    │   │                        └──────────┬──────────────┘ │  │  │
    │   │                                   │                 │  │  │
    │   │             ┌─────────────────────┼─────────────┐   │  │  │
    │   │             v                     v             v   │  │  │
    │   │      ┌───────────┐        ┌────────────┐  ┌──────┐ │  │  │
    │   │      │Serve File │        │Dir Listing │  │Error │ │  │  │
    │   │      │sendfile() │        │Generate    │  │Page  │ │  │  │
    │   │      └─────┬─────┘        └─────┬──────┘  └──┬───┘ │  │  │
    │   │            └────────────────────┴────────────┘      │  │  │
    │   │                           │                         │  │  │
    │   │                 ┌─────────v─────────┐               │  │  │
    │   │                 │ Build Response    │               │  │  │
    │   │                 │ - Status line     │               │  │  │
    │   │                 │ - Headers         │               │  │  │
    │   │                 │ - Body            │               │  │  │
    │   │                 └─────────┬─────────┘               │  │  │
    │   │                           │                         │  │  │
    │   │                 ┌─────────v─────────┐               │  │  │
    │   │                 │ Send Response     │               │  │  │
    │   │                 │ - send() headers  │               │  │  │
    │   │                 │ - sendfile() body │               │  │  │
    │   │                 └─────────┬─────────┘               │  │  │
    │   │                           │                         │  │  │
    │   │                 ┌─────────v─────────┐               │  │  │
    │   │                 │ Keep-Alive?       │───Yes────>────┘  │  │
    │   │                 │ More requests?    │                   │  │
    │   │                 └─────────┬─────────┘                   │  │
    │   │                           │No                           │  │
    │   │                 ┌─────────v─────────┐                   │  │
    │   │                 │ close(conn_fd)    │                   │  │
    │   │                 └───────────────────┘                   │  │
    │   └─────────────────────────────────────────────────────────┘  │
    │                                                                │
    └────────────────────────────────────────────────────────────────┘

4.2 Key Components

  1. Server Initialization (server_init)
    • Parse command-line arguments
    • Create and bind listening socket
    • Set socket options (SO_REUSEADDR)
  2. Connection Handler (handle_connection)
    • Read data from socket
    • Call request parser
    • Route to appropriate handler
    • Build and send response
    • Handle keep-alive loop
  3. Request Parser (parse_request)
    • State machine for incremental parsing
    • Handle partial reads
    • Parse request line, headers, body
  4. Response Builder (build_response)
    • Generate status line
    • Add required headers
    • Handle different content types
  5. File Handler (serve_file)
    • Open and stat file
    • Set Content-Type from extension
    • Use sendfile() for efficient transfer
  6. Security Module (validate_path)
    • URL decode path
    • Resolve with realpath()
    • Verify within document root

4.3 Data Structures

/* Configuration */
typedef struct {
    int port;
    char document_root[PATH_MAX];
    int enable_dir_listing;
    int keep_alive_timeout;
    int max_request_size;
} server_config_t;

/* HTTP Request */
typedef struct {
    char method[16];           /* GET, POST, HEAD, etc. */
    char uri[2048];            /* Request URI */
    char version[16];          /* HTTP/1.1 */
    char host[256];            /* Host header */
    size_t content_length;     /* Content-Length header */
    int keep_alive;            /* Connection: keep-alive */
    char *headers;             /* Raw headers buffer */
    int header_count;          /* Number of headers */
    /* Range request support */
    int has_range;
    size_t range_start;
    size_t range_end;
} http_request_t;

/* HTTP Response */
typedef struct {
    int status_code;           /* 200, 404, etc. */
    const char *status_text;   /* "OK", "Not Found", etc. */
    char headers[4096];        /* Response headers */
    size_t header_len;
    int fd;                    /* File descriptor for sendfile */
    size_t content_length;
    const char *content_type;
} http_response_t;

/* Connection state */
typedef struct {
    int fd;                    /* Socket file descriptor */
    char buffer[8192];         /* Read buffer */
    size_t buffer_len;         /* Bytes in buffer */
    int requests_served;       /* For keep-alive limit */
    time_t last_activity;      /* For timeout */
} connection_t;

/* MIME type mapping */
typedef struct {
    const char *extension;
    const char *mime_type;
} mime_entry_t;

static const mime_entry_t mime_types[] = {
    {".html", "text/html"},
    {".htm",  "text/html"},
    {".css",  "text/css"},
    {".js",   "application/javascript"},
    {".json", "application/json"},
    {".png",  "image/png"},
    {".jpg",  "image/jpeg"},
    {".jpeg", "image/jpeg"},
    {".gif",  "image/gif"},
    {".svg",  "image/svg+xml"},
    {".ico",  "image/x-icon"},
    {".txt",  "text/plain"},
    {".pdf",  "application/pdf"},
    {".mp4",  "video/mp4"},
    {".webm", "video/webm"},
    {NULL,    "application/octet-stream"}
};

4.4 Algorithm Overview

Request Parsing State Machine:

┌───────────────────────────────────────────────────────────────┐
│                    Parser States                               │
│                                                                │
│  ┌─────────────────┐                                          │
│  │ STATE_START     │ ─── Read bytes ───> Check for \r\n       │
│  └────────┬────────┘                                          │
│           │ Found request line                                │
│           v                                                   │
│  ┌─────────────────┐                                          │
│  │ STATE_REQUEST   │ Parse: METHOD SP URI SP VERSION CRLF     │
│  │ LINE            │                                          │
│  └────────┬────────┘                                          │
│           │ Request line complete                              │
│           v                                                   │
│  ┌─────────────────┐                                          │
│  │ STATE_HEADERS   │ ─── Read header ───> key: value CRLF     │
│  │                 │ <── Loop until empty line (CRLF CRLF)    │
│  └────────┬────────┘                                          │
│           │ Empty line found                                  │
│           v                                                   │
│  ┌─────────────────┐                                          │
│  │ STATE_BODY      │ ─── Read Content-Length bytes ──────>    │
│  │ (if POST)       │     (or until connection close)          │
│  └────────┬────────┘                                          │
│           │ Body complete                                      │
│           v                                                   │
│  ┌─────────────────┐                                          │
│  │ STATE_COMPLETE  │ Request ready for processing              │
│  └─────────────────┘                                          │
│                                                                │
└───────────────────────────────────────────────────────────────┘

Edge Cases:
- \r\n split across recv() calls
- Partial header received
- Content-Length mismatch
- Invalid characters in method/URI

File Serving Algorithm:

serve_file(path):
    1. URL decode path (convert %xx to characters)
    2. Resolve path with realpath()
    3. Check if path starts with document_root (security!)
    4. stat() the path
    5. If directory:
       - Check for index.html
       - Or generate directory listing
    6. If file:
       - Get Content-Type from extension
       - Check Range header for partial content
       - Build response headers
       - Use sendfile() to transfer content
    7. Return appropriate status code

5. Implementation Guide

5.1 Development Environment Setup

# Install required packages (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install -y build-essential gdb valgrind strace

# Install required packages (macOS)
xcode-select --install
brew install valgrind  # Note: may not work on newer macOS

# Create project directory
mkdir -p ~/projects/myhttpd
cd ~/projects/myhttpd

# Create initial files
touch myhttpd.c Makefile

# Create test document root
mkdir -p www
echo "<html><body><h1>Hello, World!</h1></body></html>" > www/index.html
echo "body { color: blue; }" > www/style.css

Makefile:

CC = gcc
CFLAGS = -Wall -Wextra -g -O2 -std=c11
LDFLAGS =

TARGET = myhttpd
SRCS = myhttpd.c
OBJS = $(SRCS:.c=.o)

.PHONY: all clean test

all: $(TARGET)

$(TARGET): $(OBJS)
	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)

%.o: %.c
	$(CC) $(CFLAGS) -c -o $@ $<

clean:
	rm -f $(TARGET) $(OBJS)

test: $(TARGET)
	./$(TARGET) -p 8080 -d ./www &
	sleep 1
	curl -v http://localhost:8080/
	curl -v http://localhost:8080/index.html
	pkill -f "./$(TARGET)"

5.2 Project Structure

myhttpd/
├── Makefile
├── myhttpd.c          # Main server code
├── http_parser.c      # Request parsing (optional separate file)
├── http_parser.h
├── mime_types.c       # MIME type detection
├── mime_types.h
├── www/               # Test document root
│   ├── index.html
│   ├── style.css
│   ├── script.js
│   └── images/
│       ├── logo.png
│       └── photo.jpg
└── tests/
    ├── test_parser.c
    ├── test_range.sh
    └── test_keepalive.sh

5.3 The Core Question You’re Answering

“How does HTTP—the protocol that powers the web—actually work at the byte level?”

Understanding HTTP deeply is essential for any web developer or systems programmer. You’ll see:

  • Why headers are case-insensitive
  • Why chunked encoding exists
  • How keep-alive improves performance
  • Why CRLF matters

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. HTTP/1.1 Message Format
    • Request line: method, path, version
    • Headers: key-value pairs (case-insensitive keys)
    • Body: optional, length indicated by Content-Length
    • Reference: RFC 7230-7235
  2. Keep-Alive Connections
    • Connection header semantics
    • When to close vs. keep open
    • Timeout handling and request limits
    • Reference: RFC 7230 Section 6
  3. Content Types (MIME)
    • File extension to Content-Type mapping
    • text/html, application/json, image/png, etc.
    • Charset specification (text/html; charset=utf-8)
    • Reference: IANA Media Types registry
  4. Error Responses
    • 2xx Success (200 OK, 206 Partial Content)
    • 3xx Redirect (301 Moved Permanently, 302 Found)
    • 4xx Client Error (400 Bad Request, 404 Not Found)
    • 5xx Server Error (500 Internal Server Error)
    • Proper error pages with HTML body

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. Request Parsing
    • How do you handle incomplete requests (data split across recv() calls)?
    • What if headers are spread across multiple TCP packets?
    • How do you detect the end of headers?
    • What is your maximum request size limit?
  2. File Serving
    • How do you handle large files efficiently?
    • Should you use sendfile() or read()/write()?
    • What about memory-mapped files?
    • How do you handle binary vs. text files?
  3. Security
    • How do you prevent path traversal attacks (../../../etc/passwd)?
    • What about symbolic links pointing outside document root?
    • How do you handle null bytes in paths?
    • What HTTP methods should you reject?
  4. Performance
    • How do you minimize system calls?
    • What is the optimal buffer size?
    • Should you use non-blocking I/O?
    • How do you handle slow clients?

5.6 Thinking Exercise

Parse an HTTP Request

Parse this request byte by byte:

GET /index.html HTTP/1.1\r\n
Host: localhost:8080\r\n
Connection: keep-alive\r\n
Accept: text/html\r\n
\r\n

State machine:
1. REQUEST_LINE: read until \r\n
   Parse: method="GET", path="/index.html", version="HTTP/1.1"

2. HEADERS: read lines until \r\n\r\n (empty line)
   For each line: split on ": " -> key, value
   Store in hash table or array

3. Check for Content-Length header
   If present: read that many bytes for body
   If absent: no body (for GET/HEAD)

4. Request complete! Process it.

Edge cases:
- What if "\r\n" is split across recv() calls?
- What if Content-Length is wrong?
- What if path has URL encoding (%20)?

Trace through this manually:

  1. Draw the state transitions
  2. Mark where you’d buffer data
  3. Identify error conditions at each state

5.7 Hints in Layers

Hint 1: Request Parser State Machine

Use states: READING_REQUEST_LINE, READING_HEADERS, READING_BODY, REQUEST_COMPLETE.

typedef enum {
    STATE_REQUEST_LINE,
    STATE_HEADERS,
    STATE_BODY,
    STATE_COMPLETE,
    STATE_ERROR
} parser_state_t;

int parse_request(connection_t *conn, http_request_t *req) {
    char *line_end;

    while (conn->buffer_len > 0) {
        switch (conn->state) {
            case STATE_REQUEST_LINE:
                line_end = strstr(conn->buffer, "\r\n");
                if (!line_end) return PARSE_INCOMPLETE;
                /* Parse: METHOD SP URI SP VERSION CRLF */
                parse_request_line(conn->buffer, line_end - conn->buffer, req);
                /* Shift buffer past this line */
                memmove(conn->buffer, line_end + 2,
                        conn->buffer_len - (line_end + 2 - conn->buffer));
                conn->state = STATE_HEADERS;
                break;
            /* ... */
        }
    }
}

Hint 2: Simple MIME Type Detection

const char *get_content_type(const char *path) {
    const char *ext = strrchr(path, '.');
    if (!ext) return "application/octet-stream";
    if (strcmp(ext, ".html") == 0) return "text/html";
    if (strcmp(ext, ".css") == 0) return "text/css";
    if (strcmp(ext, ".js") == 0) return "application/javascript";
    if (strcmp(ext, ".png") == 0) return "image/png";
    if (strcmp(ext, ".jpg") == 0) return "image/jpeg";
    return "application/octet-stream";
}

Hint 3: Response Builder

char response[4096];
int len = snprintf(response, sizeof(response),
    "HTTP/1.1 200 OK\r\n"
    "Content-Type: %s\r\n"
    "Content-Length: %ld\r\n"
    "Connection: keep-alive\r\n"
    "\r\n",
    content_type, file_size);

/* Send headers */
send(client_fd, response, len, 0);

/* Send file body */
sendfile(client_fd, file_fd, NULL, file_size);

Hint 4: Efficient File Sending

Use sendfile() system call to avoid copying file data through userspace:

#include <sys/sendfile.h>  /* Linux */
/* or */
#include <sys/uio.h>       /* macOS (use sendfile differently) */

/* Linux version */
ssize_t bytes_sent = sendfile(client_fd, file_fd, &offset, remaining);

/* macOS version (different signature!) */
off_t len = file_size;
sendfile(file_fd, client_fd, 0, &len, NULL, 0);

5.8 The Interview Questions They’ll Ask

  1. “What’s the difference between HTTP/1.0, HTTP/1.1, and HTTP/2?”
    • HTTP/1.0: New connection per request
    • HTTP/1.1: Keep-alive default, Host required, chunked encoding
    • HTTP/2: Binary framing, multiplexing, header compression
  2. “How does HTTP keep-alive work?”
    • Connection persists after response
    • Client can send another request on same socket
    • Server uses Content-Length or chunked encoding to know when response ends
    • Timeout and max-requests limits prevent resource exhaustion
  3. “What is chunked transfer encoding and why does it exist?”
    • Allows sending response before knowing total length
    • Format: chunk-size CRLF chunk-data CRLF
    • Ends with: 0 CRLF CRLF
    • Used for dynamic content, streaming
  4. “How would you handle a POST request with a large body?”
    • Read Content-Length header first
    • Stream body to temp file (don’t buffer in memory)
    • Use splice() or similar for efficiency
    • Impose size limits (413 Request Entity Too Large)
  5. “What is the Host header and why is it required in HTTP/1.1?”
    • Enables virtual hosting: multiple domains on one IP
    • Server uses Host to select which site to serve
    • Missing Host = 400 Bad Request
  6. “How would you implement range requests?”
    • Parse Range header: bytes=start-end
    • Seek to start position in file
    • Send Content-Range header: bytes start-end/total
    • Return 206 Partial Content status
  7. “What security concerns exist in an HTTP server?”
    • Path traversal (../)
    • Buffer overflows in parsing
    • Denial of service (slow clients, large headers)
    • Symbolic link attacks
    • Integer overflows in Content-Length

5.9 Books That Will Help

Topic Book Chapter
HTTP protocol RFC 7230-7235 All
Socket programming “UNIX Network Programming, Vol 1” by Stevens Ch. 4-6
Web servers “HTTP: The Definitive Guide” by Gourley & Totty Ch. 1-5, 15
Advanced I/O “The Linux Programming Interface” by Kerrisk Ch. 59-61
Performance “High Performance Browser Networking” by Grigorik Ch. 9-12

5.10 Implementation Phases

Phase 1: Basic Server (Week 1)

  • Create listening socket
  • Accept one connection
  • Read raw request bytes
  • Print request to stdout
  • Send hardcoded “Hello World” response
  • Close connection

Phase 2: Request Parser (Week 1)

  • Implement state machine parser
  • Parse request line (method, path, version)
  • Parse headers into structure
  • Handle incomplete reads
  • Add URL decoding

Phase 3: File Serving (Week 2)

  • Map path to filesystem
  • Open and stat files
  • Detect MIME types
  • Build proper response headers
  • Send file content

Phase 4: Security & Polish (Week 2)

  • Path traversal prevention
  • realpath() validation
  • Error handling (400, 404, 500)
  • HTML error pages
  • Logging

Phase 5: Keep-Alive & Range (Week 3)

  • Parse Connection header
  • Loop for multiple requests
  • Timeout handling
  • Range request support
  • 206 Partial Content

5.11 Key Implementation Decisions

Decision Option A Option B Recommendation
Concurrency Single-threaded Thread-per-connection Start single, add threads later
Buffer management Static buffers Dynamic allocation Static (8KB) for simplicity
File sending read/write loop sendfile() sendfile() for performance
Header storage Array Hash table Array (simpler, usually <20 headers)
Path handling String manipulation realpath() realpath() for security
Keep-alive Always on Honor Connection header Honor header, default on for HTTP/1.1

6. Testing Strategy

6.1 Unit Tests

/* Test request line parsing */
void test_parse_request_line(void) {
    http_request_t req = {0};
    const char *line = "GET /index.html HTTP/1.1";

    int result = parse_request_line(line, strlen(line), &req);

    assert(result == 0);
    assert(strcmp(req.method, "GET") == 0);
    assert(strcmp(req.uri, "/index.html") == 0);
    assert(strcmp(req.version, "HTTP/1.1") == 0);
}

/* Test path traversal detection */
void test_path_traversal(void) {
    assert(validate_path("/var/www", "/var/www/index.html") == 0);
    assert(validate_path("/var/www", "/var/www/../etc/passwd") == -1);
    assert(validate_path("/var/www", "/var/www/sub/../index.html") == 0);
}

/* Test MIME type detection */
void test_mime_types(void) {
    assert(strcmp(get_content_type("/test.html"), "text/html") == 0);
    assert(strcmp(get_content_type("/test.HTML"), "text/html") == 0);
    assert(strcmp(get_content_type("/test.png"), "image/png") == 0);
    assert(strcmp(get_content_type("/test"), "application/octet-stream") == 0);
}

/* Test URL decoding */
void test_url_decode(void) {
    char buf[256];
    url_decode("/path%20with%20spaces", buf, sizeof(buf));
    assert(strcmp(buf, "/path with spaces") == 0);

    url_decode("/path%2F%2E%2E/test", buf, sizeof(buf));
    assert(strcmp(buf, "/path/../test") == 0);
}

6.2 Integration Tests

#!/bin/bash
# test_integration.sh

SERVER_PID=0
PORT=8081
DOC_ROOT="./test_www"

setup() {
    mkdir -p $DOC_ROOT
    echo "<html><body>Hello</body></html>" > $DOC_ROOT/index.html
    echo "Test file content" > $DOC_ROOT/test.txt
    mkdir -p $DOC_ROOT/subdir
    echo "Subdir file" > $DOC_ROOT/subdir/file.txt

    ./myhttpd -p $PORT -d $DOC_ROOT &
    SERVER_PID=$!
    sleep 1
}

teardown() {
    kill $SERVER_PID 2>/dev/null
    rm -rf $DOC_ROOT
}

test_get_index() {
    response=$(curl -s -w "%{http_code}" http://localhost:$PORT/)
    [[ "$response" == *"Hello"* ]] && [[ "$response" == *"200" ]]
}

test_get_404() {
    status=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:$PORT/nonexistent)
    [[ "$status" == "404" ]]
}

test_path_traversal() {
    status=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:$PORT/../../../etc/passwd)
    [[ "$status" == "404" ]] || [[ "$status" == "400" ]]
}

test_keep_alive() {
    # Make two requests on same connection
    response=$(curl -v http://localhost:$PORT/index.html http://localhost:$PORT/test.txt 2>&1)
    [[ "$response" == *"Re-using existing connection"* ]] || [[ "$response" == *"reused"* ]]
}

# Run tests
trap teardown EXIT
setup

test_get_index && echo "PASS: test_get_index" || echo "FAIL: test_get_index"
test_get_404 && echo "PASS: test_get_404" || echo "FAIL: test_get_404"
test_path_traversal && echo "PASS: test_path_traversal" || echo "FAIL: test_path_traversal"
test_keep_alive && echo "PASS: test_keep_alive" || echo "FAIL: test_keep_alive"

6.3 Edge Cases to Test

  1. Request Parsing
    • Empty request
    • Request with only \r\n
    • Very long request line (>8KB)
    • Invalid HTTP method
    • Missing Host header
    • Headers with no value
    • Case sensitivity in headers
  2. File Handling
    • Zero-byte files
    • Very large files (>1GB)
    • Files with spaces in name
    • Hidden files (dotfiles)
    • Symbolic links
    • Files without read permission
  3. Keep-Alive
    • Client closes before second request
    • Timeout between requests
    • Maximum requests exceeded
    • Pipelined requests (multiple requests without waiting for response)
  4. Range Requests
    • Invalid range (start > end)
    • Range beyond file size
    • Multiple ranges (optional to support)
    • Range with only start: bytes=100-
    • Range with only end: bytes=-100
  5. Error Conditions
    • Server out of file descriptors
    • Document root doesn’t exist
    • Write error mid-response
    • Client disconnect during sendfile

6.4 Verification Commands

# Trace system calls
strace -f ./myhttpd -p 8080 -d ./www 2>&1 | tee strace.log

# Check for sendfile usage
strace ./myhttpd -p 8080 -d ./www 2>&1 | grep sendfile

# Memory leak detection
valgrind --leak-check=full --show-leak-kinds=all ./myhttpd -p 8080 -d ./www

# Check file descriptor leaks
ls -la /proc/$(pgrep myhttpd)/fd/

# Load testing
ab -n 1000 -c 10 http://localhost:8080/index.html
wrk -t4 -c100 -d30s http://localhost:8080/index.html

# Protocol compliance
curl -v -H "Host: test" http://localhost:8080/
curl -v -X POST -d "data" http://localhost:8080/
curl -v -H "Connection: close" http://localhost:8080/

# Range request
curl -v -H "Range: bytes=0-99" http://localhost:8080/largefile.txt

7. Common Pitfalls & Debugging

Problem 1: “Browser shows blank page”

  • Why: Missing Content-Length or wrong value
  • Fix: Use stat() to get exact file size, verify with ls -la
  • Test: curl -v http://localhost:8080/ and check Content-Length header

Problem 2: “Connection reset by peer”

  • Why: Closing connection before client finished reading
  • Fix: Shutdown write side first, wait for client to close
  • Code:
    shutdown(client_fd, SHUT_WR);  /* Stop sending */
    /* Read until EOF or error */
    while (recv(client_fd, buf, sizeof(buf), 0) > 0);
    close(client_fd);
    

Problem 3: “Path traversal vulnerability”

  • Why: Not sanitizing .. in paths
  • Fix: Use realpath() and verify result starts with document root
  • Code:
    char resolved[PATH_MAX];
    if (realpath(full_path, resolved) == NULL) {
        return -1;  /* Invalid path */
    }
    if (strncmp(resolved, doc_root, strlen(doc_root)) != 0) {
        return -1;  /* Path escapes document root */
    }
    

Problem 4: “Keep-alive not working”

  • Why: Not sending Connection: keep-alive header, or closing too soon
  • Fix: Check Connection header in request, honor timeout
  • Debug: curl -v should show “Connection #0 to host … left intact”

Problem 5: “Partial file sent”

  • Why: sendfile() may not send all bytes in one call
  • Fix: Loop until all bytes sent
  • Code:
    size_t remaining = file_size;
    while (remaining > 0) {
        ssize_t sent = sendfile(client_fd, file_fd, NULL, remaining);
        if (sent <= 0) {
            if (errno == EAGAIN) continue;
            break;  /* Error */
        }
        remaining -= sent;
    }
    

Problem 6: “Request parsing fails randomly”

  • Why: Request split across multiple recv() calls
  • Fix: Buffer all data until \r\n\r\n found
  • Debug: Add logging for each recv() showing bytes received

Problem 7: “URL with spaces fails”

  • Why: Not URL-decoding the path
  • Fix: Implement URL decode for %xx sequences
  • Test: curl "http://localhost:8080/file%20with%20spaces.txt"

Problem 8: “Memory leak on each request”

  • Why: Not freeing dynamically allocated headers/body
  • Debug: Run with Valgrind, check for “definitely lost” blocks
  • Fix: Free all allocations at end of request handling

8. Extensions & Challenges

8.1 Easy Extensions

  1. Add directory listing
    • Generate HTML page listing files in directory
    • Sort by name, show file sizes and dates
  2. Add Last-Modified header
    • Use stat() mtime for Last-Modified
    • Return 304 Not Modified if If-Modified-Since matches
  3. Add logging
    • Combined Log Format (like Apache/nginx)
    • Log: IP, date, request, status, bytes, user-agent
  4. Add command-line options
    • -p PORT: Listen port
    • -d ROOT: Document root
    • -t TIMEOUT: Keep-alive timeout
    • -v: Verbose mode

8.2 Advanced Challenges

  1. Thread pool for concurrent connections
    • Use pthread with work queue
    • Handle connection lifecycle across threads
  2. TLS/HTTPS support
    • Integrate OpenSSL or mbedTLS
    • Handle certificate loading and handshake
  3. HTTP/2 support
    • Binary framing layer
    • Stream multiplexing
    • Header compression (HPACK)
  4. CGI support
    • Execute scripts for .cgi or .php
    • Set environment variables
    • Capture stdout as response
  5. Reverse proxy mode
    • Forward requests to backend server
    • Support load balancing
  6. WebSocket upgrade
    • Detect Upgrade header
    • Perform WebSocket handshake
    • Handle WebSocket frames

8.3 Research Topics

  1. Zero-copy I/O
    • Compare sendfile, splice, vmsplice
    • Measure performance differences
  2. Event-driven architecture
    • Convert to epoll-based design
    • Compare with thread-per-connection
  3. HTTP/3 and QUIC
    • Understand UDP-based transport
    • Connection migration, 0-RTT
  4. Web server security
    • Study CVEs in nginx/Apache
    • OWASP guidelines for web servers

9. Real-World Connections

9.1 Production Systems Using This

System Requests/sec Key Features
nginx 50,000+ Event-driven, low memory
Apache httpd 10,000+ Process/thread models, modules
lighttpd 40,000+ Single-threaded, event-driven
Caddy 20,000+ Automatic HTTPS, HTTP/2
Node.js http 30,000+ JavaScript, async I/O

9.2 How the Pros Do It

nginx architecture:

Master Process
    │
    ├─── Worker Process 1 (epoll loop)
    │         ├─── Connection 1
    │         ├─── Connection 2
    │         └─── ... (thousands)
    │
    ├─── Worker Process 2 (epoll loop)
    │
    └─── Worker Process N
  • Uses epoll (Linux) or kqueue (BSD/macOS)
  • Single thread per worker handles thousands of connections
  • Memory-efficient: reuses buffers, minimizes allocations
  • Zero-copy with sendfile()

Apache httpd:

  • Multiple processing models (prefork, worker, event)
  • Prefork: process-per-connection
  • Event MPM: hybrid threads + events
  • Rich module ecosystem

9.3 Reading the Source

Start with:

  1. lighttpd: Smaller codebase, good learning example
    • src/connections.c: Connection handling
    • src/request.c: Request parsing
    • src/response.c: Response generation
  2. nginx: Production quality, complex but well-documented
    • src/http/ngx_http_parse.c: HTTP parser
    • src/http/ngx_http_request.c: Request lifecycle
    • src/os/unix/ngx_readv_chain.c: Efficient I/O
  3. h2o: Modern C server with HTTP/2
    • Clean codebase, good documentation

10. Resources

10.1 Man Pages

# Essential
man 2 socket
man 2 bind
man 2 listen
man 2 accept
man 2 recv
man 2 send
man 2 sendfile
man 2 stat
man 2 open
man 2 close
man 2 shutdown

# Helpful
man 7 socket
man 7 tcp
man 7 ip
man 3 realpath
man 3 strftime  # For Date header

10.2 Online Resources

  • RFC 7230: HTTP/1.1 Message Syntax and Routing
    • https://tools.ietf.org/html/rfc7230
  • RFC 7231: HTTP/1.1 Semantics and Content
    • https://tools.ietf.org/html/rfc7231
  • IANA Media Types: Official MIME type registry
    • https://www.iana.org/assignments/media-types/
  • MDN HTTP Guide: Excellent conceptual overview
    • https://developer.mozilla.org/en-US/docs/Web/HTTP
  • Beej’s Guide to Network Programming
    • https://beej.us/guide/bgnet/

10.3 Book Chapters

Book Chapters Focus
“UNIX Network Programming, Vol 1” by Stevens Ch. 1-8, 14 Socket fundamentals
“The Linux Programming Interface” by Kerrisk Ch. 56-63 Sockets, advanced I/O
“HTTP: The Definitive Guide” Ch. 1-5, 15 HTTP protocol deep dive
“High Performance Browser Networking” Ch. 9-11 Performance optimization
“Computer Networks” by Tanenbaum Ch. 7 HTTP/WWW concepts

11. Self-Assessment Checklist

Before considering this project complete, verify:

  • I can explain how HTTP request parsing works with a state machine
  • I understand why keep-alive improves performance
  • I can describe what sendfile() does and why it’s efficient
  • I know how to prevent path traversal attacks
  • My implementation handles partial reads correctly
  • My implementation handles large files without excessive memory use
  • I can explain the difference between HTTP/1.0 and HTTP/1.1
  • I understand what MIME types are and how they’re detected
  • I can answer all the interview questions confidently
  • My code passes all tests with zero Valgrind errors
  • Real browsers can load pages from my server
  • Keep-alive connections work correctly
  • Error pages return proper status codes

12. Submission / Completion Criteria

This project is complete when:

  1. Functional Requirements Met:
    • Server starts and listens on specified port
    • Serves static files (HTML, CSS, JS, images)
    • Returns 404 for missing files
    • Returns 400 for malformed requests
    • Handles keep-alive connections
  2. Security Requirements Met:
    • Path traversal attacks are blocked
    • Large requests are rejected
    • No crashes on malformed input
  3. Quality Requirements Met:
    • No memory leaks (verified with Valgrind)
    • No file descriptor leaks
    • Clean shutdown on SIGINT/SIGTERM
  4. Testing Complete:
    • All unit tests pass
    • All integration tests pass
    • Manual testing with real browser successful
    • Performance test with ab or wrk shows reasonable throughput
  5. Documentation:
    • README with build and usage instructions
    • Code comments for complex sections
    • Design decisions documented

Stretch Goals (Optional):

  • Range requests (206 Partial Content)
  • Directory listing
  • ETag support
  • Gzip compression
  • Thread pool for concurrency