Project 6: HTTP Server with Request Pooling (Capstone)

Sprint: 2 - Data & Invariants Difficulty: Expert Time Estimate: 2-4 weeks Prerequisites: All previous Sprint 2 projects, basic understanding of sockets

Overview

What you’ll build: A single-threaded HTTP/1.1 server that handles multiple concurrent connections using select()/poll(), with a custom memory pool for request parsing, demonstrating every concept from this sprint in a production-relevant context.

Why this is the ultimate test: A network server is where memory bugs become security bugs. You must:

Parse untrusted input into owned data structures
Track connection state with strict invariants (partial reads, connection lifecycle)
Prevent buffer overflows in parsing (attackers WILL send malformed data)
Free resources correctly when connections close unexpectedly
Handle ownership of request data across parse/handle/respond phases

The Core Question You’re Answering:

“Can I build a production-quality network service that handles untrusted input, manages complex state, and never leaks memory or crashes?”

Learning Objectives

By the end of this project, you will be able to:

Implement an event-driven server using select()/poll()
Design a connection state machine with clear invariants
Use pool allocation for request handling
Parse HTTP safely with bounds checking
Handle partial reads/writes correctly
Survive fuzzing without crashes or leaks
Build production-quality systems software

Theoretical Foundation

Connection State Machine

CONNECTION STATES:
┌────────────────────────────────────────────────────────────────────┐
│                                                                    │
│   ┌─────────────┐      read data      ┌─────────────────────┐     │
│   │   READING   │ ─────────────────── │ READING_HEADERS     │     │
│   │   REQUEST   │                     │ (partial request)   │     │
│   └─────────────┘                     └─────────────────────┘     │
│         │                                      │                   │
│         │ headers complete                     │ more headers      │
│         ▼                                      │                   │
│   ┌─────────────┐                              │                   │
│   │  PROCESSING │ ◄────────────────────────────┘                   │
│   │  (building  │                                                  │
│   │   response) │                                                  │
│   └─────────────┘                                                  │
│         │                                                          │
│         │ response ready                                           │
│         ▼                                                          │
│   ┌─────────────┐      send complete    ┌─────────────┐           │
│   │   SENDING   │ ───────────────────── │   CLOSING   │           │
│   │  RESPONSE   │                       │             │           │
│   └─────────────┘                       └─────────────┘           │
│         │                                      │                   │
│         │ partial send                         │                   │
│         └──────────────────────────────────────┘                   │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

INVARIANTS:
1. Each connection is in EXACTLY one state
2. Buffer contains only data valid for current state
3. State transitions only happen on specific events
4. Resources are freed when entering CLOSING state

Request Pool Allocation

Per-Request Memory Pool:
┌────────────────────────────────────────────────────────────────────┐
│                                                                    │
│   Connection starts:                                               │
│   ┌─────────────────────────────────────────────────┐             │
│   │ Pool: 4KB                                        │             │
│   │ [──────────────── available ──────────────────] │             │
│   └─────────────────────────────────────────────────┘             │
│                                                                    │
│   After parsing request:                                           │
│   ┌─────────────────────────────────────────────────┐             │
│   │ Pool: 4KB                                        │             │
│   │ [method][uri][headers][][ available ]            │             │
│   │  ← 312 bytes used →    ← 3784 free →            │             │
│   └─────────────────────────────────────────────────┘             │
│                                                                    │
│   After response sent:                                             │
│   pool_reset() → 0 bytes used, ready for next request             │
│                                                                    │
│   KEY INSIGHT:                                                     │
│   - Zero malloc() calls during request handling                   │
│   - All request data has same lifetime                            │
│   - pool_reset() is O(1), not O(n) frees                         │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

Defensive Parsing

HTTP Request Format:
─────────────────────
GET /index.html HTTP/1.1\r\n
Host: localhost\r\n
Content-Length: 0\r\n
\r\n

ATTACK VECTORS (must handle all):
┌────────────────────────────────────────────────────────────────────┐
│ 1. Oversized URI                                                   │
│    GET /AAAAAAA...(10KB)...AAAA HTTP/1.1                          │
│    → Reject with 414 URI Too Long                                  │
│                                                                    │
│ 2. Invalid Content-Length                                          │
│    Content-Length: 99999999999                                     │
│    → Reject with 413 Payload Too Large                            │
│                                                                    │
│ 3. Null bytes in headers                                           │
│    Host: localhost\x00malicious                                   │
│    → Reject with 400 Bad Request                                   │
│                                                                    │
│ 4. Missing terminator                                              │
│    GET /index.html HTTP/1.1\r\nHost: local(connection closes)     │
│    → Timeout and close cleanly                                     │
│                                                                    │
│ 5. Slowloris attack                                                │
│    Send partial request, hold connection                           │
│    → Timeout after 30 seconds, close                              │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

Project Specification

Core API

// Server lifecycle
Server* server_create(int port, const char* docroot);
void server_run(Server* server);  // Main event loop
void server_stop(Server* server);
void server_destroy(Server* server);

// Configuration
void server_set_pool_size(Server* server, size_t size);
void server_set_max_connections(Server* server, int max);
void server_set_timeout(Server* server, int seconds);

// Statistics
ServerStats server_get_stats(Server* server);
void server_print_stats(Server* server);

Expected Output

$ ./httpserver --pool-size 4096 --max-connections 128 --port 8080 ./www
[INFO] Initializing memory pools...
[INFO] Created 128 connection pools (4096 bytes each)
[INFO] Total pre-allocated memory: 512 KB
[INFO] Listening on 0.0.0.0:8080
[INFO] Document root: /home/user/project/www
[INFO] Server ready. Press Ctrl+C to stop.

[CONN 0] New connection from 127.0.0.1:52341
[CONN 0] State: READING_REQUEST_LINE
[CONN 0] Read 78 bytes, buffer: 78/4096
[CONN 0] Parsed: GET /index.html HTTP/1.1
[CONN 0] State: READING_HEADERS → PROCESSING
[CONN 0] Headers parsed: 6 headers, used 312 bytes pool
[CONN 0] State: PROCESSING → SENDING_RESPONSE
[CONN 0] Serving file: ./www/index.html (1247 bytes)
[CONN 0] Sent 1247 bytes
[CONN 0] State: SENDING_RESPONSE → CLOSING
[CONN 0] Pool reset, 312 bytes freed
[CONN 0] Connection closed
[CONN 0] Total lifetime: 3ms, heap allocs: 0

Memory Statistics

========== Memory Statistics ==========
Total requests served: 1,247
Total bytes received: 156,783
Total bytes sent: 4,291,034

Pool Statistics:
  Active connections: 3
  Peak connections: 47
  Pool resets: 1,247
  Average pool usage: 412 bytes/request
  Peak pool usage: 3,891 bytes
  Pool overflows: 0

Heap Statistics:
  malloc() calls during runtime: 0
  free() calls during runtime: 0
  Current heap usage: 0 bytes

Connection State Breakdown:
  READING_REQUEST_LINE: 2
  READING_HEADERS: 1
  PROCESSING: 0
  SENDING_RESPONSE: 0

Invariant Checks Passed: 47,291
Invariant Violations: 0
=======================================

Solution Architecture

Data Structures

typedef enum {
    CONN_READING_REQUEST,
    CONN_READING_HEADERS,
    CONN_PROCESSING,
    CONN_SENDING_RESPONSE,
    CONN_CLOSING
} ConnectionState;

typedef struct {
    int fd;
    ConnectionState state;

    // Request parsing
    char read_buffer[8192];
    size_t read_pos;

    // Pool for request data
    Arena* pool;

    // Parsed request (allocated from pool)
    char* method;
    char* uri;
    char* version;
    HttpHeader* headers;
    size_t header_count;

    // Response
    char* response_buffer;
    size_t response_len;
    size_t response_sent;

    // Timing
    time_t connected_at;
    time_t last_activity;
} Connection;

typedef struct {
    int listen_fd;
    Connection* connections[MAX_CONNECTIONS];
    Arena* pools[MAX_CONNECTIONS];

    const char* docroot;
    ServerStats stats;

    bool running;
} Server;

Implementation Guide

Phase 1: Basic Socket Server (Day 1-2)

Server* server_create(int port, const char* docroot) {
    Server* server = calloc(1, sizeof(Server));
    server->docroot = strdup(docroot);

    // Create listening socket
    server->listen_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (server->listen_fd < 0) {
        perror("socket");
        free(server);
        return NULL;
    }

    // Allow port reuse
    int opt = 1;
    setsockopt(server->listen_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    // Bind
    struct sockaddr_in addr = {0};
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = INADDR_ANY;
    addr.sin_port = htons(port);

    if (bind(server->listen_fd, (struct sockaddr*)&addr, sizeof(addr)) < 0) {
        perror("bind");
        close(server->listen_fd);
        free(server);
        return NULL;
    }

    listen(server->listen_fd, 128);

    // Pre-allocate pools
    for (int i = 0; i < MAX_CONNECTIONS; i++) {
        server->pools[i] = arena_create(POOL_SIZE);
    }

    return server;
}

Phase 2: Event Loop with select() (Day 2-3)

void server_run(Server* server) {
    server->running = true;

    while (server->running) {
        fd_set read_fds, write_fds;
        FD_ZERO(&read_fds);
        FD_ZERO(&write_fds);

        int max_fd = server->listen_fd;
        FD_SET(server->listen_fd, &read_fds);

        // Add active connections
        for (int i = 0; i < MAX_CONNECTIONS; i++) {
            Connection* conn = server->connections[i];
            if (!conn) continue;

            if (conn->state == CONN_READING_REQUEST ||
                conn->state == CONN_READING_HEADERS) {
                FD_SET(conn->fd, &read_fds);
            }
            if (conn->state == CONN_SENDING_RESPONSE) {
                FD_SET(conn->fd, &write_fds);
            }
            if (conn->fd > max_fd) max_fd = conn->fd;
        }

        struct timeval timeout = {.tv_sec = 1, .tv_usec = 0};
        int ready = select(max_fd + 1, &read_fds, &write_fds, NULL, &timeout);

        if (ready < 0) {
            if (errno == EINTR) continue;
            perror("select");
            break;
        }

        // Accept new connections
        if (FD_ISSET(server->listen_fd, &read_fds)) {
            accept_connection(server);
        }

        // Handle existing connections
        for (int i = 0; i < MAX_CONNECTIONS; i++) {
            Connection* conn = server->connections[i];
            if (!conn) continue;

            if (FD_ISSET(conn->fd, &read_fds)) {
                handle_read(server, conn);
            }
            if (FD_ISSET(conn->fd, &write_fds)) {
                handle_write(server, conn);
            }
        }

        // Check timeouts
        check_timeouts(server);
    }
}

Phase 3: HTTP Parsing with Bounds Checking (Day 3-5)

static bool parse_request_line(Connection* conn) {
    char* line_end = strstr(conn->read_buffer, "\r\n");
    if (!line_end) return false;  // Incomplete

    size_t line_len = line_end - conn->read_buffer;
    if (line_len > MAX_REQUEST_LINE) {
        conn->state = CONN_CLOSING;
        // Send 414 URI Too Long
        return false;
    }

    // Parse: METHOD URI VERSION
    char* cursor = conn->read_buffer;

    // Method (allocate from pool)
    char* method_end = strchr(cursor, ' ');
    if (!method_end || method_end > line_end) {
        // Bad request
        return false;
    }

    size_t method_len = method_end - cursor;
    conn->method = arena_alloc(conn->pool, method_len + 1, 1);
    memcpy(conn->method, cursor, method_len);
    conn->method[method_len] = '\0';

    cursor = method_end + 1;

    // URI
    char* uri_end = strchr(cursor, ' ');
    if (!uri_end || uri_end > line_end) {
        return false;
    }

    size_t uri_len = uri_end - cursor;
    if (uri_len > MAX_URI_LENGTH) {
        // 414 URI Too Long
        return false;
    }

    conn->uri = arena_alloc(conn->pool, uri_len + 1, 1);
    memcpy(conn->uri, cursor, uri_len);
    conn->uri[uri_len] = '\0';

    // Validate URI (no null bytes, path traversal, etc.)
    if (!validate_uri(conn->uri)) {
        // 400 Bad Request
        return false;
    }

    cursor = uri_end + 1;

    // Version
    size_t version_len = line_end - cursor;
    conn->version = arena_alloc(conn->pool, version_len + 1, 1);
    memcpy(conn->version, cursor, version_len);
    conn->version[version_len] = '\0';

    // Shift buffer
    memmove(conn->read_buffer, line_end + 2, conn->read_pos - line_len - 2);
    conn->read_pos -= line_len + 2;

    conn->state = CONN_READING_HEADERS;
    return true;
}

Phase 4: File Serving (Day 5-6)

static void serve_file(Connection* conn, const char* path) {
    // Construct full path
    char full_path[PATH_MAX];
    snprintf(full_path, sizeof(full_path), "%s%s", server->docroot, path);

    // Validate path (prevent directory traversal)
    char resolved[PATH_MAX];
    if (!realpath(full_path, resolved)) {
        send_error(conn, 404, "Not Found");
        return;
    }

    // Check it's under docroot
    if (strncmp(resolved, server->docroot, strlen(server->docroot)) != 0) {
        send_error(conn, 403, "Forbidden");
        return;
    }

    // Open file
    int fd = open(resolved, O_RDONLY);
    if (fd < 0) {
        send_error(conn, 404, "Not Found");
        return;
    }

    // Get file size
    struct stat st;
    fstat(fd, &st);

    // Build response
    char header[512];
    int header_len = snprintf(header, sizeof(header),
        "HTTP/1.1 200 OK\r\n"
        "Content-Type: %s\r\n"
        "Content-Length: %ld\r\n"
        "Connection: close\r\n"
        "\r\n",
        get_content_type(resolved),
        st.st_size);

    // Allocate response buffer
    conn->response_len = header_len + st.st_size;
    conn->response_buffer = arena_alloc(conn->pool, conn->response_len, 1);
    memcpy(conn->response_buffer, header, header_len);
    read(fd, conn->response_buffer + header_len, st.st_size);
    close(fd);

    conn->response_sent = 0;
    conn->state = CONN_SENDING_RESPONSE;
}

Phase 5: Cleanup and Statistics (Day 6-7)

static void close_connection(Server* server, int slot) {
    Connection* conn = server->connections[slot];
    if (!conn) return;

    close(conn->fd);
    arena_reset(conn->pool);  // O(1) cleanup!

    server->stats.total_connections++;
    server->stats.active_connections--;

    free(conn);
    server->connections[slot] = NULL;
}

void server_print_stats(Server* server) {
    printf("========== Server Statistics ==========\n");
    printf("Total requests: %lu\n", server->stats.total_requests);
    printf("Active connections: %d\n", server->stats.active_connections);
    printf("Peak connections: %d\n", server->stats.peak_connections);
    printf("Bytes received: %lu\n", server->stats.bytes_received);
    printf("Bytes sent: %lu\n", server->stats.bytes_sent);
    printf("Pool resets: %lu\n", server->stats.pool_resets);
    printf("Average pool usage: %.1f bytes\n",
           (double)server->stats.total_pool_usage / server->stats.total_requests);
    printf("Invariant violations: %d\n", server->stats.invariant_violations);
    printf("========================================\n");
}

Testing Strategy

Functional Testing

# Basic request
curl -v http://localhost:8080/index.html

# Concurrent connections
ab -n 10000 -c 100 http://localhost:8080/test.html

# Large file
curl http://localhost:8080/large_file.bin -o /dev/null

Fuzzing

# Send malformed requests
./fuzzer --target localhost:8080 --malformed-requests 1000

# Expected: All rejected gracefully, no crashes, no leaks

Memory Verification

$ valgrind --leak-check=full ./httpserver 8080 ./www
# Handle 1000 requests, then shutdown
==12345== All heap blocks were freed -- no leaks are possible

Common Pitfalls

Pitfall 1: Partial Reads

// WRONG: Assuming read() returns complete request
int n = read(conn->fd, buffer, sizeof(buffer));
// May return partial data!

// CORRECT: Accumulate in buffer, check for complete message
int n = read(conn->fd, conn->read_buffer + conn->read_pos,
             sizeof(conn->read_buffer) - conn->read_pos);
conn->read_pos += n;

// Check if we have a complete request line
if (strstr(conn->read_buffer, "\r\n")) {
    parse_request_line(conn);
}

Pitfall 2: Buffer Overflow in URI

// WRONG: No length check
strcpy(path, uri);  // Buffer overflow!

// CORRECT: Bounds checking
if (strlen(uri) >= sizeof(path)) {
    send_error(conn, 414, "URI Too Long");
    return;
}
strncpy(path, uri, sizeof(path) - 1);
path[sizeof(path) - 1] = '\0';

Pitfall 3: Pool Exhaustion

// WRONG: Not checking allocation
char* data = arena_alloc(conn->pool, size, 1);
memcpy(data, src, size);  // Crash if data is NULL!

// CORRECT: Check for pool exhaustion
char* data = arena_alloc(conn->pool, size, 1);
if (!data) {
    send_error(conn, 507, "Insufficient Storage");
    return;
}

Interview Preparation

Common Questions

“How does select() work for handling multiple connections?”
- Single-threaded event loop
- select() blocks until any fd is ready
- Check which fds are ready, handle them
- Repeat
“Why use memory pools instead of malloc?”
- Zero malloc overhead per request
- All request data has same lifetime
- O(1) cleanup with pool_reset()
- Prevents fragmentation
“How do you prevent buffer overflows when parsing HTTP?”
- Check all lengths before copying
- Validate Content-Length limits
- Reject requests that exceed buffer sizes
- Use safe string functions
“What happens if a client sends data very slowly?”
- Timeout mechanism
- Close connections that are idle too long
- Limit partial request duration

Self-Assessment Checklist

Functionality

Serves static files correctly
Handles 100+ concurrent connections
Timeouts work correctly
Graceful shutdown releases resources

Security

Rejects oversized requests
Prevents directory traversal
No buffer overflows (survives fuzzing)
Validates all input

Memory

Zero heap allocations during request handling
Valgrind clean after extended use
Pool usage stays bounded
No leaks on connection close

Invariants

State machine transitions correct
Invariant checker passes continuously
Statistics are accurate

Summary

The HTTP Server capstone proves you can build production-quality systems software:

Invariants everywhere: Connection state, buffer validity, pool ownership
Defensive parsing: Untrusted input, bounds checking, validation
Pool allocation: Zero malloc per request, O(1) cleanup
Event-driven architecture: Non-blocking I/O, state machines
Security awareness: Fuzzing survival, attack resistance

When you can build a server that handles thousands of requests, survives fuzzing, never leaks memory, and clearly documents its invariants, you’ve proven mastery of Sprint 2’s core concepts.

Congratulations on completing Sprint 2: Data & Invariants!

You’ve gone from “hoping code works” to “proving code is correct.” You understand ownership, invariants, and defensive design at a professional level. This is the discipline that separates production-quality systems code from fragile prototypes.