Project 5: libhttp-lite (Integrated Boundary System)

Build a minimal HTTP server/client C library with stable APIs, strict parsing, and integrated logging + JSON parsing.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	3-4 weeks
Main Programming Language	C (Alternatives: Rust, Zig)
Alternative Programming Languages	Rust, Zig
Coolness Level	Level 8 - end-to-end integration
Business Potential	Level 7 - embeddable HTTP stack
Prerequisites	Sockets, parsing, ABI design, logging
Key Topics	HTTP parsing, API stability, integration

1. Learning Objectives

By completing this project, you will:

Design a stable C API for HTTP server and client components.
Implement strict HTTP/1.1 parsing with size and timeout limits.
Integrate logging and JSON parsing without global state.
Build a middleware pipeline with explicit ownership rules.
Validate ABI stability and error models across modules.

2. All Theory Needed (Per-Concept Breakdown)

2.1 HTTP Parsing & Stream Boundaries

Fundamentals

HTTP/1.1 is a text-based protocol built on TCP, which is a byte stream. That means you must parse requests incrementally: read bytes, detect request line and headers, then read the body based on Content-Length or Transfer-Encoding. A correct parser must handle partial reads, strict header syntax, and size limits. Unlike JSON, HTTP parsing is stateful and streaming by nature. The boundary here is the protocol: you must accept valid requests and reject malformed or oversized ones without blocking or crashing.

Deep Dive into the Concept

HTTP parsing starts with the request line: METHOD SP PATH SP VERSION CRLF. You must accept known methods and verify the version (e.g., HTTP/1.1). Then come headers: Key: Value lines terminated by CRLF, followed by an empty line (CRLF CRLF). The tricky part is that headers can be split across TCP reads. Your parser must buffer input and only proceed when a full line is available. This is why a streaming parser with a state machine is essential. A naive parser that expects the entire request in one recv call will fail under real network conditions.

The Content-Length header defines how many bytes of body to read. You must validate this value, enforce a maximum (e.g., 1MB), and then read exactly that many bytes. If the client sends fewer bytes or disconnects, treat it as a protocol error. If the client sends more, you must not consume bytes that belong to the next request (for keep-alive connections). If you don’t plan to support keep-alive, you can close the connection after one request; this simplifies boundaries and is acceptable for a minimal library. Document this explicitly.

Chunked transfer encoding is a more advanced feature. For a minimal library, you can reject Transfer-Encoding: chunked with a 411 or 501 response. That is still valid behavior if documented. The boundary contract should be explicit: “This library supports only Content-Length bodies.”

Parsing must also handle header limits. Attackers can send huge header lines to exhaust memory. Define a max header size (e.g., 8KB) and reject requests exceeding it. Track total bytes read for headers and body separately. If limits are exceeded, return a 413 Payload Too Large or 431 Request Header Fields Too Large response.

Finally, consider response parsing for the client side. The same framing rules apply: status line, headers, body. A client library can be simpler if it only supports responses with Content-Length and no chunking. As long as you document limitations, this is acceptable for a learning project. The key is deterministic behavior and explicit errors, not full RFC completeness.

How This Fits in This Project

Your HTTP parser and limits define the core of libhttp-lite in Sec. 3.5 and Sec. 4.4. You will implement the parsing state machine in Sec. 5.10 Phase 2 and test error responses in Sec. 6.2. This directly reuses concepts from Project 3 (parsing) and Project 2 (TCP framing). Also used in: Project 2, Project 3.

Definitions & Key Terms

Request line -> METHOD SP PATH SP VERSION.
Headers -> Key/value metadata lines terminated by CRLF.
Content-Length -> Body size in bytes.
State machine -> Parser that transitions through known states.

Mental Model Diagram (ASCII)

TCP stream -> [state: REQ_LINE] -> [state: HEADERS] -> [state: BODY]
                   |                         |               |
                   v                         v               v
                parse line             parse header      read N bytes

How It Works (Step-by-Step)

Read bytes until CRLF -> parse request line.
Read header lines until CRLF CRLF.
Validate headers and compute body length.
Read exact body bytes (if any).
Dispatch handler; build response.

Minimal Concrete Example

if (state == HEADERS && line_is_empty()) {
    if (content_length > MAX_BODY) return ERR_TOO_LARGE;
    state = BODY;
}

Common Misconceptions

“recv returns a whole request.” -> It returns any available bytes.
“Chunked bodies are optional details.” -> If unsupported, you must reject clearly.
“Headers are small.” -> Attackers can send huge headers.

Check-Your-Understanding Questions

Why must HTTP parsing be incremental?
What is the risk of ignoring Content-Length?
Why should you enforce header size limits?

Check-Your-Understanding Answers

TCP is a stream; requests can be split across reads.
You can misread body boundaries and corrupt the stream.
To prevent memory exhaustion attacks.

Real-World Applications

Embedded HTTP servers in devices.
Internal service gateways.

Where You’ll Apply It

In this project: Sec. 4.4 (parser), Sec. 5.10 Phase 2 (state machine).
Also used in: Project 2, Project 3.

References

RFC 7230 (HTTP/1.1 Message Syntax)
“The Linux Programming Interface” - socket chapters

Key Insight

HTTP parsing is a streaming boundary: correctness depends on incremental state, not single reads.

Summary

A minimal HTTP stack still needs strict parsing, size limits, and deterministic state transitions. These boundaries make the server safe and predictable.

Homework/Exercises to Practice the Concept

Implement a parser for request line and headers only.
Add a body reader based on Content-Length.
Reject oversized headers with 431.

Solutions to the Homework/Exercises

Use a buffer and scan for CRLF.
Track content_length and read exactly N bytes.
If header bytes > limit, return error and close.

2.2 API/ABI Stability with Opaque Handles

Fundamentals

A stable C API hides implementation details behind opaque handles and exposes functions instead of structs. ABI stability requires that public types and function signatures remain compatible across versions. If you expose struct http_request in a header, any internal change breaks ABI. By using opaque types and accessor functions, you preserve binary compatibility and can evolve internals safely. This is the same boundary discipline you applied in Projects 1 and 2.

Deep Dive into the Concept

For a reusable HTTP library, consumers might link against a shared object. If you change the size or layout of a public struct, old binaries will still compile but crash or misbehave at runtime. Opaque handles prevent this by making the struct incomplete in the header:

typedef struct http_server http_server;

The implementation file defines the actual struct. Callers only receive pointers and use accessor functions. This decouples user code from internal representation and gives you freedom to add fields, change layouts, or swap implementations.

Versioning is also important. You should expose HTTP_API_VERSION and optionally http_version() so callers can verify compatibility at compile or runtime. If you change function behavior or add new functions, you should keep old functions intact. If you need to add fields to a public struct (for example, a public http_response), you can version it with a size field similar to the plugin ABI in Project 1. This is a consistent boundary strategy across projects.

Error models must be stable as well. If you return an enum, you should never change the meaning of existing values. Add new ones at the end. For example:

typedef enum { HTTP_OK=0, HTTP_ERR_PARSE=-1, HTTP_ERR_IO=-2 } http_status;

If you later add HTTP_ERR_TOO_LARGE, it should not reuse an existing value. This protects callers who may compare numeric values.

Ownership boundaries are also part of API stability. For example, if http_request_body() returns a pointer, document whether it is borrowed (valid only during handler call) or owned (caller must free). A safe and common rule is: request bodies are owned by the library and valid only during the handler call; response bodies are owned by the caller and copied by the library (or the library takes ownership via a flag). These rules must be explicit and stable.

Finally, consider function naming and namespacing. Prefix all public symbols with http_ to avoid collisions. This is part of ABI hygiene and reduces the risk of conflicts in large projects.

How This Fits in This Project

Opaque handles and versioning shape your public headers (Sec. 3.5) and the stable API design in Sec. 4.2. This mirrors Project 1’s plugin ABI and Project 2’s client handle. Also used in: Project 1, Project 2.

Definitions & Key Terms

Opaque handle -> Public type without exposed fields.
ABI stability -> Binary compatibility across versions.
Accessor function -> Getter used instead of direct struct access.
Namespace prefix -> Common prefix for exported symbols.

Mental Model Diagram (ASCII)

public header: typedef struct http_server http_server;
implementation: struct http_server { ... } // hidden

How It Works (Step-by-Step)

Define opaque types in header.
Implement structs in .c files.
Expose accessors (getters/setters).
Keep enums stable across versions.
Add new functions without breaking old ones.

Minimal Concrete Example

http_server* http_server_create(int port);
void http_server_destroy(http_server* srv);

Common Misconceptions

“Exposing structs is easier.” -> It breaks ABI when internals change.
“Versioning is only for plugins.” -> Shared libraries need it too.
“Return errno directly.” -> It leaks unstable system details into API.

Check-Your-Understanding Questions

Why should http_request be opaque?
How does a size field help with versioned structs?
Why is stable enum ordering important?

Check-Your-Understanding Answers

To prevent ABI breakage and hide internals.
It allows detecting older layouts safely.
Callers may compare numeric values; changing them breaks behavior.

Real-World Applications

libcurl and libuv APIs.
Plugin systems and shared libraries.

Where You’ll Apply It

In this project: Sec. 3.5 (public API), Sec. 5.11 (decisions).
Also used in: Project 1, Project 2.

References

“C Interfaces and Implementations” - Ch. 1-2
“Expert C Programming” - Ch. 5

Key Insight

Opaque handles are the cheapest insurance policy for ABI stability.

Summary

Stable APIs avoid exposing struct layouts and use explicit versioning. This keeps your HTTP library safe to evolve over time.

Homework/Exercises to Practice the Concept

Convert a public struct API to an opaque handle API.
Add an enum error code and ensure old values remain unchanged.
Add a version function and expose it in the header.

Solutions to the Homework/Exercises

Move struct definition to .c and provide accessors.
Append new enum values without reordering.
Implement http_version() returning a constant string.

2.3 Integration Boundaries: Logging, JSON, Middleware

Fundamentals

libhttp-lite integrates multiple boundaries: HTTP parsing, JSON parsing, and logging. The library must connect these pieces without leaking ownership or global state. Middleware is a plugin-like pipeline where each layer can inspect or modify requests and responses. This requires clear contracts: who owns request bodies, how errors propagate, and how logs are emitted. Integration is where subtle boundary bugs appear.

Deep Dive into the Concept

Integration boundaries are about composition. Your HTTP server reads a request, parses it, and passes it to a handler. That handler might parse a JSON body, log an event, and return a response. Each step has its own ownership rules. For example, the HTTP parser might allocate a buffer for the request body. You must decide: does the handler own that buffer? A safe rule is: the library owns the request body and it is valid only during the handler callback. If the handler needs to keep it, it must copy. This avoids leaks and simplifies cleanup.

Logging integration should be explicit. Do not use global loggers; instead, allow the user to set a logger handle on the server with http_server_set_logger. The server can then log structured events (request method, path, status code, latency). This mirrors the logging library’s API and preserves boundaries. If no logger is set, logging is a no-op.

JSON parsing integration is another boundary. If a route expects JSON, the handler can call your JSON parser and receive a json_value*. Ownership should be clear: the handler owns the JSON tree and must free it. Errors from the JSON parser must translate into HTTP errors (e.g., 400 Bad Request with a JSON error body). This means your library should provide helper functions to build error responses with a consistent JSON shape.

Middleware is essentially a plugin pipeline. Each middleware function can inspect the request and either continue or short-circuit with a response. Define a simple interface:

typedef http_status (*http_middleware_fn)(http_request*, http_response*);

The contract should specify that middleware runs in order, and if any middleware returns a non-OK status, the pipeline stops and the response is sent. This is similar to plugin lifecycle in Project 1. You should also define whether middleware can modify headers or body and what happens if multiple middleware set the same header.

Concurrency adds another layer. If the server is multi-threaded, requests can be processed concurrently. That means middleware and handlers must be thread-safe or documented as thread-compatible (safe if each request has its own state). The library should avoid global mutable state and use per-connection or per-request structs. If you implement a thread pool, ensure that request objects are not shared across threads.

Finally, error propagation: if the JSON parser fails or a handler returns an error, the server should generate a standardized JSON error response. This improves client behavior and makes error handling deterministic. Define a shape like:

{"error":"bad_request","code":400,"message":"invalid JSON at line 3"}

This aligns with the “Outcome Completeness” requirement and gives users a predictable error model.

How This Fits in This Project

Integration choices define your public API (Sec. 3.2), error shape (Sec. 3.7.5), and middleware pipeline (Sec. 4.2). The ownership contracts mirror Projects 3 and 4. Also used in: Project 1, Project 3, Project 4.

Definitions & Key Terms

Middleware -> Function chain that can inspect/modify requests.
Short-circuit -> Stop pipeline and return a response early.
Per-request state -> Data valid only during request handling.
Error translation -> Mapping internal errors to HTTP responses.

Mental Model Diagram (ASCII)

request -> parser -> middleware1 -> middleware2 -> handler -> response
                  ^                    |
                  |                    v
               logger               json parser

How It Works (Step-by-Step)

Parse HTTP request into http_request.
Run middleware chain in order.
If middleware returns error, build response and stop.
Call handler; handler may parse JSON and log.
Build response and send to client.

Minimal Concrete Example

http_server_add_middleware(srv, auth_mw);
http_server_add_route(srv, "/echo", echo_handler);

Common Misconceptions

“Middleware can keep request pointers forever.” -> Request is valid only during callback.
“Global logger is fine.” -> It leaks boundaries and breaks testability.
“JSON parsing errors should return 500.” -> They are client errors (400).

Check-Your-Understanding Questions

Why should request bodies be borrowed, not owned, by handlers?
How do you stop middleware execution safely?
Why is standardized error JSON important?

Check-Your-Understanding Answers

It prevents leaks and clarifies lifetime; handlers copy if needed.
Return a non-OK status and let the server send the response.
It gives clients predictable parsing and error handling.

Real-World Applications

Web frameworks with middleware (Express, Go net/http).
API gateways that log and validate requests.

Where You’ll Apply It

In this project: Sec. 3.7.5 (error JSON), Sec. 5.10 Phase 3 (middleware).
Also used in: Project 1, Project 3, Project 4.

References

“Designing Data-Intensive Applications” - sections on APIs
“Clean Code” - interface design

Key Insight

Integration is a boundary problem: ownership, error propagation, and state lifetimes must be explicit.

Summary

Middleware and integration require clear contracts. If you make ownership and error translation explicit, the system stays predictable even as you add features.

Homework/Exercises to Practice the Concept

Implement a middleware that rejects large bodies with 413.
Add a JSON echo handler that returns 400 on parse errors.
Add logging middleware that records method/path/status.

Solutions to the Homework/Exercises

Check Content-Length before reading body and return 413.
Call JSON parser; on error, return error JSON.
Log after handler returns with status code.

3. Project Specification

3.1 What You Will Build

A minimal HTTP/1.1 server and client library with stable C APIs. The server supports routing, middleware, logging integration, and JSON parsing for request bodies. The client can send requests and parse responses.

3.2 Functional Requirements

HTTP Server: start/stop, register routes, handle requests.
HTTP Client: send GET/POST with body.
Strict Parsing: request/response parsing with size limits.
Logging: structured request logs via injected logger.
Middleware: pluggable pipeline with short-circuit support.
JSON Integration: helper to parse JSON bodies.

3.3 Non-Functional Requirements

Performance: handle 100 req/sec locally.
Reliability: reject malformed requests safely.
Usability: stable API with clear ownership rules.

3.4 Example Usage / Output

[INFO] listening on 0.0.0.0:8080
[INFO] POST /echo 200 1ms

3.5 Data Formats / Schemas / Protocols

HTTP request structure:

METHOD SP PATH SP HTTP/1.1\r\n
Header: Value\r\n
\r\n
<body>

3.6 Edge Cases

Missing Host header.
Content-Length too large.
Malformed header lines.
Client disconnects mid-body.

3.7 Real World Outcome

You can run a small HTTP server, send JSON requests, and receive consistent JSON error responses.

3.7.1 How to Run (Copy/Paste)

make
./httpd --port 8080

3.7.2 Golden Path Demo (Deterministic)

Use fixed JSON input and deterministic logger timestamps.

3.7.3 If CLI: Exact Terminal Transcript

$ ./httpd --port 8080
[INFO] listening on 0.0.0.0:8080

$ curl -s -X POST http://localhost:8080/echo -d '{"msg":"hello"}'
{"msg":"hello"}
$ echo $?
0

Failure demo (invalid JSON):

$ curl -s -X POST http://localhost:8080/echo -d '{"msg":}'
{"error":"bad_request","code":400,"message":"invalid JSON at line 1 col 9"}

Exit Codes (server):

0 normal exit
2 bind/listen failure
3 invalid args

3.7.4 If Web App

N/A (CLI server + API).

3.7.5 If API

Example Request/Response (200):

$ curl -s -X POST http://localhost:8080/echo -d '{"msg":"hello"}'
{"msg":"hello"}

Example Error Response (400):

{"error":"bad_request","code":400,"message":"invalid JSON at line 1 col 9"}

Unified Error Shape:

{"error":"<string>","code":<number>,"message":"<string>"}

3.7.6 If Library

Install/Import: link libhttp-lite.a or libhttp-lite.so.

Minimal Usage:

http_server *srv = http_server_create(8080);
http_server_set_logger(srv, logger);
http_server_add_route(srv, "/echo", echo_handler);
http_server_run(srv);

Expected Output:

Requests logged with method/path/status.
Response body sent to client.

Error Handling Example:

if (http_server_run(srv) != HTTP_OK) {
    fprintf(stderr, "server error: %s\n", http_last_error(srv));
}

4. Solution Architecture

4.1 High-Level Design

client -> tcp -> http parser -> middleware -> handler -> response
                                  |
                                  v
                                logger

4.2 Key Components

4.3 Data Structures (No Full Code)

typedef struct http_request http_request; // opaque

typedef struct {
    const char *method;
    const char *path;
    size_t content_length;
} http_request_view;

4.4 Algorithm Overview

Key Algorithm: request handling

Read from socket into buffer.
Parse request line and headers.
Read body based on Content-Length.
Run middleware chain; if OK, call handler.
Build response and write to socket.

Complexity Analysis:

Time: O(n) for request size n.
Space: O(n) for buffered request.

5. Implementation Guide

5.1 Development Environment Setup

cc --version
make --version

5.2 Project Structure

libhttp-lite/
|-- include/
|   `-- http.h
|-- src/
|   |-- server.c
|   |-- client.c
|   |-- parser.c
|   |-- router.c
|   `-- middleware.c
|-- examples/
|   |-- httpd.c
|   `-- http-client.c
`-- Makefile

5.3 The Core Question You’re Answering

“How do you design an API that integrates multiple boundaries without breaking compatibility?”

5.4 Concepts You Must Understand First

HTTP parsing and stream handling.
Opaque handles and ABI stability.
Ownership and error translation across modules.

5.5 Questions to Guide Your Design

How will you enforce size limits on headers and bodies?
What is the ownership model for request and response buffers?
How will middleware communicate errors?

5.6 Thinking Exercise

Sketch a pipeline: request -> auth middleware -> JSON parse -> handler -> response. Where does each buffer live and who frees it?

5.7 The Interview Questions They’ll Ask

How do you preserve ABI stability across releases?
How do you parse HTTP incrementally?
How do you integrate logging without global state?

5.8 Hints in Layers

Hint 1: Opaque request/response types

typedef struct http_request http_request;

Hint 2: Enforce limits

http_server_set_max_body(srv, 1 * 1024 * 1024);

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Core Server (1 week)

Goals: listener, basic request parsing, routing. Checkpoint: handles GET /health.

Phase 2: Parsing & Limits (1 week)

Goals: strict parsing with limits, error responses. Checkpoint: malformed requests return 400.

Phase 3: Integration (1-2 weeks)

Goals: logging, JSON, middleware. Checkpoint: /echo route returns JSON and logs requests.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Header too large: send 16KB headers -> 431.
Invalid JSON: return 400 with error JSON.
Missing host: return 400.

6.3 Test Data

POST /echo HTTP/1.1\r\nHost: x\r\nContent-Length: 5\r\n\r\nhello

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Log raw request bytes for failed parses.
Add asserts for state transitions.

7.3 Performance Traps

Reallocating buffer for each read; use a growable buffer.

8. Extensions & Challenges

8.1 Beginner Extensions

Add GET /time endpoint.
Add request ID header.

8.2 Intermediate Extensions

Add keep-alive support.
Add chunked encoding support.

8.3 Advanced Extensions

Add TLS via OpenSSL.
Add HTTP/2 framing (experimental).

9. Real-World Connections

9.1 Industry Applications

Embedded HTTP servers.
Internal service health endpoints.

libmicrohttpd - small HTTP server.
civetweb - embeddable HTTP server.

9.3 Interview Relevance

Protocol parsing and state machines.
API/ABI stability and ownership rules.

10. Resources

10.1 Essential Reading

RFC 7230 (HTTP/1.1)
“C Interfaces and Implementations” - Ch. 1-2

10.2 Video Resources

“HTTP/1.1 Parsing” - networking lecture (searchable title)

10.3 Tools & Documentation

man socket, man bind, man listen, man accept

11. Self-Assessment Checklist

11.1 Understanding

I can explain HTTP request parsing steps.
I can describe ownership of request bodies.
I can explain middleware short-circuit behavior.

11.2 Implementation

Server handles valid requests and rejects invalid ones.
Error responses follow unified JSON shape.
Logs integrate without global state.

11.3 Growth

I can describe ABI stability strategy in an interview.
I documented limits and trade-offs.

12. Submission / Completion Criteria

Minimum Viable Completion:

HTTP server with one route.
Basic parsing with size limits.
Stable public API with opaque handles.

Full Completion:

Middleware, logging, JSON integration.
Error JSON responses for invalid input.
Client library to send requests.

Excellence (Going Above & Beyond):

Keep-alive support.
TLS support.