Project 2: HTTP Connection Pool with Failure Injection

Build a resilient HTTP connection pool that survives timeouts, resets, and half-open sockets.

Quick Reference

Attribute Value
Difficulty Intermediate-Advanced
Time Estimate 2-3 weeks
Language C (Alternatives: Rust, Go)
Prerequisites Socket basics, blocking vs non-blocking I/O
Key Topics TCP states, pooling, failure injection

1. Learning Objectives

By completing this project, you will:

  1. Implement a connection pool with health checks and reuse.
  2. Detect half-open connections and stale sockets.
  3. Inject failures to validate resilience strategies.
  4. Apply circuit breaker and backoff behavior under load.

2. Theoretical Foundation

2.1 Core Concepts

  • TCP connection lifecycle: SYN, ESTABLISHED, FIN_WAIT, TIME_WAIT, and how they impact pooling.
  • Half-open connections: One side thinks the connection is alive; the other has closed it.
  • Keep-alive and timeouts: HTTP keep-alive needs idle detection or probes.
  • Backpressure and load: Pools need limits to prevent resource exhaustion.

2.2 Why This Matters

Production services often fail at the integration boundary between application logic and network reality. A pool that blindly reuses sockets will cause bursts of errors during failures.

2.3 Historical Context / Background

Early clients created per-request connections. Keep-alive and pooling emerged to reduce latency and handshake overhead, but they introduced new failure modes.

2.4 Common Misconceptions

  • “If connect() succeeded once, the socket is good forever.” Not with idle timeouts.
  • “Errors will surface immediately.” Half-open sockets can fail only on write.

3. Project Specification

3.1 What You Will Build

A library and CLI tool that maintains a pool of HTTP connections to a target server, issues requests, and handles failures with retries and circuit breaking.

3.2 Functional Requirements

  1. Pool management: Keep a configurable number of idle connections.
  2. Health checks: Validate sockets before reuse.
  3. Failure injection: Simulate timeout, reset, and server crash.
  4. Observability: Metrics for active, idle, failed, and retried requests.

3.3 Non-Functional Requirements

  • Performance: Support hundreds of requests/sec on localhost.
  • Reliability: Recover from server restarts without manual intervention.
  • Usability: Clear CLI flags for pool size and timeouts.

3.4 Example Usage / Output

$ ./pooler --host 127.0.0.1 --port 8080 --concurrency 50 --failures on
requests=1000 ok=972 retries=28 open_circuit=3 avg_ms=12.4

3.5 Real World Outcome

You run a load test against a dev server that you kill and restart. The tool keeps working, retries transient failures, and reports health. Example output:

$ ./pooler --host 127.0.0.1 --port 8080 --concurrency 20 --failures on
[stats] ok=182 fail=6 retry=14 open_circuit=1 pool_active=18 pool_idle=2

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│  Requester   │──▶│ Connection   │──▶│  HTTP Client │
│  Threads     │   │   Pool       │   │   Engine     │
└──────────────┘   └──────────────┘   └──────────────┘
        │                  │                  │
        ▼                  ▼                  ▼
   Metrics Store     Health Checks       Failure Injector

HTTP Connection Pool Architecture

4.2 Key Components

Component Responsibility Key Decisions
Pool Manage socket lifecycle Max idle/active sizes
HealthCheck Validate reuse SO_KEEPALIVE, ping, or read test
Injector Simulate failures Delay, drop, reset, timeout

4.3 Data Structures

struct conn {
    int fd;
    int healthy;
    uint64_t last_used_ms;
};

struct pool {
    struct conn *items;
    size_t max_idle;
    size_t max_active;
};

4.4 Algorithm Overview

Key Algorithm: reuse with validation

  1. Acquire idle socket from pool.
  2. Run health check (non-blocking read or ping).
  3. If unhealthy, close and replace; else use.

Complexity Analysis:

  • Time: O(1) average for acquire/release
  • Space: O(N) for pool size

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

pooler/
├── src/
│   ├── main.c
│   ├── pool.c
│   ├── http.c
│   └── inject.c
├── tests/
│   └── test_failures.sh
├── Makefile
└── README.md

HTTP Pooler Project Structure

5.3 The Core Question You’re Answering

“How do I keep a pool of sockets safe when the network is lying to me?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. TCP State Machine
    • What are ESTABLISHED, TIME_WAIT, CLOSE_WAIT?
    • How do half-open connections form?
    • Book Reference: “TCP/IP Illustrated, Vol. 1” Ch. 13
  2. Non-blocking sockets
    • How to set O_NONBLOCK and interpret EAGAIN.
    • Book Reference: “TLPI” Ch. 61
  3. Connection reuse
    • How keep-alive works in HTTP/1.1.
    • Book Reference: “Release It!” Ch. 5

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. How do you detect a dead connection before reuse?
  2. When should you close and create a new socket?
  3. How do you avoid retry storms during outages?
  4. What metrics best describe pool health?

5.6 Thinking Exercise

Simulate a Half-Open Connection

  1. Start a server that accepts connections.
  2. Kill the server without closing sockets.
  3. Attempt to write from the client and observe the error.

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is a half-open TCP connection?”
  2. “How would you detect a dead socket in a pool?”
  3. “Why can reusing sockets increase failures?”

5.8 Hints in Layers

Hint 1: Validate on checkout Perform a non-blocking recv() with MSG_PEEK to detect closure.

Hint 2: Use backoff After repeated failures, sleep or open a circuit to reduce retries.

Hint 3: Inject chaos Randomly close sockets to test resilience.

5.9 Books That Will Help

Topic Book Chapter
TCP states “TCP/IP Illustrated, Vol. 1” Ch. 13, 18-19
Socket programming “TCP/IP Sockets in C” Ch. 2-4
Stability patterns “Release It!” Ch. 5
Sockets “The Linux Programming Interface” Ch. 61

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

Goals:

  • Basic HTTP client
  • Single connection reuse

Tasks:

  1. Implement a simple GET request.
  2. Add keep-alive support.

Checkpoint: Reuse one socket for multiple requests.

Phase 2: Core Functionality (5-7 days)

Goals:

  • Connection pool
  • Health checks

Tasks:

  1. Implement pool acquire/release.
  2. Add socket validation on checkout.

Checkpoint: Pool survives server restart.

Phase 3: Polish & Edge Cases (3-5 days)

Goals:

  • Failure injection
  • Metrics and CLI

Tasks:

  1. Add timeout and reset injection.
  2. Track retries and circuit open stats.

Checkpoint: Failure injection does not crash pool.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Health check SO_KEEPALIVE vs probe probe Immediate detection
Retry strategy immediate vs backoff backoff Avoids cascading failures
Concurrency threads vs event loop threads Simpler for project scope

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate pool logic acquire/release behavior
Integration Tests End-to-end HTTP server restart test
Chaos Tests Failure injection random drops

6.2 Critical Test Cases

  1. Server restart: pool should recover and reconnect.
  2. Timeout injection: requests retried with backoff.
  3. RST injection: close and replace sockets.

6.3 Test Data

GET /health HTTP/1.1
Host: localhost
Connection: keep-alive


7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Reuse without check Burst of failures Validate socket on checkout
No backoff Thundering herd Add exponential backoff
Ignoring TIME_WAIT Socket errors Respect OS limits

7.2 Debugging Strategies

  • Use ss -tan to view socket states.
  • Log errors by errno to distinguish timeouts from resets.

7.3 Performance Traps

Too many active connections can exhaust file descriptors. Cap pool size.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add configurable request paths.
  • Print latency histogram.

8.2 Intermediate Extensions

  • Add TLS support using OpenSSL.
  • Use epoll to manage many connections.

8.3 Advanced Extensions

  • Implement adaptive pool sizing.
  • Add circuit breaker with rolling window metrics.

9. Real-World Connections

9.1 Industry Applications

  • HTTP client libraries in microservices.
  • Database connection pools with similar failure patterns.
  • curl: https://github.com/curl/curl - Robust HTTP client
  • nghttp2: https://github.com/nghttp2/nghttp2 - HTTP/2 stack

9.3 Interview Relevance

  • Demonstrates network failure handling.
  • Shows knowledge of TCP state behavior.

10. Resources

10.1 Essential Reading

  • “TCP/IP Illustrated, Vol. 1” by Stevens & Fall - Ch. 13, 18-19
  • “TCP/IP Sockets in C” by Donahoo & Calvert - Ch. 2-4

10.2 Video Resources

  • TCP state machine deep dives - YouTube
  • Production networking incidents - SRE talks

10.3 Tools & Documentation

  • man 2 connect: Socket connection
  • man 7 tcp: TCP behaviors
  • Project 1 prepares FD and multiplexing knowledge.
  • Project 3 adds process supervision for network daemons.

11. Self-Assessment Checklist

11.1 Understanding

  • I can describe half-open TCP connections.
  • I can explain why keep-alive can be dangerous.
  • I can justify my retry strategy.

11.2 Implementation

  • Pool recovers from server restarts.
  • Failure injection behaves as expected.
  • Metrics are accurate and stable.

11.3 Growth

  • I documented a real failure case and fix.
  • I can explain this project in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Pool reuses connections safely.
  • Detects and replaces dead sockets.

Full Completion:

  • Failure injection and retries implemented.
  • Metrics and CLI are complete.

Excellence (Going Above & Beyond):

  • Adaptive pool sizing and circuit breaker.
  • TLS support with robust error handling.

This guide was generated from SPRINT_5_SYSTEMS_INTEGRATION_PROJECTS.md. For the complete learning path, see the parent directory.