Project 2: HTTP Connection Pool with Failure Injection
Build a resilient HTTP connection pool that survives timeouts, resets, and half-open sockets.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate-Advanced |
| Time Estimate | 2-3 weeks |
| Language | C (Alternatives: Rust, Go) |
| Prerequisites | Socket basics, blocking vs non-blocking I/O |
| Key Topics | TCP states, pooling, failure injection |
1. Learning Objectives
By completing this project, you will:
- Implement a connection pool with health checks and reuse.
- Detect half-open connections and stale sockets.
- Inject failures to validate resilience strategies.
- Apply circuit breaker and backoff behavior under load.
2. Theoretical Foundation
2.1 Core Concepts
- TCP connection lifecycle: SYN, ESTABLISHED, FIN_WAIT, TIME_WAIT, and how they impact pooling.
- Half-open connections: One side thinks the connection is alive; the other has closed it.
- Keep-alive and timeouts: HTTP keep-alive needs idle detection or probes.
- Backpressure and load: Pools need limits to prevent resource exhaustion.
2.2 Why This Matters
Production services often fail at the integration boundary between application logic and network reality. A pool that blindly reuses sockets will cause bursts of errors during failures.
2.3 Historical Context / Background
Early clients created per-request connections. Keep-alive and pooling emerged to reduce latency and handshake overhead, but they introduced new failure modes.
2.4 Common Misconceptions
- “If
connect()succeeded once, the socket is good forever.” Not with idle timeouts. - “Errors will surface immediately.” Half-open sockets can fail only on write.
3. Project Specification
3.1 What You Will Build
A library and CLI tool that maintains a pool of HTTP connections to a target server, issues requests, and handles failures with retries and circuit breaking.
3.2 Functional Requirements
- Pool management: Keep a configurable number of idle connections.
- Health checks: Validate sockets before reuse.
- Failure injection: Simulate timeout, reset, and server crash.
- Observability: Metrics for active, idle, failed, and retried requests.
3.3 Non-Functional Requirements
- Performance: Support hundreds of requests/sec on localhost.
- Reliability: Recover from server restarts without manual intervention.
- Usability: Clear CLI flags for pool size and timeouts.
3.4 Example Usage / Output
$ ./pooler --host 127.0.0.1 --port 8080 --concurrency 50 --failures on
requests=1000 ok=972 retries=28 open_circuit=3 avg_ms=12.4
3.5 Real World Outcome
You run a load test against a dev server that you kill and restart. The tool keeps working, retries transient failures, and reports health. Example output:
$ ./pooler --host 127.0.0.1 --port 8080 --concurrency 20 --failures on
[stats] ok=182 fail=6 retry=14 open_circuit=1 pool_active=18 pool_idle=2
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Requester │──▶│ Connection │──▶│ HTTP Client │
│ Threads │ │ Pool │ │ Engine │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
▼ ▼ ▼
Metrics Store Health Checks Failure Injector

4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Pool | Manage socket lifecycle | Max idle/active sizes |
| HealthCheck | Validate reuse | SO_KEEPALIVE, ping, or read test |
| Injector | Simulate failures | Delay, drop, reset, timeout |
4.3 Data Structures
struct conn {
int fd;
int healthy;
uint64_t last_used_ms;
};
struct pool {
struct conn *items;
size_t max_idle;
size_t max_active;
};
4.4 Algorithm Overview
Key Algorithm: reuse with validation
- Acquire idle socket from pool.
- Run health check (non-blocking read or ping).
- If unhealthy, close and replace; else use.
Complexity Analysis:
- Time: O(1) average for acquire/release
- Space: O(N) for pool size
5. Implementation Guide
5.1 Development Environment Setup
sudo apt-get install build-essential
5.2 Project Structure
pooler/
├── src/
│ ├── main.c
│ ├── pool.c
│ ├── http.c
│ └── inject.c
├── tests/
│ └── test_failures.sh
├── Makefile
└── README.md

5.3 The Core Question You’re Answering
“How do I keep a pool of sockets safe when the network is lying to me?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- TCP State Machine
- What are
ESTABLISHED,TIME_WAIT,CLOSE_WAIT? - How do half-open connections form?
- Book Reference: “TCP/IP Illustrated, Vol. 1” Ch. 13
- What are
- Non-blocking sockets
- How to set
O_NONBLOCKand interpretEAGAIN. - Book Reference: “TLPI” Ch. 61
- How to set
- Connection reuse
- How keep-alive works in HTTP/1.1.
- Book Reference: “Release It!” Ch. 5
5.5 Questions to Guide Your Design
Before implementing, think through these:
- How do you detect a dead connection before reuse?
- When should you close and create a new socket?
- How do you avoid retry storms during outages?
- What metrics best describe pool health?
5.6 Thinking Exercise
Simulate a Half-Open Connection
- Start a server that accepts connections.
- Kill the server without closing sockets.
- Attempt to write from the client and observe the error.
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “What is a half-open TCP connection?”
- “How would you detect a dead socket in a pool?”
- “Why can reusing sockets increase failures?”
5.8 Hints in Layers
Hint 1: Validate on checkout
Perform a non-blocking recv() with MSG_PEEK to detect closure.
Hint 2: Use backoff After repeated failures, sleep or open a circuit to reduce retries.
Hint 3: Inject chaos Randomly close sockets to test resilience.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| TCP states | “TCP/IP Illustrated, Vol. 1” | Ch. 13, 18-19 |
| Socket programming | “TCP/IP Sockets in C” | Ch. 2-4 |
| Stability patterns | “Release It!” | Ch. 5 |
| Sockets | “The Linux Programming Interface” | Ch. 61 |
5.10 Implementation Phases
Phase 1: Foundation (3-4 days)
Goals:
- Basic HTTP client
- Single connection reuse
Tasks:
- Implement a simple GET request.
- Add keep-alive support.
Checkpoint: Reuse one socket for multiple requests.
Phase 2: Core Functionality (5-7 days)
Goals:
- Connection pool
- Health checks
Tasks:
- Implement pool acquire/release.
- Add socket validation on checkout.
Checkpoint: Pool survives server restart.
Phase 3: Polish & Edge Cases (3-5 days)
Goals:
- Failure injection
- Metrics and CLI
Tasks:
- Add timeout and reset injection.
- Track retries and circuit open stats.
Checkpoint: Failure injection does not crash pool.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Health check | SO_KEEPALIVE vs probe |
probe | Immediate detection |
| Retry strategy | immediate vs backoff | backoff | Avoids cascading failures |
| Concurrency | threads vs event loop | threads | Simpler for project scope |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate pool logic | acquire/release behavior |
| Integration Tests | End-to-end HTTP | server restart test |
| Chaos Tests | Failure injection | random drops |
6.2 Critical Test Cases
- Server restart: pool should recover and reconnect.
- Timeout injection: requests retried with backoff.
- RST injection: close and replace sockets.
6.3 Test Data
GET /health HTTP/1.1
Host: localhost
Connection: keep-alive
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Reuse without check | Burst of failures | Validate socket on checkout |
| No backoff | Thundering herd | Add exponential backoff |
| Ignoring TIME_WAIT | Socket errors | Respect OS limits |
7.2 Debugging Strategies
- Use
ss -tanto view socket states. - Log errors by errno to distinguish timeouts from resets.
7.3 Performance Traps
Too many active connections can exhaust file descriptors. Cap pool size.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add configurable request paths.
- Print latency histogram.
8.2 Intermediate Extensions
- Add TLS support using OpenSSL.
- Use
epollto manage many connections.
8.3 Advanced Extensions
- Implement adaptive pool sizing.
- Add circuit breaker with rolling window metrics.
9. Real-World Connections
9.1 Industry Applications
- HTTP client libraries in microservices.
- Database connection pools with similar failure patterns.
9.2 Related Open Source Projects
- curl: https://github.com/curl/curl - Robust HTTP client
- nghttp2: https://github.com/nghttp2/nghttp2 - HTTP/2 stack
9.3 Interview Relevance
- Demonstrates network failure handling.
- Shows knowledge of TCP state behavior.
10. Resources
10.1 Essential Reading
- “TCP/IP Illustrated, Vol. 1” by Stevens & Fall - Ch. 13, 18-19
- “TCP/IP Sockets in C” by Donahoo & Calvert - Ch. 2-4
10.2 Video Resources
- TCP state machine deep dives - YouTube
- Production networking incidents - SRE talks
10.3 Tools & Documentation
man 2 connect: Socket connectionman 7 tcp: TCP behaviors
10.4 Related Projects in This Series
- Project 1 prepares FD and multiplexing knowledge.
- Project 3 adds process supervision for network daemons.
11. Self-Assessment Checklist
11.1 Understanding
- I can describe half-open TCP connections.
- I can explain why keep-alive can be dangerous.
- I can justify my retry strategy.
11.2 Implementation
- Pool recovers from server restarts.
- Failure injection behaves as expected.
- Metrics are accurate and stable.
11.3 Growth
- I documented a real failure case and fix.
- I can explain this project in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Pool reuses connections safely.
- Detects and replaces dead sockets.
Full Completion:
- Failure injection and retries implemented.
- Metrics and CLI are complete.
Excellence (Going Above & Beyond):
- Adaptive pool sizing and circuit breaker.
- TLS support with robust error handling.
This guide was generated from SPRINT_5_SYSTEMS_INTEGRATION_PROJECTS.md. For the complete learning path, see the parent directory.