Project 17: HTTP Proxy Server
A forward proxy that relays HTTP requests and logs traffic metadata.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 2 weeks |
| Main Programming Language | Python |
| Alternative Programming Languages | Go, Rust, C |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | 3. The “Service & Support” Model |
| Prerequisites | See concepts below |
| Key Topics | Transport and Ports (TCP and UDP), Operations, Diagnostics, and Security |
1. Learning Objectives
- Build and validate: A forward proxy that relays HTTP requests and logs traffic metadata..
- Explain protocol behavior and verify it with a capture or trace.
- Handle edge cases and produce reproducible results.
2. All Theory Needed (Per-Concept Breakdown)
Transport and Ports (TCP and UDP)
Fundamentals The transport layer provides end-to-end communication between applications. TCP offers reliable, ordered delivery through acknowledgments, retransmission, and flow control. UDP provides minimal overhead without reliability guarantees, which makes it ideal for low-latency or simple query/response protocols like DNS. Ports are the addressing mechanism for applications. A network connection is identified by a 5-tuple: source IP, source port, destination IP, destination port, and protocol. Understanding how TCP and UDP differ is essential for building tools like ping-like diagnostics, port scanners, and proxies.
Deep Dive TCP is a stateful protocol. It begins with a three-way handshake that establishes initial sequence numbers on both ends. Once established, each side maintains a sliding window of bytes that have been sent but not yet acknowledged. Retransmission occurs on timeout or after duplicate acknowledgments. Flow control uses the receiver’s advertised window to prevent buffer overflow, while congestion control reacts to signs of network congestion by reducing the sending rate. These behaviors are not just theoretical: they explain why a large file transfer can slow down after packet loss, and why a connection may stall if ACKs are filtered or delayed.
UDP is the opposite: it simply wraps application data with source and destination ports and a checksum. There is no handshake, no retransmission, and no ordering. This makes UDP great for short queries, live streaming, or gaming, where timeliness is more important than perfect delivery. But it also means that the application must handle loss, duplication, or reordering if those problems matter. This is why protocols like DNS and DHCP include their own retry logic and transaction IDs.
Ports are the multiplexing mechanism of the transport layer. A single IP address can host many services because each service listens on a different port. Clients use ephemeral ports chosen by the OS to distinguish their connections. Firewalls and NAT devices often make decisions based on ports and protocol state, which is why understanding TCP states (SYN_SENT, ESTABLISHED, FIN_WAIT) is crucial for debugging. For example, a firewall that drops inbound SYN packets but allows inbound ACKs can cause mysterious failures in connection setup while established connections continue to work.
Transport behavior is also shaped by MTU and fragmentation. TCP segments are sized to fit within the path MTU. If a segment is too large and fragmentation is blocked, the connection can stall in ways that appear random. UDP has no built-in recovery for lost fragments, which can make large UDP payloads unreliable. This is why many UDP-based protocols keep messages small or implement their own segmentation and reassembly.
When you build transport-layer tools, you are interacting with a state machine. A port scanner that uses TCP SYN packets is testing how a host responds to a state transition. A proxy server is managing two concurrent TCP state machines and relaying data between them. A VPN tunnel often runs over UDP or TCP and must handle reliability differently depending on the transport. The more you understand transport mechanics, the more precise your debugging and design choices become.
How this fit on projects Projects 2, 3, 9, 11, 17, and 19 depend directly on transport behavior.
Definitions & key terms
- 5-tuple: The identifiers of a transport flow.
- Handshake: TCP connection establishment.
- Window: Flow control mechanism for TCP.
- Ephemeral port: Temporary client port assigned by the OS.
Mental model diagram
Client Server
SYN --------------------> (listening)
SYN-ACK <------------------
ACK --------------------> (established)
How it works
- TCP establishes state via handshake.
- Data is sent with sequence numbers and ACKs.
- Loss triggers retransmission and window reduction.
- UDP sends without state; application handles retries if needed. Invariants: TCP guarantees order if the connection stays up. Failure modes: half-open connections, blocked SYNs, dropped ACKs.
Minimal concrete example
UDP request/response:
Client -> UDP:53 query id=0x1234
Server -> UDP:53 response id=0x1234
Common misconceptions
- “UDP is always faster.” It can be, but loss may negate benefits.
- “TCP guarantees delivery across the internet.” It only guarantees delivery within the connection’s lifetime.
Check-your-understanding questions
- Why does a TCP connection need both sequence and acknowledgment numbers?
- When would you choose UDP over TCP for a home network tool?
Check-your-understanding answers
- To track sent bytes and confirm receipt in order.
- For low-latency, small messages where retries are acceptable.
Real-world applications
- Reliable file transfer versus real-time streaming.
- Port scanning and service discovery.
Where you will apply it
- Projects 2, 3, 9, 11, 17, 19
References
- RFC 9293 (TCP), RFC 768 (UDP)
- “TCP/IP Illustrated, Vol 1” by Stevens - Ch. 11-16
Key insights Transport protocols are state machines; your tools must respect their state.
Summary TCP and UDP make different promises. Your designs must align with those promises.
Homework/Exercises to practice the concept
- List three application protocols that use UDP and why.
- Draw the TCP close sequence and label each side’s state.
Solutions to the homework/exercises
- DNS (small queries), NTP (time sync), VoIP (latency).
- FIN/ACK exchange with TIME_WAIT on the side that closes last.
Operations, Diagnostics, and Security
Fundamentals Diagnostics and security are where theory becomes practical. ICMP provides feedback about connectivity, latency, and routing failures. Tools like ping and traceroute rely on ICMP to make network paths visible. Packet capture reveals the truth of what is happening on the wire, which is essential when logs or assumptions are wrong. Security in home/office networks is built on segmentation, stateful filtering, and secure Wi-Fi configuration. Understanding these tools and controls lets you diagnose failures quickly and prevent common threats.
Deep Dive ICMP is often misunderstood as “just ping,” but it is the control plane of IP. It reports unreachable destinations, TTL expiry, and fragmentation requirements. Traceroute manipulates TTL values to force routers along a path to send ICMP Time Exceeded messages, which reveals hop-by-hop routing. Understanding ICMP types and codes lets you differentiate between “host unreachable” and “port unreachable,” which can save hours of guessing. But ICMP is not guaranteed to be delivered, so diagnostics must be combined with other evidence like TCP resets and packet captures.
Packet capture is the definitive tool for understanding network behavior. A capture shows headers at every layer, timestamps, retransmissions, and the exact sequence of events that produced a failure. It also reveals hidden behaviors, such as DNS retries, TCP window size changes, or unexpected multicast traffic. Effective capture requires careful filtering and knowledge of what you are looking for. On switched networks, you may only see traffic destined for your host, so you may need port mirroring or monitor mode on Wi-Fi. Without this awareness, you can falsely conclude that traffic is not present when you are simply not observing the right link.
Security in home networks is primarily about reducing attack surface and limiting trust. A stateful firewall tracks connection state and allows return traffic while blocking unsolicited inbound flows. This is different from NAT, though NAT produces a similar effect. Segmentation isolates devices so that a compromised IoT device cannot reach sensitive systems. Wi-Fi security must protect the air interface using WPA2 or WPA3, and should avoid legacy configurations like WEP or open networks. VPNs create encrypted tunnels across untrusted networks; they can be site-to-site (linking networks) or remote access (linking a device to a network). VPNs also alter routing by creating new routes over the tunnel interface, which is why they sometimes break local network access or change DNS behavior.
Operational visibility is not just about troubleshooting failures. It is about validating performance and policy. A bandwidth monitor can show whether a device is saturating the uplink. A DNS sinkhole can show which devices are contacting malicious domains. A firewall log can show attempted scans or misconfigured services. These tools are the foundation of a healthy home or office network.
Finally, security is a system property, not a single setting. A strong Wi-Fi passphrase is meaningless if devices are on the same flat network with no segmentation. A firewall rule is only as good as the routing table that feeds it. The projects in this guide are structured to force you to test and verify these properties, not just configure them.
How this fit on projects Projects 2-4, 9-20 use diagnostics and security concepts directly.
Definitions & key terms
- ICMP: Control messages for IP.
- Stateful firewall: Filters traffic based on connection state.
- Segmentation: Separating devices into isolated networks.
- VPN: Encrypted tunnel over untrusted networks.
Mental model diagram
Client -> Router -> Internet
| | |
| firewall state |
| ICMP feedback |
+-> capture point |
How it works
- Use ICMP to test reachability and path.
- Capture packets to confirm actual behavior.
- Enforce policy with firewall and segmentation.
- Use VPNs to extend trust securely. Invariants: ICMP is best-effort, not guaranteed. Failure modes: blocked ICMP, asymmetric routing, misapplied firewall rules.
Minimal concrete example
Traceroute logic:
TTL=1 -> ICMP Time Exceeded from hop 1
TTL=2 -> ICMP Time Exceeded from hop 2
...
Common misconceptions
- “If ping fails, the host is down.” ICMP may be blocked.
- “NAT equals firewall.” NAT does not express security policy.
Check-your-understanding questions
- Why can traceroute fail even when the destination is reachable?
- Why might a VPN break access to local printers?
Check-your-understanding answers
- Routers or hosts may block ICMP Time Exceeded messages.
- VPN routes may override local routes, sending traffic into the tunnel.
Real-world applications
- Diagnosing intermittent Wi-Fi dropouts.
- Segmenting IoT devices away from trusted systems.
Where you will apply it
- Projects 2-4, 9-20
References
- RFC 792 (ICMP)
- “TCP/IP Illustrated, Vol 1” by Stevens - Ch. 6
Key insights You cannot secure or fix what you cannot observe.
Summary Diagnostics and security are practical disciplines that transform network theory into reliable systems.
Homework/Exercises to practice the concept
- Capture a TCP handshake and annotate each packet.
- Design a guest network segmentation plan for your home.
Solutions to the homework/exercises
- Identify SYN, SYN-ACK, ACK, then the first data packet.
- Create a separate VLAN or SSID with no access to LAN subnets.
3. Project Specification
3.1 What You Will Build
A forward proxy that relays HTTP requests and logs traffic metadata.
Included:
- CLI tool with clear output
- Validation steps and logging
- Documentation of assumptions
Excluded:
- Production-grade performance tuning
- Full security hardening
3.2 Functional Requirements
- Core function: Implement the primary behavior described in the project goal.
- Observable output: Produce deterministic output comparable to the Real World Outcome.
- Error handling: Handle timeouts, invalid inputs, and unreachable hosts gracefully.
3.3 Non-Functional Requirements
- Performance: Complete typical tasks within a few seconds on a LAN.
- Reliability: Fail safely and clearly on errors.
- Usability: Provide concise CLI flags and helpful messages.
3.4 Example Usage / Output
$ ./http_proxy --listen 127.0.0.1:8081
[14:10:02] GET http://example.com/ from 192.168.1.12 -> 93.184.216.34
[14:10:03] 200 OK 1256 bytes
3.5 Data Formats / Schemas / Protocols
Protocols: Transport and Ports (TCP and UDP), Operations, Diagnostics, and Security.
3.6 Edge Cases
- Target unreachable or timing out
- Malformed or unexpected responses
- Multiple interfaces or subnets
3.7 Real World Outcome
$ ./http_proxy --listen 127.0.0.1:8081
[14:10:02] GET http://example.com/ from 192.168.1.12 -> 93.184.216.34
[14:10:03] 200 OK 1256 bytes
3.7.1 How to Run (Copy/Paste)
- Build:
make(or create a virtual environment as needed) - Run:
./P17-http-proxy-server - Config: update any constants in a config file or flags
- Working directory: project root
3.7.2 Golden Path Demo (Deterministic)
Run against a known local target and compare with the expected output.
3.7.3 If CLI: provide an exact terminal transcript
$ ./http_proxy --listen 127.0.0.1:8081
[14:10:02] GET http://example.com/ from 192.168.1.12 -> 93.184.216.34
[14:10:03] 200 OK 1256 bytes
4. Solution Architecture
4.1 High-Level Design
CLI Input -> Core Engine -> Output/Logs
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| CLI Parser | Parse flags and inputs | Keep interface minimal |
| Core Engine | Protocol logic and state | Keep deterministic timing |
| Output/Logs | Present results and errors | Use consistent formatting |
4.4 Data Structures (No Full Code)
Request:
- target
- protocol fields
Response:
- status
- timing
State:
- retries
- cache entries
5. Implementation Guide
5.1 Development Environment Setup
- Ensure required tools (tcpdump, dig, ip) are installed.
- Use elevated privileges where raw sockets are required.
5.2 Project Structure
project/
README.md
docs/
src/
tests/
data/
5.3 The Core Question You’re Answering
“How can a middlebox observe or control web traffic without breaking it?”
5.4 Concepts You Must Understand First
- HTTP request structure
- How does a proxy-style request line differ?
- Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 14
- TCP stream forwarding
- How do you relay data between two sockets?
- Book Reference: “UNIX Network Programming, Vol 1” - Ch. 5
- Connection lifetimes
- How do keep-alive connections change your logic?
- Book Reference: “Computer Networks” - Ch. 6
5.5 Questions to Guide Your Design
- Parsing vs tunneling
- Which parts of the request should you parse vs pass through?
- Logging
- What metadata is useful without capturing full content?
5.6 Thinking Exercise
CONNECT method
How does an HTTP proxy handle HTTPS via CONNECT?
Questions to answer:
- What does the proxy see after the tunnel is established?
- What can it not see?
5.7 The Interview Questions They’ll Ask
- “How does a forward proxy differ from a reverse proxy?”
- “What is the CONNECT method used for?”
- “How do you avoid blocking while relaying data?”
- “What privacy issues do proxies raise?”
- “How do proxies interact with TLS?”
5.8 Hints in Layers
Hint 1: Start with HTTP only Handle plain HTTP before adding CONNECT.
Hint 2: Use line-based parsing for headers Detect end of headers before relaying body.
Hint 3: Pseudocode outline
- accept client connection
- read request line and headers
- open connection to target server
- relay request and response
- log summary
Hint 4: Verification
Set http_proxy environment variable and browse a test site.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| TCP relay patterns | “UNIX Network Programming, Vol 1” | Ch. 5 |
| HTTP format | “TCP/IP Illustrated, Vol 1” | Ch. 14 |
| Proxy concepts | “Computer Networks” | Ch. 7 |
5.10 Implementation Phases
- Establish core protocol I/O and a minimal success path.
- Add parsing, validation, and timeouts.
- Add logging, metrics, and polish for output.
5.11 Key Implementation Decisions
- Which interface and capture point provides visibility?
- What timeout and retry strategy balances speed and accuracy?
- How will results be validated against reference tools?
6. Testing Strategy
6.1 Test Categories
- Unit: parsing and validation logic
- Integration: protocol exchange with a real device
- System: full run with reference tools
6.2 Critical Test Cases
- Successful request/response path
- Timeout and retry behavior
- Invalid or unexpected input handling
6.3 Test Data
- Local gateway IP and a known reachable host
- A non-routable IP to trigger timeouts
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
Problem 1: “Hangs on large responses”
- Why: Not streaming data, waiting for EOF.
- Fix: Relay data in chunks until Content-Length satisfied.
- Quick test: Download a large file through the proxy.
Problem 2: “HTTPS fails”
- Why: CONNECT not implemented or blocked.
- Fix: Add CONNECT tunnel logic and allow port 443.
- Quick test: Run
curl -xwith an HTTPS URL.
7.2 Debugging Strategies
- Capture traffic with tcpdump or Wireshark.
- Compare against a known-good tool.
- Log timestamps and retry logic.
7.3 Performance Traps
- Excessive retries causing long runtimes.
- Inefficient parsing under high packet rates.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add basic configuration flags for interface and timeout.
- Improve output formatting and sorting.
8.2 Intermediate Extensions
- Add caching or state persistence.
- Add CSV or JSON output export.
8.3 Advanced Extensions
- Add concurrency with careful rate limiting.
- Add visualization or integration with a dashboard.
9. Real-World Connections
9.1 Industry Applications
- Network diagnostics and troubleshooting
- Security monitoring and policy enforcement
9.2 Related Open Source Projects
- tcpdump / Wireshark
- nmap / dnsmasq / unbound (as applicable)
9.3 Interview Relevance
- Explaining protocol behavior
- Diagnosing failures by layer
10. Resources
10.1 Essential Reading
| Topic | Book | Chapter |
|---|---|---|
| TCP relay patterns | “UNIX Network Programming, Vol 1” | Ch. 5 |
| HTTP format | “TCP/IP Illustrated, Vol 1” | Ch. 14 |
| Proxy concepts | “Computer Networks” | Ch. 7 |
10.2 Video Resources
- Wireshark or tcpdump walkthroughs (search for recent tutorials)
- Vendor or RFC explainers for the relevant protocol
10.3 Tools & Documentation
- RFCs for the protocols used in this project
man tcpdump,man ip,man ss
10.4 Related Projects in This Series
- Project 2: Build Your Own ping Utility
- Project 3: Build Your Own traceroute Utility
- Project 5: DNS Resolver (Client-Side)
- Project 7: Simple DNS Server (Authoritative)
- Project 9: Port Scanner
- Project 10: DNS Sinkhole (Pi-hole Style)
- Project 11: Simple HTTP Server
- Project 12: Bandwidth Monitor
- Project 13: Simple Packet Filter Firewall
- Project 14: Software Router with NAT
- Project 15: Wake-on-LAN Tool
- Project 19: Simple VPN Server
- Project 20: Complete Home Network Stack (Capstone)