Project 6: Email Header Analyzer and Route Visualizer
Build a forensic tool that parses email headers and reconstructs the message path.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Language | Python (Alternatives: Go, Rust) |
| Prerequisites | RFC 5322 headers, basic parsing |
| Key Topics | Received headers, authentication results, time zones |
1. Learning Objectives
- Parse raw email headers with folding and repeated fields.
- Extract Received chain and reconstruct delivery path.
- Parse Authentication-Results and summarize SPF/DKIM/DMARC.
- Output a clear timeline and route diagram.
2. Theoretical Foundation
2.1 Core Concepts
- Received headers: Each hop adds a Received line, forming a reverse-chronological chain.
- Authentication-Results: Contains pass/fail results from SPF, DKIM, DMARC.
- Message-ID and Date: Identify the message and capture its originating timestamp.
- Header folding: Long headers wrap onto multiple lines with leading whitespace.
2.2 Why This Matters
Header analysis is the basis of deliverability debugging and security investigations. It tells you where a message came from and how it was handled.
2.3 Historical Context / Background
Email headers were designed for traceability. Over time, providers added new headers for authentication and filtering.
2.4 Common Misconceptions
- Misconception: The top Received header is the sender. Reality: The bottom-most Received is closest to origin.
- Misconception: Date header is reliable. Reality: It can be forged; Received timestamps are more trustworthy.
3. Project Specification
3.1 What You Will Build
A CLI that accepts a raw message, parses headers, reconstructs hops, and outputs a timeline and a trust summary for SPF/DKIM/DMARC.
3.2 Functional Requirements
- Parse headers with folding and duplicates.
- Extract and order Received headers.
- Parse Authentication-Results and summarize results.
- Build a route timeline with timestamps.
3.3 Non-Functional Requirements
- Performance: Parse a typical message in under 200 ms.
- Reliability: Handle malformed headers gracefully.
- Usability: Output clear hop-by-hop details.
3.4 Example Usage / Output
$ ./header-analyzer message.eml
Route:
1) mx1.origin.net -> mx2.provider.com 2024-01-03 09:12:31 -0500
2) mx2.provider.com -> mx3.provider.com 2024-01-03 09:12:33 -0500
Auth:
SPF: pass
DKIM: pass (d=example.com)
DMARC: pass (p=reject)
3.5 Real World Outcome
You can explain the exact path and authentication state of any email message and identify suspicious hops or forged headers.
4. Solution Architecture
4.1 High-Level Design
Message Parser
-> Header Normalizer
-> Received Chain Builder
-> Auth Summary Parser
-> Timeline Renderer
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Normalizer | Unfold headers | Preserve order and raw values |
| Chain Builder | Parse Received headers | Parse timestamps and hosts |
| Auth Parser | Extract SPF/DKIM/DMARC | Handle multiple results |
| Renderer | Output report | Text and JSON modes |
4.3 Data Structures
class ReceivedHop:
def __init__(self, from_host, by_host, date):
self.from_host = from_host
self.by_host = by_host
self.date = date
4.4 Algorithm Overview
Key Algorithm: Received Ordering
- Collect all Received headers in order of appearance.
- Reverse list to show origin first.
- Parse dates and compute transit times.
Complexity Analysis:
- Time: O(n) for header count
- Space: O(n)
5. Implementation Guide
5.1 Development Environment Setup
python -m venv .venv
source .venv/bin/activate
5.2 Project Structure
header-analyzer/
├── parser.py
├── received.py
├── auth_results.py
└── render.py
5.3 The Core Question You’re Answering
“Where did this email really travel, and what did each hop decide about it?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Header folding rules
- Received header format
- Auth results structure
- Time zones and offsets
5.5 Questions to Guide Your Design
- How will you parse dates with varying formats?
- How will you handle missing or malformed Received headers?
- Will you trust Authentication-Results from all hops?
5.6 Thinking Exercise
If a message has three Received headers and the second has a timestamp that is earlier than the third, what could that mean?
5.7 The Interview Questions They’ll Ask
- “Which Received header is closest to the sender?”
- “How do you detect forged Date headers?”
- “What is Authentication-Results used for?”
5.8 Hints in Layers
Hint 1: Start with unfold
- Join continuation lines to form single headers.
Hint 2: Parse minimal Received fields
- Focus on
from,by, and date first.
Hint 3: Add timeline metrics
- Compute delta between hops.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Message format | RFC 5322 | Sections 2-3 |
| Header processing | Internet Email Protocols | Ch. 7 |
| Parsing | TLPI | Ch. 63 |
5.10 Implementation Phases
Phase 1: Foundation (2-3 days)
Goals:
- Parse and unfold headers
Tasks:
- Implement header parser with folding.
- Extract basic fields.
Checkpoint: Print all headers correctly.
Phase 2: Core Functionality (4-5 days)
Goals:
- Build route chain
Tasks:
- Parse Received headers.
- Create timeline output.
Checkpoint: Display hop order and timestamps.
Phase 3: Polish and Edge Cases (2-3 days)
Goals:
- Parse Authentication-Results
Tasks:
- Extract SPF/DKIM/DMARC results.
- Output summary and JSON mode.
Checkpoint: Report correct auth status.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Date parsing | strict vs forgiving | forgiving | Real messages are inconsistent |
| Auth results | first vs last | last trusted hop | Closer to recipient |
| Output | text vs JSON | both | Supports automation |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Header unfolding | folded Subject line |
| Integration Tests | Real messages | Gmail sample message |
| Edge Case Tests | Missing dates | fallback behavior |
6.2 Critical Test Cases
- Multiple Received headers are ordered correctly.
- Folded headers are reconstructed correctly.
- Auth results are parsed into SPF/DKIM/DMARC values.
6.3 Test Data
Received: from a.example.com by b.example.com; Tue, 1 Jan 2024 10:00:00 -0500
7. Common Pitfalls and Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong order of Received | Path reversed | Reverse list to origin first |
| Bad date parsing | crashes | Use robust parser with fallback |
| Trusting untrusted headers | false conclusions | Treat earliest hops as untrusted |
7.2 Debugging Strategies
- Compare with raw header view in email client.
- Print the unfolded headers for inspection.
7.3 Performance Traps
- Excessive regex on large headers. Use incremental parsing.
8. Extensions and Challenges
8.1 Beginner Extensions
- Add a JSON output mode.
- Highlight suspicious time gaps.
8.2 Intermediate Extensions
- Visualize route as ASCII diagram.
- Detect private IP hops.
8.3 Advanced Extensions
- Integrate with SPF/DKIM verifier tools.
- Export to graph format (Graphviz).
9. Real-World Connections
9.1 Industry Applications
- Security analysts use headers for phishing investigations.
- Deliverability teams diagnose routing and authentication.
9.2 Related Open Source Projects
- mailparser: https://github.com/mikel/mail - robust mail parsing
- spamassassin: https://spamassassin.apache.org/ - header analysis engine
9.3 Interview Relevance
- Header parsing and forensic reasoning are common security interview topics.
10. Resources
10.1 Essential Reading
- RFC 5322 - email message format
- RFC 7601 - Authentication-Results header
10.2 Video Resources
- Email header analysis walkthroughs
10.3 Tools and Documentation
- Email client raw source view
- mxtoolbox header analyzer for comparison
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the purpose of Received headers
- I can parse folded headers
- I understand Authentication-Results
11.2 Implementation
- Produces a correct hop timeline
- Summarizes SPF/DKIM/DMARC
- Handles malformed headers
11.3 Growth
- I can reason about message origin with partial data
- I can explain header trust boundaries
12. Submission / Completion Criteria
Minimum Viable Completion:
- Parse headers and list Received hops
Full Completion:
- Provide a timeline and auth summary
Excellence (Going Above and Beyond):
- Visual route graph and JSON export
- Detect anomalies in routing
This guide was generated from EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md. For the complete learning path, see the parent directory.