Project 6: Email Header Analyzer and Route Visualizer

Build a forensic tool that parses email headers and reconstructs the message path.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Language Python (Alternatives: Go, Rust)
Prerequisites RFC 5322 headers, basic parsing
Key Topics Received headers, authentication results, time zones

1. Learning Objectives

  1. Parse raw email headers with folding and repeated fields.
  2. Extract Received chain and reconstruct delivery path.
  3. Parse Authentication-Results and summarize SPF/DKIM/DMARC.
  4. Output a clear timeline and route diagram.

2. Theoretical Foundation

2.1 Core Concepts

  • Received headers: Each hop adds a Received line, forming a reverse-chronological chain.
  • Authentication-Results: Contains pass/fail results from SPF, DKIM, DMARC.
  • Message-ID and Date: Identify the message and capture its originating timestamp.
  • Header folding: Long headers wrap onto multiple lines with leading whitespace.

2.2 Why This Matters

Header analysis is the basis of deliverability debugging and security investigations. It tells you where a message came from and how it was handled.

2.3 Historical Context / Background

Email headers were designed for traceability. Over time, providers added new headers for authentication and filtering.

2.4 Common Misconceptions

  • Misconception: The top Received header is the sender. Reality: The bottom-most Received is closest to origin.
  • Misconception: Date header is reliable. Reality: It can be forged; Received timestamps are more trustworthy.

3. Project Specification

3.1 What You Will Build

A CLI that accepts a raw message, parses headers, reconstructs hops, and outputs a timeline and a trust summary for SPF/DKIM/DMARC.

3.2 Functional Requirements

  1. Parse headers with folding and duplicates.
  2. Extract and order Received headers.
  3. Parse Authentication-Results and summarize results.
  4. Build a route timeline with timestamps.

3.3 Non-Functional Requirements

  • Performance: Parse a typical message in under 200 ms.
  • Reliability: Handle malformed headers gracefully.
  • Usability: Output clear hop-by-hop details.

3.4 Example Usage / Output

$ ./header-analyzer message.eml
Route:
  1) mx1.origin.net -> mx2.provider.com  2024-01-03 09:12:31 -0500
  2) mx2.provider.com -> mx3.provider.com  2024-01-03 09:12:33 -0500
Auth:
  SPF: pass
  DKIM: pass (d=example.com)
  DMARC: pass (p=reject)

3.5 Real World Outcome

You can explain the exact path and authentication state of any email message and identify suspicious hops or forged headers.


4. Solution Architecture

4.1 High-Level Design

Message Parser
  -> Header Normalizer
  -> Received Chain Builder
  -> Auth Summary Parser
  -> Timeline Renderer

4.2 Key Components

Component Responsibility Key Decisions
Normalizer Unfold headers Preserve order and raw values
Chain Builder Parse Received headers Parse timestamps and hosts
Auth Parser Extract SPF/DKIM/DMARC Handle multiple results
Renderer Output report Text and JSON modes

4.3 Data Structures

class ReceivedHop:
    def __init__(self, from_host, by_host, date):
        self.from_host = from_host
        self.by_host = by_host
        self.date = date

4.4 Algorithm Overview

Key Algorithm: Received Ordering

  1. Collect all Received headers in order of appearance.
  2. Reverse list to show origin first.
  3. Parse dates and compute transit times.

Complexity Analysis:

  • Time: O(n) for header count
  • Space: O(n)

5. Implementation Guide

5.1 Development Environment Setup

python -m venv .venv
source .venv/bin/activate

5.2 Project Structure

header-analyzer/
├── parser.py
├── received.py
├── auth_results.py
└── render.py

5.3 The Core Question You’re Answering

“Where did this email really travel, and what did each hop decide about it?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Header folding rules
  2. Received header format
  3. Auth results structure
  4. Time zones and offsets

5.5 Questions to Guide Your Design

  1. How will you parse dates with varying formats?
  2. How will you handle missing or malformed Received headers?
  3. Will you trust Authentication-Results from all hops?

5.6 Thinking Exercise

If a message has three Received headers and the second has a timestamp that is earlier than the third, what could that mean?

5.7 The Interview Questions They’ll Ask

  1. “Which Received header is closest to the sender?”
  2. “How do you detect forged Date headers?”
  3. “What is Authentication-Results used for?”

5.8 Hints in Layers

Hint 1: Start with unfold

  • Join continuation lines to form single headers.

Hint 2: Parse minimal Received fields

  • Focus on from, by, and date first.

Hint 3: Add timeline metrics

  • Compute delta between hops.

5.9 Books That Will Help

Topic Book Chapter
Message format RFC 5322 Sections 2-3
Header processing Internet Email Protocols Ch. 7
Parsing TLPI Ch. 63

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

  • Parse and unfold headers

Tasks:

  1. Implement header parser with folding.
  2. Extract basic fields.

Checkpoint: Print all headers correctly.

Phase 2: Core Functionality (4-5 days)

Goals:

  • Build route chain

Tasks:

  1. Parse Received headers.
  2. Create timeline output.

Checkpoint: Display hop order and timestamps.

Phase 3: Polish and Edge Cases (2-3 days)

Goals:

  • Parse Authentication-Results

Tasks:

  1. Extract SPF/DKIM/DMARC results.
  2. Output summary and JSON mode.

Checkpoint: Report correct auth status.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Date parsing strict vs forgiving forgiving Real messages are inconsistent
Auth results first vs last last trusted hop Closer to recipient
Output text vs JSON both Supports automation

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Header unfolding folded Subject line
Integration Tests Real messages Gmail sample message
Edge Case Tests Missing dates fallback behavior

6.2 Critical Test Cases

  1. Multiple Received headers are ordered correctly.
  2. Folded headers are reconstructed correctly.
  3. Auth results are parsed into SPF/DKIM/DMARC values.

6.3 Test Data

Received: from a.example.com by b.example.com; Tue, 1 Jan 2024 10:00:00 -0500

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong order of Received Path reversed Reverse list to origin first
Bad date parsing crashes Use robust parser with fallback
Trusting untrusted headers false conclusions Treat earliest hops as untrusted

7.2 Debugging Strategies

  • Compare with raw header view in email client.
  • Print the unfolded headers for inspection.

7.3 Performance Traps

  • Excessive regex on large headers. Use incremental parsing.

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add a JSON output mode.
  • Highlight suspicious time gaps.

8.2 Intermediate Extensions

  • Visualize route as ASCII diagram.
  • Detect private IP hops.

8.3 Advanced Extensions

  • Integrate with SPF/DKIM verifier tools.
  • Export to graph format (Graphviz).

9. Real-World Connections

9.1 Industry Applications

  • Security analysts use headers for phishing investigations.
  • Deliverability teams diagnose routing and authentication.
  • mailparser: https://github.com/mikel/mail - robust mail parsing
  • spamassassin: https://spamassassin.apache.org/ - header analysis engine

9.3 Interview Relevance

  • Header parsing and forensic reasoning are common security interview topics.

10. Resources

10.1 Essential Reading

  • RFC 5322 - email message format
  • RFC 7601 - Authentication-Results header

10.2 Video Resources

  • Email header analysis walkthroughs

10.3 Tools and Documentation

  • Email client raw source view
  • mxtoolbox header analyzer for comparison

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the purpose of Received headers
  • I can parse folded headers
  • I understand Authentication-Results

11.2 Implementation

  • Produces a correct hop timeline
  • Summarizes SPF/DKIM/DMARC
  • Handles malformed headers

11.3 Growth

  • I can reason about message origin with partial data
  • I can explain header trust boundaries

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Parse headers and list Received hops

Full Completion:

  • Provide a timeline and auth summary

Excellence (Going Above and Beyond):

  • Visual route graph and JSON export
  • Detect anomalies in routing

This guide was generated from EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md. For the complete learning path, see the parent directory.