EMAIL SYSTEMS DEEP DIVE PROJECTS
Email Systems Deep Dive: From MX Records to Spam Detection
Goal: Truly understand how email works behind the scenes—protocols, authentication, trust systems, and spam detection—by building tools that interact with every layer of the email stack.
Core Concept Analysis
Email is a federated, store-and-forward messaging system built on protocols designed in the 1980s, with security layers bolted on over decades. To understand it, you need to grasp these fundamental building blocks:
The Email Stack
┌─────────────────────────────────────────────────────────────┐
│ USER LAYER (MUA) │
│ Thunderbird, Gmail Web, Apple Mail, Outlook │
├─────────────────────────────────────────────────────────────┤
│ RETRIEVAL PROTOCOLS │
│ POP3 / IMAP │
├─────────────────────────────────────────────────────────────┤
│ DELIVERY AGENT (MDA) │
│ Dovecot, procmail, maildrop │
├─────────────────────────────────────────────────────────────┤
│ TRANSFER AGENT (MTA) │
│ Postfix, Sendmail, Exim │
├─────────────────────────────────────────────────────────────┤
│ SMTP PROTOCOL │
│ RFC 5321 - The actual transport │
├─────────────────────────────────────────────────────────────┤
│ AUTHENTICATION LAYER │
│ SPF / DKIM / DMARC / ARC │
├─────────────────────────────────────────────────────────────┤
│ DNS LAYER │
│ MX Records, TXT Records (SPF/DKIM/DMARC) │
├─────────────────────────────────────────────────────────────┤
│ REPUTATION SYSTEMS │
│ Blacklists (DNSBLs), Sender Score, Feedback Loops │
└─────────────────────────────────────────────────────────────┘
Key Concepts You’ll Master
| Concept | What It Does | Why It Matters |
|---|---|---|
| MX Records | DNS entries pointing to mail servers for a domain | Without this, no one knows where to deliver your mail |
| SMTP | The protocol for transferring mail between servers | The actual “language” mail servers speak |
| SPF | DNS record listing authorized sending IPs | Prevents unauthorized servers from sending as your domain |
| DKIM | Cryptographic signature proving email origin | Proves the email wasn’t tampered with in transit |
| DMARC | Policy for handling SPF/DKIM failures | Tells receivers what to do when auth fails |
| Blacklists | Databases of known spam IPs/domains | The “criminal record” system for mail servers |
| Reputation | Trust score based on sending behavior | Determines if your mail reaches inbox or spam |
| Headers | Metadata recording email’s journey | The forensic trail of every email |
Project 1: Raw SMTP Client from Scratch
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: C, Go, Rust
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
- Difficulty: Level 2: Intermediate
- Knowledge Area: Network Protocols / SMTP
- Software or Tool: SMTP Client
- Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens
What you’ll build: A command-line tool that sends emails by speaking raw SMTP to mail servers—no libraries, just socket connections and protocol commands.
Why it teaches email: You’ll see exactly what happens when you click “Send”. Every HELO, MAIL FROM, RCPT TO, and DATA command. You’ll understand why emails fail, what “relay denied” means, and how servers authenticate each other.
Core challenges you’ll face:
- Socket-level communication → maps to understanding TCP connections
- SMTP command/response parsing → maps to protocol state machines
- STARTTLS negotiation → maps to encryption in transit
- Handling multi-line responses → maps to protocol parsing edge cases
- Authentication (PLAIN, LOGIN, CRAM-MD5) → maps to SMTP AUTH mechanisms
Key Concepts:
- SMTP Protocol Basics: RFC 5321 - Sections 3 and 4
- Socket Programming: “The Linux Programming Interface” Chapter 56-61 - Michael Kerrisk
- TLS Handshake: “Serious Cryptography, 2nd Edition” Chapter 12 - Jean-Philippe Aumasson
- Protocol State Machines: “TCP/IP Illustrated, Volume 1” Chapter 5 - W. Richard Stevens
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic networking concepts, socket programming basics
Real world outcome:
$ ./smtp-client --server smtp.gmail.com --port 587 \
--from you@gmail.com --to friend@example.com \
--subject "Test from my client" --body "Hello from raw SMTP!"
[CONNECT] Connected to smtp.gmail.com:587
[RECV] 220 smtp.gmail.com ESMTP Ready
[SEND] EHLO myclient.local
[RECV] 250-smtp.gmail.com Hello
[RECV] 250-STARTTLS
[RECV] 250 OK
[SEND] STARTTLS
[RECV] 220 Ready for TLS
[TLS] Handshake complete (TLS 1.3)
[SEND] AUTH PLAIN <credentials>
[RECV] 235 Authentication successful
[SEND] MAIL FROM:<you@gmail.com>
[RECV] 250 OK
[SEND] RCPT TO:<friend@example.com>
[RECV] 250 OK
[SEND] DATA
[RECV] 354 Start mail input
[SEND] <headers and body>
[RECV] 250 Message accepted
[SEND] QUIT
SUCCESS: Email sent!
Implementation Hints:
SMTP is a text-based, line-oriented protocol. Each command ends with \r\n. Responses start with a 3-digit code (2xx = success, 4xx = temporary failure, 5xx = permanent failure). Multi-line responses have a hyphen after the code (e.g., 250-SIZE 35882577) until the final line (e.g., 250 OK). You must handle this parsing correctly. For STARTTLS, you upgrade the plain socket to TLS mid-connection—this is where most beginners get stuck.
Learning milestones:
- Send email to local server without TLS → You understand basic SMTP dialogue
- Successfully negotiate STARTTLS → You understand encryption layer
- Authenticate and send via Gmail → You understand real-world SMTP requirements
Project 2: MX Record Resolver & Mail Router
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: C
- Alternative Programming Languages: Go, Rust, Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: DNS / Email Routing
- Software or Tool: DNS Resolver
- Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens
What you’ll build: A tool that takes any email address and determines exactly which server(s) should receive mail for that domain, including priority ordering and fallback logic.
Why it teaches email: MX (Mail eXchange) records are the foundation of email routing. Without understanding DNS lookups, you can’t understand why emails get lost, how backup mail servers work, or why some domains have multiple MX records with different priorities.
Core challenges you’ll face:
- DNS query construction → maps to understanding DNS protocol
- MX record parsing and priority sorting → maps to mail routing logic
- Handling CNAME chains → maps to DNS resolution complexity
- Fallback to A/AAAA records → maps to implicit MX rules
- DNS caching and TTL → maps to real-world DNS behavior
Key Concepts:
- DNS Protocol: “TCP/IP Illustrated, Volume 1” Chapter 11 - W. Richard Stevens
- MX Record Format: RFC 5321 Section 5 - Mail Routing
- DNS Message Format: RFC 1035 - Sections 4-5
- Network Byte Order: “Computer Systems: A Programmer’s Perspective” Chapter 11 - Bryant & O’Hallaron
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic networking, understanding of DNS concepts
Real world outcome:
$ ./mx-resolver gmail.com
Querying MX records for: gmail.com
MX Records (sorted by priority):
Priority 5: gmail-smtp-in.l.google.com → 142.250.152.27
Priority 10: alt1.gmail-smtp-in.l.google.com → 142.250.115.26
Priority 20: alt2.gmail-smtp-in.l.google.com → 142.251.9.27
Priority 30: alt3.gmail-smtp-in.l.google.com → 74.125.200.27
Priority 40: alt4.gmail-smtp-in.l.google.com → 64.233.186.27
Mail delivery order: Try priority 5 first, then 10, 20, etc.
TTL: 300 seconds (cached until refresh)
$ ./mx-resolver some-small-site.com
Querying MX records for: some-small-site.com
No MX records found!
Falling back to A record (implicit MX, RFC 5321):
some-small-site.com → 93.184.216.34
Warning: This domain has no explicit MX. Mail delivery may be unreliable.
Implementation Hints:
You’ll construct raw DNS query packets: 12-byte header, then the question section with the domain name in label format (e.g., \x05gmail\x03com\x00 for gmail.com), question type (MX = 15), and class (IN = 1). Send via UDP to port 53. Parse the response: same header (check the response flags), skip the question section, then parse the answer section where each MX record contains priority (2 bytes) and the mail server name (in compressed or uncompressed format). DNS name compression uses pointers (bytes starting with 0xC0) that reference earlier parts of the message—this is the tricky part.
Learning milestones:
- Query and parse A records → You understand basic DNS structure
- Handle MX record priority sorting → You understand mail routing
- Handle DNS compression pointers → You’ve mastered DNS parsing
Project 3: SPF Record Parser & Validator
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust, C
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Email Authentication / DNS
- Software or Tool: SPF Validator
- Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens
What you’ll build: A complete SPF validator that parses SPF DNS records, resolves all mechanisms (including nested includes), and determines if a given IP address is authorized to send email for a domain.
Why it teaches email authentication: SPF is the first line of defense against email spoofing. Understanding its recursive nature (includes, redirects), its IP matching logic, and its failure modes teaches you how email authentication actually works in practice.
Core challenges you’ll face:
- SPF syntax parsing → maps to understanding the SPF grammar
- Recursive
include:resolution → maps to understanding DNS lookup chains - IP/CIDR matching → maps to network address mathematics
- Handling
redirectandexp→ maps to SPF modifiers - DNS lookup limits (max 10) → maps to SPF anti-DoS mechanisms
- Qualifier interpretation (+, -, ~, ?) → maps to pass/fail/softfail/neutral
Key Concepts:
- SPF Specification: RFC 7208 - Sender Policy Framework
- CIDR Notation: “TCP/IP Illustrated, Volume 1” Chapter 3 - W. Richard Stevens
- DNS TXT Records: “Computer Networks, 5th Edition” Chapter 7 - Tanenbaum
- Recursive Parsing: “Language Implementation Patterns” Chapter 3 - Terence Parr
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: DNS basics, IP networking, basic parsing skills
Real world outcome:
$ ./spf-check google.com 209.85.220.41
Checking if 209.85.220.41 can send for google.com
Fetching SPF record for google.com...
SPF: v=spf1 include:_spf.google.com ~all
Resolving include:_spf.google.com...
SPF: v=spf1 include:_netblocks.google.com include:_netblocks2.google.com include:_netblocks3.google.com ~all
Resolving include:_netblocks.google.com...
SPF: v=spf1 ip4:35.190.247.0/24 ip4:64.233.160.0/19 ip4:66.102.0.0/20 ip4:66.249.80.0/20 ip4:72.14.192.0/18 ip4:74.125.0.0/16 ip4:108.177.8.0/21 ip4:173.194.0.0/16 ip4:209.85.128.0/17 ... ~all
DNS lookups used: 4 of 10 maximum
Checking IP 209.85.220.41 against mechanisms:
✗ ip4:35.190.247.0/24 - not in range
✗ ip4:64.233.160.0/19 - not in range
...
✓ ip4:209.85.128.0/17 - MATCH! (209.85.128.0 - 209.85.255.255)
RESULT: PASS
IP 209.85.220.41 is AUTHORIZED to send email for google.com
Implementation Hints:
SPF records are TXT DNS records starting with v=spf1. Parse mechanisms left-to-right: ip4:, ip6:, a, mx, include:, all. Each can have a qualifier prefix: + (pass, default), - (fail), ~ (softfail), ? (neutral). The include: mechanism requires recursively fetching another domain’s SPF record—but you must count total DNS lookups (max 10 to prevent DoS). For IP matching, convert CIDR notation to a range: 209.85.128.0/17 means the first 17 bits are fixed, so it covers 209.85.128.0 to 209.85.255.255 (2^15 = 32768 addresses).
Learning milestones:
- Parse simple SPF records → You understand SPF syntax
- Handle nested includes with lookup counting → You understand SPF’s recursive nature
- Correctly match IPs against CIDR ranges → You’ve mastered SPF validation
Project 4: DKIM Signature Verifier
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust, C
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Cryptography / Email Authentication
- Software or Tool: DKIM Verifier
- Main Book: “Serious Cryptography, 2nd Edition” by Jean-Philippe Aumasson
What you’ll build: A tool that extracts DKIM signatures from email headers, fetches the public key from DNS, reconstructs the signed content exactly as the sender created it, and verifies the RSA/Ed25519 signature.
Why it teaches email authentication: DKIM proves that an email hasn’t been tampered with and came from a domain’s authorized servers. Understanding header canonicalization, body hashing, and signature verification teaches you how cryptographic authentication works in real-world protocols.
Core challenges you’ll face:
- DKIM header parsing → maps to understanding DKIM tag-value syntax
- Canonicalization (relaxed vs simple) → maps to why signatures break
- Body hash calculation → maps to content integrity verification
- Public key DNS lookup → maps to key distribution via DNS
- RSA/Ed25519 signature verification → maps to asymmetric cryptography
- Header ordering and selection → maps to what’s actually signed
Key Concepts:
- DKIM Specification: RFC 6376 - DomainKeys Identified Mail
- RSA Signatures: “Serious Cryptography, 2nd Edition” Chapter 11 - Aumasson
- SHA-256 Hashing: “Serious Cryptography, 2nd Edition” Chapter 6 - Aumasson
- Base64 Encoding: “Computer Systems: A Programmer’s Perspective” Chapter 2 - Bryant & O’Hallaron
- Email Header Format: RFC 5322 - Internet Message Format
Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Cryptography basics, email header understanding, base64
Real world outcome:
$ ./dkim-verify email.eml
Parsing email: email.eml
Found DKIM-Signature header:
v=1; a=rsa-sha256; c=relaxed/relaxed;
d=google.com; s=20230601;
h=from:to:subject:date:message-id;
bh=2jUSOH9NhtVGCQWNr9BrIAPreKQjO6Sn7XIkfJVOzv8=;
b=dGVzdCBzaWduYXR1cmU...
Fetching public key from DNS:
Query: 20230601._domainkey.google.com TXT
Key: v=DKIM1; k=rsa; p=MIIBIjANBgkqhki...
Canonicalizing headers (relaxed):
from:sender@google.com
to:recipient@example.com
subject:test email
...
Calculating body hash (relaxed canonicalization):
Body length: 1547 bytes
Computed hash: 2jUSOH9NhtVGCQWNr9BrIAPreKQjO6Sn7XIkfJVOzv8=
Header bh=: 2jUSOH9NhtVGCQWNr9BrIAPreKQjO6Sn7XIkfJVOzv8=
✓ Body hash MATCHES
Verifying RSA-SHA256 signature:
✓ Signature VALID
RESULT: DKIM PASS
This email was cryptographically signed by google.com and has not been modified.
Implementation Hints:
The DKIM-Signature header contains tags: a (algorithm, usually rsa-sha256), d (signing domain), s (selector), h (signed headers list), bh (body hash), b (signature). Canonicalization is crucial: “relaxed” mode converts headers to lowercase, unfolds lines, and reduces whitespace; “simple” mode keeps them as-is. The body hash (bh) is SHA-256 of the canonicalized body. The signature (b) signs the canonicalized headers (in order specified by h) PLUS the DKIM-Signature header itself (with b= value removed). Fetch the public key from <selector>._domainkey.<domain> TXT record. Use RSA to verify: decrypt signature with public key, compare to hash of signed content.
Learning milestones:
- Parse DKIM headers and fetch public keys → You understand DKIM structure
- Correctly canonicalize headers and body → You understand why signatures break
- Successfully verify real email signatures → You’ve mastered DKIM
Project 5: DKIM Signer (Outbound Email)
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Cryptography / Email Authentication
- Software or Tool: DKIM Signer
- Main Book: “Serious Cryptography, 2nd Edition” by Jean-Philippe Aumasson
What you’ll build: A tool that generates RSA key pairs, creates DNS TXT record entries for the public key, and signs outgoing emails with DKIM headers that receiving servers can verify.
Why it teaches email authentication: Signing emails is harder than verifying—you must choose which headers to sign, generate proper key pairs, and format everything exactly right. This teaches you both the security model (what signing proves) and the practical deployment (DNS records, key rotation).
Core challenges you’ll face:
- RSA key pair generation → maps to asymmetric key management
- DNS TXT record formatting → maps to key publication
- Header selection strategy → maps to what to protect from tampering
- Canonicalization implementation → maps to ensuring verifiability
- Signature formatting → maps to base64 line wrapping
Key Concepts:
- DKIM Signing: RFC 6376 Section 5 - Signer Actions
- RSA Key Generation: “Serious Cryptography, 2nd Edition” Chapter 11 - Aumasson
- DNS TXT Record Limits: “TCP/IP Illustrated, Volume 1” Chapter 11 - Stevens (255-byte strings)
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 4 (DKIM Verifier), RSA understanding
Real world outcome:
$ ./dkim-signer generate-keys --domain example.com --selector mail2024
Generated RSA-2048 key pair.
Add this TXT record to your DNS:
Host: mail2024._domainkey.example.com
Value: v=DKIM1; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
Private key saved to: mail2024.private.pem
$ ./dkim-signer sign email.txt --key mail2024.private.pem --domain example.com
Signing email...
Headers to sign: from, to, subject, date, message-id
Canonicalization: relaxed/relaxed
Algorithm: rsa-sha256
Added DKIM-Signature header:
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=example.com; s=mail2024;
h=from:to:subject:date:message-id;
bh=XgF6uYzcgcROQtd83d1Evx8x2uW+QnNHkSL6XpXMwcU=;
b=kHiL2BqZMdpD8rH+...
Signed email saved to: email.signed.txt
Verify with: ./dkim-verify email.signed.txt
Implementation Hints:
Generate a 2048-bit RSA key pair. The public key goes in DNS as a TXT record at <selector>._domainkey.<domain> in format v=DKIM1; k=rsa; p=<base64-encoded-public-key>. When signing: canonicalize selected headers and body, compute body hash, build the DKIM-Signature header (without b= value), canonicalize and concatenate all signed headers + DKIM-Signature, sign with private key, base64-encode signature, insert into b= tag. Headers to sign should include From (required), plus To, Subject, Date, Message-ID for integrity.
Learning milestones:
- Generate keys and publish to DNS → You understand key distribution
- Sign emails that pass your own verifier → You’ve closed the loop
- Sign emails that pass Gmail/Outlook verification → You’ve achieved production quality
Project 6: Email Header Analyzer & Route Visualizer
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, JavaScript (Node.js)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Email Forensics / Network Analysis
- Software or Tool: Email Analyzer
- Main Book: “Practical Malware Analysis” by Sikorski & Honig
What you’ll build: A tool that parses raw email headers, reconstructs the email’s journey through servers, visualizes the path with timestamps and geographic locations, and flags suspicious patterns.
Why it teaches email: Email headers are the forensic record of every email. Understanding Received headers, Authentication-Results, and various X- headers teaches you how emails actually traverse the internet and how to detect spoofing or routing anomalies.
Core challenges you’ll face:
- Received header parsing → maps to understanding mail relay chains
- Timestamp extraction and timezone handling → maps to forensic timeline analysis
- IP geolocation → maps to tracing email origins
- Authentication result interpretation → maps to SPF/DKIM/DMARC results
- Spoofing detection → maps to forensic indicators
Key Concepts:
- Email Header Format: RFC 5322 - Internet Message Format
- Received Header Syntax: RFC 5321 Section 4.4 - Trace Information
- Authentication-Results: RFC 8601 - Message Header Field
- Email Forensics: “Email Header Analysis” - Forensics Insider
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic regex, datetime handling
Real world outcome:
$ ./email-analyzer suspicious_email.eml
═══════════════════════════════════════════════════════════════
EMAIL ROUTE ANALYSIS
═══════════════════════════════════════════════════════════════
From: "Amazon Support" <support@amaz0n-security.xyz>
To: victim@example.com
Subject: Your account has been compromised!
Date: 2024-03-15 14:32:00 UTC
═══════════════════════════════════════════════════════════════
DELIVERY PATH (bottom to top = send order)
═══════════════════════════════════════════════════════════════
Step 1: 2024-03-15 14:32:05 UTC
From: mail.shadyserver.ru (185.234.72.19)
To: mx1.example.com
Location: Moscow, Russia
⚠️ WARNING: Origin server in high-risk region
Step 2: 2024-03-15 14:32:08 UTC (+3 seconds)
From: mx1.example.com
To: internal-filter.example.com
Step 3: 2024-03-15 14:32:09 UTC (+1 second)
From: internal-filter.example.com
To: mailbox.example.com (final delivery)
═══════════════════════════════════════════════════════════════
AUTHENTICATION RESULTS
═══════════════════════════════════════════════════════════════
SPF: FAIL (amaz0n-security.xyz does not authorize 185.234.72.19)
DKIM: NONE (no signature present)
DMARC: FAIL (policy=reject)
═══════════════════════════════════════════════════════════════
PHISHING INDICATORS
═══════════════════════════════════════════════════════════════
🚨 HIGH RISK INDICATORS FOUND:
• Domain typosquatting: amaz0n-security.xyz (looks like amazon.com)
• SPF authentication failed
• No DKIM signature
• Originating IP on 3 blacklists
• Reply-To differs from From address
VERDICT: LIKELY PHISHING - Do not click any links!
Implementation Hints:
Received headers are read bottom-to-top (first added = bottom). Each has format: Received: from <sending-server> by <receiving-server> ... ; <timestamp>. Parse these with regex, extract IPs, query a geolocation API (like ip-api.com). The Authentication-Results header contains spf=pass/fail, dkim=pass/fail, dmarc=pass/fail. Build a suspicious pattern detector: mismatched From/Reply-To, typosquatting domains (Levenshtein distance), failed authentication, known bad IP ranges.
Learning milestones:
- Parse and display email route → You understand Received headers
- Add geolocation and timing analysis → You understand mail flow
- Detect phishing indicators → You’ve built a forensic tool
Project 7: Mini SMTP Server (Receive & Store)
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: C
- Alternative Programming Languages: Go, Rust, Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 3: Advanced
- Knowledge Area: Network Servers / Email Infrastructure
- Software or Tool: SMTP Server
- Main Book: “The Linux Programming Interface” by Michael Kerrisk
What you’ll build: An SMTP server that accepts incoming connections, handles the SMTP protocol dialogue, receives email messages, and stores them in Maildir format for later retrieval.
Why it teaches email: Running your own mail server shows you both sides of the protocol. You’ll handle malformed commands, implement anti-relay protection, validate sender domains, and understand why mail servers are notoriously complex to operate.
Core challenges you’ll face:
- Multi-client handling → maps to concurrent network programming
- SMTP state machine → maps to protocol implementation
- Anti-relay protection → maps to security requirements
- Maildir storage format → maps to email persistence
- Graceful error handling → maps to robustness requirements
Key Concepts:
- SMTP Server Requirements: RFC 5321 Section 4 - SMTP Procedures
- Concurrent Servers: “The Linux Programming Interface” Chapter 60 - Kerrisk
- Maildir Format: “The Linux Programming Interface” Chapter 63 - Kerrisk
- Socket Programming: “Advanced Programming in the UNIX Environment” Chapter 16 - Stevens
Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Socket programming, process/thread management
Real world outcome:
$ ./mini-smtp-server --port 2525 --domain testmail.local --maildir ./mail
Mini SMTP Server starting...
Listening on port 2525
Accepting mail for: testmail.local
Storing mail in: ./mail/
[14:32:05] Connection from 127.0.0.1
[14:32:05] <- EHLO client.local
[14:32:05] -> 250-testmail.local Hello client.local
[14:32:05] -> 250-SIZE 10485760
[14:32:05] -> 250 OK
[14:32:05] <- MAIL FROM:<sender@example.com>
[14:32:05] -> 250 OK
[14:32:05] <- RCPT TO:<user@testmail.local>
[14:32:05] -> 250 OK
[14:32:05] <- DATA
[14:32:05] -> 354 Start mail input; end with <CRLF>.<CRLF>
[14:32:05] <- <email content>
[14:32:05] <- .
[14:32:05] -> 250 OK: Message accepted, ID=1710512125.12345.testmail.local
[14:32:05] Stored: ./mail/user/new/1710512125.12345.testmail.local
[14:32:05] <- QUIT
[14:32:05] -> 221 Bye
# Check stored mail:
$ ls ./mail/user/new/
1710512125.12345.testmail.local
$ cat ./mail/user/new/1710512125.12345.testmail.local
From: sender@example.com
To: user@testmail.local
Subject: Test message
...
Implementation Hints:
Implement a state machine: GREETING → HELO/EHLO → MAIL → RCPT → DATA → message body → dot → (repeat or QUIT). Each state transition depends on the command received. Anti-relay: only accept mail for domains you control (check RCPT TO domain). Maildir format: store each message as a unique file in maildir/username/new/. Filename format: <timestamp>.<unique>.<hostname>. Handle concurrent connections with fork() (simplest), threads, or async I/O (select/epoll).
Learning milestones:
- Accept and store single email → You understand basic SMTP server operation
- Handle multiple concurrent connections → You understand server architecture
- Properly reject relay attempts → You understand mail security
Project 8: Bayesian Spam Classifier
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust, C
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Machine Learning / Email Filtering
- Software or Tool: Spam Filter
- Main Book: “Grokking Algorithms” by Aditya Bhargava
What you’ll build: A spam filter using Naive Bayes classification that learns from labeled examples (ham/spam) and calculates the probability that new emails are spam based on word frequencies.
Why it teaches spam detection: Bayesian filtering is how spam detection actually started (Paul Graham’s “A Plan for Spam” in 2002). Understanding how word probabilities combine, how to handle unseen words, and how spammers try to evade detection teaches you both ML fundamentals and the cat-and-mouse game of spam.
Core challenges you’ll face:
- Tokenization and preprocessing → maps to feature extraction
- Probability calculation with smoothing → maps to handling unseen words
- Log-probability to prevent underflow → maps to numerical stability
- Training on imbalanced datasets → maps to real-world ML challenges
- Threshold tuning (precision vs recall) → maps to false positive tradeoffs
Key Concepts:
- Naive Bayes: “Grokking Algorithms” Chapter 10 - Aditya Bhargava
- Spam Filtering Origins: “A Plan for Spam” - Paul Graham (2002)
- Laplace Smoothing: “Naive Bayes Spam Filtering” - Wikipedia
- Machine Learning for Spam: “Machine Learning for Email Spam Filtering” - ScienceDirect
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Basic probability, some Python
Real world outcome:
$ ./spam-classifier train --ham ./training/ham/ --spam ./training/spam/
Training Bayesian classifier...
Loaded 5000 ham emails, 5000 spam emails
Tokenizing and calculating probabilities...
Top spam indicators:
"viagra": P(spam|word) = 0.9987
"winner": P(spam|word) = 0.9834
"nigeria": P(spam|word) = 0.9756
"click here": P(spam|word) = 0.9612
Top ham indicators:
"meeting": P(ham|word) = 0.9234
"attached": P(ham|word) = 0.8932
"regards": P(ham|word) = 0.8845
Model saved to: spam_model.json
$ ./spam-classifier classify email.eml
Classifying: email.eml
Token analysis:
"congratulations" → spam indicator (0.89)
"won" → spam indicator (0.76)
"lottery" → spam indicator (0.94)
"bank details" → spam indicator (0.91)
Combined probability: P(spam) = 0.9973
VERDICT: SPAM (confidence: 99.73%)
$ ./spam-classifier classify work_email.eml
Token analysis:
"quarterly" → ham indicator (0.82)
"report" → ham indicator (0.71)
"meeting" → ham indicator (0.92)
"attached" → ham indicator (0.89)
Combined probability: P(spam) = 0.0234
VERDICT: HAM (confidence: 97.66%)
Implementation Hints: For each word, calculate P(word|spam) and P(word|ham) from training data. Use Laplace smoothing: add 1 to counts to handle unseen words. For classification: P(spam|email) ∝ P(spam) × ∏P(word|spam). Use log probabilities to prevent underflow: log(P) = log(P(spam)) + Σlog(P(word|spam)). Compare log(P(spam|email)) vs log(P(ham|email)). Tokenization matters: lowercase, remove punctuation, maybe stem words. Consider bigrams (“click here”) for better accuracy.
Learning milestones:
- Train on small dataset, classify correctly → You understand Naive Bayes
- Handle large vocabulary without underflow → You understand numerical issues
- Tune threshold to minimize false positives → You understand real-world tradeoffs
Project 9: DMARC Policy Engine
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Email Authentication / Policy Enforcement
- Software or Tool: DMARC Validator
- Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens
What you’ll build: A complete DMARC validator that checks both SPF and DKIM alignment with the From header domain, applies the domain’s DMARC policy (none/quarantine/reject), and generates aggregate reports.
Why it teaches email authentication: DMARC ties SPF and DKIM together with the visible “From” address. Understanding alignment (strict vs relaxed), policy application, and reporting teaches you how modern email authentication actually prevents spoofing at scale.
Core challenges you’ll face:
- DMARC record parsing → maps to policy configuration
- SPF alignment checking → maps to domain matching logic
- DKIM alignment checking → maps to signature domain matching
- Policy application → maps to enforcement decisions
- Aggregate report generation (RUA) → maps to XML report format
Key Concepts:
- DMARC Specification: RFC 7489 - Domain-based Message Authentication
- Alignment Rules: “DMARC Made Simple” - Valimail
- Policy Enforcement: Microsoft DMARC Guide - Microsoft Learn
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 3 (SPF) and 4 (DKIM)
Real world outcome:
$ ./dmarc-check email.eml
Analyzing email for DMARC compliance...
══════════════════════════════════════════════════════════════
EMAIL DETAILS
══════════════════════════════════════════════════════════════
From header: sender@bigcorp.com
Return-Path: bounce@mail.bigcorp.com
DKIM domain: bigcorp.com
══════════════════════════════════════════════════════════════
DMARC RECORD FOR bigcorp.com
══════════════════════════════════════════════════════════════
v=DMARC1; p=reject; sp=reject; adkim=r; aspf=r;
rua=mailto:dmarc@bigcorp.com; ruf=mailto:dmarc@bigcorp.com; pct=100
Policy: REJECT (100% of failures)
SPF alignment: Relaxed (organizational domain match OK)
DKIM alignment: Relaxed (organizational domain match OK)
══════════════════════════════════════════════════════════════
ALIGNMENT CHECK
══════════════════════════════════════════════════════════════
SPF Check:
From header domain: bigcorp.com
Return-Path domain: mail.bigcorp.com
Organizational domain: bigcorp.com
SPF result: PASS
SPF aligned: ✓ YES (relaxed: mail.bigcorp.com → bigcorp.com)
DKIM Check:
From header domain: bigcorp.com
DKIM d= domain: bigcorp.com
DKIM result: PASS
DKIM aligned: ✓ YES (exact match)
══════════════════════════════════════════════════════════════
DMARC RESULT
══════════════════════════════════════════════════════════════
DMARC: PASS
✓ At least one mechanism (SPF or DKIM) passed AND is aligned
Action: DELIVER (passed policy check)
Implementation Hints:
DMARC lives at _dmarc.<domain> as a TXT record. Key tags: p (policy: none/quarantine/reject), adkim (DKIM alignment: r=relaxed, s=strict), aspf (SPF alignment). Alignment means the domain in the From header must match the authenticated domain. Relaxed allows organizational domain match (mail.example.com aligns with example.com). Strict requires exact match. DMARC passes if EITHER SPF or DKIM passes AND aligns. For reporting, aggregate results into XML format per RFC 7489 Appendix C.
Learning milestones:
- Parse DMARC records and check alignment → You understand DMARC basics
- Correctly apply policy based on results → You understand enforcement
- Generate aggregate reports → You’ve built a complete DMARC engine
Project 10: Email Reputation & Blacklist Checker
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Node.js
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: DNS / Email Reputation
- Software or Tool: Reputation Checker
- Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens
What you’ll build: A tool that checks an IP address or domain against multiple DNS blacklists (DNSBLs), queries reputation services, and provides a trust score with recommendations.
Why it teaches email trust: Blacklists are how the email ecosystem maintains trust. Understanding DNSBL query format, the different types of lists (spam, malware, open relays), and how delisting works teaches you the reputation system that determines if your emails get delivered.
Core challenges you’ll face:
- DNSBL query format → maps to reverse DNS lookups
- Multiple list aggregation → maps to reputation scoring
- Rate limiting and caching → maps to API etiquette
- Result interpretation → maps to understanding list types
- Delisting guidance → maps to operational knowledge
Key Concepts:
- DNSBL Protocol: RFC 5782 - DNS Blacklists
- Major Blacklists: “MXToolbox Blacklist Check” - Reference implementation
- IP Reputation: “What Is IP Reputation” - Proofpoint
- Sender Score: “Sender Score” - Return Path
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: DNS basics
Real world outcome:
$ ./reputation-check 185.234.72.19
Checking reputation for IP: 185.234.72.19
═══════════════════════════════════════════════════════════════
BLACKLIST STATUS (checking 50 DNSBLs)
═══════════════════════════════════════════════════════════════
LISTED ON 5 BLACKLISTS:
✗ Spamhaus ZEN (zen.spamhaus.org)
Reason: SBL (Spamhaus Block List - verified spam source)
Delist: https://www.spamhaus.org/sbl/removal/
✗ Spamcop (bl.spamcop.net)
Reason: User-reported spam source
Delist: Automatic after 24-48 hours if spam stops
✗ Barracuda (b.barracudacentral.org)
Reason: Poor email practices
Delist: https://www.barracudacentral.org/rbl/removal
✗ SORBS (dnsbl.sorbs.net)
Reason: Dynamic IP range
Delist: http://www.sorbs.net/cgi-bin/support
✗ CBL (cbl.abuseat.org)
Reason: Infected host / botnet activity
Delist: https://www.abuseat.org/lookup.cgi
NOT LISTED ON 45 BLACKLISTS:
✓ SpamRats, ✓ PSBL, ✓ Invaluement, ✓ McAfee, ...
═══════════════════════════════════════════════════════════════
REPUTATION SCORE
═══════════════════════════════════════════════════════════════
Score: 23/100 (POOR)
Factors:
• Listed on 5 major blacklists (-40 points)
• IP in known VPS range (-10 points)
• No PTR record (-15 points)
• Recently allocated IP (-12 points)
═══════════════════════════════════════════════════════════════
RECOMMENDATIONS
═══════════════════════════════════════════════════════════════
1. Request delisting from Spamhaus (most impactful)
2. Set up proper PTR record matching your mail server hostname
3. Review server for malware/compromise (CBL listing suggests infection)
4. Consider IP warm-up plan once delisted
Mail sent from this IP will likely be rejected by major providers.
Implementation Hints:
DNSBL queries reverse the IP and append the list domain: to check 185.234.72.19 on zen.spamhaus.org, query 19.72.234.185.zen.spamhaus.org. If you get an A record response, the IP is listed (the returned IP like 127.0.0.2 indicates the reason). No response (NXDOMAIN) means not listed. Popular lists: zen.spamhaus.org, bl.spamcop.net, b.barracudacentral.org, dnsbl.sorbs.net. Query them in parallel for speed. Cache results (honor TTL). Some lists require registration for high-volume queries.
Learning milestones:
- Query single blacklist successfully → You understand DNSBL format
- Aggregate multiple lists into a score → You understand reputation systems
- Provide actionable delisting guidance → You’ve built a useful tool
Project 11: POP3/IMAP Client
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: C, Go, Rust
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Network Protocols / Email Retrieval
- Software or Tool: Email Client
- Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens
What you’ll build: A command-line email client that connects to mail servers via POP3 and IMAP, lists messages, downloads them, and displays content—all using raw protocol commands.
Why it teaches email: POP3 and IMAP are how email clients (Outlook, Thunderbird) retrieve mail. Understanding these protocols teaches you the difference between “download and delete” (POP3) vs “server-side folders” (IMAP), and why synchronization is complex.
Core challenges you’ll face:
- POP3 protocol dialogue → maps to simple retrieval protocol
- IMAP command/response parsing → maps to complex stateful protocol
- TLS encryption → maps to secure connections
- MIME parsing → maps to email structure (multipart, attachments)
- Folder management (IMAP) → maps to server-side organization
Key Concepts:
- POP3 Protocol: RFC 1939 - Post Office Protocol
- IMAP Protocol: RFC 3501 - IMAP4rev1
- MIME Format: RFC 2045 - MIME Part One
- Email Message Format: RFC 5322 - Internet Message Format
Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Socket programming, TLS basics
Real world outcome:
$ ./email-client --server imap.gmail.com --port 993 --user you@gmail.com
Password: ********
Connecting to imap.gmail.com:993 (TLS)...
[IMAP] <- * OK Gimap ready
[IMAP] -> a1 LOGIN you@gmail.com ********
[IMAP] <- a1 OK LOGIN completed
$ list-folders
Folders:
INBOX (42 messages, 3 unread)
[Gmail]/Sent Mail (1523 messages)
[Gmail]/Drafts (2 messages)
[Gmail]/Spam (15 messages)
Work (89 messages, 12 unread)
$ select INBOX
Selected: INBOX
42 messages, 3 unread
$ list 1-5
# From Subject Date
1 boss@work.com Q4 Planning Mar 15
2 newsletter@... Weekly Digest Mar 15
3* support@github.com [Action Required] Security Mar 14
4* alice@team.com Review PR #123 Mar 14
5* bob@team.com Re: Design feedback Mar 13
(* = unread)
$ read 3
From: support@github.com
To: you@gmail.com
Subject: [Action Required] Security Alert
Date: Thu, 14 Mar 2024 10:23:45 -0700
Content-Type: multipart/alternative
We noticed a new sign-in to your account...
[View HTML version? y/n]
$ download 3 --save ./email_3.eml
Saved to: ./email_3.eml
Implementation Hints:
POP3 is simpler: USER, PASS, LIST, RETR, DELE, QUIT. IMAP is complex: tagged commands (a1, a2…), responses can be multiline, SELECT to choose folder, FETCH to retrieve. Both need TLS (port 995 for POP3S, 993 for IMAPS, or STARTTLS on standard ports). MIME parsing is its own challenge: multipart/mixed contains parts separated by boundaries, each part has its own headers, content can be base64 or quoted-printable encoded.
Learning milestones:
- List and download via POP3 → You understand basic retrieval
- Navigate folders and search via IMAP → You understand modern mail access
- Parse and display MIME attachments → You’ve built a real mail client
Project 12: Open Relay Tester
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Bash
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Email Security / Penetration Testing
- Software or Tool: Security Scanner
- Main Book: “Penetration Testing” by Georgia Weidman
What you’ll build: A security tool that tests SMTP servers for open relay vulnerability—the misconfiguration that allows anyone to send email through the server, commonly exploited by spammers.
Why it teaches email security: Open relays were the spam epidemic of the early 2000s. Understanding how to detect them teaches you SMTP authentication, relay rules, and why mail server configuration is security-critical.
Core challenges you’ll face:
- SMTP dialogue for relay testing → maps to protocol abuse patterns
- Testing multiple relay scenarios → maps to comprehensive security testing
- Detecting soft failures → maps to understanding reject messages
- Handling rate limiting → maps to ethical testing practices
Key Concepts:
- Open Relay Concept: “Open Mail Relay” - Wikipedia
- SMTP Security: RFC 5321 Section 7 - Security Considerations
- Relay Testing: “Test for Open Relay” - MXToolbox reference
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Basic SMTP knowledge (Project 1)
Real world outcome:
$ ./relay-tester mail.example.com
Testing mail.example.com for open relay vulnerability...
═══════════════════════════════════════════════════════════════
OPEN RELAY TEST RESULTS
═══════════════════════════════════════════════════════════════
Server: mail.example.com (93.184.216.34)
Banner: 220 mail.example.com ESMTP Postfix
Test 1: External to External (most dangerous)
MAIL FROM: <spammer@evil.com>
RCPT TO: <victim@innocent.com>
Result: 554 Relay access denied ✓ PASS
Test 2: External to Internal
MAIL FROM: <spammer@evil.com>
RCPT TO: <user@example.com>
Result: 250 OK (expected - local delivery)
Test 3: Null sender to External
MAIL FROM: <>
RCPT TO: <victim@innocent.com>
Result: 554 Relay access denied ✓ PASS
Test 4: Percent hack
MAIL FROM: <spammer@evil.com>
RCPT TO: <victim%innocent.com@example.com>
Result: 550 Invalid recipient ✓ PASS
Test 5: Source routing
MAIL FROM: <spammer@evil.com>
RCPT TO: <@example.com:victim@innocent.com>
Result: 550 Source routing denied ✓ PASS
Test 6: After fake AUTH
[Attempt AUTH with invalid credentials]
RCPT TO: <victim@innocent.com>
Result: 530 Authentication required ✓ PASS
═══════════════════════════════════════════════════════════════
VERDICT: NOT AN OPEN RELAY ✓
═══════════════════════════════════════════════════════════════
All relay attempts were properly rejected.
This server appears to be correctly configured.
Implementation Hints: Test scenarios: (1) external-to-external relay, (2) percent hack (user%domain@server), (3) source routing (@server:user@domain), (4) null sender, (5) after failed authentication. A properly configured server should reject all external relay attempts with 5xx errors. Parse response codes carefully: 250 = accepted (bad!), 550/554 = rejected (good). Be ethical: only test servers you own or have permission to test.
Learning milestones:
- Detect basic open relay → You understand relay concepts
- Test all bypass techniques → You understand historical exploits
- Generate security report → You’ve built a useful audit tool
Project 13: Email Bounce Parser & Categorizer
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Node.js
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Email Operations / Deliverability
- Software or Tool: Bounce Handler
- Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens
What you’ll build: A tool that parses bounce messages (DSN - Delivery Status Notifications), extracts the original recipient and failure reason, and categorizes bounces as hard (permanent) or soft (temporary).
Why it teaches email operations: Every email marketer and mail administrator deals with bounces. Understanding DSN format, SMTP error codes, and the difference between “mailbox full” (soft) and “user unknown” (hard) teaches you the feedback loop of email delivery.
Core challenges you’ll face:
- DSN message parsing → maps to multipart/report format
- SMTP status code interpretation → maps to error classification
- Original message extraction → maps to message/rfc822 parsing
- Pattern matching for non-standard bounces → maps to real-world messiness
Key Concepts:
- DSN Format: RFC 3464 - Delivery Status Notifications
- SMTP Enhanced Status Codes: RFC 3463 - Enhanced Mail Status Codes
- Multipart/Report: RFC 6522 - Report Media Type
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Email header parsing, MIME basics
Real world outcome:
$ ./bounce-parser bounces/*.eml
Parsing 150 bounce messages...
═══════════════════════════════════════════════════════════════
BOUNCE ANALYSIS SUMMARY
═══════════════════════════════════════════════════════════════
Total bounces: 150
HARD BOUNCES (permanent - remove from list): 87
User unknown (5.1.1): 52
- john.doe@oldcompany.com
- jane.smith@defunct.org
- ...
Domain not found (5.1.2): 23
- anyone@typodomian.com
- sales@company.con
- ...
Mailbox disabled (5.2.1): 12
SOFT BOUNCES (temporary - retry later): 48
Mailbox full (4.2.2): 28
Server temporarily unavailable (4.4.1): 15
Rate limited (4.7.1): 5
UNCLASSIFIED (needs manual review): 15
Non-standard format, see ./unclassified/
═══════════════════════════════════════════════════════════════
OUTPUT FILES
═══════════════════════════════════════════════════════════════
hard_bounces.csv - Emails to remove (87 addresses)
soft_bounces.csv - Emails to retry (48 addresses)
bounce_report.json - Full analysis data
Implementation Hints:
DSN messages are multipart/report with parts: (1) human-readable explanation, (2) message/delivery-status with structured fields (Status: 5.1.1, Action: failed, Final-Recipient: …), (3) original message. Status codes: 4.x.x = temporary, 5.x.x = permanent. Common codes: 5.1.1 = bad mailbox, 5.1.2 = bad domain, 4.2.2 = mailbox full. Many servers don’t follow standards, so also pattern-match common phrases like “User unknown”, “over quota”, “mailbox not found”.
Learning milestones:
- Parse standard DSN messages → You understand bounce format
- Correctly classify hard vs soft bounces → You understand deliverability
- Handle non-standard bounce formats → You’ve built a production-ready tool
Project 14: Phishing Email Detector
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Node.js
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Email Security / Threat Detection
- Software or Tool: Security Analyzer
- Main Book: “Practical Malware Analysis” by Sikorski & Honig
What you’ll build: A tool that analyzes emails for phishing indicators: authentication failures, typosquatting domains, suspicious links, urgency language, and mismatched sender information.
Why it teaches email security: Phishing is the #1 email threat. Building a detector teaches you the specific techniques attackers use and the indicators that reveal fake emails—combining technical analysis (headers, authentication) with content analysis (URLs, language patterns).
Core challenges you’ll face:
- Authentication result interpretation → maps to SPF/DKIM/DMARC analysis
- Typosquatting detection → maps to string similarity algorithms
- URL extraction and analysis → maps to link safety checking
- NLP for urgency detection → maps to content analysis
- Visual similarity detection → maps to brand impersonation
Key Concepts:
- Phishing Techniques: “Bug Bounty Bootcamp” Chapter 12 - Vickie Li
- String Similarity: Levenshtein distance, homoglyph detection
- Email Authentication: Cloudflare SPF/DKIM/DMARC Guide
- URL Analysis: Safe Browsing API patterns
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Email header parsing, basic NLP concepts
Real world outcome:
$ ./phishing-detector suspicious_email.eml
═══════════════════════════════════════════════════════════════
PHISHING ANALYSIS REPORT
═══════════════════════════════════════════════════════════════
Email: "Your PayPal Account Has Been Limited"
From: service@paypa1.com
Reply-To: paypal-support@gmail.com
═══════════════════════════════════════════════════════════════
RISK INDICATORS
═══════════════════════════════════════════════════════════════
🚨 CRITICAL: Typosquatting detected
"paypa1.com" is visually similar to "paypal.com"
Levenshtein distance: 1 (letter 'l' → digit '1')
Risk: HIGH
🚨 CRITICAL: Authentication failures
SPF: FAIL (paypa1.com does not authorize sending IP)
DKIM: NONE (no signature)
DMARC: FAIL
Risk: HIGH
🚨 HIGH: Reply-To mismatch
From domain: paypa1.com
Reply-To domain: gmail.com
This is a classic phishing indicator
Risk: HIGH
⚠️ MEDIUM: Suspicious URLs found
Link text: "Verify Your Account"
Actual URL: http://185.234.72.19/paypal-verify/login.php
- Uses raw IP address (not domain)
- HTTP (not HTTPS)
- Path mimics legitimate site
Risk: MEDIUM
⚠️ MEDIUM: Urgency language detected
"immediate action required"
"account will be suspended"
"within 24 hours"
Risk: MEDIUM
═══════════════════════════════════════════════════════════════
VERDICT: PHISHING (Confidence: 97%)
═══════════════════════════════════════════════════════════════
Risk Score: 92/100
Indicators: 5 critical, 2 high, 3 medium
Recommendation: DO NOT interact with this email.
Report to: reportphishing@apwg.org
Implementation Hints: Analyze multiple layers: (1) Authentication - check for SPF/DKIM/DMARC failures in Authentication-Results header. (2) Domain analysis - compare From domain to known brands using Levenshtein distance, check for homoglyphs (paypaI vs paypal using capital I). (3) Link analysis - extract all URLs, check for IP addresses, URL shorteners, mismatched display text vs actual URL. (4) Content analysis - look for urgency words (“immediate”, “suspended”, “verify now”), threats, requests for credentials. Weight factors and compute overall score.
Learning milestones:
- Detect basic authentication failures → You understand email verification
- Identify typosquatting and homoglyphs → You understand domain spoofing
- Analyze links and content patterns → You’ve built a real security tool
Project 15: Email Authentication Dashboard
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: Python (with Flask/FastAPI)
- Alternative Programming Languages: Go, Node.js
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Email Authentication / Web Development
- Software or Tool: Dashboard / Web App
- Main Book: “TCP/IP Illustrated, Volume 1” by W. Richard Stevens
What you’ll build: A web dashboard that takes a domain name and displays its complete email authentication setup—SPF, DKIM selectors, DMARC policy—with validation status and recommendations.
Why it teaches email: This project ties together everything: DNS queries, SPF parsing, DKIM key lookup, DMARC policy interpretation. Building a dashboard forces you to present complex technical information clearly and provide actionable recommendations.
Core challenges you’ll face:
- Integrating multiple DNS queries → maps to comprehensive domain analysis
- DKIM selector discovery → maps to brute-forcing common selectors
- Visualization of authentication chain → maps to explaining to non-technical users
- Recommendation engine → maps to best practices knowledge
Key Concepts:
- SPF/DKIM/DMARC: All previous projects combined
- Web Development: Flask/FastAPI basics
- DNS Queries: All record types (TXT, MX, A)
- Common DKIM Selectors: google, selector1, selector2, k1, etc.
Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 2, 3, 4, 9
Real world outcome:
Browser: http://localhost:5000/check/google.com
╔═══════════════════════════════════════════════════════════════╗
║ EMAIL AUTHENTICATION DASHBOARD ║
║ Domain: google.com ║
╚═══════════════════════════════════════════════════════════════╝
┌─────────────────────────────────────────────────────────────────┐
│ OVERALL SCORE: 95/100 ★★★★★ │
│ "Excellent email authentication setup" │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ MX RECORDS ✓ │
├─────────────────────────────────────────────────────────────────┤
│ Priority 5: gmail-smtp-in.l.google.com │
│ Priority 10: alt1.gmail-smtp-in.l.google.com │
│ Priority 20: alt2.gmail-smtp-in.l.google.com │
│ Status: ✓ Multiple MX records with proper failover │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ SPF RECORD ✓ │
├─────────────────────────────────────────────────────────────────┤
│ v=spf1 include:_spf.google.com ~all │
│ │
│ Analysis: │
│ • Uses includes (good for manageability) │
│ • Ends with ~all (softfail - could be stricter) │
│ • 4 DNS lookups (under 10 limit) │
│ │
│ Recommendation: Consider -all for stricter policy │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ DKIM KEYS ✓ │
├─────────────────────────────────────────────────────────────────┤
│ Selectors found: │
│ • 20230601._domainkey.google.com ✓ │
│ Algorithm: RSA-2048 │
│ │
│ Status: DKIM properly configured │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ DMARC POLICY ✓ │
├─────────────────────────────────────────────────────────────────┤
│ v=DMARC1; p=reject; sp=reject; rua=mailto:... │
│ │
│ Policy: REJECT (strongest) │
│ Subdomain policy: REJECT │
│ Aggregate reports: Configured │
│ │
│ Status: Excellent DMARC configuration │
└─────────────────────────────────────────────────────────────────┘
Implementation Hints:
Build on previous projects: reuse SPF parser, DKIM lookup, DMARC parser. For DKIM selector discovery, try common selectors: google, selector1, selector2, default, k1, dkim, mail, s1, s2, etc. Store results in a simple scoring system. Use a web framework (Flask is simplest) to create routes: /check/<domain>. Return JSON for API or HTML for dashboard. Add caching (Redis or in-memory) to avoid repeated DNS queries.
Learning milestones:
- Display all authentication records → You’ve integrated previous projects
- Provide actionable recommendations → You understand best practices
- Create a usable web interface → You’ve built a product
Project 16 (Capstone): Full Mail Server Stack
- File: EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md
- Main Programming Language: C (core), Python (tooling)
- Alternative Programming Languages: Go, Rust
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Systems Programming / Email Infrastructure
- Software or Tool: Complete Mail Server
- Main Book: “The Linux Programming Interface” by Michael Kerrisk
What you’ll build: A complete, production-grade mail server from scratch: SMTP server (receiving), SMTP client (sending), authentication validation (SPF/DKIM/DMARC), spam filtering, Maildir storage, and IMAP server for retrieval—all without using Postfix, Dovecot, or other existing mail software.
Why this is the ultimate email project: This capstone forces you to understand every piece of the email puzzle and how they fit together. You’ll face real-world challenges: handling malformed input, managing concurrent connections, preventing abuse, and achieving deliverability.
Core challenges you’ll face:
- MTA implementation → maps to reliable message transfer
- Message queue management → maps to durability and retry logic
- Authentication integration → maps to connecting SPF/DKIM/DMARC
- Content filtering → maps to spam and malware detection
- IMAP server → maps to complex stateful protocol
- Configuration management → maps to operational complexity
- Deliverability → maps to reputation and authentication
Key Concepts:
- Mail Server Architecture: “The Linux Programming Interface” Chapters 56-63 - Kerrisk
- All RFCs: 5321 (SMTP), 5322 (Message Format), 6376 (DKIM), 7208 (SPF), 7489 (DMARC), 3501 (IMAP)
- Queue Management: “Release It!” Chapter 5 - Nygard
- Building Mail Servers: “LinuxBabe Mail Server Guide”
- Self-Hosting Email: “How To Run Your Own Mail Server” - c0ffee.net
Difficulty: Master Time estimate: 2-3 months Prerequisites: All previous projects, strong systems programming
Real world outcome:
# Start your mail server
$ ./mailserver start --domain mymail.com --config /etc/mymail/config.yaml
[INIT] Loading configuration...
[INIT] Starting SMTP server on port 25 (inbound)
[INIT] Starting SMTP server on port 587 (submission)
[INIT] Starting IMAP server on port 993
[INIT] Loading spam filter model...
[INIT] Initializing authentication validators...
[READY] Mail server for mymail.com is running
# Server receives an email
[SMTP:25] Connection from 209.85.220.41
[SMTP:25] EHLO mail.google.com
[SMTP:25] MAIL FROM:<sender@gmail.com>
[SMTP:25] RCPT TO:<user@mymail.com>
[AUTH] Checking SPF for gmail.com... PASS
[SMTP:25] DATA
[AUTH] Checking DKIM signature... PASS
[AUTH] Checking DMARC alignment... PASS
[SPAM] Score: 0.02 (HAM)
[STORE] Message saved: /var/mail/user/new/1710523456.12345.mymail.com
[SMTP:25] 250 OK: Message accepted
# User connects via IMAP
[IMAP:993] Connection from 192.168.1.100
[IMAP:993] LOGIN user@mymail.com ********
[IMAP:993] SELECT INBOX
[IMAP:993] FETCH 1:* (FLAGS ENVELOPE)
# User reads their email in Thunderbird
# Server sends an email
$ ./mailserver send --from user@mymail.com --to friend@gmail.com \
--subject "Hello from my own server!" --body "This email was sent by my custom mail server!"
[SEND] Looking up MX for gmail.com...
[SEND] Connecting to gmail-smtp-in.l.google.com:25
[SEND] STARTTLS negotiated
[SIGN] Adding DKIM signature (d=mymail.com s=mail2024)
[SEND] Message sent successfully
[SEND] Remote response: 250 2.0.0 OK
# Check mail deliverability
$ ./mailserver test-deliverability --to test@mail-tester.com
Sending test email...
Waiting for results...
Mail-Tester Score: 9.5/10
✓ SPF: PASS
✓ DKIM: PASS
✓ DMARC: PASS
✓ Blacklist: Not listed
✓ PTR record: Configured
✓ Content: Clean
Your mail server is production-ready!
Implementation Hints: Break this into phases: (1) Basic SMTP receive and store (Project 7 expanded), (2) Add authentication checking, (3) Add spam filtering, (4) Add SMTP sending with DKIM signing, (5) Add IMAP server, (6) Add queue management for retries. Use Maildir for storage (reliable, simple). Implement a message queue for outbound mail with retry logic (try again after 15min, 1hr, 4hr, 1day). Sign all outbound mail with DKIM. For production, you’ll also need: PTR record setup, IP warm-up, feedback loop registration with major providers.
Learning milestones:
- Receive and store mail with authentication checking → You understand inbound mail
- Send mail with DKIM that passes Gmail verification → You understand outbound mail
- Full bidirectional email with IMAP access → You’ve built a complete mail system
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor | Business Value |
|---|---|---|---|---|---|
| 1. Raw SMTP Client | Intermediate | 1-2 weeks | ★★★☆☆ | ★★★★☆ | Resume Gold |
| 2. MX Resolver | Intermediate | 1-2 weeks | ★★★☆☆ | ★★★☆☆ | Micro-SaaS |
| 3. SPF Validator | Advanced | 2-3 weeks | ★★★★☆ | ★★★☆☆ | Service & Support |
| 4. DKIM Verifier | Expert | 3-4 weeks | ★★★★★ | ★★★★☆ | Service & Support |
| 5. DKIM Signer | Expert | 2-3 weeks | ★★★★★ | ★★★★☆ | Service & Support |
| 6. Header Analyzer | Intermediate | 1-2 weeks | ★★★☆☆ | ★★★★★ | Micro-SaaS |
| 7. Mini SMTP Server | Advanced | 3-4 weeks | ★★★★☆ | ★★★★★ | Open Core |
| 8. Spam Classifier | Advanced | 2-3 weeks | ★★★★☆ | ★★★★☆ | Micro-SaaS |
| 9. DMARC Engine | Advanced | 2-3 weeks | ★★★★☆ | ★★★☆☆ | Service & Support |
| 10. Reputation Checker | Intermediate | 1-2 weeks | ★★★☆☆ | ★★★★☆ | Micro-SaaS |
| 11. POP3/IMAP Client | Intermediate | 2-3 weeks | ★★★☆☆ | ★★★☆☆ | Resume Gold |
| 12. Relay Tester | Intermediate | 1 week | ★★☆☆☆ | ★★★☆☆ | Service & Support |
| 13. Bounce Parser | Intermediate | 1-2 weeks | ★★★☆☆ | ★★☆☆☆ | Service & Support |
| 14. Phishing Detector | Advanced | 2-3 weeks | ★★★★☆ | ★★★★★ | Service & Support |
| 15. Auth Dashboard | Advanced | 2-3 weeks | ★★★★☆ | ★★★★☆ | Micro-SaaS |
| 16. Full Mail Server | Master | 2-3 months | ★★★★★ | ★★★★★ | Open Core |
Recommended Learning Path
If you’re a beginner to email systems:
-
Start with Project 1 (SMTP Client) - This gives you hands-on experience with the core protocol. You’ll see exactly what happens when email is sent.
-
Then Project 2 (MX Resolver) - Understand how email finds its destination. This connects DNS to email routing.
-
Then Project 6 (Header Analyzer) - Learn to read the forensic trail of every email. Great for debugging and understanding the full picture.
-
Then Project 10 (Reputation Checker) - Understand the trust layer that determines deliverability.
If you want to understand authentication deeply:
- Project 3 (SPF Validator) → Project 4 (DKIM Verifier) → Project 9 (DMARC Engine) → Project 15 (Dashboard)
This progression takes you through the entire authentication stack from simple (SPF) to complex (DMARC) to integrated (dashboard).
If you want to build something production-useful:
-
Project 6 (Header Analyzer) + Project 14 (Phishing Detector) - Together these make a powerful email security tool.
-
Project 15 (Auth Dashboard) - A marketable SaaS product for domain administrators.
If you want the deepest understanding:
- Work through Projects 1-9 in order, then attempt Project 16 (Full Mail Server). This is the “build everything from scratch” path that will make you a true email expert.
Key Resources Summary
RFCs (The Source of Truth)
- RFC 5321 - SMTP Protocol
- RFC 5322 - Email Message Format
- RFC 7208 - SPF
- RFC 6376 - DKIM
- RFC 7489 - DMARC
- RFC 3501 - IMAP
Books
- “TCP/IP Illustrated, Volume 1” by W. Richard Stevens - Networking foundations
- “The Linux Programming Interface” by Michael Kerrisk - Systems programming
- “Serious Cryptography, 2nd Edition” by Jean-Philippe Aumasson - For DKIM cryptography
- “Grokking Algorithms” by Aditya Bhargava - For spam classification
Online Resources
- Cloudflare: SPF/DKIM/DMARC Explained
- LinuxBabe Mail Server Guide
- MXToolbox - Reference implementation for many tools
- Sender Score - IP reputation reference
Summary
| # | Project | Main Language |
|---|---|---|
| 1 | Raw SMTP Client from Scratch | Python |
| 2 | MX Record Resolver & Mail Router | C |
| 3 | SPF Record Parser & Validator | Python |
| 4 | DKIM Signature Verifier | Python |
| 5 | DKIM Signer (Outbound Email) | Python |
| 6 | Email Header Analyzer & Route Visualizer | Python |
| 7 | Mini SMTP Server (Receive & Store) | C |
| 8 | Bayesian Spam Classifier | Python |
| 9 | DMARC Policy Engine | Python |
| 10 | Email Reputation & Blacklist Checker | Python |
| 11 | POP3/IMAP Client | Python |
| 12 | Open Relay Tester | Python |
| 13 | Email Bounce Parser & Categorizer | Python |
| 14 | Phishing Email Detector | Python |
| 15 | Email Authentication Dashboard | Python (Flask) |
| 16 | Full Mail Server Stack (Capstone) | C + Python |
By completing these projects, you’ll transform from someone who “uses email” to someone who truly understands how the global email system works—from the moment you click send to the spam filter’s decision to deliver or reject.