Project 13: Email Bounce Parser and Categorizer
Build a parser that reads DSN messages and categorizes bounces by reason.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Language | Python (Alternatives: Go, Rust) |
| Prerequisites | MIME parsing, SMTP codes |
| Key Topics | DSN format, status codes, categorization |
1. Learning Objectives
- Parse DSN (Delivery Status Notification) messages.
- Extract bounce codes and human-readable reasons.
- Categorize bounces into hard/soft/blocked types.
- Produce a report useful for deliverability monitoring.
2. Theoretical Foundation
2.1 Core Concepts
- DSN: A standardized MIME report describing delivery failures.
- Status codes: 2.x.x, 4.x.x, 5.x.x for delivery status.
- Hard vs soft bounces: Permanent vs temporary failures.
- Enhanced status codes: Provide structured failure categories.
2.2 Why This Matters
Bounce handling is essential for list hygiene and deliverability. Misclassifying bounces can harm sender reputation.
2.3 Historical Context / Background
The DSN format was introduced to replace inconsistent bounce messages with a structured standard.
2.4 Common Misconceptions
- Misconception: All 5xx are permanent. Reality: some are policy-based and can change.
- Misconception: Subject text is enough. Reality: DSN fields are more reliable.
3. Project Specification
3.1 What You Will Build
A CLI tool that reads an email bounce message, extracts DSN fields, and outputs a category and recommended action.
3.2 Functional Requirements
- Parse MIME multipart/report with message/delivery-status.
- Extract status codes, diagnostic code, and recipient.
- Categorize into hard/soft/blocked/unknown.
- Output structured JSON and text summary.
3.3 Non-Functional Requirements
- Performance: Parse a DSN in under 100 ms.
- Reliability: Handle non-standard bounces gracefully.
- Usability: Provide clear category and reason.
3.4 Example Usage / Output
$ ./bounce-parse bounce.eml
Recipient: user@example.com
Status: 5.1.1
Category: hard_bounce
Reason: mailbox does not exist
Action: suppress address
3.5 Real World Outcome
You can automate list hygiene and suppression decisions based on reliable bounce categorization.
4. Solution Architecture
4.1 High-Level Design
Message Parser
-> MIME DSN Extractor
-> Status Code Mapper
-> Category Engine
-> Report Generator
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Parser | Parse RFC 5322 | Handle folded headers |
| DSN Extractor | Extract delivery-status part | MIME boundary handling |
| Mapper | Interpret status codes | Use RFC 3463 categories |
| Reporter | Output results | JSON and text |
4.3 Data Structures
class BounceResult:
def __init__(self, recipient, status, category, reason):
self.recipient = recipient
self.status = status
self.category = category
self.reason = reason
4.4 Algorithm Overview
Key Algorithm: DSN Parsing
- Parse MIME parts and find delivery-status.
- Extract fields like Final-Recipient, Status, Diagnostic-Code.
- Map status to category.
Complexity Analysis:
- Time: O(n) for message size
- Space: O(n)
5. Implementation Guide
5.1 Development Environment Setup
python -m venv .venv
source .venv/bin/activate
5.2 Project Structure
bounce-parser/
├── mime_parser.py
├── dsn_extract.py
├── categorize.py
└── report.py
5.3 The Core Question You’re Answering
“Why did this message bounce, and what should we do about it?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- DSN MIME structure
- Enhanced status codes
- SMTP reply codes
- Policy vs user errors
5.5 Questions to Guide Your Design
- How will you handle bounces that are not DSN formatted?
- Which status codes map to soft vs hard?
- How will you extract a clear reason string?
5.6 Thinking Exercise
If the status code is 4.2.2 (mailbox full), should this be a soft or hard bounce? Why?
5.7 The Interview Questions They’ll Ask
- “What is the difference between SMTP codes and enhanced status codes?”
- “How do you categorize bounces reliably?”
- “Why are soft bounces retried?”
5.8 Hints in Layers
Hint 1: Target delivery-status part
- Look for
message/delivery-statusMIME type.
Hint 2: Use RFC 3463 mapping
- The first digit of status indicates class.
Hint 3: Provide fallback parsing
- For non-DSN bounces, use heuristics.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| DSN format | RFC 3464 | All |
| Status codes | RFC 3463 | Sections 3-4 |
| SMTP codes | RFC 5321 | Section 4 |
5.10 Implementation Phases
Phase 1: Foundation (3-4 days)
Goals:
- Parse MIME and extract DSN
Tasks:
- Parse headers and MIME boundaries.
- Extract delivery-status fields.
Checkpoint: Print Status and Recipient.
Phase 2: Core Functionality (4-5 days)
Goals:
- Categorize bounces
Tasks:
- Map enhanced status codes.
- Add category logic.
Checkpoint: Correct category for common codes.
Phase 3: Polish and Edge Cases (2-3 days)
Goals:
- Handle non-standard bounces
Tasks:
- Add heuristic parsing for text bounces.
- Output detailed reports.
Checkpoint: Robust output for varied bounces.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Parsing | full MIME vs minimal | minimal with DSN focus | Scope control |
| Categorization | simple vs detailed | detailed mapping | Better actionability |
| Output | text vs JSON | both | Integrations |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Status mapping | 5.1.1 -> hard |
| Integration Tests | DSN samples | postfix bounce |
| Edge Case Tests | Non-DSN bounce | heuristics |
6.2 Critical Test Cases
- 5.1.1 categorized as hard bounce.
- 4.2.2 categorized as soft bounce.
- Policy rejection flagged as blocked.
6.3 Test Data
Status: 5.1.1
Diagnostic-Code: smtp; 550 5.1.1 User unknown
7. Common Pitfalls and Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Ignoring enhanced status | wrong category | map 5.x.x and 4.x.x |
| Parsing wrong MIME part | empty results | find delivery-status part |
| No fallback | fail on non-DSN | add heuristics |
7.2 Debugging Strategies
- Print MIME part list.
- Use sample bounces from multiple providers.
7.3 Performance Traps
- Full MIME parsing for large attachments. Only parse headers and DSN part.
8. Extensions and Challenges
8.1 Beginner Extensions
- Add user-friendly reason messages.
- Add bulk processing of mailboxes.
8.2 Intermediate Extensions
- Store results in a database.
- Track bounce history per recipient.
8.3 Advanced Extensions
- Integrate with suppression list automation.
- Add provider-specific rules.
9. Real-World Connections
9.1 Industry Applications
- Email platforms use bounce parsing for list hygiene.
- Deliverability teams analyze bounce trends.
9.2 Related Open Source Projects
- bounce-mail: https://github.com/dvermeirjr/bounce-mail
- mailparser: https://github.com/mikel/mail
9.3 Interview Relevance
- Handling DSNs and SMTP status codes is common in email systems roles.
10. Resources
10.1 Essential Reading
- RFC 3464 - DSN format
- RFC 3463 - Enhanced status codes
10.2 Video Resources
- Bounce handling walkthroughs
10.3 Tools and Documentation
- Sample DSNs from Postfix/Exim
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain DSN structure
- I understand enhanced status codes
- I can categorize bounces correctly
11.2 Implementation
- Parses DSN and extracts fields
- Categorizes with correct action
- Handles non-DSN bounces
11.3 Growth
- I can integrate bounce results into list hygiene
- I can explain policy-based failures
12. Submission / Completion Criteria
Minimum Viable Completion:
- Parse DSN and output status and recipient
Full Completion:
- Categorize bounce and provide recommended action
Excellence (Going Above and Beyond):
- Bulk processing and suppression list integration
This guide was generated from EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md. For the complete learning path, see the parent directory.