Project 13: Email Bounce Parser and Categorizer

Build a parser that reads DSN messages and categorizes bounces by reason.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Language Python (Alternatives: Go, Rust)
Prerequisites MIME parsing, SMTP codes
Key Topics DSN format, status codes, categorization

1. Learning Objectives

  1. Parse DSN (Delivery Status Notification) messages.
  2. Extract bounce codes and human-readable reasons.
  3. Categorize bounces into hard/soft/blocked types.
  4. Produce a report useful for deliverability monitoring.

2. Theoretical Foundation

2.1 Core Concepts

  • DSN: A standardized MIME report describing delivery failures.
  • Status codes: 2.x.x, 4.x.x, 5.x.x for delivery status.
  • Hard vs soft bounces: Permanent vs temporary failures.
  • Enhanced status codes: Provide structured failure categories.

2.2 Why This Matters

Bounce handling is essential for list hygiene and deliverability. Misclassifying bounces can harm sender reputation.

2.3 Historical Context / Background

The DSN format was introduced to replace inconsistent bounce messages with a structured standard.

2.4 Common Misconceptions

  • Misconception: All 5xx are permanent. Reality: some are policy-based and can change.
  • Misconception: Subject text is enough. Reality: DSN fields are more reliable.

3. Project Specification

3.1 What You Will Build

A CLI tool that reads an email bounce message, extracts DSN fields, and outputs a category and recommended action.

3.2 Functional Requirements

  1. Parse MIME multipart/report with message/delivery-status.
  2. Extract status codes, diagnostic code, and recipient.
  3. Categorize into hard/soft/blocked/unknown.
  4. Output structured JSON and text summary.

3.3 Non-Functional Requirements

  • Performance: Parse a DSN in under 100 ms.
  • Reliability: Handle non-standard bounces gracefully.
  • Usability: Provide clear category and reason.

3.4 Example Usage / Output

$ ./bounce-parse bounce.eml
Recipient: user@example.com
Status: 5.1.1
Category: hard_bounce
Reason: mailbox does not exist
Action: suppress address

3.5 Real World Outcome

You can automate list hygiene and suppression decisions based on reliable bounce categorization.


4. Solution Architecture

4.1 High-Level Design

Message Parser
  -> MIME DSN Extractor
  -> Status Code Mapper
  -> Category Engine
  -> Report Generator

4.2 Key Components

Component Responsibility Key Decisions
Parser Parse RFC 5322 Handle folded headers
DSN Extractor Extract delivery-status part MIME boundary handling
Mapper Interpret status codes Use RFC 3463 categories
Reporter Output results JSON and text

4.3 Data Structures

class BounceResult:
    def __init__(self, recipient, status, category, reason):
        self.recipient = recipient
        self.status = status
        self.category = category
        self.reason = reason

4.4 Algorithm Overview

Key Algorithm: DSN Parsing

  1. Parse MIME parts and find delivery-status.
  2. Extract fields like Final-Recipient, Status, Diagnostic-Code.
  3. Map status to category.

Complexity Analysis:

  • Time: O(n) for message size
  • Space: O(n)

5. Implementation Guide

5.1 Development Environment Setup

python -m venv .venv
source .venv/bin/activate

5.2 Project Structure

bounce-parser/
├── mime_parser.py
├── dsn_extract.py
├── categorize.py
└── report.py

5.3 The Core Question You’re Answering

“Why did this message bounce, and what should we do about it?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. DSN MIME structure
  2. Enhanced status codes
  3. SMTP reply codes
  4. Policy vs user errors

5.5 Questions to Guide Your Design

  1. How will you handle bounces that are not DSN formatted?
  2. Which status codes map to soft vs hard?
  3. How will you extract a clear reason string?

5.6 Thinking Exercise

If the status code is 4.2.2 (mailbox full), should this be a soft or hard bounce? Why?

5.7 The Interview Questions They’ll Ask

  1. “What is the difference between SMTP codes and enhanced status codes?”
  2. “How do you categorize bounces reliably?”
  3. “Why are soft bounces retried?”

5.8 Hints in Layers

Hint 1: Target delivery-status part

  • Look for message/delivery-status MIME type.

Hint 2: Use RFC 3463 mapping

  • The first digit of status indicates class.

Hint 3: Provide fallback parsing

  • For non-DSN bounces, use heuristics.

5.9 Books That Will Help

Topic Book Chapter
DSN format RFC 3464 All
Status codes RFC 3463 Sections 3-4
SMTP codes RFC 5321 Section 4

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

Goals:

  • Parse MIME and extract DSN

Tasks:

  1. Parse headers and MIME boundaries.
  2. Extract delivery-status fields.

Checkpoint: Print Status and Recipient.

Phase 2: Core Functionality (4-5 days)

Goals:

  • Categorize bounces

Tasks:

  1. Map enhanced status codes.
  2. Add category logic.

Checkpoint: Correct category for common codes.

Phase 3: Polish and Edge Cases (2-3 days)

Goals:

  • Handle non-standard bounces

Tasks:

  1. Add heuristic parsing for text bounces.
  2. Output detailed reports.

Checkpoint: Robust output for varied bounces.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Parsing full MIME vs minimal minimal with DSN focus Scope control
Categorization simple vs detailed detailed mapping Better actionability
Output text vs JSON both Integrations

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Status mapping 5.1.1 -> hard
Integration Tests DSN samples postfix bounce
Edge Case Tests Non-DSN bounce heuristics

6.2 Critical Test Cases

  1. 5.1.1 categorized as hard bounce.
  2. 4.2.2 categorized as soft bounce.
  3. Policy rejection flagged as blocked.

6.3 Test Data

Status: 5.1.1
Diagnostic-Code: smtp; 550 5.1.1 User unknown

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Ignoring enhanced status wrong category map 5.x.x and 4.x.x
Parsing wrong MIME part empty results find delivery-status part
No fallback fail on non-DSN add heuristics

7.2 Debugging Strategies

  • Print MIME part list.
  • Use sample bounces from multiple providers.

7.3 Performance Traps

  • Full MIME parsing for large attachments. Only parse headers and DSN part.

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add user-friendly reason messages.
  • Add bulk processing of mailboxes.

8.2 Intermediate Extensions

  • Store results in a database.
  • Track bounce history per recipient.

8.3 Advanced Extensions

  • Integrate with suppression list automation.
  • Add provider-specific rules.

9. Real-World Connections

9.1 Industry Applications

  • Email platforms use bounce parsing for list hygiene.
  • Deliverability teams analyze bounce trends.
  • bounce-mail: https://github.com/dvermeirjr/bounce-mail
  • mailparser: https://github.com/mikel/mail

9.3 Interview Relevance

  • Handling DSNs and SMTP status codes is common in email systems roles.

10. Resources

10.1 Essential Reading

  • RFC 3464 - DSN format
  • RFC 3463 - Enhanced status codes

10.2 Video Resources

  • Bounce handling walkthroughs

10.3 Tools and Documentation

  • Sample DSNs from Postfix/Exim

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain DSN structure
  • I understand enhanced status codes
  • I can categorize bounces correctly

11.2 Implementation

  • Parses DSN and extracts fields
  • Categorizes with correct action
  • Handles non-DSN bounces

11.3 Growth

  • I can integrate bounce results into list hygiene
  • I can explain policy-based failures

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Parse DSN and output status and recipient

Full Completion:

  • Categorize bounce and provide recommended action

Excellence (Going Above and Beyond):

  • Bulk processing and suppression list integration

This guide was generated from EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md. For the complete learning path, see the parent directory.