Project 4: DKIM Signature Verifier

Build a verifier that parses DKIM-Signature headers and validates signatures against DNS-published keys.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 3-4 weeks
Language Python (Alternatives: Go, Rust, C)
Prerequisites DNS TXT, RSA signatures, MIME headers
Key Topics Canonicalization, body hash, DKIM tags

1. Learning Objectives

  1. Parse DKIM-Signature headers into tag-value pairs.
  2. Implement canonicalization (simple and relaxed) for headers and body.
  3. Fetch DKIM public keys via DNS and validate signatures.
  4. Output pass/fail with clear diagnostic messages.

2. Theoretical Foundation

2.1 Core Concepts

  • DKIM: DomainKeys Identified Mail uses cryptographic signatures to validate integrity and domain ownership.
  • Canonicalization: Rules for normalizing headers and body before hashing.
  • Body hash: The bh= tag is the base64 hash of the canonicalized body.
  • Selector and domain: s= and d= build the DNS name for the public key.

2.2 Why This Matters

DKIM is the integrity layer. Without correct canonicalization, even valid signatures will fail, causing deliverability issues.

2.3 Historical Context / Background

DKIM merges DomainKeys and Identified Internet Mail. It allows recipients to verify that content was not modified and that it was authorized by the domain.

2.4 Common Misconceptions

  • Misconception: DKIM validates the sender address. Reality: It validates a signing domain.
  • Misconception: Only the body is signed. Reality: Selected headers are signed too.

3. Project Specification

3.1 What You Will Build

A tool that accepts a raw RFC 5322 message, extracts DKIM signatures, canonicalizes headers and body, fetches the public key, and verifies the signature.

3.2 Functional Requirements

  1. Parse headers and body, preserving raw header order.
  2. Parse DKIM-Signature tags (v, a, d, s, h, bh, b, c).
  3. Canonicalize headers and body per c=.
  4. Compute body hash and compare to bh=.
  5. Verify signature over the canonicalized header set.

3.3 Non-Functional Requirements

  • Performance: Should verify a message in under 1 second.
  • Reliability: Must handle folded headers and multiple signatures.
  • Usability: Return detailed failure reasons.

3.4 Example Usage / Output

$ ./dkim-verify message.eml
Signature 1: d=example.com s=selector1
Body hash: PASS
Header signature: PASS
Result: DKIM PASS

3.5 Real World Outcome

You can audit messages to prove whether content and headers were altered in transit and which domain authorized them.


4. Solution Architecture

4.1 High-Level Design

Message Parser
  -> DKIM Tag Parser
  -> Canonicalizer
  -> DNS Key Fetcher
  -> Signature Verifier

4.2 Key Components

Component Responsibility Key Decisions
Parser Split headers/body Preserve raw header order
Canonicalizer Apply relaxed/simple rules Follow RFC 6376
Key Fetcher Lookup public key TXT query on selector._domainkey
Verifier RSA verify Use crypto library

4.3 Data Structures

class DkimSignature:
    def __init__(self, tags: dict):
        self.tags = tags
        self.headers = []

4.4 Algorithm Overview

Key Algorithm: Header Canonicalization (relaxed)

  1. Lowercase header field name.
  2. Unfold whitespace, trim, compress WSP.
  3. Rebuild “name:value” with single space.

Complexity Analysis:

  • Time: O(n) for message length
  • Space: O(n)

5. Implementation Guide

5.1 Development Environment Setup

python -m venv .venv
source .venv/bin/activate
python -m pip install cryptography

5.2 Project Structure

dkim-verify/
├── message_parser.py
├── dkim_tags.py
├── canonicalize.py
├── dns_keys.py
└── verify.py

5.3 The Core Question You’re Answering

“Can I prove this message was authorized by the signing domain and not modified?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Header Folding
    • RFC 5322 header continuation rules
  2. Canonicalization
    • Simple vs relaxed for headers and body
  3. DKIM Tags
    • Required tags: v, a, d, s, h, bh, b
  4. RSA Signatures
    • Base64 decoding and verification

5.5 Questions to Guide Your Design

  1. How will you choose which headers are signed when multiple appear?
  2. How will you handle body length tag l= if present?
  3. How will you parse and preserve header order?

5.6 Thinking Exercise

If a header appears twice (e.g., two Received headers), which instance is signed? Why does order matter?

5.7 The Interview Questions They’ll Ask

  1. “What is DKIM canonicalization and why is it necessary?”
  2. “What does the bh= tag represent?”
  3. “How do you build the DNS name for the DKIM public key?”

5.8 Hints in Layers

Hint 1: Parse tags into a dict

  • Split on semicolons, then on ‘=’.

Hint 2: Verify body hash first

  • It is simpler and isolates issues early.

Hint 3: Build the signing string carefully

  • The DKIM-Signature header itself is included with an empty b=.

5.9 Books That Will Help

Topic Book Chapter
DKIM spec RFC 6376 Sections 3-6
Cryptography Serious Cryptography Ch. 6, 11
Message format RFC 5322 Sections 2-3

5.10 Implementation Phases

Phase 1: Foundation (1 week)

Goals:

  • Parse message and DKIM tags

Tasks:

  1. Split headers and body.
  2. Parse DKIM tag list.

Checkpoint: Print tags and header list.

Phase 2: Core Functionality (1-2 weeks)

Goals:

  • Canonicalize and compute hashes

Tasks:

  1. Implement relaxed and simple canonicalization.
  2. Compute body hash and compare.

Checkpoint: Correct bh= validation on known message.

Phase 3: Polish and Edge Cases (1 week)

Goals:

  • Verify signature

Tasks:

  1. Fetch public key from DNS.
  2. Verify RSA signature for signed headers.

Checkpoint: DKIM PASS on a real message.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Crypto library cryptography vs openssl CLI cryptography Cleaner API
Canonicalization implement both implement both Real messages use relaxed
Multi-signature first vs all verify all More accurate

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Canonicalization folded headers, trailing spaces
Integration Tests Real messages Gmail and Yahoo samples
Edge Case Tests Multiple signatures Verify all or report failures

6.2 Critical Test Cases

  1. Relaxed canonicalization matches RFC examples.
  2. Body hash mismatch returns DKIM fail.
  3. Missing public key returns temperror.

6.3 Test Data

DKIM-Signature: v=1; a=rsa-sha256; d=example.com; s=sel; ...

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Mishandled header folding Signature fails Preserve raw header lines
Incorrect canonicalization False failures Follow RFC examples
Wrong key name NXDOMAIN Use s._domainkey.d

7.2 Debugging Strategies

  • Compare with opendkim-testmsg output.
  • Log canonicalized header string.

7.3 Performance Traps

  • Re-fetching DNS keys for each message. Cache keys by selector.

8. Extensions and Challenges

8.1 Beginner Extensions

  • Report which header failed canonicalization.
  • Output JSON results.

8.2 Intermediate Extensions

  • Support ed25519-sha256 when present.
  • Verify l= body length tag.

8.3 Advanced Extensions

  • Implement ARC verification chain.
  • Build a DKIM lint tool for domains.

9. Real-World Connections

9.1 Industry Applications

  • Mail gateways use DKIM verification for trust scoring.
  • Security tools validate integrity of inbound messages.
  • OpenDKIM: https://github.com/trusteddomainproject/OpenDKIM
  • dkimpy: https://github.com/kjd/idc-dkimpy

9.3 Interview Relevance

  • Canonicalization and signature verification are common email security topics.

10. Resources

10.1 Essential Reading

  • RFC 6376 - DKIM specification
  • RFC 5322 - Message format

10.2 Video Resources

  • DKIM verification walkthroughs

10.3 Tools and Documentation

  • opendkim-testmsg for reference verification
  • dig for DKIM TXT lookups

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain DKIM tags and selectors
  • I understand relaxed vs simple canonicalization
  • I can explain the body hash

11.2 Implementation

  • Verifies real DKIM signatures
  • Handles multiple signatures
  • Reports clear failure reasons

11.3 Growth

  • I can debug DKIM failures by inspecting canonicalized data
  • I can explain DKIM limitations without DMARC

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Parse DKIM header and fetch public key

Full Completion:

  • Verify body hash and header signature

Excellence (Going Above and Beyond):

  • Support multiple algorithms and ARC
  • Provide detailed diagnostics and linting

This guide was generated from EMAIL_SYSTEMS_DEEP_DIVE_PROJECTS.md. For the complete learning path, see the parent directory.