Project 3: DNS Deep Dive Tool

A DNS analysis tool that traces resolution, compares resolvers, and detects misconfigurations.

Quick Reference

Attribute	Value
Difficulty	Level 2: Intermediate
Time Estimate	1 week
Main Programming Language	Bash
Alternative Programming Languages	Python, Go, Rust
Coolness Level	Level 3: Clever
Business Potential	2. Micro-SaaS / Pro tool
Prerequisites	Basic Linux CLI
Key Topics	DNS Resolution and Name System Behavior

1. Learning Objectives

By completing this project, you will:

Build the core tool described in the project and validate output against a golden transcript.
Explain how the tool maps to the Linux networking layer model.
Diagnose at least one real or simulated failure using the tool’s output.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement this project successfully.

DNS Resolution and Name System Behavior

Fundamentals DNS is the internet’s naming system: it maps human-friendly names to resource records such as A, AAAA, MX, and TXT. A client (stub resolver) typically asks a recursive resolver to answer. If the recursive resolver does not have the answer cached, it follows the hierarchy: root servers point to TLD servers, which point to authoritative servers for the domain. RFC 1034 defines the conceptual model and RFC 1035 defines the protocol and message format. The root zone is served by 13 named authorities (A through M) with many anycast instances worldwide. On Linux, name resolution is often mediated by systemd-resolved; resolvectl shows which upstream servers are in use, whether DNSSEC validation is enabled, and which interface supplied the configuration. This chapter teaches you to treat DNS as a multi-stage system with caches, delegation, and failure modes rather than as a simple lookup table.

Deep Dive DNS resolution is a distributed, cached workflow with explicit authority boundaries. The stub resolver (part of glibc, systemd-resolved, or another resolver component) forwards a query to a recursive resolver. The recursive resolver answers from cache if possible, or performs iterative resolution: it asks a root server for the TLD delegation, asks the TLD server for the domain’s authoritative server, and then asks the authoritative server for the actual record. Each response contains referrals and glue records, and the resolver follows them until it obtains an authoritative answer. This delegation chain explains why DNS failures can occur in specific segments: a root server issue affects only the first step, while a broken authoritative server affects only its zone.

Caching is central to DNS correctness. Every answer has a TTL, and resolvers cache both positive and negative responses. A short TTL allows rapid changes but increases load and latency; a long TTL increases stability but delays recovery from mistakes. Negative caching (caching NXDOMAIN) can cause failures to persist longer than expected. When you troubleshoot DNS, you must distinguish between the authoritative truth and the cached reality. This is why comparing multiple resolvers is such a powerful technique: if one resolver is wrong, it is usually a cache or policy issue; if all resolvers are wrong, the authoritative zone is likely at fault.

Linux introduces an additional layer of complexity: multiple components can manage resolver configuration. systemd-resolved may serve a local stub address (often 127.0.0.53), NetworkManager may set per-interface DNS servers, and VPN clients may override DNS settings. resolvectl surfaces the runtime state, revealing which upstreams are actually being used and which interface contributed them. This is essential when you see “DNS works sometimes,” because the system might be switching between upstreams or applying split DNS rules. Without this visibility you might debug the wrong resolver entirely.

DNSSEC adds cryptographic integrity. It uses signatures (RRSIG) and chain-of-trust records (DS, DNSKEY) to allow a validating resolver to verify that an answer has not been tampered with. If validation fails, the resolver can return a “bogus” result, which is functionally a failure even if the record exists. This is not a DNSSEC bug; it is the intended protection. The important mental model is: DNSSEC provides integrity, not availability. A missing signature or a broken chain can cause resolution failure even when the authoritative server is reachable.

Failure modes map cleanly to the resolution chain. NXDOMAIN can be legitimate or a poisoned response. SERVFAIL can indicate upstream outages, misconfigured DNSSEC, or authoritative server errors. Inconsistent answers across resolvers point to caching, geo-based responses, or split-horizon DNS. The proper diagnostic approach is layered: query the system resolver (what applications see), query a public recursive resolver (what the internet sees), then query authoritative servers directly (the truth for the zone). If those disagree, you have located the fault boundary. This is exactly the diagnostic muscle the DNS Deep Dive Tool project will train.

Finally, remember that DNS is a dependency for nearly all applications. A slow or inconsistent resolver adds latency to every request. That means “network is slow” can be a DNS problem even if packets are flowing perfectly. By treating DNS as a system with hierarchies, caches, and validation, you gain the ability to diagnose outages that look random but are actually deterministic.

How this fit on projects

DNS Deep Dive Tool (Project 3)
Connectivity Diagnostic Suite (Project 2)

Definitions & key terms

Resolver: Client or service that performs DNS lookups for applications.
Authoritative server: DNS server that hosts the original records for a zone.
TTL: Time a record can be cached.

Mental model diagram

App -> Stub Resolver -> Recursive Resolver
                       |-> Root -> TLD -> Authoritative
                       |-> Cache

How it works (step-by-step, invariants, failure modes)

App asks stub resolver for name.
Stub asks recursive resolver.
Recursive uses cache or queries root/TLD/authoritative.
Answer returned, cached for TTL. Invariants: DNS is hierarchical; records are cached with TTL. Failure modes: wrong resolver, DNSSEC validation failure, stale cache.

Minimal concrete example Protocol transcript (simplified):

Query: A example.com
Root -> referral to .com
TLD -> referral to example.com authoritative
Auth -> A 93.184.216.34 TTL 86400

Common misconceptions

“DNS is just a file.” (It is a distributed, cached system.)
“If one resolver works, DNS is fine.” (Different resolvers can have different caches.)

Check-your-understanding questions

What does a recursive resolver do that a stub resolver does not?
Why can two users see different DNS answers for the same name?
Why can DNSSEC cause lookups to fail even if records exist?

Check-your-understanding answers

It performs iterative queries and caching on behalf of the client.
Caches and different upstream resolvers yield different answers.
Missing or invalid signatures cause validation failure.

Real-world applications

Debugging website outages, email misrouting, and CDN propagation issues.

Where you’ll apply it Projects 2 and 3.

References

DNS conceptual and protocol standards (RFC 1034/1035).
Root servers and 13 named authorities (IANA).
resolvectl description (systemd-resolved interface).

Key insights DNS failures are often cache or resolver-path problems, not record problems.

Summary You now know the DNS chain of responsibility and how Linux exposes its resolver state.

Homework/Exercises to practice the concept

Draw the resolution path for a domain with a CNAME that points to a CDN.
Explain how TTL affects incident recovery timelines.

Solutions to the homework/exercises

The resolver must follow the CNAME to its target and query that name’s authoritative servers.
Short TTLs speed recovery but increase query load; long TTLs delay changes.

3. Project Specification

3.1 What You Will Build

A DNS analysis tool that traces resolution, compares resolvers, and detects misconfigurations.

3.2 Functional Requirements

Core data collection: Gather the required system/network data reliably.
Interpretation layer: Translate raw outputs into human-readable insights.
Deterministic output: Produce stable, comparable results across runs.
Error handling: Detect missing privileges, tools, or unsupported interfaces.

3.3 Non-Functional Requirements

Performance: Runs in under 5 seconds for baseline mode.
Reliability: Handles missing data sources gracefully.
Usability: Output is readable without post-processing.

3.4 Example Usage / Output

$ ./dnsdeep.sh example.com

DNS DEEP DIVE
System resolver: systemd-resolved

Trace:
  root -> .com -> example.com auth

Records:
  A: 93.184.216.34
  AAAA: 2606:2800:220:1:248:1893:25c8:1946
  NS: a.iana-servers.net
  SOA: ns.icann.org hostmaster.icann.org

Resolver comparison:
  system: 93.184.216.34 (2 ms)
  8.8.8.8: 93.184.216.34 (15 ms)
  1.1.1.1: 93.184.216.34 (12 ms)

Health:
  DNSSEC: validated
  Consistent answers: yes

3.5 Data Formats / Schemas / Protocols

Input: CLI tool output, kernel state, or service logs.
Output: A structured report with sections and summarized metrics.

3.6 Edge Cases

Missing tool binaries or insufficient permissions.
Interfaces or hosts that return no data.
Transient states (link flaps, intermittent loss).

3.7 Real World Outcome

$ ./dnsdeep.sh example.com

DNS DEEP DIVE
System resolver: systemd-resolved

Trace:
  root -> .com -> example.com auth

Records:
  A: 93.184.216.34
  AAAA: 2606:2800:220:1:248:1893:25c8:1946
  NS: a.iana-servers.net
  SOA: ns.icann.org hostmaster.icann.org

Resolver comparison:
  system: 93.184.216.34 (2 ms)
  8.8.8.8: 93.184.216.34 (15 ms)
  1.1.1.1: 93.184.216.34 (12 ms)

Health:
  DNSSEC: validated
  Consistent answers: yes

3.7.1 How to Run (Copy/Paste)

$ ./run-project.sh [options]

3.7.2 Golden Path Demo (Deterministic)

Run the tool against a known-good target and verify every section of the output matches the expected format.

3.7.3 If CLI: provide an exact terminal transcript

$ ./dnsdeep.sh example.com

DNS DEEP DIVE
System resolver: systemd-resolved

Trace:
  root -> .com -> example.com auth

Records:
  A: 93.184.216.34
  AAAA: 2606:2800:220:1:248:1893:25c8:1946
  NS: a.iana-servers.net
  SOA: ns.icann.org hostmaster.icann.org

Resolver comparison:
  system: 93.184.216.34 (2 ms)
  8.8.8.8: 93.184.216.34 (15 ms)
  1.1.1.1: 93.184.216.34 (12 ms)

Health:
  DNSSEC: validated
  Consistent answers: yes

4. Solution Architecture

4.1 High-Level Design

[Collector] -> [Parser] -> [Analyzer] -> [Reporter]

4.2 Key Components

Component	Responsibility	Key Decisions
Collector	Gather raw tool output	Which tools to call and with what flags
Parser	Normalize raw text/JSON	Text vs JSON parsing strategy
Analyzer	Compute insights	Thresholds and heuristics
Reporter	Format output	Stable layout and readability

4.3 Data Structures (No Full Code)

InterfaceRecord: name, state, addresses, stats
RouteRecord: prefix, gateway, interface, metric
Observation: timestamp, source, severity, message

4.4 Algorithm Overview

Key Algorithm: Evidence Aggregation

Collect raw outputs from tools.
Parse into normalized records.
Apply interpretation rules and thresholds.
Render the final report.

Complexity Analysis:

Time: O(n) over number of records
Space: O(n) to hold parsed records

5. Implementation Guide

5.1 Development Environment Setup

# Install required tools with your distro package manager

5.2 Project Structure

project-root/
├── src/
│   ├── main
│   ├── collectors/
│   └── formatters/
├── tests/
└── README.md

5.3 The Core Question You’re Answering

“How does a name become an IP, and where can that process break?”

5.4 Concepts You Must Understand First

DNS hierarchy
- Root -> TLD -> authoritative.
- Book Reference: “DNS and BIND” - Ch. 1-2
Resource records
- A, AAAA, MX, CNAME, NS, SOA.
- Book Reference: “DNS and BIND” - Ch. 4
Caching and TTL
- Why answers differ across resolvers.
- Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 14

5.5 Questions to Guide Your Design

How will you display resolver differences clearly?
Which record types should be considered “health critical”?
How will you detect missing MX or mismatched A records?

5.6 Thinking Exercise

Given this answer:

example.com. 3600 IN CNAME example.net.
example.net. 3600 IN A 203.0.113.10

Questions:

Which name has the A record?
What should your tool display to avoid confusion?

5.7 The Interview Questions They’ll Ask

“What is the difference between recursive and authoritative servers?”
“Why might DNS answers differ between resolvers?”
“What does the SOA record represent?”
“How would you test DNSSEC validation?”
“Why is an MX record missing a problem?”

5.8 Hints in Layers

Hint 1: Use dig +trace for hierarchy. Hint 2: Query each record type individually. Hint 3: Compare resolvers with dig @server. Hint 4: Use resolvectl status to show system resolver.

5.9 Books That Will Help

Topic	Book	Chapter
DNS basics	“DNS and BIND”	Ch. 1-4
DNS protocol	“TCP/IP Illustrated, Vol 1”	Ch. 14

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Define outputs and parse a single tool.
Produce a minimal report.

Phase 2: Core Functionality (3-5 days)

Add remaining tools and interpretation logic.
Implement stable formatting and summaries.

Phase 3: Polish & Edge Cases (2-3 days)

Handle missing data and failure modes.
Add thresholds and validation checks.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Parsing format	Text vs JSON	JSON where available	More stable parsing
Output layout	Table vs sections	Sections	Readability for humans
Sampling	One-shot vs periodic	One-shot + optional loop	Predictable runtime

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Validate parsing	Parse fixed tool output samples
Integration Tests	Validate tool calls	Run against a lab host
Edge Case Tests	Handle failures	Missing tool, no permissions

6.2 Critical Test Cases

Reference run: Output matches golden transcript.
Missing tool: Proper error message and partial report.
Permission denied: Clear guidance for sudo or capabilities.

6.3 Test Data

Input: captured command output
Expected: normalized report with correct totals

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Wrong interface	Empty output	Verify interface names
Missing privileges	Permission errors	Use sudo or capabilities
Misparsed output	Wrong stats	Prefer JSON parsing

7.2 Debugging Strategies

Re-run each tool independently to compare raw output.
Add a verbose mode that dumps raw data sources.

7.3 Performance Traps

Avoid tight loops without sleep intervals.

8. Extensions & Challenges

8.1 Beginner Extensions

Add colored status markers.
Export report to a file.

8.2 Intermediate Extensions

Add JSON output mode.
Add baseline comparison.

8.3 Advanced Extensions

Add multi-host aggregation.
Add alerting thresholds.

9. Real-World Connections

9.1 Industry Applications

SRE runbooks and on-call diagnostics.
Network operations monitoring.

tcpdump / iproute2 / nftables
mtr / iperf3

9.3 Interview Relevance

Demonstrates evidence-based debugging and tool mastery.

10. Resources

10.1 Essential Reading

Primary book listed in the main guide.
Relevant RFCs and tool manuals.

10.2 Video Resources

Conference talks on Linux networking and troubleshooting.