Project 3: DNS Deep Dive Tool
A DNS analysis tool that traces resolution, compares resolvers, and detects misconfigurations.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Intermediate |
| Time Estimate | 1 week |
| Main Programming Language | Bash |
| Alternative Programming Languages | Python, Go, Rust |
| Coolness Level | Level 3: Clever |
| Business Potential | 2. Micro-SaaS / Pro tool |
| Prerequisites | Basic Linux CLI |
| Key Topics | DNS Resolution and Name System Behavior |
1. Learning Objectives
By completing this project, you will:
- Build the core tool described in the project and validate output against a golden transcript.
- Explain how the tool maps to the Linux networking layer model.
- Diagnose at least one real or simulated failure using the tool’s output.
2. All Theory Needed (Per-Concept Breakdown)
This section includes every concept required to implement this project successfully.
DNS Resolution and Name System Behavior
Fundamentals
DNS is the internet’s naming system: it maps human-friendly names to resource records such as A, AAAA, MX, and TXT. A client (stub resolver) typically asks a recursive resolver to answer. If the recursive resolver does not have the answer cached, it follows the hierarchy: root servers point to TLD servers, which point to authoritative servers for the domain. RFC 1034 defines the conceptual model and RFC 1035 defines the protocol and message format. The root zone is served by 13 named authorities (A through M) with many anycast instances worldwide. On Linux, name resolution is often mediated by systemd-resolved; resolvectl shows which upstream servers are in use, whether DNSSEC validation is enabled, and which interface supplied the configuration. This chapter teaches you to treat DNS as a multi-stage system with caches, delegation, and failure modes rather than as a simple lookup table.
Deep Dive DNS resolution is a distributed, cached workflow with explicit authority boundaries. The stub resolver (part of glibc, systemd-resolved, or another resolver component) forwards a query to a recursive resolver. The recursive resolver answers from cache if possible, or performs iterative resolution: it asks a root server for the TLD delegation, asks the TLD server for the domain’s authoritative server, and then asks the authoritative server for the actual record. Each response contains referrals and glue records, and the resolver follows them until it obtains an authoritative answer. This delegation chain explains why DNS failures can occur in specific segments: a root server issue affects only the first step, while a broken authoritative server affects only its zone.
Caching is central to DNS correctness. Every answer has a TTL, and resolvers cache both positive and negative responses. A short TTL allows rapid changes but increases load and latency; a long TTL increases stability but delays recovery from mistakes. Negative caching (caching NXDOMAIN) can cause failures to persist longer than expected. When you troubleshoot DNS, you must distinguish between the authoritative truth and the cached reality. This is why comparing multiple resolvers is such a powerful technique: if one resolver is wrong, it is usually a cache or policy issue; if all resolvers are wrong, the authoritative zone is likely at fault.
Linux introduces an additional layer of complexity: multiple components can manage resolver configuration. systemd-resolved may serve a local stub address (often 127.0.0.53), NetworkManager may set per-interface DNS servers, and VPN clients may override DNS settings. resolvectl surfaces the runtime state, revealing which upstreams are actually being used and which interface contributed them. This is essential when you see “DNS works sometimes,” because the system might be switching between upstreams or applying split DNS rules. Without this visibility you might debug the wrong resolver entirely.
DNSSEC adds cryptographic integrity. It uses signatures (RRSIG) and chain-of-trust records (DS, DNSKEY) to allow a validating resolver to verify that an answer has not been tampered with. If validation fails, the resolver can return a “bogus” result, which is functionally a failure even if the record exists. This is not a DNSSEC bug; it is the intended protection. The important mental model is: DNSSEC provides integrity, not availability. A missing signature or a broken chain can cause resolution failure even when the authoritative server is reachable.
Failure modes map cleanly to the resolution chain. NXDOMAIN can be legitimate or a poisoned response. SERVFAIL can indicate upstream outages, misconfigured DNSSEC, or authoritative server errors. Inconsistent answers across resolvers point to caching, geo-based responses, or split-horizon DNS. The proper diagnostic approach is layered: query the system resolver (what applications see), query a public recursive resolver (what the internet sees), then query authoritative servers directly (the truth for the zone). If those disagree, you have located the fault boundary. This is exactly the diagnostic muscle the DNS Deep Dive Tool project will train.
Finally, remember that DNS is a dependency for nearly all applications. A slow or inconsistent resolver adds latency to every request. That means “network is slow” can be a DNS problem even if packets are flowing perfectly. By treating DNS as a system with hierarchies, caches, and validation, you gain the ability to diagnose outages that look random but are actually deterministic.
How this fit on projects
- DNS Deep Dive Tool (Project 3)
- Connectivity Diagnostic Suite (Project 2)
Definitions & key terms
- Resolver: Client or service that performs DNS lookups for applications.
- Authoritative server: DNS server that hosts the original records for a zone.
- TTL: Time a record can be cached.
Mental model diagram
App -> Stub Resolver -> Recursive Resolver
|-> Root -> TLD -> Authoritative
|-> Cache
How it works (step-by-step, invariants, failure modes)
- App asks stub resolver for name.
- Stub asks recursive resolver.
- Recursive uses cache or queries root/TLD/authoritative.
- Answer returned, cached for TTL. Invariants: DNS is hierarchical; records are cached with TTL. Failure modes: wrong resolver, DNSSEC validation failure, stale cache.
Minimal concrete example Protocol transcript (simplified):
Query: A example.com
Root -> referral to .com
TLD -> referral to example.com authoritative
Auth -> A 93.184.216.34 TTL 86400
Common misconceptions
- “DNS is just a file.” (It is a distributed, cached system.)
- “If one resolver works, DNS is fine.” (Different resolvers can have different caches.)
Check-your-understanding questions
- What does a recursive resolver do that a stub resolver does not?
- Why can two users see different DNS answers for the same name?
- Why can DNSSEC cause lookups to fail even if records exist?
Check-your-understanding answers
- It performs iterative queries and caching on behalf of the client.
- Caches and different upstream resolvers yield different answers.
- Missing or invalid signatures cause validation failure.
Real-world applications
- Debugging website outages, email misrouting, and CDN propagation issues.
Where you’ll apply it Projects 2 and 3.
References
- DNS conceptual and protocol standards (RFC 1034/1035).
- Root servers and 13 named authorities (IANA).
- resolvectl description (systemd-resolved interface).
Key insights DNS failures are often cache or resolver-path problems, not record problems.
Summary You now know the DNS chain of responsibility and how Linux exposes its resolver state.
Homework/Exercises to practice the concept
- Draw the resolution path for a domain with a CNAME that points to a CDN.
- Explain how TTL affects incident recovery timelines.
Solutions to the homework/exercises
- The resolver must follow the CNAME to its target and query that name’s authoritative servers.
- Short TTLs speed recovery but increase query load; long TTLs delay changes.
3. Project Specification
3.1 What You Will Build
A DNS analysis tool that traces resolution, compares resolvers, and detects misconfigurations.
3.2 Functional Requirements
- Core data collection: Gather the required system/network data reliably.
- Interpretation layer: Translate raw outputs into human-readable insights.
- Deterministic output: Produce stable, comparable results across runs.
- Error handling: Detect missing privileges, tools, or unsupported interfaces.
3.3 Non-Functional Requirements
- Performance: Runs in under 5 seconds for baseline mode.
- Reliability: Handles missing data sources gracefully.
- Usability: Output is readable without post-processing.
3.4 Example Usage / Output
$ ./dnsdeep.sh example.com
DNS DEEP DIVE
System resolver: systemd-resolved
Trace:
root -> .com -> example.com auth
Records:
A: 93.184.216.34
AAAA: 2606:2800:220:1:248:1893:25c8:1946
NS: a.iana-servers.net
SOA: ns.icann.org hostmaster.icann.org
Resolver comparison:
system: 93.184.216.34 (2 ms)
8.8.8.8: 93.184.216.34 (15 ms)
1.1.1.1: 93.184.216.34 (12 ms)
Health:
DNSSEC: validated
Consistent answers: yes
3.5 Data Formats / Schemas / Protocols
- Input: CLI tool output, kernel state, or service logs.
- Output: A structured report with sections and summarized metrics.
3.6 Edge Cases
- Missing tool binaries or insufficient permissions.
- Interfaces or hosts that return no data.
- Transient states (link flaps, intermittent loss).
3.7 Real World Outcome
$ ./dnsdeep.sh example.com
DNS DEEP DIVE
System resolver: systemd-resolved
Trace:
root -> .com -> example.com auth
Records:
A: 93.184.216.34
AAAA: 2606:2800:220:1:248:1893:25c8:1946
NS: a.iana-servers.net
SOA: ns.icann.org hostmaster.icann.org
Resolver comparison:
system: 93.184.216.34 (2 ms)
8.8.8.8: 93.184.216.34 (15 ms)
1.1.1.1: 93.184.216.34 (12 ms)
Health:
DNSSEC: validated
Consistent answers: yes
3.7.1 How to Run (Copy/Paste)
$ ./run-project.sh [options]
3.7.2 Golden Path Demo (Deterministic)
Run the tool against a known-good target and verify every section of the output matches the expected format.
3.7.3 If CLI: provide an exact terminal transcript
$ ./dnsdeep.sh example.com
DNS DEEP DIVE
System resolver: systemd-resolved
Trace:
root -> .com -> example.com auth
Records:
A: 93.184.216.34
AAAA: 2606:2800:220:1:248:1893:25c8:1946
NS: a.iana-servers.net
SOA: ns.icann.org hostmaster.icann.org
Resolver comparison:
system: 93.184.216.34 (2 ms)
8.8.8.8: 93.184.216.34 (15 ms)
1.1.1.1: 93.184.216.34 (12 ms)
Health:
DNSSEC: validated
Consistent answers: yes
4. Solution Architecture
4.1 High-Level Design
[Collector] -> [Parser] -> [Analyzer] -> [Reporter]
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Collector | Gather raw tool output | Which tools to call and with what flags |
| Parser | Normalize raw text/JSON | Text vs JSON parsing strategy |
| Analyzer | Compute insights | Thresholds and heuristics |
| Reporter | Format output | Stable layout and readability |
4.3 Data Structures (No Full Code)
- InterfaceRecord: name, state, addresses, stats
- RouteRecord: prefix, gateway, interface, metric
- Observation: timestamp, source, severity, message
4.4 Algorithm Overview
Key Algorithm: Evidence Aggregation
- Collect raw outputs from tools.
- Parse into normalized records.
- Apply interpretation rules and thresholds.
- Render the final report.
Complexity Analysis:
- Time: O(n) over number of records
- Space: O(n) to hold parsed records
5. Implementation Guide
5.1 Development Environment Setup
# Install required tools with your distro package manager
5.2 Project Structure
project-root/
├── src/
│ ├── main
│ ├── collectors/
│ └── formatters/
├── tests/
└── README.md
5.3 The Core Question You’re Answering
“How does a name become an IP, and where can that process break?”
5.4 Concepts You Must Understand First
- DNS hierarchy
- Root -> TLD -> authoritative.
- Book Reference: “DNS and BIND” - Ch. 1-2
- Resource records
- A, AAAA, MX, CNAME, NS, SOA.
- Book Reference: “DNS and BIND” - Ch. 4
- Caching and TTL
- Why answers differ across resolvers.
- Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 14
5.5 Questions to Guide Your Design
- How will you display resolver differences clearly?
- Which record types should be considered “health critical”?
- How will you detect missing MX or mismatched A records?
5.6 Thinking Exercise
Given this answer:
example.com. 3600 IN CNAME example.net.
example.net. 3600 IN A 203.0.113.10
Questions:
- Which name has the A record?
- What should your tool display to avoid confusion?
5.7 The Interview Questions They’ll Ask
- “What is the difference between recursive and authoritative servers?”
- “Why might DNS answers differ between resolvers?”
- “What does the SOA record represent?”
- “How would you test DNSSEC validation?”
- “Why is an MX record missing a problem?”
5.8 Hints in Layers
Hint 1: Use dig +trace for hierarchy.
Hint 2: Query each record type individually.
Hint 3: Compare resolvers with dig @server.
Hint 4: Use resolvectl status to show system resolver.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| DNS basics | “DNS and BIND” | Ch. 1-4 |
| DNS protocol | “TCP/IP Illustrated, Vol 1” | Ch. 14 |
5.10 Implementation Phases
Phase 1: Foundation (1-2 days)
- Define outputs and parse a single tool.
- Produce a minimal report.
Phase 2: Core Functionality (3-5 days)
- Add remaining tools and interpretation logic.
- Implement stable formatting and summaries.
Phase 3: Polish & Edge Cases (2-3 days)
- Handle missing data and failure modes.
- Add thresholds and validation checks.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Parsing format | Text vs JSON | JSON where available | More stable parsing |
| Output layout | Table vs sections | Sections | Readability for humans |
| Sampling | One-shot vs periodic | One-shot + optional loop | Predictable runtime |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate parsing | Parse fixed tool output samples |
| Integration Tests | Validate tool calls | Run against a lab host |
| Edge Case Tests | Handle failures | Missing tool, no permissions |
6.2 Critical Test Cases
- Reference run: Output matches golden transcript.
- Missing tool: Proper error message and partial report.
- Permission denied: Clear guidance for sudo or capabilities.
6.3 Test Data
Input: captured command output
Expected: normalized report with correct totals
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong interface | Empty output | Verify interface names |
| Missing privileges | Permission errors | Use sudo or capabilities |
| Misparsed output | Wrong stats | Prefer JSON parsing |
7.2 Debugging Strategies
- Re-run each tool independently to compare raw output.
- Add a verbose mode that dumps raw data sources.
7.3 Performance Traps
- Avoid tight loops without sleep intervals.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add colored status markers.
- Export report to a file.
8.2 Intermediate Extensions
- Add JSON output mode.
- Add baseline comparison.
8.3 Advanced Extensions
- Add multi-host aggregation.
- Add alerting thresholds.
9. Real-World Connections
9.1 Industry Applications
- SRE runbooks and on-call diagnostics.
- Network operations monitoring.
9.2 Related Open Source Projects
- tcpdump / iproute2 / nftables
- mtr / iperf3
9.3 Interview Relevance
- Demonstrates evidence-based debugging and tool mastery.
10. Resources
10.1 Essential Reading
- Primary book listed in the main guide.
- Relevant RFCs and tool manuals.
10.2 Video Resources
- Conference talks on Linux networking and troubleshooting.