Project 2: Connectivity Diagnostic Suite

A diagnostic tool that runs layered connectivity tests and produces a structured report.

Quick Reference

Attribute Value
Difficulty Level 2: Intermediate
Time Estimate 1 week
Main Programming Language Bash
Alternative Programming Languages Python, Go
Coolness Level Level 3: Clever
Business Potential 2. Micro-SaaS / Pro tool
Prerequisites Basic Linux CLI
Key Topics IP Addressing, Routing, and Path Discovery, DNS Resolution and Name System Behavior, ICMP and Path Probing

1. Learning Objectives

By completing this project, you will:

  1. Build the core tool described in the project and validate output against a golden transcript.
  2. Explain how the tool maps to the Linux networking layer model.
  3. Diagnose at least one real or simulated failure using the tool’s output.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement this project successfully.

IP Addressing, Routing, and Path Discovery

Fundamentals

Routing is the decision process that answers, “Where should this packet go next?” Linux chooses routes using longest-prefix match and attaches that choice to an egress interface and, if needed, a next-hop gateway. The ip tool exposes both the routing tables and the policy rules that choose which table to consult, and it can ask the kernel directly which route would be used for a given destination. Path discovery tools translate that decision into evidence: tracepath probes the path and reports Path MTU (PMTU), while mtr repeatedly probes hops to surface loss and latency patterns. Together, these tools let you move from assumptions (“the route is fine”) to proof (“the kernel will use this gateway, and hop 6 drops 30% of probes”). That shift from inference to evidence is the central skill in routing diagnostics.

Deep Dive

Linux routing is a policy engine, not a single static table. Before any prefix matching occurs, the kernel consults routing policy rules. These rules can select a routing table based on source address, incoming interface, firewall mark, or user-defined priority. Once a table is chosen, the kernel performs longest-prefix match: the most specific prefix wins, and metrics break ties among equally specific routes. The final selection yields an egress interface, a next hop (if the destination is not directly connected), and a preferred source IP. This explains many “route exists but traffic fails” scenarios: the route might exist in a table that is never selected for that traffic, or the preferred source IP might not be reachable on the chosen path.

The most important command in this domain is ip route get <destination>. It queries the kernel’s decision engine and returns exactly what would happen if a packet were sent: the chosen route, interface, and source address. It is your truth oracle because it reflects the kernel’s actual behavior, not your interpretation of the routing table. But a routing decision alone does not guarantee reachability. The next hop must still be reachable at Layer 2, and the path beyond the next hop must accept and forward the packet. That is why route diagnosis always includes neighbor resolution and path probing.

Path discovery tools provide that second half. tracepath sends probes with increasing TTL values and reports where ICMP responses are generated. It also discovers PMTU by observing “Packet Too Big” responses and tracking the smallest MTU on the path. mtr adds repetition, showing latency and loss over time rather than a single snapshot. This matters because routing problems often manifest as intermittent congestion or packet loss at specific hops. A static traceroute might miss a transient spike; a rolling mtr report reveals it. The pairing of ip route get (decision evidence) with mtr (path behavior) is a powerful diagnostic habit.

PMTU is a classic foot-gun. The path MTU is the smallest MTU on the path between two hosts. If you send packets larger than the PMTU and fragmentation is disabled (as it often is for modern networks), routers will drop them and send ICMP “Packet Too Big.” If those ICMP messages are blocked, the sender never learns the correct size. The result is the infamous symptom: small packets work, large packets hang. Linux tools surface this in multiple ways: tracepath reports PMTU directly; tcpdump reveals ICMP errors; and iperf3 shows throughput collapse when MTU mismatches cause retransmissions. Understanding PMTU shifts your diagnosis from “the server is slow” to “the path is constrained.”

Advanced routing problems often involve policy routing and multiple interfaces. VPNs, source-based routing, and multi-homed hosts can send different destinations through different uplinks. The kernel may choose a route based on source address or marks assigned by firewall rules. If you only look at the main table, you will miss the true behavior. The correct workflow is: inspect ip rule, identify which table is in use for the traffic in question, use ip route get with a source address when needed, and then validate with path probes. This discipline separates a correct, reproducible diagnosis from a lucky guess.

Finally, remember that routing is only one layer. A correct route can still fail if neighbor resolution fails or if the next-hop router is down. That is why routing diagnosis must be layered: (1) What route does the kernel choose? (2) Can the next hop be resolved at L2? (3) What does the path beyond the next hop look like? The tools in this guide map directly to those questions, and the projects will force you to practice that sequence until it is reflexive.

How this fit on projects

  • Connectivity Diagnostic Suite (Project 2)
  • Routing Table Explorer (Project 7)
  • Bandwidth Monitor (Project 8)

Definitions & key terms

  • Longest-prefix match: Route selection rule where the most specific prefix wins.
  • PMTU: Path MTU, the smallest MTU along a path.
  • Policy routing: Selecting a routing table based on metadata, not just destination.

Mental model diagram

Destination IP
   |
   v
Policy rules -> Routing table -> Best prefix -> Next hop + egress
   |
   v
tracepath / mtr validate path and MTU

How it works (step-by-step, invariants, failure modes)

  1. Select routing table based on policy rules.
  2. Find best prefix match.
  3. Choose next hop and source IP.
  4. Resolve next hop at L2.
  5. Probe path for PMTU and latency. Invariants: best prefix wins; PMTU <= smallest link MTU. Failure modes: wrong table, blackhole route, ICMP blocked, PMTUD failure.

Minimal concrete example Route lookup transcript:

Destination: 203.0.113.10
Selected: 0.0.0.0/0 via 192.168.1.1 dev eth0 src 192.168.1.100

Common misconceptions

  • “The default route is used for everything.” (Only when no more specific prefix matches.)
  • “Traceroute proves connectivity.” (It only proves ICMP/TTL handling, not application reachability.)

Check-your-understanding questions

  1. Why can a /24 override a /16 route?
  2. What does tracepath report that traceroute might not?
  3. How can policy routing send the same destination via different paths?

Check-your-understanding answers

  1. Longest-prefix match chooses the most specific route.
  2. Path MTU changes along the path.
  3. Different tables can be selected based on source or marks.

Real-world applications

  • VPN split-tunneling, multi-homed servers, and performance debugging.

Where you’ll apply it Projects 2, 7, 8.

References

  • ip(8) description and routing functionality.
  • tracepath description (path + MTU discovery).
  • mtr description (combines traceroute and ping).
  • PMTU discovery standards.

Key insights Routing is a choice plus a constraint; you must verify both the chosen path and its MTU limits.

Summary You can now predict route selection and validate the path end-to-end using tracepath and mtr.

Homework/Exercises to practice the concept

  • Given a routing table with overlapping prefixes, predict which route is chosen for five destinations.
  • Use a diagram to show how PMTU failures cause “large packet” hangs.

Solutions to the homework/exercises

  • The most specific prefix always wins; ties go to lowest metric.
  • Large packets drop when the path MTU is smaller; ICMP “Packet Too Big” feedback is required to adapt.

DNS Resolution and Name System Behavior

Fundamentals

DNS is the internet’s naming system: it maps human-friendly names to resource records such as A, AAAA, MX, and TXT. A client (stub resolver) typically asks a recursive resolver to answer. If the recursive resolver does not have the answer cached, it follows the hierarchy: root servers point to TLD servers, which point to authoritative servers for the domain. RFC 1034 defines the conceptual model and RFC 1035 defines the protocol and message format. The root zone is served by 13 named authorities (A through M) with many anycast instances worldwide. On Linux, name resolution is often mediated by systemd-resolved; resolvectl shows which upstream servers are in use, whether DNSSEC validation is enabled, and which interface supplied the configuration. This chapter teaches you to treat DNS as a multi-stage system with caches, delegation, and failure modes rather than as a simple lookup table.

Deep Dive

DNS resolution is a distributed, cached workflow with explicit authority boundaries. The stub resolver (part of glibc, systemd-resolved, or another resolver component) forwards a query to a recursive resolver. The recursive resolver answers from cache if possible, or performs iterative resolution: it asks a root server for the TLD delegation, asks the TLD server for the domain’s authoritative server, and then asks the authoritative server for the actual record. Each response contains referrals and glue records, and the resolver follows them until it obtains an authoritative answer. This delegation chain explains why DNS failures can occur in specific segments: a root server issue affects only the first step, while a broken authoritative server affects only its zone.

Caching is central to DNS correctness. Every answer has a TTL, and resolvers cache both positive and negative responses. A short TTL allows rapid changes but increases load and latency; a long TTL increases stability but delays recovery from mistakes. Negative caching (caching NXDOMAIN) can cause failures to persist longer than expected. When you troubleshoot DNS, you must distinguish between the authoritative truth and the cached reality. This is why comparing multiple resolvers is such a powerful technique: if one resolver is wrong, it is usually a cache or policy issue; if all resolvers are wrong, the authoritative zone is likely at fault.

Linux introduces an additional layer of complexity: multiple components can manage resolver configuration. systemd-resolved may serve a local stub address (often 127.0.0.53), NetworkManager may set per-interface DNS servers, and VPN clients may override DNS settings. resolvectl surfaces the runtime state, revealing which upstreams are actually being used and which interface contributed them. This is essential when you see “DNS works sometimes,” because the system might be switching between upstreams or applying split DNS rules. Without this visibility you might debug the wrong resolver entirely.

DNSSEC adds cryptographic integrity. It uses signatures (RRSIG) and chain-of-trust records (DS, DNSKEY) to allow a validating resolver to verify that an answer has not been tampered with. If validation fails, the resolver can return a “bogus” result, which is functionally a failure even if the record exists. This is not a DNSSEC bug; it is the intended protection. The important mental model is: DNSSEC provides integrity, not availability. A missing signature or a broken chain can cause resolution failure even when the authoritative server is reachable.

Failure modes map cleanly to the resolution chain. NXDOMAIN can be legitimate or a poisoned response. SERVFAIL can indicate upstream outages, misconfigured DNSSEC, or authoritative server errors. Inconsistent answers across resolvers point to caching, geo-based responses, or split-horizon DNS. The proper diagnostic approach is layered: query the system resolver (what applications see), query a public recursive resolver (what the internet sees), then query authoritative servers directly (the truth for the zone). If those disagree, you have located the fault boundary. This is exactly the diagnostic muscle the DNS Deep Dive Tool project will train.

Finally, remember that DNS is a dependency for nearly all applications. A slow or inconsistent resolver adds latency to every request. That means “network is slow” can be a DNS problem even if packets are flowing perfectly. By treating DNS as a system with hierarchies, caches, and validation, you gain the ability to diagnose outages that look random but are actually deterministic.

How this fit on projects

  • DNS Deep Dive Tool (Project 3)
  • Connectivity Diagnostic Suite (Project 2)

Definitions & key terms

  • Resolver: Client or service that performs DNS lookups for applications.
  • Authoritative server: DNS server that hosts the original records for a zone.
  • TTL: Time a record can be cached.

Mental model diagram

App -> Stub Resolver -> Recursive Resolver
                       |-> Root -> TLD -> Authoritative
                       |-> Cache

How it works (step-by-step, invariants, failure modes)

  1. App asks stub resolver for name.
  2. Stub asks recursive resolver.
  3. Recursive uses cache or queries root/TLD/authoritative.
  4. Answer returned, cached for TTL. Invariants: DNS is hierarchical; records are cached with TTL. Failure modes: wrong resolver, DNSSEC validation failure, stale cache.

Minimal concrete example Protocol transcript (simplified):

Query: A example.com
Root -> referral to .com
TLD -> referral to example.com authoritative
Auth -> A 93.184.216.34 TTL 86400

Common misconceptions

  • “DNS is just a file.” (It is a distributed, cached system.)
  • “If one resolver works, DNS is fine.” (Different resolvers can have different caches.)

Check-your-understanding questions

  1. What does a recursive resolver do that a stub resolver does not?
  2. Why can two users see different DNS answers for the same name?
  3. Why can DNSSEC cause lookups to fail even if records exist?

Check-your-understanding answers

  1. It performs iterative queries and caching on behalf of the client.
  2. Caches and different upstream resolvers yield different answers.
  3. Missing or invalid signatures cause validation failure.

Real-world applications

  • Debugging website outages, email misrouting, and CDN propagation issues.

Where you’ll apply it Projects 2 and 3.

References

  • DNS conceptual and protocol standards (RFC 1034/1035).
  • Root servers and 13 named authorities (IANA).
  • resolvectl description (systemd-resolved interface).

Key insights DNS failures are often cache or resolver-path problems, not record problems.

Summary You now know the DNS chain of responsibility and how Linux exposes its resolver state.

Homework/Exercises to practice the concept

  • Draw the resolution path for a domain with a CNAME that points to a CDN.
  • Explain how TTL affects incident recovery timelines.

Solutions to the homework/exercises

  • The resolver must follow the CNAME to its target and query that name’s authoritative servers.
  • Short TTLs speed recovery but increase query load; long TTLs delay changes.

ICMP and Path Probing

Fundamentals

ICMP is the control and error-reporting protocol that helps IP work. It is used for reachability tests (Echo Request/Reply), error signals (Destination Unreachable), and path discovery (Time Exceeded). Tools like ping, traceroute, tracepath, and mtr rely on ICMP in different ways. Traceroute is built on a simple idea: send packets with increasing TTL values and observe ICMP Time Exceeded replies from routers along the path. If a hop does not respond, you may see asterisks, which can be caused by rate limiting, firewalls, or asymmetric paths. The core skill is to interpret ICMP signals as evidence about where a packet was handled and why it stopped.

Deep Dive

ICMP sits alongside IP as a signaling protocol. It does not carry application data, but it carries explanations: unreachable, expired, fragmentation needed, redirects. That makes it the backbone of network diagnostics. Ping uses ICMP Echo Request/Reply to establish basic reachability and round-trip time. Traceroute and tracepath use TTL expiration to expose the route: a packet is sent with TTL=1, the first router decrements to 0 and returns an ICMP Time Exceeded message. Then TTL=2 exposes the second hop, and so on until the destination responds. The response at the final hop depends on the probe type: traditional traceroute sends UDP probes to high ports to elicit an ICMP Port Unreachable, while TCP traceroute sends SYN packets and uses SYN-ACK or RST responses. Each method has different behaviors under firewalls, which is why asterisks are not a definitive sign of failure.

Rate limiting is a major source of confusion. Routers often rate-limit ICMP responses to protect themselves. That means a hop can show 100% loss in traceroute while the end-to-end path is perfectly healthy. The correct interpretation is to focus on the destination and on loss that persists beyond a hop. If hop 4 shows loss but hop 5 and the destination do not, hop 4 is merely rate limiting responses. If loss appears at hop 4 and persists to all subsequent hops, that indicates a real problem at or beyond hop 4.

Path MTU discovery depends on ICMP as well. When a packet is too large to be forwarded without fragmentation, routers send an ICMP “Packet Too Big” (IPv6) or “Fragmentation Needed” (IPv4) message with the next-hop MTU. If those messages are blocked, PMTUD fails, leading to the classic symptom that small packets succeed and large ones hang. tracepath is designed to surface PMTU by detecting these ICMP responses without requiring root. mtr can reveal intermittent ICMP loss, which may correlate with congestion or policy.

Asymmetry is another subtlety: the path from A to B can differ from B to A. Traceroute shows only the forward path of the probes, but ICMP replies traverse the return path. If the return path is filtered or congested, you may see false negatives in hop responses. This is why an ICMP-based picture must be interpreted cautiously: it shows where probes and responses traveled, not necessarily the full bidirectional path.

When you design a diagnostic workflow, ICMP should be layered. Start with ping to verify basic reachability. Use traceroute or tracepath to map hops and PMTU. Use mtr to observe variability over time. If ICMP is blocked, switch to TCP-based probes (e.g., TCP traceroute or a TCP connect test) to confirm application reachability. The goal is to use ICMP as a signal, not a single source of truth.

How this fit on projects

  • Connectivity Diagnostic Suite (Project 2)
  • Network Troubleshooting Wizard (Project 13)

Definitions & key terms

  • ICMP: Internet Control Message Protocol; signals errors and diagnostics for IP.
  • TTL: Time To Live; decremented by each router to prevent loops.
  • Time Exceeded: ICMP message sent when TTL reaches zero.

Mental model diagram

Probe TTL=1 -> Router1 -> ICMP Time Exceeded
Probe TTL=2 -> Router2 -> ICMP Time Exceeded
Probe TTL=3 -> Router3 -> ICMP Time Exceeded
...
Probe TTL=N -> Destination -> Reply

How it works (step-by-step, invariants, failure modes)

  1. Send probe with TTL=1.
  2. Router decrements TTL to 0 and returns Time Exceeded.
  3. Repeat with TTL=2,3,… to map hops.
  4. Interpret destination response to confirm reachability. Invariants: TTL decrements per hop; ICMP responses can be rate-limited. Failure modes: firewalls block ICMP, asymmetry hides hops, PMTUD messages dropped.

Minimal concrete example Protocol transcript (simplified):

TTL=1 -> 192.168.1.1 replies Time Exceeded
TTL=2 -> 10.0.0.1 replies Time Exceeded
TTL=3 -> * (no reply)
TTL=4 -> 203.0.113.10 replies Time Exceeded
TTL=5 -> 93.184.216.34 replies Port Unreachable

Common misconceptions

  • “Asterisks mean the hop is down.” (Often just rate limiting.)
  • “Traceroute shows the full round-trip path.” (It shows the forward path of probes; return path may differ.)

Check-your-understanding questions

  1. Why can traceroute show loss at an intermediate hop while the destination is reachable?
  2. How does TCP traceroute differ from UDP traceroute?
  3. What role does ICMP play in PMTUD?

Check-your-understanding answers

  1. Routers may rate-limit ICMP responses, causing apparent loss without real failure.
  2. TCP traceroute sends SYN packets and uses SYN-ACK/RST responses, often bypassing ICMP blocks.
  3. Routers send ICMP “Packet Too Big” messages to indicate the path MTU.

Real-world applications

  • Diagnosing ISP outages, firewall blocks, and MTU black holes.

Where you’ll apply it Projects 2 and 13.

References

  • RFC 792 (ICMP)
  • RFC 1191 / RFC 8201 (PMTU discovery)

Key insights ICMP is evidence, not verdict; interpret it in context with routing and application probes.

Summary You can now use ICMP-based tools to map paths, detect PMTU issues, and interpret loss correctly.

Homework/Exercises to practice the concept

  • Run traceroute and mtr to two destinations and compare hop stability.
  • Explain why hop-level loss can be misleading.

Solutions to the homework/exercises

  • Hop stability often varies due to rate limiting; only loss that persists to the destination indicates a true problem.
  • Intermediate loss without destination loss is usually a control-plane artifact, not a data-plane failure.

3. Project Specification

3.1 What You Will Build

A diagnostic tool that runs layered connectivity tests and produces a structured report.

3.2 Functional Requirements

  1. Core data collection: Gather the required system/network data reliably.
  2. Interpretation layer: Translate raw outputs into human-readable insights.
  3. Deterministic output: Produce stable, comparable results across runs.
  4. Error handling: Detect missing privileges, tools, or unsupported interfaces.

3.3 Non-Functional Requirements

  • Performance: Runs in under 5 seconds for baseline mode.
  • Reliability: Handles missing data sources gracefully.
  • Usability: Output is readable without post-processing.

3.4 Example Usage / Output

$ ./netdiag.sh example.com

NETWORK CONNECTIVITY REPORT
Target: example.com (93.184.216.34)
Source: 192.168.1.100 (eth0)

DNS: PASS (23 ms)
ICMP: PASS (0% loss, avg 12 ms)
TCP 443: FAIL (timeout)
Traceroute: path stalls after hop 4
PMTU: 1472 bytes
Diagnosis: likely firewall or upstream block on TCP 443

3.5 Data Formats / Schemas / Protocols

  • Input: CLI tool output, kernel state, or service logs.
  • Output: A structured report with sections and summarized metrics.

3.6 Edge Cases

  • Missing tool binaries or insufficient permissions.
  • Interfaces or hosts that return no data.
  • Transient states (link flaps, intermittent loss).

3.7 Real World Outcome

$ ./netdiag.sh example.com

NETWORK CONNECTIVITY REPORT
Target: example.com (93.184.216.34)
Source: 192.168.1.100 (eth0)

DNS: PASS (23 ms)
ICMP: PASS (0% loss, avg 12 ms)
TCP 443: FAIL (timeout)
Traceroute: path stalls after hop 4
PMTU: 1472 bytes
Diagnosis: likely firewall or upstream block on TCP 443

3.7.1 How to Run (Copy/Paste)

$ ./run-project.sh [options]

3.7.2 Golden Path Demo (Deterministic)

Run the tool against a known-good target and verify every section of the output matches the expected format.

3.7.3 If CLI: provide an exact terminal transcript

$ ./netdiag.sh example.com

NETWORK CONNECTIVITY REPORT
Target: example.com (93.184.216.34)
Source: 192.168.1.100 (eth0)

DNS: PASS (23 ms)
ICMP: PASS (0% loss, avg 12 ms)
TCP 443: FAIL (timeout)
Traceroute: path stalls after hop 4
PMTU: 1472 bytes
Diagnosis: likely firewall or upstream block on TCP 443

4. Solution Architecture

4.1 High-Level Design

[Collector] -> [Parser] -> [Analyzer] -> [Reporter]

4.2 Key Components

Component Responsibility Key Decisions
Collector Gather raw tool output Which tools to call and with what flags
Parser Normalize raw text/JSON Text vs JSON parsing strategy
Analyzer Compute insights Thresholds and heuristics
Reporter Format output Stable layout and readability

4.3 Data Structures (No Full Code)

  • InterfaceRecord: name, state, addresses, stats
  • RouteRecord: prefix, gateway, interface, metric
  • Observation: timestamp, source, severity, message

4.4 Algorithm Overview

Key Algorithm: Evidence Aggregation

  1. Collect raw outputs from tools.
  2. Parse into normalized records.
  3. Apply interpretation rules and thresholds.
  4. Render the final report.

Complexity Analysis:

  • Time: O(n) over number of records
  • Space: O(n) to hold parsed records

5. Implementation Guide

5.1 Development Environment Setup

# Install required tools with your distro package manager

5.2 Project Structure

project-root/
├── src/
│   ├── main
│   ├── collectors/
│   └── formatters/
├── tests/
└── README.md

5.3 The Core Question You’re Answering

“If the connection is broken or slow, exactly where is it failing and why?”

5.4 Concepts You Must Understand First

  1. ICMP behavior
    • What is ICMP used for?
    • Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 6-7
  2. Traceroute mechanics
    • How TTL reveals hops.
    • Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 8
  3. PMTU discovery
    • Why large packets can fail when small packets work.
    • Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 20

5.5 Questions to Guide Your Design

  1. What order of tests minimizes time while maximizing insight?
  2. How will you differentiate DNS failures from routing failures?
  3. Which outputs should map to specific root causes?

5.6 Thinking Exercise

Interpret this traceroute:

1  192.168.1.1  1 ms
2  10.0.0.1    12 ms
3  * * *
4  * * *
5  93.184.216.34  15 ms

Questions:

  • Is the destination reachable?
  • Why are hops 3-4 missing?

5.7 The Interview Questions They’ll Ask

  1. “Ping works but HTTPS fails. What do you check next?”
  2. “What does asterisks in traceroute mean?”
  3. “How is tracepath different from traceroute?”
  4. “How do you detect MTU problems?”
  5. “Why might ICMP be blocked but TCP still work?”

5.8 Hints in Layers

Hint 1: Start with DNS, then ICMP, then TCP. Hint 2: Use mtr in report mode for loss/latency trends. Hint 3: Use tracepath to report PMTU without root. Hint 4: Correlate a failing hop with upstream ISP boundaries.

5.9 Books That Will Help

Topic Book Chapter
ICMP “TCP/IP Illustrated, Vol 1” Ch. 6-7
Traceroute “TCP/IP Illustrated, Vol 1” Ch. 8
PMTU “TCP/IP Illustrated, Vol 1” Ch. 20

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

  • Define outputs and parse a single tool.
  • Produce a minimal report.

Phase 2: Core Functionality (3-5 days)

  • Add remaining tools and interpretation logic.
  • Implement stable formatting and summaries.

Phase 3: Polish & Edge Cases (2-3 days)

  • Handle missing data and failure modes.
  • Add thresholds and validation checks.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Parsing format Text vs JSON JSON where available More stable parsing
Output layout Table vs sections Sections Readability for humans
Sampling One-shot vs periodic One-shot + optional loop Predictable runtime

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate parsing Parse fixed tool output samples
Integration Tests Validate tool calls Run against a lab host
Edge Case Tests Handle failures Missing tool, no permissions

6.2 Critical Test Cases

  1. Reference run: Output matches golden transcript.
  2. Missing tool: Proper error message and partial report.
  3. Permission denied: Clear guidance for sudo or capabilities.

6.3 Test Data

Input: captured command output
Expected: normalized report with correct totals

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong interface Empty output Verify interface names
Missing privileges Permission errors Use sudo or capabilities
Misparsed output Wrong stats Prefer JSON parsing

7.2 Debugging Strategies

  • Re-run each tool independently to compare raw output.
  • Add a verbose mode that dumps raw data sources.

7.3 Performance Traps

  • Avoid tight loops without sleep intervals.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add colored status markers.
  • Export report to a file.

8.2 Intermediate Extensions

  • Add JSON output mode.
  • Add baseline comparison.

8.3 Advanced Extensions

  • Add multi-host aggregation.
  • Add alerting thresholds.

9. Real-World Connections

9.1 Industry Applications

  • SRE runbooks and on-call diagnostics.
  • Network operations monitoring.
  • tcpdump / iproute2 / nftables
  • mtr / iperf3

9.3 Interview Relevance

  • Demonstrates evidence-based debugging and tool mastery.

10. Resources

10.1 Essential Reading

  • Primary book listed in the main guide.
  • Relevant RFCs and tool manuals.

10.2 Video Resources

  • Conference talks on Linux networking and troubleshooting.