Project 9: Network Diagnostic Toolkit

Build a CLI suite that wraps core networking tools into a consistent, automated diagnostic workflow.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 2-3 weeks
Language Bash (Alternatives: Python, Go)
Prerequisites Projects 3-6, basic TCP/IP, familiarity with curl and ping
Key Topics network diagnostics, parallel execution, parsing, reporting

1. Learning Objectives

By completing this project, you will:

  1. Automate common network diagnostics into a single CLI.
  2. Parse and normalize outputs from tools like ping, traceroute, dig.
  3. Run checks in parallel with timeouts and retries.
  4. Generate human-readable and machine-readable reports.
  5. Build a modular diagnostic framework for new checks.

2. Theoretical Foundation

2.1 Core Concepts

  • Latency and packet loss: Measuring reliability and responsiveness.
  • DNS resolution: Tracing issues across resolvers.
  • Routing: Understanding hop-by-hop paths and bottlenecks.
  • Connectivity vs reachability: The difference between TCP connectivity and ICMP reachability.
  • Timeouts and retries: Robustness for flaky networks.

2.2 Why This Matters

When networks fail, time is critical. A repeatable diagnostic tool reduces guesswork, supports incident response, and provides evidence for root cause analysis.

2.3 Historical Context / Background

Tools like ping and traceroute date back to early Unix systems and remain foundational. Modern environments wrap these tools into automated playbooks; you will build your own.

2.4 Common Misconceptions

  • “Ping failure means host is down.” ICMP may be blocked.
  • “DNS works if one resolver works.” Cached results can hide issues.
  • “Traceroute always shows the real path.” Firewalls and load balancing distort paths.

3. Project Specification

3.1 What You Will Build

A CLI toolkit called netdiag that performs a standard diagnostic suite: ping, DNS lookup, traceroute, TCP port check, and HTTP request validation. It outputs a summary report plus optional JSON.

3.2 Functional Requirements

  1. Ping test: Latency and packet loss.
  2. DNS test: Resolve via system and custom resolvers.
  3. Traceroute: Show hop latency and detect breaks.
  4. Port check: TCP connect test to host:port.
  5. HTTP check: Validate response code and headers.
  6. Parallel mode: Run multiple hosts concurrently.
  7. Output: Summary + JSON report.

3.3 Non-Functional Requirements

  • Reliability: Handle missing tools gracefully.
  • Speed: Parallel checks with timeouts.
  • Usability: Clear summary and exit codes.

3.4 Example Usage / Output

$ netdiag example.com --port 443 --http
[netdiag] ping: avg=14ms loss=0%
[netdiag] dns: system=93.184.216.34 resolver=93.184.216.34
[netdiag] tcp: 443 open (12ms)
[netdiag] http: 200 OK

3.5 Real World Outcome

You can run netdiag during outages to quickly identify whether failures are DNS, routing, firewall, or service-level issues.


4. Solution Architecture

4.1 High-Level Design

hosts -> check runner -> parsers -> report builder -> stdout/json
                 |         |
                 |         +-> normalized metrics
                 +-> parallel workers

Project 9: Network Diagnostic Toolkit high-level design diagram

4.2 Key Components

Component Responsibility Key Decisions
Runner Orchestrate checks sequential vs parallel
Parsers Normalize outputs regex vs field parsing
Report builder Summarize results table + JSON
Tool adapter Detect tool availability fallback strategies

4.3 Data Structures

RESULT[ping_avg]=14
RESULT[dns_ip]="93.184.216.34"

4.4 Algorithm Overview

Key Algorithm: Parallel Checks

  1. Build job list from host + check.
  2. Spawn background jobs with timeouts.
  3. Collect exit codes and outputs.
  4. Merge results into report.

Complexity Analysis:

  • Time: O(h*c) with parallelism (h hosts, c checks)
  • Space: O(h*c) for results

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install iputils-ping traceroute dnsutils curl

5.2 Project Structure

netdiag/
|-- netdiag
|-- lib/
|   |-- checks.sh
|   |-- parse.sh
|   `-- report.sh
`-- fixtures/

Project 9: Network Diagnostic Toolkit project structure diagram

5.3 The Core Question You Are Answering

“How can I prove where the network is failing, not just that it is failing?”

5.4 Concepts You Must Understand First

  1. ICMP vs TCP checks
  2. DNS resolution and caching
  3. Timeouts and retry strategies

5.5 Questions to Guide Your Design

  • How will you standardize outputs from different tools?
  • How do you handle missing traceroute or restricted ICMP?
  • What exit code should represent “partial failure”?

5.6 Thinking Exercise

Create a flowchart for a failed HTTP check: is it DNS? TCP? TLS? HTTP?

5.7 The Interview Questions They Will Ask

  1. Why might ping fail but TCP still succeed?
  2. How do you detect DNS resolution issues?
  3. How do you design retries without amplifying outages?

5.8 Hints in Layers

Hint 1: Start by wrapping ping and parsing avg latency.

Hint 2: Add DNS resolution with dig +short.

Hint 3: Add nc -z for TCP port tests.

Hint 4: Add parallelism with & and wait.

5.9 Books That Will Help

Topic Book Chapter
Networking basics “Computer Networking: A Top-Down Approach” Ch. 1-2
Linux networking tools “Linux Network Administrators Guide” Ch. 6

5.10 Implementation Phases

Phase 1: Core Checks (4-5 days)

Goals:

  • Ping, DNS, TCP check.

Tasks:

  1. Implement check functions with parsers.
  2. Build summary output.

Checkpoint: Output matches manual commands.

Phase 2: Parallel Execution (4-5 days)

Goals:

  • Run checks concurrently.

Tasks:

  1. Background jobs per host.
  2. Collect and merge results.

Checkpoint: Performance improves with multiple hosts.

Phase 3: Reports and Polish (3-4 days)

Goals:

  • JSON output and error handling.

Tasks:

  1. Add --json output.
  2. Add timeouts and retries.

Checkpoint: JSON output is valid and consistent.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Parallelism background jobs vs xargs background jobs simplest
Parsing regex vs structured flags structured flags more reliable
Output table only vs JSON both human + machine

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Parser correctness ping output fixtures
Integration Multi-check runs local host tests
Edge Cases Tool missing simulate traceroute absent

6.2 Critical Test Cases

  1. DNS resolver returns NXDOMAIN.
  2. TCP port closed returns exit code 1.
  3. Ping loss > 50% triggers warning.

6.3 Test Data

fixtures/ping_sample.txt
fixtures/dig_sample.txt

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Parsing localized output regex fails force LANG=C
No timeout commands hang use timeout
Blind retries extended outages cap retries

7.2 Debugging Strategies

  • Log raw tool output in debug mode.
  • Compare output with manual runs.

7.3 Performance Traps

Over-parallelizing can overwhelm DNS resolvers or networks. Limit concurrency.


8. Extensions and Challenges

8.1 Beginner Extensions

  • Add --verbose mode.
  • Add saved reports.

8.2 Intermediate Extensions

  • Add TLS handshake timing.
  • Add bandwidth test integration (iperf).

8.3 Advanced Extensions

  • Build a continuous monitoring mode.
  • Add Slack or email notifications.

9. Real-World Connections

9.1 Industry Applications

  • Network troubleshooting in SRE.
  • Site reliability triage tools.
  • mtr: combined ping/traceroute tool.
  • smokeping: latency monitoring.

9.3 Interview Relevance

  • Demonstrates understanding of network diagnostics.
  • Shows ability to automate operational playbooks.

10. Resources

10.1 Essential Reading

  • man ping, man traceroute, man dig, man nc

10.2 Video Resources

  • “How Traceroute Works” (YouTube)

10.3 Tools and Documentation

  • curl, nc, dig, timeout
  • Project 3: Log Parser & Alert System
  • Project 10: Deployment Automation

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain why ICMP and TCP checks differ.
  • I can interpret traceroute output.

11.2 Implementation

  • Reports generate for multiple hosts.
  • JSON output validates.

11.3 Growth

  • I can extend with new checks quickly.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Ping, DNS, TCP checks with summary

Full Completion:

  • Parallel mode + JSON output

Excellence (Going Above & Beyond):

  • Continuous monitoring + notifications