Project 7: Routing Table Explorer

A routing explorer that shows route selection, next hop, and neighbor cache state.

Quick Reference

Attribute	Value
Difficulty	Level 2: Intermediate
Time Estimate	1 week
Main Programming Language	Bash
Alternative Programming Languages	Python, Go
Coolness Level	Level 3: Clever
Business Potential	2. Micro-SaaS / Pro tool
Prerequisites	Basic Linux CLI
Key Topics	Interfaces, Link Layer, and Neighbor Discovery, IP Addressing, Routing, and Path Discovery

1. Learning Objectives

By completing this project, you will:

Build the core tool described in the project and validate output against a golden transcript.
Explain how the tool maps to the Linux networking layer model.
Diagnose at least one real or simulated failure using the tool’s output.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement this project successfully.

Interfaces, Link Layer, and Neighbor Discovery

Fundamentals Interfaces are the kernel’s handle for network connectivity. Each interface has a name, link state, MTU, MAC address (for Ethernet), and byte/error counters. When traffic stays on the local link, delivery is done at Layer 2, so IP addresses must be mapped to MAC addresses using ARP (IPv4) or Neighbor Discovery (IPv6). Linux exposes interface state and addressing with iproute2 (ip link, ip addr), physical capabilities with ethtool, and configuration ownership with nmcli when NetworkManager is in control. A key principle: “UP” only means the interface is administratively enabled; it does not guarantee a physical link. To know whether packets can truly move, you must verify carrier state, negotiated speed/duplex, and neighbor cache entries for the next hop. This chapter gives you the vocabulary and evidence sources that anchor every other networking tool.

Deep Dive A Linux interface is a convergence of hardware, driver, kernel state, and optional user-space control planes. The kernel tracks administrative state (UP/DOWN), operational state (e.g., LOWER_UP), MTU, and per-queue statistics. Administrative UP means the kernel will attempt to transmit; operational state indicates whether the link is actually usable. The driver determines whether a carrier is present and negotiates speed and duplex. This is why ethtool matters so much: it is the only tool that asks the driver what the hardware actually negotiated, which can reveal subtle failure modes such as auto-negotiation mismatches, disabled offloads, or a link that flaps under load. Many performance “mysteries” are rooted here, not in routing or DNS.

Layer 2 is where IP becomes deliverable. On IPv4, ARP is the protocol that resolves an IP address to a MAC address. The kernel maintains a neighbor cache; when it needs to transmit and no mapping exists, it broadcasts an ARP request and waits for a response. If the response is missing, packets may be queued or dropped. IPv6 uses Neighbor Discovery (NDP) instead of ARP, but the logic is similar: resolve a next-hop link-layer address before transmitting. The neighbor cache has states like REACHABLE, STALE, DELAY, and FAILED. These states explain intermittent outages: a STALE entry works until a probe fails; a FAILED entry means the kernel has given up and won’t transmit until a new resolution attempt succeeds.

Modern Linux is saturated with virtual interfaces. Bridges, veth pairs, VLANs, and tunnels are software constructs that behave like physical interfaces but represent logical connectivity. Containers and Kubernetes rely on veth pairs to connect isolated namespaces to bridges. That means the same “interface truth” applies in virtual environments: you still need to check link state, addresses, and neighbor resolution, but the physical meaning is different. A veth “carrier down” can mean a peer namespace isn’t up. A bridge can mask multiple endpoints behind a single MAC. The interpretation changes, but the tools do not.

Configuration ownership is another hidden complexity. On many systems, NetworkManager or systemd-networkd owns interface configuration, and manual changes can be overwritten. nmcli shows the manager’s view: which connection profiles exist, which interface they bind to, and which IPs and DNS servers are in effect. If ip addr and nmcli disagree, that is evidence that the kernel state and the manager’s intended state are diverging. That mismatch is often the cause of “it worked, then it reverted” incidents. The correct troubleshooting practice is to identify the owner, inspect both perspectives, and then decide whether you are diagnosing a kernel state problem (carrier, driver, ARP) or a control-plane problem (configuration drift).

Finally, interface metrics are not just numbers; they are diagnostics. RX/TX errors, dropped packets, or increasing queue drops indicate link issues or overload. Seeing these counters rise while higher-layer tools show intermittent loss is a strong signal that the fault is at or below the interface layer. In other words, before you chase a routing bug, you must prove the interface is physically and logically healthy. This is why interface and neighbor checks are always the first steps in a serious network investigation.

MTU and tagging details add another dimension. If a VLAN tag or tunnel reduces effective MTU, packets larger than the path can be dropped or fragmented, which manifests as “some connections work, others hang.” Likewise, checksum and segmentation offloads can change how packet captures look: tcpdump may show incorrect checksums because the NIC computes them later. Knowing that offloads exist helps you interpret evidence correctly, so you do not misdiagnose a healthy link as faulty. The interface layer is where these physical and logical constraints converge, making it the foundation for everything else you will observe.

How this fit on projects

Network Interface Inspector (Project 1)
Routing Table Explorer (Project 7)
Network Namespace Laboratory (Project 11)

Definitions & key terms

MAC address: Link-layer hardware address used to deliver Ethernet frames.
Carrier: Physical link presence as reported by the driver.
Neighbor cache: Kernel table mapping IP addresses to link-layer addresses.

Mental model diagram

IP packet -> need next hop -> neighbor cache lookup
   |                   |
   |                   +-- hit -> MAC known
   |                   +-- miss -> ARP/NDP query
   v
Ethernet frame -> driver -> NIC -> wire

How it works (step-by-step, invariants, failure modes)

Interface administratively UP.
Driver reports carrier and negotiated link.
Kernel chooses next hop.
Neighbor cache resolves MAC.
Frame transmitted. Invariants: MAC resolution required for L2 delivery; carrier must be present to transmit. Failure modes: link down, wrong VLAN, ARP/ND failure, manager overwriting manual config.

Minimal concrete example Protocol transcript (ARP):

Host: Who has 192.168.1.1? Tell 192.168.1.100
Gateway: 192.168.1.1 is at 00:11:22:33:44:55

Common misconceptions

“UP means the cable is fine.” (Carrier state matters.)
“ARP is only for the default gateway.” (ARP is for any same-subnet destination.)

Check-your-understanding questions

What is the difference between administrative state and carrier state?
Why does a missing neighbor entry cause packets to be dropped?
When would ethtool show no speed even if the interface is UP?

Check-your-understanding answers

Admin state is a software flag; carrier is physical link presence.
The kernel cannot build an Ethernet frame without a MAC.
Virtual interfaces or link-down conditions often show no speed.

Real-world applications

Diagnosing link flaps, ARP storms, and NIC driver issues.

Where you’ll apply it Projects 1, 7, 11.

References

ethtool description (driver/hardware settings).
nmcli description (NetworkManager control and status).

Key insights Physical truth (carrier, speed, ARP) is the foundation for every higher-layer fix.

Summary Interfaces and neighbors determine whether packets can leave the host at all; validate them before blaming routes or DNS.

Homework/Exercises to practice the concept

Draw the neighbor cache state transitions for a host that goes idle and then becomes active again.
Label where carrier loss would appear in the data path.

Solutions to the homework/exercises

Idle host moves to STALE, then probes on use; if no reply, becomes FAILED.
Carrier loss is reported by the driver and visible before any routing decision.

IP Addressing, Routing, and Path Discovery

Fundamentals Routing is the decision process that answers, “Where should this packet go next?” Linux chooses routes using longest-prefix match and attaches that choice to an egress interface and, if needed, a next-hop gateway. The ip tool exposes both the routing tables and the policy rules that choose which table to consult, and it can ask the kernel directly which route would be used for a given destination. Path discovery tools translate that decision into evidence: tracepath probes the path and reports Path MTU (PMTU), while mtr repeatedly probes hops to surface loss and latency patterns. Together, these tools let you move from assumptions (“the route is fine”) to proof (“the kernel will use this gateway, and hop 6 drops 30% of probes”). That shift from inference to evidence is the central skill in routing diagnostics.

Deep Dive Linux routing is a policy engine, not a single static table. Before any prefix matching occurs, the kernel consults routing policy rules. These rules can select a routing table based on source address, incoming interface, firewall mark, or user-defined priority. Once a table is chosen, the kernel performs longest-prefix match: the most specific prefix wins, and metrics break ties among equally specific routes. The final selection yields an egress interface, a next hop (if the destination is not directly connected), and a preferred source IP. This explains many “route exists but traffic fails” scenarios: the route might exist in a table that is never selected for that traffic, or the preferred source IP might not be reachable on the chosen path.

The most important command in this domain is ip route get <destination>. It queries the kernel’s decision engine and returns exactly what would happen if a packet were sent: the chosen route, interface, and source address. It is your truth oracle because it reflects the kernel’s actual behavior, not your interpretation of the routing table. But a routing decision alone does not guarantee reachability. The next hop must still be reachable at Layer 2, and the path beyond the next hop must accept and forward the packet. That is why route diagnosis always includes neighbor resolution and path probing.

Path discovery tools provide that second half. tracepath sends probes with increasing TTL values and reports where ICMP responses are generated. It also discovers PMTU by observing “Packet Too Big” responses and tracking the smallest MTU on the path. mtr adds repetition, showing latency and loss over time rather than a single snapshot. This matters because routing problems often manifest as intermittent congestion or packet loss at specific hops. A static traceroute might miss a transient spike; a rolling mtr report reveals it. The pairing of ip route get (decision evidence) with mtr (path behavior) is a powerful diagnostic habit.

PMTU is a classic foot-gun. The path MTU is the smallest MTU on the path between two hosts. If you send packets larger than the PMTU and fragmentation is disabled (as it often is for modern networks), routers will drop them and send ICMP “Packet Too Big.” If those ICMP messages are blocked, the sender never learns the correct size. The result is the infamous symptom: small packets work, large packets hang. Linux tools surface this in multiple ways: tracepath reports PMTU directly; tcpdump reveals ICMP errors; and iperf3 shows throughput collapse when MTU mismatches cause retransmissions. Understanding PMTU shifts your diagnosis from “the server is slow” to “the path is constrained.”

Advanced routing problems often involve policy routing and multiple interfaces. VPNs, source-based routing, and multi-homed hosts can send different destinations through different uplinks. The kernel may choose a route based on source address or marks assigned by firewall rules. If you only look at the main table, you will miss the true behavior. The correct workflow is: inspect ip rule, identify which table is in use for the traffic in question, use ip route get with a source address when needed, and then validate with path probes. This discipline separates a correct, reproducible diagnosis from a lucky guess.

Finally, remember that routing is only one layer. A correct route can still fail if neighbor resolution fails or if the next-hop router is down. That is why routing diagnosis must be layered: (1) What route does the kernel choose? (2) Can the next hop be resolved at L2? (3) What does the path beyond the next hop look like? The tools in this guide map directly to those questions, and the projects will force you to practice that sequence until it is reflexive.

How this fit on projects

Connectivity Diagnostic Suite (Project 2)
Routing Table Explorer (Project 7)
Bandwidth Monitor (Project 8)

Definitions & key terms

Longest-prefix match: Route selection rule where the most specific prefix wins.
PMTU: Path MTU, the smallest MTU along a path.
Policy routing: Selecting a routing table based on metadata, not just destination.

Mental model diagram

Destination IP
   |
   v
Policy rules -> Routing table -> Best prefix -> Next hop + egress
   |
   v
tracepath / mtr validate path and MTU

How it works (step-by-step, invariants, failure modes)

Select routing table based on policy rules.
Find best prefix match.
Choose next hop and source IP.
Resolve next hop at L2.
Probe path for PMTU and latency. Invariants: best prefix wins; PMTU <= smallest link MTU. Failure modes: wrong table, blackhole route, ICMP blocked, PMTUD failure.

Minimal concrete example Route lookup transcript:

Destination: 203.0.113.10
Selected: 0.0.0.0/0 via 192.168.1.1 dev eth0 src 192.168.1.100

Common misconceptions

“The default route is used for everything.” (Only when no more specific prefix matches.)
“Traceroute proves connectivity.” (It only proves ICMP/TTL handling, not application reachability.)

Check-your-understanding questions

Why can a /24 override a /16 route?
What does tracepath report that traceroute might not?
How can policy routing send the same destination via different paths?

Check-your-understanding answers

Longest-prefix match chooses the most specific route.
Path MTU changes along the path.
Different tables can be selected based on source or marks.

Real-world applications

VPN split-tunneling, multi-homed servers, and performance debugging.

Where you’ll apply it Projects 2, 7, 8.

References

ip(8) description and routing functionality.
tracepath description (path + MTU discovery).
mtr description (combines traceroute and ping).
PMTU discovery standards.

Key insights Routing is a choice plus a constraint; you must verify both the chosen path and its MTU limits.

Summary You can now predict route selection and validate the path end-to-end using tracepath and mtr.

Homework/Exercises to practice the concept

Given a routing table with overlapping prefixes, predict which route is chosen for five destinations.
Use a diagram to show how PMTU failures cause “large packet” hangs.

Solutions to the homework/exercises

The most specific prefix always wins; ties go to lowest metric.
Large packets drop when the path MTU is smaller; ICMP “Packet Too Big” feedback is required to adapt.

3. Project Specification

3.1 What You Will Build

A routing explorer that shows route selection, next hop, and neighbor cache state.

3.2 Functional Requirements

Core data collection: Gather the required system/network data reliably.
Interpretation layer: Translate raw outputs into human-readable insights.
Deterministic output: Produce stable, comparable results across runs.
Error handling: Detect missing privileges, tools, or unsupported interfaces.

3.3 Non-Functional Requirements

Performance: Runs in under 5 seconds for baseline mode.
Reliability: Handles missing data sources gracefully.
Usability: Output is readable without post-processing.

3.4 Example Usage / Output

$ ./routeexplore.sh

ROUTING TABLE
  default via 192.168.1.1 dev eth0 metric 100
  10.0.0.0/8 via 10.255.0.1 dev tun0 metric 50
  192.168.1.0/24 dev eth0 scope link

Route lookup: 8.8.8.8
  selected: default via 192.168.1.1 dev eth0
  source: 192.168.1.100

Neighbor cache:
  192.168.1.1 -> 00:11:22:33:44:55 REACHABLE

3.5 Data Formats / Schemas / Protocols

Input: CLI tool output, kernel state, or service logs.
Output: A structured report with sections and summarized metrics.

3.6 Edge Cases

Missing tool binaries or insufficient permissions.
Interfaces or hosts that return no data.
Transient states (link flaps, intermittent loss).

3.7 Real World Outcome

$ ./routeexplore.sh

ROUTING TABLE
  default via 192.168.1.1 dev eth0 metric 100
  10.0.0.0/8 via 10.255.0.1 dev tun0 metric 50
  192.168.1.0/24 dev eth0 scope link

Route lookup: 8.8.8.8
  selected: default via 192.168.1.1 dev eth0
  source: 192.168.1.100

Neighbor cache:
  192.168.1.1 -> 00:11:22:33:44:55 REACHABLE

3.7.1 How to Run (Copy/Paste)

$ ./run-project.sh [options]

3.7.2 Golden Path Demo (Deterministic)

Run the tool against a known-good target and verify every section of the output matches the expected format.

3.7.3 If CLI: provide an exact terminal transcript

$ ./routeexplore.sh

ROUTING TABLE
  default via 192.168.1.1 dev eth0 metric 100
  10.0.0.0/8 via 10.255.0.1 dev tun0 metric 50
  192.168.1.0/24 dev eth0 scope link

Route lookup: 8.8.8.8
  selected: default via 192.168.1.1 dev eth0
  source: 192.168.1.100

Neighbor cache:
  192.168.1.1 -> 00:11:22:33:44:55 REACHABLE

4. Solution Architecture

4.1 High-Level Design

[Collector] -> [Parser] -> [Analyzer] -> [Reporter]

4.2 Key Components

Component	Responsibility	Key Decisions
Collector	Gather raw tool output	Which tools to call and with what flags
Parser	Normalize raw text/JSON	Text vs JSON parsing strategy
Analyzer	Compute insights	Thresholds and heuristics
Reporter	Format output	Stable layout and readability

4.3 Data Structures (No Full Code)

InterfaceRecord: name, state, addresses, stats
RouteRecord: prefix, gateway, interface, metric
Observation: timestamp, source, severity, message

4.4 Algorithm Overview

Key Algorithm: Evidence Aggregation

Collect raw outputs from tools.
Parse into normalized records.
Apply interpretation rules and thresholds.
Render the final report.

Complexity Analysis:

Time: O(n) over number of records
Space: O(n) to hold parsed records

5. Implementation Guide

5.1 Development Environment Setup

# Install required tools with your distro package manager

5.2 Project Structure

project-root/
├── src/
│   ├── main
│   ├── collectors/
│   └── formatters/
├── tests/
└── README.md

5.3 The Core Question You’re Answering

“When I send a packet to X, how does the kernel decide where it goes?”

5.4 Concepts You Must Understand First

Route selection
- Longest-prefix match.
- Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 8
Gateway vs direct routes
- When next hop is required.
- Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 8-9
Neighbor cache
- ARP/ND resolution.
- Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 4

5.5 Questions to Guide Your Design

How will you display source address selection?
How will you show route choice explanations?
How will you include policy routing tables?

5.6 Thinking Exercise

Given these routes:

10.0.0.0/8 via 10.0.0.1
10.0.0.0/16 via 10.0.0.2

Question: Which route is used for 10.0.5.10 and why?

5.7 The Interview Questions They’ll Ask

“What is longest-prefix match?”
“How do you view the routing table in Linux?”
“What does an incomplete ARP entry mean?”
“How do you add a static route?”
“How do you test a route decision?”

5.8 Hints in Layers

Hint 1: Use ip route get for decision evidence. Hint 2: Show neighbor cache alongside routes. Hint 3: Include policy rules (ip rule show). Hint 4: Highlight default route separately.

5.9 Books That Will Help

Topic	Book	Chapter
Routing	“TCP/IP Illustrated, Vol 1”	Ch. 8-9
ARP	“TCP/IP Illustrated, Vol 1”	Ch. 4

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Define outputs and parse a single tool.
Produce a minimal report.

Phase 2: Core Functionality (3-5 days)

Add remaining tools and interpretation logic.
Implement stable formatting and summaries.

Phase 3: Polish & Edge Cases (2-3 days)

Handle missing data and failure modes.
Add thresholds and validation checks.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Parsing format	Text vs JSON	JSON where available	More stable parsing
Output layout	Table vs sections	Sections	Readability for humans
Sampling	One-shot vs periodic	One-shot + optional loop	Predictable runtime

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Validate parsing	Parse fixed tool output samples
Integration Tests	Validate tool calls	Run against a lab host
Edge Case Tests	Handle failures	Missing tool, no permissions

6.2 Critical Test Cases

Reference run: Output matches golden transcript.
Missing tool: Proper error message and partial report.
Permission denied: Clear guidance for sudo or capabilities.

6.3 Test Data

Input: captured command output
Expected: normalized report with correct totals

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Wrong interface	Empty output	Verify interface names
Missing privileges	Permission errors	Use sudo or capabilities
Misparsed output	Wrong stats	Prefer JSON parsing

7.2 Debugging Strategies

Re-run each tool independently to compare raw output.
Add a verbose mode that dumps raw data sources.

7.3 Performance Traps

Avoid tight loops without sleep intervals.

8. Extensions & Challenges

8.1 Beginner Extensions

Add colored status markers.
Export report to a file.

8.2 Intermediate Extensions

Add JSON output mode.
Add baseline comparison.

8.3 Advanced Extensions

Add multi-host aggregation.
Add alerting thresholds.

9. Real-World Connections

9.1 Industry Applications

SRE runbooks and on-call diagnostics.
Network operations monitoring.

tcpdump / iproute2 / nftables
mtr / iperf3

9.3 Interview Relevance

Demonstrates evidence-based debugging and tool mastery.

10. Resources

10.1 Essential Reading

Primary book listed in the main guide.
Relevant RFCs and tool manuals.

10.2 Video Resources

Conference talks on Linux networking and troubleshooting.