Project 7: Routing Table Explorer
A routing explorer that shows route selection, next hop, and neighbor cache state.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Intermediate |
| Time Estimate | 1 week |
| Main Programming Language | Bash |
| Alternative Programming Languages | Python, Go |
| Coolness Level | Level 3: Clever |
| Business Potential | 2. Micro-SaaS / Pro tool |
| Prerequisites | Basic Linux CLI |
| Key Topics | Interfaces, Link Layer, and Neighbor Discovery, IP Addressing, Routing, and Path Discovery |
1. Learning Objectives
By completing this project, you will:
- Build the core tool described in the project and validate output against a golden transcript.
- Explain how the tool maps to the Linux networking layer model.
- Diagnose at least one real or simulated failure using the tool’s output.
2. All Theory Needed (Per-Concept Breakdown)
This section includes every concept required to implement this project successfully.
Interfaces, Link Layer, and Neighbor Discovery
Fundamentals
Interfaces are the kernel’s handle for network connectivity. Each interface has a name, link state, MTU, MAC address (for Ethernet), and byte/error counters. When traffic stays on the local link, delivery is done at Layer 2, so IP addresses must be mapped to MAC addresses using ARP (IPv4) or Neighbor Discovery (IPv6). Linux exposes interface state and addressing with iproute2 (ip link, ip addr), physical capabilities with ethtool, and configuration ownership with nmcli when NetworkManager is in control. A key principle: “UP” only means the interface is administratively enabled; it does not guarantee a physical link. To know whether packets can truly move, you must verify carrier state, negotiated speed/duplex, and neighbor cache entries for the next hop. This chapter gives you the vocabulary and evidence sources that anchor every other networking tool.
Deep Dive
A Linux interface is a convergence of hardware, driver, kernel state, and optional user-space control planes. The kernel tracks administrative state (UP/DOWN), operational state (e.g., LOWER_UP), MTU, and per-queue statistics. Administrative UP means the kernel will attempt to transmit; operational state indicates whether the link is actually usable. The driver determines whether a carrier is present and negotiates speed and duplex. This is why ethtool matters so much: it is the only tool that asks the driver what the hardware actually negotiated, which can reveal subtle failure modes such as auto-negotiation mismatches, disabled offloads, or a link that flaps under load. Many performance “mysteries” are rooted here, not in routing or DNS.
Layer 2 is where IP becomes deliverable. On IPv4, ARP is the protocol that resolves an IP address to a MAC address. The kernel maintains a neighbor cache; when it needs to transmit and no mapping exists, it broadcasts an ARP request and waits for a response. If the response is missing, packets may be queued or dropped. IPv6 uses Neighbor Discovery (NDP) instead of ARP, but the logic is similar: resolve a next-hop link-layer address before transmitting. The neighbor cache has states like REACHABLE, STALE, DELAY, and FAILED. These states explain intermittent outages: a STALE entry works until a probe fails; a FAILED entry means the kernel has given up and won’t transmit until a new resolution attempt succeeds.
Modern Linux is saturated with virtual interfaces. Bridges, veth pairs, VLANs, and tunnels are software constructs that behave like physical interfaces but represent logical connectivity. Containers and Kubernetes rely on veth pairs to connect isolated namespaces to bridges. That means the same “interface truth” applies in virtual environments: you still need to check link state, addresses, and neighbor resolution, but the physical meaning is different. A veth “carrier down” can mean a peer namespace isn’t up. A bridge can mask multiple endpoints behind a single MAC. The interpretation changes, but the tools do not.
Configuration ownership is another hidden complexity. On many systems, NetworkManager or systemd-networkd owns interface configuration, and manual changes can be overwritten. nmcli shows the manager’s view: which connection profiles exist, which interface they bind to, and which IPs and DNS servers are in effect. If ip addr and nmcli disagree, that is evidence that the kernel state and the manager’s intended state are diverging. That mismatch is often the cause of “it worked, then it reverted” incidents. The correct troubleshooting practice is to identify the owner, inspect both perspectives, and then decide whether you are diagnosing a kernel state problem (carrier, driver, ARP) or a control-plane problem (configuration drift).
Finally, interface metrics are not just numbers; they are diagnostics. RX/TX errors, dropped packets, or increasing queue drops indicate link issues or overload. Seeing these counters rise while higher-layer tools show intermittent loss is a strong signal that the fault is at or below the interface layer. In other words, before you chase a routing bug, you must prove the interface is physically and logically healthy. This is why interface and neighbor checks are always the first steps in a serious network investigation.
MTU and tagging details add another dimension. If a VLAN tag or tunnel reduces effective MTU, packets larger than the path can be dropped or fragmented, which manifests as “some connections work, others hang.” Likewise, checksum and segmentation offloads can change how packet captures look: tcpdump may show incorrect checksums because the NIC computes them later. Knowing that offloads exist helps you interpret evidence correctly, so you do not misdiagnose a healthy link as faulty. The interface layer is where these physical and logical constraints converge, making it the foundation for everything else you will observe.
How this fit on projects
- Network Interface Inspector (Project 1)
- Routing Table Explorer (Project 7)
- Network Namespace Laboratory (Project 11)
Definitions & key terms
- MAC address: Link-layer hardware address used to deliver Ethernet frames.
- Carrier: Physical link presence as reported by the driver.
- Neighbor cache: Kernel table mapping IP addresses to link-layer addresses.
Mental model diagram
IP packet -> need next hop -> neighbor cache lookup
| |
| +-- hit -> MAC known
| +-- miss -> ARP/NDP query
v
Ethernet frame -> driver -> NIC -> wire
How it works (step-by-step, invariants, failure modes)
- Interface administratively UP.
- Driver reports carrier and negotiated link.
- Kernel chooses next hop.
- Neighbor cache resolves MAC.
- Frame transmitted. Invariants: MAC resolution required for L2 delivery; carrier must be present to transmit. Failure modes: link down, wrong VLAN, ARP/ND failure, manager overwriting manual config.
Minimal concrete example Protocol transcript (ARP):
Host: Who has 192.168.1.1? Tell 192.168.1.100
Gateway: 192.168.1.1 is at 00:11:22:33:44:55
Common misconceptions
- “UP means the cable is fine.” (Carrier state matters.)
- “ARP is only for the default gateway.” (ARP is for any same-subnet destination.)
Check-your-understanding questions
- What is the difference between administrative state and carrier state?
- Why does a missing neighbor entry cause packets to be dropped?
- When would
ethtoolshow no speed even if the interface is UP?
Check-your-understanding answers
- Admin state is a software flag; carrier is physical link presence.
- The kernel cannot build an Ethernet frame without a MAC.
- Virtual interfaces or link-down conditions often show no speed.
Real-world applications
- Diagnosing link flaps, ARP storms, and NIC driver issues.
Where you’ll apply it Projects 1, 7, 11.
References
- ethtool description (driver/hardware settings).
- nmcli description (NetworkManager control and status).
Key insights Physical truth (carrier, speed, ARP) is the foundation for every higher-layer fix.
Summary Interfaces and neighbors determine whether packets can leave the host at all; validate them before blaming routes or DNS.
Homework/Exercises to practice the concept
- Draw the neighbor cache state transitions for a host that goes idle and then becomes active again.
- Label where carrier loss would appear in the data path.
Solutions to the homework/exercises
- Idle host moves to STALE, then probes on use; if no reply, becomes FAILED.
- Carrier loss is reported by the driver and visible before any routing decision.
IP Addressing, Routing, and Path Discovery
Fundamentals
Routing is the decision process that answers, “Where should this packet go next?” Linux chooses routes using longest-prefix match and attaches that choice to an egress interface and, if needed, a next-hop gateway. The ip tool exposes both the routing tables and the policy rules that choose which table to consult, and it can ask the kernel directly which route would be used for a given destination. Path discovery tools translate that decision into evidence: tracepath probes the path and reports Path MTU (PMTU), while mtr repeatedly probes hops to surface loss and latency patterns. Together, these tools let you move from assumptions (“the route is fine”) to proof (“the kernel will use this gateway, and hop 6 drops 30% of probes”). That shift from inference to evidence is the central skill in routing diagnostics.
Deep Dive Linux routing is a policy engine, not a single static table. Before any prefix matching occurs, the kernel consults routing policy rules. These rules can select a routing table based on source address, incoming interface, firewall mark, or user-defined priority. Once a table is chosen, the kernel performs longest-prefix match: the most specific prefix wins, and metrics break ties among equally specific routes. The final selection yields an egress interface, a next hop (if the destination is not directly connected), and a preferred source IP. This explains many “route exists but traffic fails” scenarios: the route might exist in a table that is never selected for that traffic, or the preferred source IP might not be reachable on the chosen path.
The most important command in this domain is ip route get <destination>. It queries the kernel’s decision engine and returns exactly what would happen if a packet were sent: the chosen route, interface, and source address. It is your truth oracle because it reflects the kernel’s actual behavior, not your interpretation of the routing table. But a routing decision alone does not guarantee reachability. The next hop must still be reachable at Layer 2, and the path beyond the next hop must accept and forward the packet. That is why route diagnosis always includes neighbor resolution and path probing.
Path discovery tools provide that second half. tracepath sends probes with increasing TTL values and reports where ICMP responses are generated. It also discovers PMTU by observing “Packet Too Big” responses and tracking the smallest MTU on the path. mtr adds repetition, showing latency and loss over time rather than a single snapshot. This matters because routing problems often manifest as intermittent congestion or packet loss at specific hops. A static traceroute might miss a transient spike; a rolling mtr report reveals it. The pairing of ip route get (decision evidence) with mtr (path behavior) is a powerful diagnostic habit.
PMTU is a classic foot-gun. The path MTU is the smallest MTU on the path between two hosts. If you send packets larger than the PMTU and fragmentation is disabled (as it often is for modern networks), routers will drop them and send ICMP “Packet Too Big.” If those ICMP messages are blocked, the sender never learns the correct size. The result is the infamous symptom: small packets work, large packets hang. Linux tools surface this in multiple ways: tracepath reports PMTU directly; tcpdump reveals ICMP errors; and iperf3 shows throughput collapse when MTU mismatches cause retransmissions. Understanding PMTU shifts your diagnosis from “the server is slow” to “the path is constrained.”
Advanced routing problems often involve policy routing and multiple interfaces. VPNs, source-based routing, and multi-homed hosts can send different destinations through different uplinks. The kernel may choose a route based on source address or marks assigned by firewall rules. If you only look at the main table, you will miss the true behavior. The correct workflow is: inspect ip rule, identify which table is in use for the traffic in question, use ip route get with a source address when needed, and then validate with path probes. This discipline separates a correct, reproducible diagnosis from a lucky guess.
Finally, remember that routing is only one layer. A correct route can still fail if neighbor resolution fails or if the next-hop router is down. That is why routing diagnosis must be layered: (1) What route does the kernel choose? (2) Can the next hop be resolved at L2? (3) What does the path beyond the next hop look like? The tools in this guide map directly to those questions, and the projects will force you to practice that sequence until it is reflexive.
How this fit on projects
- Connectivity Diagnostic Suite (Project 2)
- Routing Table Explorer (Project 7)
- Bandwidth Monitor (Project 8)
Definitions & key terms
- Longest-prefix match: Route selection rule where the most specific prefix wins.
- PMTU: Path MTU, the smallest MTU along a path.
- Policy routing: Selecting a routing table based on metadata, not just destination.
Mental model diagram
Destination IP
|
v
Policy rules -> Routing table -> Best prefix -> Next hop + egress
|
v
tracepath / mtr validate path and MTU
How it works (step-by-step, invariants, failure modes)
- Select routing table based on policy rules.
- Find best prefix match.
- Choose next hop and source IP.
- Resolve next hop at L2.
- Probe path for PMTU and latency. Invariants: best prefix wins; PMTU <= smallest link MTU. Failure modes: wrong table, blackhole route, ICMP blocked, PMTUD failure.
Minimal concrete example Route lookup transcript:
Destination: 203.0.113.10
Selected: 0.0.0.0/0 via 192.168.1.1 dev eth0 src 192.168.1.100
Common misconceptions
- “The default route is used for everything.” (Only when no more specific prefix matches.)
- “Traceroute proves connectivity.” (It only proves ICMP/TTL handling, not application reachability.)
Check-your-understanding questions
- Why can a /24 override a /16 route?
- What does tracepath report that traceroute might not?
- How can policy routing send the same destination via different paths?
Check-your-understanding answers
- Longest-prefix match chooses the most specific route.
- Path MTU changes along the path.
- Different tables can be selected based on source or marks.
Real-world applications
- VPN split-tunneling, multi-homed servers, and performance debugging.
Where you’ll apply it Projects 2, 7, 8.
References
- ip(8) description and routing functionality.
- tracepath description (path + MTU discovery).
- mtr description (combines traceroute and ping).
- PMTU discovery standards.
Key insights Routing is a choice plus a constraint; you must verify both the chosen path and its MTU limits.
Summary You can now predict route selection and validate the path end-to-end using tracepath and mtr.
Homework/Exercises to practice the concept
- Given a routing table with overlapping prefixes, predict which route is chosen for five destinations.
- Use a diagram to show how PMTU failures cause “large packet” hangs.
Solutions to the homework/exercises
- The most specific prefix always wins; ties go to lowest metric.
- Large packets drop when the path MTU is smaller; ICMP “Packet Too Big” feedback is required to adapt.
3. Project Specification
3.1 What You Will Build
A routing explorer that shows route selection, next hop, and neighbor cache state.
3.2 Functional Requirements
- Core data collection: Gather the required system/network data reliably.
- Interpretation layer: Translate raw outputs into human-readable insights.
- Deterministic output: Produce stable, comparable results across runs.
- Error handling: Detect missing privileges, tools, or unsupported interfaces.
3.3 Non-Functional Requirements
- Performance: Runs in under 5 seconds for baseline mode.
- Reliability: Handles missing data sources gracefully.
- Usability: Output is readable without post-processing.
3.4 Example Usage / Output
$ ./routeexplore.sh
ROUTING TABLE
default via 192.168.1.1 dev eth0 metric 100
10.0.0.0/8 via 10.255.0.1 dev tun0 metric 50
192.168.1.0/24 dev eth0 scope link
Route lookup: 8.8.8.8
selected: default via 192.168.1.1 dev eth0
source: 192.168.1.100
Neighbor cache:
192.168.1.1 -> 00:11:22:33:44:55 REACHABLE
3.5 Data Formats / Schemas / Protocols
- Input: CLI tool output, kernel state, or service logs.
- Output: A structured report with sections and summarized metrics.
3.6 Edge Cases
- Missing tool binaries or insufficient permissions.
- Interfaces or hosts that return no data.
- Transient states (link flaps, intermittent loss).
3.7 Real World Outcome
$ ./routeexplore.sh
ROUTING TABLE
default via 192.168.1.1 dev eth0 metric 100
10.0.0.0/8 via 10.255.0.1 dev tun0 metric 50
192.168.1.0/24 dev eth0 scope link
Route lookup: 8.8.8.8
selected: default via 192.168.1.1 dev eth0
source: 192.168.1.100
Neighbor cache:
192.168.1.1 -> 00:11:22:33:44:55 REACHABLE
3.7.1 How to Run (Copy/Paste)
$ ./run-project.sh [options]
3.7.2 Golden Path Demo (Deterministic)
Run the tool against a known-good target and verify every section of the output matches the expected format.
3.7.3 If CLI: provide an exact terminal transcript
$ ./routeexplore.sh
ROUTING TABLE
default via 192.168.1.1 dev eth0 metric 100
10.0.0.0/8 via 10.255.0.1 dev tun0 metric 50
192.168.1.0/24 dev eth0 scope link
Route lookup: 8.8.8.8
selected: default via 192.168.1.1 dev eth0
source: 192.168.1.100
Neighbor cache:
192.168.1.1 -> 00:11:22:33:44:55 REACHABLE
4. Solution Architecture
4.1 High-Level Design
[Collector] -> [Parser] -> [Analyzer] -> [Reporter]
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Collector | Gather raw tool output | Which tools to call and with what flags |
| Parser | Normalize raw text/JSON | Text vs JSON parsing strategy |
| Analyzer | Compute insights | Thresholds and heuristics |
| Reporter | Format output | Stable layout and readability |
4.3 Data Structures (No Full Code)
- InterfaceRecord: name, state, addresses, stats
- RouteRecord: prefix, gateway, interface, metric
- Observation: timestamp, source, severity, message
4.4 Algorithm Overview
Key Algorithm: Evidence Aggregation
- Collect raw outputs from tools.
- Parse into normalized records.
- Apply interpretation rules and thresholds.
- Render the final report.
Complexity Analysis:
- Time: O(n) over number of records
- Space: O(n) to hold parsed records
5. Implementation Guide
5.1 Development Environment Setup
# Install required tools with your distro package manager
5.2 Project Structure
project-root/
├── src/
│ ├── main
│ ├── collectors/
│ └── formatters/
├── tests/
└── README.md
5.3 The Core Question You’re Answering
“When I send a packet to X, how does the kernel decide where it goes?”
5.4 Concepts You Must Understand First
- Route selection
- Longest-prefix match.
- Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 8
- Gateway vs direct routes
- When next hop is required.
- Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 8-9
- Neighbor cache
- ARP/ND resolution.
- Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 4
5.5 Questions to Guide Your Design
- How will you display source address selection?
- How will you show route choice explanations?
- How will you include policy routing tables?
5.6 Thinking Exercise
Given these routes:
10.0.0.0/8 via 10.0.0.1
10.0.0.0/16 via 10.0.0.2
Question: Which route is used for 10.0.5.10 and why?
5.7 The Interview Questions They’ll Ask
- “What is longest-prefix match?”
- “How do you view the routing table in Linux?”
- “What does an incomplete ARP entry mean?”
- “How do you add a static route?”
- “How do you test a route decision?”
5.8 Hints in Layers
Hint 1: Use ip route get for decision evidence.
Hint 2: Show neighbor cache alongside routes.
Hint 3: Include policy rules (ip rule show).
Hint 4: Highlight default route separately.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Routing | “TCP/IP Illustrated, Vol 1” | Ch. 8-9 |
| ARP | “TCP/IP Illustrated, Vol 1” | Ch. 4 |
5.10 Implementation Phases
Phase 1: Foundation (1-2 days)
- Define outputs and parse a single tool.
- Produce a minimal report.
Phase 2: Core Functionality (3-5 days)
- Add remaining tools and interpretation logic.
- Implement stable formatting and summaries.
Phase 3: Polish & Edge Cases (2-3 days)
- Handle missing data and failure modes.
- Add thresholds and validation checks.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Parsing format | Text vs JSON | JSON where available | More stable parsing |
| Output layout | Table vs sections | Sections | Readability for humans |
| Sampling | One-shot vs periodic | One-shot + optional loop | Predictable runtime |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate parsing | Parse fixed tool output samples |
| Integration Tests | Validate tool calls | Run against a lab host |
| Edge Case Tests | Handle failures | Missing tool, no permissions |
6.2 Critical Test Cases
- Reference run: Output matches golden transcript.
- Missing tool: Proper error message and partial report.
- Permission denied: Clear guidance for sudo or capabilities.
6.3 Test Data
Input: captured command output
Expected: normalized report with correct totals
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong interface | Empty output | Verify interface names |
| Missing privileges | Permission errors | Use sudo or capabilities |
| Misparsed output | Wrong stats | Prefer JSON parsing |
7.2 Debugging Strategies
- Re-run each tool independently to compare raw output.
- Add a verbose mode that dumps raw data sources.
7.3 Performance Traps
- Avoid tight loops without sleep intervals.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add colored status markers.
- Export report to a file.
8.2 Intermediate Extensions
- Add JSON output mode.
- Add baseline comparison.
8.3 Advanced Extensions
- Add multi-host aggregation.
- Add alerting thresholds.
9. Real-World Connections
9.1 Industry Applications
- SRE runbooks and on-call diagnostics.
- Network operations monitoring.
9.2 Related Open Source Projects
- tcpdump / iproute2 / nftables
- mtr / iperf3
9.3 Interview Relevance
- Demonstrates evidence-based debugging and tool mastery.
10. Resources
10.1 Essential Reading
- Primary book listed in the main guide.
- Relevant RFCs and tool manuals.
10.2 Video Resources
- Conference talks on Linux networking and troubleshooting.