Project 1: Network Device Scanner (ARP Discovery Tool)

A CLI tool that discovers all devices on your local network by sending ARP requests and collecting replies.

Quick Reference

Attribute Value
Difficulty Level 1: Beginner
Time Estimate Weekend
Main Programming Language Python
Alternative Programming Languages C, Go, Rust
Coolness Level Level 3: Genuinely Clever
Business Potential 2. The “Micro-SaaS / Pro Tool”
Prerequisites See concepts below
Key Topics Link Layer and LAN Behavior (Ethernet, Wi-Fi, ARP, Switching), IP Addressing, Subnetting, and Routing (Including NAT)

1. Learning Objectives

  1. Build and validate: A CLI tool that discovers all devices on your local network by sending ARP requests and collecting replies..
  2. Explain protocol behavior and verify it with a capture or trace.
  3. Handle edge cases and produce reproducible results.

2. All Theory Needed (Per-Concept Breakdown)

Fundamentals

The link layer is the realm of MAC addresses, frames, and local delivery. It is responsible for getting data from one device to another device on the same local network segment. Ethernet and Wi-Fi are the most common link layers in home and office networks. Ethernet uses switches that learn where MAC addresses live and forward frames to the correct port. Wi-Fi is a shared medium where devices contend for airtime and associate with an access point, which then bridges traffic to the wired LAN. ARP (Address Resolution Protocol) is the bridge between IP and MAC addresses, allowing an IP address to be mapped to a link-layer destination. The link layer defines the boundaries of broadcast domains, which strongly influences performance and security.

Deep Dive

The link layer is the place where “local” really means local. A switch builds a table of MAC address to port mappings by observing the source MAC of incoming frames. When it sees a destination MAC it does not know, it floods the frame out all ports (except the one it arrived on). Over time, this learning behavior makes forwarding efficient, but it also creates risks like MAC table overflow or broadcast storms if the network is poorly segmented. Unlike routers, switches do not understand IP addresses; they only see MACs and EtherType values. This is why ARP is necessary. When a host wants to send to an IP on the same subnet, it broadcasts an ARP request asking who owns that IP, and the owner replies with its MAC. That reply is cached for a limited time. If the cache is stale, ARP traffic increases and devices appear to “mysteriously” fail or slow down.

Wi-Fi adds complexity because the medium is shared and half-duplex. Devices must contend for airtime, and interference or poor signal quality can cause retransmissions that look like packet loss at higher layers. An access point is effectively a bridge between the Wi-Fi link and the wired Ethernet LAN. When you see a device “connected” but unable to reach the internet, the cause could be at the association layer (link) rather than at IP or DNS. Another subtlety is that Wi-Fi uses different frame formats and encryption (WPA2/WPA3) to protect frames on the air. This encryption is per-link, not end-to-end. Once a frame leaves the access point and enters the wired LAN, that Wi-Fi encryption no longer applies.

VLANs are a link-layer technique for segmentation. They allow multiple logical networks to share the same physical switch by tagging frames. In a home/office setting, VLANs are used to separate guest networks or IoT devices from trusted devices. This is a key security and performance tool, but it requires that all participating switches and access points handle VLAN tags correctly. Misconfigured VLAN tagging leads to symptoms like DHCP working on one SSID but not another, or devices that can access the router but not other devices.

Understanding link-layer behavior is essential for building scanners and sniffers. A packet sniffer on a wired switch port sees only the frames destined for that port unless you use port mirroring. On Wi-Fi, you might need monitor mode to see frames not destined for your device. This is a practical limitation that affects how you validate your tools. It is also why many network tools appear to “miss” traffic when run on the wrong interface or in the wrong capture mode.

Finally, link-layer security is not the same as network-layer security. ARP has no authentication, which is why ARP spoofing is possible. Wi-Fi encryption does not prevent a malicious device from joining the network if the passphrase is weak. If you understand the link layer, you can explain and detect these risks with concrete evidence, such as duplicate IP address warnings or sudden changes in ARP cache entries.

How this fit on projects Projects 1, 4, 12, 15, and 18 force you to understand ARP, frames, and capture limitations.

Definitions & key terms

  • MAC address: A link-layer identifier used for local delivery.
  • Frame: The unit of data at the link layer.
  • ARP: Protocol that maps IP addresses to MAC addresses on a LAN.
  • Broadcast domain: The scope of a link-layer broadcast.
  • VLAN: A logical segmentation of a link layer.

Mental model diagram

Device A        Switch          Device B
AA:AA           CAM Table       BB:BB
  |                |              |
  | ARP who-has?   | flood        |
  |--------------->|------------->|
  |                | ARP reply    |
  |<---------------|<-------------|
  | data frame     | unicast      |

How it works

  1. Host wants to send to IP in same subnet.
  2. Host broadcasts ARP request for target IP.
  3. Target replies with its MAC address.
  4. Host caches mapping and sends frames directly. Invariants: ARP traffic does not cross routers. Failure modes: ARP cache poisoning, switch flooding, weak Wi-Fi encryption.

Minimal concrete example

ARP exchange (text):
Request: Who has 192.168.1.50? Tell 192.168.1.10
Reply: 192.168.1.50 is at 00:11:22:33:44:55

Common misconceptions

  • “Switches block broadcasts.” Switches forward broadcasts to all ports.
  • “Wi-Fi is just wireless Ethernet.” The medium behavior and security differ.

Check-your-understanding questions

  1. Why does ARP not work across subnets?
  2. What does a switch do with an unknown destination MAC?

Check-your-understanding answers

  1. ARP requests are link-local broadcasts; routers do not forward them.
  2. It floods the frame out all ports except the ingress port.

Real-world applications

  • Diagnosing why a device is visible but not reachable.
  • Segmenting IoT devices with VLANs and guest Wi-Fi.

Where you will apply it

  • Projects 1, 4, 12, 15, 18

References

  • “TCP/IP Illustrated, Vol 1” by Stevens - Ch. 2, 4
  • RFC 826 (ARP)

Key insights The link layer is the truth of local delivery; everything else is built on it.

Summary Mastering ARP, MACs, and switching gives you real control over your LAN behavior.

Homework/Exercises to practice the concept

  1. Map your own ARP cache and identify every device.
  2. Draw your network and mark the broadcast domain boundaries.

Solutions to the homework/exercises

  1. Use arp -a and compare with your router client list.
  2. Each router boundary separates a broadcast domain.

IP Addressing, Subnetting, and Routing (Including NAT)

Fundamentals

IP is the addressing and routing system of the internet. IPv4 uses 32-bit addresses and relies on subnet masks (CIDR) to determine which destinations are local versus remote. Your device sends local traffic directly and remote traffic to the default gateway (your router). Routing is a hop-by-hop decision made by routers using the longest prefix match in the routing table. NAT (Network Address Translation) is common at the edge: it maps many private addresses to a single public address. Subnetting is the tool that partitions a network into smaller segments, which is critical for performance, security, and scaling in home/office environments.

Deep Dive

An IP address has two parts: the network prefix and the host identifier. CIDR notation (for example, /24) tells you how many bits belong to the prefix. The prefix defines the boundary of local delivery. If the destination IP shares the same prefix, the host uses ARP to find the destination MAC and sends directly. If not, it sends to the default gateway. This simple rule explains most “why can’t I reach this device” problems. Subnetting is the practice of choosing a prefix length that fits your network size and segmentation goals. Too large a subnet creates unnecessary broadcast traffic and larger failure domains. Too small a subnet causes address shortages or awkward routing rules.

Routing decisions are made by examining the destination IP and finding the most specific (longest) match in the routing table. A default route (0.0.0.0/0) catches everything else. Each router decrements the TTL field, which prevents routing loops from persisting forever. When TTL reaches zero, the router drops the packet and sends an ICMP Time Exceeded message. This is the mechanism that traceroute exploits. In home networks, your router usually has a handful of routes: the local LAN, perhaps a guest LAN, and a default route to your ISP. But in small office networks, you might add static routes to reach a lab subnet, or a VPN route to reach a remote office. Misconfigured routes cause asymmetric paths, which are hard to debug without captures.

NAT adds a stateful translation table at the edge. It replaces private source IPs and ports with a public IP and a chosen source port (often called PAT). When replies come back, the router uses this table to translate them back to the internal host. This is how many devices share one public address. NAT also breaks the original end-to-end model of the internet, which is why inbound connections typically require explicit port forwarding. Understanding NAT is critical for troubleshooting “I can browse the web but cannot host a server” issues, and for understanding why some peer-to-peer applications struggle. NAT is not a firewall, but it has similar observable effects because unsolicited inbound traffic is dropped by default when no translation table entry exists.

IPv6 changes the addressing landscape. It uses 128-bit addresses, removing the need for NAT at the edge. Instead of ARP, IPv6 uses Neighbor Discovery (ND). IPv6 hosts often use Stateless Address Autoconfiguration (SLAAC) to build their own addresses from router advertisements. In practice, many home networks operate dual-stack (IPv4 and IPv6). This means troubleshooting can involve two parallel protocol stacks, with different failure modes. A device might fail over IPv4 but work over IPv6, or the reverse. Understanding IP addressing at both versions makes you far more effective at diagnosing real-world issues.

Finally, routing and addressing are where policy is enforced. VLANs map to IP subnets. Firewall rules frequently reference IP ranges. VPNs create new routes. If you know how to design and reason about addresses and routing tables, you can predict how traffic will flow before you even run a packet capture. That ability is what transforms you from a user of networks into an engineer of networks.

How this fit on projects Subnet math and routing logic are required for Projects 1-3, 9, 14, 18, and 19.

Definitions & key terms

  • CIDR: Prefix notation for networks (e.g., /24).
  • Default gateway: Router used for off-subnet traffic.
  • Longest prefix match: Routing rule that chooses the most specific route.
  • NAT/PAT: Translation of internal addresses to a public address with ports.

Mental model diagram

LAN 192.168.1.0/24       Router/NAT          Internet
Host 192.168.1.50  ->  [NAT table]  ->  203.0.113.5:45001

How it works

  1. Host checks if destination is in local subnet.
  2. If local, ARP and send directly; if not, send to gateway.
  3. Router chooses route via longest prefix match.
  4. NAT rewrites source IP/port for outbound flows. Invariants: TTL always decrements at routers. Failure modes: wrong subnet mask, missing default route, NAT table exhaustion.

Minimal concrete example

Routing table snippet:
192.168.1.0/24 -> eth0 (direct)
0.0.0.0/0 -> 192.168.1.1 (default)

Common misconceptions

  • “Two devices with the same IP will work if they are on different switches.” They will conflict if on the same subnet.
  • “NAT protects me from all inbound attacks.” It does not replace a firewall.

Check-your-understanding questions

  1. Why does a /24 network have 254 usable addresses?
  2. What happens when no route matches a destination?

Check-your-understanding answers

  1. Two addresses are reserved: network and broadcast.
  2. The packet is dropped (and often an ICMP unreachable is sent).

Real-world applications

  • Planning guest and IoT subnets.
  • Diagnosing port forwarding and inbound access problems.

Where you will apply it

  • Projects 1-3, 9, 14, 18, 19

References

  • “TCP/IP Illustrated, Vol 1” by Stevens - Ch. 3
  • RFC 791 (IPv4), RFC 8200 (IPv6)

Key insights Addressing and routing are the map; NAT and policy are the gatekeepers.

Summary If you can compute subnets and read routing tables, you can predict traffic flow.

Homework/Exercises to practice the concept

  1. Divide 192.168.10.0/24 into four /26 subnets.
  2. Sketch a routing table for a network with a guest VLAN and a VPN.

Solutions to the homework/exercises

  1. /26 blocks at .0, .64, .128, .192.
  2. Include routes for each subnet plus a default route to the ISP.

3. Project Specification

3.1 What You Will Build

A CLI tool that discovers all devices on your local network by sending ARP requests and collecting replies.

Included:

  • CLI tool with clear output
  • Validation steps and logging
  • Documentation of assumptions

Excluded:

  • Production-grade performance tuning
  • Full security hardening

3.2 Functional Requirements

  1. Core function: Implement the primary behavior described in the project goal.
  2. Observable output: Produce deterministic output comparable to the Real World Outcome.
  3. Error handling: Handle timeouts, invalid inputs, and unreachable hosts gracefully.

3.3 Non-Functional Requirements

  • Performance: Complete typical tasks within a few seconds on a LAN.
  • Reliability: Fail safely and clearly on errors.
  • Usability: Provide concise CLI flags and helpful messages.

3.4 Example Usage / Output

You run the scanner and get a device table with IP, MAC, and vendor lookup:

$ sudo ./lan_scan
Scanning 192.168.1.0/24 on en0...

IP Address       MAC Address          Vendor
-----------------------------------------------------
192.168.1.1      aa:bb:cc:11:22:33    Netgear
192.168.1.12     10:aa:bb:cc:dd:ee    Apple
192.168.1.25     b8:27:eb:12:34:56    Raspberry Pi
192.168.1.80     44:55:66:77:88:99    Samsung

Found 4 devices in 2.1 seconds

3.5 Data Formats / Schemas / Protocols

Protocols: Link Layer and LAN Behavior (Ethernet, Wi-Fi, ARP, Switching), IP Addressing, Subnetting, and Routing (Including NAT).

3.6 Edge Cases

  • Target unreachable or timing out
  • Malformed or unexpected responses
  • Multiple interfaces or subnets

3.7 Real World Outcome

You run the scanner and get a device table with IP, MAC, and vendor lookup:

$ sudo ./lan_scan
Scanning 192.168.1.0/24 on en0...

IP Address       MAC Address          Vendor
-----------------------------------------------------
192.168.1.1      aa:bb:cc:11:22:33    Netgear
192.168.1.12     10:aa:bb:cc:dd:ee    Apple
192.168.1.25     b8:27:eb:12:34:56    Raspberry Pi
192.168.1.80     44:55:66:77:88:99    Samsung

Found 4 devices in 2.1 seconds

3.7.1 How to Run (Copy/Paste)

  • Build: make (or create a virtual environment as needed)
  • Run: ./P01-network-device-scanner
  • Config: update any constants in a config file or flags
  • Working directory: project root

3.7.2 Golden Path Demo (Deterministic)

Run against a known local target and compare with the expected output.

3.7.3 If CLI: provide an exact terminal transcript

You run the scanner and get a device table with IP, MAC, and vendor lookup:

$ sudo ./lan_scan
Scanning 192.168.1.0/24 on en0...

IP Address       MAC Address          Vendor
-----------------------------------------------------
192.168.1.1      aa:bb:cc:11:22:33    Netgear
192.168.1.12     10:aa:bb:cc:dd:ee    Apple
192.168.1.25     b8:27:eb:12:34:56    Raspberry Pi
192.168.1.80     44:55:66:77:88:99    Samsung

Found 4 devices in 2.1 seconds

4. Solution Architecture

4.1 High-Level Design

CLI Input -> Core Engine -> Output/Logs

4.2 Key Components

Component Responsibility Key Decisions
CLI Parser Parse flags and inputs Keep interface minimal
Core Engine Protocol logic and state Keep deterministic timing
Output/Logs Present results and errors Use consistent formatting

4.4 Data Structures (No Full Code)

Request:
  - target
  - protocol fields
Response:
  - status
  - timing
State:
  - retries
  - cache entries

5. Implementation Guide

5.1 Development Environment Setup

  • Ensure required tools (tcpdump, dig, ip) are installed.
  • Use elevated privileges where raw sockets are required.

5.2 Project Structure

project/
  README.md
  docs/
  src/
  tests/
  data/

5.3 The Core Question You’re Answering

“How does my computer discover who is on the local network without any central directory?”

This project makes ARP visible and forces you to handle real broadcast behavior and timing.

5.4 Concepts You Must Understand First

  1. MAC addressing and OUIs
    • What do the first 3 bytes represent?
    • Book Reference: “Computer Networks” - Ch. 4
  2. ARP request/reply flow
    • Why does ARP use broadcast for requests and unicast for replies?
    • Book Reference: “TCP/IP Illustrated, Vol 1” - Ch. 4
  3. Subnet math
    • How do you compute host ranges from CIDR?
    • Book Reference: “TCP/IP Guide” - Ch. 10

5.5 Questions to Guide Your Design

  1. Packet construction
    • What fields must be set for a valid ARP request?
    • How do you specify broadcast at the link layer?
  2. Scanning strategy
    • Sequential vs parallel probing: how do you trade speed and noise?
    • How long do you wait before declaring a host silent?

5.6 Thinking Exercise

Trace a single ARP discovery

Draw the exact frames exchanged when your laptop discovers the router’s MAC for the first time.

Questions to answer:

  • Which fields change between request and reply?
  • Where is the mapping cached after the exchange?

5.7 The Interview Questions They’ll Ask

  1. “Why does ARP not cross router boundaries?”
  2. “What is a broadcast domain, and how does it affect ARP?”
  3. “How would you detect ARP spoofing on a LAN?”
  4. “What happens when two devices claim the same IP?”
  5. “Why is ARP still used in IPv4 but not in IPv6?”

5.8 Hints in Layers

Hint 1: Start small Probe a single known IP first and confirm you can see the reply with tcpdump.

Hint 2: Choose the right interface Scanning on the wrong interface yields no replies. Verify with ip addr or ifconfig.

Hint 3: Pseudocode outline

- read local IP and subnet mask
- compute host range
- for each host: send ARP who-has
- wait for replies for N seconds
- parse replies into table

Hint 4: Verification tools Use arp -a and tcpdump -n -e arp to validate traffic.

5.9 Books That Will Help

Topic Book Chapter
ARP protocol “TCP/IP Illustrated, Vol 1” Ch. 4
Ethernet framing “Computer Networks” Ch. 4
Subnet math “TCP/IP Guide” Ch. 10

5.10 Implementation Phases

  1. Establish core protocol I/O and a minimal success path.
  2. Add parsing, validation, and timeouts.
  3. Add logging, metrics, and polish for output.

5.11 Key Implementation Decisions

  • Which interface and capture point provides visibility?
  • What timeout and retry strategy balances speed and accuracy?
  • How will results be validated against reference tools?

6. Testing Strategy

6.1 Test Categories

  • Unit: parsing and validation logic
  • Integration: protocol exchange with a real device
  • System: full run with reference tools

6.2 Critical Test Cases

  • Successful request/response path
  • Timeout and retry behavior
  • Invalid or unexpected input handling

6.3 Test Data

  • Local gateway IP and a known reachable host
  • A non-routable IP to trigger timeouts

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Problem 1: “No replies at all”

  • Why: Wrong interface or Wi-Fi client isolation.
  • Fix: Select the active interface; test on wired LAN if possible.
  • Quick test: tcpdump -n -e arp

Problem 2: “Only some devices show up”

  • Why: Devices asleep or behind another subnet.
  • Fix: Wake devices or limit scanning to local subnet.
  • Quick test: Ping a target then rescan.

7.2 Debugging Strategies

  • Capture traffic with tcpdump or Wireshark.
  • Compare against a known-good tool.
  • Log timestamps and retry logic.

7.3 Performance Traps

  • Excessive retries causing long runtimes.
  • Inefficient parsing under high packet rates.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add basic configuration flags for interface and timeout.
  • Improve output formatting and sorting.

8.2 Intermediate Extensions

  • Add caching or state persistence.
  • Add CSV or JSON output export.

8.3 Advanced Extensions

  • Add concurrency with careful rate limiting.
  • Add visualization or integration with a dashboard.

9. Real-World Connections

9.1 Industry Applications

  • Network diagnostics and troubleshooting
  • Security monitoring and policy enforcement
  • tcpdump / Wireshark
  • nmap / dnsmasq / unbound (as applicable)

9.3 Interview Relevance

  • Explaining protocol behavior
  • Diagnosing failures by layer

10. Resources

10.1 Essential Reading

Topic Book Chapter
ARP protocol “TCP/IP Illustrated, Vol 1” Ch. 4
Ethernet framing “Computer Networks” Ch. 4
Subnet math “TCP/IP Guide” Ch. 10

10.2 Video Resources

  • Wireshark or tcpdump walkthroughs (search for recent tutorials)
  • Vendor or RFC explainers for the relevant protocol

10.3 Tools & Documentation

  • RFCs for the protocols used in this project
  • man tcpdump, man ip, man ss