Project 10: Network Log Analyzer

A log analyzer that extracts network-related events and correlates them over time.

Quick Reference

Attribute Value
Difficulty Level 2: Intermediate
Time Estimate 1 week
Main Programming Language Bash + awk
Alternative Programming Languages Python, Go
Coolness Level Level 2: Practical
Business Potential 3. Service & Support
Prerequisites Basic Linux CLI
Key Topics System Logging and Evidence Correlation, Packet Capture, BPF, and Observability

1. Learning Objectives

By completing this project, you will:

  1. Build the core tool described in the project and validate output against a golden transcript.
  2. Explain how the tool maps to the Linux networking layer model.
  3. Diagnose at least one real or simulated failure using the tool’s output.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement this project successfully.

System Logging and Evidence Correlation

Fundamentals Logs are the system’s memory. Linux provides the kernel ring buffer (visible via dmesg) and a structured system journal (queryable via journalctl). Network issues often leave traces in both: link up/down events, driver resets, DHCP renewals, and firewall drops. The key skill is correlation — aligning events across sources and time to reconstruct what happened. This concept teaches you to treat logs as evidence, not noise, and to connect kernel events with user-space services.

Deep Dive The kernel ring buffer is a circular in-memory log of kernel messages. dmesg reads it, showing driver events, link changes, and low-level warnings. The systemd journal, on the other hand, stores structured logs from services and can persist across reboots. journalctl allows filtering by time, unit, priority, or message content. When diagnosing network issues, you often need both: dmesg tells you the hardware and kernel-level story, while journalctl tells you what NetworkManager, systemd-resolved, or firewall services decided to do.

Time correlation is the hardest part. dmesg can show monotonic time or wall-clock time depending on options; journalctl can show wall time with precise timestamps. To build a timeline, normalize timestamps to a single format and then order events. This reveals causal chains: “link down at 03:45:12, driver reset at 03:45:13, DHCP renewed at 03:45:20.” Without this ordering, log lines look like unrelated noise.

The journal’s structured fields are powerful. You can filter by _SYSTEMD_UNIT to isolate a service, by PRIORITY for severity, or by _BOOT_ID to isolate a single boot session. This matters when diagnosing intermittent outages that span reboots. It also matters when firewall logs are noisy: you can query only kernel logs with journalctl -k, or search for DROP/REJECT strings. A disciplined approach is to start broad (all network-related messages), then narrow to the suspect service or time range.

Network logs often contain patterns. Link flaps (“link down” followed quickly by “link up”) indicate physical issues or driver instability. “Tx timeout” or “reset adapter” messages suggest hardware or driver faults. DHCP logs show lease acquisition and renewal times, which can explain transient loss of connectivity. Firewall logs show blocked packets, which can explain application-level timeouts. Learning these patterns turns log analysis from grep spam into reliable diagnosis.

Correlation across data sources is the final step. If tcpdump shows SYN packets arriving but journalctl shows firewall drops, you have direct evidence of policy blocking. If ss shows no listening socket and the journal shows the service failed to start, you have an application issue. Logs do not replace packet capture or socket state; they complement them. The goal is a multi-source narrative: “what happened, in what order, and why.”

How this fit on projects

  • Network Log Analyzer (Project 10)
  • Real-Time Network Security Monitor (Project 15)

Definitions & key terms

  • Kernel ring buffer: In-memory log of kernel messages (dmesg).
  • Journal: Structured system logs managed by systemd.
  • Correlation: Aligning events across sources and time to build causality.

Mental model diagram

Kernel events (dmesg)  ->  Timeline  ->  Diagnosis
Service logs (journal) ->

How it works (step-by-step, invariants, failure modes)

  1. Collect kernel and service logs.
  2. Normalize timestamps.
  3. Filter by time and service.
  4. Correlate events into a timeline. Invariants: kernel events precede service reactions; timestamps must be normalized. Failure modes: missing logs, log rotation, time drift.

Minimal concrete example Log transcript (simplified):

03:45:12 eth0: Link is Down
03:45:13 eth0: Reset adapter
03:45:20 DHCP: Bound to 192.168.1.100

Common misconceptions

  • “Logs are too noisy to be useful.” (Noise is reduced by filtering and correlation.)
  • “dmesg is enough.” (Service decisions live in journalctl.)

Check-your-understanding questions

  1. Why might dmesg timestamps differ from journalctl timestamps?
  2. What does a repeated link up/down pattern indicate?
  3. How do firewall logs help explain application timeouts?

Check-your-understanding answers

  1. dmesg can show monotonic time; journalctl shows wall time.
  2. Physical link issues or driver instability.
  3. Dropped packets cause connection failures even when routes exist.

Real-world applications

  • Root-cause analysis for outages, driver faults, and firewall blocks.

Where you’ll apply it Projects 10 and 15.

References

  • systemd journal documentation
  • Linux kernel logging overview

Key insights Logs are only useful when aligned in time and interpreted as a causal chain.

Summary You can now extract, filter, and correlate logs to explain network behavior.

Homework/Exercises to practice the concept

  • Build a timeline from kernel and journal logs for a known restart event.
  • Identify at least three patterns that indicate hardware vs configuration faults.

Solutions to the homework/exercises

  • Timeline shows kernel link down, service restart, DHCP renew, link up.
  • Hardware faults: reset adapter, Tx timeout; configuration faults: DNS changes, service restarts.

Packet Capture, BPF, and Observability

Fundamentals Packet capture is the closest thing you have to ground truth in networking: it reveals what actually traversed the interface, not what a tool inferred. tcpdump is the canonical CLI packet analyzer on Linux, and it relies on libpcap’s filter language (pcap-filter) to select which packets to capture. Filters are essential because capturing “everything” is noisy, expensive, and often unsafe. The skill here is translating a hypothesis into a filter: “show only SYN packets to port 443,” or “show DNS responses larger than 512 bytes.” When you can ask precise questions and capture precise evidence, you move from guesswork to proof.

Deep Dive Packet capture is powerful precisely because it bypasses abstraction. Tools like ss and ip summarize state, but they infer behavior from kernel structures. tcpdump captures actual packets and prints key fields: timestamps, addresses, ports, flags, sequence numbers, and lengths. Those fields are enough to reconstruct protocol behavior. A three-way handshake is visible as SYN, SYN-ACK, ACK. A reset is visible as RST. Loss or retransmission is visible as repeated sequence numbers or missing ACKs. In other words, packet capture is not just “packets,” it is narrative evidence.

Filters make capture usable. The libpcap language supports protocol qualifiers (tcp, udp, icmp), host and network selectors, port selectors, and even byte-level offsets. That means you can express questions like “show all TCP SYN packets from 203.0.113.9” or “show DNS responses with the TC bit set.” The filters run in the kernel, so they reduce overhead and keep captures focused. That is critical on busy servers, where unfiltered capture can drop packets or distort performance. Good operators always constrain scope: the smallest time window, the narrowest filter, and the minimal payload inspection needed to answer the question.

Interpreting output requires protocol literacy. TCP flags reveal connection lifecycle. Sequence and acknowledgment numbers show ordering and loss. Window sizes hint at flow control. UDP lacks a state machine, so you focus on port pairs and timing. ICMP messages often explain failures: “Destination Unreachable” or “Packet Too Big” are not noise — they are direct explanations from the network. If you see an incoming SYN in tcpdump but no SYN_RECV in ss, you know the packet was dropped before socket handling. That simple correlation often pinpoints firewall or routing errors in minutes.

Packet capture also intersects with performance and privacy. On high-throughput links, capturing payloads can be expensive. Many teams capture headers only or truncate payloads to reduce risk. Some environments require explicit approval for packet capture because it can contain sensitive data. The right approach is to capture the minimum necessary data and to document why you captured it. This is part of professional network hygiene: evidence gathering should not become a liability.

Observability is broader than packets. When a link flaps, dmesg records the driver event. When a firewall drops traffic, journalctl may record it if logging is enabled. By correlating packet capture with logs and socket state, you can produce a complete causal chain: “carrier dropped, ARP failed, SYNs were dropped, no socket established.” This multi-source correlation is the difference between “it seems broken” and “here is the exact failure sequence.” That is the standard expected in production incident reports.

Finally, be aware of capture artifacts. Offloads can make captured checksums appear wrong, even when packets are valid. Promiscuous mode affects what you see. Capturing on a bridge or veth interface can show duplicate or transformed packets. These artifacts are not bugs; they are features of modern networking stacks. The expert skill is to recognize them, adjust the capture point or filter, and interpret results in context. This chapter trains that discipline so your packet evidence is both correct and actionable.

How this fit on projects

  • Live Packet Capture Dashboard (Project 5)
  • Network Log Analyzer (Project 10)
  • Real-Time Network Security Monitor (Project 15)

Definitions & key terms

  • pcap filter: Expression language used by libpcap/tcpdump to select packets.
  • Capture scope: The time window and filter criteria that bound a capture.
  • Kernel ring buffer: In-memory log of kernel messages (dmesg).

Mental model diagram

Packet -> kernel -> tcpdump (filtered)
         |                 |
         |                 +-- evidence (flags, ports, timing)
         +-- logs (dmesg/journalctl)

How it works (step-by-step, invariants, failure modes)

  1. Apply BPF filter in kernel.
  2. Capture matching packets.
  3. Interpret headers and flags.
  4. Correlate with socket state and logs. Invariants: filters are applied before capture; tcpdump output is time ordered. Failure modes: capturing too much, missing packets due to filter mistakes.

Minimal concrete example Packet transcript (simplified):

12:00:01 IP 192.0.2.10.52341 > 198.51.100.20.443: Flags [S]
12:00:01 IP 198.51.100.20.443 > 192.0.2.10.52341: Flags [S.]
12:00:01 IP 192.0.2.10.52341 > 198.51.100.20.443: Flags [.]

Common misconceptions

  • “tcpdump shows everything.” (It only shows what you filter and what the NIC sees.)
  • “If tcpdump sees a packet, the app must see it.” (Firewall or routing can still drop it.)

Check-your-understanding questions

  1. Why is filtering in the kernel important?
  2. How can you tell a TCP handshake from tcpdump output?
  3. What kind of evidence would prove a firewall drop?

Check-your-understanding answers

  1. It reduces overhead and prevents excessive capture.
  2. SYN, SYN-ACK, ACK sequence appears in order.
  3. Incoming SYN seen in tcpdump, no SYN_RECV in ss, plus firewall log entry.

Real-world applications

  • Security investigations, performance debugging, and protocol verification.

Where you’ll apply it Projects 5, 10, 15.

References

  • tcpdump description (packet capture and filter expression).
  • pcap filter language (libpcap).
  • dmesg description (kernel ring buffer).
  • journalctl description (systemd journal).

Key insights Packets are the final authority; all other tools are interpretations.

Summary You can now capture targeted traffic and correlate it with logs and socket state to build evidence-backed diagnoses.

Homework/Exercises to practice the concept

  • Write three BPF filters for (a) DNS, (b) HTTPS, (c) TCP SYN only.
  • Sketch a timeline that aligns tcpdump output with socket states.

Solutions to the homework/exercises

  • DNS: udp port 53; HTTPS: tcp port 443; SYN only: tcp[tcpflags] & tcp-syn != 0.
  • A SYN observed should be followed by SYN_RECV in ss; if not, a drop occurred before socket handling.

3. Project Specification

3.1 What You Will Build

A log analyzer that extracts network-related events and correlates them over time.

3.2 Functional Requirements

  1. Core data collection: Gather the required system/network data reliably.
  2. Interpretation layer: Translate raw outputs into human-readable insights.
  3. Deterministic output: Produce stable, comparable results across runs.
  4. Error handling: Detect missing privileges, tools, or unsupported interfaces.

3.3 Non-Functional Requirements

  • Performance: Runs in under 5 seconds for baseline mode.
  • Reliability: Handles missing data sources gracefully.
  • Usability: Output is readable without post-processing.

3.4 Example Usage / Output

$ sudo ./netlog-analyzer.sh --last 24h

LINK EVENTS:
  03:45:12 eth0 LINK DOWN
  03:45:15 eth0 LINK UP 1000Mbps

DRIVER EVENTS:
  eth1: Tx Unit Hang detected

FIREWALL DROPS:
  2,595 blocked attempts (top ports: 22, 3389)

Timeline:
  03:45:12 link down -> 03:45:20 DHCP renewed

3.5 Data Formats / Schemas / Protocols

  • Input: CLI tool output, kernel state, or service logs.
  • Output: A structured report with sections and summarized metrics.

3.6 Edge Cases

  • Missing tool binaries or insufficient permissions.
  • Interfaces or hosts that return no data.
  • Transient states (link flaps, intermittent loss).

3.7 Real World Outcome

$ sudo ./netlog-analyzer.sh --last 24h

LINK EVENTS:
  03:45:12 eth0 LINK DOWN
  03:45:15 eth0 LINK UP 1000Mbps

DRIVER EVENTS:
  eth1: Tx Unit Hang detected

FIREWALL DROPS:
  2,595 blocked attempts (top ports: 22, 3389)

Timeline:
  03:45:12 link down -> 03:45:20 DHCP renewed

3.7.1 How to Run (Copy/Paste)

$ ./run-project.sh [options]

3.7.2 Golden Path Demo (Deterministic)

Run the tool against a known-good target and verify every section of the output matches the expected format.

3.7.3 If CLI: provide an exact terminal transcript

$ sudo ./netlog-analyzer.sh --last 24h

LINK EVENTS:
  03:45:12 eth0 LINK DOWN
  03:45:15 eth0 LINK UP 1000Mbps

DRIVER EVENTS:
  eth1: Tx Unit Hang detected

FIREWALL DROPS:
  2,595 blocked attempts (top ports: 22, 3389)

Timeline:
  03:45:12 link down -> 03:45:20 DHCP renewed

4. Solution Architecture

4.1 High-Level Design

[Collector] -> [Parser] -> [Analyzer] -> [Reporter]

4.2 Key Components

Component Responsibility Key Decisions
Collector Gather raw tool output Which tools to call and with what flags
Parser Normalize raw text/JSON Text vs JSON parsing strategy
Analyzer Compute insights Thresholds and heuristics
Reporter Format output Stable layout and readability

4.3 Data Structures (No Full Code)

  • InterfaceRecord: name, state, addresses, stats
  • RouteRecord: prefix, gateway, interface, metric
  • Observation: timestamp, source, severity, message

4.4 Algorithm Overview

Key Algorithm: Evidence Aggregation

  1. Collect raw outputs from tools.
  2. Parse into normalized records.
  3. Apply interpretation rules and thresholds.
  4. Render the final report.

Complexity Analysis:

  • Time: O(n) over number of records
  • Space: O(n) to hold parsed records

5. Implementation Guide

5.1 Development Environment Setup

# Install required tools with your distro package manager

5.2 Project Structure

project-root/
├── src/
│   ├── main
│   ├── collectors/
│   └── formatters/
├── tests/
└── README.md

5.3 The Core Question You’re Answering

“What network-related events happened, and what do they mean?”

5.4 Concepts You Must Understand First

  1. Kernel ring buffer
    • dmesg reads kernel messages.
    • Book Reference: “How Linux Works” - Ch. 4
  2. systemd journal
    • journalctl queries system logs.
    • Book Reference: “How Linux Works” - Ch. 6
  3. Driver message patterns
    • Recognize link up/down and error patterns.
    • Book Reference: “Linux Kernel Development” - Ch. 17

5.5 Questions to Guide Your Design

  1. Which regex patterns best capture network events?
  2. How will you align timestamps across sources?
  3. How will you summarize without losing key events?

5.6 Thinking Exercise

Interpret:

eth0: Link is Down
eth0: Link is Up 1000 Mbps
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Question: What sequence occurred and why might it matter?

5.7 The Interview Questions They’ll Ask

  1. “Where do you look for NIC driver errors?”
  2. “What is the difference between dmesg and journalctl?”
  3. “How do you filter logs by service?”
  4. “What does a link flap indicate?”
  5. “How do you correlate log events?”

5.8 Hints in Layers

Hint 1: Use dmesg --time-format=iso. Hint 2: Use journalctl -u NetworkManager. Hint 3: Extract timestamps, then sort. Hint 4: Group by interface and event type.

5.9 Books That Will Help

Topic Book Chapter
Linux logging “How Linux Works” Ch. 4, 6
Ops practice “The Practice of System Administration” Ch. 21

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

  • Define outputs and parse a single tool.
  • Produce a minimal report.

Phase 2: Core Functionality (3-5 days)

  • Add remaining tools and interpretation logic.
  • Implement stable formatting and summaries.

Phase 3: Polish & Edge Cases (2-3 days)

  • Handle missing data and failure modes.
  • Add thresholds and validation checks.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Parsing format Text vs JSON JSON where available More stable parsing
Output layout Table vs sections Sections Readability for humans
Sampling One-shot vs periodic One-shot + optional loop Predictable runtime

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate parsing Parse fixed tool output samples
Integration Tests Validate tool calls Run against a lab host
Edge Case Tests Handle failures Missing tool, no permissions

6.2 Critical Test Cases

  1. Reference run: Output matches golden transcript.
  2. Missing tool: Proper error message and partial report.
  3. Permission denied: Clear guidance for sudo or capabilities.

6.3 Test Data

Input: captured command output
Expected: normalized report with correct totals

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong interface Empty output Verify interface names
Missing privileges Permission errors Use sudo or capabilities
Misparsed output Wrong stats Prefer JSON parsing

7.2 Debugging Strategies

  • Re-run each tool independently to compare raw output.
  • Add a verbose mode that dumps raw data sources.

7.3 Performance Traps

  • Avoid tight loops without sleep intervals.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add colored status markers.
  • Export report to a file.

8.2 Intermediate Extensions

  • Add JSON output mode.
  • Add baseline comparison.

8.3 Advanced Extensions

  • Add multi-host aggregation.
  • Add alerting thresholds.

9. Real-World Connections

9.1 Industry Applications

  • SRE runbooks and on-call diagnostics.
  • Network operations monitoring.
  • tcpdump / iproute2 / nftables
  • mtr / iperf3

9.3 Interview Relevance

  • Demonstrates evidence-based debugging and tool mastery.

10. Resources

10.1 Essential Reading

  • Primary book listed in the main guide.
  • Relevant RFCs and tool manuals.

10.2 Video Resources

  • Conference talks on Linux networking and troubleshooting.