Project 12: Traffic Control Simulator

A simulator that applies network profiles (latency, loss, bandwidth) to an interface.

Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 1 week
Main Programming Language Bash
Alternative Programming Languages Python, Go
Coolness Level Level 4: Hardcore
Business Potential 3. Service & Support
Prerequisites Basic Linux CLI
Key Topics Traffic Control and Performance Measurement

1. Learning Objectives

By completing this project, you will:

  1. Build the core tool described in the project and validate output against a golden transcript.
  2. Explain how the tool maps to the Linux networking layer model.
  3. Diagnose at least one real or simulated failure using the tool’s output.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement this project successfully.

Traffic Control and Performance Measurement

Fundamentals Traffic control (tc) is the Linux subsystem that shapes how packets leave an interface. It operates on queues: packets are enqueued into a queueing discipline (qdisc), scheduled for transmission, and optionally classified into classes with different rates or priorities. netem is a qdisc that simulates network impairments like delay, jitter, loss, duplication, and reordering. Measurement completes the loop: iperf3 provides controlled throughput tests for TCP and UDP so you can quantify the effect of shaping or diagnose bottlenecks. The essential idea is simple: you cannot optimize or validate performance without both a way to measure it and a way to change it.

Deep Dive Performance issues are often reported as feelings: “the network is slow.” Traffic control gives you a concrete model. When the kernel wants to transmit a packet, it is placed into a qdisc, which decides when and how that packet leaves. A classless qdisc like pfifo is a simple queue; a classful qdisc like HTB allows you to carve bandwidth into classes, reserve minimum rates, and cap maximums. Filters direct packets into those classes based on IPs, ports, or other fields. This is a programmable version of the fairness and contention that occurs on real networks, and it is the foundation of QoS on Linux.

Queue behavior determines latency and throughput. If the queue grows faster than it drains, latency increases and drops may occur when buffers fill. tc lets you control queue size and rate, which means you can intentionally shape throughput and latency. You can prioritize SSH over bulk transfers, cap noisy neighbors, or simulate congested links. The key is that shaping is per-interface; you must target the correct interface and direction, or you will affect the wrong traffic.

netem adds realism. It can introduce a fixed delay, a distribution of jitter, random loss, or packet reordering. This is not just for testing; it is essential for understanding how applications behave under real-world conditions. A web app may seem fine on a LAN but fail under 200ms latency and 2% loss. netem lets you reproduce that environment in a lab. However, netem affects egress by default. Ingress shaping requires an Intermediate Functional Block (IFB) device to redirect incoming traffic to a controllable queue. This asymmetry is often the source of confusion: you apply netem to eth0 and wonder why download traffic is unaffected. The fix is to shape the correct direction explicitly.

Measurement provides truth. iperf3 runs a server and client to generate controlled traffic. TCP tests reveal achievable throughput and retransmissions; UDP tests reveal loss and jitter at a target bitrate. Because iperf3 requires two endpoints, it reinforces an important principle: throughput is a property of a path, not a single host. Combine iperf3 with interface counters (/proc/net/dev or ip -s link) and you can reconcile “what the test says” with “what the link did,” which helps you localize bottlenecks. A low iperf3 score with low interface utilization suggests path constraints; a high CPU with drops suggests local limits.

Disciplined experimentation is what makes tc valuable. Establish a baseline with no shaping. Change one variable at a time (add 100ms delay, then add 2% loss, then add a 5 Mbps cap). Measure after each change and document results. This is how you build a performance model that predicts behavior rather than reacting to anecdotes. The Traffic Control Simulator project forces you to practice this experimental method.

Finally, remember operational safety. tc can accidentally cut off your own access if you shape the wrong interface or apply too strict a profile. Always keep a rollback command ready and test in a lab environment before applying in production. This is not just caution; it is part of professional performance engineering.

How this fit on projects

  • Bandwidth Monitor & Performance Tester (Project 8)
  • Traffic Control Simulator (Project 12)

Definitions & key terms

  • qdisc: Queueing discipline that controls packet scheduling on an interface.
  • netem: qdisc that emulates delay, loss, jitter, and reordering.
  • Throughput: Amount of data transferred per unit time.

Mental model diagram

Packets -> qdisc -> class -> filter -> dequeue -> NIC
                \-> netem delay/loss

How it works (step-by-step, invariants, failure modes)

  1. Packet queued in qdisc.
  2. Filter assigns class (if classful).
  3. qdisc schedules dequeue based on policy.
  4. netem can add delay/loss.
  5. Packet transmitted and measured with iperf3. Invariants: qdisc is per interface; netem affects egress. Failure modes: shaping wrong interface, forgetting to clear rules.

Minimal concrete example Performance transcript (simplified):

Baseline: 940 Mbps TCP
After netem 100ms delay: 85 Mbps TCP
After netem 2% loss: TCP retransmits spike

Common misconceptions

  • “tc changes the whole network.” (It changes traffic on a specific interface.)
  • “UDP throughput equals link capacity.” (Loss increases if you exceed capacity.)

Check-your-understanding questions

  1. Why does netem only affect egress by default?
  2. What does iperf3 require before a test can run?
  3. How do you distinguish bandwidth limit vs latency limit?

Check-your-understanding answers

  1. qdiscs attach to egress queues; ingress requires IFB.
  2. A server and a client.
  3. Bandwidth limit shows a flat cap; latency limit shows throughput collapse with higher RTT.

Real-world applications

  • Capacity testing, QoS validation, and resilience testing for mobile users.

Where you’ll apply it Projects 8 and 12.

References

  • tc qdiscs/classes/filters description.
  • tc-netem capabilities.
  • iperf3 description (throughput tests, TCP/UDP/SCTP, client/server).

Key insights You cannot optimize what you cannot measure; tc and iperf3 give you both control and measurement.

Summary You can now shape traffic and measure the effects, which is essential for realistic performance debugging.

Homework/Exercises to practice the concept

  • Design a “4G poor” profile with target bandwidth, latency, and loss.
  • Describe how you would verify that the profile was applied correctly.

Solutions to the homework/exercises

  • Example: 5 Mbps, 100ms latency, 2% loss, jitter 50ms.
  • Verify with ping RTT, iperf3 throughput, and tc qdisc show output.

3. Project Specification

3.1 What You Will Build

A simulator that applies network profiles (latency, loss, bandwidth) to an interface.

3.2 Functional Requirements

  1. Core data collection: Gather the required system/network data reliably.
  2. Interpretation layer: Translate raw outputs into human-readable insights.
  3. Deterministic output: Produce stable, comparable results across runs.
  4. Error handling: Detect missing privileges, tools, or unsupported interfaces.

3.3 Non-Functional Requirements

  • Performance: Runs in under 5 seconds for baseline mode.
  • Reliability: Handles missing data sources gracefully.
  • Usability: Output is readable without post-processing.

3.4 Example Usage / Output

$ sudo ./tc-sim.sh apply 4g-poor eth0

Applied profile: 5 Mbps, 100ms, 2% loss

Validation:
  ping RTT ~100-150ms
  iperf3 throughput ~5 Mbps

3.5 Data Formats / Schemas / Protocols

  • Input: CLI tool output, kernel state, or service logs.
  • Output: A structured report with sections and summarized metrics.

3.6 Edge Cases

  • Missing tool binaries or insufficient permissions.
  • Interfaces or hosts that return no data.
  • Transient states (link flaps, intermittent loss).

3.7 Real World Outcome

$ sudo ./tc-sim.sh apply 4g-poor eth0

Applied profile: 5 Mbps, 100ms, 2% loss

Validation:
  ping RTT ~100-150ms
  iperf3 throughput ~5 Mbps

3.7.1 How to Run (Copy/Paste)

$ ./run-project.sh [options]

3.7.2 Golden Path Demo (Deterministic)

Run the tool against a known-good target and verify every section of the output matches the expected format.

3.7.3 If CLI: provide an exact terminal transcript

$ sudo ./tc-sim.sh apply 4g-poor eth0

Applied profile: 5 Mbps, 100ms, 2% loss

Validation:
  ping RTT ~100-150ms
  iperf3 throughput ~5 Mbps

4. Solution Architecture

4.1 High-Level Design

[Collector] -> [Parser] -> [Analyzer] -> [Reporter]

4.2 Key Components

Component Responsibility Key Decisions
Collector Gather raw tool output Which tools to call and with what flags
Parser Normalize raw text/JSON Text vs JSON parsing strategy
Analyzer Compute insights Thresholds and heuristics
Reporter Format output Stable layout and readability

4.3 Data Structures (No Full Code)

  • InterfaceRecord: name, state, addresses, stats
  • RouteRecord: prefix, gateway, interface, metric
  • Observation: timestamp, source, severity, message

4.4 Algorithm Overview

Key Algorithm: Evidence Aggregation

  1. Collect raw outputs from tools.
  2. Parse into normalized records.
  3. Apply interpretation rules and thresholds.
  4. Render the final report.

Complexity Analysis:

  • Time: O(n) over number of records
  • Space: O(n) to hold parsed records

5. Implementation Guide

5.1 Development Environment Setup

# Install required tools with your distro package manager

5.2 Project Structure

project-root/
├── src/
│   ├── main
│   ├── collectors/
│   └── formatters/
├── tests/
└── README.md

5.3 The Core Question You’re Answering

“How do I intentionally simulate poor network conditions on Linux?”

5.4 Concepts You Must Understand First

  1. Qdiscs and classes
    • Queueing model.
  2. netem impairment
    • Delay, loss, reorder.
  3. Egress vs ingress
    • Why shaping is mostly outbound.

5.5 Questions to Guide Your Design

  1. How will you ensure rules can be removed safely?
  2. How will you combine rate limiting and delay?
  3. What profiles best represent real-world networks?

5.6 Thinking Exercise

Design a “satellite” profile: high latency, moderate bandwidth, low loss. Explain expected app behavior.

5.7 The Interview Questions They’ll Ask

  1. “What is a qdisc?”
  2. “How do you add latency on Linux?”
  3. “Why is ingress shaping harder?”
  4. “What does netem simulate?”
  5. “How would you clean up tc rules?”

5.8 Hints in Layers

Hint 1: Start with a single netem qdisc. Hint 2: Add HTB for bandwidth limits. Hint 3: Validate with ping and iperf3. Hint 4: Always provide a clear reset command.

5.9 Books That Will Help

Topic Book Chapter
Traffic control LARTC HOWTO Full
Performance “Systems Performance” Ch. 10

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

  • Define outputs and parse a single tool.
  • Produce a minimal report.

Phase 2: Core Functionality (3-5 days)

  • Add remaining tools and interpretation logic.
  • Implement stable formatting and summaries.

Phase 3: Polish & Edge Cases (2-3 days)

  • Handle missing data and failure modes.
  • Add thresholds and validation checks.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Parsing format Text vs JSON JSON where available More stable parsing
Output layout Table vs sections Sections Readability for humans
Sampling One-shot vs periodic One-shot + optional loop Predictable runtime

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate parsing Parse fixed tool output samples
Integration Tests Validate tool calls Run against a lab host
Edge Case Tests Handle failures Missing tool, no permissions

6.2 Critical Test Cases

  1. Reference run: Output matches golden transcript.
  2. Missing tool: Proper error message and partial report.
  3. Permission denied: Clear guidance for sudo or capabilities.

6.3 Test Data

Input: captured command output
Expected: normalized report with correct totals

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong interface Empty output Verify interface names
Missing privileges Permission errors Use sudo or capabilities
Misparsed output Wrong stats Prefer JSON parsing

7.2 Debugging Strategies

  • Re-run each tool independently to compare raw output.
  • Add a verbose mode that dumps raw data sources.

7.3 Performance Traps

  • Avoid tight loops without sleep intervals.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add colored status markers.
  • Export report to a file.

8.2 Intermediate Extensions

  • Add JSON output mode.
  • Add baseline comparison.

8.3 Advanced Extensions

  • Add multi-host aggregation.
  • Add alerting thresholds.

9. Real-World Connections

9.1 Industry Applications

  • SRE runbooks and on-call diagnostics.
  • Network operations monitoring.
  • tcpdump / iproute2 / nftables
  • mtr / iperf3

9.3 Interview Relevance

  • Demonstrates evidence-based debugging and tool mastery.

10. Resources

10.1 Essential Reading

  • Primary book listed in the main guide.
  • Relevant RFCs and tool manuals.

10.2 Video Resources

  • Conference talks on Linux networking and troubleshooting.