Project 12: Traffic Control Simulator

A simulator that applies network profiles (latency, loss, bandwidth) to an interface.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	1 week
Main Programming Language	Bash
Alternative Programming Languages	Python, Go
Coolness Level	Level 4: Hardcore
Business Potential	3. Service & Support
Prerequisites	Basic Linux CLI
Key Topics	Traffic Control and Performance Measurement

1. Learning Objectives

By completing this project, you will:

Build the core tool described in the project and validate output against a golden transcript.
Explain how the tool maps to the Linux networking layer model.
Diagnose at least one real or simulated failure using the tool’s output.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement this project successfully.

Traffic Control and Performance Measurement

Fundamentals Traffic control (tc) is the Linux subsystem that shapes how packets leave an interface. It operates on queues: packets are enqueued into a queueing discipline (qdisc), scheduled for transmission, and optionally classified into classes with different rates or priorities. netem is a qdisc that simulates network impairments like delay, jitter, loss, duplication, and reordering. Measurement completes the loop: iperf3 provides controlled throughput tests for TCP and UDP so you can quantify the effect of shaping or diagnose bottlenecks. The essential idea is simple: you cannot optimize or validate performance without both a way to measure it and a way to change it.

Deep Dive Performance issues are often reported as feelings: “the network is slow.” Traffic control gives you a concrete model. When the kernel wants to transmit a packet, it is placed into a qdisc, which decides when and how that packet leaves. A classless qdisc like pfifo is a simple queue; a classful qdisc like HTB allows you to carve bandwidth into classes, reserve minimum rates, and cap maximums. Filters direct packets into those classes based on IPs, ports, or other fields. This is a programmable version of the fairness and contention that occurs on real networks, and it is the foundation of QoS on Linux.

Queue behavior determines latency and throughput. If the queue grows faster than it drains, latency increases and drops may occur when buffers fill. tc lets you control queue size and rate, which means you can intentionally shape throughput and latency. You can prioritize SSH over bulk transfers, cap noisy neighbors, or simulate congested links. The key is that shaping is per-interface; you must target the correct interface and direction, or you will affect the wrong traffic.

netem adds realism. It can introduce a fixed delay, a distribution of jitter, random loss, or packet reordering. This is not just for testing; it is essential for understanding how applications behave under real-world conditions. A web app may seem fine on a LAN but fail under 200ms latency and 2% loss. netem lets you reproduce that environment in a lab. However, netem affects egress by default. Ingress shaping requires an Intermediate Functional Block (IFB) device to redirect incoming traffic to a controllable queue. This asymmetry is often the source of confusion: you apply netem to eth0 and wonder why download traffic is unaffected. The fix is to shape the correct direction explicitly.

Measurement provides truth. iperf3 runs a server and client to generate controlled traffic. TCP tests reveal achievable throughput and retransmissions; UDP tests reveal loss and jitter at a target bitrate. Because iperf3 requires two endpoints, it reinforces an important principle: throughput is a property of a path, not a single host. Combine iperf3 with interface counters (/proc/net/dev or ip -s link) and you can reconcile “what the test says” with “what the link did,” which helps you localize bottlenecks. A low iperf3 score with low interface utilization suggests path constraints; a high CPU with drops suggests local limits.

Disciplined experimentation is what makes tc valuable. Establish a baseline with no shaping. Change one variable at a time (add 100ms delay, then add 2% loss, then add a 5 Mbps cap). Measure after each change and document results. This is how you build a performance model that predicts behavior rather than reacting to anecdotes. The Traffic Control Simulator project forces you to practice this experimental method.

Finally, remember operational safety. tc can accidentally cut off your own access if you shape the wrong interface or apply too strict a profile. Always keep a rollback command ready and test in a lab environment before applying in production. This is not just caution; it is part of professional performance engineering.

How this fit on projects

Bandwidth Monitor & Performance Tester (Project 8)
Traffic Control Simulator (Project 12)

Definitions & key terms

qdisc: Queueing discipline that controls packet scheduling on an interface.
netem: qdisc that emulates delay, loss, jitter, and reordering.
Throughput: Amount of data transferred per unit time.

Mental model diagram

Packets -> qdisc -> class -> filter -> dequeue -> NIC
                \-> netem delay/loss

How it works (step-by-step, invariants, failure modes)

Packet queued in qdisc.
Filter assigns class (if classful).
qdisc schedules dequeue based on policy.
netem can add delay/loss.
Packet transmitted and measured with iperf3. Invariants: qdisc is per interface; netem affects egress. Failure modes: shaping wrong interface, forgetting to clear rules.

Minimal concrete example Performance transcript (simplified):

Baseline: 940 Mbps TCP
After netem 100ms delay: 85 Mbps TCP
After netem 2% loss: TCP retransmits spike

Common misconceptions

“tc changes the whole network.” (It changes traffic on a specific interface.)
“UDP throughput equals link capacity.” (Loss increases if you exceed capacity.)

Check-your-understanding questions

Why does netem only affect egress by default?
What does iperf3 require before a test can run?
How do you distinguish bandwidth limit vs latency limit?

Check-your-understanding answers

qdiscs attach to egress queues; ingress requires IFB.
A server and a client.
Bandwidth limit shows a flat cap; latency limit shows throughput collapse with higher RTT.

Real-world applications

Capacity testing, QoS validation, and resilience testing for mobile users.

Where you’ll apply it Projects 8 and 12.

References

tc qdiscs/classes/filters description.
tc-netem capabilities.
iperf3 description (throughput tests, TCP/UDP/SCTP, client/server).

Key insights You cannot optimize what you cannot measure; tc and iperf3 give you both control and measurement.

Summary You can now shape traffic and measure the effects, which is essential for realistic performance debugging.

Homework/Exercises to practice the concept

Design a “4G poor” profile with target bandwidth, latency, and loss.
Describe how you would verify that the profile was applied correctly.

Solutions to the homework/exercises

Example: 5 Mbps, 100ms latency, 2% loss, jitter 50ms.
Verify with ping RTT, iperf3 throughput, and tc qdisc show output.

3. Project Specification

3.1 What You Will Build

A simulator that applies network profiles (latency, loss, bandwidth) to an interface.

3.2 Functional Requirements

Core data collection: Gather the required system/network data reliably.
Interpretation layer: Translate raw outputs into human-readable insights.
Deterministic output: Produce stable, comparable results across runs.
Error handling: Detect missing privileges, tools, or unsupported interfaces.

3.3 Non-Functional Requirements

Performance: Runs in under 5 seconds for baseline mode.
Reliability: Handles missing data sources gracefully.
Usability: Output is readable without post-processing.

3.4 Example Usage / Output

$ sudo ./tc-sim.sh apply 4g-poor eth0

Applied profile: 5 Mbps, 100ms, 2% loss

Validation:
  ping RTT ~100-150ms
  iperf3 throughput ~5 Mbps

3.5 Data Formats / Schemas / Protocols

Input: CLI tool output, kernel state, or service logs.
Output: A structured report with sections and summarized metrics.

3.6 Edge Cases

Missing tool binaries or insufficient permissions.
Interfaces or hosts that return no data.
Transient states (link flaps, intermittent loss).

3.7 Real World Outcome

$ sudo ./tc-sim.sh apply 4g-poor eth0

Applied profile: 5 Mbps, 100ms, 2% loss

Validation:
  ping RTT ~100-150ms
  iperf3 throughput ~5 Mbps

3.7.1 How to Run (Copy/Paste)

$ ./run-project.sh [options]

3.7.2 Golden Path Demo (Deterministic)

Run the tool against a known-good target and verify every section of the output matches the expected format.

3.7.3 If CLI: provide an exact terminal transcript

$ sudo ./tc-sim.sh apply 4g-poor eth0

Applied profile: 5 Mbps, 100ms, 2% loss

Validation:
  ping RTT ~100-150ms
  iperf3 throughput ~5 Mbps

4. Solution Architecture

4.1 High-Level Design

[Collector] -> [Parser] -> [Analyzer] -> [Reporter]

4.2 Key Components

Component	Responsibility	Key Decisions
Collector	Gather raw tool output	Which tools to call and with what flags
Parser	Normalize raw text/JSON	Text vs JSON parsing strategy
Analyzer	Compute insights	Thresholds and heuristics
Reporter	Format output	Stable layout and readability

4.3 Data Structures (No Full Code)

InterfaceRecord: name, state, addresses, stats
RouteRecord: prefix, gateway, interface, metric
Observation: timestamp, source, severity, message

4.4 Algorithm Overview

Key Algorithm: Evidence Aggregation

Collect raw outputs from tools.
Parse into normalized records.
Apply interpretation rules and thresholds.
Render the final report.

Complexity Analysis:

Time: O(n) over number of records
Space: O(n) to hold parsed records

5. Implementation Guide

5.1 Development Environment Setup

# Install required tools with your distro package manager

5.2 Project Structure

project-root/
├── src/
│   ├── main
│   ├── collectors/
│   └── formatters/
├── tests/
└── README.md

5.3 The Core Question You’re Answering

“How do I intentionally simulate poor network conditions on Linux?”

5.4 Concepts You Must Understand First

Qdiscs and classes
- Queueing model.
netem impairment
- Delay, loss, reorder.
Egress vs ingress
- Why shaping is mostly outbound.

5.5 Questions to Guide Your Design

How will you ensure rules can be removed safely?
How will you combine rate limiting and delay?
What profiles best represent real-world networks?

5.6 Thinking Exercise

Design a “satellite” profile: high latency, moderate bandwidth, low loss. Explain expected app behavior.

5.7 The Interview Questions They’ll Ask

“What is a qdisc?”
“How do you add latency on Linux?”
“Why is ingress shaping harder?”
“What does netem simulate?”
“How would you clean up tc rules?”

5.8 Hints in Layers

Hint 1: Start with a single netem qdisc. Hint 2: Add HTB for bandwidth limits. Hint 3: Validate with ping and iperf3. Hint 4: Always provide a clear reset command.

5.9 Books That Will Help

Topic	Book	Chapter
Traffic control	LARTC HOWTO	Full
Performance	“Systems Performance”	Ch. 10

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Define outputs and parse a single tool.
Produce a minimal report.

Phase 2: Core Functionality (3-5 days)

Add remaining tools and interpretation logic.
Implement stable formatting and summaries.

Phase 3: Polish & Edge Cases (2-3 days)

Handle missing data and failure modes.
Add thresholds and validation checks.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Parsing format	Text vs JSON	JSON where available	More stable parsing
Output layout	Table vs sections	Sections	Readability for humans
Sampling	One-shot vs periodic	One-shot + optional loop	Predictable runtime

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Validate parsing	Parse fixed tool output samples
Integration Tests	Validate tool calls	Run against a lab host
Edge Case Tests	Handle failures	Missing tool, no permissions

6.2 Critical Test Cases

Reference run: Output matches golden transcript.
Missing tool: Proper error message and partial report.
Permission denied: Clear guidance for sudo or capabilities.

6.3 Test Data

Input: captured command output
Expected: normalized report with correct totals

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Wrong interface	Empty output	Verify interface names
Missing privileges	Permission errors	Use sudo or capabilities
Misparsed output	Wrong stats	Prefer JSON parsing

7.2 Debugging Strategies

Re-run each tool independently to compare raw output.
Add a verbose mode that dumps raw data sources.

7.3 Performance Traps

Avoid tight loops without sleep intervals.

8. Extensions & Challenges

8.1 Beginner Extensions

Add colored status markers.
Export report to a file.

8.2 Intermediate Extensions

Add JSON output mode.
Add baseline comparison.

8.3 Advanced Extensions

Add multi-host aggregation.
Add alerting thresholds.

9. Real-World Connections

9.1 Industry Applications

SRE runbooks and on-call diagnostics.
Network operations monitoring.

tcpdump / iproute2 / nftables
mtr / iperf3

9.3 Interview Relevance

Demonstrates evidence-based debugging and tool mastery.

10. Resources

10.1 Essential Reading

Primary book listed in the main guide.
Relevant RFCs and tool manuals.

10.2 Video Resources

Conference talks on Linux networking and troubleshooting.