Project 3: Process Scheduler Visualization Tool

Build a real-time scheduler observer that shows context switches, run-queue depth, and per-CPU activity.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 2 weeks
Main Programming Language C (Alternatives: Rust, Python)
Alternative Programming Languages Rust, Python
Coolness Level Level 3: Clever
Business Potential Level 3: Observability tooling
Prerequisites C or Python, /proc familiarity, basic scheduling theory
Key Topics CFS, run queues, tracepoints, time-series aggregation

1. Learning Objectives

By completing this project, you will:

  1. Explain how the CFS scheduler selects the next runnable task.
  2. Read and interpret /proc/schedstat and /proc/stat.
  3. Capture sched_switch events via tracefs/perf.
  4. Aggregate event streams into per-CPU timelines.
  5. Build a deterministic, testable visualization output.
  6. Identify scheduling anomalies like long run-queue depth or starvation.

2. All Theory Needed (Per-Concept Breakdown)

2.1 CFS Scheduling Model and Run Queues

Fundamentals

Linux’s CFS (Completely Fair Scheduler) aims to distribute CPU time fairly. It keeps runnable tasks in a red-black tree ordered by virtual runtime (vruntime). The task with the smallest vruntime runs next, approximating fairness.

Deep Dive into the concept

Each CPU has a run queue (rq) and a CFS run queue (cfs_rq). vruntime increases proportional to actual runtime, scaled by task weight (from nice value). This creates the illusion of equal CPU time for tasks of equal weight. Load balancing periodically migrates tasks between CPUs. In your visualizer, run-queue depth indicates contention, and large vruntime gaps suggest dominance by a task. Understanding this model makes your visualization meaningful rather than a random stream of PIDs.

How this fits on projects

You’ll map sched_switch events to CFS states in Section 3.2 and Section 4.4, and interpret run-queue depth in Section 5.10.

Definitions & key terms

  • CFS -> Completely Fair Scheduler
  • vruntime -> weighted runtime used for scheduling order
  • run queue -> per-CPU list/tree of runnable tasks
  • nice -> user-visible priority weight

Mental model diagram (ASCII)

Run queue (rb-tree, ordered by vruntime)
  leftmost -> next to run

How it works (step-by-step)

  1. Task becomes runnable and is inserted into run queue.
  2. Scheduler picks leftmost (smallest vruntime).
  3. Task runs for a time slice and its vruntime increases.
  4. Task is reinserted and a new task is selected.

Minimal concrete example

Task A vr=10, Task B vr=12, Task C vr=20 -> pick A

Common misconceptions

  • Misconception: CFS uses fixed time quanta. Correction: time slices vary based on load and weight.
  • Misconception: higher nice always runs less. Correction: it runs less proportionally, not necessarily less in wall time.

Check-your-understanding questions

  1. Why is vruntime used instead of raw runtime?
  2. What does run-queue depth indicate?
  3. How does nice affect scheduling?

Check-your-understanding answers

  1. It normalizes runtime by task weight.
  2. Contention for the CPU.
  3. It changes task weight, affecting vruntime growth.

Real-world applications

  • Performance debugging (CPU-bound vs IO-bound workloads)
  • Capacity planning and latency analysis

Where you’ll apply it

References

  • “Linux Kernel Development” (Love), scheduler chapters
  • “Operating Systems: Three Easy Pieces” Ch. 7-10

Key insights

Scheduler theory explains what your visualization means.

Summary

CFS is a fairness machine; your tool reveals its decisions.

Homework/Exercises to practice the concept

  1. Predict how two tasks with nice -5 and +5 share CPU.
  2. Simulate a run queue with three tasks and track vruntime changes.

Solutions to the homework/exercises

  1. The nice -5 task receives more CPU time (weight is higher).
  2. The smallest vruntime task runs next; others wait.

2.2 Tracepoints and sched_switch Events

Fundamentals

The kernel emits tracepoints for scheduling events. sched:sched_switch fires on every context switch and contains previous and next task info. You can read these events via tracefs or perf.

Deep Dive into the concept

Tracepoints expose structured events with fields like prev_comm, prev_pid, next_pid, and timestamps. When enabled, the kernel writes event records to a ring buffer that user space reads. You must mount tracefs, enable events, and read trace_pipe or perf buffers. Proper timestamps should come from a monotonic clock to avoid skew.

How this fits on projects

Your event ingestion layer in Section 4.2 depends on correct tracepoint handling.

Definitions & key terms

  • tracepoint -> kernel instrumentation hook
  • tracefs -> virtual filesystem for tracing
  • sched_switch -> event emitted on context switch
  • ring buffer -> circular buffer of event records

Mental model diagram (ASCII)

CPU core -> sched_switch -> ring buffer -> user reader

How it works (step-by-step)

  1. Mount tracefs and enable sched_switch.
  2. Kernel writes events into per-CPU ring buffers.
  3. User-space reads and parses events.
  4. Events are aggregated into timelines.

Minimal concrete example

echo 1 | sudo tee /sys/kernel/tracing/events/sched/sched_switch/enable
cat /sys/kernel/tracing/trace_pipe

Common misconceptions

  • Misconception: tracepoints are always enabled. Correction: they must be explicitly enabled.
  • Misconception: trace_pipe preserves event order across CPUs. Correction: ordering across CPUs is approximate; use timestamps.

Check-your-understanding questions

  1. Why do you need timestamps for events from different CPUs?
  2. What happens if tracefs isn’t mounted?

Check-your-understanding answers

  1. Events are per-CPU and not globally ordered.
  2. You’ll see no events and the enable path fails.

Real-world applications

  • Kernel performance profiling
  • Latency investigations

Where you’ll apply it

References

  • Kernel tracing docs
  • perf trace documentation

Key insights

Tracepoints are the API that makes the scheduler observable.

Summary

Without tracepoints, scheduler behavior is invisible.

Homework/Exercises to practice the concept

  1. Enable sched_switch and count events during a CPU stress test.
  2. Compare trace_pipe output with perf trace.

Solutions to the homework/exercises

  1. Use stress-ng --cpu 2 and count lines.
  2. Both show the same switches but with different formats.

2.3 Time-Series Aggregation and Visualization

Fundamentals

Raw events are noisy. You need to aggregate events into time buckets to show run-queue depth, CPU utilization, and context switch rates. Visualization can be as simple as a TUI timeline.

Deep Dive into the concept

Aggregation involves grouping events by time window (e.g., 100ms) and CPU. The run-queue depth can be derived from /proc/schedstat deltas, while context-switch rate comes from counting sched_switch events. A deterministic output mode helps tests: use fixed windows and monotonic timestamps. Visualization can be textual bars or ASCII timelines.

How this fits on projects

This is how you turn raw events into the real-world outcome in Section 3.7 and Section 5.10 Phase 3.

Definitions & key terms

  • bucket -> time window for aggregation
  • utilization -> fraction of time CPU is busy
  • timeline -> per-CPU representation of running tasks

Mental model diagram (ASCII)

Time -> |----100ms----|----100ms----|
CPU0 -> nginx | postgres | nginx

How it works (step-by-step)

  1. Capture events with timestamps.
  2. Bucket events per CPU into fixed windows.
  3. Compute counts (context switches, run-queue depth).
  4. Render as ASCII timeline.

Minimal concrete example

CPU0: A A A B B
CPU1: C C D D D

Common misconceptions

  • Misconception: /proc/stat gives per-process usage. Correction: it’s per-CPU counters; you need other sources for per-process.

Check-your-understanding questions

  1. Why is a fixed time bucket useful for deterministic tests?
  2. How do you compute context switches per second?

Check-your-understanding answers

  1. It avoids jitter and wall-clock variance.
  2. Count sched_switch events per time window.

Real-world applications

  • Monitoring tools (top, htop, perfetto)
  • Scheduling anomaly detection

Where you’ll apply it

References

  • “Systems Performance” (Gregg), time-series chapters

Key insights

Aggregation turns kernel noise into human insight.

Summary

Your tool is only as useful as the aggregation you build.

Homework/Exercises to practice the concept

  1. Aggregate /proc/stat deltas into 1s CPU utilization.
  2. Render a text bar for CPU busy time.

Solutions to the homework/exercises

  1. Compute delta idle vs total and derive utilization.
  2. Use # for busy, . for idle.

3. Project Specification

3.1 What You Will Build

A CLI/TUI tool, schedviz, that shows per-CPU scheduling activity: current task, recent switches, run-queue depth, and summary statistics. It supports fixed-window aggregation for deterministic outputs.

3.2 Functional Requirements

  1. Read sched_switch tracepoints in real time.
  2. Parse /proc/schedstat and /proc/stat for run-queue depth and CPU usage.
  3. Render per-CPU timelines in text or TUI mode.
  4. Provide summary stats (context switches/sec, average run-queue depth).
  5. Support --window-ms and --fixed-ts for deterministic buckets.
  6. Handle multi-CPU systems and process exits gracefully.

3.3 Non-Functional Requirements

  • Performance: handle 10k events/sec without dropping.
  • Reliability: survives tracefs being temporarily unavailable.
  • Usability: output is readable in 80x24 terminals.

3.4 Example Usage / Output

$ sudo ./schedviz --fixed-ts --window-ms 100
CPU0: nginx(1234) -> postgres(5678) -> nginx(1234)
CPU1: kworker/1:0 -> bash(2222) -> kworker/1:0
runq depth: [3,1,2,1]
ctx switches/sec: 1240

3.5 Data Formats / Schemas / Protocols

  • sched_switch event format from tracefs
  • /proc/schedstat parsed into per-CPU counters

3.6 Edge Cases

  • No sched_switch events (tracefs not mounted)
  • Processes exit mid-window
  • CPU hotplug changes CPU count

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

sudo ./schedviz --fixed-ts --window-ms 100

3.7.2 Golden Path Demo (Deterministic)

  • Use --fixed-ts and 100ms windows to generate stable snapshots.

3.7.3 CLI Transcript (Success + Failure)

$ sudo ./schedviz --fixed-ts
CPU0: idle -> bash(1234) -> gcc(5000)
runq depth: [1,2,1]
ctx switches/sec: 800

$ ./schedviz
error: schedviz requires tracefs access (try sudo)
exit code: 2

3.7.4 Exit Codes

  • 0 success
  • 2 missing permissions

4. Solution Architecture

4.1 High-Level Design

Tracefs Reader -> Event Parser -> Aggregator -> Renderer

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Reader | read tracefs/perf | trace_pipe for v1 | | Parser | parse sched_switch lines | regex vs manual parse | | Aggregator | bucket events | fixed window size | | Renderer | ASCII timeline | terminal-friendly output |

4.3 Data Structures (No Full Code)

struct cpu_stats {
    uint64_t switches;
    uint64_t runq_depth;
    char last_task[32];
};

4.4 Algorithm Overview

  1. Enable tracepoint.
  2. Read events into per-CPU buckets.
  3. Every window, compute metrics and render.

Complexity: O(E) per window where E is events.


5. Implementation Guide

5.1 Development Environment Setup

sudo apt install linux-tools-common

5.2 Project Structure

schedviz/
|-- src/
|   |-- main.c
|   |-- tracefs.c
|   |-- agg.c
|   `-- render.c
`-- Makefile

5.3 The Core Question You’re Answering

“Can I make the scheduler’s decisions visible in real time?”

5.4 Concepts You Must Understand First

  1. CFS scheduling model.
  2. Tracepoints and event ingestion.
  3. Time-window aggregation.

5.5 Questions to Guide Your Design

  1. What window size is most useful?
  2. How will you handle CPU hotplug?
  3. Which metrics are most valuable to show?

5.6 Thinking Exercise

Predict how make -j8 changes run-queue depth on a 4-core machine.

5.7 The Interview Questions They’ll Ask

  1. What is sched_switch and why is it important?
  2. How does CFS balance fairness vs responsiveness?

5.8 Hints in Layers

  • Hint 1: Start with /proc/stat utilization.
  • Hint 2: Add tracefs sched_switch events.
  • Hint 3: Build simple ASCII timelines.

5.9 Books That Will Help

| Topic | Book | Chapter | |——|——|———| | Scheduling | Linux Kernel Development | Scheduler chapters | | OS scheduling | OSTEP | Ch. 7-10 | | Performance | Systems Performance | Timeline chapters |

5.10 Implementation Phases

Phase 1: Event capture (3 days)

  • Enable tracepoint and print raw events.

Phase 2: Aggregation (4 days)

  • Bucket events and compute metrics.

Phase 3: Visualization (3-4 days)

  • Render timelines and stats.

6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———|———|———-| | Unit | parser | parse a sample sched_switch line | | Integration | capture | stress CPU and count events | | Edge | no events | tracefs disabled |

6.2 Critical Test Cases

  1. Known sched_switch log parses correctly.
  2. Deterministic output for fixed window size.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Tracefs not mounted | no events | mount tracefs | | Mixed timestamps | jittery output | use monotonic clock |

7.2 Debugging Strategies

  • Compare counts against /proc/stat context switch counter.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add JSON output for metrics.

8.2 Intermediate Extensions

  • Add per-process CPU usage estimates.

8.3 Advanced Extensions

  • Export perfetto trace format.

9. Real-World Connections

9.1 Industry Applications

  • Identifying CPU contention in production workloads
  • perf, bcc, sysstat

9.3 Interview Relevance

  • Explaining how schedulers make decisions

10. Resources

  • Kernel tracing docs
  • perf trace docs

11. Self-Assessment Checklist

  • I can interpret sched_switch events.
  • I can explain run-queue depth metrics.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Captures and displays sched_switch events.

Full Completion:

  • Aggregates and renders per-CPU timelines.

Excellence:

  • Exports to external visualization tools.

13. Determinism Notes

  • Use --fixed-ts and fixed window size for tests.