Project 15: Real-Time Loop Latency Profiler

Measure loop jitter on Linux, analyze worst-case timing, and understand where determinism breaks.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	1–2 weekends
Main Programming Language	C (Alternatives: Rust, Go, Python)
Alternative Programming Languages	Rust, Go, Python
Coolness Level	High
Business Potential	Medium
Prerequisites	C basics, Linux timing APIs
Key Topics	Scheduler jitter, high-resolution timers, measurement overhead

1. Learning Objectives

By completing this project, you will:

Implement a precise timing loop using clock_gettime.
Measure and visualize jitter under different loads.
Identify sources of timing variance on Linux.
Apply scheduling and CPU affinity to reduce jitter.

2. All Theory Needed (Per-Concept Breakdown)

Concept 1: Linux Scheduling, Jitter, and Timing Measurement

Fundamentals

Linux is not a hard real-time OS. Processes are scheduled by the kernel, and timing depends on CPU load, interrupts, and scheduler decisions. Jitter is the variability in timing between expected and actual wake-ups. Measuring jitter requires high-resolution timers (clock_gettime, clock_nanosleep) and careful logging to avoid measurement overhead. The goal is to quantify how predictable your loop is and what factors make it worse.

Deep Dive into the concept

Linux scheduling uses time slices and priorities to allocate CPU time. Real-time scheduling policies (SCHED_FIFO, SCHED_RR) can reduce jitter by giving a task higher priority, but they can also starve other tasks if misused. Standard scheduling (SCHED_OTHER) is fair but not deterministic. On the Pi Zero 2 W, interrupts from USB, Wi-Fi, and storage can add latency. Even if your loop runs in a tight while-loop, it can be preempted by kernel tasks.

Measuring jitter involves comparing the desired period to the actual time between iterations. A typical loop sets a target period (e.g., 5 ms) and records the time at the start of each iteration. Jitter is the difference between the actual interval and the target. You should record both average and worst-case jitter. A histogram or percentile analysis (e.g., 99th percentile) is useful for real-world decisions. For example, a control loop might tolerate 1 ms average jitter but not 10 ms spikes.

High-resolution timing uses CLOCK_MONOTONIC to avoid issues with wall-clock changes. clock_nanosleep in absolute mode (TIMER_ABSTIME) is preferred because it avoids accumulating drift. In a naive loop that sleeps for a fixed period, drift accumulates. The correct approach is to compute the next wake time as t0 + n * period and sleep until that absolute time. This also provides a clear jitter measure: actual_time - target_time.

Measurement overhead can distort results. Logging to disk for every iteration can add latency and jitter. The solution is to store results in memory and flush them after the measurement interval. Another approach is to sample every N iterations. CPU frequency scaling can also affect timing; for consistent results, you may want to fix the CPU governor to performance mode.

Reducing jitter can be attempted by pinning the process to a CPU core (sched_setaffinity), setting a real-time priority, disabling unnecessary services, and reducing I/O during the test. However, Linux will still have interrupts and kernel tasks that can preempt you. The goal is to understand these limits, not to eliminate all jitter.

How this fit on projects

This concept is used in §3 and §5.10 and informs Project 5 (PWM) and the capstone project.

Definitions & key terms

Jitter: Timing variability relative to target period.
CLOCK_MONOTONIC: Monotonic clock not affected by wall time changes.
SCHED_FIFO: Real-time scheduling policy.
CPU affinity: Pinning a process to a CPU core.

Mental model diagram (ASCII)

Target timeline: |----|----|----|
Actual timeline: |-----|---|------|
Jitter = actual - target

How it works (step-by-step, with invariants and failure modes)

Set target period and start time.
Sleep until next absolute time.
Record actual time and compute jitter.
Log results after test.

Failure modes:

Logging inside loop -> inflated jitter.
Using wall clock -> time jumps.
CPU scaling -> inconsistent results.

Minimal concrete example

clock_gettime(CLOCK_MONOTONIC, &t0);
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &next, NULL);

Common misconceptions

“Linux can be hard real-time.” It cannot without RT patches.
“Average jitter is enough.” Worst-case matters most.

Check-your-understanding questions

Why use CLOCK_MONOTONIC instead of CLOCK_REALTIME?
How does absolute sleep reduce drift?
Why does logging inside the loop distort results?

Check-your-understanding answers

It avoids time jumps from NTP or manual clock changes.
It schedules relative to a fixed timeline.
Logging adds I/O latency and preemption.

Real-world applications

Control loops, robotics, and timing-sensitive systems.

Where you’ll apply it

This project: §3.2, §5.10.
Other projects: Project 5.

References

“Operating Systems: Three Easy Pieces” — scheduling
“Systems Performance” — latency measurement

Key insights

Worst-case jitter is the real limit of Linux timing predictability.

Summary

A latency profiler quantifies how Linux scheduling affects real-time loops.

Homework/Exercises to practice the concept

Measure jitter at 1 ms and 10 ms periods.
Compare jitter with and without CPU affinity.
Record jitter histogram and analyze percentiles.

Solutions to the homework/exercises

Shorter periods increase jitter.
CPU affinity reduces jitter variance.
99th percentile highlights worst-case spikes.

3. Project Specification

3.1 What You Will Build

A latency profiler that measures loop jitter and produces a CSV report with statistics.

3.2 Functional Requirements

Run a timing loop at a configurable period.
Record jitter statistics (avg, max, percentiles).
Test under different load conditions.
Export results to CSV.

3.3 Non-Functional Requirements

Performance: 1 million iterations without crash.
Reliability: Results reproducible within 10%.
Usability: Clear output and charts.

3.4 Example Usage / Output

$ ./latency_profile
Target period: 5 ms
Avg jitter: 0.7 ms
Max jitter: 6.3 ms

3.5 Data Formats / Schemas / Protocols

CSV output:

iteration,actual_us,jitter_us

3.6 Edge Cases

Negative jitter due to clock granularity.
CPU frequency scaling changes.
Logging too frequently.

3.7 Real World Outcome

You can quantify worst-case jitter and explain timing limits.

3.7.1 How to Run (Copy/Paste)

./latency_profile --period-us 5000 --samples 100000

3.7.2 Golden Path Demo (Deterministic)

export FIXED_TIME="2026-01-01T13:30:00Z"
./latency_profile --simulate --period-us 5000

Expected output:

[2026-01-01T13:30:00Z] Avg jitter: 0.7 ms

3.7.3 Failure Demo (Deterministic)

./latency_profile --period-us 0

Expected output:

[ERROR] Invalid period

Exit code: 151

3.7.4 CLI Exit Codes

0: Success
150: Timer init failure
151: Invalid period

4. Solution Architecture

4.1 High-Level Design

Timing Loop -> Jitter Calculator -> Stats -> CSV Output

4.2 Key Components

4.3 Data Structures (No Full Code)

struct sample { long jitter_us; };

4.4 Algorithm Overview

Key Algorithm: Absolute Sleep Loop

Compute next target time.
Sleep until target.
Compute jitter and record.

Complexity Analysis:

Time: O(n) samples
Space: O(n) for samples

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install -y build-essential

5.2 Project Structure

project-root/
├── latency_profile.c
├── stats.c
└── README.md

5.3 The Core Question You’re Answering

“How predictable is timing on Linux, and where does it break?”

5.4 Concepts You Must Understand First

Scheduler policies and priorities.
High-resolution timers.
Measurement overhead.

5.5 Questions to Guide Your Design

What is your acceptable jitter budget?
How will you log without skewing results?

5.6 Thinking Exercise

Define a timing requirement for a motor control loop and compare it to measured jitter.

5.7 The Interview Questions They’ll Ask

Why is Linux not a hard real-time OS?
What is priority inversion?
How do you measure jitter accurately?

5.8 Hints in Layers

Hint 1: Start with a 10 ms period.

Hint 2: Use clock_nanosleep with TIMER_ABSTIME.

Hint 3: Buffer results and flush at end.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Basic timing loop (3 hours)

Implement monotonic timing loop.

Phase 2: Stats (4 hours)

Calculate mean, max, percentiles.

Phase 3: Load tests (3 hours)

Add CPU and I/O stress and compare.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Jitter stats computed correctly.
Results stable under repeated runs.
Invalid period -> exit 151.

6.3 Test Data

period=5000us, samples=10000

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Compare runs with and without load.
Plot jitter histogram.

7.3 Performance Traps

Excessive stats calculations in loop increase jitter.

8. Extensions & Challenges

8.1 Beginner Extensions

Add basic histogram plotting in ASCII.

8.2 Intermediate Extensions

Add CPU affinity and compare results.

8.3 Advanced Extensions

Compare with PREEMPT_RT kernel.

9. Real-World Connections

9.1 Industry Applications

Robotics, motor control, and timing-critical systems.

cyclictest from RT-tests.

9.3 Interview Relevance

Real-time vs soft real-time is a common topic.

10. Resources

10.1 Essential Reading

Linux clock_nanosleep and scheduling docs.

10.2 Video Resources

Real-time Linux tutorials.

10.3 Tools & Documentation

stress-ng for load testing.

Previous: Project 14
Next: Project 16

11. Self-Assessment Checklist

11.1 Understanding

I can explain jitter and scheduling impact.
I can explain monotonic timing.

11.2 Implementation

Jitter stats are reproducible.
Worst-case jitter is documented.

11.3 Growth

I can discuss real-time limits in interviews.

12. Submission / Completion Criteria

Minimum Viable Completion:

1000-sample jitter report.

Full Completion:

Jitter report under idle and load.

Excellence (Going Above & Beyond):

Compare to PREEMPT_RT kernel and document improvements.