Project 7: Interrupt Latency Profiler
A measurement harness that profiles interrupt latency under different system loads and priorities.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 1-2 weeks |
| Main Programming Language | C (Alternatives: C++, Rust, Ada) |
| Alternative Programming Languages | C++, Rust, Ada |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | 1. The “Resume Gold” |
| Prerequisites | Timers, GPIO toggling, NVIC basics |
| Key Topics | NVIC priority, latency, profiling, jitter |
1. Learning Objectives
By completing this project, you will:
- Measure ISR latency with GPIO and/or cycle counters.
- Quantify worst-case latency under load.
- Understand how priorities affect preemption.
- Produce a latency report with min/avg/max values.
2. All Theory Needed (Per-Concept Breakdown)
NVIC, Interrupt Priority, and Latency Measurement
Fundamentals Interrupts allow the MCU to respond to events without polling. The NVIC assigns priorities and decides which interrupt runs next. Latency is the time between an interrupt event and when its handler actually executes. Measuring latency is critical for real-time systems because it defines how quickly you can respond to sensors, timers, or communication events.
Deep Dive into the concept On Cortex-M, when an interrupt event occurs, the NVIC determines whether it can preempt the current execution. If the new interrupt has higher priority, the core pushes registers onto the stack and jumps to the handler. This stacking and unstacking takes a predictable number of cycles, but additional latency can be introduced if interrupts are disabled or if higher-priority ISRs are running. The STM32F3 supports configurable priority levels; you must define a priority scheme that matches your system’s timing requirements. Latency measurement is done by toggling a GPIO at the first instruction of an ISR and measuring the delay from the event source. For timer interrupts, you can compare the timer output event to the GPIO toggle. For external interrupts, you can inject a known edge and measure the response. Another tool is the DWT cycle counter, which can count CPU cycles between event and handler execution. In many systems, average latency is less important than worst-case latency, because worst-case defines deadline misses. Therefore, a good profiler will stress the system by enabling other interrupts, running CPU-heavy tasks, and then measuring the maximum latency observed. Your results guide design decisions: if latency is too high, you might raise priority, reduce ISR work, or move tasks to DMA. This project teaches you that ‘real-time’ is a measured property, not a claim.
How this fit on projects In Interrupt Latency Profiler, you build a latency profiler that measures interrupt response under load and produces a report of min/avg/max latency.
Definitions & key terms
- NVIC -> Nested Vectored Interrupt Controller that prioritizes and dispatches interrupts.
- Latency -> Time between an event and ISR execution.
- Preemption -> Higher-priority interrupt interrupting lower-priority code.
- DWT -> Data Watchpoint and Trace unit, includes cycle counter.
- Critical section -> Code where interrupts are temporarily disabled.
Mental model diagram (ASCII)
Event -> NVIC arbitration -> stacking -> ISR entry -> GPIO toggle
How it works (step-by-step, with invariants and failure modes)
- Configure a periodic timer interrupt as the event source.
- Toggle a GPIO at the first line of the ISR.
- Measure delay between timer event and GPIO edge using scope/logic analyzer.
- Stress the system with other interrupts and compute max latency.
- Invariant: ISR entry time remains bounded; failure mode: unbounded latency due to long critical sections.
Minimal concrete example
void TIM3_IRQHandler(void) {
GPIOA->ODR ^= (1 << 5); // toggle for timing
TIM3->SR &= ~TIM_SR_UIF;
}
Common misconceptions
- Average latency is enough ignores worst-case deadlines.
- Interrupts are always immediate ignores priority and masking.
- ISRs can do heavy work ignores that long ISRs increase latency for others.
Check-your-understanding questions
- What factors increase worst-case interrupt latency?
- How do priorities affect preemption?
- Why is GPIO toggling a good measurement method?
Check-your-understanding answers
- Long critical sections, higher-priority ISRs, and flash wait states increase latency.
- Higher-priority interrupts can preempt lower-priority handlers at any time.
- It produces a physical edge that can be measured precisely with external tools.
Real-world applications
- Safety-critical control loops.
- Real-time motor control and power electronics.
- High-speed data acquisition systems.
Where you’ll apply it
- In this project: see Section 6.2 Critical Test Cases and Section 7.2 Debugging Strategies.
- Also used in: P03 Timer-Driven LED Sequencer.
References
- ARM Cortex-M4 Technical Reference Manual (exception entry/exit).
- Joseph Yiu, ‘The Definitive Guide to ARM Cortex-M3/M4’ (NVIC).
- ST application notes on interrupt latency measurement.
Key insights
- Real-time behavior is defined by worst-case latency, not average latency.
Summary Interrupt profiling makes latency visible and measurable. By understanding NVIC priorities and measuring response time, you can engineer systems that meet real-time deadlines.
Homework/Exercises to practice the concept
- Measure ISR latency with and without other interrupts enabled.
- Configure two interrupts with different priorities and observe preemption.
- Use the DWT cycle counter to compute cycles between event and ISR entry.
Solutions to the homework/exercises
- Latency increases when other interrupts run; log max latency as your design bound.
- The higher-priority ISR preempts immediately, lower-priority waits.
- DWT->CYCCNT gives cycle-accurate timing; subtract to compute latency.
Timers, Prescalers, and Interrupt Scheduling
Fundamentals Hardware timers are the MCU’s timekeeping engines. They count ticks derived from a clock source and can generate periodic events without CPU busy-waiting. By configuring a prescaler and auto-reload value, you set a precise interval. Timers can trigger interrupts, toggle pins, or drive DMA. Interrupts allow your firmware to react to timer events with deterministic latency, which is the basis of real-time scheduling. Using timers and interrupts instead of delay loops is the core technique for reliable embedded timing.
Deep Dive into the concept STM32 timers are flexible peripherals that can operate as basic timebases, output compare generators, PWM engines, or input capture units. At their core is a counter clocked by a timer clock derived from the APB bus. The prescaler divides the timer clock, and the auto-reload register sets the period at which the counter resets. Each update event can trigger an interrupt, set a status flag, or generate a DMA request. The advantage over software delays is determinism: the hardware counts regardless of CPU load. However, interrupts introduce latency. The NVIC prioritizes interrupts, and higher-priority ISRs can delay timer handlers. If you rely on timer interrupts for scheduling, you must measure latency and account for jitter. A timer-driven LED sequencer is an ideal training ground: you build a state machine that advances on each timer tick, ensuring that your pattern timing is stable even if the CPU is busy. Advanced timers also support one-pulse mode, dead-time insertion, and complementary outputs, which are critical for motor control but beyond the basics here. For a correct timer configuration, you compute the prescaler as (timer_clock / desired_tick) - 1, and the auto-reload as (tick_rate / desired_interval) - 1. You then verify by toggling a GPIO on each interrupt and measuring the period. Misconfigurations are easy to spot: a factor-of-two error usually means you forgot the APB timer clock doubling rule. Another common failure is forgetting to clear interrupt flags, which results in repeated or missed interrupts. In short, timers are your deterministic clockwork. Mastering them means you can build schedules, measure drift, and guarantee timing, which is the heart of embedded systems.
How this fit on projects In Interrupt Latency Profiler, you design a timer-driven schedule that advances a state machine without delay loops, proving deterministic timing.
Definitions & key terms
- Prescaler -> Divider that slows the timer clock.
- Auto-reload -> Value that sets the timer period.
- Update event -> Timer overflow event that can trigger interrupt or DMA.
- Interrupt latency -> Time between event and ISR execution.
- Jitter -> Variation in interrupt timing due to system load.
Mental model diagram (ASCII)
Timer clock -> prescaler -> counter -> overflow -> ISR -> state machine step
How it works (step-by-step, with invariants and failure modes)
- Select a timer and enable its clock.
- Compute prescaler and auto-reload for desired tick rate.
- Enable update interrupt and configure NVIC priority.
- In ISR, update the state machine and clear interrupt flags.
- Invariant: timer tick period remains stable; failure mode: drift or missed interrupts if flags are not cleared.
Minimal concrete example
// Timer ISR toggles LED state
void TIM2_IRQHandler(void) {
if (TIM2->SR & TIM_SR_UIF) {
TIM2->SR &= ~TIM_SR_UIF;
step_led_pattern();
}
}
Common misconceptions
- Delays are fine for scheduling ignores CPU load and jitter.
- Timer interrupts are always on time ignores NVIC priority and blocking ISRs.
- Auto-reload equals milliseconds ignores prescaler and timer clock source.
Check-your-understanding questions
- What causes a factor-of-two error in timer frequency?
- How do you measure jitter in a timer-driven system?
- Why must you clear the update flag in the ISR?
Check-your-understanding answers
- Forgetting that timers may run at twice the APB clock when the prescaler is not 1.
- Toggle a GPIO in the ISR and measure period variation with a logic analyzer.
- If not cleared, the ISR may immediately retrigger or mask new interrupts.
Real-world applications
- LED sequencing and deterministic UI timing.
- Sensor sampling schedules with fixed intervals.
- Control loops in motors or power electronics.
Where you’ll apply it
- In this project: see Section 4.4 Algorithm Overview and Section 5.10 Implementation Phases.
- Also used in: P04 PWM Motor or LED Brightness Controller, P05 ADC Sensor Sampler + Logging.
References
- STM32F3 Reference Manual (timer chapter).
- Joseph Yiu, ‘The Definitive Guide to ARM Cortex-M3/M4’ (interrupts).
- Elecia White, ‘Making Embedded Systems’ (timing and scheduling).
Key insights
- Timers create deterministic time; interrupts make that time actionable without blocking the CPU.
Summary Timers and interrupts are the foundation of real-time embedded behavior. They replace blocking delays with precise schedules and make timing measurable.
Homework/Exercises to practice the concept
- Calculate prescaler and auto-reload for a 250 ms timer tick at 72 MHz.
- Toggle a GPIO on each timer interrupt and measure drift over 5 minutes.
- Experiment with different NVIC priorities and observe latency.
Solutions to the homework/exercises
- A 1 kHz tick requires prescaler 71999; a 250 ms interval uses auto-reload 249.
- Drift near zero indicates correct timer configuration; large drift indicates clock misconfig.
- Raising priority reduces latency but can starve lower-priority handlers.
Timebase, SysTick, and Measurement Discipline
Fundamentals A reliable timebase is how you reason about time in firmware. SysTick is a dedicated timer tied to the core clock that can generate periodic interrupts. If you configure it for 1 kHz, you get a millisecond tick that becomes your system heartbeat. This tick drives delays, scheduling, timeouts, and timestamps. But SysTick is only reliable if you know the core clock frequency and if you measure the actual timing. Without measurement, a ‘1 ms tick’ is just a guess. A bring-up project must therefore treat SysTick as a calibration point and validate it with real-world observation.
Deep Dive into the concept SysTick is a 24-bit down-counter integrated into the Cortex-M core. It takes the core clock (or core clock divided by 8) and counts down from a reload value to zero, then sets a flag and optionally fires an interrupt. Because it is in the core, it is unaffected by peripheral bus prescalers, which makes it a good reference for measuring the system clock. To configure SysTick, you load the reload register with (core_clock / desired_tick) - 1, select the clock source, enable the counter, and enable its interrupt. The interrupt handler typically increments a global tick counter. On STM32, HAL_Delay and other drivers are often built on this tick. The main danger is implicit coupling: if SysTick is configured incorrectly, every delay and timeout in your system is wrong. Another hazard is jitter. SysTick interrupts can be delayed by higher-priority interrupts, which means the tick is not a precise real-time clock, but a best-effort scheduler. For measurement, you should use SysTick only as a reference and then validate it with a GPIO toggle or timer output. A robust timebase audit logs the computed reload value, the configured clock source, and the observed period. The audit should also include a drift test: toggle a pin every N ticks for several minutes and measure drift relative to a stopwatch or logic analyzer. If drift is observed, the root cause is often clock source accuracy (HSI vs HSE) or an incorrect reload value due to a wrong core clock assumption. You also need to consider wraparound. A 24-bit counter at 72 MHz will wrap quickly if used without an interrupt, and a 32-bit tick counter will wrap after ~49 days at 1 kHz. In embedded systems, you typically handle wraparound by using unsigned arithmetic and comparing time differences rather than absolute values. Finally, you must decide what ‘good enough’ measurement means. For LED blinking, 1-2% accuracy is acceptable. For UART baud or ADC sampling, you need tighter tolerances. The discipline is to tie every time-dependent feature to a measured reference rather than an assumption.
How this fit on projects In Interrupt Latency Profiler, SysTick is your baseline for clock validation. You compute the reload value, log it, and then validate the resulting tick with physical measurement.
Definitions & key terms
- SysTick -> Core-integrated timer used for periodic interrupts and timekeeping.
- Reload value -> The count value loaded into SysTick before it starts counting down.
- Tick -> A periodic time event, commonly 1 ms in embedded systems.
- Jitter -> Variation in the timing of periodic events due to interrupt latency.
- Drift -> Long-term timing error relative to a reference clock.
Mental model diagram (ASCII)
Core clock -> SysTick down-counter -> interrupt -> tick++
|
v
timing reference
How it works (step-by-step, with invariants and failure modes)
- Compute reload value based on expected core clock and desired tick rate.
- Configure SysTick to use core clock and enable interrupt.
- Increment a global tick counter in the SysTick handler.
- Toggle a GPIO every N ticks and measure the period.
- Invariant: tick increments at configured frequency; failure mode: drift or jitter due to wrong clock or interrupt priority.
Minimal concrete example
void SysTick_Handler(void) {
g_tick_ms++;
}
void delay_ms(uint32_t ms) {
uint32_t start = g_tick_ms;
while ((uint32_t)(g_tick_ms - start) < ms) {
__WFI();
}
}
Common misconceptions
- SysTick gives precise real-time scheduling ignores interrupt latency and jitter.
- If LED blinks, tick is correct ignores the need to measure drift over time.
- Tick counters never overflow ignores wraparound in long-running systems.
Check-your-understanding questions
- Why might SysTick interrupts be delayed even if configured correctly?
- How do you compute reload for 1 kHz at 72 MHz?
- What is the safest way to compare timeouts with wraparound?
Check-your-understanding answers
- Higher-priority interrupts can preempt SysTick, causing jitter.
- Reload = (72,000,000 / 1,000) - 1 = 71,999.
- Use unsigned subtraction: if (now - start) >= timeout.
Real-world applications
- Scheduling periodic sensor sampling.
- Timeouts in communication protocols.
- Measuring CPU load and real-time performance.
Where you’ll apply it
- In this project: see Section 3.4 Example Usage and Section 6.2 Critical Test Cases.
- Also used in: P03 Timer-Driven LED Sequencer, P07 Interrupt Latency Profiler.
References
- ARM Cortex-M4 Technical Reference Manual (SysTick).
- Joseph Yiu, ‘The Definitive Guide to ARM Cortex-M3/M4’ (timers and exceptions).
- Elecia White, ‘Making Embedded Systems’ (timing discipline).
Key insights
- A tick is only trustworthy when you can measure it and bound its jitter and drift.
Summary SysTick provides a convenient timebase, but it is only as accurate as your clock configuration and interrupt discipline. Treat it as a calibrated instrument, not a magic delay source.
Homework/Exercises to practice the concept
- Compute the reload value for 2 kHz and verify it in code.
- Measure the drift of a 1 Hz LED blink over 10 minutes.
- Experiment with interrupt priorities and observe SysTick jitter.
Solutions to the homework/exercises
- Reload for 2 kHz at 72 MHz is 35,999.
- A 1 Hz blink should be 600 seconds over 10 minutes; log the error and compute percent drift.
- Setting a higher-priority timer interrupt increases SysTick jitter; adjust priorities if needed.
3. Project Specification
3.1 What You Will Build
A latency profiler that triggers interrupts from a timer, toggles a GPIO at ISR entry, and reports latency statistics.
3.2 Functional Requirements
- Latency Measurement: Measure latency from event to ISR entry using GPIO or DWT.
- Priority Tests: Test multiple priority configurations and compare results.
- Load Tests: Introduce background load and measure worst-case latency.
- Report: Output min/avg/max latency statistics.
3.3 Non-Functional Requirements
- Performance: Measurement resolution <1 us with DWT or external logic analyzer.
- Reliability: Repeatable results across runs.
- Usability: Simple configuration for priorities and load levels.
3.4 Example Usage / Output
Latency (us): min=1.2 avg=1.6 max=3.8
Priority: timer=1, uart=3
Load: 40% CPU
Status: PASS
3.5 Data Formats / Schemas / Protocols
Latency report: LAT_MIN_US=<f> LAT_AVG_US=<f> LAT_MAX_US=<f>
3.6 Edge Cases
- Interrupts disabled during critical sections.
- High-priority ISR starves lower-priority interrupts.
- Measurement pin toggled too late in ISR.
- DWT counter not enabled or overflowing.
3.7 Real World Outcome
You will produce a latency report that quantifies worst-case response time.
3.7.1 How to Run (Copy/Paste)
$ make flash
$ screen /dev/tty.usbmodem* 115200
3.7.2 Golden Path Demo (Deterministic)
- Run with fixed load and priority settings; record results.
3.7.3 CLI Transcript (Success)
LAT_MIN_US=1.2 LAT_AVG_US=1.6 LAT_MAX_US=3.8
RESULT=PASS
# Exit code: 0
3.7.4 Failure Demo (Interrupts Masked)
LAT_MAX_US=120.0
RESULT=FAIL
# Exit code: 2
4. Solution Architecture
Timer events generate interrupts; ISR toggles a GPIO and DWT captures cycle counts for latency measurement.
4.1 High-Level Design
Timer Event -> NVIC -> ISR Entry -> GPIO Toggle -> Latency Report
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Event Source | Timer or external trigger | Timer for repeatability |
| ISR Probe | GPIO toggle and DWT capture | Probe at first instruction |
| Latency Analyzer | Computes statistics | Store samples in buffer |
4.3 Data Structures (No Full Code)
typedef struct {
uint32_t samples[256];
uint16_t count;
} latency_buf_t;
4.4 Algorithm Overview
Latency Capture
- Record cycle counter on event.
- Record cycle counter at ISR entry.
- Compute delta and store.
Complexity: O(1) per sample.
5. Implementation Guide
5.1 Development Environment Setup
make init
make flash
screen /dev/tty.usbmodem* 115200
5.2 Project Structure
project-root/
|-- src/
| |-- main.c
| |-- drivers/
| `-- app/
|-- include/
|-- Makefile
`-- README.md
5.3 The Core Question You’re Answering
“What is my worst-case interrupt latency under real load?”
5.4 Concepts You Must Understand First
- NVIC priorities and preemption.
- GPIO timing measurement.
- DWT cycle counter usage.
5.5 Questions to Guide Your Design
- How will you trigger the event deterministically?
- Where exactly will you toggle the GPIO?
- What constitutes unacceptable latency?
5.6 Thinking Exercise
Latency Budget
Deadline = 50 us
Worst-case latency <= 10 us
ISR execution <= 40 us
5.7 The Interview Questions They’ll Ask
- What is interrupt latency and why does worst-case matter?
- How do priorities affect preemption?
- How do you measure ISR latency on hardware?
5.8 Hints in Layers
Hint 1: Toggle GPIO at first line of ISR. Hint 2: Use DWT cycle counter for higher resolution. Hint 3: Introduce synthetic CPU load for stress testing.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Interrupts | Definitive Guide to ARM Cortex-M | Ch. 7 |
| Real-time design | Making Embedded Systems | Ch. 8 |
5.10 Implementation Phases
Phase 1: Baseline Measurement (2 days)
Measure latency with minimal load.
Phase 2: Priority Experiments (4 days)
Change NVIC priorities and record effects.
Phase 3: Load Testing (4 days)
Introduce load and compute worst-case latency.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Measurement | GPIO vs DWT | Both | GPIO for physical timing, DWT for precision |
| Event source | Timer vs external | Timer | Deterministic events |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Stats calculation | Min/avg/max computation |
| Integration Tests | Latency capture | GPIO timing validation |
| Edge Case Tests | Interrupt masking | Critical section delays |
6.2 Critical Test Cases
- Baseline Latency: Latency stays within expected range with minimal load.
- Priority Impact: Lower priority interrupts show increased latency.
- Masked Interrupts: Latency spikes when interrupts are disabled.
6.3 Test Data
Min=1.1 us, Avg=1.5 us, Max=3.7 us
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| GPIO toggled late | Latency appears smaller than reality | Toggle at ISR entry |
| DWT disabled | No cycle counts | Enable CYCCNT |
| Logging in ISR | Latency skewed | Store samples and log later |
7.2 Debugging Strategies
- Use a logic analyzer to validate GPIO timing.
- Compare DWT cycles to measured GPIO edges.
- Reduce ISR work and see latency improve.
7.3 Performance Traps
Heavy ISRs can make latency appear worse than it needs to be; move work out of ISR.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a histogram of latency values.
8.2 Intermediate Extensions
- Measure latency for external interrupt input.
8.3 Advanced Extensions
- Automate sweep of priority configurations and plot results.
9. Real-World Connections
9.1 Industry Applications
- Motor control: Interrupt latency determines control loop stability.
- Safety systems: Guaranteed response times for fault detection.
9.2 Related Open Source Projects
- FreeRTOS: RTOS timing and interrupt latency discussions.
- CMSIS: NVIC and DWT access reference.
9.3 Interview Relevance
- Interrupt latency measurement scenarios.
- Priority and preemption design questions.
10. Resources
10.1 Essential Reading
- Definitive Guide to ARM Cortex-M by Joseph Yiu - NVIC and exception details.
- Making Embedded Systems by Elecia White - Real-time timing discipline.
10.2 Video Resources
- Cortex-M interrupt latency measurement tutorial.
- NVIC priority configuration demo.
10.3 Tools & Documentation
- Logic analyzer: Measure GPIO timing.
- STM32CubeIDE: Build and debug.
10.4 Related Projects in This Series
- P03 Timer-Driven LED Sequencer - uses timer interrupts.
- P14 Fault Injection and Watchdog Recovery - reliability under faults.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain NVIC priority and preemption.
- I understand worst-case vs average latency.
- I can use DWT for cycle measurements.
11.2 Implementation
- Latency report produced with min/avg/max.
- Load tests show expected changes.
- Measurements repeat across runs.
11.3 Growth
- I can design a latency budget for a new system.
- I can improve latency by reducing ISR work.
- I can explain these measurements in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Latency measurement works.
- Report produced.
- At least two priority configs tested.
Full Completion:
- Load tests performed.
- Worst-case latency documented.
Excellence (Going Above & Beyond):
- Automated latency sweep and plots created.