Learn RTOS from Scratch in C: From Bare Metal to a Preemptive Kernel

Goal: Build a real, working RTOS kernel on a Cortex-M microcontroller and understand every layer that makes it work: reset and boot, memory layout, interrupts, system tick, context switching, task states, scheduling, synchronization, and time services. By the end, you will be able to design and implement a deterministic scheduler, debug interrupt-level code, and reason about latency, jitter, and deadlines like a real embedded systems engineer. You will also understand how commercial RTOS kernels (FreeRTOS, Zephyr, ThreadX, etc.) are structured internally because you will have built their core from scratch.

Introduction

A Real-Time Operating System (RTOS) is a minimal kernel that guarantees timing behavior. Unlike a general-purpose OS, an RTOS is designed for deterministic response: tasks must run within known deadlines, often on tiny microcontrollers with tight memory and CPU budgets. This guide walks you from bare-metal C to a fully preemptive RTOS kernel with tasks, interrupts, context switching, and synchronization primitives.

What you will build:

A bare-metal firmware with custom startup code and linker script
A millisecond system tick driven by SysTick
A cooperative scheduler with explicit yields
A preemptive, priority-based scheduler driven by interrupts
Synchronization primitives (mutexes, semaphores, queues, event flags)
Time services (sleep, timeout, periodic timers)
Memory pools and stack safety checks
Instrumentation for latency measurement and debugging

Scope boundaries:

Target architecture: ARM Cortex-M (examples use STM32F4 class MCUs)
Kernel only (no filesystem, no networking stack)
C and small amounts of ARM assembly for context switching
No dynamic memory allocation required (static-only is acceptable)

Big picture diagram

                         APPLICATION TASKS
┌──────────────────────────────────────────────────────────┐
│   Task A     Task B     Task C     Idle/Background       │
│  (sensor)   (control)  (logging)   (low power)           │
└───────────────┬───────────────┬───────────────┬──────────┘
                │ syscalls: sleep/yield/lock/post
                v
┌──────────────────────────────────────────────────────────┐
│                     RTOS KERNEL                          │
│  Scheduler  Task Control Blocks  IPC (mutex/queue)       │
│  Time Mgmt  Priority + States     Context Switch         │
└───────────────┬───────────────────────────┬──────────────┘
                │ SysTick + PendSV + NVIC
                v
┌──────────────────────────────────────────────────────────┐
│                HARDWARE ABSTRACTION LAYER                │
│  SysTick  GPIO  UART  Timers  Interrupt Controller       │
└──────────────────────────────────────────────────────────┘
                │ Memory-mapped registers
                v
┌──────────────────────────────────────────────────────────┐
│                     PHYSICAL HARDWARE                    │
│   Cortex-M CPU   Flash/RAM   Timers   GPIO   UART         │
└──────────────────────────────────────────────────────────┘

How to Use This Guide

Read the Theory Primer as a mini-book. Each concept maps directly to projects.
Build the projects in order. Each project depends on the previous ones.
Use the Project-to-Concept Map to revisit chapters if you get stuck.
Keep your board connected and use OpenOCD + GDB for real debugging.
Keep a lab notebook: record register values, ISR timing, and bugs you fix.

Prerequisites & Background

Essential Prerequisites (Must Have)

Solid C programming (pointers, structs, volatile, memory layout)
Comfort reading datasheets and reference manuals
Basic digital logic and CPU architecture concepts

Helpful But Not Required

ARM assembly and calling conventions
Experience with embedded toolchains (GCC, Make, GDB)
Understanding of OS basics (processes, scheduling)

Self-Assessment Questions

Can you explain what volatile means in embedded C?
Do you know how a linker script controls memory placement?
Can you read a peripheral register map and configure a GPIO pin?
Do you understand why interrupts are asynchronous to normal execution?

Development Environment Setup

Board: STM32F4 (Black Pill or Nucleo)
Debugger: ST-Link V2
Toolchain: arm-none-eabi-gcc, make, openocd, arm-none-eabi-gdb
Optional: Logic analyzer or oscilloscope for timing verification

Time Investment

Project 1-2: 1 weekend
Project 3-4: 2-3 weeks total (context switch is the hard part)
Project 5-10: 3-5 weeks depending on debugging depth

Important Reality Check

This is systems programming at the metal. You will hit hard faults, lock up the board, and debug memory corruption. That is the point. Expect frustration and be systematic: change one thing at a time, instrument, and verify.

Big Picture / Mental Model

Think of your RTOS as a loop of time + state + decision:

Tick Interrupt --> Update timers --> Choose next task --> Context switch
      ^                                                   |
      |                                                   v
  Hardware clock                                   Task runs until:
      |                                      - yields
      |                                      - blocks
      +-------------------------------------- - is preempted

The kernel is essentially the code that runs between those arrows.

Theory Primer

1) Real-Time Fundamentals and Timing Guarantees

Fundamentals

Real-time systems are not about running fast; they are about running predictably. A task is real-time when it has a deadline or maximum response time, and missing that deadline is a failure (hard real-time) or a degradation (soft real-time). Key quantities include latency (time between event and response), jitter (variation in timing), and worst-case execution time (WCET). Determinism is more important than raw throughput. In embedded systems, deadlines often come from the physical world: a motor control loop must update at a fixed period, a sensor must be sampled before data becomes stale, or a safety signal must be handled within microseconds.

Deep Dive into the Concept

A real-time system is a scheduling problem constrained by physics. If you have a periodic task that must run every 1 ms for 50 us, the processor must be available every millisecond, or the output becomes invalid. This means you must reason about utilization (total CPU time consumed by tasks), priority (which task runs first when multiple are ready), and blocking (time spent waiting for resources). In a bare-metal loop, your timing is implicit: code runs in a fixed order, and interrupts can preempt at any point. In an RTOS, timing becomes explicit: tasks are created, priorities are assigned, and the kernel enforces a deterministic order.

Real-time theory gives you two key mental tools. First, schedulability analysis: can the CPU meet all deadlines? Classic results like Rate Monotonic Scheduling (RMS) and Earliest Deadline First (EDF) answer that under assumptions. Second, response-time analysis: given a task and its priority, what is the worst-case time from event to completion? Even if you do not run formal proofs, you must think in these terms to design a reliable kernel. Your RTOS is the device that enforces these guarantees.

In practice, embedded engineers often use a hybrid approach: priority-based preemption for critical tasks, while lower priority tasks run when the CPU is idle. The system tick drives periodic scheduling decisions, while interrupts handle urgent asynchronous events. Understanding the differences between hard, firm, and soft real-time allows you to choose appropriate design tradeoffs. For example, missing a motor control update could damage hardware (hard), while missing a UI update just causes stutter (soft).

How This Fits on Projects

All projects depend on timing. Projects 2, 4, 5, 8, and 10 explicitly measure or enforce timing; projects 6 and 7 show how timing can be destroyed by blocking or priority inversion.

Definitions & Key Terms

Deadline: latest acceptable completion time for a task
Latency: time from event to response
Jitter: variation in response timing
WCET: worst-case execution time
Hard Real-Time: missed deadline is a failure
Soft Real-Time: missed deadline is a degradation

Mental Model Diagram

Event occurs ----> [Latency] ----> Task executes ----> Deadline
                     ^                 ^
                     |                 |
                   jitter          execution time

How It Works (Step-by-Step)

An external event (sensor interrupt) or time event (tick) occurs.
The system captures the event and marks a task ready.
Scheduler decides if the ready task should preempt the current one.
Task runs and completes before its deadline.
The system records timing to verify jitter and latency bounds.

Minimal Concrete Example

// Hard real-time: 1 ms motor control loop
#define PERIOD_TICKS 1
void motor_task(void) {
    while (1) {
        control_step();      // must finish in <1 ms
        sleep_ticks(PERIOD_TICKS);
    }
}

Common Misconceptions

“Real-time means fast” (it means predictable)
“An RTOS guarantees deadlines” (only if you design correctly)
“If average utilization < 100%, it’s fine” (worst-case matters)

Check-Your-Understanding Questions

What is the difference between latency and jitter?
Why is WCET more important than average execution time?
When can soft real-time be acceptable?

Check-Your-Understanding Answers

Latency is the delay from event to response; jitter is the variation in that delay.
A single worst-case overrun can violate deadlines even if average time is low.
When occasional deadline misses only reduce quality, not safety or correctness.

Real-World Applications

Motor control loops in robotics
Engine control units in automotive
Medical devices that must respond to alarms within strict bounds

Where You’ll Apply It

Projects 2, 4, 5, 8, and 10.

References

RTOS task priorities and immediate switch behavior: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__ThreadMgmt.html
IoT device scale and growth context (for why real-time matters): https://iot-analytics.com/number-connected-iot-devices/

Key Insight

Determinism, not speed, defines real-time success.

Summary

Real-time engineering is about meeting deadlines predictably. You must understand latency, jitter, and WCET and design your kernel to enforce timing rules.

Homework/Exercises to Practice the Concept

Measure the jitter of a 1 kHz loop using GPIO toggles and a logic analyzer.
Compute CPU utilization for three periodic tasks and determine if deadlines are feasible.
Identify a system in your life that is hard real-time and justify why.

Solutions to the Homework/Exercises

Toggle a pin at loop entry and exit, measure timing variance with a scope.
Add execution times and divide by periods; ensure total utilization < safe bound.
Example: airbag deployment must respond within milliseconds or fails safety.

2) Bare-Metal Boot, Memory Map, and Linker Control

Fundamentals

On a microcontroller, there is no OS to prepare your program. The CPU starts at a fixed reset vector address, loads an initial stack pointer, and jumps to your reset handler. You must provide a startup file and linker script to describe where code and data live in flash and RAM. This is where your RTOS begins: if the vector table is wrong, interrupts will never work; if the stack is misplaced, context switches will crash.

Deep Dive into the Concept

The Cortex-M boot sequence is deterministic. At reset, the CPU reads the first two 32-bit words in flash: the initial stack pointer and the reset handler address. This means your linker script must place the vector table at address 0x00000000 (or remapped location). Your startup code sets up .data (copying initialized data from flash to RAM) and clears .bss. After that, it calls main.

In an RTOS, you will add additional sections: task stacks, TCB arrays, and possibly memory pools. You must decide which objects live in RAM and which in flash. You must also manage stack alignment and avoid overlap with the heap or other buffers. Linker control is your guarantee that each task has a safe, dedicated stack region.

This chapter is where you learn why embedded firmware is not just C code. The binary is a memory layout. If you understand linker scripts, you can create multiple memory regions, reserve space for a bootloader, and place the vector table in a custom region. Every bug here is catastrophic: wrong addresses cause hard faults that seem mysterious until you realize your stack pointer points into flash.

How This Fits on Projects

Project 1 builds the linker script and startup. Projects 3-4 depend on stack layout and vector table correctness.

Definitions & Key Terms

Vector table: array of pointers to exception handlers
Reset handler: first code executed after reset
Linker script: map of memory regions and section placement
.data / .bss: initialized and zero-initialized data segments

Mental Model Diagram

Flash (ROM)                         RAM
┌───────────────┐             ┌───────────────┐
│ Vector table  │----+        │ .data         │
│ .text (code)  │    | copy   │ .bss (zero)   │
│ .rodata       │    +------> │ stacks        │
└───────────────┘             └───────────────┘
       ^ reset reads SP, PC

How It Works (Step-by-Step)

CPU reads initial SP and reset handler from vector table.
Reset handler sets stack pointer and initializes RAM sections.
C runtime (minimal) is prepared.
main() runs and your RTOS initialization begins.

Minimal Concrete Example

/* linker.ld */
MEMORY
{
  FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
  RAM (rwx)  : ORIGIN = 0x20000000, LENGTH = 128K
}

SECTIONS
{
  .isr_vector : { KEEP(*(.isr_vector)) } > FLASH
  .text : { *(.text*) *(.rodata*) } > FLASH
  .data : { *(.data*) } > RAM AT > FLASH
  .bss  : { *(.bss*) *(COMMON) } > RAM
}

Common Misconceptions

“The compiler decides memory layout” (the linker script does)
“Stacks are automatic” (you must allocate them)
“Vector table can be anywhere” (the CPU expects it at reset)

Check-Your-Understanding Questions

What happens if .data is not copied to RAM?
Why does the CPU read the initial SP from flash?
What causes a hard fault if the stack pointer is wrong?

Check-Your-Understanding Answers

Variables with initial values will be incorrect.
The CPU must know where the stack begins before code runs.
The first push/pop or interrupt will write to an invalid address.

Real-World Applications

Bootloaders that remap the vector table
Firmware with multiple memory regions (boot + app)

Where You’ll Apply It

Projects 1, 3, 4, 9.

References

Cortex-M4 System timer and registers (for memory map examples): https://manuals.plus/m/a5ff6b2d88bcbc6b3ccdc368778a3c98f71c37239afbd2620c60acbffbbaa1fa

Key Insight

Your RTOS is only as reliable as your memory map.

Summary

The linker script and startup file define the physical reality your kernel runs in. Without them, there is no safe stack, no interrupts, and no RTOS.

Homework/Exercises to Practice the Concept

Modify a linker script to reserve 4 KB for a bootloader.
Add a separate .stack section and place it at the top of RAM.
Relocate the vector table and verify interrupts still fire.

Solutions to the Homework/Exercises

Create a FLASH region starting after the bootloader size.
Add .stack section and symbol to reserve space.
Update VTOR register to new vector table address.

3) Interrupts, SysTick, and the Exception Model

Fundamentals

Interrupts are asynchronous events that suspend normal execution. On Cortex-M, interrupts and exceptions are managed by the NVIC and a fixed vector table. SysTick is a built-in 24-bit timer designed to generate periodic interrupts used by RTOS kernels for scheduling. The CPU automatically saves a subset of registers on exception entry and restores them on return, enabling fast, deterministic response.

Deep Dive into the Concept

The Cortex-M exception model is the foundation of a preemptive RTOS. When an interrupt fires, the CPU automatically pushes registers onto the current stack (R0-R3, R12, LR, PC, xPSR) and switches to Handler mode. It then loads the ISR address from the vector table and executes the handler. On return, the processor uses a special EXC_RETURN value to restore state and continue where it left off. This is how tasks can be interrupted and later resumed safely.

SysTick is a 24-bit down-counter that reloads from a programmed value and can generate an interrupt each time it reaches zero. The ARM documentation defines registers such as CTRL (control/status), LOAD (reload value), VAL (current value), and CALIB (calibration). This standardized timer allows your RTOS to be portable across Cortex-M devices. By programming SysTick to fire every 1 ms, you create the heartbeat for timekeeping and preemptive scheduling. The interrupt handler updates the tick count, manages sleeping tasks, and triggers a context switch via PendSV.

The crucial design principle: keep ISRs short and deterministic. Your RTOS should defer heavy work to tasks, not do it in the interrupt itself. This is why you will build message queues and deferred processing in later projects.

How This Fits on Projects

Projects 2 and 4 implement SysTick and ISR-driven preemption. Projects 7 and 8 rely on ISR-safe IPC.

Definitions & Key Terms

ISR: Interrupt Service Routine
NVIC: Nested Vectored Interrupt Controller
SysTick: Cortex-M system timer
EXC_RETURN: special value used to return from exception

Mental Model Diagram

Normal Task ----> [Interrupt occurs]
      |           CPU pushes R0-R3,R12,LR,PC,xPSR
      v
  ISR runs (short)
      |
      v
  CPU pops saved registers, resume task

How It Works (Step-by-Step)

SysTick counter reaches zero.
NVIC asserts SysTick exception.
CPU saves a hardware stack frame automatically.
SysTick handler updates kernel tick and possibly triggers PendSV.
Exception return restores registers and continues execution.

Minimal Concrete Example

volatile uint32_t g_tick = 0;

void SysTick_Handler(void) {
    g_tick++;
    rtos_tick(); // update timers, set PendSV if needed
}

int main(void) {
    SysTick_Config(SystemCoreClock / 1000); // 1ms tick
    while (1) {}
}

Common Misconceptions

“Interrupts save all registers” (only a subset is hardware-stacked)
“ISRs are just normal functions” (they run in Handler mode)
“SysTick is peripheral-specific” (it is ARM core IP)

Check-Your-Understanding Questions

Which registers are stacked automatically on Cortex-M exception entry?
Why should ISRs be short?
What is SysTick used for in an RTOS?

Check-Your-Understanding Answers

R0-R3, R12, LR, PC, xPSR.
Long ISRs increase latency and jitter for other interrupts.
Periodic tick for timekeeping and scheduler activation.

Real-World Applications

Periodic sensor sampling
Communication timeouts
Motor control loops driven by interrupts

Where You’ll Apply It

Projects 2, 4, 7, 8.

References

Cortex-M4 SysTick timer and register details: https://manuals.plus/m/a5ff6b2d88bcbc6b3ccdc368778a3c98f71c37239afbd2620c60acbffbbaa1fa
SysTick used for periodic OS context switching: https://arm-software.github.io/CMSIS_6/main/Core/group__SysTick__gr.html
Exception stack frame description and EXC_RETURN behavior: https://community.arm.com/support-forums/f/architectures-and-processors-forum/5291/the-reason-why-the-exception-frame-forms-on-psp

Key Insight

Interrupts and SysTick are the hardware heartbeat that makes preemption possible.

Summary

The exception model defines how the CPU saves and restores state. SysTick provides a portable, periodic interrupt that drives your scheduler.

Homework/Exercises to Practice the Concept

Configure SysTick to generate a 2 ms tick and verify using GPIO toggles.
Modify the SysTick ISR to count missed ticks if it runs late.
Trigger a manual SysTick exception from software and observe behavior.

Solutions to the Homework/Exercises

Set reload to (SystemCoreClock/500) - 1 and toggle GPIO in ISR.
Compare current timer value at ISR entry to expected threshold.
Write to SysTick CTRL to set COUNTFLAG and trigger interrupt.

4) Context Switching, Stack Frames, and Task Control Blocks

Fundamentals

A task is just a function with its own stack and saved CPU context. A context switch saves the state of one task and restores another. On Cortex-M, hardware automatically saves part of the state on exception entry, while software (your kernel) saves the rest. A Task Control Block (TCB) stores the stack pointer and metadata for each task.

Deep Dive into the Concept

Context switching is the core of multitasking. On Cortex-M, the processor uses two stack pointers: MSP (Main Stack Pointer) and PSP (Process Stack Pointer). By running tasks on PSP and exceptions on MSP, you can isolate kernel and task stacks. When a context switch is requested, the kernel triggers the PendSV exception, which runs at the lowest priority. PendSV is designed for this purpose: it is safe to run after all higher priority interrupts complete.

During PendSV, you save callee-saved registers (R4-R11) onto the current task’s stack. The hardware already saved R0-R3, R12, LR, PC, xPSR when entering the exception. You then store the PSP into the current TCB, choose the next task, load its PSP, restore its R4-R11, and exit the exception. The CPU automatically restores the rest of the context, and the new task resumes as if it had never been interrupted.

This logic explains why stacks must be carefully initialized. To start a task for the first time, you fake a stack frame as if the task had been interrupted. That means placing initial values for xPSR (Thumb bit set), PC (task entry), LR (task exit handler), and general registers. The first context switch simply “returns” into the task.

How This Fits on Projects

Projects 3 and 4 implement context switching and PendSV. Project 9 uses stack sizing and overflow detection.

Definitions & Key Terms

TCB: Task Control Block
PSP/MSP: Process and Main Stack Pointers
PendSV: Exception used for deferred context switches
Stack frame: saved register context on the stack

Mental Model Diagram

Task A stack: [R4..R11][HW frame]
      |
      | save PSP -> TCB A
      v
Switch
      ^
      | load PSP <- TCB B
Task B stack: [R4..R11][HW frame]

How It Works (Step-by-Step)

Scheduler decides to switch tasks.
Kernel triggers PendSV.
PendSV saves R4-R11 onto current task stack.
PSP stored in current TCB.
Next TCB selected; PSP loaded.
R4-R11 restored from new task stack.
Exception return restores HW frame and resumes task.

Minimal Concrete Example

struct tcb { uint32_t *sp; uint8_t prio; };

__attribute__((naked)) void PendSV_Handler(void) {
    __asm volatile(
        "mrs r0, psp            \n" // get PSP
        "stmdb r0!, {r4-r11}    \n" // save callee-saved
        "ldr r1, =current_tcb   \n"
        "ldr r2, [r1]           \n"
        "str r0, [r2]           \n" // save SP
        "bl  schedule_next      \n" // select next task
        "ldr r1, =current_tcb   \n"
        "ldr r2, [r1]           \n"
        "ldr r0, [r2]           \n" // load SP
        "ldmia r0!, {r4-r11}    \n" // restore
        "msr psp, r0            \n"
        "bx lr                  \n" // exception return
    );
}

Common Misconceptions

“Context switch saves all registers automatically” (only some are automatic)
“Tasks run on MSP” (best practice is PSP)
“PendSV is just another interrupt” (it is special, low priority)

Check-Your-Understanding Questions

Why does the CPU only save part of the context automatically?
Why is PendSV used for context switching?
What must be in a task’s initial stack frame?

Check-Your-Understanding Answers

To minimize interrupt latency; software saves the rest when needed.
It runs after all higher priority interrupts, making switching safe.
xPSR (Thumb bit), PC, LR, and initial register values.

Real-World Applications

Any multitasking embedded system
Thread switching in commercial RTOS kernels

Where You’ll Apply It

Projects 3, 4, 9.

References

Exception entry stack frame and PendSV usage discussion: https://community.arm.com/support-forums/f/architectures-and-processors-forum/5291/the-reason-why-the-exception-frame-forms-on-psp

Key Insight

A context switch is just controlled stack manipulation plus the exception return mechanism.

Summary

Tasks are stacks with saved registers. PendSV provides a safe hook to swap those stacks.

Homework/Exercises to Practice the Concept

Draw the exact stack layout for a task that has never run.
Manually simulate a context switch using a debugger and registers.
Add a guard pattern to detect stack overflow.

Solutions to the Homework/Exercises

Stack should contain xPSR, PC, LR, R12, R3-R0 plus fake R4-R11.
Use GDB to push/pop registers and update PSP, then continue.
Fill stack with 0xDEADBEEF and check for corruption.

5) Scheduling, Task States, and Priority

Fundamentals

Scheduling decides which task runs next. An RTOS typically uses fixed-priority preemptive scheduling: the highest-priority READY task always runs. Tasks move between states (READY, RUNNING, BLOCKED) based on events and timeouts. Preemption ensures higher-priority tasks can interrupt lower-priority ones.

Deep Dive into the Concept

A scheduler is a policy implemented by a small amount of code. In cooperative scheduling, tasks run until they call yield() or block. In preemptive scheduling, a periodic interrupt (SysTick) forces the kernel to choose the next task. Fixed-priority preemption is simple and deterministic: if a task with higher priority becomes READY, the system switches immediately. This is the behavior described in CMSIS-RTOS.

More advanced policies include round-robin (time slicing among equal-priority tasks), rate-monotonic scheduling (shorter periods get higher priority), and earliest-deadline-first scheduling (dynamic priorities). You will implement fixed priority first because it is predictable and easiest to debug. Later projects add timeouts and sleep states so tasks can block without busy-waiting, which improves determinism and efficiency.

Task states are critical to correctness. A task is READY when it can run, RUNNING when it owns the CPU, and BLOCKED when it waits for an event (mutex, queue, timer). When a blocked task becomes ready, it might preempt the current task. If you mishandle states, you can starve tasks or create priority inversion. Priority inversion happens when a low-priority task holds a resource needed by a high-priority task. RTOS kernels typically mitigate this with priority inheritance on mutexes.

How This Fits on Projects

Projects 3-6 implement scheduling and task states. Project 6 demonstrates priority inversion and inheritance.

Definitions & Key Terms

READY/RUNNING/BLOCKED: task states
Preemption: interrupting a running task
Priority inheritance: temporarily boosting a task that holds a needed resource
Time slicing: round-robin scheduling among equal priorities

Mental Model Diagram

READY ---> RUNNING ---> BLOCKED ---> READY
     ^         |             |
     |         v             |
     +---- preempted <-------+

How It Works (Step-by-Step)

SysTick or event makes a task READY.
Scheduler compares priorities.
If new task has higher priority, switch immediately.
If equal priority, apply round-robin if enabled.
If a task blocks, scheduler chooses next READY task.

Minimal Concrete Example

int schedule_next(void) {
    int best = idle_task;
    for (int i = 0; i < task_count; i++) {
        if (tasks[i].state == READY && tasks[i].prio > tasks[best].prio) {
            best = i;
        }
    }
    return best;
}

Common Misconceptions

“Priority always prevents starvation” (low priority can starve if no time slicing)
“Round robin is always better” (it can harm determinism for real-time tasks)
“Priority inversion cannot happen in small systems” (it can happen anywhere)

Check-Your-Understanding Questions

When does a preemptive scheduler switch tasks?
Why can priority inversion break real-time guarantees?
What is the role of an idle task?

Check-Your-Understanding Answers

When a higher-priority task becomes READY.
It delays a high-priority task behind a lower-priority resource holder.
It runs when no tasks are ready and can enter low-power mode.

Real-World Applications

Motor control prioritized above logging
Safety checks prioritized above UI updates

Where You’ll Apply It

Projects 3, 4, 5, 6.

References

Priority switch behavior and priority inheritance note: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__ThreadMgmt.html
Mutex priority inheritance attribute: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__MutexMgmt.html

Key Insight

Scheduling policy is the core contract between your code and time.

Summary

Fixed-priority preemption with correct task states is the simplest deterministic RTOS model.

Homework/Exercises to Practice the Concept

Implement round-robin for equal-priority tasks and measure jitter impact.
Create three tasks with different priorities and verify preemption order.
Simulate a priority inversion scenario and describe the outcome.

Solutions to the Homework/Exercises

Track a time slice counter and rotate tasks of equal priority.
Use GPIO toggles to visualize which task runs first.
Low-priority task locks mutex, high-priority blocks, medium-priority runs.

6) Synchronization and Inter-Task Communication (IPC)

Fundamentals

Tasks share resources and must coordinate. Synchronization primitives prevent corruption and enforce ordering. Mutexes provide mutual exclusion; semaphores signal events or resource availability; message queues pass data between tasks; event flags coordinate multiple conditions. Improper synchronization leads to deadlocks, priority inversion, and missed deadlines.

Deep Dive into the Concept

The simplest synchronization primitive is a critical section: disable interrupts, manipulate shared state, re-enable. This is fast but increases interrupt latency, so it must be short. Mutexes are more flexible: tasks can block, allowing the CPU to run other tasks instead of busy-waiting. However, mutexes introduce priority inversion. Priority inheritance (optional but common) temporarily raises the priority of the mutex owner to that of the highest waiting task, preventing unbounded blocking. CMSIS-RTOS documents this behavior and recommends using mutex attributes to enable it.

Semaphores generalize mutexes: a binary semaphore can signal an event, while a counting semaphore can track multiple resources. Queues provide structured communication: tasks send messages, and receivers block until data arrives. Event flags allow a task to wait for multiple conditions simultaneously (e.g., sensor ready AND buffer free). A robust RTOS needs all these primitives, and you must implement them with interrupt-safe operations to avoid race conditions.

IPC is not just about correctness; it is about determinism. If a high-priority task blocks on a queue, it must unblock within a bounded time. If a queue is full, the sender must block or drop data predictably. Real-time design means you must define these behaviors explicitly.

How This Fits on Projects

Projects 6 and 7 implement mutexes and queues. Project 8 uses event flags and software timers.

Definitions & Key Terms

Mutex: mutual exclusion lock
Semaphore: signaling or counting primitive
Queue: FIFO buffer for messages
Event flags: bitmask-based synchronization

Mental Model Diagram

Producer Task --> [Queue] --> Consumer Task
            \                 /
             \--- semaphore --

How It Works (Step-by-Step)

Task tries to acquire a mutex.
If available, it locks and continues; else it blocks.
When released, highest priority waiting task is unblocked.
Queues store messages in a ring buffer; send/receive block on full/empty.
Event flags allow tasks to wait for multiple conditions via bitmask.

Minimal Concrete Example

// Simple queue send (blocking)
if (queue_full(q)) {
    block_current_task(q);
}
queue_put(q, msg);

Common Misconceptions

“Mutex = semaphore” (mutexes are for mutual exclusion and ownership)
“Disabling interrupts is always OK” (it increases latency)
“Queues are just buffers” (they are synchronization objects too)

Check-Your-Understanding Questions

When should you use a mutex instead of a semaphore?
What problem does priority inheritance solve?
Why must queue operations be interrupt-safe?

Check-Your-Understanding Answers

Use a mutex when a resource has an owner and must be released by same task.
It prevents high-priority tasks from being blocked indefinitely by lower-priority holders.
Because ISRs can modify the queue concurrently.

Real-World Applications

Sensor producer feeding a logging task
UART driver signaling a processing task

Where You’ll Apply It

Projects 6, 7, 8.

References

CMSIS-RTOS2 priority inheritance description: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__ThreadMgmt.html
CMSIS-RTOS2 mutex attribute for priority inheritance: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__MutexMgmt.html

Key Insight

Synchronization defines both safety and timing; it is a real-time concern.

Summary

Mutexes, semaphores, queues, and event flags are the core tools for safe concurrency.

Homework/Exercises to Practice the Concept

Build a binary semaphore that an ISR can give and a task can take.
Implement a fixed-size queue with blocking send/receive.
Demonstrate priority inversion with three tasks.

Solutions to the Homework/Exercises

Protect the semaphore count with critical sections.
Use head/tail indices and block when full/empty.
Low task locks mutex, high task waits, medium runs until inheritance applied.

7) Time Services and Software Timers

Fundamentals

An RTOS must provide time-based services: delays, timeouts, periodic timers, and tick counters. These services are typically built on top of the system tick. A task should be able to sleep without busy-waiting, and timers should execute callbacks or release tasks on schedule.

Deep Dive into the Concept

Time services turn the SysTick heartbeat into usable APIs. The kernel maintains a tick counter and data structures for delayed tasks. A simple approach: store a wake-up tick in each task and scan all tasks each tick. A more efficient approach: maintain a sorted delay list or a timer wheel. The tradeoff is complexity versus CPU overhead.

Software timers are virtual timers built on top of the same tick. A timer object stores a period and callback. Each tick decrements timers and triggers callbacks when zero. In safety-critical systems, callbacks should be minimal and often just unblock a task.

Advanced systems use tickless idle: when no tasks are ready, the kernel programs a hardware timer to wake up at the next scheduled event and stops the SysTick to save power. You will not implement full tickless idle in this guide, but you will build the foundations: accurate time tracking and minimal ISR overhead.

How This Fits on Projects

Projects 5 and 8 implement delays, timeouts, and software timers.

Definitions & Key Terms

Tick: periodic timebase (often 1 ms)
Sleep/Delay: task blocks for N ticks
Timer: callback after a delay or periodically
Tickless idle: suppressing ticks during idle

Mental Model Diagram

Tick -> update tick count -> check timers -> wake tasks

How It Works (Step-by-Step)

SysTick fires every N cycles.
Kernel increments global tick.
Timer list updated; expired timers run callbacks or wake tasks.
Scheduler runs highest priority READY task.

Minimal Concrete Example

void sleep_ticks(uint32_t ticks) {
    current->wake_tick = g_tick + ticks;
    current->state = BLOCKED;
    schedule();
}

Common Misconceptions

“Delay is just a busy loop” (busy loops waste CPU)
“Timers are independent of ticks” (software timers usually depend on tick)
“Tickless idle is trivial” (it needs careful time accounting)

Check-Your-Understanding Questions

Why do we avoid busy-wait delays in an RTOS?
What is the difference between a one-shot and periodic timer?
What can go wrong if tick overflow is not handled?

Check-Your-Understanding Answers

It wastes CPU and breaks determinism.
One-shot fires once; periodic reloads each period.
Time comparisons can fail, causing missed wakeups.

Real-World Applications

Periodic sensor sampling
Timeouts in communication protocols

Where You’ll Apply It

Projects 5, 8.

References

SysTick periodic interrupts for OS scheduling: https://arm-software.github.io/CMSIS_6/main/Core/group__SysTick__gr.html

Key Insight

Time services are where the OS becomes useful to applications.

Summary

Delays, timeouts, and timers turn hardware ticks into predictable scheduling.

Homework/Exercises to Practice the Concept

Implement a sorted delay list and compare with a linear scan.
Add a periodic timer that toggles a GPIO.
Make your tick counter wrap safely after overflow.

Solutions to the Homework/Exercises

Keep list sorted by wake tick and pop expired timers.
Create a timer object with period and callback.
Use unsigned arithmetic comparisons with wraparound.

8) Memory Management and Stack Safety

Fundamentals

Embedded systems rarely use a full heap. Instead, they rely on static allocation and fixed-size pools. Every task needs a stack; if the stack overflows, it corrupts memory silently. RTOS kernels must provide stack sizing, overflow detection, and safe allocation strategies.

Deep Dive into the Concept

Memory is the tightest constraint on many MCUs. A Cortex-M4 might have 128 KB of RAM, and each task stack consumes part of it. You must estimate stack depth based on call depth, interrupt nesting, and local variables. A safe strategy is to fill stacks with a known pattern (0xA5A5A5A5) and later check the high-water mark. Many commercial RTOS kernels provide stack watermarking for this reason.

Dynamic allocation can introduce fragmentation and unpredictability. For a real-time kernel, deterministic memory behavior is often more important than flexibility. Fixed-size block pools provide constant-time allocation. You will implement a simple pool allocator to guarantee that allocations succeed or fail predictably.

Stack safety also interacts with the exception model. Because interrupts use a stack (MSP or PSP), stack corruption can cause instant hard faults. Some systems use guard regions or MPU (Memory Protection Unit) to detect overflow. While we will not implement MPU protection here, you will design for safety by reserving guard patterns and checking them periodically.

How This Fits on Projects

Project 9 implements memory pools and stack overflow detection.

Definitions & Key Terms

Stack watermark: deepest stack usage measurement
Fragmentation: unused memory scattered in heap
Memory pool: fixed-size block allocator
Guard pattern: known value to detect overflow

Mental Model Diagram

Task Stack (grows down)
┌───────────────┐  High addr
│   Free space  │
│   (pattern)   │
│--------------│ <-- high-water mark
│  Used stack   │
└───────────────┘  Low addr

How It Works (Step-by-Step)

Allocate fixed stack array for each task.
Fill with guard pattern.
Periodically scan to find deepest usage.
If guard region corrupted, flag overflow.
Use memory pools for deterministic allocation.

Minimal Concrete Example

#define STACK_SIZE 256
uint32_t task1_stack[STACK_SIZE];

void init_stack(uint32_t *stack) {
    for (int i = 0; i < STACK_SIZE; i++) stack[i] = 0xA5A5A5A5;
}

Common Misconceptions

“Stack size is always enough if it works once” (usage varies by path)
“Heap is fine for RTOS” (fragmentation breaks determinism)
“Overflow always crashes immediately” (it often corrupts silently)

Check-Your-Understanding Questions

Why is heap fragmentation dangerous in real-time systems?
How do you measure maximum stack usage safely?
What is the benefit of fixed-size memory pools?

Check-Your-Understanding Answers

It makes allocation time unpredictable and can fail unexpectedly.
Fill with a pattern and measure how much was overwritten.
Constant-time allocation and deterministic behavior.

Real-World Applications

Safety-critical systems requiring deterministic memory
Certified embedded systems with static allocation only

Where You’ll Apply It

Project 9.

References

SysTick and system timing (for stack usage during ISR): https://arm-software.github.io/CMSIS_6/main/Core/group__SysTick__gr.html

Key Insight

Memory determinism is as important as CPU determinism in an RTOS.

Summary

You must allocate stacks and memory in a predictable, measurable way to keep real-time guarantees.

Homework/Exercises to Practice the Concept

Implement a fixed block allocator for 32-byte objects.
Measure stack high-water marks for each task.
Trigger a stack overflow and catch it with a guard pattern.

Solutions to the Homework/Exercises

Use a free list of block pointers.
Scan for untouched 0xA5A5A5A5 values.
Make a recursive function and verify overflow detection.

Glossary

RTOS: Real-Time Operating System with deterministic scheduling
TCB: Task Control Block
ISR: Interrupt Service Routine
SysTick: ARM system timer
PendSV: exception for deferred context switches
Priority inversion: low-priority task blocks high-priority task
WCET: worst-case execution time
Jitter: variation in timing
Tickless idle: stopping periodic ticks to save power

Why RTOS Matters

Modern systems are filled with embedded devices. IoT Analytics estimates 18.5 billion connected IoT devices in 2024 and projects 21.1 billion by the end of 2025. These devices must respond predictably to real-world events. Many are safety-critical (automotive, medical, industrial control), where a missed deadline can cause physical damage. An RTOS provides the structure and determinism required for such systems.

Context & Evolution

Early embedded systems used super-loops and interrupts. As systems grew, the complexity of concurrency and timing made these designs fragile. RTOS kernels emerged to formalize scheduling, synchronization, and time management, replacing ad-hoc designs with predictable mechanisms.

Old vs New (ASCII)

Super-loop design                    RTOS design
┌──────────────┐                    ┌──────────────┐
│ loop()       │                    │ scheduler    │
│  taskA()     │                    │ taskA (prio) │
│  taskB()     │                    │ taskB (prio) │
│  taskC()     │                    │ taskC (prio) │
└──────────────┘                    └──────────────┘
        ^                                    ^
   timing implicit                     timing explicit

Concept Summary Table

Concept	What You Must Internalize	Key Artifacts	Projects
Real-Time Fundamentals	deadlines, latency, jitter, WCET	timing budget	2,4,5,8,10
Boot + Memory Map	startup, vector table, linker script	linker.ld, startup.s	1,3,4,9
Interrupts + SysTick	exception entry, SysTick registers	ISR, SysTick_Handler	2,4,7,8
Context Switching	stack frame, PSP/MSP, PendSV	PendSV_Handler, TCB	3,4,9
Scheduling + States	READY/RUNNING/BLOCKED, priority	scheduler.c	3,4,5,6
Synchronization + IPC	mutex, sem, queue, events	mutex.c, queue.c	6,7,8
Time Services	delay, timeout, timers	tick.c, timer.c	5,8
Memory Management	stacks, pools, overflow checks	mempool.c	9

Project-to-Concept Map

Project	Concepts Applied
1. Bare-Metal Hello World	Boot + Memory Map
2. System Tick Interrupt	Interrupts + SysTick, Real-Time Fundamentals
3. Cooperative Scheduler	Context Switching, Scheduling
4. Preemptive Scheduler	Interrupts + SysTick, Scheduling, Context Switching
5. Sleep/Delay + Idle Task	Time Services, Scheduling
6. Mutex + Priority Inversion	Synchronization, Scheduling
7. Message Queue + ISR Deferral	IPC, Interrupts
8. Event Flags + Software Timers	Time Services, IPC
9. Memory Pool + Stack Safety	Memory Management
10. Latency Measurement Toolkit	Real-Time Fundamentals, Interrupts

Deep Dive Reading by Concept

Concept	Book + Chapters	Why This Matters
Real-Time Fundamentals	Real-Time Concepts for Embedded Systems Ch. 1-3	Defines hard vs soft real-time and timing metrics
Boot + Memory Map	Making Embedded Systems (2nd ed) Ch. 3-4	Board bring-up, datasheets, and timing setup underpin every RTOS
Interrupts + SysTick	Making Embedded Systems (2nd ed) Ch. 5	Interrupt timing and ISR structure define kernel responsiveness
Context Switching	Real-Time Concepts for Embedded Systems Ch. 5	Task switching and CPU context preservation
Scheduling + Tasks	Real-Time Concepts for Embedded Systems Ch. 4-5	RTOS fundamentals and task design
Synchronization + IPC	Real-Time Concepts for Embedded Systems Ch. 6-7, 15	Semaphores, queues, and communication
Time Services	Real-Time Concepts for Embedded Systems Ch. 11	Timer services and scheduling
Memory Management	Real-Time Concepts for Embedded Systems Ch. 13	Deterministic allocation and memory control
RTOS Fundamentals	Zephyr RTOS Embedded C Programming Ch. 2, 4, 5	Practical RTOS primitives

Quick Start (First 48 Hours)

Day 1: Toolchain and Bare Metal

Install arm-none-eabi-gcc, OpenOCD, GDB
Build and flash Project 1 (LED blink)
Verify you can single-step in GDB

Day 2: Interrupts and Tick

Implement Project 2 SysTick
Toggle a GPIO in the ISR and measure with a logic analyzer
Confirm tick count accuracy at 1 ms

If you get stuck, pause and read the Theory Primer chapters on Boot and Interrupts.

Recommended Learning Paths

Embedded First Path: Projects 1-4, then 5-6, then 7-10
OS Background Path: Read Theory Primer 4-6 first, then Projects 1-4
Timing Obsessed Path: Projects 2, 4, 5, 10 first, then the rest

Success Metrics

You can explain the Cortex-M exception stack frame from memory
You can implement a context switch in 20 lines of assembly
You can measure task latency and jitter with GPIO and a scope
Your kernel can run at least 5 tasks with deterministic timing
You can demonstrate and resolve priority inversion

Optional Appendices

Appendix A: GDB/OpenOCD Debugging Cheatsheet

# Flash + reset
openocd -f interface/stlink.cfg -f target/stm32f4x.cfg
arm-none-eabi-gdb build.elf
(gdb) target remote :3333
(gdb) monitor reset halt
(gdb) load

Appendix B: Measuring Latency with GPIO

Toggle a GPIO at ISR entry and exit
Measure pulse width with scope
Latency = time from event to ISR entry

Project Overview Table

Project	Core Output	Core Concepts	Difficulty
1. Bare-Metal Hello World	LED blink, custom linker/startup	Boot + memory map	Advanced
2. System Tick Interrupt	1 ms SysTick + ISR	Interrupts + SysTick	Advanced
3. Cooperative Scheduler	Two tasks, manual yield	Context switch + TCB	Expert
4. Preemptive Scheduler	Priority preemption	Scheduling + SysTick	Expert
5. Sleep/Delay + Idle Task	Tick-based sleep	Time services	Expert
6. Mutex + Priority Inversion	Priority inheritance demo	Sync + scheduling	Expert
7. Message Queue + ISR Deferral	Producer-consumer queue	IPC + ISR safety	Expert
8. Event Flags + Timers	Periodic timer callbacks	Time + IPC	Expert
9. Memory Pool + Stack Safety	Deterministic allocation	Memory mgmt	Expert
10. Latency Measurement Toolkit	Jitter and latency report	Real-time analysis	Expert

Project List

Project 1: The Bare-Metal “Hello, World”

Real World Outcome

You will flash a raw ELF/bin to the MCU and see a single LED blink at a fixed interval. When you connect GDB, you can set a breakpoint at main and single-step through register writes. Example terminal session:

$ make flash
Open On-Chip Debugger 0.12.0
Info : Listening on port 3333 for gdb connections
Info : stm32f4x.cpu: hardware has 6 breakpoints, 4 watchpoints
wrote 8192 bytes from file build/rtos.elf in 0.542s

The Core Question You’re Answering

How does a CPU start running code with no operating system, and how do I control hardware directly?

Concepts You Must Understand First

Boot + memory map (Theory Primer 2; Making Embedded Systems 2nd ed Ch. 3-4)
Interrupts + vector table basics (Primer 3; Making Embedded Systems 2nd ed Ch. 5)
Memory-mapped I/O (Primer 2; Making Embedded Systems 2nd ed Ch. 4)

Questions to Guide Your Design

Where is the vector table placed and how is it aligned?
Which GPIO register controls the LED pin?
How will you create a delay without a timer?

Thinking Exercise

Draw the exact memory map (Flash + RAM) and mark where .text, .data, .bss, and your stack reside.

The Interview Questions They’ll Ask

What is in the first two words of the vector table?
Why must volatile be used for peripheral registers?
What does a linker script do?
Why do you clear .bss on reset?

Hints in Layers

Start by writing a minimal startup.s with vector table and reset handler.
Use the reference manual to find RCC and GPIO registers.
Toggle the pin using BSRR to avoid read-modify-write hazards.
Add a delay loop only after GPIO output works.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Making Embedded Systems (2nd ed) | Ch. 3-4 | Board bring-up, I/O, timers | | The GNU Make Book | Ch. 1-3 | Build system basics |

Common Pitfalls & Debugging

Problem: LED never blinks

Why: GPIO clock not enabled in RCC
Fix: Set the correct bit in RCC_AHB1ENR
Quick test: Read back RCC_AHB1ENR in GDB

Problem: HardFault on startup

Why: Stack pointer invalid or vector table misaligned
Fix: Verify linker script and vector table address
Quick test: Inspect SP after reset in GDB

Definition of Done

Vector table is in correct flash location
Reset handler sets up .data and .bss
GPIO configured correctly
LED blinks at a visible rate

Project 2: The System Tick Interrupt

Real World Outcome

Your firmware prints a tick counter over UART every second, and a GPIO toggles at exactly 1 kHz. Example UART output:

[000001000] tick=1000
[000002000] tick=2000
[000003000] tick=3000

The Core Question You’re Answering

How do I create a precise hardware timebase that can drive an RTOS scheduler?

Concepts You Must Understand First

SysTick registers and reload values (Primer 3; Making Embedded Systems 2nd ed Ch. 4)
Interrupt entry/exit timing (Primer 3; Real-Time Concepts Ch. 10)
Real-time timing fundamentals (Primer 1; Real-Time Concepts Ch. 1,4)

Questions to Guide Your Design

What reload value yields a 1 ms interrupt?
How do you minimize ISR execution time?
How will you verify tick accuracy?

Thinking Exercise

Calculate the maximum jitter introduced if your SysTick ISR takes 5 us on a 1 ms tick.

The Interview Questions They’ll Ask

Why is SysTick preferred for RTOS tick generation?
What happens if SysTick fires while interrupts are disabled?
How can you detect if your ISR is too slow?
What is COUNTFLAG used for?

Hints in Layers

Use CMSIS SysTick_Config(SystemCoreClock/1000).
Toggle a GPIO at ISR entry to measure timing.
Keep ISR under 10 us to minimize jitter.
Store tick counter in volatile.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 11 | Timer services | | Making Embedded Systems (2nd ed) | Ch. 4-5 | Timers and interrupts |

Common Pitfalls & Debugging

Problem: SysTick never fires

Why: CTRL not enabling interrupt
Fix: Set ENABLE and TICKINT bits
Quick test: Read SysTick CTRL in GDB

Problem: Tick count drifts

Why: Wrong SystemCoreClock value
Fix: Verify clock config (PLL, prescalers)
Quick test: Compare with scope measurement

Definition of Done

SysTick ISR fires every 1 ms
GPIO toggle measured at 1 kHz
Tick counter increments correctly
UART reports tick values without drift

Project 3: A Cooperative Multi-Tasking Scheduler

Real World Outcome

Two tasks blink LEDs independently. The scheduler switches tasks when each calls yield(). GDB shows separate stack pointers per task. You can add a third task without changing kernel logic.

The Core Question You’re Answering

How does a context switch work at the register and stack level?

Concepts You Must Understand First

Context switching and stack frames (Primer 4; Real-Time Concepts Ch. 5)
Boot and memory layout (Primer 2; Making Embedded Systems 2nd ed Ch. 3)
Task basics and states (Primer 5; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

Where will each task’s stack live?
How will you initialize a task stack frame?
What registers must be saved/restored manually?

Thinking Exercise

Sketch the stack of a newly created task and label each saved register.

The Interview Questions They’ll Ask

Why are R4-R11 saved manually in PendSV?
What is the difference between MSP and PSP?
How does the scheduler decide the next task?
What happens if two tasks share a stack?

Hints in Layers

Start with a TCB containing only a stack pointer.
Build a fake exception frame for new tasks.
Use PendSV to save/restore registers.
Verify context switch by watching R4 values per task.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 4-5 | RTOS task basics | | Making Embedded Systems (2nd ed) | Ch. 6 | Managing flow of activity |

Common Pitfalls & Debugging

Problem: Task crashes on first run

Why: Incorrect initial stack frame (xPSR Thumb bit missing)
Fix: Set xPSR to 0x01000000
Quick test: Inspect stack frame in memory

Problem: Tasks overwrite each other

Why: Stacks overlap in RAM
Fix: Reserve distinct stack arrays
Quick test: Fill stacks with patterns and check overlap

Definition of Done

At least two tasks run and yield correctly
Each task has its own stack
Context switch saves and restores registers
Scheduler scales to N tasks

Project 4: A Preemptive, Priority-Based Scheduler

Real World Outcome

You will see a high-priority task preempt a lower-priority task on every tick. A GPIO toggled in the high-priority task interrupts the low-priority blink pattern. This confirms preemption.

The Core Question You’re Answering

How does the OS forcefully take control of the CPU to meet deadlines?

Concepts You Must Understand First

SysTick interrupts (Primer 3; Real-Time Concepts Ch. 10-11)
Scheduling and priorities (Primer 5; Real-Time Concepts Ch. 4-5)
Context switching (Primer 4; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

How do you trigger PendSV from SysTick?
How do you manage READY/RUNNING states safely?
How do you ensure the highest priority task runs immediately?

Thinking Exercise

Simulate three tasks (priorities 3, 2, 1) and draw the execution timeline when a high-priority task becomes ready mid-tick.

The Interview Questions They’ll Ask

Why is PendSV given the lowest priority?
What is priority inversion and how would it show up here?
How can you ensure preemption happens immediately?
What happens if no task is READY?

Hints in Layers

In SysTick ISR, set PendSV pending bit.
Maintain task state array and select highest READY.
Add an idle task with lowest priority.
Verify preemption with GPIO toggles.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 4-5 | RTOS scheduling | | Making Embedded Systems (2nd ed) | Ch. 5-6 | Interrupts and flow of activity |

Common Pitfalls & Debugging

Problem: Preemption never happens

Why: PendSV priority too high
Fix: Set PendSV to lowest priority in NVIC
Quick test: Check NVIC priority registers

Problem: Kernel crashes in ISR

Why: Re-entrant scheduler or interrupts not masked
Fix: Use critical sections around scheduler data
Quick test: Disable interrupts during task list update

Definition of Done

SysTick triggers preemption
Highest priority READY task runs immediately
Idle task runs when no tasks are ready
Task states are consistent under load

Project 5: Sleep, Delay, and Idle Task

Real World Outcome

Tasks can call sleep_ms(100) and reliably resume. An idle task runs when no work is available and toggles a GPIO slowly. Tick-based timing is visible on a scope.

The Core Question You’re Answering

How do tasks block without busy-waiting and still meet deadlines?

Concepts You Must Understand First

Time services and delays (Primer 7; Real-Time Concepts Ch. 11)
Scheduling and task states (Primer 5; Real-Time Concepts Ch. 4-5)
Tick wraparound handling (Primer 7; Real-Time Concepts Ch. 9,11)

Questions to Guide Your Design

How will you track wake-up ticks?
How do you handle tick counter overflow?
How does the idle task reduce power?

Thinking Exercise

Design a data structure to store sleeping tasks efficiently. Compare a linear scan vs sorted list.

The Interview Questions They’ll Ask

Why is busy-waiting bad in RTOS design?
What happens when tick counter wraps around?
How does an idle task improve power efficiency?
How do you ensure a sleeping task wakes exactly on time?

Hints in Layers

Store wake_tick in each TCB.
Each tick, check for expired tasks and mark READY.
Use unsigned arithmetic to handle wraparound.
Add idle hook to enter low-power mode.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 11 | Timer services | | Zephyr RTOS Embedded C Programming | Ch. 2 | RTOS fundamentals |

Common Pitfalls & Debugging

Problem: Tasks never wake

Why: Incorrect tick comparison with overflow
Fix: Use if ((int32_t)(now - wake) >= 0)
Quick test: Force tick near overflow and test

Problem: Idle task runs too often

Why: READY tasks never marked correctly
Fix: Validate state transitions
Quick test: Log state changes via UART

Definition of Done

Tasks can sleep for N ticks
Idle task runs when all others are blocked
Tick overflow handled correctly
Measured sleep times are accurate

Project 6: Mutexes and Priority Inversion Demo

Real World Outcome

You will create three tasks (low, medium, high priority). The low task locks a mutex, high task blocks, medium task runs and delays high task. Then you enable priority inheritance and observe the fix.

The Core Question You’re Answering

How do you prevent a low-priority task from breaking real-time guarantees?

Concepts You Must Understand First

Synchronization and mutexes (Primer 6; Real-Time Concepts Ch. 6,15)
Scheduling and priorities (Primer 5; Real-Time Concepts Ch. 4-5)
Task states and blocking (Primer 5; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

How does a mutex track its owner?
How will you implement priority inheritance?
What happens when a task holding a mutex blocks?

Thinking Exercise

Draw a timing diagram showing priority inversion with and without inheritance.

The Interview Questions They’ll Ask

What is priority inversion?
How does priority inheritance work?
Why is priority inheritance typically limited to mutexes?
Can priority inheritance cause deadlocks?

Hints in Layers

Add owner and lock_count to the mutex struct.
When high-priority task blocks, temporarily boost owner priority.
Restore original priority on unlock.
Log priority changes to UART for visibility.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 6 | Semaphores and mutexes | | Zephyr RTOS Embedded C Programming | Ch. 4 | Multithreading and synchronization |

Common Pitfalls & Debugging

Problem: Deadlock after inheritance

Why: Recursive lock without support
Fix: Either disallow or implement recursive mutex
Quick test: Add assertions on lock count

Problem: Priority never restored

Why: Missing restore path on unlock
Fix: Store original priority in mutex
Quick test: Print priority before/after unlock

Definition of Done

Priority inversion reproduced and measured
Priority inheritance fixes the inversion
Mutex ownership and unlock correctness verified
No deadlocks under stress

Project 7: Message Queue and ISR Deferral

Real World Outcome

An ISR pushes sensor data into a queue, and a lower-priority task processes it. UART logs show clean producer-consumer flow without dropped messages.

The Core Question You’re Answering

How do you safely move data from an interrupt to a task without losing determinism?

Concepts You Must Understand First

Interrupt model and ISR constraints (Primer 3; Real-Time Concepts Ch. 10)
IPC queues (Primer 6; Real-Time Concepts Ch. 7)
Task wakeup mechanics (Primer 5; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

How will ISR-safe queue operations differ from task-level operations?
What happens when the queue is full?
How will you wake a blocked consumer task?

Thinking Exercise

Design a ring buffer with head/tail indices and explain how to avoid race conditions.

The Interview Questions They’ll Ask

Why should ISRs not do heavy processing?
How do you make a queue ISR-safe?
What is a deferred interrupt?
How do you handle queue overflow?

Hints in Layers

Use a fixed-size ring buffer with power-of-two size.
In ISR, disable interrupts only around index updates.
If queue is full, drop oldest or increment a loss counter.
Wake the consumer task by setting its state to READY.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 7, 10 | Message queues and interrupts | | Making Embedded Systems (2nd ed) | Ch. 5-6 | Interrupts and flow control |

Common Pitfalls & Debugging

Problem: Queue corrupted after ISR

Why: Non-atomic head/tail update
Fix: Use critical section or atomic ops
Quick test: Enable queue integrity checks

Problem: Consumer never wakes

Why: Missing state transition or PendSV trigger
Fix: Set task READY and trigger scheduler
Quick test: Set breakpoint in scheduler

Definition of Done

ISR can enqueue data without blocking
Consumer task processes all data
Overflow behavior is defined and tested
No race conditions observed

Project 8: Event Flags and Software Timers

Real World Outcome

Multiple tasks wait on event flags (bitmask). A software timer fires every 100 ms and sets a flag; another timer provides a 1-second heartbeat. LEDs and UART logs confirm flag-driven execution.

The Core Question You’re Answering

How do you coordinate multiple conditions and periodic events efficiently?

Concepts You Must Understand First

Event flags and kernel objects (Primer 6; Real-Time Concepts Ch. 8,15)
Software timers and timeouts (Primer 7; Real-Time Concepts Ch. 11)
Scheduling under time constraints (Primer 5; Real-Time Concepts Ch. 4-5)

Questions to Guide Your Design

How will you store and atomically update event flags?
What data structure will manage multiple timers?
How will you handle missed or delayed timer events?

Thinking Exercise

Design a bitmask-based wait that supports wait-any and wait-all semantics.

The Interview Questions They’ll Ask

What is the difference between a queue and event flags?
How do you implement periodic software timers?
Why should timer callbacks be short?
How do you avoid timer drift?

Hints in Layers

Use a 32-bit mask for event flags.
Support wait-any by unblocking when (flags & mask) != 0.
Use a sorted timer list for efficiency.
Use absolute next-fire time to reduce drift.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 8, 11 | Kernel objects and timers | | Zephyr RTOS Embedded C Programming | Ch. 5 | Work queues and messaging |

Common Pitfalls & Debugging

Problem: Flags missed

Why: Flag cleared before task unblocks
Fix: Use sticky bits until acknowledged
Quick test: Add logging around flag set/clear

Problem: Timer drift

Why: Next trigger based on current time instead of absolute
Fix: Add period to scheduled time
Quick test: Compare timestamps over 1000 cycles

Definition of Done

Event flags support wait-any and wait-all
Software timers fire at correct periods
Timer callbacks are deterministic
No missed events under load

Project 9: Memory Pool and Stack Safety

Real World Outcome

Your kernel allocates buffers from a fixed memory pool with constant-time behavior. Stack high-water marks are reported over UART, and deliberate overflow triggers a safe error.

The Core Question You’re Answering

How do you make memory usage deterministic and safe in an RTOS?

Concepts You Must Understand First

Memory pools and deterministic allocation (Primer 8; Real-Time Concepts Ch. 13)
Stack safety and sizing (Primer 8; Making Embedded Systems 2nd ed Ch. 11)
Context switching (Primer 4; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

How big should each task stack be?
How will you detect stack overflow in runtime?
What pool sizes are appropriate for IPC messages?

Thinking Exercise

Estimate stack depth for a task that calls three nested functions, each with 64 bytes of local data.

The Interview Questions They’ll Ask

Why do RTOS kernels often avoid malloc/free?
What is a stack watermark and how is it measured?
How do memory pools improve determinism?
What happens if a stack overflows during an ISR?

Hints in Layers

Fill stacks with a known pattern at init.
Scan stack memory periodically for watermark.
Build a pool as a linked list of fixed blocks.
On overflow, trigger a safe fault handler.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 13 | Memory management | | Making Embedded Systems (2nd ed) | Ch. 11 | Optimization and resource limits |

Common Pitfalls & Debugging

Problem: Pool allocator corrupts memory

Why: Double free or invalid pointer
Fix: Add allocation state flags
Quick test: Stress test with random allocations

Problem: Stack watermark always zero

Why: Pattern overwritten everywhere
Fix: Increase stack size, reduce recursion
Quick test: Measure after minimal workload

Definition of Done

Memory pool allocations are constant time
Stack high-water marks reported
Overflow detection triggers error
No heap usage in kernel

Project 10: Latency and Jitter Measurement Toolkit

Real World Outcome

You will generate a report of interrupt latency and task jitter. GPIO pulses show worst-case ISR latency, and UART logs show jitter statistics. This turns your RTOS into a measurable system.

The Core Question You’re Answering

How do you verify that your RTOS actually meets timing guarantees?

Concepts You Must Understand First

Real-time fundamentals and metrics (Primer 1; Real-Time Concepts Ch. 16)
Interrupt timing behavior (Primer 3; Real-Time Concepts Ch. 10)
Time services and tick accuracy (Primer 7; Real-Time Concepts Ch. 11)

Questions to Guide Your Design

What metrics will you capture (latency, jitter, WCET)?
How will you timestamp events without disturbing timing?
How will you report the data?

Thinking Exercise

Design an experiment to measure how much jitter increases when you enable a heavy ISR.

The Interview Questions They’ll Ask

How do you measure interrupt latency on real hardware?
What is the difference between latency and response time?
Why is jitter important in control systems?
How do you confirm worst-case behavior?

Hints in Layers

Toggle a GPIO at ISR entry and exit, measure with a scope.
Use the DWT cycle counter if available for precise timing.
Log timestamps to UART in a low-priority task.
Stress the system with extra ISRs to see worst-case effects.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 16 | Common design problems | | Making Embedded Systems (2nd ed) | Ch. 4-5 | Timing and interrupt fundamentals |

Common Pitfalls & Debugging

Problem: Measurement itself changes timing

Why: UART logging in ISR adds delay
Fix: Only toggle GPIO in ISR; log later in task
Quick test: Compare latency with and without logging

Problem: Jitter values inconsistent

Why: Non-deterministic interrupts or DMA
Fix: Disable non-essential peripherals during test
Quick test: Run tests in minimal configuration

Definition of Done

Latency measured with GPIO pulses
Jitter statistics reported over UART
Worst-case measurements captured under load
Results documented in a timing report