Learn RTOS from Scratch in C: From Bare Metal to a Preemptive Kernel

Goal: Build a real, working RTOS kernel on a Cortex-M microcontroller and understand every layer that makes it work: reset and boot, memory layout, interrupts, system tick, context switching, task states, scheduling, synchronization, and time services. By the end, you will be able to design and implement a deterministic scheduler, debug interrupt-level code, and reason about latency, jitter, and deadlines like a real embedded systems engineer. You will also understand how commercial RTOS kernels (FreeRTOS, Zephyr, ThreadX, etc.) are structured internally because you will have built their core from scratch.

Introduction

A Real-Time Operating System (RTOS) is a minimal kernel that guarantees timing behavior. Unlike a general-purpose OS, an RTOS is designed for deterministic response: tasks must run within known deadlines, often on tiny microcontrollers with tight memory and CPU budgets. This guide walks you from bare-metal C to a fully preemptive RTOS kernel with tasks, interrupts, context switching, and synchronization primitives.

What you will build:

  • A bare-metal firmware with custom startup code and linker script
  • A millisecond system tick driven by SysTick
  • A cooperative scheduler with explicit yields
  • A preemptive, priority-based scheduler driven by interrupts
  • Synchronization primitives (mutexes, semaphores, queues, event flags)
  • Time services (sleep, timeout, periodic timers)
  • Memory pools and stack safety checks
  • Instrumentation for latency measurement and debugging

Scope boundaries:

  • Target architecture: ARM Cortex-M (examples use STM32F4 class MCUs)
  • Kernel only (no filesystem, no networking stack)
  • C and small amounts of ARM assembly for context switching
  • No dynamic memory allocation required (static-only is acceptable)

Big picture diagram

                         APPLICATION TASKS
┌──────────────────────────────────────────────────────────┐
│   Task A     Task B     Task C     Idle/Background       │
│  (sensor)   (control)  (logging)   (low power)           │
└───────────────┬───────────────┬───────────────┬──────────┘
                │ syscalls: sleep/yield/lock/post
                v
┌──────────────────────────────────────────────────────────┐
│                     RTOS KERNEL                          │
│  Scheduler  Task Control Blocks  IPC (mutex/queue)       │
│  Time Mgmt  Priority + States     Context Switch         │
└───────────────┬───────────────────────────┬──────────────┘
                │ SysTick + PendSV + NVIC
                v
┌──────────────────────────────────────────────────────────┐
│                HARDWARE ABSTRACTION LAYER                │
│  SysTick  GPIO  UART  Timers  Interrupt Controller       │
└──────────────────────────────────────────────────────────┘
                │ Memory-mapped registers
                v
┌──────────────────────────────────────────────────────────┐
│                     PHYSICAL HARDWARE                    │
│   Cortex-M CPU   Flash/RAM   Timers   GPIO   UART         │
└──────────────────────────────────────────────────────────┘

How to Use This Guide

  1. Read the Theory Primer as a mini-book. Each concept maps directly to projects.
  2. Build the projects in order. Each project depends on the previous ones.
  3. Use the Project-to-Concept Map to revisit chapters if you get stuck.
  4. Keep your board connected and use OpenOCD + GDB for real debugging.
  5. Keep a lab notebook: record register values, ISR timing, and bugs you fix.

Prerequisites & Background

Essential Prerequisites (Must Have)

  • Solid C programming (pointers, structs, volatile, memory layout)
  • Comfort reading datasheets and reference manuals
  • Basic digital logic and CPU architecture concepts

Helpful But Not Required

  • ARM assembly and calling conventions
  • Experience with embedded toolchains (GCC, Make, GDB)
  • Understanding of OS basics (processes, scheduling)

Self-Assessment Questions

  • Can you explain what volatile means in embedded C?
  • Do you know how a linker script controls memory placement?
  • Can you read a peripheral register map and configure a GPIO pin?
  • Do you understand why interrupts are asynchronous to normal execution?

Development Environment Setup

  • Board: STM32F4 (Black Pill or Nucleo)
  • Debugger: ST-Link V2
  • Toolchain: arm-none-eabi-gcc, make, openocd, arm-none-eabi-gdb
  • Optional: Logic analyzer or oscilloscope for timing verification

Time Investment

  • Project 1-2: 1 weekend
  • Project 3-4: 2-3 weeks total (context switch is the hard part)
  • Project 5-10: 3-5 weeks depending on debugging depth

Important Reality Check

This is systems programming at the metal. You will hit hard faults, lock up the board, and debug memory corruption. That is the point. Expect frustration and be systematic: change one thing at a time, instrument, and verify.

Big Picture / Mental Model

Think of your RTOS as a loop of time + state + decision:

Tick Interrupt --> Update timers --> Choose next task --> Context switch
      ^                                                   |
      |                                                   v
  Hardware clock                                   Task runs until:
      |                                      - yields
      |                                      - blocks
      +-------------------------------------- - is preempted

The kernel is essentially the code that runs between those arrows.

Theory Primer

1) Real-Time Fundamentals and Timing Guarantees

Fundamentals

Real-time systems are not about running fast; they are about running predictably. A task is real-time when it has a deadline or maximum response time, and missing that deadline is a failure (hard real-time) or a degradation (soft real-time). Key quantities include latency (time between event and response), jitter (variation in timing), and worst-case execution time (WCET). Determinism is more important than raw throughput. In embedded systems, deadlines often come from the physical world: a motor control loop must update at a fixed period, a sensor must be sampled before data becomes stale, or a safety signal must be handled within microseconds.

Deep Dive into the Concept

A real-time system is a scheduling problem constrained by physics. If you have a periodic task that must run every 1 ms for 50 us, the processor must be available every millisecond, or the output becomes invalid. This means you must reason about utilization (total CPU time consumed by tasks), priority (which task runs first when multiple are ready), and blocking (time spent waiting for resources). In a bare-metal loop, your timing is implicit: code runs in a fixed order, and interrupts can preempt at any point. In an RTOS, timing becomes explicit: tasks are created, priorities are assigned, and the kernel enforces a deterministic order.

Real-time theory gives you two key mental tools. First, schedulability analysis: can the CPU meet all deadlines? Classic results like Rate Monotonic Scheduling (RMS) and Earliest Deadline First (EDF) answer that under assumptions. Second, response-time analysis: given a task and its priority, what is the worst-case time from event to completion? Even if you do not run formal proofs, you must think in these terms to design a reliable kernel. Your RTOS is the device that enforces these guarantees.

In practice, embedded engineers often use a hybrid approach: priority-based preemption for critical tasks, while lower priority tasks run when the CPU is idle. The system tick drives periodic scheduling decisions, while interrupts handle urgent asynchronous events. Understanding the differences between hard, firm, and soft real-time allows you to choose appropriate design tradeoffs. For example, missing a motor control update could damage hardware (hard), while missing a UI update just causes stutter (soft).

How This Fits on Projects

All projects depend on timing. Projects 2, 4, 5, 8, and 10 explicitly measure or enforce timing; projects 6 and 7 show how timing can be destroyed by blocking or priority inversion.

Definitions & Key Terms

  • Deadline: latest acceptable completion time for a task
  • Latency: time from event to response
  • Jitter: variation in response timing
  • WCET: worst-case execution time
  • Hard Real-Time: missed deadline is a failure
  • Soft Real-Time: missed deadline is a degradation

Mental Model Diagram

Event occurs ----> [Latency] ----> Task executes ----> Deadline
                     ^                 ^
                     |                 |
                   jitter          execution time

How It Works (Step-by-Step)

  1. An external event (sensor interrupt) or time event (tick) occurs.
  2. The system captures the event and marks a task ready.
  3. Scheduler decides if the ready task should preempt the current one.
  4. Task runs and completes before its deadline.
  5. The system records timing to verify jitter and latency bounds.

Minimal Concrete Example

// Hard real-time: 1 ms motor control loop
#define PERIOD_TICKS 1
void motor_task(void) {
    while (1) {
        control_step();      // must finish in <1 ms
        sleep_ticks(PERIOD_TICKS);
    }
}

Common Misconceptions

  • “Real-time means fast” (it means predictable)
  • “An RTOS guarantees deadlines” (only if you design correctly)
  • “If average utilization < 100%, it’s fine” (worst-case matters)

Check-Your-Understanding Questions

  1. What is the difference between latency and jitter?
  2. Why is WCET more important than average execution time?
  3. When can soft real-time be acceptable?

Check-Your-Understanding Answers

  1. Latency is the delay from event to response; jitter is the variation in that delay.
  2. A single worst-case overrun can violate deadlines even if average time is low.
  3. When occasional deadline misses only reduce quality, not safety or correctness.

Real-World Applications

  • Motor control loops in robotics
  • Engine control units in automotive
  • Medical devices that must respond to alarms within strict bounds

Where You’ll Apply It

Projects 2, 4, 5, 8, and 10.

References

  • RTOS task priorities and immediate switch behavior: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__ThreadMgmt.html
  • IoT device scale and growth context (for why real-time matters): https://iot-analytics.com/number-connected-iot-devices/

Key Insight

Determinism, not speed, defines real-time success.

Summary

Real-time engineering is about meeting deadlines predictably. You must understand latency, jitter, and WCET and design your kernel to enforce timing rules.

Homework/Exercises to Practice the Concept

  1. Measure the jitter of a 1 kHz loop using GPIO toggles and a logic analyzer.
  2. Compute CPU utilization for three periodic tasks and determine if deadlines are feasible.
  3. Identify a system in your life that is hard real-time and justify why.

Solutions to the Homework/Exercises

  1. Toggle a pin at loop entry and exit, measure timing variance with a scope.
  2. Add execution times and divide by periods; ensure total utilization < safe bound.
  3. Example: airbag deployment must respond within milliseconds or fails safety.

2) Bare-Metal Boot, Memory Map, and Linker Control

Fundamentals

On a microcontroller, there is no OS to prepare your program. The CPU starts at a fixed reset vector address, loads an initial stack pointer, and jumps to your reset handler. You must provide a startup file and linker script to describe where code and data live in flash and RAM. This is where your RTOS begins: if the vector table is wrong, interrupts will never work; if the stack is misplaced, context switches will crash.

Deep Dive into the Concept

The Cortex-M boot sequence is deterministic. At reset, the CPU reads the first two 32-bit words in flash: the initial stack pointer and the reset handler address. This means your linker script must place the vector table at address 0x00000000 (or remapped location). Your startup code sets up .data (copying initialized data from flash to RAM) and clears .bss. After that, it calls main.

In an RTOS, you will add additional sections: task stacks, TCB arrays, and possibly memory pools. You must decide which objects live in RAM and which in flash. You must also manage stack alignment and avoid overlap with the heap or other buffers. Linker control is your guarantee that each task has a safe, dedicated stack region.

This chapter is where you learn why embedded firmware is not just C code. The binary is a memory layout. If you understand linker scripts, you can create multiple memory regions, reserve space for a bootloader, and place the vector table in a custom region. Every bug here is catastrophic: wrong addresses cause hard faults that seem mysterious until you realize your stack pointer points into flash.

How This Fits on Projects

Project 1 builds the linker script and startup. Projects 3-4 depend on stack layout and vector table correctness.

Definitions & Key Terms

  • Vector table: array of pointers to exception handlers
  • Reset handler: first code executed after reset
  • Linker script: map of memory regions and section placement
  • .data / .bss: initialized and zero-initialized data segments

Mental Model Diagram

Flash (ROM)                         RAM
┌───────────────┐             ┌───────────────┐
│ Vector table  │----+        │ .data         │
│ .text (code)  │    | copy   │ .bss (zero)   │
│ .rodata       │    +------> │ stacks        │
└───────────────┘             └───────────────┘
       ^ reset reads SP, PC

How It Works (Step-by-Step)

  1. CPU reads initial SP and reset handler from vector table.
  2. Reset handler sets stack pointer and initializes RAM sections.
  3. C runtime (minimal) is prepared.
  4. main() runs and your RTOS initialization begins.

Minimal Concrete Example

/* linker.ld */
MEMORY
{
  FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
  RAM (rwx)  : ORIGIN = 0x20000000, LENGTH = 128K
}

SECTIONS
{
  .isr_vector : { KEEP(*(.isr_vector)) } > FLASH
  .text : { *(.text*) *(.rodata*) } > FLASH
  .data : { *(.data*) } > RAM AT > FLASH
  .bss  : { *(.bss*) *(COMMON) } > RAM
}

Common Misconceptions

  • “The compiler decides memory layout” (the linker script does)
  • “Stacks are automatic” (you must allocate them)
  • “Vector table can be anywhere” (the CPU expects it at reset)

Check-Your-Understanding Questions

  1. What happens if .data is not copied to RAM?
  2. Why does the CPU read the initial SP from flash?
  3. What causes a hard fault if the stack pointer is wrong?

Check-Your-Understanding Answers

  1. Variables with initial values will be incorrect.
  2. The CPU must know where the stack begins before code runs.
  3. The first push/pop or interrupt will write to an invalid address.

Real-World Applications

  • Bootloaders that remap the vector table
  • Firmware with multiple memory regions (boot + app)

Where You’ll Apply It

Projects 1, 3, 4, 9.

References

  • Cortex-M4 System timer and registers (for memory map examples): https://manuals.plus/m/a5ff6b2d88bcbc6b3ccdc368778a3c98f71c37239afbd2620c60acbffbbaa1fa

Key Insight

Your RTOS is only as reliable as your memory map.

Summary

The linker script and startup file define the physical reality your kernel runs in. Without them, there is no safe stack, no interrupts, and no RTOS.

Homework/Exercises to Practice the Concept

  1. Modify a linker script to reserve 4 KB for a bootloader.
  2. Add a separate .stack section and place it at the top of RAM.
  3. Relocate the vector table and verify interrupts still fire.

Solutions to the Homework/Exercises

  1. Create a FLASH region starting after the bootloader size.
  2. Add .stack section and symbol to reserve space.
  3. Update VTOR register to new vector table address.

3) Interrupts, SysTick, and the Exception Model

Fundamentals

Interrupts are asynchronous events that suspend normal execution. On Cortex-M, interrupts and exceptions are managed by the NVIC and a fixed vector table. SysTick is a built-in 24-bit timer designed to generate periodic interrupts used by RTOS kernels for scheduling. The CPU automatically saves a subset of registers on exception entry and restores them on return, enabling fast, deterministic response.

Deep Dive into the Concept

The Cortex-M exception model is the foundation of a preemptive RTOS. When an interrupt fires, the CPU automatically pushes registers onto the current stack (R0-R3, R12, LR, PC, xPSR) and switches to Handler mode. It then loads the ISR address from the vector table and executes the handler. On return, the processor uses a special EXC_RETURN value to restore state and continue where it left off. This is how tasks can be interrupted and later resumed safely.

SysTick is a 24-bit down-counter that reloads from a programmed value and can generate an interrupt each time it reaches zero. The ARM documentation defines registers such as CTRL (control/status), LOAD (reload value), VAL (current value), and CALIB (calibration). This standardized timer allows your RTOS to be portable across Cortex-M devices. By programming SysTick to fire every 1 ms, you create the heartbeat for timekeeping and preemptive scheduling. The interrupt handler updates the tick count, manages sleeping tasks, and triggers a context switch via PendSV.

The crucial design principle: keep ISRs short and deterministic. Your RTOS should defer heavy work to tasks, not do it in the interrupt itself. This is why you will build message queues and deferred processing in later projects.

How This Fits on Projects

Projects 2 and 4 implement SysTick and ISR-driven preemption. Projects 7 and 8 rely on ISR-safe IPC.

Definitions & Key Terms

  • ISR: Interrupt Service Routine
  • NVIC: Nested Vectored Interrupt Controller
  • SysTick: Cortex-M system timer
  • EXC_RETURN: special value used to return from exception

Mental Model Diagram

Normal Task ----> [Interrupt occurs]
      |           CPU pushes R0-R3,R12,LR,PC,xPSR
      v
  ISR runs (short)
      |
      v
  CPU pops saved registers, resume task

How It Works (Step-by-Step)

  1. SysTick counter reaches zero.
  2. NVIC asserts SysTick exception.
  3. CPU saves a hardware stack frame automatically.
  4. SysTick handler updates kernel tick and possibly triggers PendSV.
  5. Exception return restores registers and continues execution.

Minimal Concrete Example

volatile uint32_t g_tick = 0;

void SysTick_Handler(void) {
    g_tick++;
    rtos_tick(); // update timers, set PendSV if needed
}

int main(void) {
    SysTick_Config(SystemCoreClock / 1000); // 1ms tick
    while (1) {}
}

Common Misconceptions

  • “Interrupts save all registers” (only a subset is hardware-stacked)
  • “ISRs are just normal functions” (they run in Handler mode)
  • “SysTick is peripheral-specific” (it is ARM core IP)

Check-Your-Understanding Questions

  1. Which registers are stacked automatically on Cortex-M exception entry?
  2. Why should ISRs be short?
  3. What is SysTick used for in an RTOS?

Check-Your-Understanding Answers

  1. R0-R3, R12, LR, PC, xPSR.
  2. Long ISRs increase latency and jitter for other interrupts.
  3. Periodic tick for timekeeping and scheduler activation.

Real-World Applications

  • Periodic sensor sampling
  • Communication timeouts
  • Motor control loops driven by interrupts

Where You’ll Apply It

Projects 2, 4, 7, 8.

References

  • Cortex-M4 SysTick timer and register details: https://manuals.plus/m/a5ff6b2d88bcbc6b3ccdc368778a3c98f71c37239afbd2620c60acbffbbaa1fa
  • SysTick used for periodic OS context switching: https://arm-software.github.io/CMSIS_6/main/Core/group__SysTick__gr.html
  • Exception stack frame description and EXC_RETURN behavior: https://community.arm.com/support-forums/f/architectures-and-processors-forum/5291/the-reason-why-the-exception-frame-forms-on-psp

Key Insight

Interrupts and SysTick are the hardware heartbeat that makes preemption possible.

Summary

The exception model defines how the CPU saves and restores state. SysTick provides a portable, periodic interrupt that drives your scheduler.

Homework/Exercises to Practice the Concept

  1. Configure SysTick to generate a 2 ms tick and verify using GPIO toggles.
  2. Modify the SysTick ISR to count missed ticks if it runs late.
  3. Trigger a manual SysTick exception from software and observe behavior.

Solutions to the Homework/Exercises

  1. Set reload to (SystemCoreClock/500) - 1 and toggle GPIO in ISR.
  2. Compare current timer value at ISR entry to expected threshold.
  3. Write to SysTick CTRL to set COUNTFLAG and trigger interrupt.

4) Context Switching, Stack Frames, and Task Control Blocks

Fundamentals

A task is just a function with its own stack and saved CPU context. A context switch saves the state of one task and restores another. On Cortex-M, hardware automatically saves part of the state on exception entry, while software (your kernel) saves the rest. A Task Control Block (TCB) stores the stack pointer and metadata for each task.

Deep Dive into the Concept

Context switching is the core of multitasking. On Cortex-M, the processor uses two stack pointers: MSP (Main Stack Pointer) and PSP (Process Stack Pointer). By running tasks on PSP and exceptions on MSP, you can isolate kernel and task stacks. When a context switch is requested, the kernel triggers the PendSV exception, which runs at the lowest priority. PendSV is designed for this purpose: it is safe to run after all higher priority interrupts complete.

During PendSV, you save callee-saved registers (R4-R11) onto the current task’s stack. The hardware already saved R0-R3, R12, LR, PC, xPSR when entering the exception. You then store the PSP into the current TCB, choose the next task, load its PSP, restore its R4-R11, and exit the exception. The CPU automatically restores the rest of the context, and the new task resumes as if it had never been interrupted.

This logic explains why stacks must be carefully initialized. To start a task for the first time, you fake a stack frame as if the task had been interrupted. That means placing initial values for xPSR (Thumb bit set), PC (task entry), LR (task exit handler), and general registers. The first context switch simply “returns” into the task.

How This Fits on Projects

Projects 3 and 4 implement context switching and PendSV. Project 9 uses stack sizing and overflow detection.

Definitions & Key Terms

  • TCB: Task Control Block
  • PSP/MSP: Process and Main Stack Pointers
  • PendSV: Exception used for deferred context switches
  • Stack frame: saved register context on the stack

Mental Model Diagram

Task A stack: [R4..R11][HW frame]
      |
      | save PSP -> TCB A
      v
Switch
      ^
      | load PSP <- TCB B
Task B stack: [R4..R11][HW frame]

How It Works (Step-by-Step)

  1. Scheduler decides to switch tasks.
  2. Kernel triggers PendSV.
  3. PendSV saves R4-R11 onto current task stack.
  4. PSP stored in current TCB.
  5. Next TCB selected; PSP loaded.
  6. R4-R11 restored from new task stack.
  7. Exception return restores HW frame and resumes task.

Minimal Concrete Example

struct tcb { uint32_t *sp; uint8_t prio; };

__attribute__((naked)) void PendSV_Handler(void) {
    __asm volatile(
        "mrs r0, psp            \n" // get PSP
        "stmdb r0!, {r4-r11}    \n" // save callee-saved
        "ldr r1, =current_tcb   \n"
        "ldr r2, [r1]           \n"
        "str r0, [r2]           \n" // save SP
        "bl  schedule_next      \n" // select next task
        "ldr r1, =current_tcb   \n"
        "ldr r2, [r1]           \n"
        "ldr r0, [r2]           \n" // load SP
        "ldmia r0!, {r4-r11}    \n" // restore
        "msr psp, r0            \n"
        "bx lr                  \n" // exception return
    );
}

Common Misconceptions

  • “Context switch saves all registers automatically” (only some are automatic)
  • “Tasks run on MSP” (best practice is PSP)
  • “PendSV is just another interrupt” (it is special, low priority)

Check-Your-Understanding Questions

  1. Why does the CPU only save part of the context automatically?
  2. Why is PendSV used for context switching?
  3. What must be in a task’s initial stack frame?

Check-Your-Understanding Answers

  1. To minimize interrupt latency; software saves the rest when needed.
  2. It runs after all higher priority interrupts, making switching safe.
  3. xPSR (Thumb bit), PC, LR, and initial register values.

Real-World Applications

  • Any multitasking embedded system
  • Thread switching in commercial RTOS kernels

Where You’ll Apply It

Projects 3, 4, 9.

References

  • Exception entry stack frame and PendSV usage discussion: https://community.arm.com/support-forums/f/architectures-and-processors-forum/5291/the-reason-why-the-exception-frame-forms-on-psp

Key Insight

A context switch is just controlled stack manipulation plus the exception return mechanism.

Summary

Tasks are stacks with saved registers. PendSV provides a safe hook to swap those stacks.

Homework/Exercises to Practice the Concept

  1. Draw the exact stack layout for a task that has never run.
  2. Manually simulate a context switch using a debugger and registers.
  3. Add a guard pattern to detect stack overflow.

Solutions to the Homework/Exercises

  1. Stack should contain xPSR, PC, LR, R12, R3-R0 plus fake R4-R11.
  2. Use GDB to push/pop registers and update PSP, then continue.
  3. Fill stack with 0xDEADBEEF and check for corruption.

5) Scheduling, Task States, and Priority

Fundamentals

Scheduling decides which task runs next. An RTOS typically uses fixed-priority preemptive scheduling: the highest-priority READY task always runs. Tasks move between states (READY, RUNNING, BLOCKED) based on events and timeouts. Preemption ensures higher-priority tasks can interrupt lower-priority ones.

Deep Dive into the Concept

A scheduler is a policy implemented by a small amount of code. In cooperative scheduling, tasks run until they call yield() or block. In preemptive scheduling, a periodic interrupt (SysTick) forces the kernel to choose the next task. Fixed-priority preemption is simple and deterministic: if a task with higher priority becomes READY, the system switches immediately. This is the behavior described in CMSIS-RTOS.

More advanced policies include round-robin (time slicing among equal-priority tasks), rate-monotonic scheduling (shorter periods get higher priority), and earliest-deadline-first scheduling (dynamic priorities). You will implement fixed priority first because it is predictable and easiest to debug. Later projects add timeouts and sleep states so tasks can block without busy-waiting, which improves determinism and efficiency.

Task states are critical to correctness. A task is READY when it can run, RUNNING when it owns the CPU, and BLOCKED when it waits for an event (mutex, queue, timer). When a blocked task becomes ready, it might preempt the current task. If you mishandle states, you can starve tasks or create priority inversion. Priority inversion happens when a low-priority task holds a resource needed by a high-priority task. RTOS kernels typically mitigate this with priority inheritance on mutexes.

How This Fits on Projects

Projects 3-6 implement scheduling and task states. Project 6 demonstrates priority inversion and inheritance.

Definitions & Key Terms

  • READY/RUNNING/BLOCKED: task states
  • Preemption: interrupting a running task
  • Priority inheritance: temporarily boosting a task that holds a needed resource
  • Time slicing: round-robin scheduling among equal priorities

Mental Model Diagram

READY ---> RUNNING ---> BLOCKED ---> READY
     ^         |             |
     |         v             |
     +---- preempted <-------+

How It Works (Step-by-Step)

  1. SysTick or event makes a task READY.
  2. Scheduler compares priorities.
  3. If new task has higher priority, switch immediately.
  4. If equal priority, apply round-robin if enabled.
  5. If a task blocks, scheduler chooses next READY task.

Minimal Concrete Example

int schedule_next(void) {
    int best = idle_task;
    for (int i = 0; i < task_count; i++) {
        if (tasks[i].state == READY && tasks[i].prio > tasks[best].prio) {
            best = i;
        }
    }
    return best;
}

Common Misconceptions

  • “Priority always prevents starvation” (low priority can starve if no time slicing)
  • “Round robin is always better” (it can harm determinism for real-time tasks)
  • “Priority inversion cannot happen in small systems” (it can happen anywhere)

Check-Your-Understanding Questions

  1. When does a preemptive scheduler switch tasks?
  2. Why can priority inversion break real-time guarantees?
  3. What is the role of an idle task?

Check-Your-Understanding Answers

  1. When a higher-priority task becomes READY.
  2. It delays a high-priority task behind a lower-priority resource holder.
  3. It runs when no tasks are ready and can enter low-power mode.

Real-World Applications

  • Motor control prioritized above logging
  • Safety checks prioritized above UI updates

Where You’ll Apply It

Projects 3, 4, 5, 6.

References

  • Priority switch behavior and priority inheritance note: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__ThreadMgmt.html
  • Mutex priority inheritance attribute: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__MutexMgmt.html

Key Insight

Scheduling policy is the core contract between your code and time.

Summary

Fixed-priority preemption with correct task states is the simplest deterministic RTOS model.

Homework/Exercises to Practice the Concept

  1. Implement round-robin for equal-priority tasks and measure jitter impact.
  2. Create three tasks with different priorities and verify preemption order.
  3. Simulate a priority inversion scenario and describe the outcome.

Solutions to the Homework/Exercises

  1. Track a time slice counter and rotate tasks of equal priority.
  2. Use GPIO toggles to visualize which task runs first.
  3. Low-priority task locks mutex, high-priority blocks, medium-priority runs.

6) Synchronization and Inter-Task Communication (IPC)

Fundamentals

Tasks share resources and must coordinate. Synchronization primitives prevent corruption and enforce ordering. Mutexes provide mutual exclusion; semaphores signal events or resource availability; message queues pass data between tasks; event flags coordinate multiple conditions. Improper synchronization leads to deadlocks, priority inversion, and missed deadlines.

Deep Dive into the Concept

The simplest synchronization primitive is a critical section: disable interrupts, manipulate shared state, re-enable. This is fast but increases interrupt latency, so it must be short. Mutexes are more flexible: tasks can block, allowing the CPU to run other tasks instead of busy-waiting. However, mutexes introduce priority inversion. Priority inheritance (optional but common) temporarily raises the priority of the mutex owner to that of the highest waiting task, preventing unbounded blocking. CMSIS-RTOS documents this behavior and recommends using mutex attributes to enable it.

Semaphores generalize mutexes: a binary semaphore can signal an event, while a counting semaphore can track multiple resources. Queues provide structured communication: tasks send messages, and receivers block until data arrives. Event flags allow a task to wait for multiple conditions simultaneously (e.g., sensor ready AND buffer free). A robust RTOS needs all these primitives, and you must implement them with interrupt-safe operations to avoid race conditions.

IPC is not just about correctness; it is about determinism. If a high-priority task blocks on a queue, it must unblock within a bounded time. If a queue is full, the sender must block or drop data predictably. Real-time design means you must define these behaviors explicitly.

How This Fits on Projects

Projects 6 and 7 implement mutexes and queues. Project 8 uses event flags and software timers.

Definitions & Key Terms

  • Mutex: mutual exclusion lock
  • Semaphore: signaling or counting primitive
  • Queue: FIFO buffer for messages
  • Event flags: bitmask-based synchronization

Mental Model Diagram

Producer Task --> [Queue] --> Consumer Task
            \                 /
             \--- semaphore --

How It Works (Step-by-Step)

  1. Task tries to acquire a mutex.
  2. If available, it locks and continues; else it blocks.
  3. When released, highest priority waiting task is unblocked.
  4. Queues store messages in a ring buffer; send/receive block on full/empty.
  5. Event flags allow tasks to wait for multiple conditions via bitmask.

Minimal Concrete Example

// Simple queue send (blocking)
if (queue_full(q)) {
    block_current_task(q);
}
queue_put(q, msg);

Common Misconceptions

  • “Mutex = semaphore” (mutexes are for mutual exclusion and ownership)
  • “Disabling interrupts is always OK” (it increases latency)
  • “Queues are just buffers” (they are synchronization objects too)

Check-Your-Understanding Questions

  1. When should you use a mutex instead of a semaphore?
  2. What problem does priority inheritance solve?
  3. Why must queue operations be interrupt-safe?

Check-Your-Understanding Answers

  1. Use a mutex when a resource has an owner and must be released by same task.
  2. It prevents high-priority tasks from being blocked indefinitely by lower-priority holders.
  3. Because ISRs can modify the queue concurrently.

Real-World Applications

  • Sensor producer feeding a logging task
  • UART driver signaling a processing task

Where You’ll Apply It

Projects 6, 7, 8.

References

  • CMSIS-RTOS2 priority inheritance description: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__ThreadMgmt.html
  • CMSIS-RTOS2 mutex attribute for priority inheritance: https://arm-software.github.io/CMSIS_6/main/RTOS2/group__CMSIS__RTOS__MutexMgmt.html

Key Insight

Synchronization defines both safety and timing; it is a real-time concern.

Summary

Mutexes, semaphores, queues, and event flags are the core tools for safe concurrency.

Homework/Exercises to Practice the Concept

  1. Build a binary semaphore that an ISR can give and a task can take.
  2. Implement a fixed-size queue with blocking send/receive.
  3. Demonstrate priority inversion with three tasks.

Solutions to the Homework/Exercises

  1. Protect the semaphore count with critical sections.
  2. Use head/tail indices and block when full/empty.
  3. Low task locks mutex, high task waits, medium runs until inheritance applied.

7) Time Services and Software Timers

Fundamentals

An RTOS must provide time-based services: delays, timeouts, periodic timers, and tick counters. These services are typically built on top of the system tick. A task should be able to sleep without busy-waiting, and timers should execute callbacks or release tasks on schedule.

Deep Dive into the Concept

Time services turn the SysTick heartbeat into usable APIs. The kernel maintains a tick counter and data structures for delayed tasks. A simple approach: store a wake-up tick in each task and scan all tasks each tick. A more efficient approach: maintain a sorted delay list or a timer wheel. The tradeoff is complexity versus CPU overhead.

Software timers are virtual timers built on top of the same tick. A timer object stores a period and callback. Each tick decrements timers and triggers callbacks when zero. In safety-critical systems, callbacks should be minimal and often just unblock a task.

Advanced systems use tickless idle: when no tasks are ready, the kernel programs a hardware timer to wake up at the next scheduled event and stops the SysTick to save power. You will not implement full tickless idle in this guide, but you will build the foundations: accurate time tracking and minimal ISR overhead.

How This Fits on Projects

Projects 5 and 8 implement delays, timeouts, and software timers.

Definitions & Key Terms

  • Tick: periodic timebase (often 1 ms)
  • Sleep/Delay: task blocks for N ticks
  • Timer: callback after a delay or periodically
  • Tickless idle: suppressing ticks during idle

Mental Model Diagram

Tick -> update tick count -> check timers -> wake tasks

How It Works (Step-by-Step)

  1. SysTick fires every N cycles.
  2. Kernel increments global tick.
  3. Timer list updated; expired timers run callbacks or wake tasks.
  4. Scheduler runs highest priority READY task.

Minimal Concrete Example

void sleep_ticks(uint32_t ticks) {
    current->wake_tick = g_tick + ticks;
    current->state = BLOCKED;
    schedule();
}

Common Misconceptions

  • “Delay is just a busy loop” (busy loops waste CPU)
  • “Timers are independent of ticks” (software timers usually depend on tick)
  • “Tickless idle is trivial” (it needs careful time accounting)

Check-Your-Understanding Questions

  1. Why do we avoid busy-wait delays in an RTOS?
  2. What is the difference between a one-shot and periodic timer?
  3. What can go wrong if tick overflow is not handled?

Check-Your-Understanding Answers

  1. It wastes CPU and breaks determinism.
  2. One-shot fires once; periodic reloads each period.
  3. Time comparisons can fail, causing missed wakeups.

Real-World Applications

  • Periodic sensor sampling
  • Timeouts in communication protocols

Where You’ll Apply It

Projects 5, 8.

References

  • SysTick periodic interrupts for OS scheduling: https://arm-software.github.io/CMSIS_6/main/Core/group__SysTick__gr.html

Key Insight

Time services are where the OS becomes useful to applications.

Summary

Delays, timeouts, and timers turn hardware ticks into predictable scheduling.

Homework/Exercises to Practice the Concept

  1. Implement a sorted delay list and compare with a linear scan.
  2. Add a periodic timer that toggles a GPIO.
  3. Make your tick counter wrap safely after overflow.

Solutions to the Homework/Exercises

  1. Keep list sorted by wake tick and pop expired timers.
  2. Create a timer object with period and callback.
  3. Use unsigned arithmetic comparisons with wraparound.

8) Memory Management and Stack Safety

Fundamentals

Embedded systems rarely use a full heap. Instead, they rely on static allocation and fixed-size pools. Every task needs a stack; if the stack overflows, it corrupts memory silently. RTOS kernels must provide stack sizing, overflow detection, and safe allocation strategies.

Deep Dive into the Concept

Memory is the tightest constraint on many MCUs. A Cortex-M4 might have 128 KB of RAM, and each task stack consumes part of it. You must estimate stack depth based on call depth, interrupt nesting, and local variables. A safe strategy is to fill stacks with a known pattern (0xA5A5A5A5) and later check the high-water mark. Many commercial RTOS kernels provide stack watermarking for this reason.

Dynamic allocation can introduce fragmentation and unpredictability. For a real-time kernel, deterministic memory behavior is often more important than flexibility. Fixed-size block pools provide constant-time allocation. You will implement a simple pool allocator to guarantee that allocations succeed or fail predictably.

Stack safety also interacts with the exception model. Because interrupts use a stack (MSP or PSP), stack corruption can cause instant hard faults. Some systems use guard regions or MPU (Memory Protection Unit) to detect overflow. While we will not implement MPU protection here, you will design for safety by reserving guard patterns and checking them periodically.

How This Fits on Projects

Project 9 implements memory pools and stack overflow detection.

Definitions & Key Terms

  • Stack watermark: deepest stack usage measurement
  • Fragmentation: unused memory scattered in heap
  • Memory pool: fixed-size block allocator
  • Guard pattern: known value to detect overflow

Mental Model Diagram

Task Stack (grows down)
┌───────────────┐  High addr
│   Free space  │
│   (pattern)   │
│--------------│ <-- high-water mark
│  Used stack   │
└───────────────┘  Low addr

How It Works (Step-by-Step)

  1. Allocate fixed stack array for each task.
  2. Fill with guard pattern.
  3. Periodically scan to find deepest usage.
  4. If guard region corrupted, flag overflow.
  5. Use memory pools for deterministic allocation.

Minimal Concrete Example

#define STACK_SIZE 256
uint32_t task1_stack[STACK_SIZE];

void init_stack(uint32_t *stack) {
    for (int i = 0; i < STACK_SIZE; i++) stack[i] = 0xA5A5A5A5;
}

Common Misconceptions

  • “Stack size is always enough if it works once” (usage varies by path)
  • “Heap is fine for RTOS” (fragmentation breaks determinism)
  • “Overflow always crashes immediately” (it often corrupts silently)

Check-Your-Understanding Questions

  1. Why is heap fragmentation dangerous in real-time systems?
  2. How do you measure maximum stack usage safely?
  3. What is the benefit of fixed-size memory pools?

Check-Your-Understanding Answers

  1. It makes allocation time unpredictable and can fail unexpectedly.
  2. Fill with a pattern and measure how much was overwritten.
  3. Constant-time allocation and deterministic behavior.

Real-World Applications

  • Safety-critical systems requiring deterministic memory
  • Certified embedded systems with static allocation only

Where You’ll Apply It

Project 9.

References

  • SysTick and system timing (for stack usage during ISR): https://arm-software.github.io/CMSIS_6/main/Core/group__SysTick__gr.html

Key Insight

Memory determinism is as important as CPU determinism in an RTOS.

Summary

You must allocate stacks and memory in a predictable, measurable way to keep real-time guarantees.

Homework/Exercises to Practice the Concept

  1. Implement a fixed block allocator for 32-byte objects.
  2. Measure stack high-water marks for each task.
  3. Trigger a stack overflow and catch it with a guard pattern.

Solutions to the Homework/Exercises

  1. Use a free list of block pointers.
  2. Scan for untouched 0xA5A5A5A5 values.
  3. Make a recursive function and verify overflow detection.

Glossary

  • RTOS: Real-Time Operating System with deterministic scheduling
  • TCB: Task Control Block
  • ISR: Interrupt Service Routine
  • SysTick: ARM system timer
  • PendSV: exception for deferred context switches
  • Priority inversion: low-priority task blocks high-priority task
  • WCET: worst-case execution time
  • Jitter: variation in timing
  • Tickless idle: stopping periodic ticks to save power

Why RTOS Matters

Modern systems are filled with embedded devices. IoT Analytics estimates 18.5 billion connected IoT devices in 2024 and projects 21.1 billion by the end of 2025. These devices must respond predictably to real-world events. Many are safety-critical (automotive, medical, industrial control), where a missed deadline can cause physical damage. An RTOS provides the structure and determinism required for such systems.

Context & Evolution

Early embedded systems used super-loops and interrupts. As systems grew, the complexity of concurrency and timing made these designs fragile. RTOS kernels emerged to formalize scheduling, synchronization, and time management, replacing ad-hoc designs with predictable mechanisms.

Old vs New (ASCII)

Super-loop design                    RTOS design
┌──────────────┐                    ┌──────────────┐
│ loop()       │                    │ scheduler    │
│  taskA()     │                    │ taskA (prio) │
│  taskB()     │                    │ taskB (prio) │
│  taskC()     │                    │ taskC (prio) │
└──────────────┘                    └──────────────┘
        ^                                    ^
   timing implicit                     timing explicit

Concept Summary Table

Concept What You Must Internalize Key Artifacts Projects
Real-Time Fundamentals deadlines, latency, jitter, WCET timing budget 2,4,5,8,10
Boot + Memory Map startup, vector table, linker script linker.ld, startup.s 1,3,4,9
Interrupts + SysTick exception entry, SysTick registers ISR, SysTick_Handler 2,4,7,8
Context Switching stack frame, PSP/MSP, PendSV PendSV_Handler, TCB 3,4,9
Scheduling + States READY/RUNNING/BLOCKED, priority scheduler.c 3,4,5,6
Synchronization + IPC mutex, sem, queue, events mutex.c, queue.c 6,7,8
Time Services delay, timeout, timers tick.c, timer.c 5,8
Memory Management stacks, pools, overflow checks mempool.c 9

Project-to-Concept Map

Project Concepts Applied
1. Bare-Metal Hello World Boot + Memory Map
2. System Tick Interrupt Interrupts + SysTick, Real-Time Fundamentals
3. Cooperative Scheduler Context Switching, Scheduling
4. Preemptive Scheduler Interrupts + SysTick, Scheduling, Context Switching
5. Sleep/Delay + Idle Task Time Services, Scheduling
6. Mutex + Priority Inversion Synchronization, Scheduling
7. Message Queue + ISR Deferral IPC, Interrupts
8. Event Flags + Software Timers Time Services, IPC
9. Memory Pool + Stack Safety Memory Management
10. Latency Measurement Toolkit Real-Time Fundamentals, Interrupts

Deep Dive Reading by Concept

Concept Book + Chapters Why This Matters
Real-Time Fundamentals Real-Time Concepts for Embedded Systems Ch. 1-3 Defines hard vs soft real-time and timing metrics
Boot + Memory Map Making Embedded Systems (2nd ed) Ch. 3-4 Board bring-up, datasheets, and timing setup underpin every RTOS
Interrupts + SysTick Making Embedded Systems (2nd ed) Ch. 5 Interrupt timing and ISR structure define kernel responsiveness
Context Switching Real-Time Concepts for Embedded Systems Ch. 5 Task switching and CPU context preservation
Scheduling + Tasks Real-Time Concepts for Embedded Systems Ch. 4-5 RTOS fundamentals and task design
Synchronization + IPC Real-Time Concepts for Embedded Systems Ch. 6-7, 15 Semaphores, queues, and communication
Time Services Real-Time Concepts for Embedded Systems Ch. 11 Timer services and scheduling
Memory Management Real-Time Concepts for Embedded Systems Ch. 13 Deterministic allocation and memory control
RTOS Fundamentals Zephyr RTOS Embedded C Programming Ch. 2, 4, 5 Practical RTOS primitives

Quick Start (First 48 Hours)

Day 1: Toolchain and Bare Metal

  • Install arm-none-eabi-gcc, OpenOCD, GDB
  • Build and flash Project 1 (LED blink)
  • Verify you can single-step in GDB

Day 2: Interrupts and Tick

  • Implement Project 2 SysTick
  • Toggle a GPIO in the ISR and measure with a logic analyzer
  • Confirm tick count accuracy at 1 ms

If you get stuck, pause and read the Theory Primer chapters on Boot and Interrupts.

  1. Embedded First Path: Projects 1-4, then 5-6, then 7-10
  2. OS Background Path: Read Theory Primer 4-6 first, then Projects 1-4
  3. Timing Obsessed Path: Projects 2, 4, 5, 10 first, then the rest

Success Metrics

  • You can explain the Cortex-M exception stack frame from memory
  • You can implement a context switch in 20 lines of assembly
  • You can measure task latency and jitter with GPIO and a scope
  • Your kernel can run at least 5 tasks with deterministic timing
  • You can demonstrate and resolve priority inversion

Optional Appendices

Appendix A: GDB/OpenOCD Debugging Cheatsheet

# Flash + reset
openocd -f interface/stlink.cfg -f target/stm32f4x.cfg
arm-none-eabi-gdb build.elf
(gdb) target remote :3333
(gdb) monitor reset halt
(gdb) load

Appendix B: Measuring Latency with GPIO

  • Toggle a GPIO at ISR entry and exit
  • Measure pulse width with scope
  • Latency = time from event to ISR entry

Project Overview Table

Project Core Output Core Concepts Difficulty
1. Bare-Metal Hello World LED blink, custom linker/startup Boot + memory map Advanced
2. System Tick Interrupt 1 ms SysTick + ISR Interrupts + SysTick Advanced
3. Cooperative Scheduler Two tasks, manual yield Context switch + TCB Expert
4. Preemptive Scheduler Priority preemption Scheduling + SysTick Expert
5. Sleep/Delay + Idle Task Tick-based sleep Time services Expert
6. Mutex + Priority Inversion Priority inheritance demo Sync + scheduling Expert
7. Message Queue + ISR Deferral Producer-consumer queue IPC + ISR safety Expert
8. Event Flags + Timers Periodic timer callbacks Time + IPC Expert
9. Memory Pool + Stack Safety Deterministic allocation Memory mgmt Expert
10. Latency Measurement Toolkit Jitter and latency report Real-time analysis Expert

Project List

Project 1: The Bare-Metal “Hello, World”

Real World Outcome

You will flash a raw ELF/bin to the MCU and see a single LED blink at a fixed interval. When you connect GDB, you can set a breakpoint at main and single-step through register writes. Example terminal session:

$ make flash
Open On-Chip Debugger 0.12.0
Info : Listening on port 3333 for gdb connections
Info : stm32f4x.cpu: hardware has 6 breakpoints, 4 watchpoints
wrote 8192 bytes from file build/rtos.elf in 0.542s

The Core Question You’re Answering

How does a CPU start running code with no operating system, and how do I control hardware directly?

Concepts You Must Understand First

  • Boot + memory map (Theory Primer 2; Making Embedded Systems 2nd ed Ch. 3-4)
  • Interrupts + vector table basics (Primer 3; Making Embedded Systems 2nd ed Ch. 5)
  • Memory-mapped I/O (Primer 2; Making Embedded Systems 2nd ed Ch. 4)

Questions to Guide Your Design

  • Where is the vector table placed and how is it aligned?
  • Which GPIO register controls the LED pin?
  • How will you create a delay without a timer?

Thinking Exercise

Draw the exact memory map (Flash + RAM) and mark where .text, .data, .bss, and your stack reside.

The Interview Questions They’ll Ask

  1. What is in the first two words of the vector table?
  2. Why must volatile be used for peripheral registers?
  3. What does a linker script do?
  4. Why do you clear .bss on reset?

Hints in Layers

  1. Start by writing a minimal startup.s with vector table and reset handler.
  2. Use the reference manual to find RCC and GPIO registers.
  3. Toggle the pin using BSRR to avoid read-modify-write hazards.
  4. Add a delay loop only after GPIO output works.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Making Embedded Systems (2nd ed) | Ch. 3-4 | Board bring-up, I/O, timers | | The GNU Make Book | Ch. 1-3 | Build system basics |

Common Pitfalls & Debugging

Problem: LED never blinks

  • Why: GPIO clock not enabled in RCC
  • Fix: Set the correct bit in RCC_AHB1ENR
  • Quick test: Read back RCC_AHB1ENR in GDB

Problem: HardFault on startup

  • Why: Stack pointer invalid or vector table misaligned
  • Fix: Verify linker script and vector table address
  • Quick test: Inspect SP after reset in GDB

Definition of Done

  • Vector table is in correct flash location
  • Reset handler sets up .data and .bss
  • GPIO configured correctly
  • LED blinks at a visible rate

Project 2: The System Tick Interrupt

Real World Outcome

Your firmware prints a tick counter over UART every second, and a GPIO toggles at exactly 1 kHz. Example UART output:

[000001000] tick=1000
[000002000] tick=2000
[000003000] tick=3000

The Core Question You’re Answering

How do I create a precise hardware timebase that can drive an RTOS scheduler?

Concepts You Must Understand First

  • SysTick registers and reload values (Primer 3; Making Embedded Systems 2nd ed Ch. 4)
  • Interrupt entry/exit timing (Primer 3; Real-Time Concepts Ch. 10)
  • Real-time timing fundamentals (Primer 1; Real-Time Concepts Ch. 1,4)

Questions to Guide Your Design

  • What reload value yields a 1 ms interrupt?
  • How do you minimize ISR execution time?
  • How will you verify tick accuracy?

Thinking Exercise

Calculate the maximum jitter introduced if your SysTick ISR takes 5 us on a 1 ms tick.

The Interview Questions They’ll Ask

  1. Why is SysTick preferred for RTOS tick generation?
  2. What happens if SysTick fires while interrupts are disabled?
  3. How can you detect if your ISR is too slow?
  4. What is COUNTFLAG used for?

Hints in Layers

  1. Use CMSIS SysTick_Config(SystemCoreClock/1000).
  2. Toggle a GPIO at ISR entry to measure timing.
  3. Keep ISR under 10 us to minimize jitter.
  4. Store tick counter in volatile.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 11 | Timer services | | Making Embedded Systems (2nd ed) | Ch. 4-5 | Timers and interrupts |

Common Pitfalls & Debugging

Problem: SysTick never fires

  • Why: CTRL not enabling interrupt
  • Fix: Set ENABLE and TICKINT bits
  • Quick test: Read SysTick CTRL in GDB

Problem: Tick count drifts

  • Why: Wrong SystemCoreClock value
  • Fix: Verify clock config (PLL, prescalers)
  • Quick test: Compare with scope measurement

Definition of Done

  • SysTick ISR fires every 1 ms
  • GPIO toggle measured at 1 kHz
  • Tick counter increments correctly
  • UART reports tick values without drift

Project 3: A Cooperative Multi-Tasking Scheduler

Real World Outcome

Two tasks blink LEDs independently. The scheduler switches tasks when each calls yield(). GDB shows separate stack pointers per task. You can add a third task without changing kernel logic.

The Core Question You’re Answering

How does a context switch work at the register and stack level?

Concepts You Must Understand First

  • Context switching and stack frames (Primer 4; Real-Time Concepts Ch. 5)
  • Boot and memory layout (Primer 2; Making Embedded Systems 2nd ed Ch. 3)
  • Task basics and states (Primer 5; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

  • Where will each task’s stack live?
  • How will you initialize a task stack frame?
  • What registers must be saved/restored manually?

Thinking Exercise

Sketch the stack of a newly created task and label each saved register.

The Interview Questions They’ll Ask

  1. Why are R4-R11 saved manually in PendSV?
  2. What is the difference between MSP and PSP?
  3. How does the scheduler decide the next task?
  4. What happens if two tasks share a stack?

Hints in Layers

  1. Start with a TCB containing only a stack pointer.
  2. Build a fake exception frame for new tasks.
  3. Use PendSV to save/restore registers.
  4. Verify context switch by watching R4 values per task.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 4-5 | RTOS task basics | | Making Embedded Systems (2nd ed) | Ch. 6 | Managing flow of activity |

Common Pitfalls & Debugging

Problem: Task crashes on first run

  • Why: Incorrect initial stack frame (xPSR Thumb bit missing)
  • Fix: Set xPSR to 0x01000000
  • Quick test: Inspect stack frame in memory

Problem: Tasks overwrite each other

  • Why: Stacks overlap in RAM
  • Fix: Reserve distinct stack arrays
  • Quick test: Fill stacks with patterns and check overlap

Definition of Done

  • At least two tasks run and yield correctly
  • Each task has its own stack
  • Context switch saves and restores registers
  • Scheduler scales to N tasks

Project 4: A Preemptive, Priority-Based Scheduler

Real World Outcome

You will see a high-priority task preempt a lower-priority task on every tick. A GPIO toggled in the high-priority task interrupts the low-priority blink pattern. This confirms preemption.

The Core Question You’re Answering

How does the OS forcefully take control of the CPU to meet deadlines?

Concepts You Must Understand First

  • SysTick interrupts (Primer 3; Real-Time Concepts Ch. 10-11)
  • Scheduling and priorities (Primer 5; Real-Time Concepts Ch. 4-5)
  • Context switching (Primer 4; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

  • How do you trigger PendSV from SysTick?
  • How do you manage READY/RUNNING states safely?
  • How do you ensure the highest priority task runs immediately?

Thinking Exercise

Simulate three tasks (priorities 3, 2, 1) and draw the execution timeline when a high-priority task becomes ready mid-tick.

The Interview Questions They’ll Ask

  1. Why is PendSV given the lowest priority?
  2. What is priority inversion and how would it show up here?
  3. How can you ensure preemption happens immediately?
  4. What happens if no task is READY?

Hints in Layers

  1. In SysTick ISR, set PendSV pending bit.
  2. Maintain task state array and select highest READY.
  3. Add an idle task with lowest priority.
  4. Verify preemption with GPIO toggles.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 4-5 | RTOS scheduling | | Making Embedded Systems (2nd ed) | Ch. 5-6 | Interrupts and flow of activity |

Common Pitfalls & Debugging

Problem: Preemption never happens

  • Why: PendSV priority too high
  • Fix: Set PendSV to lowest priority in NVIC
  • Quick test: Check NVIC priority registers

Problem: Kernel crashes in ISR

  • Why: Re-entrant scheduler or interrupts not masked
  • Fix: Use critical sections around scheduler data
  • Quick test: Disable interrupts during task list update

Definition of Done

  • SysTick triggers preemption
  • Highest priority READY task runs immediately
  • Idle task runs when no tasks are ready
  • Task states are consistent under load

Project 5: Sleep, Delay, and Idle Task

Real World Outcome

Tasks can call sleep_ms(100) and reliably resume. An idle task runs when no work is available and toggles a GPIO slowly. Tick-based timing is visible on a scope.

The Core Question You’re Answering

How do tasks block without busy-waiting and still meet deadlines?

Concepts You Must Understand First

  • Time services and delays (Primer 7; Real-Time Concepts Ch. 11)
  • Scheduling and task states (Primer 5; Real-Time Concepts Ch. 4-5)
  • Tick wraparound handling (Primer 7; Real-Time Concepts Ch. 9,11)

Questions to Guide Your Design

  • How will you track wake-up ticks?
  • How do you handle tick counter overflow?
  • How does the idle task reduce power?

Thinking Exercise

Design a data structure to store sleeping tasks efficiently. Compare a linear scan vs sorted list.

The Interview Questions They’ll Ask

  1. Why is busy-waiting bad in RTOS design?
  2. What happens when tick counter wraps around?
  3. How does an idle task improve power efficiency?
  4. How do you ensure a sleeping task wakes exactly on time?

Hints in Layers

  1. Store wake_tick in each TCB.
  2. Each tick, check for expired tasks and mark READY.
  3. Use unsigned arithmetic to handle wraparound.
  4. Add idle hook to enter low-power mode.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 11 | Timer services | | Zephyr RTOS Embedded C Programming | Ch. 2 | RTOS fundamentals |

Common Pitfalls & Debugging

Problem: Tasks never wake

  • Why: Incorrect tick comparison with overflow
  • Fix: Use if ((int32_t)(now - wake) >= 0)
  • Quick test: Force tick near overflow and test

Problem: Idle task runs too often

  • Why: READY tasks never marked correctly
  • Fix: Validate state transitions
  • Quick test: Log state changes via UART

Definition of Done

  • Tasks can sleep for N ticks
  • Idle task runs when all others are blocked
  • Tick overflow handled correctly
  • Measured sleep times are accurate

Project 6: Mutexes and Priority Inversion Demo

Real World Outcome

You will create three tasks (low, medium, high priority). The low task locks a mutex, high task blocks, medium task runs and delays high task. Then you enable priority inheritance and observe the fix.

The Core Question You’re Answering

How do you prevent a low-priority task from breaking real-time guarantees?

Concepts You Must Understand First

  • Synchronization and mutexes (Primer 6; Real-Time Concepts Ch. 6,15)
  • Scheduling and priorities (Primer 5; Real-Time Concepts Ch. 4-5)
  • Task states and blocking (Primer 5; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

  • How does a mutex track its owner?
  • How will you implement priority inheritance?
  • What happens when a task holding a mutex blocks?

Thinking Exercise

Draw a timing diagram showing priority inversion with and without inheritance.

The Interview Questions They’ll Ask

  1. What is priority inversion?
  2. How does priority inheritance work?
  3. Why is priority inheritance typically limited to mutexes?
  4. Can priority inheritance cause deadlocks?

Hints in Layers

  1. Add owner and lock_count to the mutex struct.
  2. When high-priority task blocks, temporarily boost owner priority.
  3. Restore original priority on unlock.
  4. Log priority changes to UART for visibility.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 6 | Semaphores and mutexes | | Zephyr RTOS Embedded C Programming | Ch. 4 | Multithreading and synchronization |

Common Pitfalls & Debugging

Problem: Deadlock after inheritance

  • Why: Recursive lock without support
  • Fix: Either disallow or implement recursive mutex
  • Quick test: Add assertions on lock count

Problem: Priority never restored

  • Why: Missing restore path on unlock
  • Fix: Store original priority in mutex
  • Quick test: Print priority before/after unlock

Definition of Done

  • Priority inversion reproduced and measured
  • Priority inheritance fixes the inversion
  • Mutex ownership and unlock correctness verified
  • No deadlocks under stress

Project 7: Message Queue and ISR Deferral

Real World Outcome

An ISR pushes sensor data into a queue, and a lower-priority task processes it. UART logs show clean producer-consumer flow without dropped messages.

The Core Question You’re Answering

How do you safely move data from an interrupt to a task without losing determinism?

Concepts You Must Understand First

  • Interrupt model and ISR constraints (Primer 3; Real-Time Concepts Ch. 10)
  • IPC queues (Primer 6; Real-Time Concepts Ch. 7)
  • Task wakeup mechanics (Primer 5; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

  • How will ISR-safe queue operations differ from task-level operations?
  • What happens when the queue is full?
  • How will you wake a blocked consumer task?

Thinking Exercise

Design a ring buffer with head/tail indices and explain how to avoid race conditions.

The Interview Questions They’ll Ask

  1. Why should ISRs not do heavy processing?
  2. How do you make a queue ISR-safe?
  3. What is a deferred interrupt?
  4. How do you handle queue overflow?

Hints in Layers

  1. Use a fixed-size ring buffer with power-of-two size.
  2. In ISR, disable interrupts only around index updates.
  3. If queue is full, drop oldest or increment a loss counter.
  4. Wake the consumer task by setting its state to READY.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 7, 10 | Message queues and interrupts | | Making Embedded Systems (2nd ed) | Ch. 5-6 | Interrupts and flow control |

Common Pitfalls & Debugging

Problem: Queue corrupted after ISR

  • Why: Non-atomic head/tail update
  • Fix: Use critical section or atomic ops
  • Quick test: Enable queue integrity checks

Problem: Consumer never wakes

  • Why: Missing state transition or PendSV trigger
  • Fix: Set task READY and trigger scheduler
  • Quick test: Set breakpoint in scheduler

Definition of Done

  • ISR can enqueue data without blocking
  • Consumer task processes all data
  • Overflow behavior is defined and tested
  • No race conditions observed

Project 8: Event Flags and Software Timers

Real World Outcome

Multiple tasks wait on event flags (bitmask). A software timer fires every 100 ms and sets a flag; another timer provides a 1-second heartbeat. LEDs and UART logs confirm flag-driven execution.

The Core Question You’re Answering

How do you coordinate multiple conditions and periodic events efficiently?

Concepts You Must Understand First

  • Event flags and kernel objects (Primer 6; Real-Time Concepts Ch. 8,15)
  • Software timers and timeouts (Primer 7; Real-Time Concepts Ch. 11)
  • Scheduling under time constraints (Primer 5; Real-Time Concepts Ch. 4-5)

Questions to Guide Your Design

  • How will you store and atomically update event flags?
  • What data structure will manage multiple timers?
  • How will you handle missed or delayed timer events?

Thinking Exercise

Design a bitmask-based wait that supports wait-any and wait-all semantics.

The Interview Questions They’ll Ask

  1. What is the difference between a queue and event flags?
  2. How do you implement periodic software timers?
  3. Why should timer callbacks be short?
  4. How do you avoid timer drift?

Hints in Layers

  1. Use a 32-bit mask for event flags.
  2. Support wait-any by unblocking when (flags & mask) != 0.
  3. Use a sorted timer list for efficiency.
  4. Use absolute next-fire time to reduce drift.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 8, 11 | Kernel objects and timers | | Zephyr RTOS Embedded C Programming | Ch. 5 | Work queues and messaging |

Common Pitfalls & Debugging

Problem: Flags missed

  • Why: Flag cleared before task unblocks
  • Fix: Use sticky bits until acknowledged
  • Quick test: Add logging around flag set/clear

Problem: Timer drift

  • Why: Next trigger based on current time instead of absolute
  • Fix: Add period to scheduled time
  • Quick test: Compare timestamps over 1000 cycles

Definition of Done

  • Event flags support wait-any and wait-all
  • Software timers fire at correct periods
  • Timer callbacks are deterministic
  • No missed events under load

Project 9: Memory Pool and Stack Safety

Real World Outcome

Your kernel allocates buffers from a fixed memory pool with constant-time behavior. Stack high-water marks are reported over UART, and deliberate overflow triggers a safe error.

The Core Question You’re Answering

How do you make memory usage deterministic and safe in an RTOS?

Concepts You Must Understand First

  • Memory pools and deterministic allocation (Primer 8; Real-Time Concepts Ch. 13)
  • Stack safety and sizing (Primer 8; Making Embedded Systems 2nd ed Ch. 11)
  • Context switching (Primer 4; Real-Time Concepts Ch. 5)

Questions to Guide Your Design

  • How big should each task stack be?
  • How will you detect stack overflow in runtime?
  • What pool sizes are appropriate for IPC messages?

Thinking Exercise

Estimate stack depth for a task that calls three nested functions, each with 64 bytes of local data.

The Interview Questions They’ll Ask

  1. Why do RTOS kernels often avoid malloc/free?
  2. What is a stack watermark and how is it measured?
  3. How do memory pools improve determinism?
  4. What happens if a stack overflows during an ISR?

Hints in Layers

  1. Fill stacks with a known pattern at init.
  2. Scan stack memory periodically for watermark.
  3. Build a pool as a linked list of fixed blocks.
  4. On overflow, trigger a safe fault handler.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 13 | Memory management | | Making Embedded Systems (2nd ed) | Ch. 11 | Optimization and resource limits |

Common Pitfalls & Debugging

Problem: Pool allocator corrupts memory

  • Why: Double free or invalid pointer
  • Fix: Add allocation state flags
  • Quick test: Stress test with random allocations

Problem: Stack watermark always zero

  • Why: Pattern overwritten everywhere
  • Fix: Increase stack size, reduce recursion
  • Quick test: Measure after minimal workload

Definition of Done

  • Memory pool allocations are constant time
  • Stack high-water marks reported
  • Overflow detection triggers error
  • No heap usage in kernel

Project 10: Latency and Jitter Measurement Toolkit

Real World Outcome

You will generate a report of interrupt latency and task jitter. GPIO pulses show worst-case ISR latency, and UART logs show jitter statistics. This turns your RTOS into a measurable system.

The Core Question You’re Answering

How do you verify that your RTOS actually meets timing guarantees?

Concepts You Must Understand First

  • Real-time fundamentals and metrics (Primer 1; Real-Time Concepts Ch. 16)
  • Interrupt timing behavior (Primer 3; Real-Time Concepts Ch. 10)
  • Time services and tick accuracy (Primer 7; Real-Time Concepts Ch. 11)

Questions to Guide Your Design

  • What metrics will you capture (latency, jitter, WCET)?
  • How will you timestamp events without disturbing timing?
  • How will you report the data?

Thinking Exercise

Design an experiment to measure how much jitter increases when you enable a heavy ISR.

The Interview Questions They’ll Ask

  1. How do you measure interrupt latency on real hardware?
  2. What is the difference between latency and response time?
  3. Why is jitter important in control systems?
  4. How do you confirm worst-case behavior?

Hints in Layers

  1. Toggle a GPIO at ISR entry and exit, measure with a scope.
  2. Use the DWT cycle counter if available for precise timing.
  3. Log timestamps to UART in a low-priority task.
  4. Stress the system with extra ISRs to see worst-case effects.

Books That Will Help

| Book | Chapters | Why | |—|—|—| | Real-Time Concepts for Embedded Systems | Ch. 16 | Common design problems | | Making Embedded Systems (2nd ed) | Ch. 4-5 | Timing and interrupt fundamentals |

Common Pitfalls & Debugging

Problem: Measurement itself changes timing

  • Why: UART logging in ISR adds delay
  • Fix: Only toggle GPIO in ISR; log later in task
  • Quick test: Compare latency with and without logging

Problem: Jitter values inconsistent

  • Why: Non-deterministic interrupts or DMA
  • Fix: Disable non-essential peripherals during test
  • Quick test: Run tests in minimal configuration

Definition of Done

  • Latency measured with GPIO pulses
  • Jitter statistics reported over UART
  • Worst-case measurements captured under load
  • Results documented in a timing report