Project 3: A Cooperative Multi-Tasking Scheduler

Build a cooperative scheduler where tasks explicitly yield, and implement context switching with per-task stacks on Cortex-M.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2-3 weeks
Main Programming Language C with small ARM assembly
Alternative Programming Languages Rust, Ada
Coolness Level Very High
Business Potential Medium
Prerequisites Projects 1-2, SysTick basics, ARM register familiarity
Key Topics Task stacks, context switch, cooperative scheduling, TCBs, task states

1. Learning Objectives

By completing this project, you will:

  1. Create multiple tasks with separate stacks and initialize their contexts.
  2. Implement a cooperative yield() that switches tasks correctly.
  3. Explain the Cortex-M exception frame and how it enables context switching.
  4. Debug context switches using GDB and register inspection.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Task Stacks and Cortex-M Context Frames

Fundamentals

Each task in a multitasking system must have its own stack to store local variables, return addresses, and saved registers. On Cortex-M, the hardware automatically pushes a fixed exception frame on interrupt entry. By carefully crafting a fake exception frame on a new task’s stack, you can trick the CPU into “returning” into the task as if it had been interrupted earlier. The key is understanding which registers are stacked automatically (R0-R3, R12, LR, PC, xPSR) and which are saved manually (R4-R11) by your context switch code. This separation is the foundation of efficient context switching on Cortex-M.

Additional fundamentals: A task stack is the only place where that task’s local state lives. Sharing stacks between tasks is equivalent to sharing a single thread of execution, which defeats multitasking. The requirement for independent stacks is non-negotiable, and it is the root of why context switching is a stack manipulation problem.

Deep Dive into the concept

A task stack is not just a memory buffer; it is a snapshot of the CPU state at a specific time. When a task is running, its stack grows downward as functions call other functions. When the task is interrupted (or yields), the CPU pushes an exception frame. This frame includes the program counter and status register, which defines where execution resumes. The RTOS must preserve additional registers (R4-R11) that are not automatically stacked. In cooperative scheduling, you can trigger a context switch using a software exception (like PendSV or SVC) or by manually saving registers and swapping stack pointers. On Cortex-M, the PendSV exception is designed for context switching because it can be assigned the lowest priority and runs only when no other interrupts are active.

Creating a new task requires building an initial stack that looks like an interrupted context. This means placing the initial PC (task entry function), a valid LR (usually a task exit handler), and a default xPSR with the Thumb bit set. If the xPSR Thumb bit is missing, the CPU will fault immediately when it tries to “return” into the task. You also need to set a sensible initial R0 value if you want to pass an argument. The stack pointer should be aligned to 8 bytes per the ARM ABI. An incorrect stack alignment may work sometimes but will eventually cause hard-to-debug faults, especially when the CPU uses 64-bit data or floating-point registers.

The stack is also the most common source of RTOS bugs. If the stack is too small, it will overflow and corrupt adjacent memory. In a cooperative scheduler, you might not notice immediately, because tasks yield voluntarily. But as soon as a deep call or recursion occurs, corruption can manifest. This is why later projects introduce stack watermarks and overflow detection. For now, you must choose stack sizes conservatively and verify them with GDB or a stack fill pattern. You should also remember the difference between MSP (main stack pointer) and PSP (process stack pointer). Cortex-M supports two stack pointers; the RTOS often keeps the kernel on MSP and tasks on PSP. Even in a cooperative scheduler, using PSP for tasks makes the later transition to preemptive scheduling smoother.

Finally, context switching is about invariants. The invariant is that every task’s saved context is self-consistent: its stack pointer points to a valid frame, and the frame holds valid register values. When you switch tasks, you must store the outgoing task’s SP in its TCB and restore the incoming task’s SP into the CPU. If you do anything else, you will crash. The ability to reason about these invariants is what makes you capable of debugging RTOS-level issues. This project is your first chance to build and validate that reasoning.

Additional depth: The Cortex-M exception frame has a specific layout and alignment that affects both correctness and debugging. The hardware-stacked frame is 8 words, and the stack pointer is automatically aligned to 8 bytes before stacking. If you manually adjust SP or misalign it, the hardware stacking can fail or create hard-to-debug faults. When you create a new task, you must ensure the stack pointer points to a valid frame that the CPU can “unstack” on return. That means the stacked PC must be a valid function entry with the Thumb bit set, and the stacked xPSR must have T-bit set. If you plan to use floating-point instructions later, you must also account for extended frames (lazy stacking) and ensure the stack has room for extra registers. These details are easy to miss but are critical for a stable context switch.

Another important detail is how the compiler uses registers. The Cortex-M calling convention treats R4-R11 as callee-saved and R0-R3/R12 as caller-saved. This is why the hardware stacks R0-R3/R12 and your context switch saves R4-R11. If you mistakenly save the wrong set, tasks will corrupt each other’s state in subtle ways. When debugging, you can validate this by writing distinct values into R4-R11 in each task and confirming they persist across yields. This is a practical, concrete check that your context switch is correct.

Stack sizing is also more nuanced than just “large enough.” Each task has its own stack plus the interrupt stacking that occurs if an interrupt fires while the task is running. If you use MSP for interrupts, the task stack only needs to cover task call depth. If you use PSP for both tasks and interrupts, then the task stack must also handle ISR frames, which increases the required size. This project is a good place to decide on that split and to document it clearly. A later improvement is to track high-water marks and adjust stack sizes accordingly.

Finally, a cooperative context switch can be implemented without PendSV by directly switching stack pointers in C/assembly. This can be tempting, but PendSV offers a cleaner and more robust mechanism that aligns with the interrupt architecture. Even in cooperative mode, using PendSV prepares you for preemption and keeps the kernel design consistent. The deeper point is that you are building a reusable kernel architecture, not just a single demo.

How this fit on projects

This concept is used directly in Sec. 3.1 (task creation), Sec. 4.4 (data structures), and Sec. 5.10 (Phase 2). It is also required for the preemptive scheduler in Project 4.

Definitions & key terms

  • TCB: Task Control Block storing task state and stack pointer.
  • Exception frame: Hardware-saved registers on interrupt entry.
  • PSP/MSP: Process and Main Stack Pointers on Cortex-M.
  • PendSV: Low-priority exception used for context switching.

Mental model diagram (ASCII)

Task A stack -> [saved R4-R11][hw exception frame]
             -> SP_A stored in TCB
Task B stack -> [saved R4-R11][hw exception frame]
             -> SP_B stored in TCB

How it works (step-by-step, with invariants and failure modes)

  1. Task yields -> context switch triggered.
  2. Save R4-R11 to current stack, store SP in TCB.
  3. Load next task’s SP from its TCB.
  4. Restore R4-R11, return from exception to task.
  5. Failure modes: invalid xPSR or misaligned stack -> HardFault.

Minimal concrete example

// Pseudocode for initializing task stack
uint32_t *sp = stack_top;
*(--sp) = 0x01000000; // xPSR (Thumb)
*(--sp) = (uint32_t)task_entry; // PC
*(--sp) = (uint32_t)task_exit;  // LR
// R12, R3, R2, R1, R0
for (int i = 0; i < 5; i++) *(--sp) = 0;
// R4-R11 saved manually later

Common misconceptions

  • “One stack is enough for all tasks” -> tasks must have isolated stacks.
  • “The compiler handles context switching” -> you must save/restore registers.
  • “Any initial PC works” -> must be Thumb state and valid address.

Check-your-understanding questions

  1. Which registers are automatically stacked on Cortex-M interrupt entry?
  2. Why must the initial xPSR have bit 24 set?
  3. What is the difference between MSP and PSP?

Check-your-understanding answers

  1. R0-R3, R12, LR, PC, xPSR.
  2. Bit 24 sets Thumb state; without it, the CPU faults.
  3. MSP is for handlers/kernel, PSP is for thread tasks.

Real-world applications

  • Every RTOS port layer on Cortex-M.
  • Cooperative fibers or coroutines in embedded runtimes.

Where you’ll apply it

  • This project: Sec. 3.1, Sec. 4.4, Sec. 5.10 Phase 2.
  • Also used in: Project 4 and Project 9.

References

  • ARM Cortex-M exception handling documentation.
  • “Real-Time Concepts for Embedded Systems” by Qing Li, Ch. 5.

Key insights

Context switching is just disciplined stack manipulation plus strict invariants.

Summary

Task stacks and exception frames are the mechanism that makes multitasking possible on Cortex-M.

Homework/Exercises to practice the concept

  1. Draw the full stack frame for a task after its first yield.
  2. Modify the stack init to pass an argument via R0.
  3. Use GDB to inspect the stack before and after a context switch.

Solutions to the homework/exercises

  1. Include hw frame plus manually saved registers.
  2. Place the argument value in the R0 slot of the initial frame.
  3. Set breakpoints and inspect memory around SP for each task.

2.2 Cooperative Scheduling and Task State Management

Fundamentals

A cooperative scheduler switches tasks only when the running task voluntarily yields. This makes the scheduler simpler and deterministic: no task can be preempted unexpectedly. The scheduler must maintain a list of runnable tasks and a current task index. Each task calls yield() to give up the CPU, at which point the scheduler selects the next task in a round-robin or priority order. Tasks that never yield will starve others, which is why cooperative scheduling is best for well-behaved tasks or educational kernels.

Additional fundamentals: Cooperative scheduling is essentially a social contract between tasks. The kernel is simple, but the tasks must be polite and yield often. This makes it a great teaching model, because it exposes scheduling logic without the complexity of preemption, while still forcing you to reason about fairness and responsiveness.

Deep Dive into the concept

In a cooperative scheduler, the system is essentially a controlled handoff. The scheduler can be implemented as a simple loop that resumes each task in sequence. But with real stacks and context switching, the scheduler becomes a small kernel: it must track task states, save/restore contexts, and ensure the invariants for each task are preserved. The Task Control Block (TCB) typically stores the task’s stack pointer, state (READY, RUNNING, BLOCKED), and sometimes priority. In cooperative mode, READY tasks are those that are not blocked; RUNNING is the currently executing task. When a task calls yield(), it transitions itself from RUNNING to READY and then invokes the scheduler.

Task state management is about correctness and fairness. A round-robin scheduler is fair among READY tasks, but does not prioritize critical work. Cooperative scheduling can also be combined with priorities, but the enforcement is still voluntary; a high-priority task cannot preempt a lower-priority one. This distinction is crucial. Many embedded systems start with cooperative designs because they are easier to debug and require less ISR complexity. However, as soon as timing guarantees are required, preemption is necessary. This project teaches the mechanics of task switching first, so you can later add preemption with a clear mental model.

The yield() function is a kernel boundary. It is the only place where context switching is allowed. This makes debugging easier because you can set breakpoints in yield() and inspect task stacks and TCBs. It also means you must design tasks to yield frequently enough to keep the system responsive. A typical pattern is to call yield() in long-running loops or after processing a chunk of work. In later projects, blocking calls like sleep() or wait() will implicitly yield, but in this project you implement the explicit mechanism.

A cooperative scheduler must also handle task creation and exit. When a task finishes, it should not return to nowhere. Many kernels define a task exit handler that cleans up and marks the task as DEAD or READY for reuse. Without this, a task returning will corrupt control flow. You also need an idle task as a safety net when no user tasks are ready. Even in cooperative scheduling, an idle loop can handle background work or power-saving in later steps.

The key tradeoff is simplicity versus responsiveness. Cooperative scheduling is simpler because it avoids the complexity of interrupt-based preemption, but it relies on trust in the tasks. This project is intentionally structured to make you aware of that tradeoff and to teach the machinery of task switching before you add more complexity.

Additional depth: Cooperative scheduling has a direct relationship to software design discipline. Tasks must be written so they naturally yield at safe points, such as after processing one item in a queue or after completing a fixed amount of work. This encourages modularity and clear control flow, but it also requires careful planning. If you have a task that performs a long computation without yielding, you must restructure it into smaller chunks or add explicit yield points. This is a valuable exercise because it forces you to think about responsiveness and fairness, even before preemption exists.

In a cooperative system, blocking operations are also different. Without a scheduler that can preempt, blocking is often implemented as a yield loop that checks for a condition. This can lead to busy-waiting if not designed carefully. A cleaner approach is to use a blocking API that changes task state and yields the CPU immediately. This requires the scheduler to maintain state and to resume the task when the condition is met. Even though this project is cooperative, you can still implement proper blocking semantics to practice the kernel structures you will use later.

Fairness is another subtlety. A simple round-robin scheduler is fair only if all tasks yield at similar rates. If one task yields too frequently, it can effectively dominate the CPU because it is always READY and always first in the round-robin order. You can mitigate this by tracking the last-run index and rotating fairly, or by adding time-slicing logic even in a cooperative kernel. These choices influence how smooth and predictable your system feels.

Finally, a cooperative scheduler is an ideal environment for debugging because context switches happen at known points. You can place breakpoints in yield() and inspect task stacks and TCBs without worrying about asynchronous preemption. Use this to build confidence in your task state transitions and stack integrity. The debugging skills you develop here are essential when you move to preemptive scheduling, where issues become harder to reproduce.

How this fit on projects

This concept is implemented in Sec. 3.2 (yield), Sec. 4.2 (scheduler design), and Sec. 5.10 (Phase 1). It prepares you for preemption in Project 4.

Definitions & key terms

  • Cooperative scheduling: Tasks yield voluntarily; no forced preemption.
  • Yield: A call that transfers control to the scheduler.
  • READY/RUNNING/BLOCKED: Task states used by the scheduler.
  • Idle task: Default task when no others are ready.

Mental model diagram (ASCII)

Task A -> yield -> Scheduler -> Task B -> yield -> Scheduler -> Task C

How it works (step-by-step, with invariants and failure modes)

  1. Task calls yield().
  2. Scheduler saves current task context.
  3. Scheduler selects next READY task.
  4. Scheduler restores context and resumes task.
  5. Failure mode: task never yields -> starvation.

Minimal concrete example

void yield(void) {
    current_tcb->state = READY;
    schedule_next();
    context_switch();
}

Common misconceptions

  • “Cooperative means no scheduler” -> you still need task management.
  • “Round-robin always fair” -> a task can still hog CPU by not yielding.
  • “Idle task is optional” -> it is required to handle no READY tasks.

Check-your-understanding questions

  1. Why can a cooperative scheduler starve tasks?
  2. What is the role of an idle task?
  3. How do task states change when a task yields?

Check-your-understanding answers

  1. If a task never yields, others never run.
  2. It provides a safe execution path when no tasks are READY.
  3. RUNNING -> READY for the yielding task; next task becomes RUNNING.

Real-world applications

  • Simple embedded loops with multiple periodic tasks.
  • Cooperative multitasking in early microcontroller systems.

Where you’ll apply it

  • This project: Sec. 3.1, Sec. 4.2, Sec. 5.10 Phase 1.
  • Also used in: Project 5.

References

  • “Real-Time Concepts for Embedded Systems” by Qing Li, Ch. 4-5.
  • Classic cooperative schedulers in early RTOS designs.

Key insights

Cooperative scheduling teaches the mechanics of multitasking without the complexity of preemption.

Summary

A cooperative scheduler is a disciplined handoff system; its correctness depends on well-behaved tasks and robust state management.

Homework/Exercises to practice the concept

  1. Add a third task and verify it runs in round-robin order.
  2. Modify yield() to implement simple priority selection.
  3. Create a task that never yields and observe starvation.

Solutions to the homework/exercises

  1. Add another TCB and confirm its stack is independent.
  2. Choose the highest priority READY task in the scheduler.
  3. Observe that other tasks stop running; fix by adding yield points.

3. Project Specification

3.1 What You Will Build

A cooperative multitasking kernel that runs at least two tasks, each with its own stack, and switches between them only when tasks call yield().

3.2 Functional Requirements

  1. Task Creation: Tasks are created with independent stacks.
  2. Context Switch: yield() switches tasks without corruption.
  3. Scheduler: Supports round-robin selection considered stable.

3.3 Non-Functional Requirements

  • Performance: Context switch < 20 us on target hardware.
  • Reliability: No stack corruption after 10,000 switches.
  • Usability: GDB can inspect task stacks and registers.

3.4 Example Usage / Output

  • Two LEDs blink independently.
  • UART prints “Task A” and “Task B” alternately.

3.5 Data Formats / Schemas / Protocols

  • TCB structure includes stack pointer and state.

3.6 Edge Cases

  • Task returns without exit handler.
  • Stack size too small -> overflow.
  • Null task pointer -> hard fault.

3.7 Real World Outcome

Two tasks run cooperatively and visibly alternate, proving that separate stacks and context switching are correct.

3.7.1 How to Run (Copy/Paste)

make clean all
make flash

Exit codes:

  • make: 0 success, 2 build failure.
  • openocd: 0 flash success, 1 connection failure.

3.7.2 Golden Path Demo (Deterministic)

  1. Task A toggles LED1 every 200 ms and calls yield().
  2. Task B toggles LED2 every 300 ms and calls yield().
  3. Both LEDs blink in their own rhythm without interfering.

3.7.3 Failure Demo (Deterministic)

  1. Remove yield() from Task A loop.
  2. Flash and observe only LED1 toggling.
  3. Restore yield() and verify both LEDs run.

4. Solution Architecture

4.1 High-Level Design

Tasks -> yield() -> scheduler -> context switch -> next task

4.2 Key Components

Component Responsibility Key Decisions
tcb.c Task control blocks Store SP and state
sched.c Task selection Round-robin order
context.S Save/restore registers Use PendSV or inline asm

4.4 Data Structures (No Full Code)

typedef struct {
    uint32_t *sp;
    uint8_t state;
    uint8_t priority;
} tcb_t;

4.4 Algorithm Overview

Key Algorithm: Cooperative Switch

  1. Save current context to stack.
  2. Update current TCB with SP.
  3. Choose next READY task.
  4. Restore next task context.

Complexity Analysis:

  • Time: O(n) for simple round-robin selection.
  • Space: O(n) for TCB list.

5. Implementation Guide

5.1 Development Environment Setup

brew install arm-none-eabi-gcc openocd

5.2 Project Structure

rtos-p03/
|-- src/
|   |-- sched.c
|   |-- tcb.c
|   `-- main.c
|-- asm/
|   `-- context.S
|-- include/
`-- Makefile

5.3 The Core Question You’re Answering

“How can I switch between tasks reliably by saving and restoring CPU state?”

5.4 Concepts You Must Understand First

  1. Cortex-M exception frame and task stack layout.
  2. Cooperative scheduling and task states.

5.5 Questions to Guide Your Design

  1. How will you build the initial stack frame for each task?
  2. Where will each task’s stack reside in RAM?
  3. How will you prevent tasks from returning into invalid code?

5.6 Thinking Exercise

Draw a timeline of task execution showing yields and stack pointer changes.

5.7 The Interview Questions They’ll Ask

  1. “Why are R4-R11 saved manually?”
  2. “What happens if a task stack overlaps another?”
  3. “Why does cooperative scheduling require well-behaved tasks?”

5.8 Hints in Layers

Hint 1: Start with two tasks Keep it simple; ensure both stacks are independent.

Hint 2: Validate the xPSR Set the Thumb bit in the initial frame.

Hint 3: Use PendSV PendSV is designed for context switching and simplifies restoration.


5.9 Books That Will Help

Topic Book Chapter
Context switching “Real-Time Concepts for Embedded Systems” Ch. 5
Cortex-M internals “The Definitive Guide to ARM Cortex-M” Ch. 7

5.10 Implementation Phases

Phase 1: Task Creation (3-4 days)

Goals: Build TCBs and stacks. Tasks: Define TCB structure, allocate stacks, initialize frames. Checkpoint: Task can start and toggle a LED.

Phase 2: Cooperative Scheduler (4-6 days)

Goals: Implement yield() and round-robin selection. Tasks: Save/restore context, switch tasks. Checkpoint: Two tasks alternate on yield.

Phase 3: Debug and Validate (3-4 days)

Goals: Prove no stack corruption. Tasks: Fill stacks with pattern and check watermark. Checkpoint: Consistent stack usage after many switches.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Context switch path PendSV vs manual call PendSV Designed for low-priority switching
Stack size Small vs generous Generous Avoid early overflow

6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |——————|——————————|————————–| | Unit Tests | Validate stack init | Check xPSR value | | Integration Tests | Task switching correctness | Two tasks blink | | Edge Case Tests | Yield/return errors | Task return handler |

6.2 Critical Test Cases

  1. Task A and B switch 10,000 times without fault.
  2. Stack watermark stays above safety margin.
  3. Task returning triggers exit handler.

6.3 Test Data

Expected stack alignment: 8-byte
Expected xPSR: 0x01000000

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |——————————|————————|———————————-| | Thumb bit missing | HardFault on first run | Set xPSR to 0x01000000 | | Misaligned stack | Random crashes | Align stack to 8 bytes | | Task never yields | Starvation | Add yield points |

7.2 Debugging Strategies

  • Inspect SP/PC for each task in GDB.
  • Fill stacks with a pattern and scan for corruption.

7.3 Performance Traps

  • Excessive context switch frequency reduces CPU time for real work.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a third cooperative task.
  • Implement a basic priority field (no preemption).

8.2 Intermediate Extensions

  • Add task exit cleanup and reuse stacks.
  • Implement task_sleep() that yields for N ticks.

8.3 Advanced Extensions

  • Switch tasks with PSP while keeping kernel on MSP.
  • Add stack canaries for overflow detection.

9. Real-World Connections

9.1 Industry Applications

  • Simple cooperative schedulers in IoT devices.
  • Legacy systems where preemption is too risky.
  • FreeRTOS: Cooperative scheduling mode (configUSE_PREEMPTION=0).
  • ChibiOS: Task stack initialization patterns.

9.3 Interview Relevance

  • Context switching, stack frames, and task states are core RTOS interview topics.

10. Resources

10.1 Essential Reading

  • “The Definitive Guide to ARM Cortex-M” by Joseph Yiu.
  • “Real-Time Concepts for Embedded Systems” by Qing Li.

10.2 Video Resources

  • RTOS context switch walkthroughs on Cortex-M.

10.3 Tools & Documentation

  • GDB register inspection and memory dump commands.

11. Self-Assessment Checklist

11.1 Understanding

  • I can draw the Cortex-M exception frame layout.
  • I understand why tasks need independent stacks.

11.2 Implementation

  • Two tasks switch reliably via yield().
  • Stack usage is measured and safe.

11.3 Growth

  • I can explain cooperative vs preemptive scheduling.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Two tasks run and yield correctly.
  • Context switch saves/restores registers without fault.

Full Completion:

  • Stack watermarks measured and documented.
  • Task exit handler implemented.

Excellence (Going Above & Beyond):

  • PSP/MSP split implemented.
  • Context switch verified with cycle counts.