Project 15: Tiny Operating System
Build a minimal but complete operating system with preemptive multitasking, memory protection (using MPU), IPC (semaphores, queues), and a simple shell—like a tiny FreeRTOS you built yourself.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C + ARM Assembly |
| Difficulty | Master |
| Time | 1-2 months |
| Prerequisites | Projects 6-7, 9, strong OS theory |
| Key Topics | Multitasking, MPU, Syscalls, IPC |
| Coolness | Level 5: Pure Magic (Super Cool) |
| Portfolio Value | Industry Disruptor |
| Hardware | STM32 board |
Learning Objectives
By completing this project, you will:
- Implement privilege separation: Understand ARM privilege levels and mode switching
- Configure the Memory Protection Unit: Set up MPU regions for kernel and user space isolation
- Build a system call interface: Use SVCall to provide kernel services to user tasks
- Implement inter-process communication: Create semaphores, mutexes, and message queues
- Create a preemptive scheduler: Round-robin or priority-based task scheduling
- Build a command shell: Interactive interface for process management
- Integrate everything from previous projects: This capstone ties together all ARM concepts
The Core Question You’re Answering
“How does an operating system provide isolation, resource management, and coordination for multiple concurrent tasks on a single-core processor, and what hardware features make this possible on ARM?”
This question forces you to understand that an OS is not magic—it’s a carefully designed layer of software that leverages specific hardware features (privilege modes, MPU, interrupts, timers) to create the illusion of multiple programs running simultaneously while protecting them from each other.
Concepts You Must Understand First
Before starting this project, ensure you understand these concepts:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Context switching | Core of multitasking | Project 7 |
| ARM exception model | Syscalls use SVCall | Project 9 |
| Timer interrupts | SysTick drives preemption | Project 7, 12 |
| Stack management | Each task needs its own stack | Project 5, 7 |
| ARM privilege levels | Kernel vs user mode | ARM TRM |
| Scheduling algorithms | Round-robin, priority | OSTEP |
Key Concepts Deep Dive
- ARM Privilege Levels
- What is the difference between privileged (Handler) and unprivileged (Thread) modes?
- How do you switch from privileged to unprivileged mode?
- Why can’t unprivileged code access certain registers?
- How does the processor automatically switch to privileged mode on exceptions?
- “The Definitive Guide to ARM Cortex-M3/M4” Chapter 3
- Memory Protection Unit (MPU)
- What does the MPU protect against?
- How do you define memory regions (base, size, permissions)?
- What happens when an MPU violation occurs?
- How do you configure different regions for different tasks?
- “The Definitive Guide to ARM Cortex-M3/M4” Chapter 11
- System Calls (SVCall)
- How does the SVC instruction work?
- How do you extract the syscall number from the SVC instruction?
- How do you pass parameters and return values?
- What is the advantage of SVCall over direct function calls?
- “Operating Systems: Three Easy Pieces” Chapter 6
- Inter-Process Communication
- What are the fundamental IPC primitives (semaphores, mutexes, queues)?
- How do semaphores enable synchronization and mutual exclusion?
- What is priority inversion and how do you prevent it?
- How do message queues enable asynchronous communication?
- “Operating Systems: Three Easy Pieces” Chapters 27-31
- Process State and Scheduling
- What states can a task be in (Ready, Running, Blocked)?
- How does the scheduler choose which task to run next?
- What triggers a context switch (voluntary vs preemptive)?
- How do you handle task creation and termination?
- “Operating Systems: Three Easy Pieces” Chapters 7-8
Theoretical Foundation
Operating System Layer Cake
A minimal OS provides these layers of abstraction:
OS Architecture:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ User Space (Unprivileged) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Task 1 Task 2 Task 3 Shell │ │
│ │ [user] [user] [user] [user] │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ SVC (syscall) │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Kernel Space │ │
│ │ ┌─────────────────────────────────────────────────────┐│ │
│ │ │ Syscall Handler │ Scheduler │ IPC │ Memory Manager ││ │
│ │ └─────────────────────────────────────────────────────┘│ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Hardware │ │
│ │ MPU │ SysTick │ NVIC │ UART │ GPIO │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
ARM Privilege Modes
Cortex-M has two privilege levels:
Privilege Levels:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Privileged (Handler Mode) │
│ ───────────────────────── │
│ • Always in Handler mode during exceptions │
│ • Full access to all registers (including control regs) │
│ • Can access MPU, NVIC, SCB │
│ • Can execute privileged instructions │
│ • Kernel code runs here │
│ │
│ Unprivileged (Thread Mode) │
│ ───────────────────────── │
│ • Configured via CONTROL register │
│ • Limited register access │
│ • Cannot directly modify MPU, NVIC │
│ • Cannot execute privileged instructions │
│ • User tasks run here │
│ │
│ Transition: │
│ • Exception entry: Automatic to Privileged │
│ • Exception return: Can specify unprivileged via EXC_RETURN │
│ • CONTROL register: Can only be written in privileged mode │
│ │
└─────────────────────────────────────────────────────────────────┘
Memory Protection Unit (MPU)
The MPU enforces memory access permissions:
MPU Configuration:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ MPU Region Definition: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Region Number │ Base Address │ Size │ Permissions │ EN │ │
│ ├─────────────────────────────────────────────────────────┤ │
│ │ 0 │ 0x08000000 │ 32K │ RO+X │ 1 │ │ Kernel code
│ │ 1 │ 0x20000000 │ 8K │ RW │ 1 │ │ Kernel data
│ │ 2 │ 0x08008000 │ 64K │ RO+X (user) │ 1 │ │ User code
│ │ 3 │ 0x20002000 │ 4K │ RW (user) │ 1 │ │ Task 1 stack
│ │ 4 │ 0x20003000 │ 4K │ RW (user) │ 0 │ │ Task 2 stack
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Access Permissions (AP bits): │
│ • 000: No access │
│ • 001: Privileged RW only │
│ • 010: Privileged RW, User RO │
│ • 011: Full access (Privileged and User RW) │
│ │
│ On violation: │
│ • MemManage fault exception triggers │
│ • MMFSR indicates type of violation │
│ • Kernel can terminate task or log error │
│ │
└─────────────────────────────────────────────────────────────────┘
System Call Mechanism
System calls use the SVC instruction:
System Call Flow:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ User Code: │
│ ────────── │
│ int result = sys_write(fd, buf, len); │
│ │
│ Expands to: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ MOV R0, fd │ │
│ │ MOV R1, buf │ │
│ │ MOV R2, len │ │
│ │ SVC #1 ; Syscall number in immediate │ │
│ │ ; Return value in R0 │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Kernel (SVC Handler): │
│ ───────────────────── │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 1. Read stacked PC to find SVC instruction │ │
│ │ 2. Extract syscall number from immediate field │ │
│ │ 3. Read arguments from stacked R0, R1, R2 │ │
│ │ 4. Execute kernel function │ │
│ │ 5. Store return value in stacked R0 │ │
│ │ 6. Return from exception │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Task State Machine
Tasks transition between states:
Task States:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ ┌──────────────────────────────────────┐ │
│ │ │ │
│ ▼ │ │
│ ┌─────────┐ schedule ┌─────────┐ │ │
│ │ READY │───────────────▶│ RUNNING │ │ │
│ └─────────┘ └─────────┘ │ │
│ ▲ │ │ │
│ │ │ wait() │ preempt/ │
│ signal │ │ yield │
│ │ ▼ │ │
│ │ ┌─────────┐ │ │
│ └─────────────────────│ BLOCKED │ │ │
│ └─────────┘ │ │
│ │ │
│ ┌─────────┐ │ │
│ │ ZOMBIE │◀──────────────────────────┘ │
│ └─────────┘ exit() │
│ │
│ State Transitions: │
│ • READY → RUNNING: Scheduler selects this task │
│ • RUNNING → READY: Preemption (time slice expired) │
│ • RUNNING → BLOCKED: Task calls wait() on semaphore/queue │
│ • BLOCKED → READY: Resource becomes available (signal) │
│ • RUNNING → ZOMBIE: Task calls exit() or returns from main │
│ │
└─────────────────────────────────────────────────────────────────┘
Preemptive Scheduling
SysTick enables time-sliced scheduling:
Preemptive Scheduling:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Time ─────────────────────────────────────────────────────▶ │
│ │
│ SysTick: ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ │
│ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ │
│ │
│ Task 1: ████████ ████████ ████████ │
│ Task 2: ████████ ████████ ████████ │
│ Task 3: ████████ ████████ ████████ │
│ │
│ Each tick: │
│ 1. SysTick interrupt fires │
│ 2. Current task is preempted (context saved) │
│ 3. Scheduler selects next ready task │
│ 4. New task's context is restored │
│ 5. Return from interrupt to new task │
│ │
│ Time quantum: e.g., 10ms per task │
│ │
└─────────────────────────────────────────────────────────────────┘
Semaphores and Synchronization
Semaphores coordinate access to shared resources:
Semaphore Operation:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Semaphore Structure: │
│ ┌─────────────────────┐ │
│ │ count: int │ Current value (available resources) │
│ │ wait_queue: list │ Tasks waiting for this semaphore │
│ └─────────────────────┘ │
│ │
│ wait(sem): signal(sem): │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ disable interrupts │ │ disable interrupts │ │
│ │ if (sem.count > 0) │ │ sem.count++ │ │
│ │ sem.count-- │ │ if (wait_queue) │ │
│ │ return │ │ wake first task │ │
│ │ else │ │ enable interrupts │ │
│ │ add to wait_queue│ └────────────────────┘ │
│ │ block this task │ │
│ │ enable interrupts │ │
│ └────────────────────┘ │
│ │
│ Mutex = binary semaphore (count = 0 or 1) │
│ │
└─────────────────────────────────────────────────────────────────┘
Message Queues
Queues enable inter-task communication:
Message Queue:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Queue Structure: │
│ ┌─────────────────────────────────────────────┐ │
│ │ buffer[N]: messages │ │
│ │ head, tail: indices │ │
│ │ count: current messages │ │
│ │ sem_full: semaphore for full condition │ │
│ │ sem_empty: semaphore for empty condition │ │
│ └─────────────────────────────────────────────┘ │
│ │
│ send(queue, msg): receive(queue, &msg): │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ wait(sem_full) │ │ wait(sem_empty) │ │
│ │ buffer[tail]=msg │ │ msg=buffer[head] │ │
│ │ tail = (tail+1)%N│ │ head=(head+1)%N │ │
│ │ signal(sem_empty)│ │ signal(sem_full) │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
│ Producer ──[msg]──▶ Queue ──[msg]──▶ Consumer │
│ │
└─────────────────────────────────────────────────────────────────┘
Common Misconceptions
Misconception 1: “User mode can’t do anything useful” Reality: User mode can access its own memory and execute normal instructions. Only privileged operations (I/O, MPU config) are restricted.
Misconception 2: “Each task needs its own MPU configuration” Reality: You can share code regions and only change data/stack regions on context switch, or use a simpler model with fewer regions.
Misconception 3: “Preemption is complex” Reality: Preemption is just a context switch triggered by a timer interrupt instead of a voluntary yield.
Misconception 4: “Syscalls are slow” Reality: SVC is fast—just exception entry/exit overhead. The kernel function itself dominates latency.
Project Specification
What You Will Build
A complete minimal operating system with:
- Preemptive multitasking with up to 8 tasks
- User/kernel mode separation with MPU protection
- System call interface for kernel services
- Semaphores, mutexes, and message queues
- Interactive shell for process management
- Task creation, termination, and resource cleanup
Functional Requirements
- Task Management:
- Create tasks with specified stack size and priority
- Terminate tasks gracefully with resource cleanup
- Support at least 8 concurrent tasks
- Track task state (Ready, Running, Blocked, Zombie)
- Scheduling:
- Preemptive round-robin or priority-based scheduling
- Configurable time quantum (default 10ms)
- Idle task when no other tasks ready
- Yield syscall for voluntary context switch
- Memory Protection:
- Kernel code/data inaccessible from user tasks
- Each task has protected stack region
- MPU fault handler for violations
- Shared code region for user tasks
- System Calls:
sys_write(fd, buf, len)- Output to consolesys_read(fd, buf, len)- Input from consolesys_yield()- Voluntary context switchsys_exit(code)- Terminate current tasksys_spawn(entry, priority)- Create new tasksys_sleep(ms)- Sleep for duration- Semaphore operations (wait, signal, create)
- Message queue operations (send, receive, create)
- IPC Primitives:
- Binary semaphores (mutexes)
- Counting semaphores
- Message queues with configurable size
- Blocking and non-blocking variants
- Shell Interface:
ps- List all tasks with state and priorityexec <name>- Start a new taskkill <pid>- Terminate a taskmem- Show memory usagesem- Show semaphore statusmsg- Show message queue status
Non-Functional Requirements
- Footprint: Kernel < 16KB Flash, < 4KB RAM
- Latency: Syscall < 10us, context switch < 50us
- Reliability: No kernel panics on user misbehavior
- Determinism: Consistent scheduling behavior
Real World Outcome
When complete, your OS will work like this:
=== TinyOS v1.0 ===
Kernel: 8KB Flash, 2KB RAM
User space: 120KB Flash, 30KB RAM
Tasks: 8 max, priority 0-7
Boot sequence:
[OK] MPU configured: 8 regions
[OK] Kernel in privileged mode
[OK] User tasks in unprivileged mode
[OK] SysTick @ 1ms
[OK] Shell task started
TinyOS> ps
PID NAME PRI STATE STACK CPU%
1 idle 7 READY 128 85%
2 shell 2 RUNNING 512 2%
3 blinker 3 READY 256 1%
4 sensor 1 BLOCKED 256 12%
TinyOS> exec counter
Starting task 'counter' (PID 5)
TinyOS> kill 3
Task 'blinker' terminated
TinyOS> mem
Kernel heap: 1024/2048 bytes used
User heap: 4096/30720 bytes used
Task stacks: 1280 bytes total
Free memory: 25344 bytes
TinyOS> sem
SEM VALUE WAITERS
uart_tx 1 (none)
sensor_rdy 0 sensor(4)
data_mutex 1 (none)
TinyOS> msg
QUEUE SIZE PENDING
cmd_q 16 2 messages
data_q 64 0 messages
TinyOS> exec producer
Starting task 'producer' (PID 6)
TinyOS> exec consumer
Starting task 'consumer' (PID 7)
TinyOS> ps
PID NAME PRI STATE STACK CPU%
1 idle 7 READY 128 62%
2 shell 2 RUNNING 512 3%
4 sensor 1 READY 256 15%
5 counter 4 READY 256 5%
6 producer 3 BLOCKED 256 8%
7 consumer 3 READY 256 7%
# User task trying to access kernel memory:
TinyOS> exec bad_task
Starting task 'bad_task' (PID 8)
[FAULT] MPU violation at 0x20000010 by task 8
[FAULT] Terminating task 'bad_task'
TinyOS> ps
PID NAME PRI STATE STACK CPU%
1 idle 7 READY 128 62%
2 shell 2 RUNNING 512 3%
...
# Task 8 was terminated due to protection fault
Questions to Guide Your Design
Work through these questions BEFORE writing code:
-
Memory Layout: How do you partition Flash and RAM between kernel and user space?
-
Stack Allocation: Do you use static stacks or a stack allocator? How big should stacks be?
-
MPU Regions: How many regions do you need? How do you handle task switching with limited regions?
-
Syscall Design: How do you pass arguments? Return values? Errors?
-
Priority Inversion: What if a high-priority task waits on a semaphore held by a low-priority task?
-
Idle Task: What should the idle task do? (WFI instruction for power saving)
-
Task Exit: How do you clean up task resources? Who reclaims the stack?
-
Reentrant Kernel: Can one syscall be interrupted by another? Should it?
Thinking Exercise
Before writing any code, design these scenarios on paper:
Scenario 1: Task Creation
- Shell calls
sys_spawn("blinker", 3) - What kernel structures are created?
- How is the new task’s stack initialized?
- What MPU regions change (if any)?
- When does the new task first run?
Scenario 2: Mutex Deadlock
- Task A holds mutex M1, waiting for M2
- Task B holds mutex M2, waiting for M1
- How would you detect this?
- How would you prevent it?
Scenario 3: MPU Fault Handling
- User task reads from address 0x20000010 (kernel data)
- MPU triggers MemManage fault
- What information is available to the handler?
- How do you terminate just that task, not the whole system?
Solution Architecture
High-Level Design
┌─────────────────────────────────────────────────────────────────┐
│ TinyOS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ User Space │
│ ────────── │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Shell │ │ Blinker │ │ Sensor │ │ Idle │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ └────────────┴────────────┴────────────┘ │
│ │ │
│ SVC (syscall) │
│ │ │
│ ─────────────────────────┼───────────────────────────────── │
│ ▼ │
│ Kernel Space │
│ ──────────── │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Syscall │ Scheduler │ IPC │ Memory │ Shell │ │
│ │ Handler │ │ Manager │ Manager │ │ │
│ └──────┬──────────────┬───────────┬───────────┬──────┬────┘ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Hardware Abstraction │ │
│ │ SysTick │ MPU │ NVIC │ UART │ GPIO │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Memory Map
Memory Layout:
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Flash (128KB example): │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 0x08000000 Vector Table (kernel) 256 bytes │ │
│ │ 0x08000100 Kernel Code ~8 KB │ │
│ │ 0x08002000 User Code (shared) ~118 KB │ │
│ │ 0x0801FFFF End of Flash │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ RAM (32KB example): │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 0x20000000 Kernel Data/BSS ~2 KB │ │
│ │ 0x20000800 Kernel Heap ~2 KB │ │
│ │ 0x20001000 User Heap ~4 KB │ │
│ │ 0x20002000 Task Stacks (8 x 1KB) ~8 KB │ │
│ │ 0x2000A000 Free ~16 KB │ │
│ │ 0x20007FFF End of RAM │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Data Structures
// Task Control Block
typedef struct TCB {
uint32_t *stack_ptr; // Current stack pointer (must be first!)
uint32_t *stack_base; // Bottom of stack
uint32_t stack_size; // Size of stack
uint8_t state; // READY, RUNNING, BLOCKED, ZOMBIE
uint8_t priority; // 0 (highest) to 7 (lowest)
uint8_t pid; // Process ID
char name[16]; // Task name
uint32_t sleep_until; // For sleep syscall
void *waiting_on; // Semaphore/queue we're blocked on
struct TCB *next; // Next in ready/wait queue
} TCB;
// Task states
#define TASK_READY 0
#define TASK_RUNNING 1
#define TASK_BLOCKED 2
#define TASK_ZOMBIE 3
// Semaphore
typedef struct {
volatile int32_t count;
TCB *wait_queue;
char name[16];
} Semaphore;
// Message Queue
typedef struct {
uint8_t *buffer;
uint32_t msg_size;
uint32_t capacity;
uint32_t count;
uint32_t head;
uint32_t tail;
Semaphore sem_full;
Semaphore sem_empty;
Semaphore sem_mutex;
char name[16];
} MessageQueue;
// Kernel state
typedef struct {
TCB *current_task;
TCB *ready_queue[8]; // One queue per priority
TCB tasks[MAX_TASKS];
uint8_t num_tasks;
uint32_t ticks; // System tick counter
Semaphore semaphores[MAX_SEMS];
MessageQueue queues[MAX_QUEUES];
} Kernel;
Syscall Table
// System call numbers
#define SYS_YIELD 0
#define SYS_EXIT 1
#define SYS_WRITE 2
#define SYS_READ 3
#define SYS_SPAWN 4
#define SYS_SLEEP 5
#define SYS_SEM_CREATE 10
#define SYS_SEM_WAIT 11
#define SYS_SEM_SIGNAL 12
#define SYS_QUEUE_CREATE 20
#define SYS_QUEUE_SEND 21
#define SYS_QUEUE_RECV 22
// Syscall wrapper (user space)
static inline int syscall(int num, int arg0, int arg1, int arg2) {
register int r0 __asm__("r0") = arg0;
register int r1 __asm__("r1") = arg1;
register int r2 __asm__("r2") = arg2;
register int ret __asm__("r0");
__asm__ volatile (
"svc %[num]"
: "=r" (ret)
: [num] "I" (num), "r" (r0), "r" (r1), "r" (r2)
: "memory"
);
return ret;
}
// User space wrappers
void sys_yield(void) { syscall(SYS_YIELD, 0, 0, 0); }
void sys_exit(int code) { syscall(SYS_EXIT, code, 0, 0); }
int sys_write(int fd, const void *buf, int len) {
return syscall(SYS_WRITE, fd, (int)buf, len);
}
Implementation Guide
Development Environment Setup
# Required tools
sudo apt-get install gcc-arm-none-eabi gdb-multiarch openocd
# Create project structure
mkdir -p tinyos/{kernel,user,inc}
cd tinyos
Project Structure
tinyos/
├── kernel/
│ ├── main.c # Kernel entry point
│ ├── scheduler.c # Task scheduling
│ ├── syscall.c # Syscall handler
│ ├── ipc.c # Semaphores, queues
│ ├── mpu.c # MPU configuration
│ ├── fault.c # Fault handlers
│ └── startup.s # Vector table, context switch
├── user/
│ ├── shell.c # Shell task
│ ├── syscall_wrappers.c # Syscall user-side
│ ├── blinker.c # Example task
│ └── user_entry.c # User task entry points
├── inc/
│ ├── kernel.h # Kernel interfaces
│ ├── syscall.h # Syscall numbers
│ ├── ipc.h # IPC structures
│ └── stm32f4xx.h # Hardware definitions
├── linker_kernel.ld # Kernel linker script
├── linker_user.ld # User linker script (optional)
└── Makefile
Implementation Phases
Phase 1: Basic Kernel (Days 1-7)
Goals:
- Boot into kernel mode
- Run a single task in unprivileged mode
- Implement syscall mechanism
Tasks:
- Set up kernel startup code (stack, BSS clear)
- Create first user task structure
- Switch to unprivileged mode
- Implement SVC handler skeleton
- Implement sys_yield (no-op at first)
- Test: User task runs and can call syscall
Checkpoint: User task runs and successfully calls sys_write.
// Kernel main
void kernel_main(void) {
// Initialize kernel structures
kernel_init();
// Create shell task
task_create(shell_main, "shell", 2, 512);
// Create idle task
task_create(idle_main, "idle", 7, 128);
// Start first task
scheduler_start();
// Never returns
}
// Switch to unprivileged mode (assembly)
// startup.s:
.global start_first_task
.thumb_func
start_first_task:
// Load task's stack pointer
ldr r0, =current_task
ldr r0, [r0]
ldr r0, [r0] // r0 = stack_ptr
// Restore context from stack
ldmia r0!, {r4-r11} // Restore R4-R11
msr psp, r0 // Set PSP to task's stack
// Switch to unprivileged mode using PSP
mov r0, #3 // CONTROL: PSP, unprivileged
msr control, r0
isb
// "Return" to user task
pop {r0-r3, r12, lr}
pop {pc}
Phase 2: Preemptive Scheduling (Days 8-14)
Goals:
- Multiple tasks running
- SysTick-driven preemption
- Round-robin scheduling
Tasks:
- Implement ready queue
- Configure SysTick for 1ms ticks
- Implement PendSV for context switching
- Add multiple test tasks
- Implement sys_yield properly
- Test: Tasks take turns running
Checkpoint: Multiple tasks run in round-robin fashion.
// SysTick handler
void SysTick_Handler(void) {
kernel.ticks++;
// Check for sleeping tasks to wake
wake_sleeping_tasks();
// Trigger context switch
SCB->ICSR |= SCB_ICSR_PENDSVSET_Msk;
}
// PendSV handler (context switch)
// In assembly for precise control
.global PendSV_Handler
.thumb_func
PendSV_Handler:
// Save current context
mrs r0, psp
stmdb r0!, {r4-r11} // Push R4-R11
// Save stack pointer to current TCB
ldr r1, =current_task
ldr r1, [r1]
str r0, [r1] // Save SP to TCB
// Call scheduler to get next task
push {lr}
bl scheduler_next_task
pop {lr}
// Load new task's stack pointer
ldr r1, =current_task
ldr r1, [r1]
ldr r0, [r1]
// Restore context
ldmia r0!, {r4-r11} // Pop R4-R11
msr psp, r0
// Return to new task
bx lr
// Scheduler
TCB *scheduler_next_task(void) {
// Find highest priority ready task
for (int p = 0; p < 8; p++) {
if (kernel.ready_queue[p]) {
TCB *task = kernel.ready_queue[p];
kernel.ready_queue[p] = task->next;
task->state = TASK_RUNNING;
kernel.current_task = task;
return task;
}
}
// Should never reach here (idle task is always ready)
return NULL;
}
Phase 3: MPU Protection (Days 15-21)
Goals:
- Configure MPU regions
- Kernel memory protected
- Fault on violations
Tasks:
- Define memory regions
- Implement MPU configuration
- Set up MemManage fault handler
- Test protection with bad accesses
- Terminate violating task
Checkpoint: User task that touches kernel memory is terminated, system continues.
// MPU configuration
void mpu_init(void) {
// Disable MPU during configuration
MPU->CTRL = 0;
// Region 0: Kernel code (read-only, privileged only)
MPU->RNR = 0;
MPU->RBAR = 0x08000000 | MPU_RBAR_VALID_Msk | 0;
MPU->RASR = MPU_RASR_ENABLE_Msk |
(14 << MPU_RASR_SIZE_Pos) | // 32KB
(5 << MPU_RASR_AP_Pos); // RO, privileged only
// Region 1: Kernel data (read-write, privileged only)
MPU->RNR = 1;
MPU->RBAR = 0x20000000 | MPU_RBAR_VALID_Msk | 1;
MPU->RASR = MPU_RASR_ENABLE_Msk |
(12 << MPU_RASR_SIZE_Pos) | // 8KB
(1 << MPU_RASR_AP_Pos) | // RW, privileged only
MPU_RASR_XN_Msk; // No execute
// Region 2: User code (read-only, user accessible)
MPU->RNR = 2;
MPU->RBAR = 0x08008000 | MPU_RBAR_VALID_Msk | 2;
MPU->RASR = MPU_RASR_ENABLE_Msk |
(16 << MPU_RASR_SIZE_Pos) | // 128KB
(6 << MPU_RASR_AP_Pos); // RO, user accessible
// Region 3-7: Task stacks (configured per task)
// Enable MPU with default memory map for privileged
MPU->CTRL = MPU_CTRL_ENABLE_Msk |
MPU_CTRL_PRIVDEFENA_Msk;
__DSB();
__ISB();
}
// Configure stack region for task
void mpu_configure_task_stack(TCB *task, int region) {
MPU->RNR = region;
MPU->RBAR = (uint32_t)task->stack_base | MPU_RBAR_VALID_Msk | region;
MPU->RASR = MPU_RASR_ENABLE_Msk |
(9 << MPU_RASR_SIZE_Pos) | // 1KB (adjust as needed)
(3 << MPU_RASR_AP_Pos) | // RW, full access
MPU_RASR_XN_Msk; // No execute
__DSB();
__ISB();
}
// MemManage fault handler
void MemManage_Handler(void) {
uint32_t mmfar = SCB->MMFAR;
uint32_t cfsr = SCB->CFSR;
printf("[FAULT] MPU violation at 0x%08X by task %d\n",
mmfar, kernel.current_task->pid);
printf("[FAULT] CFSR: 0x%08X\n", cfsr);
printf("[FAULT] Terminating task '%s'\n", kernel.current_task->name);
// Terminate the offending task
task_terminate(kernel.current_task);
// Clear fault status
SCB->CFSR = cfsr;
// Trigger reschedule
SCB->ICSR |= SCB_ICSR_PENDSVSET_Msk;
}
Phase 4: IPC (Days 22-28)
Goals:
- Implement semaphores
- Implement message queues
- Task blocking on IPC
Tasks:
- Implement semaphore create/wait/signal
- Implement blocking (move to wait queue)
- Implement waking (move to ready queue)
- Implement message queue
- Test producer/consumer scenario
Checkpoint: Producer/consumer with message queue works correctly.
// Semaphore wait
int sem_wait(Semaphore *sem) {
// Must be atomic
__disable_irq();
if (sem->count > 0) {
sem->count--;
__enable_irq();
return 0;
}
// Block current task
TCB *task = kernel.current_task;
task->state = TASK_BLOCKED;
task->waiting_on = sem;
// Add to wait queue
task->next = sem->wait_queue;
sem->wait_queue = task;
__enable_irq();
// Trigger context switch
SCB->ICSR |= SCB_ICSR_PENDSVSET_Msk;
__ISB();
// When we get here, we've been woken up
return 0;
}
// Semaphore signal
int sem_signal(Semaphore *sem) {
__disable_irq();
if (sem->wait_queue) {
// Wake first waiting task
TCB *task = sem->wait_queue;
sem->wait_queue = task->next;
task->state = TASK_READY;
task->waiting_on = NULL;
// Add to ready queue
add_to_ready_queue(task);
} else {
sem->count++;
}
__enable_irq();
return 0;
}
// Message queue send
int queue_send(MessageQueue *queue, void *msg) {
sem_wait(&queue->sem_full); // Wait if queue is full
sem_wait(&queue->sem_mutex); // Lock queue
// Copy message to buffer
memcpy(queue->buffer + (queue->tail * queue->msg_size),
msg, queue->msg_size);
queue->tail = (queue->tail + 1) % queue->capacity;
queue->count++;
sem_signal(&queue->sem_mutex); // Unlock queue
sem_signal(&queue->sem_empty); // Signal message available
return 0;
}
Phase 5: Shell and Polish (Days 29-35+)
Goals:
- Interactive shell
- Complete syscall set
- Thorough testing
Tasks:
- Implement shell task with command parsing
- Implement ps, kill, exec commands
- Add memory stats command
- Test all edge cases
- Optimize and clean up
Checkpoint: Full OS with shell managing tasks.
Hints in Layers
Hint 1: Getting Started
Start without MPU:
- Run two tasks in round-robin
- Get context switching working
- Then add MPU protection
- Debug kernel first, protect later
Hint 2: Context Switch Debugging
Common issues:
- Missing R4-R11 in save/restore (they’re not auto-saved)
- Wrong stack pointer (PSP vs MSP)
- EXC_RETURN value wrong (should be 0xFFFFFFFD for PSP/Thread)
- Stack misalignment (must be 8-byte aligned)
Hint 3: MPU Size Encoding
MPU region size is encoded as power of 2:
- Size = 2^(RASR.SIZE+1)
- SIZE=4 → 32 bytes
- SIZE=9 → 1KB
- SIZE=14 → 32KB
- Minimum size is 32 bytes (SIZE=4)
Hint 4: Syscall Number Extraction
The SVC number is in the instruction:
void SVC_Handler_C(uint32_t *stack) {
uint32_t pc = stack[6]; // Return address
uint8_t svc_num = ((uint8_t *)pc)[-2]; // SVC number
}
Hint 5: Blocking Correctly
When a task blocks:
- Set state to BLOCKED
- Remove from ready queue
- Add to wait queue
- Call scheduler to run next task
- When signaled: state to READY, add to ready queue
Hint 6: Priority Inversion
If high-priority task waits on semaphore held by low-priority task:
- Low task never runs (high is spinning)
- Solution: Priority inheritance
- Temporarily boost low task’s priority
Testing Strategy
Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test kernel primitives | Semaphore wait/signal |
| Scheduling | Test task switching | Multiple tasks, priorities |
| Protection | Test MPU | Bad access terminated |
| IPC | Test communication | Producer/consumer |
| Stress | Test limits | Many tasks, long runs |
Critical Test Cases
- Basic Scheduling: Three tasks run round-robin
- Priority: High priority preempts low
- Semaphore: Producer/consumer synchronized
- Message Queue: Messages delivered in order
- MPU Violation: Bad access terminates task
- Task Exit: Resources cleaned up
- Long Run: 1 hour with no crashes
- Stress: 8 tasks all active
Test Tasks
// Producer task
void producer_main(void) {
int count = 0;
while (1) {
Message msg = { .value = count++ };
queue_send(&data_queue, &msg);
sys_sleep(100);
}
}
// Consumer task
void consumer_main(void) {
while (1) {
Message msg;
queue_receive(&data_queue, &msg);
printf("Received: %d\n", msg.value);
}
}
// Bad task (for MPU testing)
void bad_task_main(void) {
volatile int *kernel_data = (int *)0x20000010;
*kernel_data = 0xDEADBEEF; // Should fault!
}
Common Pitfalls & Debugging
Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| PSP not set | Crash on first task | Set PSP before unprivileged |
| R4-R11 lost | Corrupted data after switch | Save/restore in PendSV |
| MPU order | Wrong region takes priority | Higher number = higher priority |
| Interrupt in syscall | Corruption | Disable IRQ in critical sections |
| Stack overflow | Random crashes | Check stack pointer, add guard |
| Deadlock | System hangs | Analyze wait graphs |
Debugging Strategies
- LED Debugging: Toggle LED at key points
- UART Logging: Printf in kernel (not user)
- Fault Analysis: Dump all status registers
- GDB Remote: Use your Project 14 stub!
- Statistics: Track context switches, syscalls
Kernel Panic Handler
void kernel_panic(const char *msg) {
__disable_irq();
printf("\n\n=== KERNEL PANIC ===\n");
printf("Message: %s\n", msg);
printf("Current task: %s (PID %d)\n",
kernel.current_task->name, kernel.current_task->pid);
// Dump registers, stack, etc.
dump_system_state();
// Halt
while (1) {
// Blink LED rapidly
toggle_led();
delay_ms(100);
}
}
Extensions & Challenges
Beginner Extensions
- More syscalls: getpid, nice, time
- Task arguments: Pass arguments to new tasks
- Exit codes: Report exit codes to parent
- Statistics: CPU usage per task
Intermediate Extensions
- Priority inheritance: Solve priority inversion
- Signals: SIGTERM, SIGKILL
- File system: Simple in-memory FS
- Dynamic memory: malloc/free for user tasks
- Drivers: Abstract driver interface
Advanced Extensions
- Virtual memory: Simplified paging
- Networking: TCP/IP stack
- USB support: USB device stack
- Multi-core: SMP scheduling
- Security: Capabilities, namespaces
Self-Assessment Checklist
Understanding
- I can explain ARM privilege levels and mode switching
- I understand how the MPU enforces memory protection
- I can describe the syscall flow from user to kernel
- I understand preemptive vs cooperative scheduling
- I can explain semaphore and queue operations
- I know how priority inversion occurs and how to prevent it
Implementation
- Tasks run in unprivileged mode
- Context switch works correctly
- MPU protects kernel from user
- Syscalls work for all implemented calls
- Semaphores block and wake correctly
- Message queues work with producer/consumer
- Shell can manage tasks
Quality
- System runs for hours without crashes
- Bad tasks don’t crash the kernel
- Kernel code is well-organized
- Memory usage is reasonable
The Interview Questions They’ll Ask
After completing this project, you’ll be ready for these questions:
- “How does an OS provide process isolation?”
- Hardware support: MMU/MPU restricts memory access
- Privilege levels: User code can’t execute privileged instructions
- Syscalls: Only way to request kernel services
- Each process has own address space
- “Explain how context switching works”
- Save current task’s registers to its stack
- Update current TCB with new stack pointer
- Select next task to run (scheduler)
- Load new task’s registers from its stack
- Return to new task
- “What is a deadlock and how do you prevent it?”
- Deadlock: Circular wait for resources
- Prevention: Lock ordering, timeout, detection
- Avoidance: Banker’s algorithm
- Recovery: Kill one process
- “How do syscalls work?”
- User executes SVC instruction with syscall number
- CPU switches to privileged mode, jumps to handler
- Kernel extracts number and arguments from stack
- Kernel performs operation, stores return value
- Returns to user mode
- “What is priority inversion and how do you solve it?”
- High-priority task waits on resource held by low-priority
- Medium-priority task runs, blocking low from releasing
- Solution: Priority inheritance (boost low temporarily)
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| OS fundamentals | “Operating Systems: Three Easy Pieces” | All chapters |
| ARM privilege modes | “Definitive Guide to ARM Cortex-M3/M4” | Chapter 3 |
| MPU | “Definitive Guide to ARM Cortex-M3/M4” | Chapter 11 |
| Scheduling | “Operating Systems: Three Easy Pieces” | Chapters 7-10 |
| Synchronization | “Operating Systems: Three Easy Pieces” | Chapters 26-31 |
| Embedded OS design | “Making Embedded Systems” | Chapters 9-10 |
Resources
Documentation
- ARM Cortex-M Technical Reference Manual
- ARM MPU documentation
- FreeRTOS source code (reference implementation)
Example Projects
- FreeRTOS
- Zephyr RTOS
- RIOT OS
- xv6 - Teaching OS
Related Projects in This Series
- Previous: Project 14: GDB Stub
- Next: Project 16: Game Console Capstone
- Prerequisites: Project 7: Context Switcher, Project 9: Exception Handler
This guide was expanded from LEARN_ARM_DEEP_DIVE.md. For the complete learning path, see the project index.