Project 9: Memory Pool and Stack Safety

Build deterministic memory pools and stack safety checks to eliminate heap nondeterminism and detect stack overflow early.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	1-2 weeks
Main Programming Language	C
Alternative Programming Languages	Rust, Ada
Coolness Level	High
Business Potential	High
Prerequisites	Projects 1-8, task stacks, scheduler internals
Key Topics	Memory pools, deterministic allocation, stack watermarking, overflow

1. Learning Objectives

By completing this project, you will:

Implement a fixed-size memory pool allocator.
Measure stack high-water marks for each task.
Detect and handle stack overflow safely.
Eliminate dynamic heap usage from kernel code.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Fixed-Size Memory Pools and Deterministic Allocation

Fundamentals

A memory pool is a preallocated set of fixed-size blocks. Allocations and frees are constant time because they only manipulate a free list. This makes memory usage deterministic and avoids fragmentation, which is essential in real-time systems. Unlike malloc, a pool guarantees bounded allocation time and predictable memory use.

Additional fundamentals: Deterministic allocation is one of the biggest differences between desktop software and RTOS firmware. A memory pool turns allocation into a predictable operation with bounded time and no fragmentation. This makes timing analysis possible, which is crucial for real-time guarantees. When memory costs are explicit, you can justify pool sizes and prove that worst-case memory use is safe.

Deep Dive into the concept

Dynamic memory allocation with malloc is unpredictable in real-time systems because it can fragment memory and take unbounded time. Memory pools avoid this by dividing a fixed region into equal-sized blocks and managing them with a free list. Allocation removes a block from the list; free returns it. The time for these operations is O(1). The downside is that all blocks are the same size, so you may waste some memory for small allocations. A common approach is to use multiple pools for different sizes (e.g., 32, 64, 128 bytes), which balances determinism and efficiency.

Pool design also includes safety checks. You should guard against double frees and invalid pointers. This can be done by embedding a magic value in each block or by maintaining a bitmap of allocated blocks. For a learning project, a simple free-list with an optional allocation flag is enough. Another concern is alignment: blocks should be aligned to at least 4 or 8 bytes for safe access. The pool can enforce alignment by defining the block size as a multiple of the alignment.

Pools are often used for RTOS objects like queues, semaphores, and message buffers because they provide predictable allocation time. In this project, you will use pools for IPC messages or for task stacks. You must decide what each pool is used for and document the limits. If a pool is exhausted, you should return NULL or signal an error; blocking is not usually acceptable because it could cause a deadlock.

In practice, pools are initialized at boot and never resized. This fits the static nature of many embedded systems. You can verify pool usage by tracking allocation counts and high-water marks. This provides insight into how much memory you actually need and helps avoid over-provisioning.

Additional depth: Deterministic allocation also depends on predictable failure behavior. When a pool is exhausted, you must decide whether to return NULL, assert, or block. In real-time systems, blocking can cause deadlocks if the blocked task holds other resources. Returning NULL forces the caller to handle allocation failure explicitly, which is often the safest approach. You should document this behavior clearly and provide simple error reporting (e.g., a counter or log). This makes failures visible and helps you tune pool sizes.

Another design choice is pool placement in memory. Some MCUs have multiple RAM regions with different performance characteristics. For example, CCM RAM is fast but not accessible by DMA. If you allocate DMA buffers from the wrong pool, peripheral transfers will fail. It can be useful to have separate pools for DMA-capable and non-DMA memory. Even if you do not implement this now, mention it as a future extension to show awareness of hardware constraints.

Memory pools can also be used for kernel objects like TCBs, semaphores, and queues, which avoids dynamic allocation during runtime. A common RTOS design pattern is to create all tasks and objects at boot and then run without any further allocation. This eliminates runtime allocation failures entirely and simplifies worst-case timing analysis. Your pool API should support this usage by making initialization explicit and by providing clear statistics on pool usage.

Finally, pool debugging is important. If you accidentally free a pointer that is not from the pool, you can corrupt the free list and crash later. Adding a simple range check (verify pointer within pool range) and alignment check can catch these errors early. This adds a few cycles but pays off in robustness.

How this fit on projects

Memory pools are implemented in Sec. 3.1 and used to allocate IPC buffers or task control structures. They are also referenced when measuring stack safety in Sec. 3.7.

Definitions & key terms

Memory pool: Fixed-size block allocator.
Free list: Linked list of available blocks.
Fragmentation: Wasted memory due to variable-size allocations.
Determinism: Bounded and predictable timing behavior.

Mental model diagram (ASCII)

[Pool]
[Block]->[Block]->[Block] (free list)

How it works (step-by-step, with invariants and failure modes)

Pool initialized with N blocks.
Allocation pops a block from free list.
Free pushes block back to free list.
Failure mode: double free corrupts list.

Minimal concrete example

typedef struct block {
    struct block *next;
} block_t;

void *pool_alloc(pool_t *p) {
    if (!p->free) return NULL;
    block_t *b = p->free;
    p->free = b->next;
    return b;
}

Common misconceptions

“malloc is fine for RTOS” -> it can be unpredictable.
“Pools waste too much memory” -> multiple pool sizes reduce waste.
“Pool exhaustion should block” -> it can cause deadlocks.

Check-your-understanding questions

Why is pool allocation deterministic?
What is the main tradeoff of fixed-size blocks?
How would you detect double free?

Check-your-understanding answers

It only manipulates a free list, O(1).
Potential internal fragmentation due to fixed size.
Use allocation flags or a magic value per block.

Real-world applications

RTOS object allocation.
Packet buffers in networking stacks.

Where you’ll apply it

This project: Sec. 3.1, Sec. 3.7.2, Sec. 5.10 Phase 1.
Also used in: Project 7 and Project 8.

References

“Real-Time Concepts for Embedded Systems” by Qing Li, Ch. 13.
Memory pool designs in embedded RTOS documentation.

Key insights

Fixed-size pools trade flexibility for predictable timing and safety.

Summary

Memory pools provide constant-time allocation and eliminate fragmentation, which is essential for real-time determinism.

Homework/Exercises to practice the concept

Implement a pool with 32-byte blocks and test allocation/free.
Add an allocation counter and track high-water usage.
Simulate pool exhaustion and verify error handling.

Solutions to the homework/exercises

Allocate and free blocks in a loop; ensure no corruption.
Track max in-use blocks and print over UART.
Return NULL and log an error when pool is empty.

2.2 Stack Watermarking and Overflow Detection

Fundamentals

Each task stack must be large enough to handle its worst-case call depth. Stack overflows corrupt memory and cause unpredictable failures. Stack watermarking is a technique where you fill a stack with a known pattern at initialization and later scan it to find how much was used. Overflow detection can be implemented with guard values or MPU regions to catch corruption early.

Additional fundamentals: Stack safety is a reliability issue, not just a sizing problem. If a stack overflows, it corrupts memory silently and can lead to intermittent failures that are extremely hard to debug. Watermarks and canaries give you concrete signals that something went wrong.

Deep Dive into the concept

The stack is the most common source of RTOS faults because it is invisible during normal operation. Each task must be given a stack size large enough for its deepest call chain, local variables, and interrupt nesting (if interrupts use the same stack). Watermarking is a practical technique: you fill the entire stack region with a known pattern like 0xA5A5A5A5 at boot. Later, you scan from the base until the pattern changes; the distance gives you the maximum stack usage. This provides empirical evidence for sizing and can be logged periodically.

Overflow detection can be added by placing a guard word (canary) at the end of the stack. If the canary is overwritten, you know the stack overflowed. Some MCUs offer a Memory Protection Unit (MPU) that can create a guard region; any access to that region triggers a fault. This is more robust but also more complex to configure. For this project, a canary-based check is acceptable. You should decide where to place the canary: at the bottom of the stack (lowest address) if the stack grows downward. You should check it during context switches or on each tick.

Stack overflow can occur in ISRs as well. If you use MSP for interrupts and PSP for tasks, ISR stack overflow is less likely to corrupt task stacks, but you still need to size MSP correctly. Many RTOSes keep the kernel and interrupts on MSP and tasks on PSP. This is a good practice and can be added as an extension.

Another important concept is stack usage under worst-case conditions. A task may call a rarely-used function with large local buffers. This can cause a sudden stack spike that only occurs under specific conditions. Watermarking helps catch these cases, but it only records maximum usage after the fact. If you want proactive detection, you can check stack usage each context switch and trigger a safe fault if it exceeds a threshold. This prevents silent corruption and provides a clear failure mode.

Additional depth: Stack watermarking can be combined with periodic checks to detect rising usage trends. For example, you can scan stack usage every second and log the maximum. If you see the watermark creeping up over time, it may indicate a memory leak, recursion, or an unexpected code path. This can be invaluable for long-running systems where a rare execution path causes a stack spike after hours or days.

Overflow detection should also consider which stack is being checked. If you separate MSP and PSP, you should check both. Many systems forget to monitor the main stack (MSP), which can overflow during nested interrupts or during early boot. This can lead to obscure crashes that are hard to reproduce. A robust approach is to assign a known size to MSP, fill it with a pattern, and periodically check its canary as well.

Another subtlety is that stack canaries only tell you that an overflow has already happened; they do not prevent it. If you want proactive prevention, you can set a threshold and trigger a warning before the canary is overwritten. For example, if the watermark exceeds 80% of the stack, you can log a warning or reduce load. This is particularly useful in systems that must run for long periods without reset.

Finally, remember that stack usage is affected by compiler optimizations and debug settings. Debug builds often use more stack due to less optimization and additional debug metadata. If you size stacks based on debug builds, you may over-allocate. If you size based on release builds, you might miss worst-case debug usage. The best approach is to test both and document the differences, especially if you plan to ship a release build.

How this fit on projects

Stack watermarking is implemented in Sec. 3.1 and validated in Sec. 3.7.2. It also informs context switching design in Project 3.

Definitions & key terms

Watermark: Maximum observed stack usage.
Canary: Guard value to detect overflow.
MPU: Memory Protection Unit for guarding regions.
Stack growth direction: Downward on Cortex-M.

Mental model diagram (ASCII)

High addr
[stack top] <- SP starts here
[used region]
[unused 0xA5 pattern]
[canary]
Low addr

How it works (step-by-step, with invariants and failure modes)

Fill stack with pattern and set canary.
Task runs; stack overwrites pattern.
Periodically scan for watermark.
Check canary on context switch.
Failure mode: canary overwritten -> overflow detected.

Minimal concrete example

void stack_init(uint32_t *base, size_t words) {
    for (size_t i = 0; i < words; i++) base[i] = 0xA5A5A5A5;
    base[0] = 0xDEADBEEF; // canary
}

Common misconceptions

“Stack size is obvious” -> it often is not.
“Overflow will crash immediately” -> it can silently corrupt data.
“Watermarks are too expensive” -> scanning can be infrequent.

Check-your-understanding questions

Why is stack overflow dangerous in RTOS tasks?
What does a watermark tell you that static analysis cannot?
How does a canary detect overflow?

Check-your-understanding answers

It overwrites adjacent memory and corrupts system state.
It measures real usage under actual execution paths.
If the canary value changes, the stack exceeded its boundary.

Real-world applications

Safety-critical systems that require stack usage validation.
Long-running devices where stack corruption is catastrophic.

Where you’ll apply it

This project: Sec. 3.1, Sec. 3.7.2, Sec. 5.10 Phase 2.
Also used in: Project 3.

References

ARM application notes on stack usage.
“Making Embedded Systems” by Elecia White, Ch. 11.

Key insights

Stack safety is non-negotiable in RTOS design; detection mechanisms are essential.

Summary

Watermarking and canaries provide practical, low-cost defenses against stack overflow.

Homework/Exercises to practice the concept

Implement stack watermark scanning and log the result.
Add a canary check on each context switch.
Deliberately overflow a task stack and confirm detection.

Solutions to the homework/exercises

Compare watermark before and after task workload.
If canary corrupted, trigger safe fault handler.
Allocate a large local array to force overflow and verify detection.

3. Project Specification

3.1 What You Will Build

A fixed-size memory pool allocator and a stack safety subsystem that tracks high-water marks and detects overflow conditions for each task.

3.2 Functional Requirements

Memory Pool: Constant-time allocate/free for fixed-size blocks.
Stack Watermark: Measure maximum stack usage per task.
Overflow Detection: Detect and handle stack overflow safely.

3.3 Non-Functional Requirements

Performance: Pool alloc/free < 10 us.
Reliability: Overflow detection triggers consistently.
Usability: Stack usage reported over UART.

3.4 Example Usage / Output

[stack] task=control used=312 bytes
[pool] blocks_in_use=6 max=8

3.5 Data Formats / Schemas / Protocols

Pool struct: {free_list, block_size, total_blocks}.

3.6 Edge Cases

Double free of pool block.
Pool exhaustion.
Stack overflow during ISR.

3.7 Real World Outcome

Memory allocation is deterministic, and stack safety checks prevent silent corruption, making the RTOS robust for long-running deployments.

3.7.1 How to Run (Copy/Paste)

make clean all
make flash

Exit codes:

make: 0 success, 2 build failure.
openocd: 0 flash success, 1 connection failure.

3.7.2 Golden Path Demo (Deterministic)

Allocate and free pool blocks in a loop.
Log maximum blocks in use and stack watermarks.
Verify no errors and stable outputs.

3.7.3 Failure Demo (Deterministic)

Deliberately overflow a task stack with large local array.
Canary check triggers a fault handler.
Restore safe stack size and verify normal operation.

4. Solution Architecture

4.1 High-Level Design

Pool alloc/free -> deterministic memory
Stack init -> watermark + canary -> overflow detection

4.2 Key Components

Component	Responsibility	Key Decisions
`pool.c`	Fixed-size allocator	Block size and count
`stack.c`	Watermark scanning and canary checks	Scan frequency
`fault.c`	Safe overflow handling	Reset vs safe halt

4.4 Data Structures (No Full Code)

typedef struct {
    block_t *free;
    uint16_t block_size;
    uint16_t total_blocks;
} pool_t;

4.4 Algorithm Overview

Key Algorithm: Watermark Scan

Scan stack from base until pattern changes.
Compute used bytes = total - unused.
Update high-water mark.

Complexity Analysis:

Time: O(n) per scan.
Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

brew install arm-none-eabi-gcc openocd

5.2 Project Structure

rtos-p09/
|-- src/
|   |-- pool.c
|   |-- stack.c
|   `-- main.c
|-- include/
`-- Makefile

5.3 The Core Question You’re Answering

“How do I make memory usage deterministic and detect stack overflows before they corrupt my system?”

5.4 Concepts You Must Understand First

Fixed-size memory pools.
Stack watermarking and overflow detection.

5.5 Questions to Guide Your Design

What block sizes should the pool support?
How often should you scan stacks for watermarks?
What should the system do when overflow is detected?

5.6 Thinking Exercise

Estimate stack depth for a task with three nested function calls and 64-byte locals each.

5.7 The Interview Questions They’ll Ask

“Why avoid malloc in RTOS kernels?”
“What is a stack high-water mark?”
“How would you detect stack overflow?”

5.8 Hints in Layers

Hint 1: Start with one pool Add more sizes later if needed.

Hint 2: Fill with pattern Watermarking depends on a known fill value.

Hint 3: Canary check Check a single value for quick overflow detection.

5.9 Books That Will Help

Topic	Book	Chapter
Memory pools	“Real-Time Concepts for Embedded Systems”	Ch. 13
Stack usage	“Making Embedded Systems”	Ch. 11

5.10 Implementation Phases

Phase 1: Memory Pool (3-4 days)

Goals: Implement allocator and tests. Tasks: init pool, alloc/free, error handling. Checkpoint: Allocation stable under stress.

Phase 2: Stack Watermarking (3-4 days)

Goals: Measure stack usage. Tasks: fill pattern, scan, report. Checkpoint: Watermark output matches expected usage.

Phase 3: Overflow Detection (2-3 days)

Goals: Detect overflow and fail safely. Tasks: canary check, fault handler. Checkpoint: Overflow triggers safe fault.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Pool block size	Single vs multiple	Single	Simpler for first implementation
Overflow response	Reset vs safe halt	Safe halt	Preserve debugging context

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Pool alloc/free repeated without corruption.
Stack watermark reported correctly.
Canary check triggers on overflow.

6.3 Test Data

Pattern: 0xA5A5A5A5
Canary: 0xDEADBEEF

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Log pool usage and stack watermarks periodically.
Use GDB to inspect pool free list and stack contents.

7.3 Performance Traps

Scanning stacks too frequently can add overhead; scan periodically.

8. Extensions & Challenges

8.1 Beginner Extensions

Add multiple pool sizes for different allocations.
Add pool usage statistics.

8.2 Intermediate Extensions

Use MPU to create guard regions.
Add stack overflow hook for diagnostics.

8.3 Advanced Extensions

Implement a small slab allocator for variable sizes.
Add compile-time stack usage analysis.

9. Real-World Connections

9.1 Industry Applications

Safety-critical RTOS kernels with deterministic allocation.
Long-running devices requiring robust stack safety.

FreeRTOS: Heap alternatives and stack checks.
Zephyr: Memory slabs and stack sentinel.

9.3 Interview Relevance

Memory determinism and stack safety are common embedded interview topics.

10. Resources

10.1 Essential Reading

“Real-Time Concepts for Embedded Systems” by Qing Li.
“Making Embedded Systems” by Elecia White.

10.2 Video Resources

Embedded memory management tutorials.

10.3 Tools & Documentation

GDB for inspecting memory and stack usage.

Project 10

11. Self-Assessment Checklist

11.1 Understanding

I can explain why pools are deterministic.
I understand how stack watermarks work.

11.2 Implementation

Pool allocator works under stress.
Stack overflow detection triggers correctly.

11.3 Growth

I can justify stack sizing decisions.

12. Submission / Completion Criteria

Minimum Viable Completion:

Fixed-size pool allocator implemented.
Stack watermark reporting working.

Full Completion:

Overflow detection triggers safe handler.
Pool usage statistics logged.

Excellence (Going Above & Beyond):

MPU guard regions configured.
Multi-size pool allocator added.