Project 9: Exception Handler & Fault Analyzer
Build a comprehensive fault handler that catches all ARM exceptions (HardFault, MemManage, BusFault, UsageFault), decodes the fault cause, prints a detailed stack trace, and optionally recovers or reboots.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4 - Expert |
| Time Estimate | 1-2 weeks |
| Language | C with ARM Assembly (primary), Rust (alternative) |
| Prerequisites | Projects 3-4 (bare-metal, UART), strong understanding of stack and calling conventions |
| Key Topics | Exception Model, Fault Registers, Stack Unwinding, CFSR, HFSR, MMFAR, BFAR |
1. Learning Objectives
After completing this project, you will:
- Understand the ARM Cortex-M exception model and vector table
- Master the structure of stacked exception frames
- Decode all fault status registers (CFSR, HFSR, MMFAR, BFAR)
- Determine the exact faulting instruction from the stacked PC
- Implement stack unwinding to generate call traces
- Understand the difference between precise and imprecise bus faults
- Know how to safely recover from or reset after faults
- Handle both MSP and PSP stack pointers in exception handlers
- Debug embedded systems crashes like a professional
2. Theoretical Foundation
2.1 Core Concepts
The ARM Exception Model
ARM Cortex-M processors have a sophisticated exception handling system. When a fault occurs, the processor automatically:
- Pushes an exception frame onto the stack
- Switches to Handler mode
- Loads the appropriate vector from the vector table
- Begins executing the exception handler
Exception Flow:
┌──────────────────┐
Normal execution ───────────────────►│ Fault occurs │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Save CPU state │
│ (R0-R3, R12, LR, │
│ PC, xPSR) │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Load handler │
│ from vector table│
└────────┬─────────┘
│
┌────────▼─────────┐
│ Execute handler │
│ (your code!) │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Return or reset │
└──────────────────┘
The Exception Frame
When an exception occurs, the processor automatically pushes 8 registers onto the current stack:
Stack Layout After Exception (grows downward):
┌───────────────────────────────────────────┐ Higher addresses
│ xPSR │ ← SP + 28
├───────────────────────────────────────────┤
│ PC (Return Address) │ ← SP + 24
├───────────────────────────────────────────┤
│ LR (Link Register) │ ← SP + 20
├───────────────────────────────────────────┤
│ R12 │ ← SP + 16
├───────────────────────────────────────────┤
│ R3 │ ← SP + 12
├───────────────────────────────────────────┤
│ R2 │ ← SP + 8
├───────────────────────────────────────────┤
│ R1 │ ← SP + 4
├───────────────────────────────────────────┤
│ R0 │ ← SP (stack pointer)
└───────────────────────────────────────────┘ Lower addresses
Note: If FPU was active, additional 18 registers are pushed!
The stacked PC contains the address of the instruction that was executing (or about to execute) when the fault occurred. This is crucial for debugging.
Fault Status Registers
ARM Cortex-M provides detailed fault information through several System Control Block (SCB) registers:
Fault Register Map:
┌─────────────────────────────────────────────────────────────────┐
│ CFSR (0xE000ED28) - 32 bits │
│ ┌──────────────────┬──────────────────┬──────────────────┐ │
│ │ UsageFault (16) │ BusFault (8) │ MemManage (8) │ │
│ │ Bits 31-16 │ Bits 15-8 │ Bits 7-0 │ │
│ └──────────────────┴──────────────────┴──────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ HFSR (0xE000ED2C) - HardFault Status │
│ • VECTTBL: Vector table read fault │
│ • FORCED: Escalated from another fault │
│ • DEBUGEVT: Debug event occurred │
├─────────────────────────────────────────────────────────────────┤
│ MMFAR (0xE000ED34) - MemManage Fault Address │
│ • Contains the address that caused MemManage fault │
│ • Only valid if MMARVALID bit is set in CFSR │
├─────────────────────────────────────────────────────────────────┤
│ BFAR (0xE000ED38) - BusFault Address │
│ • Contains the address that caused BusFault │
│ • Only valid if BFARVALID bit is set in CFSR │
└─────────────────────────────────────────────────────────────────┘
Types of Faults
Fault Hierarchy:
┌─────────────────────────────────────────────────────────────────┐
│ HardFault │
│ • Highest priority fault handler │
│ • Catches faults that can't be handled by other handlers │
│ • Also catches faults when other handlers are disabled │
├─────────────────────────────────────────────────────────────────┤
│ MemManage Fault │
│ • MPU violations │
│ • Execute from non-executable region │
│ • Access to restricted region │
├─────────────────────────────────────────────────────────────────┤
│ BusFault │
│ • Bus error during memory access │
│ • Access to invalid peripheral address │
│ • Can be precise (address known) or imprecise │
├─────────────────────────────────────────────────────────────────┤
│ UsageFault │
│ • Undefined instruction │
│ • Invalid state transition │
│ • Unaligned access (if configured) │
│ • Division by zero (if configured) │
└─────────────────────────────────────────────────────────────────┘
2.2 Why This Matters
This is the skill that separates embedded experts from beginners. When a production device crashes in the field, you need to diagnose the problem from minimal information. A good fault handler provides:
- Immediate diagnosis: Know exactly what went wrong
- Crash forensics: Understand the call chain that led to the crash
- Field debugging: Devices in the field can report crash information
- Root cause analysis: Fix the actual bug, not just the symptom
Industry usage:
- Automotive systems use fault handlers to log crashes before entering safe mode
- Medical devices record fault information for regulatory compliance
- IoT devices report fault data to cloud monitoring systems
- Operating systems use fault handlers for memory protection and process isolation
2.3 Historical Context
The exception model in Cortex-M processors evolved from earlier ARM architectures but was significantly simplified for embedded use. Key innovations:
- Automatic context saving: The processor saves registers automatically
- Tail-chaining: Efficient handling of back-to-back exceptions
- Late-arriving: Higher priority exceptions can preempt pending handlers
- Configurable fault handlers: Separate handlers for different fault types
The Cortex-M exception model was designed to enable deterministic, low-latency interrupt handling suitable for real-time systems, while also providing robust fault detection and debugging capabilities.
2.4 Common Misconceptions
Misconception 1: “HardFault means a hardware problem”
- Reality: HardFault is a software-triggered exception, often caused by memory access violations or invalid instructions. It’s called “hard” because it’s the fault of last resort.
Misconception 2: “The PC in the exception frame points to the faulting instruction”
- Reality: The stacked PC points to the instruction that was being executed or was about to execute. For precise faults, this is the faulting instruction. For imprecise faults, the CPU may have advanced past it.
Misconception 3: “BFAR/MMFAR always contain valid addresses”
- Reality: These registers are only valid when the corresponding VALID bit is set in CFSR.
Misconception 4: “You can always recover from a fault”
- Reality: Some faults corrupt state irreparably. The safest response is often to log and reset.
Misconception 5: “EXC_RETURN values are regular addresses”
- Reality: The special values 0xFFFFFFF* are EXC_RETURN codes that control exception return behavior, not actual memory addresses.
3. Project Specification
3.1 What You Will Build
A comprehensive fault handling system that:
- Catches all ARM Cortex-M exception types
- Decodes fault status registers into human-readable output
- Prints the stacked register values
- Generates a call stack trace
- Provides recovery options (reset or continue)
- Works on STM32 or similar Cortex-M boards
3.2 Functional Requirements
- Exception handlers: Implement HardFault, MemManage, BusFault, and UsageFault handlers
- Register dump: Print all stacked registers (R0-R3, R12, LR, PC, xPSR)
- Fault decoding: Parse and explain each bit in CFSR, HFSR
- Address reporting: Show MMFAR and BFAR when valid
- Stack trace: Unwind the stack to show call chain
- Trigger commands: Provide test functions that cause each fault type
- Recovery: Implement safe reset or recovery mechanisms
3.3 Non-Functional Requirements
- Reliability: Handler must not itself fault
- Minimalism: Handler should not depend on potentially corrupted state
- Speed: Fault information captured quickly before further corruption
- Portability: Work on any Cortex-M3/M4/M7 device
- Memory safety: Use minimal stack space in handler
3.4 Example Usage / Output
=== ARM Fault Handler Test Suite ===
Available fault triggers:
1. Trigger HardFault (invalid address)
2. Trigger BusFault (invalid peripheral)
3. Trigger UsageFault (undefined instruction)
4. Trigger MemManage (MPU violation)
5. Trigger UsageFault (divide by zero)
6. Trigger unaligned access
Select test (1-6): 2
Triggering BusFault...
!!! HARD FAULT DETECTED !!!
Fault Type: BusFault (Precise)
Fault Address: 0xE0100000 (invalid peripheral access)
Faulting Instruction: 0x08001234 (LDR R0, [R1])
Stacked Registers:
R0 = 0x00000042 R1 = 0xE0100000
R2 = 0x00000000 R3 = 0x20001234
R12 = 0x08004567 LR = 0x0800089B
PC = 0x08001234 xPSR= 0x61000000
Fault Status:
CFSR = 0x00000400 [PRECISERR]
HFSR = 0x40000000 [FORCED]
BFAR = 0xE0100000 (Valid)
CFSR Breakdown:
MemManage: (none)
BusFault: PRECISERR - Precise data bus error
UsageFault: (none)
Stack Trace:
#0 0x08001234 in read_sensor() at sensors.c:47
#1 0x0800089A in main() at main.c:123
#2 0x080000A2 in Reset_Handler() at startup.s:45
Call Stack Memory:
0x20003FF0: 0x08001234 <- Fault PC
0x20003FEC: 0x0800089A <- Called from
0x20003FE8: 0x080000A2 <- Called from
Action: System reset in 5 seconds...
3.5 Real World Outcome
What success looks like:
- Working fault handlers: All fault types are caught and reported
- Clear diagnostics: Output clearly identifies the fault cause and location
- Stack unwinding: Call trace shows how execution reached the fault
- Recovery mechanism: System can safely reset or recover
- Deep understanding: You can explain the ARM exception model to others
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────┐
│ Fault Handler System │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Vector Table │ │
│ │ ┌─────────┬─────────┬─────────┬─────────┬─────────┐ │ │
│ │ │ Reset │ NMI │ HardFlt │ MemMgmt │ BusFlt │... │ │
│ │ └────┬────┴────┬────┴────┬────┴────┬────┴────┬────┘ │ │
│ └───────│─────────│─────────│─────────│─────────│─────────┘ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Assembly Wrappers (Naked Functions) │ │
│ │ │ │
│ │ • Determine which stack was in use (MSP/PSP) │ │
│ │ • Get the exception frame pointer │ │
│ │ • Call C handler with frame pointer │ │
│ └─────────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ C Handler Implementation │ │
│ │ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ Read Fault Regs │ CFSR, HFSR, MMFAR, BFAR │ │
│ │ └────────┬─────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ Print Registers │ Stacked R0-R3, R12, LR, PC, xPSR │ │
│ │ └────────┬─────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ Decode Faults │ Parse CFSR bits, identify cause │ │
│ │ └────────┬─────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ Stack Unwind │ Follow frame pointers │ │
│ │ └────────┬─────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────┐ │ │
│ │ │ Recovery Action │ Reset, infinite loop, or return │ │
│ │ └──────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | Purpose | Key Functions |
|---|---|---|
| Vector Table | Routes exceptions to handlers | Assembly definitions |
| Assembly Wrapper | Determines stack, extracts frame | HardFault_Handler (naked) |
| C Handler | Main fault analysis logic | HardFault_Handler_C() |
| Fault Decoder | Parses status registers | decode_cfsr(), decode_hfsr() |
| Stack Unwinder | Generates call trace | unwind_stack() |
| Output System | Prints diagnostics | Uses UART from Project 4 |
| Recovery Module | Handles post-fault actions | system_reset(), hang() |
4.3 Data Structures
// Exception frame pushed automatically by hardware
typedef struct {
uint32_t r0;
uint32_t r1;
uint32_t r2;
uint32_t r3;
uint32_t r12;
uint32_t lr;
uint32_t pc; // Address of faulting instruction
uint32_t xpsr;
} exception_frame_t;
// Extended frame when FPU context is active
typedef struct {
exception_frame_t basic;
uint32_t s0_s15[16]; // S0-S15 FPU registers
uint32_t fpscr; // FPU Status and Control
uint32_t reserved;
} exception_frame_fpu_t;
// Decoded fault information
typedef struct {
uint32_t cfsr; // Configurable Fault Status Register
uint32_t hfsr; // HardFault Status Register
uint32_t mmfar; // MemManage Fault Address
uint32_t bfar; // BusFault Address
uint32_t afsr; // Auxiliary Fault Status Register
bool mmfar_valid;
bool bfar_valid;
} fault_info_t;
// Stack trace entry
typedef struct {
uint32_t pc; // Program counter
uint32_t lr; // Link register (return address)
const char *func_name; // Function name (if available)
} stack_frame_t;
4.4 Algorithm Overview
FUNCTION handle_fault(exception_frame):
// Phase 1: Capture fault state immediately
fault_info = read_fault_registers()
// Phase 2: Output basic information
print("!!! FAULT DETECTED !!!")
print_exception_frame(exception_frame)
// Phase 3: Decode and explain the fault
IF fault_info.cfsr & MEMMANAGE_BITS:
decode_memmanage_fault(fault_info)
IF fault_info.cfsr & BUSFAULT_BITS:
decode_bus_fault(fault_info)
IF fault_info.cfsr & USAGEFAULT_BITS:
decode_usage_fault(fault_info)
// Phase 4: Generate stack trace
unwind_stack(exception_frame)
// Phase 5: Clear fault status bits (write-1-to-clear)
CFSR = fault_info.cfsr
HFSR = fault_info.hfsr
// Phase 6: Recovery action
IF recoverable(fault_info):
attempt_recovery(exception_frame)
ELSE:
print("System reset in 5 seconds...")
delay(5000)
NVIC_SystemReset()
5. Implementation Guide
5.1 Development Environment Setup
# Verify toolchain
$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Arm Embedded Toolchain) 12.2.1
# Verify debugger
$ openocd --version
Open On-Chip Debugger 0.12.0
# Create project directory
$ mkdir -p ~/projects/arm-fault-handler
$ cd ~/projects/arm-fault-handler
# Create initial file structure
$ touch main.c fault_handler.c fault_handler.h startup.s linker.ld Makefile
# Hardware: STM32 Nucleo board connected via ST-Link
5.2 Project Structure
arm-fault-handler/
├── main.c # Test application with fault triggers
├── fault_handler.c # Fault handler implementation
├── fault_handler.h # Fault handler interface
├── fault_decode.c # CFSR/HFSR decoding functions
├── stack_unwind.c # Stack unwinding implementation
├── uart.c # UART driver (from Project 4)
├── uart.h
├── startup.s # Vector table and reset handler
├── linker.ld # Linker script
├── stm32f4xx.h # Register definitions
├── Makefile
└── README.md
5.3 The Core Question You’re Answering
“How does an ARM Cortex-M processor report errors, and how can I extract maximum debugging information from a crash?”
Before coding, understand: When your program does something illegal (invalid memory access, undefined instruction, etc.), the processor doesn’t just crash randomly. It follows a precise protocol: save state, set status bits, and jump to your handler. Your job is to read what the processor tells you.
5.4 Concepts You Must Understand First
Stop and research these before coding:
- EXC_RETURN Values
- What does LR contain during an exception? (Hint: 0xFFFFFFF*)
- How do you determine which stack (MSP/PSP) was in use?
- Book Reference: “The Definitive Guide to ARM Cortex-M3” Chapter 8 - Yiu
- CFSR Register Structure
- How is CFSR divided into three sub-registers?
- Which bits indicate address validity?
- Book Reference: “The Definitive Guide to ARM Cortex-M3” Chapter 12 - Yiu
- Precise vs Imprecise Faults
- What makes a bus fault imprecise?
- Why can’t you always determine the faulting address?
- Book Reference: ARM Cortex-M Technical Reference Manual
- Naked Functions
- Why do assembly wrappers need
__attribute__((naked))? - What does the compiler add to normal functions that would break exception handlers?
- Why do assembly wrappers need
5.5 Questions to Guide Your Design
Before implementing, think through these:
- Stack Selection
- How do you know if the faulted code was using MSP or PSP?
- What if the stack pointer itself is corrupted?
- Output Method
- Can you safely use printf() in a fault handler?
- What if the fault was caused by UART code?
- Recursion Protection
- What if your fault handler itself faults?
- How do you detect and handle nested faults?
- Memory Safety
- What if the stacked PC points to invalid memory?
- How do you safely read potentially corrupted memory?
5.6 Thinking Exercise
Trace Through an Exception
Before coding, trace what happens when code executes *(volatile int *)0xE0100000; (invalid address):
1. LDR instruction fetches from 0xE0100000
2. Bus returns error (no peripheral at that address)
3. Processor detects BusFault condition
4. CFSR.BFSR.PRECISERR bit set to 1
5. BFAR = 0xE0100000
6. HFSR.FORCED = 1 (escalated because BusFault handler might be disabled)
7. Processor pushes R0-R3, R12, LR, PC, xPSR to current stack
8. LR = 0xFFFFFFF9 (return to Thread mode, MSP)
9. PC loaded from vector table offset 0x0C (HardFault)
10. Your HardFault_Handler starts executing
Questions while tracing:
- Why did it escalate to HardFault?
- What does 0xFFFFFFF9 mean vs 0xFFFFFFFD?
- How do you get back to the stacked PC value?
5.7 Hints in Layers
Hint 1: Getting the Exception Frame
The first challenge is getting a pointer to the exception frame. The processor pushed it onto either MSP or PSP. You check bit 2 of LR (EXC_RETURN) to determine which stack was in use.
Hint 2: Assembly Wrapper Pattern
// The wrapper must be naked to avoid compiler-generated prologue
__attribute__((naked)) void HardFault_Handler(void) {
__asm volatile (
"TST LR, #4 \n" // Test bit 2 of EXC_RETURN
"ITE EQ \n" // If-Then-Else
"MRSEQ R0, MSP \n" // If 0, exception used MSP
"MRSNE R0, PSP \n" // If 1, exception used PSP
"B HardFault_Handler_C\n" // Jump to C handler
);
}
Hint 3: Reading Fault Registers
#define SCB_CFSR (*(volatile uint32_t *)0xE000ED28)
#define SCB_HFSR (*(volatile uint32_t *)0xE000ED2C)
#define SCB_MMFAR (*(volatile uint32_t *)0xE000ED34)
#define SCB_BFAR (*(volatile uint32_t *)0xE000ED38)
void read_fault_info(fault_info_t *info) {
info->cfsr = SCB_CFSR;
info->hfsr = SCB_HFSR;
info->mmfar = SCB_MMFAR;
info->bfar = SCB_BFAR;
info->mmfar_valid = (info->cfsr & (1 << 7)) != 0; // MMARVALID
info->bfar_valid = (info->cfsr & (1 << 15)) != 0; // BFARVALID
}
Hint 4: CFSR Bit Decoding
// CFSR bit definitions
#define CFSR_IACCVIOL (1 << 0) // Instruction access violation
#define CFSR_DACCVIOL (1 << 1) // Data access violation
#define CFSR_MUNSTKERR (1 << 3) // MemManage fault on unstacking
#define CFSR_MSTKERR (1 << 4) // MemManage fault on stacking
#define CFSR_MMARVALID (1 << 7) // MMFAR valid
#define CFSR_IBUSERR (1 << 8) // Bus fault on instruction fetch
#define CFSR_PRECISERR (1 << 9) // Precise data bus error
#define CFSR_IMPRECISERR (1 << 10) // Imprecise data bus error
#define CFSR_UNSTKERR (1 << 11) // Bus fault on unstacking
#define CFSR_STKERR (1 << 12) // Bus fault on stacking
#define CFSR_BFARVALID (1 << 15) // BFAR valid
#define CFSR_UNDEFINSTR (1 << 16) // Undefined instruction
#define CFSR_INVSTATE (1 << 17) // Invalid state (Thumb bit)
#define CFSR_INVPC (1 << 18) // Invalid PC load
#define CFSR_NOCP (1 << 19) // No coprocessor
#define CFSR_UNALIGNED (1 << 24) // Unaligned access
#define CFSR_DIVBYZERO (1 << 25) // Divide by zero
5.8 The Interview Questions They’ll Ask
Prepare to answer these:
- “Explain the ARM Cortex-M exception model”
- Vector table at fixed location (usually 0x0)
- Hardware automatically saves context
- Nested exceptions supported via NVIC priority
- EXC_RETURN controls exception return behavior
- “What’s the difference between MSP and PSP?”
- MSP: Main Stack Pointer, used by handlers and privileged code
- PSP: Process Stack Pointer, used by threads/tasks
- Allows separation of kernel and user stacks
- CONTROL register selects which is active in Thread mode
- “How do you handle a fault in the fault handler?”
- Nested fault escalates to HardFault
- HardFault in HardFault handler causes lockup
- Use minimal code, avoid memory allocation
- Consider watchdog timer as last resort
- “What does EXC_RETURN value 0xFFFFFFF9 mean?”
- Return to Thread mode
- Use MSP for stack
- No floating point context
- (Compare to 0xFFFFFFFD: use PSP)
- “How would you debug a crash that only happens in production?”
- Implement fault handler that logs to persistent storage
- Store registers, stack trace, and fault registers
- Transmit crash data on next boot
- Consider using hardware watchdog for hangs
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Exception model | “The Definitive Guide to ARM Cortex-M3 and Cortex-M4” by Yiu | Ch. 8 |
| Fault handling | “The Definitive Guide to ARM Cortex-M3 and Cortex-M4” by Yiu | Ch. 12 |
| Stack unwinding | “Computer Systems: A Programmer’s Perspective” by Bryant | Ch. 3 |
| Debug features | ARM Cortex-M Technical Reference Manual | Ch. 11 |
| Practical debugging | “Making Embedded Systems” by White | Ch. 10 |
5.10 Implementation Phases
Phase 1: Basic HardFault Handler (3-4 hours)
- Implement assembly wrapper to get stack pointer
- Write C handler that prints stacked PC
- Verify it catches a simple fault (null pointer dereference)
Phase 2: Full Register Dump (2-3 hours)
- Print all stacked registers
- Add CFSR, HFSR values
- Print MMFAR/BFAR when valid
Phase 3: Fault Decoding (3-4 hours)
- Implement CFSR bit parsing
- Create human-readable fault descriptions
- Add HFSR decoding
Phase 4: Stack Unwinding (4-6 hours)
- Implement frame pointer following
- Generate call trace
- Handle edge cases (stack corruption)
Phase 5: Test Suite & Polish (3-4 hours)
- Create functions to trigger each fault type
- Test all code paths
- Add recovery/reset mechanism
5.11 Key Implementation Decisions
| Decision | Trade-offs |
|---|---|
| Separate handlers vs unified | Separate: cleaner routing. Unified: simpler, all escalate to HardFault |
| Printf vs direct UART | Printf: convenient. Direct UART: safer, no malloc |
| Symbol table for stack trace | With: function names. Without: smaller binary, still useful |
| Recovery vs reset | Recovery: complex, risk of corruption. Reset: safe, lose state |
| FPU context handling | Handle: complete. Ignore: simpler, only for non-FPU code |
6. Testing Strategy
6.1 Fault Trigger Functions
// Trigger HardFault via invalid address read
void trigger_hardfault_invalid_addr(void) {
volatile int *p = (int *)0xFFFFFFFF;
int x = *p; // Bus fault, escalates to HardFault
(void)x;
}
// Trigger BusFault via invalid peripheral access
void trigger_busfault_peripheral(void) {
volatile int *p = (int *)0xE0100000; // Invalid peripheral
int x = *p;
(void)x;
}
// Trigger UsageFault via undefined instruction
void trigger_usagefault_undef(void) {
// Execute undefined instruction
__asm volatile (".word 0xFFFFFFFF");
}
// Trigger UsageFault via divide by zero
void trigger_usagefault_divzero(void) {
// Enable div-by-zero trap first
SCB->CCR |= SCB_CCR_DIV_0_TRP_Msk;
volatile int a = 10, b = 0;
int c = a / b;
(void)c;
}
// Trigger unaligned access fault
void trigger_unaligned(void) {
// Enable unaligned trap first
SCB->CCR |= SCB_CCR_UNALIGN_TRP_Msk;
uint8_t buffer[8] = {0};
volatile uint32_t *p = (uint32_t *)(buffer + 1);
*p = 0x12345678;
}
6.2 Integration Tests
| Test | Trigger | Expected Result |
|---|---|---|
| Invalid address | Read from 0xFFFFFFFF | HardFault, shows PRECISERR or IMPRECISERR |
| Invalid peripheral | Read from 0xE0100000 | BusFault, BFAR = 0xE0100000 |
| Undefined instruction | Execute 0xFFFFFFFF | UsageFault, UNDEFINSTR |
| Divide by zero | a / 0 | UsageFault, DIVBYZERO |
| Unaligned access | Unaligned word read | UsageFault, UNALIGNED |
| Stack overflow | Deep recursion | HardFault, corrupted stack |
6.3 Verification Checklist
[ ] HardFault handler is called for each test
[ ] Stacked PC points to or near faulting instruction
[ ] CFSR correctly identifies fault type
[ ] BFAR/MMFAR contain correct address when valid
[ ] Stack trace shows correct call chain
[ ] System resets cleanly after fault
[ ] Handler doesn't crash on corrupted stack
[ ] Output is readable via UART
7. Common Pitfalls & Debugging
Problem 1: “Handler never gets called”
- Why: Vector table not correctly placed or handler symbol missing
- Fix: Check linker script places vectors at 0x0, verify handler is not optimized out
- Debug: Use debugger to check vector table contents
Problem 2: “Wrong stack pointer used”
- Why: EXC_RETURN bit test incorrect
- Fix: Ensure
TST LR, #4andITE EQ/MRSEQ/MRSNEare correct - Debug: Print LR value, manually verify which stack should be used
Problem 3: “BFAR/MMFAR always shows 0”
- Why: Reading after clearing CFSR, or valid bit not set
- Fix: Read fault addresses before clearing CFSR, check VALID bits
- Debug: Print raw CFSR value, check valid bits
Problem 4: “Handler faults (nested fault)”
- Why: Using corrupted stack, calling unsafe functions
- Fix: Minimize handler code, don’t use malloc/printf
- Debug: Use dedicated fault handler stack, simplify handler
Problem 5: “Stack trace is garbage”
- Why: Stack corrupted, or frame pointer not used
- Fix: Compile with
-fno-omit-frame-pointer, validate addresses - Debug: Print raw stack memory, manually trace
Problem 6: “Imprecise fault shows wrong PC”
- Why: Imprecise faults don’t stall the pipeline
- Fix: Look at surrounding instructions, BFAR might be valid
- Debug: Disable write buffering with
SCnSCB->ACTLR |= 2;
8. Extensions & Challenges
8.1 Easy Extensions
| Extension | Description | Learning |
|---|---|---|
| Fault counter | Count each fault type | Persistent diagnostics |
| Last fault storage | Store in backup RAM | Survives reset |
| LED indication | Blink pattern for fault type | Visual debugging |
| Watchdog integration | Reset if handler hangs | System reliability |
8.2 Advanced Challenges
| Challenge | Description | Learning |
|---|---|---|
| Full symbol lookup | Map PC to function names | ELF parsing, debug info |
| Crash log to flash | Persist fault info | Flash programming |
| Remote reporting | Send faults via network | Error reporting systems |
| Fault injection | Controlled fault testing | Testing methodologies |
| MPU configuration | Set up memory protection | Hardware security |
| RTOS integration | Handle task faults | Task isolation |
8.3 Research Topics
- How do commercial RTOSes (FreeRTOS, Zephyr) handle faults?
- What is ARM TrustZone and how does it affect exception handling?
- How do debuggers like GDB implement breakpoints on ARM?
- What is lockup state and how do you recover from it?
9. Real-World Connections
9.1 Production Systems Using This
| System | How It Uses Fault Handling | Notable Feature |
|---|---|---|
| Automotive ECUs | Log faults, enter limp mode | Safety-critical recovery |
| Medical devices | Record faults for FDA compliance | Audit trail |
| FreeRTOS | Per-task fault handling | Task isolation |
| Zephyr RTOS | Comprehensive fault framework | Detailed diagnostics |
| ARM Mbed OS | Error handling hooks | Crash reporting |
9.2 How the Pros Do It
FreeRTOS:
- Configurable fault handlers per task
- Stack overflow detection
- Heap corruption detection
Zephyr RTOS:
- Detailed fault dump with all registers
- Memory domain protection with MPU
- Fault log to flash storage
Production devices:
- Store crash data in backup RAM or flash
- Transmit crash reports on next boot
- Implement watchdog as last-resort recovery
10. Self-Assessment Checklist
Before considering this project complete, verify:
- I can explain the ARM Cortex-M exception model and vector table
- I understand the structure of the hardware-pushed exception frame
- I can decode CFSR to identify any fault type
- I know the difference between MSP and PSP and when each is used
- I can explain EXC_RETURN values (0xFFFFFFF9, 0xFFFFFFFD, etc.)
- My handler correctly identifies the faulting instruction address
- My handler prints all fault status registers with decoded meanings
- I implemented stack unwinding to show the call chain
- My handler doesn’t itself cause faults
- I can trigger and correctly diagnose all fault types
- I understand the difference between precise and imprecise faults
- I can answer all the interview questions listed above
Next Steps
After completing this project, you’ll be well-prepared for:
- Project 14: GDB Stub - Use debug features for remote debugging
- Project 15: Tiny OS - Implement process isolation with fault handling
- Production debugging - Apply these skills to real embedded systems
The fault handling expertise you’ve gained is essential for any professional embedded developer. When production devices crash, you’ll be the one who can diagnose the problem.