Project 9: Exception Handler & Fault Analyzer

Build a comprehensive fault handler that catches all ARM exceptions (HardFault, MemManage, BusFault, UsageFault), decodes the fault cause, prints a detailed stack trace, and optionally recovers or reboots.

Quick Reference

Attribute	Value
Difficulty	Level 4 - Expert
Time Estimate	1-2 weeks
Language	C with ARM Assembly (primary), Rust (alternative)
Prerequisites	Projects 3-4 (bare-metal, UART), strong understanding of stack and calling conventions
Key Topics	Exception Model, Fault Registers, Stack Unwinding, CFSR, HFSR, MMFAR, BFAR

1. Learning Objectives

After completing this project, you will:

Understand the ARM Cortex-M exception model and vector table
Master the structure of stacked exception frames
Decode all fault status registers (CFSR, HFSR, MMFAR, BFAR)
Determine the exact faulting instruction from the stacked PC
Implement stack unwinding to generate call traces
Understand the difference between precise and imprecise bus faults
Know how to safely recover from or reset after faults
Handle both MSP and PSP stack pointers in exception handlers
Debug embedded systems crashes like a professional

2. Theoretical Foundation

2.1 Core Concepts

The ARM Exception Model

ARM Cortex-M processors have a sophisticated exception handling system. When a fault occurs, the processor automatically:

Pushes an exception frame onto the stack
Switches to Handler mode
Loads the appropriate vector from the vector table
Begins executing the exception handler

Exception Flow:
                                    ┌──────────────────┐
Normal execution ───────────────────►│ Fault occurs     │
                                    └────────┬─────────┘
                                             │
                                    ┌────────▼─────────┐
                                    │ Save CPU state   │
                                    │ (R0-R3, R12, LR, │
                                    │  PC, xPSR)       │
                                    └────────┬─────────┘
                                             │
                                    ┌────────▼─────────┐
                                    │ Load handler     │
                                    │ from vector table│
                                    └────────┬─────────┘
                                             │
                                    ┌────────▼─────────┐
                                    │ Execute handler  │
                                    │ (your code!)     │
                                    └────────┬─────────┘
                                             │
                                    ┌────────▼─────────┐
                                    │ Return or reset  │
                                    └──────────────────┘

The Exception Frame

When an exception occurs, the processor automatically pushes 8 registers onto the current stack:

Stack Layout After Exception (grows downward):
┌───────────────────────────────────────────┐  Higher addresses
│               xPSR                        │  ← SP + 28
├───────────────────────────────────────────┤
│               PC (Return Address)         │  ← SP + 24
├───────────────────────────────────────────┤
│               LR (Link Register)          │  ← SP + 20
├───────────────────────────────────────────┤
│               R12                         │  ← SP + 16
├───────────────────────────────────────────┤
│               R3                          │  ← SP + 12
├───────────────────────────────────────────┤
│               R2                          │  ← SP + 8
├───────────────────────────────────────────┤
│               R1                          │  ← SP + 4
├───────────────────────────────────────────┤
│               R0                          │  ← SP (stack pointer)
└───────────────────────────────────────────┘  Lower addresses

Note: If FPU was active, additional 18 registers are pushed!

The stacked PC contains the address of the instruction that was executing (or about to execute) when the fault occurred. This is crucial for debugging.

Fault Status Registers

ARM Cortex-M provides detailed fault information through several System Control Block (SCB) registers:

Fault Register Map:
┌─────────────────────────────────────────────────────────────────┐
│                    CFSR (0xE000ED28) - 32 bits                   │
│  ┌──────────────────┬──────────────────┬──────────────────┐     │
│  │ UsageFault (16)  │ BusFault (8)     │ MemManage (8)    │     │
│  │ Bits 31-16       │ Bits 15-8        │ Bits 7-0         │     │
│  └──────────────────┴──────────────────┴──────────────────┘     │
├─────────────────────────────────────────────────────────────────┤
│                    HFSR (0xE000ED2C) - HardFault Status          │
│  • VECTTBL: Vector table read fault                             │
│  • FORCED: Escalated from another fault                         │
│  • DEBUGEVT: Debug event occurred                               │
├─────────────────────────────────────────────────────────────────┤
│                    MMFAR (0xE000ED34) - MemManage Fault Address  │
│  • Contains the address that caused MemManage fault             │
│  • Only valid if MMARVALID bit is set in CFSR                   │
├─────────────────────────────────────────────────────────────────┤
│                    BFAR (0xE000ED38) - BusFault Address          │
│  • Contains the address that caused BusFault                    │
│  • Only valid if BFARVALID bit is set in CFSR                   │
└─────────────────────────────────────────────────────────────────┘

Types of Faults

Fault Hierarchy:
┌─────────────────────────────────────────────────────────────────┐
│                        HardFault                                │
│  • Highest priority fault handler                               │
│  • Catches faults that can't be handled by other handlers       │
│  • Also catches faults when other handlers are disabled         │
├─────────────────────────────────────────────────────────────────┤
│                        MemManage Fault                          │
│  • MPU violations                                               │
│  • Execute from non-executable region                           │
│  • Access to restricted region                                  │
├─────────────────────────────────────────────────────────────────┤
│                        BusFault                                 │
│  • Bus error during memory access                               │
│  • Access to invalid peripheral address                         │
│  • Can be precise (address known) or imprecise                  │
├─────────────────────────────────────────────────────────────────┤
│                        UsageFault                               │
│  • Undefined instruction                                        │
│  • Invalid state transition                                     │
│  • Unaligned access (if configured)                             │
│  • Division by zero (if configured)                             │
└─────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

This is the skill that separates embedded experts from beginners. When a production device crashes in the field, you need to diagnose the problem from minimal information. A good fault handler provides:

Immediate diagnosis: Know exactly what went wrong
Crash forensics: Understand the call chain that led to the crash
Field debugging: Devices in the field can report crash information
Root cause analysis: Fix the actual bug, not just the symptom

Industry usage:

Automotive systems use fault handlers to log crashes before entering safe mode
Medical devices record fault information for regulatory compliance
IoT devices report fault data to cloud monitoring systems
Operating systems use fault handlers for memory protection and process isolation

2.3 Historical Context

The exception model in Cortex-M processors evolved from earlier ARM architectures but was significantly simplified for embedded use. Key innovations:

Automatic context saving: The processor saves registers automatically
Tail-chaining: Efficient handling of back-to-back exceptions
Late-arriving: Higher priority exceptions can preempt pending handlers
Configurable fault handlers: Separate handlers for different fault types

The Cortex-M exception model was designed to enable deterministic, low-latency interrupt handling suitable for real-time systems, while also providing robust fault detection and debugging capabilities.

2.4 Common Misconceptions

Misconception 1: “HardFault means a hardware problem”

Reality: HardFault is a software-triggered exception, often caused by memory access violations or invalid instructions. It’s called “hard” because it’s the fault of last resort.

Misconception 2: “The PC in the exception frame points to the faulting instruction”

Reality: The stacked PC points to the instruction that was being executed or was about to execute. For precise faults, this is the faulting instruction. For imprecise faults, the CPU may have advanced past it.

Misconception 3: “BFAR/MMFAR always contain valid addresses”

Reality: These registers are only valid when the corresponding VALID bit is set in CFSR.

Misconception 4: “You can always recover from a fault”

Reality: Some faults corrupt state irreparably. The safest response is often to log and reset.

Misconception 5: “EXC_RETURN values are regular addresses”

Reality: The special values 0xFFFFFFF* are EXC_RETURN codes that control exception return behavior, not actual memory addresses.

3. Project Specification

3.1 What You Will Build

A comprehensive fault handling system that:

Catches all ARM Cortex-M exception types
Decodes fault status registers into human-readable output
Prints the stacked register values
Generates a call stack trace
Provides recovery options (reset or continue)
Works on STM32 or similar Cortex-M boards

3.2 Functional Requirements

Exception handlers: Implement HardFault, MemManage, BusFault, and UsageFault handlers
Register dump: Print all stacked registers (R0-R3, R12, LR, PC, xPSR)
Fault decoding: Parse and explain each bit in CFSR, HFSR
Address reporting: Show MMFAR and BFAR when valid
Stack trace: Unwind the stack to show call chain
Trigger commands: Provide test functions that cause each fault type
Recovery: Implement safe reset or recovery mechanisms

3.3 Non-Functional Requirements

Reliability: Handler must not itself fault
Minimalism: Handler should not depend on potentially corrupted state
Speed: Fault information captured quickly before further corruption
Portability: Work on any Cortex-M3/M4/M7 device
Memory safety: Use minimal stack space in handler

3.4 Example Usage / Output

=== ARM Fault Handler Test Suite ===

Available fault triggers:
  1. Trigger HardFault (invalid address)
  2. Trigger BusFault (invalid peripheral)
  3. Trigger UsageFault (undefined instruction)
  4. Trigger MemManage (MPU violation)
  5. Trigger UsageFault (divide by zero)
  6. Trigger unaligned access

Select test (1-6): 2

Triggering BusFault...

!!! HARD FAULT DETECTED !!!

Fault Type: BusFault (Precise)
Fault Address: 0xE0100000 (invalid peripheral access)
Faulting Instruction: 0x08001234 (LDR R0, [R1])

Stacked Registers:
  R0  = 0x00000042    R1  = 0xE0100000
  R2  = 0x00000000    R3  = 0x20001234
  R12 = 0x08004567    LR  = 0x0800089B
  PC  = 0x08001234    xPSR= 0x61000000

Fault Status:
  CFSR  = 0x00000400  [PRECISERR]
  HFSR  = 0x40000000  [FORCED]
  BFAR  = 0xE0100000  (Valid)

CFSR Breakdown:
  MemManage: (none)
  BusFault:  PRECISERR - Precise data bus error
  UsageFault: (none)

Stack Trace:
  #0 0x08001234 in read_sensor() at sensors.c:47
  #1 0x0800089A in main() at main.c:123
  #2 0x080000A2 in Reset_Handler() at startup.s:45

Call Stack Memory:
  0x20003FF0: 0x08001234  <- Fault PC
  0x20003FEC: 0x0800089A  <- Called from
  0x20003FE8: 0x080000A2  <- Called from

Action: System reset in 5 seconds...

3.5 Real World Outcome

What success looks like:

Working fault handlers: All fault types are caught and reported
Clear diagnostics: Output clearly identifies the fault cause and location
Stack unwinding: Call trace shows how execution reached the fault
Recovery mechanism: System can safely reset or recover
Deep understanding: You can explain the ARM exception model to others

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                    Fault Handler System                          │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                   Vector Table                           │    │
│  │  ┌─────────┬─────────┬─────────┬─────────┬─────────┐    │    │
│  │  │ Reset   │ NMI     │ HardFlt │ MemMgmt │ BusFlt  │... │    │
│  │  └────┬────┴────┬────┴────┬────┴────┬────┴────┬────┘    │    │
│  └───────│─────────│─────────│─────────│─────────│─────────┘    │
│          │         │         │         │         │               │
│          ▼         ▼         ▼         ▼         ▼               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │            Assembly Wrappers (Naked Functions)           │    │
│  │                                                          │    │
│  │  • Determine which stack was in use (MSP/PSP)           │    │
│  │  • Get the exception frame pointer                       │    │
│  │  • Call C handler with frame pointer                     │    │
│  └─────────────────────────────┬───────────────────────────┘    │
│                                │                                 │
│                                ▼                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │               C Handler Implementation                   │    │
│  │                                                          │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Read Fault Regs  │ CFSR, HFSR, MMFAR, BFAR           │    │
│  │  └────────┬─────────┘                                   │    │
│  │           │                                              │    │
│  │           ▼                                              │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Print Registers  │ Stacked R0-R3, R12, LR, PC, xPSR  │    │
│  │  └────────┬─────────┘                                   │    │
│  │           │                                              │    │
│  │           ▼                                              │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Decode Faults    │ Parse CFSR bits, identify cause   │    │
│  │  └────────┬─────────┘                                   │    │
│  │           │                                              │    │
│  │           ▼                                              │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Stack Unwind     │ Follow frame pointers              │    │
│  │  └────────┬─────────┘                                   │    │
│  │           │                                              │    │
│  │           ▼                                              │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Recovery Action  │ Reset, infinite loop, or return   │    │
│  │  └──────────────────┘                                   │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component	Purpose	Key Functions
Vector Table	Routes exceptions to handlers	Assembly definitions
Assembly Wrapper	Determines stack, extracts frame	HardFault_Handler (naked)
C Handler	Main fault analysis logic	HardFault_Handler_C()
Fault Decoder	Parses status registers	decode_cfsr(), decode_hfsr()
Stack Unwinder	Generates call trace	unwind_stack()
Output System	Prints diagnostics	Uses UART from Project 4
Recovery Module	Handles post-fault actions	system_reset(), hang()

4.3 Data Structures

// Exception frame pushed automatically by hardware
typedef struct {
    uint32_t r0;
    uint32_t r1;
    uint32_t r2;
    uint32_t r3;
    uint32_t r12;
    uint32_t lr;
    uint32_t pc;      // Address of faulting instruction
    uint32_t xpsr;
} exception_frame_t;

// Extended frame when FPU context is active
typedef struct {
    exception_frame_t basic;
    uint32_t s0_s15[16];   // S0-S15 FPU registers
    uint32_t fpscr;        // FPU Status and Control
    uint32_t reserved;
} exception_frame_fpu_t;

// Decoded fault information
typedef struct {
    uint32_t cfsr;         // Configurable Fault Status Register
    uint32_t hfsr;         // HardFault Status Register
    uint32_t mmfar;        // MemManage Fault Address
    uint32_t bfar;         // BusFault Address
    uint32_t afsr;         // Auxiliary Fault Status Register
    bool mmfar_valid;
    bool bfar_valid;
} fault_info_t;

// Stack trace entry
typedef struct {
    uint32_t pc;           // Program counter
    uint32_t lr;           // Link register (return address)
    const char *func_name; // Function name (if available)
} stack_frame_t;

4.4 Algorithm Overview

FUNCTION handle_fault(exception_frame):
    // Phase 1: Capture fault state immediately
    fault_info = read_fault_registers()

    // Phase 2: Output basic information
    print("!!! FAULT DETECTED !!!")
    print_exception_frame(exception_frame)

    // Phase 3: Decode and explain the fault
    IF fault_info.cfsr & MEMMANAGE_BITS:
        decode_memmanage_fault(fault_info)
    IF fault_info.cfsr & BUSFAULT_BITS:
        decode_bus_fault(fault_info)
    IF fault_info.cfsr & USAGEFAULT_BITS:
        decode_usage_fault(fault_info)

    // Phase 4: Generate stack trace
    unwind_stack(exception_frame)

    // Phase 5: Clear fault status bits (write-1-to-clear)
    CFSR = fault_info.cfsr
    HFSR = fault_info.hfsr

    // Phase 6: Recovery action
    IF recoverable(fault_info):
        attempt_recovery(exception_frame)
    ELSE:
        print("System reset in 5 seconds...")
        delay(5000)
        NVIC_SystemReset()

5. Implementation Guide

5.1 Development Environment Setup

# Verify toolchain
$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Arm Embedded Toolchain) 12.2.1

# Verify debugger
$ openocd --version
Open On-Chip Debugger 0.12.0

# Create project directory
$ mkdir -p ~/projects/arm-fault-handler
$ cd ~/projects/arm-fault-handler

# Create initial file structure
$ touch main.c fault_handler.c fault_handler.h startup.s linker.ld Makefile

# Hardware: STM32 Nucleo board connected via ST-Link

5.2 Project Structure

arm-fault-handler/
├── main.c              # Test application with fault triggers
├── fault_handler.c     # Fault handler implementation
├── fault_handler.h     # Fault handler interface
├── fault_decode.c      # CFSR/HFSR decoding functions
├── stack_unwind.c      # Stack unwinding implementation
├── uart.c              # UART driver (from Project 4)
├── uart.h
├── startup.s           # Vector table and reset handler
├── linker.ld           # Linker script
├── stm32f4xx.h         # Register definitions
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How does an ARM Cortex-M processor report errors, and how can I extract maximum debugging information from a crash?”

Before coding, understand: When your program does something illegal (invalid memory access, undefined instruction, etc.), the processor doesn’t just crash randomly. It follows a precise protocol: save state, set status bits, and jump to your handler. Your job is to read what the processor tells you.

5.4 Concepts You Must Understand First

Stop and research these before coding:

EXC_RETURN Values
- What does LR contain during an exception? (Hint: 0xFFFFFFF*)
- How do you determine which stack (MSP/PSP) was in use?
- Book Reference: “The Definitive Guide to ARM Cortex-M3” Chapter 8 - Yiu
CFSR Register Structure
- How is CFSR divided into three sub-registers?
- Which bits indicate address validity?
- Book Reference: “The Definitive Guide to ARM Cortex-M3” Chapter 12 - Yiu
Precise vs Imprecise Faults
- What makes a bus fault imprecise?
- Why can’t you always determine the faulting address?
- Book Reference: ARM Cortex-M Technical Reference Manual
Naked Functions
- Why do assembly wrappers need __attribute__((naked))?
- What does the compiler add to normal functions that would break exception handlers?

5.5 Questions to Guide Your Design

Before implementing, think through these:

Stack Selection
- How do you know if the faulted code was using MSP or PSP?
- What if the stack pointer itself is corrupted?
Output Method
- Can you safely use printf() in a fault handler?
- What if the fault was caused by UART code?
Recursion Protection
- What if your fault handler itself faults?
- How do you detect and handle nested faults?
Memory Safety
- What if the stacked PC points to invalid memory?
- How do you safely read potentially corrupted memory?

5.6 Thinking Exercise

Trace Through an Exception

Before coding, trace what happens when code executes *(volatile int *)0xE0100000; (invalid address):

LDR instruction fetches from 0xE0100000
Bus returns error (no peripheral at that address)
Processor detects BusFault condition
CFSR.BFSR.PRECISERR bit set to 1
BFAR = 0xE0100000
HFSR.FORCED = 1 (escalated because BusFault handler might be disabled)
Processor pushes R0-R3, R12, LR, PC, xPSR to current stack
LR = 0xFFFFFFF9 (return to Thread mode, MSP)
PC loaded from vector table offset 0x0C (HardFault)
Your HardFault_Handler starts executing

Questions while tracing:
- Why did it escalate to HardFault?
- What does 0xFFFFFFF9 mean vs 0xFFFFFFFD?
- How do you get back to the stacked PC value?

5.7 Hints in Layers

Hint 1: Getting the Exception Frame

The first challenge is getting a pointer to the exception frame. The processor pushed it onto either MSP or PSP. You check bit 2 of LR (EXC_RETURN) to determine which stack was in use.

Hint 2: Assembly Wrapper Pattern

// The wrapper must be naked to avoid compiler-generated prologue
__attribute__((naked)) void HardFault_Handler(void) {
    __asm volatile (
        "TST LR, #4      \n"  // Test bit 2 of EXC_RETURN
        "ITE EQ          \n"  // If-Then-Else
        "MRSEQ R0, MSP   \n"  // If 0, exception used MSP
        "MRSNE R0, PSP   \n"  // If 1, exception used PSP
        "B HardFault_Handler_C\n"  // Jump to C handler
    );
}

Hint 3: Reading Fault Registers

#define SCB_CFSR    (*(volatile uint32_t *)0xE000ED28)
#define SCB_HFSR    (*(volatile uint32_t *)0xE000ED2C)
#define SCB_MMFAR   (*(volatile uint32_t *)0xE000ED34)
#define SCB_BFAR    (*(volatile uint32_t *)0xE000ED38)

void read_fault_info(fault_info_t *info) {
    info->cfsr = SCB_CFSR;
    info->hfsr = SCB_HFSR;
    info->mmfar = SCB_MMFAR;
    info->bfar = SCB_BFAR;
    info->mmfar_valid = (info->cfsr & (1 << 7)) != 0;  // MMARVALID
    info->bfar_valid = (info->cfsr & (1 << 15)) != 0;  // BFARVALID
}

Hint 4: CFSR Bit Decoding

// CFSR bit definitions
#define CFSR_IACCVIOL   (1 << 0)   // Instruction access violation
#define CFSR_DACCVIOL   (1 << 1)   // Data access violation
#define CFSR_MUNSTKERR  (1 << 3)   // MemManage fault on unstacking
#define CFSR_MSTKERR    (1 << 4)   // MemManage fault on stacking
#define CFSR_MMARVALID  (1 << 7)   // MMFAR valid

#define CFSR_IBUSERR    (1 << 8)   // Bus fault on instruction fetch
#define CFSR_PRECISERR  (1 << 9)   // Precise data bus error
#define CFSR_IMPRECISERR (1 << 10) // Imprecise data bus error
#define CFSR_UNSTKERR   (1 << 11)  // Bus fault on unstacking
#define CFSR_STKERR     (1 << 12)  // Bus fault on stacking
#define CFSR_BFARVALID  (1 << 15)  // BFAR valid

#define CFSR_UNDEFINSTR (1 << 16)  // Undefined instruction
#define CFSR_INVSTATE   (1 << 17)  // Invalid state (Thumb bit)
#define CFSR_INVPC      (1 << 18)  // Invalid PC load
#define CFSR_NOCP       (1 << 19)  // No coprocessor
#define CFSR_UNALIGNED  (1 << 24)  // Unaligned access
#define CFSR_DIVBYZERO  (1 << 25)  // Divide by zero

5.8 The Interview Questions They’ll Ask

Prepare to answer these:

“Explain the ARM Cortex-M exception model”
- Vector table at fixed location (usually 0x0)
- Hardware automatically saves context
- Nested exceptions supported via NVIC priority
- EXC_RETURN controls exception return behavior
“What’s the difference between MSP and PSP?”
- MSP: Main Stack Pointer, used by handlers and privileged code
- PSP: Process Stack Pointer, used by threads/tasks
- Allows separation of kernel and user stacks
- CONTROL register selects which is active in Thread mode
“How do you handle a fault in the fault handler?”
- Nested fault escalates to HardFault
- HardFault in HardFault handler causes lockup
- Use minimal code, avoid memory allocation
- Consider watchdog timer as last resort
“What does EXC_RETURN value 0xFFFFFFF9 mean?”
- Return to Thread mode
- Use MSP for stack
- No floating point context
- (Compare to 0xFFFFFFFD: use PSP)
“How would you debug a crash that only happens in production?”
- Implement fault handler that logs to persistent storage
- Store registers, stack trace, and fault registers
- Transmit crash data on next boot
- Consider using hardware watchdog for hangs

5.9 Books That Will Help

Topic	Book	Chapter
Exception model	“The Definitive Guide to ARM Cortex-M3 and Cortex-M4” by Yiu	Ch. 8
Fault handling	“The Definitive Guide to ARM Cortex-M3 and Cortex-M4” by Yiu	Ch. 12
Stack unwinding	“Computer Systems: A Programmer’s Perspective” by Bryant	Ch. 3
Debug features	ARM Cortex-M Technical Reference Manual	Ch. 11
Practical debugging	“Making Embedded Systems” by White	Ch. 10

5.10 Implementation Phases

Phase 1: Basic HardFault Handler (3-4 hours)

Implement assembly wrapper to get stack pointer
Write C handler that prints stacked PC
Verify it catches a simple fault (null pointer dereference)

Phase 2: Full Register Dump (2-3 hours)

Print all stacked registers
Add CFSR, HFSR values
Print MMFAR/BFAR when valid

Phase 3: Fault Decoding (3-4 hours)

Implement CFSR bit parsing
Create human-readable fault descriptions
Add HFSR decoding

Phase 4: Stack Unwinding (4-6 hours)

Implement frame pointer following
Generate call trace
Handle edge cases (stack corruption)

Phase 5: Test Suite & Polish (3-4 hours)

Create functions to trigger each fault type
Test all code paths
Add recovery/reset mechanism

5.11 Key Implementation Decisions

Decision	Trade-offs
Separate handlers vs unified	Separate: cleaner routing. Unified: simpler, all escalate to HardFault
Printf vs direct UART	Printf: convenient. Direct UART: safer, no malloc
Symbol table for stack trace	With: function names. Without: smaller binary, still useful
Recovery vs reset	Recovery: complex, risk of corruption. Reset: safe, lose state
FPU context handling	Handle: complete. Ignore: simpler, only for non-FPU code

6. Testing Strategy

6.1 Fault Trigger Functions

// Trigger HardFault via invalid address read
void trigger_hardfault_invalid_addr(void) {
    volatile int *p = (int *)0xFFFFFFFF;
    int x = *p;  // Bus fault, escalates to HardFault
    (void)x;
}

// Trigger BusFault via invalid peripheral access
void trigger_busfault_peripheral(void) {
    volatile int *p = (int *)0xE0100000;  // Invalid peripheral
    int x = *p;
    (void)x;
}

// Trigger UsageFault via undefined instruction
void trigger_usagefault_undef(void) {
    // Execute undefined instruction
    __asm volatile (".word 0xFFFFFFFF");
}

// Trigger UsageFault via divide by zero
void trigger_usagefault_divzero(void) {
    // Enable div-by-zero trap first
    SCB->CCR |= SCB_CCR_DIV_0_TRP_Msk;

    volatile int a = 10, b = 0;
    int c = a / b;
    (void)c;
}

// Trigger unaligned access fault
void trigger_unaligned(void) {
    // Enable unaligned trap first
    SCB->CCR |= SCB_CCR_UNALIGN_TRP_Msk;

    uint8_t buffer[8] = {0};
    volatile uint32_t *p = (uint32_t *)(buffer + 1);
    *p = 0x12345678;
}

6.2 Integration Tests

Test	Trigger	Expected Result
Invalid address	Read from 0xFFFFFFFF	HardFault, shows PRECISERR or IMPRECISERR
Invalid peripheral	Read from 0xE0100000	BusFault, BFAR = 0xE0100000
Undefined instruction	Execute 0xFFFFFFFF	UsageFault, UNDEFINSTR
Divide by zero	a / 0	UsageFault, DIVBYZERO
Unaligned access	Unaligned word read	UsageFault, UNALIGNED
Stack overflow	Deep recursion	HardFault, corrupted stack

6.3 Verification Checklist

[ ] HardFault handler is called for each test
[ ] Stacked PC points to or near faulting instruction
[ ] CFSR correctly identifies fault type
[ ] BFAR/MMFAR contain correct address when valid
[ ] Stack trace shows correct call chain
[ ] System resets cleanly after fault
[ ] Handler doesn't crash on corrupted stack
[ ] Output is readable via UART

7. Common Pitfalls & Debugging

Problem 1: “Handler never gets called”

Why: Vector table not correctly placed or handler symbol missing
Fix: Check linker script places vectors at 0x0, verify handler is not optimized out
Debug: Use debugger to check vector table contents

Problem 2: “Wrong stack pointer used”

Why: EXC_RETURN bit test incorrect
Fix: Ensure TST LR, #4 and ITE EQ/MRSEQ/MRSNE are correct
Debug: Print LR value, manually verify which stack should be used

Problem 3: “BFAR/MMFAR always shows 0”

Why: Reading after clearing CFSR, or valid bit not set
Fix: Read fault addresses before clearing CFSR, check VALID bits
Debug: Print raw CFSR value, check valid bits

Problem 4: “Handler faults (nested fault)”

Why: Using corrupted stack, calling unsafe functions
Fix: Minimize handler code, don’t use malloc/printf
Debug: Use dedicated fault handler stack, simplify handler

Problem 5: “Stack trace is garbage”

Why: Stack corrupted, or frame pointer not used
Fix: Compile with -fno-omit-frame-pointer, validate addresses
Debug: Print raw stack memory, manually trace

Problem 6: “Imprecise fault shows wrong PC”

Why: Imprecise faults don’t stall the pipeline
Fix: Look at surrounding instructions, BFAR might be valid
Debug: Disable write buffering with SCnSCB->ACTLR |= 2;

8. Extensions & Challenges

8.1 Easy Extensions

Extension	Description	Learning
Fault counter	Count each fault type	Persistent diagnostics
Last fault storage	Store in backup RAM	Survives reset
LED indication	Blink pattern for fault type	Visual debugging
Watchdog integration	Reset if handler hangs	System reliability

8.2 Advanced Challenges

Challenge	Description	Learning
Full symbol lookup	Map PC to function names	ELF parsing, debug info
Crash log to flash	Persist fault info	Flash programming
Remote reporting	Send faults via network	Error reporting systems
Fault injection	Controlled fault testing	Testing methodologies
MPU configuration	Set up memory protection	Hardware security
RTOS integration	Handle task faults	Task isolation

8.3 Research Topics

How do commercial RTOSes (FreeRTOS, Zephyr) handle faults?
What is ARM TrustZone and how does it affect exception handling?
How do debuggers like GDB implement breakpoints on ARM?
What is lockup state and how do you recover from it?

9. Real-World Connections

9.1 Production Systems Using This

System	How It Uses Fault Handling	Notable Feature
Automotive ECUs	Log faults, enter limp mode	Safety-critical recovery
Medical devices	Record faults for FDA compliance	Audit trail
FreeRTOS	Per-task fault handling	Task isolation
Zephyr RTOS	Comprehensive fault framework	Detailed diagnostics
ARM Mbed OS	Error handling hooks	Crash reporting

9.2 How the Pros Do It

FreeRTOS:

Configurable fault handlers per task
Stack overflow detection
Heap corruption detection

Zephyr RTOS:

Detailed fault dump with all registers
Memory domain protection with MPU
Fault log to flash storage

Production devices:

Store crash data in backup RAM or flash
Transmit crash reports on next boot
Implement watchdog as last-resort recovery

10. Self-Assessment Checklist

Before considering this project complete, verify:

Next Steps

After completing this project, you’ll be well-prepared for:

Project 14: GDB Stub - Use debug features for remote debugging
Project 15: Tiny OS - Implement process isolation with fault handling
Production debugging - Apply these skills to real embedded systems

The fault handling expertise you’ve gained is essential for any professional embedded developer. When production devices crash, you’ll be the one who can diagnose the problem.