Project 9: Exception Handler & Fault Analyzer

Build a comprehensive fault handler that catches all ARM exceptions (HardFault, MemManage, BusFault, UsageFault), decodes the fault cause, prints a detailed stack trace, and optionally recovers or reboots.

Quick Reference

Attribute Value
Difficulty Level 4 - Expert
Time Estimate 1-2 weeks
Language C with ARM Assembly (primary), Rust (alternative)
Prerequisites Projects 3-4 (bare-metal, UART), strong understanding of stack and calling conventions
Key Topics Exception Model, Fault Registers, Stack Unwinding, CFSR, HFSR, MMFAR, BFAR

1. Learning Objectives

After completing this project, you will:

  • Understand the ARM Cortex-M exception model and vector table
  • Master the structure of stacked exception frames
  • Decode all fault status registers (CFSR, HFSR, MMFAR, BFAR)
  • Determine the exact faulting instruction from the stacked PC
  • Implement stack unwinding to generate call traces
  • Understand the difference between precise and imprecise bus faults
  • Know how to safely recover from or reset after faults
  • Handle both MSP and PSP stack pointers in exception handlers
  • Debug embedded systems crashes like a professional

2. Theoretical Foundation

2.1 Core Concepts

The ARM Exception Model

ARM Cortex-M processors have a sophisticated exception handling system. When a fault occurs, the processor automatically:

  1. Pushes an exception frame onto the stack
  2. Switches to Handler mode
  3. Loads the appropriate vector from the vector table
  4. Begins executing the exception handler
Exception Flow:
                                    ┌──────────────────┐
Normal execution ───────────────────►│ Fault occurs     │
                                    └────────┬─────────┘
                                             │
                                    ┌────────▼─────────┐
                                    │ Save CPU state   │
                                    │ (R0-R3, R12, LR, │
                                    │  PC, xPSR)       │
                                    └────────┬─────────┘
                                             │
                                    ┌────────▼─────────┐
                                    │ Load handler     │
                                    │ from vector table│
                                    └────────┬─────────┘
                                             │
                                    ┌────────▼─────────┐
                                    │ Execute handler  │
                                    │ (your code!)     │
                                    └────────┬─────────┘
                                             │
                                    ┌────────▼─────────┐
                                    │ Return or reset  │
                                    └──────────────────┘

The Exception Frame

When an exception occurs, the processor automatically pushes 8 registers onto the current stack:

Stack Layout After Exception (grows downward):
┌───────────────────────────────────────────┐  Higher addresses
│               xPSR                        │  ← SP + 28
├───────────────────────────────────────────┤
│               PC (Return Address)         │  ← SP + 24
├───────────────────────────────────────────┤
│               LR (Link Register)          │  ← SP + 20
├───────────────────────────────────────────┤
│               R12                         │  ← SP + 16
├───────────────────────────────────────────┤
│               R3                          │  ← SP + 12
├───────────────────────────────────────────┤
│               R2                          │  ← SP + 8
├───────────────────────────────────────────┤
│               R1                          │  ← SP + 4
├───────────────────────────────────────────┤
│               R0                          │  ← SP (stack pointer)
└───────────────────────────────────────────┘  Lower addresses

Note: If FPU was active, additional 18 registers are pushed!

The stacked PC contains the address of the instruction that was executing (or about to execute) when the fault occurred. This is crucial for debugging.

Fault Status Registers

ARM Cortex-M provides detailed fault information through several System Control Block (SCB) registers:

Fault Register Map:
┌─────────────────────────────────────────────────────────────────┐
│                    CFSR (0xE000ED28) - 32 bits                   │
│  ┌──────────────────┬──────────────────┬──────────────────┐     │
│  │ UsageFault (16)  │ BusFault (8)     │ MemManage (8)    │     │
│  │ Bits 31-16       │ Bits 15-8        │ Bits 7-0         │     │
│  └──────────────────┴──────────────────┴──────────────────┘     │
├─────────────────────────────────────────────────────────────────┤
│                    HFSR (0xE000ED2C) - HardFault Status          │
│  • VECTTBL: Vector table read fault                             │
│  • FORCED: Escalated from another fault                         │
│  • DEBUGEVT: Debug event occurred                               │
├─────────────────────────────────────────────────────────────────┤
│                    MMFAR (0xE000ED34) - MemManage Fault Address  │
│  • Contains the address that caused MemManage fault             │
│  • Only valid if MMARVALID bit is set in CFSR                   │
├─────────────────────────────────────────────────────────────────┤
│                    BFAR (0xE000ED38) - BusFault Address          │
│  • Contains the address that caused BusFault                    │
│  • Only valid if BFARVALID bit is set in CFSR                   │
└─────────────────────────────────────────────────────────────────┘

Types of Faults

Fault Hierarchy:
┌─────────────────────────────────────────────────────────────────┐
│                        HardFault                                │
│  • Highest priority fault handler                               │
│  • Catches faults that can't be handled by other handlers       │
│  • Also catches faults when other handlers are disabled         │
├─────────────────────────────────────────────────────────────────┤
│                        MemManage Fault                          │
│  • MPU violations                                               │
│  • Execute from non-executable region                           │
│  • Access to restricted region                                  │
├─────────────────────────────────────────────────────────────────┤
│                        BusFault                                 │
│  • Bus error during memory access                               │
│  • Access to invalid peripheral address                         │
│  • Can be precise (address known) or imprecise                  │
├─────────────────────────────────────────────────────────────────┤
│                        UsageFault                               │
│  • Undefined instruction                                        │
│  • Invalid state transition                                     │
│  • Unaligned access (if configured)                             │
│  • Division by zero (if configured)                             │
└─────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

This is the skill that separates embedded experts from beginners. When a production device crashes in the field, you need to diagnose the problem from minimal information. A good fault handler provides:

  • Immediate diagnosis: Know exactly what went wrong
  • Crash forensics: Understand the call chain that led to the crash
  • Field debugging: Devices in the field can report crash information
  • Root cause analysis: Fix the actual bug, not just the symptom

Industry usage:

  • Automotive systems use fault handlers to log crashes before entering safe mode
  • Medical devices record fault information for regulatory compliance
  • IoT devices report fault data to cloud monitoring systems
  • Operating systems use fault handlers for memory protection and process isolation

2.3 Historical Context

The exception model in Cortex-M processors evolved from earlier ARM architectures but was significantly simplified for embedded use. Key innovations:

  • Automatic context saving: The processor saves registers automatically
  • Tail-chaining: Efficient handling of back-to-back exceptions
  • Late-arriving: Higher priority exceptions can preempt pending handlers
  • Configurable fault handlers: Separate handlers for different fault types

The Cortex-M exception model was designed to enable deterministic, low-latency interrupt handling suitable for real-time systems, while also providing robust fault detection and debugging capabilities.

2.4 Common Misconceptions

Misconception 1: “HardFault means a hardware problem”

  • Reality: HardFault is a software-triggered exception, often caused by memory access violations or invalid instructions. It’s called “hard” because it’s the fault of last resort.

Misconception 2: “The PC in the exception frame points to the faulting instruction”

  • Reality: The stacked PC points to the instruction that was being executed or was about to execute. For precise faults, this is the faulting instruction. For imprecise faults, the CPU may have advanced past it.

Misconception 3: “BFAR/MMFAR always contain valid addresses”

  • Reality: These registers are only valid when the corresponding VALID bit is set in CFSR.

Misconception 4: “You can always recover from a fault”

  • Reality: Some faults corrupt state irreparably. The safest response is often to log and reset.

Misconception 5: “EXC_RETURN values are regular addresses”

  • Reality: The special values 0xFFFFFFF* are EXC_RETURN codes that control exception return behavior, not actual memory addresses.

3. Project Specification

3.1 What You Will Build

A comprehensive fault handling system that:

  • Catches all ARM Cortex-M exception types
  • Decodes fault status registers into human-readable output
  • Prints the stacked register values
  • Generates a call stack trace
  • Provides recovery options (reset or continue)
  • Works on STM32 or similar Cortex-M boards

3.2 Functional Requirements

  1. Exception handlers: Implement HardFault, MemManage, BusFault, and UsageFault handlers
  2. Register dump: Print all stacked registers (R0-R3, R12, LR, PC, xPSR)
  3. Fault decoding: Parse and explain each bit in CFSR, HFSR
  4. Address reporting: Show MMFAR and BFAR when valid
  5. Stack trace: Unwind the stack to show call chain
  6. Trigger commands: Provide test functions that cause each fault type
  7. Recovery: Implement safe reset or recovery mechanisms

3.3 Non-Functional Requirements

  1. Reliability: Handler must not itself fault
  2. Minimalism: Handler should not depend on potentially corrupted state
  3. Speed: Fault information captured quickly before further corruption
  4. Portability: Work on any Cortex-M3/M4/M7 device
  5. Memory safety: Use minimal stack space in handler

3.4 Example Usage / Output

=== ARM Fault Handler Test Suite ===

Available fault triggers:
  1. Trigger HardFault (invalid address)
  2. Trigger BusFault (invalid peripheral)
  3. Trigger UsageFault (undefined instruction)
  4. Trigger MemManage (MPU violation)
  5. Trigger UsageFault (divide by zero)
  6. Trigger unaligned access

Select test (1-6): 2

Triggering BusFault...

!!! HARD FAULT DETECTED !!!

Fault Type: BusFault (Precise)
Fault Address: 0xE0100000 (invalid peripheral access)
Faulting Instruction: 0x08001234 (LDR R0, [R1])

Stacked Registers:
  R0  = 0x00000042    R1  = 0xE0100000
  R2  = 0x00000000    R3  = 0x20001234
  R12 = 0x08004567    LR  = 0x0800089B
  PC  = 0x08001234    xPSR= 0x61000000

Fault Status:
  CFSR  = 0x00000400  [PRECISERR]
  HFSR  = 0x40000000  [FORCED]
  BFAR  = 0xE0100000  (Valid)

CFSR Breakdown:
  MemManage: (none)
  BusFault:  PRECISERR - Precise data bus error
  UsageFault: (none)

Stack Trace:
  #0 0x08001234 in read_sensor() at sensors.c:47
  #1 0x0800089A in main() at main.c:123
  #2 0x080000A2 in Reset_Handler() at startup.s:45

Call Stack Memory:
  0x20003FF0: 0x08001234  <- Fault PC
  0x20003FEC: 0x0800089A  <- Called from
  0x20003FE8: 0x080000A2  <- Called from

Action: System reset in 5 seconds...

3.5 Real World Outcome

What success looks like:

  1. Working fault handlers: All fault types are caught and reported
  2. Clear diagnostics: Output clearly identifies the fault cause and location
  3. Stack unwinding: Call trace shows how execution reached the fault
  4. Recovery mechanism: System can safely reset or recover
  5. Deep understanding: You can explain the ARM exception model to others

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                    Fault Handler System                          │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                   Vector Table                           │    │
│  │  ┌─────────┬─────────┬─────────┬─────────┬─────────┐    │    │
│  │  │ Reset   │ NMI     │ HardFlt │ MemMgmt │ BusFlt  │... │    │
│  │  └────┬────┴────┬────┴────┬────┴────┬────┴────┬────┘    │    │
│  └───────│─────────│─────────│─────────│─────────│─────────┘    │
│          │         │         │         │         │               │
│          ▼         ▼         ▼         ▼         ▼               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │            Assembly Wrappers (Naked Functions)           │    │
│  │                                                          │    │
│  │  • Determine which stack was in use (MSP/PSP)           │    │
│  │  • Get the exception frame pointer                       │    │
│  │  • Call C handler with frame pointer                     │    │
│  └─────────────────────────────┬───────────────────────────┘    │
│                                │                                 │
│                                ▼                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │               C Handler Implementation                   │    │
│  │                                                          │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Read Fault Regs  │ CFSR, HFSR, MMFAR, BFAR           │    │
│  │  └────────┬─────────┘                                   │    │
│  │           │                                              │    │
│  │           ▼                                              │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Print Registers  │ Stacked R0-R3, R12, LR, PC, xPSR  │    │
│  │  └────────┬─────────┘                                   │    │
│  │           │                                              │    │
│  │           ▼                                              │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Decode Faults    │ Parse CFSR bits, identify cause   │    │
│  │  └────────┬─────────┘                                   │    │
│  │           │                                              │    │
│  │           ▼                                              │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Stack Unwind     │ Follow frame pointers              │    │
│  │  └────────┬─────────┘                                   │    │
│  │           │                                              │    │
│  │           ▼                                              │    │
│  │  ┌──────────────────┐                                   │    │
│  │  │ Recovery Action  │ Reset, infinite loop, or return   │    │
│  │  └──────────────────┘                                   │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component Purpose Key Functions
Vector Table Routes exceptions to handlers Assembly definitions
Assembly Wrapper Determines stack, extracts frame HardFault_Handler (naked)
C Handler Main fault analysis logic HardFault_Handler_C()
Fault Decoder Parses status registers decode_cfsr(), decode_hfsr()
Stack Unwinder Generates call trace unwind_stack()
Output System Prints diagnostics Uses UART from Project 4
Recovery Module Handles post-fault actions system_reset(), hang()

4.3 Data Structures

// Exception frame pushed automatically by hardware
typedef struct {
    uint32_t r0;
    uint32_t r1;
    uint32_t r2;
    uint32_t r3;
    uint32_t r12;
    uint32_t lr;
    uint32_t pc;      // Address of faulting instruction
    uint32_t xpsr;
} exception_frame_t;

// Extended frame when FPU context is active
typedef struct {
    exception_frame_t basic;
    uint32_t s0_s15[16];   // S0-S15 FPU registers
    uint32_t fpscr;        // FPU Status and Control
    uint32_t reserved;
} exception_frame_fpu_t;

// Decoded fault information
typedef struct {
    uint32_t cfsr;         // Configurable Fault Status Register
    uint32_t hfsr;         // HardFault Status Register
    uint32_t mmfar;        // MemManage Fault Address
    uint32_t bfar;         // BusFault Address
    uint32_t afsr;         // Auxiliary Fault Status Register
    bool mmfar_valid;
    bool bfar_valid;
} fault_info_t;

// Stack trace entry
typedef struct {
    uint32_t pc;           // Program counter
    uint32_t lr;           // Link register (return address)
    const char *func_name; // Function name (if available)
} stack_frame_t;

4.4 Algorithm Overview

FUNCTION handle_fault(exception_frame):
    // Phase 1: Capture fault state immediately
    fault_info = read_fault_registers()

    // Phase 2: Output basic information
    print("!!! FAULT DETECTED !!!")
    print_exception_frame(exception_frame)

    // Phase 3: Decode and explain the fault
    IF fault_info.cfsr & MEMMANAGE_BITS:
        decode_memmanage_fault(fault_info)
    IF fault_info.cfsr & BUSFAULT_BITS:
        decode_bus_fault(fault_info)
    IF fault_info.cfsr & USAGEFAULT_BITS:
        decode_usage_fault(fault_info)

    // Phase 4: Generate stack trace
    unwind_stack(exception_frame)

    // Phase 5: Clear fault status bits (write-1-to-clear)
    CFSR = fault_info.cfsr
    HFSR = fault_info.hfsr

    // Phase 6: Recovery action
    IF recoverable(fault_info):
        attempt_recovery(exception_frame)
    ELSE:
        print("System reset in 5 seconds...")
        delay(5000)
        NVIC_SystemReset()

5. Implementation Guide

5.1 Development Environment Setup

# Verify toolchain
$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Arm Embedded Toolchain) 12.2.1

# Verify debugger
$ openocd --version
Open On-Chip Debugger 0.12.0

# Create project directory
$ mkdir -p ~/projects/arm-fault-handler
$ cd ~/projects/arm-fault-handler

# Create initial file structure
$ touch main.c fault_handler.c fault_handler.h startup.s linker.ld Makefile

# Hardware: STM32 Nucleo board connected via ST-Link

5.2 Project Structure

arm-fault-handler/
├── main.c              # Test application with fault triggers
├── fault_handler.c     # Fault handler implementation
├── fault_handler.h     # Fault handler interface
├── fault_decode.c      # CFSR/HFSR decoding functions
├── stack_unwind.c      # Stack unwinding implementation
├── uart.c              # UART driver (from Project 4)
├── uart.h
├── startup.s           # Vector table and reset handler
├── linker.ld           # Linker script
├── stm32f4xx.h         # Register definitions
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How does an ARM Cortex-M processor report errors, and how can I extract maximum debugging information from a crash?”

Before coding, understand: When your program does something illegal (invalid memory access, undefined instruction, etc.), the processor doesn’t just crash randomly. It follows a precise protocol: save state, set status bits, and jump to your handler. Your job is to read what the processor tells you.

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. EXC_RETURN Values
    • What does LR contain during an exception? (Hint: 0xFFFFFFF*)
    • How do you determine which stack (MSP/PSP) was in use?
    • Book Reference: “The Definitive Guide to ARM Cortex-M3” Chapter 8 - Yiu
  2. CFSR Register Structure
    • How is CFSR divided into three sub-registers?
    • Which bits indicate address validity?
    • Book Reference: “The Definitive Guide to ARM Cortex-M3” Chapter 12 - Yiu
  3. Precise vs Imprecise Faults
    • What makes a bus fault imprecise?
    • Why can’t you always determine the faulting address?
    • Book Reference: ARM Cortex-M Technical Reference Manual
  4. Naked Functions
    • Why do assembly wrappers need __attribute__((naked))?
    • What does the compiler add to normal functions that would break exception handlers?

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. Stack Selection
    • How do you know if the faulted code was using MSP or PSP?
    • What if the stack pointer itself is corrupted?
  2. Output Method
    • Can you safely use printf() in a fault handler?
    • What if the fault was caused by UART code?
  3. Recursion Protection
    • What if your fault handler itself faults?
    • How do you detect and handle nested faults?
  4. Memory Safety
    • What if the stacked PC points to invalid memory?
    • How do you safely read potentially corrupted memory?

5.6 Thinking Exercise

Trace Through an Exception

Before coding, trace what happens when code executes *(volatile int *)0xE0100000; (invalid address):

1. LDR instruction fetches from 0xE0100000
2. Bus returns error (no peripheral at that address)
3. Processor detects BusFault condition
4. CFSR.BFSR.PRECISERR bit set to 1
5. BFAR = 0xE0100000
6. HFSR.FORCED = 1 (escalated because BusFault handler might be disabled)
7. Processor pushes R0-R3, R12, LR, PC, xPSR to current stack
8. LR = 0xFFFFFFF9 (return to Thread mode, MSP)
9. PC loaded from vector table offset 0x0C (HardFault)
10. Your HardFault_Handler starts executing

Questions while tracing:
- Why did it escalate to HardFault?
- What does 0xFFFFFFF9 mean vs 0xFFFFFFFD?
- How do you get back to the stacked PC value?

5.7 Hints in Layers

Hint 1: Getting the Exception Frame

The first challenge is getting a pointer to the exception frame. The processor pushed it onto either MSP or PSP. You check bit 2 of LR (EXC_RETURN) to determine which stack was in use.

Hint 2: Assembly Wrapper Pattern

// The wrapper must be naked to avoid compiler-generated prologue
__attribute__((naked)) void HardFault_Handler(void) {
    __asm volatile (
        "TST LR, #4      \n"  // Test bit 2 of EXC_RETURN
        "ITE EQ          \n"  // If-Then-Else
        "MRSEQ R0, MSP   \n"  // If 0, exception used MSP
        "MRSNE R0, PSP   \n"  // If 1, exception used PSP
        "B HardFault_Handler_C\n"  // Jump to C handler
    );
}

Hint 3: Reading Fault Registers

#define SCB_CFSR    (*(volatile uint32_t *)0xE000ED28)
#define SCB_HFSR    (*(volatile uint32_t *)0xE000ED2C)
#define SCB_MMFAR   (*(volatile uint32_t *)0xE000ED34)
#define SCB_BFAR    (*(volatile uint32_t *)0xE000ED38)

void read_fault_info(fault_info_t *info) {
    info->cfsr = SCB_CFSR;
    info->hfsr = SCB_HFSR;
    info->mmfar = SCB_MMFAR;
    info->bfar = SCB_BFAR;
    info->mmfar_valid = (info->cfsr & (1 << 7)) != 0;  // MMARVALID
    info->bfar_valid = (info->cfsr & (1 << 15)) != 0;  // BFARVALID
}

Hint 4: CFSR Bit Decoding

// CFSR bit definitions
#define CFSR_IACCVIOL   (1 << 0)   // Instruction access violation
#define CFSR_DACCVIOL   (1 << 1)   // Data access violation
#define CFSR_MUNSTKERR  (1 << 3)   // MemManage fault on unstacking
#define CFSR_MSTKERR    (1 << 4)   // MemManage fault on stacking
#define CFSR_MMARVALID  (1 << 7)   // MMFAR valid

#define CFSR_IBUSERR    (1 << 8)   // Bus fault on instruction fetch
#define CFSR_PRECISERR  (1 << 9)   // Precise data bus error
#define CFSR_IMPRECISERR (1 << 10) // Imprecise data bus error
#define CFSR_UNSTKERR   (1 << 11)  // Bus fault on unstacking
#define CFSR_STKERR     (1 << 12)  // Bus fault on stacking
#define CFSR_BFARVALID  (1 << 15)  // BFAR valid

#define CFSR_UNDEFINSTR (1 << 16)  // Undefined instruction
#define CFSR_INVSTATE   (1 << 17)  // Invalid state (Thumb bit)
#define CFSR_INVPC      (1 << 18)  // Invalid PC load
#define CFSR_NOCP       (1 << 19)  // No coprocessor
#define CFSR_UNALIGNED  (1 << 24)  // Unaligned access
#define CFSR_DIVBYZERO  (1 << 25)  // Divide by zero

5.8 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the ARM Cortex-M exception model”
    • Vector table at fixed location (usually 0x0)
    • Hardware automatically saves context
    • Nested exceptions supported via NVIC priority
    • EXC_RETURN controls exception return behavior
  2. “What’s the difference between MSP and PSP?”
    • MSP: Main Stack Pointer, used by handlers and privileged code
    • PSP: Process Stack Pointer, used by threads/tasks
    • Allows separation of kernel and user stacks
    • CONTROL register selects which is active in Thread mode
  3. “How do you handle a fault in the fault handler?”
    • Nested fault escalates to HardFault
    • HardFault in HardFault handler causes lockup
    • Use minimal code, avoid memory allocation
    • Consider watchdog timer as last resort
  4. “What does EXC_RETURN value 0xFFFFFFF9 mean?”
    • Return to Thread mode
    • Use MSP for stack
    • No floating point context
    • (Compare to 0xFFFFFFFD: use PSP)
  5. “How would you debug a crash that only happens in production?”
    • Implement fault handler that logs to persistent storage
    • Store registers, stack trace, and fault registers
    • Transmit crash data on next boot
    • Consider using hardware watchdog for hangs

5.9 Books That Will Help

Topic Book Chapter
Exception model “The Definitive Guide to ARM Cortex-M3 and Cortex-M4” by Yiu Ch. 8
Fault handling “The Definitive Guide to ARM Cortex-M3 and Cortex-M4” by Yiu Ch. 12
Stack unwinding “Computer Systems: A Programmer’s Perspective” by Bryant Ch. 3
Debug features ARM Cortex-M Technical Reference Manual Ch. 11
Practical debugging “Making Embedded Systems” by White Ch. 10

5.10 Implementation Phases

Phase 1: Basic HardFault Handler (3-4 hours)

  • Implement assembly wrapper to get stack pointer
  • Write C handler that prints stacked PC
  • Verify it catches a simple fault (null pointer dereference)

Phase 2: Full Register Dump (2-3 hours)

  • Print all stacked registers
  • Add CFSR, HFSR values
  • Print MMFAR/BFAR when valid

Phase 3: Fault Decoding (3-4 hours)

  • Implement CFSR bit parsing
  • Create human-readable fault descriptions
  • Add HFSR decoding

Phase 4: Stack Unwinding (4-6 hours)

  • Implement frame pointer following
  • Generate call trace
  • Handle edge cases (stack corruption)

Phase 5: Test Suite & Polish (3-4 hours)

  • Create functions to trigger each fault type
  • Test all code paths
  • Add recovery/reset mechanism

5.11 Key Implementation Decisions

Decision Trade-offs
Separate handlers vs unified Separate: cleaner routing. Unified: simpler, all escalate to HardFault
Printf vs direct UART Printf: convenient. Direct UART: safer, no malloc
Symbol table for stack trace With: function names. Without: smaller binary, still useful
Recovery vs reset Recovery: complex, risk of corruption. Reset: safe, lose state
FPU context handling Handle: complete. Ignore: simpler, only for non-FPU code

6. Testing Strategy

6.1 Fault Trigger Functions

// Trigger HardFault via invalid address read
void trigger_hardfault_invalid_addr(void) {
    volatile int *p = (int *)0xFFFFFFFF;
    int x = *p;  // Bus fault, escalates to HardFault
    (void)x;
}

// Trigger BusFault via invalid peripheral access
void trigger_busfault_peripheral(void) {
    volatile int *p = (int *)0xE0100000;  // Invalid peripheral
    int x = *p;
    (void)x;
}

// Trigger UsageFault via undefined instruction
void trigger_usagefault_undef(void) {
    // Execute undefined instruction
    __asm volatile (".word 0xFFFFFFFF");
}

// Trigger UsageFault via divide by zero
void trigger_usagefault_divzero(void) {
    // Enable div-by-zero trap first
    SCB->CCR |= SCB_CCR_DIV_0_TRP_Msk;

    volatile int a = 10, b = 0;
    int c = a / b;
    (void)c;
}

// Trigger unaligned access fault
void trigger_unaligned(void) {
    // Enable unaligned trap first
    SCB->CCR |= SCB_CCR_UNALIGN_TRP_Msk;

    uint8_t buffer[8] = {0};
    volatile uint32_t *p = (uint32_t *)(buffer + 1);
    *p = 0x12345678;
}

6.2 Integration Tests

Test Trigger Expected Result
Invalid address Read from 0xFFFFFFFF HardFault, shows PRECISERR or IMPRECISERR
Invalid peripheral Read from 0xE0100000 BusFault, BFAR = 0xE0100000
Undefined instruction Execute 0xFFFFFFFF UsageFault, UNDEFINSTR
Divide by zero a / 0 UsageFault, DIVBYZERO
Unaligned access Unaligned word read UsageFault, UNALIGNED
Stack overflow Deep recursion HardFault, corrupted stack

6.3 Verification Checklist

[ ] HardFault handler is called for each test
[ ] Stacked PC points to or near faulting instruction
[ ] CFSR correctly identifies fault type
[ ] BFAR/MMFAR contain correct address when valid
[ ] Stack trace shows correct call chain
[ ] System resets cleanly after fault
[ ] Handler doesn't crash on corrupted stack
[ ] Output is readable via UART

7. Common Pitfalls & Debugging

Problem 1: “Handler never gets called”

  • Why: Vector table not correctly placed or handler symbol missing
  • Fix: Check linker script places vectors at 0x0, verify handler is not optimized out
  • Debug: Use debugger to check vector table contents

Problem 2: “Wrong stack pointer used”

  • Why: EXC_RETURN bit test incorrect
  • Fix: Ensure TST LR, #4 and ITE EQ/MRSEQ/MRSNE are correct
  • Debug: Print LR value, manually verify which stack should be used

Problem 3: “BFAR/MMFAR always shows 0”

  • Why: Reading after clearing CFSR, or valid bit not set
  • Fix: Read fault addresses before clearing CFSR, check VALID bits
  • Debug: Print raw CFSR value, check valid bits

Problem 4: “Handler faults (nested fault)”

  • Why: Using corrupted stack, calling unsafe functions
  • Fix: Minimize handler code, don’t use malloc/printf
  • Debug: Use dedicated fault handler stack, simplify handler

Problem 5: “Stack trace is garbage”

  • Why: Stack corrupted, or frame pointer not used
  • Fix: Compile with -fno-omit-frame-pointer, validate addresses
  • Debug: Print raw stack memory, manually trace

Problem 6: “Imprecise fault shows wrong PC”

  • Why: Imprecise faults don’t stall the pipeline
  • Fix: Look at surrounding instructions, BFAR might be valid
  • Debug: Disable write buffering with SCnSCB->ACTLR |= 2;

8. Extensions & Challenges

8.1 Easy Extensions

Extension Description Learning
Fault counter Count each fault type Persistent diagnostics
Last fault storage Store in backup RAM Survives reset
LED indication Blink pattern for fault type Visual debugging
Watchdog integration Reset if handler hangs System reliability

8.2 Advanced Challenges

Challenge Description Learning
Full symbol lookup Map PC to function names ELF parsing, debug info
Crash log to flash Persist fault info Flash programming
Remote reporting Send faults via network Error reporting systems
Fault injection Controlled fault testing Testing methodologies
MPU configuration Set up memory protection Hardware security
RTOS integration Handle task faults Task isolation

8.3 Research Topics

  • How do commercial RTOSes (FreeRTOS, Zephyr) handle faults?
  • What is ARM TrustZone and how does it affect exception handling?
  • How do debuggers like GDB implement breakpoints on ARM?
  • What is lockup state and how do you recover from it?

9. Real-World Connections

9.1 Production Systems Using This

System How It Uses Fault Handling Notable Feature
Automotive ECUs Log faults, enter limp mode Safety-critical recovery
Medical devices Record faults for FDA compliance Audit trail
FreeRTOS Per-task fault handling Task isolation
Zephyr RTOS Comprehensive fault framework Detailed diagnostics
ARM Mbed OS Error handling hooks Crash reporting

9.2 How the Pros Do It

FreeRTOS:

  • Configurable fault handlers per task
  • Stack overflow detection
  • Heap corruption detection

Zephyr RTOS:

  • Detailed fault dump with all registers
  • Memory domain protection with MPU
  • Fault log to flash storage

Production devices:

  • Store crash data in backup RAM or flash
  • Transmit crash reports on next boot
  • Implement watchdog as last-resort recovery

10. Self-Assessment Checklist

Before considering this project complete, verify:

  • I can explain the ARM Cortex-M exception model and vector table
  • I understand the structure of the hardware-pushed exception frame
  • I can decode CFSR to identify any fault type
  • I know the difference between MSP and PSP and when each is used
  • I can explain EXC_RETURN values (0xFFFFFFF9, 0xFFFFFFFD, etc.)
  • My handler correctly identifies the faulting instruction address
  • My handler prints all fault status registers with decoded meanings
  • I implemented stack unwinding to show the call chain
  • My handler doesn’t itself cause faults
  • I can trigger and correctly diagnose all fault types
  • I understand the difference between precise and imprecise faults
  • I can answer all the interview questions listed above

Next Steps

After completing this project, you’ll be well-prepared for:

  • Project 14: GDB Stub - Use debug features for remote debugging
  • Project 15: Tiny OS - Implement process isolation with fault handling
  • Production debugging - Apply these skills to real embedded systems

The fault handling expertise you’ve gained is essential for any professional embedded developer. When production devices crash, you’ll be the one who can diagnose the problem.