Project 8: ARM Emulator

Build an ARM emulator that can execute real ARM binaries, emulating the CPU, memory, and basic I/O. It will run simple bare-metal programs and show the internal state of registers and memory—like building a mini-QEMU.


Quick Reference

Attribute Value
Language C (alt: Rust, C++, Go)
Difficulty Master
Time 1 month+
Coolness ★★★★★ Pure Magic (Super Cool)
Portfolio Value Open Core Infrastructure
Prerequisites Project 1 (instruction decoder), Projects 2-3 (ARM understanding), strong C skills
Key Topics ISA emulation, CPU simulation, barrel shifter, condition codes

Learning Objectives

By completing this project, you will:

  1. Fully understand the ARM instruction set: Every instruction must be decoded and executed correctly
  2. Implement the ARM execution model: PC+8 pipeline offset, condition codes, barrel shifter
  3. Build a virtual machine: Memory management, register file, program execution
  4. Master low-level programming: Bit manipulation, instruction encoding, state machines
  5. Create developer tools: Debugger-like stepping, breakpoints, memory inspection
  6. Understand CPU architecture deeply: After this, ARM will have no secrets

The Core Question You’re Answering

“What exactly happens when an ARM processor executes an instruction?”

Building an emulator forces complete understanding. You can’t fake it—every instruction must be decoded bit by bit and executed with correct semantics. This project is the ultimate test of ARM knowledge and the most powerful way to internalize the architecture.


Concepts You Must Understand First

Concept Why It Matters Where to Learn
ARM instruction encoding Every bit must be decoded correctly ARM ARM, Project 1
Condition codes (NZCV) Almost all instructions are conditional ARM ARM Section A2
Barrel shifter ARM’s unique feature for operand processing ARM ARM Section A5
PC+8 offset Pipeline affects PC-relative addressing ARM TRM
Load/Store architecture Memory access patterns differ from x86 ARM fundamentals
Processor modes Different register banks, privileges ARM ARM Section A2

Self-Assessment Questions

  1. Instruction format: What bits determine if an instruction is data processing or load/store?
  2. Condition codes: What condition code value means “always execute”?
  3. Barrel shifter: How do you extract the shift type and amount from bits 11-4?
  4. Immediate encoding: An ARM immediate is 8 bits + 4-bit rotation. How do you decode 0x0F1?
  5. PC behavior: When executing instruction at 0x1000, what value does reading PC give?

Theoretical Foundation

CPU Emulation Fundamentals

An emulator simulates the behavior of hardware in software:

Real Hardware:                    Emulator:
┌─────────────────────┐          ┌─────────────────────────────────┐
│ ARM Silicon         │          │ Software Implementation         │
│                     │          │                                 │
│ ┌─────────────────┐ │          │ struct ARM_CPU {                │
│ │ Registers       │ │  ═══▶   │     uint32_t r[16];  // R0-R15  │
│ │ R0-R15, CPSR    │ │          │     uint32_t cpsr;              │
│ └─────────────────┘ │          │ };                              │
│                     │          │                                 │
│ ┌─────────────────┐ │          │ uint8_t memory[MEM_SIZE];       │
│ │ Memory Bus      │ │  ═══▶   │                                 │
│ └─────────────────┘ │          │ void execute(cpu, insn);        │
│                     │          │                                 │
└─────────────────────┘          └─────────────────────────────────┘

The Fetch-Decode-Execute Cycle

Every CPU (and emulator) runs this loop:

┌─────────────────────────────────────────────────────────────────────┐
│                    Fetch-Decode-Execute Cycle                       │
└─────────────────────────────────────────────────────────────────────┘

    ┌──────────────┐
    │              │
    │    FETCH     │────▶ Read instruction from memory at PC
    │              │      insn = mem[PC]
    └──────┬───────┘
           │
           ▼
    ┌──────────────┐
    │              │
    │   DECODE     │────▶ Extract opcode, operands, conditions
    │              │      cond = insn >> 28
    │              │      opcode = (insn >> 21) & 0xF
    └──────┬───────┘
           │
           ▼
    ┌──────────────┐
    │              │
    │  CONDITION   │────▶ Check if condition passes (NZCV flags)
    │    CHECK     │      if (!condition_passed(cpsr, cond)) skip
    │              │
    └──────┬───────┘
           │
           ▼
    ┌──────────────┐
    │              │
    │   EXECUTE    │────▶ Perform the operation
    │              │      result = ALU(Rn, operand2)
    │              │      Rd = result (maybe update flags)
    └──────┬───────┘
           │
           ▼
    ┌──────────────┐
    │              │
    │  UPDATE PC   │────▶ PC += 4 (or branch target)
    │              │
    └──────┬───────┘
           │
           └──────────────────────────▶ (repeat)

ARM Instruction Encoding Overview

ARM instructions are 32 bits with a consistent structure:

Bit layout overview:
31  28 27 26 25 24  21 20 19  16 15  12 11            0
┌─────┬─────┬──┬─────┬──┬─────┬─────┬──────────────────┐
│Cond │ Op  │I │OpCd │S │ Rn  │ Rd  │    Operand2      │
└─────┴─────┴──┴─────┴──┴─────┴─────┴──────────────────┘

Cond (31-28): Condition field - when to execute
             0000 = EQ (Z=1)
             0001 = NE (Z=0)
             ...
             1110 = AL (always)
             1111 = NV (never/special)

Op (27-26): Major instruction class
           00 = Data Processing / Multiply
           01 = Load/Store single
           10 = Branch / Block transfer
           11 = Coprocessor / SWI

I (25): Immediate bit
       0 = Operand2 is register (with shift)
       1 = Operand2 is immediate (with rotation)

OpCode (24-21): Specific operation within class
               For data processing: AND, EOR, SUB, RSB, ADD, etc.

S (20): Set flags bit
       0 = Don't update CPSR
       1 = Update CPSR based on result

Data Processing Instructions

The most common instruction type:

Format: <OP>{S}{cond} Rd, Rn, Operand2

Examples:
  ADD R0, R1, R2        ; R0 = R1 + R2
  ADDS R0, R1, #5       ; R0 = R1 + 5, update flags
  MOVEQ R3, R4          ; R3 = R4 if Z flag set
  AND R5, R6, R7, LSL #2 ; R5 = R6 & (R7 << 2)

Opcodes (bits 24-21):
  0000 = AND   Rd = Rn & Op2
  0001 = EOR   Rd = Rn ^ Op2
  0010 = SUB   Rd = Rn - Op2
  0011 = RSB   Rd = Op2 - Rn (reverse subtract)
  0100 = ADD   Rd = Rn + Op2
  0101 = ADC   Rd = Rn + Op2 + Carry
  0110 = SBC   Rd = Rn - Op2 - !Carry
  0111 = RSC   Rd = Op2 - Rn - !Carry
  1000 = TST   Rn & Op2 (flags only, no Rd)
  1001 = TEQ   Rn ^ Op2 (flags only, no Rd)
  1010 = CMP   Rn - Op2 (flags only, no Rd)
  1011 = CMN   Rn + Op2 (flags only, no Rd)
  1100 = ORR   Rd = Rn | Op2
  1101 = MOV   Rd = Op2 (Rn ignored)
  1110 = BIC   Rd = Rn & ~Op2 (bit clear)
  1111 = MVN   Rd = ~Op2 (move not)

The Barrel Shifter

ARM’s barrel shifter allows operands to be shifted as part of any data processing instruction:

Operand2 encoding when I=0 (register):
Bits 11-0:
  11  8   7   6  5  4   3  0
┌─────┬───────┬────┬───────┐
│ Amt │ Shift │ 0  │  Rm   │  Immediate shift amount
└─────┴───────┴────┴───────┘

  11  8   7   6  5  4   3  0
┌─────┬───────┬────┬───────┐
│ Rs  │ Shift │ 1  │  Rm   │  Register-specified shift amount
└─────┴───────┴────┴───────┘

Shift types (bits 6-5):
  00 = LSL (Logical Shift Left)
  01 = LSR (Logical Shift Right)
  10 = ASR (Arithmetic Shift Right - sign extend)
  11 = ROR (Rotate Right)

Special case: shift=00, amount=0 means no shift
Special case: shift=01/10, amount=0 means shift by 32
Special case: shift=11, amount=0 means RRX (rotate right through carry)

Immediate encoding (when I=1):

Bits 11-0:
  11  8   7  0
┌─────┬───────┐
│ Rot │ Imm8  │
└─────┴───────┘

Value = Imm8 rotated right by (Rot * 2) positions

Example: Rot=0x0F, Imm8=0x01
  Rotate 0x01 right by 30 bits = 0x00000004
  (Or equivalently, rotate left by 2)

Load/Store Instructions

Format: LDR{cond} Rd, [Rn, #offset]    ; Load word
        STR{cond} Rd, [Rn, #offset]    ; Store word
        LDRB, STRB                      ; Byte versions

Addressing modes:
  [Rn]              ; Base register only
  [Rn, #imm]        ; Base + immediate offset
  [Rn, Rm]          ; Base + register offset
  [Rn, Rm, LSL #n]  ; Base + shifted register
  [Rn, #imm]!       ; Pre-indexed with writeback
  [Rn], #imm        ; Post-indexed

Encoding:
  Bit 25: 0 = immediate offset, 1 = register offset
  Bit 24: 0 = post-indexed, 1 = pre-indexed (or offset)
  Bit 23: 0 = subtract offset, 1 = add offset
  Bit 22: 0 = word, 1 = byte
  Bit 21: 0 = no writeback, 1 = writeback
  Bit 20: 0 = store, 1 = load

Branch Instructions

Format: B{cond} label      ; Branch
        BL{cond} label     ; Branch with Link (function call)

Encoding (bits 27-25 = 101):
  Bit 24: 0 = Branch, 1 = Branch with Link (saves return address in LR)
  Bits 23-0: Signed 24-bit offset (in words)

The offset is:
  1. Sign-extended to 32 bits
  2. Left-shifted by 2 (word alignment)
  3. Added to PC+8 (due to pipeline)

Target = PC + 8 + (SignExtend(offset) << 2)

Example: 0xEAFFFFFE = B .-8 (infinite loop)
  Cond = 0xE (always)
  Offset = 0xFFFFFE = -2 (signed)
  Target = PC + 8 + (-2 * 4) = PC + 8 - 8 = PC

Condition Code Evaluation

Every ARM instruction can be conditional:

bool condition_passed(uint32_t cpsr, uint8_t cond) {
    bool N = (cpsr >> 31) & 1;  // Negative
    bool Z = (cpsr >> 30) & 1;  // Zero
    bool C = (cpsr >> 29) & 1;  // Carry
    bool V = (cpsr >> 28) & 1;  // Overflow

    switch (cond) {
        case 0x0: return Z;              // EQ: Equal
        case 0x1: return !Z;             // NE: Not equal
        case 0x2: return C;              // CS/HS: Carry set
        case 0x3: return !C;             // CC/LO: Carry clear
        case 0x4: return N;              // MI: Minus/negative
        case 0x5: return !N;             // PL: Plus/positive
        case 0x6: return V;              // VS: Overflow
        case 0x7: return !V;             // VC: No overflow
        case 0x8: return C && !Z;        // HI: Unsigned higher
        case 0x9: return !C || Z;        // LS: Unsigned lower or same
        case 0xA: return N == V;         // GE: Signed >=
        case 0xB: return N != V;         // LT: Signed <
        case 0xC: return !Z && (N == V); // GT: Signed >
        case 0xD: return Z || (N != V);  // LE: Signed <=
        case 0xE: return true;           // AL: Always
        case 0xF: return true;           // NV: (Special, treat as AL in ARMv5+)
    }
    return false;
}

Updating Flags

When S bit is set, update CPSR:

void update_flags(ARM_CPU *cpu, uint32_t result, uint32_t op1, uint32_t op2,
                  bool is_add, bool *carry_out) {
    // N flag: bit 31 of result
    if (result & 0x80000000) cpu->cpsr |= CPSR_N;
    else cpu->cpsr &= ~CPSR_N;

    // Z flag: result is zero
    if (result == 0) cpu->cpsr |= CPSR_Z;
    else cpu->cpsr &= ~CPSR_Z;

    // C flag: depends on operation
    if (carry_out) {
        if (*carry_out) cpu->cpsr |= CPSR_C;
        else cpu->cpsr &= ~CPSR_C;
    } else if (is_add) {
        // Carry from addition: unsigned overflow
        if (result < op1) cpu->cpsr |= CPSR_C;
        else cpu->cpsr &= ~CPSR_C;
    }

    // V flag: signed overflow (for ADD/SUB)
    if (is_add) {
        bool sign1 = op1 & 0x80000000;
        bool sign2 = op2 & 0x80000000;
        bool signr = result & 0x80000000;
        // Overflow if signs of operands same, but result different
        if ((sign1 == sign2) && (sign1 != signr))
            cpu->cpsr |= CPSR_V;
        else
            cpu->cpsr &= ~CPSR_V;
    }
}

Why This Matters

Building an ARM emulator is valuable for:

  • Deep understanding: No way to fake knowledge—every instruction must work
  • Tool development: Debuggers, profilers, binary analysis
  • Education: Teach others how CPUs work
  • Security research: Analyze ARM binaries without hardware
  • Vintage computing: Emulate classic ARM systems
  • Interview preparation: Demonstrates mastery of computer architecture

Project Specification

What You Will Build

An ARM emulator with:

  1. Full ARMv4/v5 instruction support: Data processing, load/store, branches
  2. Memory system: Configurable size, read/write operations
  3. Verbose execution mode: Show each instruction’s effect
  4. Debugging features: Breakpoints, single-step, register/memory inspection
  5. ELF loader: Load real ARM binaries
  6. I/O emulation: Basic UART for printf debugging

Functional Requirements

  1. CPU Emulation:
    • 16 general-purpose registers (R0-R15)
    • CPSR with NZCV flags
    • User mode execution (modes optional)
    • Correct PC+8 handling
  2. Instruction Support (minimum):
    • Data processing: AND, EOR, SUB, RSB, ADD, ADC, SBC, RSC, TST, TEQ, CMP, CMN, ORR, MOV, BIC, MVN
    • Load/Store: LDR, STR, LDRB, STRB
    • Branches: B, BL
    • Barrel shifter: All shift types
    • Multiply: MUL, MLA (optional)
  3. Memory System:
    • Configurable size (default 64KB-1MB)
    • Word, halfword, byte access
    • Basic ROM/RAM regions
  4. Debug Features:
    • Verbose mode: Print each instruction
    • Single-step execution
    • Breakpoints
    • Register dump
    • Memory dump

Non-Functional Requirements

  • Correctness: Pass ARM test suites
  • Performance: Run simple programs at reasonable speed
  • Debuggability: Clear error messages, state inspection
  • Portability: Build on Linux/macOS/Windows

Real World Outcome

$ ./arm-emu -v program.bin

ARM Emulator v1.0
Loading binary: program.bin (256 bytes)
Memory: 64KB @ 0x00000000

[0x00000000] E3A00001  MOV R0, #1
    R0: 0x00000000 → 0x00000001

[0x00000004] E3A01002  MOV R1, #2
    R1: 0x00000000 → 0x00000002

[0x00000008] E0802001  ADD R2, R0, R1
    R2: 0x00000000 → 0x00000003

[0x0000000C] E3520005  CMP R2, #5
    CPSR: N=1 Z=0 C=0 V=0 (3 < 5)

[0x00000010] AA000002  BGE 0x00000020
    Branch NOT taken (N != V)

[0x00000014] E2822001  ADD R2, R2, #1
    R2: 0x00000003 → 0x00000004

... execution continues ...

=== Execution Complete ===
Cycles: 47
Instructions: 42
Final state:
  R0=0x00000005  R1=0x00000002  R2=0x00000007  R3=0x00000000
  R4=0x00000000  R5=0x00000000  R6=0x00000000  R7=0x00000000
  R8=0x00000000  R9=0x00000000  R10=0x00000000 R11=0x00000000
  R12=0x00000000 SP=0x00010000  LR=0x00000000  PC=0x00000038
  CPSR=0x60000010 [nZCv, User mode]

Interactive debug mode:

$ ./arm-emu -d program.bin
ARM Emulator Debug Mode
Type 'help' for commands.

(emu) run
Stopped at breakpoint 0x00000010

(emu) regs
R0=0x00000005  R1=0x00000002  R2=0x00000003  R3=0x00000000
...
PC=0x00000010  CPSR=0x80000010 [Nzcv]

(emu) step
[0x00000010] AA000002  BGE 0x00000020
    Branch NOT taken

(emu) mem 0x00001000 16
00001000: 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0A 00 00  Hello, World!...

(emu) break 0x00000030
Breakpoint set at 0x00000030

(emu) continue
Stopped at breakpoint 0x00000030

(emu) quit

Solution Architecture

High-Level Design

┌─────────────────────────────────────────────────────────────────────┐
│                          ARM Emulator                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐           │
│  │    Loader    │──▶│     CPU      │──▶│    Debug     │           │
│  │  (ELF/bin)   │   │   Execute    │   │   Interface  │           │
│  └──────────────┘   └──────────────┘   └──────────────┘           │
│                            │                                        │
│         ┌──────────────────┼──────────────────┐                    │
│         │                  │                  │                    │
│         ▼                  ▼                  ▼                    │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐           │
│  │   Decoder    │   │   Memory     │   │   Disasm     │           │
│  │  (parse insn)│   │   System     │   │  (for debug) │           │
│  └──────────────┘   └──────────────┘   └──────────────┘           │
│         │                  ▲                                       │
│         │                  │                                       │
│         ▼                  │                                       │
│  ┌──────────────┐         │                                        │
│  │    ALU       │─────────┘                                        │
│  │ (operations) │                                                  │
│  └──────────────┘                                                  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Key Components

Component Responsibility Key Functions
Loader Load binaries into memory load_binary, load_elf
CPU Core Main execution loop run, step, reset
Decoder Parse instruction bits decode_insn, identify type
ALU Arithmetic/logic operations exec_data_proc, barrel_shift
Memory Read/write emulated memory mem_read, mem_write
Disassembler Convert instructions to text disassemble
Debug User interaction set_breakpoint, dump_regs

Data Structures

// CPU state
typedef struct {
    uint32_t r[16];        // R0-R15 (R13=SP, R14=LR, R15=PC)
    uint32_t cpsr;         // Current Program Status Register
    uint32_t spsr;         // Saved PSR (for exceptions)

    // Banked registers for modes (optional for basic emulator)
    uint32_t r_fiq[7];     // R8-R14 for FIQ mode
    uint32_t r_svc[2];     // R13, R14 for Supervisor mode
    uint32_t r_abt[2];     // R13, R14 for Abort mode
    uint32_t r_irq[2];     // R13, R14 for IRQ mode
    uint32_t r_und[2];     // R13, R14 for Undefined mode
} ARM_Registers;

typedef struct {
    ARM_Registers regs;

    uint8_t *memory;       // Emulated memory
    size_t mem_size;

    uint64_t cycles;       // Cycle counter
    uint64_t insn_count;   // Instruction counter
    bool halted;           // CPU halted

    // Debug state
    uint32_t breakpoints[16];
    int num_breakpoints;
    bool verbose;          // Print each instruction
    bool single_step;
} ARM_CPU;

// Decoded instruction (intermediate representation)
typedef struct {
    uint8_t cond;          // Condition code
    uint8_t type;          // Instruction type
    uint8_t opcode;        // Operation code
    uint8_t rn, rd, rs, rm; // Register operands
    uint32_t imm;          // Immediate value
    uint8_t shift_type;    // Shift type
    uint8_t shift_amount;  // Shift amount
    bool s_bit;            // Update flags?
    bool i_bit;            // Immediate operand?
    bool p_bit;            // Pre/post indexing
    bool u_bit;            // Add/subtract offset
    bool b_bit;            // Byte/word
    bool w_bit;            // Writeback
    bool l_bit;            // Load/store
} DecodedInsn;

// CPSR flags
#define CPSR_N  (1 << 31)  // Negative
#define CPSR_Z  (1 << 30)  // Zero
#define CPSR_C  (1 << 29)  // Carry
#define CPSR_V  (1 << 28)  // Overflow
#define CPSR_MODE_MASK 0x1F

Main Execution Loop

void cpu_run(ARM_CPU *cpu) {
    while (!cpu->halted) {
        // Fetch
        uint32_t pc = cpu->regs.r[15];
        uint32_t insn = mem_read32(cpu, pc - 8);  // PC is ahead due to pipeline

        // Decode condition
        uint8_t cond = insn >> 28;

        // Check condition
        if (!condition_passed(cpu->regs.cpsr, cond)) {
            cpu->regs.r[15] += 4;  // Skip instruction
            cpu->cycles++;
            continue;
        }

        // Decode and execute
        DecodedInsn decoded;
        decode_instruction(insn, &decoded);

        if (cpu->verbose) {
            print_instruction(cpu, pc - 8, insn, &decoded);
        }

        execute_instruction(cpu, &decoded);

        // Check for breakpoints
        if (is_breakpoint(cpu, cpu->regs.r[15] - 8)) {
            cpu->halted = true;
            printf("Breakpoint at 0x%08X\n", cpu->regs.r[15] - 8);
        }

        cpu->insn_count++;
        cpu->cycles++;

        if (cpu->single_step) {
            cpu->halted = true;
        }
    }
}

Instruction Dispatch

void execute_instruction(ARM_CPU *cpu, DecodedInsn *insn) {
    uint32_t type = insn->type;

    switch (type) {
        case INSN_DATA_PROC:
            exec_data_processing(cpu, insn);
            break;

        case INSN_MULTIPLY:
            exec_multiply(cpu, insn);
            break;

        case INSN_LOAD_STORE:
            exec_load_store(cpu, insn);
            break;

        case INSN_LOAD_STORE_MULTI:
            exec_load_store_multiple(cpu, insn);
            break;

        case INSN_BRANCH:
            exec_branch(cpu, insn);
            break;

        case INSN_SWI:
            exec_swi(cpu, insn);
            break;

        default:
            printf("Unimplemented instruction type: %d\n", type);
            cpu->halted = true;
    }
}

Implementation Guide

Development Environment Setup

# Required tools
sudo apt-get install gcc gdb

# ARM cross-compiler for creating test binaries
sudo apt-get install gcc-arm-none-eabi

# Create project structure
mkdir -p arm-emu/{src,include,tests,binaries}
cd arm-emu

Project Structure

arm-emu/
├── src/
│   ├── main.c             # Entry point, CLI
│   ├── cpu.c              # CPU state, execution loop
│   ├── decode.c           # Instruction decoder
│   ├── execute.c          # Instruction execution
│   ├── alu.c              # ALU operations, barrel shifter
│   ├── memory.c           # Memory system
│   ├── loader.c           # Binary/ELF loader
│   ├── disasm.c           # Disassembler
│   └── debug.c            # Debug interface
├── include/
│   ├── cpu.h
│   ├── decode.h
│   ├── memory.h
│   └── debug.h
├── tests/
│   ├── test_alu.c         # ALU unit tests
│   ├── test_decode.c      # Decoder tests
│   └── programs/          # Test ARM programs
│       ├── simple.s       # Simple test
│       ├── loop.s         # Loop test
│       └── fibonacci.s    # Fibonacci
├── Makefile
└── README.md

Implementation Phases

Phase 1: CPU State and Memory (Days 1-4)

Goals:

  • Define CPU state structure
  • Implement memory system
  • Create basic initialization

Tasks:

  1. Define ARM_CPU structure with registers
  2. Implement mem_read8/16/32 and mem_write8/16/32
  3. Implement cpu_init() and cpu_reset()
  4. Load raw binary into memory
  5. Test with memory read/write

Key code:

ARM_CPU *cpu_create(size_t mem_size) {
    ARM_CPU *cpu = calloc(1, sizeof(ARM_CPU));
    cpu->memory = calloc(mem_size, 1);
    cpu->mem_size = mem_size;
    cpu_reset(cpu);
    return cpu;
}

void cpu_reset(ARM_CPU *cpu) {
    memset(&cpu->regs, 0, sizeof(cpu->regs));
    cpu->regs.cpsr = 0x10;  // User mode
    cpu->regs.r[15] = 8;    // PC starts at 0, but reads as 8 (pipeline)
    cpu->halted = false;
    cpu->cycles = 0;
    cpu->insn_count = 0;
}

uint32_t mem_read32(ARM_CPU *cpu, uint32_t addr) {
    if (addr + 3 >= cpu->mem_size) {
        printf("Memory read out of bounds: 0x%08X\n", addr);
        cpu->halted = true;
        return 0;
    }
    return *(uint32_t *)(cpu->memory + addr);
}

void mem_write32(ARM_CPU *cpu, uint32_t addr, uint32_t value) {
    if (addr + 3 >= cpu->mem_size) {
        printf("Memory write out of bounds: 0x%08X\n", addr);
        cpu->halted = true;
        return;
    }
    *(uint32_t *)(cpu->memory + addr) = value;
}

Checkpoint: CPU initializes, memory read/write works.

Phase 2: Instruction Decoder (Days 5-10)

Goals:

  • Decode all instruction types
  • Handle barrel shifter encoding
  • Parse immediate values

Tasks:

  1. Implement condition extraction
  2. Implement instruction type identification (bits 27-25)
  3. Decode data processing instructions
  4. Decode load/store instructions
  5. Decode branch instructions
  6. Implement barrel shifter decode

Key code:

void decode_instruction(uint32_t insn, DecodedInsn *decoded) {
    decoded->cond = (insn >> 28) & 0xF;

    uint8_t bits_27_25 = (insn >> 25) & 0x7;
    uint8_t bit_4 = (insn >> 4) & 0x1;
    uint8_t bit_7 = (insn >> 7) & 0x1;

    // Determine instruction type
    if (bits_27_25 == 0b000 || bits_27_25 == 0b001) {
        // Data processing or multiply
        if (bits_27_25 == 0b000 && bit_4 == 1 && bit_7 == 1) {
            decoded->type = INSN_MULTIPLY;
            decode_multiply(insn, decoded);
        } else {
            decoded->type = INSN_DATA_PROC;
            decode_data_proc(insn, decoded);
        }
    } else if (bits_27_25 == 0b010 || bits_27_25 == 0b011) {
        decoded->type = INSN_LOAD_STORE;
        decode_load_store(insn, decoded);
    } else if (bits_27_25 == 0b100) {
        decoded->type = INSN_LOAD_STORE_MULTI;
        decode_ldm_stm(insn, decoded);
    } else if (bits_27_25 == 0b101) {
        decoded->type = INSN_BRANCH;
        decode_branch(insn, decoded);
    } else if (bits_27_25 == 0b111) {
        decoded->type = INSN_SWI;
        decode_swi(insn, decoded);
    } else {
        decoded->type = INSN_UNDEFINED;
    }
}

void decode_data_proc(uint32_t insn, DecodedInsn *decoded) {
    decoded->i_bit = (insn >> 25) & 1;
    decoded->opcode = (insn >> 21) & 0xF;
    decoded->s_bit = (insn >> 20) & 1;
    decoded->rn = (insn >> 16) & 0xF;
    decoded->rd = (insn >> 12) & 0xF;

    if (decoded->i_bit) {
        // Immediate operand
        uint32_t imm8 = insn & 0xFF;
        uint32_t rot = ((insn >> 8) & 0xF) * 2;
        decoded->imm = (imm8 >> rot) | (imm8 << (32 - rot));
    } else {
        // Register operand with shift
        decoded->rm = insn & 0xF;
        decoded->shift_type = (insn >> 5) & 0x3;

        if ((insn >> 4) & 1) {
            // Register-specified shift
            decoded->rs = (insn >> 8) & 0xF;
            decoded->shift_amount = 0;  // Will read from Rs
        } else {
            // Immediate shift
            decoded->shift_amount = (insn >> 7) & 0x1F;
            decoded->rs = 0xFF;  // Not used
        }
    }
}

Checkpoint: Decoder correctly parses all instruction types.

Phase 3: Data Processing Execution (Days 11-18)

Goals:

  • Implement all 16 data processing opcodes
  • Implement barrel shifter
  • Implement flag updates

Tasks:

  1. Implement barrel_shift() for all shift types
  2. Implement each opcode (AND, EOR, SUB, etc.)
  3. Implement flag calculation
  4. Handle PC as destination (branch-like behavior)
  5. Test with simple assembly programs

Key code:

uint32_t barrel_shift(ARM_CPU *cpu, uint32_t value, uint8_t shift_type,
                      uint8_t shift_amount, bool *carry_out) {
    if (shift_amount == 0 && shift_type != 0) {
        // Special cases for zero shift amount
        switch (shift_type) {
            case 0: return value;  // LSL #0 = no shift
            case 1: *carry_out = (value >> 31) & 1; return 0;  // LSR #32
            case 2: *carry_out = (value >> 31) & 1;
                    return (value & 0x80000000) ? 0xFFFFFFFF : 0;  // ASR #32
            case 3: // RRX
                *carry_out = value & 1;
                return ((cpu->regs.cpsr & CPSR_C) ? 0x80000000 : 0) | (value >> 1);
        }
    }

    switch (shift_type) {
        case 0: // LSL
            if (shift_amount > 0) *carry_out = (value >> (32 - shift_amount)) & 1;
            return value << shift_amount;

        case 1: // LSR
            if (shift_amount > 0) *carry_out = (value >> (shift_amount - 1)) & 1;
            return value >> shift_amount;

        case 2: // ASR
            if (shift_amount > 0) *carry_out = (value >> (shift_amount - 1)) & 1;
            return (int32_t)value >> shift_amount;

        case 3: // ROR
            if (shift_amount > 0) *carry_out = (value >> (shift_amount - 1)) & 1;
            return (value >> shift_amount) | (value << (32 - shift_amount));
    }
    return value;
}

void exec_data_processing(ARM_CPU *cpu, DecodedInsn *insn) {
    uint32_t op1 = cpu->regs.r[insn->rn];
    uint32_t op2;
    bool carry_out = (cpu->regs.cpsr & CPSR_C) != 0;

    // Get operand 2
    if (insn->i_bit) {
        op2 = insn->imm;
    } else {
        uint32_t rm_val = cpu->regs.r[insn->rm];
        uint8_t shift_amt = (insn->rs != 0xFF)
            ? (cpu->regs.r[insn->rs] & 0xFF)
            : insn->shift_amount;
        op2 = barrel_shift(cpu, rm_val, insn->shift_type, shift_amt, &carry_out);
    }

    uint32_t result;
    bool write_result = true;
    bool is_arithmetic = false;

    switch (insn->opcode) {
        case 0x0: result = op1 & op2; break;  // AND
        case 0x1: result = op1 ^ op2; break;  // EOR
        case 0x2: result = op1 - op2; is_arithmetic = true; break;  // SUB
        case 0x3: result = op2 - op1; is_arithmetic = true; break;  // RSB
        case 0x4: result = op1 + op2; is_arithmetic = true; break;  // ADD
        case 0x5: result = op1 + op2 + ((cpu->regs.cpsr & CPSR_C) ? 1 : 0);
                  is_arithmetic = true; break;  // ADC
        case 0x6: result = op1 - op2 - ((cpu->regs.cpsr & CPSR_C) ? 0 : 1);
                  is_arithmetic = true; break;  // SBC
        case 0x7: result = op2 - op1 - ((cpu->regs.cpsr & CPSR_C) ? 0 : 1);
                  is_arithmetic = true; break;  // RSC
        case 0x8: result = op1 & op2; write_result = false; break;  // TST
        case 0x9: result = op1 ^ op2; write_result = false; break;  // TEQ
        case 0xA: result = op1 - op2; write_result = false;
                  is_arithmetic = true; break;  // CMP
        case 0xB: result = op1 + op2; write_result = false;
                  is_arithmetic = true; break;  // CMN
        case 0xC: result = op1 | op2; break;  // ORR
        case 0xD: result = op2; break;        // MOV
        case 0xE: result = op1 & ~op2; break; // BIC
        case 0xF: result = ~op2; break;       // MVN
        default: return;
    }

    // Update flags if S bit set
    if (insn->s_bit) {
        update_flags(cpu, result, op1, op2, is_arithmetic, &carry_out);
    }

    // Write result
    if (write_result) {
        cpu->regs.r[insn->rd] = result;
        if (insn->rd == 15) {
            // Writing to PC - branch
            cpu->regs.r[15] = result + 8;  // Compensate for pipeline
        } else {
            cpu->regs.r[15] += 4;  // Advance PC normally
        }
    } else {
        cpu->regs.r[15] += 4;
    }
}

Checkpoint: Simple ALU programs execute correctly.

Phase 4: Load/Store and Branches (Days 19-25)

Goals:

  • Implement LDR, STR, LDRB, STRB
  • Implement B, BL
  • Handle all addressing modes

Tasks:

  1. Implement load/store with immediate offset
  2. Implement load/store with register offset
  3. Implement pre/post indexing
  4. Implement writeback
  5. Implement B and BL
  6. Test with loops and memory access

Key code:

void exec_load_store(ARM_CPU *cpu, DecodedInsn *insn) {
    uint32_t base = cpu->regs.r[insn->rn];
    uint32_t offset;

    if (!insn->i_bit) {
        // Immediate offset
        offset = insn->imm;
    } else {
        // Register offset (may be shifted)
        bool carry;
        offset = barrel_shift(cpu, cpu->regs.r[insn->rm],
                              insn->shift_type, insn->shift_amount, &carry);
    }

    // Add or subtract offset
    uint32_t addr;
    if (insn->p_bit) {
        // Pre-indexed
        addr = insn->u_bit ? base + offset : base - offset;
    } else {
        // Post-indexed
        addr = base;
    }

    if (insn->l_bit) {
        // Load
        uint32_t value;
        if (insn->b_bit) {
            value = mem_read8(cpu, addr);
        } else {
            value = mem_read32(cpu, addr);
        }
        cpu->regs.r[insn->rd] = value;
    } else {
        // Store
        uint32_t value = cpu->regs.r[insn->rd];
        if (insn->b_bit) {
            mem_write8(cpu, addr, value & 0xFF);
        } else {
            mem_write32(cpu, addr, value);
        }
    }

    // Writeback
    if (insn->w_bit || !insn->p_bit) {
        if (insn->p_bit) {
            cpu->regs.r[insn->rn] = addr;
        } else {
            cpu->regs.r[insn->rn] = insn->u_bit ? base + offset : base - offset;
        }
    }

    cpu->regs.r[15] += 4;
}

void exec_branch(ARM_CPU *cpu, DecodedInsn *insn) {
    // Sign-extend 24-bit offset
    int32_t offset = insn->imm;
    if (offset & 0x800000) {
        offset |= 0xFF000000;  // Sign extend
    }
    offset <<= 2;  // Word alignment

    if (insn->l_bit) {
        // Branch with Link - save return address
        cpu->regs.r[14] = cpu->regs.r[15] - 4;  // PC points to next+8, we want next
    }

    // PC = PC + 8 + offset (8 for pipeline)
    cpu->regs.r[15] = cpu->regs.r[15] + offset;
}

Checkpoint: Loops and subroutine calls work.

Phase 5: Debug Interface and Polish (Days 26-35)

Goals:

  • Implement interactive debugger
  • Add disassembler
  • Create comprehensive tests

Tasks:

  1. Implement command parser
  2. Implement breakpoints
  3. Implement single-step
  4. Implement register/memory dump
  5. Add disassembly output
  6. Test with Fibonacci, etc.

Checkpoint: Can debug programs interactively, step through code.


Hints in Layers

Hint 1: PC Pipeline Offset

ARM has a 3-stage pipeline. When an instruction reads PC, it gets the address of the current instruction + 8:

// Instruction at 0x1000
MOV R0, PC    ; R0 = 0x1008, not 0x1000!

// In emulator, when PC is operand:
uint32_t pc_value = cpu->regs.r[15];  // Already has +8 offset

// When fetching:
uint32_t insn = mem_read32(cpu, cpu->regs.r[15] - 8);
Hint 2: Immediate Rotation

ARM immediate values use 8 bits + 4-bit rotation:

// Bits 11-0 of instruction:
// [11:8] = rotate (r), [7:0] = immediate (i)
// value = i ROR (r * 2)

uint32_t decode_immediate(uint32_t bits) {
    uint32_t imm8 = bits & 0xFF;
    uint32_t rotate = ((bits >> 8) & 0xF) * 2;
    if (rotate == 0) return imm8;
    return (imm8 >> rotate) | (imm8 << (32 - rotate));
}

// Examples:
// 0x001 → 0x00000001 (no rotation)
// 0x101 → 0x40000000 (1 rotated right by 2)
// 0xF01 → 0x00000004 (1 rotated right by 30 = left by 2)
Hint 3: Carry Flag in Shifts

The barrel shifter produces a carry out that affects the C flag:

// For LSL, carry is the last bit shifted out:
// LSL #5: carry = bit 27 of original value

// For LSR/ASR, carry is bit (shift_amount - 1):
// LSR #1: carry = bit 0

// For ROR, carry is bit (shift_amount - 1):
// ROR #4: carry = bit 3

// Special: LSL #0 leaves carry unchanged
// Special: LSR/ASR #0 means shift by 32 (in immediate form)
// Special: ROR #0 means RRX (rotate right extended)
Hint 4: Testing Strategy

Create minimal test programs:

# test_add.s - Test ADD instruction
    MOV R0, #5
    MOV R1, #3
    ADD R2, R0, R1   @ R2 should be 8
    SWI #0           @ Halt (you define this)

# Assemble with:
arm-none-eabi-as -o test.o test_add.s
arm-none-eabi-ld -Ttext=0x0 -o test.elf test.o
arm-none-eabi-objcopy -O binary test.elf test.bin

Check each register value after execution.

Hint 5: SWI for Basic I/O

Implement SWI (Software Interrupt) for basic I/O:

void exec_swi(ARM_CPU *cpu, DecodedInsn *insn) {
    uint32_t call_num = insn->imm & 0xFFFFFF;

    switch (call_num) {
        case 0:  // Exit
            cpu->halted = true;
            break;
        case 1:  // Print char in R0
            putchar(cpu->regs.r[0] & 0xFF);
            break;
        case 2:  // Print number in R0
            printf("%d", cpu->regs.r[0]);
            break;
        // Add more as needed
    }
    cpu->regs.r[15] += 4;
}
Hint 6: Debugging Your Emulator

When things go wrong:

// Add instruction tracing
void trace_instruction(ARM_CPU *cpu, uint32_t pc, uint32_t insn) {
    printf("[%08X] %08X  ", pc, insn);
    disassemble(insn);
    printf("\n");
    printf("  R0=%08X R1=%08X R2=%08X R3=%08X\n",
           cpu->regs.r[0], cpu->regs.r[1], cpu->regs.r[2], cpu->regs.r[3]);
    printf("  CPSR=%08X [%c%c%c%c]\n", cpu->regs.cpsr,
           (cpu->regs.cpsr & CPSR_N) ? 'N' : 'n',
           (cpu->regs.cpsr & CPSR_Z) ? 'Z' : 'z',
           (cpu->regs.cpsr & CPSR_C) ? 'C' : 'c',
           (cpu->regs.cpsr & CPSR_V) ? 'V' : 'v');
}

Compare with QEMU or real hardware for reference.


Testing Strategy

Test Categories

Category Purpose Examples
Unit Test individual functions Barrel shifter, flag calculation
Decode Verify instruction parsing All instruction formats
Execute Test each opcode ADD, SUB, MOV, etc.
Integration Run complete programs Loops, subroutines
Comparison Match real ARM behavior Run same code on QEMU

Critical Test Cases

// Test 1: Basic MOV
void test_mov(void) {
    uint32_t insn = 0xE3A00042;  // MOV R0, #0x42
    ARM_CPU cpu;
    cpu_init(&cpu);
    load_word(&cpu, 0, insn);
    cpu_step(&cpu);
    assert(cpu.regs.r[0] == 0x42);
}

// Test 2: Barrel shifter
void test_shift(void) {
    bool carry;
    assert(barrel_shift(NULL, 0x1, 0, 4, &carry) == 0x10);  // LSL #4
    assert(barrel_shift(NULL, 0x10, 1, 4, &carry) == 0x1);  // LSR #4
    assert(barrel_shift(NULL, 0x80000000, 2, 4, &carry) == 0xF8000000);  // ASR #4
}

// Test 3: Condition codes
void test_conditions(void) {
    ARM_CPU cpu;
    cpu_init(&cpu);

    // Test EQ (Z set)
    cpu.regs.cpsr = CPSR_Z;
    assert(condition_passed(cpu.regs.cpsr, 0) == true);
    assert(condition_passed(cpu.regs.cpsr, 1) == false);  // NE

    // Test signed comparisons
    cpu.regs.cpsr = CPSR_N | CPSR_V;  // N=V, so GE is true
    assert(condition_passed(cpu.regs.cpsr, 0xA) == true);  // GE
}

// Test 4: Fibonacci
void test_fibonacci(void) {
    ARM_CPU cpu;
    cpu_init(&cpu);
    load_program(&cpu, "tests/fibonacci.bin");
    cpu_run(&cpu);
    // Expect R0 = 55 (10th Fibonacci number)
    assert(cpu.regs.r[0] == 55);
}

Test ARM Programs

# fibonacci.s - Calculate Fibonacci(10)
    MOV R0, #10      @ n
    MOV R1, #0       @ fib(0)
    MOV R2, #1       @ fib(1)

loop:
    CMP R0, #0
    BEQ done
    ADD R3, R1, R2   @ next = fib(n-1) + fib(n-2)
    MOV R1, R2       @ shift
    MOV R2, R3
    SUB R0, R0, #1
    B loop

done:
    MOV R0, R1       @ result in R0
    SWI #0           @ halt

Common Pitfalls & Debugging

Frequent Mistakes

Pitfall Symptom Solution
PC offset wrong Branches go to wrong place Remember PC+8 in ARM mode
Immediate decode wrong Wrong values loaded Test rotation separately
Carry not updated Condition checks fail Update C in logical ops too
Signed vs unsigned Wrong overflow detection Use correct cast in ASR
Byte order Memory garbled Ensure little-endian
Missing S-bit check Flags update unexpectedly Only update if S=1

Debugging Tips

  1. Single instruction tests: Test each instruction type in isolation
  2. Compare with QEMU: Run same code, compare register states
  3. Trace everything: Print before/after for each instruction
  4. Use known binaries: ARM toolchain’s test programs
  5. Check edge cases: Shift by 0, shift by 32, immediate 0

Extensions & Challenges

Beginner Extensions

  • Thumb mode: Add 16-bit Thumb instruction support
  • More verbose output: Show barrel shifter operations
  • Instruction counts: Profile which instructions execute most

Intermediate Extensions

  • Full multiply support: MUL, MLA, UMULL, SMULL
  • Load/Store Multiple: LDM, STM
  • Coprocessor stubs: Handle CP15 for system control
  • ELF loader: Parse ELF files, set up sections

Advanced Extensions

  • ARM/Thumb interworking: BX instruction
  • Processor modes: Implement all 7 modes with banked registers
  • Exceptions: Implement exception handling
  • Memory protection: Add simple MPU emulation
  • JIT compilation: Translate hot paths to native code

The Interview Questions They’ll Ask

  1. “How does an ARM emulator work?”
    • Fetch-decode-execute loop, condition checking, barrel shifter
  2. “What is the barrel shifter and why is it important?”
    • Shifts operand2 as part of data processing, powerful for multiplication/division
  3. “Explain ARM’s condition codes”
    • NZCV flags, almost all instructions conditional, saves branches
  4. “Why is PC+8 instead of PC?”
    • 3-stage pipeline: fetch, decode, execute - PC points to fetch stage
  5. “How do you test an emulator for correctness?”
    • Unit tests, comparison with real hardware, test suites
  6. “What’s the difference between emulation and simulation?”
    • Emulation: functionally identical behavior
    • Simulation: may model timing, power, etc.

Books That Will Help

Topic Book Chapter
ARM instruction set ARM Architecture Reference Manual Section A
Emulator design Computer Systems: A Programmer’s Perspective Ch. 4
CPU pipelines Computer Organization and Design ARM Edition Ch. 4
Condition codes The Art of ARM Assembly, Vol 1 Ch. 4
Memory systems Computer Architecture (Hennessy) Ch. 5
Binary analysis Practical Binary Analysis Ch. 2-3

Self-Assessment Checklist

Understanding

  • I can decode any ARM data processing instruction by hand
  • I understand how the barrel shifter encodes shift type and amount
  • I can evaluate condition codes given NZCV flag states
  • I understand the PC+8 pipeline offset
  • I can explain the difference between immediate and register operand encoding

Implementation

  • All 16 data processing opcodes work correctly
  • Barrel shifter handles all shift types including edge cases
  • Load/store works with all addressing modes
  • Branches calculate correct targets
  • Flags are updated correctly when S bit is set

Testing

  • Simple programs (ADD, MOV) pass
  • Loop programs work
  • Fibonacci computes correctly
  • Results match QEMU or real ARM

Learning Milestones

  1. Simple programs run (MOV, ADD) → Basic decode/execute works
  2. Branches and loops work → Control flow is correct
  3. Fibonacci computes correctly → Arithmetic and memory work
  4. You can run real test binaries → Emulator is production-quality

This guide was expanded from LEARN_ARM_DEEP_DIVE.md. For the complete learning path, see the project index.