Project 8: ARM Emulator
Build an ARM emulator that can execute real ARM binaries, emulating the CPU, memory, and basic I/O. It will run simple bare-metal programs and show the internal state of registers and memory—like building a mini-QEMU.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C (alt: Rust, C++, Go) |
| Difficulty | Master |
| Time | 1 month+ |
| Coolness | ★★★★★ Pure Magic (Super Cool) |
| Portfolio Value | Open Core Infrastructure |
| Prerequisites | Project 1 (instruction decoder), Projects 2-3 (ARM understanding), strong C skills |
| Key Topics | ISA emulation, CPU simulation, barrel shifter, condition codes |
Learning Objectives
By completing this project, you will:
- Fully understand the ARM instruction set: Every instruction must be decoded and executed correctly
- Implement the ARM execution model: PC+8 pipeline offset, condition codes, barrel shifter
- Build a virtual machine: Memory management, register file, program execution
- Master low-level programming: Bit manipulation, instruction encoding, state machines
- Create developer tools: Debugger-like stepping, breakpoints, memory inspection
- Understand CPU architecture deeply: After this, ARM will have no secrets
The Core Question You’re Answering
“What exactly happens when an ARM processor executes an instruction?”
Building an emulator forces complete understanding. You can’t fake it—every instruction must be decoded bit by bit and executed with correct semantics. This project is the ultimate test of ARM knowledge and the most powerful way to internalize the architecture.
Concepts You Must Understand First
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| ARM instruction encoding | Every bit must be decoded correctly | ARM ARM, Project 1 |
| Condition codes (NZCV) | Almost all instructions are conditional | ARM ARM Section A2 |
| Barrel shifter | ARM’s unique feature for operand processing | ARM ARM Section A5 |
| PC+8 offset | Pipeline affects PC-relative addressing | ARM TRM |
| Load/Store architecture | Memory access patterns differ from x86 | ARM fundamentals |
| Processor modes | Different register banks, privileges | ARM ARM Section A2 |
Self-Assessment Questions
- Instruction format: What bits determine if an instruction is data processing or load/store?
- Condition codes: What condition code value means “always execute”?
- Barrel shifter: How do you extract the shift type and amount from bits 11-4?
- Immediate encoding: An ARM immediate is 8 bits + 4-bit rotation. How do you decode 0x0F1?
- PC behavior: When executing instruction at 0x1000, what value does reading PC give?
Theoretical Foundation
CPU Emulation Fundamentals
An emulator simulates the behavior of hardware in software:
Real Hardware: Emulator:
┌─────────────────────┐ ┌─────────────────────────────────┐
│ ARM Silicon │ │ Software Implementation │
│ │ │ │
│ ┌─────────────────┐ │ │ struct ARM_CPU { │
│ │ Registers │ │ ═══▶ │ uint32_t r[16]; // R0-R15 │
│ │ R0-R15, CPSR │ │ │ uint32_t cpsr; │
│ └─────────────────┘ │ │ }; │
│ │ │ │
│ ┌─────────────────┐ │ │ uint8_t memory[MEM_SIZE]; │
│ │ Memory Bus │ │ ═══▶ │ │
│ └─────────────────┘ │ │ void execute(cpu, insn); │
│ │ │ │
└─────────────────────┘ └─────────────────────────────────┘
The Fetch-Decode-Execute Cycle
Every CPU (and emulator) runs this loop:
┌─────────────────────────────────────────────────────────────────────┐
│ Fetch-Decode-Execute Cycle │
└─────────────────────────────────────────────────────────────────────┘
┌──────────────┐
│ │
│ FETCH │────▶ Read instruction from memory at PC
│ │ insn = mem[PC]
└──────┬───────┘
│
▼
┌──────────────┐
│ │
│ DECODE │────▶ Extract opcode, operands, conditions
│ │ cond = insn >> 28
│ │ opcode = (insn >> 21) & 0xF
└──────┬───────┘
│
▼
┌──────────────┐
│ │
│ CONDITION │────▶ Check if condition passes (NZCV flags)
│ CHECK │ if (!condition_passed(cpsr, cond)) skip
│ │
└──────┬───────┘
│
▼
┌──────────────┐
│ │
│ EXECUTE │────▶ Perform the operation
│ │ result = ALU(Rn, operand2)
│ │ Rd = result (maybe update flags)
└──────┬───────┘
│
▼
┌──────────────┐
│ │
│ UPDATE PC │────▶ PC += 4 (or branch target)
│ │
└──────┬───────┘
│
└──────────────────────────▶ (repeat)
ARM Instruction Encoding Overview
ARM instructions are 32 bits with a consistent structure:
Bit layout overview:
31 28 27 26 25 24 21 20 19 16 15 12 11 0
┌─────┬─────┬──┬─────┬──┬─────┬─────┬──────────────────┐
│Cond │ Op │I │OpCd │S │ Rn │ Rd │ Operand2 │
└─────┴─────┴──┴─────┴──┴─────┴─────┴──────────────────┘
Cond (31-28): Condition field - when to execute
0000 = EQ (Z=1)
0001 = NE (Z=0)
...
1110 = AL (always)
1111 = NV (never/special)
Op (27-26): Major instruction class
00 = Data Processing / Multiply
01 = Load/Store single
10 = Branch / Block transfer
11 = Coprocessor / SWI
I (25): Immediate bit
0 = Operand2 is register (with shift)
1 = Operand2 is immediate (with rotation)
OpCode (24-21): Specific operation within class
For data processing: AND, EOR, SUB, RSB, ADD, etc.
S (20): Set flags bit
0 = Don't update CPSR
1 = Update CPSR based on result
Data Processing Instructions
The most common instruction type:
Format: <OP>{S}{cond} Rd, Rn, Operand2
Examples:
ADD R0, R1, R2 ; R0 = R1 + R2
ADDS R0, R1, #5 ; R0 = R1 + 5, update flags
MOVEQ R3, R4 ; R3 = R4 if Z flag set
AND R5, R6, R7, LSL #2 ; R5 = R6 & (R7 << 2)
Opcodes (bits 24-21):
0000 = AND Rd = Rn & Op2
0001 = EOR Rd = Rn ^ Op2
0010 = SUB Rd = Rn - Op2
0011 = RSB Rd = Op2 - Rn (reverse subtract)
0100 = ADD Rd = Rn + Op2
0101 = ADC Rd = Rn + Op2 + Carry
0110 = SBC Rd = Rn - Op2 - !Carry
0111 = RSC Rd = Op2 - Rn - !Carry
1000 = TST Rn & Op2 (flags only, no Rd)
1001 = TEQ Rn ^ Op2 (flags only, no Rd)
1010 = CMP Rn - Op2 (flags only, no Rd)
1011 = CMN Rn + Op2 (flags only, no Rd)
1100 = ORR Rd = Rn | Op2
1101 = MOV Rd = Op2 (Rn ignored)
1110 = BIC Rd = Rn & ~Op2 (bit clear)
1111 = MVN Rd = ~Op2 (move not)
The Barrel Shifter
ARM’s barrel shifter allows operands to be shifted as part of any data processing instruction:
Operand2 encoding when I=0 (register):
Bits 11-0:
11 8 7 6 5 4 3 0
┌─────┬───────┬────┬───────┐
│ Amt │ Shift │ 0 │ Rm │ Immediate shift amount
└─────┴───────┴────┴───────┘
11 8 7 6 5 4 3 0
┌─────┬───────┬────┬───────┐
│ Rs │ Shift │ 1 │ Rm │ Register-specified shift amount
└─────┴───────┴────┴───────┘
Shift types (bits 6-5):
00 = LSL (Logical Shift Left)
01 = LSR (Logical Shift Right)
10 = ASR (Arithmetic Shift Right - sign extend)
11 = ROR (Rotate Right)
Special case: shift=00, amount=0 means no shift
Special case: shift=01/10, amount=0 means shift by 32
Special case: shift=11, amount=0 means RRX (rotate right through carry)
Immediate encoding (when I=1):
Bits 11-0:
11 8 7 0
┌─────┬───────┐
│ Rot │ Imm8 │
└─────┴───────┘
Value = Imm8 rotated right by (Rot * 2) positions
Example: Rot=0x0F, Imm8=0x01
Rotate 0x01 right by 30 bits = 0x00000004
(Or equivalently, rotate left by 2)
Load/Store Instructions
Format: LDR{cond} Rd, [Rn, #offset] ; Load word
STR{cond} Rd, [Rn, #offset] ; Store word
LDRB, STRB ; Byte versions
Addressing modes:
[Rn] ; Base register only
[Rn, #imm] ; Base + immediate offset
[Rn, Rm] ; Base + register offset
[Rn, Rm, LSL #n] ; Base + shifted register
[Rn, #imm]! ; Pre-indexed with writeback
[Rn], #imm ; Post-indexed
Encoding:
Bit 25: 0 = immediate offset, 1 = register offset
Bit 24: 0 = post-indexed, 1 = pre-indexed (or offset)
Bit 23: 0 = subtract offset, 1 = add offset
Bit 22: 0 = word, 1 = byte
Bit 21: 0 = no writeback, 1 = writeback
Bit 20: 0 = store, 1 = load
Branch Instructions
Format: B{cond} label ; Branch
BL{cond} label ; Branch with Link (function call)
Encoding (bits 27-25 = 101):
Bit 24: 0 = Branch, 1 = Branch with Link (saves return address in LR)
Bits 23-0: Signed 24-bit offset (in words)
The offset is:
1. Sign-extended to 32 bits
2. Left-shifted by 2 (word alignment)
3. Added to PC+8 (due to pipeline)
Target = PC + 8 + (SignExtend(offset) << 2)
Example: 0xEAFFFFFE = B .-8 (infinite loop)
Cond = 0xE (always)
Offset = 0xFFFFFE = -2 (signed)
Target = PC + 8 + (-2 * 4) = PC + 8 - 8 = PC
Condition Code Evaluation
Every ARM instruction can be conditional:
bool condition_passed(uint32_t cpsr, uint8_t cond) {
bool N = (cpsr >> 31) & 1; // Negative
bool Z = (cpsr >> 30) & 1; // Zero
bool C = (cpsr >> 29) & 1; // Carry
bool V = (cpsr >> 28) & 1; // Overflow
switch (cond) {
case 0x0: return Z; // EQ: Equal
case 0x1: return !Z; // NE: Not equal
case 0x2: return C; // CS/HS: Carry set
case 0x3: return !C; // CC/LO: Carry clear
case 0x4: return N; // MI: Minus/negative
case 0x5: return !N; // PL: Plus/positive
case 0x6: return V; // VS: Overflow
case 0x7: return !V; // VC: No overflow
case 0x8: return C && !Z; // HI: Unsigned higher
case 0x9: return !C || Z; // LS: Unsigned lower or same
case 0xA: return N == V; // GE: Signed >=
case 0xB: return N != V; // LT: Signed <
case 0xC: return !Z && (N == V); // GT: Signed >
case 0xD: return Z || (N != V); // LE: Signed <=
case 0xE: return true; // AL: Always
case 0xF: return true; // NV: (Special, treat as AL in ARMv5+)
}
return false;
}
Updating Flags
When S bit is set, update CPSR:
void update_flags(ARM_CPU *cpu, uint32_t result, uint32_t op1, uint32_t op2,
bool is_add, bool *carry_out) {
// N flag: bit 31 of result
if (result & 0x80000000) cpu->cpsr |= CPSR_N;
else cpu->cpsr &= ~CPSR_N;
// Z flag: result is zero
if (result == 0) cpu->cpsr |= CPSR_Z;
else cpu->cpsr &= ~CPSR_Z;
// C flag: depends on operation
if (carry_out) {
if (*carry_out) cpu->cpsr |= CPSR_C;
else cpu->cpsr &= ~CPSR_C;
} else if (is_add) {
// Carry from addition: unsigned overflow
if (result < op1) cpu->cpsr |= CPSR_C;
else cpu->cpsr &= ~CPSR_C;
}
// V flag: signed overflow (for ADD/SUB)
if (is_add) {
bool sign1 = op1 & 0x80000000;
bool sign2 = op2 & 0x80000000;
bool signr = result & 0x80000000;
// Overflow if signs of operands same, but result different
if ((sign1 == sign2) && (sign1 != signr))
cpu->cpsr |= CPSR_V;
else
cpu->cpsr &= ~CPSR_V;
}
}
Why This Matters
Building an ARM emulator is valuable for:
- Deep understanding: No way to fake knowledge—every instruction must work
- Tool development: Debuggers, profilers, binary analysis
- Education: Teach others how CPUs work
- Security research: Analyze ARM binaries without hardware
- Vintage computing: Emulate classic ARM systems
- Interview preparation: Demonstrates mastery of computer architecture
Project Specification
What You Will Build
An ARM emulator with:
- Full ARMv4/v5 instruction support: Data processing, load/store, branches
- Memory system: Configurable size, read/write operations
- Verbose execution mode: Show each instruction’s effect
- Debugging features: Breakpoints, single-step, register/memory inspection
- ELF loader: Load real ARM binaries
- I/O emulation: Basic UART for printf debugging
Functional Requirements
- CPU Emulation:
- 16 general-purpose registers (R0-R15)
- CPSR with NZCV flags
- User mode execution (modes optional)
- Correct PC+8 handling
- Instruction Support (minimum):
- Data processing: AND, EOR, SUB, RSB, ADD, ADC, SBC, RSC, TST, TEQ, CMP, CMN, ORR, MOV, BIC, MVN
- Load/Store: LDR, STR, LDRB, STRB
- Branches: B, BL
- Barrel shifter: All shift types
- Multiply: MUL, MLA (optional)
- Memory System:
- Configurable size (default 64KB-1MB)
- Word, halfword, byte access
- Basic ROM/RAM regions
- Debug Features:
- Verbose mode: Print each instruction
- Single-step execution
- Breakpoints
- Register dump
- Memory dump
Non-Functional Requirements
- Correctness: Pass ARM test suites
- Performance: Run simple programs at reasonable speed
- Debuggability: Clear error messages, state inspection
- Portability: Build on Linux/macOS/Windows
Real World Outcome
$ ./arm-emu -v program.bin
ARM Emulator v1.0
Loading binary: program.bin (256 bytes)
Memory: 64KB @ 0x00000000
[0x00000000] E3A00001 MOV R0, #1
R0: 0x00000000 → 0x00000001
[0x00000004] E3A01002 MOV R1, #2
R1: 0x00000000 → 0x00000002
[0x00000008] E0802001 ADD R2, R0, R1
R2: 0x00000000 → 0x00000003
[0x0000000C] E3520005 CMP R2, #5
CPSR: N=1 Z=0 C=0 V=0 (3 < 5)
[0x00000010] AA000002 BGE 0x00000020
Branch NOT taken (N != V)
[0x00000014] E2822001 ADD R2, R2, #1
R2: 0x00000003 → 0x00000004
... execution continues ...
=== Execution Complete ===
Cycles: 47
Instructions: 42
Final state:
R0=0x00000005 R1=0x00000002 R2=0x00000007 R3=0x00000000
R4=0x00000000 R5=0x00000000 R6=0x00000000 R7=0x00000000
R8=0x00000000 R9=0x00000000 R10=0x00000000 R11=0x00000000
R12=0x00000000 SP=0x00010000 LR=0x00000000 PC=0x00000038
CPSR=0x60000010 [nZCv, User mode]
Interactive debug mode:
$ ./arm-emu -d program.bin
ARM Emulator Debug Mode
Type 'help' for commands.
(emu) run
Stopped at breakpoint 0x00000010
(emu) regs
R0=0x00000005 R1=0x00000002 R2=0x00000003 R3=0x00000000
...
PC=0x00000010 CPSR=0x80000010 [Nzcv]
(emu) step
[0x00000010] AA000002 BGE 0x00000020
Branch NOT taken
(emu) mem 0x00001000 16
00001000: 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0A 00 00 Hello, World!...
(emu) break 0x00000030
Breakpoint set at 0x00000030
(emu) continue
Stopped at breakpoint 0x00000030
(emu) quit
Solution Architecture
High-Level Design
┌─────────────────────────────────────────────────────────────────────┐
│ ARM Emulator │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Loader │──▶│ CPU │──▶│ Debug │ │
│ │ (ELF/bin) │ │ Execute │ │ Interface │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Decoder │ │ Memory │ │ Disasm │ │
│ │ (parse insn)│ │ System │ │ (for debug) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ ▲ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────┐ │ │
│ │ ALU │─────────┘ │
│ │ (operations) │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Key Components
| Component | Responsibility | Key Functions |
|---|---|---|
| Loader | Load binaries into memory | load_binary, load_elf |
| CPU Core | Main execution loop | run, step, reset |
| Decoder | Parse instruction bits | decode_insn, identify type |
| ALU | Arithmetic/logic operations | exec_data_proc, barrel_shift |
| Memory | Read/write emulated memory | mem_read, mem_write |
| Disassembler | Convert instructions to text | disassemble |
| Debug | User interaction | set_breakpoint, dump_regs |
Data Structures
// CPU state
typedef struct {
uint32_t r[16]; // R0-R15 (R13=SP, R14=LR, R15=PC)
uint32_t cpsr; // Current Program Status Register
uint32_t spsr; // Saved PSR (for exceptions)
// Banked registers for modes (optional for basic emulator)
uint32_t r_fiq[7]; // R8-R14 for FIQ mode
uint32_t r_svc[2]; // R13, R14 for Supervisor mode
uint32_t r_abt[2]; // R13, R14 for Abort mode
uint32_t r_irq[2]; // R13, R14 for IRQ mode
uint32_t r_und[2]; // R13, R14 for Undefined mode
} ARM_Registers;
typedef struct {
ARM_Registers regs;
uint8_t *memory; // Emulated memory
size_t mem_size;
uint64_t cycles; // Cycle counter
uint64_t insn_count; // Instruction counter
bool halted; // CPU halted
// Debug state
uint32_t breakpoints[16];
int num_breakpoints;
bool verbose; // Print each instruction
bool single_step;
} ARM_CPU;
// Decoded instruction (intermediate representation)
typedef struct {
uint8_t cond; // Condition code
uint8_t type; // Instruction type
uint8_t opcode; // Operation code
uint8_t rn, rd, rs, rm; // Register operands
uint32_t imm; // Immediate value
uint8_t shift_type; // Shift type
uint8_t shift_amount; // Shift amount
bool s_bit; // Update flags?
bool i_bit; // Immediate operand?
bool p_bit; // Pre/post indexing
bool u_bit; // Add/subtract offset
bool b_bit; // Byte/word
bool w_bit; // Writeback
bool l_bit; // Load/store
} DecodedInsn;
// CPSR flags
#define CPSR_N (1 << 31) // Negative
#define CPSR_Z (1 << 30) // Zero
#define CPSR_C (1 << 29) // Carry
#define CPSR_V (1 << 28) // Overflow
#define CPSR_MODE_MASK 0x1F
Main Execution Loop
void cpu_run(ARM_CPU *cpu) {
while (!cpu->halted) {
// Fetch
uint32_t pc = cpu->regs.r[15];
uint32_t insn = mem_read32(cpu, pc - 8); // PC is ahead due to pipeline
// Decode condition
uint8_t cond = insn >> 28;
// Check condition
if (!condition_passed(cpu->regs.cpsr, cond)) {
cpu->regs.r[15] += 4; // Skip instruction
cpu->cycles++;
continue;
}
// Decode and execute
DecodedInsn decoded;
decode_instruction(insn, &decoded);
if (cpu->verbose) {
print_instruction(cpu, pc - 8, insn, &decoded);
}
execute_instruction(cpu, &decoded);
// Check for breakpoints
if (is_breakpoint(cpu, cpu->regs.r[15] - 8)) {
cpu->halted = true;
printf("Breakpoint at 0x%08X\n", cpu->regs.r[15] - 8);
}
cpu->insn_count++;
cpu->cycles++;
if (cpu->single_step) {
cpu->halted = true;
}
}
}
Instruction Dispatch
void execute_instruction(ARM_CPU *cpu, DecodedInsn *insn) {
uint32_t type = insn->type;
switch (type) {
case INSN_DATA_PROC:
exec_data_processing(cpu, insn);
break;
case INSN_MULTIPLY:
exec_multiply(cpu, insn);
break;
case INSN_LOAD_STORE:
exec_load_store(cpu, insn);
break;
case INSN_LOAD_STORE_MULTI:
exec_load_store_multiple(cpu, insn);
break;
case INSN_BRANCH:
exec_branch(cpu, insn);
break;
case INSN_SWI:
exec_swi(cpu, insn);
break;
default:
printf("Unimplemented instruction type: %d\n", type);
cpu->halted = true;
}
}
Implementation Guide
Development Environment Setup
# Required tools
sudo apt-get install gcc gdb
# ARM cross-compiler for creating test binaries
sudo apt-get install gcc-arm-none-eabi
# Create project structure
mkdir -p arm-emu/{src,include,tests,binaries}
cd arm-emu
Project Structure
arm-emu/
├── src/
│ ├── main.c # Entry point, CLI
│ ├── cpu.c # CPU state, execution loop
│ ├── decode.c # Instruction decoder
│ ├── execute.c # Instruction execution
│ ├── alu.c # ALU operations, barrel shifter
│ ├── memory.c # Memory system
│ ├── loader.c # Binary/ELF loader
│ ├── disasm.c # Disassembler
│ └── debug.c # Debug interface
├── include/
│ ├── cpu.h
│ ├── decode.h
│ ├── memory.h
│ └── debug.h
├── tests/
│ ├── test_alu.c # ALU unit tests
│ ├── test_decode.c # Decoder tests
│ └── programs/ # Test ARM programs
│ ├── simple.s # Simple test
│ ├── loop.s # Loop test
│ └── fibonacci.s # Fibonacci
├── Makefile
└── README.md
Implementation Phases
Phase 1: CPU State and Memory (Days 1-4)
Goals:
- Define CPU state structure
- Implement memory system
- Create basic initialization
Tasks:
- Define ARM_CPU structure with registers
- Implement mem_read8/16/32 and mem_write8/16/32
- Implement cpu_init() and cpu_reset()
- Load raw binary into memory
- Test with memory read/write
Key code:
ARM_CPU *cpu_create(size_t mem_size) {
ARM_CPU *cpu = calloc(1, sizeof(ARM_CPU));
cpu->memory = calloc(mem_size, 1);
cpu->mem_size = mem_size;
cpu_reset(cpu);
return cpu;
}
void cpu_reset(ARM_CPU *cpu) {
memset(&cpu->regs, 0, sizeof(cpu->regs));
cpu->regs.cpsr = 0x10; // User mode
cpu->regs.r[15] = 8; // PC starts at 0, but reads as 8 (pipeline)
cpu->halted = false;
cpu->cycles = 0;
cpu->insn_count = 0;
}
uint32_t mem_read32(ARM_CPU *cpu, uint32_t addr) {
if (addr + 3 >= cpu->mem_size) {
printf("Memory read out of bounds: 0x%08X\n", addr);
cpu->halted = true;
return 0;
}
return *(uint32_t *)(cpu->memory + addr);
}
void mem_write32(ARM_CPU *cpu, uint32_t addr, uint32_t value) {
if (addr + 3 >= cpu->mem_size) {
printf("Memory write out of bounds: 0x%08X\n", addr);
cpu->halted = true;
return;
}
*(uint32_t *)(cpu->memory + addr) = value;
}
Checkpoint: CPU initializes, memory read/write works.
Phase 2: Instruction Decoder (Days 5-10)
Goals:
- Decode all instruction types
- Handle barrel shifter encoding
- Parse immediate values
Tasks:
- Implement condition extraction
- Implement instruction type identification (bits 27-25)
- Decode data processing instructions
- Decode load/store instructions
- Decode branch instructions
- Implement barrel shifter decode
Key code:
void decode_instruction(uint32_t insn, DecodedInsn *decoded) {
decoded->cond = (insn >> 28) & 0xF;
uint8_t bits_27_25 = (insn >> 25) & 0x7;
uint8_t bit_4 = (insn >> 4) & 0x1;
uint8_t bit_7 = (insn >> 7) & 0x1;
// Determine instruction type
if (bits_27_25 == 0b000 || bits_27_25 == 0b001) {
// Data processing or multiply
if (bits_27_25 == 0b000 && bit_4 == 1 && bit_7 == 1) {
decoded->type = INSN_MULTIPLY;
decode_multiply(insn, decoded);
} else {
decoded->type = INSN_DATA_PROC;
decode_data_proc(insn, decoded);
}
} else if (bits_27_25 == 0b010 || bits_27_25 == 0b011) {
decoded->type = INSN_LOAD_STORE;
decode_load_store(insn, decoded);
} else if (bits_27_25 == 0b100) {
decoded->type = INSN_LOAD_STORE_MULTI;
decode_ldm_stm(insn, decoded);
} else if (bits_27_25 == 0b101) {
decoded->type = INSN_BRANCH;
decode_branch(insn, decoded);
} else if (bits_27_25 == 0b111) {
decoded->type = INSN_SWI;
decode_swi(insn, decoded);
} else {
decoded->type = INSN_UNDEFINED;
}
}
void decode_data_proc(uint32_t insn, DecodedInsn *decoded) {
decoded->i_bit = (insn >> 25) & 1;
decoded->opcode = (insn >> 21) & 0xF;
decoded->s_bit = (insn >> 20) & 1;
decoded->rn = (insn >> 16) & 0xF;
decoded->rd = (insn >> 12) & 0xF;
if (decoded->i_bit) {
// Immediate operand
uint32_t imm8 = insn & 0xFF;
uint32_t rot = ((insn >> 8) & 0xF) * 2;
decoded->imm = (imm8 >> rot) | (imm8 << (32 - rot));
} else {
// Register operand with shift
decoded->rm = insn & 0xF;
decoded->shift_type = (insn >> 5) & 0x3;
if ((insn >> 4) & 1) {
// Register-specified shift
decoded->rs = (insn >> 8) & 0xF;
decoded->shift_amount = 0; // Will read from Rs
} else {
// Immediate shift
decoded->shift_amount = (insn >> 7) & 0x1F;
decoded->rs = 0xFF; // Not used
}
}
}
Checkpoint: Decoder correctly parses all instruction types.
Phase 3: Data Processing Execution (Days 11-18)
Goals:
- Implement all 16 data processing opcodes
- Implement barrel shifter
- Implement flag updates
Tasks:
- Implement barrel_shift() for all shift types
- Implement each opcode (AND, EOR, SUB, etc.)
- Implement flag calculation
- Handle PC as destination (branch-like behavior)
- Test with simple assembly programs
Key code:
uint32_t barrel_shift(ARM_CPU *cpu, uint32_t value, uint8_t shift_type,
uint8_t shift_amount, bool *carry_out) {
if (shift_amount == 0 && shift_type != 0) {
// Special cases for zero shift amount
switch (shift_type) {
case 0: return value; // LSL #0 = no shift
case 1: *carry_out = (value >> 31) & 1; return 0; // LSR #32
case 2: *carry_out = (value >> 31) & 1;
return (value & 0x80000000) ? 0xFFFFFFFF : 0; // ASR #32
case 3: // RRX
*carry_out = value & 1;
return ((cpu->regs.cpsr & CPSR_C) ? 0x80000000 : 0) | (value >> 1);
}
}
switch (shift_type) {
case 0: // LSL
if (shift_amount > 0) *carry_out = (value >> (32 - shift_amount)) & 1;
return value << shift_amount;
case 1: // LSR
if (shift_amount > 0) *carry_out = (value >> (shift_amount - 1)) & 1;
return value >> shift_amount;
case 2: // ASR
if (shift_amount > 0) *carry_out = (value >> (shift_amount - 1)) & 1;
return (int32_t)value >> shift_amount;
case 3: // ROR
if (shift_amount > 0) *carry_out = (value >> (shift_amount - 1)) & 1;
return (value >> shift_amount) | (value << (32 - shift_amount));
}
return value;
}
void exec_data_processing(ARM_CPU *cpu, DecodedInsn *insn) {
uint32_t op1 = cpu->regs.r[insn->rn];
uint32_t op2;
bool carry_out = (cpu->regs.cpsr & CPSR_C) != 0;
// Get operand 2
if (insn->i_bit) {
op2 = insn->imm;
} else {
uint32_t rm_val = cpu->regs.r[insn->rm];
uint8_t shift_amt = (insn->rs != 0xFF)
? (cpu->regs.r[insn->rs] & 0xFF)
: insn->shift_amount;
op2 = barrel_shift(cpu, rm_val, insn->shift_type, shift_amt, &carry_out);
}
uint32_t result;
bool write_result = true;
bool is_arithmetic = false;
switch (insn->opcode) {
case 0x0: result = op1 & op2; break; // AND
case 0x1: result = op1 ^ op2; break; // EOR
case 0x2: result = op1 - op2; is_arithmetic = true; break; // SUB
case 0x3: result = op2 - op1; is_arithmetic = true; break; // RSB
case 0x4: result = op1 + op2; is_arithmetic = true; break; // ADD
case 0x5: result = op1 + op2 + ((cpu->regs.cpsr & CPSR_C) ? 1 : 0);
is_arithmetic = true; break; // ADC
case 0x6: result = op1 - op2 - ((cpu->regs.cpsr & CPSR_C) ? 0 : 1);
is_arithmetic = true; break; // SBC
case 0x7: result = op2 - op1 - ((cpu->regs.cpsr & CPSR_C) ? 0 : 1);
is_arithmetic = true; break; // RSC
case 0x8: result = op1 & op2; write_result = false; break; // TST
case 0x9: result = op1 ^ op2; write_result = false; break; // TEQ
case 0xA: result = op1 - op2; write_result = false;
is_arithmetic = true; break; // CMP
case 0xB: result = op1 + op2; write_result = false;
is_arithmetic = true; break; // CMN
case 0xC: result = op1 | op2; break; // ORR
case 0xD: result = op2; break; // MOV
case 0xE: result = op1 & ~op2; break; // BIC
case 0xF: result = ~op2; break; // MVN
default: return;
}
// Update flags if S bit set
if (insn->s_bit) {
update_flags(cpu, result, op1, op2, is_arithmetic, &carry_out);
}
// Write result
if (write_result) {
cpu->regs.r[insn->rd] = result;
if (insn->rd == 15) {
// Writing to PC - branch
cpu->regs.r[15] = result + 8; // Compensate for pipeline
} else {
cpu->regs.r[15] += 4; // Advance PC normally
}
} else {
cpu->regs.r[15] += 4;
}
}
Checkpoint: Simple ALU programs execute correctly.
Phase 4: Load/Store and Branches (Days 19-25)
Goals:
- Implement LDR, STR, LDRB, STRB
- Implement B, BL
- Handle all addressing modes
Tasks:
- Implement load/store with immediate offset
- Implement load/store with register offset
- Implement pre/post indexing
- Implement writeback
- Implement B and BL
- Test with loops and memory access
Key code:
void exec_load_store(ARM_CPU *cpu, DecodedInsn *insn) {
uint32_t base = cpu->regs.r[insn->rn];
uint32_t offset;
if (!insn->i_bit) {
// Immediate offset
offset = insn->imm;
} else {
// Register offset (may be shifted)
bool carry;
offset = barrel_shift(cpu, cpu->regs.r[insn->rm],
insn->shift_type, insn->shift_amount, &carry);
}
// Add or subtract offset
uint32_t addr;
if (insn->p_bit) {
// Pre-indexed
addr = insn->u_bit ? base + offset : base - offset;
} else {
// Post-indexed
addr = base;
}
if (insn->l_bit) {
// Load
uint32_t value;
if (insn->b_bit) {
value = mem_read8(cpu, addr);
} else {
value = mem_read32(cpu, addr);
}
cpu->regs.r[insn->rd] = value;
} else {
// Store
uint32_t value = cpu->regs.r[insn->rd];
if (insn->b_bit) {
mem_write8(cpu, addr, value & 0xFF);
} else {
mem_write32(cpu, addr, value);
}
}
// Writeback
if (insn->w_bit || !insn->p_bit) {
if (insn->p_bit) {
cpu->regs.r[insn->rn] = addr;
} else {
cpu->regs.r[insn->rn] = insn->u_bit ? base + offset : base - offset;
}
}
cpu->regs.r[15] += 4;
}
void exec_branch(ARM_CPU *cpu, DecodedInsn *insn) {
// Sign-extend 24-bit offset
int32_t offset = insn->imm;
if (offset & 0x800000) {
offset |= 0xFF000000; // Sign extend
}
offset <<= 2; // Word alignment
if (insn->l_bit) {
// Branch with Link - save return address
cpu->regs.r[14] = cpu->regs.r[15] - 4; // PC points to next+8, we want next
}
// PC = PC + 8 + offset (8 for pipeline)
cpu->regs.r[15] = cpu->regs.r[15] + offset;
}
Checkpoint: Loops and subroutine calls work.
Phase 5: Debug Interface and Polish (Days 26-35)
Goals:
- Implement interactive debugger
- Add disassembler
- Create comprehensive tests
Tasks:
- Implement command parser
- Implement breakpoints
- Implement single-step
- Implement register/memory dump
- Add disassembly output
- Test with Fibonacci, etc.
Checkpoint: Can debug programs interactively, step through code.
Hints in Layers
Hint 1: PC Pipeline Offset
ARM has a 3-stage pipeline. When an instruction reads PC, it gets the address of the current instruction + 8:
// Instruction at 0x1000
MOV R0, PC ; R0 = 0x1008, not 0x1000!
// In emulator, when PC is operand:
uint32_t pc_value = cpu->regs.r[15]; // Already has +8 offset
// When fetching:
uint32_t insn = mem_read32(cpu, cpu->regs.r[15] - 8);
Hint 2: Immediate Rotation
ARM immediate values use 8 bits + 4-bit rotation:
// Bits 11-0 of instruction:
// [11:8] = rotate (r), [7:0] = immediate (i)
// value = i ROR (r * 2)
uint32_t decode_immediate(uint32_t bits) {
uint32_t imm8 = bits & 0xFF;
uint32_t rotate = ((bits >> 8) & 0xF) * 2;
if (rotate == 0) return imm8;
return (imm8 >> rotate) | (imm8 << (32 - rotate));
}
// Examples:
// 0x001 → 0x00000001 (no rotation)
// 0x101 → 0x40000000 (1 rotated right by 2)
// 0xF01 → 0x00000004 (1 rotated right by 30 = left by 2)
Hint 3: Carry Flag in Shifts
The barrel shifter produces a carry out that affects the C flag:
// For LSL, carry is the last bit shifted out:
// LSL #5: carry = bit 27 of original value
// For LSR/ASR, carry is bit (shift_amount - 1):
// LSR #1: carry = bit 0
// For ROR, carry is bit (shift_amount - 1):
// ROR #4: carry = bit 3
// Special: LSL #0 leaves carry unchanged
// Special: LSR/ASR #0 means shift by 32 (in immediate form)
// Special: ROR #0 means RRX (rotate right extended)
Hint 4: Testing Strategy
Create minimal test programs:
# test_add.s - Test ADD instruction
MOV R0, #5
MOV R1, #3
ADD R2, R0, R1 @ R2 should be 8
SWI #0 @ Halt (you define this)
# Assemble with:
arm-none-eabi-as -o test.o test_add.s
arm-none-eabi-ld -Ttext=0x0 -o test.elf test.o
arm-none-eabi-objcopy -O binary test.elf test.bin
Check each register value after execution.
Hint 5: SWI for Basic I/O
Implement SWI (Software Interrupt) for basic I/O:
void exec_swi(ARM_CPU *cpu, DecodedInsn *insn) {
uint32_t call_num = insn->imm & 0xFFFFFF;
switch (call_num) {
case 0: // Exit
cpu->halted = true;
break;
case 1: // Print char in R0
putchar(cpu->regs.r[0] & 0xFF);
break;
case 2: // Print number in R0
printf("%d", cpu->regs.r[0]);
break;
// Add more as needed
}
cpu->regs.r[15] += 4;
}
Hint 6: Debugging Your Emulator
When things go wrong:
// Add instruction tracing
void trace_instruction(ARM_CPU *cpu, uint32_t pc, uint32_t insn) {
printf("[%08X] %08X ", pc, insn);
disassemble(insn);
printf("\n");
printf(" R0=%08X R1=%08X R2=%08X R3=%08X\n",
cpu->regs.r[0], cpu->regs.r[1], cpu->regs.r[2], cpu->regs.r[3]);
printf(" CPSR=%08X [%c%c%c%c]\n", cpu->regs.cpsr,
(cpu->regs.cpsr & CPSR_N) ? 'N' : 'n',
(cpu->regs.cpsr & CPSR_Z) ? 'Z' : 'z',
(cpu->regs.cpsr & CPSR_C) ? 'C' : 'c',
(cpu->regs.cpsr & CPSR_V) ? 'V' : 'v');
}
Compare with QEMU or real hardware for reference.
Testing Strategy
Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | Test individual functions | Barrel shifter, flag calculation |
| Decode | Verify instruction parsing | All instruction formats |
| Execute | Test each opcode | ADD, SUB, MOV, etc. |
| Integration | Run complete programs | Loops, subroutines |
| Comparison | Match real ARM behavior | Run same code on QEMU |
Critical Test Cases
// Test 1: Basic MOV
void test_mov(void) {
uint32_t insn = 0xE3A00042; // MOV R0, #0x42
ARM_CPU cpu;
cpu_init(&cpu);
load_word(&cpu, 0, insn);
cpu_step(&cpu);
assert(cpu.regs.r[0] == 0x42);
}
// Test 2: Barrel shifter
void test_shift(void) {
bool carry;
assert(barrel_shift(NULL, 0x1, 0, 4, &carry) == 0x10); // LSL #4
assert(barrel_shift(NULL, 0x10, 1, 4, &carry) == 0x1); // LSR #4
assert(barrel_shift(NULL, 0x80000000, 2, 4, &carry) == 0xF8000000); // ASR #4
}
// Test 3: Condition codes
void test_conditions(void) {
ARM_CPU cpu;
cpu_init(&cpu);
// Test EQ (Z set)
cpu.regs.cpsr = CPSR_Z;
assert(condition_passed(cpu.regs.cpsr, 0) == true);
assert(condition_passed(cpu.regs.cpsr, 1) == false); // NE
// Test signed comparisons
cpu.regs.cpsr = CPSR_N | CPSR_V; // N=V, so GE is true
assert(condition_passed(cpu.regs.cpsr, 0xA) == true); // GE
}
// Test 4: Fibonacci
void test_fibonacci(void) {
ARM_CPU cpu;
cpu_init(&cpu);
load_program(&cpu, "tests/fibonacci.bin");
cpu_run(&cpu);
// Expect R0 = 55 (10th Fibonacci number)
assert(cpu.regs.r[0] == 55);
}
Test ARM Programs
# fibonacci.s - Calculate Fibonacci(10)
MOV R0, #10 @ n
MOV R1, #0 @ fib(0)
MOV R2, #1 @ fib(1)
loop:
CMP R0, #0
BEQ done
ADD R3, R1, R2 @ next = fib(n-1) + fib(n-2)
MOV R1, R2 @ shift
MOV R2, R3
SUB R0, R0, #1
B loop
done:
MOV R0, R1 @ result in R0
SWI #0 @ halt
Common Pitfalls & Debugging
Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| PC offset wrong | Branches go to wrong place | Remember PC+8 in ARM mode |
| Immediate decode wrong | Wrong values loaded | Test rotation separately |
| Carry not updated | Condition checks fail | Update C in logical ops too |
| Signed vs unsigned | Wrong overflow detection | Use correct cast in ASR |
| Byte order | Memory garbled | Ensure little-endian |
| Missing S-bit check | Flags update unexpectedly | Only update if S=1 |
Debugging Tips
- Single instruction tests: Test each instruction type in isolation
- Compare with QEMU: Run same code, compare register states
- Trace everything: Print before/after for each instruction
- Use known binaries: ARM toolchain’s test programs
- Check edge cases: Shift by 0, shift by 32, immediate 0
Extensions & Challenges
Beginner Extensions
- Thumb mode: Add 16-bit Thumb instruction support
- More verbose output: Show barrel shifter operations
- Instruction counts: Profile which instructions execute most
Intermediate Extensions
- Full multiply support: MUL, MLA, UMULL, SMULL
- Load/Store Multiple: LDM, STM
- Coprocessor stubs: Handle CP15 for system control
- ELF loader: Parse ELF files, set up sections
Advanced Extensions
- ARM/Thumb interworking: BX instruction
- Processor modes: Implement all 7 modes with banked registers
- Exceptions: Implement exception handling
- Memory protection: Add simple MPU emulation
- JIT compilation: Translate hot paths to native code
The Interview Questions They’ll Ask
- “How does an ARM emulator work?”
- Fetch-decode-execute loop, condition checking, barrel shifter
- “What is the barrel shifter and why is it important?”
- Shifts operand2 as part of data processing, powerful for multiplication/division
- “Explain ARM’s condition codes”
- NZCV flags, almost all instructions conditional, saves branches
- “Why is PC+8 instead of PC?”
- 3-stage pipeline: fetch, decode, execute - PC points to fetch stage
- “How do you test an emulator for correctness?”
- Unit tests, comparison with real hardware, test suites
- “What’s the difference between emulation and simulation?”
- Emulation: functionally identical behavior
- Simulation: may model timing, power, etc.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| ARM instruction set | ARM Architecture Reference Manual | Section A |
| Emulator design | Computer Systems: A Programmer’s Perspective | Ch. 4 |
| CPU pipelines | Computer Organization and Design ARM Edition | Ch. 4 |
| Condition codes | The Art of ARM Assembly, Vol 1 | Ch. 4 |
| Memory systems | Computer Architecture (Hennessy) | Ch. 5 |
| Binary analysis | Practical Binary Analysis | Ch. 2-3 |
Self-Assessment Checklist
Understanding
- I can decode any ARM data processing instruction by hand
- I understand how the barrel shifter encodes shift type and amount
- I can evaluate condition codes given NZCV flag states
- I understand the PC+8 pipeline offset
- I can explain the difference between immediate and register operand encoding
Implementation
- All 16 data processing opcodes work correctly
- Barrel shifter handles all shift types including edge cases
- Load/store works with all addressing modes
- Branches calculate correct targets
- Flags are updated correctly when S bit is set
Testing
- Simple programs (ADD, MOV) pass
- Loop programs work
- Fibonacci computes correctly
- Results match QEMU or real ARM
Learning Milestones
- Simple programs run (MOV, ADD) → Basic decode/execute works
- Branches and loops work → Control flow is correct
- Fibonacci computes correctly → Arithmetic and memory work
- You can run real test binaries → Emulator is production-quality
This guide was expanded from LEARN_ARM_DEEP_DIVE.md. For the complete learning path, see the project index.