Project 8: ABI Conformance Audit
Audit binaries for ABI rule compliance.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3 |
| Time Estimate | 10-14 hours |
| Main Programming Language | Python (Alternatives: Rust, Go) |
| Alternative Programming Languages | Rust, Go |
| Coolness Level | Level 3 |
| Business Potential | Level 3 |
| Prerequisites | Concept 2: Registers & Calling Conventions, Concept 6: Toolchain & ELF |
| Key Topics | disassembly parsing, ABI checks |
1. Learning Objectives
By completing this project, you will:
- Translate ARM concepts into observable outputs you can verify.
- Explain why each toolchain or hardware step is necessary.
- Detect and fix at least one realistic failure mode.
- Communicate the result clearly in a technical review or interview.
2. All Theory Needed (Per-Concept Breakdown)
Registers & Calling Conventions
Fundamentals Registers are the CPU’s fastest storage and form the working set for computation. ARM profiles differ in register count and special-purpose roles, but all share the idea that function calls require a contract: how arguments and return values are passed, which registers must be preserved, and how the stack is organized. The procedure call standard defines this contract so independently compiled code can interoperate. citeturn3search7 In Cortex-M, the register file is small and heavily constrained by Thumb encoding, whereas AArch64 offers 31 general-purpose registers and a distinct separation between 64-bit (X) and 32-bit (W) views. citeturn2search0 The stack pointer and link register govern call/return flow, and misusing them corrupts control flow even if individual instructions look correct.
Deep Dive The register file is the interface between the ISA and your mental model. On Cortex-M0+, you have 16 architectural registers (r0–r15), with r13 as the stack pointer, r14 as the link register (return address), and r15 as the program counter. Thumb encodings make low registers (r0–r7) more convenient, and this shapes how you allocate values and temporaries. On AArch64, the register file expands to 31 general-purpose registers (x0–x30), plus a dedicated SP. The lower 32-bit view (w0–w30) is an alias, not separate storage. citeturn2search0 This abundance reduces register pressure but increases the importance of ABI rules to maintain interoperability.
Calling conventions are not optional. They define which registers hold arguments, which registers are caller-saved or callee-saved, and how the stack is aligned. The AAPCS64 procedure call standard formalizes this for AArch64, and the same principle applies to Cortex-M via EABI conventions. citeturn3search7 If a function trashes a callee-saved register or returns with an unbalanced stack pointer, the next return will jump to a garbage address. The stack is not merely a place to store locals: it is a control-flow structure with strict invariants. Typical invariants include: the stack pointer must remain aligned to a fixed boundary at call boundaries, return addresses must be preserved (often via LR), and exception frames must be compatible with hardware expectations.
The difference between M-profile and A-profile also affects how you reason about stack frames and interrupts. Cortex-M pushes a standard register frame on interrupt entry and uses special EXC_RETURN values to restore state. AArch64 exceptions are higher-level and interact with exception levels (EL0–EL3). This means the calling convention is entangled with the exception model: you must ensure that the register context saved by an ISR matches what the hardware expects, and that your handler preserves the right registers. Even if you never write an ISR, debugging requires you to recognize how the stack frame was laid out and which registers are live at a given moment.
Finally, registers are a performance and correctness story. Keeping hot values in registers avoids memory latency, but retaining too many values across calls increases spill overhead and complexity. The ABI is a compromise: caller-saved registers allow fast leaf functions to avoid stack usage, while callee-saved registers allow values to persist across deeper calls. Understanding this trade-off is what turns disassembly from a wall of text into a readable narrative.
How this fits on projects
- Central to P02 (Register and Stack Visualizer) and P08 (Calling Convention Audit).
Definitions & key terms
- Register file: The set of architectural registers visible to instructions.
- Caller-saved: Registers a caller must preserve if it needs their values after a call.
- Callee-saved: Registers a callee must preserve across the call. citeturn3search7
- Stack frame: A structured region on the stack holding locals and saved state.
- Procedure call standard: ABI rules for passing arguments and preserving registers. citeturn3search7
Mental model diagram
Cortex-M0+ Register File:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
32 bits wide
◀──────────────────▶
┌───────────────────────────────┐
r0 │ General Purpose (argument 1) │ ─┐
├───────────────────────────────┤ │
r1 │ General Purpose (argument 2) │ │ Low registers:
├───────────────────────────────┤ │ - All Thumb instructions work
r2 │ General Purpose (argument 3) │ │ - Used for function arguments
├───────────────────────────────┤ │ and return values
r3 │ General Purpose (argument 4) │ │ - Caller-saved (scratch)
├───────────────────────────────┤ │
r4 │ General Purpose (preserved) │ │
├───────────────────────────────┤ │
r5 │ General Purpose (preserved) │ │
├───────────────────────────────┤ │
r6 │ General Purpose (preserved) │ │
├───────────────────────────────┤ │
r7 │ General Purpose (frame ptr) │ ─┘
├───────────────────────────────┤
r8 │ General Purpose (preserved) │ ─┐ High registers:
├───────────────────────────────┤ │ - Only some instructions work
r9 │ General Purpose (preserved) │ │ - Must move to low reg for
├───────────────────────────────┤ │ most operations
r10 │ General Purpose (preserved) │ │ - Callee-saved
├───────────────────────────────┤ │
r11 │ General Purpose (preserved) │ │
├───────────────────────────────┤ │
r12 │ Intra-Procedure Call scratch │ ─┘
├═══════════════════════════════┤ ──── SPECIAL REGISTERS ────
r13 │ Stack Pointer (SP) │ Points to top of stack
├───────────────────────────────┤ (actually 2 SPs: MSP and PSP)
r14 │ Link Register (LR) │ Return address for functions
├───────────────────────────────┤
r15 │ Program Counter (PC) │ Address of next instruction
└───────────────────────────────┘
┌───────────────────────────────┐
xPSR│ N│Z│C│V│ ... │ Exception # │ Program Status Register:
└─┬─┴─┴─┴─────────────────────┴─┘ N = Negative, Z = Zero
│ C = Carry, V = Overflow
└─ Condition flags set by
arithmetic operations
IMPORTANT: Cortex-M0+ has NO program counter-relative addressing for
data. You MUST use literal pools or calculate addresses manually!

AArch64 Register File:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
64 bits wide (X registers)
◀──────────────────────────────────────────▶
│ 32 bits (W register alias) │
│ ◀─────────────────────┤
┌───────────────────────────────────────────────────────────────┐
x0 │ Argument 1 / Return value │ w0
├───────────────────────────────────────────────────────────────┤
x1 │ Argument 2 / Return value (for 128-bit returns) │ w1
├───────────────────────────────────────────────────────────────┤
x2 │ Argument 3 │ w2
├───────────────────────────────────────────────────────────────┤
... ...
├───────────────────────────────────────────────────────────────┤
x7 │ Argument 8 │ w7
├───────────────────────────────────────────────────────────────┤
x8 │ Indirect result location (for large struct returns) │ w8
├───────────────────────────────────────────────────────────────┤
x9 │ Temporary / Caller-saved │ w9
├───────────────────────────────────────────────────────────────┤
... │ x9-x15: Temporaries (caller-saved) │
├───────────────────────────────────────────────────────────────┤
x16 │ IP0 - Intra-procedure-call scratch (PLT, veneers) │ w16
├───────────────────────────────────────────────────────────────┤
x17 │ IP1 - Intra-procedure-call scratch │ w17
├───────────────────────────────────────────────────────────────┤
x18 │ Platform register (reserved on some OSes) │ w18
├───────────────────────────────────────────────────────────────┤
x19 │ Callee-saved (must preserve across calls) │ w19
├───────────────────────────────────────────────────────────────┤
... │ x19-x28: Callee-saved │
├═══════════════════════════════════════════════════════════════┤
x29 │ Frame Pointer (FP) │ w29
├───────────────────────────────────────────────────────────────┤
x30 │ Link Register (LR) - return address │ w30
├───────────────────────────────────────────────────────────────┤
SP │ Stack Pointer (not a GPR, dedicated register) │ wsp
├───────────────────────────────────────────────────────────────┤
PC │ Program Counter (not directly accessible like ARM32!) │
└───────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────┐
XZR │ Zero Register (reads as 0, writes discarded) │ wzr
└───────────────────────────────────────────────────────────────┘
^ This is REVOLUTIONARY - no wasted instruction to clear!
SIMD/Floating-Point Registers (32 × 128-bit):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌───────────────────────────────────────────────────────────────┐
v0 │ B0│B1│B2│...│B15│ ← 16 bytes = 128 bits (Q0/V0) │
│ H0│H1│...│H7 │ ← 8 halfwords │
│ S0│S1│S2│S3 │ ← 4 singles (float) │
│ D0│D1 │ ← 2 doubles │
└───────────────────────────────────────────────────────────────┘
Used for: floating-point, SIMD (NEON), and crypto operations
Key Differences from ARM32:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• 31 GPRs (vs 16) → Less register pressure, fewer spills
• PC not directly readable/writable → Use ADR/ADRP for addresses
• Zero register (xzr/wzr) → MOV x0, xzr instead of MOV r0, #0
• No conditional execution → Use CSEL, CSINC instead
• 64-bit addresses → Can address all of RAM directly
• All instructions 32-bit → No 16-bit Thumb encoding

How it works (step-by-step, with invariants and failure modes)
- Arguments are placed in registers according to the ABI contract. citeturn3search7
- Callee preserves its required registers and sets up a stack frame if needed.
- Return value is placed in the agreed register(s) before restoring SP and LR.
- Failure mode: If SP alignment is violated or callee-saved registers are clobbered, returns jump to wrong addresses or data corrupts silently.
Minimal concrete example (pseudo-assembly, not runnable)
CALL f(a,b):
ARG0 <- a
ARG1 <- b
LR <- return_address
SP <- SP - frame_size
...
return: restore SP, jump LR
Common misconceptions
- “More registers always mean simpler code” → ABI rules still constrain usage.
- “Stack is just a data structure” → It also encodes control flow.
Check-your-understanding questions
- Why does the ABI require callee-saved registers?
- What happens if a function returns without restoring SP?
- Why does AArch64 expose 31 GPRs instead of 16?
Check-your-understanding answers
- It lets callers rely on certain registers surviving across calls, enabling composition. citeturn3search7
- The return address is read from the wrong stack location, leading to a crash or silent corruption.
- AArch64’s design prioritizes performance and reduced spills; the larger register file supports that. citeturn2search0
Real-world applications
- Disassembly analysis for security or performance auditing.
- Interfacing assembly routines with C libraries. citeturn3search7
Where you’ll apply it
- This project: see §3.1 and §5.4 in P08-abi-conformance-audit.md
- P02 Register and Stack Visualizer
- P08 Calling Convention Audit
References
- AArch64 register model overview. citeturn2search0
- AAPCS64 procedure call standard. citeturn3search7
Key insights Calling conventions are the glue that makes low-level code composable.
Summary Registers and the stack are not independent; they form a contract that every function must obey.
Homework/Exercises to practice the concept
- Draw a stack frame for a function that calls two helpers and uses three local variables.
- Identify which registers you would preserve in a callee according to a generic ABI.
Solutions to the homework/exercises
- The frame must allocate locals, save the return address, and preserve any callee-saved registers used.
- Preserve registers designated as callee-saved in the ABI; all others are caller-saved.
Toolchain & ELF
Fundamentals Assembly alone is not executable; you need a toolchain to assemble, link, and package code into a binary format. GNU as (the GNU assembler) accepts assembly source and emits object files; the linker combines objects into an executable with sections and symbols. citeturn1search1 On most ARM systems, the object format is ELF, defined by the System V ABI family. citeturn1search4 Understanding sections, symbols, and relocations is essential for boot images, firmware layout, and disassembly.
Deep Dive The toolchain is a pipeline: source → object → linked image. The assembler parses directives, encodes instructions for the target ISA, and emits relocatable objects. The linker then resolves symbols, assigns addresses, applies relocations, and produces a final ELF file or a raw binary. This is not a black box: if your startup code lands at the wrong address or your vector table is misaligned, the linker script is responsible. The GNU assembler manual documents directive syntax and how the assembler handles sections, alignment, and symbols. citeturn1search1
ELF (Executable and Linkable Format) is the standard container for compiled objects. It defines headers, sections, and symbol tables so tools can reason about what is in a binary. citeturn1search4 ELF’s strength is transparency: you can inspect sections such as .text (code), .data (initialized data), .bss (zero-initialized data), and custom sections for vector tables or boot metadata. In embedded contexts, you often convert ELF into a raw binary that can be flashed, but the ELF remains the authoritative artifact for debugging because it contains symbols and relocation information.
Relocations are where everything connects. When the assembler emits an instruction that references a symbol whose address is not yet known, it emits a relocation entry. The linker later resolves it. This is how references to labels, functions, and global variables are patched. If you understand relocations, you can interpret why certain instructions appear in disassembly, and you can identify errors like “relocation overflow” or “undefined reference.” The same reasoning applies to position-independent code or shared libraries on A-profile systems.
In practical terms, mastering the toolchain lets you answer questions like: Why is my vector table not at the start of flash? Why does the linker place my .data in RAM but my .text in flash? Why does a symbol show up as undefined? These are the exact questions you will encounter in bare-metal ARM development, and they can only be solved by understanding ELF and the linker. The toolchain also connects to diagnostics: objdump and readelf are not just utilities; they are the microscope that lets you see what the assembler and linker actually produced.
How this fits on projects
- Core to P01 (Toolchain Pipeline Explorer) and P10 (Capstone Monitor).
Definitions & key terms
- Assembler: Translates assembly source into object files. citeturn1search1
- Linker: Resolves symbols and produces an executable or binary.
- ELF: Executable and Linkable Format for binaries. citeturn1search4
- Relocation: A placeholder that the linker resolves to a final address.
Mental model diagram
Toolchain Flow
──────────────
Source (.s) → Assembler → Object (.o) → Linker → ELF (.elf) → Binary (.bin)
│ │
Symbols/Relocs Sections/Addresses
How it works (step-by-step, with invariants and failure modes)
- Assemble source into relocatable objects.
- Link with a linker script or default layout.
- Verify ELF sections and symbols.
- Failure mode: wrong section placement → boot hangs or interrupts jump to wrong address.
Minimal concrete example (pseudo, not runnable)
.section .vectors
.word reset_handler
.linker: place .vectors at flash start
Common misconceptions
- “ELF is only for OS programs” → It is central in embedded, too. citeturn1search4
- “Linker script is optional” → Not when you need precise memory layout.
Check-your-understanding questions
- What is the role of a relocation entry?
- Why do embedded projects often convert ELF to raw binary?
- What is the difference between
.textand.bss?
Check-your-understanding answers
- It records a reference the linker must patch with a final address.
- Flashing tools often want raw bytes, but ELF holds symbols for debugging.
.textholds code;.bssholds zero-initialized data.
Real-world applications
- Firmware image layout, boot loaders, and disassembly tooling.
Where you’ll apply it
- This project: see §3.1 and §5.4 in P08-abi-conformance-audit.md
- P01 Toolchain Pipeline Explorer
- P10 Capstone Monitor
References
- GNU assembler manual. citeturn1search1
- ELF format and ABI overview. citeturn1search4
Key insights The toolchain is the bridge between assembly and hardware; without it, nothing runs.
Summary Understanding ELF and linking turns build failures into solvable layout problems.
Homework/Exercises to practice the concept
- Identify three sections you expect in a bare-metal ELF and explain why.
- Explain how a symbol reference becomes a concrete address.
Solutions to the homework/exercises
.textfor code,.datafor initialized globals,.bssfor zeroed globals.- The assembler emits a relocation that the linker resolves to the final address.
3. Project Specification
3.1 What You Will Build
A checker that flags ABI violations like clobbered callee-saved registers.
3.2 Functional Requirements
- Requirement 1: Parse symbols and identify function boundaries
- Requirement 2: Detect missing save/restore patterns
- Requirement 3: Output a report with violations
3.3 Non-Functional Requirements
- Deterministic output
- Low false positives
3.4 Example Usage / Output
$ abi-audit sample.elf
OK: foo preserves x19-x20
WARN: bar modifies x19 without saving
$ abi-audit --missing-file
error: input file not found
exit code: 2
3.5 Data Formats / Schemas / Protocols
- Report: function, rule, status
3.6 Edge Cases
- Stripped binaries
- Inline functions
3.7 Real World Outcome
This is the golden reference for success:
- You can explain ABI violations in disassembly.
3.7.1 How to Run (Copy/Paste)
- Build: follow the toolchain steps defined in this guide
- Run: use the CLI examples in §3.4 with fixed inputs
- Expected directory: project root
3.7.2 Golden Path Demo (Deterministic)
Run with a fixed input set and confirm output matches §3.4 exactly.
3.7.3 If CLI: Exact Terminal Transcript
$ abi-audit sample.elf
OK: foo preserves x19-x20
WARN: bar modifies x19 without saving
$ abi-audit --missing-file
error: input file not found
exit code: 2
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Input Layer │───▶│ Core Logic │───▶│ Output Layer │
└──────────────┘ └──────────────┘ └──────────────┘
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Input Parser | Validate and normalize input | Strict error handling |
| Core Engine | Perform the main computation | Deterministic paths |
| Reporter | Produce user-facing output | Stable formatting |
4.3 Data Structures (No Full Code)
Record Entry {
name: string
fields: list
notes: text
}
4.4 Algorithm Overview
Key Algorithm: Core Flow
- Parse input and validate parameters.
- Execute the core transformation or analysis.
- Emit deterministic output or error summary.
Complexity Analysis:
- Time: O(n) in the size of input records
- Space: O(n) for stored mappings and logs
5. Implementation Guide
5.1 Development Environment Setup
# Install toolchain and verify versions
toolchain --version
5.2 Project Structure
project-root/
├── src/
│ ├── core
│ └── io
├── tests/
│ └── fixtures
├── docs/
└── README.md
5.3 The Core Question You’re Answering
“Audit binaries for ABI rule compliance.”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Registers & Calling Conventions
- What is the key invariant you must preserve?
- Toolchain & ELF
- What is the key invariant you must preserve?
5.5 Questions to Guide Your Design
- Data Flow
- How does input become output?
- Which steps must be deterministic?
- Validation
- What is the simplest test that proves correctness?
- How will you detect regressions?
5.6 Thinking Exercise
Trace the Critical Path
Write a step-by-step trace of the most important workflow in this project.
Questions to answer:
- Where could a subtle bug hide?
- What would you log to prove correctness?
5.7 The Interview Questions They’ll Ask
- “What is the core invariant this project relies on?”
- “How would you debug a failure in this workflow?”
- “What trade-offs did you make in design?”
- “How does this map to real hardware or toolchains?”
- “How do you prove your output is correct?”
5.8 Hints in Layers
Hint 1: Start small Focus on the smallest input that still demonstrates the concept.
Hint 2: Make output deterministic Fix inputs and produce stable logs before expanding functionality.
Hint 3: Validate against a known reference Compare with a known-good output or specification.
Hint 4: Add instrumentation Log internal steps so you can verify each phase explicitly.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Core concept | “ARM Assembly Language” by William Hohl | Ch. 3-5 |
| Binary formats | “Linkers and Loaders” by John R. Levine | Ch. 1-3 |
5.10 Implementation Phases
Phase 1: Foundation (2-4 hours)
Goals:
- Establish a minimal working pipeline
- Validate one end-to-end path
Tasks:
- Build the smallest viable input and output
- Verify outputs against a reference Checkpoint: Output matches expected golden path
Phase 2: Core Functionality (4-8 hours)
Goals:
- Implement main logic and validation
- Add structured error handling
Tasks:
- Implement the core transformation
- Add deterministic reporting Checkpoint: Core tests pass reliably
Phase 3: Polish & Edge Cases (2-4 hours)
Goals:
- Cover edge cases
- Improve output clarity
Tasks:
- Add negative tests
- Document limitations Checkpoint: All edge cases handled gracefully
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Input format | Free-form vs structured | Structured | Easier validation |
| Output format | Human vs machine | Both | Supports verification and tooling |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate core logic | Field parsing, bounds checks |
| Integration Tests | Validate full flow | End-to-end CLI runs |
| Edge Case Tests | Validate boundaries | Empty input, invalid flags |
6.2 Critical Test Cases
- Golden path: Fixed input produces known output.
- Invalid input: Error path triggers correct exit code.
- Boundary case: Maximum supported value handled correctly.
6.3 Test Data
Input: fixed seed or fixed fixture
Expected: exact output text from §3.4
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Misaligned assumptions | Unexpected output | Re-check invariants |
| Missing validation | Silent failures | Add explicit checks |
| Non-determinism | Flaky output | Fix inputs and seeds |
7.2 Debugging Strategies
- Trace everything: Log each step with stable ordering
- Compare against reference: Use known-good outputs
7.3 Performance Traps
- Avoid repeated parsing of the same input; cache results when possible
8. Extensions & Challenges
8.1 Beginner Extensions
- Add one extra output format
- Add a help screen with examples
8.2 Intermediate Extensions
- Add a verification mode that compares two outputs
- Add structured JSON output
8.3 Advanced Extensions
- Add a batch mode for large inputs
- Add cross-target comparisons (M vs A profile)
9. Real-World Connections
9.1 Industry Applications
- Firmware bring-up: use the same checks to validate early boot images
- Security audits: analyze binaries for ABI or control-flow correctness
9.2 Related Open Source Projects
- binutils: source of many ARM tooling workflows
- QEMU: emulator used for ARM testing
9.3 Interview Relevance
- Explains why ARM behavior differs across profiles
- Demonstrates toolchain literacy and debugging rigor
10. Resources
10.1 Essential Reading
- “ARM Assembly Language” by William Hohl - practical instruction usage
- “Linkers and Loaders” by John R. Levine - binary layout
10.2 Video Resources
- ARM architecture overview talks and lectures
10.3 Tools & Documentation
- GNU binutils documentation
- Arm developer documentation
10.4 Related Projects in This Series
- This project connects with: P01-toolchain-pipeline-explorer.md, P02-register-stack-visualizer.md, P03-thumb-encoder-decoder.md
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the core concept without notes
- I can explain why my design choices were necessary
- I can describe one realistic failure mode
11.2 Implementation
- All functional requirements are met
- Tests pass deterministically
- Edge cases are documented
11.3 Growth
- I can describe what I would improve next time
- I can explain this project in an interview
12. Submission / Completion Criteria
Minimum Viable Completion:
- Core functionality works on reference inputs
- Deterministic golden path is documented
- At least one failure path is demonstrated
Full Completion:
- All minimum criteria plus:
- Edge cases are covered with tests
- Output format is stable and documented
Excellence (Going Above & Beyond):
- Add a comparison against a second target
- Provide a short write-up of lessons learned