Project 1: Toolchain Pipeline Explorer
Build a repeatable pipeline that assembles, links, and inspects Cortex-M and AArch64 binaries.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2 |
| Time Estimate | 6-10 hours |
| Main Programming Language | Assembly + C (Alternatives: Rust, Zig) |
| Alternative Programming Languages | Rust, Zig |
| Coolness Level | Level 2 |
| Business Potential | Level 2 |
| Prerequisites | Binary/hex basics, CLI tooling comfort, Concept 6: Toolchain & ELF |
| Key Topics | ELF sections, linker layout, ISA targeting |
1. Learning Objectives
By completing this project, you will:
- Translate ARM concepts into observable outputs you can verify.
- Explain why each toolchain or hardware step is necessary.
- Detect and fix at least one realistic failure mode.
- Communicate the result clearly in a technical review or interview.
2. All Theory Needed (Per-Concept Breakdown)
Profiles & Execution States
Fundamentals ARM is a family of architectures organized into profiles optimized for different constraints. The A-profile targets application processors that run rich OSes (phones, laptops, servers), the M-profile targets microcontrollers with tight power and memory budgets, and the R-profile targets deterministic real-time systems. citeturn0search0turn0search1turn0search2 Each profile implies a different set of instructions, privilege models, and system features. Within a profile, ARM defines execution states (such as AArch64 or AArch32) that determine register width, instruction encoding, and address space. AArch64 is the 64-bit execution state introduced in ARMv8-A, while AArch32 is the 32-bit state; M-profile uses Thumb encodings for compact code density and simpler decode logic. citeturn0search2 This is why “ARM assembly” is not a single language: the same mnemonic can encode differently, or even be invalid, depending on profile and state.
Deep Dive The profile split is the most important high-level idea in ARM. A-profile cores are built to host complex operating systems with virtual memory, multi-core scheduling, and high performance. That means features like exception levels, MMUs, and richer instruction sets matter. In contrast, M-profile focuses on minimal latency, low power, and deterministic behavior: it strips away many features to reduce silicon cost and simplify real-time response. R-profile sits in-between: it retains more predictability than A-profile but includes stronger real-time guarantees than M-profile. citeturn0search0turn0search1 When you choose to write assembly, you’re implicitly choosing a profile, and that choice changes everything from the boot flow to the toolchain arguments you use.
Execution states deepen the split. In ARMv8-A, AArch64 brings a new 64-bit register file and 32-bit fixed-length instruction encoding (A64). AArch32 keeps the 32-bit model (A32/T32). This means that for A-profile hardware, your code must declare its intended execution state; otherwise, even valid mnemonics may assemble into the wrong encoding or fail. citeturn0search2 M-profile, by contrast, uses Thumb encodings by design, favoring compact instructions and simpler decode paths. These constraints are not academic. They drive register availability, calling convention differences, and even the structure of your interrupt handlers. If you write code that assumes AArch64 but run on Cortex-M, the encoding and semantics are incompatible.
Another subtle but critical effect of profile and state is the system-level context. A-profile expects multiple privilege levels and potentially a hypervisor. M-profile’s exception model is simpler, its vector table is fixed and immediate, and it typically boots directly into a single firmware image. R-profile targets systems where real-time guarantees trump throughput; this affects interrupt priority, memory latency assumptions, and peripheral access patterns. Understanding profile choice lets you reason about why an instruction exists, why a particular addressing mode is missing, and why certain system registers are visible or hidden.
Finally, architecture profiles determine the ecosystem around your work. A-profile benefits from abundant tooling, standardized ABIs, and OS integration, while M-profile leans on vendor SDKs, board-specific memory maps, and smaller toolchains. This guide intentionally spans both because many real-world systems combine them: a Linux-capable application processor for high-level features and a microcontroller for deterministic control. Once you see that split, you can design experiments and projects that map to the right target without confusion.
How this fits on projects
- Shapes target selection in P01 (Toolchain Pipeline Explorer) and P07 (Exception Level Lab).
- Determines encoding assumptions in P03 (Thumb Encoder/Decoder).
Definitions & key terms
- Profile: A family of ARM features optimized for a market segment (A/M/R). citeturn0search0turn0search1turn0search2
- Execution state: The architectural mode (AArch64, AArch32, Thumb) that defines register width and instruction encoding. citeturn0search2
- AArch64: 64-bit execution state introduced in ARMv8-A.
- AArch32: 32-bit execution state in ARMv8-A.
- Thumb: Compact instruction encoding used by M-profile.
Mental model diagram
ARM Architecture Evolution:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌─────────────────────────────────────────────┐
│ ARM Holdings (IP owner) │
│ Designs architectures, licenses to others │
└─────────────────────┬───────────────────────┘
│
┌────────────────────────────────────────┼────────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌────────────────┐ ┌────────────────┐
│ M-Profile │ │ A-Profile │ │ R-Profile │
│ Microcontrollers │ Applications │ │ Real-Time │
│ (Embedded) │ │ (Phones, PCs) │ │ (Automotive) │
└──────────────┘ └────────────────┘ └────────────────┘
│ │
┌──────┴──────┐ ┌────────────┼────────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌───────┐ ┌────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│Cortex │ │Cortex │ │Cortex-A7│ │Cortex-A │ │Cortex-A │
│ -M0+ │ │-M4/M7 │ │Cortex-A9│ │53/55/72 │ │76/78/X │
│ │ │ │ │(32-bit) │ │(64-bit) │ │(64-bit) │
└───────┘ └────────┘ └─────────┘ └─────────┘ └─────────┘
│ │ │ │ │
│ │ │ │ │
Thumb Thumb-2 ARM32 AArch64 AArch64
only + DSP + Thumb + NEON + SVE2
+ FPU
┌──────────────────────────────────────────────────────────────────────────────┐
│ YOUR TARGETS: │
│ │
│ Raspberry Pi Pico (RP2040) Raspberry Pi 3/4/5 │
│ ├─ Dual Cortex-M0+ cores ├─ Cortex-A53/A72/A76 cores │
│ ├─ ARMv6-M architecture ├─ ARMv8-A architecture (AArch64) │
│ ├─ Thumb instruction set ├─ A64 instruction set │
│ ├─ 16 registers (r0-r15) ├─ 31 registers (x0-x30) │
│ ├─ 133 MHz max clock ├─ 1.5-2.4 GHz clock │
│ └─ 264 KB RAM, no OS └─ 1-8 GB RAM, Linux capable │
└──────────────────────────────────────────────────────────────────────────────┘

How it works (step-by-step, with invariants and failure modes)
- Choose the target profile (A/M/R) based on system constraints and OS expectations.
- Select execution state (AArch64, AArch32, Thumb) based on ISA and toolchain output.
- Assemble and link with profile/state-specific flags; encoding mismatches yield invalid opcodes.
- Boot into the expected privilege level; if the firmware expects EL2 and you start at EL1, early setup fails.
- Validate on target or emulator; incorrect profile assumptions manifest as illegal instruction faults or boot hangs.
Minimal concrete example (pseudo-assembly, not runnable)
Select Target = {Profile: M, State: Thumb}
Assemble([LOAD R0, [ADDR]], Target)
If Target != CPU_State → Fault: Illegal Instruction
Common misconceptions
- “ARM assembly is one language” → It is a family with profile/state splits.
- “Thumb is only a compact mode” → It also shapes register access and available instructions.
- “AArch64 is just ARM32 with bigger registers” → It changes the register file and encoding model.
Check-your-understanding questions
- Why can Cortex-M code not run on a Cortex-A core without translation?
- What is the difference between AArch64 and AArch32?
- How does the profile choice affect your toolchain flags?
Check-your-understanding answers
- Cortex-M uses the M-profile with Thumb encodings and a different system model; Cortex-A expects A-profile with AArch64/AArch32 states.
- AArch64 is a 64-bit execution state with a new register file and A64 encoding; AArch32 is 32-bit with different encodings. citeturn0search2
- The assembler and linker must emit instructions for the correct ISA and object format; mismatches produce illegal opcodes or link errors.
Real-world applications
- Microcontroller firmware (M-profile) in sensors, robotics, and embedded control. citeturn0search0
- Application processors (A-profile) in mobile, desktop, and servers. citeturn0search2
Where you’ll apply it
- This project: see §3.1 and §5.4 in P01-toolchain-pipeline-explorer.md
- P01 Toolchain Pipeline Explorer
- P03 Thumb Instruction Encoder/Decoder
- P07 AArch64 Exception Level Lab
References
- Arm M-profile overview. citeturn0search0
- Arm R-profile overview. citeturn0search1
- Arm A-profile overview and execution states. citeturn0search2
Key insights Your “ARM assembly” only makes sense once you name the profile and execution state.
Summary Profiles and execution states are the root of every other difference in ARM assembly. When you get this right, the rest of the system becomes predictable.
Homework/Exercises to practice the concept
- Pick two devices (one microcontroller, one phone) and identify their ARM profile and execution state.
- Write a one-paragraph explanation of why Thumb exists.
Solutions to the homework/exercises
- Example: RP2040 is M-profile with Thumb; a modern smartphone SoC is A-profile with AArch64.
- Thumb improves code density and decoder simplicity, which is crucial for small embedded systems.
Toolchain & ELF
Fundamentals Assembly alone is not executable; you need a toolchain to assemble, link, and package code into a binary format. GNU as (the GNU assembler) accepts assembly source and emits object files; the linker combines objects into an executable with sections and symbols. citeturn1search1 On most ARM systems, the object format is ELF, defined by the System V ABI family. citeturn1search4 Understanding sections, symbols, and relocations is essential for boot images, firmware layout, and disassembly.
Deep Dive The toolchain is a pipeline: source → object → linked image. The assembler parses directives, encodes instructions for the target ISA, and emits relocatable objects. The linker then resolves symbols, assigns addresses, applies relocations, and produces a final ELF file or a raw binary. This is not a black box: if your startup code lands at the wrong address or your vector table is misaligned, the linker script is responsible. The GNU assembler manual documents directive syntax and how the assembler handles sections, alignment, and symbols. citeturn1search1
ELF (Executable and Linkable Format) is the standard container for compiled objects. It defines headers, sections, and symbol tables so tools can reason about what is in a binary. citeturn1search4 ELF’s strength is transparency: you can inspect sections such as .text (code), .data (initialized data), .bss (zero-initialized data), and custom sections for vector tables or boot metadata. In embedded contexts, you often convert ELF into a raw binary that can be flashed, but the ELF remains the authoritative artifact for debugging because it contains symbols and relocation information.
Relocations are where everything connects. When the assembler emits an instruction that references a symbol whose address is not yet known, it emits a relocation entry. The linker later resolves it. This is how references to labels, functions, and global variables are patched. If you understand relocations, you can interpret why certain instructions appear in disassembly, and you can identify errors like “relocation overflow” or “undefined reference.” The same reasoning applies to position-independent code or shared libraries on A-profile systems.
In practical terms, mastering the toolchain lets you answer questions like: Why is my vector table not at the start of flash? Why does the linker place my .data in RAM but my .text in flash? Why does a symbol show up as undefined? These are the exact questions you will encounter in bare-metal ARM development, and they can only be solved by understanding ELF and the linker. The toolchain also connects to diagnostics: objdump and readelf are not just utilities; they are the microscope that lets you see what the assembler and linker actually produced.
How this fits on projects
- Core to P01 (Toolchain Pipeline Explorer) and P10 (Capstone Monitor).
Definitions & key terms
- Assembler: Translates assembly source into object files. citeturn1search1
- Linker: Resolves symbols and produces an executable or binary.
- ELF: Executable and Linkable Format for binaries. citeturn1search4
- Relocation: A placeholder that the linker resolves to a final address.
Mental model diagram
Toolchain Flow
──────────────
Source (.s) → Assembler → Object (.o) → Linker → ELF (.elf) → Binary (.bin)
│ │
Symbols/Relocs Sections/Addresses
How it works (step-by-step, with invariants and failure modes)
- Assemble source into relocatable objects.
- Link with a linker script or default layout.
- Verify ELF sections and symbols.
- Failure mode: wrong section placement → boot hangs or interrupts jump to wrong address.
Minimal concrete example (pseudo, not runnable)
.section .vectors
.word reset_handler
.linker: place .vectors at flash start
Common misconceptions
- “ELF is only for OS programs” → It is central in embedded, too. citeturn1search4
- “Linker script is optional” → Not when you need precise memory layout.
Check-your-understanding questions
- What is the role of a relocation entry?
- Why do embedded projects often convert ELF to raw binary?
- What is the difference between
.textand.bss?
Check-your-understanding answers
- It records a reference the linker must patch with a final address.
- Flashing tools often want raw bytes, but ELF holds symbols for debugging.
.textholds code;.bssholds zero-initialized data.
Real-world applications
- Firmware image layout, boot loaders, and disassembly tooling.
Where you’ll apply it
- This project: see §3.1 and §5.4 in P01-toolchain-pipeline-explorer.md
- P01 Toolchain Pipeline Explorer
- P10 Capstone Monitor
References
- GNU assembler manual. citeturn1search1
- ELF format and ABI overview. citeturn1search4
Key insights The toolchain is the bridge between assembly and hardware; without it, nothing runs.
Summary Understanding ELF and linking turns build failures into solvable layout problems.
Homework/Exercises to practice the concept
- Identify three sections you expect in a bare-metal ELF and explain why.
- Explain how a symbol reference becomes a concrete address.
Solutions to the homework/exercises
.textfor code,.datafor initialized globals,.bssfor zeroed globals.- The assembler emits a relocation that the linker resolves to the final address.
3. Project Specification
3.1 What You Will Build
A small CLI workflow that emits an ELF, inspects sections, and validates entry points for two ARM targets.
3.2 Functional Requirements
- Requirement 1: Produce ELF artifacts for both Cortex-M and AArch64 targets
- Requirement 2: Display section addresses and sizes in a stable, parseable format
- Requirement 3: Emit a symbol summary including entry points and vector table locations
3.3 Non-Functional Requirements
- Deterministic output across runs
- Clear error messages for missing tools
3.4 Example Usage / Output
$ arm-toolchain-lab --target cortex-m0 --show-sections
ELF: sample.elf
.text @ 0x10000000 size 0x120
.vectors @ 0x10000000 size 0x40
$ arm-toolchain-lab --target aarch64 --show-sections
ELF: sample.elf
.text @ 0x00080000 size 0x180
$ arm-toolchain-lab --target cortex-m0 --show-sections --bad-flag
error: unknown flag "--bad-flag"
exit code: 2
3.5 Data Formats / Schemas / Protocols
- Sections table: name, address, size
- Symbols table: name, address, type
3.6 Edge Cases
- Missing toolchain
- Stripped symbols
- Unsupported target name
3.7 Real World Outcome
This is the golden reference for success:
- A learner can compare the ELF layout for two targets and explain why sections moved.
- The CLI produces deterministic output and a clear error format.
3.7.1 How to Run (Copy/Paste)
- Build: follow the toolchain steps defined in this guide
- Run: use the CLI examples in §3.4 with fixed inputs
- Expected directory: project root
3.7.2 Golden Path Demo (Deterministic)
Run with a fixed input set and confirm output matches §3.4 exactly.
3.7.3 If CLI: Exact Terminal Transcript
$ arm-toolchain-lab --target cortex-m0 --show-sections
ELF: sample.elf
.text @ 0x10000000 size 0x120
.vectors @ 0x10000000 size 0x40
$ arm-toolchain-lab --target aarch64 --show-sections
ELF: sample.elf
.text @ 0x00080000 size 0x180
$ arm-toolchain-lab --target cortex-m0 --show-sections --bad-flag
error: unknown flag "--bad-flag"
exit code: 2
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Input Layer │───▶│ Core Logic │───▶│ Output Layer │
└──────────────┘ └──────────────┘ └──────────────┘
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Input Parser | Validate and normalize input | Strict error handling |
| Core Engine | Perform the main computation | Deterministic paths |
| Reporter | Produce user-facing output | Stable formatting |
4.3 Data Structures (No Full Code)
Record Entry {
name: string
fields: list
notes: text
}
4.4 Algorithm Overview
Key Algorithm: Core Flow
- Parse input and validate parameters.
- Execute the core transformation or analysis.
- Emit deterministic output or error summary.
Complexity Analysis:
- Time: O(n) in the size of input records
- Space: O(n) for stored mappings and logs
5. Implementation Guide
5.1 Development Environment Setup
# Install toolchain and verify versions
toolchain --version
5.2 Project Structure
project-root/
├── src/
│ ├── core
│ └── io
├── tests/
│ └── fixtures
├── docs/
└── README.md
5.3 The Core Question You’re Answering
“Build a repeatable pipeline that assembles, links, and inspects Cortex-M and AArch64 binaries.”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Profiles & Execution States
- What is the key invariant you must preserve?
- Toolchain & ELF
- What is the key invariant you must preserve?
5.5 Questions to Guide Your Design
- Data Flow
- How does input become output?
- Which steps must be deterministic?
- Validation
- What is the simplest test that proves correctness?
- How will you detect regressions?
5.6 Thinking Exercise
Trace the Critical Path
Write a step-by-step trace of the most important workflow in this project.
Questions to answer:
- Where could a subtle bug hide?
- What would you log to prove correctness?
5.7 The Interview Questions They’ll Ask
- “What is the core invariant this project relies on?”
- “How would you debug a failure in this workflow?”
- “What trade-offs did you make in design?”
- “How does this map to real hardware or toolchains?”
- “How do you prove your output is correct?”
5.8 Hints in Layers
Hint 1: Start small Focus on the smallest input that still demonstrates the concept.
Hint 2: Make output deterministic Fix inputs and produce stable logs before expanding functionality.
Hint 3: Validate against a known reference Compare with a known-good output or specification.
Hint 4: Add instrumentation Log internal steps so you can verify each phase explicitly.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Core concept | “ARM Assembly Language” by William Hohl | Ch. 3-5 |
| Binary formats | “Linkers and Loaders” by John R. Levine | Ch. 1-3 |
5.10 Implementation Phases
Phase 1: Foundation (2-4 hours)
Goals:
- Establish a minimal working pipeline
- Validate one end-to-end path
Tasks:
- Build the smallest viable input and output
- Verify outputs against a reference Checkpoint: Output matches expected golden path
Phase 2: Core Functionality (4-8 hours)
Goals:
- Implement main logic and validation
- Add structured error handling
Tasks:
- Implement the core transformation
- Add deterministic reporting Checkpoint: Core tests pass reliably
Phase 3: Polish & Edge Cases (2-4 hours)
Goals:
- Cover edge cases
- Improve output clarity
Tasks:
- Add negative tests
- Document limitations Checkpoint: All edge cases handled gracefully
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Input format | Free-form vs structured | Structured | Easier validation |
| Output format | Human vs machine | Both | Supports verification and tooling |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate core logic | Field parsing, bounds checks |
| Integration Tests | Validate full flow | End-to-end CLI runs |
| Edge Case Tests | Validate boundaries | Empty input, invalid flags |
6.2 Critical Test Cases
- Golden path: Fixed input produces known output.
- Invalid input: Error path triggers correct exit code.
- Boundary case: Maximum supported value handled correctly.
6.3 Test Data
Input: fixed seed or fixed fixture
Expected: exact output text from §3.4
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Misaligned assumptions | Unexpected output | Re-check invariants |
| Missing validation | Silent failures | Add explicit checks |
| Non-determinism | Flaky output | Fix inputs and seeds |
7.2 Debugging Strategies
- Trace everything: Log each step with stable ordering
- Compare against reference: Use known-good outputs
7.3 Performance Traps
- Avoid repeated parsing of the same input; cache results when possible
8. Extensions & Challenges
8.1 Beginner Extensions
- Add one extra output format
- Add a help screen with examples
8.2 Intermediate Extensions
- Add a verification mode that compares two outputs
- Add structured JSON output
8.3 Advanced Extensions
- Add a batch mode for large inputs
- Add cross-target comparisons (M vs A profile)
9. Real-World Connections
9.1 Industry Applications
- Firmware bring-up: use the same checks to validate early boot images
- Security audits: analyze binaries for ABI or control-flow correctness
9.2 Related Open Source Projects
- binutils: source of many ARM tooling workflows
- QEMU: emulator used for ARM testing
9.3 Interview Relevance
- Explains why ARM behavior differs across profiles
- Demonstrates toolchain literacy and debugging rigor
10. Resources
10.1 Essential Reading
- “ARM Assembly Language” by William Hohl - practical instruction usage
- “Linkers and Loaders” by John R. Levine - binary layout
10.2 Video Resources
- ARM architecture overview talks and lectures
10.3 Tools & Documentation
- GNU binutils documentation
- Arm developer documentation
10.4 Related Projects in This Series
- This project connects with: P02-register-stack-visualizer.md, P03-thumb-encoder-decoder.md, P04-mmio-memory-map-notebook.md
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the core concept without notes
- I can explain why my design choices were necessary
- I can describe one realistic failure mode
11.2 Implementation
- All functional requirements are met
- Tests pass deterministically
- Edge cases are documented
11.3 Growth
- I can describe what I would improve next time
- I can explain this project in an interview
12. Submission / Completion Criteria
Minimum Viable Completion:
- Core functionality works on reference inputs
- Deterministic golden path is documented
- At least one failure path is demonstrated
Full Completion:
- All minimum criteria plus:
- Edge cases are covered with tests
- Output format is stable and documented
Excellence (Going Above & Beyond):
- Add a comparison against a second target
- Provide a short write-up of lessons learned