Project 1: Toolchain Pipeline Explorer

Build a repeatable pipeline that assembles, links, and inspects Cortex-M and AArch64 binaries.

Quick Reference

Attribute	Value
Difficulty	Level 2
Time Estimate	6-10 hours
Main Programming Language	Assembly + C (Alternatives: Rust, Zig)
Alternative Programming Languages	Rust, Zig
Coolness Level	Level 2
Business Potential	Level 2
Prerequisites	Binary/hex basics, CLI tooling comfort, Concept 6: Toolchain & ELF
Key Topics	ELF sections, linker layout, ISA targeting

1. Learning Objectives

By completing this project, you will:

Translate ARM concepts into observable outputs you can verify.
Explain why each toolchain or hardware step is necessary.
Detect and fix at least one realistic failure mode.
Communicate the result clearly in a technical review or interview.

2. All Theory Needed (Per-Concept Breakdown)

Profiles & Execution States

Fundamentals ARM is a family of architectures organized into profiles optimized for different constraints. The A-profile targets application processors that run rich OSes (phones, laptops, servers), the M-profile targets microcontrollers with tight power and memory budgets, and the R-profile targets deterministic real-time systems. citeturn0search0turn0search1turn0search2 Each profile implies a different set of instructions, privilege models, and system features. Within a profile, ARM defines execution states (such as AArch64 or AArch32) that determine register width, instruction encoding, and address space. AArch64 is the 64-bit execution state introduced in ARMv8-A, while AArch32 is the 32-bit state; M-profile uses Thumb encodings for compact code density and simpler decode logic. citeturn0search2 This is why “ARM assembly” is not a single language: the same mnemonic can encode differently, or even be invalid, depending on profile and state.

Deep Dive The profile split is the most important high-level idea in ARM. A-profile cores are built to host complex operating systems with virtual memory, multi-core scheduling, and high performance. That means features like exception levels, MMUs, and richer instruction sets matter. In contrast, M-profile focuses on minimal latency, low power, and deterministic behavior: it strips away many features to reduce silicon cost and simplify real-time response. R-profile sits in-between: it retains more predictability than A-profile but includes stronger real-time guarantees than M-profile. citeturn0search0turn0search1 When you choose to write assembly, you’re implicitly choosing a profile, and that choice changes everything from the boot flow to the toolchain arguments you use.

Execution states deepen the split. In ARMv8-A, AArch64 brings a new 64-bit register file and 32-bit fixed-length instruction encoding (A64). AArch32 keeps the 32-bit model (A32/T32). This means that for A-profile hardware, your code must declare its intended execution state; otherwise, even valid mnemonics may assemble into the wrong encoding or fail. citeturn0search2 M-profile, by contrast, uses Thumb encodings by design, favoring compact instructions and simpler decode paths. These constraints are not academic. They drive register availability, calling convention differences, and even the structure of your interrupt handlers. If you write code that assumes AArch64 but run on Cortex-M, the encoding and semantics are incompatible.

Another subtle but critical effect of profile and state is the system-level context. A-profile expects multiple privilege levels and potentially a hypervisor. M-profile’s exception model is simpler, its vector table is fixed and immediate, and it typically boots directly into a single firmware image. R-profile targets systems where real-time guarantees trump throughput; this affects interrupt priority, memory latency assumptions, and peripheral access patterns. Understanding profile choice lets you reason about why an instruction exists, why a particular addressing mode is missing, and why certain system registers are visible or hidden.

Finally, architecture profiles determine the ecosystem around your work. A-profile benefits from abundant tooling, standardized ABIs, and OS integration, while M-profile leans on vendor SDKs, board-specific memory maps, and smaller toolchains. This guide intentionally spans both because many real-world systems combine them: a Linux-capable application processor for high-level features and a microcontroller for deterministic control. Once you see that split, you can design experiments and projects that map to the right target without confusion.

How this fits on projects

Shapes target selection in P01 (Toolchain Pipeline Explorer) and P07 (Exception Level Lab).
Determines encoding assumptions in P03 (Thumb Encoder/Decoder).

Definitions & key terms

Profile: A family of ARM features optimized for a market segment (A/M/R). citeturn0search0turn0search1turn0search2
Execution state: The architectural mode (AArch64, AArch32, Thumb) that defines register width and instruction encoding. citeturn0search2
AArch64: 64-bit execution state introduced in ARMv8-A.
AArch32: 32-bit execution state in ARMv8-A.
Thumb: Compact instruction encoding used by M-profile.

Mental model diagram

ARM Architecture Evolution:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

                              ┌─────────────────────────────────────────────┐
                              │            ARM Holdings (IP owner)           │
                              │   Designs architectures, licenses to others  │
                              └─────────────────────┬───────────────────────┘
                                                    │
           ┌────────────────────────────────────────┼────────────────────────────┐
           │                                        │                            │
           ▼                                        ▼                            ▼
    ┌──────────────┐                    ┌────────────────┐              ┌────────────────┐
    │   M-Profile  │                    │    A-Profile   │              │   R-Profile    │
    │ Microcontrollers                  │  Applications  │              │   Real-Time    │
    │ (Embedded)   │                    │ (Phones, PCs)  │              │   (Automotive) │
    └──────────────┘                    └────────────────┘              └────────────────┘
           │                                        │
    ┌──────┴──────┐                    ┌────────────┼────────────┐
    │             │                    │            │            │
    ▼             ▼                    ▼            ▼            ▼
 ┌───────┐   ┌────────┐          ┌─────────┐  ┌─────────┐  ┌─────────┐
 │Cortex │   │Cortex  │          │Cortex-A7│  │Cortex-A │  │Cortex-A │
 │ -M0+  │   │-M4/M7  │          │Cortex-A9│  │53/55/72 │  │76/78/X  │
 │       │   │        │          │(32-bit) │  │(64-bit) │  │(64-bit) │
 └───────┘   └────────┘          └─────────┘  └─────────┘  └─────────┘
     │            │                   │            │            │
     │            │                   │            │            │
  Thumb       Thumb-2              ARM32      AArch64      AArch64
  only        + DSP               + Thumb    + NEON       + SVE2
                + FPU

┌──────────────────────────────────────────────────────────────────────────────┐
│ YOUR TARGETS:                                                                 │
│                                                                               │
│ Raspberry Pi Pico (RP2040)     Raspberry Pi 3/4/5                            │
│ ├─ Dual Cortex-M0+ cores       ├─ Cortex-A53/A72/A76 cores                   │
│ ├─ ARMv6-M architecture        ├─ ARMv8-A architecture (AArch64)             │
│ ├─ Thumb instruction set       ├─ A64 instruction set                        │
│ ├─ 16 registers (r0-r15)       ├─ 31 registers (x0-x30)                      │
│ ├─ 133 MHz max clock           ├─ 1.5-2.4 GHz clock                          │
│ └─ 264 KB RAM, no OS           └─ 1-8 GB RAM, Linux capable                  │
└──────────────────────────────────────────────────────────────────────────────┘

ARM Architecture Family Tree

How it works (step-by-step, with invariants and failure modes)

Choose the target profile (A/M/R) based on system constraints and OS expectations.
Select execution state (AArch64, AArch32, Thumb) based on ISA and toolchain output.
Assemble and link with profile/state-specific flags; encoding mismatches yield invalid opcodes.
Boot into the expected privilege level; if the firmware expects EL2 and you start at EL1, early setup fails.
Validate on target or emulator; incorrect profile assumptions manifest as illegal instruction faults or boot hangs.

Minimal concrete example (pseudo-assembly, not runnable)

Select Target = {Profile: M, State: Thumb}
Assemble([LOAD R0, [ADDR]], Target)
If Target != CPU_State → Fault: Illegal Instruction

Common misconceptions

“ARM assembly is one language” → It is a family with profile/state splits.
“Thumb is only a compact mode” → It also shapes register access and available instructions.
“AArch64 is just ARM32 with bigger registers” → It changes the register file and encoding model.

Check-your-understanding questions

Why can Cortex-M code not run on a Cortex-A core without translation?
What is the difference between AArch64 and AArch32?
How does the profile choice affect your toolchain flags?

Check-your-understanding answers

Cortex-M uses the M-profile with Thumb encodings and a different system model; Cortex-A expects A-profile with AArch64/AArch32 states.
AArch64 is a 64-bit execution state with a new register file and A64 encoding; AArch32 is 32-bit with different encodings. citeturn0search2
The assembler and linker must emit instructions for the correct ISA and object format; mismatches produce illegal opcodes or link errors.

Real-world applications

Microcontroller firmware (M-profile) in sensors, robotics, and embedded control. citeturn0search0
Application processors (A-profile) in mobile, desktop, and servers. citeturn0search2

Where you’ll apply it

This project: see §3.1 and §5.4 in P01-toolchain-pipeline-explorer.md
P01 Toolchain Pipeline Explorer
P03 Thumb Instruction Encoder/Decoder
P07 AArch64 Exception Level Lab

References

Arm M-profile overview. citeturn0search0
Arm R-profile overview. citeturn0search1
Arm A-profile overview and execution states. citeturn0search2

Key insights Your “ARM assembly” only makes sense once you name the profile and execution state.

Summary Profiles and execution states are the root of every other difference in ARM assembly. When you get this right, the rest of the system becomes predictable.

Homework/Exercises to practice the concept

Pick two devices (one microcontroller, one phone) and identify their ARM profile and execution state.
Write a one-paragraph explanation of why Thumb exists.

Solutions to the homework/exercises

Example: RP2040 is M-profile with Thumb; a modern smartphone SoC is A-profile with AArch64.
Thumb improves code density and decoder simplicity, which is crucial for small embedded systems.

Toolchain & ELF

Fundamentals Assembly alone is not executable; you need a toolchain to assemble, link, and package code into a binary format. GNU as (the GNU assembler) accepts assembly source and emits object files; the linker combines objects into an executable with sections and symbols. citeturn1search1 On most ARM systems, the object format is ELF, defined by the System V ABI family. citeturn1search4 Understanding sections, symbols, and relocations is essential for boot images, firmware layout, and disassembly.

Deep Dive The toolchain is a pipeline: source → object → linked image. The assembler parses directives, encodes instructions for the target ISA, and emits relocatable objects. The linker then resolves symbols, assigns addresses, applies relocations, and produces a final ELF file or a raw binary. This is not a black box: if your startup code lands at the wrong address or your vector table is misaligned, the linker script is responsible. The GNU assembler manual documents directive syntax and how the assembler handles sections, alignment, and symbols. citeturn1search1

ELF (Executable and Linkable Format) is the standard container for compiled objects. It defines headers, sections, and symbol tables so tools can reason about what is in a binary. citeturn1search4 ELF’s strength is transparency: you can inspect sections such as .text (code), .data (initialized data), .bss (zero-initialized data), and custom sections for vector tables or boot metadata. In embedded contexts, you often convert ELF into a raw binary that can be flashed, but the ELF remains the authoritative artifact for debugging because it contains symbols and relocation information.

Relocations are where everything connects. When the assembler emits an instruction that references a symbol whose address is not yet known, it emits a relocation entry. The linker later resolves it. This is how references to labels, functions, and global variables are patched. If you understand relocations, you can interpret why certain instructions appear in disassembly, and you can identify errors like “relocation overflow” or “undefined reference.” The same reasoning applies to position-independent code or shared libraries on A-profile systems.

In practical terms, mastering the toolchain lets you answer questions like: Why is my vector table not at the start of flash? Why does the linker place my .data in RAM but my .text in flash? Why does a symbol show up as undefined? These are the exact questions you will encounter in bare-metal ARM development, and they can only be solved by understanding ELF and the linker. The toolchain also connects to diagnostics: objdump and readelf are not just utilities; they are the microscope that lets you see what the assembler and linker actually produced.

How this fits on projects

Core to P01 (Toolchain Pipeline Explorer) and P10 (Capstone Monitor).

Definitions & key terms

Assembler: Translates assembly source into object files. citeturn1search1
Linker: Resolves symbols and produces an executable or binary.
ELF: Executable and Linkable Format for binaries. citeturn1search4
Relocation: A placeholder that the linker resolves to a final address.

Mental model diagram

Toolchain Flow
──────────────
Source (.s) → Assembler → Object (.o) → Linker → ELF (.elf) → Binary (.bin)
                    │                 │
               Symbols/Relocs    Sections/Addresses

How it works (step-by-step, with invariants and failure modes)

Assemble source into relocatable objects.
Link with a linker script or default layout.
Verify ELF sections and symbols.
Failure mode: wrong section placement → boot hangs or interrupts jump to wrong address.

Minimal concrete example (pseudo, not runnable)

.section .vectors
.word reset_handler
.linker: place .vectors at flash start

Common misconceptions

“ELF is only for OS programs” → It is central in embedded, too. citeturn1search4
“Linker script is optional” → Not when you need precise memory layout.

Check-your-understanding questions

What is the role of a relocation entry?
Why do embedded projects often convert ELF to raw binary?
What is the difference between .text and .bss?

Check-your-understanding answers

It records a reference the linker must patch with a final address.
Flashing tools often want raw bytes, but ELF holds symbols for debugging.
.text holds code; .bss holds zero-initialized data.

Real-world applications

Firmware image layout, boot loaders, and disassembly tooling.

Where you’ll apply it

This project: see §3.1 and §5.4 in P01-toolchain-pipeline-explorer.md
P01 Toolchain Pipeline Explorer
P10 Capstone Monitor

References

GNU assembler manual. citeturn1search1
ELF format and ABI overview. citeturn1search4

Key insights The toolchain is the bridge between assembly and hardware; without it, nothing runs.

Summary Understanding ELF and linking turns build failures into solvable layout problems.

Homework/Exercises to practice the concept

Identify three sections you expect in a bare-metal ELF and explain why.
Explain how a symbol reference becomes a concrete address.

Solutions to the homework/exercises

.text for code, .data for initialized globals, .bss for zeroed globals.
The assembler emits a relocation that the linker resolves to the final address.

3. Project Specification

3.1 What You Will Build

A small CLI workflow that emits an ELF, inspects sections, and validates entry points for two ARM targets.

3.2 Functional Requirements

Requirement 1: Produce ELF artifacts for both Cortex-M and AArch64 targets
Requirement 2: Display section addresses and sizes in a stable, parseable format
Requirement 3: Emit a symbol summary including entry points and vector table locations

3.3 Non-Functional Requirements

Deterministic output across runs
Clear error messages for missing tools

3.4 Example Usage / Output

$ arm-toolchain-lab --target cortex-m0 --show-sections
ELF: sample.elf
.text  @ 0x10000000  size 0x120
.vectors @ 0x10000000 size 0x40

$ arm-toolchain-lab --target aarch64 --show-sections
ELF: sample.elf
.text  @ 0x00080000  size 0x180

$ arm-toolchain-lab --target cortex-m0 --show-sections --bad-flag
error: unknown flag "--bad-flag"
exit code: 2

3.5 Data Formats / Schemas / Protocols

Sections table: name, address, size
Symbols table: name, address, type

3.6 Edge Cases

Missing toolchain
Stripped symbols
Unsupported target name

3.7 Real World Outcome

This is the golden reference for success:

A learner can compare the ELF layout for two targets and explain why sections moved.
The CLI produces deterministic output and a clear error format.

3.7.1 How to Run (Copy/Paste)

Build: follow the toolchain steps defined in this guide
Run: use the CLI examples in §3.4 with fixed inputs
Expected directory: project root

3.7.2 Golden Path Demo (Deterministic)

Run with a fixed input set and confirm output matches §3.4 exactly.

3.7.3 If CLI: Exact Terminal Transcript

$ arm-toolchain-lab --target cortex-m0 --show-sections
ELF: sample.elf
.text  @ 0x10000000  size 0x120
.vectors @ 0x10000000 size 0x40

$ arm-toolchain-lab --target aarch64 --show-sections
ELF: sample.elf
.text  @ 0x00080000  size 0x180

$ arm-toolchain-lab --target cortex-m0 --show-sections --bad-flag
error: unknown flag "--bad-flag"
exit code: 2

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Input Layer  │───▶│ Core Logic   │───▶│ Output Layer │
└──────────────┘     └──────────────┘     └──────────────┘

4.2 Key Components

Component	Responsibility	Key Decisions
Input Parser	Validate and normalize input	Strict error handling
Core Engine	Perform the main computation	Deterministic paths
Reporter	Produce user-facing output	Stable formatting

4.3 Data Structures (No Full Code)

Record Entry {
  name: string
  fields: list
  notes: text
}

4.4 Algorithm Overview

Key Algorithm: Core Flow

Parse input and validate parameters.
Execute the core transformation or analysis.
Emit deterministic output or error summary.

Complexity Analysis:

Time: O(n) in the size of input records
Space: O(n) for stored mappings and logs

5. Implementation Guide

5.1 Development Environment Setup

# Install toolchain and verify versions
toolchain --version

5.2 Project Structure

project-root/
├── src/
│   ├── core
│   └── io
├── tests/
│   └── fixtures
├── docs/
└── README.md

5.3 The Core Question You’re Answering

“Build a repeatable pipeline that assembles, links, and inspects Cortex-M and AArch64 binaries.”

5.4 Concepts You Must Understand First

Stop and research these before coding:

Profiles & Execution States
- What is the key invariant you must preserve?
Toolchain & ELF
- What is the key invariant you must preserve?

5.5 Questions to Guide Your Design

Data Flow
- How does input become output?
- Which steps must be deterministic?
Validation
- What is the simplest test that proves correctness?
- How will you detect regressions?

5.6 Thinking Exercise

Trace the Critical Path

Write a step-by-step trace of the most important workflow in this project.

Questions to answer:

Where could a subtle bug hide?
What would you log to prove correctness?

5.7 The Interview Questions They’ll Ask

“What is the core invariant this project relies on?”
“How would you debug a failure in this workflow?”
“What trade-offs did you make in design?”
“How does this map to real hardware or toolchains?”
“How do you prove your output is correct?”

5.8 Hints in Layers

Hint 1: Start small Focus on the smallest input that still demonstrates the concept.

Hint 2: Make output deterministic Fix inputs and produce stable logs before expanding functionality.

Hint 3: Validate against a known reference Compare with a known-good output or specification.

Hint 4: Add instrumentation Log internal steps so you can verify each phase explicitly.

5.9 Books That Will Help

Topic	Book	Chapter
Core concept	“ARM Assembly Language” by William Hohl	Ch. 3-5
Binary formats	“Linkers and Loaders” by John R. Levine	Ch. 1-3

5.10 Implementation Phases

Phase 1: Foundation (2-4 hours)

Goals:

Establish a minimal working pipeline
Validate one end-to-end path Tasks:
1. Build the smallest viable input and output
2. Verify outputs against a reference Checkpoint: Output matches expected golden path

Phase 2: Core Functionality (4-8 hours)

Goals:

Implement main logic and validation
Add structured error handling Tasks:
1. Implement the core transformation
2. Add deterministic reporting Checkpoint: Core tests pass reliably

Phase 3: Polish & Edge Cases (2-4 hours)

Goals:

Cover edge cases
Improve output clarity Tasks:
1. Add negative tests
2. Document limitations Checkpoint: All edge cases handled gracefully

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Input format	Free-form vs structured	Structured	Easier validation
Output format	Human vs machine	Both	Supports verification and tooling

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Validate core logic	Field parsing, bounds checks
Integration Tests	Validate full flow	End-to-end CLI runs
Edge Case Tests	Validate boundaries	Empty input, invalid flags

6.2 Critical Test Cases

Golden path: Fixed input produces known output.
Invalid input: Error path triggers correct exit code.
Boundary case: Maximum supported value handled correctly.

6.3 Test Data

Input: fixed seed or fixed fixture
Expected: exact output text from §3.4

7. Common Pitfalls & Debugging

Pitfall	Symptom	Solution
Misaligned assumptions	Unexpected output	Re-check invariants
Missing validation	Silent failures	Add explicit checks
Non-determinism	Flaky output	Fix inputs and seeds

7.2 Debugging Strategies

Trace everything: Log each step with stable ordering
Compare against reference: Use known-good outputs

7.3 Performance Traps

Avoid repeated parsing of the same input; cache results when possible

8. Extensions & Challenges

8.1 Beginner Extensions

Add one extra output format
Add a help screen with examples

8.2 Intermediate Extensions

Add a verification mode that compares two outputs
Add structured JSON output

8.3 Advanced Extensions

Add a batch mode for large inputs
Add cross-target comparisons (M vs A profile)

9. Real-World Connections

9.1 Industry Applications

Firmware bring-up: use the same checks to validate early boot images
Security audits: analyze binaries for ABI or control-flow correctness

binutils: source of many ARM tooling workflows
QEMU: emulator used for ARM testing

9.3 Interview Relevance

Explains why ARM behavior differs across profiles
Demonstrates toolchain literacy and debugging rigor

10. Resources

10.1 Essential Reading

“ARM Assembly Language” by William Hohl - practical instruction usage
“Linkers and Loaders” by John R. Levine - binary layout

10.2 Video Resources

ARM architecture overview talks and lectures

10.3 Tools & Documentation

GNU binutils documentation
Arm developer documentation

This project connects with: P02-register-stack-visualizer.md, P03-thumb-encoder-decoder.md, P04-mmio-memory-map-notebook.md

11. Self-Assessment Checklist

11.1 Understanding

I can explain the core concept without notes
I can explain why my design choices were necessary
I can describe one realistic failure mode

11.2 Implementation

All functional requirements are met
Tests pass deterministically
Edge cases are documented

11.3 Growth

I can describe what I would improve next time
I can explain this project in an interview

12. Submission / Completion Criteria

Minimum Viable Completion:

Core functionality works on reference inputs
Deterministic golden path is documented
At least one failure path is demonstrated

Full Completion:

All minimum criteria plus:
Edge cases are covered with tests
Output format is stable and documented

Excellence (Going Above & Beyond):

Add a comparison against a second target
Provide a short write-up of lessons learned

Project 1: Toolchain Pipeline Explorer

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

Profiles & Execution States

Toolchain & ELF

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 If CLI: Exact Terminal Transcript

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2-4 hours)

Phase 2: Core Functionality (4-8 hours)

Phase 3: Polish & Edge Cases (2-4 hours)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria