Project 2: Addressing Mode Calculator

A CLI tool that computes effective addresses from base, index, scale, and displacement.

Quick Reference

Attribute Value
Difficulty Level 3
Time Estimate Weekend
Main Programming Language Python or C (Alternatives: Rust, Go)
Alternative Programming Languages Rust, Go
Coolness Level Level 3
Business Potential 1
Prerequisites Data Representation, Memory, and Addressing, Data Representation, Memory, and Addressing
Key Topics Data Representation, Memory, and Addressing, Data Representation, Memory, and Addressing

1. Learning Objectives

By completing this project, you will:

  1. Explain why addressing mode calculator reveals key x86-64 behaviors.
  2. Build a deterministic tool with clear, inspectable output.
  3. Validate correctness against a golden reference output.
  4. Connect the tool output to ABI and architecture rules.
  5. Address calculation is the heart of memory operations and compiler output.

2. All Theory Needed (Per-Concept Breakdown)

Data Representation, Memory, and Addressing

Fundamentals

x86-64 is a byte-addressed, little-endian architecture. Data representation determines how values appear in memory, how loads and stores reconstruct those values, and how alignment affects performance. Memory addressing is not just “base + offset”; it is a rich set of forms including base, index, scale, and displacement, plus RIP-relative addressing in 64-bit mode. These addressing forms are part of the ISA and are a primary tool for compilers. Virtual memory adds another layer: the addresses you see in registers are virtual, translated by page tables configured by the OS. When you write or analyze assembly, you are always navigating both representation and translation. Official architecture references and ABI specifications describe these addressing forms and constraints. (Sources: Intel SDM, Microsoft x64 architecture docs)

Deep Dive

Data representation is the mapping between abstract values and physical bytes. On x86-64, integers are typically two’s complement and stored in little-endian order. That means the least significant byte sits at the lowest memory address. When you inspect memory dumps, the order will appear reversed relative to the human-readable hex. This matters for debugging and binary analysis; it also matters for writing correct parsing and serialization logic.

Memory addressing is a key differentiator between x86-64 and many simpler ISAs. The architecture supports effective addresses of the form base + index * scale + displacement, where scale can be 1, 2, 4, or 8. This lets the CPU calculate addresses for arrays and structures in a single instruction, which is why compiler output often uses complex addressing instead of explicit multiply or add instructions. In long mode, RIP-relative addressing is widely used for position-independent code; it allows the instruction stream to refer to nearby constants and jump tables without absolute addresses. That is why you will see references relative to RIP rather than absolute pointers in modern binaries.

Virtual memory is the next layer of meaning. The addresses in registers are virtual; they are translated to physical addresses using a page table hierarchy. As a result, two different processes can have the same virtual address mapping to different physical memory. The OS enforces protection and isolation through page permissions. When you read assembly, you see the virtual addresses. The mapping is invisible unless you consult page tables or OS introspection tools, which is why memory corruption bugs can appear non-deterministic; they might read valid memory but the wrong mapping.

Alignment is another subtlety. Many instructions perform better when data is aligned to its natural width (for example, 8-byte aligned for 64-bit values). Misaligned loads are supported in x86-64 but can be slower or cause extra microarchitectural work. ABI conventions often require stack alignment to 16 bytes at call boundaries, which ensures that SIMD operations and stack-based data are aligned. This alignment rule is part of the ABI, not just a performance hint.

Addressing modes also influence instruction encoding. The ModR/M and SIB bytes encode the base, index, scale, and displacement. Some combinations are invalid or have special meaning (for example, certain base/index fields imply RIP-relative addressing or a displacement-only form). Understanding this encoding is critical for building decoders and for interpreting bytes in memory. It is also how you can verify that a disassembler is correct: the addressing mode can be inferred from the encoding and compared to the textual rendering.

Finally, consider how data representation affects control flow and calling conventions. Arguments passed by reference are simply addresses; the ABI does not enforce type. That means assembly must interpret the bytes correctly, or the program will behave incorrectly even if the instruction sequence is “valid.” This is where assembly becomes a discipline: you must know what the bytes mean, and that meaning is not written anywhere except in the ABI and the program’s logic.

How this fits on projects

  • Projects 2-4 are explicitly about effective address calculation and RIP-relative forms.
  • Projects 9-10 require precise understanding of data layout and alignment inside ELF/PE sections.

Definitions & key terms

  • Little-endian: Least significant byte at lowest address.
  • Effective address: The computed address used by a memory instruction.
  • RIP-relative: Addressing relative to the instruction pointer.
  • Virtual memory: The address space seen by a process, mapped to physical memory.
  • Alignment: Address boundary that improves correctness or performance.

Mental model diagram

VALUE -> BYTES -> VIRTUAL ADDRESS -> PAGE TABLE -> PHYSICAL ADDRESS

      +---------+          +------------------+
      |  Value  |  encode  |  Byte Sequence   |
      +---------+          +---------+--------+
                                    |
                                    v
                         +----------------------+
                         |  Effective Address   |
                         | base + index*scale + |
                         |      displacement    |
                         +----------+-----------+
                                    |
                                    v
                         +----------------------+
                         |   Virtual Address    |
                         +----------+-----------+
                                    |
                                    v
                         +----------------------+
                         |   Page Translation   |
                         +----------+-----------+
                                    |
                                    v
                         +----------------------+
                         |   Physical Address   |
                         +----------------------+

How it works

  1. Program computes effective address from base/index/scale/disp.
  2. CPU uses that effective address as a virtual address.
  3. MMU translates virtual to physical using page tables.
  4. Data is loaded or stored in little-endian byte order.

Invariants and failure modes:

  • Invariant: Effective address is computed before translation.
  • Failure: Misinterpreting endianness yields wrong values.
  • Invariant: ABI defines alignment at call boundaries.
  • Failure: Misalignment can break SIMD assumptions or slow down code.

Minimal concrete example (pseudo-assembly, not real code)

# PSEUDOCODE ONLY
# Compute address of element i in an array of 8-byte elements
EFFECTIVE_ADDRESS = BASE_PTR + INDEX * 8 + OFFSET
LOAD64 REG_X, [EFFECTIVE_ADDRESS]

Common misconceptions

  • “x86-64 is big-endian.” It is little-endian by default.
  • “All addresses are physical.” User code uses virtual addresses.
  • “Alignment is optional.” It is required by ABI for some operations.

Check-your-understanding questions

  1. Why does little-endian matter when reading a hexdump?
  2. What is the difference between effective and virtual address?
  3. Why do compilers use base+index*scale addressing?

Check-your-understanding answers

  1. The byte order is reversed relative to human-readable hex.
  2. Effective is computed by the instruction; virtual is then translated.
  3. It encodes array indexing in a single instruction.

Real-world applications

  • Debugging pointer arithmetic errors
  • Building instruction decoders and disassemblers
  • Understanding how compilers lay out data

Where you will apply it Projects 2, 3, 4, 9, 10

References

  • Intel 64 and IA-32 Architectures Software Developer’s Manual (Intel)
  • Microsoft x64 architecture documentation
  • “Computer Systems: A Programmer’s Perspective” by Bryant and O’Hallaron - Ch. 3

Key insights Memory is not just bytes; it is a layered mapping between representation and address translation.

Summary Effective addressing and data layout are the glue between values in your head and bytes in memory.

Homework/Exercises to practice the concept

  • Convert a 64-bit integer into its little-endian byte sequence.
  • Compute effective addresses for an array with different indices.

Solutions to the homework/exercises

  • List the bytes from least significant to most significant.
  • Use base + index * element_size + offset.

    Data Representation, Memory, and Addressing

Fundamentals

x86-64 is a byte-addressed, little-endian architecture. Data representation determines how values appear in memory, how loads and stores reconstruct those values, and how alignment affects performance. Memory addressing is not just “base + offset”; it is a rich set of forms including base, index, scale, and displacement, plus RIP-relative addressing in 64-bit mode. These addressing forms are part of the ISA and are a primary tool for compilers. Virtual memory adds another layer: the addresses you see in registers are virtual, translated by page tables configured by the OS. When you write or analyze assembly, you are always navigating both representation and translation. Official architecture references and ABI specifications describe these addressing forms and constraints. (Sources: Intel SDM, Microsoft x64 architecture docs)

Deep Dive

Data representation is the mapping between abstract values and physical bytes. On x86-64, integers are typically two’s complement and stored in little-endian order. That means the least significant byte sits at the lowest memory address. When you inspect memory dumps, the order will appear reversed relative to the human-readable hex. This matters for debugging and binary analysis; it also matters for writing correct parsing and serialization logic.

Memory addressing is a key differentiator between x86-64 and many simpler ISAs. The architecture supports effective addresses of the form base + index * scale + displacement, where scale can be 1, 2, 4, or 8. This lets the CPU calculate addresses for arrays and structures in a single instruction, which is why compiler output often uses complex addressing instead of explicit multiply or add instructions. In long mode, RIP-relative addressing is widely used for position-independent code; it allows the instruction stream to refer to nearby constants and jump tables without absolute addresses. That is why you will see references relative to RIP rather than absolute pointers in modern binaries.

Virtual memory is the next layer of meaning. The addresses in registers are virtual; they are translated to physical addresses using a page table hierarchy. As a result, two different processes can have the same virtual address mapping to different physical memory. The OS enforces protection and isolation through page permissions. When you read assembly, you see the virtual addresses. The mapping is invisible unless you consult page tables or OS introspection tools, which is why memory corruption bugs can appear non-deterministic; they might read valid memory but the wrong mapping.

Alignment is another subtlety. Many instructions perform better when data is aligned to its natural width (for example, 8-byte aligned for 64-bit values). Misaligned loads are supported in x86-64 but can be slower or cause extra microarchitectural work. ABI conventions often require stack alignment to 16 bytes at call boundaries, which ensures that SIMD operations and stack-based data are aligned. This alignment rule is part of the ABI, not just a performance hint.

Addressing modes also influence instruction encoding. The ModR/M and SIB bytes encode the base, index, scale, and displacement. Some combinations are invalid or have special meaning (for example, certain base/index fields imply RIP-relative addressing or a displacement-only form). Understanding this encoding is critical for building decoders and for interpreting bytes in memory. It is also how you can verify that a disassembler is correct: the addressing mode can be inferred from the encoding and compared to the textual rendering.

Finally, consider how data representation affects control flow and calling conventions. Arguments passed by reference are simply addresses; the ABI does not enforce type. That means assembly must interpret the bytes correctly, or the program will behave incorrectly even if the instruction sequence is “valid.” This is where assembly becomes a discipline: you must know what the bytes mean, and that meaning is not written anywhere except in the ABI and the program’s logic.

How this fits on projects

  • Projects 2-4 are explicitly about effective address calculation and RIP-relative forms.
  • Projects 9-10 require precise understanding of data layout and alignment inside ELF/PE sections.

Definitions & key terms

  • Little-endian: Least significant byte at lowest address.
  • Effective address: The computed address used by a memory instruction.
  • RIP-relative: Addressing relative to the instruction pointer.
  • Virtual memory: The address space seen by a process, mapped to physical memory.
  • Alignment: Address boundary that improves correctness or performance.

Mental model diagram

VALUE -> BYTES -> VIRTUAL ADDRESS -> PAGE TABLE -> PHYSICAL ADDRESS

      +---------+          +------------------+
      |  Value  |  encode  |  Byte Sequence   |
      +---------+          +---------+--------+
                                    |
                                    v
                         +----------------------+
                         |  Effective Address   |
                         | base + index*scale + |
                         |      displacement    |
                         +----------+-----------+
                                    |
                                    v
                         +----------------------+
                         |   Virtual Address    |
                         +----------+-----------+
                                    |
                                    v
                         +----------------------+
                         |   Page Translation   |
                         +----------+-----------+
                                    |
                                    v
                         +----------------------+
                         |   Physical Address   |
                         +----------------------+

How it works

  1. Program computes effective address from base/index/scale/disp.
  2. CPU uses that effective address as a virtual address.
  3. MMU translates virtual to physical using page tables.
  4. Data is loaded or stored in little-endian byte order.

Invariants and failure modes:

  • Invariant: Effective address is computed before translation.
  • Failure: Misinterpreting endianness yields wrong values.
  • Invariant: ABI defines alignment at call boundaries.
  • Failure: Misalignment can break SIMD assumptions or slow down code.

Minimal concrete example (pseudo-assembly, not real code)

# PSEUDOCODE ONLY
# Compute address of element i in an array of 8-byte elements
EFFECTIVE_ADDRESS = BASE_PTR + INDEX * 8 + OFFSET
LOAD64 REG_X, [EFFECTIVE_ADDRESS]

Common misconceptions

  • “x86-64 is big-endian.” It is little-endian by default.
  • “All addresses are physical.” User code uses virtual addresses.
  • “Alignment is optional.” It is required by ABI for some operations.

Check-your-understanding questions

  1. Why does little-endian matter when reading a hexdump?
  2. What is the difference between effective and virtual address?
  3. Why do compilers use base+index*scale addressing?

Check-your-understanding answers

  1. The byte order is reversed relative to human-readable hex.
  2. Effective is computed by the instruction; virtual is then translated.
  3. It encodes array indexing in a single instruction.

Real-world applications

  • Debugging pointer arithmetic errors
  • Building instruction decoders and disassemblers
  • Understanding how compilers lay out data

Where you will apply it Projects 2, 3, 4, 9, 10

References

  • Intel 64 and IA-32 Architectures Software Developer’s Manual (Intel)
  • Microsoft x64 architecture documentation
  • “Computer Systems: A Programmer’s Perspective” by Bryant and O’Hallaron - Ch. 3

Key insights Memory is not just bytes; it is a layered mapping between representation and address translation.

Summary Effective addressing and data layout are the glue between values in your head and bytes in memory.

Homework/Exercises to practice the concept

  • Convert a 64-bit integer into its little-endian byte sequence.
  • Compute effective addresses for an array with different indices.

Solutions to the homework/exercises

  • List the bytes from least significant to most significant.
  • Use base + index * element_size + offset.

3. Project Specification

3.1 What You Will Build

A CLI tool that computes effective addresses from base, index, scale, and displacement.

Why this teaches x86-64: Address calculation is the heart of memory operations and compiler output.

Included:

  • Deterministic CLI output for a fixed input
  • Clear mapping between inputs and architectural meaning
  • A small test suite with edge cases

Excluded:

  • Full compiler or full disassembler coverage
  • Production-grade UI or packaging

3.2 Functional Requirements

  1. Deterministic Output: Same input yields identical output.
  2. Architecture-Aware: Output references ABI/ISA rules where relevant.
  3. Validation Mode: Provide a compare mode against a golden output.

3.3 Non-Functional Requirements

  • Performance: Fast enough for small inputs and interactive use.
  • Reliability: Handles malformed inputs with clear errors.
  • Usability: Outputs are readable and documented.

3.4 Example Usage / Output

$ x64addr --base 0x1000 --index 0x20 --scale 8 --disp 0x18

EFFECTIVE_ADDRESS = 0x0000000000001118
ALIGNMENT: 8-byte aligned
RIP_RELATIVE: false

3.5 Data Formats / Schemas / Protocols

  • Input format: line-oriented text or hex bytes (documented in README)
  • Output format: stable, human-readable report with labeled fields

3.6 Edge Cases

  • Empty input or missing fields
  • Invalid numeric values or malformed hex
  • Inputs that exercise maximum/minimum bounds

3.7 Real World Outcome

This section is your golden reference. Match it exactly.

3.7.1 How to Run (Copy/Paste)

  • Build: (if needed) make or equivalent
  • Run: P02-addressing-mode-calculator with sample input
  • Working directory: project root

3.7.2 Golden Path Demo (Deterministic)

Run with the provided demo input and confirm output matches the transcript.

3.7.3 If CLI: exact terminal transcript

$ x64addr --base 0x1000 --index 0x20 --scale 8 --disp 0x18

EFFECTIVE_ADDRESS = 0x0000000000001118
ALIGNMENT: 8-byte aligned
RIP_RELATIVE: false

4. Solution Architecture

4.1 High-Level Design

INPUT -> PARSER -> MODEL -> RENDERER -> REPORT

4.2 Key Components

Component Responsibility Key Decisions
Parser Turn input into structured records Strict vs permissive parsing
Model Apply ISA/ABI rules Deterministic state transitions
Renderer Produce readable output Stable formatting

4.4 Data Structures (No Full Code)

  • Record: holds one instruction/event with decoded fields
  • State: represents register/flag or address state
  • Report: list of formatted output lines

4.4 Algorithm Overview

Key Algorithm: Parse and Evaluate

  1. Parse input into records.
  2. Apply rules to update state.
  3. Render the state and summary output.

Complexity Analysis:

  • Time: O(n) over input records
  • Space: O(n) for report output

5. Implementation Guide

5.1 Development Environment Setup

# Ensure basic tools are installed
# build-essential or clang, plus objdump/readelf if needed

5.2 Project Structure

project-root/
├── src/
│   ├── main.*
│   ├── parser.*
│   └── model.*
├── tests/
│   └── test_cases.*
└── README.md

5.3 The Core Question You’re Answering

How does the CPU compute the address used by a memory instruction?

5.4 Concepts You Must Understand First

  1. Addressing Modes
    • What is base + index * scale + displacement?
    • Book Reference: “Computer Systems: A Programmer’s Perspective” - Ch. 3
  2. Alignment
    • Why does alignment matter for performance?
    • Book Reference: “Write Great Code, Volume 1” - Ch. 6

5.5 Questions to Guide Your Design

  1. Input Model
    • How will users specify base/index/scale/disp?
    • How will you handle missing components?
  2. Validation
    • How will you detect invalid scales?
    • How will you report alignment conditions?

5.6 Thinking Exercise

Address Walkthrough

Given a base of 0x1000 and an index of 0x3, compute addresses for scale 1,2,4,8 and a displacement of 0x20. Explain which are aligned for 8-byte data.

Questions to answer:

  • Which combination yields the smallest effective address?
  • How does alignment change when scale changes?

5.7 The Interview Questions They’ll Ask

  1. “What addressing modes does x86-64 support?”
  2. “Why does RIP-relative addressing exist?”
  3. “How does alignment affect performance?”
  4. “What happens if you use a misaligned address?”
  5. “How would you compute an address for a struct field?”

5.8 Hints in Layers

Hint 1: Starting Point Represent the address components as integers and compute a single formula.

Hint 2: Next Level Add validation for scale values and optional fields.

Hint 3: Technical Details Treat missing base or index as zero. Keep output in canonical 64-bit hex.

Hint 4: Tools/Debugging Cross-check a few results by hand and confirm the tool matches.

5.9 Books That Will Help

Topic Book Chapter
Addressing modes “Computer Systems: A Programmer’s Perspective” Ch. 3
Data layout “Write Great Code, Volume 1” Ch. 6

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

  • Parse input format
  • Produce a minimal output Tasks:
    1. Define input grammar and example files.
    2. Implement a minimal parser and renderer. Checkpoint: Golden output matches a small input.

Phase 2: Core Functionality (1 week)

Goals:

  • Implement full rule set
  • Add validation and errors Tasks:
    1. Implement rule engine for core cases.
    2. Add error handling for invalid inputs. Checkpoint: All core tests pass.

Phase 3: Polish & Edge Cases (2-3 days)

Goals:

  • Add edge-case coverage
  • Improve output readability Tasks:
    1. Add edge-case tests.
    2. Refine output formatting and summary. Checkpoint: Output matches golden transcript for all cases.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Input format Text, JSON Text Easiest to audit and diff
Output format Plain text, JSON Plain text Matches CLI tooling

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate parsing and rule application Valid/invalid inputs
Integration Tests End-to-end output comparison Golden transcripts
Edge Case Tests Stress unusual inputs Empty input, max values

6.2 Critical Test Cases

  1. Minimal Input: One record, verify output.
  2. Boundary Values: Largest/smallest values.
  3. Malformed Input: Ensure clean error messages.

6.3 Test Data

INPUT: sample_min.txt
EXPECTED: matches golden transcript

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong assumptions Output mismatches Re-read ABI/ISA rules
Off-by-one parsing Missing fields Add explicit length checks
Ambiguous output Hard to verify Add labels and separators

Project-specific pitfalls

Problem 1: “Addresses are wrong when index is missing”

  • Why: Missing index treated as garbage instead of zero.
  • Fix: Default missing fields to zero.
  • Quick test: Base-only input should equal base + disp.

7.2 Debugging Strategies

  • Golden diffing: Use diff to compare outputs line by line.
  • State logging: Print intermediate state after each step.

7.3 Performance Traps

  • Avoid over-optimizing; correctness and determinism matter most.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a new input case and golden output
  • Add a summary line with counts

8.2 Intermediate Extensions

  • Add JSON output mode
  • Add validation warnings for suspicious inputs

8.3 Advanced Extensions

  • Support additional ABI or instruction variants
  • Integrate with a real binary to collect inputs

9. Real-World Connections

9.1 Industry Applications

  • Profilers and tracers: Use similar decoding and state models.
  • Security analysis: Use precise ABI knowledge to interpret crashes.
  • objdump: reference tool for binary inspection.
  • llvm-objdump: LLVM-based disassembly and inspection.

9.3 Interview Relevance

  • ABI and calling conventions are common systems interview topics.
  • Explaining decoding and linking demonstrates low-level fluency.

10. Resources

10.1 Essential Reading

  • Intel 64 and IA-32 Architectures Software Developer’s Manual - ISA reference
  • System V AMD64 ABI Draft 0.99.7 - calling convention rules

10.2 Video Resources

  • Vendor and university lectures on x86-64 and ABIs (search official channels)