Project 1: Register and Flags Simulator
A CLI simulator that executes a tiny pseudo-instruction set and prints register/flag transitions.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3 |
| Time Estimate | Weekend |
| Main Programming Language | Python or C (text-based simulator) (Alternatives: Rust, Go) |
| Alternative Programming Languages | Rust, Go |
| Coolness Level | Level 3 |
| Business Potential | 1 |
| Prerequisites | Architectural State and Execution Modes, Data Representation, Memory, and Addressing |
| Key Topics | Architectural State and Execution Modes, Data Representation, Memory, and Addressing |
1. Learning Objectives
By completing this project, you will:
- Explain why register and flags simulator reveals key x86-64 behaviors.
- Build a deterministic tool with clear, inspectable output.
- Validate correctness against a golden reference output.
- Connect the tool output to ABI and architecture rules.
- You build the mental model of architectural state changes without needing real code.
2. All Theory Needed (Per-Concept Breakdown)
Architectural State and Execution Modes
Fundamentals
The x86-64 architecture defines a precise set of architectural state: general-purpose registers, flags, instruction pointer, and control registers. x86-64 is a 64-bit extension of the x86 family; it adds 64-bit registers and a 64-bit address space while preserving backward compatibility. In practice, the CPU runs in long mode when executing 64-bit code. Long mode includes both 64-bit submode and compatibility submode for 32-bit code, which is why the same processor can run modern OSes and legacy binaries. Understanding architectural state is foundational: every instruction is just a transformation of state, and every bug is a mismatch between assumed state and actual state. Official architecture references describe the state, registers, and execution modes in detail, and those documents are the ground truth for everything you will do in this guide. (Sources: Intel SDM, Microsoft x64 architecture docs)
Deep Dive
Architectural state is the contract between software and hardware. On x86-64, this contract includes 16 general-purpose registers (GPRs), instruction pointer (RIP), flags (RFLAGS), vector registers, and a collection of control and model-specific registers. Most assembly you write or analyze uses the GPRs, RIP, and RFLAGS. A key idea is that x86-64 is not a clean break from x86; it is an extension. The architecture introduces 64-bit registers by extending the existing 32-bit ones. The 64-bit submode uses RIP-relative addressing as a first-class form of memory reference, enabling position-independent code. Long mode also changes segmentation behavior: segmentation is mostly disabled for code and data, with flat 64-bit addressing, while FS/GS remain available for thread-local storage and certain OS conventions. Compatibility submode exists to run 32-bit code, which uses the legacy 32-bit register view, limited address space, and different calling conventions. This duality matters because it affects how you reason about binaries, how tools interpret instruction encodings, and how the OS sets up execution.
The register file has subtleties that matter in real code. The 64-bit GPRs can be accessed as 32-bit, 16-bit, and 8-bit sub-registers. Writes to the 32-bit sub-registers zero-extend into the full 64-bit register, which is a performance and correctness feature. Writes to 8-bit and 16-bit sub-registers do not zero-extend and can create partial-register dependencies. That can cause performance penalties on some microarchitectures, which is why compilers prefer 32-bit writes when possible. The RFLAGS register contains condition codes and control flags such as the zero flag, carry flag, and direction flag. Understanding which instructions modify which flags is critical when you analyze branches and conditionals. Even if you are not writing assembly, reading disassembly requires you to track how flags and registers evolve.
Execution mode is another layer of architectural state. The CPU can run in real mode, protected mode, or long mode, each with different addressing and privilege semantics. For x86-64 user-mode work, you primarily live in long mode under an OS that manages paging and privilege transitions. The OS configures control registers, enables paging, and establishes the ABI conventions that user code must follow. That is why architecture knowledge is always paired with ABI knowledge; the architecture defines what is possible, the ABI defines what is expected.
Finally, x86-64 is a CISC architecture with variable-length instructions and a complex encoding scheme. This is part of the architectural state because the instruction pointer advances by the decoded length of each instruction, and the decoder depends on the correct interpretation of prefixes and operand sizes. When you study architectural state, keep the decoder in mind because it is how the CPU interprets the instruction stream. The decode rules are defined in the vendor manuals and are not optional. You cannot reason about control flow without understanding how the instruction pointer moves, and you cannot reason about side effects without knowing which registers and flags are architectural vs microarchitectural.
How this fits on projects
- Projects 1-4 build tools that show and validate architectural state transitions.
- Projects 5-8 require precise understanding of registers, flags, and execution mode to explain control flow and syscalls.
Definitions & key terms
- Architectural state: The CPU state visible and defined by the ISA.
- Long mode: 64-bit execution mode that enables 64-bit addressing and registers.
- Compatibility submode: 32-bit execution within long mode.
- RIP: The instruction pointer in 64-bit mode.
- RFLAGS: The status and control flags register.
Mental model diagram
+------------------------------+
| Architectural State |
+------------------------------+
| GPRs: RAX..R15 |
| RIP (instruction pointer) |
| RFLAGS (status/control) |
| SIMD regs (XMM/YMM/ZMM) |
| Control regs (CR0/CR3/CR4) |
+--------------+---------------+
|
v
+------------------------------+
| Execution Mode (Long) |
| - 64-bit submode |
| - 32-bit compat submode |
+--------------+---------------+
|
v
+------------------------------+
| Instruction Decoder |
| (interprets byte stream) |
+------------------------------+
How it works
- OS configures control registers to enable long mode and paging.
- CPU fetches instruction bytes at RIP.
- Decoder interprets prefixes and operand sizes based on mode.
- Instruction executes, updating registers and RFLAGS.
- RIP advances by decoded instruction length.
Invariants and failure modes:
- Invariant: RIP always points to the next instruction boundary.
- Failure: incorrect decoding changes RIP and corrupts control flow.
- Invariant: mode determines operand sizes and address sizes.
- Failure: mode confusion leads to misinterpreting data as code.
Minimal concrete example (pseudo-assembly, not real code)
# PSEUDOCODE ONLY
STATE:
REG_A = 5
REG_B = 7
INSTRUCTION STREAM:
LOAD64 REG_TMP, [ADDR_X]
ADD64 REG_A, REG_B
CMP64 REG_A, REG_TMP
JUMP_IF_ZERO LABEL_OK
Common misconceptions
- “x86-64 is totally different from x86.” It is an extension with compatibility.
- “All registers are independent.” Sub-register writes can affect full registers.
- “Flags are only for comparisons.” Many arithmetic instructions update flags.
Check-your-understanding questions
- Why does writing a 32-bit sub-register zero-extend the 64-bit register?
- What does compatibility mode allow in long mode?
- Why is RIP-relative addressing important for position-independent code?
Check-your-understanding answers
- It simplifies dependency tracking and enables efficient zero-extension.
- It allows 32-bit code to run on a 64-bit CPU under a 64-bit OS.
- It allows code to reference nearby data without absolute addresses.
Real-world applications
- Reverse engineering compiled binaries
- Debugging crashes and register corruption
- Building profilers and tracers
Where you will apply it Projects 1, 2, 3, 4, 5, 7
References
- Intel 64 and IA-32 Architectures Software Developer’s Manual (Intel)
- Microsoft x64 architecture documentation
- “The Art of 64-Bit Assembly, Volume 1” by Randall Hyde - Ch. 1-3
Key insights Architectural state is the smallest truth you can trust when everything else is uncertain.
Summary You cannot reason about x86-64 without knowing what the CPU state is and how execution mode shapes instruction meaning.
Homework/Exercises to practice the concept
- List all architectural registers you can name and group them by purpose.
- Draw the state transitions of a simple conditional branch.
Solutions to the homework/exercises
- Group into GPRs, RIP, RFLAGS, SIMD, control registers.
- Show state before compare, flags after compare, and RIP change on branch.
Data Representation, Memory, and Addressing
Fundamentals
x86-64 is a byte-addressed, little-endian architecture. Data representation determines how values appear in memory, how loads and stores reconstruct those values, and how alignment affects performance. Memory addressing is not just “base + offset”; it is a rich set of forms including base, index, scale, and displacement, plus RIP-relative addressing in 64-bit mode. These addressing forms are part of the ISA and are a primary tool for compilers. Virtual memory adds another layer: the addresses you see in registers are virtual, translated by page tables configured by the OS. When you write or analyze assembly, you are always navigating both representation and translation. Official architecture references and ABI specifications describe these addressing forms and constraints. (Sources: Intel SDM, Microsoft x64 architecture docs)
Deep Dive
Data representation is the mapping between abstract values and physical bytes. On x86-64, integers are typically two’s complement and stored in little-endian order. That means the least significant byte sits at the lowest memory address. When you inspect memory dumps, the order will appear reversed relative to the human-readable hex. This matters for debugging and binary analysis; it also matters for writing correct parsing and serialization logic.
Memory addressing is a key differentiator between x86-64 and many simpler ISAs. The architecture supports effective addresses of the form base + index * scale + displacement, where scale can be 1, 2, 4, or 8. This lets the CPU calculate addresses for arrays and structures in a single instruction, which is why compiler output often uses complex addressing instead of explicit multiply or add instructions. In long mode, RIP-relative addressing is widely used for position-independent code; it allows the instruction stream to refer to nearby constants and jump tables without absolute addresses. That is why you will see references relative to RIP rather than absolute pointers in modern binaries.
Virtual memory is the next layer of meaning. The addresses in registers are virtual; they are translated to physical addresses using a page table hierarchy. As a result, two different processes can have the same virtual address mapping to different physical memory. The OS enforces protection and isolation through page permissions. When you read assembly, you see the virtual addresses. The mapping is invisible unless you consult page tables or OS introspection tools, which is why memory corruption bugs can appear non-deterministic; they might read valid memory but the wrong mapping.
Alignment is another subtlety. Many instructions perform better when data is aligned to its natural width (for example, 8-byte aligned for 64-bit values). Misaligned loads are supported in x86-64 but can be slower or cause extra microarchitectural work. ABI conventions often require stack alignment to 16 bytes at call boundaries, which ensures that SIMD operations and stack-based data are aligned. This alignment rule is part of the ABI, not just a performance hint.
Addressing modes also influence instruction encoding. The ModR/M and SIB bytes encode the base, index, scale, and displacement. Some combinations are invalid or have special meaning (for example, certain base/index fields imply RIP-relative addressing or a displacement-only form). Understanding this encoding is critical for building decoders and for interpreting bytes in memory. It is also how you can verify that a disassembler is correct: the addressing mode can be inferred from the encoding and compared to the textual rendering.
Finally, consider how data representation affects control flow and calling conventions. Arguments passed by reference are simply addresses; the ABI does not enforce type. That means assembly must interpret the bytes correctly, or the program will behave incorrectly even if the instruction sequence is “valid.” This is where assembly becomes a discipline: you must know what the bytes mean, and that meaning is not written anywhere except in the ABI and the program’s logic.
How this fits on projects
- Projects 2-4 are explicitly about effective address calculation and RIP-relative forms.
- Projects 9-10 require precise understanding of data layout and alignment inside ELF/PE sections.
Definitions & key terms
- Little-endian: Least significant byte at lowest address.
- Effective address: The computed address used by a memory instruction.
- RIP-relative: Addressing relative to the instruction pointer.
- Virtual memory: The address space seen by a process, mapped to physical memory.
- Alignment: Address boundary that improves correctness or performance.
Mental model diagram
VALUE -> BYTES -> VIRTUAL ADDRESS -> PAGE TABLE -> PHYSICAL ADDRESS
+---------+ +------------------+
| Value | encode | Byte Sequence |
+---------+ +---------+--------+
|
v
+----------------------+
| Effective Address |
| base + index*scale + |
| displacement |
+----------+-----------+
|
v
+----------------------+
| Virtual Address |
+----------+-----------+
|
v
+----------------------+
| Page Translation |
+----------+-----------+
|
v
+----------------------+
| Physical Address |
+----------------------+
How it works
- Program computes effective address from base/index/scale/disp.
- CPU uses that effective address as a virtual address.
- MMU translates virtual to physical using page tables.
- Data is loaded or stored in little-endian byte order.
Invariants and failure modes:
- Invariant: Effective address is computed before translation.
- Failure: Misinterpreting endianness yields wrong values.
- Invariant: ABI defines alignment at call boundaries.
- Failure: Misalignment can break SIMD assumptions or slow down code.
Minimal concrete example (pseudo-assembly, not real code)
# PSEUDOCODE ONLY
# Compute address of element i in an array of 8-byte elements
EFFECTIVE_ADDRESS = BASE_PTR + INDEX * 8 + OFFSET
LOAD64 REG_X, [EFFECTIVE_ADDRESS]
Common misconceptions
- “x86-64 is big-endian.” It is little-endian by default.
- “All addresses are physical.” User code uses virtual addresses.
- “Alignment is optional.” It is required by ABI for some operations.
Check-your-understanding questions
- Why does little-endian matter when reading a hexdump?
- What is the difference between effective and virtual address?
- Why do compilers use base+index*scale addressing?
Check-your-understanding answers
- The byte order is reversed relative to human-readable hex.
- Effective is computed by the instruction; virtual is then translated.
- It encodes array indexing in a single instruction.
Real-world applications
- Debugging pointer arithmetic errors
- Building instruction decoders and disassemblers
- Understanding how compilers lay out data
Where you will apply it Projects 2, 3, 4, 9, 10
References
- Intel 64 and IA-32 Architectures Software Developer’s Manual (Intel)
- Microsoft x64 architecture documentation
- “Computer Systems: A Programmer’s Perspective” by Bryant and O’Hallaron - Ch. 3
Key insights Memory is not just bytes; it is a layered mapping between representation and address translation.
Summary Effective addressing and data layout are the glue between values in your head and bytes in memory.
Homework/Exercises to practice the concept
- Convert a 64-bit integer into its little-endian byte sequence.
- Compute effective addresses for an array with different indices.
Solutions to the homework/exercises
- List the bytes from least significant to most significant.
- Use base + index * element_size + offset.
3. Project Specification
3.1 What You Will Build
A CLI simulator that executes a tiny pseudo-instruction set and prints register/flag transitions.
Why this teaches x86-64: You build the mental model of architectural state changes without needing real code.
Included:
- Deterministic CLI output for a fixed input
- Clear mapping between inputs and architectural meaning
- A small test suite with edge cases
Excluded:
- Full compiler or full disassembler coverage
- Production-grade UI or packaging
3.2 Functional Requirements
- Deterministic Output: Same input yields identical output.
- Architecture-Aware: Output references ABI/ISA rules where relevant.
- Validation Mode: Provide a compare mode against a golden output.
3.3 Non-Functional Requirements
- Performance: Fast enough for small inputs and interactive use.
- Reliability: Handles malformed inputs with clear errors.
- Usability: Outputs are readable and documented.
3.4 Example Usage / Output
Your simulator produces a trace that looks like a real CPU state dump, but using a simplified pseudo-ISA.
$ x64sim --program demo.trace
STEP 0
REG_A=0x0000000000000005 REG_B=0x0000000000000007
FLAGS: Z=0 N=0 C=0 O=0
STEP 1
OP=ADD64 REG_A, REG_B
REG_A=0x000000000000000C REG_B=0x0000000000000007
FLAGS: Z=0 N=0 C=0 O=0
STEP 2
OP=CMP64 REG_A, 0x000000000000000C
FLAGS: Z=1 N=0 C=0 O=0
3.5 Data Formats / Schemas / Protocols
- Input format: line-oriented text or hex bytes (documented in README)
- Output format: stable, human-readable report with labeled fields
3.6 Edge Cases
- Empty input or missing fields
- Invalid numeric values or malformed hex
- Inputs that exercise maximum/minimum bounds
3.7 Real World Outcome
This section is your golden reference. Match it exactly.
3.7.1 How to Run (Copy/Paste)
- Build: (if needed)
makeor equivalent - Run:
P01-register-and-flags-simulatorwith sample input - Working directory: project root
3.7.2 Golden Path Demo (Deterministic)
Run with the provided demo input and confirm output matches the transcript.
3.7.3 If CLI: exact terminal transcript
Your simulator produces a trace that looks like a real CPU state dump, but using a simplified pseudo-ISA.
$ x64sim --program demo.trace
STEP 0
REG_A=0x0000000000000005 REG_B=0x0000000000000007
FLAGS: Z=0 N=0 C=0 O=0
STEP 1
OP=ADD64 REG_A, REG_B
REG_A=0x000000000000000C REG_B=0x0000000000000007
FLAGS: Z=0 N=0 C=0 O=0
STEP 2
OP=CMP64 REG_A, 0x000000000000000C
FLAGS: Z=1 N=0 C=0 O=0
4. Solution Architecture
4.1 High-Level Design
INPUT -> PARSER -> MODEL -> RENDERER -> REPORT
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Parser | Turn input into structured records | Strict vs permissive parsing |
| Model | Apply ISA/ABI rules | Deterministic state transitions |
| Renderer | Produce readable output | Stable formatting |
4.4 Data Structures (No Full Code)
- Record: holds one instruction/event with decoded fields
- State: represents register/flag or address state
- Report: list of formatted output lines
4.4 Algorithm Overview
Key Algorithm: Parse and Evaluate
- Parse input into records.
- Apply rules to update state.
- Render the state and summary output.
Complexity Analysis:
- Time: O(n) over input records
- Space: O(n) for report output
5. Implementation Guide
5.1 Development Environment Setup
# Ensure basic tools are installed
# build-essential or clang, plus objdump/readelf if needed
5.2 Project Structure
project-root/
├── src/
│ ├── main.*
│ ├── parser.*
│ └── model.*
├── tests/
│ └── test_cases.*
└── README.md
5.3 The Core Question You’re Answering
How does a single instruction change the CPU state, and how do flags encode that change?
5.4 Concepts You Must Understand First
- Architectural State
- What registers and flags exist?
- Book Reference: “The Art of 64-Bit Assembly, Volume 1” - Ch. 1-3
- Data Representation
- How do integers appear in registers?
- Book Reference: “Computer Systems: A Programmer’s Perspective” - Ch. 3
5.5 Questions to Guide Your Design
- State Modeling
- How will you represent registers and flags in memory?
- How will you update flags deterministically?
- Trace Format
- What trace format will you accept as input?
- How will you render each step for human readability?
5.6 Thinking Exercise
Trace the Flags
Draw a table of REG_A, REG_B, and flags after each pseudo-instruction in a three-step program. Explain why the zero flag changes when it does.
Questions to answer:
- Which operations update flags in your simulator?
- What does a flag mean when no arithmetic occurred?
5.7 The Interview Questions They’ll Ask
- “How do flags influence conditional branches?”
- “Why do some instructions update flags and others do not?”
- “What is the difference between signed and unsigned comparisons at the flag level?”
- “How would you simulate overflow in 64-bit arithmetic?”
- “Why are partial register updates tricky?”
5.8 Hints in Layers
Hint 1: Starting Point Design a small struct that holds registers and flags, and a function that applies one instruction.
Hint 2: Next Level Implement arithmetic as pure functions that return both a value and a flag set.
Hint 3: Technical Details Use a table-driven dispatch: opcode -> handler. Keep handlers small and deterministic.
Hint 4: Tools/Debugging Create a golden trace file and compare output with a diff tool after each change.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Register model | “The Art of 64-Bit Assembly, Volume 1” | Ch. 1-3 |
| Data representation | “Computer Systems: A Programmer’s Perspective” | Ch. 3 |
5.10 Implementation Phases
Phase 1: Foundation (2-3 days)
Goals:
- Parse input format
- Produce a minimal output
Tasks:
- Define input grammar and example files.
- Implement a minimal parser and renderer. Checkpoint: Golden output matches a small input.
Phase 2: Core Functionality (1 week)
Goals:
- Implement full rule set
- Add validation and errors
Tasks:
- Implement rule engine for core cases.
- Add error handling for invalid inputs. Checkpoint: All core tests pass.
Phase 3: Polish & Edge Cases (2-3 days)
Goals:
- Add edge-case coverage
- Improve output readability
Tasks:
- Add edge-case tests.
- Refine output formatting and summary. Checkpoint: Output matches golden transcript for all cases.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Input format | Text, JSON | Text | Easiest to audit and diff |
| Output format | Plain text, JSON | Plain text | Matches CLI tooling |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate parsing and rule application | Valid/invalid inputs |
| Integration Tests | End-to-end output comparison | Golden transcripts |
| Edge Case Tests | Stress unusual inputs | Empty input, max values |
6.2 Critical Test Cases
- Minimal Input: One record, verify output.
- Boundary Values: Largest/smallest values.
- Malformed Input: Ensure clean error messages.
6.3 Test Data
INPUT: sample_min.txt
EXPECTED: matches golden transcript
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong assumptions | Output mismatches | Re-read ABI/ISA rules |
| Off-by-one parsing | Missing fields | Add explicit length checks |
| Ambiguous output | Hard to verify | Add labels and separators |
Project-specific pitfalls
Problem 1: “Flags look wrong after subtraction”
- Why: Signed vs unsigned overflow rules were mixed.
- Fix: Implement carry and overflow separately.
- Quick test: Run a trace with a known overflow and compare flags.
7.2 Debugging Strategies
- Golden diffing: Use diff to compare outputs line by line.
- State logging: Print intermediate state after each step.
7.3 Performance Traps
- Avoid over-optimizing; correctness and determinism matter most.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a new input case and golden output
- Add a summary line with counts
8.2 Intermediate Extensions
- Add JSON output mode
- Add validation warnings for suspicious inputs
8.3 Advanced Extensions
- Support additional ABI or instruction variants
- Integrate with a real binary to collect inputs
9. Real-World Connections
9.1 Industry Applications
- Profilers and tracers: Use similar decoding and state models.
- Security analysis: Use precise ABI knowledge to interpret crashes.
9.2 Related Open Source Projects
- objdump: reference tool for binary inspection.
- llvm-objdump: LLVM-based disassembly and inspection.
9.3 Interview Relevance
- ABI and calling conventions are common systems interview topics.
- Explaining decoding and linking demonstrates low-level fluency.
10. Resources
10.1 Essential Reading
- Intel 64 and IA-32 Architectures Software Developer’s Manual - ISA reference
- System V AMD64 ABI Draft 0.99.7 - calling convention rules
10.2 Video Resources
- Vendor and university lectures on x86-64 and ABIs (search official channels)