Project 5: The Assembly Level - Disassemble and Stepi

Debug an optimized program at the instruction level and learn what the CPU actually executes.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 3-4 hours
Language GDB commands / x86-64 assembly
Prerequisites Project 1, basic registers, calling convention
Key Topics disassemble, stepi, registers, optimizations

1. Learning Objectives

By completing this project, you will:

  1. Navigate disassembly and correlate it to source code.
  2. Step instruction-by-instruction with stepi and nexti.
  3. Inspect register state and calling convention behavior.
  4. Explain how compiler optimizations change program structure.

2. Theoretical Foundation

2.1 Core Concepts

  • Instruction pointer (RIP): Points to the next instruction to execute.
  • Calling convention: Arguments in rdi, rsi, rdx, return in rax.
  • Optimization effects: Variables may never exist in memory; code can be reordered or removed.

2.2 Why This Matters

When debugging optimized code, source-level stepping lies. Assembly is the source of truth for what ran.

2.3 Historical Context / Background

Compiler optimization research matured in the 1970s and 1980s. Modern compilers aggressively transform code for speed, which makes assembly-level debugging essential.

2.4 Common Misconceptions

  • “C lines map 1:1 to assembly”: Often false under optimization.
  • “Variables always exist”: Many are optimized into registers or removed.

3. Project Specification

3.1 What You Will Build

A simple C program compiled with -O2. You will step through instructions, inspect registers, and prove how optimizations rewrite the code.

3.2 Functional Requirements

  1. Compile with debug symbols and optimizations.
  2. Disassemble main and a helper function.
  3. Step into and through the helper using stepi.
  4. Read register values to confirm computations.

3.3 Non-Functional Requirements

  • Performance: Small program for clear disassembly.
  • Reliability: Same binary used for analysis.
  • Usability: Use TUI for readability.

3.4 Example Usage / Output

(gdb) disassemble main
(gdb) layout asm
(gdb) stepi
(gdb) info registers rax rdi rsi

3.5 Real World Outcome

You will see assembly and register changes during execution:

$ gcc -g -O2 -o optimized optimized.c
$ gdb ./optimized
(gdb) break main
(gdb) run
(gdb) disassemble calculate
Dump of assembler code for function calculate:
   0x... <+0>: lea (%rdi,%rsi,1),%eax
   0x... <+3>: add %eax,%eax
   0x... <+5>: ret
(gdb) stepi
(gdb) info registers rax
rax            0x3c  60

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌───────────────┐     ┌──────────────┐
│ optimized.c  │────▶│ gcc -O2 -g    │────▶│ gdb assembly │
└──────────────┘     └───────────────┘     └──────────────┘

4.2 Key Components

Component Responsibility Key Decisions
Optimized binary Provide realistic assembly Use -O2
GDB session Inspect instructions Use layout asm
Register view Verify computation Inspect rax, rdi, rsi

4.3 Data Structures

struct RegisterSnapshot {
    unsigned long rax;
    unsigned long rdi;
    unsigned long rsi;
};

4.4 Algorithm Overview

Key Algorithm: Instruction trace

  1. Stop at main.
  2. Disassemble functions.
  3. Step instruction-by-instruction.
  4. Read registers after each instruction.

Complexity Analysis:

  • Time: O(I) for number of instructions stepped.
  • Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

gcc -g -O2 -o optimized optimized.c

5.2 Project Structure

project-root/
├── optimized.c
└── optimized

5.3 The Core Question You’re Answering

“What is the CPU actually executing, and how does it differ from my source code?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Calling convention
    • Argument registers and return register
  2. Basic x86-64 instructions
    • mov, lea, add, call, ret
  3. Optimization levels
    • -O0 vs -O2 vs -O3

5.5 Questions to Guide Your Design

  1. Which register holds the return value?
  2. How do you map an instruction to a C line?
  3. Why might the compiler remove a variable?

5.6 Thinking Exercise

How would you force the compiler to keep a variable in memory so it is visible in GDB?

5.7 The Interview Questions They’ll Ask

  1. What is the calling convention on x86-64?
  2. Why does optimized code make debugging harder?
  3. How do you step a single instruction in GDB?

5.8 Hints in Layers

Hint 1: Use TUI

  • layout asm, layout regs

Hint 2: Mixed view

  • disassemble /m main

Hint 3: Inspect registers

  • info registers rax rdi rsi

5.9 Books That Will Help

Topic Book Chapter
Assembly basics CSAPP Ch. 3
Optimizations CSAPP Ch. 5
GDB assembly features GDB Manual “Examining Source”

5.10 Implementation Phases

Phase 1: Foundation (45 minutes)

Goals:

  • Produce an optimized binary.

Tasks:

  1. Compile with -O2 -g.
  2. Disassemble main.

Checkpoint: You can see assembly output.

Phase 2: Core Functionality (60 minutes)

Goals:

  • Step and inspect.

Tasks:

  1. Step through instructions with stepi.
  2. Record register values.

Checkpoint: You can explain how calculate works in assembly.

Phase 3: Polish & Edge Cases (45 minutes)

Goals:

  • Compare with -O0 build.

Tasks:

  1. Build a non-optimized binary.
  2. Compare disassembly and variable visibility.

Checkpoint: You can explain optimization differences.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Optimization -O0, -O2 -O2 Demonstrates real-world issues
View mode TUI vs plain TUI Easier to follow

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Disassembly Confirm view disassemble main works
Stepping Validate instruction stepping stepi advances RIP
Register read Confirm state info registers rax

6.2 Critical Test Cases

  1. Disassembly shows calculate instructions.
  2. stepi moves across call and ret.
  3. rax holds expected result.

6.3 Test Data

calculate(10, 20) -> 60

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Optimized out variables <optimized out> Inspect registers instead
Confusing source lines Stepping jumps Use disassemble /m
Missing symbols No source mapping Compile with -g

7.2 Debugging Strategies

  • Track rip and use x/10i $rip.
  • Compare -O0 and -O2 builds to learn changes.

7.3 Performance Traps

None for this small target, but be aware that single-stepping optimized code is slower.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Use nexti to skip over function calls.

8.2 Intermediate Extensions

  • Debug a loop unrolled by the compiler.

8.3 Advanced Extensions

  • Inspect SIMD instructions and vector registers.

9. Real-World Connections

9.1 Industry Applications

  • Performance debugging: Identify compiler reordering and optimization effects.
  • Security: Understand how binary-level behavior differs from source.
  • Compiler Explorer: Compare source and assembly visually.
  • GDB: Official debugger.

9.3 Interview Relevance

  • Demonstrates low-level understanding of how code executes.

10. Resources

10.1 Essential Reading

  • CSAPP - Machine-level representation and optimizations.
  • Intel SDM - Instruction reference.

10.2 Video Resources

  • Search: “gdb assembly stepi”.

10.3 Tools & Documentation

  • GDB: https://sourceware.org/gdb/
  • objdump: For offline disassembly.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain how a call sets up a return.
  • I can map an instruction to a C statement.

11.2 Implementation

  • I stepped through an optimized function.
  • I verified results in registers.

11.3 Growth

  • I can debug code even if source is misleading.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Disassemble and step through calculate.

Full Completion:

  • Explain register values at each instruction.

Excellence (Going Above & Beyond):

  • Compare instruction sequences for -O0 and -O2 builds.

This guide was generated from LEARN_GDB_DEEP_DIVE.md. For the complete learning path, see the parent directory README.