Project 18: C to Assembly Translator (“From C to Assembly”)

Build an educational tool showing how C constructs translate to assembly, with side-by-side comparison of -O0 vs -O2 output, annotated with optimization explanations.


Quick Reference

Attribute Value
Language C (with shell scripting for orchestration)
Difficulty Level 5 (Master)
Time 3 Weeks
Book Reference CS:APP Chapter 3, Expert C Programming Ch. 8
Coolness Performance Gold - See what your code becomes
Portfolio Value Exceptional - Demonstrates deep systems knowledge

Learning Objectives

By completing this project, you will:

  1. Master the compilation pipeline: Understand how C source code transforms through preprocessing, compilation, assembly, and linking into executable machine code

  2. Read and interpret x86-64 assembly: Recognize common patterns for loops, conditionals, function calls, and data access in compiler-generated assembly

  3. Understand optimization transformations: Know how -O0, -O1, -O2, and -O3 change code generation and why each transformation improves performance

  4. Map C constructs to assembly patterns: Predict what assembly the compiler will generate for any C construct

  5. Analyze code generation differences: Compare how GCC and Clang generate different assembly for the same source code

  6. Understand calling conventions: Know how arguments are passed, how return values work, and how the stack frame is managed

  7. Identify performance bottlenecks: Use assembly analysis to find inefficient code patterns and optimize effectively

  8. Build educational tooling: Create tools that help others learn low-level programming concepts


The Core Question You’re Answering

“How does the compiler transform each C construct into machine instructions, and how do optimizations fundamentally change this translation?”

Most programmers treat the compiler as a black box - C code goes in, executables come out. But understanding what happens inside that box is essential for:

  • Performance optimization: Writing C code that compiles to efficient assembly
  • Debugging: Understanding why code behaves unexpectedly at the machine level
  • Security research: Analyzing how vulnerabilities manifest in machine code
  • Systems programming: Writing code that interacts correctly with hardware and OS
  • Interview excellence: Demonstrating deep systems knowledge

When you finish this project, you will look at any C code and mentally compile it to assembly. You will understand why certain “obvious” optimizations are automatic while others require explicit code changes.


Theoretical Foundation

The Compilation Pipeline

Understanding what happens between gcc source.c and executable output:

THE C COMPILATION PIPELINE
================================================================================

  source.c                    Preprocessed                  Assembly
 ┌──────────────┐            ┌──────────────┐            ┌──────────────┐
 │ #include     │            │ (includes    │            │ .text        │
 │ #define      │   cpp      │  expanded)   │   cc1      │ .globl main  │
 │              │ ─────────► │              │ ─────────► │ main:        │
 │ int main() { │            │ int main() { │            │   push rbp   │
 │   return 0;  │            │   return 0;  │            │   mov rbp,rsp│
 │ }            │            │ }            │            │   xor eax,eax│
 └──────────────┘            └──────────────┘            │   pop rbp    │
      .c                          .i                     │   ret        │
                                                         └──────────────┘
                                                               .s

                                        │
                                        │  as (assembler)
                                        ▼

                              Object File                  Executable
                             ┌──────────────┐            ┌──────────────┐
                             │ ELF Header   │            │ ELF Header   │
                             │ .text (code) │   ld       │ .text        │
                             │ .data        │ ─────────► │ .data        │
                             │ .symtab      │  (linker)  │ .rodata      │
                             │ .rela.text   │            │ ...          │
                             └──────────────┘            └──────────────┘
                                  .o                          a.out

WHAT EACH STAGE DOES:
─────────────────────────────────────────────────────────────────────────────────
Preprocessor (cpp):
  - Expands #include directives (copies header file contents)
  - Expands #define macros
  - Processes #if/#ifdef conditionals
  - Handles #pragma directives

Compiler (cc1):
  - Parses C into Abstract Syntax Tree (AST)
  - Performs semantic analysis (type checking)
  - Generates Intermediate Representation (IR)
  - Applies optimizations (if enabled)
  - Generates target assembly

Assembler (as):
  - Converts assembly mnemonics to machine code
  - Resolves local labels
  - Generates object file with relocations

Linker (ld):
  - Combines multiple object files
  - Resolves external symbol references
  - Applies relocations
  - Creates executable with proper sections

Compiler Intermediate Representation

Modern compilers don’t translate C directly to assembly. They use intermediate representations:

COMPILER IR AND OPTIMIZATION STAGES
================================================================================

       C Source Code
            │
            ▼
    ┌───────────────┐
    │     Parser    │ ──► Abstract Syntax Tree (AST)
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │  Semantic     │ ──► Type-checked AST
    │  Analysis     │
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │   IR Gen      │ ──► High-Level IR (GIMPLE in GCC, LLVM IR in Clang)
    └───────────────┘
            │
            ▼
    ┌───────────────────────────────────────────────────────────────────────┐
    │                     OPTIMIZATION PASSES                               │
    │                                                                       │
    │  -O0 (none):     Skip most optimizations, maximum debuggability       │
    │                                                                       │
    │  -O1 (basic):    Dead code elimination, constant folding,             │
    │                  basic block merging, simple register allocation      │
    │                                                                       │
    │  -O2 (standard): + Inlining, loop optimizations, instruction          │
    │                  scheduling, common subexpression elimination,        │
    │                  strength reduction, tail call optimization           │
    │                                                                       │
    │  -O3 (aggressive): + Vectorization, aggressive inlining,              │
    │                    loop unrolling, function cloning                   │
    │                                                                       │
    │  -Os (size):     Like -O2 but optimizes for code size                 │
    │                                                                       │
    │  -Ofast:         -O3 + unsafe math optimizations                      │
    └───────────────────────────────────────────────────────────────────────┘
            │
            ▼
    ┌───────────────┐
    │  Register     │ ──► Low-Level IR with physical registers
    │  Allocation   │
    └───────────────┘
            │
            ▼
    ┌───────────────┐
    │  Code Gen     │ ──► Target Assembly (x86-64, ARM, etc.)
    └───────────────┘

EXAMPLE: GCC GIMPLE IR for a simple function

C Source:
    int square(int x) {
        return x * x;
    }

GIMPLE (gcc -fdump-tree-gimple):
    square (int x)
    {
      int D.1234;
      D.1234 = x * x;
      return D.1234;
    }

x86-64 Assembly Essentials

Understanding the target assembly language:

x86-64 REGISTER CONVENTIONS (System V AMD64 ABI)
================================================================================

GENERAL PURPOSE REGISTERS (64-bit):
─────────────────────────────────────────────────────────────────────────────────
Register   64-bit   32-bit   16-bit   8-bit    Purpose
─────────────────────────────────────────────────────────────────────────────────
RAX        rax      eax      ax       al       Return value, accumulator
RBX        rbx      ebx      bx       bl       Callee-saved
RCX        rcx      ecx      cx       cl       4th argument, counter
RDX        rdx      edx      dx       dl       3rd argument, I/O
RSI        rsi      esi      si       sil      2nd argument, source index
RDI        rdi      edi      di       dil      1st argument, dest index
RBP        rbp      ebp      bp       bpl      Base pointer (callee-saved)
RSP        rsp      esp      sp       spl      Stack pointer
R8         r8       r8d      r8w      r8b      5th argument
R9         r9       r9d      r9w      r9b      6th argument
R10        r10      r10d     r10w     r10b     Caller-saved temp
R11        r11      r11d     r11w     r11b     Caller-saved temp
R12-R15    r12-r15  r12d-r15d ...    ...      Callee-saved

ARGUMENT PASSING ORDER:
  Integer/Pointer: RDI, RSI, RDX, RCX, R8, R9, then stack
  Floating Point:  XMM0-XMM7, then stack
  Return Value:    RAX (integer), XMM0 (float), RDX:RAX (128-bit)

CALLEE-SAVED vs CALLER-SAVED:
  Callee-saved (function must preserve): RBX, RBP, R12-R15
  Caller-saved (function may clobber):   RAX, RCX, RDX, RSI, RDI, R8-R11

COMMON INSTRUCTION PATTERNS:
─────────────────────────────────────────────────────────────────────────────────

Data Movement:
  mov  dst, src        ; dst = src
  lea  dst, [addr]     ; dst = address (Load Effective Address)
  movzx dst, src       ; Move with zero extension
  movsx dst, src       ; Move with sign extension
  push src             ; Push onto stack
  pop  dst             ; Pop from stack

Arithmetic:
  add  dst, src        ; dst += src
  sub  dst, src        ; dst -= src
  imul dst, src        ; dst *= src (signed)
  neg  dst             ; dst = -dst
  inc  dst             ; dst++
  dec  dst             ; dst--
  xor  dst, dst        ; dst = 0 (fast way to zero a register)

Comparisons and Jumps:
  cmp  a, b            ; Set flags based on a - b
  test a, b            ; Set flags based on a & b
  je   label           ; Jump if equal (ZF=1)
  jne  label           ; Jump if not equal (ZF=0)
  jl   label           ; Jump if less (signed)
  jg   label           ; Jump if greater (signed)
  jb   label           ; Jump if below (unsigned)
  ja   label           ; Jump if above (unsigned)
  jmp  label           ; Unconditional jump

Function Calls:
  call func            ; Push return address, jump to func
  ret                  ; Pop return address, jump there
  leave                ; mov rsp, rbp; pop rbp (cleanup frame)

Optimization Transformations

Key optimizations that change generated code dramatically:

COMMON COMPILER OPTIMIZATIONS
================================================================================

1. CONSTANT FOLDING
───────────────────
Before:                         After:
    int x = 3 + 4;                  int x = 7;

Assembly change:
  -O0: mov DWORD PTR [rbp-4], 3
       add DWORD PTR [rbp-4], 4
  -O2: mov DWORD PTR [rbp-4], 7     ; Computed at compile time

2. DEAD CODE ELIMINATION
────────────────────────
Before:                         After:
    int x = 5;                      return 10;
    int y = x + 5;                  // x, y never used
    return 10;

3. COMMON SUBEXPRESSION ELIMINATION (CSE)
─────────────────────────────────────────
Before:                         After:
    int a = b * c + d;              int temp = b * c;
    int e = b * c + f;              int a = temp + d;
                                    int e = temp + f;

4. STRENGTH REDUCTION
─────────────────────
Converts expensive operations to cheaper ones:

Before:                         After:
    x * 2                           x << 1
    x * 8                           x << 3
    x / 4                           x >> 2 (if x unsigned)
    x % 8                           x & 7 (if x unsigned)

Assembly change for x * 4:
  -O0: imul eax, DWORD PTR [rbp-4], 4
  -O2: mov eax, DWORD PTR [rdi]
       sal eax, 2                    ; Shift left by 2 = multiply by 4

5. LOOP INVARIANT CODE MOTION
─────────────────────────────
Before:                         After:
    for (i = 0; i < n; i++) {       int temp = a * b;
        sum += arr[i] * a * b;      for (i = 0; i < n; i++) {
    }                                   sum += arr[i] * temp;
                                    }

6. LOOP UNROLLING
─────────────────
Before:                         After:
    for (i = 0; i < 4; i++) {       sum += arr[0];
        sum += arr[i];              sum += arr[1];
    }                               sum += arr[2];
                                    sum += arr[3];

7. INLINING
───────────
Before:                         After:
    int square(int x) {             // square() call eliminated
        return x * x;               // Code inserted directly:
    }                               y = x * x;
    ...
    y = square(x);

8. TAIL CALL OPTIMIZATION
─────────────────────────
Before (recursive):             After (iterative):
    int factorial(int n) {          // Recursive call converted to jump
        if (n <= 1) return 1;       // Stack doesn't grow
        return n * factorial(n-1);
    }

9. REGISTER ALLOCATION
──────────────────────
-O0: Variables live on stack, constant loads/stores
-O2: Variables kept in registers, minimal memory access

Assembly change for sum += arr[i]:
  -O0: mov eax, DWORD PTR [rbp-4]    ; Load sum from stack
       add eax, DWORD PTR [rbp-8]    ; Load arr[i] and add
       mov DWORD PTR [rbp-4], eax    ; Store sum back

  -O2: add eax, DWORD PTR [rdi]      ; Sum stays in eax, arr ptr in rdi

Why This Matters

Understanding C-to-assembly translation matters for:

REAL-WORLD APPLICATIONS
================================================================================

1. PERFORMANCE ENGINEERING
───────────────────────────
   Problem: "Why is this function slow?"
   Solution: Look at generated assembly to find:
   - Excessive memory traffic (variables not in registers)
   - Missed optimizations (loop not vectorized)
   - Poor instruction scheduling (pipeline stalls)

2. DEBUGGING RELEASE BUILDS
───────────────────────────
   Problem: "Works in debug, crashes in release"
   Reason: Often undefined behavior that optimizer exploits:
   - Signed overflow (compiler assumes never happens)
   - Uninitialized variables (optimizer removes "dead" init)
   - NULL pointer checks removed after dereference

3. SECURITY ANALYSIS
────────────────────
   Understanding exploits requires knowing:
   - How stack frames are laid out
   - How function calls work (return address location)
   - How bounds checking is implemented (or not)

4. EMBEDDED SYSTEMS
───────────────────
   Constraints require understanding:
   - Code size (which optimizations shrink code)
   - Register pressure (when to use volatile)
   - Timing-sensitive code (instruction count matters)

5. COMPETITIVE PROGRAMMING
──────────────────────────
   When microseconds matter:
   - Know which code patterns are fast
   - Understand what the compiler will optimize
   - Write code that helps the optimizer

6. TECHNICAL INTERVIEWS
───────────────────────
   Questions like:
   - "What assembly does this C code generate?"
   - "Why might this optimization break this code?"
   - "How would you optimize this at the assembly level?"

Historical Context

EVOLUTION OF C COMPILATION
================================================================================

1970s - Early C Compilers
────────────────────────────────────────
Dennis Ritchie's original PDP-11 compiler was simple:
- Single-pass compilation
- Minimal optimization
- Close correspondence between C and assembly
- "Portable assembly language" was literal

1980s - Optimization Begins
────────────────────────────────────────
As hardware diversified:
- Register allocation algorithms developed
- Basic block optimizations
- Peephole optimization (local instruction patterns)
- GCC created (1987) - first major open-source optimizing compiler

1990s - Advanced Optimization
────────────────────────────────────────
- SSA (Static Single Assignment) form for better analysis
- Interprocedural optimization
- Profile-guided optimization
- Loop transformations (unrolling, vectorization)

2000s - LLVM Revolution
────────────────────────────────────────
Chris Lattner creates LLVM (2003):
- Modular compiler infrastructure
- Clean IR for analysis
- JIT compilation capability
- Clang frontend (2007)

2010s-Present - Modern Optimizations
────────────────────────────────────────
- Auto-vectorization (SIMD without intrinsics)
- Link-time optimization (LTO)
- Polyhedral model for loop nests
- Machine learning for heuristics

TODAY:
Modern compilers (GCC 14, Clang 18) apply 100+ optimization passes,
transforming your C code in ways the original designers couldn't imagine.

Common Misconceptions

MISCONCEPTIONS ABOUT C AND ASSEMBLY
================================================================================

MYTH 1: "C is just portable assembly"
─────────────────────────────────────
Reality: Modern C is a HIGH-LEVEL language that gets heavily transformed.
- Your C code may have no direct correspondence to the output
- The optimizer may eliminate, reorder, or transform your code completely
- The "as-if" rule means ANY transformation preserving observable behavior is valid

MYTH 2: "Adding 'register' keyword makes variables use registers"
─────────────────────────────────────────────────────────────────
Reality: Modern compilers ignore 'register' entirely.
- Compilers do register allocation far better than humans
- The keyword only prevents taking address (&) of the variable
- Was relevant in 1970s, obsolete today

MYTH 3: "Hand-written assembly is always faster"
────────────────────────────────────────────────
Reality: Compilers usually win.
- Compilers know instruction latencies for specific CPUs
- Compilers apply transformations humans miss
- Compilers can do whole-program optimization
- Exception: SIMD intrinsics, crypto, specific hardware

MYTH 4: "Micro-optimizations in C translate to assembly"
─────────────────────────────────────────────────────────
Reality: Many "optimizations" make no difference:
- x++ vs ++x (identical in most contexts)
- for vs while (identical structure)
- Explicit unrolling (compiler does it better)

The compiler sees through these and generates equivalent code.

MYTH 5: "More optimization levels = better"
───────────────────────────────────────────
Reality: -O3 can be WORSE than -O2:
- Aggressive inlining increases code size (cache misses)
- Loop unrolling may hurt for small iteration counts
- Vectorization has overhead for short arrays
- -Ofast may produce incorrect results (unsafe math)

Profile before assuming higher -O is better.

Project Specification

What You Will Build

A C-to-Assembly teaching tool that:

  1. Takes C source code as input
  2. Displays the original C code with syntax highlighting
  3. Shows side-by-side assembly output at different optimization levels
  4. Annotates the assembly with explanations of optimizations applied
  5. Highlights the correspondence between C constructs and assembly patterns
$ ./c2asm examples/loop.c

================================================================================
                         C TO ASSEMBLY TRANSLATOR
================================================================================

=== Original C Code ===
┌──────────────────────────────────────────────────────────────────────────────┐
│ int sum(int *arr, int n) {                                                   │
│     int total = 0;                                                           │
│     for (int i = 0; i < n; i++) {                                            │
│         total += arr[i];                                                     │
│     }                                                                        │
│     return total;                                                            │
│ }                                                                            │
└──────────────────────────────────────────────────────────────────────────────┘

=== Assembly at -O0 (No Optimization) ===
┌──────────────────────────────────────────────────────────────────────────────┐
│ sum:                                                                         │
│     push    rbp                         ; Save caller's frame pointer        │
│     mov     rbp, rsp                    ; Set up our frame                   │
│     mov     QWORD PTR [rbp-24], rdi     ; arr stored on stack               │
│     mov     DWORD PTR [rbp-28], esi     ; n stored on stack                 │
│     mov     DWORD PTR [rbp-4], 0        ; total = 0                         │
│     mov     DWORD PTR [rbp-8], 0        ; i = 0                             │
│ .L2:                                    ; Loop header                        │
│     mov     eax, DWORD PTR [rbp-8]      ; Load i                            │
│     cmp     eax, DWORD PTR [rbp-28]     ; Compare i with n                  │
│     jge     .L3                         ; Exit if i >= n                     │
│     mov     eax, DWORD PTR [rbp-8]      ; Load i again                      │
│     cdqe                                ; Sign-extend to 64-bit             │
│     lea     rdx, [0+rax*4]              ; rdx = i * 4 (byte offset)         │
│     mov     rax, QWORD PTR [rbp-24]     ; Load arr pointer                  │
│     add     rax, rdx                    ; rax = &arr[i]                     │
│     mov     eax, DWORD PTR [rax]        ; Load arr[i]                       │
│     add     DWORD PTR [rbp-4], eax      ; total += arr[i]                   │
│     add     DWORD PTR [rbp-8], 1        ; i++                               │
│     jmp     .L2                         ; Back to loop header               │
│ .L3:                                    ; After loop                         │
│     mov     eax, DWORD PTR [rbp-4]      ; Load total                        │
│     pop     rbp                         ; Restore frame pointer             │
│     ret                                 ; Return total in eax               │
└──────────────────────────────────────────────────────────────────────────────┘

Instruction count in loop body: 15

=== Assembly at -O2 (Optimized) ===
┌──────────────────────────────────────────────────────────────────────────────┐
│ sum:                                                                         │
│     test    esi, esi                    ; Test if n <= 0                     │
│     jle     .L4                         ; Early exit if n <= 0              │
│     lea     rcx, [rdi+rsi*4]            ; rcx = arr + n (end pointer)       │
│     xor     eax, eax                    ; total = 0 (fast zeroing)          │
│ .L3:                                    ; Loop body                          │
│     add     eax, DWORD PTR [rdi]        ; total += *arr                     │
│     add     rdi, 4                      ; arr++ (pointer increment)         │
│     cmp     rdi, rcx                    ; Compare with end pointer          │
│     jne     .L3                         ; Continue if not at end            │
│     ret                                 ; Return total in eax               │
│ .L4:                                    ; n <= 0 case                        │
│     xor     eax, eax                    ; Return 0                          │
│     ret                                                                      │
└──────────────────────────────────────────────────────────────────────────────┘

Instruction count in loop body: 4

=== Optimization Analysis ===
┌──────────────────────────────────────────────────────────────────────────────┐
│ TRANSFORMATIONS APPLIED:                                                     │
│                                                                              │
│ 1. REGISTER ALLOCATION                                                       │
│    - 'total' kept in eax (not on stack)                                     │
│    - 'arr' pointer advanced in rdi                                          │
│    - No stack frame needed (leaf function optimization)                     │
│                                                                              │
│ 2. INDEX TO POINTER CONVERSION                                              │
│    - arr[i] becomes *arr with arr++                                         │
│    - Eliminates index multiplication on each iteration                      │
│    - End pointer calculated once (rcx = arr + n)                            │
│                                                                              │
│ 3. FRAME POINTER ELIMINATION                                                │
│    - No push rbp / mov rbp, rsp                                             │
│    - Saves 2 instructions                                                   │
│                                                                              │
│ 4. INSTRUCTION SELECTION                                                    │
│    - xor eax, eax instead of mov eax, 0 (1 byte shorter)                   │
│    - test esi, esi instead of cmp esi, 0 (same, but idiomatic)             │
│                                                                              │
│ PERFORMANCE IMPACT:                                                          │
│    -O0: ~15 instructions per iteration                                      │
│    -O2: ~4 instructions per iteration                                       │
│    Speedup: Approximately 3-4x for this loop                                │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Functional Requirements

Core Functionality:

  1. C Code Input
    • Accept C source files as command-line arguments
    • Accept C code from stdin (for piping)
    • Support inline C code snippets for quick testing
    • Handle function-level and file-level input
  2. Assembly Generation
    • Generate assembly for multiple optimization levels (-O0, -O1, -O2, -O3, -Os)
    • Support both GCC and Clang
    • Support both AT&T and Intel syntax
    • Preserve debug information for source mapping
  3. Comparison Display
    • Side-by-side display of different optimization levels
    • Syntax highlighting for both C and assembly
    • Line numbering for reference
    • Instruction count comparison
  4. Annotation System
    • Inline comments explaining each assembly instruction
    • Optimization transformation identification
    • Register usage tracking
    • Calling convention annotations
  5. Analysis Features
    • Count instructions (total and per basic block)
    • Identify optimization patterns applied
    • Compare GCC vs Clang output
    • Detect undefined behavior risks

Non-Functional Requirements

  • Performance: Process typical source files in under 1 second
  • Portability: Works on Linux and macOS
  • Usability: Clear, educational output suitable for learning
  • Extensibility: Easy to add new analysis features

Example Usage / Output

Example 1: Simple Arithmetic

$ ./c2asm -c "int square(int x) { return x * x; }"

=== C Code ===
int square(int x) { return x * x; }

=== -O0 ===                              === -O2 ===
square:                                   square:
    push    rbp                               imul    eax, edi, edi
    mov     rbp, rsp                          ret
    mov     DWORD PTR [rbp-4], edi
    mov     eax, DWORD PTR [rbp-4]
    imul    eax, eax
    pop     rbp
    ret

Analysis: O2 eliminates stack operations, computes directly in registers.
          Single imul uses edi (1st arg) as both operands.

Example 2: Conditional

$ ./c2asm examples/abs.c

=== C Code ===
int abs_val(int x) {
    if (x < 0)
        return -x;
    return x;
}

=== -O0 ===                              === -O2 ===
abs_val:                                  abs_val:
    push    rbp                               mov     eax, edi
    mov     rbp, rsp                          mov     edx, edi
    mov     DWORD PTR [rbp-4], edi            neg     edx
    cmp     DWORD PTR [rbp-4], 0              cmovs   eax, edx
    jns     .L2                               ret
    neg     DWORD PTR [rbp-4]
    mov     eax, DWORD PTR [rbp-4]
    jmp     .L3
.L2:
    mov     eax, DWORD PTR [rbp-4]
.L3:
    pop     rbp
    ret

Analysis: O2 uses conditional move (cmovs) to avoid branch.
          Branchless code can be faster on modern CPUs (no pipeline stalls).

Example 3: Struct Access

$ ./c2asm examples/struct.c

=== C Code ===
struct Point {
    int x;
    int y;
};

int get_x(struct Point *p) {
    return p->x;
}

int get_y(struct Point *p) {
    return p->y;
}

=== -O2 Assembly ===
get_x:
    mov     eax, DWORD PTR [rdi]       ; rdi = p, x is at offset 0
    ret

get_y:
    mov     eax, DWORD PTR [rdi+4]     ; y is at offset 4
    ret

Analysis: Struct member access is just pointer + offset.
          No function call overhead with inlining enabled.

Example 4: Function Call

$ ./c2asm examples/call.c

=== C Code ===
int add(int a, int b) {
    return a + b;
}

int compute(int x, int y, int z) {
    return add(x, add(y, z));
}

=== -O0 ===                              === -O2 ===
add:                                      add:
    push    rbp                               lea     eax, [rdi+rsi]
    mov     rbp, rsp                          ret
    mov     DWORD PTR [rbp-4], edi
    mov     DWORD PTR [rbp-8], esi        compute:
    mov     edx, DWORD PTR [rbp-4]            lea     eax, [rdi+rsi]
    mov     eax, DWORD PTR [rbp-8]            add     eax, edx
    add     eax, edx                          ret
    pop     rbp
    ret

compute:
    push    rbp
    mov     rbp, rsp
    sub     rsp, 16
    mov     DWORD PTR [rbp-4], edi
    mov     DWORD PTR [rbp-8], esi
    mov     DWORD PTR [rbp-12], edx
    mov     edx, DWORD PTR [rbp-12]
    mov     eax, DWORD PTR [rbp-8]
    mov     esi, edx
    mov     edi, eax
    call    add
    mov     edx, eax
    mov     eax, DWORD PTR [rbp-4]
    mov     esi, edx
    mov     edi, eax
    call    add
    leave
    ret

Analysis: O2 inlines add() into compute(), eliminating both call instructions.
          lea used for addition (single instruction, no flags affected).

Example 5: Switch Statement

$ ./c2asm examples/switch.c

=== C Code ===
int grade(int score) {
    switch (score / 10) {
        case 10:
        case 9: return 'A';
        case 8: return 'B';
        case 7: return 'C';
        case 6: return 'D';
        default: return 'F';
    }
}

=== -O2 Assembly ===
grade:
    mov     eax, edi
    mov     edx, 1717986919              ; Magic number for division by 10
    imul    edx
    sar     edx, 2
    sar     edi, 31
    sub     edx, edi                     ; edx = score / 10
    cmp     edx, 10
    ja      .L2                          ; Default case if > 10
    mov     eax, edx
    jmp     [QWORD PTR .L4[0+rax*8]]     ; Jump table dispatch
.L4:
    .quad   .L2                          ; 0: 'F'
    .quad   .L2                          ; 1: 'F'
    .quad   .L2                          ; 2: 'F'
    .quad   .L2                          ; 3: 'F'
    .quad   .L2                          ; 4: 'F'
    .quad   .L2                          ; 5: 'F'
    .quad   .L9                          ; 6: 'D'
    .quad   .L8                          ; 7: 'C'
    .quad   .L7                          ; 8: 'B'
    .quad   .L6                          ; 9: 'A'
    .quad   .L6                          ; 10: 'A'
.L6:
    mov     eax, 65                      ; 'A'
    ret
; ... other cases ...
.L2:
    mov     eax, 70                      ; 'F'
    ret

Analysis: Switch compiled to jump table for O(1) dispatch.
          Division by 10 uses magic multiplication (faster than div).

Real World Outcome

After building this tool, you will be able to:

  1. Look at any C code and predict its assembly: Know before compiling what the output will look like

  2. Optimize code intentionally: Write C that helps the compiler generate better assembly, rather than hoping it figures things out

  3. Debug optimization bugs: When code works at -O0 but breaks at -O2, identify which transformation caused the problem

  4. Understand performance: Know why certain code patterns are fast or slow by seeing the actual instructions

  5. Ace systems interviews: Answer questions like “what assembly does this generate?” confidently


Solution Architecture

High-Level Design

C TO ASSEMBLY TRANSLATOR ARCHITECTURE
================================================================================

  User Input                          Output
      │                                  │
      ▼                                  │
┌───────────────────────────────────────────────────────────────────────────────┐
│                            C2ASM TOOL                                         │
├───────────────────────────────────────────────────────────────────────────────┤
│                                                                               │
│  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐                   │
│  │   INPUT     │      │  COMPILER   │      │   PARSER    │                   │
│  │   PARSER    │ ───► │   DRIVER    │ ───► │  & DIFFER   │                   │
│  │             │      │             │      │             │                   │
│  │ - File      │      │ - Run GCC   │      │ - Parse ASM │                   │
│  │ - Stdin     │      │ - Run Clang │      │ - Extract   │                   │
│  │ - CLI code  │      │ - Multiple  │      │   functions │                   │
│  │             │      │   opt levels│      │ - Compute   │                   │
│  │             │      │             │      │   diff      │                   │
│  └─────────────┘      └─────────────┘      └─────────────┘                   │
│                                                  │                            │
│                                                  ▼                            │
│  ┌───────────────────────────────────────────────────────────────────────┐   │
│  │                         ANALYSIS ENGINE                                │   │
│  │                                                                        │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐   │   │
│  │  │ Instruction │  │ Optimization│  │  Register   │  │   C/ASM     │   │   │
│  │  │   Counter   │  │  Detector   │  │  Analyzer   │  │   Mapper    │   │   │
│  │  │             │  │             │  │             │  │             │   │   │
│  │  │ Total count │  │ Identify    │  │ Track usage │  │ Correlate   │   │   │
│  │  │ Loop body   │  │ which opts  │  │ Calling     │  │ C lines to  │   │   │
│  │  │ Basic block │  │ were applied│  │ convention  │  │ ASM blocks  │   │   │
│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘   │   │
│  │                                                                        │   │
│  └───────────────────────────────────────────────────────────────────────┘   │
│                                                  │                            │
│                                                  ▼                            │
│  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐                   │
│  │  ANNOTATOR  │ ───► │  FORMATTER  │ ───► │  DISPLAY    │ ─────────────►   │
│  │             │      │             │      │             │                   │
│  │ Add inline  │      │ Syntax      │      │ Terminal    │                   │
│  │ comments    │      │ highlighting│      │ or HTML     │                   │
│  │             │      │ Box drawing │      │ output      │                   │
│  └─────────────┘      └─────────────┘      └─────────────┘                   │
│                                                                               │
└───────────────────────────────────────────────────────────────────────────────┘

Key Components

1. Input Parser

  • Handles different input modes (file, stdin, inline code)
  • Validates C syntax (basic check before compilation)
  • Extracts functions for individual analysis

2. Compiler Driver

  • Wraps GCC and Clang invocations
  • Manages temporary files
  • Captures assembly output with various flags
  • Handles compiler errors gracefully

3. Assembly Parser

  • Tokenizes assembly output
  • Identifies functions, labels, instructions
  • Extracts basic blocks
  • Handles both AT&T and Intel syntax

4. Analysis Engine

  • Counts instructions per function and loop
  • Detects optimization patterns
  • Tracks register allocation
  • Maps C source to assembly blocks

5. Annotator

  • Adds explanatory comments to assembly
  • Describes instruction purpose
  • Notes calling convention details
  • Explains optimization transformations

6. Display/Formatter

  • Creates side-by-side view
  • Syntax highlighting
  • Box drawing for visual clarity
  • Multiple output formats (terminal, HTML)

Data Structures

/* Assembly instruction representation */
typedef struct {
    char *label;           /* Label if this line has one (e.g., ".L2:") */
    char *mnemonic;        /* Instruction name (mov, add, jmp, etc.) */
    char *operands[3];     /* Up to 3 operands */
    int operand_count;
    char *original_line;   /* Original text */
    char *comment;         /* Our added annotation */
    int source_line;       /* Corresponding C source line (-1 if unknown) */
} AsmInstruction;

/* Basic block (sequence of instructions ending in control flow) */
typedef struct {
    char *label;
    AsmInstruction *instructions;
    int instruction_count;
    char **successors;     /* Labels this block can jump to */
    int successor_count;
} BasicBlock;

/* Function in assembly */
typedef struct {
    char *name;
    BasicBlock *blocks;
    int block_count;
    int total_instructions;
    int is_leaf;           /* True if function makes no calls */
} AsmFunction;

/* Comparison result */
typedef struct {
    char *c_source;
    AsmFunction *opt_O0;
    AsmFunction *opt_O1;
    AsmFunction *opt_O2;
    AsmFunction *opt_O3;
    char **transformations; /* List of optimizations detected */
    int transformation_count;
} ComparisonResult;

/* Optimization pattern */
typedef struct {
    char *name;                    /* e.g., "Strength Reduction" */
    char *description;
    char *before_pattern;          /* What O0 looks like */
    char *after_pattern;           /* What O2 looks like */
    bool (*detector)(AsmFunction *O0, AsmFunction *O2);
} OptimizationPattern;

Algorithm Overview

Assembly Generation:

1. Create temp file with C source
2. For each optimization level:
   a. Run: gcc -S -o output.s -O{level} -fverbose-asm source.c
   b. Parse resulting assembly file
   c. Store in data structure
3. Clean up temp files

C-to-ASM Correlation:

1. Compile with: gcc -g -S -fverbose-asm source.c
2. Parse .loc directives in assembly (debug info)
3. Build mapping: C line number -> assembly instruction range
4. Store correlation for display

Optimization Detection:

1. For each known optimization pattern:
   a. Check if O0 has "before" pattern
   b. Check if O2 has "after" pattern
   c. If both, record the transformation
2. Return list of detected optimizations

Implementation Guide

Development Environment Setup

Required Tools:

# Compilers
sudo apt install gcc clang          # Linux
brew install gcc llvm               # macOS

# Verify
gcc --version
clang --version

# Useful flags to know:
gcc -S source.c              # Generate assembly
gcc -S -O0 source.c          # No optimization
gcc -S -O2 source.c          # Standard optimization
gcc -S -masm=intel source.c  # Intel syntax
gcc -S -fverbose-asm source.c  # Include C source as comments

# See preprocessing:
gcc -E source.c              # Just preprocess

# See intermediate representation:
gcc -fdump-tree-gimple source.c     # GCC GIMPLE IR
clang -emit-llvm -S source.c        # LLVM IR

Project Structure

c2asm/
├── Makefile
├── README.md
├── include/
│   ├── c2asm.h              # Main header
│   ├── parser.h             # Input parsing
│   ├── compiler.h           # Compiler driver
│   ├── asm_parser.h         # Assembly parser
│   ├── analysis.h           # Analysis engine
│   ├── annotator.h          # Annotation system
│   ├── display.h            # Output formatting
│   └── patterns.h           # Optimization patterns
├── src/
│   ├── main.c               # Entry point, CLI
│   ├── parser.c             # Input handling
│   ├── compiler.c           # GCC/Clang invocation
│   ├── asm_parser.c         # Assembly tokenization
│   ├── analysis.c           # Instruction counting, etc.
│   ├── annotator.c          # Comment generation
│   ├── display.c            # Terminal output
│   └── patterns.c           # Optimization detection
├── data/
│   └── annotations.txt      # Instruction annotation database
├── examples/
│   ├── loop.c               # Various test cases
│   ├── conditional.c
│   ├── struct.c
│   ├── recursion.c
│   └── advanced.c
└── tests/
    ├── test_parser.c
    ├── test_analysis.c
    └── run_tests.sh

The Core Question You’re Answering

“How does the compiler transform each C construct into machine instructions, and how do optimizations change this?”

This question drives every design decision:

  • We show multiple optimization levels to reveal the transformation
  • We annotate assembly to explain what each instruction does
  • We detect and name optimization patterns
  • We count instructions to quantify the improvement

Concepts You Must Understand First

Before implementing, ensure you understand:

  1. x86-64 calling convention (System V AMD64 ABI)
    • How arguments are passed (RDI, RSI, RDX, RCX, R8, R9, then stack)
    • Return values (RAX for integers, XMM0 for floats)
    • Callee-saved vs caller-saved registers
    • Stack frame layout
  2. Assembly syntax (AT&T vs Intel)
    • AT&T: movl %eax, %ebx (source, dest)
    • Intel: mov ebx, eax (dest, source)
    • Our tool should handle both
  3. Common optimization transformations
    • Constant folding, dead code elimination
    • Strength reduction, loop transformations
    • Inlining, register allocation
  4. GCC/Clang command-line interface
    • How to generate assembly (-S)
    • How to set optimization level (-O0 to -O3)
    • How to include debug info (-g)
    • How to get verbose assembly (-fverbose-asm)

Questions to Guide Your Design

Input Handling:

  • How will you handle functions vs complete programs?
  • Should you support headers (#include)?
  • How will you detect and report C syntax errors?

Compilation:

  • Should you use temp files or pipes?
  • How will you handle compiler not found?
  • Should you cache compiled results?

Assembly Parsing:

  • How will you identify function boundaries?
  • How will you handle labels vs instructions?
  • How will you parse operands (registers, memory, immediates)?

Analysis:

  • How will you identify loops in assembly?
  • How will you detect which optimizations were applied?
  • How will you map C source lines to assembly?

Display:

  • How wide should the terminal output be?
  • How will you handle very long instructions?
  • Should you support HTML output for documentation?

Thinking Exercise

Before writing code, predict the assembly for these C constructs:

Exercise 1: Simple assignment

int x = 42;

At -O0, what happens? At -O2 if x is never used?

Exercise 2: Array access

int arr[100];
int y = arr[i];

What calculations are needed for arr[i]?

Exercise 3: Function call

int result = add(a, b);

What happens before, during, and after the call instruction?

Exercise 4: Loop

for (int i = 0; i < 10; i++) sum += i;

What’s the difference between -O0 and -O2?

Hints in Layers

Hint 1: Getting Started - Compiler Wrapper

Start by wrapping the compiler call. Create a function that takes C code and returns assembly:

// compiler.h
typedef struct {
    char *assembly;          // The generated assembly text
    int success;             // 0 = success, non-zero = error
    char *error_message;     // Compiler error output if any
} CompileResult;

CompileResult compile_to_asm(const char *c_code, int opt_level,
                             const char *compiler, const char *syntax);

// Usage:
CompileResult result = compile_to_asm(
    "int add(int a, int b) { return a + b; }",
    2,        // -O2
    "gcc",    // or "clang"
    "intel"   // or "att"
);

Implementation approach:

  1. Write C code to a temp file
  2. Call gcc/clang with appropriate flags
  3. Read the output .s file
  4. Parse any error output
  5. Clean up temp files
Hint 2: Assembly Parsing Strategy

Parsing assembly requires handling several cases:

// Line types you'll encounter:
"func_name:"              // Function label
".L2:"                    // Local label
"    mov    eax, edi"     // Instruction with operands
"    ret"                 // Instruction without operands
"# comment"               // Comment (AT&T style)
"; comment"               // Comment (Intel style)
".cfi_startproc"          // Assembler directive
".loc 1 5 0"              // Debug location info

// Parsing approach:
AsmInstruction parse_line(const char *line) {
    AsmInstruction instr = {0};

    // Check for label (ends with ':')
    if (has_label(line)) {
        instr.label = extract_label(line);
        return instr;
    }

    // Skip directives (start with '.')
    if (line[0] == '.' || line[0] == '#' || line[0] == ';') {
        instr.is_directive = true;
        return instr;
    }

    // Parse instruction
    char *parts = tokenize(line);
    instr.mnemonic = parts[0];
    // ... parse operands

    return instr;
}
Hint 3: Instruction Annotation Database

Create a database of common instructions and their meanings:

// annotations.h
typedef struct {
    const char *mnemonic;
    const char *description;
} InstructionInfo;

static const InstructionInfo instruction_db[] = {
    {"mov",    "Move data between locations"},
    {"lea",    "Load effective address (compute address without accessing memory)"},
    {"add",    "Add source to destination"},
    {"sub",    "Subtract source from destination"},
    {"imul",   "Signed multiply"},
    {"xor",    "Bitwise XOR (xor eax,eax = zero register)"},
    {"push",   "Push value onto stack"},
    {"pop",    "Pop value from stack"},
    {"call",   "Call function (push return address, jump)"},
    {"ret",    "Return from function (pop return address, jump)"},
    {"cmp",    "Compare (subtract and set flags, discard result)"},
    {"test",   "Bitwise AND and set flags (discard result)"},
    {"je",     "Jump if equal (ZF=1)"},
    {"jne",    "Jump if not equal (ZF=0)"},
    {"jl",     "Jump if less (signed)"},
    {"jg",     "Jump if greater (signed)"},
    {"jmp",    "Unconditional jump"},
    // ... more
    {NULL, NULL}
};

const char *get_annotation(const char *mnemonic) {
    for (int i = 0; instruction_db[i].mnemonic; i++) {
        if (strcmp(instruction_db[i].mnemonic, mnemonic) == 0) {
            return instruction_db[i].description;
        }
    }
    return "Unknown instruction";
}
Hint 4: Optimization Detection

Detect optimizations by comparing O0 vs O2 output:

// patterns.c

// Pattern: Strength reduction (multiply to shift)
bool detect_strength_reduction(AsmFunction *O0, AsmFunction *O2) {
    bool has_mul_O0 = find_instruction(O0, "imul") != NULL ||
                      find_instruction(O0, "mul") != NULL;
    bool has_shift_O2 = find_instruction(O2, "sal") != NULL ||
                        find_instruction(O2, "shl") != NULL;

    // If O0 has multiply and O2 has shift instead, it's strength reduction
    return has_mul_O0 && has_shift_O2 && !find_instruction(O2, "imul");
}

// Pattern: Frame pointer elimination
bool detect_fp_elimination(AsmFunction *O0, AsmFunction *O2) {
    bool has_frame_O0 = find_instruction(O0, "push", "rbp") &&
                        find_sequence(O0, "mov", "rbp, rsp");
    bool has_frame_O2 = find_instruction(O2, "push", "rbp");

    return has_frame_O0 && !has_frame_O2;
}

// Pattern: Inlining
bool detect_inlining(AsmFunction *O0, AsmFunction *O2) {
    int calls_O0 = count_instruction(O0, "call");
    int calls_O2 = count_instruction(O2, "call");

    return calls_O0 > calls_O2;  // Fewer calls means inlining occurred
}
Hint 5: Source-to-Assembly Mapping

Use debug info to correlate C and assembly:

// With -g flag, GCC emits .loc directives:
//   .loc <file> <line> <column>
// before the assembly instructions for that source line.

// Example compiler output with -g -S -fverbose-asm:
// .loc 1 3 5
//     mov     eax, DWORD PTR [rbp-4]  # sum, sum
//     add     eax, DWORD PTR [rbp-8]  # sum, i

// Parse .loc to build mapping:
typedef struct {
    int source_line;
    int asm_start_line;
    int asm_end_line;
} SourceMapping;

void build_source_mapping(const char *asm_text, SourceMapping **mappings,
                          int *count) {
    int current_source_line = -1;
    int asm_line = 0;

    for (each line in asm_text) {
        asm_line++;
        if (starts_with(line, ".loc")) {
            // Parse: .loc <file> <line> <column>
            int file, line, col;
            sscanf(line, ".loc %d %d %d", &file, &line, &col);
            current_source_line = line;
            add_mapping(current_source_line, asm_line);
        }
    }
}
Hint 6: Display Formatting

Create side-by-side display with alignment:

// display.c

#define COL_WIDTH 40

void display_side_by_side(AsmFunction *left, AsmFunction *right,
                          const char *left_title, const char *right_title) {
    // Print header
    printf("=== %-*s === %-*s\n", COL_WIDTH-4, left_title,
                                   COL_WIDTH-4, right_title);

    // Print separator
    print_separator(COL_WIDTH * 2);

    // Print instructions
    int max_lines = MAX(left->total_instructions, right->total_instructions);

    for (int i = 0; i < max_lines; i++) {
        // Left side
        if (i < left->total_instructions) {
            char *formatted = format_instruction(&left->instructions[i]);
            printf("%-*s", COL_WIDTH, formatted);
        } else {
            printf("%-*s", COL_WIDTH, "");
        }

        // Right side
        if (i < right->total_instructions) {
            char *formatted = format_instruction(&right->instructions[i]);
            printf("%s\n", formatted);
        } else {
            printf("\n");
        }
    }
}

// Add annotations inline
char *format_instruction(AsmInstruction *instr) {
    static char buf[256];

    if (instr->label) {
        snprintf(buf, sizeof(buf), "%s:", instr->label);
    } else {
        const char *annot = get_annotation(instr->mnemonic);
        snprintf(buf, sizeof(buf), "    %-8s %-20s ; %s",
                 instr->mnemonic,
                 join_operands(instr),
                 annot);
    }

    return buf;
}

The Interview Questions They’ll Ask

After completing this project, you’ll be ready for:

  1. “What assembly does this C code generate?”
    int max(int a, int b) { return a > b ? a : b; }
    
    • At -O0: conditional jump
    • At -O2: conditional move (cmov)
    • Explain why cmov is often faster (no branch prediction needed)
  2. “Why does -O2 produce different code than -O0?”
    • Register allocation: variables in registers vs stack
    • Instruction selection: lea vs add, xor vs mov 0
    • Inlining: function calls eliminated
    • Loop transformations: unrolling, strength reduction
  3. “This code works at -O0 but crashes at -O2. Why?”
    • Likely undefined behavior that optimizer exploits
    • Examples: signed overflow, uninitialized read, null pointer dereference
    • The compiler assumes UB never happens and optimizes accordingly
  4. “How are function arguments passed on x86-64?”
    • First 6 integer args: RDI, RSI, RDX, RCX, R8, R9
    • Additional args: pushed on stack (right to left)
    • Floating point: XMM0-XMM7
    • Return value: RAX (or RDX:RAX for 128-bit)
  5. “What optimizations can the compiler do automatically?”
    • Always: constant folding, dead code elimination, basic CSE
    • Often: inlining, loop unrolling, strength reduction
    • Sometimes: vectorization (if the pattern is right)
    • Never: algorithmic improvements (O(n^2) stays O(n^2))
  6. “When should you hand-write assembly instead of letting the compiler optimize?”
    • Almost never for general code
    • Exceptions: cryptographic primitives, SIMD intrinsics, specific hardware
    • Compilers know instruction latencies for specific CPUs
    • Profile before assuming you can beat the compiler

Books That Will Help

Topic Book Chapter/Section
x86-64 Assembly CS:APP 3rd Ed Ch. 3: Machine-Level Representation of Programs
Optimization Expert C Programming Ch. 8: Why Programmers Can’t Tell What Their Programs Will Do
Calling Conventions CS:APP 3rd Ed Sect. 3.7: Procedures
Compiler Internals Engineering a Compiler Ch. 1: Overview of Compilation
Code Generation Engineering a Compiler Ch. 11: Instruction Selection
Register Allocation Engineering a Compiler Ch. 13: Register Allocation
Optimization Theory Compilers (Dragon Book) Ch. 8-9: Code Generation and Optimization

Implementation Phases

Phase 1: Basic Compiler Wrapper (Days 1-4)

  • Create temp file handling
  • Implement gcc/clang invocation
  • Capture assembly output
  • Handle errors gracefully
  • Test with simple C programs

Phase 2: Assembly Parser (Days 5-9)

  • Tokenize assembly lines
  • Identify instructions vs labels vs directives
  • Parse operands (registers, memory, immediates)
  • Build function and basic block structures
  • Handle both AT&T and Intel syntax

Phase 3: Side-by-Side Display (Days 10-12)

  • Format output columns
  • Align corresponding code
  • Add instruction count summary
  • Syntax highlighting (optional)
  • Box drawing for visual clarity

Phase 4: Annotation System (Days 13-15)

  • Build instruction database
  • Add inline comments
  • Detect and explain optimization patterns
  • Track register usage
  • Map C source to assembly (using debug info)

Phase 5: Analysis Features (Days 16-18)

  • Count instructions per function
  • Identify loops and count loop body instructions
  • Detect specific optimizations
  • Compare GCC vs Clang output
  • Generate optimization summary

Phase 6: Polish and Extensions (Days 19-21)

  • Add more optimization patterns
  • HTML output option
  • Interactive mode
  • Comprehensive test suite
  • Documentation and examples

Key Implementation Decisions

  1. Temp file vs pipe?
    • Temp files are simpler and more reliable
    • Use mkstemp() for safety
    • Clean up in atexit() handler
  2. How to detect function boundaries?
    • Look for global labels (not starting with .L)
    • Parse .globl directive
    • Match with corresponding ret instruction
  3. How to handle different assembly syntaxes?
    • Default to Intel (more readable)
    • Use -masm=intel for GCC
    • Clang uses different flag: -mllvm -x86-asm-syntax=intel
  4. How much annotation is helpful?
    • Basic: instruction meaning only
    • Standard: instruction + context (e.g., “saving caller’s frame pointer”)
    • Verbose: full explanation of why this code is generated

Testing Strategy

Test Categories

Category Purpose Examples
Unit Tests Test individual components Parser, compiler driver
Integration Tests Test full pipeline C code to annotated output
Regression Tests Ensure consistent output Known C patterns
Comparison Tests Verify GCC vs Clang handling Same input, different compilers

Critical Test Cases

Basic Types:

// test_basic.c
int return_int(void) { return 42; }
int add(int a, int b) { return a + b; }
void do_nothing(void) { }

Conditionals:

// test_conditional.c
int max(int a, int b) { return a > b ? a : b; }
int abs_val(int x) { return x < 0 ? -x : x; }

Loops:

// test_loops.c
int sum_array(int *arr, int n) {
    int total = 0;
    for (int i = 0; i < n; i++) total += arr[i];
    return total;
}

int factorial(int n) {
    int result = 1;
    while (n > 1) result *= n--;
    return result;
}

Structs:

// test_struct.c
struct Point { int x, y; };
int get_x(struct Point *p) { return p->x; }
void set_x(struct Point *p, int x) { p->x = x; }

Function Calls:

// test_call.c
int helper(int x) { return x * 2; }
int caller(int y) { return helper(y) + helper(y + 1); }

Switch Statements:

// test_switch.c
int classify(int x) {
    switch (x) {
        case 0: return -1;
        case 1: case 2: return 0;
        case 3: return 1;
        default: return 99;
    }
}

Test Script

#!/bin/bash
# test_c2asm.sh

C2ASM=./c2asm
PASS=0
FAIL=0

test_case() {
    name=$1
    input=$2
    expected_pattern=$3

    output=$($C2ASM -c "$input" 2>&1)

    if echo "$output" | grep -q "$expected_pattern"; then
        echo "PASS: $name"
        ((PASS++))
    else
        echo "FAIL: $name"
        echo "  Input: $input"
        echo "  Expected pattern: $expected_pattern"
        echo "  Output: $output"
        ((FAIL++))
    fi
}

# Test cases
test_case "Return constant" \
    "int f(void) { return 42; }" \
    "mov.*eax.*42"

test_case "Addition" \
    "int add(int a, int b) { return a + b; }" \
    "add"

test_case "Function call" \
    "extern int bar(int); int foo(int x) { return bar(x); }" \
    "call"

test_case "Loop at O2 uses registers" \
    "int sum(int n) { int s=0; for(int i=0;i<n;i++) s+=i; return s; }" \
    "eax"

echo ""
echo "Results: $PASS passed, $FAIL failed"

Common Pitfalls & Debugging

Frequent Mistakes

Pitfall Symptom Solution
Temp file not cleaned up /tmp fills up Use atexit() handler
Parsing AT&T when expecting Intel Wrong operand order Check compiler flags
Missing function boundaries All code in one function Look for .globl and non-.L labels
Incorrect instruction count Off-by-one or missing directives Filter directives before counting
Broken on macOS Different assembler output Handle Mach-O vs ELF differences

Debugging Your Tool

/* Add verbose mode for debugging */
#ifdef DEBUG
#define DBG(fmt, ...) fprintf(stderr, "[DEBUG] " fmt "\n", ##__VA_ARGS__)
#else
#define DBG(fmt, ...)
#endif

/* In compiler.c */
int run_compiler(const char *source_file, const char *output_file, int opt_level) {
    char cmd[1024];
    snprintf(cmd, sizeof(cmd), "gcc -S -O%d -masm=intel -o %s %s 2>&1",
             opt_level, output_file, source_file);

    DBG("Running: %s", cmd);

    FILE *fp = popen(cmd, "r");
    // ... capture output

    DBG("Compiler exit code: %d", status);
    return status;
}

/* In asm_parser.c */
AsmInstruction parse_line(const char *line) {
    DBG("Parsing line: [%s]", line);

    // ...parsing code...

    DBG("Result: mnemonic=%s, operands=%d",
        instr.mnemonic ? instr.mnemonic : "(null)",
        instr.operand_count);

    return instr;
}

Testing with Known Output

# Generate reference output manually
echo 'int add(int a, int b) { return a + b; }' > /tmp/test.c
gcc -S -O2 -masm=intel -o /tmp/test.s /tmp/test.c
cat /tmp/test.s

# Expected output (GCC 11, x86-64):
#   .file   "test.c"
#   .intel_syntax noprefix
#   .text
#   .globl  add
#   .type   add, @function
# add:
#   .cfi_startproc
#   lea     eax, [rdi+rsi]
#   ret
#   .cfi_endproc

Extensions & Challenges

Beginner Extensions

  • Add color output: Highlight registers, memory, immediates differently
  • Add -v verbose mode: Show all compiler flags and temp files
  • Support reading from stdin: echo "int f() { return 0; }" | ./c2asm
  • Add instruction count summary: Total instructions per optimization level

Intermediate Extensions

  • GCC vs Clang comparison: Side-by-side comparison of both compilers
  • LLVM IR view: Show intermediate representation with -emit-llvm
  • Basic block visualization: Show control flow graph
  • Detect more optimizations: Vectorization, tail calls, etc.
  • Measure compilation time: Show how long each optimization level takes

Advanced Extensions

  • Profile-guided comparison: Compare -O2 vs -O2 with PGO
  • Memory layout visualization: Show how structs map to assembly accesses
  • Interactive mode: REPL for exploring C-to-assembly
  • Web interface: Build a Godbolt-like tool (local version)
  • Disassembly support: Compare source assembly to binary disassembly
  • ARM/RISC-V support: Cross-compile and show different architectures

Real-World Connections

Industry Tools

Compiler Explorer (Godbolt): https://godbolt.org

  • The gold standard for online C-to-assembly exploration
  • Your tool provides similar functionality locally
  • Learn from its interface design

perf annotate: Linux performance tool

  • Shows assembly with performance counters
  • Identifies hot instructions

Hopper/IDA/Ghidra: Disassemblers

  • Work from the other direction: binary to assembly
  • Useful for comparing compiler output to final binary

Use Cases

  1. Performance Debugging
    • “Why is this loop slow?”
    • Look at assembly to find missed optimizations
  2. Security Research
    • Understanding how stack protectors work
    • Analyzing how mitigations are implemented
  3. Compiler Development
    • Testing optimization passes
    • Comparing different compiler versions
  4. Education
    • Teaching computer architecture
    • Demonstrating compilation concepts

Resources

Essential References

Resource URL Description
Godbolt https://godbolt.org Online compiler explorer
x86-64 ABI https://gitlab.com/x86-psABIs/x86-64-ABI Calling convention spec
Intel Manual https://software.intel.com/sdm Official instruction reference
GCC Options https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html Compiler flags
LLVM Lang Ref https://llvm.org/docs/LangRef.html LLVM IR documentation
  • P04: Stack Frame Inspector - Understand calling conventions in detail
  • P06: Symbol Table Analyzer - Understand linking and symbols
  • P17: Calling Convention Visualizer - Complement to this project

Self-Assessment Checklist

Understanding Verification

  • I can explain what each compiler stage does (preprocessing, compilation, assembly, linking)
  • I know the x86-64 calling convention for first 6 integer arguments
  • I can identify callee-saved vs caller-saved registers
  • I understand why -O2 produces different code than -O0
  • I can explain at least 5 compiler optimizations

Implementation Verification

  • My tool successfully compiles C code to assembly
  • My tool parses assembly into structured data
  • My tool displays side-by-side comparison of optimization levels
  • My tool adds meaningful annotations to assembly
  • My tool detects at least 3 optimization patterns

Quality Verification

  • The tool handles compiler errors gracefully
  • The tool works with both GCC and Clang
  • Output is clear and educational
  • Test suite passes for all example programs

Growth Verification

  • I can look at C code and mentally predict its assembly
  • I can explain performance differences based on generated assembly
  • I can use this knowledge to write more efficient C code
  • I can answer interview questions about compilation

Submission / Completion Criteria

Minimum Viable Completion

  • Compiles C code to assembly at multiple optimization levels
  • Parses and displays assembly output
  • Shows instruction count comparison
  • Works with simple functions

Full Completion

  • Side-by-side display of O0 vs O2
  • Inline annotations explaining instructions
  • Detects and reports optimization patterns
  • Handles loops, conditionals, function calls, structs
  • Works with both GCC and Clang
  • Source-to-assembly line mapping
  • Comprehensive test suite

Excellence (Going Above & Beyond)

  • Interactive mode
  • HTML output for documentation
  • LLVM IR intermediate view
  • Vectorization analysis
  • Profile-guided optimization comparison
  • Cross-compilation support (ARM, RISC-V)
  • Published as open-source tool

Thinking Exercise

Before writing code, work through these exercises by hand:

Exercise 1: Loop Analysis

Given this C code:

int count_zeros(int *arr, int n) {
    int count = 0;
    for (int i = 0; i < n; i++) {
        if (arr[i] == 0) count++;
    }
    return count;
}
  1. At -O0, what variables go on the stack?
  2. At -O2, what variables stay in registers?
  3. What optimization converts arr[i] to pointer arithmetic?
  4. How many instructions are in the loop body at -O0 vs -O2?

Exercise 2: Function Inlining

Given:

static int square(int x) { return x * x; }
int sum_of_squares(int a, int b) {
    return square(a) + square(b);
}
  1. What happens to square() at -O2?
  2. Why is static important here?
  3. What does the final sum_of_squares() assembly look like?

Exercise 3: Undefined Behavior

Given:

int bad_code(int x) {
    if (x + 1 > x) return 1;
    return 0;
}
  1. What does -O0 produce?
  2. What does -O2 produce? Why?
  3. What undefined behavior enables this transformation?

This guide was expanded from EXPERT_C_PROGRAMMING_DEEP_DIVE.md. For the complete learning path, see the project index.