Project 9: Write Programs in Assembly (x86-64 or ARM)

Write real assembly programs: a number printer, string reversal, bubble sort, and finally a simple shell or text editor. No C, no libraries–just raw syscalls.

Project Overview

Attribute Details
Difficulty Advanced
Time Estimate 2-4 weeks (for multiple programs)
Primary Language Assembly (x86-64 or ARM64)
Alternative Languages x86-64, ARM64, RISC-V
Knowledge Area Assembly Programming / Low-level Coding
Tools Required NASM or GAS (assembler), ld (linker), GDB/LLDB
Primary Reference “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron

1. Learning Objectives

By completing this project, you will be able to:

  1. Write complete programs in assembly without any C runtime or standard library
  2. Make direct system calls using the Linux syscall interface
  3. Manage registers correctly following calling conventions (caller-saved vs callee-saved)
  4. Implement control flow using conditional and unconditional branches
  5. Build and manipulate the stack for local variables and function calls
  6. Process strings and arrays at the byte level
  7. Read compiler output fluently and understand what your C code becomes
  8. Debug at the instruction level using GDB or LLDB

2. Deep Theoretical Foundation

2.1 Why Assembly? Why Now?

Every program you’ve ever written compiles down to assembly. Understanding assembly isn’t just an academic exercise–it’s the key to:

  • Debugging impossible bugs: When the debugger shows you disassembly, you need to read it
  • Writing secure code: Buffer overflows, ROP chains, and exploits happen at the assembly level
  • Performance optimization: Knowing what the CPU actually executes helps you write faster code
  • Reverse engineering: Understanding malware, proprietary protocols, or undocumented APIs
  • Embedded systems: Writing bootloaders, device drivers, and bare-metal code

2.2 The x86-64 Register Set

The x86-64 architecture provides 16 general-purpose 64-bit registers:

x86-64 General-Purpose Registers:
+-------------------------------------------------------------------------+
|  64-bit   |  32-bit  |  16-bit  |   8-bit High  |   8-bit Low           |
+-----------+----------+----------+---------------+-----------------------+
|   RAX     |   EAX    |    AX    |      AH       |      AL               |
|   RBX     |   EBX    |    BX    |      BH       |      BL               |
|   RCX     |   ECX    |    CX    |      CH       |      CL               |
|   RDX     |   EDX    |    DX    |      DH       |      DL               |
|   RSI     |   ESI    |    SI    |      -        |      SIL              |
|   RDI     |   EDI    |    DI    |      -        |      DIL              |
|   RBP     |   EBP    |    BP    |      -        |      BPL              |
|   RSP     |   ESP    |    SP    |      -        |      SPL              |
|   R8      |   R8D    |    R8W   |      -        |      R8B              |
|   R9      |   R9D    |    R9W   |      -        |      R9B              |
|   R10     |   R10D   |   R10W   |      -        |      R10B             |
|   R11     |   R11D   |   R11W   |      -        |      R11B             |
|   R12     |   R12D   |   R12W   |      -        |      R12B             |
|   R13     |   R13D   |   R13W   |      -        |      R13B             |
|   R14     |   R14D   |   R14W   |      -        |      R14B             |
|   R15     |   R15D   |   R15W   |      -        |      R15B             |
+-----------+----------+----------+---------------+-----------------------+

Special Registers:
RIP  - Instruction Pointer (program counter)
RFLAGS - Status flags (zero, carry, sign, overflow, etc.)

Historical Naming: The original names (AX, BX, CX, DX) come from the 16-bit 8086:

  • Accumulator (math operations)
  • Base (base pointer for memory)
  • Counter (loop counter)
  • Data (I/O operations)

Modern code uses all registers fairly interchangeably, but the names persist.

2.3 The System V AMD64 Calling Convention

When calling functions (including syscalls), registers have specific roles:

System V AMD64 ABI Calling Convention:
+------------------------------------------------------------------------+
|                         FUNCTION ARGUMENTS                             |
+------------------------------------------------------------------------+
|  Argument #  |  Integer/Pointer  |  Floating Point  |  Stack Order     |
|--------------|-------------------|------------------|------------------|
|      1       |       RDI         |      XMM0        |       -          |
|      2       |       RSI         |      XMM1        |       -          |
|      3       |       RDX         |      XMM2        |       -          |
|      4       |       RCX         |      XMM3        |       -          |
|      5       |       R8          |      XMM4        |       -          |
|      6       |       R9          |      XMM5        |       -          |
|     7+       |      (stack)      |      XMM6-7      |   Right to left  |
+------------------------------------------------------------------------+

+------------------------------------------------------------------------+
|                         RETURN VALUES                                  |
+------------------------------------------------------------------------+
|  RAX  - Integer/pointer return value (up to 64 bits)                   |
|  RDX  - Second return value (for 128-bit returns)                      |
|  XMM0 - Floating point return value                                    |
+------------------------------------------------------------------------+

+------------------------------------------------------------------------+
|                     REGISTER PRESERVATION                              |
+------------------------------------------------------------------------+
|  Caller-Saved (scratch)  |  Callee-Saved (preserved)                   |
|--------------------------|---------------------------------------------|
|  RAX, RCX, RDX           |  RBX, RBP, R12, R13, R14, R15               |
|  RSI, RDI                |  RSP (must be restored)                     |
|  R8, R9, R10, R11        |                                              |
+------------------------------------------------------------------------+
|  Caller-saved: Function MAY destroy these                              |
|  Callee-saved: Function MUST preserve these (push/pop)                 |
+------------------------------------------------------------------------+

Key insight: When you call a function, assume RAX, RCX, RDX, RSI, RDI, R8-R11 are destroyed. If you need values in those registers after the call, save them first.

2.4 Linux System Calls

System calls are how user programs request services from the kernel. On x86-64 Linux:

Linux System Call Interface (x86-64):
+------------------------------------------------------------------------+
|                         SYSCALL SETUP                                  |
+------------------------------------------------------------------------+
|  RAX  - System call number                                             |
|  RDI  - First argument                                                 |
|  RSI  - Second argument                                                |
|  RDX  - Third argument                                                 |
|  R10  - Fourth argument  (note: NOT RCX!)                              |
|  R8   - Fifth argument                                                 |
|  R9   - Sixth argument                                                 |
+------------------------------------------------------------------------+

+------------------------------------------------------------------------+
|                      SYSCALL EXECUTION                                 |
+------------------------------------------------------------------------+
|  SYSCALL instruction triggers kernel mode transition                   |
|  Kernel executes requested service                                     |
|  RAX contains return value (or negative error code)                    |
|  RCX and R11 are destroyed by SYSCALL instruction                      |
+------------------------------------------------------------------------+

Common System Calls:
+--------+------------------+---------------------------------------+
| Number |      Name        |           Arguments                   |
+--------+------------------+---------------------------------------+
|   0    |   read           | fd, buf, count                        |
|   1    |   write          | fd, buf, count                        |
|   2    |   open           | pathname, flags, mode                 |
|   3    |   close          | fd                                    |
|  57    |   fork           | (none)                                |
|  59    |   execve         | pathname, argv, envp                  |
|  60    |   exit           | status                                |
|  61    |   wait4          | pid, status, options, rusage          |
+--------+------------------+---------------------------------------+

2.5 The Stack

The stack is fundamental to assembly programming. It grows downward (toward lower addresses) and is used for:

Stack Frame Layout:
+------------------------------------------------------------------------+
|  High addresses                                                        |
|                                                                        |
|  +------------------------------------------------------------------+  |
|  |  Caller's stack frame                                            |  |
|  +------------------------------------------------------------------+  |
|  |  Return address (pushed by CALL)                          8 bytes|  |
|  +------------------------------------------------------------------+  |
|  |  Saved RBP (if using frame pointer)                       8 bytes|  | <-- RBP
|  +------------------------------------------------------------------+  |
|  |  Local variable 1                                                |  |
|  +------------------------------------------------------------------+  |
|  |  Local variable 2                                                |  |
|  +------------------------------------------------------------------+  |
|  |  ...                                                             |  |
|  +------------------------------------------------------------------+  |
|  |  Saved callee-saved registers (RBX, R12-R15 if used)             |  |
|  +------------------------------------------------------------------+  |
|  |  Red zone (128 bytes below RSP - can use without adjusting RSP)  |  | <-- RSP
|  +------------------------------------------------------------------+  |
|                                                                        |
|  Low addresses (stack grows this direction)                            |
+------------------------------------------------------------------------+

Stack Operations:
+----------------------------------------+
|  PUSH RAX   |  SUB RSP, 8              |
|             |  MOV [RSP], RAX          |
+----------------------------------------+
|  POP  RAX   |  MOV RAX, [RSP]          |
|             |  ADD RSP, 8              |
+----------------------------------------+

Red Zone: On x86-64 Linux, leaf functions (those that don’t call other functions) can use 128 bytes below RSP without adjusting RSP. This optimization avoids stack pointer manipulation for simple functions.

2.6 Memory Addressing Modes

x86-64 provides flexible ways to address memory:

Addressing Mode Syntax:
+------------------------------------------------------------------------+
|  Mode              |  Syntax                |  Effective Address       |
|--------------------+------------------------+--------------------------|
|  Immediate         |  MOV RAX, 42           |  RAX = 42                |
|  Register          |  MOV RAX, RBX          |  RAX = RBX               |
|  Direct            |  MOV RAX, [0x1000]     |  RAX = mem[0x1000]       |
|  Register Indirect |  MOV RAX, [RBX]        |  RAX = mem[RBX]          |
|  Base + Offset     |  MOV RAX, [RBX+8]      |  RAX = mem[RBX+8]        |
|  Base + Index      |  MOV RAX, [RBX+RCX]    |  RAX = mem[RBX+RCX]      |
|  Scaled Index      |  MOV RAX, [RBX+RCX*8]  |  RAX = mem[RBX+RCX*8]    |
|  Full Form         |  MOV RAX, [RBX+RCX*8+16]| RAX = mem[RBX+RCX*8+16] |
+------------------------------------------------------------------------+

Scale factors: 1, 2, 4, or 8 (matching data type sizes)

Array Access Example:
int64_t array[100];      // C
// array[i] in assembly:
// MOV RAX, [RBX + RCX*8]   ; RBX = base, RCX = index, 8 = sizeof(int64_t)

2.7 Flags Register

The RFLAGS register contains condition codes set by arithmetic and comparison operations:

Important Flags:
+------------------------------------------------------------------------+
|  Flag  |  Name       |  Set When                                       |
|--------+-------------+-------------------------------------------------|
|   ZF   |  Zero       |  Result is zero                                 |
|   SF   |  Sign       |  Result is negative (high bit set)              |
|   CF   |  Carry      |  Unsigned overflow/underflow                    |
|   OF   |  Overflow   |  Signed overflow                                |
+------------------------------------------------------------------------+

Conditional Jumps:
+------------------------------------------------------------------------+
|  Instruction  |  Condition           |  Common Use                     |
|---------------+----------------------+---------------------------------|
|  JE / JZ      |  ZF=1                |  Jump if equal (after CMP)      |
|  JNE / JNZ    |  ZF=0                |  Jump if not equal              |
|  JL / JNGE    |  SF!=OF              |  Jump if less (signed)          |
|  JG / JNLE    |  ZF=0 and SF=OF      |  Jump if greater (signed)       |
|  JB / JNAE    |  CF=1                |  Jump if below (unsigned)       |
|  JA / JNBE    |  CF=0 and ZF=0       |  Jump if above (unsigned)       |
|  JS           |  SF=1                |  Jump if sign (negative)        |
|  JO           |  OF=1                |  Jump if overflow               |
+------------------------------------------------------------------------+

3. Complete Project Specification

This project consists of four sub-projects, each building on the previous:

Sub-Project A: Hello World (2-3 hours)

Write a program that prints “Hello from raw assembly!” using only the write syscall.

Sub-Project B: Integer to ASCII Printer (1-2 days)

Write a program that converts an integer to its decimal string representation and prints it.

Sub-Project C: Bubble Sort (2-3 days)

Implement bubble sort for an array of integers, entirely in assembly.

Sub-Project D: Mini Shell (1-2 weeks)

Build a simple shell that reads commands, forks, and executes programs.


4. Real World Outcome

When you complete all sub-projects, you’ll have:

Sub-Project A: Hello World

$ nasm -f elf64 hello.asm && ld hello.o -o hello && ./hello
Hello from raw assembly!

$ wc -c hello
352 hello   # Only 352 bytes! No libc, no bloat.

$ file hello
hello: ELF 64-bit LSB executable, x86-64, statically linked, not stripped

Sub-Project B: Integer Printer

$ ./print_int
Enter a number concept to print: 12345
12345

$ ./print_int
Testing negative: -42
-42

$ ./print_int
Testing zero: 0
0

$ ./print_int
Testing large: 9223372036854775807
9223372036854775807

Sub-Project C: Bubble Sort

$ ./bubble_sort
Original array: 64 34 25 12 22 11 90
Sorted array:   11 12 22 25 34 64 90

Comparisons: 21
Swaps: 11

Sub-Project D: Mini Shell

$ ./minish
minish> echo hello world
hello world
minish> ls -la
total 48
drwxr-xr-x  2 user user 4096 Dec 29 10:00 .
drwxr-xr-x 10 user user 4096 Dec 29 09:00 ..
-rwxr-xr-x  1 user user 8192 Dec 29 10:00 minish
-rw-r--r--  1 user user 2048 Dec 29 10:00 minish.asm
minish> pwd
/home/user/assembly
minish> exit
Goodbye!

5. The Core Questions You’re Answering

Each sub-project answers fundamental questions:

Sub-Project A: Hello World

“How does a program actually output text without printf?”

You’ll learn that printf is just a wrapper around the write system call.

Sub-Project B: Integer Printer

“How do numbers become the characters we see on screen?”

You’ll implement the algorithm compilers use: repeated division to extract digits.

Sub-Project C: Bubble Sort

“How do loops and array access work at the CPU level?”

You’ll see that array[i] is just base + i * size and loops are conditional jumps.

Sub-Project D: Mini Shell

“How does a shell actually run programs?”

You’ll understand fork/exec, the foundation of Unix process creation.


6. Concepts You Must Understand First

Before writing assembly, verify you can answer these questions:

6.1 Binary and Hexadecimal

| Concept | Questions to Answer | Reference | |———|———————|———–| | Hex notation | What is 0xFF in decimal? Binary? | CS:APP Ch. 2.1 | | Two’s complement | How is -1 represented in 64 bits? | CS:APP Ch. 2.2 | | Byte order | Is x86-64 big or little endian? | CS:APP Ch. 2.1.3 |

6.2 Memory Model

| Concept | Questions to Answer | Reference | |———|———————|———–| | Stack growth | Does the stack grow up or down? | CS:APP Ch. 3.7 | | Alignment | Why must RSP be 16-byte aligned before CALL? | System V ABI | | Sections | What goes in .text vs .data vs .bss? | CS:APP Ch. 7.4 |

6.3 System Calls

| Concept | Questions to Answer | Reference | |———|———————|———–| | Syscall interface | What register holds the syscall number? | Linux syscall table | | Return values | How do you know if a syscall failed? | TLPI Ch. 3 | | errno | What does a negative return value mean? | TLPI Ch. 3 |


7. Questions to Guide Your Design

Sub-Project A: Hello World

Data Section Questions:

  • Where should the string “Hello from raw assembly!” live?
  • How do you calculate the length of a string at assembly time?
  • What does db mean? What about equ?

Syscall Questions:

  • What is the syscall number for write?
  • What does file descriptor 1 represent?
  • What happens if write returns -1?

Sub-Project B: Integer Printer

Algorithm Questions:

  • How do you extract the least significant digit from a number?
  • After extracting a digit, what operation gives you the remaining number?
  • Digits come out in reverse order–how do you handle this?
  • How do you convert digit 5 to character ‘5’?

Edge Cases:

  • How do you handle zero?
  • How do you handle negative numbers?
  • What’s the maximum number of digits in a 64-bit integer?

Sub-Project C: Bubble Sort

Memory Layout Questions:

  • How do you store an array in the .data section?
  • Given the base address and index, how do you calculate element address?
  • What’s the scale factor for 64-bit integers?

Loop Questions:

  • How do you implement a nested loop?
  • How do you implement “swap if greater”?
  • When should the outer loop terminate?

Sub-Project D: Mini Shell

Input Questions:

  • How do you read a line of input without libc?
  • How do you know when the user pressed Enter?
  • What syscall reads from standard input?

Process Questions:

  • What does fork() return to parent vs child?
  • What happens if execve succeeds? Fails?
  • Why do you need to wait for the child?

8. Thinking Exercise

Exercise A: Trace the Hello World Program

Before writing code, trace through what happens when your program runs:

1. Kernel loads your program at some address (let's say 0x401000)
2. RIP is set to _start (entry point)
3. CPU fetches instruction at _start

Walk through each instruction:
   MOV RAX, 1      ; RAX = 1 (sys_write)
   MOV RDI, 1      ; RDI = 1 (stdout)
   MOV RSI, ???    ; RSI = address of message - what is this address?
   MOV RDX, ???    ; RDX = length - how is this calculated?
   SYSCALL         ; What happens in the kernel?

   MOV RAX, 60     ; RAX = 60 (sys_exit)
   XOR RDI, RDI    ; RDI = 0 (exit status) - why XOR instead of MOV?
   SYSCALL         ; Program terminates

Exercise B: Integer to ASCII by Hand

Convert the number 12345 to ASCII by hand:

Step 1: 12345 % 10 = ?     Digit: ?    Remaining: ?
Step 2: ?     % 10 = ?     Digit: ?    Remaining: ?
Step 3: ?     % 10 = ?     Digit: ?    Remaining: ?
Step 4: ?     % 10 = ?     Digit: ?    Remaining: ?
Step 5: ?     % 10 = ?     Digit: ?    Remaining: ?

Digits extracted (in order): ?, ?, ?, ?, ?
Digits reversed: ?, ?, ?, ?, ?
ASCII values: ?, ?, ?, ?, ? (hint: '0' = 0x30)

Exercise C: Bubble Sort Pass

Given array [64, 34, 25, 12], trace one pass of bubble sort:

Pass 1:
  Compare array[0] and array[1]: 64 vs 34 -> swap? [?, ?, ?, ?]
  Compare array[1] and array[2]: ?  vs ?  -> swap? [?, ?, ?, ?]
  Compare array[2] and array[3]: ?  vs ?  -> swap? [?, ?, ?, ?]

Result after pass 1: [?, ?, ?, ?]
Largest element should be at: position ?

9. The Interview Questions They’ll Ask

After completing this project, you should confidently answer:

Basic Questions

  1. “What’s the difference between CALL and JMP?”
    • Expected: CALL pushes return address, JMP doesn’t
    • Bonus: Explain how RET uses the pushed address
  2. “How do you pass arguments to a function on x86-64 Linux?”
    • Expected: RDI, RSI, RDX, RCX, R8, R9, then stack
    • Bonus: Explain caller-saved vs callee-saved
  3. “What happens when you execute SYSCALL?”
    • Expected: CPU transitions to kernel mode, executes syscall handler
    • Bonus: Explain how the kernel knows which syscall to run

Intermediate Questions

  1. “Why does XOR RAX, RAX zero a register instead of MOV RAX, 0?”
    • Expected: Shorter encoding (2 bytes vs 7 bytes)
    • Bonus: Discuss dependency breaking and register renaming
  2. “What’s the red zone and when can you use it?”
    • Expected: 128 bytes below RSP, usable without adjusting RSP
    • Bonus: Only leaf functions can use it; signal handlers may clobber it
  3. “How would you implement strlen in assembly?”
    • Expected: Loop scanning for null byte, counting iterations
    • Bonus: Discuss SCASB/REPNE or SIMD approaches

Advanced Questions

  1. “What’s the most efficient way to swap two registers without a temporary?”
    • Expected: XCHG instruction, or three XORs
    • Bonus: Discuss that XCHG has implicit LOCK prefix with memory
  2. “How does fork actually create a new process?”
    • Expected: Kernel copies process state, both return from fork
    • Bonus: Discuss copy-on-write optimization
  3. “What’s position-independent code and why does it matter?”
    • Expected: Code that works regardless of load address, needed for shared libraries
    • Bonus: Explain RIP-relative addressing

10. Solution Architecture

Sub-Project A: Hello World Structure

hello.asm:
+------------------------------------------------------------------------+
|  section .data                                                         |
|  +------------------------------------------------------------------+  |
|  |  msg: db "Hello from raw assembly!", 10                          |  |
|  |  len: equ $ - msg                                                |  |
|  +------------------------------------------------------------------+  |
|                                                                        |
|  section .text                                                         |
|  +------------------------------------------------------------------+  |
|  |  global _start                                                   |  |
|  |                                                                  |  |
|  |  _start:                                                         |  |
|  |      ; Setup and call write(1, msg, len)                         |  |
|  |      ; Setup and call exit(0)                                    |  |
|  +------------------------------------------------------------------+  |
+------------------------------------------------------------------------+

Sub-Project B: Integer Printer Structure

print_int.asm:
+------------------------------------------------------------------------+
|  section .bss                                                          |
|  +------------------------------------------------------------------+  |
|  |  buffer: resb 21    ; Space for 20 digits + null                 |  |
|  +------------------------------------------------------------------+  |
|                                                                        |
|  section .text                                                         |
|  +------------------------------------------------------------------+  |
|  |  _start:                                                         |  |
|  |      ; Load test number                                          |  |
|  |      ; Call int_to_str                                           |  |
|  |      ; Call print_string                                         |  |
|  |      ; Exit                                                      |  |
|  |                                                                  |  |
|  |  int_to_str:                                                     |  |
|  |      ; Convert integer in RDI to string at RSI                   |  |
|  |      ; Handle sign                                               |  |
|  |      ; Extract digits with div                                   |  |
|  |      ; Reverse digit order                                       |  |
|  |      ; Return length in RAX                                      |  |
|  |                                                                  |  |
|  |  print_string:                                                   |  |
|  |      ; Write string at RDI with length RSI                       |  |
|  +------------------------------------------------------------------+  |
+------------------------------------------------------------------------+

Algorithm Flow for int_to_str:
+------------------------------------------------------------------------+
|                                                                        |
|  Input: RDI = number to convert, RSI = buffer pointer                  |
|                                                                        |
|     +-------------+                                                    |
|     | Is negative?|                                                    |
|     +------+------+                                                    |
|            |                                                           |
|    +-------+--------+                                                  |
|    | Yes            | No                                               |
|    v                v                                                  |
|  Store '-'       Continue                                              |
|  Negate number                                                         |
|            |                                                           |
|            v                                                           |
|     +-------------+                                                    |
|     | digit_loop  |<-----------+                                       |
|     +------+------+            |                                       |
|            |                   |                                       |
|            v                   |                                       |
|     DIV by 10                  |                                       |
|     Remainder = digit          |                                       |
|     Push digit                 |                                       |
|            |                   |                                       |
|            v                   |                                       |
|     +-------------+            |                                       |
|     |Quotient = 0?|--No--------+                                       |
|     +------+------+                                                    |
|            | Yes                                                       |
|            v                                                           |
|     +-------------+                                                    |
|     | Pop digits  |                                                    |
|     | Store ASCII |                                                    |
|     +------+------+                                                    |
|            |                                                           |
|            v                                                           |
|     Return length                                                      |
|                                                                        |
+------------------------------------------------------------------------+

Sub-Project C: Bubble Sort Structure

bubble_sort.asm:
+------------------------------------------------------------------------+
|  section .data                                                         |
|  +------------------------------------------------------------------+  |
|  |  array: dq 64, 34, 25, 12, 22, 11, 90                            |  |
|  |  count: equ ($ - array) / 8                                       |  |
|  +------------------------------------------------------------------+  |
|                                                                        |
|  section .text                                                         |
|  +------------------------------------------------------------------+  |
|  |  bubble_sort:                                                    |  |
|  |      ; Outer loop: n-1 passes                                    |  |
|  |      ; Inner loop: compare adjacent pairs                        |  |
|  |      ; Swap if out of order                                      |  |
|  |      ; Optimization: track if any swaps occurred                 |  |
|  +------------------------------------------------------------------+  |
+------------------------------------------------------------------------+

Nested Loop Structure:
+------------------------------------------------------------------------+
|                                                                        |
|  outer_loop: (i from n-1 down to 1)                                    |
|  +------------------------------------------------------------------+  |
|  |                                                                  |  |
|  |  inner_loop: (j from 0 to i-1)                                   |  |
|  |  +------------------------------------------------------------+  |  |
|  |  |                                                            |  |  |
|  |  |  Load array[j] and array[j+1]                              |  |  |
|  |  |        |                                                   |  |  |
|  |  |        v                                                   |  |  |
|  |  |  Compare: array[j] > array[j+1]?                           |  |  |
|  |  |        |                                                   |  |  |
|  |  |  +-----+-----+                                             |  |  |
|  |  |  | Yes       | No                                          |  |  |
|  |  |  v           v                                             |  |  |
|  |  |  Swap     Continue                                         |  |  |
|  |  |        |                                                   |  |  |
|  |  |        v                                                   |  |  |
|  |  |  j++, continue if j < i                                    |  |  |
|  |  |                                                            |  |  |
|  |  +------------------------------------------------------------+  |  |
|  |                                                                  |  |
|  |  i--, continue if i > 0                                          |  |
|  |                                                                  |  |
|  +------------------------------------------------------------------+  |
|                                                                        |
+------------------------------------------------------------------------+

Sub-Project D: Mini Shell Structure

minish.asm:
+------------------------------------------------------------------------+
|  section .data                                                         |
|  +------------------------------------------------------------------+  |
|  |  prompt: db "minish> ", 0                                        |  |
|  |  newline: db 10                                                  |  |
|  +------------------------------------------------------------------+  |
|                                                                        |
|  section .bss                                                          |
|  +------------------------------------------------------------------+  |
|  |  input_buf: resb 256                                             |  |
|  |  argv_buf: resq 32    ; Space for 32 argument pointers           |  |
|  +------------------------------------------------------------------+  |
|                                                                        |
|  section .text                                                         |
|  +------------------------------------------------------------------+  |
|  |  main_loop:                                                      |  |
|  |      ; Print prompt                                              |  |
|  |      ; Read line                                                 |  |
|  |      ; Check for exit                                            |  |
|  |      ; Parse into argv                                           |  |
|  |      ; Fork                                                      |  |
|  |      ; In child: execve                                          |  |
|  |      ; In parent: wait                                           |  |
|  |      ; Loop                                                      |  |
|  +------------------------------------------------------------------+  |
+------------------------------------------------------------------------+

Process Flow:
+------------------------------------------------------------------------+
|                                                                        |
|  +-----------+     +----------+     +-----------+                      |
|  | Print     |---->| Read     |---->| Parse     |                      |
|  | Prompt    |     | Input    |     | Arguments |                      |
|  +-----------+     +----------+     +-----+-----+                      |
|                                           |                            |
|                                           v                            |
|                                    +------+------+                     |
|                                    | "exit"?     |                     |
|                                    +------+------+                     |
|                                           |                            |
|                               +-----------+-----------+                |
|                               | Yes                   | No             |
|                               v                       v                |
|                          Exit program           +-----+-----+          |
|                                                 |   fork()  |          |
|                                                 +-----+-----+          |
|                                                       |                |
|                            +-------------+------------+                |
|                            |                          |                |
|                            v                          v                |
|                       Child (0)                 Parent (pid)           |
|                            |                          |                |
|                            v                          v                |
|                    +-------+-------+           +------+------+         |
|                    |    execve()   |           |   wait()    |         |
|                    | (never returns|           +------+------+         |
|                    |  on success)  |                  |                |
|                    +---------------+                  v                |
|                                              Back to main_loop         |
|                                                                        |
+------------------------------------------------------------------------+

11. Implementation Guide

Phase 1: Sub-Project A - Hello World (Day 1)

Step 1: Create the source file structure

Create hello.asm with:

  • A .data section for your message
  • A .text section for your code
  • The _start label (entry point for programs without libc)

Step 2: Write the message string

Use db (define bytes) to create your string:

  • Include a newline character (10 or 0x0A)
  • Calculate length using equ $ - msg

Step 3: Make the write syscall

Setup:

  • RAX = 1 (sys_write)
  • RDI = 1 (stdout file descriptor)
  • RSI = address of message
  • RDX = length of message

Execute: syscall

Step 4: Make the exit syscall

Setup:

  • RAX = 60 (sys_exit)
  • RDI = 0 (exit status)

Execute: syscall

Step 5: Assemble and link

nasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello

Phase 2: Sub-Project B - Integer Printer (Days 2-3)

Step 1: Handle the sign

If number < 0:
    Store '-' in buffer
    Negate number (make it positive)
    Advance buffer pointer

Step 2: Extract digits using division

The DIV instruction divides RDX:RAX by the operand:

  • Quotient goes in RAX
  • Remainder goes in RDX
Clear RDX to 0
MOV RAX, number
DIV by 10
; Now: RAX = number/10, RDX = number%10 (the digit)

Step 3: Build the string backwards

Since digits come out least-significant first:

  1. Push each digit onto the stack
  2. After all digits extracted, pop them to reverse

Or:

  1. Start writing at end of buffer
  2. Decrement buffer pointer for each digit
  3. Return pointer to first character

Step 4: Convert digit to ASCII

Digit 0-9 becomes ‘0’-‘9’ by adding 0x30 (ASCII value of ‘0’)

Phase 3: Sub-Project C - Bubble Sort (Days 4-6)

Step 1: Define the array

Use dq (define quadword) for 64-bit integers:

array: dq 64, 34, 25, 12, 22, 11, 90
count: equ ($ - array) / 8  ; Number of elements

Step 2: Implement the outer loop

Use a register for the pass counter (n-1 down to 1):

  • Initialize counter
  • At loop end: decrement, jump if not zero

Step 3: Implement the inner loop

For each pass, compare adjacent elements:

  • Load array[j] into one register
  • Load array[j+1] into another
  • Compare using CMP instruction
  • Jump to swap or no-swap based on flags

Step 4: Swap elements

If array[j] > array[j+1], swap them:

  • Can use XCHG with memory
  • Or use three MOV instructions with a temp register

Step 5: Early termination optimization

Track whether any swaps occurred in a pass:

  • If no swaps, array is sorted
  • Can exit early

Phase 4: Sub-Project D - Mini Shell (Days 7-14)

Step 1: Print the prompt

Reuse your write syscall pattern:

mov rax, 1
mov rdi, 1
mov rsi, prompt
mov rdx, prompt_len
syscall

Step 2: Read user input

Use the read syscall:

  • RAX = 0 (sys_read)
  • RDI = 0 (stdin)
  • RSI = buffer address
  • RDX = buffer size

Returns number of bytes read in RAX.

Step 3: Parse the command line

Split input into argv array:

  1. Skip leading whitespace
  2. Mark start of argument
  3. Find end of argument (whitespace or null)
  4. Replace whitespace with null
  5. Store pointer in argv array
  6. Repeat until end of input
  7. Terminate argv with NULL pointer

Step 4: Handle the “exit” command

Compare first argument with “exit” string:

  • If match, call exit syscall
  • Otherwise, continue to execute

Step 5: Fork the process

Use fork syscall:

  • RAX = 57 (sys_fork)
  • Returns 0 in child, pid in parent, -1 on error

Step 6: Execute in child

Use execve syscall:

  • RAX = 59 (sys_execve)
  • RDI = path to program (argv[0])
  • RSI = argv array
  • RDX = envp (can be NULL or pointer to environment)

If execve returns, it failed!

Step 7: Wait in parent

Use wait4 syscall:

  • RAX = 61 (sys_wait4)
  • RDI = -1 (wait for any child)
  • RSI = pointer to status (or 0)
  • RDX = 0 (no options)
  • R10 = 0 (no rusage)

12. Hints in Layers

Sub-Project A Hints

Hint 1: String definition

Use db (define byte) for strings. The number 10 is the ASCII code for newline:

msg: db "Hello", 10
Hint 2: Calculating length

The $ symbol means “current address”. So $ - msg is the length:

len: equ $ - msg

This is calculated at assembly time, not runtime.

Hint 3: The write syscall

Write returns the number of bytes written (or -1 on error). You don’t need to check it for this simple program, but in real code you should.

Hint 4: Why XOR for zero?

xor rdi, rdi is shorter than mov rdi, 0:

  • XOR: 3 bytes (48 31 ff)
  • MOV: 7 bytes (48 c7 c7 00 00 00 00)

Both set RDI to 0, but XOR is preferred.

Sub-Project B Hints

Hint 1: The division trap

DIV divides RDX:RAX. If you don’t clear RDX first, you’ll divide a huge number or get a divide error. Always:

xor rdx, rdx
div rcx
Hint 2: Converting digit to ASCII

The characters ‘0’ through ‘9’ are consecutive in ASCII starting at 0x30:

add dl, '0'    ; or add dl, 0x30
Hint 3: Handling zero

Zero is a special case–the division loop produces no digits. Check for it explicitly:

test rax, rax
jz .is_zero
Hint 4: Using the stack for reversal

Push each digit as you extract it, count them. Then pop them in reverse:

push rdx      ; Save digit
inc r8        ; Count digits
; ... after loop
pop rax       ; Get digit (reversed order)

Sub-Project C Hints

Hint 1: Array indexing

For 64-bit elements, multiply index by 8:

mov rax, [array + rcx*8]    ; Load array[rcx]
Hint 2: Comparing 64-bit values

CMP sets flags based on subtraction without storing result:

cmp rax, rbx    ; Sets flags based on RAX - RBX
jle .no_swap    ; Jump if RAX <= RBX (signed)
Hint 3: Swapping with XCHG

XCHG can swap a register with memory:

xchg rax, [array + rcx*8]    ; Swap RAX with array[rcx]

Warning: XCHG with memory has an implicit LOCK prefix (slower).

Hint 4: Loop counter strategy

Use R12-R15 for loop counters–they’re callee-saved, so they survive function calls. For pure assembly, any register works.

Sub-Project D Hints

Hint 1: Reading until newline

The read syscall returns when the user presses Enter. The newline is included in the buffer. Replace it with a null byte:

mov byte [buffer + rax - 1], 0  ; RAX = bytes read
Hint 2: Fork return value

After fork:

  • In parent: RAX = child’s PID (positive)
  • In child: RAX = 0
  • On error: RAX = -1
    syscall
    test rax, rax
    js .error      ; Negative = error
    jz .child      ; Zero = child process
    ; ... parent code
    
Hint 3: Why execve doesn't return

On success, execve replaces your process image entirely. The only way it returns is on error. If execve returns, call exit!

Hint 4: Simple string comparison

For “exit” check, compare byte by byte or use CMPSB:

mov rsi, input_buf
mov rdi, exit_str
mov rcx, 5         ; "exit\0" is 5 bytes
repe cmpsb
jne .not_exit

Common Pitfalls and Debugging

Pitfall 1: Segmentation Fault on Startup

Symptom: Program crashes immediately without printing anything.

Cause: Usually wrong section declaration or missing _start label.

Solution:

# Verify entry point
objdump -f hello | grep start

# Check sections
objdump -h hello

# Verify _start is global
nm hello | grep start

Pitfall 2: Garbage Output Instead of Text

Symptom: Program prints garbage characters.

Cause: Wrong address or length passed to write syscall.

Solution:

# Debug with GDB
gdb ./hello
(gdb) break _start
(gdb) run
(gdb) info registers
# Check RSI points to your string
# Check RDX is the correct length

Pitfall 3: Division Error (SIGFPE)

Symptom: “Floating point exception” (despite no floats).

Cause: Dividing by zero, or RDX not cleared before DIV.

Solution:

; ALWAYS clear RDX before unsigned division
xor rdx, rdx
div rbx        ; Divides RDX:RAX by RBX

Pitfall 4: Infinite Loop

Symptom: Program hangs, uses 100% CPU.

Cause: Loop condition never becomes false.

Solution:

# Attach GDB to running process
gdb -p $(pgrep your_program)
(gdb) bt        # Where is it?
(gdb) info registers  # What are the counter values?

Pitfall 5: Fork Creates Zombie Processes

Symptom: ps aux | grep defunct shows zombie processes.

Cause: Parent not calling wait after fork.

Solution: Always wait for children:

; In parent after fork
mov rax, 61    ; sys_wait4
mov rdi, -1    ; Wait for any child
xor rsi, rsi   ; No status pointer
xor rdx, rdx   ; No options
xor r10, r10   ; No rusage
syscall

Pitfall 6: Stack Alignment Issues

Symptom: Crash when calling other functions, especially those using SSE.

Cause: RSP must be 16-byte aligned before CALL instruction.

Solution:

; Ensure alignment before calls
and rsp, -16   ; Align to 16 bytes (clears low 4 bits)

Extensions and Challenges

Beginner Extensions

  • Add color output using ANSI escape codes
  • Print numbers in hexadecimal format
  • Add a simple “cat” command to your shell

Intermediate Extensions

  • Implement bubble sort for strings (strcmp in assembly)
  • Add command-line argument parsing to your programs
  • Implement a simple “echo” builtin in your shell
  • Add input/output redirection (> and <) to your shell

Advanced Extensions

  • Implement quicksort instead of bubble sort
  • Add pipes (|) to your shell
  • Support background processes (&) in your shell
  • Implement a simple text editor (read file, display, edit, save)

Expert Challenge

  • Port all programs to ARM64 assembly (use aarch64-linux-gnu-as)
  • Implement signal handling in your shell (Ctrl+C should not exit)
  • Add job control (fg, bg, jobs commands)

Resources

Essential Reading

Topic Book Chapter
x86-64 Assembly “Computer Systems: A Programmer’s Perspective” Chapter 3
System Calls “The Linux Programming Interface” Chapter 3
Calling Conventions “Computer Systems: A Programmer’s Perspective” Chapter 3.7
Practical Assembly “The Art of 64-Bit Assembly, Volume 1” All chapters

Reference Documentation

Tools

Tool Purpose Command
nasm Assembler nasm -f elf64 file.asm -o file.o
ld Linker ld file.o -o file
objdump Disassembler objdump -d file
gdb Debugger gdb ./file
strace Syscall tracer strace ./file
nm Symbol table nm file
readelf ELF info readelf -a file

Useful GDB Commands for Assembly

(gdb) layout asm          # Show assembly view
(gdb) layout regs         # Show registers
(gdb) stepi               # Step one instruction
(gdb) nexti               # Step over calls
(gdb) x/10i $rip          # Examine 10 instructions at RIP
(gdb) x/8xg $rsp          # Examine 8 quadwords at RSP
(gdb) info registers      # Show all registers
(gdb) p/x $rax            # Print RAX in hex

Self-Assessment Checklist

Understanding

  • I can explain what happens during a syscall instruction
  • I understand the difference between caller-saved and callee-saved registers
  • I can read and write x86-64 addressing modes
  • I understand how the stack grows and is used for local variables
  • I can explain how fork/exec creates new processes

Implementation

  • My Hello World program runs and prints correctly
  • My integer printer handles positive, negative, and zero
  • My bubble sort correctly sorts an array
  • My shell can run basic commands like ls, echo, pwd
  • I can assemble, link, and debug my programs

Growth

  • I can read disassembled C code and understand it
  • I can trace through assembly code by hand
  • I know when assembly is appropriate vs high-level languages
  • I’m comfortable using GDB to debug at the instruction level

Learning Milestones

Milestone Indicator
Hello World runs You understand syscalls and basic assembly syntax
Integer printer works You can implement algorithms in assembly
Bubble sort completes You understand loops, arrays, and memory access
Shell runs commands You understand process creation and the Unix model

What’s Next?

After completing this project, you’re ready for:

  1. Project 7: Memory Visualizer & Debugger - Use ptrace to inspect running processes
  2. Project 8: x86-64 Disassembler - Parse the machine code you’ve been writing
  3. Project 11: Bare-Metal Programming - Write assembly that runs without an OS

This guide was expanded from CPU_ISA_ARCHITECTURE_PROJECTS.md. For the complete learning path, see the project index.