← Back to all projects

LEARN MODERN GC DEEP DIVE

In the early days of programming, memory management was a manual, error-prone struggle. One forgotten `free()` led to a leak; one premature `free()` led to a crash. Garbage Collection (GC) was the industry's answer to Memory Safety at Scale.

Learn Modern Garbage Collection: From Reference Counting to ZGC/Shenandoah

Goal: Deeply understand the engineering marvels behind modern memory management. You will progress from simple reference counting to building the components of ultra-low latency collectors like ZGC and Shenandoah, mastering concepts like colored pointers, load barriers, and concurrent compaction. By the end, you’ll know exactly how runtimes manage billions of objects with sub-millisecond pauses.


Why Garbage Collection Matters

In the early days of programming, memory management was a manual, error-prone struggle. One forgotten free() led to a leak; one premature free() led to a crash. Garbage Collection (GC) was the industry’s answer to “Memory Safety at Scale.”

Modern GCs are no longer just “background cleaners.” They are sophisticated, multi-threaded systems that:

  • Predict the future: Using generational hypotheses to optimize for short-lived objects.
  • Rewrite the present: Moving objects in memory while the application is still running (Concurrent Compaction).
  • Abuse Hardware: Using CPU pointer tagging and hardware-level barriers to track memory state with zero-latency overhead.

Understanding GC is the difference between a developer who wonders why their app “stutters” and an engineer who can tune the JVM or Go runtime for 99.99th percentile latency.


Core Concept Analysis

1. The Tri-Color Marking Algorithm

The foundation of almost all modern concurrent collectors. It visualizes the GC “wave” as it moves through the object graph.

[ White ] -> [ Grey ] -> [ Black ]

1. White: Unvisited objects (candidates for deletion).
2. Grey:  Visited, but their children haven't been scanned.
3. Black: Visited, and children are also visited (safe).

GC Goal: Move all reachable objects to Black. Anything left White is garbage.

2. The Memory Layout: Regions vs. Monolithic

Older GCs viewed memory as one giant block. Modern GCs (G1, ZGC) break it into “Regions.”

Monolithic Heap:
[ SSSSSSSSSSSS HHHHHHHHHHHHHHHHHH ] 
(Stack)      (Heap - hard to defrag)

Region-based Heap (G1/ZGC):
[ R1:Young ] [ R2:Old   ] [ R3:Free  ]
[ R4:Young ] [ R5:Humong] [ R6:Old   ]
(Easy to clean specific areas without stopping the whole world)

3. Colored Pointers & Load Barriers

This is the “Magic” of ZGC. Instead of looking at the object to see its state, we look at the pointer itself.

ZGC Pointer Layout (64-bit):
| 0000 | 0 | 1 | 0 | 1 | [ 42 bits of Address ] |
         ^   ^   ^   ^
         |   |   |   +-- Remapped Bit
         |   |   +------ Finalizable Bit
         |   +---------- Marked 1 Bit
         +-------------- Marked 0 Bit

Load Barrier Logic:
Object obj = field.get(); // <-- Trigger Load Barrier
if (ptr_is_bad(obj)) {
    obj = fix_pointer(obj); // Slow path: Remap or Mark
}
return obj; // Fast path: Just a bitmask check

Concept Summary Table

Concept Cluster What You Need to Internalize
Reachability If you can’t find it from a “Root” (stack/statics), it’s dead.
Fragmentation Memory isn’t just “full”; it gets “holey.” Compaction is the cure.
Stop-The-World The “pause” where the app stops so the GC can move things safely.
Concurrency The GC runs while your code runs. This requires Read/Write barriers.
Generational “Most objects die young.” Young Gen is fast; Old Gen is thorough.

Deep Dive Reading by Concept

Foundations

| Concept | Book & Chapter | |———|—————-| | Mark-and-Sweep | “The Garbage Collection Handbook” by Richard Jones — Ch. 2: “Mark-sweep Garbage Collection” | | Copying GC | “The Garbage Collection Handbook” by Richard Jones — Ch. 4: “Copying Garbage Collection” |

Modern Mechanics

| Concept | Book & Chapter | |———|—————-| | G1 Internals | “The Garbage Collection Handbook” — Ch. 12: “Garbage-First Garbage Collection” | | ZGC/Shenandoah | “Modern Operating Systems” by Tanenbaum — Ch. 3 (Memory Mgmt principles) + ZGC Wiki/RFCs | | Write Barriers | “The Garbage Collection Handbook” — Ch. 11: “Barriers” |

Essential Reading Order

  1. The Basics (Week 1): The Garbage Collection Handbook Ch 1-3. Understand why we need GC.
  2. The Modern Shift (Week 2): The Garbage Collection Handbook Ch 11-12. Learn about G1 and Barriers.

Project 1: The Manual Reference Counter

  • File: LEARN_MODERN_GC.md
  • Main Programming Language: C
  • Alternative Programming Languages: C++, Rust (raw pointers)
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Memory Management / Pointers
  • Software or Tool: malloc/free
  • Main Book: “The C Programming Language” by K&R

What you’ll build: A smart-pointer-like system in C where you wrap allocations in a struct that tracks how many “owners” it has. When the count hits zero, it automatically calls free().

Why it teaches GC: This is the “Grandfather” of GC (used in Python and Swift). You’ll learn the primary struggle: Circular Dependencies. You’ll see why reference counting alone isn’t enough when Object A points to B, and B points to A.

Core challenges you’ll face:

  • Tracking increments/decrements → maps to Reference Counting overhead
  • Handling circular refs (The Leak) → maps to Why we need Mark-and-Sweep
  • Thread safety → maps to Atomic increments vs. Performance

Key Concepts:

  • Reference Counting: “The Garbage Collection Handbook” Ch. 5 - Richard Jones
  • Atomic Operations: “C Programming: A Modern Approach” - K.N. King

Real World Outcome: You’ll have a library ref.h that allows code like this:

REF(Person) p = ref_new(Person); 
ref_inc(p); // Count 1
{
   REF(Person) alias = p;
   ref_inc(alias); // Count 2
}
ref_dec(p); // Count 1
// ... later ...
ref_dec(p); // Count 0 -> memory freed!

Project 2: The “Stop-The-World” Mark-and-Sweep

  • File: LEARN_MODERN_GC.md
  • Main Programming Language: C++
  • Alternative Programming Languages: Go, Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Data Structures / Graph Traversal
  • Software or Tool: Custom VM/Heap allocator
  • Main Book: “The Garbage Collection Handbook” by Richard Jones

What you’ll build: A tiny virtual machine with a fixed heap. When the heap is full, the VM “stops,” scans all global and stack variables, marks everything reachable, and sweeps (frees) the rest.

Why it teaches GC: This is the core of Java’s early GCs. You’ll learn how to “crawl” an object graph and the necessity of the “Stop-The-World” pause.

Core challenges you’ll face:

  • Root Scanning → How do you find the “start” of the graph?
  • DFS vs BFS for marking → maps to Stack overflow risks during GC
  • Bitmaps vs. Object Headers → Where do you store the “Marked” bit?

Real World Outcome: A console program that prints:

Allocation failed! Starting GC...
[GC] Roots: 4
[GC] Marked: 152 objects
[GC] Swept: 45 objects (12 KB freed)
Allocation resumed.