MODERN GARBAGE COLLECTORS DEEP DIVE
For decades, Garbage Collection (GC) was seen as a necessary evil. It provided memory safety and developer productivity but at the cost of unpredictable Stop-the-World (STW) pauses. These pauses could freeze a high-frequency trading system or a real-time multiplayer game for seconds, making managed languages like Java or Go unsuitable for low-latency tasks.
Sprint: Modern Garbage Collection - Deep Dive Mastery
Goal: Deeply understand the engineering marvels behind modern, high-performance garbage collectors like ZGC, Shenandoah, and G1. You will transition from basic âstop-the-worldâ concepts to mastering concurrent compaction, colored pointers, and the complex barrier systems that enable sub-millisecond pauses on multi-terabyte heaps. By the end, youâll be able to explain exactly how JVM threads interact with the GC at the assembly level and why certain architectures are chosen for specific workloads.
Why Modern Garbage Collection Matters
For decades, Garbage Collection (GC) was seen as a ânecessary evil.â It provided memory safety and developer productivity but at the cost of unpredictable âStop-the-Worldâ (STW) pauses. These pauses could freeze a high-frequency trading system or a real-time multiplayer game for seconds, making managed languages like Java or Go âunsuitableâ for low-latency tasks.
In the last decade, weâve entered the era of Concurrent Compaction. Modern GCs like ZGC and Shenandoah can mark, move, and update references to objects while the application is still running.
Consider the scale:
- ZGC can manage a 16TB heap with pause times consistently under 1ms.
- The JVM handles trillions of allocations per second in massive distributed systems.
- CVEs and Security: Manual memory management errors (use-after-free, double-free) account for ~70% of security vulnerabilities. GC eliminates these classes of bugs entirely.
Understanding GC is not just about âtuning flagsâ; itâs about understanding how the hardware, the operating system, and the language runtime cooperate to simulate âinfinite memory.â
The Memory Hierarchy: Where GC Lives
Garbage collection is fundamentally a cache-locality and memory-bandwidth problem. To understand why modern GCs are designed the way they are, you must understand where the data actually lives.
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â CPU â
â âââââââââââ â
â â Registersâ â Fastest (1 cycle), smallest (64 bytes) â
â ââââââŹâââââ â
â â â
â ââââââ´âââââ â
â â L1 Cacheâ â Very fast (4 cycles), small (64 KB) â
â ââââââŹâââââ â
â â â
â ââââââ´âââââ â
â â L2 Cacheâ â Fast (10 cycles), medium (256 KB) â
â ââââââŹâââââ â
â â â
â ââââââ´âââââ â
â â L3 Cacheâ â Moderate (40 cycles), larger (8 MB) â
â ââââââŹâââââ â
âââââââââźââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â
âââââââââ´ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â RAM â Slow (100+ cycles), large (16+ GB) â
âââââââââ´ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
When a GC âscansâ the heap, itâs pulling data from RAM into the CPU caches. If the GC isnât careful, it can âpolluteâ the cache, kicking out your applicationâs useful data and causing a massive performance drop. This is why âConcurrentâ collectors are so revolutionaryâthey attempt to do this work while the CPU is otherwise idle or on spare cores.
Core Concept Analysis
1. The Search for Life: Tracing vs. Reference Counting
There are two primary ways to find âgarbage.â
Reference Counting keeps a tally. When the count hits zero, the object is dead. Pros: Immediate reclamation. Cons: Cannot handle circular references and has high overhead on every pointer update.
Tracing starts from âRootsâ (Stack, Statics) and follows every pointer to find live objects. Anything not reached is garbage.
ROOTS (Stack, Globals)
â
âź
[Object A] ââââş [Object B]
â â
âź âź
[Object C] [Object D] (Unreachable)
â˛
â
[Object E] (Unreachable)
RESULT: D and E are collected, even if they point to each other.

2. The Tri-Color Marking Algorithm
To mark objects concurrently (while the app is running), we use a state machine.
- â WHITE: Unvisited. Potential garbage.
- â GREY: Visited, but its children havenât been scanned yet (the âFrontierâ).
- â BLACK: Visited, and all reachable children are also visited. Live.
The Concurrency Nightmare: If an application thread (the Mutator) moves a WHITE object from a GREY parent to a BLACK parent, the GC will never see that object again. It will stay WHITE and be deletedâeven though itâs live!
Step 1: GC is at GREY Node A. Node C is WHITE.
[BLACK A] --- [GREY B] --- [WHITE C]
Step 2: Mutator moves C to point from A.
[BLACK A] --+ [GREY B] [WHITE C]
| ^
+----------------+
Step 3: GC finishes B, doesn't see C.
C is collected. SYSTEM CRASH.

3. Barriers: The Traffic Cops
To prevent the âLost Objectâ problem, we use Barriers. These are tiny snippets of code injected by the JIT compiler into your application.
- Write Barrier (G1/Go): âIâm changing a reference! Let me tell the GC so it can re-scan this area.â
- Load Barrier (ZGC): âIâm reading a pointer! Let me check if the GC moved the object or if it needs marking.â
// Original C# / Java Code
myObj.field = otherObj;
// Compiled Assembly with Write Barrier (Simplified)
MOV [RAX+8], RBX ; Perform the write
CALL gc_write_barrier ; Tell the GC: "Hey, something changed!"
4. Compaction and The âMovingâ Problem
Marking is only half the battle. If you just delete objects, your memory becomes âSwiss Cheeseâ (Fragmentation). You eventually run out of space for large objects even if you have enough total memory.
Compaction moves objects together to create big blocks of free space.
[Obj A][Free][Obj B][Free][Obj C] <-- Fragmented
â â â
âź âź âź
[Obj A][Obj B][Obj C][ Free Space ] <-- Compacted

The Challenge: When you move Obj B, every pointer in the entire system that pointed to 0x1234 must now point to 0x5678.
5. Colored Pointers (ZGCâs Innovation)
Traditional GCs store metadata (like âis this object marked?â) in the Object Header. ZGC stores it in the Pointer itself.
64-bit Pointer Layout:
| 18 bits | 1 bit | 1 bit | 1 bit | 1 bit | 42 bits |
| Unused | Final | Remap | Mark1 | Mark0 | Address |

By looking at the pointer, the CPU can tell instantly via a Load Barrier if the address is stale without ever touching the actual memory of the object. This is what enables 1ms pauses on 16TB heaps.
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Tracing Basics | Finding life from roots. The heap is a directed graph. |
| Tri-Color Marking | The formal logic for concurrent discovery. |
| Barriers | How the compiler helps the GC. The bridge between the app and the collector. |
| Compaction | Solving fragmentation by moving objects. The hardest part of GC. |
| Generational GC | The âWeak Generational Hypothesisâ: most objects die young. |
| Region-based Heap | Breaking the heap into manageable chunks (G1, ZGC) instead of two giant pools. |
| Concurrency Invariants | SATB (Snapshot-at-the-Beginning) vs. Incremental Update. |
Deep Dive Reading by Concept
This section maps each concept to specific book chapters. Read these before or alongside the projects.
Foundations & Algorithms
| Concept | Book & Chapter |
|---|---|
| Basic Algorithms (Mark-Sweep, Copying) | âThe Garbage Collection Handbookâ (Jones et al.) â Ch. 2, 3, 4 |
| Reference Counting | âThe Garbage Collection Handbookâ â Ch. 5 |
| Allocation Strategies | âThe Garbage Collection Handbookâ â Ch. 7 |
Modern Collector Architectures
| Concept | Book & Chapter |
|---|---|
| Generational GC | âThe Garbage Collection Handbookâ â Ch. 9 |
| G1 (Garbage First) Internals | âJava Performanceâ (Scott Oaks) â Ch. 6 & âThe Garbage Collection Handbookâ â Ch. 16 |
| ZGC & Shenandoah (Concurrent) | âThe Garbage Collection Handbookâ â Ch. 17: âConcurrent and Real-time collectionâ |
| Barriers in Depth | âThe Garbage Collection Handbookâ â Ch. 11: âBarriersâ |
Runtime & Hardware Interaction
| Concept | Book & Chapter |
|---|---|
| Stack Scanning & Safepoints | âThe Garbage Collection Handbookâ â Ch. 11.2 |
| Hardware Cache Effects | âThe Garbage Collection Handbookâ â Ch. 12 |
| JIT Compilation & Barriers | âComputer Systems: A Programmerâs Perspectiveâ â Ch. 3 (Understanding the generated assembly) |
Project List
Projects are ordered from fundamental understanding to advanced implementations.
Project 1: The âSimpletonâ Mark-Sweep (Tracing Basics)
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig
- Difficulty: Level 3: Advanced
- Knowledge Area: Memory Management
- Main Book: âThe Garbage Collection Handbookâ Ch. 2
What youâll build: A minimal runtime where you allocate objects from a pre-allocated âHeapâ array. When memory runs out, your gc() function triggers, pauses everything, marks reachable objects, and sweeps the rest.
Real World Outcome
You will build a custom memory management system from scratch. When your application runs out of memory, instead of crashing with a âSegmentation Faultâ or âOut of Memoryâ error, you will see your GC wake up, scan the stack, identify live objects, and reclaim the âholesâ in the heap.
Execution Trace:
$ gcc simple_gc.c -o simple_gc && ./simple_gc
[INIT] Virtual Heap initialized: 1024 bytes at 0x7fff5fbff000
[ALLOC] ObjA (64 bytes) -> Address: 0x7fff5fbff010
[ALLOC] ObjB (128 bytes) -> Address: 0x7fff5fbff050
[ALLOC] ObjC (512 bytes) -> Address: 0x7fff5fbff0d0
[ALLOC] ObjD (512 bytes) -> ERROR: Out of Memory!
[GC] --- PHASE: MARK ---
[GC] Scanning Root: stack_var_a (points to 0x7fff5fbff010) -> Marked ObjA
[GC] Scanning Root: stack_var_b (points to 0x7fff5fbff0d0) -> Marked ObjC
[GC] --- PHASE: SWEEP ---
[GC] Reclaiming unreachable object at 0x7fff5fbff050 (128 bytes)
[GC] COMPLETED. Reclaimed 128 bytes.
[ALLOC] Retrying ObjD (512 bytes) -> STILL ERROR: Not enough contiguous space.
# You've just discovered FRAGMENTATION!
The Core Question Youâre Answering
âHow do we identify âtrashâ without knowing what trash is?â
Before you write any code, sit with this question. Most developers think GC searches for dead objects. It doesnât. Dead objects are âinvisibleâ to the GC. A GC searches for life. Anything it doesnât find is, by definition, trash. This project teaches you the reachability algorithm that powers almost every modern language.
Concepts You Must Understand First
Stop and research these before coding:
- The Reachability Graph
- What is a âRootâ? (Global variables, variables on the current stack).
- Why is the heap a Directed Graph?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 2.1
- Finding the Stack Start/End
- How do you know where the C stack begins and where it currently is?
- Why do we need to scan the stack to find pointers to the heap?
- Book Reference: âComputer Systems: A Programmerâs Perspectiveâ Ch. 3.7
- Object Headers (Metadata)
- Why does every object need a hidden âprefixâ (header) to store its size and its âMark Bitâ?
- How do you jump from a pointer back to its header in memory?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 11.2
- Padding & Alignment
- Why must objects be aligned (usually to 8 bytes) for the CPU to read them efficiently?
- Book Reference: âWrite Great Code, Volume 1â Ch. 3
Questions to Guide Your Design
- Allocating Memory
- How will you manage a big block of memory (
char heap[1024])? - Will you use a âFree Listâ to keep track of available holes?
- How will you manage a big block of memory (
- Identifying Pointers
- If you see a random 64-bit number on the stack, how do you know if itâs a pointer to your heap or just a large integer? (This is why GCs are often âConservativeâ or âPreciseâ).
- The Sweep Phase
- Once youâve marked all live objects, how do you âsweepâ the dead ones?
- Do you merge adjacent free blocks (Coalescing)?
Thinking Exercise
Trace a Reference Cycle
Before coding, trace this on paper:
struct Node {
struct Node* next;
int data;
};
void main() {
struct Node* a = gc_alloc(sizeof(struct Node)); // Root A
struct Node* b = gc_alloc(sizeof(struct Node)); // Not a root
struct Node* c = gc_alloc(sizeof(struct Node)); // Not a root
a->next = b;
b->next = c;
c->next = b; // CYCLE! B points to C, C points to B
a = NULL; // Root A is now gone
gc();
}
Questions while tracing:
- Even though B and C point to each other, are they reachable from the Roots?
- Why does simple Reference Counting fail here while Mark-Sweep succeeds?
- Draw the graph before and after
a = NULL.
The Interview Questions Theyâll Ask
- âWhat is the difference between Mark-Sweep and Reference Counting?â
- âWhy is a Mark-Sweep collector called âStop-the-Worldâ?â
- âWhat is a âRootâ in GC terms, and where do they typically live?â
- âHow do you handle circular references in Mark-Sweep?â
- âWhat is fragmentation, and how does the Sweep phase contribute to it?â
Hints in Layers
Hint 1: The Virtual Heap
Donât use malloc. Create a static char heap[1024*1024] and a char* next_free_ptr. This is called a âBump Allocator.â
Hint 2: The Header struct Define a header that prefixes every object:
typedef struct {
size_t size;
int marked; // 0 = dead, 1 = live
} Header;
Hint 3: Marking from the Stack
To simulate roots, create a global array of pointers: void* roots[100]. In a real GC, youâd scan the assembly stack, but for Project 1, manually adding/removing pointers from roots is a great start.
Hint 4: Recursive Marking
Your mark(void* ptr) function should:
- Find the header (ptr - sizeof(Header)).
- If already marked, return (prevent infinite loops in cycles!).
- Mark it.
- Scan the objectâs body for other pointers and call
mark()on them.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Mark-Sweep Algorithm | âThe Garbage Collection Handbookâ | Ch. 2 |
| Heap Allocation Strategies | âThe Garbage Collection Handbookâ | Ch. 7 |
| Finding Roots on the Stack | âThe Garbage Collection Handbookâ | Ch. 11.2 |
| Stack Layout in C | âComputer Systems: A Programmerâs Perspectiveâ | Ch. 3.7 |
Project 2: The âNomadâ Copying GC (Moving Objects)
- Main Programming Language: C
- Alternative Programming Languages: C++
- Difficulty: Level 3: Advanced
- Knowledge Area: Pointers & Relocation
- Main Book: âThe Garbage Collection Handbookâ Ch. 4
What youâll build: A âSemi-Spaceâ collector. You split your heap into two halves: FromSpace and ToSpace. When FromSpace is full, you copy all live objects into ToSpace and update every single pointer in your application to the new addresses.
Real World Outcome
You will create a memory manager that completely eliminates external fragmentation. Every time the GC runs, you will witness your objects being physically moved from one memory region to another and packed tightly. Youâll see that after collection, you have one giant, contiguous block of free space, allowing for incredibly fast âBump Allocation.â
Execution Trace:
$ ./copying_gc
[ALLOC] ObjA (100 bytes) at 0x1000 (FromSpace)
[ALLOC] ObjB (100 bytes) at 0x1064 (FromSpace)
[DELETE] ObjA reference removed.
[ALLOC] ObjC (900 bytes) -> FromSpace Full!
[GC] --- PHASE: EVACUATE ---
[GC] ObjB is LIVE. Moving 0x1064 -> 0x2000 (ToSpace)
[GC] Writing Forwarding Address 0x2000 into old header at 0x1064
[GC] --- PHASE: PATCH ---
[GC] Root 'ptr_b' was 0x1064, now updated to 0x2000
[GC] --- PHASE: FLIP ---
[GC] ToSpace is now the active FromSpace.
[INFO] Recovered 100 bytes of fragmented space.
[ALLOC] ObjC (900 bytes) -> SUCCESS at 0x2064
The Core Question Youâre Answering
âHow do we move data while keeping the application from crashing?â
Moving objects is the most dangerous thing a GC can do. If you move an object but forget to update just one pointer that points to it, your program will access garbage memory and crash with a Segfault. This project teaches you the logic of Forwarding Pointers and the Cheney Scan algorithm.
Concepts You Must Understand First
- Semi-Space Layout
- Why do we only use 50% of the memory at any given time?
- What are the âFromSpaceâ and âToSpaceâ?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 4.1
- Forwarding Addresses
- When you move an object, how do you leave a âForwarding Addressâ in the old location so other pointers can find it?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 4.2
- Breadth-First vs. Depth-First Scanning
- What is the Cheney Scan, and why is it more memory-efficient than recursive marking?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 4.3
Questions to Guide Your Design
- The Move Logic
- When you copy an object to
ToSpace, what happens to the internal pointers it contains? (They still point toFromSpace!). - How do you âfixâ those pointers after the move?
- When you copy an object to
- Object Identification
- How can you tell if an object in
FromSpacehas already been moved? (Check for a âForwarding Tagâ in the header).
- How can you tell if an object in
- Allocation Performance
- Why is allocation in a Copying GC much faster than in a Mark-Sweep GC? (Hint: No free-list searching).
Thinking Exercise
The Relocation Race
Imagine you have a chain: Root -> Obj A -> Obj B.
- You copy
Obj Ato the new space. Rootnow points toNew A.New Astill points to theOld Bin the old space!- What happens if you delete the old space before you find
Old B?
Trace this process on paper:
- Draw the two spaces.
- Show the state of pointers after
Amoves but beforeBmoves. - Identify the âScan Pointerâ and âFree Pointerâ in Cheneyâs algorithm.
The Interview Questions Theyâll Ask
- âWhy does a Copying GC require twice as much memory as a Mark-Sweep GC?â
- âWhat is a âForwarding Pointerâ?â
- âExplain the Cheney Scan algorithm. Is it BFS or DFS?â
- âWhat is the primary advantage of a Copying GC over Mark-Sweep?â
- âHow does a Copying GC improve CPU cache locality?â
Hints in Layers
Hint 1: Two Heaps
Define two global arrays: char space1[1024] and char space2[1024]. Use a pointer char* current_space to track which one is active.
Hint 2: The Forwarding Header Add a field to your header:
typedef struct {
size_t size;
void* forwarding_address; // NULL if not moved
} Header;
Hint 3: The Copy Function
The copy(void* old_ptr) function should:
- Check if
header->forwarding_addressis set. - If yes, return it.
- If no,
memcpythe object toToSpace, setforwarding_address, and return the new address.
Hint 4: The Cheney Scan
Maintain two pointers in ToSpace: scan and free.
freeis where you put new copies.scanis where you are currently looking for pointers to update.- When
scan == free, the GC is finished.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Copying GC Fundamentals | âThe Garbage Collection Handbookâ | Ch. 4 |
| Allocation in Copying GCs | âThe Garbage Collection Handbookâ | Ch. 7.3 |
| The Cheney Algorithm | âThe Garbage Collection Handbookâ | Ch. 4.3 |
Project 3: The Generational Gap (Optimizing for Youth)
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Difficulty: Level 4: Expert
- Knowledge Area: Generational Hypothesis
- Main Book: âThe Garbage Collection Handbookâ Ch. 9
What youâll build: Enhance Project 1 by splitting the heap into a âYoung Genâ and an âOld Gen.â Youâll implement a âpromotionâ policy: objects that survive 3 minor collections get moved (promoted) to the Old Gen.
Real World Outcome
You will implement the same optimization that makes the JVM and V8 (Chrome) so fast. You will observe how âMinor GCsâ can reclaim huge amounts of memory in microseconds by ignoring the vast majority of the heap (the Old Generation). Youâll also implement a Write Barrier to handle the âOld-to-Youngâ pointer problem.
Execution Trace:
$ ./gen_gc
[ALLOC] 1000 short-lived objects in YoungGen (Nursery).
[ALLOC] 1 long-lived Cache object.
[GC] --- MINOR GC (YoungGen Only) ---
[GC] Scanned 1.2MB in 0.05ms.
[GC] 999 objects collected. 1 object promoted (Age 1).
[INFO] OldGen usage: 0.1%. No Major GC needed.
# Result: Your application continues with almost zero interruption.
The Core Question Youâre Answering
âWhy scan the whole desert if the water is only in the oasis?â
The Weak Generational Hypothesis states that âMost objects die young.â If you focus your energy on the âNurseryâ (where new objects are born), you can reclaim most memory very quickly without looking at long-lived objects (like global configuration or caches).
Concepts You Must Understand First
- The Weak Generational Hypothesis
- Why do objects usually die quickly?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 9.1
- Remembered Sets & Card Tables
- The âOld-to-Youngâ problem: If an object in the Old Gen points to an object in the Young Gen, how do you find it during a Minor GC without scanning the entire Old Gen?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 9.4
- Promotion (Tenuring) Logic
- How do you track the âAgeâ of an object? (Usually a few bits in the header).
- Book Reference: âThe Garbage Collection Handbookâ Ch. 9.2
Questions to Guide Your Design
- Heap Partitioning
- How much of your total memory should be Young Gen vs. Old Gen? (Common ratio is 1:2 or 1:3).
- Promotion Threshold
- How many survival cycles are enough to prove an object is âOldâ?
- The Write Barrier
- Every time an application thread writes a pointer, you must check: âDid an Old object just start pointing to a Young object?â If yes, you must record this in your Remembered Set.
Thinking Exercise
The Nursery Trap
Imagine your web server handles a request. It creates 1,000 small temporary objects (strings, JSON nodes).
- The request finishes. All 1,000 objects are now dead.
- A Minor GC runs.
- How many objects does the GC have to scan if it only looks at the Young Gen?
- How many would it have to scan if there was no Young Gen and it had to scan the whole 1GB heap?
Questions:
- Does the GC work harder when there is a lot of garbage or when there is a lot of live data?
The Interview Questions Theyâll Ask
- âWhat is the Weak Generational Hypothesis?â
- âWhat is a âMinor GCâ vs. a âMajor GCâ (or Full GC)?â
- âWhat is a âRemembered Setâ and why is it necessary?â
- âExplain âCard Tableâ marking. How does it speed up Minor GCs?â
- âWhy canât we just use a Copying GC for everything?â (Hint: Memory overhead of 2x for large heaps).
Hints in Layers
Hint 1: Two Heaps (Again) Use your Copying GC (Project 2) for the Young Gen (itâs great for short-lived objects) and your Mark-Sweep (Project 1) for the Old Gen.
Hint 2: The Age Counter Update your header to include 2 bits for age:
typedef struct {
unsigned int age : 2;
unsigned int marked : 1;
size_t size;
} Header;
Hint 3: The Promotion Trigger
When the Copying GC is about to move an object to ToSpace, check header->age. If age == 3, call move_to_old_gen(ptr) instead of moving it to ToSpace.
Hint 4: The Card Table
Create a byte array char card_table[HEAP_SIZE / 512].
Every time a pointer is written: card_table[address / 512] = 1.
During Minor GC, only scan the 512-byte blocks of the Old Gen that have a âdirtyâ card.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Generational GC Theory | âThe Garbage Collection Handbookâ | Ch. 9 |
| Card Tables & Barriers | âThe Garbage Collection Handbookâ | Ch. 9.4 |
| Tenuring Strategies | âThe Garbage Collection Handbookâ | Ch. 9.2 |
| G1âs Generational approach | âJava Performanceâ | Ch. 6 |
Project 4: Tri-Color Visualizer (Concurrent Marking)
- Main Programming Language: JavaScript (React/SVG or D3.js)
- Alternative Programming Languages: Python (Pygame), Rust (Wasm + Leptos/Yew)
- Difficulty: Level 2: Intermediate
- Knowledge Area: Algorithm Visualization
- Main Book: âThe Garbage Collection Handbookâ Ch. 13
What youâll build: An interactive web app where users can create âObject Nodesâ and âReference Edges.â You can then step through the Tri-Color Marking algorithm (White, Grey, Black) and simulate a âMutatorâ (application thread) changing pointers while the GC is running.
Real World Outcome
You will create a tool that makes the abstract concept of concurrent marking visible and intuitive. You will be able to play with the algorithm, intentionally trying to âbreakâ the GC by moving pointers during the marking phase, and then seeing how a Write Barrier catches and fixes your attempt.
Visual Outcome:
- Canvas Interface: A drag-and-drop graph editor.
- Color Coding: Nodes flash different colors as the GC âFrontierâ (Grey) moves through the graph.
- Violation Alert: If you successfully move a White node to a Black parent while the GC isnât looking, the screen flashes Red with an âOBJECT LEAK DETECTEDâ warning.
- Barrier Toggle: A switch to enable/disable the âIncremental Updateâ barrier.
Interaction Flow:
- The Setup: Create Node A (Root), B, and C. A points to B. B points to C.
- The Run: Start the GC. A turns Black. B turns Grey. C is White.
- The Sabotage: The user (Mutator) moves the pointer: A now points to C, and B points to NULL.
- The Result: The GC finishes B (now points to nothing). Since A is already Black, itâs not scanned again.
- The Bug: C stays White and the visualizer flashes âERROR: LIVE OBJECT COLLECTED!â
- The Fix: Enable âWrite Barrierâ mode. Watch C turn Grey automatically when the pointer is moved.
The Core Question Youâre Answering
âHow do we mark objects while they are still being moved around by the application?â
This is the central challenge of Concurrent Marking. If the world doesnât stop, the graph you are scanning is constantly changing. You are essentially painting a moving car. This project teaches you the Tri-Color Invariants that keep the car from crashing.
Concepts You Must Understand First
- Tri-Color Marking States
- White: Unvisited.
- Grey: Frontier (visited, but children are not).
- Black: Safe (visited and children are visited).
- Book Reference: âThe Garbage Collection Handbookâ Ch. 13.1
- The Lost Object Problem
- Under what conditions can a live object be missed? (Pointer from Black to White + No path from Grey to White).
- Book Reference: âThe Garbage Collection Handbookâ Ch. 13.2
- Write Barrier Strategies
- Incremental Update (G1): If you store a pointer, turn the destination Grey.
- SATB (Snapshot-At-The-Beginning): If you overwrite a pointer, turn the old value Grey.
- Book Reference: âThe Garbage Collection Handbookâ Ch. 13.3
Questions to Guide Your Design
- The State Machine
- How will you manage the transition of nodes between colors?
- Mutator Interference
- How will you allow the user to modify the graph while the âGC processâ is stepping through?
- Invariant Checking
- Can you write a function that detects a violation of the âNo Black-to-White pointersâ rule in real-time?
Thinking Exercise
The Stealth Move
Draw this graph: A (Black) -> B (Grey) -> C (White).
- If you move the link from
B -> CtoA -> C, and then deleteB -> C. - At this exact moment, is C reachable from the roots? (Yes, through A).
- Does the GC know about the path through A? (No, A is already finished).
- What must the âWrite Barrierâ do to save C?
The Interview Questions Theyâll Ask
- âWhat are the three colors in Tri-Color marking, and what does each represent?â
- âExplain the âWhite-to-Blackâ violation.â
- âWhat is the difference between an âIncremental Updateâ barrier and a âSnapshot-at-the-Beginningâ barrier?â
- âWhy is concurrent marking harder than Stop-the-World marking?â
Hints in Layers
Hint 1: Graph Representation
Use a simple adjacency list or an array of objects: { id: 1, color: 'white', refs: [2, 3] }.
Hint 2: The Step-by-Step Loop Implement the GC as a loop that:
- Picks a âGreyâ node.
- Turns all its âWhiteâ children to âGreyâ.
- Turns the node itself to âBlackâ.
Hint 3: Simulating the Mutator Add a button âBreak Link and Reconnect.â When clicked, it should perform the âStealth Moveâ described in the thinking exercise.
Hint 4: Visualizing the Barrier
When the barrier is ON, any operation that changes node.refs should immediately check if itâs creating a Black-to-White link and change the destination node to Grey.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Tri-Color Marking Logic | âThe Garbage Collection Handbookâ | Ch. 13.1 |
| Barrier Mechanics | âThe Garbage Collection Handbookâ | Ch. 13.3 |
| G1 SATB Logic | âJava Performanceâ | Ch. 6 |
Project 5: The Barrier Benchmarker (JIT & Assembly)
- Main Programming Language: C++ / Inline Assembly
- Alternative Programming Languages: Zig, Rust
- Difficulty: Level 4: Expert
- Knowledge Area: Systems Performance
- Main Book: âThe Garbage Collection Handbookâ Ch. 15
What youâll build: A performance suite that measures the nanosecond overhead of different GC barriers. Youâll use inline assembly to simulate exactly what a JIT compiler (like HotSpot) does.
Real World Outcome
You will produce a benchmark report that quantifies the âGC Tax.â Youâll see exactly how many CPU cycles are wasted on memory safety, which explains why C++ can sometimes be faster than Java/Go even if the GC is âperfect.â
Example Benchmark Output:
$ ./barrier_bench --iterations=100000000
Testing: Pointer Assignment (obj.field = val)
--------------------------------------------------
1. RAW (No GC): 0.42 ns/op
2. G1 Barrier (Write): 0.85 ns/op (+102% overhead)
3. ZGC Barrier (Load): 1.12 ns/op (+166% overhead)
[ANALYSIS]
- The G1 Write Barrier adds ~3 x86 instructions (CMP, JNE, STR).
- The ZGC Load Barrier adds ~5 x86 instructions including a bitwise AND.
- Conclusion: High-frequency trading apps should avoid pointer-heavy structures in managed languages.
The Core Question Youâre Answering
âWhat is the true price of low-latency memory management?â
You pay for sub-millisecond pauses by making every pointer operation in your program slightly slower. This project quantifies that price at the assembly level. Youâll learn why ZGC uses a âfast pathâ and a âslow path.â
Concepts You Must Understand First
- CPU Pipelines & Branch Prediction
- Why is a conditional branch (
if (gc_is_active)) nearly free if itâs usually false?
- Why is a conditional branch (
- Inline Assembly
- How do you write
asmblocks in C++/Rust to ensure the compiler doesnât optimize away your barrier?
- How do you write
- Load vs. Write Barriers
- Why are Load Barriers (ZGC) typically more expensive than Write Barriers (G1)? (Hint: Apps usually read pointers 10x more than they write them).
Questions to Guide Your Design
- The Benchmark Loop
- How do you ensure you are measuring the barrier and not just the overhead of the loop itself?
- Simulating the GC State
- How will you toggle the âGC is activeâ flag so you can measure both the âFast Pathâ (GC idle) and âSlow Pathâ (GC marking/relocating)?
- Instruction Counting
- Can you use a tool like
objdumpto verify exactly what instructions your barrier is adding?
- Can you use a tool like
Thinking Exercise
The Hidden Tax
Consider this simple loop:
for (int i = 0; i < 1000000; i++) {
node = node->next; // READ pointer
node->value = i; // WRITE value
}
If you are using ZGC, there is a Load Barrier on node->next.
If you are using G1, there is a Write Barrier on node->value (if value was a pointer).
- Which one is executed in every iteration?
- If the barrier takes 2 nanoseconds, how much longer does the loop take in total?
The Interview Questions Theyâll Ask
- âWhat is a âFast Pathâ in a GC barrier?â
- âHow does the CPUâs branch predictor help minimize GC overhead?â
- âWhy did ZGC choose a Load Barrier instead of a Write Barrier?â
- âWhat is the performance impact of âDirtying a Cardâ in G1?â
Hints in Layers
Hint 1: Simple Barrier Implementation A G1 Write Barrier looks something like:
void write_barrier(void** slot, void* new_val) {
*slot = new_val;
if (global_gc_phase == MARKING) {
record_dirty_card(slot);
}
}
Hint 2: Using __builtin_expect
Use __builtin_expect (in GCC/Clang) to tell the compiler that the GC is usually NOT active. This helps the branch predictor.
Hint 3: High-Resolution Timers
Use std::chrono::high_resolution_clock or the RDTSC instruction (x86) for nanosecond precision.
Hint 4: Assembly Verification
Compile with -S and look at the assembly. Ensure your barrier hasnât been âhoistedâ out of the loop by the optimizer.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Barrier Implementation | âThe Garbage Collection Handbookâ | Ch. 11.4 |
| Performance of Barriers | âThe Garbage Collection Handbookâ | Ch. 15 |
| Assembly & CPU Pipelines | âComputer Systems: A Programmerâs Perspectiveâ | Ch. 3 & 5 |
| JIT Compiler Optimization | âJava Performanceâ | Ch. 4 |
Project 6: G1 Region Simulator (Garbage-First Logic)
- Main Programming Language: Java
- Alternative Programming Languages: Python, Go
- Difficulty: Level 3: Advanced
- Knowledge Area: Heuristics & Scheduling
- Main Book: âThe Garbage Collection Handbookâ Ch. 16
What youâll build: A simulation of G1âs region-based heap. Instead of one giant heap, you have 100 small âRegions.â Youâll implement the logic that selects which regions to collect based on their Garbage Density and the userâs Pause Time Goal.
Real World Outcome
You will understand why G1 is called âGarbage First.â You will build a simulator that demonstrates the âmixed collectionâ strategy. Youâll see how the algorithm balances reclaiming space with meeting a strict pause-time target by dynamically choosing which âRegionsâ to clean.
Simulator Output:
$ java G1Simulator --pause-goal=10ms --heap-size=1GB
[STATS] Heap Usage: 75%
[ANALYSIS] Calculating 'Garbage Density' per Region...
- Region #12: 95% Garbage (Score: 0.95)
- Region #45: 88% Garbage (Score: 0.88)
- Region #08: 12% Garbage (Score: 0.12)
- Region #99: 100% Garbage (Score: 1.00)
[PLANNING] Build Collection Set (CSet):
1. Adding 3 Young Regions (Mandatory). Est Pause: 4.5ms
2. Adding Region #99. Est Pause: 1.2ms (Cumulative: 5.7ms)
3. Adding Region #12. Est Pause: 1.8ms (Cumulative: 7.5ms)
4. Adding Region #45. Est Pause: 2.1ms (Cumulative: 9.6ms)
5. Pause Goal (10ms) reached. Stopping CSet construction.
[GC] PHASE: EVACUATION
[GC] Done. Pause Time: 9.82ms. Reclaimed 142MB.
# Result: You just successfully traded completeness for latency!
The Core Question Youâre Answering
âHow do we collect enough memory to stay alive without stopping the world for too long?â
GC is an optimization problem. G1âs breakthrough was realizing that you donât have to collect the whole Old Generation at once. You can pick and choose the regions that are mostly garbage and leave the others for later. This project teaches you the Garbage-First Heuristic.
Concepts You Must Understand First
- Region-based Heap Layout
- What are Young, Old, and Humongous regions?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 16.1
- The CSet (Collection Set)
- How does G1 build the list of regions to be collected in the next cycle?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 16.3
- Pause Time Prediction
- How does G1 use historical data to guess how long it will take to evacuate a region?
- Book Reference: âJava Performanceâ Ch. 6
Questions to Guide Your Design
- Measuring Density
- How will you track how much âLive Dataâ is in each region without doing a full GC? (Hint: Remembered Sets).
- Scheduling
- If your pause goal is 5ms, and scanning the Young regions takes 4ms, how many Old regions can you afford to add to the CSet?
- Mixed Collections
- When should G1 start collecting Old regions? (The
InitiatingHeapOccupancyPercentlogic).
- When should G1 start collecting Old regions? (The
Thinking Exercise
The High-Value Target
Imagine you have two regions:
- Region A: 1MB of data, 90% is garbage. (100KB live).
- Region B: 1MB of data, 10% is garbage. (900KB live). If the cost of collecting 100KB is 1ms:
- How long does Region A take to collect?
- How long does Region B take to collect?
- Which one gives you more free space for the same âpause time costâ?
The Interview Questions Theyâll Ask
- âWhy is G1 called âGarbage-Firstâ?â
- âWhat is a âMixed Collectionâ in G1?â
- âHow does G1 handle objects that are larger than a single region? (Humongous Objects).â
- âWhat happens if G1 cannot reclaim memory fast enough to meet allocation demand? (Full GC).â
Hints in Layers
Hint 1: The Region Object
Define a class Region that tracks totalSize, liveDataSize, and type (EDEN, SURVIVOR, OLD).
Hint 2: The Simulator Loop Every âtickâ of your simulator, allocate some objects into EDEN regions. When all EDEN regions are full, trigger a âYoung GC.â
Hint 3: Calculating Value
The âValueâ of a region is (totalSize - liveDataSize) / estimatedCollectionTime. Sort your regions by this value and pick from the top.
Hint 4: Historical Averages Keep an array of the last 5 collection times. Use the average to predict the next one. This is exactly how the real G1 works.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| G1 Architecture | âThe Garbage Collection Handbookâ | Ch. 16 |
| G1 Tuning & Heuristics | âJava Performanceâ | Ch. 6 |
| Pause Time Goals | âThe Garbage Collection Handbookâ | Ch. 16.2 |
Project 7: Colored Pointer Sandbox (ZGC Style)
- Main Programming Language: C
- Alternative Programming Languages: C++
- Difficulty: Level 5: Master
- Knowledge Area: Virtual Memory & Bit Manipulation
- Main Book: OpenJDK ZGC Wiki (Main - ZGC)
What youâll build: A memory manager that uses Colored Pointers. You will use the upper bits of a 64-bit address (Marked0, Marked1, Remapped) to store object state. Youâll implement a load_barrier() that checks these bits and âhealsâ the pointer if itâs in the wrong state.
Real World Outcome
You will implement the âSecret Sauceâ of ZGC. Youâll have a system where you can mark 1 million objects as âstaleâ simply by changing a single global variable, and the application will âself-healâ those pointers one-by-one as it accesses them. You will use mmap to create the âMulti-Mappingâ trick required for colored pointers to work.
Execution Trace:
$ ./zgc_sandbox
[INIT] Heap mapped at 3 virtual locations:
- Marked0: 0x000100000000
- Marked1: 0x000200000000
- Remap: 0x000400000000
[GC] Transitioning to MARKED0 phase.
[APP] Reading pointer P1 (Value: 0x0001000000005000)
[BARRIER] Pointer color matches global mask. Fast path (0.2ns).
[GC] Transitioning to REMAPPED phase (Compaction started).
[APP] Reading pointer P1 (Value: 0x0001000000005000)
[BARRIER] STALE DETECTED! (Expected REMAPPED bit).
[BARRIER] Entering SLOW PATH...
[BARRIER] Relocation check for 0x5000: Object moved to 0x9000.
[BARRIER] HEALING pointer...
[APP] P1 updated to 0x0004000000009000. Read successful.
# Result: The app never pauses for compaction; it fixes its own memory as it goes!
The Core Question Youâre Answering
âCan we know an objectâs state without ever looking at the object?â
In traditional GCs, to see if an object is âMarked,â you have to load the objectâs header from RAM into the CPU cache. If you have millions of objects, this causes massive cache misses. ZGC stores the state in the Pointer. This project teaches you how to use the âMetadata Bitsâ of a 64-bit address.
Concepts You Must Understand First
- Virtual Address Space
- On a 64-bit system, we only use ~48 bits for actual addresses. What can we do with the other 16 bits?
- Multi-Mapping (The trick)
- How can we make
0x0001....5000and0x0004....5000point to the same physical memory? (Usingmmapto map the same file multiple times). - Book Reference: âThe Linux Programming Interfaceâ Ch. 49 (Memory Mapping)
- How can we make
- Load Barriers
- Why must the barrier check the pointer before the application uses it?
Questions to Guide Your Design
- The Bitmask
- Which bits will you use for
Marked0,Marked1, andRemapped?
- Which bits will you use for
- The Global Color
- How will the
load_barrierfunction know which bit is the âcorrectâ one for the current GC phase?
- How will the
- The Slow Path
- What does the barrier do when it detects a âBad Colorâ? (It must check if the object has been moved and find the new address).
Thinking Exercise
The Magic Mirror
Imagine you have a physical book. You put three different mirrors in front of it.
- Mirror 1 shows the book in Red.
- Mirror 2 shows the book in Blue.
- Mirror 3 shows the book in Green. No matter which mirror you look through, you see the same book. In ZGC:
- Address
0x0001...is the âMarked0â mirror. - Address
0x0002...is the âMarked1â mirror. - Address
0x0004...is the âRemappedâ mirror. Why does this make bit-toggling fast?
The Interview Questions Theyâll Ask
- âWhat are Colored Pointers, and what problem do they solve?â
- âExplain âMulti-mappingâ in ZGC. Why is it necessary?â
- âWhy does ZGC only work on 64-bit architectures?â
- âWhat is the âSelf-Healingâ property of ZGC pointers?â
Hints in Layers
Hint 1: The Address Layout
Use bits 42, 43, and 44 for your colors.
#define ZGC_METADATA_MASK 0x0007000000000000
Hint 2: mmap Multi-mapping
On Linux, you can use memfd_create to create a file in RAM, then mmap it three times into different virtual address ranges.
Hint 3: The Barrier Macro
#define LOAD_BARRIER(ptr) \
(((ptr) & global_good_mask) ? (ptr) : slow_path_heal(ptr))
Hint 4: The Relocation Table
When you âmoveâ an object in your sandbox, store the mapping in a hash map: old_addr -> new_addr. The slow_path_heal function uses this to update the stale pointer.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| ZGC Principles | OpenJDK Wiki | âMain - ZGCâ |
| Memory Mapping (mmap) | âThe Linux Programming Interfaceâ | Ch. 49 |
| Bit Manipulation | âHackerâs Delightâ | Ch. 2 |
| Load Barriers | âThe Garbage Collection Handbookâ | Ch. 11.4 |
Project 8: The Forwarding Table (Shenandoah-style)
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Difficulty: Level 4: Expert
- Knowledge Area: Concurrency & Atomic Operations
- Main Book: âThe Garbage Collection Handbookâ Ch. 17
What youâll build: A simulator for Shenandoahâs Concurrent Evacuation. When an object is moved, instead of updating all pointers immediately, you store the new address in a âForwarding Pointerâ (Brooks Pointer) located right before the objectâs header.
Real World Outcome
You will implement a system that can move objects while multiple application threads are reading and writing to them. You will see how Brooks Pointers and Atomic CAS operations ensure that every thread always sees the most up-to-date version of an object, even in the middle of a relocation.
Thread Simulation Log:
$ ./shenandoah_sim --threads=4
[GC] START Relocation: ObjA (0x1000) -> Target (0x2000)
[MUTATOR-T1] READ ObjA (0x1000)...
[BARRIER] Indirection: 0x1000.brooks_ptr -> 0x1000. Self-access.
[GC] COPYING DATA: [0x1000...0x1040] -> [0x2000...0x2040]
[MUTATOR-T2] WRITE ObjA (0x1000).val = 42...
[BARRIER] Indirection: 0x1000.brooks_ptr -> 0x1000. Writing to Old Copy.
[GC] ATOMIC SWAP: 0x1000.brooks_ptr = 0x2000. (SUCCESS)
[MUTATOR-T3] READ ObjA (0x1000)...
[BARRIER] Indirection: 0x1000.brooks_ptr -> 0x2000. Redirecting to New Copy.
[MUTATOR-T4] WRITE ObjA (0x1000).val = 99...
[BARRIER] Indirection: 0x1000.brooks_ptr -> 0x2000. Redirecting to New Copy.
# Result: Consistency maintained across all threads during physical data movement.
The Core Question Youâre Answering
âHow do we handle the split-second when an object exists in two places at once?â
During concurrent evacuation, an object is copied from the âFromâ space to the âToâ space. For a brief moment, both copies exist. If Thread A writes to the old copy and Thread B reads from the new copy, your data is corrupted. This project teaches you how Atomic Compare-And-Swap (CAS) operations and Brooks Pointers ensure consistency.
Concepts You Must Understand First
- Brooks Pointers
- Why put the forwarding address inside the object (or just before it) instead of in a separate table?
- Book Reference: âThe Garbage Collection Handbookâ Ch. 17.6
- Atomic CAS (Compare-And-Swap)
- How do you move an object and update its forwarding pointer in a way that no other thread can âsneak inâ a write to the old version?
- Book Reference: âComputer Systems: A Programmerâs Perspectiveâ Ch. 12.7
- Read and Write Barriers in Shenandoah
- Why does Shenandoah need a barrier for every access, not just marking?
Questions to Guide Your Design
- The Object Layout
- How will you structure your memory so that every object has a 64-bit âForwarding Pointerâ prefixed to it?
- The âTo-Spaceâ Invariant
- How do you guarantee that once an object is moved, all future writes go to the new copy?
- Race Conditions
- What happens if two GC threads try to evacuate the same object at the same time?
Thinking Exercise
The Double-Write Trap
- GC thread starts copying
Obj AtoNew A. - Mutator thread writes
x = 10toObj A. - GC thread finishes copying and sets the Brooks Pointer.
- What happened to the value
x = 10? Was it copied? (No, the write happened after the copy started).- How does the âWrite Barrierâ prevent this data loss?
The Interview Questions Theyâll Ask
- âWhat is a Brooks Pointer, and how does it differ from a standard forwarding pointer?â
- âExplain the âLoad Reference Barrierâ in Shenandoah.â
- âHow does Shenandoah use CAS to win the race against Mutator threads?â
- âCompare ZGCâs Colored Pointers to Shenandoahâs Forwarding Pointers. Which one has more memory overhead?â
Hints in Layers
Hint 1: The Indirection Header
Every object should look like this in memory:
[ Forwarding Pointer (8 bytes) ][ Header ][ Object Data ]
By default, the Forwarding Pointer points to itself.
Hint 2: The Read Barrier
Any time you access an object obj, you should actually access obj->forwarding_pointer.
Hint 3: The Atomic Evacuation When moving:
- Copy the object to the new space.
- Use
atomic_compare_exchangeto update the old copyâs forwarding pointer fromselftonew_address. - If it fails, another thread already moved it! Abandon your copy and use the other one.
Hint 4: Thread Simulation
Use pthreads (C) or std::thread (C++) to simulate concurrent access. Use a sleep or yield to increase the chance of race conditions.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Shenandoah & Concurrent Compaction | âThe Garbage Collection Handbookâ | Ch. 17.6 |
| Atomic Operations | âComputer Systems: A Programmerâs Perspectiveâ | Ch. 12 |
| Lock-free Algorithms | âThe Art of Multiprocessor Programmingâ | Ch. 3 |
Project 9: The âMutatorâ Stress Test (Race Conditions)
- Main Programming Language: Go
- Alternative Programming Languages: Java, C++
- Difficulty: Level 3: Advanced
- Knowledge Area: Concurrency Testing
- Main Book: âThe Garbage Collection Handbookâ Ch. 13
What youâll build: A program that creates a massive, complex object graph (e.g., a multi-dimensional linked list) and then spawns multiple threads that do nothing but randomly swap references. Youâll run your custom GCs against this âMutatorâ and identify data corruption bugs.
Real World Outcome
You will develop a âChaos Monkeyâ for memory managers. It will prove whether your GC is actually âThread Safeâ or if it contains subtle race conditions that only appear under high load. You will learn how to build âShadow Mapsâ to verify heap integrity after a concurrent run.
Stress Test Output:
$ ./mutator_stress --threads=16 --gc=my_concurrent_gc
[MUTATOR] Spawning 16 threads...
[MUTATOR] Performing 1,000,000 reference swaps per second.
[GC] Triggering concurrent marking cycle...
[GC] Triggering concurrent relocation cycle...
[VALIDATOR] Scanning heap for dangling pointers...
[VALIDATOR] ERROR: Found dangling pointer at 0x8804.
[VALIDATOR] Pointer at 0x8804 expected ObjX, but found 0xDEADBEEF (Garbage).
[SUMMARY] GC Bug Found! Race condition between Barrier and Relocator.
# You've just found a bug that would have taken 6 months to find in production.
The Core Question Youâre Answering
âHow do we prove that a concurrent algorithm is correct?â
Concurrency is famously difficult to reason about. A âMutator Stress Testâ is the industry standard for verifying that a GCâs barriers and invariants actually work in the âreal worldâ of multi-core processors.
Concepts You Must Understand First
- Race Conditions
- What happens when a GC scan and a Mutator write happen at the exact same nanosecond?
- Graph Validation
- How can you verify that every object that should be live is still in the heap? (Usually by keeping a âShadow Mapâ or doing a slow STW scan for comparison).
- Thread Synchronization
- Using Mutexes vs. Lock-free code in the Mutator.
Questions to Guide Your Design
- The Graph Structure
- How can you build a graph that is complex enough to trigger bugs but simple enough to validate?
- Workload Simulation
- Should the Mutators swap pointers quickly (test barriers) or allocate/deallocate quickly (test fragmentation)?
- Bug Localization
- When a bug is found, how can you âreplayâ the sequence of events to find the cause?
Thinking Exercise
The Vanishing Node
Imagine a thread is moving Node X from Parent A to Parent B.
- GC marks Parent A (Black).
- GC hasnât reached Parent B yet (White).
- Thread moves X to Parent B.
- Thread removes X from Parent A.
- Is there any point where X is not pointed to by either A or B? (No).
- Will the GC find X? (Not unless thereâs a barrier!).
The Interview Questions Theyâll Ask
- âWhat is a âMutatorâ in GC terminology?â
- âHow do you test for race conditions in a non-deterministic system?â
- âWhat is the âLost Objectâ problem, and how can a stress test find it?â
- âWhy is it important to test GC with multiple threads even if the GC itself is single-threaded?â
Project 10: SATB (Snapshot-At-The-Beginning)
- Main Programming Language: Java
- Alternative Programming Languages: C#, C++
- Difficulty: Level 4: Expert
- Knowledge Area: Consistency Models
- Main Book: âThe Garbage Collection Handbookâ Ch. 16
What youâll build: Implement a marking algorithm that uses SATB. When marking starts, you âlogicallyâ snapshot the entire object graph. Any references deleted after that point are still considered live for this cycle.
Real World Outcome
You will implement the marking logic used by G1 and Shenandoah. Youâll see how SATB simplifies concurrent marking by treating any object that was live at the start of the cycle as âuntouchableâ until the cycle ends. You will observe how âFloating Garbageâ is created as a trade-off for algorithm simplicity and correctness.
Execution Trace:
$ ./satb_demo
[GC] Cycle 12 START. Snapshot taken.
[MUTATOR] Operation: objA.field = NULL (Old value was ObjB).
[BARRIER] SATB PRE-WRITE BARRIER: Catching 'ObjB'.
[BARRIER] ObjB added to Marking Stack.
[GC] PHASE: CONCURRENT MARKING
[GC] Scanning Marking Stack... Found ObjB. Marking LIVE.
[GC] Cycle FINISHED.
[INFO] ObjB is marked LIVE even though nothing points to it.
# You've just seen 'Floating Garbage'âmemory that will be reclaimed in the NEXT cycle.
The Core Question Youâre Answering
âHow do we create a consistent view of a world that is constantly changing?â
SATB is like taking a photo of a crowd. Even if people move after the photo is taken, the photo remains the same. This project teaches you the trade-off between GC Precision (reclaiming memory immediately) and GC Complexity (making the marking process easier to reason about).
Concepts You Must Understand First
- SATB Invariant
- If an object was reachable at the start of GC, it stays reachable until the end of GC.
- Book Reference: âThe Garbage Collection Handbookâ Ch. 13.3
- The âDeletionâ Barrier
- Why do we only care when a pointer is overwritten or deleted?
- Floating Garbage
- Why does SATB result in some dead objects being kept alive longer than necessary?
Questions to Guide Your Design
- The Write Barrier
- How will you intercept the âOld Valueâ of a pointer before it is replaced?
- The Marking Stack
- How do you store the objects caught by the SATB barrier so the GC can scan them later?
- Cycle Initialization
- How do you âmarkâ the start of a cycle so the barriers know when to activate?
Thinking Exercise
The Ghost in the Machine
If you have a pointer A -> B.
- GC starts.
- Thread changes it to
A -> C. - In SATB,
Bis âcaughtâ by the barrier and marked live. - Is
Bactually dead? (Yes, nothing points to it). - Why is it better to keep
Balive (Floating Garbage) than to risk accidentally deleting a live object?
The Interview Questions Theyâll Ask
- âWhat does SATB stand for?â
- âWhat is âFloating Garbageâ and why is it an acceptable trade-off?â
- âHow does a SATB write barrier differ from an Incremental Update barrier?â
- âWhy does G1 use SATB during its concurrent marking phase?â
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| SATB Theory | âThe Garbage Collection Handbookâ | Ch. 13.3 |
| G1 SATB Implementation | âJava Performanceâ | Ch. 6 |
| Snapshot consistency | âPrinciples of Concurrent Programmingâ | Ch. 4 |
Project 11: Epsilon GC (The âNo-Opâ Reference)
- Main Programming Language: Java / JVM Flags
- Alternative Programming Languages: Any GC-based language (e.g., Go with
GOGC=off) - Difficulty: Level 1: Beginner
- Knowledge Area: Benchmarking & Performance Baselines
- Main Book: OpenJDK JEP 318
What youâll build: A suite of performance tests using the Epsilon GC (the GC that handles allocation but never reclaims memory). Youâll use this to find the âBaseline Performanceâ of your application and calculate the exact âGC Taxâ paid by G1 or ZGC.
Real World Outcome
You will determine if your applicationâs performance problems are caused by the GC or by your own code. Youâll also learn to calculate the âMemory Leak Rateâ of a system by watching how fast an Epsilon-backed JVM crashes with an OutOfMemoryError (OOM).
Benchmark Output:
$ java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx128m -jar my-app.jar
[ALLOC] Total bytes allocated: 0 MB
[ALLOC] Total bytes allocated: 50 MB
[ALLOC] Total bytes allocated: 100 MB
[ERROR] java.lang.OutOfMemoryError: Java heap space
[ANALYSIS]
- Time to crash: 42 seconds.
- Total throughput: 15,000 req/sec.
- Comparison: G1 GC gives 12,000 req/sec.
- Result: You are paying a 20% "GC Tax" for memory safety.
The Core Question Youâre Answering
âWhat would my app look like if memory were infinite?â
Epsilon GC is the control group in your experiments. It answers: âHow fast can my code run if it never has to wait for a GC barrier or a pause?â Itâs also a powerful tool for short-lived tasks (like a CLI tool that runs for 1 second) where you donât care about collecting memory.
Concepts You Must Understand First
- GC Overhead vs. Application Latency
- How much CPU time is spent on your business logic vs. managing memory?
- Short-lived Runtimes
- Why would you ever want a GC that doesnât collect? (Hint: Serverless/Lambda functions).
- Allocation Baselines
- How many gigabytes of memory does your app allocate per minute?
The Interview Questions Theyâll Ask
- âWhat is Epsilon GC and what is its purpose?â
- âWhy is Epsilon useful for performance benchmarking?â
- âCan you use Epsilon in production? If so, when?â
- âHow does Epsilon help you identify memory leaks?â
Hints in Layers
Hint 1: Setup
Use -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC on any OpenJDK 11+ JVM.
Hint 2: The Benchmark Run a CPU-intensive task with G1, then with ZGC, then with Epsilon. Compare the throughput (ops/sec).
Hint 3: Calculating the Tax
GC Tax = (Epsilon Throughput - G1 Throughput) / Epsilon Throughput.
Project 12: The âOracleâ GC Log Parser
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust
- Difficulty: Level 2: Intermediate
- Knowledge Area: Log Analysis & Observability
- Main Book: âJava Performanceâ by Charlie Hunt (Ch. 6: Garbage Collection)
What youâll build: A tool that parses raw -Xlog:gc* output and produces a visual report showing the âPause Timeline,â âAllocation Rate,â and âPromotion Rate.â
Real World Outcome
You will be able to debug production performance issues by reading the âheartbeatâ of the JVM. Youâll identify why a system is slowing down (e.g., âPremature Promotionâ) and which GC flags need tuning. Youâll build a CLI tool that turns cryptic logs into actionable architectural advice.
Tool Output:
$ python gc_oracle.py --file=gc.log
--------------------------------------------------
JVM GC ANALYSIS REPORT
--------------------------------------------------
Total Runtime: 2h 14m
Max Pause: 145ms (G1 Evacuation Pause)
Avg Pause: 12ms
DIAGNOSTIC ALERTS:
[WARNING] High Promotion Rate (15MB/s).
=> Survivor space is too small. Objects are moving to OldGen too early.
=> Suggestion: Increase SurvivorRatio or increase YoungGen size.
[CRITICAL] System.gc() called 4 times.
=> Manual GC calls are killing your performance.
=> Suggestion: Run with -XX:+DisableExplicitGC.
[INFO] Allocation Rate: 250MB/s.
=> This is a very "chatty" application. Check for object pooling opportunities.
--------------------------------------------------
The Core Question Youâre Answering
âWhat is actually happening inside the black box?â
Modern GCs produce incredibly detailed logs, but they are hard for humans to read. This project teaches you the lifecycle of a GC cycle (Marking, Evacuation, Cleanup) and how to identify the âCritical Failuresâ (Full GC, Humongous Allocation).
Concepts You Must Understand First
- GC Log Formats
- What is Unified Logging (
-Xlog:gc)?
- What is Unified Logging (
- Phases of Collection
- What is the difference between a âConcurrent Markâ phase (doesnât stop the app) and a âPause Youngâ phase (stops the app)?
- Survivor Spaces & Aging
- How can you tell from the logs if objects are being promoted to the Old Gen too quickly?
Questions to Guide Your Design
- The Regex Challenge
- How will you parse timestamps and memory sizes (e.g.,
1024K -> 512K(2048K))?
- How will you parse timestamps and memory sizes (e.g.,
- Timeline Visualization
- How can you plot pauses on a timeline to see if they correlate with application latency?
- Automatic Diagnostics
- Can you write rules that detect common problems like âAllocation Failureâ or âMetadata GC Thresholdâ?
The Interview Questions Theyâll Ask
- âHow do you identify a memory leak from a GC log?â
- âWhat is the difference between a âPause Young (Normal)â and a âPause Young (Prepare for Mixed)â in G1 logs?â
- âHow can you tell if the heap is too small by looking at GC logs?â
- âWhat is a âFull GCâ and why is it bad in G1/ZGC?â
Hints in Layers
Hint 1: Sample Logs
Generate logs by running a simple Java program with -Xlog:gc:file=gc.log.
Hint 2: Parsing Memory
Use a regular expression to capture the before/after/total heap sizes: (\d+)[KMG]->(\d+)[KMG]\((\d+)[KMG]\).
Hint 3: Calculating Allocation Rate The difference between the âEndâ of one GC and the âStartâ of the next GC is the amount of memory allocated by the application during that interval.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| GC Logging & Tuning | âJava Performanceâ | Ch. 6 |
| Unified Logging Format | OpenJDK Documentation | âJEP 158â |
| Troubleshooting GC | âThe Well-Grounded Java Developerâ | Ch. 5 |
Final Overall Project: The âEternal Runtimeâ (Masterpiece)
What youâll build: A custom Virtual Machine (VM) that combines everything youâve learned. It will support a Region-based Heap with Concurrent Marking, ZGC-style Colored Pointers, and Concurrent Compaction.
Real World Outcome
This is the ultimate proof of mastery. You will have built a memory manager from scratch that rivals the core logic of modern high-performance runtimes. You will be able to run a simple bytecode language in your VM with sub-millisecond pauses on a heap that you are constantly compacting concurrently.
Final System Output:
$ ./eternal_vm --script=benchmark.evm --heap=4GB
[VM] Initializing Region-based Heap...
[VM] Loading Bytecode...
[VM] --- RUNTIME ACTIVE ---
[GC] Concurrent Mark Phase Started...
[GC] Concurrent Evacuation Started...
[VM] [T1] Executing Op: LOAD_PTR (Address: 0x000400000100)
[BARRIER] Healing stale pointer! New Address: 0x000400000900
[GC] Cycle Completed. Total Pause: 0.45ms.
[VM] benchmark.evm finished in 12.4s.
[STATS] Total Memory Reclaimed: 14.2 GB.
# CONGRATULATIONS: You have mastered Modern Garbage Collection.
The Interview Questions Theyâll Ask
- âIf you had to design a GC for a 100TB heap today, which architecture would you choose and why?â
- âWhat is the single most difficult part of implementing a concurrent GC?â
- âHow do you balance the trade-off between throughput and latency in a memory manager?â
Books for Final Mastery
| Topic | Book | Chapter |
|---|---|---|
| Complete GC Implementation | âThe Garbage Collection Handbookâ | Ch. 1-17 (The whole thing) |
| VM Architecture | âCrafting Interpretersâ | Ch. 26: Garbage Collection |
| High-Performance Systems | âSystems Performanceâ (Brendan Gregg) | Ch. 2 & 12 |