← Back to all projects

LEARN JVM INTERNALS BUILD FROM SCRATCH

Learn JVM Internals: Build Your Own Java Virtual Machine in C

Goal: Deeply understand the Java Virtual Machine by building one from scratch in C—from parsing class files to executing bytecode, implementing garbage collection, and understanding every internal mechanism.


Why the JVM Exists

Before the JVM (1995), software development faced fundamental problems:

The Problems the JVM Solves

  1. Platform Dependency: C/C++ programs compiled for Windows won’t run on Linux
    • Recompile for every platform
    • Platform-specific bugs
    • Different compilers, different behaviors
  2. Memory Unsafety: Manual memory management in C/C++
    • Buffer overflows
    • Use-after-free bugs
    • Memory leaks
    • Security vulnerabilities
  3. No Runtime Safety: C/C++ trusts the programmer completely
    • Array bounds unchecked
    • Null pointer dereferences
    • Type confusion

The JVM’s Solution

┌─────────────────────────────────────────────────────────────┐
│                    Java Source Code                         │
│                      (.java files)                          │
└─────────────────────────┬───────────────────────────────────┘
                          │ javac (compiler)
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                       Bytecode                              │
│                    (.class files)                           │
│              Platform-independent!                          │
└─────────────────────────┬───────────────────────────────────┘
                          │
         ┌────────────────┼────────────────┐
         ▼                ▼                ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  JVM for    │  │  JVM for    │  │  JVM for    │
│   Windows   │  │   Linux     │  │   macOS     │
└─────────────┘  └─────────────┘  └─────────────┘
         │                │                │
         ▼                ▼                ▼
    Windows x86      Linux x86       macOS ARM
    Machine Code     Machine Code    Machine Code

Write Once, Run Anywhere — The JVM abstracts the hardware.


JVM Architecture: The Complete Picture

┌──────────────────────────────────────────────────────────────────────────┐
│                           CLASS LOADER SUBSYSTEM                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                   │
│  │   Loading    │→ │   Linking    │→ │ Initialization│                   │
│  │              │  │ (Verify,     │  │              │                   │
│  │ Bootstrap    │  │  Prepare,    │  │ <clinit>    │                   │
│  │ Extension    │  │  Resolve)    │  │              │                   │
│  │ Application  │  │              │  │              │                   │
│  └──────────────┘  └──────────────┘  └──────────────┘                   │
└──────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                          RUNTIME DATA AREAS                              │
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                         Method Area                              │    │
│  │  (Class metadata, static variables, constant pool, method code) │    │
│  │                     [Shared by all threads]                      │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                            Heap                                  │    │
│  │              (All objects and arrays live here)                  │    │
│  │                     [Shared by all threads]                      │    │
│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │    │
│  │  │ Young Gen   │  │  Old Gen    │  │  Metaspace  │              │    │
│  │  │ Eden|S0|S1  │  │             │  │             │              │    │
│  │  └─────────────┘  └─────────────┘  └─────────────┘              │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
│  ┌───────────────┐ ┌───────────────┐ ┌───────────────┐                  │
│  │   Thread 1    │ │   Thread 2    │ │   Thread N    │                  │
│  │ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │                  │
│  │ │ PC Register│ │ │ │ PC Register│ │ │ │ PC Register│ │                  │
│  │ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │                  │
│  │ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │                  │
│  │ │   Stack   │ │ │ │   Stack   │ │ │ │   Stack   │ │                  │
│  │ │  ┌─────┐  │ │ │ │  ┌─────┐  │ │ │ │  ┌─────┐  │ │                  │
│  │ │  │Frame│  │ │ │ │  │Frame│  │ │ │ │  │Frame│  │ │                  │
│  │ │  ├─────┤  │ │ │ │  ├─────┤  │ │ │ │  ├─────┤  │ │                  │
│  │ │  │Frame│  │ │ │ │  │Frame│  │ │ │ │  │Frame│  │ │                  │
│  │ │  └─────┘  │ │ │ │  └─────┘  │ │ │ │  └─────┘  │ │                  │
│  │ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │                  │
│  │ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │                  │
│  │ │Native Stk │ │ │ │Native Stk │ │ │ │Native Stk │ │                  │
│  │ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │                  │
│  └───────────────┘ └───────────────┘ └───────────────┘                  │
│        [Per-thread]       [Per-thread]       [Per-thread]               │
└──────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                          EXECUTION ENGINE                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                   │
│  │  Interpreter │  │ JIT Compiler │  │   Garbage    │                   │
│  │              │  │              │  │  Collector   │                   │
│  │ Executes     │  │ Compiles hot │  │              │                   │
│  │ bytecode     │  │ code to      │  │ Reclaims     │                   │
│  │ instruction  │  │ native       │  │ unused       │                   │
│  │ by           │  │ machine code │  │ memory       │                   │
│  │ instruction  │  │              │  │              │                   │
│  └──────────────┘  └──────────────┘  └──────────────┘                   │
└──────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                     NATIVE METHOD INTERFACE (JNI)                        │
│                   (Bridge to native C/C++ libraries)                     │
└──────────────────────────────────────────────────────────────────────────┘

Core Concepts: How the JVM Works Behind the Scenes

Stack-Based vs Register-Based VMs

The JVM is a stack-based virtual machine (unlike x86/ARM which are register-based):

Register-based (x86):              Stack-based (JVM):
ADD R1, R2, R3                     ILOAD_1      ; push local[1]
(R1 = R2 + R3)                     ILOAD_2      ; push local[2]
                                   IADD         ; pop 2, push sum
                                   ISTORE_3     ; pop to local[3]

Why stack-based?
- Simpler to generate code for
- No register allocation needed
- Easier to verify
- More compact bytecode

The Operand Stack and Local Variables

Each method call creates a stack frame:

┌─────────────────────────────────────┐
│            Stack Frame              │
├─────────────────────────────────────┤
│  Local Variables                    │
│  ┌─────┬─────┬─────┬─────┬─────┐   │
│  │  0  │  1  │  2  │  3  │ ... │   │
│  │this │arg1 │arg2 │var1 │     │   │
│  └─────┴─────┴─────┴─────┴─────┘   │
├─────────────────────────────────────┤
│  Operand Stack                      │
│  ┌─────┐                            │
│  │  5  │ ← top                      │
│  ├─────┤                            │
│  │  3  │                            │
│  ├─────┤                            │
│  │ ... │                            │
│  └─────┘                            │
├─────────────────────────────────────┤
│  Return Address                     │
│  Pointer to previous frame          │
│  Reference to constant pool         │
└─────────────────────────────────────┘

Bytecode Execution Example

// Java code
public static int add(int a, int b) {
    return a + b;
}
// Bytecode (javap -c output)
public static int add(int, int);
  Code:
     0: iload_0        // Push int from local[0] (a)
     1: iload_1        // Push int from local[1] (b)
     2: iadd           // Pop 2 ints, push sum
     3: ireturn        // Return int from top of stack

Execution trace:
┌─────────────────────────────────────────────────────────────┐
│ Step │ Instruction │ Stack (before) │ Stack (after)        │
├──────┼─────────────┼────────────────┼──────────────────────┤
│  0   │ iload_0     │ []             │ [5]           (a=5)  │
│  1   │ iload_1     │ [5]            │ [5, 3]        (b=3)  │
│  2   │ iadd        │ [5, 3]         │ [8]           (5+3)  │
│  3   │ ireturn     │ [8]            │ []            (ret 8)│
└─────────────────────────────────────────────────────────────┘

The Class File Format

Every .class file has this structure (magic number 0xCAFEBABE):

ClassFile {
    u4             magic;                    // 0xCAFEBABE
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[count-1];   // The "symbol table"
    u2             access_flags;             // public, final, etc.
    u2             this_class;               // Index into const pool
    u2             super_class;              // Index into const pool
    u2             interfaces_count;
    u2             interfaces[count];
    u2             fields_count;
    field_info     fields[count];
    u2             methods_count;
    method_info    methods[count];
    u2             attributes_count;
    attribute_info attributes[count];
}

Object Layout in Memory

┌──────────────────────────────────────┐
│         Object Header                │
│  ┌────────────────────────────────┐  │
│  │ Mark Word (64 bits)            │  │  ← GC state, hash, lock info
│  ├────────────────────────────────┤  │
│  │ Class Pointer (32/64 bits)     │  │  ← Pointer to class metadata
│  ├────────────────────────────────┤  │
│  │ Array Length (if array)        │  │
│  └────────────────────────────────┘  │
├──────────────────────────────────────┤
│         Instance Fields              │
│  ┌────────────────────────────────┐  │
│  │ field1                         │  │
│  ├────────────────────────────────┤  │
│  │ field2                         │  │
│  ├────────────────────────────────┤  │
│  │ ...                            │  │
│  └────────────────────────────────┘  │
└──────────────────────────────────────┘

Project List

Projects are ordered to build your JVM incrementally, from simple concepts to a working implementation.


Project 1: Build a Stack-Based Calculator VM

  • File: LEARN_JVM_INTERNALS_BUILD_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Virtual Machines / Stack Machines
  • Software or Tool: GCC, GDB
  • Main Book: “Write Your Own Virtual Machine” by Justin Meiners

What you’ll build: A simple stack-based calculator that executes bytecode operations (PUSH, ADD, SUB, MUL, DIV, PRINT). This teaches the fundamental execution model of the JVM without Java complexity.

Why it teaches JVM: The JVM is fundamentally a stack machine. By building a simpler stack VM first, you’ll understand how operand stacks work, how instructions manipulate the stack, and how an instruction dispatch loop works.

Core challenges you’ll face:

  • Stack operations → maps to JVM operand stack
  • Instruction decoding → maps to bytecode interpretation
  • Dispatch loop → maps to JVM execution engine
  • Operand encoding → maps to bytecode format

Key Concepts:

  • Stack Machines: “Structure and Interpretation of Computer Programs” - Section 5.2
  • Virtual Machine Basics: Write Your Own Virtual Machine
  • Instruction Dispatch: “Crafting Interpreters” by Robert Nystrom - Chapter 15

Difficulty: Intermediate Time estimate: Weekend Prerequisites: C programming, basic data structures

Real world outcome:

// calculator_vm.c

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#define STACK_SIZE 256

typedef enum {
    OP_PUSH,    // Push constant onto stack
    OP_ADD,     // Pop 2, push sum
    OP_SUB,     // Pop 2, push difference
    OP_MUL,     // Pop 2, push product
    OP_DIV,     // Pop 2, push quotient
    OP_PRINT,   // Pop and print
    OP_HALT     // Stop execution
} OpCode;

typedef struct {
    int32_t stack[STACK_SIZE];
    int32_t sp;  // Stack pointer
    uint8_t *code;
    size_t pc;   // Program counter
} VM;

void vm_run(VM *vm);

int main() {
    // Bytecode: PUSH 5, PUSH 3, ADD, PUSH 2, MUL, PRINT, HALT
    // Computes: (5 + 3) * 2 = 16
    uint8_t program[] = {
        OP_PUSH, 0, 0, 0, 5,   // PUSH 5 (32-bit little endian)
        OP_PUSH, 0, 0, 0, 3,   // PUSH 3
        OP_ADD,                 // 5 + 3 = 8
        OP_PUSH, 0, 0, 0, 2,   // PUSH 2
        OP_MUL,                 // 8 * 2 = 16
        OP_PRINT,               // Print 16
        OP_HALT                 // Stop
    };

    VM vm = {.sp = 0, .code = program, .pc = 0};
    vm_run(&vm);
    return 0;
}
$ gcc -o calc_vm calculator_vm.c
$ ./calc_vm
Executing bytecode...
  PUSH 5        Stack: [5]
  PUSH 3        Stack: [5, 3]
  ADD           Stack: [8]
  PUSH 2        Stack: [8, 2]
  MUL           Stack: [16]
  PRINT         Output: 16
  HALT          Execution complete.

Implementation Hints:

The core execution loop:

void vm_run(VM *vm) {
    while (1) {
        uint8_t opcode = vm->code[vm->pc++];

        switch (opcode) {
            case OP_PUSH: {
                // Read 32-bit constant from bytecode
                int32_t value = read_int32(vm);
                vm->stack[vm->sp++] = value;
                break;
            }
            case OP_ADD: {
                int32_t b = vm->stack[--vm->sp];
                int32_t a = vm->stack[--vm->sp];
                vm->stack[vm->sp++] = a + b;
                break;
            }
            case OP_PRINT: {
                int32_t value = vm->stack[--vm->sp];
                printf("%d\n", value);
                break;
            }
            case OP_HALT:
                return;
            // ... other cases
        }
    }
}

Questions to explore:

  1. What happens if you pop from an empty stack?
  2. How would you add local variables?
  3. How would you add function calls?
  4. What’s the performance impact of the switch dispatch?

Learning milestones:

  1. Basic ops work → You understand stack manipulation
  2. You handle underflow → You understand safety
  3. You can trace execution → You understand debugging VMs
  4. You want more features → You’re ready for JVM complexity

Project 2: Parse Java Class Files

  • File: LEARN_JVM_INTERNALS_BUILD_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Python (for prototyping)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Binary Parsing / File Formats
  • Software or Tool: GCC, hexdump, javap
  • Main Book: “The Java Virtual Machine Specification” (Chapter 4)

What you’ll build: A class file parser that reads .class files and displays their contents: constant pool, methods, fields, and bytecode. This is the foundation of any JVM.

Why it teaches JVM: Before executing bytecode, you must load it. The class file format is the JVM’s “assembly language.” Understanding it deeply is essential for everything that follows.

Core challenges you’ll face:

  • Big-endian byte order → maps to network byte order
  • Constant pool parsing → maps to symbol resolution
  • Variable-length structures → maps to sequential parsing
  • Attribute handling → maps to Code attribute for bytecode

Key Concepts:

  • Class File Format: JVM Spec Chapter 4
  • Constant Pool: “Inside the Java Virtual Machine” by Bill Venners - Chapter 6
  • Binary Parsing: “Computer Systems: A Programmer’s Perspective” - Chapter 7

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 1 completed, understanding of binary formats

Real world outcome:

# Compile a simple Java file
$ cat Hello.java
public class Hello {
    public static void main(String[] args) {
        System.out.println("Hello, JVM!");
    }
}
$ javac Hello.java

# Parse with your tool
$ ./classparser Hello.class
=== Class File: Hello.class ===
Magic: 0xCAFEBABE ✓
Version: 61.0 (Java 17)
Constant Pool (22 entries):
  #1  = Methodref       #2.#3
  #2  = Class           #4
  #3  = NameAndType     #5:#6
  #4  = Utf8            "java/lang/Object"
  #5  = Utf8            "<init>"
  #6  = Utf8            "()V"
  #7  = Fieldref        #8.#9
  #8  = Class           #10
  #9  = NameAndType     #11:#12
  #10 = Utf8            "java/lang/System"
  #11 = Utf8            "out"
  #12 = Utf8            "Ljava/io/PrintStream;"
  #13 = String          #14
  #14 = Utf8            "Hello, JVM!"
  ...

Access Flags: ACC_PUBLIC ACC_SUPER
This Class: #17 (Hello)
Super Class: #2 (java/lang/Object)
Interfaces: (none)
Fields: (none)

Methods (2):
  Method: <init>()V
    Flags: ACC_PUBLIC
    Code:
      max_stack=1, max_locals=1
      0: aload_0
      1: invokespecial #1  // Object.<init>
      4: return

  Method: main([Ljava/lang/String;)V
    Flags: ACC_PUBLIC ACC_STATIC
    Code:
      max_stack=2, max_locals=1
      0: getstatic #7      // System.out
      3: ldc #13           // "Hello, JVM!"
      5: invokevirtual #15 // PrintStream.println
      8: return

Implementation Hints:

Class file reading structure:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <arpa/inet.h>  // For ntohl, ntohs (byte order)

typedef struct {
    uint8_t *data;
    size_t size;
    size_t pos;
} ClassReader;

uint8_t read_u1(ClassReader *r) {
    return r->data[r->pos++];
}

uint16_t read_u2(ClassReader *r) {
    uint16_t value = (r->data[r->pos] << 8) | r->data[r->pos + 1];
    r->pos += 2;
    return value;
}

uint32_t read_u4(ClassReader *r) {
    uint32_t value = (r->data[r->pos] << 24) |
                     (r->data[r->pos + 1] << 16) |
                     (r->data[r->pos + 2] << 8) |
                     r->data[r->pos + 3];
    r->pos += 4;
    return value;
}

// Constant pool entry types
#define CONSTANT_Utf8               1
#define CONSTANT_Integer            3
#define CONSTANT_Float              4
#define CONSTANT_Long               5
#define CONSTANT_Double             6
#define CONSTANT_Class              7
#define CONSTANT_String             8
#define CONSTANT_Fieldref           9
#define CONSTANT_Methodref          10
#define CONSTANT_InterfaceMethodref 11
#define CONSTANT_NameAndType        12
#define CONSTANT_MethodHandle       15
#define CONSTANT_MethodType         16
#define CONSTANT_InvokeDynamic      18

Parsing the constant pool:

typedef struct {
    uint8_t tag;
    union {
        struct { uint16_t length; char *bytes; } utf8;
        struct { uint16_t class_index; uint16_t name_type_index; } methodref;
        struct { uint16_t name_index; } class_info;
        struct { uint16_t string_index; } string_info;
        int32_t integer;
        float float_val;
        // ... other types
    } info;
} ConstantPoolEntry;

void parse_constant_pool(ClassReader *r, ClassFile *cf) {
    cf->constant_pool_count = read_u2(r);
    cf->constant_pool = malloc(sizeof(ConstantPoolEntry) * cf->constant_pool_count);

    // Note: Indices start at 1, not 0!
    for (int i = 1; i < cf->constant_pool_count; i++) {
        uint8_t tag = read_u1(r);
        cf->constant_pool[i].tag = tag;

        switch (tag) {
            case CONSTANT_Utf8: {
                uint16_t length = read_u2(r);
                cf->constant_pool[i].info.utf8.length = length;
                cf->constant_pool[i].info.utf8.bytes = malloc(length + 1);
                memcpy(cf->constant_pool[i].info.utf8.bytes, r->data + r->pos, length);
                cf->constant_pool[i].info.utf8.bytes[length] = '\0';
                r->pos += length;
                break;
            }
            case CONSTANT_Methodref:
            case CONSTANT_Fieldref:
            case CONSTANT_InterfaceMethodref: {
                cf->constant_pool[i].info.methodref.class_index = read_u2(r);
                cf->constant_pool[i].info.methodref.name_type_index = read_u2(r);
                break;
            }
            // ... handle other constant types
        }
    }
}

Questions to explore:

  1. Why does the constant pool start at index 1?
  2. Why do Long and Double take two slots?
  3. How do you resolve symbolic references?
  4. What’s the Code attribute structure?

Learning milestones:

  1. Magic number reads correctly → You understand endianness
  2. Constant pool parses → You understand the symbol table
  3. Methods parse with bytecode → You can see instructions
  4. Output matches javap → Your parser is correct

Project 3: Bytecode Disassembler

  • File: LEARN_JVM_INTERNALS_BUILD_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Bytecode / Instruction Sets
  • Software or Tool: GCC, javap (for comparison)
  • Main Book: “The Java Virtual Machine Specification” (Chapter 6)

What you’ll build: A disassembler that decodes JVM bytecode into human-readable mnemonics, showing operands and constant pool references.

Why it teaches JVM: To execute bytecode, you must understand every instruction. Building a disassembler forces you to study the instruction set, operand encoding, and how instructions reference the constant pool.

Core challenges you’ll face:

  • Instruction encoding → maps to variable-length opcodes
  • Operand types → maps to immediate, constant pool index, branch offset
  • Wide instructions → maps to 16-bit local variable indices
  • Tableswitch/lookupswitch → maps to complex multi-byte encoding

Key Concepts:

  • JVM Instruction Set: JVM Spec Chapter 6
  • Bytecode Reference: “Inside the Java Virtual Machine” by Bill Venners - Chapter 10

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2 completed

Real world outcome:

$ ./jdisasm Hello.class

public class Hello
  minor version: 0
  major version: 61
  flags: ACC_PUBLIC, ACC_SUPER

Constant pool:
   #1 = Methodref          #2.#3          // java/lang/Object."<init>":()V
   #2 = Class              #4             // java/lang/Object
   ...

{
  public Hello();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 1: 0

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #7   // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #13  // String Hello, JVM!
         5: invokevirtual #15  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LineNumberTable:
        line 3: 0
        line 4: 8
}

Implementation Hints:

Instruction table structure:

typedef enum {
    // Constants
    OP_NOP         = 0x00,
    OP_ACONST_NULL = 0x01,
    OP_ICONST_M1   = 0x02,
    OP_ICONST_0    = 0x03,
    OP_ICONST_1    = 0x04,
    OP_ICONST_2    = 0x05,
    OP_ICONST_3    = 0x06,
    OP_ICONST_4    = 0x07,
    OP_ICONST_5    = 0x08,
    OP_BIPUSH      = 0x10,  // Push byte
    OP_SIPUSH      = 0x11,  // Push short
    OP_LDC         = 0x12,  // Push from constant pool

    // Loads
    OP_ILOAD       = 0x15,
    OP_ILOAD_0     = 0x1A,
    OP_ILOAD_1     = 0x1B,
    OP_ILOAD_2     = 0x1C,
    OP_ILOAD_3     = 0x1D,
    OP_ALOAD       = 0x19,
    OP_ALOAD_0     = 0x2A,

    // Stores
    OP_ISTORE      = 0x36,
    OP_ISTORE_0    = 0x3B,
    OP_ASTORE      = 0x3A,

    // Arithmetic
    OP_IADD        = 0x60,
    OP_ISUB        = 0x64,
    OP_IMUL        = 0x68,
    OP_IDIV        = 0x6C,

    // Control flow
    OP_IFEQ        = 0x99,
    OP_IFNE        = 0x9A,
    OP_IF_ICMPEQ   = 0x9F,
    OP_GOTO        = 0xA7,

    // Method invocation
    OP_INVOKEVIRTUAL   = 0xB6,
    OP_INVOKESPECIAL   = 0xB7,
    OP_INVOKESTATIC    = 0xB8,
    OP_INVOKEINTERFACE = 0xB9,

    // Return
    OP_RETURN      = 0xB1,
    OP_IRETURN     = 0xAC,
    OP_ARETURN     = 0xB0,

    // Objects
    OP_NEW         = 0xBB,
    OP_GETFIELD    = 0xB4,
    OP_PUTFIELD    = 0xB5,
    OP_GETSTATIC   = 0xB2,
    OP_PUTSTATIC   = 0xB3,

    // ... ~200 total opcodes
} OpCode;

typedef struct {
    const char *mnemonic;
    int operand_bytes;  // 0, 1, 2, or special (-1 for variable)
    const char *format; // For printing
} InstructionInfo;

InstructionInfo instructions[256] = {
    [OP_NOP]           = {"nop",           0, ""},
    [OP_ICONST_0]      = {"iconst_0",      0, ""},
    [OP_BIPUSH]        = {"bipush",        1, "%d"},
    [OP_SIPUSH]        = {"sipush",        2, "%d"},
    [OP_LDC]           = {"ldc",           1, "#%d"},
    [OP_ILOAD]         = {"iload",         1, "%d"},
    [OP_ILOAD_0]       = {"iload_0",       0, ""},
    [OP_INVOKEVIRTUAL] = {"invokevirtual", 2, "#%d"},
    [OP_GOTO]          = {"goto",          2, "%+d"},  // Signed offset
    // ...
};

Disassemble a method:

void disassemble_code(uint8_t *code, size_t length, ConstantPool *cp) {
    size_t pc = 0;
    while (pc < length) {
        printf("%4zu: ", pc);
        uint8_t opcode = code[pc++];
        InstructionInfo *info = &instructions[opcode];

        printf("%-15s", info->mnemonic);

        switch (info->operand_bytes) {
            case 0:
                break;
            case 1: {
                int8_t operand = (int8_t)code[pc++];
                printf(info->format, operand);
                break;
            }
            case 2: {
                int16_t operand = (code[pc] << 8) | code[pc + 1];
                pc += 2;
                printf(info->format, operand);
                // If constant pool ref, print resolved name
                if (info->format[0] == '#') {
                    printf("  // %s", resolve_cp_entry(cp, operand));
                }
                break;
            }
            // Handle tableswitch, lookupswitch specially
        }
        printf("\n");
    }
}

Questions to explore:

  1. How are branch offsets encoded (signed)?
  2. What makes tableswitch and lookupswitch different?
  3. How do you handle the wide prefix instruction?
  4. How do you pretty-print constant pool references?

Learning milestones:

  1. Basic instructions decode → You understand encoding
  2. Constant pool refs resolve → You understand symbols
  3. Control flow shows targets → You understand branches
  4. Output matches javap → Your disassembler is complete

Project 4: Basic Bytecode Interpreter

  • File: LEARN_JVM_INTERNALS_BUILD_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Interpreters / Execution Engines
  • Software or Tool: GCC, GDB, Valgrind
  • Main Book: “The Java Virtual Machine Specification” (Chapter 2)

What you’ll build: A working bytecode interpreter that can execute simple Java methods: arithmetic, local variables, control flow, and method returns.

Why it teaches JVM: This is the heart of the JVM. Your interpreter will actually run Java bytecode, making you understand every instruction’s semantics, the operand stack, local variables, and program counter management.

Core challenges you’ll face:

  • Stack frame management → maps to method invocation
  • Type handling → maps to int, long, float, double, reference
  • Control flow → maps to branches, jumps, loops
  • Return value passing → maps to caller’s operand stack

Key Concepts:

  • JVM Execution Model: “Inside the Java Virtual Machine” by Bill Venners
  • Instruction Semantics: JVM Spec Chapter 6
  • Interpreter Design: “Crafting Interpreters” by Robert Nystrom

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Projects 1-3 completed

Real world outcome:

// Test.java
public class Test {
    public static int factorial(int n) {
        if (n <= 1) return 1;
        return n * factorial(n - 1);
    }

    public static void main(String[] args) {
        int result = factorial(5);
        // We'll print this via native method later
    }
}
$ javac Test.java
$ ./minijvm Test
[MiniJVM] Loading class: Test
[MiniJVM] Resolving class: java/lang/Object
[MiniJVM] Executing: Test.main([Ljava/lang/String;)V

[DEBUG] Frame: main
  0: iconst_5             stack: [5]
  1: invokestatic #2      // Test.factorial(I)I

  [DEBUG] Frame: factorial (n=5)
    0: iload_0             stack: [5]
    1: iconst_1            stack: [5, 1]
    2: if_icmpgt 7         branch taken (5 > 1)
    7: iload_0             stack: [5]
    8: iload_0             stack: [5, 5]
    9: iconst_1            stack: [5, 5, 1]
   10: isub                stack: [5, 4]
   11: invokestatic #2     // Test.factorial(I)I

    [DEBUG] Frame: factorial (n=4)
      ... (recursive calls)

    [DEBUG] Frame: factorial (n=1)
      0: iload_0           stack: [1]
      1: iconst_1          stack: [1, 1]
      2: if_icmpgt 7       branch NOT taken
      5: iconst_1          stack: [1]
      6: ireturn           returning 1

  ... (returns bubble up: 1 * 2 * 3 * 4 * 5 = 120)

[MiniJVM] Result in local[1]: 120
[MiniJVM] Execution complete.

Implementation Hints:

Core VM structures:

// Value can hold any JVM type
typedef union {
    int32_t i;      // int
    int64_t j;      // long
    float f;        // float
    double d;       // double
    void *a;        // reference (pointer to object)
} Value;

// Stack frame for each method call
typedef struct Frame {
    Value *locals;           // Local variable array
    Value *operand_stack;    // Operand stack
    int sp;                  // Stack pointer
    uint8_t *code;           // Bytecode
    size_t pc;               // Program counter
    ConstantPool *cp;        // Constant pool reference
    struct Frame *prev;      // Link to caller's frame
} Frame;

// The VM state
typedef struct {
    Frame *current_frame;
    ClassFile **loaded_classes;
    int class_count;
    // Heap would go here
} VM;

Main execution loop:

Value vm_execute(VM *vm) {
    Frame *frame = vm->current_frame;

    while (1) {
        uint8_t opcode = frame->code[frame->pc++];

        switch (opcode) {
            // Push constants
            case OP_ICONST_0:
            case OP_ICONST_1:
            case OP_ICONST_2:
            case OP_ICONST_3:
            case OP_ICONST_4:
            case OP_ICONST_5:
                frame->operand_stack[frame->sp++].i = opcode - OP_ICONST_0;
                break;

            case OP_BIPUSH: {
                int8_t value = (int8_t)frame->code[frame->pc++];
                frame->operand_stack[frame->sp++].i = value;
                break;
            }

            // Load from locals
            case OP_ILOAD: {
                uint8_t index = frame->code[frame->pc++];
                frame->operand_stack[frame->sp++] = frame->locals[index];
                break;
            }

            case OP_ILOAD_0:
            case OP_ILOAD_1:
            case OP_ILOAD_2:
            case OP_ILOAD_3:
                frame->operand_stack[frame->sp++] = frame->locals[opcode - OP_ILOAD_0];
                break;

            // Arithmetic
            case OP_IADD: {
                int32_t b = frame->operand_stack[--frame->sp].i;
                int32_t a = frame->operand_stack[--frame->sp].i;
                frame->operand_stack[frame->sp++].i = a + b;
                break;
            }

            case OP_IMUL: {
                int32_t b = frame->operand_stack[--frame->sp].i;
                int32_t a = frame->operand_stack[--frame->sp].i;
                frame->operand_stack[frame->sp++].i = a * b;
                break;
            }

            // Control flow
            case OP_IF_ICMPGT: {
                int16_t offset = (frame->code[frame->pc] << 8) | frame->code[frame->pc + 1];
                frame->pc += 2;
                int32_t b = frame->operand_stack[--frame->sp].i;
                int32_t a = frame->operand_stack[--frame->sp].i;
                if (a > b) {
                    frame->pc = frame->pc - 3 + offset;  // -3 because we already advanced
                }
                break;
            }

            case OP_GOTO: {
                int16_t offset = (frame->code[frame->pc] << 8) | frame->code[frame->pc + 1];
                frame->pc = frame->pc - 1 + offset;
                break;
            }

            // Method invocation
            case OP_INVOKESTATIC: {
                uint16_t index = (frame->code[frame->pc] << 8) | frame->code[frame->pc + 1];
                frame->pc += 2;

                // Resolve method from constant pool
                MethodInfo *method = resolve_method(vm, frame->cp, index);

                // Create new frame
                Frame *new_frame = create_frame(method);
                new_frame->prev = frame;

                // Pop arguments from caller's stack to callee's locals
                int arg_count = count_args(method->descriptor);
                for (int i = arg_count - 1; i >= 0; i--) {
                    new_frame->locals[i] = frame->operand_stack[--frame->sp];
                }

                // Switch to new frame
                frame = new_frame;
                vm->current_frame = frame;
                break;
            }

            // Return
            case OP_IRETURN: {
                Value result = frame->operand_stack[--frame->sp];
                Frame *caller = frame->prev;
                free_frame(frame);

                if (caller == NULL) {
                    return result;  // Return from main
                }

                // Push result onto caller's stack
                caller->operand_stack[caller->sp++] = result;
                frame = caller;
                vm->current_frame = frame;
                break;
            }

            case OP_RETURN: {
                Frame *caller = frame->prev;
                free_frame(frame);

                if (caller == NULL) {
                    Value none = {0};
                    return none;
                }

                frame = caller;
                vm->current_frame = frame;
                break;
            }

            default:
                fprintf(stderr, "Unknown opcode: 0x%02X\n", opcode);
                exit(1);
        }
    }
}

Questions to explore:

  1. How do you handle long and double (2 stack slots)?
  2. How do you handle instance methods (this pointer)?
  3. What about exceptions?
  4. How do you handle method overloading?

Learning milestones:

  1. Arithmetic works → You understand the operand stack
  2. Recursion works → You understand frames and invocation
  3. Control flow works → You understand branch instructions
  4. Complex programs run → Your interpreter is functional

Project 5: Object System and Heap

  • File: LEARN_JVM_INTERNALS_BUILD_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Memory Management / Object Layout
  • Software or Tool: GCC, Valgrind, GDB
  • Main Book: “The Garbage Collection Handbook” by Jones, Hosking, Moss

What you’ll build: An object allocation system with heap management, object headers, field access, and the new instruction implementation.

Why it teaches JVM: Objects are the heart of Java. Understanding how objects are laid out in memory, how fields are accessed, and how inheritance affects layout is essential for a working JVM.

Core challenges you’ll face:

  • Object header design → maps to class pointer, GC bits, hash
  • Field layout → maps to alignment, inheritance
  • Heap allocation → maps to bump pointer or free list
  • Reference semantics → maps to pointers vs copies

Key Concepts:

  • Object Layout: “Java Performance” by Scott Oaks - Chapter 3
  • Memory Allocation: “The Garbage Collection Handbook” - Chapter 7
  • JVM Object Model: “Inside the Java Virtual Machine” - Chapter 5

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 4 completed

Real world outcome:

// Point.java
public class Point {
    int x;
    int y;

    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }

    public int distanceSquared() {
        return x * x + y * y;
    }
}

// Main.java
public class Main {
    public static void main(String[] args) {
        Point p = new Point(3, 4);
        int d = p.distanceSquared();
        // d should be 25
    }
}
$ ./minijvm Main
[HEAP] Allocating Point: 24 bytes at 0x10000
  Header: class=Point, hash=0, gc_mark=0
  Fields: x=0, y=0

[EXEC] Main.main
  0: iconst_3
  1: iconst_4
  2: invokespecial #2    // Point.<init>(II)V
     [Setting p.x = 3]
     [Setting p.y = 4]
  5: astore_1            // Store ref in local[1]

[HEAP] Object at 0x10000:
  class: Point
  x: 3
  y: 4

  6: aload_1             // Push ref
  7: invokevirtual #3    // Point.distanceSquared()I

[EXEC] Point.distanceSquared (this=0x10000)
  0: aload_0             // Push 'this'
  1: getfield #1         // x = 3
  4: aload_0
  5: getfield #1         // x = 3
  8: imul                // 9
  9: aload_0
 10: getfield #2         // y = 4
 13: aload_0
 14: getfield #2         // y = 4
 17: imul                // 16
 18: iadd                // 25
 19: ireturn

[RESULT] distanceSquared returned: 25

Implementation Hints:

Object header and layout:

// Object header (every object starts with this)
typedef struct {
    uint32_t hash_and_gc;    // Hash code (24 bits) + GC flags (8 bits)
    struct ClassInfo *klass; // Pointer to class metadata
} ObjectHeader;

// Macro to get the header from an object pointer
#define OBJ_HEADER(obj) ((ObjectHeader*)(obj))
#define OBJ_CLASS(obj)  (OBJ_HEADER(obj)->klass)

// A Point object in memory:
// +0: ObjectHeader (16 bytes on 64-bit)
// +16: x (4 bytes, int)
// +20: y (4 bytes, int)
// Total: 24 bytes

// Class metadata
typedef struct ClassInfo {
    char *name;
    struct ClassInfo *super;
    FieldInfo *fields;
    int field_count;
    int instance_size;       // Size of an instance in bytes
    MethodInfo *methods;
    int method_count;
    ConstantPool *cp;
} ClassInfo;

// Field metadata
typedef struct {
    char *name;
    char *descriptor;        // "I" for int, "Ljava/lang/String;" for String
    int offset;              // Byte offset from object start
    int size;                // Size in bytes
} FieldInfo;

Heap and allocation:

typedef struct {
    uint8_t *start;
    uint8_t *end;
    uint8_t *free;           // Next free address (bump pointer)
} Heap;

Heap heap;

void heap_init(size_t size) {
    heap.start = malloc(size);
    heap.end = heap.start + size;
    heap.free = heap.start;
}

void* heap_alloc(ClassInfo *klass) {
    size_t size = sizeof(ObjectHeader) + klass->instance_size;

    // Align to 8 bytes
    size = (size + 7) & ~7;

    if (heap.free + size > heap.end) {
        // Trigger GC (we'll implement this later)
        gc_collect();
        if (heap.free + size > heap.end) {
            fprintf(stderr, "OutOfMemoryError\n");
            exit(1);
        }
    }

    void *obj = heap.free;
    heap.free += size;

    // Initialize header
    ObjectHeader *header = (ObjectHeader*)obj;
    header->hash_and_gc = 0;
    header->klass = klass;

    // Zero-initialize fields
    memset((uint8_t*)obj + sizeof(ObjectHeader), 0, klass->instance_size);

    return obj;
}

Field access instructions:

case OP_NEW: {
    uint16_t index = read_u2_from_code(frame);
    ClassInfo *klass = resolve_class(vm, frame->cp, index);
    void *obj = heap_alloc(klass);
    frame->operand_stack[frame->sp++].a = obj;
    break;
}

case OP_GETFIELD: {
    uint16_t index = read_u2_from_code(frame);
    FieldInfo *field = resolve_field(vm, frame->cp, index);

    void *obj = frame->operand_stack[--frame->sp].a;
    if (obj == NULL) {
        throw_null_pointer_exception(vm);
    }

    uint8_t *field_ptr = (uint8_t*)obj + sizeof(ObjectHeader) + field->offset;

    // Push field value based on type
    switch (field->descriptor[0]) {
        case 'I':  // int
            frame->operand_stack[frame->sp++].i = *(int32_t*)field_ptr;
            break;
        case 'J':  // long
            frame->operand_stack[frame->sp++].j = *(int64_t*)field_ptr;
            break;
        case 'L':  // reference
        case '[':  // array
            frame->operand_stack[frame->sp++].a = *(void**)field_ptr;
            break;
        // ... other types
    }
    break;
}

case OP_PUTFIELD: {
    uint16_t index = read_u2_from_code(frame);
    FieldInfo *field = resolve_field(vm, frame->cp, index);

    Value value = frame->operand_stack[--frame->sp];
    void *obj = frame->operand_stack[--frame->sp].a;

    if (obj == NULL) {
        throw_null_pointer_exception(vm);
    }

    uint8_t *field_ptr = (uint8_t*)obj + sizeof(ObjectHeader) + field->offset;

    switch (field->descriptor[0]) {
        case 'I':
            *(int32_t*)field_ptr = value.i;
            break;
        case 'L':
        case '[':
            *(void**)field_ptr = value.a;
            break;
        // ... other types
    }
    break;
}

Questions to explore:

  1. How do you handle inheritance and field offset calculation?
  2. How do you implement arrays?
  3. What about String objects (which are special)?
  4. How do you handle null references?

Learning milestones:

  1. new allocates objects → You understand heap allocation
  2. Field access works → You understand object layout
  3. Constructors work → You understand initialization
  4. Methods receive ‘this’ → You understand instance methods

Project 6: Mark-and-Sweep Garbage Collector

  • File: LEARN_JVM_INTERNALS_BUILD_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 5: Master
  • Knowledge Area: Garbage Collection / Memory Management
  • Software or Tool: GCC, Valgrind, AddressSanitizer
  • Main Book: “The Garbage Collection Handbook” by Jones, Hosking, Moss

What you’ll build: A mark-and-sweep garbage collector that automatically reclaims unreachable objects, finding roots from the stack and static fields.

Why it teaches JVM: Garbage collection is what makes Java memory-safe. Implementing it yourself teaches you how the JVM tracks live objects, traverses the object graph, and reclaims memory without programmer intervention.

Core challenges you’ll face:

  • Root finding → maps to scanning stacks and static fields
  • Object graph traversal → maps to marking reachable objects
  • Sweeping → maps to reclaiming unmarked memory
  • Fragmentation → maps to why compacting GC exists

Key Concepts:

Difficulty: Master Time estimate: 3-4 weeks Prerequisites: Projects 4-5 completed

Real world outcome:

// GCTest.java
public class GCTest {
    public static void main(String[] args) {
        for (int i = 0; i < 1000; i++) {
            Object garbage = new Object();
            // 'garbage' becomes unreachable each iteration
        }
        // All 1000 objects should be collected
    }
}
$ ./minijvm -verbose:gc GCTest
[GC] Heap: 1MB, used: 0KB
[EXEC] Starting main...

[ALLOC] Object #1 at 0x10000 (32 bytes)
[ALLOC] Object #2 at 0x10020 (32 bytes)
...
[ALLOC] Object #100 at 0x10C60 (32 bytes)

[GC] Heap usage: 3200 bytes (0.3%)
[GC] Triggering collection...
[GC] Phase 1: MARK
  Scanning roots...
    - Thread stack: main (1 frames)
    - Static fields: 0 references
  Marking reachable objects...
    - Marked 1 object (current loop variable)
  Total marked: 1

[GC] Phase 2: SWEEP
  Scanning heap from 0x10000 to 0x10C80...
    - 0x10000: NOT marked, freeing 32 bytes
    - 0x10020: NOT marked, freeing 32 bytes
    ...
    - 0x10C60: MARKED, keeping

  Freed: 99 objects, 3168 bytes
  Live: 1 object, 32 bytes

[GC] Collection complete: 3168 bytes reclaimed
[GC] Heap: used 32 bytes (0.003%)

... (loop continues, GC triggers periodically)

[GC] Final stats:
  Total allocations: 1000 objects
  Total collected: 1000 objects
  Peak heap usage: 32KB
  GC cycles: 10

Implementation Hints:

GC data structures:

// Add GC mark bit to object header
typedef struct {
    uint32_t hash    : 24;
    uint32_t gc_mark : 1;   // 0 = white (unmarked), 1 = black (marked)
    uint32_t gc_age  : 7;   // For generational GC later
    struct ClassInfo *klass;
} ObjectHeader;

// Free list for swept objects
typedef struct FreeBlock {
    size_t size;
    struct FreeBlock *next;
} FreeBlock;

typedef struct {
    uint8_t *start;
    uint8_t *end;
    uint8_t *free;          // For bump allocation
    FreeBlock *free_list;   // For free-list allocation after GC
    size_t bytes_allocated;
    size_t gc_threshold;    // Trigger GC when exceeded
} Heap;

Mark phase:

void gc_mark(VM *vm) {
    // Reset all marks
    gc_clear_marks(vm);

    // Find roots and mark from them
    gc_mark_roots(vm);
}

void gc_clear_marks(VM *vm) {
    // Iterate through all allocated objects
    uint8_t *ptr = vm->heap.start;
    while (ptr < vm->heap.free) {
        ObjectHeader *header = (ObjectHeader*)ptr;
        header->gc_mark = 0;  // Clear mark
        ptr += get_object_size(header);
    }
}

void gc_mark_roots(VM *vm) {
    // 1. Scan thread stacks
    for (Frame *frame = vm->current_frame; frame != NULL; frame = frame->prev) {
        // Mark references in local variables
        for (int i = 0; i < frame->locals_count; i++) {
            if (is_reference_type(frame->local_types[i])) {
                gc_mark_object(frame->locals[i].a);
            }
        }

        // Mark references on operand stack
        for (int i = 0; i < frame->sp; i++) {
            if (is_reference_type(frame->stack_types[i])) {
                gc_mark_object(frame->operand_stack[i].a);
            }
        }
    }

    // 2. Scan static fields
    for (int i = 0; i < vm->class_count; i++) {
        ClassInfo *klass = vm->loaded_classes[i];
        for (int j = 0; j < klass->static_field_count; j++) {
            if (is_reference_type(klass->static_fields[j].descriptor)) {
                gc_mark_object(klass->static_values[j].a);
            }
        }
    }
}

void gc_mark_object(void *obj) {
    if (obj == NULL) return;

    ObjectHeader *header = OBJ_HEADER(obj);
    if (header->gc_mark) return;  // Already marked

    header->gc_mark = 1;  // Mark this object

    // Recursively mark referenced objects
    ClassInfo *klass = header->klass;
    for (int i = 0; i < klass->field_count; i++) {
        FieldInfo *field = &klass->fields[i];
        if (is_reference_type(field->descriptor)) {
            uint8_t *field_ptr = (uint8_t*)obj + sizeof(ObjectHeader) + field->offset;
            void *ref = *(void**)field_ptr;
            gc_mark_object(ref);  // Recursive mark
        }
    }

    // If array of references, mark each element
    if (klass->is_array && is_reference_type(klass->component_type)) {
        int length = get_array_length(obj);
        void **elements = get_array_data(obj);
        for (int i = 0; i < length; i++) {
            gc_mark_object(elements[i]);
        }
    }
}

Sweep phase:

void gc_sweep(VM *vm) {
    size_t freed = 0;
    vm->heap.free_list = NULL;

    uint8_t *ptr = vm->heap.start;
    while (ptr < vm->heap.free) {
        ObjectHeader *header = (ObjectHeader*)ptr;
        size_t obj_size = get_object_size(header);

        if (!header->gc_mark) {
            // Object is garbage, add to free list
            FreeBlock *block = (FreeBlock*)ptr;
            block->size = obj_size;
            block->next = vm->heap.free_list;
            vm->heap.free_list = block;
            freed += obj_size;

            printf("[GC] Freed object at %p (%zu bytes)\n", ptr, obj_size);
        }

        ptr += obj_size;
    }

    vm->heap.bytes_allocated -= freed;
    printf("[GC] Sweep complete: %zu bytes freed\n", freed);
}

Allocation with free list:

void* heap_alloc(VM *vm, ClassInfo *klass) {
    size_t size = sizeof(ObjectHeader) + klass->instance_size;
    size = ALIGN(size, 8);

    // Check if GC needed
    if (vm->heap.bytes_allocated + size > vm->heap.gc_threshold) {
        gc_collect(vm);
    }

    // Try free list first
    FreeBlock **prev = &vm->heap.free_list;
    for (FreeBlock *block = vm->heap.free_list; block; block = block->next) {
        if (block->size >= size) {
            // Found a block, remove from free list
            *prev = block->next;
            void *obj = (void*)block;

            // Initialize object
            ObjectHeader *header = (ObjectHeader*)obj;
            header->gc_mark = 0;
            header->klass = klass;
            memset((uint8_t*)obj + sizeof(ObjectHeader), 0, klass->instance_size);

            vm->heap.bytes_allocated += size;
            return obj;
        }
        prev = &block->next;
    }

    // Fall back to bump allocation
    if (vm->heap.free + size > vm->heap.end) {
        gc_collect(vm);
        if (vm->heap.free + size > vm->heap.end) {
            fprintf(stderr, "OutOfMemoryError\n");
            exit(1);
        }
    }

    void *obj = vm->heap.free;
    vm->heap.free += size;
    vm->heap.bytes_allocated += size;

    // Initialize
    ObjectHeader *header = (ObjectHeader*)obj;
    header->gc_mark = 0;
    header->klass = klass;
    memset((uint8_t*)obj + sizeof(ObjectHeader), 0, klass->instance_size);

    return obj;
}

Questions to explore:

  1. How do you avoid stack overflow during recursive marking?
  2. What about circular references?
  3. How do you handle arrays?
  4. What’s the stop-the-world problem?

Learning milestones:

  1. Marking finds all live objects → You understand reachability
  2. Sweeping reclaims dead objects → You understand collection
  3. Program runs without leaks → GC is correct
  4. GC doesn’t crash → Root finding is complete

Project 7: Exception Handling

  • File: LEARN_JVM_INTERNALS_BUILD_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Exception Handling / Stack Unwinding
  • Software or Tool: GCC, GDB
  • Main Book: “The Java Virtual Machine Specification” (Chapter 2.10)

What you’ll build: An exception handling system with try-catch-finally, exception tables, and stack unwinding.

Why it teaches JVM: Java’s exception handling is integral to the language. Understanding exception tables, handler searching, and stack unwinding reveals how the JVM maintains control flow safety.

Core challenges you’ll face:

  • Exception table parsing → maps to Code attribute structure
  • Handler matching → maps to type checking and inheritance
  • Stack unwinding → maps to frame cleanup
  • Finally blocks → maps to special bytecode patterns

Key Concepts:

  • Exception Tables: JVM Spec - Exceptions
  • athrow Instruction: JVM Spec Chapter 6

Difficulty: Expert Time estimate: 2 weeks Prerequisites: Projects 4-6 completed

Real world outcome:

public class ExceptionTest {
    public static void main(String[] args) {
        try {
            int result = divide(10, 0);
        } catch (ArithmeticException e) {
            System.out.println("Caught: division by zero");
        }
    }

    public static int divide(int a, int b) {
        if (b == 0) {
            throw new ArithmeticException("/ by zero");
        }
        return a / b;
    }
}
$ ./minijvm ExceptionTest
[EXEC] main
  0: bipush 10
  1: iconst_0
  2: invokestatic divide

[EXEC] divide (a=10, b=0)
  0: iload_1          // b
  1: ifeq 12          // if b == 0, goto 12
  12: new #2          // ArithmeticException
  15: dup
  16: ldc #3          // "/ by zero"
  18: invokespecial #4 // ArithmeticException.<init>(String)
  21: athrow           // THROW!

[EXCEPTION] ArithmeticException thrown at divide:21
[EXCEPTION] Searching for handler in divide...
  No handler found, unwinding frame.
[EXCEPTION] Searching for handler in main...
  Exception table:
    from=0, to=5, target=8, type=ArithmeticException
  Handler found at main:8 for ArithmeticException

[EXEC] main (exception handler)
  8: astore_1         // Store exception in local[1]
  9: getstatic #5     // System.out
  12: ldc #6          // "Caught: division by zero"
  14: invokevirtual #7 // println

Caught: division by zero

Implementation Hints:

Exception table structure:

typedef struct {
    uint16_t start_pc;    // Try block start
    uint16_t end_pc;      // Try block end (exclusive)
    uint16_t handler_pc;  // Catch block start
    uint16_t catch_type;  // 0 = catch all, else index of class in constant pool
} ExceptionHandler;

typedef struct {
    // ... other method info
    uint8_t *code;
    int code_length;
    ExceptionHandler *exception_table;
    int exception_table_length;
} MethodInfo;

Exception throwing:

void throw_exception(VM *vm, void *exception_obj) {
    ClassInfo *exception_class = OBJ_CLASS(exception_obj);

    while (vm->current_frame != NULL) {
        Frame *frame = vm->current_frame;
        MethodInfo *method = frame->method;

        // Search exception table for matching handler
        for (int i = 0; i < method->exception_table_length; i++) {
            ExceptionHandler *handler = &method->exception_table[i];

            // Check if PC is in the try block
            if (frame->pc >= handler->start_pc && frame->pc < handler->end_pc) {

                // Check if exception type matches
                if (handler->catch_type == 0 ||
                    is_instance_of(exception_class, resolve_class(vm, frame->cp, handler->catch_type))) {

                    // Found handler! Jump to it
                    frame->pc = handler->handler_pc;
                    frame->sp = 0;  // Clear operand stack
                    frame->operand_stack[frame->sp++].a = exception_obj;  // Push exception
                    return;  // Continue execution from handler
                }
            }
        }

        // No handler in this frame, unwind
        Frame *prev = frame->prev;
        free_frame(frame);
        vm->current_frame = prev;
    }

    // No handler found anywhere - uncaught exception
    fprintf(stderr, "Exception in thread \"main\" %s\n", exception_class->name);
    print_stack_trace(exception_obj);
    exit(1);
}

athrow instruction:

case OP_ATHROW: {
    void *exception = frame->operand_stack[--frame->sp].a;
    if (exception == NULL) {
        throw_null_pointer_exception(vm);
    } else {
        throw_exception(vm, exception);
    }
    break;
}

Questions to explore:

  1. How does finally work (it’s complex!)?
  2. What about nested try-catch?
  3. How do you implement the stack trace?
  4. What about checked vs unchecked exceptions?

Learning milestones:

  1. Simple throw/catch works → You understand exception tables
  2. Stack unwinding works → You understand frame cleanup
  3. Type matching works → You understand handler search
  4. Finally works → You understand complex control flow

Project 8: Method Dispatch (Virtual, Interface, Special)

  • File: LEARN_JVM_INTERNALS_BUILD_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: OOP / Virtual Tables / Polymorphism
  • Software or Tool: GCC
  • Main Book: “The Java Virtual Machine Specification” (Chapter 5.4)

What you’ll build: Complete method dispatch supporting invokevirtual, invokeinterface, invokespecial, and invokestatic with proper virtual table (vtable) implementation.

Why it teaches JVM: Method dispatch is the core of object-oriented execution. Understanding vtables, interface dispatch, and special invocations reveals how polymorphism actually works at the machine level.

Core challenges you’ll face:

  • Virtual table construction → maps to method inheritance
  • Interface dispatch → maps to itable or linear search
  • Special invocation → maps to constructors and private methods
  • Method resolution → maps to constant pool resolution

Key Concepts:

  • Method Invocation: JVM Spec Chapter 5.4
  • Virtual Tables: “Inside the Java Virtual Machine” - Chapter 8
  • Interface Dispatch: Various papers on itable implementation

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Projects 4-5 completed

Real world outcome:

interface Shape {
    int area();
}

class Rectangle implements Shape {
    int width, height;
    public Rectangle(int w, int h) { width = w; height = h; }
    public int area() { return width * height; }
}

class Circle implements Shape {
    int radius;
    public Circle(int r) { radius = r; }
    public int area() { return (int)(3.14159 * radius * radius); }
}

public class DispatchTest {
    public static void main(String[] args) {
        Shape s1 = new Rectangle(3, 4);
        Shape s2 = new Circle(5);

        int a1 = s1.area();  // Should call Rectangle.area()
        int a2 = s2.area();  // Should call Circle.area()
    }
}
$ ./minijvm DispatchTest
[CLASS] Loading Rectangle
  vtable[0] = Object.hashCode
  vtable[1] = Object.equals
  vtable[2] = Object.toString
  vtable[3] = Rectangle.area        ← overrides/implements

[CLASS] Loading Circle
  vtable[0] = Object.hashCode
  vtable[1] = Object.equals
  vtable[2] = Object.toString
  vtable[3] = Circle.area           ← overrides/implements

[EXEC] main
  0: new Rectangle
  ...
  10: astore_1        // s1 = Rectangle@0x10000

  11: new Circle
  ...
  21: astore_2        // s2 = Circle@0x10100

  22: aload_1         // Push s1 (Rectangle)
  23: invokeinterface Shape.area()I
      [DISPATCH] Receiver type: Rectangle
      [DISPATCH] Looking for: area()I
      [DISPATCH] Found at vtable[3]: Rectangle.area

  28: aload_2         // Push s2 (Circle)
  29: invokeinterface Shape.area()I
      [DISPATCH] Receiver type: Circle
      [DISPATCH] Looking for: area()I
      [DISPATCH] Found at vtable[3]: Circle.area

[RESULT] a1 = 12 (Rectangle 3x4)
[RESULT] a2 = 78 (Circle r=5, ~3.14*25)

Implementation Hints:

Virtual table structure:

typedef struct {
    char *name;
    char *descriptor;
    ClassInfo *declaring_class;
    uint8_t *code;
    int max_stack;
    int max_locals;
    // ... other method info
} MethodInfo;

typedef struct ClassInfo {
    char *name;
    struct ClassInfo *super;
    struct ClassInfo **interfaces;
    int interface_count;

    FieldInfo *fields;
    int field_count;
    int instance_size;

    MethodInfo **vtable;       // Virtual method table
    int vtable_size;

    MethodInfo **methods;      // All methods (including static)
    int method_count;
} ClassInfo;

Building the vtable:

void build_vtable(ClassInfo *klass) {
    ClassInfo *super = klass->super;

    if (super == NULL) {
        // java.lang.Object - base case
        klass->vtable_size = 0;
        klass->vtable = NULL;
    } else {
        // Start with parent's vtable
        klass->vtable_size = super->vtable_size;
        klass->vtable = malloc(sizeof(MethodInfo*) * (super->vtable_size + klass->method_count));
        memcpy(klass->vtable, super->vtable, sizeof(MethodInfo*) * super->vtable_size);
    }

    // Process each method
    for (int i = 0; i < klass->method_count; i++) {
        MethodInfo *method = klass->methods[i];

        if (method->access_flags & ACC_STATIC) continue;
        if (strcmp(method->name, "<init>") == 0) continue;

        // Check if overriding a parent method
        int vtable_index = -1;
        for (int j = 0; j < klass->vtable_size; j++) {
            if (strcmp(klass->vtable[j]->name, method->name) == 0 &&
                strcmp(klass->vtable[j]->descriptor, method->descriptor) == 0) {
                vtable_index = j;
                break;
            }
        }

        if (vtable_index >= 0) {
            // Override existing slot
            klass->vtable[vtable_index] = method;
            method->vtable_index = vtable_index;
        } else {
            // New method, add to vtable
            method->vtable_index = klass->vtable_size;
            klass->vtable[klass->vtable_size++] = method;
        }
    }
}

Method dispatch:

case OP_INVOKEVIRTUAL: {
    uint16_t index = read_u2(frame);
    MethodRef *ref = resolve_method_ref(vm, frame->cp, index);

    // Get receiver object (on stack below arguments)
    int arg_count = count_args(ref->descriptor);
    void *receiver = frame->operand_stack[frame->sp - arg_count - 1].a;

    if (receiver == NULL) {
        throw_null_pointer_exception(vm);
        break;
    }

    // Virtual dispatch: look up in receiver's vtable
    ClassInfo *actual_class = OBJ_CLASS(receiver);
    MethodInfo *method = actual_class->vtable[ref->method->vtable_index];

    invoke_method(vm, method, frame, arg_count + 1);  // +1 for 'this'
    break;
}

case OP_INVOKEINTERFACE: {
    uint16_t index = read_u2(frame);
    uint8_t count = read_u1(frame);  // Argument count (for verification)
    read_u1(frame);  // Must be zero

    MethodRef *ref = resolve_method_ref(vm, frame->cp, index);

    void *receiver = frame->operand_stack[frame->sp - count].a;
    if (receiver == NULL) {
        throw_null_pointer_exception(vm);
        break;
    }

    // Interface dispatch: search for method
    ClassInfo *actual_class = OBJ_CLASS(receiver);
    MethodInfo *method = find_interface_method(actual_class, ref->name, ref->descriptor);

    invoke_method(vm, method, frame, count);
    break;
}

case OP_INVOKESPECIAL: {
    // For constructors, private methods, super calls
    // NO virtual dispatch - use resolved method directly
    uint16_t index = read_u2(frame);
    MethodRef *ref = resolve_method_ref(vm, frame->cp, index);

    int arg_count = count_args(ref->descriptor);
    invoke_method(vm, ref->method, frame, arg_count + 1);
    break;
}

Questions to explore:

  1. Why is invokespecial needed (vs invokevirtual)?
  2. How do you handle interface with multiple inheritance?
  3. What about default methods (Java 8+)?
  4. How do you optimize interface dispatch?

Learning milestones:

  1. Virtual dispatch works → You understand vtables
  2. Interface dispatch works → You understand itable/search
  3. Super calls work → You understand invokespecial
  4. Polymorphism works correctly → Dispatch is correct

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. Stack Calculator VM Intermediate Weekend ⭐⭐⭐ ⭐⭐⭐⭐
2. Class File Parser Advanced 1-2 weeks ⭐⭐⭐⭐ ⭐⭐⭐
3. Bytecode Disassembler Advanced 1-2 weeks ⭐⭐⭐⭐ ⭐⭐⭐⭐
4. Bytecode Interpreter Expert 3-4 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
5. Object System & Heap Expert 2-3 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
6. Garbage Collector Master 3-4 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
7. Exception Handling Expert 2 weeks ⭐⭐⭐⭐ ⭐⭐⭐
8. Method Dispatch Expert 2-3 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐

Recommendation

Learning Path for Maximum Understanding:

  1. Start with Project 1 (Stack Calculator) — Build intuition for stack-based execution
  2. Then Project 2 (Class Parser) — Understand the bytecode format
  3. Then Project 3 (Disassembler) — Learn every instruction
  4. Then Project 4 (Interpreter) — Make bytecode actually execute
  5. Then Project 5 (Objects) — Add object-oriented features
  6. Then Project 6 (GC) — Add automatic memory management
  7. Then Projects 7-8 — Complete the picture

By Project 6, you’ll have a working JVM that can run real Java programs!


Final Overall Project: Complete Mini-JVM

  • File: LEARN_JVM_INTERNALS_BUILD_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, C++
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 5: Master
  • Knowledge Area: Virtual Machines / Language Runtime
  • Software or Tool: GCC, GDB, Valgrind
  • Main Book: “The Java Virtual Machine Specification”

What you’ll build: A complete mini-JVM that can run non-trivial Java programs:

  • Full class file parsing
  • Complete bytecode interpreter (~200 instructions)
  • Object system with inheritance
  • Mark-sweep garbage collector
  • Exception handling
  • Virtual and interface method dispatch
  • Basic class loading
  • Native method support (System.out.println)
  • Arrays (primitive and reference)
  • Strings

Real world outcome:

$ ./minijvm examples/Fibonacci.class
Fibonacci numbers: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55

$ ./minijvm examples/BinaryTree.class
Building tree...
Inserting: 5, 3, 7, 1, 4, 6, 8
In-order traversal: 1 3 4 5 6 7 8
Tree height: 3

$ ./minijvm examples/LinkedList.class
Adding elements...
List: [1, 2, 3, 4, 5]
Removing 3...
List: [1, 2, 4, 5]

Resources Summary

Essential Books

  • “The Java Virtual Machine Specification” — The authoritative reference (free online)
  • “Inside the Java Virtual Machine” by Bill Venners — Excellent conceptual explanations
  • “The Garbage Collection Handbook” by Jones, Hosking, Moss — The definitive GC reference

Online Tutorials

GitHub Projects to Study

  • jvm.c — JVM in C
  • tojvm — Minimal JVM implementation

Official Documentation


Summary

# Project Main Language
1 Stack-Based Calculator VM C
2 Parse Java Class Files C
3 Bytecode Disassembler C
4 Basic Bytecode Interpreter C
5 Object System and Heap C
6 Mark-and-Sweep Garbage Collector C
7 Exception Handling C
8 Method Dispatch C
Final Complete Mini-JVM C

Sources