LEARN V8 JAVASCRIPT ENGINE DEEP DIVE

Learn V8 & Build a JavaScript Engine From Scratch

Goal: Deeply understand V8’s internals—parsing, bytecode compilation, interpretation, JIT compilation, garbage collection, and object representation. Then build your own JavaScript engine in C from scratch to truly master these concepts.

Why V8 Matters

V8 is arguably one of the most sophisticated pieces of software ever written. It powers:

Chrome (billions of users)
Node.js (server-side JavaScript)
Deno (modern JavaScript runtime)
Electron (desktop apps like VS Code, Slack)

Understanding V8 teaches you:

Compiler construction: Lexing, parsing, AST generation
Interpreter design: Bytecode and virtual machines
JIT compilation: Generating machine code at runtime
Garbage collection: Automatic memory management
Runtime optimization: Hidden classes, inline caching
Low-level systems: Memory layout, CPU caches, assembly

V8 Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                              V8 Engine                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   JavaScript Source Code                                                 │
│         │                                                                │
│         ▼                                                                │
│   ┌─────────────┐                                                        │
│   │   Parser    │ ─────────────────────────────────────────────────────┐ │
│   │  (Scanner + │         Produces AST                                 │ │
│   │   Parser)   │                                                      │ │
│   └──────┬──────┘                                                      │ │
│          │                                                             │ │
│          ▼                                                             │ │
│   ┌─────────────┐                                                      │ │
│   │  Ignition   │ ◄── Bytecode Compiler                                │ │
│   │ (Interpreter)│         │                                           │ │
│   └──────┬──────┘          │                                           │ │
│          │                 │ Collects type feedback                    │ │
│          │                 │ (profiling information)                   │ │
│          ▼                 ▼                                           │ │
│   ┌─────────────┐   ┌─────────────┐                                    │ │
│   │  Sparkplug  │   │   Type      │                                    │ │
│   │  (Baseline  │   │  Feedback   │                                    │ │
│   │   Compiler) │   │   Vector    │                                    │ │
│   └──────┬──────┘   └──────┬──────┘                                    │ │
│          │                 │                                           │ │
│          ▼                 ▼                                           │ │
│   ┌─────────────┐   ┌─────────────┐                                    │ │
│   │   Maglev    │◄──│   "Hot"     │                                    │ │
│   │  (Mid-tier  │   │   Code      │                                    │ │
│   │   Compiler) │   │   Detected  │                                    │ │
│   └──────┬──────┘   └─────────────┘                                    │ │
│          │                                                             │ │
│          ▼                                                             │ │
│   ┌─────────────┐                                                      │ │
│   │  TurboFan   │ ◄── Optimizing JIT Compiler                          │ │
│   │   (JIT)     │         │                                            │ │
│   └──────┬──────┘         │                                            │ │
│          │                │ Speculative optimizations                  │ │
│          │                │ based on type feedback                     │ │
│          ▼                ▼                                            │ │
│   ┌─────────────────────────────────────────────────────────────────┐  │ │
│   │                    Machine Code                                  │  │ │
│   │              (x64, ARM64, etc.)                                  │  │ │
│   └─────────────────────────────────────────────────────────────────┘  │ │
│                          │                                             │ │
│                          │ If speculation fails                        │ │
│                          │ (type mismatch, etc.)                       │ │
│                          ▼                                             │ │
│                   ┌─────────────┐                                      │ │
│                   │Deoptimize   │ ──► Back to Ignition bytecode        │ │
│                   └─────────────┘                                      │ │
│                                                                          │
├─────────────────────────────────────────────────────────────────────────┤
│                          Memory Management                               │
│  ┌────────────────────────────────────────────────────────────────────┐ │
│  │                    Orinoco Garbage Collector                        │ │
│  │  ┌──────────────────┐    ┌──────────────────────────────────────┐  │ │
│  │  │  Young Generation │    │         Old Generation               │  │ │
│  │  │  ┌──────┬───────┐ │    │  ┌────────────────────────────────┐ │  │ │
│  │  │  │ From │  To   │ │    │  │ Mark-Sweep-Compact             │ │  │ │
│  │  │  │Space │ Space │ │    │  │ (Concurrent/Parallel)          │ │  │ │
│  │  │  └──────┴───────┘ │    │  └────────────────────────────────┘ │  │ │
│  │  │  Scavenger (fast) │    │                                      │  │ │
│  │  └──────────────────┘    └──────────────────────────────────────┘  │ │
│  └────────────────────────────────────────────────────────────────────┘ │
│                                                                          │
├─────────────────────────────────────────────────────────────────────────┤
│                          Object Representation                           │
│  ┌────────────────────────────────────────────────────────────────────┐ │
│  │  Hidden Classes (Maps/Shapes)          Inline Caching              │ │
│  │  ┌─────────────┐                       ┌─────────────────────────┐ │ │
│  │  │ Object {    │──► Map ──────────────►│ Monomorphic IC          │ │ │
│  │  │   x: 1,     │    (shape descriptor) │ (single type → fast)    │ │ │
│  │  │   y: 2      │                       │                         │ │ │
│  │  │ }           │                       │ Polymorphic IC          │ │ │
│  │  └─────────────┘                       │ (few types → ok)        │ │ │
│  │                                        │                         │ │ │
│  │  Property Storage:                     │ Megamorphic IC          │ │ │
│  │  [in-object] → [out-of-object array]   │ (many types → slow)     │ │ │
│  └────────────────────────────────────────┴─────────────────────────┘ │ │
└─────────────────────────────────────────────────────────────────────────┘

The Compilation Pipeline Explained

1. Parsing

Source code → Tokens → Abstract Syntax Tree (AST)

function add(a, b) { return a + b; }

Becomes tokens: FUNCTION, IDENT(add), LPAREN, IDENT(a), COMMA, IDENT(b), RPAREN, LBRACE, RETURN, IDENT(a), PLUS, IDENT(b), SEMICOLON, RBRACE

Then AST:

FunctionDeclaration
├── name: "add"
├── params: [a, b]
└── body: ReturnStatement
         └── BinaryExpression
             ├── left: Identifier(a)
             ├── op: +
             └── right: Identifier(b)

2. Ignition (Bytecode)

AST → Bytecode (compact, portable instructions)

Ldar a1        ; Load argument 1 into accumulator
Add a2         ; Add argument 2 to accumulator
Return         ; Return accumulator

3. TurboFan (JIT)

Bytecode + Type Feedback → Optimized Machine Code

; If we know a and b are integers:
mov eax, [rbp+16]   ; Load a
add eax, [rbp+24]   ; Add b
ret                  ; Return

Project List

We’ll build progressively from simple components to a complete JavaScript engine.

Project 1: JavaScript Lexer (Scanner/Tokenizer)

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Lexical Analysis / Text Processing
Software or Tool: Custom implementation
Main Book: “Crafting Interpreters” by Robert Nystrom

What you’ll build: A lexer that tokenizes JavaScript source code, handling identifiers, keywords, numbers (including floats and hex), strings (with escape sequences), operators, and all JavaScript punctuation.

Why it teaches V8: Every JavaScript engine starts with a lexer. V8’s scanner is highly optimized, but understanding the fundamentals—character classification, lookahead, token representation—is essential before moving to parsing.

Core challenges you’ll face:

Recognizing keywords vs identifiers → maps to reserved word detection
Handling string escape sequences → maps to \n, \t, \u0000, etc.
Number literal parsing → maps to integers, floats, hex, binary, octal
Handling regex literals → maps to context-sensitive lexing

Key Concepts:

Lexical Grammar: ECMAScript Lexical Grammar
Scanner Implementation: “Crafting Interpreters” Chapter 4
Finite Automata: Understanding state machines for tokenization

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Basic C programming. Understanding of character encoding (ASCII, UTF-8).

Real world outcome:

// Your lexer in action:
$ ./jslex "let x = 42 + 'hello';"

Tokens:
  [LET]        "let"       line 1, col 1
  [IDENT]      "x"         line 1, col 5
  [ASSIGN]     "="         line 1, col 7
  [NUMBER]     "42"        line 1, col 9
  [PLUS]       "+"         line 1, col 12
  [STRING]     "'hello'"   line 1, col 14
  [SEMICOLON]  ";"         line 1, col 22
  [EOF]                    line 1, col 23

$ ./jslex "function foo(a, b) { return a + b; }"

Tokens:
  [FUNCTION]   "function"  line 1, col 1
  [IDENT]      "foo"       line 1, col 10
  [LPAREN]     "("         line 1, col 13
  [IDENT]      "a"         line 1, col 14
  [COMMA]      ","         line 1, col 15
  [IDENT]      "b"         line 1, col 17
  [RPAREN]     ")"         line 1, col 18
  ...

Implementation Hints:

Token structure:

typedef enum {
    // Literals
    TOKEN_NUMBER,
    TOKEN_STRING,
    TOKEN_IDENT,
    TOKEN_REGEX,

    // Keywords
    TOKEN_LET, TOKEN_CONST, TOKEN_VAR,
    TOKEN_FUNCTION, TOKEN_RETURN, TOKEN_IF, TOKEN_ELSE,
    TOKEN_FOR, TOKEN_WHILE, TOKEN_DO, TOKEN_BREAK, TOKEN_CONTINUE,
    TOKEN_TRUE, TOKEN_FALSE, TOKEN_NULL, TOKEN_UNDEFINED,
    TOKEN_CLASS, TOKEN_NEW, TOKEN_THIS, TOKEN_SUPER,
    TOKEN_IMPORT, TOKEN_EXPORT, TOKEN_DEFAULT,
    TOKEN_TRY, TOKEN_CATCH, TOKEN_FINALLY, TOKEN_THROW,
    TOKEN_ASYNC, TOKEN_AWAIT, TOKEN_YIELD,

    // Operators
    TOKEN_PLUS, TOKEN_MINUS, TOKEN_STAR, TOKEN_SLASH, TOKEN_PERCENT,
    TOKEN_PLUS_PLUS, TOKEN_MINUS_MINUS,
    TOKEN_EQ, TOKEN_EQ_EQ, TOKEN_EQ_EQ_EQ,
    TOKEN_BANG, TOKEN_BANG_EQ, TOKEN_BANG_EQ_EQ,
    TOKEN_LT, TOKEN_LT_EQ, TOKEN_GT, TOKEN_GT_EQ,
    TOKEN_AND_AND, TOKEN_OR_OR, TOKEN_QUESTION, TOKEN_COLON,
    TOKEN_ARROW,  // =>

    // Punctuation
    TOKEN_LPAREN, TOKEN_RPAREN,
    TOKEN_LBRACE, TOKEN_RBRACE,
    TOKEN_LBRACKET, TOKEN_RBRACKET,
    TOKEN_COMMA, TOKEN_SEMICOLON, TOKEN_DOT,

    TOKEN_EOF,
    TOKEN_ERROR
} TokenType;

typedef struct {
    TokenType type;
    const char* start;  // Pointer into source
    int length;
    int line;
    int column;

    // For number literals
    double number_value;
} Token;

typedef struct {
    const char* source;
    const char* current;
    const char* token_start;
    int line;
    int column;
} Lexer;

Core lexer logic:

Token lexer_next_token(Lexer* lexer) {
    skip_whitespace_and_comments(lexer);

    lexer->token_start = lexer->current;

    if (is_at_end(lexer)) {
        return make_token(lexer, TOKEN_EOF);
    }

    char c = advance(lexer);

    // Identifiers and keywords
    if (is_alpha(c) || c == '_' || c == '$') {
        return scan_identifier(lexer);
    }

    // Numbers
    if (is_digit(c)) {
        return scan_number(lexer);
    }

    // Strings
    if (c == '"' || c == '\'' || c == '`') {
        return scan_string(lexer, c);
    }

    // Operators and punctuation
    switch (c) {
        case '(': return make_token(lexer, TOKEN_LPAREN);
        case ')': return make_token(lexer, TOKEN_RPAREN);
        case '{': return make_token(lexer, TOKEN_LBRACE);
        case '}': return make_token(lexer, TOKEN_RBRACE);
        case '+':
            if (match(lexer, '+')) return make_token(lexer, TOKEN_PLUS_PLUS);
            if (match(lexer, '=')) return make_token(lexer, TOKEN_PLUS_EQ);
            return make_token(lexer, TOKEN_PLUS);
        case '=':
            if (match(lexer, '=')) {
                if (match(lexer, '=')) return make_token(lexer, TOKEN_EQ_EQ_EQ);
                return make_token(lexer, TOKEN_EQ_EQ);
            }
            if (match(lexer, '>')) return make_token(lexer, TOKEN_ARROW);
            return make_token(lexer, TOKEN_EQ);
        // ... more cases
    }

    return error_token(lexer, "Unexpected character");
}

Token scan_identifier(Lexer* lexer) {
    while (is_alpha_numeric(peek(lexer)) || peek(lexer) == '_' || peek(lexer) == '$') {
        advance(lexer);
    }

    // Check if it's a keyword
    TokenType type = check_keyword(lexer);
    return make_token(lexer, type);
}

TokenType check_keyword(Lexer* lexer) {
    int length = lexer->current - lexer->token_start;

    // Use a trie or hash table for efficiency
    // Simple version: check each keyword
    switch (lexer->token_start[0]) {
        case 'f':
            if (length == 8 && memcmp(lexer->token_start, "function", 8) == 0)
                return TOKEN_FUNCTION;
            if (length == 5 && memcmp(lexer->token_start, "false", 5) == 0)
                return TOKEN_FALSE;
            if (length == 3 && memcmp(lexer->token_start, "for", 3) == 0)
                return TOKEN_FOR;
            break;
        case 'l':
            if (length == 3 && memcmp(lexer->token_start, "let", 3) == 0)
                return TOKEN_LET;
            break;
        // ... more keywords
    }

    return TOKEN_IDENT;
}

Learning milestones:

You tokenize basic expressions → You understand character-by-character processing
You handle all number formats → You understand JavaScript’s numeric literals
You distinguish keywords from identifiers → You understand reserved words
You handle template literals → You understand context-sensitive lexing

Project 2: JavaScript Parser (AST Generator)

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Parsing / Syntax Analysis / Grammar
Software or Tool: Custom implementation
Main Book: “Crafting Interpreters” by Robert Nystrom

What you’ll build: A recursive descent parser that produces an Abstract Syntax Tree (AST) from JavaScript tokens, handling expressions (with proper precedence), statements, functions, and classes.

Why it teaches V8: V8’s parser is a hand-written recursive descent parser (not generated). Understanding operator precedence, expression parsing, and statement parsing is fundamental to any language implementation.

Core challenges you’ll face:

Operator precedence → maps to Pratt parsing or precedence climbing
Left recursion elimination → maps to expression parsing challenges
Statement vs expression ambiguity → maps to JavaScript’s grammar quirks
Automatic semicolon insertion → maps to JavaScript’s ASI rules

Key Concepts:

ECMAScript Grammar: ECMA-262 Syntax
Pratt Parsing: “Crafting Interpreters” Chapter 17
Recursive Descent: “Crafting Interpreters” Chapters 6, 8

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Project 1 completed. Understanding of context-free grammars.

Real world outcome:

// Your parser in action:
$ ./jsparse "let x = 1 + 2 * 3;"

AST:
Program
└── VariableDeclaration (let)
    └── VariableDeclarator
        ├── id: Identifier "x"
        └── init: BinaryExpression (+)
            ├── left: Literal 1
            └── right: BinaryExpression (*)
                ├── left: Literal 2
                └── right: Literal 3

$ ./jsparse "function add(a, b) { return a + b; }"

AST:
Program
└── FunctionDeclaration
    ├── id: Identifier "add"
    ├── params: [Identifier "a", Identifier "b"]
    └── body: BlockStatement
        └── ReturnStatement
            └── argument: BinaryExpression (+)
                ├── left: Identifier "a"
                └── right: Identifier "b"

Implementation Hints:

AST node structures:

typedef enum {
    NODE_PROGRAM,
    NODE_LITERAL,
    NODE_IDENTIFIER,
    NODE_BINARY_EXPR,
    NODE_UNARY_EXPR,
    NODE_CALL_EXPR,
    NODE_MEMBER_EXPR,
    NODE_ARRAY_EXPR,
    NODE_OBJECT_EXPR,
    NODE_FUNCTION_DECL,
    NODE_FUNCTION_EXPR,
    NODE_ARROW_FUNCTION,
    NODE_VAR_DECL,
    NODE_BLOCK_STMT,
    NODE_IF_STMT,
    NODE_WHILE_STMT,
    NODE_FOR_STMT,
    NODE_RETURN_STMT,
    NODE_EXPR_STMT,
    // ... more
} NodeType;

typedef struct ASTNode {
    NodeType type;
    int line;
    int column;

    union {
        // Literal
        struct {
            enum { LIT_NUMBER, LIT_STRING, LIT_BOOL, LIT_NULL, LIT_UNDEFINED } kind;
            union {
                double number;
                char* string;
                bool boolean;
            };
        } literal;

        // Identifier
        struct {
            char* name;
        } identifier;

        // Binary expression
        struct {
            TokenType op;
            struct ASTNode* left;
            struct ASTNode* right;
        } binary;

        // Function declaration
        struct {
            char* name;
            struct ASTNode** params;
            int param_count;
            struct ASTNode* body;
        } function;

        // ... more node types
    };
} ASTNode;

Pratt parser for expressions (handles precedence elegantly):

typedef enum {
    PREC_NONE,
    PREC_ASSIGNMENT,  // =
    PREC_OR,          // ||
    PREC_AND,         // &&
    PREC_EQUALITY,    // == !=
    PREC_COMPARISON,  // < > <= >=
    PREC_TERM,        // + -
    PREC_FACTOR,      // * /
    PREC_UNARY,       // ! -
    PREC_CALL,        // . ()
    PREC_PRIMARY
} Precedence;

typedef ASTNode* (*ParseFn)(Parser* parser, bool can_assign);
typedef ASTNode* (*InfixParseFn)(Parser* parser, ASTNode* left, bool can_assign);

typedef struct {
    ParseFn prefix;
    InfixParseFn infix;
    Precedence precedence;
} ParseRule;

// The heart of Pratt parsing
ASTNode* parse_precedence(Parser* parser, Precedence precedence) {
    advance(parser);
    ParseFn prefix_rule = get_rule(parser->previous.type)->prefix;

    if (prefix_rule == NULL) {
        error(parser, "Expect expression.");
        return NULL;
    }

    bool can_assign = precedence <= PREC_ASSIGNMENT;
    ASTNode* left = prefix_rule(parser, can_assign);

    while (precedence <= get_rule(parser->current.type)->precedence) {
        advance(parser);
        InfixParseFn infix_rule = get_rule(parser->previous.type)->infix;
        left = infix_rule(parser, left, can_assign);
    }

    return left;
}

// Parse rules table
ParseRule rules[] = {
    [TOKEN_LPAREN]    = {grouping,  call,    PREC_CALL},
    [TOKEN_MINUS]     = {unary,     binary,  PREC_TERM},
    [TOKEN_PLUS]      = {NULL,      binary,  PREC_TERM},
    [TOKEN_SLASH]     = {NULL,      binary,  PREC_FACTOR},
    [TOKEN_STAR]      = {NULL,      binary,  PREC_FACTOR},
    [TOKEN_NUMBER]    = {number,    NULL,    PREC_NONE},
    [TOKEN_STRING]    = {string,    NULL,    PREC_NONE},
    [TOKEN_IDENT]     = {identifier,NULL,    PREC_NONE},
    [TOKEN_EQ_EQ]     = {NULL,      binary,  PREC_EQUALITY},
    [TOKEN_EQ_EQ_EQ]  = {NULL,      binary,  PREC_EQUALITY},
    [TOKEN_LT]        = {NULL,      binary,  PREC_COMPARISON},
    [TOKEN_GT]        = {NULL,      binary,  PREC_COMPARISON},
    [TOKEN_AND_AND]   = {NULL,      and_,    PREC_AND},
    [TOKEN_OR_OR]     = {NULL,      or_,     PREC_OR},
    // ... more rules
};

ASTNode* binary(Parser* parser, ASTNode* left, bool can_assign) {
    TokenType op = parser->previous.type;
    ParseRule* rule = get_rule(op);

    // Parse right side with higher precedence (left-associative)
    ASTNode* right = parse_precedence(parser, (Precedence)(rule->precedence + 1));

    ASTNode* node = make_node(NODE_BINARY_EXPR);
    node->binary.op = op;
    node->binary.left = left;
    node->binary.right = right;
    return node;
}

Learning milestones:

You parse expressions with correct precedence → You understand Pratt parsing
You parse statements → You understand statement grammar
You handle functions and closures → You understand scope
You produce a correct AST → You can build on this for compilation

Project 3: AST Interpreter (Tree-Walking)

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Interpretation / Semantics / Runtime
Software or Tool: Custom implementation
Main Book: “Crafting Interpreters” by Robert Nystrom

What you’ll build: An interpreter that walks the AST and executes JavaScript code directly, implementing variables, functions, closures, objects, and basic built-ins.

Why it teaches V8: Before V8 moved to bytecode, early JavaScript engines used tree-walking interpreters. Understanding how to execute an AST teaches you JavaScript semantics before you optimize them.

Core challenges you’ll face:

Value representation → maps to tagged unions, NaN-boxing
Environment/scope chains → maps to lexical scoping, closures
Function calls → maps to call stack, arguments
Object property access → maps to prototype chain

Key Concepts:

JavaScript Semantics: ECMAScript Specification
Closures: Understanding lexical environments
Tree-Walking Interpretation: “Crafting Interpreters” Part II

Difficulty: Intermediate Time estimate: 2 weeks Prerequisites: Projects 1-2 completed. Understanding of JavaScript semantics.

Real world outcome:

$ ./jsinterp

MiniJS> let x = 10;
undefined

MiniJS> let add = function(a, b) { return a + b; };
undefined

MiniJS> add(x, 5);
15

MiniJS> let counter = function() {
...>     let count = 0;
...>     return function() {
...>         count = count + 1;
...>         return count;
...>     };
...> };
undefined

MiniJS> let c = counter();
undefined

MiniJS> c();
1

MiniJS> c();
2

MiniJS> c();
3

MiniJS> let obj = { name: "Alice", greet: function() { return "Hello, " + this.name; } };
undefined

MiniJS> obj.greet();
"Hello, Alice"

Implementation Hints:

JavaScript value representation:

typedef enum {
    VAL_UNDEFINED,
    VAL_NULL,
    VAL_BOOL,
    VAL_NUMBER,
    VAL_STRING,
    VAL_OBJECT,
    VAL_FUNCTION,
    VAL_ARRAY
} ValueType;

typedef struct Value {
    ValueType type;
    union {
        bool boolean;
        double number;
        char* string;
        struct Object* object;
        struct Function* function;
        struct Array* array;
    };
} Value;

// For efficiency, you can use NaN-boxing (fits in 8 bytes):
// 64-bit double has lots of unused NaN bit patterns
// Use those patterns to encode pointers and other types
typedef uint64_t NanBoxedValue;

#define QNAN     ((uint64_t)0x7ffc000000000000)
#define SIGN_BIT ((uint64_t)0x8000000000000000)

#define TAG_NIL   1  // 01
#define TAG_FALSE 2  // 10
#define TAG_TRUE  3  // 11

#define IS_NUMBER(v)  (((v) & QNAN) != QNAN)
#define IS_NIL(v)     ((v) == (QNAN | TAG_NIL))
#define IS_BOOL(v)    (((v) | 1) == (QNAN | TAG_TRUE))
#define IS_OBJ(v)     (((v) & (QNAN | SIGN_BIT)) == (QNAN | SIGN_BIT))

Environment for scoping:

typedef struct Environment {
    struct Environment* enclosing;  // Parent scope
    HashTable variables;            // Variable name → Value
} Environment;

Environment* environment_new(Environment* enclosing) {
    Environment* env = malloc(sizeof(Environment));
    env->enclosing = enclosing;
    hash_table_init(&env->variables);
    return env;
}

void environment_define(Environment* env, const char* name, Value value) {
    hash_table_set(&env->variables, name, value);
}

Value environment_get(Environment* env, const char* name) {
    Value value;
    if (hash_table_get(&env->variables, name, &value)) {
        return value;
    }
    if (env->enclosing != NULL) {
        return environment_get(env->enclosing, name);
    }
    // Error: undefined variable
    runtime_error("Undefined variable '%s'", name);
}

AST evaluation:

Value evaluate(Interpreter* interp, ASTNode* node) {
    switch (node->type) {
        case NODE_LITERAL:
            return evaluate_literal(node);

        case NODE_IDENTIFIER:
            return environment_get(interp->env, node->identifier.name);

        case NODE_BINARY_EXPR: {
            Value left = evaluate(interp, node->binary.left);
            Value right = evaluate(interp, node->binary.right);
            return evaluate_binary(node->binary.op, left, right);
        }

        case NODE_CALL_EXPR: {
            Value callee = evaluate(interp, node->call.callee);
            if (callee.type != VAL_FUNCTION) {
                runtime_error("Can only call functions");
            }

            // Evaluate arguments
            Value* args = malloc(sizeof(Value) * node->call.arg_count);
            for (int i = 0; i < node->call.arg_count; i++) {
                args[i] = evaluate(interp, node->call.arguments[i]);
            }

            return call_function(interp, callee.function, args, node->call.arg_count);
        }

        case NODE_FUNCTION_EXPR: {
            // Create function with closure over current environment
            Function* fn = malloc(sizeof(Function));
            fn->params = node->function.params;
            fn->param_count = node->function.param_count;
            fn->body = node->function.body;
            fn->closure = interp->env;  // Capture environment!

            Value val = { .type = VAL_FUNCTION, .function = fn };
            return val;
        }

        case NODE_VAR_DECL: {
            Value value = VAL_UNDEFINED;
            if (node->var_decl.init != NULL) {
                value = evaluate(interp, node->var_decl.init);
            }
            environment_define(interp->env, node->var_decl.name, value);
            return (Value){ .type = VAL_UNDEFINED };
        }

        // ... more cases
    }
}

Value call_function(Interpreter* interp, Function* fn, Value* args, int arg_count) {
    // Create new environment for function body
    // Parent is the CLOSURE environment, not the call site!
    Environment* fn_env = environment_new(fn->closure);

    // Bind parameters
    for (int i = 0; i < fn->param_count; i++) {
        Value arg = (i < arg_count) ? args[i] : (Value){ .type = VAL_UNDEFINED };
        environment_define(fn_env, fn->params[i]->identifier.name, arg);
    }

    // Execute body
    Environment* previous = interp->env;
    interp->env = fn_env;

    Value result = execute_block(interp, fn->body);

    interp->env = previous;
    return result;
}

Learning milestones:

You evaluate expressions → You understand JavaScript operators
You implement variables and scope → You understand environments
You implement closures → You understand lexical scoping
You implement objects → You understand JavaScript’s object model

Project 4: Bytecode Compiler

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Compilation / Code Generation / Bytecode
Software or Tool: Custom implementation
Main Book: “Crafting Interpreters” by Robert Nystrom

What you’ll build: A compiler that translates JavaScript AST to bytecode—a compact, linear sequence of instructions that a virtual machine can execute efficiently.

Why it teaches V8: V8’s Ignition is a bytecode interpreter. Bytecode is smaller than AST, faster to interpret, and easier to optimize. This is the foundation of modern JavaScript engines.

Core challenges you’ll face:

Instruction set design → maps to what operations do you need?
Register vs stack machine → maps to V8 uses registers
Constant pool → maps to storing literals efficiently
Control flow → maps to jumps, loops, conditionals

Key Concepts:

V8 Bytecode: Understanding V8’s Bytecode
Bytecode Design: “Crafting Interpreters” Chapter 14
Stack vs Register Machines: Comparing approaches

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 1-3 completed. Understanding of assembly concepts.

Real world outcome:

$ ./jscompile "function add(a, b) { return a + b; }"

Bytecode for function 'add':
  Constants: []
  Parameters: 2 (a, b)
  Registers: 3

  0000: Ldar r1          ; Load register 1 (param 'a') into accumulator
  0002: Add r2           ; Add register 2 (param 'b') to accumulator
  0004: Return           ; Return accumulator

$ ./jscompile "let x = 1; if (x > 0) { x = x + 1; } else { x = x - 1; }"

Bytecode for <script>:
  Constants: [1, 0, 1, 1]
  Locals: 1 (x)
  Registers: 2

  0000: LdaConstant [0]       ; Load 1
  0002: Star r0               ; Store in r0 (x)
  0004: Ldar r0               ; Load x
  0006: LdaConstant [1]       ; Load 0
  0008: TestGreaterThan r0    ; Compare x > 0
  0010: JumpIfFalse [22]      ; Jump to else if false
  0012: Ldar r0               ; Load x
  0014: AddSmi [2]            ; Add constant 1
  0016: Star r0               ; Store back to x
  0018: Jump [28]             ; Jump past else
  0022: Ldar r0               ; Load x (else branch)
  0024: SubSmi [3]            ; Subtract constant 1
  0026: Star r0               ; Store back to x
  0028: LdaUndefined          ; Result of if statement
  0030: Return

Implementation Hints:

Bytecode instruction set:

typedef enum {
    // Load/Store
    OP_LDA_CONSTANT,     // Load constant into accumulator
    OP_LDA_UNDEFINED,    // Load undefined
    OP_LDA_NULL,         // Load null
    OP_LDA_TRUE,         // Load true
    OP_LDA_FALSE,        // Load false
    OP_LDA_ZERO,         // Load 0
    OP_LDA_SMI,          // Load small integer
    OP_LDAR,             // Load register into accumulator
    OP_STAR,             // Store accumulator in register

    // Arithmetic
    OP_ADD,              // Add register to accumulator
    OP_SUB,              // Subtract register from accumulator
    OP_MUL,              // Multiply
    OP_DIV,              // Divide
    OP_MOD,              // Modulo
    OP_ADD_SMI,          // Add small integer to accumulator
    OP_SUB_SMI,          // Subtract small integer

    // Comparison
    OP_TEST_EQUAL,
    OP_TEST_STRICT_EQUAL,
    OP_TEST_LESS_THAN,
    OP_TEST_GREATER_THAN,

    // Unary
    OP_NEGATE,
    OP_NOT,
    OP_TYPEOF,

    // Control flow
    OP_JUMP,             // Unconditional jump
    OP_JUMP_IF_FALSE,    // Jump if accumulator is falsy
    OP_JUMP_IF_TRUE,     // Jump if accumulator is truthy
    OP_LOOP,             // Backward jump (for loops)

    // Functions
    OP_CALL,             // Call function
    OP_RETURN,           // Return from function

    // Variables
    OP_LDA_GLOBAL,       // Load global variable
    OP_STA_GLOBAL,       // Store global variable
    OP_LDA_NAMED,        // Load named property
    OP_STA_NAMED,        // Store named property

    // Objects
    OP_CREATE_OBJECT,
    OP_CREATE_ARRAY,
    OP_GET_PROPERTY,
    OP_SET_PROPERTY,

    // Closures
    OP_CREATE_CLOSURE,
    OP_LDA_CONTEXT,      // Load from closure context
    OP_STA_CONTEXT,      // Store to closure context
} OpCode;

typedef struct {
    uint8_t* code;
    int count;
    int capacity;

    // Constant pool
    Value* constants;
    int constant_count;
    int constant_capacity;
} Chunk;

Compiler structure:

typedef struct {
    Token current;
    Token previous;
    Lexer* lexer;

    Chunk* chunk;

    // Local variables
    struct {
        Token name;
        int depth;  // Scope depth
        bool is_captured;  // For closures
    } locals[256];
    int local_count;
    int scope_depth;

    // For closures
    struct {
        uint8_t index;
        bool is_local;
    } upvalues[256];
    int upvalue_count;

} Compiler;

void emit_byte(Compiler* c, uint8_t byte) {
    chunk_write(c->chunk, byte);
}

void emit_bytes(Compiler* c, uint8_t b1, uint8_t b2) {
    emit_byte(c, b1);
    emit_byte(c, b2);
}

int emit_jump(Compiler* c, uint8_t instruction) {
    emit_byte(c, instruction);
    emit_byte(c, 0xff);  // Placeholder
    emit_byte(c, 0xff);
    return c->chunk->count - 2;  // Return offset to patch
}

void patch_jump(Compiler* c, int offset) {
    int jump = c->chunk->count - offset - 2;
    c->chunk->code[offset] = (jump >> 8) & 0xff;
    c->chunk->code[offset + 1] = jump & 0xff;
}

Compiling control flow:

void compile_if_statement(Compiler* c, ASTNode* node) {
    // Compile condition
    compile_expression(c, node->if_stmt.condition);

    // Jump to else if false
    int then_jump = emit_jump(c, OP_JUMP_IF_FALSE);

    // Pop condition (it's still on stack)
    emit_byte(c, OP_POP);

    // Compile then branch
    compile_statement(c, node->if_stmt.then_branch);

    // Jump over else
    int else_jump = emit_jump(c, OP_JUMP);

    // Patch the then jump to here
    patch_jump(c, then_jump);

    // Pop condition (for else case)
    emit_byte(c, OP_POP);

    // Compile else branch if present
    if (node->if_stmt.else_branch != NULL) {
        compile_statement(c, node->if_stmt.else_branch);
    }

    // Patch the else jump
    patch_jump(c, else_jump);
}

void compile_while_statement(Compiler* c, ASTNode* node) {
    int loop_start = c->chunk->count;

    // Compile condition
    compile_expression(c, node->while_stmt.condition);

    // Exit if false
    int exit_jump = emit_jump(c, OP_JUMP_IF_FALSE);
    emit_byte(c, OP_POP);

    // Compile body
    compile_statement(c, node->while_stmt.body);

    // Loop back
    emit_loop(c, loop_start);

    // Patch exit
    patch_jump(c, exit_jump);
    emit_byte(c, OP_POP);
}

Learning milestones:

You design an instruction set → You understand bytecode encoding
You compile expressions → You understand stack-based evaluation
You compile control flow → You understand jumps and patching
You compile functions → You understand call frames

Project 5: Bytecode Virtual Machine

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Virtual Machines / Interpretation / Performance
Software or Tool: Custom implementation
Main Book: “Crafting Interpreters” by Robert Nystrom

What you’ll build: A bytecode interpreter (virtual machine) that executes the bytecode from Project 4, using a register-based or stack-based design with a dispatch loop.

Why it teaches V8: V8’s Ignition is a register-based bytecode interpreter. Understanding the fetch-decode-execute loop, dispatch techniques, and stack management is essential for understanding how JavaScript actually runs.

Core challenges you’ll face:

Dispatch loop optimization → maps to computed goto, direct threading
Call stack management → maps to call frames, return addresses
Value representation → maps to efficient tagging
Performance → maps to why bytecode is faster than AST walking

Key Concepts:

Ignition Design: Firing up Ignition
Dispatch Techniques: Direct threading, computed goto
VM Implementation: “Crafting Interpreters” Chapter 15

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Projects 1-4 completed. Understanding of call stacks.

Real world outcome:

$ ./jsvm bytecode.bin

MiniJS VM v0.1
Executing 156 bytes of bytecode...

Result: 42

Execution stats:
  Instructions executed: 1,234
  Execution time: 0.5ms
  Instructions/sec: 2,468,000

$ ./jsvm --trace bytecode.bin

0000: LdaConstant [0]        ; acc = 10
0002: Star r0                ; r0 = acc (10)
0004: LdaConstant [1]        ; acc = 5
0006: Add r0                 ; acc = acc + r0 (15)
0008: Star r1                ; r1 = acc (15)
0010: Return                 ; return acc

Implementation Hints:

VM structure:

#define STACK_MAX 256
#define FRAMES_MAX 64

typedef struct {
    Closure* closure;
    uint8_t* ip;           // Instruction pointer
    Value* slots;          // First slot for this frame on stack
} CallFrame;

typedef struct {
    CallFrame frames[FRAMES_MAX];
    int frame_count;

    Value stack[STACK_MAX];
    Value* stack_top;

    // Registers (V8-style, or use stack)
    Value accumulator;

    // Globals
    HashTable globals;

    // GC roots
    Object* objects;  // Linked list of all objects
} VM;

Main dispatch loop:

// Using computed goto for speed (GCC/Clang extension)
static void* dispatch_table[] = {
    &&op_lda_constant,
    &&op_lda_undefined,
    &&op_star,
    &&op_ldar,
    &&op_add,
    &&op_sub,
    &&op_jump,
    &&op_jump_if_false,
    &&op_call,
    &&op_return,
    // ... more
};

#define DISPATCH() goto *dispatch_table[*ip++]
#define READ_BYTE() (*ip++)
#define READ_SHORT() (ip += 2, (uint16_t)((ip[-2] << 8) | ip[-1]))
#define READ_CONSTANT() (frame->closure->function->chunk.constants[READ_BYTE()])

InterpretResult vm_run(VM* vm) {
    CallFrame* frame = &vm->frames[vm->frame_count - 1];
    uint8_t* ip = frame->ip;
    Value* slots = frame->slots;
    Value acc = vm->accumulator;

    DISPATCH();

op_lda_constant:
    acc = READ_CONSTANT();
    DISPATCH();

op_lda_undefined:
    acc = VAL_UNDEFINED;
    DISPATCH();

op_star: {
    uint8_t reg = READ_BYTE();
    slots[reg] = acc;
    DISPATCH();
}

op_ldar: {
    uint8_t reg = READ_BYTE();
    acc = slots[reg];
    DISPATCH();
}

op_add: {
    uint8_t reg = READ_BYTE();
    if (IS_NUMBER(acc) && IS_NUMBER(slots[reg])) {
        acc = NUMBER_VAL(AS_NUMBER(acc) + AS_NUMBER(slots[reg]));
    } else {
        // Handle string concatenation, type coercion
        acc = runtime_add(vm, acc, slots[reg]);
    }
    DISPATCH();
}

op_jump: {
    uint16_t offset = READ_SHORT();
    ip += offset;
    DISPATCH();
}

op_jump_if_false: {
    uint16_t offset = READ_SHORT();
    if (is_falsy(acc)) {
        ip += offset;
    }
    DISPATCH();
}

op_call: {
    uint8_t arg_count = READ_BYTE();
    Value callee = slots[READ_BYTE()];  // Function to call

    if (!IS_CLOSURE(callee)) {
        runtime_error("Can only call functions");
        return INTERPRET_RUNTIME_ERROR;
    }

    // Save current state
    frame->ip = ip;
    vm->accumulator = acc;

    // Set up new frame
    Closure* closure = AS_CLOSURE(callee);
    CallFrame* new_frame = &vm->frames[vm->frame_count++];
    new_frame->closure = closure;
    new_frame->ip = closure->function->chunk.code;
    new_frame->slots = vm->stack_top - arg_count - 1;

    frame = new_frame;
    ip = frame->ip;
    slots = frame->slots;
    DISPATCH();
}

op_return: {
    Value result = acc;
    vm->frame_count--;

    if (vm->frame_count == 0) {
        vm->accumulator = result;
        return INTERPRET_OK;
    }

    vm->stack_top = frame->slots;
    frame = &vm->frames[vm->frame_count - 1];
    ip = frame->ip;
    slots = frame->slots;
    acc = result;
    DISPATCH();
}
}

Learning milestones:

You implement the dispatch loop → You understand bytecode interpretation
You handle function calls → You understand call frames
You optimize dispatch → You understand computed goto
You measure performance → You understand why bytecode is fast

Project 6: Mark-Sweep Garbage Collector

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Memory Management / GC Algorithms
Software or Tool: Custom implementation
Main Book: “Crafting Interpreters” by Robert Nystrom (Chapter 26)

What you’ll build: A tracing garbage collector using mark-sweep algorithm that automatically frees unreachable objects, integrated with your JavaScript VM.

Why it teaches V8: V8’s Orinoco garbage collector is sophisticated (generational, concurrent, parallel), but the fundamentals start with mark-sweep. Understanding GC is essential for understanding JavaScript performance characteristics.

Core challenges you’ll face:

Finding GC roots → maps to stack, globals, registers
Tracing object graph → maps to marking reachable objects
Handling cycles → maps to mark-sweep handles cycles naturally
Triggering GC → maps to when to collect?

Key Concepts:

Mark-Sweep: Baby’s First Garbage Collector
V8 GC: Trash talk: the Orinoco garbage collector
GC in Interpreters: “Crafting Interpreters” Chapter 26

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1-5 completed. Understanding of memory allocation.

Real world outcome:

$ ./jsvm --gc-trace test.js

[GC] Allocated: String "hello" (32 bytes)
[GC] Allocated: Object (64 bytes)
[GC] Allocated: Array (128 bytes)
[GC] Allocated: Function (48 bytes)
[GC] Heap size: 272 bytes, threshold: 256 bytes
[GC] Starting collection...
[GC] Mark phase: 4 roots
[GC]   Marked: Object @ 0x1000
[GC]   Marked: String "hello" @ 0x1040
[GC] Sweep phase:
[GC]   Freed: Array @ 0x1080 (128 bytes) - unreachable
[GC] Collection complete: 128 bytes freed, 144 bytes remaining
[GC] New threshold: 288 bytes

Result: { name: "hello" }
Total allocations: 5
Total collections: 1
Peak heap: 272 bytes

Implementation Hints:

Object header with GC mark bit:

typedef struct Object {
    struct Object* next;  // Intrusive linked list of all objects
    bool is_marked;       // GC mark bit
    ObjectType type;
} Object;

typedef struct {
    Object obj;  // Must be first!
    char* chars;
    int length;
    uint32_t hash;
} ObjString;

typedef struct {
    Object obj;
    int count;
    int capacity;
    Value* values;
} ObjArray;

typedef struct {
    Object obj;
    HashTable properties;
    ObjShape* shape;  // Hidden class
} ObjObject;

Mark-sweep implementation:

void gc_collect(VM* vm) {
    size_t before = vm->bytes_allocated;

    // Mark phase
    gc_mark_roots(vm);
    gc_trace_references(vm);

    // Sweep phase
    gc_sweep(vm);

    // Adjust threshold
    vm->next_gc = vm->bytes_allocated * 2;

    printf("[GC] Collected %zu bytes (from %zu to %zu) next at %zu\n",
           before - vm->bytes_allocated, before,
           vm->bytes_allocated, vm->next_gc);
}

void gc_mark_roots(VM* vm) {
    // Mark stack
    for (Value* slot = vm->stack; slot < vm->stack_top; slot++) {
        gc_mark_value(vm, *slot);
    }

    // Mark call frames (closures)
    for (int i = 0; i < vm->frame_count; i++) {
        gc_mark_object(vm, (Object*)vm->frames[i].closure);
    }

    // Mark globals
    gc_mark_table(vm, &vm->globals);

    // Mark accumulator
    gc_mark_value(vm, vm->accumulator);
}

void gc_mark_value(VM* vm, Value value) {
    if (IS_OBJ(value)) {
        gc_mark_object(vm, AS_OBJ(value));
    }
}

void gc_mark_object(VM* vm, Object* object) {
    if (object == NULL) return;
    if (object->is_marked) return;  // Already marked

    object->is_marked = true;

    // Add to gray list (work list) for tracing
    if (vm->gray_capacity < vm->gray_count + 1) {
        vm->gray_capacity = vm->gray_capacity < 8 ? 8 : vm->gray_capacity * 2;
        vm->gray_stack = realloc(vm->gray_stack,
                                  sizeof(Object*) * vm->gray_capacity);
    }
    vm->gray_stack[vm->gray_count++] = object;
}

void gc_trace_references(VM* vm) {
    while (vm->gray_count > 0) {
        Object* object = vm->gray_stack[--vm->gray_count];
        gc_blacken_object(vm, object);
    }
}

void gc_blacken_object(VM* vm, Object* object) {
    switch (object->type) {
        case OBJ_STRING:
            // Strings have no references
            break;

        case OBJ_ARRAY: {
            ObjArray* array = (ObjArray*)object;
            for (int i = 0; i < array->count; i++) {
                gc_mark_value(vm, array->values[i]);
            }
            break;
        }

        case OBJ_OBJECT: {
            ObjObject* obj = (ObjObject*)object;
            gc_mark_table(vm, &obj->properties);
            gc_mark_object(vm, (Object*)obj->shape);
            break;
        }

        case OBJ_CLOSURE: {
            ObjClosure* closure = (ObjClosure*)object;
            gc_mark_object(vm, (Object*)closure->function);
            for (int i = 0; i < closure->upvalue_count; i++) {
                gc_mark_object(vm, (Object*)closure->upvalues[i]);
            }
            break;
        }
    }
}

void gc_sweep(VM* vm) {
    Object* previous = NULL;
    Object* object = vm->objects;

    while (object != NULL) {
        if (object->is_marked) {
            // Survived - unmark for next cycle
            object->is_marked = false;
            previous = object;
            object = object->next;
        } else {
            // Unreachable - free it
            Object* unreached = object;
            object = object->next;

            if (previous != NULL) {
                previous->next = object;
            } else {
                vm->objects = object;
            }

            gc_free_object(vm, unreached);
        }
    }
}

Learning milestones:

You identify GC roots → You understand what keeps objects alive
You implement marking → You understand graph traversal
You implement sweeping → You understand memory reclamation
You tune GC timing → You understand throughput vs latency

Project 7: Hidden Classes (Object Shapes)

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Optimization / Object Representation
Software or Tool: Custom implementation
Main Book: V8 Blog articles

What you’ll build: A hidden class (shape/map) system that tracks object layouts, enabling fast property access through offset-based lookups instead of hash table lookups.

Why it teaches V8: Hidden classes are one of V8’s key optimizations. They make JavaScript object property access nearly as fast as C struct field access. This is the foundation of inline caching.

Core challenges you’ll face:

Tracking object shape → maps to which properties in which order
Shape transitions → maps to adding a property creates a new shape
Property offset lookup → maps to knowing where a property lives
Sharing shapes → maps to objects with same structure share shapes

Key Concepts:

Hidden Classes: JavaScript Hidden Classes
V8 Maps: V8 Hidden Class
Fast Property Access: Design Elements

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Projects 1-6 completed. Deep understanding of object models.

Real world outcome:

$ ./jsvm --trace-shapes

let p1 = {};        // Shape: <root>
                    // p1.shape = Shape@0x1000 (properties: [])

p1.x = 1;           // Shape transition: <root> -> {x}
                    // p1.shape = Shape@0x1040 (properties: [x@offset=0])

p1.y = 2;           // Shape transition: {x} -> {x, y}
                    // p1.shape = Shape@0x1080 (properties: [x@0, y@1])

let p2 = {};        // Shape: <root>
p2.x = 10;          // Shape transition: <root> -> {x}
                    // p2.shape = Shape@0x1040 (SAME as p1 after x!)

p2.y = 20;          // Shape transition: {x} -> {x, y}
                    // p2.shape = Shape@0x1080 (SAME as p1!)

// p1 and p2 share shapes! Fast property access via offset.

Shape Statistics:
  Total shapes created: 3
  Shape transitions: 4 (2 reused existing shapes)
  Property lookups: 8 (all via cached offset)

Implementation Hints:

Shape (hidden class) structure:

typedef struct PropertyDescriptor {
    ObjString* name;
    int offset;        // Offset in object's property storage
    uint8_t attributes; // Writable, enumerable, configurable
} PropertyDescriptor;

typedef struct Shape {
    Object obj;  // GC header

    struct Shape* parent;          // Previous shape in chain
    PropertyDescriptor* own_property;  // The property this shape adds

    // Transitions: when adding a new property, which shape do we go to?
    HashTable transitions;  // property_name -> Shape*

    int property_count;     // Total properties in this shape chain
    int in_object_count;    // How many fit in-object
} Shape;

typedef struct {
    Object obj;

    Shape* shape;           // Hidden class
    Value* in_object;       // In-object properties (fast, fixed size)
    Value* out_of_line;     // Overflow properties (slower, growable)
    int out_of_line_count;
} ObjJSObject;

Shape transitions:

// Get or create shape for adding property
Shape* shape_add_property(VM* vm, Shape* current, ObjString* name) {
    // Check if transition already exists
    Value existing;
    if (hash_table_get(&current->transitions, name, &existing)) {
        return AS_SHAPE(existing);
    }

    // Create new shape
    Shape* new_shape = allocate_shape(vm);
    new_shape->parent = current;
    new_shape->own_property = allocate_property_descriptor(vm);
    new_shape->own_property->name = name;
    new_shape->own_property->offset = current->property_count;
    new_shape->property_count = current->property_count + 1;

    // Cache the transition
    hash_table_set(&current->transitions, name, SHAPE_VAL(new_shape));

    return new_shape;
}

// Fast property access using shape
Value object_get_property(ObjJSObject* obj, ObjString* name) {
    // Walk shape chain to find property
    Shape* shape = obj->shape;
    while (shape != NULL) {
        if (shape->own_property != NULL &&
            shape->own_property->name == name) {
            int offset = shape->own_property->offset;
            if (offset < obj->shape->in_object_count) {
                return obj->in_object[offset];
            } else {
                return obj->out_of_line[offset - obj->shape->in_object_count];
            }
        }
        shape = shape->parent;
    }
    return VAL_UNDEFINED;
}

// Even faster: cache the offset for monomorphic access
typedef struct {
    Shape* cached_shape;
    int cached_offset;
} PropertyCache;

Value object_get_property_cached(ObjJSObject* obj, ObjString* name,
                                  PropertyCache* cache) {
    if (obj->shape == cache->cached_shape) {
        // Cache hit! Direct offset access
        return obj->in_object[cache->cached_offset];
    }

    // Cache miss: do lookup and update cache
    int offset = shape_find_property(obj->shape, name);
    if (offset >= 0) {
        cache->cached_shape = obj->shape;
        cache->cached_offset = offset;
        return obj->in_object[offset];
    }
    return VAL_UNDEFINED;
}

Learning milestones:

You implement shape chains → You understand property ordering matters
You implement transitions → You understand shape sharing
You optimize property access → You understand offset caching
You measure speedup → You understand why this matters

Project 8: Inline Caching (IC)

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Optimization / Polymorphism / Speculation
Software or Tool: Custom implementation
Main Book: V8 Blog articles

What you’ll build: An inline caching system that remembers the shapes seen at each property access site, enabling fast path execution when shapes are stable.

Why it teaches V8: Inline caching is how V8 makes JavaScript competitive with statically-typed languages. Combined with hidden classes, it enables speculative optimizations that assume types won’t change.

Core challenges you’ll face:

Per-site caching → maps to each access point has its own cache
IC states (mono/poly/mega) → maps to how many shapes have we seen?
Feedback vectors → maps to storing type information for JIT
Cache invalidation → maps to when shapes change

Key Concepts:

Inline Caching: Hidden V8 optimizations
IC States: Inline Caching Unlocking JavaScript Performance
Feedback Vectors: V8 blog articles

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 7 completed. Understanding of speculation.

Real world outcome:

$ ./jsvm --trace-ic

function getX(obj) { return obj.x; }

p1 = {x: 1};
p2 = {x: 2};
p3 = {y: 3};

getX(p1);  // IC @ offset 5: UNINITIALIZED -> MONOMORPHIC (Shape@0x1040)
getX(p1);  // IC @ offset 5: HIT (cached shape match)
getX(p2);  // IC @ offset 5: HIT (same shape as p1!)
getX(p3);  // IC @ offset 5: MONOMORPHIC -> POLYMORPHIC
           //   Adding Shape@0x1080 (has 'y' not 'x')
           //   Result: undefined (no 'x' property)

// After many iterations with different shapes:
getX({a:1, x:5});  // IC @ offset 5: POLYMORPHIC -> MEGAMORPHIC
                   //   Too many shapes, fall back to generic lookup

IC Statistics:
  Total IC sites: 1
  Monomorphic hits: 2
  Polymorphic hits: 1
  Megamorphic lookups: 1
  Cache hit rate: 75%

Implementation Hints:

Inline cache structure:

typedef enum {
    IC_UNINITIALIZED,
    IC_MONOMORPHIC,    // Single shape (fast!)
    IC_POLYMORPHIC,    // 2-4 shapes (still ok)
    IC_MEGAMORPHIC     // Too many shapes (slow)
} ICState;

#define IC_POLY_MAX 4

typedef struct {
    ICState state;
    union {
        // Monomorphic: single shape + offset
        struct {
            Shape* shape;
            int offset;
        } mono;

        // Polymorphic: array of (shape, offset) pairs
        struct {
            Shape* shapes[IC_POLY_MAX];
            int offsets[IC_POLY_MAX];
            int count;
        } poly;
    };
} InlineCache;

// Feedback vector: one IC per bytecode site
typedef struct {
    InlineCache* caches;
    int count;
} FeedbackVector;

Property access with IC:

Value ic_get_property(VM* vm, ObjJSObject* obj, ObjString* name,
                       InlineCache* ic) {
    switch (ic->state) {
        case IC_UNINITIALIZED: {
            // First access: initialize to monomorphic
            int offset = shape_find_property(obj->shape, name);
            if (offset >= 0) {
                ic->state = IC_MONOMORPHIC;
                ic->mono.shape = obj->shape;
                ic->mono.offset = offset;
                return obj->in_object[offset];
            }
            // Property doesn't exist
            ic->state = IC_MEGAMORPHIC;  // Give up on caching
            return VAL_UNDEFINED;
        }

        case IC_MONOMORPHIC: {
            if (obj->shape == ic->mono.shape) {
                // Cache hit! Super fast path
                return obj->in_object[ic->mono.offset];
            }
            // Shape mismatch: transition to polymorphic
            ic_transition_to_poly(ic, obj->shape, name);
            return ic_get_property(vm, obj, name, ic);  // Retry
        }

        case IC_POLYMORPHIC: {
            // Check each cached shape
            for (int i = 0; i < ic->poly.count; i++) {
                if (obj->shape == ic->poly.shapes[i]) {
                    return obj->in_object[ic->poly.offsets[i]];
                }
            }

            // Not in cache: add it or go megamorphic
            if (ic->poly.count < IC_POLY_MAX) {
                int offset = shape_find_property(obj->shape, name);
                if (offset >= 0) {
                    ic->poly.shapes[ic->poly.count] = obj->shape;
                    ic->poly.offsets[ic->poly.count] = offset;
                    ic->poly.count++;
                    return obj->in_object[offset];
                }
            }

            // Too many shapes
            ic->state = IC_MEGAMORPHIC;
            return generic_get_property(obj, name);
        }

        case IC_MEGAMORPHIC: {
            // Give up on caching, use generic lookup
            return generic_get_property(obj, name);
        }
    }
}

// In bytecode handler:
case OP_GET_NAMED: {
    uint8_t name_idx = READ_BYTE();
    uint8_t ic_slot = READ_BYTE();

    ObjString* name = AS_STRING(READ_CONSTANT_AT(name_idx));
    InlineCache* ic = &frame->feedback_vector->caches[ic_slot];

    if (!IS_OBJECT(acc)) {
        runtime_error("Cannot read property of non-object");
    }

    acc = ic_get_property(vm, AS_OBJECT(acc), name, ic);
    DISPATCH();
}

Learning milestones:

You implement monomorphic IC → You understand fast-path speculation
You handle polymorphism → You understand type variability
You track IC statistics → You understand optimization feedback
You use IC for JIT → You understand the connection to TurboFan

Project 9: Generational Garbage Collector

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: Memory Management / GC Optimization
Software or Tool: Custom implementation
Main Book: “The Garbage Collection Handbook” by Jones, Hosking, Moss

What you’ll build: A generational garbage collector with a young generation (using semi-space copying) and old generation (using mark-sweep), mimicking V8’s Orinoco.

Why it teaches V8: V8’s generational hypothesis—”most objects die young”—enables fast collection of short-lived objects. Understanding this is key to writing memory-efficient JavaScript.

Core challenges you’ll face:

Semi-space copying → maps to Cheney’s algorithm for young gen
Promotion policy → maps to when to move objects to old gen
Write barriers → maps to tracking old-to-young pointers
Remembered sets → maps to roots into young generation

Key Concepts:

Orinoco Scavenger: Orinoco: young generation garbage collection
Generational GC: Trash talk: Orinoco
Semi-space Copying: Cheney’s algorithm

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Project 6 completed. Deep GC knowledge.

Real world outcome:

$ ./jsvm --gc-stats

[GC Config]
  Young generation: 4 MB (2 MB from-space, 2 MB to-space)
  Old generation: 64 MB
  Promotion threshold: 2 collections

[Running script...]

[Minor GC #1] Scavenge
  Young gen before: 1.8 MB
  Copied to to-space: 0.3 MB
  Promoted to old gen: 0.0 MB
  Freed: 1.5 MB
  Time: 0.5 ms

[Minor GC #2] Scavenge
  Young gen before: 1.9 MB
  Copied to to-space: 0.2 MB
  Promoted to old gen: 0.1 MB (survived 2 collections)
  Freed: 1.6 MB
  Time: 0.4 ms

[Minor GC #3-50] ... (fast scavenges)

[Major GC #1] Mark-Sweep-Compact
  Old gen before: 32 MB
  Marked: 28 MB
  Swept: 4 MB
  Compacted: 2 MB
  Time: 15 ms

Summary:
  Minor collections: 50
  Major collections: 1
  Total pause time: 40 ms
  Average minor pause: 0.5 ms
  Max pause: 15 ms (major GC)

Implementation Hints:

Heap structure:

typedef struct {
    // Young generation (semi-space)
    uint8_t* young_from_space;
    uint8_t* young_to_space;
    uint8_t* young_alloc_ptr;
    uint8_t* young_end;
    size_t young_size;

    // Old generation
    uint8_t* old_space;
    uint8_t* old_alloc_ptr;
    uint8_t* old_end;
    size_t old_size;

    // Remembered set: old objects pointing to young objects
    Object** remembered_set;
    int remembered_set_count;
    int remembered_set_capacity;

    // Object tracking
    Object* all_objects;

    // Stats
    size_t bytes_allocated;
    size_t minor_gc_count;
    size_t major_gc_count;
} Heap;

Scavenger (young generation collector):

void gc_scavenge(VM* vm) {
    Heap* heap = &vm->heap;

    // Swap from-space and to-space
    uint8_t* temp = heap->young_from_space;
    heap->young_from_space = heap->young_to_space;
    heap->young_to_space = temp;

    // Reset to-space allocation pointer
    heap->young_alloc_ptr = heap->young_to_space;

    // Copy roots
    gc_scavenge_roots(vm);

    // Process remembered set (old → young pointers)
    for (int i = 0; i < heap->remembered_set_count; i++) {
        Object* old_obj = heap->remembered_set[i];
        gc_scavenge_object_fields(vm, old_obj);
    }
    heap->remembered_set_count = 0;

    // Cheney's algorithm: process copied objects
    uint8_t* scan = heap->young_to_space;
    while (scan < heap->young_alloc_ptr) {
        Object* obj = (Object*)scan;
        gc_scavenge_object_fields(vm, obj);
        scan += object_size(obj);
    }

    heap->minor_gc_count++;
}

Object* gc_copy_object(VM* vm, Object* obj) {
    Heap* heap = &vm->heap;

    // Already copied? (forwarding pointer)
    if (obj->forwarding != NULL) {
        return obj->forwarding;
    }

    // Check if should promote to old generation
    obj->age++;
    if (obj->age >= PROMOTION_THRESHOLD) {
        return gc_promote_to_old(vm, obj);
    }

    // Copy to to-space
    size_t size = object_size(obj);
    Object* copy = (Object*)heap->young_alloc_ptr;
    memcpy(copy, obj, size);
    heap->young_alloc_ptr += size;

    // Leave forwarding pointer
    obj->forwarding = copy;
    copy->forwarding = NULL;

    return copy;
}

Write barrier:

// Must be called whenever we write a pointer from old to young
void gc_write_barrier(VM* vm, Object* container, Object* value) {
    Heap* heap = &vm->heap;

    // Is container in old gen and value in young gen?
    if (is_in_old_gen(heap, container) && is_in_young_gen(heap, value)) {
        // Add to remembered set
        if (heap->remembered_set_count >= heap->remembered_set_capacity) {
            heap->remembered_set_capacity *= 2;
            heap->remembered_set = realloc(heap->remembered_set,
                sizeof(Object*) * heap->remembered_set_capacity);
        }
        heap->remembered_set[heap->remembered_set_count++] = container;
    }
}

// Use in property assignment:
void object_set_property(ObjJSObject* obj, ObjString* name, Value value) {
    // ... set property ...

    // Write barrier for GC
    if (IS_OBJ(value)) {
        gc_write_barrier(vm, (Object*)obj, AS_OBJ(value));
    }
}

Learning milestones:

You implement semi-space copying → You understand Cheney’s algorithm
You implement promotion → You understand the generational hypothesis
You implement write barriers → You understand cross-generation pointers
You tune thresholds → You understand GC performance tradeoffs

Project 10: Basic JIT Compiler

File: LEARN_V8_JAVASCRIPT_ENGINE_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold”
Difficulty: Level 5: Master
Knowledge Area: JIT Compilation / Machine Code / x86-64
Software or Tool: Custom implementation
Main Book: “Engineering a Compiler” by Cooper & Torczon

What you’ll build: A simple JIT compiler that compiles hot bytecode functions to native x86-64 machine code, dramatically speeding up execution.

Why it teaches V8: JIT compilation is what makes V8 fast. TurboFan compiles JavaScript to native code based on type feedback. This project teaches you the fundamentals of runtime code generation.

Core challenges you’ll face:

Generating machine code → maps to x86-64 encoding
Register allocation → maps to mapping virtual to physical registers
Calling conventions → maps to how to call C functions
Code patching → maps to for deoptimization

Key Concepts:

JIT Tutorial: Simple JIT Compiler
x86-64 Encoding: Intel SDM
TurboFan: Launching Ignition and TurboFan

Difficulty: Master Time estimate: 1-2 months Prerequisites: All previous projects. x86-64 assembly knowledge. Understanding of calling conventions.

Real world outcome:

$ ./jsvm --jit --trace-jit

function add(a, b) { return a + b; }

[JIT] Function 'add' marked hot (1000 calls)
[JIT] Compiling 'add' to native code...
[JIT] Type feedback: a=Number, b=Number
[JIT] Generated code (24 bytes):
  0x7f8b4c000000:  push rbp
  0x7f8b4c000001:  mov rbp, rsp
  0x7f8b4c000004:  movsd xmm0, [rdi+8]     ; Load a
  0x7f8b4c000009:  addsd xmm0, [rsi+8]     ; Add b
  0x7f8b4c00000e:  movsd [rax], xmm0       ; Store result
  0x7f8b4c000013:  pop rbp
  0x7f8b4c000014:  ret
[JIT] Compilation complete, installed at 0x7f8b4c000000

[Benchmark]
  Interpreted: 100,000 calls in 150ms (666,666 calls/sec)
  JIT compiled: 100,000 calls in 2ms (50,000,000 calls/sec)
  Speedup: 75x

Implementation Hints:

Code buffer for JIT:

#include <sys/mman.h>

typedef struct {
    uint8_t* code;
    size_t size;
    size_t capacity;
} JITBuffer;

JITBuffer* jit_buffer_new(size_t initial_size) {
    JITBuffer* buf = malloc(sizeof(JITBuffer));

    // Allocate executable memory
    buf->code = mmap(NULL, initial_size,
                     PROT_READ | PROT_WRITE | PROT_EXEC,
                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    buf->size = 0;
    buf->capacity = initial_size;

    return buf;
}

void jit_emit(JITBuffer* buf, uint8_t byte) {
    buf->code[buf->size++] = byte;
}

void jit_emit_bytes(JITBuffer* buf, const uint8_t* bytes, size_t count) {
    memcpy(buf->code + buf->size, bytes, count);
    buf->size += count;
}

x86-64 code generation helpers:

// REX prefixes for 64-bit operations
#define REX_W  0x48  // 64-bit operand size

// ModR/M byte helpers
#define MODRM(mod, reg, rm) (((mod) << 6) | ((reg) << 3) | (rm))
#define MODRM_REG(reg, rm) MODRM(3, reg, rm)
#define MODRM_DISP8(reg, rm) MODRM(1, reg, rm)

// Register encoding
enum {
    RAX = 0, RCX = 1, RDX = 2, RBX = 3,
    RSP = 4, RBP = 5, RSI = 6, RDI = 7,
    R8 = 8, R9 = 9, R10 = 10, R11 = 11,
    R12 = 12, R13 = 13, R14 = 14, R15 = 15
};

// Emit: mov reg, imm64
void jit_emit_mov_imm64(JITBuffer* buf, int reg, uint64_t imm) {
    if (reg >= 8) {
        jit_emit(buf, REX_W | 0x01);  // REX.WB
        reg -= 8;
    } else {
        jit_emit(buf, REX_W);
    }
    jit_emit(buf, 0xB8 + reg);  // MOV r64, imm64
    jit_emit_bytes(buf, (uint8_t*)&imm, 8);
}

// Emit: add rax, reg
void jit_emit_add_reg(JITBuffer* buf, int src) {
    jit_emit(buf, REX_W);
    jit_emit(buf, 0x01);  // ADD r/m64, r64
    jit_emit(buf, MODRM_REG(src, RAX));
}

// Emit: ret
void jit_emit_ret(JITBuffer* buf) {
    jit_emit(buf, 0xC3);
}

// Emit: call addr
void jit_emit_call(JITBuffer* buf, void* target) {
    // mov rax, target
    jit_emit_mov_imm64(buf, RAX, (uint64_t)target);
    // call rax
    jit_emit(buf, 0xFF);
    jit_emit(buf, MODRM_REG(2, RAX));
}

Simple JIT compiler:

typedef void (*JITFunction)(VM* vm, Value* args, Value* result);

JITFunction jit_compile(VM* vm, ObjFunction* function) {
    JITBuffer* buf = jit_buffer_new(4096);
    Chunk* chunk = &function->chunk;

    // Prologue
    jit_emit(buf, 0x55);                    // push rbp
    jit_emit(buf, REX_W);
    jit_emit(buf, 0x89);
    jit_emit(buf, MODRM_REG(RSP, RBP));     // mov rbp, rsp

    // Arguments: rdi=VM*, rsi=args, rdx=result

    // Compile bytecode to native
    int ip = 0;
    while (ip < chunk->count) {
        uint8_t op = chunk->code[ip++];

        switch (op) {
            case OP_LDA_CONSTANT: {
                uint8_t idx = chunk->code[ip++];
                Value constant = chunk->constants[idx];

                // mov rax, constant_value
                jit_emit_mov_imm64(buf, RAX, value_to_bits(constant));
                break;
            }

            case OP_ADD: {
                uint8_t reg = chunk->code[ip++];

                // Check types at runtime (or use feedback for speculation)
                // For now, assume numbers:

                // Load operands to XMM registers
                // addsd xmm0, xmm1
                // Store result

                // Or call runtime helper:
                // jit_emit_call(buf, runtime_add);
                break;
            }

            case OP_RETURN: {
                // Store result
                // mov [rdx], rax  (result pointer in rdx)
                jit_emit(buf, REX_W);
                jit_emit(buf, 0x89);
                jit_emit(buf, MODRM(0, RAX, RDX));

                // Epilogue
                jit_emit(buf, 0x5D);  // pop rbp
                jit_emit_ret(buf);
                break;
            }
        }
    }

    return (JITFunction)buf->code;
}

Learning milestones:

You generate executable code → You understand mmap/mprotect
You encode x86-64 instructions → You understand machine code
You handle calling conventions → You understand ABI
You achieve significant speedup → You understand why JIT matters

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor
1. JavaScript Lexer	Intermediate	1 week	⭐⭐⭐	⭐⭐⭐
2. JavaScript Parser	Advanced	2-3 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐
3. AST Interpreter	Intermediate	2 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
4. Bytecode Compiler	Advanced	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
5. Bytecode VM	Advanced	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
6. Mark-Sweep GC	Advanced	2 weeks	⭐⭐⭐⭐	⭐⭐⭐
7. Hidden Classes	Expert	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
8. Inline Caching	Expert	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
9. Generational GC	Expert	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
10. JIT Compiler	Master	1-2 months	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐

Recommended Learning Path

To understand JavaScript execution:

Projects 1-3 (Lexer → Parser → Interpreter) - See JavaScript run
Projects 4-5 (Bytecode compiler + VM) - Understand Ignition

To understand V8’s optimizations:

Project 7 (Hidden Classes) - The key insight
Project 8 (Inline Caching) - Why property access is fast
Project 10 (JIT) - Why JavaScript is fast

To understand memory management:

Project 6 (Mark-Sweep) - Basic GC
Project 9 (Generational GC) - V8’s Orinoco

Quick path (focused on the core):

Project 1 (1 week) - Tokenization
Project 2 (2 weeks) - Parsing with Pratt
Project 5 (2 weeks) - Bytecode VM
Project 7 (2 weeks) - Hidden classes

Final Capstone: Complete JavaScript Engine

Integrate all projects into a complete JavaScript engine that can:

Parse JavaScript source code
Compile to bytecode
Execute in a VM with GC
Optimize hot functions with JIT
Use hidden classes and inline caching

Real world outcome:

$ ./minijs test.js

MiniJS v0.1 - A Learning JavaScript Engine

Parsing...
Compiling to bytecode...
Executing...

[Hot function detected: 'fibonacci' (1000 calls)]
[JIT compiling 'fibonacci'...]
[Compilation complete: 156 bytes of machine code]

Result: 6765

Performance:
  Interpreted calls: 999
  JIT compiled calls: 10,000
  Total execution time: 5ms
  GC collections: 3 (minor), 0 (major)

Time estimate: 4-6 months for the complete engine

Essential Resources

Books

Crafting Interpreters by Robert Nystrom - The best starting point (free online)
“Engineering a Compiler” by Cooper & Torczon - Compiler theory
“The Garbage Collection Handbook” by Jones, Hosking, Moss - Definitive GC reference
“Modern Compiler Implementation in C” by Andrew Appel - Academic but thorough

V8-Specific

V8 Blog - Official V8 blog with deep dives
V8 Source Code - The real thing
Understanding V8’s Bytecode - Ignition internals

Tutorials

Baby’s First Garbage Collector - Simple GC in C
JIT Tutorial - Simple JIT compiler
Writing a JIT in ~1000 Lines - Practical JIT

References

ECMAScript Specification - The JavaScript standard
Intel SDM - x86-64 instruction reference

Summary

#	Project	Main Programming Language
1	JavaScript Lexer	C
2	JavaScript Parser	C
3	AST Interpreter	C
4	Bytecode Compiler	C
5	Bytecode Virtual Machine	C
6	Mark-Sweep Garbage Collector	C
7	Hidden Classes (Object Shapes)	C
8	Inline Caching	C
9	Generational Garbage Collector	C
10	Basic JIT Compiler	C
Final	Complete JavaScript Engine	C

Building a JavaScript engine is one of the most rewarding projects in computer science. Every piece you build teaches you something fundamental about how modern software works. Enjoy the journey! 🚀