P08: Simple EVM Implementation

P08: Simple EVM Implementation

Project Overview

Attribute Value
Main Language Rust
Alternative Languages Go, C++, TypeScript
Difficulty Master
Coolness Level Level 5: Industry-Defining Achievement
Business Potential Career Catalyst (Protocol Team Credentials)
Knowledge Area Virtual Machines / Ethereum
Main Book โ€œMastering Ethereumโ€ by Andreas M. Antonopoulos & Gavin Wood

Learning Objectives

By completing this project, you will:

  1. Understand the Ethereum Virtual Machine architecture including its stack-based execution model, memory layout, and storage semantics
  2. Master bytecode interpretation implementing opcode decoding, execution, and control flow handling
  3. Implement a complete gas metering system understanding why gas exists and how it prevents denial-of-service attacks
  4. Build persistent smart contract storage with 256-bit key-value semantics matching Ethereumโ€™s SSTORE/SLOAD
  5. Debug and trace smart contract execution at the opcode level, a critical skill for security auditing and optimization
  6. Connect execution to consensus understanding how the EVM fits into Ethereumโ€™s state transition function

Deep Theoretical Foundation

What is the Ethereum Virtual Machine?

The Ethereum Virtual Machine (EVM) is the computation engine at the heart of Ethereum. Every smart contract youโ€™ve ever interacted withโ€”from Uniswap swaps to NFT mints to DAO votesโ€”executes as EVM bytecode. Understanding the EVM means understanding how Ethereum actually works.

Unlike Bitcoinโ€™s limited Script language, the EVM is Turing-complete: it can compute anything computable (given enough gas). This makes Ethereum a โ€œworld computerโ€โ€”a globally shared state machine where anyone can deploy arbitrary programs.

Why Build an EVM?

Building an EVM from scratch teaches you:

  • How smart contracts actually execute (not just what Solidity compiles to)
  • Why gas exists and how itโ€™s calculated (essential for optimization)
  • How storage works (critical for security auditing)
  • How to debug at the opcode level (invaluable for complex issues)
  • The exact semantics that determine whether a transaction succeeds or reverts

This knowledge separates smart contract developers who guess from those who know.

The EVM Architecture

Stack-Based Execution

The EVM is a stack machine, not a register machine like x86 or ARM. Every operation works with a stack:

Stack-Based (EVM):           Register-Based (x86):

PUSH 5                       mov eax, 5
PUSH 3                       add eax, 3
ADD                          ; result in eax
; result on stack top

Stack properties:

  • Maximum depth: 1024 items
  • Each item: 256 bits (32 bytes)
  • Most operations pop operands from stack and push results
  • No direct access to stack items below the top (except DUP and SWAP)

Why stack-based?

  1. Simpler verification: Stack machines are easier to analyze for correctness
  2. Compact bytecode: No need to encode register names
  3. Deterministic: Same inputs always produce same stack states

Memory Model

The EVM has three distinct data areas:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         EVM DATA AREAS                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                      STACK                            โ”‚  โ”‚
โ”‚  โ”‚  โ€ข 256-bit words, max 1024 items                     โ”‚  โ”‚
โ”‚  โ”‚  โ€ข LIFO (Last In, First Out)                         โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Volatile (cleared after execution)                โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Zero gas for basic push/pop                       โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                      MEMORY                           โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Byte-addressable, linear                          โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Starts at zero, expands as needed                 โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Volatile (cleared after execution)                โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Expansion costs gas (quadratic)                   โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                      STORAGE                          โ”‚  โ”‚
โ”‚  โ”‚  โ€ข 256-bit key โ†’ 256-bit value mapping               โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Persistent (survives transactions)                โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Per-contract (isolated between accounts)          โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Very expensive (20,000 gas for new slot)          โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Stack: Fast, temporary workspace for computation. Think of it as CPU registers.

Memory: Byte-addressable scratch space. Grows as needed during execution but is wiped between calls. Used for:

  • Building return data
  • Passing parameters to internal functions
  • Temporary arrays and structs

Storage: Persistent key-value store. This is the contractโ€™s โ€œhard drive.โ€ Storage is what makes contracts statefulโ€”your token balance, your NFT ownership, the current DEX price all live in storage.

Gas System

Gas is the EVMโ€™s execution metering system. Every operation costs gas:

Operation Gas Cost Rationale
ADD, SUB, XOR 3 Simple arithmetic
MUL, DIV 5 More complex math
SHA3 (KECCAK256) 30 + 6/word Cryptographic, scales with size
SLOAD 2100 (cold) / 100 (warm) Disk I/O equivalent
SSTORE 20000 (new) / 5000 (update) Persistent write
CREATE 32000 Contract deployment

Why gas exists:

  1. Denial of service prevention: Without gas, anyone could run infinite loops
  2. Resource pricing: Gas aligns incentivesโ€”heavy computation costs more
  3. Miner/validator compensation: Gas fees pay for execution
  4. Predictable limits: Gas limits bound maximum execution time

Gas mechanics:

Transaction gas limit: 21000 + execution gas
Block gas limit: ~30 million (Ethereum mainnet)

If execution runs out of gas:
  - All state changes revert
  - Gas is NOT refunded (still paid to validators)
  - Transaction marked as failed

Program Counter and Control Flow

The EVM reads bytecode sequentially from position 0. The program counter (PC) tracks the current instruction.

Linear execution:

PC=0: PUSH1 0x05  โ†’ stack: [5]        โ†’ PC=2
PC=2: PUSH1 0x03  โ†’ stack: [5, 3]     โ†’ PC=4
PC=4: ADD         โ†’ stack: [8]        โ†’ PC=5
PC=5: STOP        โ†’ halt

Control flow instructions:

  • JUMP: Set PC to value from stack (must land on JUMPDEST)
  • JUMPI: Conditional jump (if stack[1] != 0, jump to stack[0])
  • JUMPDEST: Valid jump target marker

Critical security detail: All jumps must land on a JUMPDEST opcode. This prevents jumping into the middle of a PUSH instruction (which could be exploited to create unintended code).

Understanding Opcodes

EVM opcodes are single bytes (0x00 to 0xFF). Here are the core categories:

Arithmetic Operations (0x01-0x0B)

Opcode Name Stack Description
0x01 ADD [a, b] โ†’ [a+b] Addition modulo 2^256
0x02 MUL [a, b] โ†’ [a*b] Multiplication modulo 2^256
0x03 SUB [a, b] โ†’ [a-b] Subtraction modulo 2^256
0x04 DIV [a, b] โ†’ [a/b] Integer division (0 if b=0)
0x05 SDIV [a, b] โ†’ [a/b] Signed division
0x06 MOD [a, b] โ†’ [a%b] Modulo (0 if b=0)
0x07 SMOD [a, b] โ†’ [a%b] Signed modulo
0x08 ADDMOD [a, b, N] โ†’ [(a+b)%N] Addition then modulo
0x09 MULMOD [a, b, N] โ†’ [(a*b)%N] Multiplication then modulo
0x0A EXP [a, b] โ†’ [a^b] Exponentiation
0x0B SIGNEXTEND [b, x] โ†’ [y] Sign extend x from (b+1) bytes

Comparison Operations (0x10-0x1D)

Opcode Name Stack Description
0x10 LT [a, b] โ†’ [1 if a<b else 0] Less than
0x11 GT [a, b] โ†’ [1 if a>b else 0] Greater than
0x12 SLT [a, b] โ†’ [1 if a<b else 0] Signed less than
0x13 SGT [a, b] โ†’ [1 if a>b else 0] Signed greater than
0x14 EQ [a, b] โ†’ [1 if a==b else 0] Equality
0x15 ISZERO [a] โ†’ [1 if a==0 else 0] Is zero

Bitwise Operations (0x16-0x1D)

Opcode Name Stack Description
0x16 AND [a, b] โ†’ [a & b] Bitwise AND
0x17 OR [a, b] โ†’ [a | b] Bitwise OR
0x18 XOR [a, b] โ†’ [a ^ b] Bitwise XOR
0x19 NOT [a] โ†’ [~a] Bitwise NOT
0x1A BYTE [i, x] โ†’ [byte] Get byte i of x (0 = most significant)
0x1B SHL [shift, val] โ†’ [val ยซย shift] Shift left
0x1C SHR [shift, val] โ†’ [valย ยป shift] Logical shift right
0x1D SAR [shift, val] โ†’ [valย ยป shift] Arithmetic shift right

Stack Operations (0x50-0x5B, 0x80-0x8F, 0x90-0x9F)

Opcode Name Description
0x50 POP Remove top stack item
0x51 MLOAD Load 32 bytes from memory
0x52 MSTORE Store 32 bytes to memory
0x53 MSTORE8 Store 1 byte to memory
0x54 SLOAD Load from storage
0x55 SSTORE Store to storage
0x56 JUMP Jump to address
0x57 JUMPI Conditional jump
0x58 PC Get program counter
0x59 MSIZE Get memory size
0x5A GAS Get remaining gas
0x5B JUMPDEST Mark valid jump destination
0x60-0x7F PUSH1-PUSH32 Push 1-32 bytes onto stack
0x80-0x8F DUP1-DUP16 Duplicate stack item
0x90-0x9F SWAP1-SWAP16 Swap stack items

Memory Expansion and Gas

Memory starts empty and expands as you access higher addresses:

Memory expansion cost = 3 * words + words^2 / 512

where words = ceil(highest_accessed_byte / 32)

Example:

  • Access byte 0: 0 words โ†’ 0 cost (first 32 bytes are free-ish)
  • Access byte 1000: 32 words โ†’ 3*32 + 32^2/512 = 96 + 2 = 98 gas
  • Access byte 10000: 313 words โ†’ 3*313 + 313^2/512 = 939 + 191 = 1130 gas

The quadratic term prevents unbounded memory expansion attacks.

Storage: The Persistent State

Storage is a mapping from 256-bit keys to 256-bit values. Each contract has its own isolated storage.

Contract Storage:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  slot 0: 0x000...000 โ†’ 0x000...100  (maybe: token supply)   โ”‚
โ”‚  slot 1: 0x000...001 โ†’ 0x123...456  (maybe: owner address)  โ”‚
โ”‚  ...                                                         โ”‚
โ”‚  slot N: mapping hash โ†’ value       (dynamic data)          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Solidity storage layout:

contract Example {
    uint256 public totalSupply;  // slot 0
    address public owner;        // slot 1
    mapping(address => uint256) public balances;  // slot 2 (base)

    // For mappings, actual slot = keccak256(key . base_slot)
    // balances[0xABC] is at keccak256(abi.encode(0xABC, 2))
}

Storage gas costs (EIP-2200):

  • SLOAD cold: 2100 gas (first access in transaction)
  • SLOAD warm: 100 gas (already accessed)
  • SSTORE from 0 to non-0: 20000 gas
  • SSTORE from non-0 to non-0: 5000 gas
  • SSTORE from non-0 to 0: 5000 gas - 15000 refund

The State Transition Function

The EVM is the core of Ethereumโ€™s state transition function:

ฯƒ' = ฮฅ(ฯƒ, T)

where:
  ฯƒ  = current world state (all account balances, storage, code)
  T  = transaction
  ฮฅ  = state transition function (includes EVM)
  ฯƒ' = new world state

Every block applies transactions sequentially, each running the EVM to compute state changes:

Block Processing:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  State ฯƒโ‚€   โ”‚ โ”€โ”€โ–บ Transaction 1 โ”€โ”€โ–บ ฯƒโ‚ โ”€โ”€โ–บ Transaction 2 โ”€โ”€โ–บ ฯƒโ‚‚ โ”€โ”€โ–บ ... โ”€โ”€โ–บ ฯƒโ‚™
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚
      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  State ฯƒโ‚™   โ”‚ (new state root in block header)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Complete Project Specification

Functional Requirements

  1. Bytecode Execution
    • Load and execute arbitrary EVM bytecode
    • Support all PUSH1-PUSH32 instructions
    • Implement arithmetic operations (ADD, MUL, SUB, DIV, MOD, EXP)
    • Implement comparison operations (LT, GT, EQ, ISZERO)
    • Implement bitwise operations (AND, OR, XOR, NOT, SHL, SHR)
  2. Stack Management
    • 256-bit stack items
    • Maximum depth of 1024
    • DUP1-DUP16 operations
    • SWAP1-SWAP16 operations
    • Proper stack underflow/overflow errors
  3. Memory Operations
    • Dynamic memory expansion
    • MLOAD, MSTORE, MSTORE8 operations
    • Proper gas calculation for expansion
    • MSIZE operation
  4. Storage Operations
    • 256-bit key to 256-bit value mapping
    • SLOAD and SSTORE operations
    • Persistent state between calls
    • Gas cost differentiation (cold/warm access)
  5. Control Flow
    • JUMP and JUMPI instructions
    • JUMPDEST validation
    • Proper PC tracking
    • STOP, RETURN, and REVERT
  6. Gas Metering
    • Accurate gas tracking per opcode
    • Out-of-gas detection and handling
    • Gas refunds for SSTORE (zero-out)
    • Report remaining gas
  7. Execution Environment
    • Call data (CALLDATALOAD, CALLDATASIZE, CALLDATACOPY)
    • Return data handling
    • Execution trace output

Command-Line Interface

# Execute bytecode from hex string
$ evm run --bytecode "6005600301" --gas 100000
Stack: [0x08]
Gas used: 9
Gas remaining: 99991

# Execute bytecode from file
$ evm run --file contract.bin --gas 100000 --trace
PC=0000: PUSH1 0x05           stack=[5]                gas=99997
PC=0002: PUSH1 0x03           stack=[5, 3]             gas=99994
PC=0004: ADD                  stack=[8]                gas=99991
PC=0005: STOP                 stack=[8]                gas=99991

# Execute with call data
$ evm run --bytecode "600035" --calldata "0x1234567890" --gas 100000
Stack: [0x1234567890000000000000000000000000000000000000000000000000000000]

# Disassemble bytecode
$ evm disasm --bytecode "6005600301"
0000: PUSH1 0x05
0002: PUSH1 0x03
0004: ADD
0005: STOP

# Show storage state
$ evm run --bytecode "60016000555460005260206000f3" --gas 100000 --show-storage
Storage:
  0x00: 0x01
Return data: 0x0000000000000000000000000000000000000000000000000000000000000001

Solution Architecture

Module Structure

src/
โ”œโ”€โ”€ main.rs                 # CLI entry point
โ”œโ”€โ”€ lib.rs                  # Public API
โ”œโ”€โ”€ evm/
โ”‚   โ”œโ”€โ”€ mod.rs              # EVM coordinator
โ”‚   โ”œโ”€โ”€ interpreter.rs      # Main execution loop
โ”‚   โ”œโ”€โ”€ opcodes.rs          # Opcode definitions and dispatch
โ”‚   โ””โ”€โ”€ gas.rs              # Gas cost calculations
โ”œโ”€โ”€ state/
โ”‚   โ”œโ”€โ”€ mod.rs              # State management
โ”‚   โ”œโ”€โ”€ stack.rs            # 256-bit stack implementation
โ”‚   โ”œโ”€โ”€ memory.rs           # Byte-addressable memory
โ”‚   โ””โ”€โ”€ storage.rs          # Persistent key-value storage
โ”œโ”€โ”€ types/
โ”‚   โ”œโ”€โ”€ mod.rs              # Core types
โ”‚   โ”œโ”€โ”€ u256.rs             # 256-bit unsigned integer
โ”‚   โ””โ”€โ”€ address.rs          # Ethereum address type
โ”œโ”€โ”€ context/
โ”‚   โ”œโ”€โ”€ mod.rs              # Execution context
โ”‚   โ”œโ”€โ”€ call_context.rs     # Call frame information
โ”‚   โ””โ”€โ”€ block_context.rs    # Block information
โ”œโ”€โ”€ trace/
โ”‚   โ”œโ”€โ”€ mod.rs              # Execution tracing
โ”‚   โ”œโ”€โ”€ step.rs             # Single step representation
โ”‚   โ””โ”€โ”€ formatter.rs        # Trace output formatting
โ””โ”€โ”€ tests/
    โ”œโ”€โ”€ arithmetic.rs       # Arithmetic opcode tests
    โ”œโ”€โ”€ stack.rs            # Stack operation tests
    โ”œโ”€โ”€ memory.rs           # Memory operation tests
    โ”œโ”€โ”€ storage.rs          # Storage tests
    โ”œโ”€โ”€ control_flow.rs     # Jump tests
    โ””โ”€โ”€ integration.rs      # Full contract tests

Core Data Structures

use primitive_types::U256;  // Or implement your own

/// Maximum stack depth
const MAX_STACK_DEPTH: usize = 1024;

/// The EVM stack - 256-bit items, max 1024 depth
pub struct Stack {
    data: Vec<U256>,
}

impl Stack {
    pub fn new() -> Self {
        Self { data: Vec::with_capacity(16) }
    }

    pub fn push(&mut self, value: U256) -> Result<(), EvmError> {
        if self.data.len() >= MAX_STACK_DEPTH {
            return Err(EvmError::StackOverflow);
        }
        self.data.push(value);
        Ok(())
    }

    pub fn pop(&mut self) -> Result<U256, EvmError> {
        self.data.pop().ok_or(EvmError::StackUnderflow)
    }

    pub fn peek(&self, depth: usize) -> Result<U256, EvmError> {
        let idx = self.data.len().checked_sub(depth + 1)
            .ok_or(EvmError::StackUnderflow)?;
        Ok(self.data[idx])
    }

    pub fn swap(&mut self, depth: usize) -> Result<(), EvmError> {
        let len = self.data.len();
        if depth >= len {
            return Err(EvmError::StackUnderflow);
        }
        self.data.swap(len - 1, len - 1 - depth);
        Ok(())
    }

    pub fn dup(&mut self, depth: usize) -> Result<(), EvmError> {
        let value = self.peek(depth - 1)?;
        self.push(value)
    }
}

/// EVM memory - byte addressable, dynamically expanding
pub struct Memory {
    data: Vec<u8>,
}

impl Memory {
    pub fn new() -> Self {
        Self { data: Vec::new() }
    }

    /// Expand memory to fit the given offset + size
    fn expand(&mut self, offset: usize, size: usize) {
        let required = offset.saturating_add(size);
        if required > self.data.len() {
            // Expand to next 32-byte boundary
            let new_size = (required + 31) / 32 * 32;
            self.data.resize(new_size, 0);
        }
    }

    /// Calculate gas cost for memory expansion
    pub fn expansion_cost(&self, offset: usize, size: usize) -> u64 {
        if size == 0 {
            return 0;
        }

        let new_words = (offset + size + 31) / 32;
        let current_words = self.data.len() / 32;

        if new_words <= current_words {
            return 0;
        }

        let new_cost = 3 * new_words as u64 + (new_words as u64).pow(2) / 512;
        let current_cost = 3 * current_words as u64 + (current_words as u64).pow(2) / 512;

        new_cost - current_cost
    }

    pub fn load(&mut self, offset: usize) -> U256 {
        self.expand(offset, 32);
        let bytes: [u8; 32] = self.data[offset..offset + 32]
            .try_into()
            .unwrap();
        U256::from_big_endian(&bytes)
    }

    pub fn store(&mut self, offset: usize, value: U256) {
        self.expand(offset, 32);
        let mut bytes = [0u8; 32];
        value.to_big_endian(&mut bytes);
        self.data[offset..offset + 32].copy_from_slice(&bytes);
    }

    pub fn store8(&mut self, offset: usize, value: u8) {
        self.expand(offset, 1);
        self.data[offset] = value;
    }

    pub fn size(&self) -> usize {
        self.data.len()
    }
}

/// Contract storage - 256-bit key to 256-bit value
pub struct Storage {
    data: HashMap<U256, U256>,
    original: HashMap<U256, U256>,  // For gas refund calculation
    warm_slots: HashSet<U256>,      // For EIP-2929 warm/cold access
}

impl Storage {
    pub fn new() -> Self {
        Self {
            data: HashMap::new(),
            original: HashMap::new(),
            warm_slots: HashSet::new(),
        }
    }

    pub fn load(&mut self, key: U256) -> (U256, bool) {
        let is_cold = !self.warm_slots.contains(&key);
        self.warm_slots.insert(key);
        let value = self.data.get(&key).copied().unwrap_or(U256::zero());
        (value, is_cold)
    }

    pub fn store(&mut self, key: U256, value: U256) -> StoreCost {
        let current = self.data.get(&key).copied().unwrap_or(U256::zero());
        let original = self.original.get(&key).copied().unwrap_or(U256::zero());

        // Track original value for refund calculations
        if !self.original.contains_key(&key) {
            self.original.insert(key, current);
        }

        self.data.insert(key, value);
        self.warm_slots.insert(key);

        StoreCost::calculate(original, current, value)
    }
}

/// Execution context for a single call
pub struct ExecutionContext {
    pub code: Vec<u8>,
    pub pc: usize,
    pub stack: Stack,
    pub memory: Memory,
    pub storage: Storage,
    pub gas_remaining: u64,
    pub gas_refund: i64,
    pub calldata: Vec<u8>,
    pub return_data: Vec<u8>,
    pub stopped: bool,
    pub reverted: bool,
}

/// Opcode enumeration
#[derive(Debug, Clone, Copy)]
pub enum Opcode {
    Stop = 0x00,
    Add = 0x01,
    Mul = 0x02,
    Sub = 0x03,
    Div = 0x04,
    SDiv = 0x05,
    Mod = 0x06,
    SMod = 0x07,
    AddMod = 0x08,
    MulMod = 0x09,
    Exp = 0x0a,
    SignExtend = 0x0b,
    Lt = 0x10,
    Gt = 0x11,
    SLt = 0x12,
    SGt = 0x13,
    Eq = 0x14,
    IsZero = 0x15,
    And = 0x16,
    Or = 0x17,
    Xor = 0x18,
    Not = 0x19,
    Byte = 0x1a,
    Shl = 0x1b,
    Shr = 0x1c,
    Sar = 0x1d,
    Keccak256 = 0x20,
    Pop = 0x50,
    MLoad = 0x51,
    MStore = 0x52,
    MStore8 = 0x53,
    SLoad = 0x54,
    SStore = 0x55,
    Jump = 0x56,
    JumpI = 0x57,
    Pc = 0x58,
    MSize = 0x59,
    Gas = 0x5a,
    JumpDest = 0x5b,
    Push1 = 0x60,
    // ... PUSH2-PUSH32
    Push32 = 0x7f,
    Dup1 = 0x80,
    // ... DUP2-DUP16
    Dup16 = 0x8f,
    Swap1 = 0x90,
    // ... SWAP2-SWAP16
    Swap16 = 0x9f,
    Return = 0xf3,
    Revert = 0xfd,
    Invalid = 0xfe,
}

/// Error types
#[derive(Debug, Clone)]
pub enum EvmError {
    StackOverflow,
    StackUnderflow,
    OutOfGas,
    InvalidJump,
    InvalidOpcode(u8),
    WriteProtection,
    OutOfBounds,
}

/// Execution result
pub struct ExecutionResult {
    pub success: bool,
    pub return_data: Vec<u8>,
    pub gas_used: u64,
    pub gas_refund: i64,
    pub logs: Vec<Log>,
}

Key Algorithms

Main Execution Loop

function execute(context: ExecutionContext) -> ExecutionResult:
    // Pre-compute valid JUMPDEST locations
    jumpdests = analyze_jumpdests(context.code)

    while context.pc < context.code.len() and not context.stopped:
        opcode = context.code[context.pc]
        gas_cost = calculate_gas_cost(opcode, context)

        if context.gas_remaining < gas_cost:
            return Error(OutOfGas)

        context.gas_remaining -= gas_cost

        result = execute_opcode(opcode, context, jumpdests)

        if result.is_error():
            return result

        // PUSH instructions advance PC by immediate size + 1
        // JUMP/JUMPI set PC directly
        // Other instructions advance PC by 1

    return ExecutionResult {
        success: not context.reverted,
        return_data: context.return_data,
        gas_used: initial_gas - context.gas_remaining,
        gas_refund: context.gas_refund,
    }

JUMPDEST Analysis

function analyze_jumpdests(code: bytes) -> Set<usize>:
    jumpdests = {}
    pc = 0

    while pc < code.len():
        opcode = code[pc]

        if opcode == JUMPDEST (0x5B):
            jumpdests.add(pc)
            pc += 1
        else if opcode >= PUSH1 and opcode <= PUSH32:
            // Skip over the immediate data
            push_size = opcode - PUSH1 + 1
            pc += 1 + push_size
        else:
            pc += 1

    return jumpdests

Gas Calculation

function calculate_gas_cost(opcode: u8, context: ExecutionContext) -> u64:
    base_cost = match opcode:
        STOP, RETURN, REVERT => 0
        ADD, SUB, NOT, LT, GT, ... => 3
        MUL, DIV, MOD, ... => 5
        EXP => 10 + 50 * (byte_size_of_exponent)
        SHA3 => 30 + 6 * (word_count)
        SLOAD => if cold 2100 else 100
        SSTORE => compute_sstore_cost(...)
        MLOAD, MSTORE, MSTORE8 => 3
        JUMP => 8
        JUMPI => 10
        ...

    // Add memory expansion cost if applicable
    if opcode in [MLOAD, MSTORE, MSTORE8, SHA3, ...]:
        memory_cost = context.memory.expansion_cost(offset, size)
        base_cost += memory_cost

    return base_cost

Signed Integer Handling

function to_signed(x: U256) -> I256:
    // 256-bit two's complement
    if x >= 2^255:
        return x - 2^256
    else:
        return x

function from_signed(x: I256) -> U256:
    if x < 0:
        return x + 2^256
    else:
        return x

function sdiv(a: U256, b: U256) -> U256:
    if b == 0:
        return 0

    a_signed = to_signed(a)
    b_signed = to_signed(b)

    // Special case: -2^255 / -1 = -2^255 (overflow)
    if a_signed == -2^255 and b_signed == -1:
        return a  // -2^255 as U256

    result = a_signed / b_signed
    return from_signed(result)

Phased Implementation Guide

Phase 1: Core Types and Stack

Goal: Implement 256-bit integers and the stack machine.

Tasks:

  1. Choose or implement a U256 type (recommend primitive-types crate for learning)
  2. Implement the Stack struct with push, pop, peek
  3. Implement DUP1-DUP16 operations
  4. Implement SWAP1-SWAP16 operations
  5. Add proper overflow/underflow error handling

Validation:

let mut stack = Stack::new();
stack.push(U256::from(5))?;
stack.push(U256::from(3))?;
stack.dup(1)?;  // Duplicate second item
assert_eq!(stack.pop()?, U256::from(5));
assert_eq!(stack.pop()?, U256::from(3));
assert_eq!(stack.pop()?, U256::from(5));

Hints if stuck:

  • Use Vec<U256> for the stack data
  • Stack grows upward: index 0 is bottom, last index is top
  • DUP1 duplicates top item (depth 0), DUP2 duplicates second item (depth 1)
  • SWAP1 swaps top with second, SWAP2 swaps top with third

Phase 2: Arithmetic Opcodes

Goal: Implement basic arithmetic operations.

Tasks:

  1. Create opcode dispatcher structure
  2. Implement ADD, SUB, MUL, DIV, MOD
  3. Implement SDIV, SMOD (signed variants)
  4. Implement ADDMOD, MULMOD
  5. Implement EXP (with proper gas calculation)

Validation:

// ADD: 5 + 3 = 8
let result = execute_bytecode("6005600301")?;  // PUSH1 5, PUSH1 3, ADD
assert_eq!(result.stack[0], U256::from(8));

// SUB: 5 - 3 = 2
let result = execute_bytecode("6003600503")?;  // PUSH1 3, PUSH1 5, SUB
assert_eq!(result.stack[0], U256::from(2));

// MUL: 5 * 3 = 15
let result = execute_bytecode("6005600302")?;
assert_eq!(result.stack[0], U256::from(15));

// DIV by zero = 0
let result = execute_bytecode("6000600304")?;  // PUSH1 0, PUSH1 3, DIV
assert_eq!(result.stack[0], U256::from(0));

Hints if stuck:

  • All arithmetic is modulo 2^256 (use wrapping operations)
  • Division by zero returns 0 (not an error in EVM)
  • For signed operations, interpret as twoโ€™s complement
  • Stack order matters: a b OP means a OP b where a is deeper

Phase 3: Comparison and Bitwise

Goal: Implement comparison and bitwise operations.

Tasks:

  1. Implement LT, GT, SLT, SGT, EQ, ISZERO
  2. Implement AND, OR, XOR, NOT
  3. Implement BYTE (extract single byte)
  4. Implement SHL, SHR, SAR (shifts)

Validation:

// LT: 3 < 5 = 1
let result = execute_bytecode("6005600310")?;  // PUSH1 5, PUSH1 3, LT
assert_eq!(result.stack[0], U256::from(1));

// ISZERO: iszero(0) = 1
let result = execute_bytecode("600015")?;
assert_eq!(result.stack[0], U256::from(1));

// SHL: 1 << 4 = 16
let result = execute_bytecode("600160041b")?;
assert_eq!(result.stack[0], U256::from(16));

Hints if stuck:

  • Comparison returns 1 (true) or 0 (false)
  • BYTE: index 0 is the most significant byte
  • SHL/SHR shift amounts >= 256 result in 0
  • SAR preserves the sign bit (arithmetic shift)

Phase 4: Memory Operations

Goal: Implement the memory subsystem.

Tasks:

  1. Implement Memory struct with dynamic expansion
  2. Implement MLOAD (32-byte load)
  3. Implement MSTORE (32-byte store)
  4. Implement MSTORE8 (single byte store)
  5. Implement MSIZE (get memory size)
  6. Calculate memory expansion gas correctly

Validation:

// Store and load
// PUSH1 0x42, PUSH1 0x00, MSTORE, PUSH1 0x00, MLOAD
let result = execute_bytecode("6042600052600051")?;
assert_eq!(result.stack[0], U256::from(0x42));

// MSIZE after storing at offset 0
let result = execute_bytecode("604260005259")?;
assert_eq!(result.stack[0], U256::from(32));  // Expanded to 32 bytes

Hints if stuck:

  • Memory is byte-addressable but MLOAD/MSTORE work with 32-byte words
  • Memory always expands to 32-byte boundaries
  • MSTORE is big-endian: valueโ€™s MSB goes to lowest address
  • Track memory expansion for gas calculation

Phase 5: Storage Operations

Goal: Implement persistent storage.

Tasks:

  1. Implement Storage struct with HashMap
  2. Implement SLOAD (read from storage)
  3. Implement SSTORE (write to storage)
  4. Implement warm/cold access tracking (EIP-2929)
  5. Calculate storage gas costs correctly
  6. Track gas refunds for clearing storage

Validation:

// Store 1 at slot 0, then load it back
// PUSH1 0x01, PUSH1 0x00, SSTORE, PUSH1 0x00, SLOAD
let result = execute_bytecode("6001600055600054")?;
assert_eq!(result.stack[0], U256::from(1));

// Storage persists after execution
assert_eq!(result.storage.get(&U256::zero()), Some(&U256::from(1)));

Hints if stuck:

  • Non-existent slots return 0
  • First access (cold) costs more than subsequent (warm)
  • Clearing a slot (setting to 0) may give gas refund
  • Storage is per-contract (isolated between addresses)

Phase 6: Control Flow

Goal: Implement jumps and control flow.

Tasks:

  1. Pre-analyze code to find valid JUMPDEST locations
  2. Implement JUMP (unconditional)
  3. Implement JUMPI (conditional)
  4. Implement JUMPDEST (no-op, just marks valid target)
  5. Implement STOP, RETURN, REVERT
  6. Implement PC (get program counter)

Validation:

// Simple loop: push 3, jump to JUMPDEST at 0x05, increment
// 0x00: PUSH1 3
// 0x02: PUSH1 5
// 0x04: JUMP
// 0x05: JUMPDEST
// 0x06: PUSH1 1
// 0x08: ADD
// 0x09: STOP
let code = "60036005565b60010100";
let result = execute_bytecode(code)?;
assert_eq!(result.stack[0], U256::from(4));  // 3 + 1

// Invalid jump should fail
let code = "6000600456";  // Jump to 0x04 (not a JUMPDEST)
let result = execute_bytecode(code);
assert!(result.is_err());

Hints if stuck:

  • Must pre-scan code to find all JUMPDEST locations
  • Cannot jump into PUSH data (even if byte value is 0x5B)
  • JUMPI only jumps if condition is non-zero
  • RETURN and REVERT halt execution with return data

Phase 7: Call Data Operations

Goal: Handle input data to the EVM.

Tasks:

  1. Add calldata field to execution context
  2. Implement CALLDATALOAD (read 32 bytes from calldata)
  3. Implement CALLDATASIZE (get calldata length)
  4. Implement CALLDATACOPY (copy calldata to memory)

Validation:

// CALLDATALOAD at offset 0
let calldata = hex::decode("1234567890abcdef")?;
let result = execute_with_calldata("600035", calldata)?;  // PUSH1 0, CALLDATALOAD
// Result should be 0x1234567890abcdef padded to 32 bytes

Hints if stuck:

  • CALLDATALOAD pads with zeros if reading past end
  • CALLDATACOPY reads (offset, size) from stack
  • Function selectors are first 4 bytes of calldata

Phase 8: Gas Metering

Goal: Accurate gas accounting.

Tasks:

  1. Define gas costs for all opcodes
  2. Track gas consumption during execution
  3. Handle out-of-gas errors (revert all changes)
  4. Implement dynamic gas costs (EXP, SHA3, memory)
  5. Track gas refunds (SSTORE cleanup)
  6. Implement GAS opcode (remaining gas)

Validation:

// ADD costs 3 gas
let result = execute_with_gas("6001600101", 10)?;
assert_eq!(result.gas_used, 9);  // 2x PUSH1 (3 each) + ADD (3)

// Out of gas
let result = execute_with_gas("6001600101", 5);
assert!(matches!(result, Err(EvmError::OutOfGas)));

Hints if stuck:

  • Check gas BEFORE executing each opcode
  • Memory expansion gas is calculated incrementally
  • EXP gas depends on byte size of exponent
  • Gas refunds are capped at 20% of total gas used (post-London)

Phase 9: Execution Tracing

Goal: Debug-quality execution traces.

Tasks:

  1. Create trace data structure
  2. Capture state after each opcode
  3. Format output like standard EVM debuggers
  4. Add verbose CLI flag

Validation:

$ evm run --bytecode "6005600301" --trace
PC=0000: PUSH1 0x05           stack=[0x05]             gas=99997
PC=0002: PUSH1 0x03           stack=[0x05, 0x03]       gas=99994
PC=0004: ADD                  stack=[0x08]             gas=99991
PC=0005: STOP                 stack=[0x08]             gas=99991

Hints if stuck:

  • Capture state AFTER each instruction
  • Include PC, opcode name, stack state, memory size, gas
  • Consider JSON output for tooling integration

Testing Strategy

Unit Tests

#[test]
fn test_stack_push_pop() {
    let mut stack = Stack::new();
    stack.push(U256::from(42)).unwrap();
    assert_eq!(stack.pop().unwrap(), U256::from(42));
}

#[test]
fn test_stack_overflow() {
    let mut stack = Stack::new();
    for i in 0..1024 {
        stack.push(U256::from(i)).unwrap();
    }
    assert!(stack.push(U256::from(0)).is_err());
}

#[test]
fn test_memory_expansion() {
    let mut mem = Memory::new();
    mem.store(0, U256::from(42));
    assert_eq!(mem.size(), 32);
    mem.store(100, U256::from(42));
    assert_eq!(mem.size(), 160);  // Rounds up to 32-byte boundary
}

#[test]
fn test_signed_division() {
    // -4 / 2 = -2
    let neg_four = U256::MAX - U256::from(3);  // Two's complement of -4
    let result = sdiv(neg_four, U256::from(2));
    let neg_two = U256::MAX - U256::from(1);   // Two's complement of -2
    assert_eq!(result, neg_two);
}

Opcode Tests

#[test]
fn test_add_basic() {
    let result = execute("6005600301").unwrap();
    assert_eq!(result.stack[0], U256::from(8));
}

#[test]
fn test_add_overflow() {
    // MAX + 1 = 0 (wraps around)
    let code = format!(
        "7f{}600101",  // PUSH32 MAX, PUSH1 1, ADD
        "ff".repeat(32)
    );
    let result = execute(&code).unwrap();
    assert_eq!(result.stack[0], U256::zero());
}

#[test]
fn test_jump_valid() {
    // PUSH1 4, JUMP, INVALID, JUMPDEST, STOP
    let result = execute("6004565bfe5b00").unwrap();
    assert!(result.success);
}

#[test]
fn test_jump_invalid() {
    // PUSH1 3, JUMP (3 is not a JUMPDEST)
    let result = execute("60035600");
    assert!(matches!(result.err(), Some(EvmError::InvalidJump)));
}

Integration Tests

#[test]
fn test_simple_storage_contract() {
    // Contract: store input at slot 0, return storage[0]
    // CALLDATALOAD 0, SSTORE 0, SLOAD 0, MSTORE 0, RETURN 32
    let code = "600035600055600054600052602060000f3";
    let calldata = vec![0u8; 31].into_iter().chain([0x42]).collect();

    let result = execute_with_calldata(code, calldata).unwrap();
    assert_eq!(result.return_data, [0u8; 31].into_iter().chain([0x42]).collect::<Vec<_>>());
}

#[test]
fn test_fibonacci() {
    // Compute 10th Fibonacci number
    // This tests loops, conditionals, stack manipulation
    let code = include_str!("fixtures/fibonacci.hex");
    let result = execute(code).unwrap();
    assert_eq!(result.stack[0], U256::from(55));
}

Gas Tests

#[test]
fn test_gas_simple_operations() {
    // 3 PUSH1 + ADD = 3 + 3 + 3 = 9 gas
    let result = execute_with_gas("6001600201", 100).unwrap();
    assert_eq!(result.gas_used, 9);
}

#[test]
fn test_gas_memory_expansion() {
    // MSTORE at offset 1000 should cost base + expansion
    let result = execute_with_gas("60426103e852", 10000).unwrap();
    let expected_expansion = 3 * 32 + 32 * 32 / 512;  // words = ceil(1032/32) = 33
    assert!(result.gas_used > 3 + 3);  // More than just PUSH costs
}

#[test]
fn test_out_of_gas() {
    let result = execute_with_gas("6001600201", 5);
    assert!(matches!(result.err(), Some(EvmError::OutOfGas)));
}

Common Pitfalls & Debugging

Pitfall 1: Stack Order Confusion

Problem: EVM stack order is counterintuitive for binary operations.

Symptom: SUB and DIV produce wrong results.

Solution:

// For "a - b", a is pushed first, then b
// Stack: [a, b] (b on top)
// SUB pops b, pops a, pushes a-b

fn sub(stack: &mut Stack) -> Result<(), EvmError> {
    let b = stack.pop()?;  // b is on top
    let a = stack.pop()?;  // a is below
    stack.push(a.wrapping_sub(b))
}

Mental model: Think โ€œthe operation is performed on stack items in order they appear in documentation.โ€

Pitfall 2: PUSH Immediate Parsing

Problem: PUSH instructions have variable-length immediates.

Symptom: Code after PUSH executes incorrectly.

Solution:

fn execute_push(ctx: &mut Context, n: usize) -> Result<(), EvmError> {
    // n is 1-32 for PUSH1-PUSH32
    let mut bytes = [0u8; 32];

    // Read n bytes from code, right-aligned in 32-byte array
    let start = 32 - n;
    for i in 0..n {
        let pc = ctx.pc + 1 + i;
        if pc < ctx.code.len() {
            bytes[start + i] = ctx.code[pc];
        }
    }

    ctx.stack.push(U256::from_big_endian(&bytes))?;
    ctx.pc += n + 1;  // Skip opcode + n bytes
    Ok(())
}

Pitfall 3: JUMPDEST Validation

Problem: Jumping into PUSH data that happens to be 0x5B.

Symptom: Arbitrary code execution or wrong jumps.

Solution:

fn analyze_jumpdests(code: &[u8]) -> HashSet<usize> {
    let mut valid = HashSet::new();
    let mut pc = 0;

    while pc < code.len() {
        let opcode = code[pc];

        if opcode == 0x5B {  // JUMPDEST
            valid.insert(pc);
            pc += 1;
        } else if opcode >= 0x60 && opcode <= 0x7F {  // PUSH1-PUSH32
            let push_size = (opcode - 0x60 + 1) as usize;
            pc += 1 + push_size;  // Skip over immediate data
        } else {
            pc += 1;
        }
    }

    valid
}

fn jump(ctx: &mut Context, jumpdests: &HashSet<usize>) -> Result<(), EvmError> {
    let dest = ctx.stack.pop()?.as_usize();
    if !jumpdests.contains(&dest) {
        return Err(EvmError::InvalidJump);
    }
    ctx.pc = dest;
    Ok(())
}

Pitfall 4: Memory Word Alignment

Problem: EVM memory is byte-addressable but MLOAD/MSTORE work with 32-byte words.

Symptom: Values stored incorrectly or shifted.

Solution:

fn mstore(memory: &mut Memory, offset: usize, value: U256) {
    let mut bytes = [0u8; 32];
    value.to_big_endian(&mut bytes);  // Big-endian!

    memory.expand(offset, 32);
    memory.data[offset..offset + 32].copy_from_slice(&bytes);
}

fn mload(memory: &mut Memory, offset: usize) -> U256 {
    memory.expand(offset, 32);

    let bytes: [u8; 32] = memory.data[offset..offset + 32]
        .try_into()
        .unwrap();
    U256::from_big_endian(&bytes)  // Big-endian!
}

Key insight: EVM uses big-endian byte order for memory and storage.

Pitfall 5: Gas Refund Complexity

Problem: SSTORE gas refunds have complex rules.

Symptom: Gas calculations donโ€™t match other implementations.

Solution: Follow EIP-2200 precisely:

fn sstore_gas(original: U256, current: U256, new: U256) -> (u64, i64) {
    // Returns (gas_cost, gas_refund)

    if current == new {
        // No-op: only warm access cost
        return (100, 0);
    }

    if original == current {
        if original.is_zero() {
            // 0 -> non-zero: new storage
            return (20000, 0);
        }
        if new.is_zero() {
            // non-zero -> 0: clearing
            return (5000, 15000);
        }
        // non-zero -> different non-zero
        return (5000, 0);
    }

    // Current != original (already modified in this transaction)
    let mut refund = 0i64;

    if !original.is_zero() {
        if current.is_zero() {
            // Undo previous clearing
            refund -= 15000;
        } else if new.is_zero() {
            // Clear dirty slot
            refund += 15000;
        }
    }

    if original == new {
        // Reset to original value
        if original.is_zero() {
            refund += 20000 - 100;  // Refund the 0->non-zero cost
        } else {
            refund += 5000 - 100;
        }
    }

    (100, refund)  // Warm access cost
}

Extensions and Challenges

Challenge 1: LOG Operations

Implement LOG0-LOG4 for event emission. Store logs with topics and data.

struct Log {
    address: Address,
    topics: Vec<U256>,  // 0-4 topics
    data: Vec<u8>,
}

fn log_n(ctx: &mut Context, n: usize) -> Result<(), EvmError> {
    let offset = ctx.stack.pop()?.as_usize();
    let size = ctx.stack.pop()?.as_usize();
    let mut topics = Vec::with_capacity(n);
    for _ in 0..n {
        topics.push(ctx.stack.pop()?);
    }

    let data = ctx.memory.read_slice(offset, size);
    ctx.logs.push(Log {
        address: ctx.address,
        topics,
        data,
    });
    Ok(())
}

Challenge 2: CREATE and CREATE2

Implement contract deployment:

fn create(ctx: &mut Context) -> Result<(), EvmError> {
    let value = ctx.stack.pop()?;
    let offset = ctx.stack.pop()?.as_usize();
    let size = ctx.stack.pop()?.as_usize();

    let init_code = ctx.memory.read_slice(offset, size);

    // New address = keccak256(rlp([sender, nonce]))[12..]
    let new_address = compute_create_address(ctx.address, ctx.nonce);

    // Execute init_code to get runtime code
    let runtime_code = execute_init_code(init_code, value)?;

    // Store the contract
    ctx.world_state.set_code(new_address, runtime_code);
    ctx.stack.push(new_address.into())
}

Challenge 3: CALL, STATICCALL, DELEGATECALL

Implement external calls with proper context handling:

fn call(ctx: &mut Context) -> Result<(), EvmError> {
    let gas = ctx.stack.pop()?;
    let to = ctx.stack.pop()?;
    let value = ctx.stack.pop()?;
    let args_offset = ctx.stack.pop()?.as_usize();
    let args_size = ctx.stack.pop()?.as_usize();
    let ret_offset = ctx.stack.pop()?.as_usize();
    let ret_size = ctx.stack.pop()?.as_usize();

    let calldata = ctx.memory.read_slice(args_offset, args_size);

    // Create new execution context
    let mut call_ctx = ExecutionContext {
        code: ctx.world_state.get_code(to.into()),
        storage: ctx.world_state.get_storage(to.into()),
        calldata,
        value,
        caller: ctx.address,
        // ...
    };

    let result = execute(&mut call_ctx)?;

    // Copy return data to memory
    ctx.memory.write_slice(ret_offset, &result.return_data[..ret_size.min(result.return_data.len())]);

    ctx.stack.push(if result.success { U256::one() } else { U256::zero() })
}

Challenge 4: Precompiled Contracts

Implement Ethereum precompiles:

  • 0x01: ECRECOVER (signature recovery)
  • 0x02: SHA256
  • 0x03: RIPEMD160
  • 0x04: Identity (data copy)
  • 0x05: MODEXP (modular exponentiation)
  • 0x06-0x08: Elliptic curve operations

Challenge 5: EVM Fuzzer

Build a fuzzer that generates random bytecode and compares execution results against a reference implementation (like revm or geth):

fn fuzz_evm() {
    let mut rng = rand::thread_rng();

    loop {
        let bytecode = generate_random_bytecode(&mut rng);

        let our_result = our_evm::execute(&bytecode, GAS_LIMIT);
        let reference_result = revm::execute(&bytecode, GAS_LIMIT);

        assert_eq!(our_result.stack, reference_result.stack);
        assert_eq!(our_result.memory, reference_result.memory);
        assert_eq!(our_result.gas_used, reference_result.gas_used);
    }
}

Real-World Connections

Ethereum Clients

Production Ethereum clients (Geth, Nethermind, Reth, Erigon) all implement the EVM. Your implementation follows the same specification:

  • go-ethereum (Geth): The reference implementation in Go
  • Nethermind: .NET implementation with focus on performance
  • Reth: New Rust implementation (excellent code to study)
  • Erigon: Fork of Geth optimized for archive nodes

Smart Contract Debugging

Tools like Tenderly, Foundryโ€™s debugger, and Hardhatโ€™s console use EVM tracing:

# Foundry trace example
$ forge test -vvvv
[PASS] testTransfer()
  [5432] Token::transfer(0x123..., 100)
    โ”œโ”€ [2100] SLOAD slot 0x00...01
    โ”œโ”€ [5000] SSTORE slot 0x00...01
    โ””โ”€ emit Transfer(from: 0xabc, to: 0x123, value: 100)

Understanding EVM execution lets you debug at this level.

Security Auditing

Auditors trace execution to find vulnerabilities:

// Vulnerable to reentrancy
function withdraw() external {
    uint bal = balances[msg.sender];
    (bool success,) = msg.sender.call{value: bal}("");  // EVM: CALL
    balances[msg.sender] = 0;  // EVM: SSTORE
}

The EVM executes CALL before SSTORE. If the recipient is a contract, it can call back before the balance is zeroed.

Gas Optimization

Solidity developers optimize gas by understanding EVM costs:

// Expensive: many SLOADs
for (uint i = 0; i < array.length; i++) {  // SLOAD every iteration
    total += array[i];
}

// Cheaper: cache length
uint len = array.length;  // Single SLOAD
for (uint i = 0; i < len; i++) {
    total += array[i];
}

Layer 2 Execution

Optimistic rollups (Arbitrum, Optimism) and zkEVMs (zkSync, Scroll) execute EVM bytecode off-chain:

  • Optimistic: Run EVM and post results; fraud proofs re-execute disputed transactions
  • zkEVM: Generate zero-knowledge proofs of correct EVM execution

Understanding the EVM is prerequisite to understanding L2 scaling.


Resources

Primary References

  1. Ethereum Yellow Paper: ethereum.github.io/yellowpaper - Formal EVM specification
  2. evm.codes: evm.codes - Interactive opcode reference
  3. โ€œMastering Ethereumโ€ Chapter 13: EVM deep dive

Code References

  1. revm: github.com/bluealloy/revm - Fast Rust EVM (study this!)
  2. py-evm: github.com/ethereum/py-evm - Readable Python implementation
  3. evmone: github.com/ethereum/evmone - Optimized C++ implementation

Test Suites

  1. Ethereum Tests: github.com/ethereum/tests - Official test vectors
  2. EVM Test: Use these to validate your implementation against the spec

Supplementary Reading

  1. EIPs affecting EVM:
    • EIP-2929: Gas cost changes for state access
    • EIP-3529: Reduction in gas refunds
    • EIP-3541: Reject contracts starting with 0xEF
  2. โ€œEVM Deep Divesโ€ series by noxx: Practical EVM internals

Self-Assessment Checklist

Before moving to the next project, verify:

  • I can explain why the EVM is stack-based rather than register-based
  • I understand the difference between memory (volatile) and storage (persistent)
  • I can trace through bytecode execution manually on paper
  • My implementation correctly handles stack overflow/underflow
  • All arithmetic operations handle 256-bit wraparound correctly
  • JUMP validation prevents jumping into PUSH immediate data
  • Gas metering matches expected values for all opcodes
  • Memory expansion costs are calculated correctly
  • Storage warm/cold access is tracked properly
  • I can explain how the EVM fits into Ethereumโ€™s state transition function

Whatโ€™s Next?

With the EVM mastered, you now understand how smart contracts actually execute. In Project 9: ERC-20 Token from Scratch, youโ€™ll apply this knowledge by implementing the most fundamental token standardโ€”understanding exactly what bytecode gets generated and why tokens work the way they do.

Your EVM implementation can also serve as the execution engine for:

  • A custom blockchain (combine with Project 4)
  • A smart contract debugger
  • A security analysis tool (Project 14)
  • An L2 fraud prover (Project 13)

The EVM is the computational heart of Ethereum. Now you own that knowledge.