P08: Simple EVM Implementation
P08: Simple EVM Implementation
Project Overview
| Attribute | Value |
|---|---|
| Main Language | Rust |
| Alternative Languages | Go, C++, TypeScript |
| Difficulty | Master |
| Coolness Level | Level 5: Industry-Defining Achievement |
| Business Potential | Career Catalyst (Protocol Team Credentials) |
| Knowledge Area | Virtual Machines / Ethereum |
| Main Book | โMastering Ethereumโ by Andreas M. Antonopoulos & Gavin Wood |
Learning Objectives
By completing this project, you will:
- Understand the Ethereum Virtual Machine architecture including its stack-based execution model, memory layout, and storage semantics
- Master bytecode interpretation implementing opcode decoding, execution, and control flow handling
- Implement a complete gas metering system understanding why gas exists and how it prevents denial-of-service attacks
- Build persistent smart contract storage with 256-bit key-value semantics matching Ethereumโs SSTORE/SLOAD
- Debug and trace smart contract execution at the opcode level, a critical skill for security auditing and optimization
- Connect execution to consensus understanding how the EVM fits into Ethereumโs state transition function
Deep Theoretical Foundation
What is the Ethereum Virtual Machine?
The Ethereum Virtual Machine (EVM) is the computation engine at the heart of Ethereum. Every smart contract youโve ever interacted withโfrom Uniswap swaps to NFT mints to DAO votesโexecutes as EVM bytecode. Understanding the EVM means understanding how Ethereum actually works.
Unlike Bitcoinโs limited Script language, the EVM is Turing-complete: it can compute anything computable (given enough gas). This makes Ethereum a โworld computerโโa globally shared state machine where anyone can deploy arbitrary programs.
Why Build an EVM?
Building an EVM from scratch teaches you:
- How smart contracts actually execute (not just what Solidity compiles to)
- Why gas exists and how itโs calculated (essential for optimization)
- How storage works (critical for security auditing)
- How to debug at the opcode level (invaluable for complex issues)
- The exact semantics that determine whether a transaction succeeds or reverts
This knowledge separates smart contract developers who guess from those who know.
The EVM Architecture
Stack-Based Execution
The EVM is a stack machine, not a register machine like x86 or ARM. Every operation works with a stack:
Stack-Based (EVM): Register-Based (x86):
PUSH 5 mov eax, 5
PUSH 3 add eax, 3
ADD ; result in eax
; result on stack top
Stack properties:
- Maximum depth: 1024 items
- Each item: 256 bits (32 bytes)
- Most operations pop operands from stack and push results
- No direct access to stack items below the top (except DUP and SWAP)
Why stack-based?
- Simpler verification: Stack machines are easier to analyze for correctness
- Compact bytecode: No need to encode register names
- Deterministic: Same inputs always produce same stack states
Memory Model
The EVM has three distinct data areas:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ EVM DATA AREAS โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ STACK โ โ
โ โ โข 256-bit words, max 1024 items โ โ
โ โ โข LIFO (Last In, First Out) โ โ
โ โ โข Volatile (cleared after execution) โ โ
โ โ โข Zero gas for basic push/pop โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ MEMORY โ โ
โ โ โข Byte-addressable, linear โ โ
โ โ โข Starts at zero, expands as needed โ โ
โ โ โข Volatile (cleared after execution) โ โ
โ โ โข Expansion costs gas (quadratic) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ STORAGE โ โ
โ โ โข 256-bit key โ 256-bit value mapping โ โ
โ โ โข Persistent (survives transactions) โ โ
โ โ โข Per-contract (isolated between accounts) โ โ
โ โ โข Very expensive (20,000 gas for new slot) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Stack: Fast, temporary workspace for computation. Think of it as CPU registers.
Memory: Byte-addressable scratch space. Grows as needed during execution but is wiped between calls. Used for:
- Building return data
- Passing parameters to internal functions
- Temporary arrays and structs
Storage: Persistent key-value store. This is the contractโs โhard drive.โ Storage is what makes contracts statefulโyour token balance, your NFT ownership, the current DEX price all live in storage.
Gas System
Gas is the EVMโs execution metering system. Every operation costs gas:
| Operation | Gas Cost | Rationale |
|---|---|---|
| ADD, SUB, XOR | 3 | Simple arithmetic |
| MUL, DIV | 5 | More complex math |
| SHA3 (KECCAK256) | 30 + 6/word | Cryptographic, scales with size |
| SLOAD | 2100 (cold) / 100 (warm) | Disk I/O equivalent |
| SSTORE | 20000 (new) / 5000 (update) | Persistent write |
| CREATE | 32000 | Contract deployment |
Why gas exists:
- Denial of service prevention: Without gas, anyone could run infinite loops
- Resource pricing: Gas aligns incentivesโheavy computation costs more
- Miner/validator compensation: Gas fees pay for execution
- Predictable limits: Gas limits bound maximum execution time
Gas mechanics:
Transaction gas limit: 21000 + execution gas
Block gas limit: ~30 million (Ethereum mainnet)
If execution runs out of gas:
- All state changes revert
- Gas is NOT refunded (still paid to validators)
- Transaction marked as failed
Program Counter and Control Flow
The EVM reads bytecode sequentially from position 0. The program counter (PC) tracks the current instruction.
Linear execution:
PC=0: PUSH1 0x05 โ stack: [5] โ PC=2
PC=2: PUSH1 0x03 โ stack: [5, 3] โ PC=4
PC=4: ADD โ stack: [8] โ PC=5
PC=5: STOP โ halt
Control flow instructions:
JUMP: Set PC to value from stack (must land on JUMPDEST)JUMPI: Conditional jump (if stack[1] != 0, jump to stack[0])JUMPDEST: Valid jump target marker
Critical security detail: All jumps must land on a JUMPDEST opcode. This prevents jumping into the middle of a PUSH instruction (which could be exploited to create unintended code).
Understanding Opcodes
EVM opcodes are single bytes (0x00 to 0xFF). Here are the core categories:
Arithmetic Operations (0x01-0x0B)
| Opcode | Name | Stack | Description |
|---|---|---|---|
| 0x01 | ADD | [a, b] โ [a+b] | Addition modulo 2^256 |
| 0x02 | MUL | [a, b] โ [a*b] | Multiplication modulo 2^256 |
| 0x03 | SUB | [a, b] โ [a-b] | Subtraction modulo 2^256 |
| 0x04 | DIV | [a, b] โ [a/b] | Integer division (0 if b=0) |
| 0x05 | SDIV | [a, b] โ [a/b] | Signed division |
| 0x06 | MOD | [a, b] โ [a%b] | Modulo (0 if b=0) |
| 0x07 | SMOD | [a, b] โ [a%b] | Signed modulo |
| 0x08 | ADDMOD | [a, b, N] โ [(a+b)%N] | Addition then modulo |
| 0x09 | MULMOD | [a, b, N] โ [(a*b)%N] | Multiplication then modulo |
| 0x0A | EXP | [a, b] โ [a^b] | Exponentiation |
| 0x0B | SIGNEXTEND | [b, x] โ [y] | Sign extend x from (b+1) bytes |
Comparison Operations (0x10-0x1D)
| Opcode | Name | Stack | Description |
|---|---|---|---|
| 0x10 | LT | [a, b] โ [1 if a<b else 0] | Less than |
| 0x11 | GT | [a, b] โ [1 if a>b else 0] | Greater than |
| 0x12 | SLT | [a, b] โ [1 if a<b else 0] | Signed less than |
| 0x13 | SGT | [a, b] โ [1 if a>b else 0] | Signed greater than |
| 0x14 | EQ | [a, b] โ [1 if a==b else 0] | Equality |
| 0x15 | ISZERO | [a] โ [1 if a==0 else 0] | Is zero |
Bitwise Operations (0x16-0x1D)
| Opcode | Name | Stack | Description |
|---|---|---|---|
| 0x16 | AND | [a, b] โ [a & b] | Bitwise AND |
| 0x17 | OR | [a, b] โ [a | b] | Bitwise OR |
| 0x18 | XOR | [a, b] โ [a ^ b] | Bitwise XOR |
| 0x19 | NOT | [a] โ [~a] | Bitwise NOT |
| 0x1A | BYTE | [i, x] โ [byte] | Get byte i of x (0 = most significant) |
| 0x1B | SHL | [shift, val] โ [val ยซย shift] | Shift left |
| 0x1C | SHR | [shift, val] โ [valย ยป shift] | Logical shift right |
| 0x1D | SAR | [shift, val] โ [valย ยป shift] | Arithmetic shift right |
Stack Operations (0x50-0x5B, 0x80-0x8F, 0x90-0x9F)
| Opcode | Name | Description |
|---|---|---|
| 0x50 | POP | Remove top stack item |
| 0x51 | MLOAD | Load 32 bytes from memory |
| 0x52 | MSTORE | Store 32 bytes to memory |
| 0x53 | MSTORE8 | Store 1 byte to memory |
| 0x54 | SLOAD | Load from storage |
| 0x55 | SSTORE | Store to storage |
| 0x56 | JUMP | Jump to address |
| 0x57 | JUMPI | Conditional jump |
| 0x58 | PC | Get program counter |
| 0x59 | MSIZE | Get memory size |
| 0x5A | GAS | Get remaining gas |
| 0x5B | JUMPDEST | Mark valid jump destination |
| 0x60-0x7F | PUSH1-PUSH32 | Push 1-32 bytes onto stack |
| 0x80-0x8F | DUP1-DUP16 | Duplicate stack item |
| 0x90-0x9F | SWAP1-SWAP16 | Swap stack items |
Memory Expansion and Gas
Memory starts empty and expands as you access higher addresses:
Memory expansion cost = 3 * words + words^2 / 512
where words = ceil(highest_accessed_byte / 32)
Example:
- Access byte 0: 0 words โ 0 cost (first 32 bytes are free-ish)
- Access byte 1000: 32 words โ 3*32 + 32^2/512 = 96 + 2 = 98 gas
- Access byte 10000: 313 words โ 3*313 + 313^2/512 = 939 + 191 = 1130 gas
The quadratic term prevents unbounded memory expansion attacks.
Storage: The Persistent State
Storage is a mapping from 256-bit keys to 256-bit values. Each contract has its own isolated storage.
Contract Storage:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ slot 0: 0x000...000 โ 0x000...100 (maybe: token supply) โ
โ slot 1: 0x000...001 โ 0x123...456 (maybe: owner address) โ
โ ... โ
โ slot N: mapping hash โ value (dynamic data) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Solidity storage layout:
contract Example {
uint256 public totalSupply; // slot 0
address public owner; // slot 1
mapping(address => uint256) public balances; // slot 2 (base)
// For mappings, actual slot = keccak256(key . base_slot)
// balances[0xABC] is at keccak256(abi.encode(0xABC, 2))
}
Storage gas costs (EIP-2200):
- SLOAD cold: 2100 gas (first access in transaction)
- SLOAD warm: 100 gas (already accessed)
- SSTORE from 0 to non-0: 20000 gas
- SSTORE from non-0 to non-0: 5000 gas
- SSTORE from non-0 to 0: 5000 gas - 15000 refund
The State Transition Function
The EVM is the core of Ethereumโs state transition function:
ฯ' = ฮฅ(ฯ, T)
where:
ฯ = current world state (all account balances, storage, code)
T = transaction
ฮฅ = state transition function (includes EVM)
ฯ' = new world state
Every block applies transactions sequentially, each running the EVM to compute state changes:
Block Processing:
โโโโโโโโโโโโโโโ
โ State ฯโ โ โโโบ Transaction 1 โโโบ ฯโ โโโบ Transaction 2 โโโบ ฯโ โโโบ ... โโโบ ฯโ
โโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโ
โ State ฯโ โ (new state root in block header)
โโโโโโโโโโโโโโโ
Complete Project Specification
Functional Requirements
- Bytecode Execution
- Load and execute arbitrary EVM bytecode
- Support all PUSH1-PUSH32 instructions
- Implement arithmetic operations (ADD, MUL, SUB, DIV, MOD, EXP)
- Implement comparison operations (LT, GT, EQ, ISZERO)
- Implement bitwise operations (AND, OR, XOR, NOT, SHL, SHR)
- Stack Management
- 256-bit stack items
- Maximum depth of 1024
- DUP1-DUP16 operations
- SWAP1-SWAP16 operations
- Proper stack underflow/overflow errors
- Memory Operations
- Dynamic memory expansion
- MLOAD, MSTORE, MSTORE8 operations
- Proper gas calculation for expansion
- MSIZE operation
- Storage Operations
- 256-bit key to 256-bit value mapping
- SLOAD and SSTORE operations
- Persistent state between calls
- Gas cost differentiation (cold/warm access)
- Control Flow
- JUMP and JUMPI instructions
- JUMPDEST validation
- Proper PC tracking
- STOP, RETURN, and REVERT
- Gas Metering
- Accurate gas tracking per opcode
- Out-of-gas detection and handling
- Gas refunds for SSTORE (zero-out)
- Report remaining gas
- Execution Environment
- Call data (CALLDATALOAD, CALLDATASIZE, CALLDATACOPY)
- Return data handling
- Execution trace output
Command-Line Interface
# Execute bytecode from hex string
$ evm run --bytecode "6005600301" --gas 100000
Stack: [0x08]
Gas used: 9
Gas remaining: 99991
# Execute bytecode from file
$ evm run --file contract.bin --gas 100000 --trace
PC=0000: PUSH1 0x05 stack=[5] gas=99997
PC=0002: PUSH1 0x03 stack=[5, 3] gas=99994
PC=0004: ADD stack=[8] gas=99991
PC=0005: STOP stack=[8] gas=99991
# Execute with call data
$ evm run --bytecode "600035" --calldata "0x1234567890" --gas 100000
Stack: [0x1234567890000000000000000000000000000000000000000000000000000000]
# Disassemble bytecode
$ evm disasm --bytecode "6005600301"
0000: PUSH1 0x05
0002: PUSH1 0x03
0004: ADD
0005: STOP
# Show storage state
$ evm run --bytecode "60016000555460005260206000f3" --gas 100000 --show-storage
Storage:
0x00: 0x01
Return data: 0x0000000000000000000000000000000000000000000000000000000000000001
Solution Architecture
Module Structure
src/
โโโ main.rs # CLI entry point
โโโ lib.rs # Public API
โโโ evm/
โ โโโ mod.rs # EVM coordinator
โ โโโ interpreter.rs # Main execution loop
โ โโโ opcodes.rs # Opcode definitions and dispatch
โ โโโ gas.rs # Gas cost calculations
โโโ state/
โ โโโ mod.rs # State management
โ โโโ stack.rs # 256-bit stack implementation
โ โโโ memory.rs # Byte-addressable memory
โ โโโ storage.rs # Persistent key-value storage
โโโ types/
โ โโโ mod.rs # Core types
โ โโโ u256.rs # 256-bit unsigned integer
โ โโโ address.rs # Ethereum address type
โโโ context/
โ โโโ mod.rs # Execution context
โ โโโ call_context.rs # Call frame information
โ โโโ block_context.rs # Block information
โโโ trace/
โ โโโ mod.rs # Execution tracing
โ โโโ step.rs # Single step representation
โ โโโ formatter.rs # Trace output formatting
โโโ tests/
โโโ arithmetic.rs # Arithmetic opcode tests
โโโ stack.rs # Stack operation tests
โโโ memory.rs # Memory operation tests
โโโ storage.rs # Storage tests
โโโ control_flow.rs # Jump tests
โโโ integration.rs # Full contract tests
Core Data Structures
use primitive_types::U256; // Or implement your own
/// Maximum stack depth
const MAX_STACK_DEPTH: usize = 1024;
/// The EVM stack - 256-bit items, max 1024 depth
pub struct Stack {
data: Vec<U256>,
}
impl Stack {
pub fn new() -> Self {
Self { data: Vec::with_capacity(16) }
}
pub fn push(&mut self, value: U256) -> Result<(), EvmError> {
if self.data.len() >= MAX_STACK_DEPTH {
return Err(EvmError::StackOverflow);
}
self.data.push(value);
Ok(())
}
pub fn pop(&mut self) -> Result<U256, EvmError> {
self.data.pop().ok_or(EvmError::StackUnderflow)
}
pub fn peek(&self, depth: usize) -> Result<U256, EvmError> {
let idx = self.data.len().checked_sub(depth + 1)
.ok_or(EvmError::StackUnderflow)?;
Ok(self.data[idx])
}
pub fn swap(&mut self, depth: usize) -> Result<(), EvmError> {
let len = self.data.len();
if depth >= len {
return Err(EvmError::StackUnderflow);
}
self.data.swap(len - 1, len - 1 - depth);
Ok(())
}
pub fn dup(&mut self, depth: usize) -> Result<(), EvmError> {
let value = self.peek(depth - 1)?;
self.push(value)
}
}
/// EVM memory - byte addressable, dynamically expanding
pub struct Memory {
data: Vec<u8>,
}
impl Memory {
pub fn new() -> Self {
Self { data: Vec::new() }
}
/// Expand memory to fit the given offset + size
fn expand(&mut self, offset: usize, size: usize) {
let required = offset.saturating_add(size);
if required > self.data.len() {
// Expand to next 32-byte boundary
let new_size = (required + 31) / 32 * 32;
self.data.resize(new_size, 0);
}
}
/// Calculate gas cost for memory expansion
pub fn expansion_cost(&self, offset: usize, size: usize) -> u64 {
if size == 0 {
return 0;
}
let new_words = (offset + size + 31) / 32;
let current_words = self.data.len() / 32;
if new_words <= current_words {
return 0;
}
let new_cost = 3 * new_words as u64 + (new_words as u64).pow(2) / 512;
let current_cost = 3 * current_words as u64 + (current_words as u64).pow(2) / 512;
new_cost - current_cost
}
pub fn load(&mut self, offset: usize) -> U256 {
self.expand(offset, 32);
let bytes: [u8; 32] = self.data[offset..offset + 32]
.try_into()
.unwrap();
U256::from_big_endian(&bytes)
}
pub fn store(&mut self, offset: usize, value: U256) {
self.expand(offset, 32);
let mut bytes = [0u8; 32];
value.to_big_endian(&mut bytes);
self.data[offset..offset + 32].copy_from_slice(&bytes);
}
pub fn store8(&mut self, offset: usize, value: u8) {
self.expand(offset, 1);
self.data[offset] = value;
}
pub fn size(&self) -> usize {
self.data.len()
}
}
/// Contract storage - 256-bit key to 256-bit value
pub struct Storage {
data: HashMap<U256, U256>,
original: HashMap<U256, U256>, // For gas refund calculation
warm_slots: HashSet<U256>, // For EIP-2929 warm/cold access
}
impl Storage {
pub fn new() -> Self {
Self {
data: HashMap::new(),
original: HashMap::new(),
warm_slots: HashSet::new(),
}
}
pub fn load(&mut self, key: U256) -> (U256, bool) {
let is_cold = !self.warm_slots.contains(&key);
self.warm_slots.insert(key);
let value = self.data.get(&key).copied().unwrap_or(U256::zero());
(value, is_cold)
}
pub fn store(&mut self, key: U256, value: U256) -> StoreCost {
let current = self.data.get(&key).copied().unwrap_or(U256::zero());
let original = self.original.get(&key).copied().unwrap_or(U256::zero());
// Track original value for refund calculations
if !self.original.contains_key(&key) {
self.original.insert(key, current);
}
self.data.insert(key, value);
self.warm_slots.insert(key);
StoreCost::calculate(original, current, value)
}
}
/// Execution context for a single call
pub struct ExecutionContext {
pub code: Vec<u8>,
pub pc: usize,
pub stack: Stack,
pub memory: Memory,
pub storage: Storage,
pub gas_remaining: u64,
pub gas_refund: i64,
pub calldata: Vec<u8>,
pub return_data: Vec<u8>,
pub stopped: bool,
pub reverted: bool,
}
/// Opcode enumeration
#[derive(Debug, Clone, Copy)]
pub enum Opcode {
Stop = 0x00,
Add = 0x01,
Mul = 0x02,
Sub = 0x03,
Div = 0x04,
SDiv = 0x05,
Mod = 0x06,
SMod = 0x07,
AddMod = 0x08,
MulMod = 0x09,
Exp = 0x0a,
SignExtend = 0x0b,
Lt = 0x10,
Gt = 0x11,
SLt = 0x12,
SGt = 0x13,
Eq = 0x14,
IsZero = 0x15,
And = 0x16,
Or = 0x17,
Xor = 0x18,
Not = 0x19,
Byte = 0x1a,
Shl = 0x1b,
Shr = 0x1c,
Sar = 0x1d,
Keccak256 = 0x20,
Pop = 0x50,
MLoad = 0x51,
MStore = 0x52,
MStore8 = 0x53,
SLoad = 0x54,
SStore = 0x55,
Jump = 0x56,
JumpI = 0x57,
Pc = 0x58,
MSize = 0x59,
Gas = 0x5a,
JumpDest = 0x5b,
Push1 = 0x60,
// ... PUSH2-PUSH32
Push32 = 0x7f,
Dup1 = 0x80,
// ... DUP2-DUP16
Dup16 = 0x8f,
Swap1 = 0x90,
// ... SWAP2-SWAP16
Swap16 = 0x9f,
Return = 0xf3,
Revert = 0xfd,
Invalid = 0xfe,
}
/// Error types
#[derive(Debug, Clone)]
pub enum EvmError {
StackOverflow,
StackUnderflow,
OutOfGas,
InvalidJump,
InvalidOpcode(u8),
WriteProtection,
OutOfBounds,
}
/// Execution result
pub struct ExecutionResult {
pub success: bool,
pub return_data: Vec<u8>,
pub gas_used: u64,
pub gas_refund: i64,
pub logs: Vec<Log>,
}
Key Algorithms
Main Execution Loop
function execute(context: ExecutionContext) -> ExecutionResult:
// Pre-compute valid JUMPDEST locations
jumpdests = analyze_jumpdests(context.code)
while context.pc < context.code.len() and not context.stopped:
opcode = context.code[context.pc]
gas_cost = calculate_gas_cost(opcode, context)
if context.gas_remaining < gas_cost:
return Error(OutOfGas)
context.gas_remaining -= gas_cost
result = execute_opcode(opcode, context, jumpdests)
if result.is_error():
return result
// PUSH instructions advance PC by immediate size + 1
// JUMP/JUMPI set PC directly
// Other instructions advance PC by 1
return ExecutionResult {
success: not context.reverted,
return_data: context.return_data,
gas_used: initial_gas - context.gas_remaining,
gas_refund: context.gas_refund,
}
JUMPDEST Analysis
function analyze_jumpdests(code: bytes) -> Set<usize>:
jumpdests = {}
pc = 0
while pc < code.len():
opcode = code[pc]
if opcode == JUMPDEST (0x5B):
jumpdests.add(pc)
pc += 1
else if opcode >= PUSH1 and opcode <= PUSH32:
// Skip over the immediate data
push_size = opcode - PUSH1 + 1
pc += 1 + push_size
else:
pc += 1
return jumpdests
Gas Calculation
function calculate_gas_cost(opcode: u8, context: ExecutionContext) -> u64:
base_cost = match opcode:
STOP, RETURN, REVERT => 0
ADD, SUB, NOT, LT, GT, ... => 3
MUL, DIV, MOD, ... => 5
EXP => 10 + 50 * (byte_size_of_exponent)
SHA3 => 30 + 6 * (word_count)
SLOAD => if cold 2100 else 100
SSTORE => compute_sstore_cost(...)
MLOAD, MSTORE, MSTORE8 => 3
JUMP => 8
JUMPI => 10
...
// Add memory expansion cost if applicable
if opcode in [MLOAD, MSTORE, MSTORE8, SHA3, ...]:
memory_cost = context.memory.expansion_cost(offset, size)
base_cost += memory_cost
return base_cost
Signed Integer Handling
function to_signed(x: U256) -> I256:
// 256-bit two's complement
if x >= 2^255:
return x - 2^256
else:
return x
function from_signed(x: I256) -> U256:
if x < 0:
return x + 2^256
else:
return x
function sdiv(a: U256, b: U256) -> U256:
if b == 0:
return 0
a_signed = to_signed(a)
b_signed = to_signed(b)
// Special case: -2^255 / -1 = -2^255 (overflow)
if a_signed == -2^255 and b_signed == -1:
return a // -2^255 as U256
result = a_signed / b_signed
return from_signed(result)
Phased Implementation Guide
Phase 1: Core Types and Stack
Goal: Implement 256-bit integers and the stack machine.
Tasks:
- Choose or implement a U256 type (recommend
primitive-typescrate for learning) - Implement the Stack struct with push, pop, peek
- Implement DUP1-DUP16 operations
- Implement SWAP1-SWAP16 operations
- Add proper overflow/underflow error handling
Validation:
let mut stack = Stack::new();
stack.push(U256::from(5))?;
stack.push(U256::from(3))?;
stack.dup(1)?; // Duplicate second item
assert_eq!(stack.pop()?, U256::from(5));
assert_eq!(stack.pop()?, U256::from(3));
assert_eq!(stack.pop()?, U256::from(5));
Hints if stuck:
- Use
Vec<U256>for the stack data - Stack grows upward: index 0 is bottom, last index is top
- DUP1 duplicates top item (depth 0), DUP2 duplicates second item (depth 1)
- SWAP1 swaps top with second, SWAP2 swaps top with third
Phase 2: Arithmetic Opcodes
Goal: Implement basic arithmetic operations.
Tasks:
- Create opcode dispatcher structure
- Implement ADD, SUB, MUL, DIV, MOD
- Implement SDIV, SMOD (signed variants)
- Implement ADDMOD, MULMOD
- Implement EXP (with proper gas calculation)
Validation:
// ADD: 5 + 3 = 8
let result = execute_bytecode("6005600301")?; // PUSH1 5, PUSH1 3, ADD
assert_eq!(result.stack[0], U256::from(8));
// SUB: 5 - 3 = 2
let result = execute_bytecode("6003600503")?; // PUSH1 3, PUSH1 5, SUB
assert_eq!(result.stack[0], U256::from(2));
// MUL: 5 * 3 = 15
let result = execute_bytecode("6005600302")?;
assert_eq!(result.stack[0], U256::from(15));
// DIV by zero = 0
let result = execute_bytecode("6000600304")?; // PUSH1 0, PUSH1 3, DIV
assert_eq!(result.stack[0], U256::from(0));
Hints if stuck:
- All arithmetic is modulo 2^256 (use wrapping operations)
- Division by zero returns 0 (not an error in EVM)
- For signed operations, interpret as twoโs complement
- Stack order matters:
a b OPmeansa OP bwhereais deeper
Phase 3: Comparison and Bitwise
Goal: Implement comparison and bitwise operations.
Tasks:
- Implement LT, GT, SLT, SGT, EQ, ISZERO
- Implement AND, OR, XOR, NOT
- Implement BYTE (extract single byte)
- Implement SHL, SHR, SAR (shifts)
Validation:
// LT: 3 < 5 = 1
let result = execute_bytecode("6005600310")?; // PUSH1 5, PUSH1 3, LT
assert_eq!(result.stack[0], U256::from(1));
// ISZERO: iszero(0) = 1
let result = execute_bytecode("600015")?;
assert_eq!(result.stack[0], U256::from(1));
// SHL: 1 << 4 = 16
let result = execute_bytecode("600160041b")?;
assert_eq!(result.stack[0], U256::from(16));
Hints if stuck:
- Comparison returns 1 (true) or 0 (false)
- BYTE: index 0 is the most significant byte
- SHL/SHR shift amounts >= 256 result in 0
- SAR preserves the sign bit (arithmetic shift)
Phase 4: Memory Operations
Goal: Implement the memory subsystem.
Tasks:
- Implement Memory struct with dynamic expansion
- Implement MLOAD (32-byte load)
- Implement MSTORE (32-byte store)
- Implement MSTORE8 (single byte store)
- Implement MSIZE (get memory size)
- Calculate memory expansion gas correctly
Validation:
// Store and load
// PUSH1 0x42, PUSH1 0x00, MSTORE, PUSH1 0x00, MLOAD
let result = execute_bytecode("6042600052600051")?;
assert_eq!(result.stack[0], U256::from(0x42));
// MSIZE after storing at offset 0
let result = execute_bytecode("604260005259")?;
assert_eq!(result.stack[0], U256::from(32)); // Expanded to 32 bytes
Hints if stuck:
- Memory is byte-addressable but MLOAD/MSTORE work with 32-byte words
- Memory always expands to 32-byte boundaries
- MSTORE is big-endian: valueโs MSB goes to lowest address
- Track memory expansion for gas calculation
Phase 5: Storage Operations
Goal: Implement persistent storage.
Tasks:
- Implement Storage struct with HashMap
- Implement SLOAD (read from storage)
- Implement SSTORE (write to storage)
- Implement warm/cold access tracking (EIP-2929)
- Calculate storage gas costs correctly
- Track gas refunds for clearing storage
Validation:
// Store 1 at slot 0, then load it back
// PUSH1 0x01, PUSH1 0x00, SSTORE, PUSH1 0x00, SLOAD
let result = execute_bytecode("6001600055600054")?;
assert_eq!(result.stack[0], U256::from(1));
// Storage persists after execution
assert_eq!(result.storage.get(&U256::zero()), Some(&U256::from(1)));
Hints if stuck:
- Non-existent slots return 0
- First access (cold) costs more than subsequent (warm)
- Clearing a slot (setting to 0) may give gas refund
- Storage is per-contract (isolated between addresses)
Phase 6: Control Flow
Goal: Implement jumps and control flow.
Tasks:
- Pre-analyze code to find valid JUMPDEST locations
- Implement JUMP (unconditional)
- Implement JUMPI (conditional)
- Implement JUMPDEST (no-op, just marks valid target)
- Implement STOP, RETURN, REVERT
- Implement PC (get program counter)
Validation:
// Simple loop: push 3, jump to JUMPDEST at 0x05, increment
// 0x00: PUSH1 3
// 0x02: PUSH1 5
// 0x04: JUMP
// 0x05: JUMPDEST
// 0x06: PUSH1 1
// 0x08: ADD
// 0x09: STOP
let code = "60036005565b60010100";
let result = execute_bytecode(code)?;
assert_eq!(result.stack[0], U256::from(4)); // 3 + 1
// Invalid jump should fail
let code = "6000600456"; // Jump to 0x04 (not a JUMPDEST)
let result = execute_bytecode(code);
assert!(result.is_err());
Hints if stuck:
- Must pre-scan code to find all JUMPDEST locations
- Cannot jump into PUSH data (even if byte value is 0x5B)
- JUMPI only jumps if condition is non-zero
- RETURN and REVERT halt execution with return data
Phase 7: Call Data Operations
Goal: Handle input data to the EVM.
Tasks:
- Add calldata field to execution context
- Implement CALLDATALOAD (read 32 bytes from calldata)
- Implement CALLDATASIZE (get calldata length)
- Implement CALLDATACOPY (copy calldata to memory)
Validation:
// CALLDATALOAD at offset 0
let calldata = hex::decode("1234567890abcdef")?;
let result = execute_with_calldata("600035", calldata)?; // PUSH1 0, CALLDATALOAD
// Result should be 0x1234567890abcdef padded to 32 bytes
Hints if stuck:
- CALLDATALOAD pads with zeros if reading past end
- CALLDATACOPY reads (offset, size) from stack
- Function selectors are first 4 bytes of calldata
Phase 8: Gas Metering
Goal: Accurate gas accounting.
Tasks:
- Define gas costs for all opcodes
- Track gas consumption during execution
- Handle out-of-gas errors (revert all changes)
- Implement dynamic gas costs (EXP, SHA3, memory)
- Track gas refunds (SSTORE cleanup)
- Implement GAS opcode (remaining gas)
Validation:
// ADD costs 3 gas
let result = execute_with_gas("6001600101", 10)?;
assert_eq!(result.gas_used, 9); // 2x PUSH1 (3 each) + ADD (3)
// Out of gas
let result = execute_with_gas("6001600101", 5);
assert!(matches!(result, Err(EvmError::OutOfGas)));
Hints if stuck:
- Check gas BEFORE executing each opcode
- Memory expansion gas is calculated incrementally
- EXP gas depends on byte size of exponent
- Gas refunds are capped at 20% of total gas used (post-London)
Phase 9: Execution Tracing
Goal: Debug-quality execution traces.
Tasks:
- Create trace data structure
- Capture state after each opcode
- Format output like standard EVM debuggers
- Add verbose CLI flag
Validation:
$ evm run --bytecode "6005600301" --trace
PC=0000: PUSH1 0x05 stack=[0x05] gas=99997
PC=0002: PUSH1 0x03 stack=[0x05, 0x03] gas=99994
PC=0004: ADD stack=[0x08] gas=99991
PC=0005: STOP stack=[0x08] gas=99991
Hints if stuck:
- Capture state AFTER each instruction
- Include PC, opcode name, stack state, memory size, gas
- Consider JSON output for tooling integration
Testing Strategy
Unit Tests
#[test]
fn test_stack_push_pop() {
let mut stack = Stack::new();
stack.push(U256::from(42)).unwrap();
assert_eq!(stack.pop().unwrap(), U256::from(42));
}
#[test]
fn test_stack_overflow() {
let mut stack = Stack::new();
for i in 0..1024 {
stack.push(U256::from(i)).unwrap();
}
assert!(stack.push(U256::from(0)).is_err());
}
#[test]
fn test_memory_expansion() {
let mut mem = Memory::new();
mem.store(0, U256::from(42));
assert_eq!(mem.size(), 32);
mem.store(100, U256::from(42));
assert_eq!(mem.size(), 160); // Rounds up to 32-byte boundary
}
#[test]
fn test_signed_division() {
// -4 / 2 = -2
let neg_four = U256::MAX - U256::from(3); // Two's complement of -4
let result = sdiv(neg_four, U256::from(2));
let neg_two = U256::MAX - U256::from(1); // Two's complement of -2
assert_eq!(result, neg_two);
}
Opcode Tests
#[test]
fn test_add_basic() {
let result = execute("6005600301").unwrap();
assert_eq!(result.stack[0], U256::from(8));
}
#[test]
fn test_add_overflow() {
// MAX + 1 = 0 (wraps around)
let code = format!(
"7f{}600101", // PUSH32 MAX, PUSH1 1, ADD
"ff".repeat(32)
);
let result = execute(&code).unwrap();
assert_eq!(result.stack[0], U256::zero());
}
#[test]
fn test_jump_valid() {
// PUSH1 4, JUMP, INVALID, JUMPDEST, STOP
let result = execute("6004565bfe5b00").unwrap();
assert!(result.success);
}
#[test]
fn test_jump_invalid() {
// PUSH1 3, JUMP (3 is not a JUMPDEST)
let result = execute("60035600");
assert!(matches!(result.err(), Some(EvmError::InvalidJump)));
}
Integration Tests
#[test]
fn test_simple_storage_contract() {
// Contract: store input at slot 0, return storage[0]
// CALLDATALOAD 0, SSTORE 0, SLOAD 0, MSTORE 0, RETURN 32
let code = "600035600055600054600052602060000f3";
let calldata = vec![0u8; 31].into_iter().chain([0x42]).collect();
let result = execute_with_calldata(code, calldata).unwrap();
assert_eq!(result.return_data, [0u8; 31].into_iter().chain([0x42]).collect::<Vec<_>>());
}
#[test]
fn test_fibonacci() {
// Compute 10th Fibonacci number
// This tests loops, conditionals, stack manipulation
let code = include_str!("fixtures/fibonacci.hex");
let result = execute(code).unwrap();
assert_eq!(result.stack[0], U256::from(55));
}
Gas Tests
#[test]
fn test_gas_simple_operations() {
// 3 PUSH1 + ADD = 3 + 3 + 3 = 9 gas
let result = execute_with_gas("6001600201", 100).unwrap();
assert_eq!(result.gas_used, 9);
}
#[test]
fn test_gas_memory_expansion() {
// MSTORE at offset 1000 should cost base + expansion
let result = execute_with_gas("60426103e852", 10000).unwrap();
let expected_expansion = 3 * 32 + 32 * 32 / 512; // words = ceil(1032/32) = 33
assert!(result.gas_used > 3 + 3); // More than just PUSH costs
}
#[test]
fn test_out_of_gas() {
let result = execute_with_gas("6001600201", 5);
assert!(matches!(result.err(), Some(EvmError::OutOfGas)));
}
Common Pitfalls & Debugging
Pitfall 1: Stack Order Confusion
Problem: EVM stack order is counterintuitive for binary operations.
Symptom: SUB and DIV produce wrong results.
Solution:
// For "a - b", a is pushed first, then b
// Stack: [a, b] (b on top)
// SUB pops b, pops a, pushes a-b
fn sub(stack: &mut Stack) -> Result<(), EvmError> {
let b = stack.pop()?; // b is on top
let a = stack.pop()?; // a is below
stack.push(a.wrapping_sub(b))
}
Mental model: Think โthe operation is performed on stack items in order they appear in documentation.โ
Pitfall 2: PUSH Immediate Parsing
Problem: PUSH instructions have variable-length immediates.
Symptom: Code after PUSH executes incorrectly.
Solution:
fn execute_push(ctx: &mut Context, n: usize) -> Result<(), EvmError> {
// n is 1-32 for PUSH1-PUSH32
let mut bytes = [0u8; 32];
// Read n bytes from code, right-aligned in 32-byte array
let start = 32 - n;
for i in 0..n {
let pc = ctx.pc + 1 + i;
if pc < ctx.code.len() {
bytes[start + i] = ctx.code[pc];
}
}
ctx.stack.push(U256::from_big_endian(&bytes))?;
ctx.pc += n + 1; // Skip opcode + n bytes
Ok(())
}
Pitfall 3: JUMPDEST Validation
Problem: Jumping into PUSH data that happens to be 0x5B.
Symptom: Arbitrary code execution or wrong jumps.
Solution:
fn analyze_jumpdests(code: &[u8]) -> HashSet<usize> {
let mut valid = HashSet::new();
let mut pc = 0;
while pc < code.len() {
let opcode = code[pc];
if opcode == 0x5B { // JUMPDEST
valid.insert(pc);
pc += 1;
} else if opcode >= 0x60 && opcode <= 0x7F { // PUSH1-PUSH32
let push_size = (opcode - 0x60 + 1) as usize;
pc += 1 + push_size; // Skip over immediate data
} else {
pc += 1;
}
}
valid
}
fn jump(ctx: &mut Context, jumpdests: &HashSet<usize>) -> Result<(), EvmError> {
let dest = ctx.stack.pop()?.as_usize();
if !jumpdests.contains(&dest) {
return Err(EvmError::InvalidJump);
}
ctx.pc = dest;
Ok(())
}
Pitfall 4: Memory Word Alignment
Problem: EVM memory is byte-addressable but MLOAD/MSTORE work with 32-byte words.
Symptom: Values stored incorrectly or shifted.
Solution:
fn mstore(memory: &mut Memory, offset: usize, value: U256) {
let mut bytes = [0u8; 32];
value.to_big_endian(&mut bytes); // Big-endian!
memory.expand(offset, 32);
memory.data[offset..offset + 32].copy_from_slice(&bytes);
}
fn mload(memory: &mut Memory, offset: usize) -> U256 {
memory.expand(offset, 32);
let bytes: [u8; 32] = memory.data[offset..offset + 32]
.try_into()
.unwrap();
U256::from_big_endian(&bytes) // Big-endian!
}
Key insight: EVM uses big-endian byte order for memory and storage.
Pitfall 5: Gas Refund Complexity
Problem: SSTORE gas refunds have complex rules.
Symptom: Gas calculations donโt match other implementations.
Solution: Follow EIP-2200 precisely:
fn sstore_gas(original: U256, current: U256, new: U256) -> (u64, i64) {
// Returns (gas_cost, gas_refund)
if current == new {
// No-op: only warm access cost
return (100, 0);
}
if original == current {
if original.is_zero() {
// 0 -> non-zero: new storage
return (20000, 0);
}
if new.is_zero() {
// non-zero -> 0: clearing
return (5000, 15000);
}
// non-zero -> different non-zero
return (5000, 0);
}
// Current != original (already modified in this transaction)
let mut refund = 0i64;
if !original.is_zero() {
if current.is_zero() {
// Undo previous clearing
refund -= 15000;
} else if new.is_zero() {
// Clear dirty slot
refund += 15000;
}
}
if original == new {
// Reset to original value
if original.is_zero() {
refund += 20000 - 100; // Refund the 0->non-zero cost
} else {
refund += 5000 - 100;
}
}
(100, refund) // Warm access cost
}
Extensions and Challenges
Challenge 1: LOG Operations
Implement LOG0-LOG4 for event emission. Store logs with topics and data.
struct Log {
address: Address,
topics: Vec<U256>, // 0-4 topics
data: Vec<u8>,
}
fn log_n(ctx: &mut Context, n: usize) -> Result<(), EvmError> {
let offset = ctx.stack.pop()?.as_usize();
let size = ctx.stack.pop()?.as_usize();
let mut topics = Vec::with_capacity(n);
for _ in 0..n {
topics.push(ctx.stack.pop()?);
}
let data = ctx.memory.read_slice(offset, size);
ctx.logs.push(Log {
address: ctx.address,
topics,
data,
});
Ok(())
}
Challenge 2: CREATE and CREATE2
Implement contract deployment:
fn create(ctx: &mut Context) -> Result<(), EvmError> {
let value = ctx.stack.pop()?;
let offset = ctx.stack.pop()?.as_usize();
let size = ctx.stack.pop()?.as_usize();
let init_code = ctx.memory.read_slice(offset, size);
// New address = keccak256(rlp([sender, nonce]))[12..]
let new_address = compute_create_address(ctx.address, ctx.nonce);
// Execute init_code to get runtime code
let runtime_code = execute_init_code(init_code, value)?;
// Store the contract
ctx.world_state.set_code(new_address, runtime_code);
ctx.stack.push(new_address.into())
}
Challenge 3: CALL, STATICCALL, DELEGATECALL
Implement external calls with proper context handling:
fn call(ctx: &mut Context) -> Result<(), EvmError> {
let gas = ctx.stack.pop()?;
let to = ctx.stack.pop()?;
let value = ctx.stack.pop()?;
let args_offset = ctx.stack.pop()?.as_usize();
let args_size = ctx.stack.pop()?.as_usize();
let ret_offset = ctx.stack.pop()?.as_usize();
let ret_size = ctx.stack.pop()?.as_usize();
let calldata = ctx.memory.read_slice(args_offset, args_size);
// Create new execution context
let mut call_ctx = ExecutionContext {
code: ctx.world_state.get_code(to.into()),
storage: ctx.world_state.get_storage(to.into()),
calldata,
value,
caller: ctx.address,
// ...
};
let result = execute(&mut call_ctx)?;
// Copy return data to memory
ctx.memory.write_slice(ret_offset, &result.return_data[..ret_size.min(result.return_data.len())]);
ctx.stack.push(if result.success { U256::one() } else { U256::zero() })
}
Challenge 4: Precompiled Contracts
Implement Ethereum precompiles:
- 0x01: ECRECOVER (signature recovery)
- 0x02: SHA256
- 0x03: RIPEMD160
- 0x04: Identity (data copy)
- 0x05: MODEXP (modular exponentiation)
- 0x06-0x08: Elliptic curve operations
Challenge 5: EVM Fuzzer
Build a fuzzer that generates random bytecode and compares execution results against a reference implementation (like revm or geth):
fn fuzz_evm() {
let mut rng = rand::thread_rng();
loop {
let bytecode = generate_random_bytecode(&mut rng);
let our_result = our_evm::execute(&bytecode, GAS_LIMIT);
let reference_result = revm::execute(&bytecode, GAS_LIMIT);
assert_eq!(our_result.stack, reference_result.stack);
assert_eq!(our_result.memory, reference_result.memory);
assert_eq!(our_result.gas_used, reference_result.gas_used);
}
}
Real-World Connections
Ethereum Clients
Production Ethereum clients (Geth, Nethermind, Reth, Erigon) all implement the EVM. Your implementation follows the same specification:
- go-ethereum (Geth): The reference implementation in Go
- Nethermind: .NET implementation with focus on performance
- Reth: New Rust implementation (excellent code to study)
- Erigon: Fork of Geth optimized for archive nodes
Smart Contract Debugging
Tools like Tenderly, Foundryโs debugger, and Hardhatโs console use EVM tracing:
# Foundry trace example
$ forge test -vvvv
[PASS] testTransfer()
[5432] Token::transfer(0x123..., 100)
โโ [2100] SLOAD slot 0x00...01
โโ [5000] SSTORE slot 0x00...01
โโ emit Transfer(from: 0xabc, to: 0x123, value: 100)
Understanding EVM execution lets you debug at this level.
Security Auditing
Auditors trace execution to find vulnerabilities:
// Vulnerable to reentrancy
function withdraw() external {
uint bal = balances[msg.sender];
(bool success,) = msg.sender.call{value: bal}(""); // EVM: CALL
balances[msg.sender] = 0; // EVM: SSTORE
}
The EVM executes CALL before SSTORE. If the recipient is a contract, it can call back before the balance is zeroed.
Gas Optimization
Solidity developers optimize gas by understanding EVM costs:
// Expensive: many SLOADs
for (uint i = 0; i < array.length; i++) { // SLOAD every iteration
total += array[i];
}
// Cheaper: cache length
uint len = array.length; // Single SLOAD
for (uint i = 0; i < len; i++) {
total += array[i];
}
Layer 2 Execution
Optimistic rollups (Arbitrum, Optimism) and zkEVMs (zkSync, Scroll) execute EVM bytecode off-chain:
- Optimistic: Run EVM and post results; fraud proofs re-execute disputed transactions
- zkEVM: Generate zero-knowledge proofs of correct EVM execution
Understanding the EVM is prerequisite to understanding L2 scaling.
Resources
Primary References
- Ethereum Yellow Paper: ethereum.github.io/yellowpaper - Formal EVM specification
- evm.codes: evm.codes - Interactive opcode reference
- โMastering Ethereumโ Chapter 13: EVM deep dive
Code References
- revm: github.com/bluealloy/revm - Fast Rust EVM (study this!)
- py-evm: github.com/ethereum/py-evm - Readable Python implementation
- evmone: github.com/ethereum/evmone - Optimized C++ implementation
Test Suites
- Ethereum Tests: github.com/ethereum/tests - Official test vectors
- EVM Test: Use these to validate your implementation against the spec
Supplementary Reading
- EIPs affecting EVM:
- EIP-2929: Gas cost changes for state access
- EIP-3529: Reduction in gas refunds
- EIP-3541: Reject contracts starting with 0xEF
- โEVM Deep Divesโ series by noxx: Practical EVM internals
Self-Assessment Checklist
Before moving to the next project, verify:
- I can explain why the EVM is stack-based rather than register-based
- I understand the difference between memory (volatile) and storage (persistent)
- I can trace through bytecode execution manually on paper
- My implementation correctly handles stack overflow/underflow
- All arithmetic operations handle 256-bit wraparound correctly
- JUMP validation prevents jumping into PUSH immediate data
- Gas metering matches expected values for all opcodes
- Memory expansion costs are calculated correctly
- Storage warm/cold access is tracked properly
- I can explain how the EVM fits into Ethereumโs state transition function
Whatโs Next?
With the EVM mastered, you now understand how smart contracts actually execute. In Project 9: ERC-20 Token from Scratch, youโll apply this knowledge by implementing the most fundamental token standardโunderstanding exactly what bytecode gets generated and why tokens work the way they do.
Your EVM implementation can also serve as the execution engine for:
- A custom blockchain (combine with Project 4)
- A smart contract debugger
- A security analysis tool (Project 14)
- An L2 fraud prover (Project 13)
The EVM is the computational heart of Ethereum. Now you own that knowledge.