Understanding Bitcoin, Blockchain & Ethereum Through Building
Goal: Deeply understand how cryptocurrency systems work at every level—from the cryptographic primitives that secure transactions, to the consensus mechanisms that keep networks honest, to the virtual machines that execute smart contracts. By building these systems yourself, you’ll gain the knowledge to read any blockchain’s source code, audit smart contracts, and architect new decentralized systems.
Why Blockchain Technology Matters
In 2008, an anonymous developer named Satoshi Nakamoto published a 9-page whitepaper that solved a problem cryptographers had struggled with for decades: how to create digital money that can’t be double-spent, without trusting a central authority.
That solution—Bitcoin—launched an entirely new field of computer science. Today:
- Bitcoin processes ~427,000+ transactions per day as of December 2024, with cumulative 1.28+ billion transactions since inception, secured by more computing power than the world’s top 500 supercomputers combined
- Ethereum hosts over $166 billion in DeFi total value locked (TVL) plus $45 billion in Layer 2 TVL, running unstoppable applications with no central server
- Layer 2 rollups (Arbitrum, Optimism, Base, zkSync) process thousands of transactions per second while inheriting Ethereum’s security
- Every major bank and tech company now has blockchain research teams, with projections suggesting Ethereum’s TVL could grow 10x by 2026
Understanding blockchain isn’t just about cryptocurrency—it’s about understanding a new paradigm for building trustless, decentralized systems. The concepts you’ll learn (cryptographic commitments, distributed consensus, state machines, game-theoretic security) apply far beyond crypto.
The Mental Model: What Makes Blockchain Work
Before diving into projects, you need to understand why blockchains work. Here’s the core insight:
Traditional Database Blockchain
┌─────────────────────┐ ┌─────────────────────────────────────┐
│ │ │ ┌──────┐ ┌──────┐ ┌──────┐ │
│ Central Server │ │ │Node 1│ │Node 2│ │Node 3│ ... │
│ (Single source │ │ └──┬───┘ └──┬───┘ └──┬───┘ │
│ of truth) │ │ │ │ │ │
│ │ │ └─────────┴─────────┘ │
└─────────┬───────────┘ │ Consensus │
│ │ (All nodes agree on │
"Trust me" │ the same state) │
│ └─────────────────────────────────────┘
▼ │
Users must trust ▼
the operator "Trust the math"
(Cryptographic proofs make
cheating impossible/expensive)
The key innovation is replacing trust in institutions with trust in mathematics:
- Cryptographic hashes make tampering detectable (change one bit → completely different hash)
- Digital signatures prove ownership without revealing secrets
- Proof of Work/Stake makes attacks economically irrational
- Merkle trees enable efficient verification without downloading everything
- Consensus protocols ensure all honest nodes see the same history
Cryptographic Hash Functions: The Foundation of Blockchain
Every blockchain relies on cryptographic hash functions. A hash function takes any input and produces a fixed-size output (the “hash” or “digest”):
Input: "Hello, World!"
↓ SHA-256
Output: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
Input: "Hello, World." (just one character different!)
↓ SHA-256
Output: f8c3bf62a9aa3e6fc1619c250e48ade01a8e0a892e2e69e9a5e3f8a2f5e21c8a
(completely different!)
Properties That Make Hash Functions Useful
┌─────────────────────────────────────────────────────────────────────────────┐
│ HASH FUNCTION PROPERTIES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. DETERMINISTIC │
│ Same input → Always same output │
│ "hello" → abc123... (every single time) │
│ │
│ 2. ONE-WAY (Preimage Resistance) │
│ Input → Hash ✓ EASY (microseconds) │
│ Hash → Input ✗ IMPOSSIBLE (longer than the universe) │
│ │
│ 3. COLLISION RESISTANT │
│ Finding two different inputs with same hash is infeasible │
│ P(collision) ≈ 1 in 2^128 for SHA-256 │
│ │
│ 4. AVALANCHE EFFECT │
│ Tiny change in input → Completely different output │
│ "hello" → abc123... │
│ "hellp" → xyz789... (no similarity!) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
How Blocks Chain Together via Hashes
THE BLOCKCHAIN DATA STRUCTURE
Block 0 (Genesis) Block 1 Block 2
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Prev Hash: 0000 │ ┌───▶│ Prev Hash: a7f3 │ ┌───▶│ Prev Hash: 8d2e │
├──────────────────┤ │ ├──────────────────┤ │ ├──────────────────┤
│ Timestamp │ │ │ Timestamp │ │ │ Timestamp │
│ Nonce │ │ │ Nonce │ │ │ Nonce │
│ Merkle Root │ │ │ Merkle Root │ │ │ Merkle Root │
├──────────────────┤ │ ├──────────────────┤ │ ├──────────────────┤
│ Transactions │ │ │ Transactions │ │ │ Transactions │
│ - Tx0 │ │ │ - Tx0 │ │ │ - Tx0 │
│ - Tx1 │ │ │ - Tx1 │ │ │ - Tx1 │
│ - ... │ │ │ - ... │ │ │ - ... │
└────────┬─────────┘ │ └────────┬─────────┘ │ └──────────────────┘
│ │ │ │
└─ Hash ──────┘ └─ Hash ──────┘
= a7f3... = 8d2e...
WHY THIS IS TAMPER-PROOF:
─────────────────────────
If you try to change a transaction in Block 1:
1. Block 1's hash changes (avalanche effect)
2. Block 2's "Prev Hash" no longer matches
3. Block 2's hash changes
4. ... all subsequent blocks become invalid
To tamper, you'd need to recalculate ALL subsequent blocks
faster than the honest network adds new ones.
Merkle Trees: Efficient Transaction Verification
A Merkle tree allows you to prove a transaction is in a block without downloading the entire block:
MERKLE TREE STRUCTURE
┌─────────────────┐
│ Merkle Root │ ← Stored in block header
│ H(AB + CD) │ (just 32 bytes!)
└────────┬────────┘
│
┌─────────────────┴─────────────────┐
│ │
┌─────┴─────┐ ┌─────┴─────┐
│ H(A+B) │ │ H(C+D) │
│ Internal │ │ Internal │
└─────┬─────┘ └─────┬─────┘
│ │
┌───────┴───────┐ ┌───────┴───────┐
│ │ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ H(A) │ │ H(B) │ │ H(C) │ │ H(D) │
│ Leaf A │ │ Leaf B │ │ Leaf C │ │ Leaf D │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Tx A │ │ Tx B │ │ Tx C │ │ Tx D │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
MERKLE PROOF EXAMPLE: Prove Tx B is in the tree
─────────────────────────────────────────────────
You need only 2 hashes (marked with ★):
1. H(A) ★ (to compute H(A+B))
2. H(C+D) ★ (to compute the root)
┌─────────────────┐
│ Merkle Root │ ← Compute and compare to block header
│ H(AB + CD) │
└────────┬────────┘
│
┌────────┴────────┐
│ │
┌───┴───┐ ┌───┴───┐
│H(A+B) │ │H(C+D) │ ★ Given
└───┬───┘ └───────┘
│
┌───┴───┐
│ │
★ ● ← Your transaction (Tx B)
H(A) H(B) You compute H(B) yourself
Verification: O(log n) hashes instead of O(n)
- 1 million transactions: only ~20 hashes needed!
- Light clients can verify without full blockchain
Elliptic Curve Cryptography: How Digital Signatures Work
Bitcoin and Ethereum use the secp256k1 elliptic curve for digital signatures. Here’s why this matters:
THE SECP256K1 CURVE: y² = x³ + 7
│
│ ....
│ ... ...
│ .. ..
│ . .
│ . Point .
────────┼───●──────────────────────────
│ . G (Generator) .
│ . .
│ .. ..
│ ... ...
│ ....
│
HOW KEY GENERATION WORKS:
─────────────────────────
1. Pick a random 256-bit number: your PRIVATE KEY (k)
k = 0x1234567890abcdef... (keep this SECRET!)
2. Multiply Generator Point G by your private key:
PUBLIC KEY = k × G = P (a point on the curve)
3. The magic: Computing k × G is easy (milliseconds)
Reversing P → k is IMPOSSIBLE (billions of years)
This is the "Elliptic Curve Discrete Logarithm Problem" (ECDLP)
Security: 2^128 operations to break = heat death of universe
DIGITAL SIGNATURE (ECDSA):
──────────────────────────
Signing a Transaction:
┌─────────────────────────────────────────────────────────┐
│ │
│ 1. Hash the message: z = SHA256(transaction_data) │
│ │
│ 2. Pick random nonce: k (MUST be unique per signature!)│
│ │
│ 3. Calculate R = k × G (a curve point) │
│ r = R.x mod n (x-coordinate of R) │
│ │
│ 4. Calculate s = k⁻¹(z + r × private_key) mod n │
│ │
│ 5. Signature = (r, s) │
│ │
└─────────────────────────────────────────────────────────┘
Verifying a Signature (anyone can do this!):
┌─────────────────────────────────────────────────────────┐
│ │
│ Given: message, signature (r, s), public key P │
│ │
│ 1. Hash the message: z = SHA256(message) │
│ │
│ 2. Calculate: u₁ = z × s⁻¹ mod n │
│ u₂ = r × s⁻¹ mod n │
│ │
│ 3. Calculate point: R' = u₁ × G + u₂ × P │
│ │
│ 4. Signature valid if: R'.x mod n == r │
│ │
└─────────────────────────────────────────────────────────┘
WHY THIS IS SECURE:
───────────────────
- Only the private key holder can create valid signatures
- Anyone can verify with just the public key
- The signature is unique to that exact message
- Change one bit of the message → signature becomes invalid
Bitcoin’s UTXO Model vs Ethereum’s Account Model
These are the two fundamental ways blockchains track “who owns what”:
BITCOIN: UTXO MODEL
════════════════════
Think of it like CASH - physical bills you receive and spend:
You have these UTXOs (Unspent Transaction Outputs):
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ UTXO #1 │ │ UTXO #2 │ │ UTXO #3 │
│ 0.5 BTC │ │ 0.3 BTC │ │ 0.2 BTC │
│ From: tx_abc │ │ From: tx_def │ │ From: tx_ghi │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Total: 1.0 BTC (but stored as 3 separate "bills")
To send 0.6 BTC to Alice:
─────────────────────────
INPUTS (consumed) OUTPUTS (created)
┌─────────────────────────┐ ┌─────────────────────────┐
│ UTXO #1: 0.5 BTC ───────┼───▶│ Alice: 0.6 BTC (new) │
│ UTXO #2: 0.3 BTC ───────┼───▶│ Change: 0.199 BTC (new)│
└─────────────────────────┘ │ Fee: 0.001 BTC │
0.8 BTC consumed └─────────────────────────┘
0.8 BTC distributed
After: You own 1 UTXO (the 0.199 BTC change + UTXO #3)
═══════════════════════════════════════════════════════════════
ETHEREUM: ACCOUNT MODEL
════════════════════════
Think of it like a BANK ACCOUNT - one balance that gets updated:
Global State (simplified):
┌───────────────────────────────────────────────────────────┐
│ Account │ Balance │ Nonce │ Storage │
├─────────────────────────────┼──────────┼───────┼─────────┤
│ 0xAlice... │ 5.0 ETH │ 12 │ - │
│ 0xBob... │ 3.2 ETH │ 45 │ - │
│ 0xUniswap... (contract) │ 1000 ETH │ 1 │ {...} │
└───────────────────────────────────────────────────────────┘
To send 2.0 ETH from Alice to Bob:
──────────────────────────────────
Before: Alice = 5.0 ETH, Bob = 3.2 ETH, Alice.nonce = 12
After: Alice = 2.9 ETH, Bob = 5.2 ETH, Alice.nonce = 13
▲ ▲
│ │
(2.0 sent + 0.1 fee) (prevents replay attacks)
═══════════════════════════════════════════════════════════════
COMPARISON
══════════
┌─────────────────────┬──────────────────────┬────────────────────────┐
│ Property │ UTXO (Bitcoin) │ Account (Ethereum) │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Privacy │ Better (new address │ Worse (single address │
│ │ for each UTXO) │ easy to track) │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Parallelism │ Excellent (UTXOs are │ Limited (account state │
│ │ independent) │ is sequential) │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Smart Contracts │ Limited (Script is │ Excellent (Turing- │
│ │ not Turing-complete)│ complete EVM) │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Complexity │ Higher (manage many │ Lower (simple balance) │
│ │ UTXOs) │ │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Double-Spend Check │ Check if UTXO exists │ Check nonce sequence │
│ │ and unspent │ │
└─────────────────────┴──────────────────────┴────────────────────────┘
Proof of Work: Making Cheating Expensive
Proof of Work is the original consensus mechanism that made Bitcoin possible:
HOW PROOF OF WORK OPERATES
The Mining Puzzle:
──────────────────
Find a nonce such that:
SHA256(block_header + nonce) < TARGET
Where TARGET is adjusted so this takes ~10 minutes on average
for the entire network combined.
Example:
────────
Block header data: "prev_hash=abc123, merkle_root=def456, timestamp=..."
Miner tries:
nonce=0: SHA256(...) = f8a3b2c1... ❌ Too high!
nonce=1: SHA256(...) = e9d4c5a6... ❌ Too high!
nonce=2: SHA256(...) = d7e8f901... ❌ Too high!
... millions of attempts ...
nonce=8372631: SHA256(...) = 0000000000abc... ✓ Below target!
This took BILLIONS of hash operations (expensive!)
But verifying is just ONE hash (cheap!)
The Difficulty Target:
──────────────────────
┌─────────────────────────────────────────────────────────────────┐
│ │
│ TARGET (determines how many leading zeros required) │
│ │
│ Difficulty 1: 00000000ffffffff... (easiest) │
│ Difficulty 10: 000000000fffffff... │
│ Difficulty 100: 0000000000ffffff... │
│ Bitcoin 2024: 00000000000000000000... (very hard!) │
│ │
│ Adjustment: Every 2016 blocks (~2 weeks) │
│ - If blocks came too fast → increase difficulty │
│ - If blocks came too slow → decrease difficulty │
│ - Goal: maintain ~10 minute average block time │
│ │
└─────────────────────────────────────────────────────────────────┘
Why This Creates Consensus:
───────────────────────────
┌─────────────────────────────────────────────────────────────────┐
│ │
│ 1. Miners compete to find valid blocks │
│ │
│ 2. First valid block gets broadcast to network │
│ │
│ 3. Other miners verify (cheap!) and accept │
│ │
│ 4. Miners start building on the new longest chain │
│ │
│ 5. Block reward (currently 3.125 BTC) incentivizes honesty │
│ │
│ │
│ Fork Resolution: LONGEST CHAIN WINS │
│ │
│ ┌────┐ ┌────┐ ┌────┐ │
│ │ B1 │────▶│ B2 │────▶│ B3 │──┬──▶ ✓ This chain wins │
│ └────┘ └────┘ └────┘ │ (more work) │
│ │ │ │
│ ▼ │ │
│ ┌────┐ │ │
│ │ B3'│───┘ ✗ Orphaned │
│ └────┘ (less work) │
│ │
└─────────────────────────────────────────────────────────────────┘
51% Attack Economics:
─────────────────────
To rewrite history, an attacker needs >50% of network hash rate:
- Current Bitcoin network: ~500 EH/s (500 quintillion hashes/sec)
- Cost of equipment: ~$10 billion
- Electricity: ~$20 million per day
- And you'd destroy the value of what you're trying to steal!
The attack is possible but economically irrational.
The Ethereum Virtual Machine: A World Computer
Ethereum extends Bitcoin’s vision by adding a Turing-complete virtual machine:
EVM ARCHITECTURE
┌─────────────────────────────────────────────────────────────────┐
│ ETHEREUM VIRTUAL MACHINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ STACK │ │
│ │ ┌────┬────┬────┬────┬────┬────┬─────────────────────┐ │ │
│ │ │ 32 │ 31 │ 30 │ 29 │ 28 │ .. │ 0 │ │ │
│ │ │byte│byte│byte│byte│byte│ │ (top) │ │ │
│ │ └────┴────┴────┴────┴────┴────┴─────────────────────┘ │ │
│ │ Max depth: 1024 items, each 256-bits (32 bytes) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ MEMORY │ │
│ │ ┌────┬────┬────┬────┬────┬────┬────┬────┬────────────┐ │ │
│ │ │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ .. │ n │ ∞ │ │ │
│ │ └────┴────┴────┴────┴────┴────┴────┴────┴────────────┘ │ │
│ │ Linear byte array, volatile (cleared after execution) │ │
│ │ Cost: grows quadratically (more memory = more gas) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ STORAGE │ │
│ │ ┌──────────────────────┬───────────────────────────┐ │ │
│ │ │ Key (256-bit) │ Value (256-bit) │ │ │
│ │ ├──────────────────────┼───────────────────────────┤ │ │
│ │ │ 0x0000...0000 │ contract_owner_address │ │ │
│ │ │ 0x0000...0001 │ total_supply │ │ │
│ │ │ keccak(user, slot) │ user_balance │ │ │
│ │ └──────────────────────┴───────────────────────────┘ │ │
│ │ Key-value store, PERSISTENT across transactions │ │
│ │ Most expensive operation! (20,000 gas to write) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
EVM EXECUTION MODEL:
────────────────────
Program Counter ──▶ ┌──────────────────────────────────────────┐
│ 0x60 0x80 0x60 0x40 0x52 0x34 ... │
│ PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE │
└──────────────────────────────────────────┘
▲
│
Each byte is an OPCODE or data
COMMON OPCODES:
───────────────
┌───────────┬────────┬────────────────────────────────────────────┐
│ Opcode │ Gas │ Description │
├───────────┼────────┼────────────────────────────────────────────┤
│ ADD │ 3 │ Pop 2 values, push their sum │
│ MUL │ 5 │ Pop 2 values, push their product │
│ SUB │ 3 │ Pop 2 values, push their difference │
│ DIV │ 5 │ Pop 2 values, push their quotient │
├───────────┼────────┼────────────────────────────────────────────┤
│ PUSH1 │ 3 │ Push 1 byte onto stack │
│ PUSH32 │ 3 │ Push 32 bytes onto stack │
│ POP │ 2 │ Remove top stack item │
│ DUP1 │ 3 │ Duplicate top stack item │
│ SWAP1 │ 3 │ Swap top 2 stack items │
├───────────┼────────┼────────────────────────────────────────────┤
│ MLOAD │ 3 │ Load word from memory │
│ MSTORE │ 3 │ Store word in memory │
│ SLOAD │ 100 │ Load word from storage (cold access 2100) │
│ SSTORE │ 20000 │ Store word in storage (most expensive!) │
├───────────┼────────┼────────────────────────────────────────────┤
│ JUMP │ 8 │ Jump to code location │
│ JUMPI │ 10 │ Conditional jump │
│ CALL │ 100+ │ Call another contract │
│ RETURN │ 0 │ End execution, return data │
└───────────┴────────┴────────────────────────────────────────────┘
EXAMPLE: Simple Addition in Bytecode
────────────────────────────────────
Solidity: function add(uint a, uint b) returns (uint) { return a + b; }
Compiles to something like:
PUSH1 0x00 ; Push 0 (result location)
CALLDATALOAD ; Load 'a' from calldata
PUSH1 0x20 ; Push 32 (offset for 'b')
CALLDATALOAD ; Load 'b' from calldata
ADD ; a + b
PUSH1 0x00 ; Push return offset
MSTORE ; Store result in memory
PUSH1 0x20 ; Push return size (32 bytes)
PUSH1 0x00 ; Push return offset
RETURN ; Return the result
Stack trace:
[] ; Start empty
[0x00] ; PUSH1 0x00
[a] ; CALLDATALOAD (replaces 0x00 with value at offset 0)
[a, 0x20] ; PUSH1 0x20
[a, b] ; CALLDATALOAD (loads value at offset 32)
[a+b] ; ADD
...
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Hash functions | SHA-256 is deterministic, one-way, collision-resistant. Blocks chain via hashes. Changing one bit invalidates everything after. |
| Merkle trees | Binary tree of hashes. Proves inclusion in O(log n). Enables light clients and efficient verification. |
| Elliptic curves (secp256k1) | Private key × Generator Point = Public key. Easy to compute, impossible to reverse. Foundation of digital signatures. |
| ECDSA signatures | Prove ownership without revealing private key. Unique per message. Anyone can verify. |
| UTXO vs Account model | Bitcoin tracks unspent outputs (like cash). Ethereum tracks balances (like bank accounts). Trade-offs in privacy, parallelism, and expressiveness. |
| Proof of Work | Find nonce where hash < target. Expensive to create, cheap to verify. Longest chain wins. 51% attack is economically irrational. |
| The EVM | Stack-based VM with 256-bit words. Stack (1024 deep), Memory (volatile), Storage (persistent). Gas measures computation. Opcodes are single bytes. |
| Consensus | Agreement without central authority. PoW uses energy, PoS uses stake. Both make attacks expensive. |
Deep Dive Reading by Concept
This section maps each concept to specific book chapters for deeper understanding. Read these before or alongside the projects to build strong mental models.
Cryptographic Foundations
| Concept | Book & Chapter |
|---|---|
| Hash function internals | Serious Cryptography, 2nd Edition by Jean-Philippe Aumasson — Ch. 6: “Hash Functions” |
| Merkle trees and proofs | Mastering Bitcoin, 3rd Edition by Andreas Antonopoulos — Ch. 11: “The Blockchain” |
| Elliptic curve math | Programming Bitcoin by Jimmy Song — Ch. 2-3: “Elliptic Curves” and “Elliptic Curve Cryptography” |
| ECDSA signatures | Programming Bitcoin by Jimmy Song — Ch. 4: “Serialization” and Ch. 5: “Transactions” |
| Cryptographic primitives overview | Practical Cryptography for Developers (online) by Svetlin Nakov |
Bitcoin Internals
| Concept | Book & Chapter |
|---|---|
| Transaction structure | Mastering Bitcoin, 3rd Edition — Ch. 6: “Transactions” |
| UTXO model deep dive | Programming Bitcoin by Jimmy Song — Ch. 5-7: Transactions and Script |
| Bitcoin Script opcodes | Programming Bitcoin by Jimmy Song — Ch. 6: “Script” |
| Block structure | Mastering Bitcoin, 3rd Edition — Ch. 11: “The Blockchain” |
| Proof of Work mining | Programming Bitcoin by Jimmy Song — Ch. 9: “Blocks” |
Ethereum & Smart Contracts
| Concept | Book & Chapter |
|---|---|
| EVM architecture | Mastering Ethereum by Antonopoulos & Wood — Ch. 13: “The Ethereum Virtual Machine” |
| Smart contract basics | Mastering Ethereum — Ch. 7: “Smart Contracts and Solidity” |
| Gas and execution model | Mastering Ethereum — Ch. 13 (Gas section) |
| Account model | Mastering Ethereum — Ch. 4: “Cryptography” and Ch. 5: “Wallets” |
Distributed Systems & Consensus
| Concept | Book & Chapter |
|---|---|
| Byzantine Fault Tolerance | Designing Data-Intensive Applications by Martin Kleppmann — Ch. 8-9: “Distributed Systems Trouble” and “Consistency and Consensus” |
| P2P networking | Computer Networks by Tanenbaum & Wetherall — Ch. 5: “Network Layer” |
| Consensus algorithms | Designing Data-Intensive Applications — Ch. 9: “Consistency and Consensus” |
| Proof of Stake | Ethereum Casper papers and Vitalik’s blog posts |
Building Virtual Machines
| Concept | Book & Chapter |
|---|---|
| Stack machine architecture | Crafting Interpreters by Robert Nystrom — Part III: “A Bytecode Virtual Machine” |
| Bytecode design | Crafting Interpreters — Ch. 14-15: “Chunks of Bytecode” and “A Virtual Machine” |
| Compiler construction | Writing a C Compiler by Nora Sandler — Full book |
Essential Reading Order
For maximum comprehension, read in this order:
- Foundation (Week 1):
- Programming Bitcoin Ch. 1-4 (field math, curves, serialization)
- Mastering Bitcoin Ch. 6 (transactions)
- Bitcoin Deep Dive (Week 2):
- Programming Bitcoin Ch. 5-9 (transactions, script, blocks)
- Mastering Bitcoin Ch. 11 (blockchain)
- Ethereum (Week 3):
- Mastering Ethereum Ch. 4-7 (crypto, wallets, contracts)
- Mastering Ethereum Ch. 13 (EVM)
- Distributed Systems (Week 4):
- Designing Data-Intensive Applications Ch. 8-9
Prerequisites & Background Knowledge
Essential Prerequisites (Must Have)
Before starting these projects, you should have:
Programming Skills:
- Proficiency in at least one programming language (Python, C, JavaScript, or Rust preferred)
- Understanding of data structures (arrays, hash tables, trees)
- Basic algorithm analysis (Big O notation)
- Experience with command-line tools and Git
Cryptography Fundamentals:
- What is a cryptographic hash function? (Can you explain SHA-256?)
- What makes a hash function “cryptographically secure”?
- What is public-key cryptography? (Asymmetric encryption)
- What is a digital signature and why can’t it be forged?
Computer Science Fundamentals:
- How does TCP/IP networking work? (Client-server model, sockets)
- What is a state machine?
- What is serialization/deserialization?
- How do distributed systems differ from single-machine programs?
Helpful But Not Required
You’ll learn these during the projects, but having them helps:
- Number theory (modular arithmetic, prime fields) - Project 1 teaches this
- Distributed systems theory (CAP theorem, Byzantine Generals) - Projects 2 & 4 cover this
- Compiler theory (parsing, ASTs, bytecode) - Projects 3 & 5 teach this
- Game theory (incentives, Nash equilibrium) - Emerges naturally in consensus projects
Self-Assessment Questions
Can you answer these? If yes, you’re ready:
- Cryptography:
- Why can’t you reverse a SHA-256 hash?
- How does signing with a private key let others verify with the public key?
- What happens if two different inputs produce the same hash?
- Programming:
- How would you represent a graph in memory?
- What’s the difference between passing by value vs. by reference?
- How do you debug a segfault or null pointer exception?
- Networking:
- What’s the difference between TCP and UDP?
- How do two programs on different computers communicate?
- What does “peer-to-peer” mean?
If you struggled with any of these, spend a day reviewing:
- Cryptography basics: “Serious Cryptography, 2nd Edition” Ch. 1-3
- Networking: “Computer Networks” Ch. 1
- Data structures: “Algorithms, Fourth Edition” Ch. 1-3
Development Environment Setup
Required Tools:
- Python 3.9+ (for Projects 1, 2, 5) - or your preferred language
- C compiler (gcc or clang for Project 2 if using C)
- Git for version control
- Text editor or IDE (VS Code, Vim, PyCharm - whatever you’re comfortable with)
Recommended Tools:
- Bitcoin Core (for testing against real Bitcoin network)
# Install Bitcoin Core in regtest mode for safe testing bitcoin-cli -regtest -daemon - Geth (Ethereum client for testing EVM projects)
# Run a local Ethereum testnet geth --dev --http - Block explorer access (blockchain.com, etherscan.io) to inspect real transactions
- Wireshark or tcpdump for inspecting network packets (Project 2)
- Hexdump tools (
xxd,hexyl) for debugging binary formats
Optional but Useful:
- Docker to run multiple blockchain nodes easily
- Postman or
curlfor testing APIs - Jupyter notebooks for experimenting with crypto math (Project 1)
Time Investment Expectations
Realistic time estimates per project:
| Project | Beginner | Intermediate | Advanced |
|---|---|---|---|
| Project 1: Bitcoin from Scratch | 6-8 weeks | 3-4 weeks | 2-3 weeks |
| Project 2: Minimal Blockchain | 2 weeks | 1 week | 2-3 days |
| Project 3: EVM from Scratch | 4-6 weeks | 2-3 weeks | 1-2 weeks |
| Project 4: Proof-of-Stake | 3-4 weeks | 2 weeks | 1 week |
| Project 5: Smart Contract Compiler | 3-4 weeks | 2 weeks | 1 week |
| Project 6: Layer-2 Rollup | 6-8 weeks | 4 weeks | 2-3 weeks |
Total learning journey: 6-12 months if doing all projects (working 10-15 hours/week)
Important: These are NOT tutorial projects. You will get stuck. You will need to read documentation, whitePapers, and books. You will debug for hours. This is how deep learning happens.
Important Reality Check
What These Projects Are NOT:
- ❌ Copy-paste tutorials with step-by-step instructions
- ❌ “Build a blockchain in 100 lines” toy examples
- ❌ Get-rich-quick crypto trading guides
- ❌ Production-ready code you should deploy with real money
What These Projects ARE:
- ✅ Deep dives into how blockchain systems actually work
- ✅ Implementations from first principles with working code
- ✅ Educational exercises that force you to confront hard problems
- ✅ Skills that transfer to auditing smart contracts, building dApps, or understanding any blockchain
Warning: Do NOT use code from these projects with real cryptocurrency. These are educational implementations. Production systems require security audits, extensive testing, and expert review.
Quick Start: Your First 48 Hours
Feeling overwhelmed? Start here.
If you’re new to blockchain and don’t know where to begin, follow this 48-hour crash course:
Hour 0-4: Understand the Core Insight
Read these (in order):
- Bitcoin Whitepaper - 9 pages, focus on sections 1-5
- “Why Blockchain Technology Matters” section above (reread it slowly)
- Visualize: Draw the “Traditional Database vs Blockchain” diagram on paper
Experiment:
# Install Python and try this:
import hashlib
# See how hash functions work
data = "Hello, Bitcoin!"
hash1 = hashlib.sha256(data.encode()).hexdigest()
print(f"SHA-256({data}) = {hash1}")
# Change ONE character
data2 = "Hello, Bitcoin?"
hash2 = hashlib.sha256(data2.encode()).hexdigest()
print(f"SHA-256({data2}) = {hash2}")
# Notice: Hashes are completely different!
Goal: Understand why cryptographic hashes make blockchains tamper-proof.
Hour 5-12: Build Your First Tiny Blockchain
Do this mini-project:
Create a simple Python script with 3 blocks that chain together:
# mini_blockchain.py
import hashlib
import json
import time
class Block:
def __init__(self, index, data, previous_hash):
self.index = index
self.timestamp = time.time()
self.data = data
self.previous_hash = previous_hash
self.hash = self.calculate_hash()
def calculate_hash(self):
block_string = json.dumps({
"index": self.index,
"timestamp": self.timestamp,
"data": self.data,
"previous_hash": self.previous_hash
}, sort_keys=True)
return hashlib.sha256(block_string.encode()).hexdigest()
# Create genesis block
genesis = Block(0, "Genesis Block", "0")
block1 = Block(1, "Alice sends Bob 10 BTC", genesis.hash)
block2 = Block(2, "Bob sends Charlie 5 BTC", block1.hash)
print(f"Block 0: {genesis.hash}")
print(f"Block 1: {block1.hash} (prev: {block1.previous_hash})")
print(f"Block 2: {block2.hash} (prev: {block2.previous_hash})")
# Try tampering!
block1.data = "Alice sends Bob 1000 BTC" # Fraud attempt!
print(f"\n❌ After tampering Block 1:")
print(f"Block 1 hash: {block1.hash}")
print(f"Block 2 expects previous: {block2.previous_hash}")
print(f"Does Block 2 still validate? {block1.hash == block2.previous_hash}")
Run it and experiment:
- What happens when you change block 1’s data?
- Why does block 2 break when you tamper with block 1?
- This is the “chain” in blockchain!
Hour 13-24: Understand Proof of Work
Add mining to your mini blockchain:
Modify your calculate_hash method to require the hash to start with 0000:
def mine_block(self, difficulty=4):
target = "0" * difficulty
self.nonce = 0
while self.hash[:difficulty] != target:
self.nonce += 1
self.hash = self.calculate_hash()
print(f"Block mined! Nonce: {self.nonce}, Hash: {self.hash}")
Run it and observe:
- How long does it take to mine a block?
- What happens if you increase difficulty to 5 zeros? 6 zeros?
- This is proof-of-work! It makes tampering expensive.
Hour 25-36: Learn Cryptographic Signatures
Read and implement:
- Read “Programming Bitcoin” Chapter 3 (Elliptic Curve Cryptography) - first 20 pages
- Use Python’s
ecdsalibrary to sign and verify messages:
from ecdsa import SigningKey, SECP256k1
# Alice generates a keypair
private_key = SigningKey.generate(curve=SECP256k1)
public_key = private_key.verifying_key
# Alice signs a message
message = b"I, Alice, send Bob 10 BTC"
signature = private_key.sign(message)
# Anyone can verify with Alice's public key
try:
public_key.verify(signature, message)
print("✓ Signature valid!")
except:
print("❌ Invalid signature")
# Try to forge (will fail!)
fake_message = b"I, Alice, send Bob 1000 BTC"
try:
public_key.verify(signature, fake_message)
except:
print("❌ Cannot forge signature!")
Goal: Understand how digital signatures prove ownership.
Hour 37-48: Read Real Bitcoin Code
Don’t implement, just READ:
- Go to Bitcoin Core source code
- Read these files (just read, don’t understand everything):
src/primitives/block.h- See how blocks are structuredsrc/primitives/transaction.h- See transaction formatsrc/hash.h- See how hashing is implemented
Then look at a real transaction:
- Visit blockchain.com
- Click any recent block
- Click a transaction
- Look at the “raw transaction” hex
Ask yourself:
- Can I see the inputs and outputs?
- Where are the signatures?
- How much data is actually in a transaction?
After 48 Hours
You should now understand:
- ✅ Why blockchains are tamper-proof (hash chains)
- ✅ Why mining makes attacks expensive (proof-of-work)
- ✅ How ownership is proven (digital signatures)
- ✅ What a real blockchain looks like (Bitcoin exploration)
You’re ready to start Project 2: “Build a Minimal Blockchain in a Weekend”
Recommended Learning Paths
Different backgrounds benefit from different project orders. Choose your path:
Path 1: “The Academic” (Theory-First Approach)
Best for: Computer science students, math background, theory lovers
Order:
- Start: Read all concept sections above thoroughly
- Project 2: Build a Minimal Blockchain (get the mental model)
- Project 4: Implement Proof-of-Stake (understand consensus theory)
- Project 1: Build Bitcoin from Scratch (apply all the theory)
- Project 3: Build the EVM (understand state machines)
- Project 5: Smart Contract Compiler (compilers & languages)
- Project 6: Layer-2 Rollup (advanced cryptography)
Why this order: You build theoretical understanding before diving into Bitcoin’s complexity.
Path 2: “The Practitioner” (Build-First Approach)
Best for: Professional developers, learn-by-doing types
Order:
- Start: Quick Start guide (48 hours)
- Project 2: Build a Minimal Blockchain (see it work immediately)
- Project 1: Build Bitcoin from Scratch (the real deal)
- Project 3: Build the EVM (different paradigm)
- Project 5: Smart Contract Compiler (compiler theory)
- Project 4: Proof-of-Stake (modern consensus)
- Project 6: Layer-2 Rollup (cutting edge)
Why this order: You get working code fast, then deepen understanding.
Path 3: “The Bitcoin Maximalist”
Best for: Those specifically interested in Bitcoin, security-focused
Order:
- Start: Read Bitcoin Whitepaper 3 times
- Project 1: Build Bitcoin from Scratch (the only true blockchain!)
- Project 2: Build a Minimal Blockchain (understand Bitcoin’s innovations)
- Project 4: Proof-of-Stake (to understand why Bitcoin doesn’t use it)
- Project 6: Layer-2 Rollup (Bitcoin’s Lightning Network uses similar ideas)
- Skip Projects 3 & 5 (or do them to understand “the competition”)
Why this order: Deep Bitcoin focus, with understanding of alternatives.
Path 4: “The Ethereum Developer”
Best for: dApp developers, smart contract auditors
Order:
- Start: Quick Start guide + read Ethereum sections above
- Project 3: Build the EVM (understand what your Solidity code runs on)
- Project 5: Smart Contract Compiler (understand how code becomes bytecode)
- Project 1: Build Bitcoin (understand why Ethereum differs)
- Project 4: Proof-of-Stake (Ethereum 2.0’s consensus)
- Project 6: Layer-2 Rollup (Arbitrum, Optimism, zkSync)
- Optional: Project 2 (minimal blockchain for comparison)
Why this order: Focused on understanding Ethereum’s entire stack.
Path 5: “The Researcher” (Consensus & Distributed Systems Focus)
Best for: Those interested in distributed systems, consensus algorithms
Order:
- Start: Read “Designing Data-Intensive Applications” Ch. 8-9 first
- Project 4: Proof-of-Stake (Byzantine Fault Tolerance)
- Project 2: Minimal Blockchain (distributed state machines)
- Project 1: Bitcoin (Nakamoto Consensus)
- Project 6: Layer-2 Rollup (optimistic vs zero-knowledge proofs)
- Projects 3 & 5: If interested in execution environments
Why this order: Focuses on the distributed systems theory that makes blockchains work.
Path 6: “The Time-Constrained” (Fastest Path to Understanding)
Best for: Busy professionals, want core insights fast
Order:
- Week 1: Quick Start (48 hours) + Project 2 (weekend)
- Week 2-4: Project 1 (Bitcoin, focus on Ch. 1-5 of “Programming Bitcoin”)
- Week 5-6: Project 3 (EVM, just get it working, don’t optimize)
- Stop here - you understand 80% of blockchain concepts
Why this order: Maximum learning per hour invested.
Project 1: Build Bitcoin From Scratch
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Programming Language: Python
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 5: Master
- Knowledge Area: Blockchain / Cryptography
- Software or Tool: Bitcoin
- Main Book: “Programming Bitcoin” by Jimmy Song
What you’ll build: A complete Bitcoin library implementing elliptic curve cryptography, transactions, blocks, script parsing, and network communication—all from first principles in Python.
Why it teaches blockchain: This forces you to understand why Bitcoin works, not just that it works. You can’t fake your way through implementing ECDSA signatures or parsing transaction scripts. Every line of code confronts you with a design decision Satoshi made.
Core challenges you’ll face:
- Finite field arithmetic → Maps to understanding why Bitcoin uses secp256k1 curve
- Elliptic curve point operations → Maps to how public keys derive from private keys
- Transaction serialization → Maps to how data is encoded on-chain
- Script interpreter → Maps to Bitcoin’s programmability model
- Merkle tree construction → Maps to how blocks efficiently prove transaction inclusion
- Block header hashing → Maps to proof-of-work mining
Resources for key challenges:
- “Programming Bitcoin” by Jimmy Song - THE definitive hands-on guide; each chapter builds the library piece by piece with exercises
- Bitcoin Whitepaper - Read alongside implementation to see theory meet practice
Key Concepts:
- Elliptic Curve Cryptography: “Programming Bitcoin” Ch. 2-3 - Jimmy Song
- Transaction Structure: “Mastering Bitcoin” Ch. 6 - Andreas Antonopoulos
- Script Opcodes: “Programming Bitcoin” Ch. 6 - Jimmy Song
- Merkle Trees: “Mastering Bitcoin” Ch. 11 - Andreas Antonopoulos
- Proof of Work: “Programming Bitcoin” Ch. 9 - Jimmy Song
Difficulty: Advanced Time estimate: 1 month+ Prerequisites: Python proficiency, basic number theory helps
Learning milestones:
- Generate valid Bitcoin addresses - You understand public key cryptography
- Parse and create transactions - You understand the UTXO model
- Validate a real block - You understand proof-of-work and Merkle proofs
- Interpret Script opcodes - You understand Bitcoin’s programmability
Real World Outcome
When you complete this project, you’ll have a fully functional Bitcoin library that you wrote from scratch. Here’s exactly what you’ll be able to do:
1. Generate Real Bitcoin Addresses
$ python bitcoin_cli.py generate-wallet
Private Key (WIF): 5HueCGU8rMjxEXxiPuD5BDku4MkFqeZyd4dZ1jvhTVqvbTLvyTJ
Public Key (compressed): 02d0de0aaeaefad02b8bdc8a01a1b8b11c696bd3d66a2c5f10780d95b7df42645c
Bitcoin Address (P2PKH): 1GAehh7TsJAHuUAeKZcXf5CnwuGuGgyX2S
This is a REAL Bitcoin address on mainnet!
Send 0.00001 BTC to it to verify (you can recover with the private key above)
2. Parse and Decode Real Transactions from the Blockchain
$ python bitcoin_cli.py decode-tx 0100000001c997a5e56e104102fa209c6a852dd90660a20b2d9c352423edce25857fcd3704000000...
TRANSACTION DECODED:
════════════════════════════════════════════════════════
Version: 1
Locktime: 0
INPUTS (1):
[0] Previous TX: 0437cd7f8525cede2e4a0b40b1f6e3c7...
Output Index: 0
ScriptSig: 47304402204e45e16932b8af514961...
Sequence: 0xffffffff
OUTPUTS (2):
[0] Value: 0.10000000 BTC (10,000,000 satoshis)
ScriptPubKey: OP_DUP OP_HASH160 <pubkeyhash> OP_EQUALVERIFY OP_CHECKSIG
Type: P2PKH (Pay to Public Key Hash)
Address: 1runeksijzfVxyrpiyCY2LCBvYsSi1Ai6
[1] Value: 0.08950000 BTC (Change)
ScriptPubKey: OP_DUP OP_HASH160 <pubkeyhash> OP_EQUALVERIFY OP_CHECKSIG
Address: 1QJtPTVJjkqVALLzCLF9kCLJYN4C4GzK2c
Transaction ID: e4c226432e...
Size: 225 bytes
════════════════════════════════════════════════════════
3. Create and Sign Your Own Transactions
$ python bitcoin_cli.py create-tx \
--from-utxo "tx_id:0" \
--to "1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2:0.001" \
--change "1MyAddress..." \
--private-key "your_wif_key"
SIGNED TRANSACTION:
Raw Hex: 0100000001eccf7e3034189b851985d871f91384b8ee357cd47c3024736e...
Breakdown:
- Input signed with ECDSA (your implementation!)
- Signature verified: ✓ VALID
- Ready to broadcast to network
You can paste this hex into any block explorer's "broadcast" feature
4. Validate Real Blocks from the Bitcoin Network
$ python bitcoin_cli.py validate-block 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
BLOCK 0 (Genesis Block) VALIDATION:
════════════════════════════════════════════════════════
Header Hash: 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
^^^^^^^^ (Leading zeros = Proof of Work!)
Version: 1
Previous Block: 0000000000000000000000000000000000000000000000000000000000000000
Merkle Root: 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
Timestamp: 2009-01-03 18:15:05 UTC
Difficulty Bits: 0x1d00ffff
Nonce: 2083236893
VALIDATION RESULTS:
✓ Block hash below target (PoW valid)
✓ Merkle root matches transactions
✓ Coinbase transaction present
✓ Block structure valid
"The Times 03/Jan/2009 Chancellor on brink of second bailout for banks"
- Satoshi's message embedded in the coinbase!
════════════════════════════════════════════════════════
5. Execute Bitcoin Script Programs
$ python bitcoin_cli.py run-script "OP_2 OP_3 OP_ADD OP_5 OP_EQUAL"
SCRIPT EXECUTION TRACE:
════════════════════════════════════════════════════════
Step 1: OP_2 Stack: [2]
Step 2: OP_3 Stack: [2, 3]
Step 3: OP_ADD Stack: [5] (popped 2 and 3, pushed 5)
Step 4: OP_5 Stack: [5, 5]
Step 5: OP_EQUAL Stack: [1] (1 = TRUE)
RESULT: ✓ Script executed successfully (stack top is truthy)
════════════════════════════════════════════════════════
The Core Question You’re Answering
“How does Bitcoin actually work at the byte level? How does a private key become an address? How does a transaction prove ownership without revealing the private key?”
Before you write any code, sit with these questions. Most developers know “Bitcoin uses cryptography” but can’t explain how a 256-bit random number becomes a valid Bitcoin address, or why you can prove you own coins without ever revealing your private key.
The answer involves:
- Finite field arithmetic (modular math in a prime field)
- Elliptic curve point multiplication (how a number becomes a curve point)
- Cryptographic hash chains (how addresses are derived)
- Digital signature math (how ECDSA proves knowledge without revelation)
Concepts You Must Understand First
Stop and research these before coding:
- Finite Fields (Modular Arithmetic)
- What does “mod p” mean and why is it useful for cryptography?
- What is a “field” in abstract algebra? Why does Bitcoin use prime fields?
- How do you compute modular inverses? (Extended Euclidean Algorithm)
- Why does Fermat’s Little Theorem give us a^(p-1) ≡ 1 (mod p)?
- Book Reference: “Programming Bitcoin” Ch. 1 - Jimmy Song
- Elliptic Curves over Finite Fields
- What is an elliptic curve equation? (y² = x³ + ax + b)
- What does “point addition” mean geometrically?
- What is the “point at infinity” and why do we need it?
- What makes secp256k1 special? (y² = x³ + 7)
- Book Reference: “Programming Bitcoin” Ch. 2-3 - Jimmy Song
- Digital Signatures (ECDSA)
- Why can signing prove ownership without revealing the private key?
- What is a “nonce” and why MUST it never be reused? (Sony PlayStation 3 hack!)
- What do r and s in a signature represent?
- How does verification work using only the public key?
- Book Reference: “Understanding and Using C Pointers” Ch. 1-2 for pointer intuition; “Programming Bitcoin” Ch. 4 - Jimmy Song for signatures
- Transaction Structure (UTXOs)
- What is a UTXO (Unspent Transaction Output)?
- Why does Bitcoin use inputs/outputs rather than account balances?
- What is a “locking script” (scriptPubKey) vs “unlocking script” (scriptSig)?
- How does OP_CHECKSIG actually verify a signature?
- Book Reference: “Mastering Bitcoin” Ch. 6 - Andreas Antonopoulos
- Bitcoin Script
- What is a stack-based language?
- Why did Satoshi make Script intentionally NOT Turing-complete?
- What are the most important opcodes: OP_DUP, OP_HASH160, OP_EQUALVERIFY, OP_CHECKSIG?
- How does P2PKH (Pay-to-Public-Key-Hash) work?
- Book Reference: “Programming Bitcoin” Ch. 6 - Jimmy Song
- Block Structure and Merkle Trees
- What is in a block header? (version, prev_hash, merkle_root, timestamp, bits, nonce)
- How does the Merkle root commit to all transactions?
- What is the “difficulty target” and how is it encoded in 4 bytes?
- Why does finding a valid nonce take billions of attempts?
- Book Reference: “Programming Bitcoin” Ch. 9 - Jimmy Song; “Mastering Bitcoin” Ch. 11 - Antonopoulos
Questions to Guide Your Design
Before implementing, think through these:
- Finite Field Class
- How will you represent a finite field element?
- How do you ensure all operations stay within the field (mod p)?
- What’s the most efficient way to compute modular exponentiation?
- How will you handle division (modular inverse)?
- Elliptic Curve Point Class
- How do you represent the “point at infinity”?
- How do you handle the special case where two points have the same x-coordinate?
- How do you efficiently compute k × G for large k? (Hint: double-and-add)
- How will you distinguish between compressed and uncompressed public keys?
- Transaction Serialization
- Why does Bitcoin use little-endian for some fields and big-endian for others?
- How do you parse variable-length integers (varints)?
- What is the “signature hash” and why is it different from the transaction hash?
- How do you handle witness data (SegWit)?
- Script Interpreter
- How will you represent opcodes?
- How do you handle conditional operators (OP_IF, OP_ELSE)?
- What should happen when a script fails?
- How do you handle OP_CHECKMULTISIG’s famous off-by-one bug?
- Block Validation
- How do you check if a block hash meets the difficulty target?
- How do you verify the Merkle root matches the transactions?
- What order should you validate things in for efficiency?
Thinking Exercise
Before coding, work through this on paper:
Exercise 1: Trace Key Derivation
Private Key (random 256-bit number):
k = 0x1 (simplest example)
Generator Point G on secp256k1:
G = (0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798,
0x483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8)
Public Key P = k × G = 1 × G = G (when k=1, P equals G)
Now trace what happens:
1. Serialize P as compressed (33 bytes) or uncompressed (65 bytes)
2. SHA256(serialized_P) → 32 bytes
3. RIPEMD160(sha256_result) → 20 bytes (the "hash160")
4. Prepend version byte (0x00 for mainnet)
5. SHA256(SHA256(version + hash160)) → take first 4 bytes as checksum
6. Base58Check encode (version + hash160 + checksum)
7. Result: Bitcoin address!
Exercise 2: Trace a Transaction
Given these UTXOs you control:
- UTXO 1: 0.5 BTC from tx_aaa, output 0
- UTXO 2: 0.3 BTC from tx_bbb, output 1
You want to send 0.6 BTC to address 1Bob...
Fee: 0.001 BTC
Work out:
1. Which UTXOs do you need to spend?
2. What is the change amount?
3. What does the raw transaction look like (draw the structure)?
4. What exactly gets signed? (The signature hash)
5. Where does the signature go in the final transaction?
The Interview Questions They’ll Ask
Prepare to answer these:
- “Walk me through what happens when you send a Bitcoin transaction.”
- “How does ECDSA prove you own a private key without revealing it?”
- “What is a UTXO and why did Satoshi choose this model over account balances?”
- “What makes Bitcoin Script different from a Turing-complete language?”
- “How does proof-of-work prevent double-spending?”
- “What would happen if someone reused a nonce in two different signatures?”
- “How does a Merkle tree allow light clients to verify transactions?”
- “What’s the difference between a transaction ID and the data that gets signed?”
- “Why does Bitcoin have both P2PKH and P2SH? What problem does P2SH solve?”
- “How is the difficulty target encoded in 4 bytes?”
Hints in Layers
Hint 1: Start with Finite Fields Build and test your finite field arithmetic first:
class FieldElement:
def __init__(self, num, prime):
self.num = num % prime
self.prime = prime
def __add__(self, other):
return FieldElement((self.num + other.num) % self.prime, self.prime)
def __pow__(self, exponent):
# Fermat's Little Theorem for negative exponents
n = exponent % (self.prime - 1)
return FieldElement(pow(self.num, n, self.prime), self.prime)
Test: Verify that a * a^(-1) = 1 for various values.
Hint 2: Point Addition Has Edge Cases The formula for adding two points depends on whether:
- Points are the same (point doubling)
- Points have the same x-coordinate (result is infinity)
- One point is the point at infinity
Draw the geometric picture before coding!
Hint 3: Signature Hash Is Not Transaction Hash When signing, you don’t sign the transaction—you sign a modified version where the input’s scriptSig is replaced with the previous output’s scriptPubKey. This tripped up many early implementers.
Hint 4: Test Against Real Data The book “Programming Bitcoin” includes test vectors from the real Bitcoin network. Use them! If your transaction parser can’t decode real transactions, something is wrong.
Hint 5: Use the Debug Flag Add verbose logging to your Script interpreter:
OP_DUP Stack: [pubkey] → [pubkey, pubkey]
OP_HASH160 Stack: [pubkey, pubkey] → [pubkey, hash160(pubkey)]
This makes debugging script execution much easier.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Complete Bitcoin implementation | Programming Bitcoin by Jimmy Song | Full book (follow along!) |
| Bitcoin transaction concepts | Mastering Bitcoin, 3rd Edition by Andreas Antonopoulos | Ch. 6: “Transactions” |
| Block structure and mining | Mastering Bitcoin, 3rd Edition | Ch. 10-11: “Mining” and “The Blockchain” |
| Elliptic curve cryptography math | Serious Cryptography, 2nd Edition by Jean-Philippe Aumasson | Ch. 11-12: “Public-Key Cryptography” |
| Hash function properties | Serious Cryptography, 2nd Edition | Ch. 6: “Hash Functions” |
| Bitcoin whitepaper context | The Book of Satoshi by Phil Champagne | Historical context and design decisions |
| Number theory foundations | An Introduction to Mathematical Cryptography by Hoffstein, Pipher, Silverman | Ch. 1-2: Modular arithmetic |
Common Pitfalls & Debugging
Building Bitcoin from scratch is complex. Here are the most common issues and how to solve them:
Problem 1: “My elliptic curve point addition doesn’t match reference implementations”
- Why: You’re likely not handling the “point at infinity” correctly, or missing modular inverse calculations
- Fix: The point at infinity (O) is the identity element. A + O = A. Also ensure you’re using Fermat’s Little Theorem for modular inverses:
a^(-1) ≡ a^(p-2) (mod p) - Quick test:
# Point addition should be commutative assert P + Q == Q + P # Identity element test assert P + O == P
Problem 2: “My signatures verify correctly but Bitcoin Core rejects them”
- Why: Bitcoin uses DER encoding for signatures, and has strict low-S requirement (BIP 62)
- Fix: Ensure s-value satisfies
s < curve_order / 2. If not, replace withcurve_order - s - Quick test: Verify against test vectors from Bitcoin test suite
Problem 3: “Transaction hashing gives wrong hash”
- Why: Bitcoin uses double SHA-256 and specific serialization order
- Fix: Hash = SHA256(SHA256(data)). Ensure fields are serialized in exact order: version, inputs, outputs, locktime
- Debug command:
# Compare your serialization with real tx bitcoin-cli getrawtransaction <txid> | xxd -r -p | xxd
Problem 4: “Script execution fails on OP_CHECKSIG”
- Why: The signature hash (sighash) calculation is intricate—you must remove scriptSig, replace with scriptPubKey, and append sighash type
- Fix: Follow the exact sighash algorithm from Bitcoin Wiki. The devil is in serialization details.
- Quick test:
# Test with a known good transaction # from block 170 (first Bitcoin transaction) test_tx_hash = "f4184fc596403b9d638783cf57adfe4c75c605f6356fbc91338530e9831e9e16"
Problem 5: “Merkle root doesn’t match block’s merkle_root field”
- Why: Merkle tree construction requires double SHA-256 at each level, and if odd number of nodes, duplicate the last one
- Fix: Implementation:
while len(hashes) > 1: if len(hashes) % 2 == 1: hashes.append(hashes[-1]) # Duplicate last hash hashes = [hash256(h1 + h2) for h1, h2 in zip(hashes[::2], hashes[1::2])] - Quick test: Validate against genesis block (known merkle root)
Problem 6: “Block validation passes but hash doesn’t have enough leading zeros”
- Why: Confusing bits/target with difficulty. Bitcoin uses compact “bits” representation
- Fix: Convert bits to target:
def bits_to_target(bits): exponent = bits >> 24 coefficient = bits & 0xffffff return coefficient * (256 ** (exponent - 3)) - Quick test: Genesis block has bits=0x1d00ffff, which should equal target with 8 leading zero bytes
Problem 7: “Getting ‘invalid signature’ errors but math seems correct”
- Why: Nonce (k value) in ECDSA must be cryptographically random. Never reuse!
- Fix: Use RFC 6979 deterministic k-generation (based on message hash + private key)
- Security warning: Reusing k reveals your private key! (This is how Sony PS3 was hacked)
Problem 8: “Python is too slow for mining”
- Why: Mining requires millions of hash operations. Pure Python is ~100x slower than C
- Fix: Either (1) lower difficulty for testing, or (2) Use hashlib (C implementation), or (3) Rewrite mining in C/Rust
- Alternative: Focus on validation, not mining. Use testnet blocks for validation tests.
Debugging Strategy:
- Start with test vectors: Bitcoin has extensive test data. Validate each component against known inputs/outputs
- Compare byte-by-byte: When your output differs, hexdump both and find first differing byte
- Use Bitcoin Core as oracle: You can use
bitcoin-clito verify your results - Build incrementally: Don’t try to validate a full block on day one. Start with: hash → signature → transaction → block
Recommended debugging tools:
bitcoin-cli- Query real blockchain dataxxdorhexyl- Hex dump to compare binary datapython -m pdb- Step through your code- bitcoin.stackexchange.com - Ask specific technical questions
Project 2: Build a Minimal Blockchain in a Weekend
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Programming Language: C
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Blockchain / Consensus
- Software or Tool: Blockchain concepts
- Main Book: “Mastering Bitcoin” by Andreas Antonopoulos
What you’ll build: A simple blockchain with proof-of-work consensus, transaction pool, and P2P gossip—the essential skeleton that makes all blockchains tick.
Why it teaches blockchain: Before diving into Bitcoin/Ethereum complexity, you need the core mental model: blocks chain together via hashes, nodes gossip transactions, and PoW makes forgery expensive. This project isolates those fundamentals.
Core challenges you’ll face:
- Chain integrity via hashes → Maps to why tampering is detectable
- Difficulty adjustment → Maps to why block times stay consistent
- Fork resolution → Maps to why “longest chain wins”
- Transaction validation → Maps to preventing double-spends
- Gossip protocol → Maps to how decentralization works
Key Concepts:
- Hash Functions: “Serious Cryptography, 2nd Edition” Ch. 6 - Jean-Philippe Aumasson
- Distributed Consensus: “Designing Data-Intensive Applications” Ch. 8-9 - Martin Kleppmann
- P2P Networking: “Computer Networks” Ch. 5 - Tanenbaum & Wetherall
Difficulty: Intermediate Time estimate: Weekend to 1 week Prerequisites: Any programming language, basic networking concepts
Learning milestones:
- Single node mines blocks - You understand PoW mechanics
- Two nodes sync chains - You understand gossip and fork resolution
- Transactions propagate and confirm - You understand the mempool and block inclusion
Real World Outcome
When you complete this project, you’ll have a working distributed blockchain running across multiple terminals (or machines). Here’s exactly what you’ll see:
1. Start Your First Node (Terminal 1)
$ ./blockchain-node --port 3000 --mine
╔═══════════════════════════════════════════════════════════════════╗
║ MINIMAL BLOCKCHAIN NODE v1.0 ║
║ Listening on port 3000 ║
╚═══════════════════════════════════════════════════════════════════╝
[2024-12-22 14:30:01] Genesis block created
Hash: 0000a1b2c3d4e5f6...
Difficulty: 4 (4 leading zeros required)
[2024-12-22 14:30:01] Starting mining thread...
[2024-12-22 14:30:01] Mining block 1...
[2024-12-22 14:30:03] Nonce attempt: 1000000
[2024-12-22 14:30:05] Nonce attempt: 2000000
[2024-12-22 14:30:08] ✓ BLOCK MINED!
Block #1
Hash: 0000f8e7d6c5b4a3...
Nonce: 2847291
Transactions: 1 (coinbase only)
Mining took: 7.2 seconds
[2024-12-22 14:30:08] Mining block 2...
2. Start a Second Node and Watch Synchronization (Terminal 2)
$ ./blockchain-node --port 3001 --peer localhost:3000
╔═══════════════════════════════════════════════════════════════════╗
║ MINIMAL BLOCKCHAIN NODE v1.0 ║
║ Listening on port 3001 ║
╚═══════════════════════════════════════════════════════════════════╝
[2024-12-22 14:31:00] Connecting to peer: localhost:3000
[2024-12-22 14:31:00] ← Received: CHAIN_REQUEST
[2024-12-22 14:31:00] → Sending: CHAIN_RESPONSE (3 blocks)
[2024-12-22 14:31:00] ✓ Synchronized with peer
Local chain: 3 blocks
Peer chain: 3 blocks
[2024-12-22 14:31:00] CURRENT CHAIN STATE:
┌─────────────────────────────────────────────────────────────────┐
│ Block 0 (Genesis) │
│ Hash: 0000a1b2c3d4e5f6789... │
│ Prev: 0000000000000000000... │
│ Txns: 0 │
├─────────────────────────────────────────────────────────────────┤
│ Block 1 │
│ Hash: 0000f8e7d6c5b4a3210... │
│ Prev: 0000a1b2c3d4e5f6789... │
│ Txns: 1 │
├─────────────────────────────────────────────────────────────────┤
│ Block 2 │
│ Hash: 00003c4d5e6f7a8b9c0... │
│ Prev: 0000f8e7d6c5b4a3210... │
│ Txns: 1 │
└─────────────────────────────────────────────────────────────────┘
3. Submit a Transaction and Watch It Propagate
$ ./blockchain-cli --node localhost:3001 send --from alice --to bob --amount 50
TRANSACTION SUBMITTED:
════════════════════════════════════════════════════════════════════
TX ID: tx_7f8a9b0c1d2e3f4a5b6c...
From: alice
To: bob
Amount: 50 coins
[2024-12-22 14:32:00] → Broadcasting to 2 connected peers...
[2024-12-22 14:32:00] ✓ Peer localhost:3000 acknowledged
[2024-12-22 14:32:00] ✓ Peer localhost:3002 acknowledged
Status: IN MEMPOOL (waiting for block inclusion)
════════════════════════════════════════════════════════════════════
# On Node 1 (Terminal 1), you'll see:
[2024-12-22 14:32:00] ← Received TX: tx_7f8a9b0c... (alice → bob: 50)
[2024-12-22 14:32:00] ✓ TX validated and added to mempool
[2024-12-22 14:32:00] Mempool size: 1 transaction
# When the block is mined:
[2024-12-22 14:32:15] ✓ BLOCK MINED!
Block #4
Hash: 0000abc123def456...
Transactions: 2 (1 coinbase + 1 user tx)
Including: tx_7f8a9b0c... (alice → bob: 50)
# On all nodes:
[2024-12-22 14:32:15] ← Received BLOCK: #4 from peer
[2024-12-22 14:32:15] ✓ Block validated and added to chain
[2024-12-22 14:32:15] TX tx_7f8a9b0c... now has 1 confirmation
4. Watch a Fork Happen and Resolve
# Start two miners simultaneously, disconnect them, let them each mine 2 blocks,
# then reconnect and watch the shorter chain get abandoned:
[2024-12-22 14:35:00] ⚠ FORK DETECTED!
Local chain: blocks 4a → 5a (total work: 12847291)
Remote chain: blocks 4b → 5b → 6b (total work: 19283746)
[2024-12-22 14:35:00] Remote chain has more work. REORGANIZING...
[2024-12-22 14:35:00] ✗ Orphaning block 5a (hash: 0000xyz...)
[2024-12-22 14:35:00] ✗ Orphaning block 4a (hash: 0000uvw...)
[2024-12-22 14:35:00] ✓ Adopting block 4b (hash: 0000rst...)
[2024-12-22 14:35:00] ✓ Adopting block 5b (hash: 0000opq...)
[2024-12-22 14:35:00] ✓ Adopting block 6b (hash: 0000lmn...)
[2024-12-22 14:35:00] Reorganization complete. Chain tip is now block 6b.
[2024-12-22 14:35:00] Returning 3 transactions from orphaned blocks to mempool.
5. Query Blockchain State
$ ./blockchain-cli --node localhost:3000 status
BLOCKCHAIN STATUS
════════════════════════════════════════════════════════════════════
Chain height: 12 blocks
Total difficulty: 48 (4 zeros × 12 blocks)
Chain work: 142,847,291 total hash attempts
Connected peers: 3
- localhost:3001 (height: 12)
- localhost:3002 (height: 12)
- 192.168.1.50:3000 (height: 12)
Mempool: 2 pending transactions
- tx_abc123... (alice → carol: 25)
- tx_def456... (bob → dave: 10)
ACCOUNT BALANCES:
alice: 425 coins (from mining + transfers)
bob: 50 coins
carol: 0 coins (25 pending)
miner1: 600 coins (block rewards)
════════════════════════════════════════════════════════════════════
The Core Question You’re Answering
“How do distributed nodes agree on a single version of truth without a central authority? Why can’t someone just create a fake blockchain?”
This is the fundamental question that Bitcoin solved. Before you write any code, understand:
- Why does hashing create “chains”? (Each block commits to all previous blocks)
- Why does Proof of Work create “irreversibility”? (Rewriting requires redoing all work)
- Why does “longest chain wins” create consensus? (Honest majority outpaces attackers)
- Why does gossip create “decentralization”? (No single point of failure)
Concepts You Must Understand First
Stop and research these before coding:
- Cryptographic Hash Functions
- What makes SHA-256 “secure”? (Preimage resistance, collision resistance)
- Why does changing one bit change the entire hash? (Avalanche effect)
- How do you verify data integrity with a hash?
- Book Reference: “Serious Cryptography, 2nd Edition” Ch. 6 - Jean-Philippe Aumasson
- Linked Data Structures via Hashes
- How does including the previous hash in each block create a “chain”?
- Why can’t you modify a block in the middle without invalidating everything after?
- What is a “hash pointer” and how is it different from a regular pointer?
- Book Reference: “Mastering Bitcoin” Ch. 11 - Andreas Antonopoulos
- Proof of Work
- What is a “target” and what does it mean for a hash to be “below” it?
- Why is finding a valid nonce hard but verifying it easy?
- How does difficulty adjustment maintain consistent block times?
- What is a “nonce” and why does incrementing it change the hash?
- Book Reference: “Programming Bitcoin” Ch. 9 - Jimmy Song
- Consensus and Fork Resolution
- What happens when two miners find valid blocks at the same time?
- Why does “longest chain” (or “most work”) win?
- What is “reorganization” and when does it happen?
- What is the difference between a soft fork and a hard fork?
- Book Reference: “Designing Data-Intensive Applications” Ch. 9 - Martin Kleppmann
- P2P Networking and Gossip
- How do nodes discover each other without a central server?
- What is a “gossip protocol” and why is it resilient?
- How do you prevent message loops in a mesh network?
- What is the difference between push and pull gossip?
- Book Reference: “Computer Networks” Ch. 5 - Tanenbaum & Wetherall
- Transaction Pools (Mempools)
- What is a mempool and why is it needed?
- How do you validate a transaction before including it?
- How do miners choose which transactions to include?
- What happens to transactions in orphaned blocks?
- Book Reference: “Mastering Bitcoin” Ch. 6 and 10 - Antonopoulos
Questions to Guide Your Design
Before implementing, think through these:
- Block Structure
- What fields must a block have? (index, timestamp, transactions, prev_hash, nonce, hash)
- How will you serialize a block for hashing?
- How will you store blocks? (In-memory array? File? Database?)
- How will you handle the genesis block (no previous hash)?
- Proof of Work Mining
- How will you represent the difficulty target?
- How will you increment the nonce?
- Should mining run in a separate thread?
- How do you stop mining when you receive a valid block from a peer?
- Networking
- What message types do you need? (NEW_BLOCK, NEW_TX, GET_CHAIN, CHAIN_RESPONSE, etc.)
- How will you serialize messages? (JSON? Binary? Custom?)
- How will you handle partial reads from sockets?
- How will you prevent infinite message forwarding?
- Transaction Validation
- How will you track account balances? (UTXO vs account model)
- How will you prevent double-spending?
- What happens if a transaction references a balance from an unconfirmed transaction?
- How will you handle coinbase (block reward) transactions?
- Fork Resolution
- How will you compare two chains to decide which is “better”?
- What do you do when you receive a block that doesn’t extend your chain?
- How will you request missing blocks from peers?
- How will you return orphaned transactions to the mempool?
Thinking Exercise
Before coding, trace this scenario on paper:
Scenario: Three nodes, one malicious
Time T0: All nodes have chain: [Block 0] → [Block 1] → [Block 2]
Block 2 contains: alice → bob: 50 coins
Time T1: Node A (malicious) disconnects from network
Node A starts mining an alternate Block 2':
Block 2': alice → mallory: 50 coins (double-spend attempt!)
Time T2: Node A mines Block 2' and Block 3' (2 blocks deep)
Meanwhile, honest network mines only Block 3
Time T3: Node A reconnects with chain: [B0] → [B1] → [B2'] → [B3']
Honest nodes have: [B0] → [B1] → [B2] → [B3]
Questions:
1. Which chain "wins"? Why?
2. What happens to the transaction "alice → bob: 50"?
3. What would Mallory need to accomplish this attack?
4. How does this relate to "6 confirmations" recommendation?
Draw the fork diagram and trace the resolution!
The Interview Questions They’ll Ask
Prepare to answer these:
- “What prevents someone from rewriting blockchain history?”
- “Why does Proof of Work use so much energy? Is there an alternative?”
- “What happens when two miners find a block at the same time?”
- “How does a new node synchronize with the network?”
- “What is a 51% attack and is it actually feasible?”
- “Why do Bitcoin transactions need multiple confirmations?”
- “How does gossip protocol handle network partitions?”
- “What’s the difference between your minimal blockchain and Bitcoin?”
- “How would you add transaction fees to your implementation?”
- “What are the trade-offs between short and long block times?”
Hints in Layers
Hint 1: Start with a Single-Node Chain Get this working first before any networking:
typedef struct Block {
uint32_t index;
uint32_t timestamp;
char prev_hash[65]; // SHA-256 hex string
char hash[65];
uint32_t nonce;
char data[1024]; // Simplified: just a string
} Block;
char* calculate_hash(Block* block) {
char buffer[2048];
sprintf(buffer, "%d%d%s%d%s",
block->index, block->timestamp,
block->prev_hash, block->nonce, block->data);
return sha256(buffer); // You'll need a SHA-256 library
}
bool is_valid_hash(char* hash, int difficulty) {
for (int i = 0; i < difficulty; i++) {
if (hash[i] != '0') return false;
}
return true;
}
Hint 2: Mining Loop Pattern
void mine_block(Block* block, int difficulty) {
block->nonce = 0;
while (true) {
char* hash = calculate_hash(block);
if (is_valid_hash(hash, difficulty)) {
strcpy(block->hash, hash);
return;
}
block->nonce++;
if (block->nonce % 100000 == 0) {
printf("Nonce: %d, Hash: %.16s...\n", block->nonce, hash);
}
}
}
Hint 3: Simple TCP Message Protocol
Message format:
[4 bytes: message type] [4 bytes: payload length] [N bytes: payload]
Message types:
0x01: NEW_BLOCK
0x02: NEW_TRANSACTION
0x03: REQUEST_CHAIN
0x04: CHAIN_RESPONSE
0x05: PEER_ANNOUNCE
Hint 4: Chain Comparison
int compare_chains(Block* chain_a, int len_a, Block* chain_b, int len_b) {
// Sum of all nonces represents total "work"
uint64_t work_a = 0, work_b = 0;
for (int i = 0; i < len_a; i++) work_a += chain_a[i].nonce;
for (int i = 0; i < len_b; i++) work_b += chain_b[i].nonce;
return (work_a > work_b) ? 1 : (work_a < work_b) ? -1 : 0;
}
Hint 5: Use select() for Multi-Connection Handling
fd_set read_fds;
FD_ZERO(&read_fds);
FD_SET(server_socket, &read_fds);
for (int i = 0; i < num_peers; i++) {
FD_SET(peer_sockets[i], &read_fds);
}
select(max_fd + 1, &read_fds, NULL, NULL, &timeout);
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Hash function fundamentals | Serious Cryptography, 2nd Edition by Jean-Philippe Aumasson | Ch. 6: “Hash Functions” |
| Blockchain structure | Mastering Bitcoin, 3rd Edition by Andreas Antonopoulos | Ch. 11: “The Blockchain” |
| Proof of Work mining | Programming Bitcoin by Jimmy Song | Ch. 9: “Blocks” |
| Distributed consensus theory | Designing Data-Intensive Applications by Martin Kleppmann | Ch. 8-9: “Trouble with Distributed Systems” |
| Network programming in C | The Linux Programming Interface by Michael Kerrisk | Ch. 56-61: “Sockets” |
| P2P protocols | Computer Networks by Tanenbaum & Wetherall | Ch. 5: “Network Layer” |
| Bitcoin P2P specifics | Mastering Bitcoin, 3rd Edition | Ch. 8: “The Bitcoin Network” |
Common Pitfalls & Debugging
Problem 1: “Nodes connect but chains don’t sync”
- Why: You’re likely sending chain data but not validating before accepting
- Fix: Before adding received blocks: (1) Verify hash matches target, (2) Check previous_hash exists in your chain, (3) Validate all transactions
- Quick test:
// Validation checklist bool validate_block(Block *block) { if (!hash_meets_target(block->hash, current_difficulty)) return false; if (!find_block_by_hash(block->prev_hash)) return false; return true; }
Problem 2: “Fork resolution chooses wrong chain”
- Why: “Longest chain wins” means chain with most cumulative work, not most blocks
- Fix: Track cumulative difficulty, not just block count
- Quick test:
// Wrong: return longest_chain_by_count(); // Right: return chain_with_most_work(); // Sum of all difficulties
Problem 3: “Mining never finds a valid hash”
- Why: Off-by-one error in difficulty check, or endianness issues
- Fix: Target is a large number. Block hash must be numerically less than target
- Debug:
printf("Block hash: %s\n", hash_to_hex(block.hash)); printf("Target: %s\n", target_to_hex(current_target)); // Hash should be smaller (more leading zeros)
Problem 4: “Nodes see different transaction states”
- Why: Race condition—transaction removed from mempool before all nodes see it
- Fix: Keep transactions in mempool until N confirmations (usually 6)
- Quick test: Send same transaction from 2 nodes simultaneously, ensure both accept it
Problem 5: “P2P gossip creates message storms”
- Why: Forwarding messages to sender, or no duplicate detection
- Fix:
// Track seen messages HashSet *seen_messages = hashset_new(); void on_message(Message *msg, Peer *from_peer) { if (hashset_contains(seen_messages, msg->id)) return; // Already seen hashset_add(seen_messages, msg->id); broadcast_to_peers_except(msg, from_peer); // Don't send back to sender }
Problem 6: “Memory leak when nodes disconnect/reconnect”
- Why: Not freeing peer connection structures
- Fix: Use valgrind to find leaks:
valgrind --leak-check=full ./blockchain-node # Fix all "definitely lost" blocks before continuing
Problem 7: “select() or epoll() returns but no data to read”
- Why: Peer disconnected (EOF), or spurious wakeup
- Fix:
int bytes = recv(socket, buffer, sizeof(buffer), 0); if (bytes == 0) { // Connection closed gracefully remove_peer(socket); } else if (bytes < 0) { // Error occurred perror("recv"); remove_peer(socket); }
Problem 8: “Difficulty adjustment causes sudden jumps”
- Why: Adjusting too frequently or using wrong formula
- Fix: Bitcoin adjusts every 2016 blocks. New target = old target × (actual time / expected time)
- Quick test: If blocks came too fast, difficulty should increase (target decreases)
Debugging Strategy:
- Single node first: Get mining working before adding P2P
- Two nodes: Test sync, then fork resolution
- Network partition test: Disconnect nodes, mine on both, reconnect, verify longest chain wins
- Print everything: In development, log every message sent/received
- Wireshark: Capture and inspect actual P2P traffic
Common gotchas:
- Endianness: Network byte order (big-endian) vs. host order
- Buffer overflows: Always check message size before parsing
- Race conditions: Use locks when multiple threads access shared state
Project 3: Build the Ethereum Virtual Machine (EVM) From Scratch
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Main Programming Language: Rust
- Alternative Programming Languages: Go, TypeScript, Python
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: Level 1: The “Resume Gold”
- Difficulty: Level 4: Expert (The Systems Architect)
- Knowledge Area: Virtual Machines, Blockchain
- Software or Tool: EVM, Ethereum, Bytecode
- Main Book: “Crafting Interpreters” by Robert Nystrom
What you’ll build: A stack-based virtual machine that executes Ethereum bytecode—implementing opcodes like PUSH, ADD, SSTORE, CALL, and the gas metering system.
Why it teaches Ethereum: The EVM is Ethereum’s brain. Smart contracts compile to bytecode that the EVM executes. Building it yourself reveals why gas exists, how storage works, why reentrancy attacks happen, and what makes smart contracts “smart.”
Core challenges you’ll face:
- Stack machine architecture → Maps to how computation is expressed
- 256-bit word operations → Maps to why Solidity uses uint256
- Gas accounting → Maps to preventing infinite loops and spam
- Storage (SLOAD/SSTORE) → Maps to contract state persistence
- CALL semantics → Maps to contract-to-contract interaction
- Memory vs storage vs stack → Maps to Solidity optimization
Resources for key challenges:
- EVM From Scratch Course by W1nt3r.eth - 116 progressive tests to pass
- EVM From Scratch Book - Jupyter notebooks building step-by-step
- “Mastering Ethereum” Ch. 13 - Authoritative EVM reference
Key Concepts:
- Stack Machines: “Computer Systems: A Programmer’s Perspective” Ch. 3 - Bryant & O’Hallaron
- Virtual Machine Design: “Crafting Interpreters” - Robert Nystrom (free online)
- EVM Opcodes: Ethereum Yellow Paper (formal specification)
Difficulty: Advanced Time estimate: 2-4 weeks Prerequisites: Understanding of stack-based computation, any systems language
Learning milestones:
- Arithmetic opcodes work - You understand the stack model
- Control flow (JUMP/JUMPI) works - You understand how loops and conditionals compile
- Storage operations work - You understand contract state
- CALL works - You understand contract interaction and reentrancy
Real World Outcome
When you complete this project, you’ll have a working EVM that can execute real Solidity bytecode. Here’s exactly what you’ll be able to do:
1. Execute and Trace Simple Bytecode
$ ./evm-cli run --bytecode "6005600401" --trace
EVM EXECUTION TRACE
════════════════════════════════════════════════════════════════════
Bytecode: 60 05 60 04 01
│ │ │ │ │
│ │ │ │ └─ ADD (0x01)
│ │ │ └──── 0x04 (data for PUSH1)
│ │ └─────── PUSH1 (0x60)
│ └────────── 0x05 (data for PUSH1)
└───────────── PUSH1 (0x60)
Step 1: PUSH1 0x05
PC: 0 → 2
Gas: 3 consumed (2999997 remaining)
Stack: [] → [5]
Step 2: PUSH1 0x04
PC: 2 → 4
Gas: 3 consumed (2999994 remaining)
Stack: [5] → [5, 4]
Step 3: ADD
PC: 4 → 5
Gas: 3 consumed (2999991 remaining)
Stack: [5, 4] → [9]
EXECUTION COMPLETE
════════════════════════════════════════════════════════════════════
Result: 0x09 (9 in decimal)
Gas used: 9
Status: SUCCESS
════════════════════════════════════════════════════════════════════
2. Execute Real Compiled Solidity Contracts
# First, compile a simple Solidity contract:
# contract Counter {
# uint256 count;
# function increment() public { count += 1; }
# function get() public view returns (uint256) { return count; }
# }
$ ./evm-cli deploy --bytecode "608060405234801561001057600080fd5b50610..."
CONTRACT DEPLOYED
════════════════════════════════════════════════════════════════════
Contract Address: 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4
Bytecode Size: 245 bytes
Gas Used: 127,543
════════════════════════════════════════════════════════════════════
$ ./evm-cli call --to 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4 \
--data "0xd09de08a" \ # increment() function selector
--trace
FUNCTION CALL: increment()
════════════════════════════════════════════════════════════════════
Step 1: PUSH1 0x80 Stack: [128]
Step 2: PUSH1 0x40 Stack: [128, 64]
Step 3: MSTORE Stack: [] Memory[64] = 128
...
Step 45: SLOAD Stack: [0] (load count from slot 0)
Step 46: PUSH1 0x01 Stack: [0, 1]
Step 47: ADD Stack: [1]
Step 48: PUSH1 0x00 Stack: [1, 0]
Step 49: SSTORE Stack: [] Storage[0] = 1
...
Step 62: RETURN
EXECUTION COMPLETE
════════════════════════════════════════════════════════════════════
Storage Changes:
Slot 0x00: 0x00 → 0x01 (count incremented!)
Gas Used: 43,291
Status: SUCCESS
════════════════════════════════════════════════════════════════════
$ ./evm-cli call --to 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4 \
--data "0x6d4ce63c" # get() function selector
Return Value: 0x0000000000000000000000000000000000000000000000000000000000000001
Decoded: uint256 = 1
3. Debug Reentrancy Vulnerabilities
# Deploy vulnerable contract and attacker contract, then trace the attack:
$ ./evm-cli call --to 0xVulnerable... --data "0x..." --trace
REENTRANCY ATTACK DETECTED
════════════════════════════════════════════════════════════════════
Call Depth: 0 → Vulnerable.withdraw()
Step 15: SLOAD balance[attacker] = 1 ETH
Step 20: CALL → Attacker.fallback() with 1 ETH
Call Depth: 1 → Attacker.fallback()
Step 5: CALL → Vulnerable.withdraw() [REENTRANT!]
Call Depth: 2 → Vulnerable.withdraw()
Step 15: SLOAD balance[attacker] = 1 ETH (NOT YET UPDATED!)
Step 20: CALL → Attacker.fallback() with 1 ETH
Call Depth: 3 → Attacker.fallback()
... (continues until gas exhausted or balance drained)
⚠ VULNERABILITY: State update (SSTORE) happens AFTER external call (CALL)
⚠ FIX: Use checks-effects-interactions pattern
════════════════════════════════════════════════════════════════════
4. Inspect Memory and Storage State
$ ./evm-cli debug --to 0x... --data "0x..."
EVM DEBUGGER
════════════════════════════════════════════════════════════════════
(evm) step
PC: 0x0A | Opcode: MSTORE | Gas: 2999991
(evm) stack
Stack (3 items):
[0] 0x0000...0080 (128)
[1] 0x0000...0040 (64)
[2] 0x0000...0001 (1)
(evm) memory 0 128
Memory (128 bytes):
0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80
^^
Free memory pointer
(evm) storage
Storage (2 slots):
Slot 0x00: 0x0000...0005 (count = 5)
Slot 0x01: 0x0000...owner_address... (owner)
(evm) continue
════════════════════════════════════════════════════════════════════
5. Run the EVM From Scratch Test Suite
$ cargo test
running 116 tests
test evm::tests::test_stop ... ok
test evm::tests::test_add ... ok
test evm::tests::test_mul ... ok
test evm::tests::test_sub ... ok
test evm::tests::test_div ... ok
test evm::tests::test_sdiv ... ok
test evm::tests::test_mod ... ok
...
test evm::tests::test_push1 ... ok
test evm::tests::test_push32 ... ok
test evm::tests::test_dup1 ... ok
...
test evm::tests::test_mstore ... ok
test evm::tests::test_mload ... ok
test evm::tests::test_sstore ... ok
test evm::tests::test_sload ... ok
...
test evm::tests::test_jump ... ok
test evm::tests::test_jumpi ... ok
test evm::tests::test_call ... ok
test evm::tests::test_delegatecall ... ok
test evm::tests::test_create ... ok
test evm::tests::test_selfdestruct ... ok
test result: ok. 116 passed; 0 failed
The Core Question You’re Answering
“How does a blockchain execute code? What actually happens when you call a smart contract function, and why do some operations cost more gas than others?”
Before writing code, understand that the EVM is fundamentally just a stack machine with three areas of data access:
- Stack: Fast, temporary, cheap (1024 elements max, 256-bit words)
- Memory: Volatile byte array, grows during execution, quadratic cost
- Storage: Persistent key-value store, survives transactions, very expensive
Every smart contract vulnerability (reentrancy, integer overflow, access control) has its roots in how the EVM executes bytecode. Understanding the EVM means understanding why contracts behave (and misbehave) the way they do.
Concepts You Must Understand First
Stop and research these before coding:
- Stack Machine Architecture
- What is a stack and why is LIFO important?
- How do you express
a + b * cusing only stack operations? - What is “reverse Polish notation”?
- How is a stack machine different from a register machine?
- Book Reference: “Crafting Interpreters” Ch. 15 - Robert Nystrom
- 256-bit Integer Arithmetic
- Why did Ethereum choose 256-bit words? (Matches cryptographic primitives)
- How do you represent negative numbers? (Two’s complement)
- What is “signed” vs “unsigned” division in the EVM?
- How do you handle overflow? (EVM wraps around!)
- Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 2 - Bryant & O’Hallaron
- Bytecode and Opcodes
- What is an “opcode” and why is it one byte?
- How does the program counter (PC) advance?
- What is the difference between PUSH1 and PUSH32?
- Why are some opcodes followed by immediate data?
- Reference: evm.codes - Interactive opcode reference
- EVM Memory Model
- What is the difference between stack, memory, and storage?
- Why does memory cost grow quadratically?
- What is the “free memory pointer” at address 0x40?
- How are dynamic arrays stored in memory vs storage?
- Book Reference: “Mastering Ethereum” Ch. 13 - Antonopoulos & Wood
- Gas and Execution Limits
- Why does each opcode have a gas cost?
- What is “gas limit” vs “gas price”?
- How does gas prevent infinite loops?
- Why does SSTORE cost so much more than ADD?
- Reference: Ethereum Yellow Paper Appendix G
- Call Semantics (CALL, DELEGATECALL, STATICCALL)
- What is a “message call” and how does it create a new execution context?
- How does DELEGATECALL preserve msg.sender and storage context?
- What happens to gas during a call?
- What is the “call depth limit” and why does it exist?
- Book Reference: “Mastering Ethereum” Ch. 13 - Antonopoulos & Wood
Questions to Guide Your Design
Before implementing, think through these:
- Data Representation
- How will you represent 256-bit integers? (BigInt library? Fixed array?)
- How will you handle the stack? (Vec/Array with push/pop?)
- How will you store memory? (Byte array that grows as needed?)
- How will you store storage? (HashMap of slot → value?)
- Opcode Dispatch
- How will you map opcode bytes to handler functions?
- How will you handle PUSHn opcodes (which read n bytes of immediate data)?
- How will you implement DUPn and SWAPn (parameterized by n)?
- Should you use a switch statement, function table, or trait objects?
- Execution Context
- What state do you need to track? (PC, stack, memory, storage, gas, etc.)
- How will you handle nested calls? (New context per call?)
- How will you pass msg.sender, msg.value, calldata?
- How will you handle return data from calls?
- Gas Accounting
- Where in your code will you deduct gas?
- How will you handle “out of gas” mid-execution?
- How will you calculate dynamic gas costs (memory expansion)?
- How will you handle gas refunds (SSTORE clearing)?
- Control Flow
- How will you validate JUMP destinations (must be JUMPDEST)?
- How will you handle STOP, RETURN, REVERT, INVALID?
- How will you implement conditionals (JUMPI)?
- How will you detect infinite loops (gas exhaustion)?
Thinking Exercise
Before coding, trace this bytecode by hand:
Bytecode: 60 03 60 05 01 60 02 02 60 00 52 60 20 60 00 f3
Disassembly:
00: PUSH1 0x03 Push 3 onto stack
02: PUSH1 0x05 Push 5 onto stack
04: ADD Pop 2, push sum (3+5=8)
05: PUSH1 0x02 Push 2 onto stack
07: MUL Pop 2, push product (8*2=16)
08: PUSH1 0x00 Push 0 onto stack
0A: MSTORE Store 16 at memory[0:32]
0B: PUSH1 0x20 Push 32 onto stack
0D: PUSH1 0x00 Push 0 onto stack
0F: RETURN Return memory[0:32]
Trace:
Step 1: PUSH1 0x03 Stack: [3] Memory: empty
Step 2: PUSH1 0x05 Stack: [3, 5] Memory: empty
Step 3: ADD Stack: [8] Memory: empty
Step 4: PUSH1 0x02 Stack: [8, 2] Memory: empty
Step 5: MUL Stack: [16] Memory: empty
Step 6: PUSH1 0x00 Stack: [16, 0] Memory: empty
Step 7: MSTORE Stack: [] Memory[0:32] = 0x...0010 (16)
Step 8: PUSH1 0x20 Stack: [32] Memory[0:32] = 0x...0010
Step 9: PUSH1 0x00 Stack: [32, 0] Memory[0:32] = 0x...0010
Step 10: RETURN Return 32 bytes from memory offset 0
Result: 0x0000000000000000000000000000000000000000000000000000000000000010
= 16 in decimal
Question: What is the total gas cost of this execution?
- PUSH1: 3 gas × 5 = 15 gas
- ADD: 3 gas
- MUL: 5 gas
- MSTORE: 3 gas + (memory expansion cost)
- RETURN: 0 gas
The Interview Questions They’ll Ask
Prepare to answer these:
- “What is the EVM and why is it stack-based?”
- “Explain the difference between memory, storage, and the stack.”
- “Why does SSTORE cost 20,000 gas while ADD costs only 3?”
- “What is a reentrancy attack and how does it exploit CALL semantics?”
- “How does DELEGATECALL differ from CALL? When would you use each?”
- “What happens when a contract runs out of gas mid-execution?”
- “How does the EVM prevent infinite loops?”
- “What is a JUMPDEST and why is it required?”
- “How are function selectors (4-byte signatures) used in the EVM?”
- “What is the ‘free memory pointer’ and where is it stored?”
- “Why do we need STATICCALL? What security guarantee does it provide?”
- “How would you implement your own ERC-20 token at the bytecode level?”
Hints in Layers
Hint 1: Start with Stack Operations Get PUSH, POP, DUP, SWAP working first:
struct EVM {
stack: Vec<U256>, // Use a 256-bit integer library
pc: usize,
code: Vec<u8>,
gas: u64,
}
impl EVM {
fn execute(&mut self) -> Result<Vec<u8>, &'static str> {
while self.pc < self.code.len() {
let opcode = self.code[self.pc];
match opcode {
0x00 => break, // STOP
0x01 => self.op_add()?,
0x60 => self.op_push(1)?, // PUSH1
0x61 => self.op_push(2)?, // PUSH2
// ...
_ => return Err("Invalid opcode"),
}
}
Ok(vec![])
}
fn op_add(&mut self) -> Result<(), &'static str> {
let a = self.stack.pop().ok_or("Stack underflow")?;
let b = self.stack.pop().ok_or("Stack underflow")?;
self.stack.push(a.wrapping_add(b)); // EVM wraps on overflow!
self.gas -= 3;
self.pc += 1;
Ok(())
}
fn op_push(&mut self, n: usize) -> Result<(), &'static str> {
let value = &self.code[self.pc + 1..self.pc + 1 + n];
self.stack.push(U256::from_big_endian(value));
self.gas -= 3;
self.pc += 1 + n;
Ok(())
}
}
Hint 2: Memory is a Growing Byte Array
struct Memory {
data: Vec<u8>,
}
impl Memory {
fn expand_to(&mut self, offset: usize, size: usize) -> u64 {
let needed = offset + size;
if needed > self.data.len() {
let old_words = (self.data.len() + 31) / 32;
self.data.resize(needed, 0);
let new_words = (self.data.len() + 31) / 32;
// Memory expansion cost is quadratic!
let old_cost = old_words * 3 + (old_words * old_words) / 512;
let new_cost = new_words * 3 + (new_words * new_words) / 512;
return (new_cost - old_cost) as u64;
}
0
}
}
Hint 3: Storage is Just a HashMap
use std::collections::HashMap;
struct Storage {
slots: HashMap<U256, U256>,
}
impl Storage {
fn sload(&self, slot: &U256) -> U256 {
self.slots.get(slot).cloned().unwrap_or(U256::zero())
}
fn sstore(&mut self, slot: U256, value: U256) -> u64 {
let old = self.sload(&slot);
let gas = if old.is_zero() && !value.is_zero() {
20000 // Setting non-zero from zero
} else if !old.is_zero() && value.is_zero() {
5000 // Clearing (plus refund)
} else {
5000 // Modifying
};
self.slots.insert(slot, value);
gas
}
}
Hint 4: Use the evm-from-scratch Test Suite Clone https://github.com/w1nt3r-eth/evm-from-scratch and run tests as you go. Each test is one opcode—perfect for incremental development.
Hint 5: Handle CALL Last CALL is the most complex opcode because it creates a new execution context. Get everything else working first, then tackle CALL, DELEGATECALL, and STATICCALL.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Stack machine fundamentals | Crafting Interpreters by Robert Nystrom | Ch. 14-15: “Chunks of Bytecode” and “A Virtual Machine” |
| EVM specification | Mastering Ethereum by Antonopoulos & Wood | Ch. 13: “The Ethereum Virtual Machine” |
| Binary representation | Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron | Ch. 2: “Representing Information” |
| VM dispatch techniques | Virtual Machines: Versatile Platforms by Iain D. Craig | Ch. 2-3: “Interpreters” |
| Smart contract security | Mastering Ethereum | Ch. 9: “Smart Contract Security” |
| Solidity internals | Ethereum Smart Contract Development by Mayukh Mukhopadhyay | Ch. 6-7: “Smart Contract Internals” |
| Formal EVM specification | Ethereum Yellow Paper | Appendix H: “Virtual Machine Specification” |
Common Pitfalls & Debugging
Problem 1: “Stack underflow on seemingly valid bytecode”
- Why: You’re not handling stack depth requirements correctly. Some opcodes pop more items than they push
- Fix: Before each opcode, validate:
stack.len() >= required_depth - Quick test:
// SWAP2 requires at least 3 items let opcode = 0x91; // SWAP2 if stack.len() < 3 { return Err("Stack underflow"); }
Problem 2: “Gas calculation doesn’t match real EVM”
- Why: Some opcodes have dynamic gas costs (memory expansion, SSTORE refunds)
- Fix: Memory expansion gas:
new_mem_cost = (new_size^2 / 512) + (3 * new_size) - Quick test: Run against evm.codes test vectors and compare gas used
Problem 3: “SSTORE/SLOAD work but values don’t persist between calls”
- Why: Storage must be external to the EVM execution context
- Fix:
struct Account { storage: HashMap<U256, U256>, // Persistent key-value store balance: U256, nonce: u64, } // EVM only holds a *reference* to storage struct EVM<'a> { storage: &'a mut HashMap<U256, U256>, // Reference to account storage // ... }
Problem 4: “256-bit arithmetic overflows or wraps incorrectly”
- Why: EVM uses wrapping arithmetic (modulo 2^256), not overflow panics
- Fix: Use
.wrapping_add(),.wrapping_mul(), etc. - Quick test:
// MAX_U256 + 1 should wrap to 0 assert_eq!(U256::MAX.wrapping_add(U256::from(1)), U256::zero());
Problem 5: “CALL/DELEGATECALL creates infinite recursion”
- Why: No call depth limit or not passing gas correctly
- Fix: EVM limits call depth to 1024. Also: callee gets at most 63/64 of remaining gas
- Quick test:
const MAX_CALL_DEPTH: usize = 1024; fn call(&mut self, call_depth: usize) -> Result<()> { if call_depth >= MAX_CALL_DEPTH { return Err("Call depth exceeded"); } let callee_gas = (self.gas * 63) / 64; // Execute with reduced gas... }
Problem 6: “CREATE opcode fails with ‘out of gas’ but plenty remains”
- Why: CREATE charges extra gas for code deployment (200 gas per byte)
- Fix: Total cost = init_code_gas + (deployed_code.len() * 200)
- Quick test: Deploy a 100-byte contract, verify (32000 + 20000) gas charged
Problem 7: “JUMPI doesn’t jump even when condition is true”
- Why: EVM treats ANY non-zero value as true, but destination must be a JUMPDEST (0x5B)
- Fix:
fn op_jumpi(&mut self) -> Result<()> { let dest = self.stack.pop()? as usize; let condition = self.stack.pop()?; if condition != U256::zero() { // Validate destination is JUMPDEST if self.code[dest] != 0x5B { return Err("Invalid jump destination"); } self.pc = dest; } else { self.pc += 1; } Ok(()) }
Problem 8: “Memory expansion costs explode unexpectedly”
- Why: Memory cost is quadratic, not linear
- Fix: Cost for expanding from
old_sizetonew_sizeis:fn memory_cost(size_in_words: u64) -> u64 { (size_in_words * size_in_words) / 512 + (3 * size_in_words) } let expansion_cost = memory_cost(new_size) - memory_cost(old_size); - Why quadratic? Prevents spam attacks. Accessing 1MB of memory should be expensive!
Debugging Strategy:
- Use evm.codes as reference: Every opcode has gas cost, stack changes, and examples
- Test one opcode at a time: The evm-from-scratch repo has isolated tests per opcode
- Compare traces: Run your EVM and geth in debug mode, compare execution traces
- Fuzz testing: Generate random valid bytecode and compare results with reference EVM
Essential debugging tools:
- evm.codes - Interactive opcode playground
- Remix IDE - Compile Solidity and inspect bytecode
- etherscan.io - View real smart contract bytecode
- Foundry’s
forge debug- Step through transactions
Test suite progression:
- Stack operations (PUSH, POP, DUP, SWAP)
- Arithmetic (ADD, MUL, DIV, MOD, SDIV, SMOD, ADDMOD, MULMOD, EXP)
- Comparison & bitwise (LT, GT, EQ, ISZERO, AND, OR, XOR, NOT, SHL, SHR)
- Memory (MLOAD, MSTORE, MSTORE8, MSIZE)
- Storage (SLOAD, SSTORE)
- Flow control (JUMP, JUMPI, PC, GAS)
- Block info (BLOCKHASH, COINBASE, TIMESTAMP, NUMBER, DIFFICULTY, GASLIMIT)
- Account (BALANCE, CALLER, CALLVALUE)
- Call operations (CALL, DELEGATECALL, STATICCALL, CREATE, CREATE2)
- Logging (LOG0-LOG4)
Project 4: Implement a Proof-of-Stake Consensus
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Programming Language: C
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 5. The “Industry Disruptor”
- Difficulty: Level 5: Master
- Knowledge Area: Distributed Consensus / Game Theory
- Software or Tool: Consensus Algorithms
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build: A consensus mechanism where validators stake tokens and are selected to propose blocks based on stake weight, with slashing for misbehavior.
Why it teaches consensus: Proof-of-Stake is how modern chains (Ethereum 2.0, Solana, Cardano) work. Building it teaches you the game theory: why validators behave honestly, what happens during forks, and how finality differs from PoW.
Core challenges you’ll face:
- Validator selection → Maps to randomness and stake weighting
- Block proposal and attestation → Maps to committee-based consensus
- Slashing conditions → Maps to punishing equivocation
- Finality gadgets → Maps to when transactions become irreversible
- Nothing-at-stake problem → Maps to PoS vs PoW tradeoffs
Key Concepts:
- Byzantine Fault Tolerance: “Designing Data-Intensive Applications” Ch. 8 - Martin Kleppmann
- Consensus Algorithms: Ethereum’s Casper FFG paper
- Game Theory: “Mastering Ethereum” Ch. 14 - Antonopoulos & Wood
Difficulty: Advanced Time estimate: 2-4 weeks Prerequisites: Understanding of distributed systems, cryptography basics
Learning milestones:
- Validators stake and get selected - You understand stake-weighted randomness
- Blocks reach finality - You understand supermajority attestation
- Slashing works - You understand the economic security model
Real World Outcome
When you complete this project, you’ll have a functioning Proof-of-Stake consensus network that you can run locally. Here’s exactly what you’ll see:
1. Validators Stake and Join the Network
$ ./pos_node validator --stake 32000
VALIDATOR NODE STARTING...
════════════════════════════════════════════════════════════
Validator Address: 0x742d35Cc6634C0532925a3b844Bc9e7595f0bEb7
Staked Amount: 32,000 tokens
Validator Index: 5
Status: ACTIVE
Effective Balance: 32,000 tokens (max)
Waiting for block proposal assignment...
════════════════════════════════════════════════════════════
2. Block Proposal and Attestation
$ ./pos_node view-epoch
EPOCH 42 STATUS:
════════════════════════════════════════════════════════════
Total Validators: 128
Total Stake: 4,096,000 tokens
Participation Rate: 97.3%
SLOT 340 (Current):
Proposer: Validator #23 (0x8ab...)
Block Hash: 0x9f2e3d...
Attestations: 124/128 (96.9%)
Status: ✓ FINALIZED (>66.67% attestations)
SLOT 341 (Next):
Assigned Proposer: Validator #67 (YOU!)
Expected time: 8 seconds
════════════════════════════════════════════════════════════
[12:34:56] YOUR TURN! Proposing block for slot 341...
[12:34:56] Including 47 pending transactions
[12:34:56] Block 0x7a3c... proposed successfully
[12:34:57] Attestation from Validator #5: ✓
[12:34:57] Attestation from Validator #12: ✓
[12:34:58] Attestation from Validator #89: ✓
[12:35:02] Attestations received: 122/128 (95.3%)
[12:35:02] Block FINALIZED ✓
[12:35:02] Reward earned: +0.025 tokens
3. Slashing Detection and Execution
$ ./pos_node simulate-attack double-vote --validator 42
ATTACK SIMULATION: Double Vote by Validator #42
════════════════════════════════════════════════════════════
[Network] Validator #42 broadcasting CONFLICTING votes:
Vote 1: Slot 450 → Block 0xabc123...
Vote 2: Slot 450 → Block 0xdef456... (DIFFERENT!)
[Detector] Slashing condition detected!
Offense: DOUBLE_VOTE
Evidence:
- Both votes from same validator (pubkey match)
- Same slot number (450)
- Different block hashes
- Both signatures valid
[Consensus] Slashing proposal submitted by Validator #17
[Consensus] 87/128 validators confirmed evidence
[Execution] SLASHING VALIDATOR #42
- Stake burned: 1,000 tokens (3.125% of total)
- Remaining stake: 31,000 tokens
- Status: EJECTED from validator set
- Whistleblower reward (Validator #17): +10 tokens
[Network] Validator #42 removed from active set
════════════════════════════════════════════════════════════
4. Fork Choice and Reorganization
$ ./pos_node view-fork-choice
FORK CHOICE RULE (LMD GHOST):
════════════════════════════════════════════════════════════
┌─ Block 453a (20 votes)
Block 452 ──┤
└─ Block 453b (108 votes) ← CANONICAL
│
└─ Block 454 (122 votes)
HEAD: Block 454 (0x8f3e...)
Justification: Epoch 92 checkpoint (>66.67% attested)
Finalization: Epoch 91 checkpoint (IRREVERSIBLE)
Orphaned blocks: 1 (Block 453a - insufficient attestations)
════════════════════════════════════════════════════════════
5. Economic Security Metrics
$ ./pos_node security-analysis
NETWORK SECURITY ANALYSIS:
════════════════════════════════════════════════════════════
Total Staked: 4,096,000 tokens
Network Value: $204,800,000 (at $50/token)
ATTACK COST ANALYSIS:
─────────────────────
To attack (33.4% stake needed): 1,368,064 tokens
Cost to acquire: ~$68,403,200
Slashing penalty if caught: -$68,403,200
Expected outcome: ECONOMIC LOSS (protocol defends)
To finalize invalid block (66.7% needed): 2,732,032 tokens
Cost: ~$136,601,600
Slashing penalty: -$136,601,600
Conclusion: Attack is economically irrational
Current security level: ✓ STRONG
Validator decentralization: 128 unique validators
Largest validator: 3.1% of stake (low centralization risk)
════════════════════════════════════════════════════════════
The Core Question You’re Answering
“Why should validators behave honestly in Proof-of-Stake? What prevents them from validating multiple conflicting chains (the ‘nothing-at-stake’ problem)?”
Before you write any code, sit with this question. In Proof-of-Work, miners can’t work on multiple chains simultaneously (they must choose where to spend their hash power). But in Proof-of-Stake, validators can trivially sign multiple conflicting blocks at zero cost.
The answer involves:
- Economic security (slashing makes misbehavior expensive)
- Game-theoretic incentives (rewards for honesty > costs of attacking)
- Verifiable evidence (cryptographic proofs of equivocation)
- Social consensus (weak subjectivity checkpoints)
Concepts You Must Understand First
Stop and research these before coding:
- Byzantine Fault Tolerance (BFT)
- What does “Byzantine” mean in distributed systems?
- Why is the 2/3 threshold (66.67%) important in BFT consensus?
- What’s the difference between safety (never finalize conflicting blocks) and liveness (eventually finalize blocks)?
- How does BFT differ from Nakamoto consensus (longest chain)?
- Book Reference: “Designing Data-Intensive Applications” Ch. 8-9 - Martin Kleppmann
- Stake-Weighted Selection
- How do you fairly select a validator when they have different stake amounts?
- What is a Verifiable Random Function (VRF)?
- Why can’t you use simple
rand() % num_validators? - How does Ethereum 2.0’s RANDAO provide randomness?
- Book Reference: Ethereum 2.0 Spec (Beacon Chain documentation)
- Slashing Conditions
- What constitutes “equivocation” (provably malicious behavior)?
- Why are double votes and surround votes slashable?
- How much should validators be slashed? (Too little = ineffective, too much = discourages participation)
- How do you prove a validator misbehaved without trusting a single reporter?
- Book Reference: Ethereum’s Casper FFG paper
- Finality Gadgets
- What does “finalized” mean? How is it different from “confirmed”?
- What is a “checkpoint” and why do we finalize in epochs, not per block?
- How does Casper FFG achieve finality on top of the LMD GHOST fork choice?
- What happens during a finality reversion (catastrophic but possible)?
- Book Reference: “Mastering Ethereum” Ch. 14 - Antonopoulos & Wood
- Nothing-at-Stake Problem
- Why is it “free” to vote on multiple chains in naive PoS?
- How does slashing solve this?
- What are “weak subjectivity” checkpoints?
- Why can’t you trustlessly sync from genesis in PoS (unlike PoW)?
- Book Reference: Vitalik Buterin’s “A Proof of Stake Design Philosophy”
- Long-Range Attacks
- What is a long-range attack (rewriting ancient history)?
- Why can’t this happen in PoW (too much computational cost)?
- How do checkpoints prevent long-range attacks?
- What is “social consensus” and why is it needed for very old reorgs?
- Book Reference: “Designing Data-Intensive Applications” Ch. 9 - Martin Kleppmann
Questions to Guide Your Design
Before implementing, think through these:
- Validator Registration
- How do validators join? Do they lock tokens in a smart contract?
- What’s the minimum stake required? (Ethereum uses 32 ETH)
- How long does it take to activate? (Prevents griefing by rapid join/leave)
- How do validators exit? (Immediate exit enables long-range attacks!)
- Block Proposal Selection
- How often should each validator propose? (Every N slots based on stake?)
- Should selection be deterministic or random?
- How far in advance do validators know they’re assigned?
- What if the selected validator is offline?
- Attestation Aggregation
- Do you collect attestations one-by-one or batch them?
- How do you efficiently verify 100+ BLS signatures?
- What’s the deadline for attestations? (Too short = missed votes, too long = slow finality)
- Fork Choice Rule
- When there are competing chains, which is canonical?
- LMD GHOST: Follow the fork with the most recent attestation weight
- How do you handle ties?
- Slashing Implementation
- Who detects slashable offenses? (Any node can!)
- How do you reward the whistleblower?
- Should slashing be gradual (correlated failures) or fixed?
- Can you slash multiple times for the same offense?
- Epoch Boundaries
- How long is an epoch? (Ethereum: 32 slots = 6.4 minutes)
- Why checkpoint finality per epoch instead of per block?
- What happens to validator set changes during an epoch?
Thinking Exercise
Before coding, work through this scenario on paper:
Exercise 1: Trace Validator Selection
Network state:
Validator A: 32,000 tokens staked (31.25% of total)
Validator B: 64,000 tokens staked (62.5%)
Validator C: 6,400 tokens staked (6.25%)
Total: 102,400 tokens
Epoch 10 random seed: 0x8f3a... (from RANDAO)
For slot 320:
1. How do you select the proposer fairly?
(Hint: Hash(seed + slot_number) mod total_stake, then find which validator's range it falls in)
2. What is the probability each validator is selected?
(Should match their stake weight!)
3. If Validator B is selected, what attestations must they collect?
(All other validators attest to their proposed block)
4. What percentage of stake must attest for finality?
(>66.67%, so at least 68,267 tokens worth of attestations)
Exercise 2: Detect a Slashable Offense
Validator D broadcasts two attestations:
Attestation 1:
Source checkpoint: Epoch 15
Target checkpoint: Epoch 20
Block hash: 0xabc123...
Signature: valid
Attestation 2:
Source checkpoint: Epoch 15
Target checkpoint: Epoch 20
Block hash: 0xdef456... (DIFFERENT!)
Signature: valid
Questions:
1. Is this slashable? Why?
2. What evidence do you need to prove it?
3. How much should Validator D be slashed?
4. Can Validator D claim "my node was hacked"? (Doesn't matter - strict liability!)
Exercise 3: Trace Finality
Epoch 25 ends with these attestations:
Block 800: 65,000 tokens attested (63.5%) ← Not finalized
Block 801: 70,000 tokens attested (68.4%) ← Finalized!
Block 802: 45,000 tokens attested (44%)
Epoch 26:
Block 803 builds on block 801
Block 804 builds on block 801
Block 805 builds on block 804
What is the finalized chain at the end of Epoch 26?
Can block 800 ever become part of the canonical chain? (No - block 801 is finalized)
The Interview Questions They’ll Ask
Prepare to answer these:
- “Explain the nothing-at-stake problem in Proof-of-Stake. How does Ethereum solve it?”
- “What’s the difference between justification and finalization in Casper FFG?”
- “Why does Proof-of-Stake use a 2/3 threshold instead of simple majority (50%+1)?”
- “Walk me through what happens when a validator double-votes.”
- “How does weak subjectivity differ from objective finality in Proof-of-Work?”
- “What is a long-range attack and why can’t it happen in Proof-of-Work?”
- “How does validator selection work without being predictable or gameable?”
- “What happens during a finality reversion (both justified checkpoints conflict)?”
- “Why is Proof-of-Stake considered more energy-efficient than Proof-of-Work?”
- “How does slashing rate scale with the number of validators misbehaving simultaneously?”
Hints in Layers
Hint 1: Start with a Simple Stake Registry Before consensus, build the staking mechanism:
typedef struct {
uint8_t pubkey[48]; // BLS public key
uint64_t stake; // Amount staked (in tokens)
uint64_t activation_epoch; // When validator activates
bool slashed; // Has been slashed?
} Validator;
Validator validators[MAX_VALIDATORS];
int validator_count = 0;
uint64_t total_stake = 0;
void register_validator(uint8_t *pubkey, uint64_t stake) {
validators[validator_count++] = (Validator){
.stake = stake,
.activation_epoch = current_epoch + 2, // 2-epoch delay
.slashed = false
};
memcpy(validators[validator_count - 1].pubkey, pubkey, 48);
total_stake += stake;
}
Test: Ensure stake accounting is always correct (sum of individual stakes == total).
Hint 2: Implement Stake-Weighted Random Selection Use the “weighted sampling” technique:
int select_proposer(uint64_t slot, uint8_t *random_seed) {
// Deterministic but unpredictable selection
uint8_t hash_input[40];
memcpy(hash_input, random_seed, 32);
memcpy(hash_input + 32, &slot, 8);
uint8_t hash[32];
sha256(hash_input, 40, hash);
uint64_t random_value = *(uint64_t*)hash % total_stake;
// Find which validator's range this falls into
uint64_t cumulative = 0;
for (int i = 0; i < validator_count; i++) {
cumulative += validators[i].stake;
if (random_value < cumulative) {
return i; // Validator i is selected!
}
}
}
Hint 3: Slashing Detection Uses Signature Comparison Two votes conflict if they’re from the same validator:
bool is_slashable_double_vote(Attestation *att1, Attestation *att2) {
// Same validator?
if (memcmp(att1->pubkey, att2->pubkey, 48) != 0) return false;
// Same target height but different blocks?
if (att1->target_epoch == att2->target_epoch &&
memcmp(att1->block_hash, att2->block_hash, 32) != 0) {
return true; // SLASHABLE!
}
return false;
}
Hint 4: Finality Requires Checkpointing Don’t try to finalize every block. Use epoch boundaries:
typedef struct {
uint64_t epoch;
uint8_t block_hash[32];
uint64_t total_attesting_stake;
bool justified; // >2/3 voted
bool finalized; // Previous checkpoint justified, this one justified
} Checkpoint;
void check_finality(Checkpoint *checkpoint) {
if (checkpoint->total_attesting_stake * 3 > total_stake * 2) {
checkpoint->justified = true;
// If previous checkpoint was justified, this one finalizes it
if (previous_checkpoint.justified) {
previous_checkpoint.finalized = true;
}
}
}
Hint 5: Test Byzantine Scenarios Your protocol must handle malicious validators:
// Simulate a Byzantine validator
void simulate_byzantine_validator(int validator_id) {
// Randomly vote on wrong blocks
if (rand() % 2 == 0) {
Attestation att = create_fake_attestation(validator_id);
broadcast_attestation(&att);
}
}
With 33% Byzantine validators, the protocol should still make progress (safety + liveness).
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Byzantine Fault Tolerance fundamentals | Designing Data-Intensive Applications, 2nd Edition by Martin Kleppmann | Ch. 8: “The Trouble with Distributed Systems”; Ch. 9: “Consistency and Consensus” |
| Proof-of-Stake mechanisms | Mastering Ethereum by Andreas Antonopoulos & Gavin Wood | Ch. 14: “Consensus” (covers Casper FFG/CBC) |
| Game theory and economic security | Mastering Ethereum | Ch. 11: “Oracles” and Ch. 14: “Consensus” |
| Distributed consensus algorithms | Designing Data-Intensive Applications, 2nd Edition | Ch. 9: “Consistency and Consensus” (Paxos, Raft, Byzantine consensus) |
| Cryptographic primitives (BLS signatures) | Serious Cryptography, 2nd Edition by Jean-Philippe Aumasson | Ch. 11-12: “Public-Key Cryptography” |
| Ethereum 2.0 Beacon Chain | Upgrading Ethereum by Ben Edgington | Full book (covers Casper FFG, LMD GHOST, slashing) |
Essential Papers:
- Casper the Friendly Finality Gadget by Vitalik Buterin & Virgil Griffith (2017)
- Combining GHOST and Casper (Ethereum 2.0 specification)
- A Proof of Stake Design Philosophy by Vitalik Buterin
Common Pitfalls & Debugging
Problem 1: “Validator selection is predictable/biased”
- Why: Using weak randomness or not properly weighting by stake
- Fix: Use VRF (Verifiable Random Function) for unpredictable but verifiable selection. Weight probability by stake:
P(validator) = stake / total_stake - Quick test: Run 1000 selections, verify distribution matches stake weights
Problem 2: “Nothing-at-stake: validators vote on multiple forks”
- Why: No penalty for voting on conflicting blocks
- Fix: Implement slashing conditions:
// Slashable offense 1: Double voting (two votes for same height) if (vote1.height == vote2.height && vote1.hash != vote2.hash) { slash(validator); } // Slashable offense 2: Surround voting (voting to revert finalized block) if (vote1.source < vote2.source && vote1.target > vote2.target) { slash(validator); }
Problem 3: “Finality never achieved”
- Why: Not tracking supermajority correctly. Finality requires >2/3 of stake to attest
- Fix:
uint64_t total_attesting_stake = sum_attestations(block); if (total_attesting_stake * 3 > total_staked * 2) { mark_finalized(block); // >66.67% voted }
Problem 4: “Long-range attack: attacker rewrites ancient history”
- Why: Old validators can re-stake on alternate chain
- Fix: Implement weak subjectivity checkpoints. Nodes won’t reorg past N blocks (~1 day worth). Require social consensus for deep reorgs.
Problem 5: “Validators join/leave causing stake accounting bugs”
- Why: Not handling validator set changes atomically
- Fix: Use epochs. Changes take effect only at epoch boundaries, never mid-epoch.
Debugging tips:
- Simulate Byzantine validators (randomly vote incorrectly) - protocol should still finalize
- Test with 33% malicious stake (max that protocol can tolerate)
- Verify slashing removes stake and prevents future participation
Project 5: Build a Simple Smart Contract Compiler
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Main Programming Language: Rust
- Alternative Programming Languages: Go, TypeScript, C
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: Level 1: The “Resume Gold”
- Difficulty: Level 5: Master (The First-Principles Wizard)
- Knowledge Area: Compilers, Blockchain
- Software or Tool: Solidity, EVM, LLVM
- Main Book: “Writing a C Compiler” by Nora Sandler
What you’ll build: A compiler that takes a tiny Solidity-like language and outputs EVM bytecode—covering parsing, type checking, and code generation.
Why it teaches smart contracts: You’ll understand why Solidity has its quirks, how high-level code becomes opcodes, what the ABI actually is, and why certain patterns are gas-expensive.
Core challenges you’ll face:
- Lexing and parsing → Maps to contract syntax
- Type system → Maps to Solidity’s types (address, uint256, etc.)
- Storage layout → Maps to how state variables map to slots
- Function dispatch → Maps to the 4-byte function selector
- ABI encoding → Maps to how calldata is structured
Resources for key challenges:
- “Writing a C Compiler” by Nora Sandler - Apply patterns to a different target
- “Crafting Interpreters” by Robert Nystrom - Parsing and bytecode fundamentals
Key Concepts:
- Compiler Construction: “Writing a C Compiler” - Nora Sandler
- ABI Specification: Solidity documentation
- Storage Layout: “Mastering Ethereum” Ch. 13 - Antonopoulos
Difficulty: Advanced Time estimate: 1 month+ Prerequisites: Built an interpreter before, understand EVM basics
Learning milestones:
- Compile arithmetic expressions - You understand stack-based code generation
- Compile storage variables - You understand SLOAD/SSTORE targeting
- Compile functions with ABI - You understand Ethereum’s calling convention
Real World Outcome
When you complete this project, you’ll have written your own Solidity-like compiler that generates real EVM bytecode:
1. Compile a Simple Contract
$ cat SimpleStorage.mini
contract SimpleStorage {
uint256 value;
function set(uint256 _value) public {
value = _value;
}
function get() public view returns (uint256) {
return value;
}
}
$ ./minisolc compile SimpleStorage.mini
COMPILATION SUCCESSFUL
════════════════════════════════════════════════════════════
Contract: SimpleStorage
Functions: 2 (set, get)
State variables: 1 (value at slot 0)
Bytecode (runtime): 0x608060405234801561001057600080fd5b50600436106100365760003560e01c806360fe47b11461003b5780636d4ce63c14610057575b600080fd5b61005560048036038101906100509190...
Function selectors:
- set(uint256): 0x60fe47b1
- get(): 0x6d4ce63c
Gas estimates:
- Deployment: 127,345 gas
- set(uint256): 43,324 gas
- get(): 2,429 gas (view, free externally)
════════════════════════════════════════════════════════════
2. Deploy and Call Your Compiled Contract
$ ./minisolc deploy SimpleStorage.mini --network local
Deploying to local EVM (from Project 3)...
Contract deployed at: 0x5FbDB2315678afecb367f032d93F642f64180aa3
Deployment cost: 127,345 gas
$ ./minisolc call 0x5FbDB...0aa3 "set(uint256)" 42
Transaction sent: 0x7a3c...
Gas used: 43,324
Storage updated: slot 0 = 0x000000000000000000000000000000000000000000000000000000000000002a
$ ./minisolc call 0x5FbDB...0aa3 "get()"
Return value: 42 (uint256)
3. View Generated Assembly
$ ./minisolc compile SimpleStorage.mini --output asm
FUNCTION: set(uint256)
════════════════════════════════════════════════════════════
JUMPDEST ; Function entry point
PUSH1 0x04 ; Calldata offset
CALLDATALOAD ; Load first argument
PUSH1 0x00 ; Storage slot 0
SSTORE ; Store to state
STOP
FUNCTION: get()
════════════════════════════════════════════════════════════
JUMPDEST ; Function entry point
PUSH1 0x00 ; Storage slot 0
SLOAD ; Load from state
PUSH1 0x00 ; Memory offset
MSTORE ; Store to memory
PUSH1 0x20 ; Return 32 bytes
PUSH1 0x00 ; From offset 0
RETURN ; Return value
════════════════════════════════════════════════════════════
Your high-level `value = _value` became just 4 opcodes!
The Core Question You’re Answering
“How does high-level code like
value = _value;become low-level EVM opcodes? Why does Solidity have gas costs and weird limitations?”
Understanding a compiler forces you to see that every line of Solidity has computational cost. value = _value is an SSTORE (20,000+ gas). Loops are JUMPs. Function calls are DELEGATECALL. Nothing is magic—it all compiles down to stack manipulations and state reads/writes.
Concepts You Must Understand First
Stop and research these before coding:
- Compiler Pipeline Stages
- What’s the difference between lexing, parsing, semantic analysis, and code generation?
- Why separate concerns? (Modularity, testing, optimization opportunities)
- Book Reference: “Writing a C Compiler” by Nora Sandler - Full book
- Stack-Based Code Generation
- How does an expression tree become a sequence of stack operations?
- Why does
a + b * ccompile toPUSH a, PUSH b, PUSH c, MUL, ADD? - Book Reference: “Crafting Interpreters” Ch. 15-17 - Robert Nystrom
- EVM Storage Model
- What’s the difference between storage (persistent), memory (temporary), and stack (working)?
- How are storage slots allocated for state variables?
- Book Reference: “Mastering Ethereum” Ch. 13 - Antonopoulos & Wood
- ABI Encoding
- How are function calls encoded in calldata?
- What’s the function selector? (First 4 bytes of keccak256(signature))
- Book Reference: Ethereum ABI Specification (official docs)
- Type Systems
- Why does Solidity have
uint8throughuint256? - What’s the difference between value types (uint, address) and reference types (arrays, structs)?
- Book Reference: “Writing a C Compiler” Ch. 11 - Nora Sandler
- Why does Solidity have
Questions to Guide Your Design
- Language Subset: Which features do you support? (Start: arithmetic, storage variables, functions. Skip: inheritance, modifiers, events)
- Type Safety: How do you enforce that
uint8 + uint256returnsuint256? - Storage Layout: How do you assign slots to state variables deterministically?
- Function Dispatch: How does calldata route to the right function?
- Optimization: Do you implement constant folding? Dead code elimination?
The Interview Questions They’ll Ask
- “Walk me through how Solidity compiles a function call.”
- “Why does SSTORE cost 20,000 gas but SLOAD only 2,100?”
- “What is the ABI and why does Ethereum need it?”
- “How does the EVM know which function to call in a contract?”
- “Explain the difference between memory and storage in Solidity.”
- “Why can’t you return a dynamically-sized array from a function in older Solidity versions?”
- “What optimizations does the Solidity compiler perform?”
Hints in Layers
Hint 1: Parse to an AST First Don’t generate bytecode directly from text. Build an Abstract Syntax Tree:
enum Expr {
Literal(u256),
Variable(String),
BinaryOp { op: BinOp, left: Box<Expr>, right: Box<Expr> },
}
Hint 2: Stack-Based Codegen Uses Post-Order Traversal
To compile a + b:
fn compile_expr(expr: &Expr) -> Vec<Opcode> {
match expr {
Expr::Literal(n) => vec![PUSH32(*n)],
Expr::Variable(name) => {
let slot = get_slot(name);
vec![PUSH1(slot), SLOAD]
},
Expr::BinaryOp { op, left, right } => {
let mut code = compile_expr(left); // Push left
code.extend(compile_expr(right)); // Push right
code.push(match op {
BinOp::Add => ADD,
BinOp::Mul => MUL,
});
code
}
}
}
Hint 3: Function Dispatch Table Generate a dispatcher that routes based on function selector:
PUSH1 0x00 ; Get calldata
CALLDATALOAD
PUSH1 0xE0
SHR ; First 4 bytes (selector)
DUP1
PUSH4 0x60fe47b1 ; set(uint256) selector
EQ
PUSH2 set_function
JUMPI ; If match, jump to set_function
DUP1
PUSH4 0x6d4ce63c ; get() selector
EQ
PUSH2 get_function
JUMPI
REVERT ; Unknown function
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Complete compiler implementation | Writing a C Compiler by Nora Sandler | Full book (compilers from scratch) |
| Bytecode generation | Crafting Interpreters by Robert Nystrom | Ch. 14-24: “Bytecode VM” |
| EVM storage and memory | Mastering Ethereum by Antonopoulos & Wood | Ch. 13: “The EVM” |
| ABI specification | Ethereum documentation | ABI Spec |
| Type systems and checking | Writing a C Compiler | Ch. 11: “Type Checking” |
Common Pitfalls & Debugging
Problem 1: “Parser accepts invalid syntax”
- Why: Grammar is too permissive or doesn’t enforce precedence
- Fix: Use a parser generator (like LALR) or hand-write a recursive descent parser with proper precedence handling
- Quick test:
// Should fail: function () { return; } // Missing function name uint x = ; // Missing expression
Problem 2: “Type checker allows uint8 + uint256”
- Why: Not enforcing type compatibility or implicit conversions
- Fix: Solidity implicitly converts smaller types to larger. Implement type widening:
fn check_binary_op(left_type: Type, right_type: Type) -> Type { match (left_type, right_type) { (Type::Uint(a), Type::Uint(b)) => Type::Uint(max(a, b)), // Widen to larger _ => panic!("Type mismatch"), } }
Problem 3: “Storage variables overwrite each other”
- Why: Not assigning unique storage slots
- Fix: Sequential allocation. First variable at slot 0, second at slot 1, etc. For structs, allocate contiguous slots.
- Debug: Print storage layout during compilation
Problem 4: “Generated bytecode is huge (way more gas than solc)”
- Why: Not optimizing. Naive codegen generates redundant PUSHes and DUPs
- Fix: Implement peephole optimizations:
PUSH1 0x00 PUSH1 0x00 ADD ← Optimize to just PUSH1 0x00 DUP1 POP ← Remove entirely (no-op)
Problem 5: “Function calls fail with ‘invalid function selector’“
- Why: Function selector is keccak256(signature)[:4], must match exactly
- Fix:
// For function "transfer(address,uint256)" let signature = "transfer(address,uint256)"; let selector = keccak256(signature.as_bytes())[0..4]; // Bytecode: check if calldata[0:4] == selector, then jump to function
Problem 6: “ABI encoding/decoding doesn’t match Solidity”
- Why: Padding and offset rules are intricate
- Fix: Follow ABI spec exactly:
- Static types (uint, address): right-padded to 32 bytes
- Dynamic types (string, bytes): offset pointer + length + data
- Test: Compare your encoding with
abi.encode()output from Solidity
Problem 7: “Constructor doesn’t run when deploying contract”
- Why: Constructor code must be part of the init bytecode, NOT runtime bytecode
- Fix: Compiler outputs two bytecode blobs:
- Init code: Runs constructor, returns runtime code
- Runtime code: The actual contract code
Init bytecode structure: [constructor logic] [CODECOPY runtime_code] [RETURN]
Debugging strategy:
- Compare your compiler output with
solc --asmoutput - Test each language feature in isolation (arithmetic, then storage, then functions)
- Use your own EVM (Project 3) to trace execution and verify correctness
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| Bitcoin From Scratch | Advanced | 1 month+ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Minimal Blockchain | Intermediate | Weekend | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| EVM From Scratch | Advanced | 2-4 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Proof-of-Stake | Advanced | 2-4 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Smart Contract Compiler | Advanced | 1 month+ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
Recommendation
Start with Project 2 (Minimal Blockchain in a Weekend), then branch based on your interest:
┌─────────────────────────────────────┐
│ Project 2: Minimal Blockchain │
│ (Start here - core mental model) │
└──────────────┬──────────────────────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Bitcoin Path │ │ Ethereum Path │ │ Consensus Path │
│ Project 1 │ │ Project 3 → 5 │ │ Project 4 │
│ (Cryptography │ │ (Smart contract │ │ (Distributed │
│ deep dive) │ │ execution) │ │ systems) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- If you want to understand cryptography and Bitcoin specifically → Project 1 with Jimmy Song’s book
- If you want to understand smart contracts and Ethereum → Project 3 (EVM) then Project 5 (Compiler)
- If you want to understand distributed consensus → Project 4 (PoS)
Project 6: Build a Full Layer-2 Rollup
What you’ll build: A complete Layer-2 scaling solution—a rollup that batches transactions off-chain, posts compressed data to a simulated L1, and allows users to withdraw via fraud proofs or validity proofs.
Why it teaches everything: This project synthesizes all the concepts: you need cryptography (for signatures and commitments), the EVM (to execute rollup transactions), consensus (for sequencer selection), and compilers (to generate proof circuits). It’s how Optimism, Arbitrum, and zkSync actually work.
Core challenges you’ll face:
- Transaction batching → Compress and post to L1
- State commitments → Merkle roots of rollup state
- Fraud proofs (Optimistic) → Challenge invalid state transitions
- Validity proofs (ZK) → Prove execution correctness cryptographically
- Bridge contracts → Deposit/withdraw between L1 and L2
- Sequencer economics → Who orders transactions and why
Resources for key challenges:
- Optimism’s cannon and fault proof specs
- Vitalik’s Incomplete Guide to Rollups
- “Mastering Ethereum” for bridge contract patterns
Key Concepts:
- Rollup Architecture: Vitalik’s rollup blog posts
- Fraud Proofs: Optimism documentation
- ZK Proofs: “Proofs, Arguments, and Zero-Knowledge” by Justin Thaler
- Bridge Security: “Mastering Ethereum” - Antonopoulos
Difficulty: Expert Time estimate: 2-3 months Prerequisites: Completed Projects 1-3, understand both Bitcoin and Ethereum
Learning milestones:
- Batch transactions and post to L1 - You understand data availability
- Execute and commit state - You understand rollup state machines
- Fraud/validity proofs work - You understand the security model
- Bridge deposits and withdrawals - You understand L1↔L2 interop
Real World Outcome
When you complete this project, you’ll have built a complete Layer-2 rollup system—this is how Optimism, Arbitrum, and zkSync scale Ethereum:
1. Deposit from L1 to L2
$ ./rollup-cli deposit --amount 10 --token ETH --to 0xYourL2Address
L1 DEPOSIT TRANSACTION:
════════════════════════════════════════════════════════════
L1 Bridge Contract: 0x1234...abcd
Tokens Locked: 10 ETH
Destination (L2): 0xYourL2Address
L1 Block: 18,245,672
L1 Tx Hash: 0x7f3a...
Waiting for L2 sequencer to process deposit...
[3 seconds later]
L2 DEPOSIT CONFIRMED:
L2 Balance updated: 0xYourL2Address = 10 ETH
L2 Block: 1,824,332
════════════════════════════════════════════════════════════
2. Execute Transactions on L2 (Cheap!)
$ ./rollup-cli transfer --to 0xAlice --amount 1 --fee 0.0001
L2 TRANSACTION:
════════════════════════════════════════════════════════════
From: 0xYour...
To: 0xAlice...
Amount: 1 ETH
L2 Gas Price: 0.0001 gwei (1000x cheaper than L1!)
L2 Gas Used: 21,000
Total Cost: ~$0.00002 (vs $2 on L1)
Status: ✓ CONFIRMED in L2 block 1,824,335
L2 Tx Hash: 0x9a2c...
════════════════════════════════════════════════════════════
$ ./rollup-cli balance 0xAlice
L2 Balance: 1.0 ETH
3. Sequencer Batches and Posts to L1
$ ./rollup-sequencer view-batch 1824
BATCH #1824 (L2 Blocks 1,824,300 - 1,824,400)
════════════════════════════════════════════════════════════
L2 Transactions: 4,872 txs
Compressed Data Size: 48 KB (from 1.2 MB uncompressed)
Compression Ratio: 25x
State Root (old): 0x7a3f...
State Root (new): 0x9c2e...
POSTING TO L1...
L1 Gas Cost: 1,200,000 gas (~$40)
Cost per L2 tx: $0.008 (shared across 4,872 txs!)
L1 Batch Transaction: 0xb3f7...
Challenge Period: 7 days (until block 18,296,000)
════════════════════════════════════════════════════════════
4. Fraud Proof Challenge (Optimistic Rollup)
$ ./rollup-verifier detect-fraud --batch 1824
FRAUD DETECTED IN BATCH #1824!
════════════════════════════════════════════════════════════
Claimed State Root: 0x9c2e...
Actual State Root: 0x8d1f... (MISMATCH!)
Transaction causing fraud: Tx #2,341 in batch
Claimed: Transfer 5 ETH from 0xBob to 0xEve
Problem: 0xBob only has 2 ETH (insufficient balance!)
SUBMITTING FRAUD PROOF TO L1...
════════════════════════════════════════════════════════════
L1 FRAUD PROOF VERIFICATION:
1. Loading batch from L1 calldata ✓
2. Re-executing Tx #2,341 on-chain ✓
3. Computing correct state root ✓
4. Comparing: 0x8d1f... ≠ 0x9c2e... ✓ FRAUD CONFIRMED
SLASHING SEQUENCER:
- Sequencer bond: 1000 ETH
- Slashed: 100 ETH (10%)
- Challenger reward: 10 ETH
- Batch REVERTED
- New challenge period started
All withdrawals from this batch are now invalid!
════════════════════════════════════════════════════════════
5. Withdraw from L2 Back to L1
$ ./rollup-cli withdraw --amount 5 --to 0xYourL1Address
L2 WITHDRAWAL INITIATED:
════════════════════════════════════════════════════════════
Amount: 5 ETH
L2 Balance after: 4 ETH
L2 Block: 1,825,100
Merkle Proof Generated:
Proof Hash: 0x3f7a...
State Root: 0x5c2e...
Waiting for batch to be posted to L1...
[10 minutes later]
BATCH POSTED TO L1 (Batch #1825)
Challenge period: 7 days (Optimistic Rollup)
You can finalize withdrawal after: 2025-01-05 14:30:00 UTC
════════════════════════════════════════════════════════════
[7 days later]
$ ./rollup-cli finalize-withdrawal --proof-id 0x3f7a...
FINALIZING WITHDRAWAL ON L1:
════════════════════════════════════════════════════════════
Verifying Merkle proof against L1 state root... ✓
Checking no fraud proofs were submitted... ✓
Checking withdrawal not already processed... ✓
L1 Transaction: Transferring 5 ETH to 0xYourL1Address
L1 Gas Cost: 150,000 gas (~$5)
✓ WITHDRAWAL COMPLETE
L1 Balance: 5 ETH received
════════════════════════════════════════════════════════════
The Core Question You’re Answering
“How can Ethereum scale to thousands of transactions per second while maintaining security? What’s the trade-off between Optimistic and ZK rollups?”
Rollups move computation off-chain but keep data on-chain. This is the key insight: you don’t need L1 to execute every transaction, just to store the data so anyone can verify. Optimistic rollups bet that most sequencers are honest (challenge if not). ZK rollups prove correctness cryptographically (instant finality, no trust needed).
Concepts You Must Understand First
Stop and research these before coding:
- Data Availability
- Why must transaction data be posted to L1 even if execution is off-chain?
- What happens if sequencer withholds data?
- Book Reference: Vitalik’s “Incomplete Guide to Rollups”
- State Commitments (Merkle Roots)
- How does a 32-byte hash commit to the entire L2 state?
- What’s in a Merkle proof and why is it logarithmic in size?
- Book Reference: “Mastering Ethereum” Ch. 11 - Antonopoulos
- Fraud Proofs (Optimistic)
- How do you prove a state transition was invalid?
- Why do you need to re-execute transactions on L1?
- Why a 7-day challenge period?
- Book Reference: Optimism documentation; Arbitrum Nitro specs
- Validity Proofs (ZK-SNARKs)
- How does a ZK proof prove “I executed 1000 transactions correctly” without showing the transactions?
- What are circuits and why are they hard to write?
- Book Reference: “Proofs, Arguments, and Zero-Knowledge” by Justin Thaler
- Bridge Security
- How do deposits from L1→L2 work?
- How do withdrawals ensure you can only take what you own?
- What’s a “forced transaction” for censorship resistance?
- Book Reference: “Mastering Ethereum” bridge patterns
The Interview Questions They’ll Ask
- “Explain the difference between Optimistic and ZK rollups. What are the trade-offs?”
- “Why do Optimistic rollups have a 7-day withdrawal delay?”
- “What is data availability and why is it critical for rollup security?”
- “How does a fraud proof work? Walk me through the on-chain verification.”
- “What happens if a rollup sequencer goes offline or becomes malicious?”
- “How do rollups achieve 10-100x lower fees than L1?”
- “What is a validity proof and how does it differ from a fraud proof?”
- “Explain sequencer centralization risks and mitigation strategies.”
Hints in Layers
Hint 1: Start with L1 Bridge Contract
contract L1Bridge {
mapping(address => uint256) public deposits;
bytes32 public latestStateRoot;
uint256 public challengePeriodEnd;
function deposit(address l2Recipient) external payable {
deposits[msg.sender] += msg.value;
emit Deposit(msg.sender, l2Recipient, msg.value);
// L2 sequencer watches for Deposit events
}
}
Hint 2: Batch Compression Matters Instead of storing full transactions, store deltas:
Full tx: [from, to, value, signature, nonce] = 200+ bytes
Compressed: [from_idx, to_idx, value_delta] = 12 bytes
Hint 3: Fraud Proof Requires On-Chain Execution L1 contract must be able to execute a single L2 transaction:
function verifyFraudProof(
bytes32 preStateRoot,
bytes calldata txData,
bytes32 claimedPostStateRoot
) external {
bytes32 actualRoot = executeTransaction(preStateRoot, txData);
require(actualRoot != claimedPostStateRoot, "No fraud");
slashSequencer();
}
Books That Will Help
| Topic | Book/Resource | Chapter/Section |
|---|---|---|
| Rollup fundamentals | Vitalik’s “Incomplete Guide to Rollups” | Full post |
| Merkle proofs and commitments | Mastering Ethereum by Antonopoulos & Wood | Ch. 11 |
| Optimistic rollup design | Optimism Documentation | Fault Proofs spec |
| ZK-SNARK theory | Proofs, Arguments, and Zero-Knowledge by Justin Thaler | Ch. 1-3, 10-12 |
| Bridge security patterns | Mastering Ethereum | Ch. 7 (Smart Contracts) |
| Data availability | Ethereum Research posts | ethereum.org/roadmap/scaling |
Common Pitfalls & Debugging
Problem 1: “Bridge allows withdrawing more than deposited”
- Why: Not tracking L2 balances correctly, or missing replay protection
- Fix: Bridge contract must:
- Lock tokens on L1 when depositing
- Verify Merkle proof of L2 balance before allowing withdrawal
- Mark withdrawal as processed to prevent replay
- Test: Try withdrawing same amount twice—second should fail
Problem 2: “Fraud proof window expires too quickly”
- Why: Challenge period too short for verifiers to check
- Fix: Optimistic rollups need ~7 days (Optimism/Arbitrum use 7-day window). This allows time for anyone to submit fraud proof if sequencer cheats
- Security: Shorter window = less decentralization (only fast verifiers can participate)
Problem 3: “Data availability attack: sequencer withholds batch data”
- Why: Posting state root without posting transaction data
- Fix: MUST post full transaction data (or data hash) to L1. Users can’t exit if they can’t reconstruct state
- Rule: Data availability is more important than computation verification!
Problem 4: “Fraud proof is invalid but gets accepted”
- Why: Not verifying the proof correctly on-chain
- Fix: L1 contract must:
function challengeStateRoot( bytes32 oldRoot, bytes32 newRoot, bytes calldata txData, bytes32[] calldata merkleProof ) external { // 1. Verify old state via Merkle proof require(verifyMerkleProof(oldRoot, merkleProof), "Invalid old state"); // 2. Re-execute transaction on-chain bytes32 computedNewRoot = executeTx(oldRoot, txData); // 3. Compare with claimed new root require(computedNewRoot != newRoot, "State root is valid"); // 4. Slash sequencer, reward challenger slashSequencer(); rewardChallenger(msg.sender); }
Problem 5: “ZK proof generation takes forever”
- Why: ZK proofs are computationally intensive (proving ~1000 EVM opcodes can take minutes)
- Fix: Use proof recursion/aggregation. Prove 1000 txs → combine 10 proofs → combine 10 meta-proofs. Final proof is constant size.
- Alternative: Start with optimistic rollup (simpler), add ZK later
Problem 6: “Sequencer censorship: can’t get transactions included”
- Why: Centralized sequencer ignores certain users
- Fix: Implement forced inclusion: users can submit tx directly to L1 contract, sequencer MUST include it within N blocks or get slashed
- Decentralization: Use sequencer rotation or shared sequencing (multiple sequencers)
Problem 7: “Gas costs explode when posting batches to L1”
- Why: Not compressing transaction data
- Fix: Use calldata compression:
- Omit default values (signature recovery = use v,r,s)
- Use custom encoding (not full RLP)
- Batch similar transactions together
- Benchmark: Optimism achieves ~10x compression
Problem 8: “Exit from L2 to L1 fails during network congestion”
- Why: Relying on sequencer to process exit
- Fix: Implement emergency escape hatch: users can always exit directly via L1 by providing Merkle proof of their L2 balance
- Code:
function emergencyWithdraw( uint256 amount, bytes32[] calldata proof ) external { bytes32 leaf = keccak256(abi.encodePacked(msg.sender, amount)); require(verifyMerkleProof(latestStateRoot, leaf, proof), "Invalid proof"); require(!isWithdrawn[leaf], "Already withdrawn"); isWithdrawn[leaf] = true; token.transfer(msg.sender, amount); }
Architecture decision tree:
Optimistic vs ZK Rollup?
- Optimistic: Easier to build, EVM-compatible, 7-day withdrawal delay
- ZK: Harder to build, requires circuits, instant finality
Start with: Optimistic (get it working), then explore ZK proofs
Debugging strategy:
- Test L1-L2 deposit flow first (simpler)
- Test L2 execution in isolation (use your EVM from Project 3)
- Test state commitment posting
- Test withdrawal flow (hardest - involves proofs)
- Test fraud/validity proof verification last
Essential tools:
- Hardhat/Foundry for L1 contract development
- Your EVM (Project 3) for L2 execution
- Circom/ZoKrates for ZK circuits (if doing ZK rollup)
Summary
This comprehensive blockchain learning path covers the complete stack—from cryptographic primitives to distributed consensus to smart contract execution. Here’s the complete list of all 6 projects:
| # | Project Name | Main Language | Difficulty | Time Estimate | Key Focus |
|---|---|---|---|---|---|
| 1 | Build Bitcoin From Scratch | Python | Master | 1 month+ | Cryptography, UTXO model, Proof-of-Work |
| 2 | Build a Minimal Blockchain in a Weekend | Python/Rust | Intermediate | Weekend | Core blockchain data structure |
| 3 | Build the Ethereum Virtual Machine (EVM) From Scratch | Rust | Master | 2-4 weeks | Stack machine, opcodes, gas metering |
| 4 | Implement a Proof-of-Stake Consensus | C | Master | 2-4 weeks | BFT consensus, game theory, slashing |
| 5 | Build a Simple Smart Contract Compiler | Rust | Master | 1 month+ | Compilers, code generation, ABI |
| 6 | Build a Full Layer-2 Rollup | Multiple | Expert | 2-3 months | Scaling, fraud proofs, bridges |
Recommended Learning Paths
Choose your path based on what interests you most:
Path 1: Bitcoin & Cryptography Deep Dive
For: Those fascinated by cryptographic systems and decentralized money Sequence:
- Project 2 (Weekend) - Get the core blockchain mental model
- Project 1 (1 month) - Deep dive into Bitcoin’s cryptography
- Project 4 (2-4 weeks) - Understand modern consensus (PoS)
Expected outcome: You’ll understand Bitcoin at the implementation level, know elliptic curve cryptography intimately, and grasp why Proof-of-Stake is different.
Path 2: Ethereum & Smart Contracts
For: Those building decentralized applications or auditing smart contracts Sequence:
- Project 2 (Weekend) - Understand blockchain basics
- Project 3 (2-4 weeks) - Build the EVM to understand execution
- Project 5 (1 month) - Build a compiler to understand gas costs
- Project 6 (2-3 months) - Understand scaling with rollups
Expected outcome: You’ll understand every opcode in the EVM, why Solidity has its quirks, how gas is metered, and how Layer-2 scaling works.
Path 3: Distributed Systems & Consensus
For: Those interested in distributed algorithms and system design Sequence:
- Project 2 (Weekend) - Blockchain as a distributed data structure
- Project 4 (2-4 weeks) - Byzantine Fault Tolerant consensus
- Project 6 (2-3 months) - Rollups as distributed systems
Expected outcome: You’ll understand Byzantine Fault Tolerance, economic security models, and how to build systems that work despite malicious actors.
Path 4: Full-Stack Blockchain Engineer (Complete Path)
For: Those who want comprehensive understanding of all blockchain layers Sequence:
- Project 2 (Weekend) - Foundation
- Project 1 (1 month) - Cryptography layer
- Project 3 (2-4 weeks) - Execution layer
- Project 4 (2-4 weeks) - Consensus layer
- Project 5 (1 month) - Developer tools layer
- Project 6 (2-3 months) - Scaling layer
Total time: 5-7 months of dedicated learning Expected outcome: You’ll understand blockchain systems from first principles—able to read any blockchain’s source code, audit smart contracts, design consensus mechanisms, and architect scaling solutions.
Expected Outcomes After Completing These Projects
After working through all 6 projects, you will be able to:
- Read and understand any blockchain’s source code
- Bitcoin Core, Geth (Ethereum), Solana runtime, etc.
- Trace how a transaction flows from submission to finality
- Audit smart contracts for security vulnerabilities
- Understand gas optimization techniques
- Identify reentrancy, integer overflow, and other common bugs
- Read EVM bytecode and assembly
- Design consensus mechanisms
- Understand the trade-offs between PoW, PoS, and BFT consensus
- Design economic incentives to align validators
- Implement slashing conditions
- Build blockchain infrastructure
- Write indexers, block explorers, or analytics tools
- Implement custom opcodes or precompiles
- Build development tools (debuggers, profilers)
- Architect scaling solutions
- Design Layer-2 rollups or sidechains
- Understand data availability sampling
- Implement bridge contracts securely
- Answer technical interview questions confidently
- Explain cryptographic primitives (ECC, hash functions, Merkle trees)
- Discuss consensus trade-offs
- Compare blockchain architectures (UTXO vs Account model)
- Contribute to open-source blockchain projects
- Ethereum clients (Geth, Reth, Nethermind)
- Layer-2 solutions (Optimism, Arbitrum, zkSync)
- Bitcoin Core or Lightning Network
Difficulty Progression
The projects are designed with increasing complexity:
Difficulty Curve:
Easy Project 2 (Minimal Blockchain)
│ │
│ ▼
│ [Core mental model established]
│ │
Medium ──────┘
│
│ Project 3 (EVM) Project 4 (PoS)
│ │ │
│ ▼ ▼
│ [Execution layer] [Consensus layer]
│ │ │
Advanced ─────┴────────────────────────┘
│ │
│ Project 1 (Bitcoin) Project 5 (Compiler)
│ │ │
│ ▼ ▼
│ [Cryptography] [Developer tools]
│ │ │
Expert ─────┴────────────────────────┘
│
▼
Project 6 (Rollup)
[Full synthesis]
Start with Project 2 to build intuition, then choose your path based on interests.
Time Investment Guide
If you have 1 weekend:
- Complete Project 2 (Minimal Blockchain)
- Outcome: Core mental model of how blockchains work
If you have 1 month:
- Weekend: Project 2
- Week 1-2: Project 3 (EVM) or Project 1 (Bitcoin)
- Week 3-4: Deepen chosen project or start second project
- Outcome: Deep understanding of either execution (Ethereum) or cryptography (Bitcoin)
If you have 3 months:
- Month 1: Projects 2 + 3 (Blockchain basics + EVM)
- Month 2: Project 1 (Bitcoin) or Project 4 (PoS)
- Month 3: Project 5 (Compiler) or Project 6 (Rollup)
- Outcome: Comprehensive blockchain developer skills
If you have 6+ months:
- Complete all 6 projects in sequence
- Contribute to open-source projects between projects
- Build your own blockchain or dApp as a capstone
- Outcome: Expert-level blockchain engineering skills
Interview Preparation Map
Each project maps directly to common interview topics:
| Interview Topic | Covered in Project | Key Questions |
|---|---|---|
| Cryptography | Project 1 | ECDSA, hash functions, Merkle trees |
| Consensus Algorithms | Projects 1, 4 | PoW vs PoS, Byzantine Fault Tolerance, finality |
| Smart Contracts | Projects 3, 5 | EVM execution, gas optimization, security |
| Scaling Solutions | Project 6 | Layer-2 rollups, data availability, bridges |
| Blockchain Architecture | Project 2 | UTXO vs Account model, immutability, forks |
| Distributed Systems | Projects 4, 6 | Fault tolerance, consensus, state machines |
After completing these projects, you’ll confidently answer questions like:
- “Walk me through what happens when you send a Bitcoin transaction.”
- “Explain how the EVM executes a smart contract.”
- “What’s the difference between Optimistic and ZK rollups?”
- “How does proof-of-stake achieve finality?”
- “Why does Solidity have gas costs?”
Next Steps After Completion
Once you’ve finished these projects:
- Contribute to Open Source
- Ethereum clients: Geth, Reth, Nethermind
- Bitcoin: Bitcoin Core, Lightning Network
- Layer-2: Optimism, Arbitrum, zkSync
- Build Your Own Project
- Novel consensus mechanism
- Domain-specific blockchain
- DeFi protocol or DAO
- Developer tooling
- Specialize Further
- ZK cryptography (SNARKs, STARKs)
- MEV (Maximal Extractable Value)
- Blockchain security auditing
- Protocol research
- Apply Your Skills
- Blockchain engineer at Web3 company
- Smart contract auditor
- Protocol researcher
- Developer relations / education
Additional Resources
Beyond the books referenced in each project, explore:
Blogs & Research:
- Vitalik Buterin’s blog (ethereum.org/en/learn)
- A16z Crypto Research
- Trail of Bits blockchain security research
Communities:
- Ethereum Research Forum (ethresear.ch)
- Bitcoin Stack Exchange
- /r/cryptography, /r/ethereum, /r/bitcoin
Courses (After Projects):
- Stanford CS 251: Cryptocurrencies and Blockchain Technologies
- Berkeley CS 294-144: Blockchain and Cryptocurrencies
- MIT 15.S12: Blockchain and Money
Practice:
- Ethernaut (smart contract CTF)
- Capture the Ether
- Damn Vulnerable DeFi
Sources
- GitHub: Blockchain Development Resources - Comprehensive resource collection
- Bitcoin Whitepaper - Original Satoshi paper
- Programming Bitcoin by Jimmy Song - O’Reilly
- Mastering Bitcoin GitHub - Free 3rd edition
- EVM From Scratch - W1nt3r.eth’s course
- EVM From Scratch Book - Jupyter notebooks
- Mastering Ethereum Ch. 13 - EVM - GitBook
- Ethereum.org EVM Docs - Official documentation