BLOCKCHAIN BITCOIN ETHEREUM LEARNING PROJECTS
Understanding Bitcoin, Blockchain & Ethereum Through Building
Goal: Deeply understand how cryptocurrency systems work at every level—from the cryptographic primitives that secure transactions, to the consensus mechanisms that keep networks honest, to the virtual machines that execute smart contracts. By building these systems yourself, you’ll gain the knowledge to read any blockchain’s source code, audit smart contracts, and architect new decentralized systems.
Why Blockchain Technology Matters
In 2008, an anonymous developer named Satoshi Nakamoto published a 9-page whitepaper that solved a problem cryptographers had struggled with for decades: how to create digital money that can’t be double-spent, without trusting a central authority.
That solution—Bitcoin—launched an entirely new field of computer science. Today:
- Bitcoin processes ~$30 billion in daily transactions, secured by more computing power than the world’s top 500 supercomputers combined
- Ethereum hosts over $50 billion in smart contract value, running unstoppable applications with no central server
- Layer 2 rollups process thousands of transactions per second while inheriting Bitcoin/Ethereum security
- Every major bank and tech company now has blockchain research teams
Understanding blockchain isn’t just about cryptocurrency—it’s about understanding a new paradigm for building trustless, decentralized systems. The concepts you’ll learn (cryptographic commitments, distributed consensus, state machines, game-theoretic security) apply far beyond crypto.
The Mental Model: What Makes Blockchain Work
Before diving into projects, you need to understand why blockchains work. Here’s the core insight:
Traditional Database Blockchain
┌─────────────────────┐ ┌─────────────────────────────────────┐
│ │ │ ┌──────┐ ┌──────┐ ┌──────┐ │
│ Central Server │ │ │Node 1│ │Node 2│ │Node 3│ ... │
│ (Single source │ │ └──┬───┘ └──┬───┘ └──┬───┘ │
│ of truth) │ │ │ │ │ │
│ │ │ └─────────┴─────────┘ │
└─────────┬───────────┘ │ Consensus │
│ │ (All nodes agree on │
"Trust me" │ the same state) │
│ └─────────────────────────────────────┘
▼ │
Users must trust ▼
the operator "Trust the math"
(Cryptographic proofs make
cheating impossible/expensive)
The key innovation is replacing trust in institutions with trust in mathematics:
- Cryptographic hashes make tampering detectable (change one bit → completely different hash)
- Digital signatures prove ownership without revealing secrets
- Proof of Work/Stake makes attacks economically irrational
- Merkle trees enable efficient verification without downloading everything
- Consensus protocols ensure all honest nodes see the same history
Cryptographic Hash Functions: The Foundation of Blockchain
Every blockchain relies on cryptographic hash functions. A hash function takes any input and produces a fixed-size output (the “hash” or “digest”):
Input: "Hello, World!"
↓ SHA-256
Output: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
Input: "Hello, World." (just one character different!)
↓ SHA-256
Output: f8c3bf62a9aa3e6fc1619c250e48ade01a8e0a892e2e69e9a5e3f8a2f5e21c8a
(completely different!)
Properties That Make Hash Functions Useful
┌─────────────────────────────────────────────────────────────────────────────┐
│ HASH FUNCTION PROPERTIES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. DETERMINISTIC │
│ Same input → Always same output │
│ "hello" → abc123... (every single time) │
│ │
│ 2. ONE-WAY (Preimage Resistance) │
│ Input → Hash ✓ EASY (microseconds) │
│ Hash → Input ✗ IMPOSSIBLE (longer than the universe) │
│ │
│ 3. COLLISION RESISTANT │
│ Finding two different inputs with same hash is infeasible │
│ P(collision) ≈ 1 in 2^128 for SHA-256 │
│ │
│ 4. AVALANCHE EFFECT │
│ Tiny change in input → Completely different output │
│ "hello" → abc123... │
│ "hellp" → xyz789... (no similarity!) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
How Blocks Chain Together via Hashes
THE BLOCKCHAIN DATA STRUCTURE
Block 0 (Genesis) Block 1 Block 2
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Prev Hash: 0000 │ ┌───▶│ Prev Hash: a7f3 │ ┌───▶│ Prev Hash: 8d2e │
├──────────────────┤ │ ├──────────────────┤ │ ├──────────────────┤
│ Timestamp │ │ │ Timestamp │ │ │ Timestamp │
│ Nonce │ │ │ Nonce │ │ │ Nonce │
│ Merkle Root │ │ │ Merkle Root │ │ │ Merkle Root │
├──────────────────┤ │ ├──────────────────┤ │ ├──────────────────┤
│ Transactions │ │ │ Transactions │ │ │ Transactions │
│ - Tx0 │ │ │ - Tx0 │ │ │ - Tx0 │
│ - Tx1 │ │ │ - Tx1 │ │ │ - Tx1 │
│ - ... │ │ │ - ... │ │ │ - ... │
└────────┬─────────┘ │ └────────┬─────────┘ │ └──────────────────┘
│ │ │ │
└─ Hash ──────┘ └─ Hash ──────┘
= a7f3... = 8d2e...
WHY THIS IS TAMPER-PROOF:
─────────────────────────
If you try to change a transaction in Block 1:
1. Block 1's hash changes (avalanche effect)
2. Block 2's "Prev Hash" no longer matches
3. Block 2's hash changes
4. ... all subsequent blocks become invalid
To tamper, you'd need to recalculate ALL subsequent blocks
faster than the honest network adds new ones.
Merkle Trees: Efficient Transaction Verification
A Merkle tree allows you to prove a transaction is in a block without downloading the entire block:
MERKLE TREE STRUCTURE
┌─────────────────┐
│ Merkle Root │ ← Stored in block header
│ H(AB + CD) │ (just 32 bytes!)
└────────┬────────┘
│
┌─────────────────┴─────────────────┐
│ │
┌─────┴─────┐ ┌─────┴─────┐
│ H(A+B) │ │ H(C+D) │
│ Internal │ │ Internal │
└─────┬─────┘ └─────┬─────┘
│ │
┌───────┴───────┐ ┌───────┴───────┐
│ │ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ H(A) │ │ H(B) │ │ H(C) │ │ H(D) │
│ Leaf A │ │ Leaf B │ │ Leaf C │ │ Leaf D │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Tx A │ │ Tx B │ │ Tx C │ │ Tx D │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
MERKLE PROOF EXAMPLE: Prove Tx B is in the tree
─────────────────────────────────────────────────
You need only 2 hashes (marked with ★):
1. H(A) ★ (to compute H(A+B))
2. H(C+D) ★ (to compute the root)
┌─────────────────┐
│ Merkle Root │ ← Compute and compare to block header
│ H(AB + CD) │
└────────┬────────┘
│
┌────────┴────────┐
│ │
┌───┴───┐ ┌───┴───┐
│H(A+B) │ │H(C+D) │ ★ Given
└───┬───┘ └───────┘
│
┌───┴───┐
│ │
★ ● ← Your transaction (Tx B)
H(A) H(B) You compute H(B) yourself
Verification: O(log n) hashes instead of O(n)
- 1 million transactions: only ~20 hashes needed!
- Light clients can verify without full blockchain
Elliptic Curve Cryptography: How Digital Signatures Work
Bitcoin and Ethereum use the secp256k1 elliptic curve for digital signatures. Here’s why this matters:
THE SECP256K1 CURVE: y² = x³ + 7
│
│ ....
│ ... ...
│ .. ..
│ . .
│ . Point .
────────┼───●──────────────────────────
│ . G (Generator) .
│ . .
│ .. ..
│ ... ...
│ ....
│
HOW KEY GENERATION WORKS:
─────────────────────────
1. Pick a random 256-bit number: your PRIVATE KEY (k)
k = 0x1234567890abcdef... (keep this SECRET!)
2. Multiply Generator Point G by your private key:
PUBLIC KEY = k × G = P (a point on the curve)
3. The magic: Computing k × G is easy (milliseconds)
Reversing P → k is IMPOSSIBLE (billions of years)
This is the "Elliptic Curve Discrete Logarithm Problem" (ECDLP)
Security: 2^128 operations to break = heat death of universe
DIGITAL SIGNATURE (ECDSA):
──────────────────────────
Signing a Transaction:
┌─────────────────────────────────────────────────────────┐
│ │
│ 1. Hash the message: z = SHA256(transaction_data) │
│ │
│ 2. Pick random nonce: k (MUST be unique per signature!)│
│ │
│ 3. Calculate R = k × G (a curve point) │
│ r = R.x mod n (x-coordinate of R) │
│ │
│ 4. Calculate s = k⁻¹(z + r × private_key) mod n │
│ │
│ 5. Signature = (r, s) │
│ │
└─────────────────────────────────────────────────────────┘
Verifying a Signature (anyone can do this!):
┌─────────────────────────────────────────────────────────┐
│ │
│ Given: message, signature (r, s), public key P │
│ │
│ 1. Hash the message: z = SHA256(message) │
│ │
│ 2. Calculate: u₁ = z × s⁻¹ mod n │
│ u₂ = r × s⁻¹ mod n │
│ │
│ 3. Calculate point: R' = u₁ × G + u₂ × P │
│ │
│ 4. Signature valid if: R'.x mod n == r │
│ │
└─────────────────────────────────────────────────────────┘
WHY THIS IS SECURE:
───────────────────
- Only the private key holder can create valid signatures
- Anyone can verify with just the public key
- The signature is unique to that exact message
- Change one bit of the message → signature becomes invalid
Bitcoin’s UTXO Model vs Ethereum’s Account Model
These are the two fundamental ways blockchains track “who owns what”:
BITCOIN: UTXO MODEL
════════════════════
Think of it like CASH - physical bills you receive and spend:
You have these UTXOs (Unspent Transaction Outputs):
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ UTXO #1 │ │ UTXO #2 │ │ UTXO #3 │
│ 0.5 BTC │ │ 0.3 BTC │ │ 0.2 BTC │
│ From: tx_abc │ │ From: tx_def │ │ From: tx_ghi │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Total: 1.0 BTC (but stored as 3 separate "bills")
To send 0.6 BTC to Alice:
─────────────────────────
INPUTS (consumed) OUTPUTS (created)
┌─────────────────────────┐ ┌─────────────────────────┐
│ UTXO #1: 0.5 BTC ───────┼───▶│ Alice: 0.6 BTC (new) │
│ UTXO #2: 0.3 BTC ───────┼───▶│ Change: 0.199 BTC (new)│
└─────────────────────────┘ │ Fee: 0.001 BTC │
0.8 BTC consumed └─────────────────────────┘
0.8 BTC distributed
After: You own 1 UTXO (the 0.199 BTC change + UTXO #3)
═══════════════════════════════════════════════════════════════
ETHEREUM: ACCOUNT MODEL
════════════════════════
Think of it like a BANK ACCOUNT - one balance that gets updated:
Global State (simplified):
┌───────────────────────────────────────────────────────────┐
│ Account │ Balance │ Nonce │ Storage │
├─────────────────────────────┼──────────┼───────┼─────────┤
│ 0xAlice... │ 5.0 ETH │ 12 │ - │
│ 0xBob... │ 3.2 ETH │ 45 │ - │
│ 0xUniswap... (contract) │ 1000 ETH │ 1 │ {...} │
└───────────────────────────────────────────────────────────┘
To send 2.0 ETH from Alice to Bob:
──────────────────────────────────
Before: Alice = 5.0 ETH, Bob = 3.2 ETH, Alice.nonce = 12
After: Alice = 2.9 ETH, Bob = 5.2 ETH, Alice.nonce = 13
▲ ▲
│ │
(2.0 sent + 0.1 fee) (prevents replay attacks)
═══════════════════════════════════════════════════════════════
COMPARISON
══════════
┌─────────────────────┬──────────────────────┬────────────────────────┐
│ Property │ UTXO (Bitcoin) │ Account (Ethereum) │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Privacy │ Better (new address │ Worse (single address │
│ │ for each UTXO) │ easy to track) │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Parallelism │ Excellent (UTXOs are │ Limited (account state │
│ │ independent) │ is sequential) │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Smart Contracts │ Limited (Script is │ Excellent (Turing- │
│ │ not Turing-complete)│ complete EVM) │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Complexity │ Higher (manage many │ Lower (simple balance) │
│ │ UTXOs) │ │
├─────────────────────┼──────────────────────┼────────────────────────┤
│ Double-Spend Check │ Check if UTXO exists │ Check nonce sequence │
│ │ and unspent │ │
└─────────────────────┴──────────────────────┴────────────────────────┘
Proof of Work: Making Cheating Expensive
Proof of Work is the original consensus mechanism that made Bitcoin possible:
HOW PROOF OF WORK OPERATES
The Mining Puzzle:
──────────────────
Find a nonce such that:
SHA256(block_header + nonce) < TARGET
Where TARGET is adjusted so this takes ~10 minutes on average
for the entire network combined.
Example:
────────
Block header data: "prev_hash=abc123, merkle_root=def456, timestamp=..."
Miner tries:
nonce=0: SHA256(...) = f8a3b2c1... ❌ Too high!
nonce=1: SHA256(...) = e9d4c5a6... ❌ Too high!
nonce=2: SHA256(...) = d7e8f901... ❌ Too high!
... millions of attempts ...
nonce=8372631: SHA256(...) = 0000000000abc... ✓ Below target!
This took BILLIONS of hash operations (expensive!)
But verifying is just ONE hash (cheap!)
The Difficulty Target:
──────────────────────
┌─────────────────────────────────────────────────────────────────┐
│ │
│ TARGET (determines how many leading zeros required) │
│ │
│ Difficulty 1: 00000000ffffffff... (easiest) │
│ Difficulty 10: 000000000fffffff... │
│ Difficulty 100: 0000000000ffffff... │
│ Bitcoin 2024: 00000000000000000000... (very hard!) │
│ │
│ Adjustment: Every 2016 blocks (~2 weeks) │
│ - If blocks came too fast → increase difficulty │
│ - If blocks came too slow → decrease difficulty │
│ - Goal: maintain ~10 minute average block time │
│ │
└─────────────────────────────────────────────────────────────────┘
Why This Creates Consensus:
───────────────────────────
┌─────────────────────────────────────────────────────────────────┐
│ │
│ 1. Miners compete to find valid blocks │
│ │
│ 2. First valid block gets broadcast to network │
│ │
│ 3. Other miners verify (cheap!) and accept │
│ │
│ 4. Miners start building on the new longest chain │
│ │
│ 5. Block reward (currently 3.125 BTC) incentivizes honesty │
│ │
│ │
│ Fork Resolution: LONGEST CHAIN WINS │
│ │
│ ┌────┐ ┌────┐ ┌────┐ │
│ │ B1 │────▶│ B2 │────▶│ B3 │──┬──▶ ✓ This chain wins │
│ └────┘ └────┘ └────┘ │ (more work) │
│ │ │ │
│ ▼ │ │
│ ┌────┐ │ │
│ │ B3'│───┘ ✗ Orphaned │
│ └────┘ (less work) │
│ │
└─────────────────────────────────────────────────────────────────┘
51% Attack Economics:
─────────────────────
To rewrite history, an attacker needs >50% of network hash rate:
- Current Bitcoin network: ~500 EH/s (500 quintillion hashes/sec)
- Cost of equipment: ~$10 billion
- Electricity: ~$20 million per day
- And you'd destroy the value of what you're trying to steal!
The attack is possible but economically irrational.
The Ethereum Virtual Machine: A World Computer
Ethereum extends Bitcoin’s vision by adding a Turing-complete virtual machine:
EVM ARCHITECTURE
┌─────────────────────────────────────────────────────────────────┐
│ ETHEREUM VIRTUAL MACHINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ STACK │ │
│ │ ┌────┬────┬────┬────┬────┬────┬─────────────────────┐ │ │
│ │ │ 32 │ 31 │ 30 │ 29 │ 28 │ .. │ 0 │ │ │
│ │ │byte│byte│byte│byte│byte│ │ (top) │ │ │
│ │ └────┴────┴────┴────┴────┴────┴─────────────────────┘ │ │
│ │ Max depth: 1024 items, each 256-bits (32 bytes) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ MEMORY │ │
│ │ ┌────┬────┬────┬────┬────┬────┬────┬────┬────────────┐ │ │
│ │ │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ .. │ n │ ∞ │ │ │
│ │ └────┴────┴────┴────┴────┴────┴────┴────┴────────────┘ │ │
│ │ Linear byte array, volatile (cleared after execution) │ │
│ │ Cost: grows quadratically (more memory = more gas) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ STORAGE │ │
│ │ ┌──────────────────────┬───────────────────────────┐ │ │
│ │ │ Key (256-bit) │ Value (256-bit) │ │ │
│ │ ├──────────────────────┼───────────────────────────┤ │ │
│ │ │ 0x0000...0000 │ contract_owner_address │ │ │
│ │ │ 0x0000...0001 │ total_supply │ │ │
│ │ │ keccak(user, slot) │ user_balance │ │ │
│ │ └──────────────────────┴───────────────────────────┘ │ │
│ │ Key-value store, PERSISTENT across transactions │ │
│ │ Most expensive operation! (20,000 gas to write) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
EVM EXECUTION MODEL:
────────────────────
Program Counter ──▶ ┌──────────────────────────────────────────┐
│ 0x60 0x80 0x60 0x40 0x52 0x34 ... │
│ PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE │
└──────────────────────────────────────────┘
▲
│
Each byte is an OPCODE or data
COMMON OPCODES:
───────────────
┌───────────┬────────┬────────────────────────────────────────────┐
│ Opcode │ Gas │ Description │
├───────────┼────────┼────────────────────────────────────────────┤
│ ADD │ 3 │ Pop 2 values, push their sum │
│ MUL │ 5 │ Pop 2 values, push their product │
│ SUB │ 3 │ Pop 2 values, push their difference │
│ DIV │ 5 │ Pop 2 values, push their quotient │
├───────────┼────────┼────────────────────────────────────────────┤
│ PUSH1 │ 3 │ Push 1 byte onto stack │
│ PUSH32 │ 3 │ Push 32 bytes onto stack │
│ POP │ 2 │ Remove top stack item │
│ DUP1 │ 3 │ Duplicate top stack item │
│ SWAP1 │ 3 │ Swap top 2 stack items │
├───────────┼────────┼────────────────────────────────────────────┤
│ MLOAD │ 3 │ Load word from memory │
│ MSTORE │ 3 │ Store word in memory │
│ SLOAD │ 100 │ Load word from storage (cold access 2100) │
│ SSTORE │ 20000 │ Store word in storage (most expensive!) │
├───────────┼────────┼────────────────────────────────────────────┤
│ JUMP │ 8 │ Jump to code location │
│ JUMPI │ 10 │ Conditional jump │
│ CALL │ 100+ │ Call another contract │
│ RETURN │ 0 │ End execution, return data │
└───────────┴────────┴────────────────────────────────────────────┘
EXAMPLE: Simple Addition in Bytecode
────────────────────────────────────
Solidity: function add(uint a, uint b) returns (uint) { return a + b; }
Compiles to something like:
PUSH1 0x00 ; Push 0 (result location)
CALLDATALOAD ; Load 'a' from calldata
PUSH1 0x20 ; Push 32 (offset for 'b')
CALLDATALOAD ; Load 'b' from calldata
ADD ; a + b
PUSH1 0x00 ; Push return offset
MSTORE ; Store result in memory
PUSH1 0x20 ; Push return size (32 bytes)
PUSH1 0x00 ; Push return offset
RETURN ; Return the result
Stack trace:
[] ; Start empty
[0x00] ; PUSH1 0x00
[a] ; CALLDATALOAD (replaces 0x00 with value at offset 0)
[a, 0x20] ; PUSH1 0x20
[a, b] ; CALLDATALOAD (loads value at offset 32)
[a+b] ; ADD
...
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Hash functions | SHA-256 is deterministic, one-way, collision-resistant. Blocks chain via hashes. Changing one bit invalidates everything after. |
| Merkle trees | Binary tree of hashes. Proves inclusion in O(log n). Enables light clients and efficient verification. |
| Elliptic curves (secp256k1) | Private key × Generator Point = Public key. Easy to compute, impossible to reverse. Foundation of digital signatures. |
| ECDSA signatures | Prove ownership without revealing private key. Unique per message. Anyone can verify. |
| UTXO vs Account model | Bitcoin tracks unspent outputs (like cash). Ethereum tracks balances (like bank accounts). Trade-offs in privacy, parallelism, and expressiveness. |
| Proof of Work | Find nonce where hash < target. Expensive to create, cheap to verify. Longest chain wins. 51% attack is economically irrational. |
| The EVM | Stack-based VM with 256-bit words. Stack (1024 deep), Memory (volatile), Storage (persistent). Gas measures computation. Opcodes are single bytes. |
| Consensus | Agreement without central authority. PoW uses energy, PoS uses stake. Both make attacks expensive. |
Deep Dive Reading by Concept
This section maps each concept to specific book chapters for deeper understanding. Read these before or alongside the projects to build strong mental models.
Cryptographic Foundations
| Concept | Book & Chapter |
|---|---|
| Hash function internals | Serious Cryptography, 2nd Edition by Jean-Philippe Aumasson — Ch. 6: “Hash Functions” |
| Merkle trees and proofs | Mastering Bitcoin, 3rd Edition by Andreas Antonopoulos — Ch. 11: “The Blockchain” |
| Elliptic curve math | Programming Bitcoin by Jimmy Song — Ch. 2-3: “Elliptic Curves” and “Elliptic Curve Cryptography” |
| ECDSA signatures | Programming Bitcoin by Jimmy Song — Ch. 4: “Serialization” and Ch. 5: “Transactions” |
| Cryptographic primitives overview | Practical Cryptography for Developers (online) by Svetlin Nakov |
Bitcoin Internals
| Concept | Book & Chapter |
|---|---|
| Transaction structure | Mastering Bitcoin, 3rd Edition — Ch. 6: “Transactions” |
| UTXO model deep dive | Programming Bitcoin by Jimmy Song — Ch. 5-7: Transactions and Script |
| Bitcoin Script opcodes | Programming Bitcoin by Jimmy Song — Ch. 6: “Script” |
| Block structure | Mastering Bitcoin, 3rd Edition — Ch. 11: “The Blockchain” |
| Proof of Work mining | Programming Bitcoin by Jimmy Song — Ch. 9: “Blocks” |
Ethereum & Smart Contracts
| Concept | Book & Chapter |
|---|---|
| EVM architecture | Mastering Ethereum by Antonopoulos & Wood — Ch. 13: “The Ethereum Virtual Machine” |
| Smart contract basics | Mastering Ethereum — Ch. 7: “Smart Contracts and Solidity” |
| Gas and execution model | Mastering Ethereum — Ch. 13 (Gas section) |
| Account model | Mastering Ethereum — Ch. 4: “Cryptography” and Ch. 5: “Wallets” |
Distributed Systems & Consensus
| Concept | Book & Chapter |
|---|---|
| Byzantine Fault Tolerance | Designing Data-Intensive Applications by Martin Kleppmann — Ch. 8-9: “Distributed Systems Trouble” and “Consistency and Consensus” |
| P2P networking | Computer Networks by Tanenbaum & Wetherall — Ch. 5: “Network Layer” |
| Consensus algorithms | Designing Data-Intensive Applications — Ch. 9: “Consistency and Consensus” |
| Proof of Stake | Ethereum Casper papers and Vitalik’s blog posts |
Building Virtual Machines
| Concept | Book & Chapter |
|---|---|
| Stack machine architecture | Crafting Interpreters by Robert Nystrom — Part III: “A Bytecode Virtual Machine” |
| Bytecode design | Crafting Interpreters — Ch. 14-15: “Chunks of Bytecode” and “A Virtual Machine” |
| Compiler construction | Writing a C Compiler by Nora Sandler — Full book |
Essential Reading Order
For maximum comprehension, read in this order:
- Foundation (Week 1):
- Programming Bitcoin Ch. 1-4 (field math, curves, serialization)
- Mastering Bitcoin Ch. 6 (transactions)
- Bitcoin Deep Dive (Week 2):
- Programming Bitcoin Ch. 5-9 (transactions, script, blocks)
- Mastering Bitcoin Ch. 11 (blockchain)
- Ethereum (Week 3):
- Mastering Ethereum Ch. 4-7 (crypto, wallets, contracts)
- Mastering Ethereum Ch. 13 (EVM)
- Distributed Systems (Week 4):
- Designing Data-Intensive Applications Ch. 8-9
Project 1: Build Bitcoin From Scratch
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Programming Language: Python
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 5: Master
- Knowledge Area: Blockchain / Cryptography
- Software or Tool: Bitcoin
- Main Book: “Programming Bitcoin” by Jimmy Song
What you’ll build: A complete Bitcoin library implementing elliptic curve cryptography, transactions, blocks, script parsing, and network communication—all from first principles in Python.
Why it teaches blockchain: This forces you to understand why Bitcoin works, not just that it works. You can’t fake your way through implementing ECDSA signatures or parsing transaction scripts. Every line of code confronts you with a design decision Satoshi made.
Core challenges you’ll face:
- Finite field arithmetic → Maps to understanding why Bitcoin uses secp256k1 curve
- Elliptic curve point operations → Maps to how public keys derive from private keys
- Transaction serialization → Maps to how data is encoded on-chain
- Script interpreter → Maps to Bitcoin’s programmability model
- Merkle tree construction → Maps to how blocks efficiently prove transaction inclusion
- Block header hashing → Maps to proof-of-work mining
Resources for key challenges:
- “Programming Bitcoin” by Jimmy Song - THE definitive hands-on guide; each chapter builds the library piece by piece with exercises
- Bitcoin Whitepaper - Read alongside implementation to see theory meet practice
Key Concepts:
- Elliptic Curve Cryptography: “Programming Bitcoin” Ch. 2-3 - Jimmy Song
- Transaction Structure: “Mastering Bitcoin” Ch. 6 - Andreas Antonopoulos
- Script Opcodes: “Programming Bitcoin” Ch. 6 - Jimmy Song
- Merkle Trees: “Mastering Bitcoin” Ch. 11 - Andreas Antonopoulos
- Proof of Work: “Programming Bitcoin” Ch. 9 - Jimmy Song
Difficulty: Advanced Time estimate: 1 month+ Prerequisites: Python proficiency, basic number theory helps
Learning milestones:
- Generate valid Bitcoin addresses - You understand public key cryptography
- Parse and create transactions - You understand the UTXO model
- Validate a real block - You understand proof-of-work and Merkle proofs
- Interpret Script opcodes - You understand Bitcoin’s programmability
Real World Outcome
When you complete this project, you’ll have a fully functional Bitcoin library that you wrote from scratch. Here’s exactly what you’ll be able to do:
1. Generate Real Bitcoin Addresses
$ python bitcoin_cli.py generate-wallet
Private Key (WIF): 5HueCGU8rMjxEXxiPuD5BDku4MkFqeZyd4dZ1jvhTVqvbTLvyTJ
Public Key (compressed): 02d0de0aaeaefad02b8bdc8a01a1b8b11c696bd3d66a2c5f10780d95b7df42645c
Bitcoin Address (P2PKH): 1GAehh7TsJAHuUAeKZcXf5CnwuGuGgyX2S
This is a REAL Bitcoin address on mainnet!
Send 0.00001 BTC to it to verify (you can recover with the private key above)
2. Parse and Decode Real Transactions from the Blockchain
$ python bitcoin_cli.py decode-tx 0100000001c997a5e56e104102fa209c6a852dd90660a20b2d9c352423edce25857fcd3704000000...
TRANSACTION DECODED:
════════════════════════════════════════════════════════
Version: 1
Locktime: 0
INPUTS (1):
[0] Previous TX: 0437cd7f8525cede2e4a0b40b1f6e3c7...
Output Index: 0
ScriptSig: 47304402204e45e16932b8af514961...
Sequence: 0xffffffff
OUTPUTS (2):
[0] Value: 0.10000000 BTC (10,000,000 satoshis)
ScriptPubKey: OP_DUP OP_HASH160 <pubkeyhash> OP_EQUALVERIFY OP_CHECKSIG
Type: P2PKH (Pay to Public Key Hash)
Address: 1runeksijzfVxyrpiyCY2LCBvYsSi1Ai6
[1] Value: 0.08950000 BTC (Change)
ScriptPubKey: OP_DUP OP_HASH160 <pubkeyhash> OP_EQUALVERIFY OP_CHECKSIG
Address: 1QJtPTVJjkqVALLzCLF9kCLJYN4C4GzK2c
Transaction ID: e4c226432e...
Size: 225 bytes
════════════════════════════════════════════════════════
3. Create and Sign Your Own Transactions
$ python bitcoin_cli.py create-tx \
--from-utxo "tx_id:0" \
--to "1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2:0.001" \
--change "1MyAddress..." \
--private-key "your_wif_key"
SIGNED TRANSACTION:
Raw Hex: 0100000001eccf7e3034189b851985d871f91384b8ee357cd47c3024736e...
Breakdown:
- Input signed with ECDSA (your implementation!)
- Signature verified: ✓ VALID
- Ready to broadcast to network
You can paste this hex into any block explorer's "broadcast" feature
4. Validate Real Blocks from the Bitcoin Network
$ python bitcoin_cli.py validate-block 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
BLOCK 0 (Genesis Block) VALIDATION:
════════════════════════════════════════════════════════
Header Hash: 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
^^^^^^^^ (Leading zeros = Proof of Work!)
Version: 1
Previous Block: 0000000000000000000000000000000000000000000000000000000000000000
Merkle Root: 4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
Timestamp: 2009-01-03 18:15:05 UTC
Difficulty Bits: 0x1d00ffff
Nonce: 2083236893
VALIDATION RESULTS:
✓ Block hash below target (PoW valid)
✓ Merkle root matches transactions
✓ Coinbase transaction present
✓ Block structure valid
"The Times 03/Jan/2009 Chancellor on brink of second bailout for banks"
- Satoshi's message embedded in the coinbase!
════════════════════════════════════════════════════════
5. Execute Bitcoin Script Programs
$ python bitcoin_cli.py run-script "OP_2 OP_3 OP_ADD OP_5 OP_EQUAL"
SCRIPT EXECUTION TRACE:
════════════════════════════════════════════════════════
Step 1: OP_2 Stack: [2]
Step 2: OP_3 Stack: [2, 3]
Step 3: OP_ADD Stack: [5] (popped 2 and 3, pushed 5)
Step 4: OP_5 Stack: [5, 5]
Step 5: OP_EQUAL Stack: [1] (1 = TRUE)
RESULT: ✓ Script executed successfully (stack top is truthy)
════════════════════════════════════════════════════════
The Core Question You’re Answering
“How does Bitcoin actually work at the byte level? How does a private key become an address? How does a transaction prove ownership without revealing the private key?”
Before you write any code, sit with these questions. Most developers know “Bitcoin uses cryptography” but can’t explain how a 256-bit random number becomes a valid Bitcoin address, or why you can prove you own coins without ever revealing your private key.
The answer involves:
- Finite field arithmetic (modular math in a prime field)
- Elliptic curve point multiplication (how a number becomes a curve point)
- Cryptographic hash chains (how addresses are derived)
- Digital signature math (how ECDSA proves knowledge without revelation)
Concepts You Must Understand First
Stop and research these before coding:
- Finite Fields (Modular Arithmetic)
- What does “mod p” mean and why is it useful for cryptography?
- What is a “field” in abstract algebra? Why does Bitcoin use prime fields?
- How do you compute modular inverses? (Extended Euclidean Algorithm)
- Why does Fermat’s Little Theorem give us a^(p-1) ≡ 1 (mod p)?
- Book Reference: “Programming Bitcoin” Ch. 1 - Jimmy Song
- Elliptic Curves over Finite Fields
- What is an elliptic curve equation? (y² = x³ + ax + b)
- What does “point addition” mean geometrically?
- What is the “point at infinity” and why do we need it?
- What makes secp256k1 special? (y² = x³ + 7)
- Book Reference: “Programming Bitcoin” Ch. 2-3 - Jimmy Song
- Digital Signatures (ECDSA)
- Why can signing prove ownership without revealing the private key?
- What is a “nonce” and why MUST it never be reused? (Sony PlayStation 3 hack!)
- What do r and s in a signature represent?
- How does verification work using only the public key?
- Book Reference: “Understanding and Using C Pointers” Ch. 1-2 for pointer intuition; “Programming Bitcoin” Ch. 4 - Jimmy Song for signatures
- Transaction Structure (UTXOs)
- What is a UTXO (Unspent Transaction Output)?
- Why does Bitcoin use inputs/outputs rather than account balances?
- What is a “locking script” (scriptPubKey) vs “unlocking script” (scriptSig)?
- How does OP_CHECKSIG actually verify a signature?
- Book Reference: “Mastering Bitcoin” Ch. 6 - Andreas Antonopoulos
- Bitcoin Script
- What is a stack-based language?
- Why did Satoshi make Script intentionally NOT Turing-complete?
- What are the most important opcodes: OP_DUP, OP_HASH160, OP_EQUALVERIFY, OP_CHECKSIG?
- How does P2PKH (Pay-to-Public-Key-Hash) work?
- Book Reference: “Programming Bitcoin” Ch. 6 - Jimmy Song
- Block Structure and Merkle Trees
- What is in a block header? (version, prev_hash, merkle_root, timestamp, bits, nonce)
- How does the Merkle root commit to all transactions?
- What is the “difficulty target” and how is it encoded in 4 bytes?
- Why does finding a valid nonce take billions of attempts?
- Book Reference: “Programming Bitcoin” Ch. 9 - Jimmy Song; “Mastering Bitcoin” Ch. 11 - Antonopoulos
Questions to Guide Your Design
Before implementing, think through these:
- Finite Field Class
- How will you represent a finite field element?
- How do you ensure all operations stay within the field (mod p)?
- What’s the most efficient way to compute modular exponentiation?
- How will you handle division (modular inverse)?
- Elliptic Curve Point Class
- How do you represent the “point at infinity”?
- How do you handle the special case where two points have the same x-coordinate?
- How do you efficiently compute k × G for large k? (Hint: double-and-add)
- How will you distinguish between compressed and uncompressed public keys?
- Transaction Serialization
- Why does Bitcoin use little-endian for some fields and big-endian for others?
- How do you parse variable-length integers (varints)?
- What is the “signature hash” and why is it different from the transaction hash?
- How do you handle witness data (SegWit)?
- Script Interpreter
- How will you represent opcodes?
- How do you handle conditional operators (OP_IF, OP_ELSE)?
- What should happen when a script fails?
- How do you handle OP_CHECKMULTISIG’s famous off-by-one bug?
- Block Validation
- How do you check if a block hash meets the difficulty target?
- How do you verify the Merkle root matches the transactions?
- What order should you validate things in for efficiency?
Thinking Exercise
Before coding, work through this on paper:
Exercise 1: Trace Key Derivation
Private Key (random 256-bit number):
k = 0x1 (simplest example)
Generator Point G on secp256k1:
G = (0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798,
0x483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8)
Public Key P = k × G = 1 × G = G (when k=1, P equals G)
Now trace what happens:
1. Serialize P as compressed (33 bytes) or uncompressed (65 bytes)
2. SHA256(serialized_P) → 32 bytes
3. RIPEMD160(sha256_result) → 20 bytes (the "hash160")
4. Prepend version byte (0x00 for mainnet)
5. SHA256(SHA256(version + hash160)) → take first 4 bytes as checksum
6. Base58Check encode (version + hash160 + checksum)
7. Result: Bitcoin address!
Exercise 2: Trace a Transaction
Given these UTXOs you control:
- UTXO 1: 0.5 BTC from tx_aaa, output 0
- UTXO 2: 0.3 BTC from tx_bbb, output 1
You want to send 0.6 BTC to address 1Bob...
Fee: 0.001 BTC
Work out:
1. Which UTXOs do you need to spend?
2. What is the change amount?
3. What does the raw transaction look like (draw the structure)?
4. What exactly gets signed? (The signature hash)
5. Where does the signature go in the final transaction?
The Interview Questions They’ll Ask
Prepare to answer these:
- “Walk me through what happens when you send a Bitcoin transaction.”
- “How does ECDSA prove you own a private key without revealing it?”
- “What is a UTXO and why did Satoshi choose this model over account balances?”
- “What makes Bitcoin Script different from a Turing-complete language?”
- “How does proof-of-work prevent double-spending?”
- “What would happen if someone reused a nonce in two different signatures?”
- “How does a Merkle tree allow light clients to verify transactions?”
- “What’s the difference between a transaction ID and the data that gets signed?”
- “Why does Bitcoin have both P2PKH and P2SH? What problem does P2SH solve?”
- “How is the difficulty target encoded in 4 bytes?”
Hints in Layers
Hint 1: Start with Finite Fields Build and test your finite field arithmetic first:
class FieldElement:
def __init__(self, num, prime):
self.num = num % prime
self.prime = prime
def __add__(self, other):
return FieldElement((self.num + other.num) % self.prime, self.prime)
def __pow__(self, exponent):
# Fermat's Little Theorem for negative exponents
n = exponent % (self.prime - 1)
return FieldElement(pow(self.num, n, self.prime), self.prime)
Test: Verify that a * a^(-1) = 1 for various values.
Hint 2: Point Addition Has Edge Cases The formula for adding two points depends on whether:
- Points are the same (point doubling)
- Points have the same x-coordinate (result is infinity)
- One point is the point at infinity
Draw the geometric picture before coding!
Hint 3: Signature Hash Is Not Transaction Hash When signing, you don’t sign the transaction—you sign a modified version where the input’s scriptSig is replaced with the previous output’s scriptPubKey. This tripped up many early implementers.
Hint 4: Test Against Real Data The book “Programming Bitcoin” includes test vectors from the real Bitcoin network. Use them! If your transaction parser can’t decode real transactions, something is wrong.
Hint 5: Use the Debug Flag Add verbose logging to your Script interpreter:
OP_DUP Stack: [pubkey] → [pubkey, pubkey]
OP_HASH160 Stack: [pubkey, pubkey] → [pubkey, hash160(pubkey)]
This makes debugging script execution much easier.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Complete Bitcoin implementation | Programming Bitcoin by Jimmy Song | Full book (follow along!) |
| Bitcoin transaction concepts | Mastering Bitcoin, 3rd Edition by Andreas Antonopoulos | Ch. 6: “Transactions” |
| Block structure and mining | Mastering Bitcoin, 3rd Edition | Ch. 10-11: “Mining” and “The Blockchain” |
| Elliptic curve cryptography math | Serious Cryptography, 2nd Edition by Jean-Philippe Aumasson | Ch. 11-12: “Public-Key Cryptography” |
| Hash function properties | Serious Cryptography, 2nd Edition | Ch. 6: “Hash Functions” |
| Bitcoin whitepaper context | The Book of Satoshi by Phil Champagne | Historical context and design decisions |
| Number theory foundations | An Introduction to Mathematical Cryptography by Hoffstein, Pipher, Silverman | Ch. 1-2: Modular arithmetic |
Project 2: Build a Minimal Blockchain in a Weekend
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Programming Language: C
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Blockchain / Consensus
- Software or Tool: Blockchain concepts
- Main Book: “Mastering Bitcoin” by Andreas Antonopoulos
What you’ll build: A simple blockchain with proof-of-work consensus, transaction pool, and P2P gossip—the essential skeleton that makes all blockchains tick.
Why it teaches blockchain: Before diving into Bitcoin/Ethereum complexity, you need the core mental model: blocks chain together via hashes, nodes gossip transactions, and PoW makes forgery expensive. This project isolates those fundamentals.
Core challenges you’ll face:
- Chain integrity via hashes → Maps to why tampering is detectable
- Difficulty adjustment → Maps to why block times stay consistent
- Fork resolution → Maps to why “longest chain wins”
- Transaction validation → Maps to preventing double-spends
- Gossip protocol → Maps to how decentralization works
Key Concepts:
- Hash Functions: “Serious Cryptography, 2nd Edition” Ch. 6 - Jean-Philippe Aumasson
- Distributed Consensus: “Designing Data-Intensive Applications” Ch. 8-9 - Martin Kleppmann
- P2P Networking: “Computer Networks” Ch. 5 - Tanenbaum & Wetherall
Difficulty: Intermediate Time estimate: Weekend to 1 week Prerequisites: Any programming language, basic networking concepts
Learning milestones:
- Single node mines blocks - You understand PoW mechanics
- Two nodes sync chains - You understand gossip and fork resolution
- Transactions propagate and confirm - You understand the mempool and block inclusion
Real World Outcome
When you complete this project, you’ll have a working distributed blockchain running across multiple terminals (or machines). Here’s exactly what you’ll see:
1. Start Your First Node (Terminal 1)
$ ./blockchain-node --port 3000 --mine
╔═══════════════════════════════════════════════════════════════════╗
║ MINIMAL BLOCKCHAIN NODE v1.0 ║
║ Listening on port 3000 ║
╚═══════════════════════════════════════════════════════════════════╝
[2024-12-22 14:30:01] Genesis block created
Hash: 0000a1b2c3d4e5f6...
Difficulty: 4 (4 leading zeros required)
[2024-12-22 14:30:01] Starting mining thread...
[2024-12-22 14:30:01] Mining block 1...
[2024-12-22 14:30:03] Nonce attempt: 1000000
[2024-12-22 14:30:05] Nonce attempt: 2000000
[2024-12-22 14:30:08] ✓ BLOCK MINED!
Block #1
Hash: 0000f8e7d6c5b4a3...
Nonce: 2847291
Transactions: 1 (coinbase only)
Mining took: 7.2 seconds
[2024-12-22 14:30:08] Mining block 2...
2. Start a Second Node and Watch Synchronization (Terminal 2)
$ ./blockchain-node --port 3001 --peer localhost:3000
╔═══════════════════════════════════════════════════════════════════╗
║ MINIMAL BLOCKCHAIN NODE v1.0 ║
║ Listening on port 3001 ║
╚═══════════════════════════════════════════════════════════════════╝
[2024-12-22 14:31:00] Connecting to peer: localhost:3000
[2024-12-22 14:31:00] ← Received: CHAIN_REQUEST
[2024-12-22 14:31:00] → Sending: CHAIN_RESPONSE (3 blocks)
[2024-12-22 14:31:00] ✓ Synchronized with peer
Local chain: 3 blocks
Peer chain: 3 blocks
[2024-12-22 14:31:00] CURRENT CHAIN STATE:
┌─────────────────────────────────────────────────────────────────┐
│ Block 0 (Genesis) │
│ Hash: 0000a1b2c3d4e5f6789... │
│ Prev: 0000000000000000000... │
│ Txns: 0 │
├─────────────────────────────────────────────────────────────────┤
│ Block 1 │
│ Hash: 0000f8e7d6c5b4a3210... │
│ Prev: 0000a1b2c3d4e5f6789... │
│ Txns: 1 │
├─────────────────────────────────────────────────────────────────┤
│ Block 2 │
│ Hash: 00003c4d5e6f7a8b9c0... │
│ Prev: 0000f8e7d6c5b4a3210... │
│ Txns: 1 │
└─────────────────────────────────────────────────────────────────┘
3. Submit a Transaction and Watch It Propagate
$ ./blockchain-cli --node localhost:3001 send --from alice --to bob --amount 50
TRANSACTION SUBMITTED:
════════════════════════════════════════════════════════════════════
TX ID: tx_7f8a9b0c1d2e3f4a5b6c...
From: alice
To: bob
Amount: 50 coins
[2024-12-22 14:32:00] → Broadcasting to 2 connected peers...
[2024-12-22 14:32:00] ✓ Peer localhost:3000 acknowledged
[2024-12-22 14:32:00] ✓ Peer localhost:3002 acknowledged
Status: IN MEMPOOL (waiting for block inclusion)
════════════════════════════════════════════════════════════════════
# On Node 1 (Terminal 1), you'll see:
[2024-12-22 14:32:00] ← Received TX: tx_7f8a9b0c... (alice → bob: 50)
[2024-12-22 14:32:00] ✓ TX validated and added to mempool
[2024-12-22 14:32:00] Mempool size: 1 transaction
# When the block is mined:
[2024-12-22 14:32:15] ✓ BLOCK MINED!
Block #4
Hash: 0000abc123def456...
Transactions: 2 (1 coinbase + 1 user tx)
Including: tx_7f8a9b0c... (alice → bob: 50)
# On all nodes:
[2024-12-22 14:32:15] ← Received BLOCK: #4 from peer
[2024-12-22 14:32:15] ✓ Block validated and added to chain
[2024-12-22 14:32:15] TX tx_7f8a9b0c... now has 1 confirmation
4. Watch a Fork Happen and Resolve
# Start two miners simultaneously, disconnect them, let them each mine 2 blocks,
# then reconnect and watch the shorter chain get abandoned:
[2024-12-22 14:35:00] ⚠ FORK DETECTED!
Local chain: blocks 4a → 5a (total work: 12847291)
Remote chain: blocks 4b → 5b → 6b (total work: 19283746)
[2024-12-22 14:35:00] Remote chain has more work. REORGANIZING...
[2024-12-22 14:35:00] ✗ Orphaning block 5a (hash: 0000xyz...)
[2024-12-22 14:35:00] ✗ Orphaning block 4a (hash: 0000uvw...)
[2024-12-22 14:35:00] ✓ Adopting block 4b (hash: 0000rst...)
[2024-12-22 14:35:00] ✓ Adopting block 5b (hash: 0000opq...)
[2024-12-22 14:35:00] ✓ Adopting block 6b (hash: 0000lmn...)
[2024-12-22 14:35:00] Reorganization complete. Chain tip is now block 6b.
[2024-12-22 14:35:00] Returning 3 transactions from orphaned blocks to mempool.
5. Query Blockchain State
$ ./blockchain-cli --node localhost:3000 status
BLOCKCHAIN STATUS
════════════════════════════════════════════════════════════════════
Chain height: 12 blocks
Total difficulty: 48 (4 zeros × 12 blocks)
Chain work: 142,847,291 total hash attempts
Connected peers: 3
- localhost:3001 (height: 12)
- localhost:3002 (height: 12)
- 192.168.1.50:3000 (height: 12)
Mempool: 2 pending transactions
- tx_abc123... (alice → carol: 25)
- tx_def456... (bob → dave: 10)
ACCOUNT BALANCES:
alice: 425 coins (from mining + transfers)
bob: 50 coins
carol: 0 coins (25 pending)
miner1: 600 coins (block rewards)
════════════════════════════════════════════════════════════════════
The Core Question You’re Answering
“How do distributed nodes agree on a single version of truth without a central authority? Why can’t someone just create a fake blockchain?”
This is the fundamental question that Bitcoin solved. Before you write any code, understand:
- Why does hashing create “chains”? (Each block commits to all previous blocks)
- Why does Proof of Work create “irreversibility”? (Rewriting requires redoing all work)
- Why does “longest chain wins” create consensus? (Honest majority outpaces attackers)
- Why does gossip create “decentralization”? (No single point of failure)
Concepts You Must Understand First
Stop and research these before coding:
- Cryptographic Hash Functions
- What makes SHA-256 “secure”? (Preimage resistance, collision resistance)
- Why does changing one bit change the entire hash? (Avalanche effect)
- How do you verify data integrity with a hash?
- Book Reference: “Serious Cryptography, 2nd Edition” Ch. 6 - Jean-Philippe Aumasson
- Linked Data Structures via Hashes
- How does including the previous hash in each block create a “chain”?
- Why can’t you modify a block in the middle without invalidating everything after?
- What is a “hash pointer” and how is it different from a regular pointer?
- Book Reference: “Mastering Bitcoin” Ch. 11 - Andreas Antonopoulos
- Proof of Work
- What is a “target” and what does it mean for a hash to be “below” it?
- Why is finding a valid nonce hard but verifying it easy?
- How does difficulty adjustment maintain consistent block times?
- What is a “nonce” and why does incrementing it change the hash?
- Book Reference: “Programming Bitcoin” Ch. 9 - Jimmy Song
- Consensus and Fork Resolution
- What happens when two miners find valid blocks at the same time?
- Why does “longest chain” (or “most work”) win?
- What is “reorganization” and when does it happen?
- What is the difference between a soft fork and a hard fork?
- Book Reference: “Designing Data-Intensive Applications” Ch. 9 - Martin Kleppmann
- P2P Networking and Gossip
- How do nodes discover each other without a central server?
- What is a “gossip protocol” and why is it resilient?
- How do you prevent message loops in a mesh network?
- What is the difference between push and pull gossip?
- Book Reference: “Computer Networks” Ch. 5 - Tanenbaum & Wetherall
- Transaction Pools (Mempools)
- What is a mempool and why is it needed?
- How do you validate a transaction before including it?
- How do miners choose which transactions to include?
- What happens to transactions in orphaned blocks?
- Book Reference: “Mastering Bitcoin” Ch. 6 and 10 - Antonopoulos
Questions to Guide Your Design
Before implementing, think through these:
- Block Structure
- What fields must a block have? (index, timestamp, transactions, prev_hash, nonce, hash)
- How will you serialize a block for hashing?
- How will you store blocks? (In-memory array? File? Database?)
- How will you handle the genesis block (no previous hash)?
- Proof of Work Mining
- How will you represent the difficulty target?
- How will you increment the nonce?
- Should mining run in a separate thread?
- How do you stop mining when you receive a valid block from a peer?
- Networking
- What message types do you need? (NEW_BLOCK, NEW_TX, GET_CHAIN, CHAIN_RESPONSE, etc.)
- How will you serialize messages? (JSON? Binary? Custom?)
- How will you handle partial reads from sockets?
- How will you prevent infinite message forwarding?
- Transaction Validation
- How will you track account balances? (UTXO vs account model)
- How will you prevent double-spending?
- What happens if a transaction references a balance from an unconfirmed transaction?
- How will you handle coinbase (block reward) transactions?
- Fork Resolution
- How will you compare two chains to decide which is “better”?
- What do you do when you receive a block that doesn’t extend your chain?
- How will you request missing blocks from peers?
- How will you return orphaned transactions to the mempool?
Thinking Exercise
Before coding, trace this scenario on paper:
Scenario: Three nodes, one malicious
Time T0: All nodes have chain: [Block 0] → [Block 1] → [Block 2]
Block 2 contains: alice → bob: 50 coins
Time T1: Node A (malicious) disconnects from network
Node A starts mining an alternate Block 2':
Block 2': alice → mallory: 50 coins (double-spend attempt!)
Time T2: Node A mines Block 2' and Block 3' (2 blocks deep)
Meanwhile, honest network mines only Block 3
Time T3: Node A reconnects with chain: [B0] → [B1] → [B2'] → [B3']
Honest nodes have: [B0] → [B1] → [B2] → [B3]
Questions:
1. Which chain "wins"? Why?
2. What happens to the transaction "alice → bob: 50"?
3. What would Mallory need to accomplish this attack?
4. How does this relate to "6 confirmations" recommendation?
Draw the fork diagram and trace the resolution!
The Interview Questions They’ll Ask
Prepare to answer these:
- “What prevents someone from rewriting blockchain history?”
- “Why does Proof of Work use so much energy? Is there an alternative?”
- “What happens when two miners find a block at the same time?”
- “How does a new node synchronize with the network?”
- “What is a 51% attack and is it actually feasible?”
- “Why do Bitcoin transactions need multiple confirmations?”
- “How does gossip protocol handle network partitions?”
- “What’s the difference between your minimal blockchain and Bitcoin?”
- “How would you add transaction fees to your implementation?”
- “What are the trade-offs between short and long block times?”
Hints in Layers
Hint 1: Start with a Single-Node Chain Get this working first before any networking:
typedef struct Block {
uint32_t index;
uint32_t timestamp;
char prev_hash[65]; // SHA-256 hex string
char hash[65];
uint32_t nonce;
char data[1024]; // Simplified: just a string
} Block;
char* calculate_hash(Block* block) {
char buffer[2048];
sprintf(buffer, "%d%d%s%d%s",
block->index, block->timestamp,
block->prev_hash, block->nonce, block->data);
return sha256(buffer); // You'll need a SHA-256 library
}
bool is_valid_hash(char* hash, int difficulty) {
for (int i = 0; i < difficulty; i++) {
if (hash[i] != '0') return false;
}
return true;
}
Hint 2: Mining Loop Pattern
void mine_block(Block* block, int difficulty) {
block->nonce = 0;
while (true) {
char* hash = calculate_hash(block);
if (is_valid_hash(hash, difficulty)) {
strcpy(block->hash, hash);
return;
}
block->nonce++;
if (block->nonce % 100000 == 0) {
printf("Nonce: %d, Hash: %.16s...\n", block->nonce, hash);
}
}
}
Hint 3: Simple TCP Message Protocol
Message format:
[4 bytes: message type] [4 bytes: payload length] [N bytes: payload]
Message types:
0x01: NEW_BLOCK
0x02: NEW_TRANSACTION
0x03: REQUEST_CHAIN
0x04: CHAIN_RESPONSE
0x05: PEER_ANNOUNCE
Hint 4: Chain Comparison
int compare_chains(Block* chain_a, int len_a, Block* chain_b, int len_b) {
// Sum of all nonces represents total "work"
uint64_t work_a = 0, work_b = 0;
for (int i = 0; i < len_a; i++) work_a += chain_a[i].nonce;
for (int i = 0; i < len_b; i++) work_b += chain_b[i].nonce;
return (work_a > work_b) ? 1 : (work_a < work_b) ? -1 : 0;
}
Hint 5: Use select() for Multi-Connection Handling
fd_set read_fds;
FD_ZERO(&read_fds);
FD_SET(server_socket, &read_fds);
for (int i = 0; i < num_peers; i++) {
FD_SET(peer_sockets[i], &read_fds);
}
select(max_fd + 1, &read_fds, NULL, NULL, &timeout);
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Hash function fundamentals | Serious Cryptography, 2nd Edition by Jean-Philippe Aumasson | Ch. 6: “Hash Functions” |
| Blockchain structure | Mastering Bitcoin, 3rd Edition by Andreas Antonopoulos | Ch. 11: “The Blockchain” |
| Proof of Work mining | Programming Bitcoin by Jimmy Song | Ch. 9: “Blocks” |
| Distributed consensus theory | Designing Data-Intensive Applications by Martin Kleppmann | Ch. 8-9: “Trouble with Distributed Systems” |
| Network programming in C | The Linux Programming Interface by Michael Kerrisk | Ch. 56-61: “Sockets” |
| P2P protocols | Computer Networks by Tanenbaum & Wetherall | Ch. 5: “Network Layer” |
| Bitcoin P2P specifics | Mastering Bitcoin, 3rd Edition | Ch. 8: “The Bitcoin Network” |
Project 3: Build the Ethereum Virtual Machine (EVM) From Scratch
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Main Programming Language: Rust
- Alternative Programming Languages: Go, TypeScript, Python
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: Level 1: The “Resume Gold”
- Difficulty: Level 4: Expert (The Systems Architect)
- Knowledge Area: Virtual Machines, Blockchain
- Software or Tool: EVM, Ethereum, Bytecode
- Main Book: “Crafting Interpreters” by Robert Nystrom
What you’ll build: A stack-based virtual machine that executes Ethereum bytecode—implementing opcodes like PUSH, ADD, SSTORE, CALL, and the gas metering system.
Why it teaches Ethereum: The EVM is Ethereum’s brain. Smart contracts compile to bytecode that the EVM executes. Building it yourself reveals why gas exists, how storage works, why reentrancy attacks happen, and what makes smart contracts “smart.”
Core challenges you’ll face:
- Stack machine architecture → Maps to how computation is expressed
- 256-bit word operations → Maps to why Solidity uses uint256
- Gas accounting → Maps to preventing infinite loops and spam
- Storage (SLOAD/SSTORE) → Maps to contract state persistence
- CALL semantics → Maps to contract-to-contract interaction
- Memory vs storage vs stack → Maps to Solidity optimization
Resources for key challenges:
- EVM From Scratch Course by W1nt3r.eth - 116 progressive tests to pass
- EVM From Scratch Book - Jupyter notebooks building step-by-step
- “Mastering Ethereum” Ch. 13 - Authoritative EVM reference
Key Concepts:
- Stack Machines: “Computer Systems: A Programmer’s Perspective” Ch. 3 - Bryant & O’Hallaron
- Virtual Machine Design: “Crafting Interpreters” - Robert Nystrom (free online)
- EVM Opcodes: Ethereum Yellow Paper (formal specification)
Difficulty: Advanced Time estimate: 2-4 weeks Prerequisites: Understanding of stack-based computation, any systems language
Learning milestones:
- Arithmetic opcodes work - You understand the stack model
- Control flow (JUMP/JUMPI) works - You understand how loops and conditionals compile
- Storage operations work - You understand contract state
- CALL works - You understand contract interaction and reentrancy
Real World Outcome
When you complete this project, you’ll have a working EVM that can execute real Solidity bytecode. Here’s exactly what you’ll be able to do:
1. Execute and Trace Simple Bytecode
$ ./evm-cli run --bytecode "6005600401" --trace
EVM EXECUTION TRACE
════════════════════════════════════════════════════════════════════
Bytecode: 60 05 60 04 01
│ │ │ │ │
│ │ │ │ └─ ADD (0x01)
│ │ │ └──── 0x04 (data for PUSH1)
│ │ └─────── PUSH1 (0x60)
│ └────────── 0x05 (data for PUSH1)
└───────────── PUSH1 (0x60)
Step 1: PUSH1 0x05
PC: 0 → 2
Gas: 3 consumed (2999997 remaining)
Stack: [] → [5]
Step 2: PUSH1 0x04
PC: 2 → 4
Gas: 3 consumed (2999994 remaining)
Stack: [5] → [5, 4]
Step 3: ADD
PC: 4 → 5
Gas: 3 consumed (2999991 remaining)
Stack: [5, 4] → [9]
EXECUTION COMPLETE
════════════════════════════════════════════════════════════════════
Result: 0x09 (9 in decimal)
Gas used: 9
Status: SUCCESS
════════════════════════════════════════════════════════════════════
2. Execute Real Compiled Solidity Contracts
# First, compile a simple Solidity contract:
# contract Counter {
# uint256 count;
# function increment() public { count += 1; }
# function get() public view returns (uint256) { return count; }
# }
$ ./evm-cli deploy --bytecode "608060405234801561001057600080fd5b50610..."
CONTRACT DEPLOYED
════════════════════════════════════════════════════════════════════
Contract Address: 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4
Bytecode Size: 245 bytes
Gas Used: 127,543
════════════════════════════════════════════════════════════════════
$ ./evm-cli call --to 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4 \
--data "0xd09de08a" \ # increment() function selector
--trace
FUNCTION CALL: increment()
════════════════════════════════════════════════════════════════════
Step 1: PUSH1 0x80 Stack: [128]
Step 2: PUSH1 0x40 Stack: [128, 64]
Step 3: MSTORE Stack: [] Memory[64] = 128
...
Step 45: SLOAD Stack: [0] (load count from slot 0)
Step 46: PUSH1 0x01 Stack: [0, 1]
Step 47: ADD Stack: [1]
Step 48: PUSH1 0x00 Stack: [1, 0]
Step 49: SSTORE Stack: [] Storage[0] = 1
...
Step 62: RETURN
EXECUTION COMPLETE
════════════════════════════════════════════════════════════════════
Storage Changes:
Slot 0x00: 0x00 → 0x01 (count incremented!)
Gas Used: 43,291
Status: SUCCESS
════════════════════════════════════════════════════════════════════
$ ./evm-cli call --to 0x5B38Da6a701c568545dCfcB03FcB875f56beddC4 \
--data "0x6d4ce63c" # get() function selector
Return Value: 0x0000000000000000000000000000000000000000000000000000000000000001
Decoded: uint256 = 1
3. Debug Reentrancy Vulnerabilities
# Deploy vulnerable contract and attacker contract, then trace the attack:
$ ./evm-cli call --to 0xVulnerable... --data "0x..." --trace
REENTRANCY ATTACK DETECTED
════════════════════════════════════════════════════════════════════
Call Depth: 0 → Vulnerable.withdraw()
Step 15: SLOAD balance[attacker] = 1 ETH
Step 20: CALL → Attacker.fallback() with 1 ETH
Call Depth: 1 → Attacker.fallback()
Step 5: CALL → Vulnerable.withdraw() [REENTRANT!]
Call Depth: 2 → Vulnerable.withdraw()
Step 15: SLOAD balance[attacker] = 1 ETH (NOT YET UPDATED!)
Step 20: CALL → Attacker.fallback() with 1 ETH
Call Depth: 3 → Attacker.fallback()
... (continues until gas exhausted or balance drained)
⚠ VULNERABILITY: State update (SSTORE) happens AFTER external call (CALL)
⚠ FIX: Use checks-effects-interactions pattern
════════════════════════════════════════════════════════════════════
4. Inspect Memory and Storage State
$ ./evm-cli debug --to 0x... --data "0x..."
EVM DEBUGGER
════════════════════════════════════════════════════════════════════
(evm) step
PC: 0x0A | Opcode: MSTORE | Gas: 2999991
(evm) stack
Stack (3 items):
[0] 0x0000...0080 (128)
[1] 0x0000...0040 (64)
[2] 0x0000...0001 (1)
(evm) memory 0 128
Memory (128 bytes):
0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80
^^
Free memory pointer
(evm) storage
Storage (2 slots):
Slot 0x00: 0x0000...0005 (count = 5)
Slot 0x01: 0x0000...owner_address... (owner)
(evm) continue
════════════════════════════════════════════════════════════════════
5. Run the EVM From Scratch Test Suite
$ cargo test
running 116 tests
test evm::tests::test_stop ... ok
test evm::tests::test_add ... ok
test evm::tests::test_mul ... ok
test evm::tests::test_sub ... ok
test evm::tests::test_div ... ok
test evm::tests::test_sdiv ... ok
test evm::tests::test_mod ... ok
...
test evm::tests::test_push1 ... ok
test evm::tests::test_push32 ... ok
test evm::tests::test_dup1 ... ok
...
test evm::tests::test_mstore ... ok
test evm::tests::test_mload ... ok
test evm::tests::test_sstore ... ok
test evm::tests::test_sload ... ok
...
test evm::tests::test_jump ... ok
test evm::tests::test_jumpi ... ok
test evm::tests::test_call ... ok
test evm::tests::test_delegatecall ... ok
test evm::tests::test_create ... ok
test evm::tests::test_selfdestruct ... ok
test result: ok. 116 passed; 0 failed
The Core Question You’re Answering
“How does a blockchain execute code? What actually happens when you call a smart contract function, and why do some operations cost more gas than others?”
Before writing code, understand that the EVM is fundamentally just a stack machine with three areas of data access:
- Stack: Fast, temporary, cheap (1024 elements max, 256-bit words)
- Memory: Volatile byte array, grows during execution, quadratic cost
- Storage: Persistent key-value store, survives transactions, very expensive
Every smart contract vulnerability (reentrancy, integer overflow, access control) has its roots in how the EVM executes bytecode. Understanding the EVM means understanding why contracts behave (and misbehave) the way they do.
Concepts You Must Understand First
Stop and research these before coding:
- Stack Machine Architecture
- What is a stack and why is LIFO important?
- How do you express
a + b * cusing only stack operations? - What is “reverse Polish notation”?
- How is a stack machine different from a register machine?
- Book Reference: “Crafting Interpreters” Ch. 15 - Robert Nystrom
- 256-bit Integer Arithmetic
- Why did Ethereum choose 256-bit words? (Matches cryptographic primitives)
- How do you represent negative numbers? (Two’s complement)
- What is “signed” vs “unsigned” division in the EVM?
- How do you handle overflow? (EVM wraps around!)
- Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 2 - Bryant & O’Hallaron
- Bytecode and Opcodes
- What is an “opcode” and why is it one byte?
- How does the program counter (PC) advance?
- What is the difference between PUSH1 and PUSH32?
- Why are some opcodes followed by immediate data?
- Reference: evm.codes - Interactive opcode reference
- EVM Memory Model
- What is the difference between stack, memory, and storage?
- Why does memory cost grow quadratically?
- What is the “free memory pointer” at address 0x40?
- How are dynamic arrays stored in memory vs storage?
- Book Reference: “Mastering Ethereum” Ch. 13 - Antonopoulos & Wood
- Gas and Execution Limits
- Why does each opcode have a gas cost?
- What is “gas limit” vs “gas price”?
- How does gas prevent infinite loops?
- Why does SSTORE cost so much more than ADD?
- Reference: Ethereum Yellow Paper Appendix G
- Call Semantics (CALL, DELEGATECALL, STATICCALL)
- What is a “message call” and how does it create a new execution context?
- How does DELEGATECALL preserve msg.sender and storage context?
- What happens to gas during a call?
- What is the “call depth limit” and why does it exist?
- Book Reference: “Mastering Ethereum” Ch. 13 - Antonopoulos & Wood
Questions to Guide Your Design
Before implementing, think through these:
- Data Representation
- How will you represent 256-bit integers? (BigInt library? Fixed array?)
- How will you handle the stack? (Vec/Array with push/pop?)
- How will you store memory? (Byte array that grows as needed?)
- How will you store storage? (HashMap of slot → value?)
- Opcode Dispatch
- How will you map opcode bytes to handler functions?
- How will you handle PUSHn opcodes (which read n bytes of immediate data)?
- How will you implement DUPn and SWAPn (parameterized by n)?
- Should you use a switch statement, function table, or trait objects?
- Execution Context
- What state do you need to track? (PC, stack, memory, storage, gas, etc.)
- How will you handle nested calls? (New context per call?)
- How will you pass msg.sender, msg.value, calldata?
- How will you handle return data from calls?
- Gas Accounting
- Where in your code will you deduct gas?
- How will you handle “out of gas” mid-execution?
- How will you calculate dynamic gas costs (memory expansion)?
- How will you handle gas refunds (SSTORE clearing)?
- Control Flow
- How will you validate JUMP destinations (must be JUMPDEST)?
- How will you handle STOP, RETURN, REVERT, INVALID?
- How will you implement conditionals (JUMPI)?
- How will you detect infinite loops (gas exhaustion)?
Thinking Exercise
Before coding, trace this bytecode by hand:
Bytecode: 60 03 60 05 01 60 02 02 60 00 52 60 20 60 00 f3
Disassembly:
00: PUSH1 0x03 Push 3 onto stack
02: PUSH1 0x05 Push 5 onto stack
04: ADD Pop 2, push sum (3+5=8)
05: PUSH1 0x02 Push 2 onto stack
07: MUL Pop 2, push product (8*2=16)
08: PUSH1 0x00 Push 0 onto stack
0A: MSTORE Store 16 at memory[0:32]
0B: PUSH1 0x20 Push 32 onto stack
0D: PUSH1 0x00 Push 0 onto stack
0F: RETURN Return memory[0:32]
Trace:
Step 1: PUSH1 0x03 Stack: [3] Memory: empty
Step 2: PUSH1 0x05 Stack: [3, 5] Memory: empty
Step 3: ADD Stack: [8] Memory: empty
Step 4: PUSH1 0x02 Stack: [8, 2] Memory: empty
Step 5: MUL Stack: [16] Memory: empty
Step 6: PUSH1 0x00 Stack: [16, 0] Memory: empty
Step 7: MSTORE Stack: [] Memory[0:32] = 0x...0010 (16)
Step 8: PUSH1 0x20 Stack: [32] Memory[0:32] = 0x...0010
Step 9: PUSH1 0x00 Stack: [32, 0] Memory[0:32] = 0x...0010
Step 10: RETURN Return 32 bytes from memory offset 0
Result: 0x0000000000000000000000000000000000000000000000000000000000000010
= 16 in decimal
Question: What is the total gas cost of this execution?
- PUSH1: 3 gas × 5 = 15 gas
- ADD: 3 gas
- MUL: 5 gas
- MSTORE: 3 gas + (memory expansion cost)
- RETURN: 0 gas
The Interview Questions They’ll Ask
Prepare to answer these:
- “What is the EVM and why is it stack-based?”
- “Explain the difference between memory, storage, and the stack.”
- “Why does SSTORE cost 20,000 gas while ADD costs only 3?”
- “What is a reentrancy attack and how does it exploit CALL semantics?”
- “How does DELEGATECALL differ from CALL? When would you use each?”
- “What happens when a contract runs out of gas mid-execution?”
- “How does the EVM prevent infinite loops?”
- “What is a JUMPDEST and why is it required?”
- “How are function selectors (4-byte signatures) used in the EVM?”
- “What is the ‘free memory pointer’ and where is it stored?”
- “Why do we need STATICCALL? What security guarantee does it provide?”
- “How would you implement your own ERC-20 token at the bytecode level?”
Hints in Layers
Hint 1: Start with Stack Operations Get PUSH, POP, DUP, SWAP working first:
struct EVM {
stack: Vec<U256>, // Use a 256-bit integer library
pc: usize,
code: Vec<u8>,
gas: u64,
}
impl EVM {
fn execute(&mut self) -> Result<Vec<u8>, &'static str> {
while self.pc < self.code.len() {
let opcode = self.code[self.pc];
match opcode {
0x00 => break, // STOP
0x01 => self.op_add()?,
0x60 => self.op_push(1)?, // PUSH1
0x61 => self.op_push(2)?, // PUSH2
// ...
_ => return Err("Invalid opcode"),
}
}
Ok(vec![])
}
fn op_add(&mut self) -> Result<(), &'static str> {
let a = self.stack.pop().ok_or("Stack underflow")?;
let b = self.stack.pop().ok_or("Stack underflow")?;
self.stack.push(a.wrapping_add(b)); // EVM wraps on overflow!
self.gas -= 3;
self.pc += 1;
Ok(())
}
fn op_push(&mut self, n: usize) -> Result<(), &'static str> {
let value = &self.code[self.pc + 1..self.pc + 1 + n];
self.stack.push(U256::from_big_endian(value));
self.gas -= 3;
self.pc += 1 + n;
Ok(())
}
}
Hint 2: Memory is a Growing Byte Array
struct Memory {
data: Vec<u8>,
}
impl Memory {
fn expand_to(&mut self, offset: usize, size: usize) -> u64 {
let needed = offset + size;
if needed > self.data.len() {
let old_words = (self.data.len() + 31) / 32;
self.data.resize(needed, 0);
let new_words = (self.data.len() + 31) / 32;
// Memory expansion cost is quadratic!
let old_cost = old_words * 3 + (old_words * old_words) / 512;
let new_cost = new_words * 3 + (new_words * new_words) / 512;
return (new_cost - old_cost) as u64;
}
0
}
}
Hint 3: Storage is Just a HashMap
use std::collections::HashMap;
struct Storage {
slots: HashMap<U256, U256>,
}
impl Storage {
fn sload(&self, slot: &U256) -> U256 {
self.slots.get(slot).cloned().unwrap_or(U256::zero())
}
fn sstore(&mut self, slot: U256, value: U256) -> u64 {
let old = self.sload(&slot);
let gas = if old.is_zero() && !value.is_zero() {
20000 // Setting non-zero from zero
} else if !old.is_zero() && value.is_zero() {
5000 // Clearing (plus refund)
} else {
5000 // Modifying
};
self.slots.insert(slot, value);
gas
}
}
Hint 4: Use the evm-from-scratch Test Suite Clone https://github.com/w1nt3r-eth/evm-from-scratch and run tests as you go. Each test is one opcode—perfect for incremental development.
Hint 5: Handle CALL Last CALL is the most complex opcode because it creates a new execution context. Get everything else working first, then tackle CALL, DELEGATECALL, and STATICCALL.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Stack machine fundamentals | Crafting Interpreters by Robert Nystrom | Ch. 14-15: “Chunks of Bytecode” and “A Virtual Machine” |
| EVM specification | Mastering Ethereum by Antonopoulos & Wood | Ch. 13: “The Ethereum Virtual Machine” |
| Binary representation | Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron | Ch. 2: “Representing Information” |
| VM dispatch techniques | Virtual Machines: Versatile Platforms by Iain D. Craig | Ch. 2-3: “Interpreters” |
| Smart contract security | Mastering Ethereum | Ch. 9: “Smart Contract Security” |
| Solidity internals | Ethereum Smart Contract Development by Mayukh Mukhopadhyay | Ch. 6-7: “Smart Contract Internals” |
| Formal EVM specification | Ethereum Yellow Paper | Appendix H: “Virtual Machine Specification” |
Project 4: Implement a Proof-of-Stake Consensus
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Programming Language: C
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 5. The “Industry Disruptor”
- Difficulty: Level 5: Master
- Knowledge Area: Distributed Consensus / Game Theory
- Software or Tool: Consensus Algorithms
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build: A consensus mechanism where validators stake tokens and are selected to propose blocks based on stake weight, with slashing for misbehavior.
Why it teaches consensus: Proof-of-Stake is how modern chains (Ethereum 2.0, Solana, Cardano) work. Building it teaches you the game theory: why validators behave honestly, what happens during forks, and how finality differs from PoW.
Core challenges you’ll face:
- Validator selection → Maps to randomness and stake weighting
- Block proposal and attestation → Maps to committee-based consensus
- Slashing conditions → Maps to punishing equivocation
- Finality gadgets → Maps to when transactions become irreversible
- Nothing-at-stake problem → Maps to PoS vs PoW tradeoffs
Key Concepts:
- Byzantine Fault Tolerance: “Designing Data-Intensive Applications” Ch. 8 - Martin Kleppmann
- Consensus Algorithms: Ethereum’s Casper FFG paper
- Game Theory: “Mastering Ethereum” Ch. 14 - Antonopoulos & Wood
Difficulty: Advanced Time estimate: 2-4 weeks Prerequisites: Understanding of distributed systems, cryptography basics
Real world outcome: You’ll run a network where validators stake tokens, get selected to propose blocks, attest to others’ blocks, and get slashed if they try to cheat. You can simulate attacks and watch the protocol defend against them.
Learning milestones:
- Validators stake and get selected - You understand stake-weighted randomness
- Blocks reach finality - You understand supermajority attestation
- Slashing works - You understand the economic security model
Project 5: Build a Simple Smart Contract Compiler
- File: BLOCKCHAIN_BITCOIN_ETHEREUM_LEARNING_PROJECTS.md
- Main Programming Language: Rust
- Alternative Programming Languages: Go, TypeScript, C
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: Level 1: The “Resume Gold”
- Difficulty: Level 5: Master (The First-Principles Wizard)
- Knowledge Area: Compilers, Blockchain
- Software or Tool: Solidity, EVM, LLVM
- Main Book: “Writing a C Compiler” by Nora Sandler
What you’ll build: A compiler that takes a tiny Solidity-like language and outputs EVM bytecode—covering parsing, type checking, and code generation.
Why it teaches smart contracts: You’ll understand why Solidity has its quirks, how high-level code becomes opcodes, what the ABI actually is, and why certain patterns are gas-expensive.
Core challenges you’ll face:
- Lexing and parsing → Maps to contract syntax
- Type system → Maps to Solidity’s types (address, uint256, etc.)
- Storage layout → Maps to how state variables map to slots
- Function dispatch → Maps to the 4-byte function selector
- ABI encoding → Maps to how calldata is structured
Resources for key challenges:
- “Writing a C Compiler” by Nora Sandler - Apply patterns to a different target
- “Crafting Interpreters” by Robert Nystrom - Parsing and bytecode fundamentals
Key Concepts:
- Compiler Construction: “Writing a C Compiler” - Nora Sandler
- ABI Specification: Solidity documentation
- Storage Layout: “Mastering Ethereum” Ch. 13 - Antonopoulos
Difficulty: Advanced Time estimate: 1 month+ Prerequisites: Built an interpreter before, understand EVM basics
Real world outcome: Write a contract in your mini-language, compile it, deploy the bytecode to your EVM (from Project 3), and call functions on it. See your high-level code execute as stack operations.
Learning milestones:
- Compile arithmetic expressions - You understand stack-based code generation
- Compile storage variables - You understand SLOAD/SSTORE targeting
- Compile functions with ABI - You understand Ethereum’s calling convention
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| Bitcoin From Scratch | Advanced | 1 month+ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Minimal Blockchain | Intermediate | Weekend | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| EVM From Scratch | Advanced | 2-4 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Proof-of-Stake | Advanced | 2-4 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Smart Contract Compiler | Advanced | 1 month+ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
Recommendation
Start with Project 2 (Minimal Blockchain in a Weekend), then branch based on your interest:
┌─────────────────────────────────────┐
│ Project 2: Minimal Blockchain │
│ (Start here - core mental model) │
└──────────────┬──────────────────────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Bitcoin Path │ │ Ethereum Path │ │ Consensus Path │
│ Project 1 │ │ Project 3 → 5 │ │ Project 4 │
│ (Cryptography │ │ (Smart contract │ │ (Distributed │
│ deep dive) │ │ execution) │ │ systems) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- If you want to understand cryptography and Bitcoin specifically → Project 1 with Jimmy Song’s book
- If you want to understand smart contracts and Ethereum → Project 3 (EVM) then Project 5 (Compiler)
- If you want to understand distributed consensus → Project 4 (PoS)
Final Overall Project: Build a Full Layer-2 Rollup
What you’ll build: A complete Layer-2 scaling solution—a rollup that batches transactions off-chain, posts compressed data to a simulated L1, and allows users to withdraw via fraud proofs or validity proofs.
Why it teaches everything: This project synthesizes all the concepts: you need cryptography (for signatures and commitments), the EVM (to execute rollup transactions), consensus (for sequencer selection), and compilers (to generate proof circuits). It’s how Optimism, Arbitrum, and zkSync actually work.
Core challenges you’ll face:
- Transaction batching → Compress and post to L1
- State commitments → Merkle roots of rollup state
- Fraud proofs (Optimistic) → Challenge invalid state transitions
- Validity proofs (ZK) → Prove execution correctness cryptographically
- Bridge contracts → Deposit/withdraw between L1 and L2
- Sequencer economics → Who orders transactions and why
Resources for key challenges:
- Optimism’s cannon and fault proof specs
- Vitalik’s Incomplete Guide to Rollups
- “Mastering Ethereum” for bridge contract patterns
Key Concepts:
- Rollup Architecture: Vitalik’s rollup blog posts
- Fraud Proofs: Optimism documentation
- ZK Proofs: “Proofs, Arguments, and Zero-Knowledge” by Justin Thaler
- Bridge Security: “Mastering Ethereum” - Antonopoulos
Difficulty: Expert Time estimate: 2-3 months Prerequisites: Completed Projects 1-3, understand both Bitcoin and Ethereum
Real world outcome: You’ll deploy a rollup where users can deposit tokens from your L1, transact cheaply on your L2, and withdraw back to L1 after a challenge period (optimistic) or immediately (ZK). You can process thousands of transactions per L1 block.
Learning milestones:
- Batch transactions and post to L1 - You understand data availability
- Execute and commit state - You understand rollup state machines
- Fraud/validity proofs work - You understand the security model
- Bridge deposits and withdrawals - You understand L1↔L2 interop
Sources
- roadmap.sh/blockchain - Blockchain developer roadmap
- GitHub: Blockchain Development Resources - Comprehensive resource collection
- Bitcoin Whitepaper - Original Satoshi paper
- Programming Bitcoin by Jimmy Song - O’Reilly
- Mastering Bitcoin GitHub - Free 3rd edition
- EVM From Scratch - W1nt3r.eth’s course
- EVM From Scratch Book - Jupyter notebooks
- Mastering Ethereum Ch. 13 - EVM - GitBook
- Ethereum.org EVM Docs - Official documentation