P06: HD Wallet (BIP-32/BIP-39) Implementation

P06: HD Wallet (BIP-32/BIP-39) Implementation

Project Overview

Attribute Value
Main Language Rust
Alternative Languages Go, Python, TypeScript
Difficulty Expert
Coolness Level Level 4: Hardcore Tech Flex
Business Potential Resume Gold (Educational/Personal Brand)
Knowledge Area Cryptography / Key Management
Main Book “Mastering Bitcoin” by Andreas M. Antonopoulos

Learning Objectives

By completing this project, you will:

  1. Master mnemonic seed phrase generation understanding how entropy is encoded into memorable words (BIP-39)
  2. Implement hierarchical key derivation using HMAC-SHA512 to derive infinite child keys from a single master seed (BIP-32)
  3. Understand the security tradeoffs between hardened and normal derivation paths
  4. Build a multi-currency wallet supporting Bitcoin, Ethereum, and other chains from a single recovery phrase
  5. Parse and validate derivation paths implementing the m/44’/60’/0’/0/0 notation used by modern wallets

Deep Theoretical Foundation

The Problem: Key Management at Scale

Before hierarchical deterministic (HD) wallets, cryptocurrency users faced a nightmare:

  • Random key generation: Each new address required a new random private key
  • Backup complexity: Users needed to back up every single key separately
  • No organization: Keys were just a bag of random numbers with no structure
  • Recovery nightmare: Lose one backup file, lose those coins forever

Imagine managing 100 different addresses, each with its own private key, each needing secure backup. This was the reality of Bitcoin in its early days.

The Solution: Deterministic Hierarchy

HD wallets solve this elegantly:

  1. Single seed: All keys derive from one master secret
  2. Deterministic: The same seed always produces the same keys in the same order
  3. Hierarchical: Keys are organized in a tree structure with meaningful paths
  4. Human-readable backup: The seed is encoded as 12-24 English words

With an HD wallet, you back up 24 words once, and you can recover every address you’ll ever create.

BIP-39: Mnemonic Seed Phrases

BIP-39 defines how to convert entropy into memorable words and back into a binary seed.

The Word List

BIP-39 specifies a list of 2048 words (2^11 = 2048, so each word encodes 11 bits). The English word list includes words like:

abandon, ability, able, about, above, absent, absorb, abstract, absurd, abuse...

Why 2048 words?

  • Powers of 2 enable clean bit-to-word mapping
  • 2048 words is enough for variety while remaining memorable
  • Each word can be identified by its first 4 letters (no duplicates)

From Entropy to Mnemonic

The process:

1. Generate entropy (128-256 bits of random data)
2. Calculate checksum: SHA256(entropy)[first N bits]
   where N = entropy_bits / 32
3. Append checksum to entropy
4. Split into 11-bit groups
5. Each group indexes into the word list

Example for 128-bit entropy:

Entropy:         128 bits (16 bytes)
Checksum:        128/32 = 4 bits (first 4 bits of SHA256)
Total:           132 bits
Groups:          132/11 = 12 words

For 256-bit entropy:

Entropy:         256 bits (32 bytes)
Checksum:        256/32 = 8 bits
Total:           264 bits
Groups:          264/11 = 24 words

Visual Representation

                    Entropy Generation
                           |
              +------------+------------+
              |       128-256 bits      |
              |    (random bytes)       |
              +------------+------------+
                           |
                           v
              +------------------------+
              |   SHA256(entropy)      |
              |   Take first N bits    |
              |   (checksum)           |
              +------------------------+
                           |
              +------------+------------+
              | entropy || checksum    |
              | 132-264 bits total     |
              +------------+------------+
                           |
        +---------+---------+---------+---------+
        | 11 bits | 11 bits | 11 bits | ...     |
        +----+----+----+----+----+----+---------+
             |         |         |
             v         v         v
        +--------+ +--------+ +--------+
        | word 1 | | word 2 | | word 3 | ...
        +--------+ +--------+ +--------+

From Mnemonic to Seed

The mnemonic is not the seed directly. It’s converted using PBKDF2:

seed = PBKDF2(
    password = mnemonic_words (space-separated),
    salt = "mnemonic" + optional_passphrase,
    iterations = 2048,
    key_length = 64 bytes (512 bits),
    hash_function = HMAC-SHA512
)

Why PBKDF2?

  • Adds computational cost to brute-force attacks
  • Allows an optional passphrase (the “25th word”)
  • Produces a 512-bit seed for BIP-32

The optional passphrase:

  • Acts as a second factor for wallet recovery
  • Different passphrases produce completely different wallets
  • Plausible deniability: same mnemonic with different passphrases = different wallets

BIP-32: Hierarchical Deterministic Wallets

BIP-32 defines how to derive a tree of keys from a single seed.

Master Key Generation

1. Take the 512-bit seed from BIP-39
2. Calculate: I = HMAC-SHA512(key="Bitcoin seed", data=seed)
3. Split I into two 256-bit halves:
   - IL (left 32 bytes) = master private key
   - IR (right 32 bytes) = master chain code

The chain code is crucial: it adds entropy to the derivation process, preventing child keys from being derived if only the private key is known.

Extended Keys

An “extended key” combines:

  • The key itself (32 bytes private or 33 bytes public)
  • The chain code (32 bytes)
  • Metadata (depth, parent fingerprint, child index)

This is what you serialize and share as xprv... or xpub... strings.

Extended Private Key (78 bytes):
+--------+----------+--------+----------+-------+-----------+
| Version| Depth    | Parent | Child    | Chain | Key       |
| 4 bytes| 1 byte   | 4 bytes| Index    | Code  | Data      |
|        |          | finger | 4 bytes  | 32b   | 33 bytes  |
+--------+----------+--------+----------+-------+-----------+

Version codes:
  0x0488ADE4 = xprv (mainnet private)
  0x0488B21E = xpub (mainnet public)
  0x04358394 = tprv (testnet private)
  0x043587CF = tpub (testnet public)

Child Key Derivation Function (CKD)

This is the heart of BIP-32. Given a parent key and an index, derive a child key.

For private parent to private child:

Input: parent_private_key (k), parent_chain_code (c), index (i)

If i >= 2^31 (hardened):
    I = HMAC-SHA512(key=c, data=0x00 || k || i)
Else (normal):
    I = HMAC-SHA512(key=c, data=point(k) || i)

IL, IR = split I into 256-bit halves
child_key = (IL + k) mod n     // n = secp256k1 order
child_chain_code = IR

Output: (child_key, child_chain_code)

For public parent to public child (normal derivation only):

Input: parent_public_key (K), parent_chain_code (c), index (i)

If i >= 2^31:
    FAIL  // Cannot derive hardened from public
Else:
    I = HMAC-SHA512(key=c, data=K || i)

IL, IR = split I into 256-bit halves
child_key = point(IL) + K     // Point addition
child_chain_code = IR

Output: (child_key, child_chain_code)

Hardened vs Normal Derivation

This is one of the most important security concepts in HD wallets.

Normal derivation (index 0 to 2^31-1):

- Uses compressed public key in HMAC input
- Public key can derive public children
- If a child private key leaks along with parent public key + chain code,
  all sibling private keys can be computed!

Hardened derivation (index 2^31 to 2^32-1, shown as i’):

- Uses private key in HMAC input
- Public key CANNOT derive children
- Even if a child private key leaks, siblings are protected

Visual comparison:

Normal Derivation (i < 2^31):

  Parent Private  ──────────────────▶  Child Private
        │                                   │
        │ point()                           │ point()
        ▼                                   ▼
  Parent Public  ───────────────────▶  Child Public
        ▲                                   ▲
        │                                   │
   (can derive children from xpub)    (anyone can verify)


Hardened Derivation (i >= 2^31):

  Parent Private  ──────────────────▶  Child Private
        │                                   │
        │ point()                           │ point()
        ▼                                   ▼
  Parent Public         X               Child Public
        ▲               │                   ▲
        │           (blocked)               │
   (cannot derive hardened children)   (anyone can verify)

Why does this matter?

Scenario: You run an e-commerce site and want to generate fresh Bitcoin addresses for each customer. You share your xpub with your server so it can derive new addresses without ever knowing private keys.

  • If you use normal derivation and an attacker steals both (1) your xpub and (2) any one child private key, they can compute the master private key and steal all funds!
  • If you use hardened derivation up to the account level, stealing one child key only compromises that child.

The standard practice: hardened derivation for purpose, coin type, and account; normal derivation for change and address index (so you can share account xpubs).

BIP-44: Multi-Account Hierarchy

BIP-44 standardizes the derivation path structure:

m / purpose' / coin_type' / account' / change / address_index

Each level:

Level Hardened? Purpose
purpose’ Yes Always 44’ for BIP-44
coin_type’ Yes Cryptocurrency identifier (0’=BTC, 60’=ETH, 2’=LTC)
account’ Yes User’s separate accounts (0’, 1’, 2’…)
change No 0=external (receiving), 1=internal (change)
address_index No Sequential address number (0, 1, 2…)

Common paths:

Bitcoin mainnet:       m/44'/0'/0'/0/0
Ethereum mainnet:      m/44'/60'/0'/0/0
Bitcoin testnet:       m/44'/1'/0'/0/0
Bitcoin first change:  m/44'/0'/0'/1/0
Second account ETH:    m/44'/60'/1'/0/0

Path notation:

  • m = master key (from seed)
  • / = derivation step
  • ' = hardened derivation (index + 2^31)
  • Number = index

Key Recovery: The Full Circle

When you enter your 24 words into a new wallet:

1. Validate words exist in BIP-39 word list
2. Convert words back to entropy + checksum bits
3. Verify checksum matches
4. Apply PBKDF2 with mnemonic and passphrase
5. Use BIP-32 to derive master key
6. Apply BIP-44 paths to regenerate all addresses
7. Scan blockchain for transactions at those addresses

This is why the order of words matters, why typos fail validation, and why the same 24 words always recover the same wallet.

Security Considerations

Entropy Quality

The security of your entire wallet depends on entropy quality:

  • Use a cryptographically secure random number generator (CSPRNG)
  • Never use predictable sources (dates, names, keyboard patterns)
  • 128 bits = ~2^128 attempts to brute force (sufficient)
  • 256 bits = future-proof against quantum computers

Passphrase Considerations

The optional passphrase:

  • Advantage: Second factor, plausible deniability
  • Risk: Forget it = lose everything (not recoverable from mnemonic alone)
  • Risk: Weak passphrase = brute-forceable with known mnemonic
  • Best practice: Either use a strong passphrase or none at all

Extended Public Key Exposure

Sharing an xpub:

  • Reveals all past and future addresses derived from it
  • Privacy concern: anyone with xpub can track your balance
  • Combined with leaked child private key = master key compromise (normal derivation)
  • Safe for: Watch-only wallets, address generation servers (with hardened account)

Complete Project Specification

Functional Requirements

  1. Mnemonic Generation (BIP-39)
    • Generate cryptographically random entropy (128, 160, 192, 224, or 256 bits)
    • Calculate and append checksum
    • Convert to mnemonic word sequence
    • Support English word list (optional: other languages)
  2. Seed Derivation (BIP-39)
    • Implement PBKDF2-HMAC-SHA512
    • Support optional passphrase
    • Produce 512-bit seed
  3. Master Key Generation (BIP-32)
    • Derive master private key and chain code from seed
    • Validate key is within secp256k1 group order
  4. Child Key Derivation (BIP-32)
    • Implement CKDpriv (private parent to private child)
    • Implement CKDpub (public parent to public child)
    • Support both normal and hardened derivation
    • Handle edge cases (IL >= n, result = 0)
  5. Path Parsing (BIP-44)
    • Parse derivation paths like m/44'/60'/0'/0/0
    • Apply sequential derivations
    • Validate path format
  6. Address Generation
    • Derive public key from private key (secp256k1)
    • Generate Bitcoin addresses (P2PKH, P2WPKH)
    • Generate Ethereum addresses (keccak256)
  7. Serialization
    • Encode extended keys as Base58Check (xprv, xpub)
    • Decode and validate extended key strings

Command-Line Interface

# Generate new mnemonic
$ hdwallet generate --words 24
Mnemonic: abandon abandon abandon abandon abandon abandon abandon abandon
          abandon abandon abandon about
Seed: 5eb00bbddcf069084889a8ab9155568165f5c453ccb85e70811aaed6f6da5fc19a5ac40b389cd370d086206dec8aa6c43daea6690f20ad3d8d48b2d2ce9e38e4

# Derive keys from mnemonic
$ hdwallet derive --mnemonic "abandon..." --path "m/44'/60'/0'/0/0"
Path: m/44'/60'/0'/0/0
Private Key: 0x...
Public Key: 0x...
Ethereum Address: 0x...

# Generate Bitcoin addresses
$ hdwallet derive --mnemonic "abandon..." --path "m/44'/0'/0'/0" --count 5
Address 0: 1...
Address 1: 1...
Address 2: 1...
Address 3: 1...
Address 4: 1...

# Export extended public key
$ hdwallet xpub --mnemonic "abandon..." --path "m/44'/0'/0'"
xpub: xpub6BosfCnifzxcFwrSzQiqu2DBVTshkCXacvNsWGYJVVhhawA7d4R5WSWGFNbi8Aw6ZRc1brxMyWMzG3DSSSSoekkudhUd9yLb6qx39T9nMdj

# Recover wallet
$ hdwallet recover --mnemonic "abandon..." --passphrase "optional"
Master Private Key: xprv...
Master Public Key: xpub...

Solution Architecture

Module Structure

src/
├── main.rs               # CLI entry point
├── lib.rs                # Public API
├── mnemonic/
│   ├── mod.rs            # Mnemonic coordination
│   ├── wordlist.rs       # BIP-39 word list (English)
│   ├── entropy.rs        # Entropy generation and checksum
│   └── encoding.rs       # Words <-> bits conversion
├── seed/
│   ├── mod.rs            # Seed derivation
│   └── pbkdf2.rs         # PBKDF2-HMAC-SHA512 implementation
├── bip32/
│   ├── mod.rs            # BIP-32 coordination
│   ├── master.rs         # Master key generation
│   ├── derivation.rs     # Child key derivation (CKD)
│   ├── extended_key.rs   # Extended key structure
│   └── serialization.rs  # Base58Check encoding
├── bip44/
│   ├── mod.rs            # BIP-44 path parsing
│   └── path.rs           # Path structure and validation
├── address/
│   ├── mod.rs            # Address generation
│   ├── bitcoin.rs        # Bitcoin address formats
│   └── ethereum.rs       # Ethereum address format
├── crypto/
│   ├── mod.rs            # Cryptographic primitives
│   ├── hmac.rs           # HMAC-SHA512
│   ├── sha256.rs         # SHA-256 (from P01)
│   ├── sha512.rs         # SHA-512
│   └── secp256k1.rs      # Elliptic curve operations (from P02)
└── tests/
    ├── bip39_vectors.rs  # Official BIP-39 test vectors
    ├── bip32_vectors.rs  # Official BIP-32 test vectors
    └── bip44_tests.rs    # Path derivation tests

Core Data Structures

/// A BIP-39 mnemonic phrase
pub struct Mnemonic {
    words: Vec<String>,
    language: Language,
}

/// The derived seed from a mnemonic
pub struct Seed {
    bytes: [u8; 64],  // 512 bits
}

/// An extended key (private or public)
pub struct ExtendedKey {
    /// Network version (mainnet/testnet, private/public)
    version: [u8; 4],
    /// How many derivations from master (0 for master)
    depth: u8,
    /// First 4 bytes of parent's key identifier
    parent_fingerprint: [u8; 4],
    /// Which child this is (0 for master)
    child_index: u32,
    /// Chain code for child derivation
    chain_code: [u8; 32],
    /// The key data (33 bytes: 0x00 + privkey OR compressed pubkey)
    key_data: [u8; 33],
}

/// A derivation path component
pub enum PathComponent {
    /// Normal derivation (0 to 2^31 - 1)
    Normal(u32),
    /// Hardened derivation (shown as i' or iH)
    Hardened(u32),
}

/// A complete derivation path
pub struct DerivationPath {
    /// Path components after 'm'
    components: Vec<PathComponent>,
}

/// Network configuration
pub enum Network {
    Bitcoin(BitcoinNetwork),
    Ethereum,
    Litecoin(LitecoinNetwork),
}

pub enum BitcoinNetwork {
    Mainnet,
    Testnet,
}

Key Algorithms

Mnemonic to Seed (PBKDF2)

function mnemonic_to_seed(mnemonic: string, passphrase: string) -> [u8; 64]:
    password = normalize_nfkd(mnemonic)  // Unicode normalization
    salt = "mnemonic" + normalize_nfkd(passphrase)

    // PBKDF2 with HMAC-SHA512
    derived_key = empty
    block_count = 1  // For 64 bytes output with SHA512

    for block in 1..block_count+1:
        u = hmac_sha512(password, salt || big_endian_u32(block))
        result = u

        for iteration in 2..2048+1:
            u = hmac_sha512(password, u)
            result = xor(result, u)

        derived_key.append(result)

    return derived_key[0..64]

Master Key Derivation

function master_key_from_seed(seed: [u8; 64]) -> ExtendedKey:
    // Use "Bitcoin seed" regardless of target cryptocurrency
    I = hmac_sha512(key="Bitcoin seed", data=seed)

    IL = I[0..32]   // Master secret key
    IR = I[32..64]  // Master chain code

    // Validate: key must be valid secp256k1 scalar
    if IL >= SECP256K1_ORDER or IL == 0:
        raise "Invalid master key"

    return ExtendedKey {
        version: MAINNET_PRIVATE,  // 0x0488ADE4
        depth: 0,
        parent_fingerprint: [0; 4],
        child_index: 0,
        chain_code: IR,
        key_data: [0x00] + IL,  // 0x00 prefix for private keys
    }

Child Key Derivation (Private)

function derive_child_private(
    parent: ExtendedKey,
    index: u32,
) -> ExtendedKey:
    assert(parent.is_private())

    if index >= HARDENED_OFFSET:  // 0x80000000
        // Hardened: use private key in HMAC input
        data = [0x00] + parent.private_key() + big_endian_u32(index)
    else:
        // Normal: use public key in HMAC input
        data = parent.public_key() + big_endian_u32(index)

    I = hmac_sha512(key=parent.chain_code, data=data)
    IL = I[0..32]
    IR = I[32..64]

    // Child key = (IL + parent_key) mod n
    child_key = (parse_256(IL) + parent.private_key_scalar()) mod SECP256K1_ORDER

    if IL >= SECP256K1_ORDER or child_key == 0:
        // Extremely rare: try next index
        return derive_child_private(parent, index + 1)

    return ExtendedKey {
        version: parent.version,
        depth: parent.depth + 1,
        parent_fingerprint: parent.fingerprint(),
        child_index: index,
        chain_code: IR,
        key_data: [0x00] + child_key.to_bytes(),
    }

Child Key Derivation (Public)

function derive_child_public(
    parent: ExtendedKey,
    index: u32,
) -> ExtendedKey:
    assert(parent.is_public())
    assert(index < HARDENED_OFFSET)  // Cannot derive hardened from public

    data = parent.public_key() + big_endian_u32(index)

    I = hmac_sha512(key=parent.chain_code, data=data)
    IL = I[0..32]
    IR = I[32..64]

    // Child pubkey = point(IL) + parent_pubkey
    point_IL = scalar_mult(IL, G)  // IL * generator
    child_pubkey = point_add(point_IL, parent.public_key_point())

    if IL >= SECP256K1_ORDER or child_pubkey.is_infinity():
        return derive_child_public(parent, index + 1)

    return ExtendedKey {
        version: parent.version,
        depth: parent.depth + 1,
        parent_fingerprint: parent.fingerprint(),
        child_index: index,
        chain_code: IR,
        key_data: child_pubkey.compress(),  // 33 bytes
    }

Phased Implementation Guide

Phase 1: BIP-39 Word List and Entropy

Goal: Generate valid mnemonic phrases.

Tasks:

  1. Include the BIP-39 English word list (2048 words)
  2. Generate cryptographically random entropy (16-32 bytes)
  3. Calculate SHA-256 checksum
  4. Append checksum bits to entropy
  5. Split into 11-bit groups and map to words

Validation:

// Known test vector from BIP-39
let entropy = hex!("00000000000000000000000000000000");
let mnemonic = generate_mnemonic(&entropy);
assert_eq!(
    mnemonic.words.join(" "),
    "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about"
);

Hints if stuck:

  • Checksum is first entropy_bits / 32 bits of SHA256(entropy)
  • Use bit manipulation to extract 11-bit groups
  • Word index = 11-bit value as unsigned integer

Phase 2: PBKDF2-HMAC-SHA512

Goal: Convert mnemonic to binary seed.

Tasks:

  1. Implement HMAC-SHA512 (if not using library)
  2. Implement PBKDF2 with 2048 iterations
  3. Apply Unicode NFKD normalization to inputs
  4. Generate 64-byte seed from mnemonic + passphrase

Validation:

let mnemonic = "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about";
let passphrase = "";
let seed = mnemonic_to_seed(mnemonic, passphrase);
assert_eq!(
    hex::encode(&seed),
    "5eb00bbddcf069084889a8ab9155568165f5c453ccb85e70811aaed6f6da5fc19a5ac40b389cd370d086206dec8aa6c43daea6690f20ad3d8d48b2d2ce9e38e4"
);

Hints if stuck:

  • PBKDF2: F(Password, Salt, c, i) = U_1 XOR U_2 XOR ... XOR U_c
  • Where U_1 = PRF(Password, Salt || INT(i)) and U_j = PRF(Password, U_{j-1})
  • Salt is literally "mnemonic" + passphrase string
  • Consider using the unicode-normalization crate for NFKD

Phase 3: Master Key Derivation

Goal: Generate master extended key from seed.

Tasks:

  1. Apply HMAC-SHA512 with key “Bitcoin seed”
  2. Split result into private key (IL) and chain code (IR)
  3. Validate private key is valid secp256k1 scalar
  4. Create ExtendedKey structure

Validation:

let seed = hex!("5eb00bbddcf069084889a8ab9155568165f5c453ccb85e70811aaed6f6da5fc19a5ac40b389cd370d086206dec8aa6c43daea6690f20ad3d8d48b2d2ce9e38e4");
let master = master_key_from_seed(&seed);
assert_eq!(
    hex::encode(master.private_key()),
    "e8f32e723decf4051aefac8e2c93c9c5b214313817cdb01a1494b917c8436b35"
);
assert_eq!(
    hex::encode(&master.chain_code),
    "873dff81c02f525623fd1fe5167eac3a55a049de3d314bb42ee227ffed37d508"
);

Hints if stuck:

  • “Bitcoin seed” is a UTF-8 string used as HMAC key
  • Private key must be < secp256k1 order (n)
  • If invalid, specification says “master key is invalid” (regenerate entropy)

Phase 4: Child Key Derivation (Normal)

Goal: Derive child keys using normal derivation.

Tasks:

  1. Implement CKDpriv for private parent to private child
  2. Serialize parent public key (compressed, 33 bytes)
  3. Construct HMAC input: pubkey   index (37 bytes)
  4. Calculate child key as (IL + parent) mod n
  5. Handle edge cases (IL >= n, result = 0)

Validation:

let parent = master_key;
let child_0 = derive_child(&parent, 0);
// Test against BIP-32 test vectors

Hints if stuck:

  • Index is 4-byte big-endian
  • Use secp256k1 point multiplication from Project 2
  • Child private key = (IL + parent private key) mod curve order

Phase 5: Child Key Derivation (Hardened)

Goal: Implement hardened derivation.

Tasks:

  1. Detect hardened index (>= 0x80000000)
  2. For hardened: HMAC input is 0x00   private_key   index (37 bytes)
  3. Same arithmetic as normal derivation
  4. Update fingerprint calculation

Validation:

// Derive m/0' (first hardened child)
let child_0h = derive_child(&master, 0x80000000);
// Compare against test vectors

Hints if stuck:

  • Hardened derivation REQUIRES parent private key
  • The 0x00 byte prefix distinguishes from public key (which starts 0x02 or 0x03)
  • Cannot derive hardened children from extended public key

Phase 6: Path Parsing and Multi-Level Derivation

Goal: Parse and apply derivation paths.

Tasks:

  1. Parse path strings like “m/44’/60’/0’/0/0”
  2. Handle both ‘ and H notation for hardened
  3. Apply sequential derivations
  4. Validate path format and indices

Validation:

let path = DerivationPath::parse("m/44'/60'/0'/0/0")?;
let eth_address_0 = derive_path(&master, &path);
// Should match known Ethereum address for test mnemonic

Hints if stuck:

  • Split on /, skip first element if it’s “m”
  • Strip trailing ' or H to detect hardened
  • Index for hardened = parsed_number + 0x80000000

Phase 7: Extended Key Serialization

Goal: Encode/decode xprv and xpub strings.

Tasks:

  1. Construct 78-byte payload (version, depth, fingerprint, index, chain code, key)
  2. Apply Base58Check encoding (double SHA256 checksum)
  3. Implement decoding with checksum validation
  4. Support different version bytes (mainnet/testnet, private/public)

Validation:

let xprv_string = master.to_string();
assert!(xprv_string.starts_with("xprv"));
let decoded = ExtendedKey::from_string(&xprv_string)?;
assert_eq!(decoded, master);

Hints if stuck:

  • Base58 alphabet: “123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz”
  • Checksum: first 4 bytes of SHA256(SHA256(payload))
  • Version bytes for mainnet private: 0x0488ADE4

Phase 8: Address Generation

Goal: Generate cryptocurrency addresses from keys.

Tasks:

  1. Derive public key from private key (secp256k1)
  2. Bitcoin P2PKH: RIPEMD160(SHA256(pubkey)) + version + checksum
  3. Bitcoin P2WPKH: Bech32 encoding of witness program
  4. Ethereum: Last 20 bytes of Keccak256(uncompressed pubkey without 0x04 prefix)

Validation:

// Using test mnemonic "abandon abandon ... about"
let btc_path = DerivationPath::parse("m/44'/0'/0'/0/0")?;
let btc_key = derive_path(&master, &btc_path);
let btc_address = bitcoin_address(&btc_key.public_key());
// Check against known address

let eth_path = DerivationPath::parse("m/44'/60'/0'/0/0")?;
let eth_key = derive_path(&master, &eth_path);
let eth_address = ethereum_address(&eth_key.public_key());
assert_eq!(eth_address, "0x9858EfFD232B4033E47d90003D41EC34EcaEda94");

Hints if stuck:

  • Bitcoin uses compressed public keys (33 bytes)
  • Ethereum uses uncompressed without the 0x04 prefix (64 bytes)
  • Keccak256 is NOT SHA3-256 (different padding)

Testing Strategy

Unit Tests

#[test]
fn test_checksum_calculation() {
    let entropy = [0u8; 16];  // All zeros
    let checksum = calculate_checksum(&entropy);
    // First 4 bits should match
    assert_eq!(checksum, 0x00);  // SHA256 of zeros starts with...
}

#[test]
fn test_11bit_extraction() {
    let bits = [0b11111111, 0b11111111];
    let word_index = extract_11bits(&bits, 0);
    assert_eq!(word_index, 0b11111111111);  // 2047
}

#[test]
fn test_hardened_index() {
    assert!(!is_hardened(0));
    assert!(!is_hardened(0x7FFFFFFF));
    assert!(is_hardened(0x80000000));
    assert!(is_hardened(0xFFFFFFFF));
}

BIP-39 Official Test Vectors

#[test]
fn test_bip39_vector_1() {
    // Test vector from BIP-39 specification
    let entropy = hex!("00000000000000000000000000000000");
    let mnemonic = generate_mnemonic(&entropy);
    let expected = "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about";
    assert_eq!(mnemonic.to_string(), expected);

    let seed = mnemonic_to_seed(&mnemonic, "TREZOR");
    let expected_seed = hex!("c55257c360c07c72029aebc1b53c05ed0362ada38ead3e3e9efa3708e53495531f09a6987599d18264c1e1c92f2cf141630c7a3c4ab7c81b2f001698e7463b04");
    assert_eq!(seed, expected_seed);
}

#[test]
fn test_bip39_vector_japanese() {
    // Japanese word list test
    let entropy = hex!("00000000000000000000000000000000");
    let mnemonic = generate_mnemonic_language(&entropy, Language::Japanese);
    // ... validate Japanese words
}

BIP-32 Official Test Vectors

#[test]
fn test_bip32_vector_1() {
    // Test Vector 1 from BIP-32
    let seed = hex!("000102030405060708090a0b0c0d0e0f");

    // Chain m
    let m = master_key_from_seed(&seed);
    assert_eq!(m.to_xpub_string(), "xpub661MyMwAqRbcFtXgS5sYJABqqG9YLmC4Q1Rdap9gSE8NqtwybGhePY2gZ29ESFjqJoCu1Rupje8YtGqsefD265TMg7usUDFdp6W1EGMcet8");
    assert_eq!(m.to_xprv_string(), "xprv9s21ZrQH143K3GJpoapnV8SFfuZcESnPVTaH9d1a2Ks1NxKU1LTDhP1uqPRimb2ZxhSuwz4dPWJn4HMxqbVUmKRMF1ixQ7KneSZZS3E7DxC");

    // Chain m/0'
    let m_0h = derive_path(&m, &parse_path("m/0'")?);
    assert_eq!(m_0h.to_xpub_string(), "xpub68Gmy5EdvgibQVfPdqkBBCHxA5htiqg55crXYuXoQRKfDBFA1WEjWgP6LHhwBZeNK1VTsfTFUHCdrfp1bgwQ9xv5ski8PX9rL2dZXvgGDnw");

    // Chain m/0'/1
    let m_0h_1 = derive_path(&m, &parse_path("m/0'/1")?);
    // ... continue for full test vector
}

Address Generation Tests

#[test]
fn test_bitcoin_address_generation() {
    let mnemonic = "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about";
    let master = master_key_from_mnemonic(mnemonic, "")?;

    // First Bitcoin address
    let key = derive_path(&master, &parse_path("m/44'/0'/0'/0/0")?);
    let address = bitcoin_p2pkh_address(&key.public_key(), Network::Mainnet);

    // Known address for this test vector
    assert_eq!(address, "1HZwkjkeaoZfTSaJxDw6aKkxp45agDiEzN");
}

#[test]
fn test_ethereum_address_generation() {
    let mnemonic = "abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about";
    let master = master_key_from_mnemonic(mnemonic, "")?;

    let key = derive_path(&master, &parse_path("m/44'/60'/0'/0/0")?);
    let address = ethereum_address(&key.public_key());

    assert_eq!(address.to_lowercase(), "0x9858effd232b4033e47d90003d41ec34ecaeda94");
}

Cross-Implementation Testing

#[test]
fn test_compatibility_with_trezor() {
    // Generate addresses and compare with Trezor's expected outputs
    // These are published test vectors
}

#[test]
fn test_compatibility_with_ledger() {
    // Same addresses should be generated as Ledger hardware wallets
}

#[test]
fn test_metamask_compatibility() {
    // MetaMask uses specific derivation paths for Ethereum
    // Verify our addresses match
}

Common Pitfalls & Debugging

Pitfall 1: Bit Manipulation Errors in Mnemonic Generation

Problem: Incorrectly extracting 11-bit groups from the entropy+checksum.

Symptom: Wrong words generated, checksum validation fails.

Solution:

fn extract_11bits(bytes: &[u8], bit_offset: usize) -> u16 {
    // Handle bits spanning multiple bytes
    let byte_offset = bit_offset / 8;
    let bit_in_byte = bit_offset % 8;

    // Read 16+ bits and extract the 11 we need
    let mut value: u32 = 0;
    for i in 0..3 {
        if byte_offset + i < bytes.len() {
            value |= (bytes[byte_offset + i] as u32) << (16 - 8 * i);
        }
    }

    // Shift to align and mask
    ((value >> (21 - bit_in_byte)) & 0x7FF) as u16
}

Pitfall 2: Unicode Normalization

Problem: Mnemonic words must be NFKD normalized before PBKDF2.

Symptom: Seed differs from test vectors, especially with non-ASCII passphrases.

Solution:

use unicode_normalization::UnicodeNormalization;

fn mnemonic_to_seed(mnemonic: &str, passphrase: &str) -> [u8; 64] {
    let normalized_mnemonic = mnemonic.nfkd().collect::<String>();
    let normalized_passphrase = passphrase.nfkd().collect::<String>();
    let salt = format!("mnemonic{}", normalized_passphrase);

    pbkdf2_hmac_sha512(&normalized_mnemonic, &salt, 2048)
}

Pitfall 3: Big-Endian vs Little-Endian

Problem: Indices and version bytes must be big-endian.

Symptom: Wrong child keys, invalid serialization.

Solution:

// Child index to bytes
fn index_to_bytes(index: u32) -> [u8; 4] {
    index.to_be_bytes()  // Big-endian!
}

// Reading version from serialized xpub/xprv
fn read_version(bytes: &[u8]) -> u32 {
    u32::from_be_bytes([bytes[0], bytes[1], bytes[2], bytes[3]])
}

Pitfall 4: Hardened Index Representation

Problem: Confusing display notation (44’) with actual index value.

Symptom: Wrong derivation paths, incompatible with other wallets.

Solution:

const HARDENED_OFFSET: u32 = 0x80000000;

fn parse_path_component(s: &str) -> Result<u32> {
    let is_hardened = s.ends_with('\'') || s.ends_with('H');
    let num_str = s.trim_end_matches(|c| c == '\'' || c == 'H');
    let index: u32 = num_str.parse()?;

    if is_hardened {
        Ok(index + HARDENED_OFFSET)
    } else {
        Ok(index)
    }
}

Pitfall 5: Key Validity Edge Cases

Problem: Derived IL might be >= curve order or result in zero key.

Symptom: Invalid keys, crashes, or security vulnerabilities.

Solution:

fn derive_child(parent: &ExtendedKey, index: u32) -> Result<ExtendedKey> {
    let i = hmac_sha512(/* ... */);
    let il = &i[..32];
    let ir = &i[32..];

    let il_scalar = Scalar::from_bytes(il)?;

    // Check: IL must be < n
    if il_scalar >= SECP256K1_ORDER {
        // Specification says to proceed with next index
        return derive_child(parent, index + 1);
    }

    let child_scalar = (il_scalar + parent.key_scalar()) % SECP256K1_ORDER;

    // Check: result must not be zero
    if child_scalar.is_zero() {
        return derive_child(parent, index + 1);
    }

    Ok(/* construct child */)
}

Pitfall 6: Fingerprint Calculation

Problem: Parent fingerprint is first 4 bytes of HASH160(public key).

Symptom: Extended key strings don’t match test vectors.

Solution:

fn fingerprint(public_key: &PublicKey) -> [u8; 4] {
    let compressed = public_key.to_compressed_bytes();  // 33 bytes
    let hash160 = ripemd160(&sha256(&compressed));
    [hash160[0], hash160[1], hash160[2], hash160[3]]
}

Extensions and Challenges

Challenge 1: Implement BIP-85

BIP-85 defines “Deterministic Entropy From BIP32 Keychains” - deriving child entropies that can generate entirely new mnemonics.

Application: Create child wallets for different purposes, each with its own
12/24 word backup, all derivable from a single master mnemonic.

Challenge 2: Multi-Signature Path Support (BIP-48)

Implement BIP-48 derivation paths for multi-signature wallets:

m/48'/coin_type'/account'/script_type'/change/address_index

script_type: 1' = P2SH-P2WSH, 2' = P2WSH

Challenge 3: Implement BIP-84 (Native SegWit)

Add support for native SegWit addresses (bc1…) using:

Path: m/84'/coin_type'/account'/change/address_index
Address: Bech32 encoding of witness program

Challenge 4: Shamir’s Secret Sharing (SLIP-39)

Implement SLIP-39, which splits a seed into multiple shares where k-of-n shares are required to recover:

20 shares, any 3 required = "Shamir Backup"
Protects against both loss AND theft of individual shares

Challenge 5: Air-Gapped Signing Workflow

Build a complete offline signing workflow:

  1. Watch-only wallet (xpub) generates unsigned transactions
  2. QR code transfer to air-gapped device
  3. Offline device signs with private keys
  4. QR code transfer of signed transaction back
  5. Online device broadcasts

Challenge 6: Hardware Wallet Simulation

Implement the core of a hardware wallet:

  • Secure key storage (simulated secure element)
  • Transaction parsing and display
  • User confirmation before signing
  • PIN protection
  • Plausible deniability passphrases

Real-World Connections

Hardware Wallets (Ledger, Trezor)

Every hardware wallet implements BIP-32/39/44:

  • You enter 24 words during setup
  • Device derives master seed and stores it in secure element
  • Each cryptocurrency uses its BIP-44 coin type
  • Device shows addresses and signs transactions
  • Your 24 words can recover to any compatible wallet

MetaMask and Browser Wallets

MetaMask uses:

  • BIP-39 for the 12-word seed phrase
  • BIP-44 path m/44'/60'/0'/0/0 for the first Ethereum account
  • Can show multiple accounts by incrementing the account index

Exchange Hot/Cold Wallet Architecture

Exchanges use HD wallets for security:

  1. Hot wallet: xpub only, generates deposit addresses
  2. Cold storage: xprv stored offline in HSM or multi-sig
  3. Customer deposits go to unique addresses (derived from xpub)
  4. Withdrawals require cold storage approval

Recovery Services

When you “restore” a wallet:

  1. Words are validated against the word list
  2. Checksum is verified
  3. PBKDF2 generates the seed
  4. Standard paths are scanned for balances
  5. All your addresses are recovered

This is why:

  • Word order matters
  • Wrong words fail checksum
  • Same words always recover same addresses

The 2014 Bitstamp Hack

In 2014, attackers obtained Bitstamp’s hot wallet private key through social engineering. The key could spend all funds immediately.

With an HD wallet architecture:

  • Attackers would only get one private key
  • Other addresses would remain secure (with hardened derivation)
  • Limited blast radius of any single key compromise

Resources

Primary References

  1. BIP-32: Hierarchical Deterministic Wallets - Official specification
  2. BIP-39: Mnemonic code for generating deterministic keys - Mnemonic standard
  3. BIP-44: Multi-Account Hierarchy - Path conventions
  4. “Mastering Bitcoin” Chapter 5 - HD Wallets explained accessibly

Code References

  1. bitcoinjs-lib: GitHub - JavaScript reference
  2. python-mnemonic: GitHub - Trezor’s Python implementation
  3. rust-bip39: GitHub - Rust implementation

Test Vectors

  1. BIP-39 Test Vectors: Official
  2. BIP-32 Test Vectors: In BIP-32
  3. Ian Coleman’s BIP39 Tool: bip39 Tool - Interactive testing

Supplementary Reading

  1. SLIP-0010: Alternative to BIP-32 for Ed25519 curves
  2. BIP-84: Native SegWit derivation paths
  3. SLIP-39: Shamir’s Secret Sharing for recovery

Self-Assessment Checklist

Before moving to the next project, verify:

  • I can explain why HD wallets only need one backup for infinite addresses
  • I understand the entropy -> checksum -> words conversion
  • I can implement PBKDF2 and explain why 2048 iterations
  • I understand the security difference between hardened and normal derivation
  • I can parse and apply a BIP-44 derivation path
  • My implementation matches official test vectors for BIP-32 and BIP-39
  • I can explain why leaking one child private key can compromise siblings (normal derivation)
  • I understand why “Bitcoin seed” is used even for Ethereum keys
  • I can generate valid Bitcoin and Ethereum addresses from a mnemonic

Conceptual Questions

  1. Why can’t you derive hardened children from an extended public key?
  2. What happens if you use the wrong passphrase with the correct mnemonic?
  3. Why is the chain code necessary? What would happen without it?
  4. How does the 24th word encode the checksum?
  5. Why do exchanges share xpubs with their hot wallet servers?

What’s Next?

With HD wallet implementation complete, you now understand the complete key management system used by every modern cryptocurrency wallet. You can derive infinite addresses from a single backup, understand the security implications of different derivation paths, and appreciate why hardware wallets are designed the way they are.

In Project 7: Bitcoin Transaction Parser, you’ll learn how these keys are actually used - decoding raw Bitcoin transactions to understand inputs, outputs, scripts, and the SegWit data structure. You’ll see exactly what gets signed when you “send” Bitcoin.