P07: Bitcoin Transaction Parser

P07: Bitcoin Transaction Parser

Project Overview

Attribute Value
Main Language Rust
Alternative Languages C, Go, Python
Difficulty Advanced
Coolness Level Level 4: Hardcore Tech Flex
Business Potential Resume Gold (Educational/Personal Brand)
Knowledge Area Protocols / Bitcoin
Main Book “Mastering Bitcoin” by Andreas M. Antonopoulos

Learning Objectives

By completing this project, you will:

  1. Master Bitcoin’s binary transaction format understanding exactly how transactions are serialized and why each field exists
  2. Implement variable-length encoding learning CompactSize integers (varints) used throughout Bitcoin
  3. Parse and interpret Bitcoin Script the stack-based programming language that defines spending conditions
  4. Understand SegWit (Segregated Witness) and how it changed the transaction format to fix malleability and increase throughput
  5. Compute Transaction IDs (TXIDs) understanding which parts of a transaction are hashed and why

Deep Theoretical Foundation

Why Transaction Parsing Matters

Every Bitcoin ever transferred exists as a transaction in the blockchain. When you “send” Bitcoin, you’re really creating a cryptographically signed message that says: “I’m spending outputs from previous transactions and creating new outputs that can only be spent by whoever knows the private key for address X.”

But what does this message actually look like? It’s not JSON. It’s not XML. It’s a carefully designed binary format, optimized for size and unambiguous parsing. Understanding this format means understanding Bitcoin itself at the deepest level.

When you run a Bitcoin node, every single transaction that arrives over the network must be:

  1. Parsed - Decoded from raw bytes into structured data
  2. Validated - Checked for correctness (proper structure, valid signatures, unspent inputs)
  3. Executed - Script evaluation to verify spending conditions

This project focuses on step 1: taking raw hex-encoded transactions and extracting every piece of information from them.

The UTXO Model

Before diving into transactions, you must understand Bitcoin’s data model. Unlike Ethereum’s account model (where you have a “balance”), Bitcoin uses Unspent Transaction Outputs (UTXOs).

Think of UTXOs as individual bills in your wallet:

  • You don’t have “$150” - you have a $100 bill, a $20 bill, and three $10 bills
  • To pay someone $75, you hand over the $100 bill and receive $25 in change
  • The $100 bill is “spent” (destroyed), and two new “bills” are created: $75 for them, $25 for you

In Bitcoin:

  • Inputs reference previous transaction outputs being spent
  • Outputs create new spendable amounts locked to new addresses
  • The sum of inputs must equal or exceed the sum of outputs (the difference is the miner’s fee)

Anatomy of a Bitcoin Transaction

A Bitcoin transaction consists of:

+----------------+
| Version (4)    |  - Transaction format version (1 or 2)
+----------------+
| Flag (2)*      |  - SegWit marker and flag (optional)
+----------------+
| Input Count    |  - Number of inputs (CompactSize)
+----------------+
| Inputs [...]   |  - Variable-length list of inputs
+----------------+
| Output Count   |  - Number of outputs (CompactSize)
+----------------+
| Outputs [...]  |  - Variable-length list of outputs
+----------------+
| Witness*       |  - Witness data for each input (if SegWit)
+----------------+
| Locktime (4)   |  - Block/time before which tx can't be mined
+----------------+

* Only present in SegWit transactions

CompactSize Integers (VarInt)

Bitcoin uses a variable-length integer encoding called CompactSize (or VarInt) throughout its protocol. This saves space by using fewer bytes for small numbers:

Value Range Encoding
0 - 0xFC (252) 1 byte: value
0xFD - 0xFFFF 3 bytes: 0xFD + 2 bytes (little-endian)
0x10000 - 0xFFFFFFFF 5 bytes: 0xFE + 4 bytes (little-endian)
0x100000000+ 9 bytes: 0xFF + 8 bytes (little-endian)

Examples:

0x05           -> 5 (1 byte)
0xFD 0x00 0x01 -> 256 (3 bytes: 0xFD + 0x0100 little-endian)
0xFE 0x00 0x01 0x00 0x00 -> 65536 (5 bytes)

This encoding is crucial because most transactions have 1-3 inputs and 1-3 outputs. Without CompactSize, we’d waste bytes specifying “number of inputs: 00 00 00 01” for every transaction.

Transaction Inputs

Each input in a transaction has the following structure:

+--------------------+
| Previous TXID (32) |  - Hash of transaction containing the output being spent
+--------------------+
| Previous Vout (4)  |  - Index of the output within that transaction
+--------------------+
| Script Length      |  - CompactSize: length of scriptSig
+--------------------+
| ScriptSig (var)    |  - Unlocking script (signature and pubkey)
+--------------------+
| Sequence (4)       |  - Originally for transaction replacement
+--------------------+

Understanding Each Field:

  1. Previous TXID (32 bytes): The SHA256d hash of a previous transaction. Note: This is stored in internal byte order (reversed from the display format you see in block explorers).

  2. Previous Vout (4 bytes, little-endian): Which output of that transaction are we spending? First output is 0.

  3. ScriptSig (variable): The “unlocking” script that proves we have the right to spend this output. For P2PKH, this typically contains a signature and public key.

  4. Sequence (4 bytes, little-endian): Originally designed for transaction replacement but now used for:

    • Signaling RBF (Replace-By-Fee) if < 0xFFFFFFFE
    • Enabling relative timelocks (BIP 68)
    • 0xFFFFFFFF disables locktime checking

Transaction Outputs

Each output has this structure:

+--------------------+
| Value (8)          |  - Amount in satoshis (little-endian)
+--------------------+
| Script Length      |  - CompactSize: length of scriptPubKey
+--------------------+
| ScriptPubKey (var) |  - Locking script (spending conditions)
+--------------------+

Understanding Each Field:

  1. Value (8 bytes, little-endian): Amount in satoshis (1 BTC = 100,000,000 satoshis). Max value: 21,000,000 BTC = 2.1 x 10^15 satoshis, which fits in 51 bits.

  2. ScriptPubKey (variable): The “locking” script that defines conditions for spending this output. Different address types have different standard scripts:

    • P2PKH (Pay-to-Public-Key-Hash): OP_DUP OP_HASH160 <20-byte-hash> OP_EQUALVERIFY OP_CHECKSIG
    • P2SH (Pay-to-Script-Hash): OP_HASH160 <20-byte-hash> OP_EQUAL
    • P2WPKH (Native SegWit): OP_0 <20-byte-hash>
    • P2WSH (SegWit Script Hash): OP_0 <32-byte-hash>
    • P2TR (Taproot): OP_1 <32-byte-key>

Bitcoin Script: A Stack-Based Language

Bitcoin Script is a simple, stack-based, Turing-incomplete language. Understanding it is essential for parsing transactions correctly.

Execution Model:

  1. Start with empty stack
  2. Execute scriptSig (unlocking script) - pushes data onto stack
  3. Execute scriptPubKey (locking script) - manipulates stack
  4. Transaction is valid if stack is non-empty and top element is true

Common Opcodes:

Opcode Hex Description
OP_0 0x00 Push empty byte array (false)
OP_PUSHDATA1 0x4c Next byte is length, then push that many bytes
OP_PUSHDATA2 0x4d Next 2 bytes are length, then push that many bytes
OP_1 to OP_16 0x51-0x60 Push 1-16 onto stack
OP_DUP 0x76 Duplicate top stack item
OP_HASH160 0xa9 SHA256 then RIPEMD160 of top item
OP_EQUAL 0x87 Pop two items, push 1 if equal, else 0
OP_EQUALVERIFY 0x88 OP_EQUAL then OP_VERIFY
OP_CHECKSIG 0xac Verify signature against pubkey
OP_CHECKMULTISIG 0xae M-of-N signature verification
OP_RETURN 0x6a Marks output as provably unspendable (data embedding)

Data Push Opcodes (0x01 - 0x4b): When the opcode is between 0x01 and 0x4b (1-75), it means “push the next N bytes onto the stack.”

Example: P2PKH ScriptPubKey

76 a9 14 89abcdef... 88 ac
^  ^  ^  ^          ^  ^
|  |  |  |          |  OP_CHECKSIG
|  |  |  |          OP_EQUALVERIFY
|  |  |  20 bytes (pubkey hash)
|  |  Push next 20 bytes
|  OP_HASH160
OP_DUP

SegWit: Segregated Witness

Segregated Witness (BIP 141/143/144) was Bitcoin’s most significant upgrade. It separates (“segregates”) the signature (“witness”) data from the main transaction structure.

Why SegWit?

  1. Transaction Malleability Fix: Before SegWit, anyone could modify the scriptSig (specifically, the encoding of signatures) without invalidating the signature itself. This changed the TXID (which hashes the entire transaction including scriptSig), breaking chains of unconfirmed transactions.

  2. Block Space Efficiency: Witness data counts at 1/4 weight, effectively increasing block capacity.

  3. Enables Lightning Network: Fixed malleability allows off-chain payment channels to work safely.

SegWit Transaction Format:

+----------------+
| Version (4)    |
+----------------+
| Marker (0x00)  |  <- SegWit marker
+----------------+
| Flag (0x01)    |  <- SegWit flag
+----------------+
| Input Count    |
+----------------+
| Inputs [...]   |  <- scriptSig is empty for native SegWit
+----------------+
| Output Count   |
+----------------+
| Outputs [...]  |
+----------------+
| Witness [...]  |  <- Witness data for each input
+----------------+
| Locktime (4)   |
+----------------+

Witness Data Structure:

For each input, witness data is:

+--------------------+
| Stack Item Count   |  - CompactSize: number of items
+--------------------+
| Item 1 Length      |  - CompactSize
+--------------------+
| Item 1 Data        |
+--------------------+
| Item 2 Length      |
+--------------------+
| Item 2 Data        |
+--------------------+
| ...                |
+--------------------+

For P2WPKH, witness typically contains:

  1. Signature (71-73 bytes, DER encoded + sighash type)
  2. Public key (33 bytes compressed)

TXID vs WTXID

TXID (Transaction ID):

  • SHA256d of transaction without witness data
  • For legacy transactions: hash of entire transaction
  • For SegWit transactions: hash of version + inputs + outputs + locktime (no marker, flag, or witness)

WTXID (Witness Transaction ID):

  • SHA256d of entire transaction including witness
  • Used in witness commitment (stored in coinbase transaction)

Why Two IDs?

The TXID remains stable regardless of witness data, solving malleability. When you reference a previous output to spend, you use the TXID. The WTXID is used to commit to the complete transaction data in blocks.

Computing TXID for SegWit Transaction:

TXID = SHA256(SHA256(
    version (4 bytes) ||
    input_count (varint) ||
    inputs (without scriptSig, which is empty anyway) ||
    output_count (varint) ||
    outputs ||
    locktime (4 bytes)
))

Note: This is exactly the “stripped” transaction - what you’d get if you removed marker, flag, and witness.

DER Signature Encoding

Bitcoin signatures use DER (Distinguished Encoding Rules) format, a subset of ASN.1:

30 <total_length>
   02 <r_length> <r_value>
   02 <s_length> <s_value>
<sighash_type>

Structure:

  • 0x30: SEQUENCE tag
  • Total length: Length of everything after this byte (excluding sighash)
  • 0x02: INTEGER tag for r
  • r_length: Length of r (usually 32-33 bytes)
  • r_value: The r value (may have leading 0x00 if high bit is set)
  • 0x02: INTEGER tag for s
  • s_length: Length of s (usually 32 bytes after low-S normalization)
  • s_value: The s value
  • sighash_type: 0x01 = SIGHASH_ALL (most common)

Example:

30 44 02 20 7a8f... 02 20 3b4c... 01

Sighash Types

The sighash type (last byte of signature) determines which parts of the transaction the signature commits to:

Type Value Description
SIGHASH_ALL 0x01 Signs all inputs and outputs (default)
SIGHASH_NONE 0x02 Signs inputs, but not outputs
SIGHASH_SINGLE 0x03 Signs inputs and only the output at same index
SIGHASH_ANYONECANPAY 0x80 Can be combined with above; only signs own input

Complete Project Specification

Functional Requirements

  1. Raw Transaction Parsing
    • Accept hex-encoded raw transactions
    • Detect and parse both legacy and SegWit formats
    • Extract all fields: version, inputs, outputs, locktime
    • Parse witness data for SegWit transactions
  2. Script Analysis
    • Decode scriptSig and scriptPubKey
    • Identify script type (P2PKH, P2SH, P2WPKH, P2WSH, P2TR, OP_RETURN)
    • Disassemble opcodes to human-readable format
    • Extract embedded data (addresses, public keys, hashes)
  3. Signature Parsing
    • Parse DER-encoded signatures
    • Extract r and s values
    • Identify sighash type
    • Validate signature structure
  4. Transaction ID Computation
    • Compute TXID (legacy method)
    • Compute WTXID (includes witness)
    • Display in correct byte order (display vs internal)
  5. Human-Readable Output
    • Pretty-print transaction structure
    • Show amounts in BTC and satoshis
    • Convert addresses to standard formats
    • Calculate and display transaction fee (if inputs known)

Non-Functional Requirements

  • Correctness: Parse all valid Bitcoin transactions correctly
  • Robustness: Handle malformed transactions gracefully with clear errors
  • Performance: Parse 1000 transactions per second
  • Compatibility: Match output of bitcoin-cli decoderawtransaction

Command-Line Interface

# Parse a transaction from hex
$ btc-tx-parser 0100000001abcd...
Transaction ID: 7a3f...
Version: 1
Inputs: 1
  [0] Previous TX: abcd1234...
      Previous Vout: 0
      ScriptSig: 47304402...
      Sequence: 0xffffffff
Outputs: 2
  [0] Value: 0.5 BTC (50000000 sat)
      ScriptPubKey: OP_DUP OP_HASH160 89ab... OP_EQUALVERIFY OP_CHECKSIG
      Type: P2PKH
      Address: 1A1zP1...
  [1] Value: 0.00009 BTC (9000 sat)
      ScriptPubKey: OP_RETURN 48656c6c6f
      Type: OP_RETURN
      Data: "Hello"
Locktime: 0

# Parse from file
$ btc-tx-parser --file transaction.hex

# Output as JSON
$ btc-tx-parser --json 0100000001abcd...

# Verbose mode (show raw bytes alongside parsed)
$ btc-tx-parser --verbose 0100000001abcd...

# Compute TXID/WTXID only
$ btc-tx-parser --txid 0100000001abcd...
TXID: 7a3f5c...
WTXID: 8b4e6d...

# Disassemble script only
$ btc-tx-parser --script "76a91489abcdef...88ac"
OP_DUP OP_HASH160 89abcdef... OP_EQUALVERIFY OP_CHECKSIG

Solution Architecture

Module Structure

src/
+-- main.rs              # CLI entry point
+-- lib.rs               # Public API
+-- parser/
|   +-- mod.rs           # Transaction parser coordinator
|   +-- varint.rs        # CompactSize integer parsing
|   +-- input.rs         # Transaction input parsing
|   +-- output.rs        # Transaction output parsing
|   +-- witness.rs       # Witness data parsing
+-- script/
|   +-- mod.rs           # Script disassembler
|   +-- opcodes.rs       # Opcode definitions
|   +-- types.rs         # Script type detection
|   +-- address.rs       # Address extraction/encoding
+-- signature/
|   +-- mod.rs           # Signature handling
|   +-- der.rs           # DER encoding parser
|   +-- sighash.rs       # Sighash type definitions
+-- hash/
|   +-- mod.rs           # Hashing utilities
|   +-- txid.rs          # TXID/WTXID computation
+-- display/
|   +-- mod.rs           # Output formatting
|   +-- json.rs          # JSON serialization
|   +-- pretty.rs        # Human-readable output
+-- tests/
    +-- vectors.rs       # Known transaction test vectors
    +-- segwit_tests.rs  # SegWit-specific tests
    +-- script_tests.rs  # Script parsing tests

Core Data Structures

/// A parsed Bitcoin transaction
pub struct Transaction {
    pub version: u32,
    pub inputs: Vec<TxInput>,
    pub outputs: Vec<TxOutput>,
    pub witness: Option<Vec<Witness>>,
    pub locktime: u32,
}

/// A transaction input
pub struct TxInput {
    pub prev_txid: [u8; 32],
    pub prev_vout: u32,
    pub script_sig: Script,
    pub sequence: u32,
}

/// A transaction output
pub struct TxOutput {
    pub value: u64,  // in satoshis
    pub script_pubkey: Script,
}

/// Witness data for one input
pub struct Witness {
    pub items: Vec<Vec<u8>>,
}

/// A Bitcoin script (raw bytes + disassembly)
pub struct Script {
    pub raw: Vec<u8>,
    pub ops: Vec<ScriptOp>,
    pub script_type: ScriptType,
}

/// A single script operation
pub enum ScriptOp {
    /// Push data onto stack
    Push(Vec<u8>),
    /// Named opcode
    Op(Opcode),
}

/// Standard script types
pub enum ScriptType {
    P2PKH,
    P2SH,
    P2WPKH,
    P2WSH,
    P2TR,
    OpReturn(Vec<u8>),
    Multisig { m: u8, n: u8 },
    NonStandard,
}

/// A parsed DER signature
pub struct Signature {
    pub r: Vec<u8>,
    pub s: Vec<u8>,
    pub sighash: SighashType,
}

/// Sighash types
pub enum SighashType {
    All,
    None,
    Single,
    AllAnyoneCanPay,
    NoneAnyoneCanPay,
    SingleAnyoneCanPay,
}

Key Algorithms

CompactSize Parsing

function read_compact_size(bytes, offset) -> (value, new_offset):
    first_byte = bytes[offset]

    if first_byte < 0xFD:
        return (first_byte, offset + 1)

    else if first_byte == 0xFD:
        value = read_u16_le(bytes, offset + 1)
        return (value, offset + 3)

    else if first_byte == 0xFE:
        value = read_u32_le(bytes, offset + 1)
        return (value, offset + 5)

    else:  // 0xFF
        value = read_u64_le(bytes, offset + 1)
        return (value, offset + 9)

SegWit Detection

function is_segwit(bytes) -> bool:
    // SegWit transactions have marker=0x00 and flag=0x01 after version
    if len(bytes) < 6:
        return false

    // Read version (4 bytes)
    // Check bytes 4 and 5
    marker = bytes[4]
    flag = bytes[5]

    return marker == 0x00 and flag == 0x01

TXID Computation

function compute_txid(tx: Transaction) -> [u8; 32]:
    // Serialize without witness data
    serialized = serialize_no_witness(tx)
    // Double SHA256
    hash1 = sha256(serialized)
    hash2 = sha256(hash1)
    // TXID is displayed in reversed byte order
    return hash2

Script Disassembly

function disassemble_script(script: bytes) -> Vec<ScriptOp>:
    ops = []
    i = 0

    while i < len(script):
        opcode = script[i]

        if opcode == 0x00:
            ops.push(Op(OP_0))
            i += 1

        else if opcode >= 0x01 and opcode <= 0x4b:
            // Direct push
            length = opcode
            data = script[i+1 : i+1+length]
            ops.push(Push(data))
            i += 1 + length

        else if opcode == 0x4c:  // OP_PUSHDATA1
            length = script[i+1]
            data = script[i+2 : i+2+length]
            ops.push(Push(data))
            i += 2 + length

        else if opcode == 0x4d:  // OP_PUSHDATA2
            length = read_u16_le(script, i+1)
            data = script[i+3 : i+3+length]
            ops.push(Push(data))
            i += 3 + length

        else:
            ops.push(Op(OPCODE_MAP[opcode]))
            i += 1

    return ops

Phased Implementation Guide

Phase 1: Hex Parsing and Basic Structure

Goal: Read hex-encoded transactions and extract the version and locktime.

Tasks:

  1. Create hex-to-bytes conversion function
  2. Parse version (first 4 bytes, little-endian)
  3. Parse locktime (last 4 bytes, little-endian)
  4. Detect SegWit marker/flag

Validation:

let tx = parse_transaction("01000000...00000000");
assert_eq!(tx.version, 1);
assert_eq!(tx.locktime, 0);

Hints if stuck:

  • Version is little-endian: bytes 0x01 0x00 0x00 0x00 = version 1
  • Locktime is always the last 4 bytes
  • For SegWit detection, check if bytes[4..6] == [0x00, 0x01]

Phase 2: CompactSize Integers

Goal: Implement variable-length integer parsing.

Tasks:

  1. Implement read_compact_size function
  2. Handle all four encoding cases (1, 3, 5, 9 bytes)
  3. Return both the value and bytes consumed
  4. Add comprehensive tests

Validation:

assert_eq!(read_compact_size(&[0x05]), (5, 1));
assert_eq!(read_compact_size(&[0xFD, 0x00, 0x01]), (256, 3));
assert_eq!(read_compact_size(&[0xFE, 0x00, 0x00, 0x01, 0x00]), (65536, 5));

Hints if stuck:

  • First check if first byte < 0xFD for single-byte case
  • Remember: values are little-endian
  • Return (value, number_of_bytes_consumed) for cursor tracking

Phase 3: Input Parsing

Goal: Parse transaction inputs.

Tasks:

  1. Read input count using CompactSize
  2. For each input:
    • Read previous TXID (32 bytes, reversed for display)
    • Read previous vout (4 bytes, little-endian)
    • Read scriptSig length and data
    • Read sequence (4 bytes, little-endian)
  3. Store in TxInput struct

Validation:

// Parse a known transaction and verify input fields
let tx = parse_transaction("01000000017b...");
assert_eq!(tx.inputs.len(), 1);
assert_eq!(tx.inputs[0].prev_vout, 0);

Hints if stuck:

  • TXID bytes are stored in “internal” order but displayed reversed
  • ScriptSig can be empty (especially for SegWit inputs)
  • Use a cursor/offset pattern to track position in byte stream

Phase 4: Output Parsing

Goal: Parse transaction outputs.

Tasks:

  1. Read output count using CompactSize
  2. For each output:
    • Read value (8 bytes, little-endian)
    • Read scriptPubKey length and data
  3. Display value in both satoshis and BTC

Validation:

let tx = parse_transaction("...");
assert_eq!(tx.outputs.len(), 2);
assert_eq!(tx.outputs[0].value, 50_000_000); // 0.5 BTC

Hints if stuck:

  • Value is u64, not u32 (8 bytes)
  • 1 BTC = 100,000,000 satoshis
  • Max value is ~21 million BTC = 2.1 quadrillion satoshis

Phase 5: Script Disassembly

Goal: Decode Bitcoin Script to human-readable format.

Tasks:

  1. Define opcode constants (OP_DUP, OP_HASH160, etc.)
  2. Implement direct push opcodes (0x01-0x4b)
  3. Implement OP_PUSHDATA1 and OP_PUSHDATA2
  4. Handle remaining opcodes
  5. Format output as assembly-style string

Validation:

let script = hex::decode("76a91489abcdef...88ac")?;
let disasm = disassemble(&script);
assert_eq!(disasm, "OP_DUP OP_HASH160 <20-bytes> OP_EQUALVERIFY OP_CHECKSIG");

Hints if stuck:

  • Opcodes 0x01-0x4b are “push next N bytes”
  • Use a match/switch on opcode value
  • Format pushed data as hex for now, later convert to addresses

Phase 6: Script Type Detection

Goal: Identify standard script patterns.

Tasks:

  1. Detect P2PKH: OP_DUP OP_HASH160 <20> OP_EQUALVERIFY OP_CHECKSIG
  2. Detect P2SH: OP_HASH160 <20> OP_EQUAL
  3. Detect P2WPKH: OP_0 <20>
  4. Detect P2WSH: OP_0 <32>
  5. Detect P2TR: OP_1 <32>
  6. Detect OP_RETURN: OP_RETURN <data>
  7. Extract pubkey hash or script hash

Validation:

let script = hex::decode("0014a91b...1234")?;
assert_eq!(detect_type(&script), ScriptType::P2WPKH);

Hints if stuck:

  • Check script length and byte patterns
  • P2WPKH is exactly 22 bytes: 0x00 0x14 + 20 bytes
  • P2WSH is exactly 34 bytes: 0x00 0x20 + 32 bytes
  • OP_RETURN starts with 0x6a

Phase 7: Witness Data Parsing

Goal: Parse SegWit witness data.

Tasks:

  1. Skip to witness section after outputs
  2. Read witness data for each input:
    • Read stack item count (CompactSize)
    • For each item: read length and data
  3. Link witness to corresponding input

Validation:

let tx = parse_segwit_transaction("02000000000101...");
assert!(tx.witness.is_some());
assert_eq!(tx.witness.unwrap()[0].items.len(), 2);

Hints if stuck:

  • Witness comes after all outputs, before locktime
  • Each input has its own witness (even if empty)
  • Non-witness inputs have witness = [[]] (one empty item)

Phase 8: Signature Parsing (DER)

Goal: Extract signature components from DER encoding.

Tasks:

  1. Parse DER sequence tag (0x30)
  2. Parse r value (integer tag 0x02 + length + data)
  3. Parse s value (integer tag 0x02 + length + data)
  4. Extract sighash type (last byte)
  5. Handle leading zeros and high-bit padding

Validation:

let sig = hex::decode("304402207a8f...01")?;
let parsed = parse_signature(&sig)?;
assert_eq!(parsed.sighash, SighashType::All);
assert_eq!(parsed.r.len(), 32);

Hints if stuck:

  • Leading 0x00 byte on r/s if high bit is set (prevents negative interpretation)
  • s should be “low-S” normalized (less than n/2)
  • Sighash byte is appended after DER structure

Phase 9: TXID Computation

Goal: Compute transaction IDs correctly.

Tasks:

  1. Implement SHA256 (or use sha2 crate)
  2. Implement double-SHA256 (SHA256d)
  3. For legacy: hash entire transaction
  4. For SegWit: hash stripped transaction (no witness)
  5. Implement WTXID (hash with witness)
  6. Handle byte order for display

Validation:

// Compare against known TXID
let tx = parse_transaction("01000000017b1eabe...");
let txid = compute_txid(&tx);
assert_eq!(hex::encode(txid.reverse()), "a1b2c3d4...");

Hints if stuck:

  • TXID is displayed in reversed byte order (legacy display convention)
  • For SegWit stripped transaction: version + inputs + outputs + locktime (no marker/flag/witness)
  • Use sha2 crate for production; implement from scratch for learning

Phase 10: Address Encoding

Goal: Convert script hashes to standard address formats.

Tasks:

  1. Implement Base58Check for legacy addresses (P2PKH, P2SH)
  2. Implement Bech32 for SegWit addresses (P2WPKH, P2WSH)
  3. Implement Bech32m for Taproot addresses (P2TR)
  4. Handle mainnet vs testnet prefixes

Validation:

// P2PKH
assert_eq!(
    pubkey_hash_to_address(&hash, Network::Mainnet),
    "1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2"
);

// P2WPKH
assert_eq!(
    witness_pubkey_to_address(&hash, Network::Mainnet),
    "bc1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t4"
);

Hints if stuck:

  • Base58Check = Base58(payload + checksum); checksum = first 4 bytes of SHA256d(payload)
  • Bech32 uses 5-bit character encoding with specific checksum
  • Prefix: mainnet P2PKH=0x00, P2SH=0x05; testnet P2PKH=0x6f, P2SH=0xc4

Phase 11: Pretty Printing and JSON Output

Goal: Create user-friendly output formats.

Tasks:

  1. Format transaction as hierarchical text
  2. Implement JSON serialization
  3. Add verbose mode showing raw bytes alongside parsed values
  4. Calculate fee (if previous outputs are known)

Validation:

$ btc-tx-parser --json <tx>
{
  "txid": "abc123...",
  "version": 2,
  "inputs": [...],
  ...
}

Hints if stuck:

  • Use serde for JSON serialization
  • Show both hex and interpreted values in verbose mode
  • Fee = sum(inputs) - sum(outputs); requires looking up previous transactions

Testing Strategy

Unit Tests

#[test]
fn test_compact_size_single_byte() {
    assert_eq!(read_compact_size(&[0x00]), (0, 1));
    assert_eq!(read_compact_size(&[0xFC]), (252, 1));
}

#[test]
fn test_compact_size_three_byte() {
    assert_eq!(read_compact_size(&[0xFD, 0xFD, 0x00]), (253, 3));
    assert_eq!(read_compact_size(&[0xFD, 0xFF, 0xFF]), (65535, 3));
}

#[test]
fn test_script_type_detection() {
    // P2PKH
    let p2pkh = hex::decode("76a91489abcdefabbaabbaabbaabbaabbaabbaabbaabba88ac").unwrap();
    assert_eq!(detect_script_type(&p2pkh), ScriptType::P2PKH);

    // P2WPKH
    let p2wpkh = hex::decode("0014751e76e8199196d454941c45d1b3a323f1433bd6").unwrap();
    assert_eq!(detect_script_type(&p2wpkh), ScriptType::P2WPKH);
}

Known Transaction Test Vectors

#[test]
fn test_genesis_coinbase() {
    // Satoshi's genesis block coinbase
    let hex = "01000000010000000000000000000000000000000000000000000000000000000000000000ffffffff4d04ffff001d0104455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73ffffffff0100f2052a0100000043410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac00000000";
    let tx = parse_transaction(hex).unwrap();

    assert_eq!(tx.version, 1);
    assert_eq!(tx.inputs.len(), 1);
    assert_eq!(tx.outputs.len(), 1);
    assert_eq!(tx.outputs[0].value, 5_000_000_000); // 50 BTC
}

#[test]
fn test_segwit_transaction() {
    // Known SegWit transaction
    let hex = "02000000000101...";
    let tx = parse_transaction(hex).unwrap();

    assert!(tx.witness.is_some());
    // Verify TXID and WTXID
}

#[test]
fn test_multisig_transaction() {
    // 2-of-3 multisig
    let hex = "0100000001...";
    let tx = parse_transaction(hex).unwrap();

    // Check multisig script detection
}

Compatibility Tests

#[test]
fn test_bitcoin_core_compatibility() {
    // Parse transactions and compare output to bitcoin-cli decoderawtransaction
    let transactions = load_test_vectors("testdata/transactions.json");

    for (hex, expected) in transactions {
        let parsed = parse_transaction(&hex).unwrap();
        let json = serde_json::to_value(&parsed).unwrap();
        assert_json_eq!(json, expected);
    }
}

Fuzz Testing

#[test]
fn fuzz_transaction_parser() {
    // Ensure parser doesn't crash on arbitrary input
    for _ in 0..10000 {
        let random_bytes = generate_random_bytes(1000);
        let _ = parse_transaction(&hex::encode(random_bytes));
        // Should not panic, may return error
    }
}

Common Pitfalls and Debugging

Pitfall 1: Byte Order Confusion

Problem: Bitcoin uses little-endian for most values, but TXIDs are displayed in reversed order.

Symptom: TXID doesn’t match block explorer.

Solution:

// Internal byte order (as stored in transaction)
let internal_txid = [0x01, 0x02, 0x03, ..., 0x20];

// Display order (reversed for human display)
let display_txid = internal_txid.iter().rev().collect::<Vec<_>>();
println!("TXID: {}", hex::encode(display_txid));

Pitfall 2: Off-by-One in Script Parsing

Problem: Mishandling direct push opcodes (0x01-0x4b).

Symptom: Script parsing fails or consumes wrong number of bytes.

Solution:

// The opcode itself IS the length, not length + 1
if opcode >= 0x01 && opcode <= 0x4b {
    let length = opcode as usize;  // Not opcode + 1
    let data = &script[pos + 1..pos + 1 + length];
    pos += 1 + length;
}

Pitfall 3: SegWit Detection False Positives

Problem: Some legacy transactions might have bytes that look like SegWit marker.

Symptom: Parsing fails on certain legacy transactions.

Solution:

fn is_segwit(bytes: &[u8]) -> bool {
    // Check marker and flag
    if bytes.len() >= 6 && bytes[4] == 0x00 && bytes[5] == 0x01 {
        // Additional check: input count should be valid CompactSize
        // Legacy tx with 0 inputs would be invalid anyway
        return true;
    }
    false
}

Pitfall 4: Empty Witness vs No Witness

Problem: Confusing SegWit transactions with empty witness data vs legacy transactions.

Symptom: Parser crashes or misinterprets witness.

Solution:

// Even native SegWit must have witness data (signature + pubkey)
// Empty witness [] is different from missing witness
if is_segwit {
    for _ in 0..num_inputs {
        let stack_items = read_compact_size(&bytes[pos..])?;
        // stack_items == 0 means empty witness for this input
        // (happens for non-SegWit inputs in mixed transactions)
    }
}

Pitfall 5: DER Signature Length Variability

Problem: Signatures can be 70-73 bytes due to leading zeros.

Symptom: Fixed-size parsing fails.

Solution:

fn parse_der_signature(bytes: &[u8]) -> Result<Signature> {
    // Read lengths from structure, don't assume fixed size
    let total_len = bytes[1] as usize;
    let r_len = bytes[3] as usize;
    let r = &bytes[4..4 + r_len];
    let s_offset = 4 + r_len;
    let s_len = bytes[s_offset + 1] as usize;
    let s = &bytes[s_offset + 2..s_offset + 2 + s_len];
    // ...
}

Pitfall 6: CompactSize Maximum Value

Problem: Not handling 8-byte (0xFF prefix) CompactSize values.

Symptom: Parser fails on transactions with very long scripts.

Solution:

fn read_compact_size(bytes: &[u8]) -> Result<(u64, usize)> {
    match bytes[0] {
        0x00..=0xFC => Ok((bytes[0] as u64, 1)),
        0xFD => Ok((read_u16_le(&bytes[1..3]) as u64, 3)),
        0xFE => Ok((read_u32_le(&bytes[1..5]) as u64, 5)),
        0xFF => Ok((read_u64_le(&bytes[1..9]), 9)),  // Don't forget this case!
    }
}

Extensions and Challenges

Challenge 1: Transaction Builder

Build the reverse: construct valid raw transactions programmatically. Create inputs, outputs, and serialize to hex.

Challenge 2: Signature Verification

Implement ECDSA signature verification (using your implementation from P02). Verify that signatures in parsed transactions are valid.

Challenge 3: Fee Estimation

Build a fee estimator that:

  • Calculates virtual bytes (vbytes) for weight calculation
  • Estimates fee based on transaction size
  • Suggests optimal fee for current mempool conditions

Challenge 4: Script Interpreter

Implement a Bitcoin Script interpreter that actually executes scripts. Verify that scriptSig + scriptPubKey evaluates to true for each input.

Challenge 5: PSBT Parser

Extend your parser to handle Partially Signed Bitcoin Transactions (BIP 174). These are used for multi-party signing workflows.

Challenge 6: Taproot Script Paths

Parse and display Taproot (P2TR) transactions including:

  • Key path spends
  • Script path spends with Merkle proofs
  • Control blocks and tapleaf scripts

Real-World Connections

Blockchain Analysis

Chain analysis companies (Chainalysis, Elliptic) parse every Bitcoin transaction to track fund flows, identify patterns, and detect illicit activity. Your parser is the first step in building such tools.

Wallet Development

Every Bitcoin wallet must parse transactions to:

  • Display transaction history
  • Calculate balances
  • Construct new transactions
  • Verify incoming payments

Node Implementation

Bitcoin Core’s parsing logic is critical infrastructure. Every transaction that propagates across the network is parsed thousands of times. Bugs in parsing code have led to network-wide issues in the past.

Educational Tools

Block explorers like Blockstream.info, mempool.space, and Blockchain.com all parse transactions to display human-readable information. Your tool could become the core of a personal block explorer.

Security Research

Understanding transaction format is essential for:

  • Analyzing malformed transaction attacks
  • Identifying signature grinding attacks
  • Detecting unusual script patterns
  • Forensic analysis of theft/hacks

Resources

Primary References

  1. “Mastering Bitcoin” Chapters 5-6 - Detailed transaction and script explanation
  2. Bitcoin Developer Reference: Transactions
  3. BIP 141 (SegWit): Segregated Witness
  4. BIP 143: Transaction Signature Verification for SegWit
  5. BIP 341 (Taproot): Taproot: SegWit version 1 spending rules

Code References

  1. Bitcoin Core: primitives/transaction.h
  2. rust-bitcoin: Transaction struct
  3. btcd (Go): wire/msgtx.go

Tools

  1. bitcoin-cli decoderawtransaction: Compare your output
  2. mempool.space: Visualize transaction structure
  3. learnmeabitcoin.com: Interactive transaction explorer
  4. Bitcoin Script IDE: ide.bitauth.com

Test Data

  1. Bitcoin Testnet: Use testnet transactions for safe testing
  2. Regtest: Create your own transactions for controlled testing
  3. Mainnet Samples: Famous transactions (pizza transaction, Satoshi’s transfers)

Self-Assessment Checklist

Before moving to the next project, verify:

  • I can parse both legacy and SegWit transactions correctly
  • I understand why CompactSize encoding saves space
  • I can disassemble any Bitcoin Script to opcodes
  • I can identify the script type from a scriptPubKey
  • I understand the difference between TXID and WTXID
  • I can explain why SegWit fixes transaction malleability
  • I can parse DER-encoded signatures and extract r, s values
  • I can compute the correct TXID for any transaction
  • I can convert pubkey hashes to addresses (Base58Check and Bech32)
  • My parser matches bitcoin-cli output for test transactions

What’s Next?

With transaction parsing mastered, you now understand how Bitcoin encodes and validates transfers at the byte level. In Project 8: Simple EVM Implementation, you’ll move from Bitcoin to Ethereum and build a minimal Ethereum Virtual Machine, learning how smart contracts execute bytecode to transform blockchain state.