P07: Bitcoin Transaction Parser

Project Overview

Attribute	Value
Main Language	Rust
Alternative Languages	C, Go, Python
Difficulty	Advanced
Coolness Level	Level 4: Hardcore Tech Flex
Business Potential	Resume Gold (Educational/Personal Brand)
Knowledge Area	Protocols / Bitcoin
Main Book	“Mastering Bitcoin” by Andreas M. Antonopoulos

Learning Objectives

By completing this project, you will:

Master Bitcoin’s binary transaction format understanding exactly how transactions are serialized and why each field exists
Implement variable-length encoding learning CompactSize integers (varints) used throughout Bitcoin
Parse and interpret Bitcoin Script the stack-based programming language that defines spending conditions
Understand SegWit (Segregated Witness) and how it changed the transaction format to fix malleability and increase throughput
Compute Transaction IDs (TXIDs) understanding which parts of a transaction are hashed and why

Deep Theoretical Foundation

Why Transaction Parsing Matters

Every Bitcoin ever transferred exists as a transaction in the blockchain. When you “send” Bitcoin, you’re really creating a cryptographically signed message that says: “I’m spending outputs from previous transactions and creating new outputs that can only be spent by whoever knows the private key for address X.”

But what does this message actually look like? It’s not JSON. It’s not XML. It’s a carefully designed binary format, optimized for size and unambiguous parsing. Understanding this format means understanding Bitcoin itself at the deepest level.

When you run a Bitcoin node, every single transaction that arrives over the network must be:

Parsed - Decoded from raw bytes into structured data
Validated - Checked for correctness (proper structure, valid signatures, unspent inputs)
Executed - Script evaluation to verify spending conditions

This project focuses on step 1: taking raw hex-encoded transactions and extracting every piece of information from them.

The UTXO Model

Before diving into transactions, you must understand Bitcoin’s data model. Unlike Ethereum’s account model (where you have a “balance”), Bitcoin uses Unspent Transaction Outputs (UTXOs).

Think of UTXOs as individual bills in your wallet:

You don’t have “$150” - you have a $100 bill, a $20 bill, and three $10 bills
To pay someone $75, you hand over the $100 bill and receive $25 in change
The $100 bill is “spent” (destroyed), and two new “bills” are created: $75 for them, $25 for you

In Bitcoin:

Inputs reference previous transaction outputs being spent
Outputs create new spendable amounts locked to new addresses
The sum of inputs must equal or exceed the sum of outputs (the difference is the miner’s fee)

Anatomy of a Bitcoin Transaction

A Bitcoin transaction consists of:

+----------------+
| Version (4)    |  - Transaction format version (1 or 2)
+----------------+
| Flag (2)*      |  - SegWit marker and flag (optional)
+----------------+
| Input Count    |  - Number of inputs (CompactSize)
+----------------+
| Inputs [...]   |  - Variable-length list of inputs
+----------------+
| Output Count   |  - Number of outputs (CompactSize)
+----------------+
| Outputs [...]  |  - Variable-length list of outputs
+----------------+
| Witness*       |  - Witness data for each input (if SegWit)
+----------------+
| Locktime (4)   |  - Block/time before which tx can't be mined
+----------------+

* Only present in SegWit transactions

CompactSize Integers (VarInt)

Bitcoin uses a variable-length integer encoding called CompactSize (or VarInt) throughout its protocol. This saves space by using fewer bytes for small numbers:

Value Range	Encoding
0 - 0xFC (252)	1 byte: value
0xFD - 0xFFFF	3 bytes: 0xFD + 2 bytes (little-endian)
0x10000 - 0xFFFFFFFF	5 bytes: 0xFE + 4 bytes (little-endian)
0x100000000+	9 bytes: 0xFF + 8 bytes (little-endian)

Examples:

0x05           -> 5 (1 byte)
0xFD 0x00 0x01 -> 256 (3 bytes: 0xFD + 0x0100 little-endian)
0xFE 0x00 0x01 0x00 0x00 -> 65536 (5 bytes)

This encoding is crucial because most transactions have 1-3 inputs and 1-3 outputs. Without CompactSize, we’d waste bytes specifying “number of inputs: 00 00 00 01” for every transaction.

Transaction Inputs

Each input in a transaction has the following structure:

+--------------------+
| Previous TXID (32) |  - Hash of transaction containing the output being spent
+--------------------+
| Previous Vout (4)  |  - Index of the output within that transaction
+--------------------+
| Script Length      |  - CompactSize: length of scriptSig
+--------------------+
| ScriptSig (var)    |  - Unlocking script (signature and pubkey)
+--------------------+
| Sequence (4)       |  - Originally for transaction replacement
+--------------------+

Understanding Each Field:

Previous TXID (32 bytes): The SHA256d hash of a previous transaction. Note: This is stored in internal byte order (reversed from the display format you see in block explorers).
Previous Vout (4 bytes, little-endian): Which output of that transaction are we spending? First output is 0.
ScriptSig (variable): The “unlocking” script that proves we have the right to spend this output. For P2PKH, this typically contains a signature and public key.
Sequence (4 bytes, little-endian): Originally designed for transaction replacement but now used for:
- Signaling RBF (Replace-By-Fee) if < 0xFFFFFFFE
- Enabling relative timelocks (BIP 68)
- 0xFFFFFFFF disables locktime checking

Transaction Outputs

Each output has this structure:

+--------------------+
| Value (8)          |  - Amount in satoshis (little-endian)
+--------------------+
| Script Length      |  - CompactSize: length of scriptPubKey
+--------------------+
| ScriptPubKey (var) |  - Locking script (spending conditions)
+--------------------+

Understanding Each Field:

Value (8 bytes, little-endian): Amount in satoshis (1 BTC = 100,000,000 satoshis). Max value: 21,000,000 BTC = 2.1 x 10^15 satoshis, which fits in 51 bits.
ScriptPubKey (variable): The “locking” script that defines conditions for spending this output. Different address types have different standard scripts:
- P2PKH (Pay-to-Public-Key-Hash): OP_DUP OP_HASH160 <20-byte-hash> OP_EQUALVERIFY OP_CHECKSIG
- P2SH (Pay-to-Script-Hash): OP_HASH160 <20-byte-hash> OP_EQUAL
- P2WPKH (Native SegWit): OP_0 <20-byte-hash>
- P2WSH (SegWit Script Hash): OP_0 <32-byte-hash>
- P2TR (Taproot): OP_1 <32-byte-key>

Bitcoin Script: A Stack-Based Language

Bitcoin Script is a simple, stack-based, Turing-incomplete language. Understanding it is essential for parsing transactions correctly.

Execution Model:

Start with empty stack
Execute scriptSig (unlocking script) - pushes data onto stack
Execute scriptPubKey (locking script) - manipulates stack
Transaction is valid if stack is non-empty and top element is true

Common Opcodes:

Opcode	Hex	Description
OP_0	0x00	Push empty byte array (false)
OP_PUSHDATA1	0x4c	Next byte is length, then push that many bytes
OP_PUSHDATA2	0x4d	Next 2 bytes are length, then push that many bytes
OP_1 to OP_16	0x51-0x60	Push 1-16 onto stack
OP_DUP	0x76	Duplicate top stack item
OP_HASH160	0xa9	SHA256 then RIPEMD160 of top item
OP_EQUAL	0x87	Pop two items, push 1 if equal, else 0
OP_EQUALVERIFY	0x88	OP_EQUAL then OP_VERIFY
OP_CHECKSIG	0xac	Verify signature against pubkey
OP_CHECKMULTISIG	0xae	M-of-N signature verification
OP_RETURN	0x6a	Marks output as provably unspendable (data embedding)

Data Push Opcodes (0x01 - 0x4b): When the opcode is between 0x01 and 0x4b (1-75), it means “push the next N bytes onto the stack.”

Example: P2PKH ScriptPubKey

76 a9 14 89abcdef... 88 ac
^  ^  ^  ^          ^  ^
|  |  |  |          |  OP_CHECKSIG
|  |  |  |          OP_EQUALVERIFY
|  |  |  20 bytes (pubkey hash)
|  |  Push next 20 bytes
|  OP_HASH160
OP_DUP

SegWit: Segregated Witness

Segregated Witness (BIP 141/143/144) was Bitcoin’s most significant upgrade. It separates (“segregates”) the signature (“witness”) data from the main transaction structure.

Why SegWit?

Transaction Malleability Fix: Before SegWit, anyone could modify the scriptSig (specifically, the encoding of signatures) without invalidating the signature itself. This changed the TXID (which hashes the entire transaction including scriptSig), breaking chains of unconfirmed transactions.
Block Space Efficiency: Witness data counts at 1/4 weight, effectively increasing block capacity.
Enables Lightning Network: Fixed malleability allows off-chain payment channels to work safely.

SegWit Transaction Format:

+----------------+
| Version (4)    |
+----------------+
| Marker (0x00)  |  <- SegWit marker
+----------------+
| Flag (0x01)    |  <- SegWit flag
+----------------+
| Input Count    |
+----------------+
| Inputs [...]   |  <- scriptSig is empty for native SegWit
+----------------+
| Output Count   |
+----------------+
| Outputs [...]  |
+----------------+
| Witness [...]  |  <- Witness data for each input
+----------------+
| Locktime (4)   |
+----------------+

Witness Data Structure:

For each input, witness data is:

+--------------------+
| Stack Item Count   |  - CompactSize: number of items
+--------------------+
| Item 1 Length      |  - CompactSize
+--------------------+
| Item 1 Data        |
+--------------------+
| Item 2 Length      |
+--------------------+
| Item 2 Data        |
+--------------------+
| ...                |
+--------------------+

For P2WPKH, witness typically contains:

Signature (71-73 bytes, DER encoded + sighash type)
Public key (33 bytes compressed)

TXID vs WTXID

TXID (Transaction ID):

SHA256d of transaction without witness data
For legacy transactions: hash of entire transaction
For SegWit transactions: hash of version + inputs + outputs + locktime (no marker, flag, or witness)

WTXID (Witness Transaction ID):

SHA256d of entire transaction including witness
Used in witness commitment (stored in coinbase transaction)

Why Two IDs?

The TXID remains stable regardless of witness data, solving malleability. When you reference a previous output to spend, you use the TXID. The WTXID is used to commit to the complete transaction data in blocks.

Computing TXID for SegWit Transaction:

TXID = SHA256(SHA256(
    version (4 bytes) ||
    input_count (varint) ||
    inputs (without scriptSig, which is empty anyway) ||
    output_count (varint) ||
    outputs ||
    locktime (4 bytes)
))

Note: This is exactly the “stripped” transaction - what you’d get if you removed marker, flag, and witness.

DER Signature Encoding

Bitcoin signatures use DER (Distinguished Encoding Rules) format, a subset of ASN.1:

<total_length>
<r_length> <r_value>
<s_length> <s_value>
<sighash_type>

Structure:

0x30: SEQUENCE tag
Total length: Length of everything after this byte (excluding sighash)
0x02: INTEGER tag for r
r_length: Length of r (usually 32-33 bytes)
r_value: The r value (may have leading 0x00 if high bit is set)
0x02: INTEGER tag for s
s_length: Length of s (usually 32 bytes after low-S normalization)
s_value: The s value
sighash_type: 0x01 = SIGHASH_ALL (most common)

Example:

30 44 02 20 7a8f... 02 20 3b4c... 01

Sighash Types

The sighash type (last byte of signature) determines which parts of the transaction the signature commits to:

Type	Value	Description
SIGHASH_ALL	0x01	Signs all inputs and outputs (default)
SIGHASH_NONE	0x02	Signs inputs, but not outputs
SIGHASH_SINGLE	0x03	Signs inputs and only the output at same index
SIGHASH_ANYONECANPAY	0x80	Can be combined with above; only signs own input

Complete Project Specification

Functional Requirements

Raw Transaction Parsing
- Accept hex-encoded raw transactions
- Detect and parse both legacy and SegWit formats
- Extract all fields: version, inputs, outputs, locktime
- Parse witness data for SegWit transactions
Script Analysis
- Decode scriptSig and scriptPubKey
- Identify script type (P2PKH, P2SH, P2WPKH, P2WSH, P2TR, OP_RETURN)
- Disassemble opcodes to human-readable format
- Extract embedded data (addresses, public keys, hashes)
Signature Parsing
- Parse DER-encoded signatures
- Extract r and s values
- Identify sighash type
- Validate signature structure
Transaction ID Computation
- Compute TXID (legacy method)
- Compute WTXID (includes witness)
- Display in correct byte order (display vs internal)
Human-Readable Output
- Pretty-print transaction structure
- Show amounts in BTC and satoshis
- Convert addresses to standard formats
- Calculate and display transaction fee (if inputs known)

Non-Functional Requirements

Correctness: Parse all valid Bitcoin transactions correctly
Robustness: Handle malformed transactions gracefully with clear errors
Performance: Parse 1000 transactions per second
Compatibility: Match output of bitcoin-cli decoderawtransaction

Command-Line Interface

# Parse a transaction from hex
$ btc-tx-parser 0100000001abcd...
Transaction ID: 7a3f...
Version: 1
Inputs: 1
  [0] Previous TX: abcd1234...
      Previous Vout: 0
      ScriptSig: 47304402...
      Sequence: 0xffffffff
Outputs: 2
  [0] Value: 0.5 BTC (50000000 sat)
      ScriptPubKey: OP_DUP OP_HASH160 89ab... OP_EQUALVERIFY OP_CHECKSIG
      Type: P2PKH
      Address: 1A1zP1...
  [1] Value: 0.00009 BTC (9000 sat)
      ScriptPubKey: OP_RETURN 48656c6c6f
      Type: OP_RETURN
      Data: "Hello"
Locktime: 0

# Parse from file
$ btc-tx-parser --file transaction.hex

# Output as JSON
$ btc-tx-parser --json 0100000001abcd...

# Verbose mode (show raw bytes alongside parsed)
$ btc-tx-parser --verbose 0100000001abcd...

# Compute TXID/WTXID only
$ btc-tx-parser --txid 0100000001abcd...
TXID: 7a3f5c...
WTXID: 8b4e6d...

# Disassemble script only
$ btc-tx-parser --script "76a91489abcdef...88ac"
OP_DUP OP_HASH160 89abcdef... OP_EQUALVERIFY OP_CHECKSIG

Solution Architecture

Module Structure

src/
+-- main.rs              # CLI entry point
+-- lib.rs               # Public API
+-- parser/
|   +-- mod.rs           # Transaction parser coordinator
|   +-- varint.rs        # CompactSize integer parsing
|   +-- input.rs         # Transaction input parsing
|   +-- output.rs        # Transaction output parsing
|   +-- witness.rs       # Witness data parsing
+-- script/
|   +-- mod.rs           # Script disassembler
|   +-- opcodes.rs       # Opcode definitions
|   +-- types.rs         # Script type detection
|   +-- address.rs       # Address extraction/encoding
+-- signature/
|   +-- mod.rs           # Signature handling
|   +-- der.rs           # DER encoding parser
|   +-- sighash.rs       # Sighash type definitions
+-- hash/
|   +-- mod.rs           # Hashing utilities
|   +-- txid.rs          # TXID/WTXID computation
+-- display/
|   +-- mod.rs           # Output formatting
|   +-- json.rs          # JSON serialization
|   +-- pretty.rs        # Human-readable output
+-- tests/
    +-- vectors.rs       # Known transaction test vectors
    +-- segwit_tests.rs  # SegWit-specific tests
    +-- script_tests.rs  # Script parsing tests

Core Data Structures

/// A parsed Bitcoin transaction
pub struct Transaction {
    pub version: u32,
    pub inputs: Vec<TxInput>,
    pub outputs: Vec<TxOutput>,
    pub witness: Option<Vec<Witness>>,
    pub locktime: u32,
}

/// A transaction input
pub struct TxInput {
    pub prev_txid: [u8; 32],
    pub prev_vout: u32,
    pub script_sig: Script,
    pub sequence: u32,
}

/// A transaction output
pub struct TxOutput {
    pub value: u64,  // in satoshis
    pub script_pubkey: Script,
}

/// Witness data for one input
pub struct Witness {
    pub items: Vec<Vec<u8>>,
}

/// A Bitcoin script (raw bytes + disassembly)
pub struct Script {
    pub raw: Vec<u8>,
    pub ops: Vec<ScriptOp>,
    pub script_type: ScriptType,
}

/// A single script operation
pub enum ScriptOp {
    /// Push data onto stack
    Push(Vec<u8>),
    /// Named opcode
    Op(Opcode),
}

/// Standard script types
pub enum ScriptType {
    P2PKH,
    P2SH,
    P2WPKH,
    P2WSH,
    P2TR,
    OpReturn(Vec<u8>),
    Multisig { m: u8, n: u8 },
    NonStandard,
}

/// A parsed DER signature
pub struct Signature {
    pub r: Vec<u8>,
    pub s: Vec<u8>,
    pub sighash: SighashType,
}

/// Sighash types
pub enum SighashType {
    All,
    None,
    Single,
    AllAnyoneCanPay,
    NoneAnyoneCanPay,
    SingleAnyoneCanPay,
}

Key Algorithms

CompactSize Parsing

function read_compact_size(bytes, offset) -> (value, new_offset):
    first_byte = bytes[offset]

    if first_byte < 0xFD:
        return (first_byte, offset + 1)

    else if first_byte == 0xFD:
        value = read_u16_le(bytes, offset + 1)
        return (value, offset + 3)

    else if first_byte == 0xFE:
        value = read_u32_le(bytes, offset + 1)
        return (value, offset + 5)

    else:  // 0xFF
        value = read_u64_le(bytes, offset + 1)
        return (value, offset + 9)

SegWit Detection

function is_segwit(bytes) -> bool:
    // SegWit transactions have marker=0x00 and flag=0x01 after version
    if len(bytes) < 6:
        return false

    // Read version (4 bytes)
    // Check bytes 4 and 5
    marker = bytes[4]
    flag = bytes[5]

    return marker == 0x00 and flag == 0x01

TXID Computation

function compute_txid(tx: Transaction) -> [u8; 32]:
    // Serialize without witness data
    serialized = serialize_no_witness(tx)
    // Double SHA256
    hash1 = sha256(serialized)
    hash2 = sha256(hash1)
    // TXID is displayed in reversed byte order
    return hash2

Script Disassembly

function disassemble_script(script: bytes) -> Vec<ScriptOp>:
    ops = []
    i = 0

    while i < len(script):
        opcode = script[i]

        if opcode == 0x00:
            ops.push(Op(OP_0))
            i += 1

        else if opcode >= 0x01 and opcode <= 0x4b:
            // Direct push
            length = opcode
            data = script[i+1 : i+1+length]
            ops.push(Push(data))
            i += 1 + length

        else if opcode == 0x4c:  // OP_PUSHDATA1
            length = script[i+1]
            data = script[i+2 : i+2+length]
            ops.push(Push(data))
            i += 2 + length

        else if opcode == 0x4d:  // OP_PUSHDATA2
            length = read_u16_le(script, i+1)
            data = script[i+3 : i+3+length]
            ops.push(Push(data))
            i += 3 + length

        else:
            ops.push(Op(OPCODE_MAP[opcode]))
            i += 1

    return ops

Phased Implementation Guide

Phase 1: Hex Parsing and Basic Structure

Goal: Read hex-encoded transactions and extract the version and locktime.

Tasks:

Create hex-to-bytes conversion function
Parse version (first 4 bytes, little-endian)
Parse locktime (last 4 bytes, little-endian)
Detect SegWit marker/flag

Validation:

let tx = parse_transaction("01000000...00000000");
assert_eq!(tx.version, 1);
assert_eq!(tx.locktime, 0);

Hints if stuck:

Version is little-endian: bytes 0x01 0x00 0x00 0x00 = version 1
Locktime is always the last 4 bytes
For SegWit detection, check if bytes[4..6] == [0x00, 0x01]

Phase 2: CompactSize Integers

Goal: Implement variable-length integer parsing.

Tasks:

Implement read_compact_size function
Handle all four encoding cases (1, 3, 5, 9 bytes)
Return both the value and bytes consumed
Add comprehensive tests

Validation:

assert_eq!(read_compact_size(&[0x05]), (5, 1));
assert_eq!(read_compact_size(&[0xFD, 0x00, 0x01]), (256, 3));
assert_eq!(read_compact_size(&[0xFE, 0x00, 0x00, 0x01, 0x00]), (65536, 5));

Hints if stuck:

First check if first byte < 0xFD for single-byte case
Remember: values are little-endian
Return (value, number_of_bytes_consumed) for cursor tracking

Phase 3: Input Parsing

Goal: Parse transaction inputs.

Tasks:

Read input count using CompactSize
For each input:
- Read previous TXID (32 bytes, reversed for display)
- Read previous vout (4 bytes, little-endian)
- Read scriptSig length and data
- Read sequence (4 bytes, little-endian)
Store in TxInput struct

Validation:

// Parse a known transaction and verify input fields
let tx = parse_transaction("01000000017b...");
assert_eq!(tx.inputs.len(), 1);
assert_eq!(tx.inputs[0].prev_vout, 0);

Hints if stuck:

TXID bytes are stored in “internal” order but displayed reversed
ScriptSig can be empty (especially for SegWit inputs)
Use a cursor/offset pattern to track position in byte stream

Phase 4: Output Parsing

Goal: Parse transaction outputs.

Tasks:

Read output count using CompactSize
For each output:
- Read value (8 bytes, little-endian)
- Read scriptPubKey length and data
Display value in both satoshis and BTC

Validation:

let tx = parse_transaction("...");
assert_eq!(tx.outputs.len(), 2);
assert_eq!(tx.outputs[0].value, 50_000_000); // 0.5 BTC

Hints if stuck:

Value is u64, not u32 (8 bytes)
1 BTC = 100,000,000 satoshis
Max value is ~21 million BTC = 2.1 quadrillion satoshis

Phase 5: Script Disassembly

Goal: Decode Bitcoin Script to human-readable format.

Tasks:

Define opcode constants (OP_DUP, OP_HASH160, etc.)
Implement direct push opcodes (0x01-0x4b)
Implement OP_PUSHDATA1 and OP_PUSHDATA2
Handle remaining opcodes
Format output as assembly-style string

Validation:

let script = hex::decode("76a91489abcdef...88ac")?;
let disasm = disassemble(&script);
assert_eq!(disasm, "OP_DUP OP_HASH160 <20-bytes> OP_EQUALVERIFY OP_CHECKSIG");

Hints if stuck:

Opcodes 0x01-0x4b are “push next N bytes”
Use a match/switch on opcode value
Format pushed data as hex for now, later convert to addresses

Phase 6: Script Type Detection

Goal: Identify standard script patterns.

Tasks:

Detect P2PKH: OP_DUP OP_HASH160 <20> OP_EQUALVERIFY OP_CHECKSIG
Detect P2SH: OP_HASH160 <20> OP_EQUAL
Detect P2WPKH: OP_0 <20>
Detect P2WSH: OP_0 <32>
Detect P2TR: OP_1 <32>
Detect OP_RETURN: OP_RETURN <data>
Extract pubkey hash or script hash

Validation:

let script = hex::decode("0014a91b...1234")?;
assert_eq!(detect_type(&script), ScriptType::P2WPKH);

Hints if stuck:

Check script length and byte patterns
P2WPKH is exactly 22 bytes: 0x00 0x14 + 20 bytes
P2WSH is exactly 34 bytes: 0x00 0x20 + 32 bytes
OP_RETURN starts with 0x6a

Phase 7: Witness Data Parsing

Goal: Parse SegWit witness data.

Tasks:

Skip to witness section after outputs
Read witness data for each input:
- Read stack item count (CompactSize)
- For each item: read length and data
Link witness to corresponding input

Validation:

let tx = parse_segwit_transaction("02000000000101...");
assert!(tx.witness.is_some());
assert_eq!(tx.witness.unwrap()[0].items.len(), 2);

Hints if stuck:

Witness comes after all outputs, before locktime
Each input has its own witness (even if empty)
Non-witness inputs have witness = [[]] (one empty item)

Phase 8: Signature Parsing (DER)

Goal: Extract signature components from DER encoding.

Tasks:

Parse DER sequence tag (0x30)
Parse r value (integer tag 0x02 + length + data)
Parse s value (integer tag 0x02 + length + data)
Extract sighash type (last byte)
Handle leading zeros and high-bit padding

Validation:

let sig = hex::decode("304402207a8f...01")?;
let parsed = parse_signature(&sig)?;
assert_eq!(parsed.sighash, SighashType::All);
assert_eq!(parsed.r.len(), 32);

Hints if stuck:

Leading 0x00 byte on r/s if high bit is set (prevents negative interpretation)
s should be “low-S” normalized (less than n/2)
Sighash byte is appended after DER structure

Phase 9: TXID Computation

Goal: Compute transaction IDs correctly.

Tasks:

Implement SHA256 (or use sha2 crate)
Implement double-SHA256 (SHA256d)
For legacy: hash entire transaction
For SegWit: hash stripped transaction (no witness)
Implement WTXID (hash with witness)
Handle byte order for display

Validation:

// Compare against known TXID
let tx = parse_transaction("01000000017b1eabe...");
let txid = compute_txid(&tx);
assert_eq!(hex::encode(txid.reverse()), "a1b2c3d4...");

Hints if stuck:

TXID is displayed in reversed byte order (legacy display convention)
For SegWit stripped transaction: version + inputs + outputs + locktime (no marker/flag/witness)
Use sha2 crate for production; implement from scratch for learning

Phase 10: Address Encoding

Goal: Convert script hashes to standard address formats.

Tasks:

Implement Base58Check for legacy addresses (P2PKH, P2SH)
Implement Bech32 for SegWit addresses (P2WPKH, P2WSH)
Implement Bech32m for Taproot addresses (P2TR)
Handle mainnet vs testnet prefixes

Validation:

// P2PKH
assert_eq!(
    pubkey_hash_to_address(&hash, Network::Mainnet),
    "1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2"
);

// P2WPKH
assert_eq!(
    witness_pubkey_to_address(&hash, Network::Mainnet),
    "bc1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t4"
);

Hints if stuck:

Base58Check = Base58(payload + checksum); checksum = first 4 bytes of SHA256d(payload)
Bech32 uses 5-bit character encoding with specific checksum
Prefix: mainnet P2PKH=0x00, P2SH=0x05; testnet P2PKH=0x6f, P2SH=0xc4

Phase 11: Pretty Printing and JSON Output

Goal: Create user-friendly output formats.

Tasks:

Format transaction as hierarchical text
Implement JSON serialization
Add verbose mode showing raw bytes alongside parsed values
Calculate fee (if previous outputs are known)

Validation:

$ btc-tx-parser --json <tx>
{
  "txid": "abc123...",
  "version": 2,
  "inputs": [...],
  ...
}

Hints if stuck:

Use serde for JSON serialization
Show both hex and interpreted values in verbose mode
Fee = sum(inputs) - sum(outputs); requires looking up previous transactions

Testing Strategy

Unit Tests

#[test]
fn test_compact_size_single_byte() {
    assert_eq!(read_compact_size(&[0x00]), (0, 1));
    assert_eq!(read_compact_size(&[0xFC]), (252, 1));
}

#[test]
fn test_compact_size_three_byte() {
    assert_eq!(read_compact_size(&[0xFD, 0xFD, 0x00]), (253, 3));
    assert_eq!(read_compact_size(&[0xFD, 0xFF, 0xFF]), (65535, 3));
}

#[test]
fn test_script_type_detection() {
    // P2PKH
    let p2pkh = hex::decode("76a91489abcdefabbaabbaabbaabbaabbaabbaabbaabba88ac").unwrap();
    assert_eq!(detect_script_type(&p2pkh), ScriptType::P2PKH);

    // P2WPKH
    let p2wpkh = hex::decode("0014751e76e8199196d454941c45d1b3a323f1433bd6").unwrap();
    assert_eq!(detect_script_type(&p2wpkh), ScriptType::P2WPKH);
}

Known Transaction Test Vectors

#[test]
fn test_genesis_coinbase() {
    // Satoshi's genesis block coinbase
    let hex = "01000000010000000000000000000000000000000000000000000000000000000000000000ffffffff4d04ffff001d0104455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73ffffffff0100f2052a0100000043410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac00000000";
    let tx = parse_transaction(hex).unwrap();

    assert_eq!(tx.version, 1);
    assert_eq!(tx.inputs.len(), 1);
    assert_eq!(tx.outputs.len(), 1);
    assert_eq!(tx.outputs[0].value, 5_000_000_000); // 50 BTC
}

#[test]
fn test_segwit_transaction() {
    // Known SegWit transaction
    let hex = "02000000000101...";
    let tx = parse_transaction(hex).unwrap();

    assert!(tx.witness.is_some());
    // Verify TXID and WTXID
}

#[test]
fn test_multisig_transaction() {
    // 2-of-3 multisig
    let hex = "0100000001...";
    let tx = parse_transaction(hex).unwrap();

    // Check multisig script detection
}

Compatibility Tests

#[test]
fn test_bitcoin_core_compatibility() {
    // Parse transactions and compare output to bitcoin-cli decoderawtransaction
    let transactions = load_test_vectors("testdata/transactions.json");

    for (hex, expected) in transactions {
        let parsed = parse_transaction(&hex).unwrap();
        let json = serde_json::to_value(&parsed).unwrap();
        assert_json_eq!(json, expected);
    }
}

Fuzz Testing

#[test]
fn fuzz_transaction_parser() {
    // Ensure parser doesn't crash on arbitrary input
    for _ in 0..10000 {
        let random_bytes = generate_random_bytes(1000);
        let _ = parse_transaction(&hex::encode(random_bytes));
        // Should not panic, may return error
    }
}

Common Pitfalls and Debugging

Pitfall 1: Byte Order Confusion

Problem: Bitcoin uses little-endian for most values, but TXIDs are displayed in reversed order.

Symptom: TXID doesn’t match block explorer.

Solution:

// Internal byte order (as stored in transaction)
let internal_txid = [0x01, 0x02, 0x03, ..., 0x20];

// Display order (reversed for human display)
let display_txid = internal_txid.iter().rev().collect::<Vec<_>>();
println!("TXID: {}", hex::encode(display_txid));

Pitfall 2: Off-by-One in Script Parsing

Problem: Mishandling direct push opcodes (0x01-0x4b).

Symptom: Script parsing fails or consumes wrong number of bytes.

Solution:

// The opcode itself IS the length, not length + 1
if opcode >= 0x01 && opcode <= 0x4b {
    let length = opcode as usize;  // Not opcode + 1
    let data = &script[pos + 1..pos + 1 + length];
    pos += 1 + length;
}

Pitfall 3: SegWit Detection False Positives

Problem: Some legacy transactions might have bytes that look like SegWit marker.

Symptom: Parsing fails on certain legacy transactions.

Solution:

fn is_segwit(bytes: &[u8]) -> bool {
    // Check marker and flag
    if bytes.len() >= 6 && bytes[4] == 0x00 && bytes[5] == 0x01 {
        // Additional check: input count should be valid CompactSize
        // Legacy tx with 0 inputs would be invalid anyway
        return true;
    }
    false
}

Pitfall 4: Empty Witness vs No Witness

Problem: Confusing SegWit transactions with empty witness data vs legacy transactions.

Symptom: Parser crashes or misinterprets witness.

Solution:

// Even native SegWit must have witness data (signature + pubkey)
// Empty witness [] is different from missing witness
if is_segwit {
    for _ in 0..num_inputs {
        let stack_items = read_compact_size(&bytes[pos..])?;
        // stack_items == 0 means empty witness for this input
        // (happens for non-SegWit inputs in mixed transactions)
    }
}

Pitfall 5: DER Signature Length Variability

Problem: Signatures can be 70-73 bytes due to leading zeros.

Symptom: Fixed-size parsing fails.

Solution:

fn parse_der_signature(bytes: &[u8]) -> Result<Signature> {
    // Read lengths from structure, don't assume fixed size
    let total_len = bytes[1] as usize;
    let r_len = bytes[3] as usize;
    let r = &bytes[4..4 + r_len];
    let s_offset = 4 + r_len;
    let s_len = bytes[s_offset + 1] as usize;
    let s = &bytes[s_offset + 2..s_offset + 2 + s_len];
    // ...
}

Pitfall 6: CompactSize Maximum Value

Problem: Not handling 8-byte (0xFF prefix) CompactSize values.

Symptom: Parser fails on transactions with very long scripts.

Solution:

fn read_compact_size(bytes: &[u8]) -> Result<(u64, usize)> {
    match bytes[0] {
        0x00..=0xFC => Ok((bytes[0] as u64, 1)),
        0xFD => Ok((read_u16_le(&bytes[1..3]) as u64, 3)),
        0xFE => Ok((read_u32_le(&bytes[1..5]) as u64, 5)),
        0xFF => Ok((read_u64_le(&bytes[1..9]), 9)),  // Don't forget this case!
    }
}

Extensions and Challenges

Challenge 1: Transaction Builder

Build the reverse: construct valid raw transactions programmatically. Create inputs, outputs, and serialize to hex.

Challenge 2: Signature Verification

Implement ECDSA signature verification (using your implementation from P02). Verify that signatures in parsed transactions are valid.

Challenge 3: Fee Estimation

Build a fee estimator that:

Calculates virtual bytes (vbytes) for weight calculation
Estimates fee based on transaction size
Suggests optimal fee for current mempool conditions

Challenge 4: Script Interpreter

Implement a Bitcoin Script interpreter that actually executes scripts. Verify that scriptSig + scriptPubKey evaluates to true for each input.

Challenge 5: PSBT Parser

Extend your parser to handle Partially Signed Bitcoin Transactions (BIP 174). These are used for multi-party signing workflows.

Challenge 6: Taproot Script Paths

Parse and display Taproot (P2TR) transactions including:

Key path spends
Script path spends with Merkle proofs
Control blocks and tapleaf scripts

Real-World Connections

Blockchain Analysis

Chain analysis companies (Chainalysis, Elliptic) parse every Bitcoin transaction to track fund flows, identify patterns, and detect illicit activity. Your parser is the first step in building such tools.

Wallet Development

Every Bitcoin wallet must parse transactions to:

Display transaction history
Calculate balances
Construct new transactions
Verify incoming payments

Node Implementation

Bitcoin Core’s parsing logic is critical infrastructure. Every transaction that propagates across the network is parsed thousands of times. Bugs in parsing code have led to network-wide issues in the past.

Educational Tools

Block explorers like Blockstream.info, mempool.space, and Blockchain.com all parse transactions to display human-readable information. Your tool could become the core of a personal block explorer.

Security Research

Understanding transaction format is essential for:

Analyzing malformed transaction attacks
Identifying signature grinding attacks
Detecting unusual script patterns
Forensic analysis of theft/hacks

Resources

Primary References

“Mastering Bitcoin” Chapters 5-6 - Detailed transaction and script explanation
Bitcoin Developer Reference: Transactions
BIP 141 (SegWit): Segregated Witness
BIP 143: Transaction Signature Verification for SegWit
BIP 341 (Taproot): Taproot: SegWit version 1 spending rules

Code References

Bitcoin Core: primitives/transaction.h
rust-bitcoin: Transaction struct
btcd (Go): wire/msgtx.go

Tools

bitcoin-cli decoderawtransaction: Compare your output
mempool.space: Visualize transaction structure
learnmeabitcoin.com: Interactive transaction explorer
Bitcoin Script IDE: ide.bitauth.com

Test Data

Bitcoin Testnet: Use testnet transactions for safe testing
Regtest: Create your own transactions for controlled testing
Mainnet Samples: Famous transactions (pizza transaction, Satoshi’s transfers)

Self-Assessment Checklist

Before moving to the next project, verify:

What’s Next?

With transaction parsing mastered, you now understand how Bitcoin encodes and validates transfers at the byte level. In Project 8: Simple EVM Implementation, you’ll move from Bitcoin to Ethereum and build a minimal Ethereum Virtual Machine, learning how smart contracts execute bytecode to transform blockchain state.