P07: Bitcoin Transaction Parser
P07: Bitcoin Transaction Parser
Project Overview
| Attribute | Value |
|---|---|
| Main Language | Rust |
| Alternative Languages | C, Go, Python |
| Difficulty | Advanced |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | Resume Gold (Educational/Personal Brand) |
| Knowledge Area | Protocols / Bitcoin |
| Main Book | âMastering Bitcoinâ by Andreas M. Antonopoulos |
Learning Objectives
By completing this project, you will:
- Master Bitcoinâs binary transaction format understanding exactly how transactions are serialized and why each field exists
- Implement variable-length encoding learning CompactSize integers (varints) used throughout Bitcoin
- Parse and interpret Bitcoin Script the stack-based programming language that defines spending conditions
- Understand SegWit (Segregated Witness) and how it changed the transaction format to fix malleability and increase throughput
- Compute Transaction IDs (TXIDs) understanding which parts of a transaction are hashed and why
Deep Theoretical Foundation
Why Transaction Parsing Matters
Every Bitcoin ever transferred exists as a transaction in the blockchain. When you âsendâ Bitcoin, youâre really creating a cryptographically signed message that says: âIâm spending outputs from previous transactions and creating new outputs that can only be spent by whoever knows the private key for address X.â
But what does this message actually look like? Itâs not JSON. Itâs not XML. Itâs a carefully designed binary format, optimized for size and unambiguous parsing. Understanding this format means understanding Bitcoin itself at the deepest level.
When you run a Bitcoin node, every single transaction that arrives over the network must be:
- Parsed - Decoded from raw bytes into structured data
- Validated - Checked for correctness (proper structure, valid signatures, unspent inputs)
- Executed - Script evaluation to verify spending conditions
This project focuses on step 1: taking raw hex-encoded transactions and extracting every piece of information from them.
The UTXO Model
Before diving into transactions, you must understand Bitcoinâs data model. Unlike Ethereumâs account model (where you have a âbalanceâ), Bitcoin uses Unspent Transaction Outputs (UTXOs).
Think of UTXOs as individual bills in your wallet:
- You donât have â$150â - you have a $100 bill, a $20 bill, and three $10 bills
- To pay someone $75, you hand over the $100 bill and receive $25 in change
- The $100 bill is âspentâ (destroyed), and two new âbillsâ are created: $75 for them, $25 for you
In Bitcoin:
- Inputs reference previous transaction outputs being spent
- Outputs create new spendable amounts locked to new addresses
- The sum of inputs must equal or exceed the sum of outputs (the difference is the minerâs fee)
Anatomy of a Bitcoin Transaction
A Bitcoin transaction consists of:
+----------------+
| Version (4) | - Transaction format version (1 or 2)
+----------------+
| Flag (2)* | - SegWit marker and flag (optional)
+----------------+
| Input Count | - Number of inputs (CompactSize)
+----------------+
| Inputs [...] | - Variable-length list of inputs
+----------------+
| Output Count | - Number of outputs (CompactSize)
+----------------+
| Outputs [...] | - Variable-length list of outputs
+----------------+
| Witness* | - Witness data for each input (if SegWit)
+----------------+
| Locktime (4) | - Block/time before which tx can't be mined
+----------------+
* Only present in SegWit transactions
CompactSize Integers (VarInt)
Bitcoin uses a variable-length integer encoding called CompactSize (or VarInt) throughout its protocol. This saves space by using fewer bytes for small numbers:
| Value Range | Encoding |
|---|---|
| 0 - 0xFC (252) | 1 byte: value |
| 0xFD - 0xFFFF | 3 bytes: 0xFD + 2 bytes (little-endian) |
| 0x10000 - 0xFFFFFFFF | 5 bytes: 0xFE + 4 bytes (little-endian) |
| 0x100000000+ | 9 bytes: 0xFF + 8 bytes (little-endian) |
Examples:
0x05 -> 5 (1 byte)
0xFD 0x00 0x01 -> 256 (3 bytes: 0xFD + 0x0100 little-endian)
0xFE 0x00 0x01 0x00 0x00 -> 65536 (5 bytes)
This encoding is crucial because most transactions have 1-3 inputs and 1-3 outputs. Without CompactSize, weâd waste bytes specifying ânumber of inputs: 00 00 00 01â for every transaction.
Transaction Inputs
Each input in a transaction has the following structure:
+--------------------+
| Previous TXID (32) | - Hash of transaction containing the output being spent
+--------------------+
| Previous Vout (4) | - Index of the output within that transaction
+--------------------+
| Script Length | - CompactSize: length of scriptSig
+--------------------+
| ScriptSig (var) | - Unlocking script (signature and pubkey)
+--------------------+
| Sequence (4) | - Originally for transaction replacement
+--------------------+
Understanding Each Field:
-
Previous TXID (32 bytes): The SHA256d hash of a previous transaction. Note: This is stored in internal byte order (reversed from the display format you see in block explorers).
-
Previous Vout (4 bytes, little-endian): Which output of that transaction are we spending? First output is 0.
-
ScriptSig (variable): The âunlockingâ script that proves we have the right to spend this output. For P2PKH, this typically contains a signature and public key.
-
Sequence (4 bytes, little-endian): Originally designed for transaction replacement but now used for:
- Signaling RBF (Replace-By-Fee) if < 0xFFFFFFFE
- Enabling relative timelocks (BIP 68)
- 0xFFFFFFFF disables locktime checking
Transaction Outputs
Each output has this structure:
+--------------------+
| Value (8) | - Amount in satoshis (little-endian)
+--------------------+
| Script Length | - CompactSize: length of scriptPubKey
+--------------------+
| ScriptPubKey (var) | - Locking script (spending conditions)
+--------------------+
Understanding Each Field:
-
Value (8 bytes, little-endian): Amount in satoshis (1 BTC = 100,000,000 satoshis). Max value: 21,000,000 BTC = 2.1 x 10^15 satoshis, which fits in 51 bits.
-
ScriptPubKey (variable): The âlockingâ script that defines conditions for spending this output. Different address types have different standard scripts:
- P2PKH (Pay-to-Public-Key-Hash):
OP_DUP OP_HASH160 <20-byte-hash> OP_EQUALVERIFY OP_CHECKSIG - P2SH (Pay-to-Script-Hash):
OP_HASH160 <20-byte-hash> OP_EQUAL - P2WPKH (Native SegWit):
OP_0 <20-byte-hash> - P2WSH (SegWit Script Hash):
OP_0 <32-byte-hash> - P2TR (Taproot):
OP_1 <32-byte-key>
- P2PKH (Pay-to-Public-Key-Hash):
Bitcoin Script: A Stack-Based Language
Bitcoin Script is a simple, stack-based, Turing-incomplete language. Understanding it is essential for parsing transactions correctly.
Execution Model:
- Start with empty stack
- Execute scriptSig (unlocking script) - pushes data onto stack
- Execute scriptPubKey (locking script) - manipulates stack
- Transaction is valid if stack is non-empty and top element is true
Common Opcodes:
| Opcode | Hex | Description |
|---|---|---|
| OP_0 | 0x00 | Push empty byte array (false) |
| OP_PUSHDATA1 | 0x4c | Next byte is length, then push that many bytes |
| OP_PUSHDATA2 | 0x4d | Next 2 bytes are length, then push that many bytes |
| OP_1 to OP_16 | 0x51-0x60 | Push 1-16 onto stack |
| OP_DUP | 0x76 | Duplicate top stack item |
| OP_HASH160 | 0xa9 | SHA256 then RIPEMD160 of top item |
| OP_EQUAL | 0x87 | Pop two items, push 1 if equal, else 0 |
| OP_EQUALVERIFY | 0x88 | OP_EQUAL then OP_VERIFY |
| OP_CHECKSIG | 0xac | Verify signature against pubkey |
| OP_CHECKMULTISIG | 0xae | M-of-N signature verification |
| OP_RETURN | 0x6a | Marks output as provably unspendable (data embedding) |
Data Push Opcodes (0x01 - 0x4b): When the opcode is between 0x01 and 0x4b (1-75), it means âpush the next N bytes onto the stack.â
Example: P2PKH ScriptPubKey
76 a9 14 89abcdef... 88 ac
^ ^ ^ ^ ^ ^
| | | | | OP_CHECKSIG
| | | | OP_EQUALVERIFY
| | | 20 bytes (pubkey hash)
| | Push next 20 bytes
| OP_HASH160
OP_DUP
SegWit: Segregated Witness
Segregated Witness (BIP 141/143/144) was Bitcoinâs most significant upgrade. It separates (âsegregatesâ) the signature (âwitnessâ) data from the main transaction structure.
Why SegWit?
-
Transaction Malleability Fix: Before SegWit, anyone could modify the scriptSig (specifically, the encoding of signatures) without invalidating the signature itself. This changed the TXID (which hashes the entire transaction including scriptSig), breaking chains of unconfirmed transactions.
-
Block Space Efficiency: Witness data counts at 1/4 weight, effectively increasing block capacity.
-
Enables Lightning Network: Fixed malleability allows off-chain payment channels to work safely.
SegWit Transaction Format:
+----------------+
| Version (4) |
+----------------+
| Marker (0x00) | <- SegWit marker
+----------------+
| Flag (0x01) | <- SegWit flag
+----------------+
| Input Count |
+----------------+
| Inputs [...] | <- scriptSig is empty for native SegWit
+----------------+
| Output Count |
+----------------+
| Outputs [...] |
+----------------+
| Witness [...] | <- Witness data for each input
+----------------+
| Locktime (4) |
+----------------+
Witness Data Structure:
For each input, witness data is:
+--------------------+
| Stack Item Count | - CompactSize: number of items
+--------------------+
| Item 1 Length | - CompactSize
+--------------------+
| Item 1 Data |
+--------------------+
| Item 2 Length |
+--------------------+
| Item 2 Data |
+--------------------+
| ... |
+--------------------+
For P2WPKH, witness typically contains:
- Signature (71-73 bytes, DER encoded + sighash type)
- Public key (33 bytes compressed)
TXID vs WTXID
TXID (Transaction ID):
- SHA256d of transaction without witness data
- For legacy transactions: hash of entire transaction
- For SegWit transactions: hash of version + inputs + outputs + locktime (no marker, flag, or witness)
WTXID (Witness Transaction ID):
- SHA256d of entire transaction including witness
- Used in witness commitment (stored in coinbase transaction)
Why Two IDs?
The TXID remains stable regardless of witness data, solving malleability. When you reference a previous output to spend, you use the TXID. The WTXID is used to commit to the complete transaction data in blocks.
Computing TXID for SegWit Transaction:
TXID = SHA256(SHA256(
version (4 bytes) ||
input_count (varint) ||
inputs (without scriptSig, which is empty anyway) ||
output_count (varint) ||
outputs ||
locktime (4 bytes)
))
Note: This is exactly the âstrippedâ transaction - what youâd get if you removed marker, flag, and witness.
DER Signature Encoding
Bitcoin signatures use DER (Distinguished Encoding Rules) format, a subset of ASN.1:
30 <total_length>
02 <r_length> <r_value>
02 <s_length> <s_value>
<sighash_type>
Structure:
- 0x30: SEQUENCE tag
- Total length: Length of everything after this byte (excluding sighash)
- 0x02: INTEGER tag for r
- r_length: Length of r (usually 32-33 bytes)
- r_value: The r value (may have leading 0x00 if high bit is set)
- 0x02: INTEGER tag for s
- s_length: Length of s (usually 32 bytes after low-S normalization)
- s_value: The s value
- sighash_type: 0x01 = SIGHASH_ALL (most common)
Example:
30 44 02 20 7a8f... 02 20 3b4c... 01
Sighash Types
The sighash type (last byte of signature) determines which parts of the transaction the signature commits to:
| Type | Value | Description |
|---|---|---|
| SIGHASH_ALL | 0x01 | Signs all inputs and outputs (default) |
| SIGHASH_NONE | 0x02 | Signs inputs, but not outputs |
| SIGHASH_SINGLE | 0x03 | Signs inputs and only the output at same index |
| SIGHASH_ANYONECANPAY | 0x80 | Can be combined with above; only signs own input |
Complete Project Specification
Functional Requirements
- Raw Transaction Parsing
- Accept hex-encoded raw transactions
- Detect and parse both legacy and SegWit formats
- Extract all fields: version, inputs, outputs, locktime
- Parse witness data for SegWit transactions
- Script Analysis
- Decode scriptSig and scriptPubKey
- Identify script type (P2PKH, P2SH, P2WPKH, P2WSH, P2TR, OP_RETURN)
- Disassemble opcodes to human-readable format
- Extract embedded data (addresses, public keys, hashes)
- Signature Parsing
- Parse DER-encoded signatures
- Extract r and s values
- Identify sighash type
- Validate signature structure
- Transaction ID Computation
- Compute TXID (legacy method)
- Compute WTXID (includes witness)
- Display in correct byte order (display vs internal)
- Human-Readable Output
- Pretty-print transaction structure
- Show amounts in BTC and satoshis
- Convert addresses to standard formats
- Calculate and display transaction fee (if inputs known)
Non-Functional Requirements
- Correctness: Parse all valid Bitcoin transactions correctly
- Robustness: Handle malformed transactions gracefully with clear errors
- Performance: Parse 1000 transactions per second
- Compatibility: Match output of
bitcoin-cli decoderawtransaction
Command-Line Interface
# Parse a transaction from hex
$ btc-tx-parser 0100000001abcd...
Transaction ID: 7a3f...
Version: 1
Inputs: 1
[0] Previous TX: abcd1234...
Previous Vout: 0
ScriptSig: 47304402...
Sequence: 0xffffffff
Outputs: 2
[0] Value: 0.5 BTC (50000000 sat)
ScriptPubKey: OP_DUP OP_HASH160 89ab... OP_EQUALVERIFY OP_CHECKSIG
Type: P2PKH
Address: 1A1zP1...
[1] Value: 0.00009 BTC (9000 sat)
ScriptPubKey: OP_RETURN 48656c6c6f
Type: OP_RETURN
Data: "Hello"
Locktime: 0
# Parse from file
$ btc-tx-parser --file transaction.hex
# Output as JSON
$ btc-tx-parser --json 0100000001abcd...
# Verbose mode (show raw bytes alongside parsed)
$ btc-tx-parser --verbose 0100000001abcd...
# Compute TXID/WTXID only
$ btc-tx-parser --txid 0100000001abcd...
TXID: 7a3f5c...
WTXID: 8b4e6d...
# Disassemble script only
$ btc-tx-parser --script "76a91489abcdef...88ac"
OP_DUP OP_HASH160 89abcdef... OP_EQUALVERIFY OP_CHECKSIG
Solution Architecture
Module Structure
src/
+-- main.rs # CLI entry point
+-- lib.rs # Public API
+-- parser/
| +-- mod.rs # Transaction parser coordinator
| +-- varint.rs # CompactSize integer parsing
| +-- input.rs # Transaction input parsing
| +-- output.rs # Transaction output parsing
| +-- witness.rs # Witness data parsing
+-- script/
| +-- mod.rs # Script disassembler
| +-- opcodes.rs # Opcode definitions
| +-- types.rs # Script type detection
| +-- address.rs # Address extraction/encoding
+-- signature/
| +-- mod.rs # Signature handling
| +-- der.rs # DER encoding parser
| +-- sighash.rs # Sighash type definitions
+-- hash/
| +-- mod.rs # Hashing utilities
| +-- txid.rs # TXID/WTXID computation
+-- display/
| +-- mod.rs # Output formatting
| +-- json.rs # JSON serialization
| +-- pretty.rs # Human-readable output
+-- tests/
+-- vectors.rs # Known transaction test vectors
+-- segwit_tests.rs # SegWit-specific tests
+-- script_tests.rs # Script parsing tests
Core Data Structures
/// A parsed Bitcoin transaction
pub struct Transaction {
pub version: u32,
pub inputs: Vec<TxInput>,
pub outputs: Vec<TxOutput>,
pub witness: Option<Vec<Witness>>,
pub locktime: u32,
}
/// A transaction input
pub struct TxInput {
pub prev_txid: [u8; 32],
pub prev_vout: u32,
pub script_sig: Script,
pub sequence: u32,
}
/// A transaction output
pub struct TxOutput {
pub value: u64, // in satoshis
pub script_pubkey: Script,
}
/// Witness data for one input
pub struct Witness {
pub items: Vec<Vec<u8>>,
}
/// A Bitcoin script (raw bytes + disassembly)
pub struct Script {
pub raw: Vec<u8>,
pub ops: Vec<ScriptOp>,
pub script_type: ScriptType,
}
/// A single script operation
pub enum ScriptOp {
/// Push data onto stack
Push(Vec<u8>),
/// Named opcode
Op(Opcode),
}
/// Standard script types
pub enum ScriptType {
P2PKH,
P2SH,
P2WPKH,
P2WSH,
P2TR,
OpReturn(Vec<u8>),
Multisig { m: u8, n: u8 },
NonStandard,
}
/// A parsed DER signature
pub struct Signature {
pub r: Vec<u8>,
pub s: Vec<u8>,
pub sighash: SighashType,
}
/// Sighash types
pub enum SighashType {
All,
None,
Single,
AllAnyoneCanPay,
NoneAnyoneCanPay,
SingleAnyoneCanPay,
}
Key Algorithms
CompactSize Parsing
function read_compact_size(bytes, offset) -> (value, new_offset):
first_byte = bytes[offset]
if first_byte < 0xFD:
return (first_byte, offset + 1)
else if first_byte == 0xFD:
value = read_u16_le(bytes, offset + 1)
return (value, offset + 3)
else if first_byte == 0xFE:
value = read_u32_le(bytes, offset + 1)
return (value, offset + 5)
else: // 0xFF
value = read_u64_le(bytes, offset + 1)
return (value, offset + 9)
SegWit Detection
function is_segwit(bytes) -> bool:
// SegWit transactions have marker=0x00 and flag=0x01 after version
if len(bytes) < 6:
return false
// Read version (4 bytes)
// Check bytes 4 and 5
marker = bytes[4]
flag = bytes[5]
return marker == 0x00 and flag == 0x01
TXID Computation
function compute_txid(tx: Transaction) -> [u8; 32]:
// Serialize without witness data
serialized = serialize_no_witness(tx)
// Double SHA256
hash1 = sha256(serialized)
hash2 = sha256(hash1)
// TXID is displayed in reversed byte order
return hash2
Script Disassembly
function disassemble_script(script: bytes) -> Vec<ScriptOp>:
ops = []
i = 0
while i < len(script):
opcode = script[i]
if opcode == 0x00:
ops.push(Op(OP_0))
i += 1
else if opcode >= 0x01 and opcode <= 0x4b:
// Direct push
length = opcode
data = script[i+1 : i+1+length]
ops.push(Push(data))
i += 1 + length
else if opcode == 0x4c: // OP_PUSHDATA1
length = script[i+1]
data = script[i+2 : i+2+length]
ops.push(Push(data))
i += 2 + length
else if opcode == 0x4d: // OP_PUSHDATA2
length = read_u16_le(script, i+1)
data = script[i+3 : i+3+length]
ops.push(Push(data))
i += 3 + length
else:
ops.push(Op(OPCODE_MAP[opcode]))
i += 1
return ops
Phased Implementation Guide
Phase 1: Hex Parsing and Basic Structure
Goal: Read hex-encoded transactions and extract the version and locktime.
Tasks:
- Create hex-to-bytes conversion function
- Parse version (first 4 bytes, little-endian)
- Parse locktime (last 4 bytes, little-endian)
- Detect SegWit marker/flag
Validation:
let tx = parse_transaction("01000000...00000000");
assert_eq!(tx.version, 1);
assert_eq!(tx.locktime, 0);
Hints if stuck:
- Version is little-endian: bytes 0x01 0x00 0x00 0x00 = version 1
- Locktime is always the last 4 bytes
- For SegWit detection, check if bytes[4..6] == [0x00, 0x01]
Phase 2: CompactSize Integers
Goal: Implement variable-length integer parsing.
Tasks:
- Implement read_compact_size function
- Handle all four encoding cases (1, 3, 5, 9 bytes)
- Return both the value and bytes consumed
- Add comprehensive tests
Validation:
assert_eq!(read_compact_size(&[0x05]), (5, 1));
assert_eq!(read_compact_size(&[0xFD, 0x00, 0x01]), (256, 3));
assert_eq!(read_compact_size(&[0xFE, 0x00, 0x00, 0x01, 0x00]), (65536, 5));
Hints if stuck:
- First check if first byte < 0xFD for single-byte case
- Remember: values are little-endian
- Return (value, number_of_bytes_consumed) for cursor tracking
Phase 3: Input Parsing
Goal: Parse transaction inputs.
Tasks:
- Read input count using CompactSize
- For each input:
- Read previous TXID (32 bytes, reversed for display)
- Read previous vout (4 bytes, little-endian)
- Read scriptSig length and data
- Read sequence (4 bytes, little-endian)
- Store in TxInput struct
Validation:
// Parse a known transaction and verify input fields
let tx = parse_transaction("01000000017b...");
assert_eq!(tx.inputs.len(), 1);
assert_eq!(tx.inputs[0].prev_vout, 0);
Hints if stuck:
- TXID bytes are stored in âinternalâ order but displayed reversed
- ScriptSig can be empty (especially for SegWit inputs)
- Use a cursor/offset pattern to track position in byte stream
Phase 4: Output Parsing
Goal: Parse transaction outputs.
Tasks:
- Read output count using CompactSize
- For each output:
- Read value (8 bytes, little-endian)
- Read scriptPubKey length and data
- Display value in both satoshis and BTC
Validation:
let tx = parse_transaction("...");
assert_eq!(tx.outputs.len(), 2);
assert_eq!(tx.outputs[0].value, 50_000_000); // 0.5 BTC
Hints if stuck:
- Value is u64, not u32 (8 bytes)
- 1 BTC = 100,000,000 satoshis
- Max value is ~21 million BTC = 2.1 quadrillion satoshis
Phase 5: Script Disassembly
Goal: Decode Bitcoin Script to human-readable format.
Tasks:
- Define opcode constants (OP_DUP, OP_HASH160, etc.)
- Implement direct push opcodes (0x01-0x4b)
- Implement OP_PUSHDATA1 and OP_PUSHDATA2
- Handle remaining opcodes
- Format output as assembly-style string
Validation:
let script = hex::decode("76a91489abcdef...88ac")?;
let disasm = disassemble(&script);
assert_eq!(disasm, "OP_DUP OP_HASH160 <20-bytes> OP_EQUALVERIFY OP_CHECKSIG");
Hints if stuck:
- Opcodes 0x01-0x4b are âpush next N bytesâ
- Use a match/switch on opcode value
- Format pushed data as hex for now, later convert to addresses
Phase 6: Script Type Detection
Goal: Identify standard script patterns.
Tasks:
- Detect P2PKH:
OP_DUP OP_HASH160 <20> OP_EQUALVERIFY OP_CHECKSIG - Detect P2SH:
OP_HASH160 <20> OP_EQUAL - Detect P2WPKH:
OP_0 <20> - Detect P2WSH:
OP_0 <32> - Detect P2TR:
OP_1 <32> - Detect OP_RETURN:
OP_RETURN <data> - Extract pubkey hash or script hash
Validation:
let script = hex::decode("0014a91b...1234")?;
assert_eq!(detect_type(&script), ScriptType::P2WPKH);
Hints if stuck:
- Check script length and byte patterns
- P2WPKH is exactly 22 bytes: 0x00 0x14 + 20 bytes
- P2WSH is exactly 34 bytes: 0x00 0x20 + 32 bytes
- OP_RETURN starts with 0x6a
Phase 7: Witness Data Parsing
Goal: Parse SegWit witness data.
Tasks:
- Skip to witness section after outputs
- Read witness data for each input:
- Read stack item count (CompactSize)
- For each item: read length and data
- Link witness to corresponding input
Validation:
let tx = parse_segwit_transaction("02000000000101...");
assert!(tx.witness.is_some());
assert_eq!(tx.witness.unwrap()[0].items.len(), 2);
Hints if stuck:
- Witness comes after all outputs, before locktime
- Each input has its own witness (even if empty)
- Non-witness inputs have witness = [[]] (one empty item)
Phase 8: Signature Parsing (DER)
Goal: Extract signature components from DER encoding.
Tasks:
- Parse DER sequence tag (0x30)
- Parse r value (integer tag 0x02 + length + data)
- Parse s value (integer tag 0x02 + length + data)
- Extract sighash type (last byte)
- Handle leading zeros and high-bit padding
Validation:
let sig = hex::decode("304402207a8f...01")?;
let parsed = parse_signature(&sig)?;
assert_eq!(parsed.sighash, SighashType::All);
assert_eq!(parsed.r.len(), 32);
Hints if stuck:
- Leading 0x00 byte on r/s if high bit is set (prevents negative interpretation)
- s should be âlow-Sâ normalized (less than n/2)
- Sighash byte is appended after DER structure
Phase 9: TXID Computation
Goal: Compute transaction IDs correctly.
Tasks:
- Implement SHA256 (or use sha2 crate)
- Implement double-SHA256 (SHA256d)
- For legacy: hash entire transaction
- For SegWit: hash stripped transaction (no witness)
- Implement WTXID (hash with witness)
- Handle byte order for display
Validation:
// Compare against known TXID
let tx = parse_transaction("01000000017b1eabe...");
let txid = compute_txid(&tx);
assert_eq!(hex::encode(txid.reverse()), "a1b2c3d4...");
Hints if stuck:
- TXID is displayed in reversed byte order (legacy display convention)
- For SegWit stripped transaction: version + inputs + outputs + locktime (no marker/flag/witness)
- Use sha2 crate for production; implement from scratch for learning
Phase 10: Address Encoding
Goal: Convert script hashes to standard address formats.
Tasks:
- Implement Base58Check for legacy addresses (P2PKH, P2SH)
- Implement Bech32 for SegWit addresses (P2WPKH, P2WSH)
- Implement Bech32m for Taproot addresses (P2TR)
- Handle mainnet vs testnet prefixes
Validation:
// P2PKH
assert_eq!(
pubkey_hash_to_address(&hash, Network::Mainnet),
"1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2"
);
// P2WPKH
assert_eq!(
witness_pubkey_to_address(&hash, Network::Mainnet),
"bc1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t4"
);
Hints if stuck:
- Base58Check = Base58(payload + checksum); checksum = first 4 bytes of SHA256d(payload)
- Bech32 uses 5-bit character encoding with specific checksum
- Prefix: mainnet P2PKH=0x00, P2SH=0x05; testnet P2PKH=0x6f, P2SH=0xc4
Phase 11: Pretty Printing and JSON Output
Goal: Create user-friendly output formats.
Tasks:
- Format transaction as hierarchical text
- Implement JSON serialization
- Add verbose mode showing raw bytes alongside parsed values
- Calculate fee (if previous outputs are known)
Validation:
$ btc-tx-parser --json <tx>
{
"txid": "abc123...",
"version": 2,
"inputs": [...],
...
}
Hints if stuck:
- Use serde for JSON serialization
- Show both hex and interpreted values in verbose mode
- Fee = sum(inputs) - sum(outputs); requires looking up previous transactions
Testing Strategy
Unit Tests
#[test]
fn test_compact_size_single_byte() {
assert_eq!(read_compact_size(&[0x00]), (0, 1));
assert_eq!(read_compact_size(&[0xFC]), (252, 1));
}
#[test]
fn test_compact_size_three_byte() {
assert_eq!(read_compact_size(&[0xFD, 0xFD, 0x00]), (253, 3));
assert_eq!(read_compact_size(&[0xFD, 0xFF, 0xFF]), (65535, 3));
}
#[test]
fn test_script_type_detection() {
// P2PKH
let p2pkh = hex::decode("76a91489abcdefabbaabbaabbaabbaabbaabbaabbaabba88ac").unwrap();
assert_eq!(detect_script_type(&p2pkh), ScriptType::P2PKH);
// P2WPKH
let p2wpkh = hex::decode("0014751e76e8199196d454941c45d1b3a323f1433bd6").unwrap();
assert_eq!(detect_script_type(&p2wpkh), ScriptType::P2WPKH);
}
Known Transaction Test Vectors
#[test]
fn test_genesis_coinbase() {
// Satoshi's genesis block coinbase
let hex = "01000000010000000000000000000000000000000000000000000000000000000000000000ffffffff4d04ffff001d0104455468652054696d65732030332f4a616e2f32303039204368616e63656c6c6f72206f6e206272696e6b206f66207365636f6e64206261696c6f757420666f722062616e6b73ffffffff0100f2052a0100000043410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac00000000";
let tx = parse_transaction(hex).unwrap();
assert_eq!(tx.version, 1);
assert_eq!(tx.inputs.len(), 1);
assert_eq!(tx.outputs.len(), 1);
assert_eq!(tx.outputs[0].value, 5_000_000_000); // 50 BTC
}
#[test]
fn test_segwit_transaction() {
// Known SegWit transaction
let hex = "02000000000101...";
let tx = parse_transaction(hex).unwrap();
assert!(tx.witness.is_some());
// Verify TXID and WTXID
}
#[test]
fn test_multisig_transaction() {
// 2-of-3 multisig
let hex = "0100000001...";
let tx = parse_transaction(hex).unwrap();
// Check multisig script detection
}
Compatibility Tests
#[test]
fn test_bitcoin_core_compatibility() {
// Parse transactions and compare output to bitcoin-cli decoderawtransaction
let transactions = load_test_vectors("testdata/transactions.json");
for (hex, expected) in transactions {
let parsed = parse_transaction(&hex).unwrap();
let json = serde_json::to_value(&parsed).unwrap();
assert_json_eq!(json, expected);
}
}
Fuzz Testing
#[test]
fn fuzz_transaction_parser() {
// Ensure parser doesn't crash on arbitrary input
for _ in 0..10000 {
let random_bytes = generate_random_bytes(1000);
let _ = parse_transaction(&hex::encode(random_bytes));
// Should not panic, may return error
}
}
Common Pitfalls and Debugging
Pitfall 1: Byte Order Confusion
Problem: Bitcoin uses little-endian for most values, but TXIDs are displayed in reversed order.
Symptom: TXID doesnât match block explorer.
Solution:
// Internal byte order (as stored in transaction)
let internal_txid = [0x01, 0x02, 0x03, ..., 0x20];
// Display order (reversed for human display)
let display_txid = internal_txid.iter().rev().collect::<Vec<_>>();
println!("TXID: {}", hex::encode(display_txid));
Pitfall 2: Off-by-One in Script Parsing
Problem: Mishandling direct push opcodes (0x01-0x4b).
Symptom: Script parsing fails or consumes wrong number of bytes.
Solution:
// The opcode itself IS the length, not length + 1
if opcode >= 0x01 && opcode <= 0x4b {
let length = opcode as usize; // Not opcode + 1
let data = &script[pos + 1..pos + 1 + length];
pos += 1 + length;
}
Pitfall 3: SegWit Detection False Positives
Problem: Some legacy transactions might have bytes that look like SegWit marker.
Symptom: Parsing fails on certain legacy transactions.
Solution:
fn is_segwit(bytes: &[u8]) -> bool {
// Check marker and flag
if bytes.len() >= 6 && bytes[4] == 0x00 && bytes[5] == 0x01 {
// Additional check: input count should be valid CompactSize
// Legacy tx with 0 inputs would be invalid anyway
return true;
}
false
}
Pitfall 4: Empty Witness vs No Witness
Problem: Confusing SegWit transactions with empty witness data vs legacy transactions.
Symptom: Parser crashes or misinterprets witness.
Solution:
// Even native SegWit must have witness data (signature + pubkey)
// Empty witness [] is different from missing witness
if is_segwit {
for _ in 0..num_inputs {
let stack_items = read_compact_size(&bytes[pos..])?;
// stack_items == 0 means empty witness for this input
// (happens for non-SegWit inputs in mixed transactions)
}
}
Pitfall 5: DER Signature Length Variability
Problem: Signatures can be 70-73 bytes due to leading zeros.
Symptom: Fixed-size parsing fails.
Solution:
fn parse_der_signature(bytes: &[u8]) -> Result<Signature> {
// Read lengths from structure, don't assume fixed size
let total_len = bytes[1] as usize;
let r_len = bytes[3] as usize;
let r = &bytes[4..4 + r_len];
let s_offset = 4 + r_len;
let s_len = bytes[s_offset + 1] as usize;
let s = &bytes[s_offset + 2..s_offset + 2 + s_len];
// ...
}
Pitfall 6: CompactSize Maximum Value
Problem: Not handling 8-byte (0xFF prefix) CompactSize values.
Symptom: Parser fails on transactions with very long scripts.
Solution:
fn read_compact_size(bytes: &[u8]) -> Result<(u64, usize)> {
match bytes[0] {
0x00..=0xFC => Ok((bytes[0] as u64, 1)),
0xFD => Ok((read_u16_le(&bytes[1..3]) as u64, 3)),
0xFE => Ok((read_u32_le(&bytes[1..5]) as u64, 5)),
0xFF => Ok((read_u64_le(&bytes[1..9]), 9)), // Don't forget this case!
}
}
Extensions and Challenges
Challenge 1: Transaction Builder
Build the reverse: construct valid raw transactions programmatically. Create inputs, outputs, and serialize to hex.
Challenge 2: Signature Verification
Implement ECDSA signature verification (using your implementation from P02). Verify that signatures in parsed transactions are valid.
Challenge 3: Fee Estimation
Build a fee estimator that:
- Calculates virtual bytes (vbytes) for weight calculation
- Estimates fee based on transaction size
- Suggests optimal fee for current mempool conditions
Challenge 4: Script Interpreter
Implement a Bitcoin Script interpreter that actually executes scripts. Verify that scriptSig + scriptPubKey evaluates to true for each input.
Challenge 5: PSBT Parser
Extend your parser to handle Partially Signed Bitcoin Transactions (BIP 174). These are used for multi-party signing workflows.
Challenge 6: Taproot Script Paths
Parse and display Taproot (P2TR) transactions including:
- Key path spends
- Script path spends with Merkle proofs
- Control blocks and tapleaf scripts
Real-World Connections
Blockchain Analysis
Chain analysis companies (Chainalysis, Elliptic) parse every Bitcoin transaction to track fund flows, identify patterns, and detect illicit activity. Your parser is the first step in building such tools.
Wallet Development
Every Bitcoin wallet must parse transactions to:
- Display transaction history
- Calculate balances
- Construct new transactions
- Verify incoming payments
Node Implementation
Bitcoin Coreâs parsing logic is critical infrastructure. Every transaction that propagates across the network is parsed thousands of times. Bugs in parsing code have led to network-wide issues in the past.
Educational Tools
Block explorers like Blockstream.info, mempool.space, and Blockchain.com all parse transactions to display human-readable information. Your tool could become the core of a personal block explorer.
Security Research
Understanding transaction format is essential for:
- Analyzing malformed transaction attacks
- Identifying signature grinding attacks
- Detecting unusual script patterns
- Forensic analysis of theft/hacks
Resources
Primary References
- âMastering Bitcoinâ Chapters 5-6 - Detailed transaction and script explanation
- Bitcoin Developer Reference: Transactions
- BIP 141 (SegWit): Segregated Witness
- BIP 143: Transaction Signature Verification for SegWit
- BIP 341 (Taproot): Taproot: SegWit version 1 spending rules
Code References
- Bitcoin Core: primitives/transaction.h
- rust-bitcoin: Transaction struct
- btcd (Go): wire/msgtx.go
Tools
- bitcoin-cli decoderawtransaction: Compare your output
- mempool.space: Visualize transaction structure
- learnmeabitcoin.com: Interactive transaction explorer
- Bitcoin Script IDE: ide.bitauth.com
Test Data
- Bitcoin Testnet: Use testnet transactions for safe testing
- Regtest: Create your own transactions for controlled testing
- Mainnet Samples: Famous transactions (pizza transaction, Satoshiâs transfers)
Self-Assessment Checklist
Before moving to the next project, verify:
- I can parse both legacy and SegWit transactions correctly
- I understand why CompactSize encoding saves space
- I can disassemble any Bitcoin Script to opcodes
- I can identify the script type from a scriptPubKey
- I understand the difference between TXID and WTXID
- I can explain why SegWit fixes transaction malleability
- I can parse DER-encoded signatures and extract r, s values
- I can compute the correct TXID for any transaction
- I can convert pubkey hashes to addresses (Base58Check and Bech32)
- My parser matches bitcoin-cli output for test transactions
Whatâs Next?
With transaction parsing mastered, you now understand how Bitcoin encodes and validates transfers at the byte level. In Project 8: Simple EVM Implementation, youâll move from Bitcoin to Ethereum and build a minimal Ethereum Virtual Machine, learning how smart contracts execute bytecode to transform blockchain state.