P01: Cryptographic Hash Function Visualizer

P01: Cryptographic Hash Function Visualizer

Project Overview

Attribute Value
Main Language Rust
Alternative Languages C, Go, Python
Difficulty Intermediate
Coolness Level Level 3: Genuinely Clever
Business Potential Resume Gold (Educational/Personal Brand)
Knowledge Area Cryptography / Hash Functions
Main Book “Serious Cryptography, 2nd Edition” by Jean-Philippe Aumasson

Learning Objectives

By completing this project, you will:

  1. Understand cryptographic hash function properties at a mathematical level—preimage resistance, second preimage resistance, and collision resistance
  2. Master bit manipulation operations including rotations, shifts, and XOR in the context of cryptographic primitives
  3. Implement the SHA-256 algorithm from the official NIST specification without using any cryptographic libraries
  4. Visualize the avalanche effect to develop intuition about why hash functions are one-way
  5. Connect hashing to blockchain understanding why hash functions are the foundation of Bitcoin, Merkle trees, and proof-of-work

Deep Theoretical Foundation

Why Hash Functions Matter in Web3

Every transaction you’ve ever seen on a blockchain, every block header, every Merkle tree, every proof-of-work puzzle—they all depend on cryptographic hash functions. When you hear someone say “the hash of this block is 0x1a2b3c…”, they’re referring to the output of a function that takes arbitrary data and produces a fixed-size “fingerprint.”

But what makes a hash function cryptographic? Why can’t you reverse it? And why does changing a single bit in the input completely scramble the output?

The Three Properties of Cryptographic Hash Functions

A cryptographic hash function H must satisfy three properties:

1. Preimage Resistance (One-Way Property)

Given a hash output h, it should be computationally infeasible to find any input m such that H(m) = h.

Why this matters for Web3: When you see a Bitcoin address, it’s actually a hash of a public key. Even though the address is public, you can’t derive the public key from it (without additional information). This provides a layer of privacy and security.

Quantitative meaning: For SHA-256, finding a preimage requires approximately 2^256 operations—more than the number of atoms in the observable universe.

2. Second Preimage Resistance

Given an input m1, it should be computationally infeasible to find a different input m2 (where m2 ≠ m1) such that H(m1) = H(m2).

Why this matters for Web3: If I sign a transaction to send 1 ETH to Alice, an attacker shouldn’t be able to find a different message that hashes to the same value. Otherwise, they could substitute my signed transaction with a malicious one.

3. Collision Resistance

It should be computationally infeasible to find any two different inputs m1 and m2 such that H(m1) = H(m2).

Why this matters for Web3: Merkle trees depend on collision resistance. If an attacker could find collisions, they could potentially substitute transactions in a block without changing the Merkle root.

Note: Due to the birthday paradox, collision resistance is weaker than preimage resistance. For a 256-bit hash, finding a collision requires approximately 2^128 operations.

The SHA-256 Algorithm Deep Dive

SHA-256 (Secure Hash Algorithm, 256-bit) was designed by the NSA and published by NIST in 2001. It’s part of the SHA-2 family and is the hash function used in Bitcoin.

High-Level Structure

Input Message → Padding → Block Processing → Final Hash
                              ↓
                    [For each 512-bit block]
                              ↓
                    Message Schedule (W)
                              ↓
                    64 Compression Rounds
                              ↓
                    Update Hash State (H)

Step 1: Message Padding

The input message must be padded to a multiple of 512 bits:

  1. Append a single ‘1’ bit
  2. Append ‘0’ bits until the message is 448 mod 512 bits
  3. Append the original message length as a 64-bit big-endian integer

Example:

Message: "Hello" (5 bytes = 40 bits)

Binary: 01001000 01100101 01101100 01101100 01101111

Padded:
01001000 01100101 01101100 01101100 01101111  ← "Hello"
10000000 00000000 00000000 00000000 ...       ← '1' bit + zeros
00000000 00000000 00000000 00101000           ← Length = 40 (0x28)

Step 2: Initial Hash Values

SHA-256 starts with 8 initial hash values (H0-H7), which are the first 32 bits of the fractional parts of the square roots of the first 8 primes:

H0 = 0x6a09e667  (from √2)
H1 = 0xbb67ae85  (from √3)
H2 = 0x3c6ef372  (from √5)
H3 = 0xa54ff53a  (from √7)
H4 = 0x510e527f  (from √11)
H5 = 0x9b05688c  (from √13)
H6 = 0x1f83d9ab  (from √17)
H7 = 0x5be0cd19  (from √19)

Why these constants? They’re “nothing up my sleeve” numbers—derived from mathematical constants in a way that proves the designers didn’t hide backdoors.

Step 3: Round Constants

There are 64 round constants (K0-K63), derived from the first 32 bits of the fractional parts of the cube roots of the first 64 primes:

K[0..7] = 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
          0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, ...

Step 4: Message Schedule

Each 512-bit block is expanded into 64 32-bit words (W0-W63):

W[i] = M[i]                                          for i ∈ [0, 15]
W[i] = σ1(W[i-2]) + W[i-7] + σ0(W[i-15]) + W[i-16]   for i ∈ [16, 63]

where:
σ0(x) = ROTR(x, 7) XOR ROTR(x, 18) XOR SHR(x, 3)
σ1(x) = ROTR(x, 17) XOR ROTR(x, 19) XOR SHR(x, 10)

ROTR = Right rotation (bits wrap around) SHR = Right shift (bits fall off)

Step 5: Compression Function (64 Rounds)

For each round i from 0 to 63:

T1 = h + Σ1(e) + Ch(e,f,g) + K[i] + W[i]
T2 = Σ0(a) + Maj(a,b,c)

h = g
g = f
f = e
e = d + T1
d = c
c = b
b = a
a = T1 + T2

where:
Σ0(x) = ROTR(x, 2) XOR ROTR(x, 13) XOR ROTR(x, 22)
Σ1(x) = ROTR(x, 6) XOR ROTR(x, 11) XOR ROTR(x, 25)
Ch(x,y,z) = (x AND y) XOR ((NOT x) AND z)       // "Choose"
Maj(x,y,z) = (x AND y) XOR (x AND z) XOR (y AND z)  // "Majority"

Understanding the functions:

  • Ch (Choose): If bit x is 1, choose bit from y; otherwise choose from z
  • Maj (Majority): Output is 1 if at least 2 of the 3 bits are 1
  • Σ (Sigma): Creates diffusion by rotating and XORing

Step 6: Update Hash State

After processing all 64 rounds, add the compressed values to the running hash:

H0 = H0 + a
H1 = H1 + b
...
H7 = H7 + h

(All additions are modulo 2^32)

The Avalanche Effect

The avalanche effect is the property that a small change in input (even 1 bit) produces a drastically different output. For a good hash function, changing 1 input bit should flip approximately 50% of output bits.

Example:

SHA256("Hello") = 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969
SHA256("hello") = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                  Every character is different!

Why does this happen? The combination of XOR, rotations, and the non-linear Ch/Maj functions ensures that changing one bit affects multiple words in the message schedule, which then propagates through all 64 rounds. After just a few rounds, the effect has spread to all 256 bits.

Security Considerations

Why SHA-256 is secure:

  1. 64 rounds provide enough mixing that no known shortcut attacks work
  2. The message schedule ensures all input bits affect all output bits
  3. Non-linear functions (Ch, Maj) prevent algebraic attacks
  4. Rotations and XORs prevent differential cryptanalysis

What would break SHA-256:

  1. Finding a shortcut to compute preimages in less than 2^256 operations
  2. Finding collisions in less than 2^128 operations
  3. Discovering mathematical weaknesses in the compression function

As of 2024, no practical attacks exist against SHA-256.


Complete Project Specification

Functional Requirements

  1. Core Hashing
    • Accept arbitrary input (string or file)
    • Implement SHA-256 algorithm from scratch (no crypto libraries)
    • Produce correct 256-bit (64 hex character) output
    • Match test vectors from NIST
  2. Visualization
    • Display message padding step
    • Show message schedule (W array) expansion
    • Print each compression round with working variables (a-h)
    • Highlight which bits changed between rounds
  3. Avalanche Demonstration
    • Compare hashes of similar inputs
    • Calculate and display the percentage of bits that differ
    • Visualize bit differences with color coding
  4. Educational Mode
    • Step-by-step explanation of each operation
    • Pause between rounds for examination
    • Export intermediate states to file

Non-Functional Requirements

  • Performance: Hash 1MB file in under 1 second
  • Correctness: Pass all NIST test vectors
  • Usability: Clear CLI interface with help
  • Portability: Compile on Linux, macOS, Windows

Command-Line Interface

# Basic hash
$ sha256viz "Hello World"
Hash: a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e

# Verbose mode with visualization
$ sha256viz --verbose "Hello"
Input: "Hello" (5 bytes)
Padded: 48656c6c6f80000000...0028 (64 bytes)

Message Schedule:
W[0]  = 48656c6c
W[1]  = 6f800000
...

Compression Rounds:
Round 0:  A=5d6aeb04 B=6a09e667 C=bb67ae85 D=3c6ef372 E=58cb02c1 F=510e527f G=9b05688c H=1f83d9ab
Round 1:  A=2a0c8e42 ...
...

Final Hash: 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969

# Avalanche test
$ sha256viz --avalanche "Hello" "hello"
"Hello" → 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969
"hello" → 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Bit difference: 128/256 (50.0%)

# File hashing
$ sha256viz --file document.pdf
Hash: 3a7bd3e2360a3d...

Solution Architecture

Module Structure

src/
├── main.rs           # CLI entry point
├── lib.rs            # Public API
├── sha256/
│   ├── mod.rs        # SHA-256 implementation
│   ├── constants.rs  # K and H initial values
│   ├── padding.rs    # Message padding logic
│   ├── schedule.rs   # Message schedule expansion
│   ├── compress.rs   # Compression function
│   └── functions.rs  # Ch, Maj, Σ0, Σ1, σ0, σ1
├── visualizer/
│   ├── mod.rs        # Visualization coordinator
│   ├── formatter.rs  # Output formatting
│   └── colors.rs     # Terminal colors for bit diffs
└── tests/
    ├── nist_vectors.rs    # Official test vectors
    └── avalanche_tests.rs # Avalanche property tests

Core Data Structures

/// A 32-bit word used throughout SHA-256
pub type Word = u32;

/// The 8 working variables (a-h) in the compression function
pub struct WorkingVars {
    pub a: Word,
    pub b: Word,
    pub c: Word,
    pub d: Word,
    pub e: Word,
    pub f: Word,
    pub g: Word,
    pub h: Word,
}

/// The 64-word message schedule
pub struct MessageSchedule {
    pub words: [Word; 64],
}

/// Complete state for one block's processing
pub struct BlockState {
    pub schedule: MessageSchedule,
    pub rounds: Vec<WorkingVars>,  // State after each round
}

/// Configuration for visualization
pub struct VisConfig {
    pub show_padding: bool,
    pub show_schedule: bool,
    pub show_rounds: bool,
    pub pause_between_rounds: bool,
    pub color_output: bool,
}

Key Algorithms

Padding Algorithm

function pad_message(message: bytes) -> bytes:
    original_length = len(message) * 8  // in bits

    // Append 0x80 (binary: 10000000)
    message.append(0x80)

    // Append zeros until length ≡ 448 (mod 512)
    while (len(message) * 8) % 512 != 448:
        message.append(0x00)

    // Append original length as 64-bit big-endian
    for i in 7..0:
        message.append((original_length >> (i * 8)) & 0xFF)

    return message

Compression Round

function compression_round(state: WorkingVars, k: Word, w: Word) -> WorkingVars:
    T1 = state.h + big_sigma1(state.e) + ch(state.e, state.f, state.g) + k + w
    T2 = big_sigma0(state.a) + maj(state.a, state.b, state.c)

    return WorkingVars {
        a: T1 + T2,
        b: state.a,
        c: state.b,
        d: state.c,
        e: state.d + T1,
        f: state.e,
        g: state.f,
        h: state.g,
    }

Visualization Strategy

  1. Padding Visualization: Show original bytes in one color, padding in another
  2. Schedule Visualization: Display W[0..15] (direct from message) differently from W[16..63] (computed)
  3. Round Visualization: Highlight which bits changed from previous round
  4. Avalanche Visualization: Side-by-side comparison with differing bits highlighted

Phased Implementation Guide

Phase 1: Core Types and Constants

Goal: Set up project structure and define constants.

Tasks:

  1. Create new Rust project with Cargo
  2. Define the 64 round constants (K array)
  3. Define the 8 initial hash values (H array)
  4. Implement Word type alias and basic bit operations

Validation:

assert_eq!(K[0], 0x428a2f98);
assert_eq!(H[0], 0x6a09e667);

Hints if stuck:

  • The constants are well-documented in FIPS 180-4
  • Use const arrays for compile-time initialization
  • Consider creating a separate constants.rs file

Phase 2: Bit Operations

Goal: Implement the primitive operations used in SHA-256.

Tasks:

  1. Implement right rotation: rotr(x, n)
  2. Implement right shift: shr(x, n)
  3. Implement Ch: ch(x, y, z)
  4. Implement Maj: maj(x, y, z)
  5. Implement σ0, σ1, Σ0, Σ1

Validation:

assert_eq!(rotr(0x80000000, 1), 0x40000000);
assert_eq!(ch(0xFF00FF00, 0x12345678, 0x87654321), 0x12654378);
assert_eq!(sigma0(0x12345678), <expected_value>);

Hints if stuck:

  • ROTR in Rust: (x >> n) | (x << (32 - n))
  • Ch can be simplified: (x & y) ^ (!x & z)
  • Test each function independently before combining

Phase 3: Message Padding

Goal: Correctly pad messages according to the specification.

Tasks:

  1. Implement padding for short messages (<55 bytes)
  2. Handle messages that require two blocks after padding
  3. Test with known inputs

Validation:

let padded = pad("abc".as_bytes());
// Should be 64 bytes, ending with 0x00...018 (length = 24 bits)
assert_eq!(padded.len(), 64);
assert_eq!(padded[3], 0x80);  // First padding byte
assert_eq!(padded[63], 0x18);  // Length byte

Hints if stuck:

  • Remember: length is in BITS, not bytes
  • The ‘1’ bit is the high bit of 0x80 (10000000 binary)
  • Use big-endian byte order for the length

Phase 4: Message Schedule

Goal: Expand each 512-bit block into 64 32-bit words.

Tasks:

  1. Parse first 16 words directly from the block
  2. Compute words 16-63 using σ0, σ1, and previous words
  3. Visualize the schedule expansion

Validation:

// For "abc" padded, first few words:
// W[0] = 0x61626380 ("abc" + 0x80)
// W[1] = 0x00000000
// ...

Hints if stuck:

  • Words are big-endian (high byte first)
  • W[16] = σ1(W[14]) + W[9] + σ0(W[1]) + W[0]
  • All arithmetic is modulo 2^32 (Rust’s wrapping_add)

Phase 5: Compression Function

Goal: Implement the 64-round compression function.

Tasks:

  1. Initialize working variables from current hash state
  2. Apply 64 rounds using K, W, and the mixing functions
  3. Add compressed values back to hash state
  4. Capture and store intermediate states for visualization

Validation:

// After processing "abc":
// H0 = 0xba7816bf
// H1 = 0x8f01cfea
// ...

Hints if stuck:

  • The round function slides values: h→g→f→e, d→c→b→a
  • ‘e’ gets the old ‘d’ plus T1
  • ‘a’ gets T1 + T2
  • Use wrapping_add for all additions

Phase 6: Complete Hash and Visualization

Goal: Produce correct hashes and visualize the process.

Tasks:

  1. Combine all phases into complete SHA-256 function
  2. Add command-line interface
  3. Implement verbose output with round-by-round display
  4. Add color coding for bit changes

Validation:

# NIST test vectors:
$ sha256viz ""
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

$ sha256viz "abc"
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

$ sha256viz "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq"
248d6a61d20638b8e5c026930c3e6039a33ce45964ff2167f6ecedd419db06c1

Hints if stuck:

  • For multiple blocks, chain the hash state
  • The final hash is H0   H1     H7 in big-endian
  • Use ANSI escape codes for terminal colors

Phase 7: Avalanche Effect Visualization

Goal: Demonstrate and visualize the avalanche effect.

Tasks:

  1. Compare two hashes bit by bit
  2. Count and display differing bits
  3. Visual representation showing which bits differ

Validation:

$ sha256viz --avalanche "a" "b"
# Should show ~50% bit difference

Testing Strategy

Unit Tests

#[test]
fn test_rotr() {
    assert_eq!(rotr(0x80000000, 1), 0x40000000);
    assert_eq!(rotr(0x00000001, 1), 0x80000000);
}

#[test]
fn test_padding_short_message() {
    let padded = pad(b"abc");
    assert_eq!(padded.len(), 64);
}

#[test]
fn test_schedule_expansion() {
    let schedule = expand_schedule(&padded_block);
    assert_eq!(schedule[0], expected_w0);
}

Integration Tests (NIST Test Vectors)

#[test]
fn test_nist_empty_string() {
    let hash = sha256(b"");
    assert_eq!(
        hex::encode(hash),
        "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
    );
}

#[test]
fn test_nist_abc() {
    let hash = sha256(b"abc");
    assert_eq!(
        hex::encode(hash),
        "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad"
    );
}

#[test]
fn test_nist_two_blocks() {
    let hash = sha256(b"abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq");
    assert_eq!(
        hex::encode(hash),
        "248d6a61d20638b8e5c026930c3e6039a33ce45964ff2167f6ecedd419db06c1"
    );
}

Property-Based Tests

#[test]
fn test_avalanche_property() {
    for _ in 0..100 {
        let input: Vec<u8> = random_bytes(100);
        let mut modified = input.clone();
        modified[random_index()] ^= 1;  // Flip one bit

        let hash1 = sha256(&input);
        let hash2 = sha256(&modified);

        let different_bits = count_different_bits(&hash1, &hash2);
        // Should be close to 128 (50% of 256 bits)
        assert!(different_bits > 100 && different_bits < 156);
    }
}

Common Pitfalls & Debugging

Pitfall 1: Endianness Confusion

Problem: SHA-256 uses big-endian for words, but your CPU might be little-endian.

Symptom: Hash output is completely wrong, but internal operations seem correct.

Solution:

// Reading a word from bytes (big-endian)
fn read_word(bytes: &[u8]) -> u32 {
    ((bytes[0] as u32) << 24) |
    ((bytes[1] as u32) << 16) |
    ((bytes[2] as u32) << 8)  |
    (bytes[3] as u32)
}

Pitfall 2: Length Encoding

Problem: The message length must be encoded as bits, not bytes, in big-endian format.

Symptom: Works for short messages, fails for longer ones.

Solution:

let bit_length: u64 = (original_bytes.len() as u64) * 8;
// Append as 8 bytes, big-endian
for i in (0..8).rev() {
    padded.push(((bit_length >> (i * 8)) & 0xFF) as u8);
}

Pitfall 3: Integer Overflow

Problem: SHA-256 uses modulo 2^32 arithmetic.

Symptom: Unexpected values after additions.

Solution: Use wrapping_add in Rust:

let result = a.wrapping_add(b).wrapping_add(c);

Pitfall 4: Message Schedule Indices

Problem: Off-by-one errors in the W array computation.

Symptom: Wrong hash for multi-block messages.

Solution: Double-check the formula:

W[i] = σ1(W[i-2]) + W[i-7] + σ0(W[i-15]) + W[i-16]

Note: It’s i-2, i-7, i-15, i-16 (not i-1, i-6, etc.)

Pitfall 5: Rotation vs Shift

Problem: Confusing right rotation (ROTR) with right shift (SHR).

Symptom: σ0 and σ1 produce wrong values.

Solution:

// ROTR: bits wrap around
fn rotr(x: u32, n: u32) -> u32 {
    (x >> n) | (x << (32 - n))
}

// SHR: bits fall off (just >>)
fn shr(x: u32, n: u32) -> u32 {
    x >> n
}

Extensions and Challenges

Challenge 1: Multi-Threaded Hashing

Parallelize hashing of large files by processing independent blocks on different threads. Note: The final hash still requires sequential combination.

Challenge 2: Hardware Acceleration

Use SIMD instructions (AVX2/SSE4) to compute multiple message schedules or rounds in parallel. This is how production implementations achieve high throughput.

Challenge 3: Implement SHA-512

Apply the same concepts to SHA-512, which uses 64-bit words and 80 rounds. Compare performance and security properties.

Challenge 4: Hash Collision Finder

Implement a birthday attack to find collisions in a truncated version of SHA-256 (e.g., find two inputs that produce the same first 32 bits). This demonstrates why full 256 bits are needed.

Challenge 5: Mining Simulator

Use your SHA-256 implementation to build a simple proof-of-work miner. Find nonces that produce hashes below a target difficulty.


Real-World Connections

Bitcoin Mining

Bitcoin miners perform SHA-256(SHA-256(block_header)) billions of times per second to find a hash below the target difficulty. Your implementation helps understand exactly what miners are computing.

Git Commits

Git uses SHA-1 (similar structure to SHA-256) to identify commits. Understanding hash functions helps understand why git can detect tampering.

Password Storage

Secure password storage uses hash functions (often with additional techniques like salting and key stretching). Understanding the one-way property explains why passwords can be verified but not recovered.

File Integrity

When you download software and verify its SHA-256 checksum, you’re confirming the exact same hash function produced the same output—proving the file wasn’t modified.


Resources

Primary References

  1. FIPS 180-4: NIST SHA Standard - The official specification
  2. “Serious Cryptography” Chapter 6 - Excellent practical explanation
  3. SHA-256 Animation: sha256algorithm.com - Visual step-through

Code References

  1. Bitcoin Core SHA-256: GitHub - Production implementation
  2. RustCrypto SHA-256: GitHub - Idiomatic Rust implementation

Supplementary Reading

  1. “Applied Cryptography” by Bruce Schneier - Classic cryptography reference
  2. Merkle-Damgård Construction: The theoretical framework behind SHA-256

Self-Assessment Checklist

Before moving to the next project, verify:

  • I can explain why SHA-256 produces a fixed-size output from variable input
  • I understand why hash functions are one-way (preimage resistance)
  • I can manually trace through one round of the compression function
  • My implementation passes all NIST test vectors
  • I can explain the avalanche effect and demonstrate it
  • I understand why Bitcoin uses SHA-256 for proof-of-work
  • I can identify which parts of SHA-256 provide diffusion vs confusion
  • I understand the role of the initial hash values and round constants

What’s Next?

With SHA-256 mastered, you now understand the cryptographic foundation of Web3. In Project 2: ECDSA Digital Signature Implementation, you’ll build on this by implementing the digital signature algorithm used to authorize every Bitcoin and Ethereum transaction. You’ll see how hash functions combine with elliptic curve cryptography to prove ownership.