P15: Decentralized Storage Client (IPFS-like)

Project Overview

Attribute	Value
Main Language	Go
Alternative Languages	Rust, TypeScript
Difficulty	Expert
Coolness Level	Level 4: Hardcore Tech Flex
Business Potential	The “Open Core” Infrastructure
Knowledge Area	Distributed Systems / Storage
Software or Tool	Content-Addressed Storage
Main Book	“Designing Data-Intensive Applications” by Martin Kleppmann
Time Estimate	3-4 weeks
Prerequisites	Projects 3 & 4, P2P networking experience

Learning Objectives

By completing this project, you will:

Master content-addressed storage understanding how cryptographic hashes replace location-based addressing and enable data integrity verification without trusting the source
Implement the Kademlia DHT algorithm including XOR distance metric, k-buckets, and iterative routing that finds content in O(log N) hops
Build Merkle DAG structures splitting large files into chunks, linking them with cryptographic hashes, and enabling efficient partial downloads and deduplication
Understand peer-to-peer networking fundamentals including peer discovery, NAT traversal, connection management, and multiaddr formatting
Design and implement block exchange protocols handling concurrent downloads, peer prioritization, and verification of received data
Explore content persistence and garbage collection understanding pinning strategies, provider record management, and data availability guarantees
Connect theory to Web3 infrastructure understanding why IPFS is the data layer for NFT metadata, dApp frontends, and decentralized applications
Apply distributed systems principles including eventual consistency, Byzantine fault tolerance concepts, and censorship resistance

Deep Theoretical Foundation

The Problem: Location vs. Content

The internet we use today is fundamentally location-based. When you request https://example.com/file.pdf, you’re saying “give me whatever is at this location.” This has several problems:

Single point of failure: If the server goes down, the file is unavailable
Trust required: You trust that the server gives you the correct file
Duplication inefficiency: The same file hosted on 100 servers has 100 different URLs
Censorship vulnerability: Block the server, block the content

Content-addressed storage flips this model: “Give me the file with this hash, from anyone who has it.”

Location-Based (HTTP):                Content-Based (IPFS):

    Request                               Request
       │                                     │
       ▼                                     ▼
┌─────────────┐                     ┌─────────────────┐
│   Server A  │                     │    ANY Peer     │
│ example.com │                     │   that has it   │
└──────┬──────┘                     └────────┬────────┘
       │                                     │
       ▼                                     ▼
    file.pdf                             file.pdf
   (trust it?)                      (verify hash = CID)
                                           ✓

Why This Matters for Web3

Blockchains are not designed for large file storage:

Ethereum stores ~1 KB of state per transaction
Storing 1 MB on-chain costs hundreds of dollars in gas
Blockchain state must be replicated by every node

So where do NFT images live? Where do dApp frontends live? Where does the actual data of Web3 live?

IPFS and similar systems provide the data layer. When an NFT points to ipfs://QmXyz..., that’s a content-addressed reference that:

Works regardless of which server hosts the file
Can be verified by hashing the received data
Enables censorship resistance (content persists if ANY node has it)

The Architecture of Content-Addressed Storage

┌────────────────────────────────────────────────────────────────────────────┐
│                        IPFS-LIKE SYSTEM ARCHITECTURE                        │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │                          APPLICATION LAYER                          │  │
│   │   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌────────────────────┐    │  │
│   │   │  CLI    │  │   API   │  │ Gateway │  │  Name Resolution   │    │  │
│   │   │ ipfs add│  │ /api/v0 │  │  HTTP   │  │     (IPNS/DNSLink) │    │  │
│   │   └────┬────┘  └────┬────┘  └────┬────┘  └─────────┬──────────┘    │  │
│   └────────┼────────────┼───────────┼────────────────────────────────────┘  │
│            │            │           │                  │                    │
│   ┌────────▼────────────▼───────────▼──────────────────▼─────────────────┐  │
│   │                           DATA LAYER                                 │  │
│   │   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────┐  │  │
│   │   │   MERKLE DAG    │  │    BLOCK STORE  │  │    PINNING          │  │  │
│   │   │                 │  │                 │  │                     │  │  │
│   │   │  ┌───────────┐  │  │  CID → Bytes    │  │  CID → PinStatus    │  │  │
│   │   │  │   Root    │  │  │                 │  │                     │  │  │
│   │   │  ├─────┬─────┤  │  │  Local storage  │  │  Garbage collection │  │  │
│   │   │  │Chunk│Chunk│  │  │  (LevelDB, FS)  │  │  reference counting │  │  │
│   │   │  └─────┴─────┘  │  │                 │  │                     │  │  │
│   │   └─────────────────┘  └─────────────────┘  └─────────────────────┘  │  │
│   └────────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│   ┌────────────────────────────────▼───────────────────────────────────────┐  │
│   │                         NETWORK LAYER                                  │  │
│   │   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────┐   │  │
│   │   │  KADEMLIA DHT   │  │    BITSWAP      │  │    PEER MANAGER     │   │  │
│   │   │                 │  │   (Exchange)    │  │                     │   │  │
│   │   │  k-buckets      │  │                 │  │  Connection pooling │   │  │
│   │   │  XOR distance   │  │  Want/Have/     │  │  NAT traversal      │   │  │
│   │   │  Provider       │  │  Block msgs     │  │  Bootstrap nodes    │   │  │
│   │   │  records        │  │                 │  │                     │   │  │
│   │   └─────────────────┘  └─────────────────┘  └─────────────────────┘   │  │
│   └────────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│   ┌────────────────────────────────▼───────────────────────────────────────┐  │
│   │                        TRANSPORT LAYER                                 │  │
│   │   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────────────┐    │  │
│   │   │   TCP    │  │   QUIC   │  │  WebRTC  │  │   WebSocket        │    │  │
│   │   └──────────┘  └──────────┘  └──────────┘  └────────────────────┘    │  │
│   │                                                                        │  │
│   │   Multiaddr: /ip4/192.168.1.1/tcp/4001/p2p/12D3KooW...                │  │
│   └────────────────────────────────────────────────────────────────────────┘  │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

The Core Question You’re Answering

“How can we store and retrieve data in a decentralized network where content is addressed by its cryptographic hash, not by server location?”

This fundamentally changes data storage:

Location independence: “What” the data is (hash) matters, not “where” it lives (server)
Verifiability: You can prove data integrity by hashing it
Deduplication: Identical content always has the same CID
Censorship resistance: No single party controls the data

Understanding content-addressed storage means understanding why IPFS is the data layer for Web3–blockchains store small state, IPFS stores large files.

Concepts You Must Understand First

Stop and research these before coding:

1. Content Addressing vs. Location Addressing

Traditional HTTP addressing identifies content by WHERE it lives:

https://example.com/files/document.pdf
        ▲               ▲
        │               │
     Server          Path on server

Content addressing identifies content by WHAT it is:

QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG
▲
└── Hash of the content itself

Key insight: The same content always produces the same hash. If two people add the same file, they get the same CID. This enables automatic deduplication and verification.

What happens when content changes? New content = new hash = new CID. There’s no “updating” a CID–you create a new one. This is immutability by design.

Questions to test your understanding:

Why does content addressing enable deduplication?
What happens if I change one byte in a file? (Entire CID changes)
How does this relate to Git commits? (Git uses content-addressing too!)
Why is immutability important for NFT metadata?

Book Reference: “Designing Data-Intensive Applications” Ch. 6 - Kleppmann

2. Cryptographic Hashing and CIDs

A Content Identifier (CID) is more than just a hash. It’s a self-describing structure:

CID Structure:
┌─────────────────────────────────────────────────────────────────────┐
│                      Content Identifier (CID)                        │
├───────────┬────────────┬───────────────────────────────────────────┤
│  Version  │   Codec    │              Multihash                     │
│   (v0/v1) │  (dag-pb,  │ ┌─────────────────────────────────────────┤
│           │  raw, etc) │ │ Hash Func │ Length │     Hash Bytes     │
│   0x01    │    0x70    │ │   0x12    │  0x20  │  sha256(content)   │
└───────────┴────────────┴─┴───────────┴────────┴────────────────────┘

Example breakdown of: bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3okuez3...
  - Version: 1 (CIDv1)
  - Codec: dag-pb (0x70) - protobuf-encoded DAG
  - Multihash function: sha2-256 (0x12)
  - Hash length: 32 bytes (0x20)
  - Hash bytes: [actual 32-byte SHA-256 hash]

Why “multihash”? Hash algorithm agility. If SHA-256 is broken, the network can migrate to SHA-3 without changing the CID format. The hash function is embedded in the CID itself.

CIDv0 vs CIDv1:

CIDv0: Legacy format, starts with Qm..., always uses dag-pb + SHA-256
CIDv1: Modern format, self-describing, supports multiple codecs and hash functions

Questions to test your understanding:

What is the advantage of embedding the hash function in the identifier?
Why might you choose raw codec vs dag-pb codec?
How do you upgrade hash algorithms without breaking old content?

Project Reference: You should have built Project 1 (SHA-256 Visualizer) and Project 3 (Merkle Tree Library) Web Reference: IPFS CID Specification

3. Merkle DAGs (Directed Acyclic Graphs)

For large files, we don’t hash the entire file as one blob. We split it into chunks and link them with a Merkle DAG:

Merkle DAG for a 2.4 MB File (10 chunks):

                    ┌─────────────────────────────────────┐
                    │            Root Node                │
                    │  CID: QmYwAPJzv5CZsnA625s3Xf2nem... │
                    │  Type: UnixFS File                  │
                    │  Size: 2,516,582 bytes              │
                    └─────────────────┬───────────────────┘
                                      │
        ┌──────────┬──────────┬───────┼───────┬──────────┬──────────┐
        │          │          │       │       │          │          │
        ▼          ▼          ▼       ▼       ▼          ▼          ▼
   ┌─────────┐┌─────────┐┌─────────┐ ... ┌─────────┐┌─────────┐┌─────────┐
   │ Chunk 0 ││ Chunk 1 ││ Chunk 2 │     │ Chunk 7 ││ Chunk 8 ││ Chunk 9 │
   │ 256 KB  ││ 256 KB  ││ 256 KB  │     │ 256 KB  ││ 256 KB  ││ 128 KB  │
   │ QmHash0 ││ QmHash1 ││ QmHash2 │     │ QmHash7 ││ QmHash8 ││ QmHash9 │
   └─────────┘└─────────┘└─────────┘     └─────────┘└─────────┘└─────────┘

Root node contains:
{
  "type": "file",
  "links": [
    {"name": "chunk-0", "cid": "QmHash0...", "size": 262144},
    {"name": "chunk-1", "cid": "QmHash1...", "size": 262144},
    ...
    {"name": "chunk-9", "cid": "QmHash9...", "size": 131072}
  ]
}

CID(file) = hash(root_node_serialized)

Why DAGs instead of simple Merkle trees?

DAGs allow sharing: if two files have identical chunks, they reference the same CID
DAGs support directories: a directory node links to file nodes
DAGs enable incremental updates: change one chunk, only that chunk’s ancestors change

The relationship to Git: Git commits form a DAG. Each commit points to parent commits and a tree. The tree points to blobs (file contents). Sound familiar?

Questions to test your understanding:

If you change one byte in chunk 5, which CIDs change? (Chunk 5, root node)
How does chunking enable parallel downloads?
Why is 256 KB a common chunk size? (Balance between overhead and parallelism)
How would you represent a directory with 10,000 files?

Book Reference: “Mastering Bitcoin” Ch. 9 (Merkle Trees) - Antonopoulos Project Reference: Project 3 (Merkle Tree Library)

4. Kademlia DHT (Distributed Hash Table)

The DHT is how we find who has the content. Kademlia is the specific DHT algorithm used by IPFS.

Core concepts:

Every peer has a 256-bit Peer ID (hash of their public key)
Every CID maps to a 256-bit key
Distance is measured by XOR: distance(a, b) = a XOR b

XOR Distance Metric Visualization:

Node A ID: 1010 0000 0000 0000 ...
Node B ID: 1010 0001 0000 0000 ...
          ─────────────────────────
XOR:       0000 0001 0000 0000 ...
                   ▲
                   └── 1 bit different = distance 256

Node A ID: 1010 0000 0000 0000 ...
Node C ID: 0000 0000 0000 0000 ...
          ─────────────────────────
XOR:       1010 0000 0000 0000 ...
           ▲
           └── Many bits different = much larger distance

Key insight: XOR distance has these properties:
- d(a,a) = 0 (identity)
- d(a,b) = d(b,a) (symmetric)
- d(a,c) <= d(a,b) + d(b,c) (triangle inequality)

K-Buckets:

K-Bucket Structure (k=3 for simplicity):
┌────────────────────────────────────────────────────────────────────┐
│  Local Node ID: 1010 0000 0000 0000 0000 0000 ... (256 bits)       │
├────────────────────────────────────────────────────────────────────┤
│                                                                    │
│  Bucket 255: Peers with ID differing in bit 255 (very close)       │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │ Peer1: 1010 0000 0000 0000 ... 0001 │ Last seen: 2s ago       │ │
│  │ Peer2: 1010 0000 0000 0000 ... 0010 │ Last seen: 5s ago       │ │
│  │ [can hold up to k=3 peers]                                    │ │
│  └───────────────────────────────────────────────────────────────┘ │
│                                                                    │
│  Bucket 254: Peers with ID differing in bit 254                    │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │ Peer3: 1010 0000 0000 0000 ... 0100 │ Last seen: 1s ago       │ │
│  └───────────────────────────────────────────────────────────────┘ │
│                                                                    │
│  ...                                                               │
│                                                                    │
│  Bucket 0: Peers with ID differing in bit 0 (very far)             │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │ Peer99: 0... (starts with 0, we start with 1)                 │ │
│  │ Peer100: 0...                                                 │ │
│  │ Peer101: 0...                                                 │ │
│  └───────────────────────────────────────────────────────────────┘ │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

Rule: Each bucket stores up to k peers at that distance.
      Closer buckets have fewer potential peers but we know them better.
      Farther buckets have many potential peers but we only need a sample.

Iterative Lookup Process:

Finding content CID: QmYwAPJzv5CZsnA625s3Xf2nem...
Key (256-bit hash of CID): 0x3a7f2b1c...

Step 1: Local node picks k closest peers from its routing table

        Local                                    Target Key
          │                                          │
          ▼                                          ▼
        ┌───┐     Query     ┌───┐                 ┌───┐
        │ A ├──────────────►│ B │                 │ ? │
        └───┘               └─┬─┘                 └───┘
                              │
                              │ "Here are peers closer to key..."
                              ▼
Step 2: Query returned peers, they return even closer peers

        ┌───┐               ┌───┐     Query     ┌───┐
        │ A │               │ B ├──────────────►│ C │
        └───┘               └───┘               └─┬─┘
                                                  │
                                                  │ "I have providers!"
                                                  ▼
Step 3: Eventually find nodes storing provider records

        Provider Record: "Peer X at /ip4/1.2.3.4/tcp/4001 has QmYwAP..."

Total hops: O(log N) where N = network size
For 1 million nodes: ~20 hops

Questions to test your understanding:

Why XOR instead of numerical difference?
What’s the purpose of keeping k peers per bucket instead of just 1?
How does Kademlia handle nodes going offline?
What prevents an attacker from controlling which nodes store a provider record?

Book Reference: “Designing Data-Intensive Applications” Ch. 6 - Kleppmann Web Reference: Kademlia DHT Paper

5. Peer-to-Peer Networking

Peer Discovery:

Bootstrap nodes: Well-known nodes that help new nodes join
mDNS: Local network discovery
DHT: Find new peers through routing table refresh

Connection Management:

Multiaddr Format:
/ip4/192.168.1.1/tcp/4001/p2p/12D3KooWRq4tRk8jF3d9VqMn5WVzQv9V3GC9p7uXh3nE8X1YZ2AB
  │         │       │   │         │
  │         │       │   │         └── Peer ID (hash of public key)
  │         │       │   └── Port number
  │         │       └── Protocol (TCP)
  │         └── IP address
  └── IP version

Other multiaddr examples:
/ip6/::1/tcp/4001/p2p/12D3KooW...           (IPv6)
/dns4/node.example.com/tcp/443/wss/p2p/...  (WebSocket over HTTPS)
/ip4/192.168.1.1/udp/4001/quic/p2p/...      (QUIC transport)

NAT Traversal:

Relay servers: Third party relays connections
Hole punching: Direct connection through coordinated timing
AutoNAT: Detect if you’re behind NAT

Book Reference: “Computer Networks, Fifth Edition” Ch. 2 - Tanenbaum

6. BitSwap Protocol

BitSwap is IPFS’s block exchange protocol–how peers request and send blocks to each other.

BitSwap Message Flow:

    Node A (wants blocks)              Node B (has blocks)
         │                                    │
         │  WANT_HAVE [CID1, CID2, CID3]      │
         ├───────────────────────────────────►│
         │                                    │
         │  HAVE [CID1, CID3]                 │
         │◄───────────────────────────────────┤
         │                                    │
         │  WANT_BLOCK [CID1]                 │
         ├───────────────────────────────────►│
         │                                    │
         │  BLOCK [CID1, data...]             │
         │◄───────────────────────────────────┤
         │                                    │
         │  (verify: hash(data) == CID1)      │
         │                                    │

Message Types:
- WANT_HAVE: "Do you have these blocks?"
- HAVE: "Yes, I have these blocks"
- WANT_BLOCK: "Please send me this block"
- BLOCK: Here's the block data
- CANCEL: Never mind, I got it elsewhere

Incentive Considerations:

BitSwap tracks peer “ledgers” (how much exchanged)
Preferential treatment for peers who reciprocate
Similar to BitTorrent’s tit-for-tat but more flexible

Web Reference: BitSwap Specification

7. Data Persistence and Garbage Collection

Content in IPFS is ephemeral by default. Without explicit preservation:

Pinning Hierarchy:
┌────────────────────────────────────────────────────────────────────┐
│                       PINNING STRATEGIES                           │
├────────────────────────────────────────────────────────────────────┤
│                                                                    │
│  Direct Pin: Pin only the root CID                                 │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  Pinned: Root ──► Not pinned: Chunk1, Chunk2, Chunk3        │   │
│  │  (WARNING: Children can be garbage collected!)              │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  Recursive Pin: Pin root and all descendants (DEFAULT)             │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  Pinned: Root ──► Pinned: Chunk1, Chunk2, Chunk3            │   │
│  │  (All blocks preserved)                                     │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                    │
│  Indirect Pin: Pinned because an ancestor is recursively pinned   │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  If RootA pins Chunk1, and RootB also uses Chunk1:          │   │
│  │  Chunk1 is safe even if RootB is unpinned                   │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘

Garbage Collection Process:
1. Mark all pinned CIDs and their descendants
2. Mark all blocks in active transfer
3. Delete unmarked blocks
4. Update blockstore metrics

Provider Record Lifecycle:

When you add content, you announce to DHT
Provider records expire after 24 hours
You must re-announce periodically to stay discoverable
If all providers stop announcing, content becomes unfindable

Questions to Guide Your Design

Before implementing, think through these:

1. Content Addressing

How do you compute a CID from file bytes?
What hash function to use? (SHA-256 for compatibility)
How do you encode the CID? (base58, base32, or base64?)
How do you handle CID v0 vs v1?

2. Chunking Strategy

What chunk size? (IPFS uses 256 KB default)
Fixed-size chunks or content-defined chunking?
How do you rebuild the file from chunks?
How do you handle small files (< chunk size)?

3. Merkle DAG Construction

How do you represent the DAG in memory and on disk?
What format for DAG nodes? (protobuf, JSON, CBOR?)
How do you link chunks to the root node?
How do you traverse the DAG to fetch all blocks?

4. DHT Implementation

How many k-buckets? (IPFS uses 256 for 256-bit hashes)
What’s the k-value (peers per bucket)? (IPFS uses k=20)
How do you bootstrap (initial peer discovery)?
How often do you refresh routing tables?
How do you handle peers going offline?

5. Provider Records

When a node has content, how does it announce to the DHT?
How many DHT nodes should store the provider record? (IPFS uses ~20)
How long are provider records valid? (IPFS uses 24 hours, with refresh)
How do you clean up stale records?

6. Block Exchange

How do you request blocks from peers (BitSwap vs simple request/response)?
Do you implement incentives to prevent freeloading?
How do you handle peers that have incomplete data?
Parallel downloads from multiple peers or sequential?

Thinking Exercises

Exercise 1: Trace Content Retrieval

Before coding, trace this scenario step by step:

Alice adds file.pdf to IPFS on Node A
CID computed: QmAbc123...
Node A announces to DHT: "I have QmAbc123"
Bob (on Node B) tries to get QmAbc123
Node B queries DHT: "Who has QmAbc123?"
DHT returns: "Node A has it"
Node B connects to Node A
Node B requests blocks from Node A
Node B verifies each block hash
Node B reconstructs file.pdf

Questions while tracing:

At step 3, which DHT nodes store the provider record? (20 closest to QmAbc123’s hash)
At step 5, how many hops in the DHT? (O(log N) where N = network size)
At step 6, what if Node A is offline? (Query returns other providers, if any)
At step 9, what if a block’s hash doesn’t match? (Reject, try different peer)
What if Charlie on Node C also fetches the file? (Now 3 providers: A, B, C)

Exercise 2: Design CID Computation

Given a file, compute its CID:

File: "Hello, IPFS!\n" (13 bytes)

Steps:
1. Hash the content: SHA-256("Hello, IPFS!\n") = 0x2c26b46b...
2. Create multihash: <hash-function-code><hash-length><hash-bytes>
   - SHA-256 code: 0x12
   - Length: 32 (0x20)
   - Result: 0x12202c26b46b...
3. Create CIDv1: <version><codec><multihash>
   - Version: 0x01
   - Codec: raw (0x55)
   - Result: 0x015512202c26b46b...
4. Encode in base58: QmZ...

Exercise 3: XOR Distance Calculation

Calculate XOR distances:

Node A: 0b11110000 (simplified to 8 bits)
Node B: 0b11110001
Node C: 0b00001111
Key K:  0b11110010

Distance(A, K) = 0b11110000 XOR 0b11110010 = 0b00000010 = 2
Distance(B, K) = 0b11110001 XOR 0b11110010 = 0b00000011 = 3
Distance(C, K) = 0b00001111 XOR 0b11110010 = 0b11111101 = 253

Closest to K: Node A (distance 2)

Question: Which k-bucket would Node A put Node B in? (Bucket for bit position where they first differ)

Real World Outcome

When your IPFS-like decentralized storage client is complete, you’ll have built a peer-to-peer content distribution system that demonstrates how Web3 handles data storage without centralized servers. Here’s what you’ll see:

# Terminal 1 - Start first node (Bootstrap node)
$ ./ipfs-lite daemon --port 4001
[Node A] Peer ID: 12D3KooWRq4tRk8jF3d9VqMn5WVzQv9V3GC9p7uXh3nE8X1YZ2AB
[Node A] Listening on /ip4/127.0.0.1/tcp/4001
[Node A] Listening on /ip6/::1/tcp/4001
[Node A] DHT server mode enabled
[Node A] Local storage: ~/.ipfs-lite/blocks/
[Node A] Ready to accept connections

# Terminal 2 - Start second node and connect to bootstrap
$ ./ipfs-lite daemon --port 4002 --bootstrap /ip4/127.0.0.1/tcp/4001/p2p/12D3Koo...
[Node B] Peer ID: 12D3KooWPq8mTUv2oX4cD5nN9sVzQy8R4GC0p7yWh3nA8Z1YZ3CD
[Node B] Connecting to bootstrap node...
[Node B] Connected to 12D3KooWRq4tRk8j... ✓
[Node B] DHT synchronized with 1 peers
[Node B] Ready

# Terminal 3 - Add a file (Node A)
$ ./ipfs-lite add ~/Documents/whitepaper.pdf --node http://localhost:4001

Adding file: whitepaper.pdf (2.4 MB)
[*] Computing content hash...
[*] File size exceeds chunk size, splitting into chunks...

Chunking strategy:
  Chunk size: 256 KB
  Total chunks: 10
  DAG structure: Merkle tree with 10 leaf nodes

Chunk 0: QmHash0... (262144 bytes) ✓
Chunk 1: QmHash1... (262144 bytes) ✓
Chunk 2: QmHash2... (262144 bytes) ✓
...
Chunk 9: QmHash9... (131072 bytes) ✓

Building Merkle DAG...
  Root node links to 10 chunks

Computing CID (Content Identifier)...
  CIDv1: base58-btc
  Codec: dag-pb
  Multihash: sha2-256

+---------------------------------------------+
| File added successfully!                     |
| CID: QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79o |
+---------------------------------------------+

File structure:
+---------------------------------------------+
| Root: QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz7 |
|  +-- Chunk-0: QmHash0... (256 KB)           |
|  +-- Chunk-1: QmHash1... (256 KB)           |
|  +-- Chunk-2: QmHash2... (256 KB)           |
|  +-- ...                                    |
|  +-- Chunk-9: QmHash9... (128 KB)           |
+---------------------------------------------+

[Node A] Announcing to DHT...
[DHT] Publishing provider record to 20 closest peers
[DHT] Provider record stored on peers: 12D3KooWPq8m..., 12D3KooWXy9z...
[Node A] File is now available on the network!

# Terminal 4 - Retrieve file from different node (Node B)
$ ./ipfs-lite get QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG --node http://localhost:4002

Retrieving: QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG

[*] Querying DHT for providers...
[DHT] Finding peers closest to QmYwAP...
[DHT] XOR distance calculation:
  Peer 12D3KooWRq4tRk8j... distance: 0x3f2a1b...
  Peer 12D3KooWXy9z...     distance: 0x7c4d2e...
[DHT] Querying 20 closest peers...
[DHT] Found 2 providers:
  ✓ 12D3KooWRq4tRk8j... (Node A) - RTT: 2ms
  ✓ 12D3KooWZa5x...     (Node C) - RTT: 45ms

[*] Fetching Merkle DAG root...
[Node A] Sending root node (1.2 KB)

[*] Parsing DAG structure...
Total size: 2.4 MB
Chunks needed: 10

[*] Downloading chunks in parallel (max 5 concurrent)...
[#####-----] Chunk 0/10 - downloading from 12D3KooWRq4t... (256 KB)
[#####-----] Chunk 1/10 - downloading from 12D3KooWRq4t... (256 KB)
[#####-----] Chunk 2/10 - downloading from 12D3KooWZa5x... (256 KB)
[#####-----] Chunk 3/10 - downloading from 12D3KooWRq4t... (256 KB)
[#####-----] Chunk 4/10 - downloading from 12D3KooWZa5x... (256 KB)
...
[##########] All chunks downloaded (2.4 MB)

[*] Verifying chunk hashes...
  Chunk 0: Hash matches QmHash0... ✓
  Chunk 1: Hash matches QmHash1... ✓
  ...
  All chunks verified ✓

[*] Reconstructing file...
+---------------------------------------------+
| File retrieved: whitepaper.pdf (2.4 MB)     |
| Downloaded in 0.8 seconds (3.0 MB/s)        |
| SHA-256 matches original ✓                  |
+---------------------------------------------+

# Terminal 5 - Pin the file (keep it permanently)
$ ./ipfs-lite pin add QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG

[Node B] Pinning CID: QmYwAP...
[Node B] Pinned content will not be garbage collected
[Node B] Announcing as provider to DHT...
[DHT] Now 3 peers provide this content

Pinned: QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG

# Check network statistics
$ ./ipfs-lite stats

+============================================================+
|              IPFS-Lite Network Statistics                  |
+============================================================+
| Local Peer ID: 12D3KooWPq8mTUv2oX4cD5nN9sVzQy8R4GC0        |
|                                                            |
| DHT Information:                                           |
|   Connected peers:           47                            |
|   Routing table size:        256 k-buckets                 |
|   K-value:                   20                            |
|   Closest peers:             20                            |
|                                                            |
| Storage:                                                   |
|   Blocks stored:             423                           |
|   Total size:                156 MB                        |
|   Pinned blocks:             89                            |
|   Pinned size:               34 MB                         |
|                                                            |
| Bandwidth (last hour):                                     |
|   Downloaded:                245 MB                        |
|   Uploaded:                  89 MB                         |
|   Peers served:              12                            |
|                                                            |
| Provider Records:                                          |
|   Content we provide:        89 CIDs                       |
|   DHT announcements:         2,140                         |
+============================================================+

# Test content addressing (same content = same CID)
$ ./ipfs-lite add file1.txt
CID: QmXa5b...

$ ./ipfs-lite add file2.txt  # Same content as file1.txt
CID: QmXa5b...  # Same CID! Content-addressed

# Test DHT lookup manually
$ ./ipfs-lite dht findprovs QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG

Searching DHT for providers of: QmYwAP...

DHT Lookup Process:
1. Hash CID to 256-bit key: 0x3a7f2b1c...
2. Find k-closest peers to key using XOR metric
3. Iteratively query closer and closer peers (Kademlia algorithm)

Iteration 1: Querying 3 bootstrap peers
  Found 15 peers closer to key
Iteration 2: Querying 15 peers
  Found 8 peers even closer
Iteration 3: Querying 8 peers
  Found 3 provider records ✓

Providers:
  ✓ 12D3KooWRq4tRk8j... (Node A) - /ip4/127.0.0.1/tcp/4001
  ✓ 12D3KooWPq8mTUv2... (Node B) - /ip4/127.0.0.1/tcp/4002
  ✓ 12D3KooWZa5x...     (Node C) - /ip4/192.168.1.50/tcp/4001

Lookup completed in 3 hops (< 0.1s)

# Simulate node going offline (test resilience)
$ ./ipfs-lite daemon --port 4003 --bootstrap /ip4/127.0.0.1/tcp/4001
[Node D] Connected to network
[Node D] Fetching QmYwAP...

# Even if Node A goes offline...
[Node A] Shutting down...

# Node D can still fetch from Node B or C
[Node D] Node A unreachable, trying alternative provider
[Node D] Downloading from Node B... ✓
[Node D] File retrieved successfully

Content remains available as long as ONE peer has it!

# View DAG structure visually
$ ./ipfs-lite dag inspect QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG

Merkle DAG Structure:
================================================================================
Root: QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG
Type: UnixFS File
Size: 2,516,582 bytes (2.4 MB)

Links (10):
  +-- [0] QmHash0... -> Chunk (262144 bytes)
  +-- [1] QmHash1... -> Chunk (262144 bytes)
  +-- [2] QmHash2... -> Chunk (262144 bytes)
  +-- [3] QmHash3... -> Chunk (262144 bytes)
  +-- [4] QmHash4... -> Chunk (262144 bytes)
  +-- [5] QmHash5... -> Chunk (262144 bytes)
  +-- [6] QmHash6... -> Chunk (262144 bytes)
  +-- [7] QmHash7... -> Chunk (262144 bytes)
  +-- [8] QmHash8... -> Chunk (262144 bytes)
  +-- [9] QmHash9... -> Chunk (131072 bytes)

CID Format:
  Version: 1
  Codec: dag-pb (0x70)
  Multihash: sha2-256
  Base Encoding: base58-btc

Verification:
  ✓ All chunk hashes valid
  ✓ DAG structure consistent
  ✓ Total size matches sum of chunks
================================================================================

You’ve built a censorship-resistant, content-addressed storage network where data is identified by its hash (not location), distributed across peers, and verifiable without trusting any single party. This is how NFT metadata, dApp frontends, and Web3 data live beyond centralized servers.

Implementation Phases

Phase 1: Content Addressing Foundation

Goal: Compute CIDs from file bytes.

Tasks:

Implement SHA-256 hashing
Create multihash encoding (function code + length + hash)
Build CIDv1 structure (version + codec + multihash)
Implement base58 encoding for human-readable CIDs

Validation:

data := []byte("Hello, IPFS!")
cid := ComputeCID(data)
fmt.Println(cid.String())  // Should match real IPFS CID

Hints if stuck:

Start with the go-cid and go-multihash libraries to understand the format
Then implement from scratch for learning
Verify against real IPFS: echo "Hello, IPFS!" | ipfs add --only-hash

Phase 2: File Chunking

Goal: Split large files into fixed-size chunks.

Tasks:

Implement 256 KB chunk splitting
Handle files smaller than chunk size
Handle files with non-aligned sizes
Compute CID for each chunk

Validation:

chunks := ChunkFile(largeFileBytes, 256*1024)
for i, chunk := range chunks {
    fmt.Printf("Chunk %d: %s (%d bytes)\n", i, ComputeCID(chunk), len(chunk))
}

Hints if stuck:

Last chunk will usually be smaller
Consider edge case: empty file
Consider edge case: file exactly chunk size

Phase 3: Merkle DAG Construction

Goal: Link chunks into a root node.

Tasks:

Define DAGNode and Link structures
Build links from chunks
Serialize root node (protobuf or CBOR)
Compute root CID

Validation:

root := BuildDAG(chunks)
fmt.Println("Root CID:", root.CID)
fmt.Println("Links:", len(root.Links))

Hints if stuck:

Study the UnixFS protobuf format
Root node contains links, not raw data
CID of root = hash of serialized root node

Phase 4: Local Block Store

Goal: Persist blocks to disk.

Tasks:

Design key-value store interface
Implement file-system-based storage (CID -> bytes)
Add block retrieval
Implement simple garbage collection

Validation:

$ ls ~/.ipfs-lite/blocks/
QmHash0... QmHash1... QmHash2... root...

Hints if stuck:

Shard directories by first 2 chars of CID (prevents too many files in one dir)
Consider using LevelDB or BadgerDB for production
GC: mark pinned, sweep unpinned

Phase 5: DHT Routing Table

Goal: Implement k-bucket structure.

Tasks:

Generate peer ID from keypair
Implement XOR distance calculation
Build k-bucket array (256 buckets, k=20)
Add peer insertion/lookup logic

Validation:

rt := NewRoutingTable(myPeerID, k=20)
rt.AddPeer(peer1)
rt.AddPeer(peer2)
closest := rt.FindClosest(targetKey, 3)

Hints if stuck:

Distance is calculated by XORing two 256-bit values
Find which bucket by finding first differing bit position
Evict stale peers when bucket is full

Phase 6: DHT Operations

Goal: Implement FindNode and FindProviders.

Tasks:

Implement iterative FindNode (find peers closest to key)
Implement provider record storage
Implement FindProviders query
Add provider announcement

Validation:

providers := dht.FindProviders(cid)
for _, p := range providers {
    fmt.Println("Provider:", p.ID, p.Addrs)
}

Hints if stuck:

Alpha=3: query 3 peers at a time
Terminate when no closer peers found
Provider records have TTL (24 hours)

Phase 7: P2P Networking

Goal: Establish peer connections.

Tasks:

Set up TCP listener
Implement connection handling
Add bootstrap node connection
Handle multiaddr parsing

Validation:

$ ./ipfs-lite daemon
Listening on /ip4/0.0.0.0/tcp/4001/p2p/12D3KooW...
Connected peers: 5

Hints if stuck:

Use libp2p for production quality
For learning, raw TCP + length-prefixed messages works
Handle connection failures gracefully

Phase 8: Block Exchange

Goal: Request and serve blocks between peers.

Tasks:

Implement WANT/HAVE/BLOCK messages
Add block request handling
Implement parallel downloads
Add block verification on receipt

Validation:

block := peer.RequestBlock(cid)
if ComputeCID(block) != cid {
    return errors.New("block verification failed")
}

Hints if stuck:

Verify every received block by hashing
Download multiple blocks in parallel
Retry with different peer on failure

Phase 9: Full Integration

Goal: Complete add/get workflow.

Tasks:

Wire up all components
Implement CLI commands
Add pinning support
Test with multiple nodes

Validation:

# Node 1
$ ./ipfs-lite add file.pdf
CID: Qm...

# Node 2 (different machine)
$ ./ipfs-lite get Qm...
Downloaded: file.pdf

The Interview Questions They’ll Ask

Prepare to answer these:

1. “Explain how IPFS is different from HTTP.”

Expected answer: HTTP is location-based–URLs point to specific servers. IPFS is content-based–CIDs identify data by its hash, not where it lives. This enables verification (you can check the hash), deduplication (same content = same CID), and decentralization (fetch from anyone who has it).

2. “What happens if I change one byte in a file on IPFS?”

Expected answer: The entire file gets a new CID because the hash changes. The old CID still points to the old content (immutability). There’s no “updating” a CID–you create a new one. This is similar to Git commits.

3. “How does the DHT find content in a large network?”

Expected answer: Kademlia uses XOR distance between peer IDs and content keys. It iteratively queries peers closer and closer to the target key. Each hop cuts the search space roughly in half, achieving O(log N) lookup complexity. For a million-node network, this means about 20 hops.

4. “What if no peer has the content I’m looking for?”

Expected answer: The DHT query returns empty–no providers found. The content is effectively lost unless someone who has it comes back online or re-adds it. This is why pinning and redundancy are important.

5. “How does IPFS prevent Sybil attacks on the DHT?”

Expected answer: Peer IDs are hashes of public keys. While creating many IDs is cheap, they’re randomly distributed in the keyspace. An attacker can’t control which k-buckets they land in or which content they’re “close” to. Additionally, provider records are stored on the k closest peers, requiring the attacker to control many IDs near each target.

6. “What’s the difference between IPFS and BitTorrent?”

Expected answer:

BitTorrent uses per-file swarms with .torrent files. IPFS has a global DHT where any CID can be looked up.
BitTorrent is mainly for large file sharing. IPFS handles any data size and supports directories.
IPFS integrates content addressing at the protocol level. BitTorrent uses info hashes but URLs still require trackers or DHT.
IPFS supports Merkle DAGs for incremental updates. BitTorrent treats files as opaque chunks.

7. “Why do NFTs use IPFS for metadata and images?”

Expected answer: On-chain storage is extremely expensive (hundreds of dollars per KB). IPFS provides:

Decentralized storage that can’t be censored
Content addressing that guarantees the metadata/image matches what the NFT references
Permanence as long as any node pins the content
Verification without trusting the hosting server

8. “What’s pinning and why is it needed?”

Expected answer: Pinning tells your node to keep content forever and not garbage collect it. Without pinning, unpopular content that no one is actively requesting will eventually be removed to free space. Pinning services (like Pinata, Infura) provide redundant storage so content survives even if original uploader goes offline.

Hints in Layers

Hint 1: Start with Simple Content Addressing

Compute CID from bytes:

import (
    "crypto/sha256"
    "github.com/multiformats/go-multihash"
    "github.com/ipfs/go-cid"
)

func computeCID(data []byte) (cid.Cid, error) {
    // 1. Hash the data
    hash := sha256.Sum256(data)

    // 2. Create multihash
    mh, err := multihash.Encode(hash[:], multihash.SHA2_256)
    if err != nil {
        return cid.Cid{}, err
    }

    // 3. Create CID v1
    c := cid.NewCidV1(cid.Raw, mh)
    return c, nil
}

Hint 2: Chunking Large Files

const ChunkSize = 256 * 1024  // 256 KB

func chunkFile(data []byte) [][]byte {
    var chunks [][]byte
    for i := 0; i < len(data); i += ChunkSize {
        end := i + ChunkSize
        if end > len(data) {
            end = len(data)
        }
        chunks = append(chunks, data[i:end])
    }
    return chunks
}

Hint 3: Build Merkle DAG

type DAGNode struct {
    Data  []byte
    Links []Link
}

type Link struct {
    Name string
    Cid  cid.Cid
    Size uint64
}

func buildDAG(chunks [][]byte) (cid.Cid, error) {
    // Create leaf nodes for each chunk
    var links []Link
    for i, chunk := range chunks {
        chunkCID, _ := computeCID(chunk)
        links = append(links, Link{
            Name: fmt.Sprintf("chunk-%d", i),
            Cid:  chunkCID,
            Size: uint64(len(chunk)),
        })
        // Store chunk in local blockstore
        storeBlock(chunkCID, chunk)
    }

    // Create root node with links
    rootNode := DAGNode{Links: links}
    rootBytes := encodeNode(rootNode)  // Protobuf or CBOR
    rootCID, _ := computeCID(rootBytes)

    storeBlock(rootCID, rootBytes)
    return rootCID, nil
}

Hint 4: Simple DHT (Kademlia basics)

type RoutingTable struct {
    localID  PeerID
    buckets  [256]*KBucket  // One bucket per bit in ID
}

type KBucket struct {
    peers []PeerInfo
    k     int  // Usually 20
}

func (rt *RoutingTable) FindClosest(key []byte, count int) []PeerInfo {
    // Compute XOR distance between key and each peer ID
    // Return 'count' closest peers
}

func (rt *RoutingTable) FindProviders(cid cid.Cid) ([]PeerInfo, error) {
    key := cid.Hash()

    // Iterative lookup
    visited := make(map[string]bool)
    toQuery := rt.FindClosest(key, 3)  // Start with 3 closest

    for len(toQuery) > 0 {
        peer := toQuery[0]
        toQuery = toQuery[1:]

        if visited[peer.ID] {
            continue
        }
        visited[peer.ID] = true

        // Ask peer for providers
        providers := peer.FindProviders(cid)
        if len(providers) > 0 {
            return providers, nil
        }

        // Ask peer for closer peers
        closerPeers := peer.FindCloserPeers(key)
        toQuery = append(toQuery, closerPeers...)
    }

    return nil, errors.New("no providers found")
}

Hint 5: Use Existing IPFS Libraries

Don’t build from scratch (unless you want the deepest learning):

go-ipfs: Full IPFS implementation (study this!)
go-cid: CID handling
go-multihash: Multihash encoding
libp2p: P2P networking layer
go-datastore: Key-value storage

Hint 6: Test with Real IPFS

Verify your CIDs match real IPFS:

# Add file to your implementation
$ ./ipfs-lite add test.txt
CID: QmXa5b...

# Add same file to real IPFS
$ ipfs add test.txt
QmXa5b...  # Should be SAME CID!

# Fetch from real IPFS gateway
$ curl https://ipfs.io/ipfs/QmXa5b...
# Should return your file

Testing Strategy

Unit Tests

func TestComputeCID_Empty(t *testing.T) {
    cid := ComputeCID([]byte{})
    // Empty data should still produce valid CID
    assert.NotEmpty(t, cid.String())
}

func TestComputeCID_Deterministic(t *testing.T) {
    data := []byte("Hello, IPFS!")
    cid1 := ComputeCID(data)
    cid2 := ComputeCID(data)
    assert.Equal(t, cid1, cid2)
}

func TestChunking_SmallFile(t *testing.T) {
    data := []byte("small")
    chunks := ChunkFile(data)
    assert.Equal(t, 1, len(chunks))
}

func TestChunking_ExactMultiple(t *testing.T) {
    data := make([]byte, 256*1024*3)  // Exactly 3 chunks
    chunks := ChunkFile(data)
    assert.Equal(t, 3, len(chunks))
}

func TestDAG_Reconstruction(t *testing.T) {
    original := randomBytes(1024 * 1024)  // 1 MB

    // Add to system
    rootCID := Add(original)

    // Retrieve
    retrieved := Get(rootCID)

    assert.Equal(t, original, retrieved)
}

DHT Tests

func TestXORDistance(t *testing.T) {
    a := []byte{0xFF, 0x00}
    b := []byte{0xF0, 0x00}

    dist := XORDistance(a, b)
    expected := []byte{0x0F, 0x00}

    assert.Equal(t, expected, dist)
}

func TestKBucket_Insertion(t *testing.T) {
    rt := NewRoutingTable(myID, 20)

    // Insert 25 peers
    for i := 0; i < 25; i++ {
        rt.AddPeer(randomPeer())
    }

    // Bucket should have at most k=20 peers
    bucket := rt.GetBucket(0)
    assert.LessOrEqual(t, len(bucket.peers), 20)
}

func TestFindClosest(t *testing.T) {
    rt := NewRoutingTable(myID, 20)

    // Add known peers
    rt.AddPeer(peer1)
    rt.AddPeer(peer2)
    rt.AddPeer(peer3)

    closest := rt.FindClosest(targetKey, 2)

    // Verify returned peers are actually closest
    dist0 := XORDistance(closest[0].ID, targetKey)
    dist1 := XORDistance(closest[1].ID, targetKey)
    assert.True(t, dist0 <= dist1)
}

Integration Tests

func TestEndToEnd_TwoNodes(t *testing.T) {
    // Start two nodes
    node1 := StartNode(port: 4001)
    node2 := StartNode(port: 4002, bootstrap: node1.Addr())

    // Add file on node1
    data := []byte("test data for e2e")
    cid := node1.Add(data)

    // Retrieve on node2
    retrieved := node2.Get(cid)

    assert.Equal(t, data, retrieved)

    // Verify node2 is now a provider
    providers := node1.DHT.FindProviders(cid)
    assert.Contains(t, providers, node2.ID)
}

func TestResilience_NodeOffline(t *testing.T) {
    node1 := StartNode(port: 4001)
    node2 := StartNode(port: 4002, bootstrap: node1.Addr())
    node3 := StartNode(port: 4003, bootstrap: node1.Addr())

    // Add on node1
    cid := node1.Add(testData)

    // Fetch on node2 (becomes provider)
    node2.Get(cid)

    // Stop node1
    node1.Stop()

    // Node3 should still be able to get from node2
    retrieved := node3.Get(cid)
    assert.Equal(t, testData, retrieved)
}

Learning Milestones

Track your progress:

Content addressing works - You compute CIDs that match real IPFS
Large files chunk correctly - Files split and reassemble properly
Merkle DAG builds correctly - Root CID represents entire file structure
DHT finds providers - Iterative lookup locates content
Multi-peer download works - Parallel fetching from multiple sources
Pinning preserves content - GC doesn’t remove pinned data
Node resilience proven - Content survives node failures

Books That Will Help

Topic	Book	Chapter
Content addressing	“Designing Data-Intensive Applications”	Ch. 6: Partitioning & Ch. 11: Stream Processing
Merkle DAGs	“Mastering Bitcoin”	Ch. 9: The Blockchain
Distributed hash tables	“Designing Data-Intensive Applications”	Ch. 6: Partitioning (DHT section)
P2P networking	“Computer Networks, Fifth Edition”	Ch. 2: Application Layer (P2P section)
Cryptographic hashing	“Serious Cryptography, 2nd Edition”	Ch. 6: Hash Functions
Distributed systems	“Designing Data-Intensive Applications”	Ch. 5: Replication & Ch. 6: Partitioning
Consensus and fault tolerance	“Designing Data-Intensive Applications”	Ch. 8: Trouble with Distributed Systems

Additional Resources

Official Documentation

IPFS Documentation - Comprehensive guide to IPFS concepts
IPFS Specifications - Protocol specifications
Content Addressing - CID deep dive
libp2p Documentation - P2P networking stack

Academic Papers

Kademlia: A Peer-to-peer Information System Based on the XOR Metric - Original DHT paper
IPFS - Content Addressed, Versioned, P2P File System - IPFS whitepaper

Implementation References

go-ipfs source code - Reference implementation
BitSwap Specification - Block exchange protocol
libp2p specs - Networking layer specs

Community Resources

How IPFS DHT Works - Visual DHT explanation
ProtoSchool - Interactive IPFS tutorials
IPFS Camp - Workshop materials

What You’ll Have Built

By completing this project, you will have created:

A content-addressed storage system that identifies data by cryptographic hash
A chunking and DAG engine that handles files of any size efficiently
A Kademlia DHT implementation with k-buckets and iterative routing
A peer-to-peer network with connection management and block exchange
A complete CLI tool for adding, getting, and pinning content

More importantly, you’ll deeply understand:

Why Web3 needs a separate data layer from blockchains
How content addressing enables trustless verification
How DHTs scale to millions of nodes
Why IPFS is the backbone of NFT metadata, dApp frontends, and decentralized storage

This knowledge directly applies to:

Building dApps that store data off-chain
Understanding how Filecoin incentivizes IPFS storage
Designing censorship-resistant systems
Working with any content-addressed system (Git, Nix, etc.)