← Back to all projects

LEARN CRDT DEEP DIVE PROJECTS

In the early days of computing, data lived in one place. If two people wanted to edit the same file, they took turns. As the internet grew, we moved to Strong Consistency models where a central server (like a database) acted as the ultimate arbiter of truth. But this model breaks at scale: it's slow, it requires you to be online, and it fails during network partitions.

Learn CRDTs: From Zero to CRDT Master

Goal: Deeply understand Conflict-Free Replicated Data Types (CRDTs)—the mathematical foundation of modern collaborative software. You will move from basic counters to complex sequence CRDTs used in text editors, learning how to achieve Strong Eventual Consistency (SEC) without central servers or locking. By the end, you’ll be able to build “local-first” applications that work offline and sync seamlessly across the globe.


Why CRDTs Matter

In the early days of computing, data lived in one place. If two people wanted to edit the same file, they took turns. As the internet grew, we moved to “Strong Consistency” models where a central server (like a database) acted as the ultimate arbiter of truth. But this model breaks at scale: it’s slow, it requires you to be online, and it fails during network partitions.

CRDTs represent a paradigm shift. Formally defined in 2011, they allow data to be updated concurrently on different machines without any coordination. When these machines eventually talk to each other, the CRDTs merge the changes deterministically. There are no “merge conflicts” in the traditional sense; the data structure itself is designed to resolve them.

What understanding this unlocks:

  • Local-First Software: Apps that feel instantaneous because they don’t wait for a server.
  • P2P Collaboration: Real-time editing (like Google Docs) without a central authority.
  • Partition Tolerance: Systems that keep working even when the internet is cut.
  • Distributed Databases: Systems like Riak or CosmosDB use CRDTs to scale globally.

Core Concept Analysis

1. The Anatomy of a CRDT

A CRDT is essentially a data structure plus a merge function. For state-based CRDTs (CvRDTs), this merge function must be a Join-Semilattice.

    Replica A (State S1)          Replica B (State S2)
           |                             |
     Local Update (f)              Local Update (g)
           |                             |
        State S1'                     State S2'
           \                           /
            \       NETWORK SYNC      /
             \_______________________/
                        |
                 MERGE(S1', S2')
                        |
             Converged State (S_final)

2. The Mathematical “Triple Threat”

To guarantee that every replica reaches the same state regardless of the order of messages, the merge function ($\sqcup$) must satisfy:

  1. Commutativity: $A \sqcup B = B \sqcup A$ (Order doesn’t matter).
  2. Associativity: $(A \sqcup B) \sqcup C = A \sqcup (B \sqcup C)$ (Grouping doesn’t matter).
  3. Idempotence: $A \sqcup A = A$ (Duplicates don’t matter).

3. State-Based vs. Operation-Based

  • CvRDT (Convergent): You send the whole state. Easier to reason about, handles message loss/duplicates naturally, but can be bandwidth-heavy.
  • CmRDT (Commutative): You send the operation (e.g., “add 5”). Requires a delivery layer that guarantees “Exactly Once” and “Causal Order.”

4. Vector Clocks: Tracking History

To know if two events are concurrent or if one happened before another, we use Vector Clocks.

Replica A: [A:1, B:0, C:0] --(Sync)--> Replica B: [A:1, B:1, C:0]
      |                                      |
   Update                                 Update
      |                                      |
Replica A: [A:2, B:0, C:0]             Replica B: [A:1, B:2, C:0]
      \                                     /
       \__________ CONCURRENT _____________/

Concept Summary Table

Concept Cluster What You Need to Internalize
Join-Semilattice The mathematical structure of state-based CRDTs. States only move “up” in a partial order.
Causality Knowing if event A “happened before” event B. Critical for removing elements correctly.
Tombstones How we “delete” data in a system where we can’t actually throw anything away (yet).
Idempotence Why receiving the same packet twice shouldn’t break your data.
Sequence CRDTs The “Final Boss” of CRDTs. Handling indices in a list that everyone is changing at once.

Deep Dive Reading by Concept

Distributed Systems Foundations

| Concept | Book & Chapter | |:——–|:—————| | Eventual Consistency | “Designing Data-Intensive Applications” by Martin Kleppmann — Ch. 5: “Replication” | | Vector Clocks | “Distributed Systems: Principles and Paradigms” by Tanenbaum — Ch. 6: “Synchronization” |

CRDT Specifics

| Concept | Paper/Resource | |:——–|:—————| | The Seminal Paper | “A comprehensive study of Convergent and Commutative Replicated Data Types” by Marc Shapiro et al. | | Local-First | “Local-first software: You own your data, in spite of the cloud” by Ink & Switch (Blog/Paper) | —

Project List

Projects are ordered from fundamental mathematical building blocks to complex sequence implementations.


Project 1: The G-Counter (Grow-only Counter)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Go, Rust, JavaScript
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Distributed Systems / Data Structures
  • Software or Tool: Distributed Analytics (e.g., page view counter)
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A distributed counter where multiple independent nodes can increment the value. When nodes sync, the counter always reaches the correct global total without a central leader.

Why it teaches CRDTs: It introduces the concept of a “state-based” CRDT and the “Max” merge function. You’ll learn why a simple total = total + 1 fails in a distributed environment and how to track per-node state.

Core challenges you’ll face:

  • Partial Updates → How to represent increments so they can be merged multiple times without double-counting.
  • Node Identity → Why each node needs a unique ID in the system.
  • Idempotent Merging → Ensuring that syncing twice with the same data doesn’t change the result.

Key Concepts:

  • State-based CRDTs: Shapiro et al., “A comprehensive study…” Section 2.1
  • Join-Semilattice: “CRDT Tutorial” - Code & Cluster

Difficulty: Beginner Time estimate: 2-4 hours Prerequisites: Basic knowledge of dictionaries/maps and JSON.


Real World Outcome

You’ll have a script that simulates three nodes (A, B, C). You can increment them independently and then merge them to see the consistent total.

Example Output:

$ python g_counter.py
Node A increments -> State: {'A': 1, 'B': 0, 'C': 0}
Node B increments twice -> State: {'A': 0, 'B': 2, 'C': 0}
--- Syncing A and B ---
Merged State: {'A': 1, 'B': 2, 'C': 0} | Global Count: 3
Node C increments -> State: {'A': 0, 'B': 0, 'C': 1}
--- Syncing all ---
Final State: {'A': 1, 'B': 2, 'C': 1} | Global Count: 4

The Core Question You’re Answering

“If we both count ‘1’, and then we share our totals, how do I know if that’s the same ‘1’ or two different ‘1’s?”

Before you write any code, sit with this question. In a centralized system, the server just adds. In a distributed system, you need to know who did the adding to avoid “over-counting” during sync.


Concepts You Must Understand First

Stop and research these before coding:

  1. Idempotence
    • If I call merge(stateA, stateB) twice, should the result change?
    • Why is total += 1 not idempotent?
    • Book Reference: “Designing Data-Intensive Applications” Ch. 5
  2. Vector-like Storage
    • How can we store counts for each individual node separately?
    • What happens when a new node joins the cluster?

Questions to Guide Your Design

Before implementing, think through these:

  1. The Merge Function
    • If Node A has {A: 5} and Node B has {A: 3}, what should the merged value for ‘A’ be? Why?
    • Does it matter if I merge A into B or B into A?
  2. The Query Function
    • How do you calculate the “Total” from the internal dictionary?

Thinking Exercise

The Over-counting Trap

Imagine two nodes, A and B, both start at 0.

  1. A increments to 1.
  2. B increments to 1.
  3. They sync. The total should be 2.

Now imagine they sync again. If your merge logic is total = A.total + B.total, what happens on the second sync?

Questions while tracing:

  • How does storing {'A': 1, 'B': 1} solve this?
  • Why does max(A_val, B_val) protect you from network duplicates?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is a Join-Semilattice in the context of CRDTs?”
  2. “Why does every node in a G-Counter need a unique ID?”
  3. “Is a G-Counter useful for a ‘Number of users online’ feature? Why/Why not?”
  4. “What happens to the state size as the number of nodes increases?”
  5. “Can a G-Counter handle decrements?”

Hints in Layers

Hint 1: The Internal State Don’t store a single integer. Store a map where keys are node IDs and values are integers.

Hint 2: Increments When Node A wants to increment, it only updates its own entry in its local map: my_state['A'] += 1.

Hint 3: Merging To merge two maps, create a new map. For every key that exists in either map, the value in the new map should be the maximum of the values found in the two source maps.

Hint 4: Summing The user-facing “Value” of the counter is simply the sum of all values in the map.


Books That Will Help

Topic Book Chapter
Replication Basics “Designing Data-Intensive Applications” by Martin Kleppmann Ch. 5
CRDT Formalisms “A comprehensive study of CvRDTs” by Shapiro Section 3.1

Implementation Hints

  1. State Initialization: Every node starts with an empty map or a map initialized with 0 for its own ID.
  2. The Max Property: The core of the merge is new_state[id] = max(local_state.get(id, 0), remote_state.get(id, 0)). This ensures that even if you receive an old message, you don’t “go backwards” in time.
  3. Partial Order: Recognize that {'A': 2} is “greater than” {'A': 1}. This is the partial order that makes the semilattice work.

Learning Milestones

  1. Local Increment works → You understand per-node state.
  2. Merge handles concurrent increments → You’ve achieved eventual consistency.
  3. Double-sync is idempotent → You understand the “Max” property’s importance.

Project 2: The PN-Counter (Positive-Negative Counter)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Rust, Go, TypeScript
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Distributed Systems
  • Software or Tool: Distributed Inventory or Vote Counter
  • Main Book: “A comprehensive study of Convergent and Commutative Replicated Data Types” by Marc Shapiro

What you’ll build: A counter that supports both increments and decrements (e.g., a “Likes” counter where you can un-like).

Why it teaches CRDTs: It teaches “Composition.” You’ll learn that most complex CRDTs are just combinations of simpler ones. It also explains why you can’t just “decrement” a G-Counter without breaking the mathematical properties.

Core challenges you’ll face:

  • Non-Monotonicity → How to handle “going down” when the underlying math only allows “going up.”
  • Composition → Managing two internal G-Counters as a single unit.

Difficulty: Beginner Time estimate: 2 hours Prerequisites: Project 1 (G-Counter).


Real World Outcome

A counter where you can call inc() and dec() across nodes and still get the right sum.

Example Output:

$ python pn_counter.py
Node A: inc, inc, dec -> State: {P: {'A': 2}, N: {'A': 1}} | Val: 1
Node B: inc, inc -> State: {P: {'B': 2}, N: {'B': 0}} | Val: 2
--- Sync A and B ---
Final Val: 3 (4 total increments, 1 total decrement)

The Core Question You’re Answering

“If a CRDT can only grow (monotonicity), how do we represent a value that goes down?”

Think about how accounting works. They don’t erase numbers; they add a “debit” entry.


Concepts You Must Understand First

  1. Composition of Semilattices
    • If A and B are Join-Semilattices, is (A, B) a Join-Semilattice?
    • Book Reference: “A comprehensive study of CvRDTs” by Shapiro

Project 3: The G-Set (Grow-only Set)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: JavaScript/Node.js
  • Alternative Programming Languages: Python, C++
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Data Structures
  • Software or Tool: Shared Tag Cloud / Event Log
  • Main Book: “Introduction to Reliable and Secure Distributed Programming” by Christian Cachin

What you’ll build: A set where you can only add items. Once an item is in, it stays in forever.

Why it teaches CRDTs: It’s the set version of a G-Counter. It teaches you that “Set Union” is the perfect merge function (it’s associative, commutative, and idempotent by nature).


Project 4: The 2P-Set (Two-Phase Set)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: Python, Java
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Distributed Storage
  • Software or Tool: User Blacklist / One-time voucher system
  • Main Book: “Distributed Systems” by Maarten van Steen

What you’ll build: A set that allows additions AND removals. However, there’s a catch: once you remove an item, it can never be added again.

Why it teaches CRDTs: It introduces Tombstones. You’ll learn that “deleting” in a distributed system usually means “marking as deleted” in a separate internal data structure.

Core challenges you’ll face:

  • The “Add-Remove-Add” problem → Why the second “Add” fails in a 2P-Set.
  • Tombstone accumulation → Realizing that “deleted” items still take up memory.

Project 5: LWW-Element-Set (Last-Write-Wins Set)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Rust, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Distributed Systems
  • Software or Tool: Shared Shopping Cart / Config Manager
  • Main Book: “Distributed Systems: Principles and Paradigms” by Andrew S. Tanenbaum

What you’ll build: A set where items can be added and removed multiple times. Conflicts are resolved using timestamps: the most recent operation wins.

Why it teaches CRDTs: It introduces the “Last-Write-Wins” resolution policy. You’ll learn how to use metadata (timestamps) to decide between conflicting operations (Add vs. Remove) on the same element.

Core challenges you’ll face:

  • Clock Skew → What happens if Node A’s clock is 5 minutes ahead of Node B?
  • Tie-breaking → What if two nodes perform opposing operations at the exact same millisecond?

Real World Outcome

A set that feels like a normal set, but handles concurrent edits by picking the “latest” one.

Example Output:

Node A: add("pizza", timestamp: 100)
Node B: remove("pizza", timestamp: 110)
--- Sync ---
Result: "pizza" is NOT in the set (Remove won because it happened later).

The Core Question You’re Answering

“In a world without a single source of time, who are we to say what happened ‘last’?”

This project forces you to confront the reality of physical clocks in distributed systems.


Project 6: The OR-Set (Observed-Remove Set)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: Rust, Go
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Distributed Algorithms
  • Software or Tool: Collaborative Todo List
  • Main Book: “A comprehensive study of CvRDTs” by Marc Shapiro

What you’ll build: A robust set where an element is in the set if and only if there is an “Add” operation that has not been “overshadowed” by a causally subsequent “Remove” operation.

Why it teaches CRDTs: This is the “Gold Standard” of sets. It teaches you how to use Unique Tags for every operation to distinguish between different “versions” of the same element. It solves the Add-Remove conflict without relying on fragile timestamps.

Core challenges you’ll face:

  • Causal Relationships → Ensuring a “Remove” only affects “Adds” that it has actually seen.
  • Metadata Management → Handling the list of tags for each element.

Project 7: Vector Clock Implementation

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Distributed Logical Time
  • Software or Tool: Versioning system / Causality tracker
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A logical clock that can tell you if two events are concurrent, or if one happened before the other.

Why it teaches CRDTs: Vector clocks are the “heartbeat” of advanced CRDTs (like the OR-Set or Sequence CRDTs). You cannot build reliable distributed systems without understanding how to track “Happened-Before” relationships.


Real World Outcome

A comparison tool that takes two vector clocks and returns: BEFORE, AFTER, CONCURRENT, or EQUAL.

Example Output:

$ ./vclock compare [A:1, B:0] [A:1, B:1]
Result: BEFORE
$ ./vclock compare [A:2, B:0] [A:1, B:1]
Result: CONCURRENT (Split brain!)

Project 9: The LWW-Register (Last-Write-Wins Register)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, TypeScript
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Distributed State
  • Software or Tool: Shared User Profile Settings
  • Main Book: “Distributed Systems” by Maarten van Steen

What you’ll build: A register that holds a single value (string or number). When two nodes update it concurrently, the one with the higher timestamp wins.

Why it teaches CRDTs: It’s the simplest way to handle generic data updates. It teaches the “Register” concept and the inherent dangers of wall-clock time in distributed systems.


Project 10: Multi-Value Register (MV-Register)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: Rust
  • Alternative Programming Languages: Go, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Conflict Resolution
  • Software or Tool: Distributed Configuration Store (like Riak’s core)
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A register that, instead of overwriting concurrent changes, preserves both. If two nodes update the register without knowing about each other’s changes, the register will return a set of values (a “conflict”) that must be resolved later.

Why it teaches CRDTs: It introduces the concept of Causal Context. You’ll learn how to use version vectors to detect if one write “supersedes” another or if they are “sibling” writes that need manual merging.


Real World Outcome

A register where “winning” isn’t automatic if the writes were concurrent.

Example Output:

Node A: set("Blue")
Node B: set("Red") (Concurrent)
--- Sync ---
Result: ["Blue", "Red"] // Both values preserved!
Node A: set("Purple", context: ["Blue", "Red"])
--- Sync ---
Result: ["Purple"] // Conflict resolved by Node A

Project 11: Recursive Map CRDT

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: Rust, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Composite Data Structures
  • Software or Tool: Collaborative JSON-like Store
  • Main Book: “A comprehensive study of CvRDTs” by Marc Shapiro

What you’ll build: A map (dictionary) where each key maps to another CRDT (a counter, a set, or even another map).

Why it teaches CRDTs: This is how real applications are built. You’ll learn how to nest CRDTs and propagate merge calls down the tree. It teaches you that “The CRDT of a Map of CRDTs is a CRDT.”


Project 13: RGA (Replicated Growable Array)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: Rust, Python
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Collaborative Text Editing
  • Software or Tool: Minimalist Google Docs Clone
  • Main Book: “Replicated growable array” by Roh et al. (Paper)

What you’ll build: A list structure that allows multiple people to insert characters at any position. Even if they insert at the “same” spot concurrently, the result will be identical for everyone.

Why it teaches CRDTs: This is the “Aha!” moment for collaborative text. You’ll learn that using array indices (0, 1, 2) is impossible in distributed systems, and you must instead use Immutable Unique IDs for every single character.

Core challenges you’ll face:

  • Shifting Indices → If I insert at index 5, but you just deleted index 2, my index 5 is now wrong. How to fix?
  • Interleaving → Ensuring that “Hello” + “World” doesn’t become “HWeolrldlo”.

Real World Outcome

A command-line text buffer where two processes can “type” simultaneously and sync to see the same string.

Example Output:

Node A: Insert 'H' at START
Node B: Insert 'i' after 'H'
--- Sync ---
Buffer: "Hi"
Node A: Insert '!' after 'i'
Node B: Insert '?' after 'i'
--- Sync ---
Buffer: "Hi!?" (Deterministic order)

The Core Question You’re Answering

“In a shared string, what is the ‘address’ of a character if indices keep changing?”

This project teaches you to think in terms of Causal Pointers (e.g., “Insert ‘B’ after the character with ID {Node:A, Counter:15}”).


Project 14: LSEQ (Logarithmic Sequence)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: Rust
  • Alternative Programming Languages: C++, TypeScript
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Algorithms / Indexing
  • Software or Tool: High-performance Text CRDT
  • Main Book: “LSEQ: an Adaptive Structure for Optimistic Editing with Large Number of Collaborators” by Brice Nédelec

What you’ll build: A sequence CRDT that uses “Fractional Indexing” (assigning numbers between 0 and 1) to position elements.

Why it teaches CRDTs: It addresses the metadata bloat problem. You’ll learn how to represent positions as paths in a tree to keep identifiers as short as possible while allowing infinite insertions between any two points.


Project 15: YATA (Yet Another Transformation Algorithm)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: JavaScript
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Distributed Logic
  • Software or Tool: The core logic of Yjs
  • Main Book: “Near Real-Time Peer-to-Peer Shared Editing on Extensible Data Types” by Nicolae et al.

What you’ll build: An optimized sequence CRDT that uses a doubly-linked list of operations.

Why it teaches CRDTs: This is the algorithm used by Yjs, one of the fastest CRDT libraries in the world. You’ll learn how to optimize insertions and deletions by keeping a flat list of operations and using a specific conflict resolution rule based on origin and left/right neighbors.


Project 17: P2P Gossip Protocol Simulation

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Rust, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Networking
  • Software or Tool: Distributed Cluster Membership / CRDT Sync
  • Main Book: “Distributed Systems: Principles and Paradigms” by Tanenbaum

What you’ll build: A system where nodes randomly pick neighbors to share their CRDT state with, eventually propagating an update to the whole network.

Why it teaches CRDTs: CRDTs don’t sync themselves. You’ll learn how to build a decentralized “anti-entropy” mechanism that ensures eventual delivery without a central server.


Project 18: Tombstone Garbage Collection

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: Rust
  • Alternative Programming Languages: Go, C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 4: Expert
  • Knowledge Area: Memory Management
  • Software or Tool: Production-ready CRDT Engine
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A mechanism to safely “forget” deleted items (tombstones) once you are sure all nodes have seen the deletion.

Why it teaches CRDTs: This is the “hidden cost” of CRDTs. You’ll learn about Stability Conditions and how to use vector clocks to determine when a piece of metadata is no longer needed by any node in the cluster.


Project 19: Local-First Hybrid (SQLite + CRDT)

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: Rust, Go
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Persistence / Databases
  • Software or Tool: Offline-capable Mobile/Web App
  • Main Book: “Local-first software” by Ink & Switch

What you’ll build: A system where data is stored in a local SQLite database for speed and offline access, but synced via CRDTs when the network is available.

Why it teaches CRDTs: You’ll learn how to map SQL tables to CRDT structures and how to handle the “Object-Relational Mapping” for distributed types. This is how modern apps like Linear or Reflect are built.


Project 20: Final Overall Project: P2P Collaborative Code Editor

  • File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
  • Main Programming Language: TypeScript (React/Vue + Monaco Editor)
  • Alternative Programming Languages: Rust (WASMed), JavaScript
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 5: Master
  • Knowledge Area: Full-stack Distributed Systems
  • Software or Tool: A competitor to VS Code Live Share (but P2P)
  • Main Book: “A Conflict-Free Replicated JSON Datatype” by Martin Kleppmann

What you’ll build: A fully functional browser-based code editor where multiple users can join a session (via WebRTC), edit code together, chat in a sidebar, and see each other’s cursors—all without a central database server.

Why it teaches CRDTs: This applies everything.

  • RGA/YATA for the code buffer.
  • LWW-Registers for cursor positions.
  • OR-Sets for the user list.
  • WebRTC for P2P transport.
  • Gossip for state discovery.

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. G-Counter Level 1 2h Basic Merging ⭐⭐
6. OR-Set Level 3 1w Causality & Tags ⭐⭐⭐⭐
13. RGA Level 4 2w Sequences & IDs ⭐⭐⭐⭐⭐
16. JSON CRDT Level 5 1m Complex Documents ⭐⭐⭐⭐⭐
20. P2P Editor Level 5 2m+ Total Mastery ⭐⭐⭐⭐⭐

Recommendation

If you are new to distributed systems: Start with Project 1 (G-Counter) and Project 3 (G-Set). They are the “Hello World” of the CRDT world.

If you want to build a real app: Jump to Project 6 (OR-Set). This is the minimum requirement for a functional shared list (like a shopping list or todo app).

If you want to master text collaboration: Focus on Project 13 (RGA). It will change the way you think about arrays and indices forever.


Summary

This learning path covers Conflict-Free Replicated Data Types (CRDTs) through 20 hands-on projects. Here’s the complete list:

# Project Name Main Language Difficulty Time Estimate
1 G-Counter Python Level 1 2-4 hours
2 PN-Counter Python Level 1 2 hours
3 G-Set JavaScript Level 1 2 hours
4 2P-Set JavaScript Level 2 4 hours
5 LWW-Element-Set Python Level 2 1 day
6 OR-Set TypeScript Level 3 1 week
7 Vector Clocks Go Level 2 2 days
8 Op-based Counter Python Level 3 3 days
9 LWW-Register Go Level 1 2 hours
10 MV-Register Rust Level 3 1 week
11 Recursive Map TypeScript Level 3 1 week
12 Delta-State Optimizer Rust Level 4 1-2 weeks
13 RGA (Text Sequence) JavaScript Level 4 2 weeks
14 LSEQ (Indexing) Rust Level 4 2 weeks
15 YATA (Yjs Core) JavaScript Level 4 2 weeks
16 JSON CRDT TypeScript Level 5 1 month
17 Gossip Protocol Go Level 3 1 week
18 Tombstone GC Rust Level 4 2 weeks
19 SQLite Hybrid TypeScript Level 4 2 weeks
20 P2P Code Editor TypeScript Level 5 2 months+

Expected Outcomes

After completing these projects, you will:

  • Understand the mathematical properties (Commutativity, Associativity, Idempotence) required for eventual consistency.
  • Be able to implement both state-based and operation-based synchronization.
  • Master the use of Vector Clocks and causal metadata to resolve conflicts.
  • Know how to build a collaborative text editor from scratch without using third-party libraries.
  • Be capable of designing “Local-First” applications that are resilient to network partitions.

You’ll have built 20 working projects that demonstrate deep understanding of distributed systems from first principles.