Learn CRDTs: From Zero to CRDT Master

Goal: Deeply understand Conflict-Free Replicated Data Types (CRDTs)—the mathematical foundation of modern collaborative software. You will move from basic counters to complex sequence CRDTs used in text editors, learning how to achieve Strong Eventual Consistency (SEC) without central servers or locking. By the end, you’ll be able to build “local-first” applications that work offline and sync seamlessly across the globe.

Why CRDTs Matter

In the early days of computing, data lived in one place. If two people wanted to edit the same file, they took turns. As the internet grew, we moved to “Strong Consistency” models where a central server (like a database) acted as the ultimate arbiter of truth. But this model breaks at scale: it’s slow, it requires you to be online, and it fails during network partitions.

CRDTs represent a paradigm shift. Formally defined in 2011, they allow data to be updated concurrently on different machines without any coordination. When these machines eventually talk to each other, the CRDTs merge the changes deterministically. There are no “merge conflicts” in the traditional sense; the data structure itself is designed to resolve them.

What understanding this unlocks:

Local-First Software: Apps that feel instantaneous because they don’t wait for a server.
P2P Collaboration: Real-time editing (like Google Docs) without a central authority.
Partition Tolerance: Systems that keep working even when the internet is cut.
Distributed Databases: Systems like Riak or CosmosDB use CRDTs to scale globally.

Core Concept Analysis

1. The Anatomy of a CRDT

A CRDT is essentially a data structure plus a merge function. For state-based CRDTs (CvRDTs), this merge function must be a Join-Semilattice.

    Replica A (State S1)          Replica B (State S2)
           |                             |
     Local Update (f)              Local Update (g)
           |                             |
        State S1'                     State S2'
           \                           /
            \       NETWORK SYNC      /
             \_______________________/
                        |
                 MERGE(S1', S2')
                        |
             Converged State (S_final)

2. The Mathematical “Triple Threat”

To guarantee that every replica reaches the same state regardless of the order of messages, the merge function ($\sqcup$) must satisfy:

Commutativity: $A \sqcup B = B \sqcup A$ (Order doesn’t matter).
Associativity: $(A \sqcup B) \sqcup C = A \sqcup (B \sqcup C)$ (Grouping doesn’t matter).
Idempotence: $A \sqcup A = A$ (Duplicates don’t matter).

3. State-Based vs. Operation-Based

CvRDT (Convergent): You send the whole state. Easier to reason about, handles message loss/duplicates naturally, but can be bandwidth-heavy.
CmRDT (Commutative): You send the operation (e.g., “add 5”). Requires a delivery layer that guarantees “Exactly Once” and “Causal Order.”

4. Vector Clocks: Tracking History

To know if two events are concurrent or if one happened before another, we use Vector Clocks.

Replica A: [A:1, B:0, C:0] --(Sync)--> Replica B: [A:1, B:1, C:0]
      |                                      |
   Update                                 Update
      |                                      |
Replica A: [A:2, B:0, C:0]             Replica B: [A:1, B:2, C:0]
      \                                     /
       \__________ CONCURRENT _____________/

Concept Summary Table

Concept Cluster	What You Need to Internalize
Join-Semilattice	The mathematical structure of state-based CRDTs. States only move “up” in a partial order.
Causality	Knowing if event A “happened before” event B. Critical for removing elements correctly.
Tombstones	How we “delete” data in a system where we can’t actually throw anything away (yet).
Idempotence	Why receiving the same packet twice shouldn’t break your data.
Sequence CRDTs	The “Final Boss” of CRDTs. Handling indices in a list that everyone is changing at once.

Deep Dive Reading by Concept

Distributed Systems Foundations

CRDT Specifics

Project List

Projects are ordered from fundamental mathematical building blocks to complex sequence implementations.

Project 1: The G-Counter (Grow-only Counter)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: Python
Alternative Programming Languages: Go, Rust, JavaScript
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 1. The “Resume Gold”
Difficulty: Level 1: Beginner
Knowledge Area: Distributed Systems / Data Structures
Software or Tool: Distributed Analytics (e.g., page view counter)
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A distributed counter where multiple independent nodes can increment the value. When nodes sync, the counter always reaches the correct global total without a central leader.

Why it teaches CRDTs: It introduces the concept of a “state-based” CRDT and the “Max” merge function. You’ll learn why a simple total = total + 1 fails in a distributed environment and how to track per-node state.

Core challenges you’ll face:

Partial Updates → How to represent increments so they can be merged multiple times without double-counting.
Node Identity → Why each node needs a unique ID in the system.
Idempotent Merging → Ensuring that syncing twice with the same data doesn’t change the result.

Key Concepts:

State-based CRDTs: Shapiro et al., “A comprehensive study…” Section 2.1
Join-Semilattice: “CRDT Tutorial” - Code & Cluster

Difficulty: Beginner Time estimate: 2-4 hours Prerequisites: Basic knowledge of dictionaries/maps and JSON.

Real World Outcome

You’ll have a script that simulates three nodes (A, B, C). You can increment them independently and then merge them to see the consistent total.

Example Output:

$ python g_counter.py
Node A increments -> State: {'A': 1, 'B': 0, 'C': 0}
Node B increments twice -> State: {'A': 0, 'B': 2, 'C': 0}
--- Syncing A and B ---
Merged State: {'A': 1, 'B': 2, 'C': 0} | Global Count: 3
Node C increments -> State: {'A': 0, 'B': 0, 'C': 1}
--- Syncing all ---
Final State: {'A': 1, 'B': 2, 'C': 1} | Global Count: 4

The Core Question You’re Answering

“If we both count ‘1’, and then we share our totals, how do I know if that’s the same ‘1’ or two different ‘1’s?”

Before you write any code, sit with this question. In a centralized system, the server just adds. In a distributed system, you need to know who did the adding to avoid “over-counting” during sync.

Concepts You Must Understand First

Stop and research these before coding:

Idempotence
- If I call merge(stateA, stateB) twice, should the result change?
- Why is total += 1 not idempotent?
- Book Reference: “Designing Data-Intensive Applications” Ch. 5
Vector-like Storage
- How can we store counts for each individual node separately?
- What happens when a new node joins the cluster?

Questions to Guide Your Design

Before implementing, think through these:

The Merge Function
- If Node A has {A: 5} and Node B has {A: 3}, what should the merged value for ‘A’ be? Why?
- Does it matter if I merge A into B or B into A?
The Query Function
- How do you calculate the “Total” from the internal dictionary?

Thinking Exercise

The Over-counting Trap

Imagine two nodes, A and B, both start at 0.

A increments to 1.
B increments to 1.
They sync. The total should be 2.

Now imagine they sync again. If your merge logic is total = A.total + B.total, what happens on the second sync?

Questions while tracing:

How does storing {'A': 1, 'B': 1} solve this?
Why does max(A_val, B_val) protect you from network duplicates?

The Interview Questions They’ll Ask

Prepare to answer these:

“What is a Join-Semilattice in the context of CRDTs?”
“Why does every node in a G-Counter need a unique ID?”
“Is a G-Counter useful for a ‘Number of users online’ feature? Why/Why not?”
“What happens to the state size as the number of nodes increases?”
“Can a G-Counter handle decrements?”

Hints in Layers

Hint 1: The Internal State Don’t store a single integer. Store a map where keys are node IDs and values are integers.

Hint 2: Increments When Node A wants to increment, it only updates its own entry in its local map: my_state['A'] += 1.

Hint 3: Merging To merge two maps, create a new map. For every key that exists in either map, the value in the new map should be the maximum of the values found in the two source maps.

Hint 4: Summing The user-facing “Value” of the counter is simply the sum of all values in the map.

Books That Will Help

Topic	Book	Chapter
Replication Basics	“Designing Data-Intensive Applications” by Martin Kleppmann	Ch. 5
CRDT Formalisms	“A comprehensive study of CvRDTs” by Shapiro	Section 3.1

Implementation Hints

State Initialization: Every node starts with an empty map or a map initialized with 0 for its own ID.
The Max Property: The core of the merge is new_state[id] = max(local_state.get(id, 0), remote_state.get(id, 0)). This ensures that even if you receive an old message, you don’t “go backwards” in time.
Partial Order: Recognize that {'A': 2} is “greater than” {'A': 1}. This is the partial order that makes the semilattice work.

Learning Milestones

Local Increment works → You understand per-node state.
Merge handles concurrent increments → You’ve achieved eventual consistency.
Double-sync is idempotent → You understand the “Max” property’s importance.

Project 2: The PN-Counter (Positive-Negative Counter)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: Python
Alternative Programming Languages: Rust, Go, TypeScript
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 1: Beginner
Knowledge Area: Distributed Systems
Software or Tool: Distributed Inventory or Vote Counter
Main Book: “A comprehensive study of Convergent and Commutative Replicated Data Types” by Marc Shapiro

What you’ll build: A counter that supports both increments and decrements (e.g., a “Likes” counter where you can un-like).

Why it teaches CRDTs: It teaches “Composition.” You’ll learn that most complex CRDTs are just combinations of simpler ones. It also explains why you can’t just “decrement” a G-Counter without breaking the mathematical properties.

Core challenges you’ll face:

Non-Monotonicity → How to handle “going down” when the underlying math only allows “going up.”
Composition → Managing two internal G-Counters as a single unit.

Difficulty: Beginner Time estimate: 2 hours Prerequisites: Project 1 (G-Counter).

Real World Outcome

A counter where you can call inc() and dec() across nodes and still get the right sum.

Example Output:

$ python pn_counter.py
Node A: inc, inc, dec -> State: {P: {'A': 2}, N: {'A': 1}} | Val: 1
Node B: inc, inc -> State: {P: {'B': 2}, N: {'B': 0}} | Val: 2
--- Sync A and B ---
Final Val: 3 (4 total increments, 1 total decrement)

The Core Question You’re Answering

“If a CRDT can only grow (monotonicity), how do we represent a value that goes down?”

Think about how accounting works. They don’t erase numbers; they add a “debit” entry.

Concepts You Must Understand First

Composition of Semilattices
- If A and B are Join-Semilattices, is (A, B) a Join-Semilattice?
- Book Reference: “A comprehensive study of CvRDTs” by Shapiro

Project 3: The G-Set (Grow-only Set)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: JavaScript/Node.js
Alternative Programming Languages: Python, C++
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 1. The “Resume Gold”
Difficulty: Level 1: Beginner
Knowledge Area: Data Structures
Software or Tool: Shared Tag Cloud / Event Log
Main Book: “Introduction to Reliable and Secure Distributed Programming” by Christian Cachin

What you’ll build: A set where you can only add items. Once an item is in, it stays in forever.

Why it teaches CRDTs: It’s the set version of a G-Counter. It teaches you that “Set Union” is the perfect merge function (it’s associative, commutative, and idempotent by nature).

Project 4: The 2P-Set (Two-Phase Set)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: JavaScript
Alternative Programming Languages: Python, Java
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Distributed Storage
Software or Tool: User Blacklist / One-time voucher system
Main Book: “Distributed Systems” by Maarten van Steen

What you’ll build: A set that allows additions AND removals. However, there’s a catch: once you remove an item, it can never be added again.

Why it teaches CRDTs: It introduces Tombstones. You’ll learn that “deleting” in a distributed system usually means “marking as deleted” in a separate internal data structure.

Core challenges you’ll face:

The “Add-Remove-Add” problem → Why the second “Add” fails in a 2P-Set.
Tombstone accumulation → Realizing that “deleted” items still take up memory.

Project 5: LWW-Element-Set (Last-Write-Wins Set)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: Python
Alternative Programming Languages: Rust, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Distributed Systems
Software or Tool: Shared Shopping Cart / Config Manager
Main Book: “Distributed Systems: Principles and Paradigms” by Andrew S. Tanenbaum

What you’ll build: A set where items can be added and removed multiple times. Conflicts are resolved using timestamps: the most recent operation wins.

Why it teaches CRDTs: It introduces the “Last-Write-Wins” resolution policy. You’ll learn how to use metadata (timestamps) to decide between conflicting operations (Add vs. Remove) on the same element.

Core challenges you’ll face:

Clock Skew → What happens if Node A’s clock is 5 minutes ahead of Node B?
Tie-breaking → What if two nodes perform opposing operations at the exact same millisecond?

Real World Outcome

A set that feels like a normal set, but handles concurrent edits by picking the “latest” one.

Example Output:

Node A: add("pizza", timestamp: 100)
Node B: remove("pizza", timestamp: 110)
--- Sync ---
Result: "pizza" is NOT in the set (Remove won because it happened later).

The Core Question You’re Answering

“In a world without a single source of time, who are we to say what happened ‘last’?”

This project forces you to confront the reality of physical clocks in distributed systems.

Project 6: The OR-Set (Observed-Remove Set)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: TypeScript
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 3: Advanced
Knowledge Area: Distributed Algorithms
Software or Tool: Collaborative Todo List
Main Book: “A comprehensive study of CvRDTs” by Marc Shapiro

What you’ll build: A robust set where an element is in the set if and only if there is an “Add” operation that has not been “overshadowed” by a causally subsequent “Remove” operation.

Why it teaches CRDTs: This is the “Gold Standard” of sets. It teaches you how to use Unique Tags for every operation to distinguish between different “versions” of the same element. It solves the Add-Remove conflict without relying on fragile timestamps.

Core challenges you’ll face:

Causal Relationships → Ensuring a “Remove” only affects “Adds” that it has actually seen.
Metadata Management → Handling the list of tags for each element.

Project 7: Vector Clock Implementation

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: Go
Alternative Programming Languages: Rust, Python
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold”
Difficulty: Level 2: Intermediate
Knowledge Area: Distributed Logical Time
Software or Tool: Versioning system / Causality tracker
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A logical clock that can tell you if two events are concurrent, or if one happened before the other.

Why it teaches CRDTs: Vector clocks are the “heartbeat” of advanced CRDTs (like the OR-Set or Sequence CRDTs). You cannot build reliable distributed systems without understanding how to track “Happened-Before” relationships.

Real World Outcome

A comparison tool that takes two vector clocks and returns: BEFORE, AFTER, CONCURRENT, or EQUAL.

Example Output:

$ ./vclock compare [A:1, B:0] [A:1, B:1]
Result: BEFORE
$ ./vclock compare [A:2, B:0] [A:1, B:1]
Result: CONCURRENT (Split brain!)

Project 9: The LWW-Register (Last-Write-Wins Register)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: Go
Alternative Programming Languages: Rust, TypeScript
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 1: Beginner
Knowledge Area: Distributed State
Software or Tool: Shared User Profile Settings
Main Book: “Distributed Systems” by Maarten van Steen

What you’ll build: A register that holds a single value (string or number). When two nodes update it concurrently, the one with the higher timestamp wins.

Why it teaches CRDTs: It’s the simplest way to handle generic data updates. It teaches the “Register” concept and the inherent dangers of wall-clock time in distributed systems.

Project 10: Multi-Value Register (MV-Register)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: Rust
Alternative Programming Languages: Go, Python
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 3: Advanced
Knowledge Area: Conflict Resolution
Software or Tool: Distributed Configuration Store (like Riak’s core)
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A register that, instead of overwriting concurrent changes, preserves both. If two nodes update the register without knowing about each other’s changes, the register will return a set of values (a “conflict”) that must be resolved later.

Why it teaches CRDTs: It introduces the concept of Causal Context. You’ll learn how to use version vectors to detect if one write “supersedes” another or if they are “sibling” writes that need manual merging.

Real World Outcome

A register where “winning” isn’t automatic if the writes were concurrent.

Example Output:

Node A: set("Blue")
Node B: set("Red") (Concurrent)
--- Sync ---
Result: ["Blue", "Red"] // Both values preserved!
Node A: set("Purple", context: ["Blue", "Red"])
--- Sync ---
Result: ["Purple"] // Conflict resolved by Node A

Project 11: Recursive Map CRDT

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: TypeScript
Alternative Programming Languages: Rust, Python
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: Composite Data Structures
Software or Tool: Collaborative JSON-like Store
Main Book: “A comprehensive study of CvRDTs” by Marc Shapiro

What you’ll build: A map (dictionary) where each key maps to another CRDT (a counter, a set, or even another map).

Why it teaches CRDTs: This is how real applications are built. You’ll learn how to nest CRDTs and propagate merge calls down the tree. It teaches you that “The CRDT of a Map of CRDTs is a CRDT.”

Project 13: RGA (Replicated Growable Array)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: JavaScript
Alternative Programming Languages: Rust, Python
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 4: Expert
Knowledge Area: Collaborative Text Editing
Software or Tool: Minimalist Google Docs Clone
Main Book: “Replicated growable array” by Roh et al. (Paper)

What you’ll build: A list structure that allows multiple people to insert characters at any position. Even if they insert at the “same” spot concurrently, the result will be identical for everyone.

Why it teaches CRDTs: This is the “Aha!” moment for collaborative text. You’ll learn that using array indices (0, 1, 2) is impossible in distributed systems, and you must instead use Immutable Unique IDs for every single character.

Core challenges you’ll face:

Shifting Indices → If I insert at index 5, but you just deleted index 2, my index 5 is now wrong. How to fix?
Interleaving → Ensuring that “Hello” + “World” doesn’t become “HWeolrldlo”.

Real World Outcome

A command-line text buffer where two processes can “type” simultaneously and sync to see the same string.

Example Output:

Node A: Insert 'H' at START
Node B: Insert 'i' after 'H'
--- Sync ---
Buffer: "Hi"
Node A: Insert '!' after 'i'
Node B: Insert '?' after 'i'
--- Sync ---
Buffer: "Hi!?" (Deterministic order)

The Core Question You’re Answering

“In a shared string, what is the ‘address’ of a character if indices keep changing?”

This project teaches you to think in terms of Causal Pointers (e.g., “Insert ‘B’ after the character with ID {Node:A, Counter:15}”).

Project 14: LSEQ (Logarithmic Sequence)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: Rust
Alternative Programming Languages: C++, TypeScript
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 4: Expert
Knowledge Area: Algorithms / Indexing
Software or Tool: High-performance Text CRDT
Main Book: “LSEQ: an Adaptive Structure for Optimistic Editing with Large Number of Collaborators” by Brice Nédelec

What you’ll build: A sequence CRDT that uses “Fractional Indexing” (assigning numbers between 0 and 1) to position elements.

Why it teaches CRDTs: It addresses the metadata bloat problem. You’ll learn how to represent positions as paths in a tree to keep identifiers as short as possible while allowing infinite insertions between any two points.

Project 15: YATA (Yet Another Transformation Algorithm)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: JavaScript
Alternative Programming Languages: Rust
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 4: Expert
Knowledge Area: Distributed Logic
Software or Tool: The core logic of Yjs
Main Book: “Near Real-Time Peer-to-Peer Shared Editing on Extensible Data Types” by Nicolae et al.

What you’ll build: An optimized sequence CRDT that uses a doubly-linked list of operations.

Why it teaches CRDTs: This is the algorithm used by Yjs, one of the fastest CRDT libraries in the world. You’ll learn how to optimize insertions and deletions by keeping a flat list of operations and using a specific conflict resolution rule based on origin and left/right neighbors.

Project 17: P2P Gossip Protocol Simulation

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: Go
Alternative Programming Languages: Rust, Python
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 3: Advanced
Knowledge Area: Networking
Software or Tool: Distributed Cluster Membership / CRDT Sync
Main Book: “Distributed Systems: Principles and Paradigms” by Tanenbaum

What you’ll build: A system where nodes randomly pick neighbors to share their CRDT state with, eventually propagating an update to the whole network.

Why it teaches CRDTs: CRDTs don’t sync themselves. You’ll learn how to build a decentralized “anti-entropy” mechanism that ensures eventual delivery without a central server.

Project 18: Tombstone Garbage Collection

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: Rust
Alternative Programming Languages: Go, C++
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Memory Management
Software or Tool: Production-ready CRDT Engine
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A mechanism to safely “forget” deleted items (tombstones) once you are sure all nodes have seen the deletion.

Why it teaches CRDTs: This is the “hidden cost” of CRDTs. You’ll learn about Stability Conditions and how to use vector clocks to determine when a piece of metadata is no longer needed by any node in the cluster.

Project 19: Local-First Hybrid (SQLite + CRDT)

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: TypeScript
Alternative Programming Languages: Rust, Go
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 4: Expert
Knowledge Area: Persistence / Databases
Software or Tool: Offline-capable Mobile/Web App
Main Book: “Local-first software” by Ink & Switch

What you’ll build: A system where data is stored in a local SQLite database for speed and offline access, but synced via CRDTs when the network is available.

Why it teaches CRDTs: You’ll learn how to map SQL tables to CRDT structures and how to handle the “Object-Relational Mapping” for distributed types. This is how modern apps like Linear or Reflect are built.

Project 20: Final Overall Project: P2P Collaborative Code Editor

File: LEARN_CRDT_DEEP_DIVE_PROJECTS.md
Main Programming Language: TypeScript (React/Vue + Monaco Editor)
Alternative Programming Languages: Rust (WASMed), JavaScript
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 5: Master
Knowledge Area: Full-stack Distributed Systems
Software or Tool: A competitor to VS Code Live Share (but P2P)
Main Book: “A Conflict-Free Replicated JSON Datatype” by Martin Kleppmann

What you’ll build: A fully functional browser-based code editor where multiple users can join a session (via WebRTC), edit code together, chat in a sidebar, and see each other’s cursors—all without a central database server.

Why it teaches CRDTs: This applies everything.

RGA/YATA for the code buffer.
LWW-Registers for cursor positions.
OR-Sets for the user list.
WebRTC for P2P transport.
Gossip for state discovery.

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor
1. G-Counter	Level 1	2h	Basic Merging	⭐⭐
6. OR-Set	Level 3	1w	Causality & Tags	⭐⭐⭐⭐
13. RGA	Level 4	2w	Sequences & IDs	⭐⭐⭐⭐⭐
16. JSON CRDT	Level 5	1m	Complex Documents	⭐⭐⭐⭐⭐
20. P2P Editor	Level 5	2m+	Total Mastery	⭐⭐⭐⭐⭐

Recommendation

If you are new to distributed systems: Start with Project 1 (G-Counter) and Project 3 (G-Set). They are the “Hello World” of the CRDT world.

If you want to build a real app: Jump to Project 6 (OR-Set). This is the minimum requirement for a functional shared list (like a shopping list or todo app).

If you want to master text collaboration: Focus on Project 13 (RGA). It will change the way you think about arrays and indices forever.

Summary

This learning path covers Conflict-Free Replicated Data Types (CRDTs) through 20 hands-on projects. Here’s the complete list:

#	Project Name	Main Language	Difficulty	Time Estimate
1	G-Counter	Python	Level 1	2-4 hours
2	PN-Counter	Python	Level 1	2 hours
3	G-Set	JavaScript	Level 1	2 hours
4	2P-Set	JavaScript	Level 2	4 hours
5	LWW-Element-Set	Python	Level 2	1 day
6	OR-Set	TypeScript	Level 3	1 week
7	Vector Clocks	Go	Level 2	2 days
8	Op-based Counter	Python	Level 3	3 days
9	LWW-Register	Go	Level 1	2 hours
10	MV-Register	Rust	Level 3	1 week
11	Recursive Map	TypeScript	Level 3	1 week
12	Delta-State Optimizer	Rust	Level 4	1-2 weeks
13	RGA (Text Sequence)	JavaScript	Level 4	2 weeks
14	LSEQ (Indexing)	Rust	Level 4	2 weeks
15	YATA (Yjs Core)	JavaScript	Level 4	2 weeks
16	JSON CRDT	TypeScript	Level 5	1 month
17	Gossip Protocol	Go	Level 3	1 week
18	Tombstone GC	Rust	Level 4	2 weeks
19	SQLite Hybrid	TypeScript	Level 4	2 weeks
20	P2P Code Editor	TypeScript	Level 5	2 months+

Expected Outcomes

After completing these projects, you will:

Understand the mathematical properties (Commutativity, Associativity, Idempotence) required for eventual consistency.
Be able to implement both state-based and operation-based synchronization.
Master the use of Vector Clocks and causal metadata to resolve conflicts.
Know how to build a collaborative text editor from scratch without using third-party libraries.
Be capable of designing “Local-First” applications that are resilient to network partitions.

Learn CRDTs: From Zero to CRDT Master

Why CRDTs Matter

Core Concept Analysis

1. The Anatomy of a CRDT

2. The Mathematical “Triple Threat”

3. State-Based vs. Operation-Based

4. Vector Clocks: Tracking History

Concept Summary Table

Deep Dive Reading by Concept

Distributed Systems Foundations

CRDT Specifics

Project List

Project 1: The G-Counter (Grow-only Counter)

Real World Outcome

The Core Question You’re Answering

Concepts You Must Understand First

Questions to Guide Your Design

Thinking Exercise

The Over-counting Trap

The Interview Questions They’ll Ask

Hints in Layers

Books That Will Help

Implementation Hints

Learning Milestones

Project 2: The PN-Counter (Positive-Negative Counter)

Real World Outcome

The Core Question You’re Answering

Concepts You Must Understand First

Project 3: The G-Set (Grow-only Set)

Project 4: The 2P-Set (Two-Phase Set)

Project 5: LWW-Element-Set (Last-Write-Wins Set)

Real World Outcome

The Core Question You’re Answering

Project 6: The OR-Set (Observed-Remove Set)

Project 7: Vector Clock Implementation

Real World Outcome

Project 9: The LWW-Register (Last-Write-Wins Register)

Project 10: Multi-Value Register (MV-Register)

Real World Outcome

Project 11: Recursive Map CRDT

Project 13: RGA (Replicated Growable Array)

Real World Outcome

The Core Question You’re Answering

Project 14: LSEQ (Logarithmic Sequence)

Project 15: YATA (Yet Another Transformation Algorithm)

Project 17: P2P Gossip Protocol Simulation

Project 18: Tombstone Garbage Collection

Project 19: Local-First Hybrid (SQLite + CRDT)

Project 20: Final Overall Project: P2P Collaborative Code Editor

Project Comparison Table

Recommendation

Summary

Expected Outcomes

You’ll have built 20 working projects that demonstrate deep understanding of distributed systems from first principles.