Project 8: Build a Lock-Free Data Structure (Atomics)
Implement a lock-free Treiber stack with atomic pointers, memory ordering, and safe reclamation.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Master |
| Time Estimate | 3-4 weeks |
| Main Programming Language | Rust |
| Alternative Programming Languages | C++ (std::atomic) |
| Coolness Level | Extremely High |
| Business Potential | High |
| Prerequisites | Concurrency, atomics, unsafe Rust |
| Key Topics | Atomics, memory ordering, ABA, memory reclamation |
1. Learning Objectives
By completing this project, you will:
- Implement a lock-free stack using atomic compare-and-swap.
- Explain memory ordering and why it matters for correctness.
- Mitigate the ABA problem with hazard pointers or epochs.
- Benchmark lock-free vs mutex-based stacks.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Atomics and Memory Ordering
Fundamentals
Atomics provide low-level synchronization without locks. Memory ordering controls how reads and writes can be reordered by the compiler and CPU. The common orderings are Relaxed, Acquire, Release, and SeqCst. Lock-free data structures rely on correct ordering to ensure visibility across threads.
Deep Dive into the concept
Atomics are the foundation of lock-free programming. A lock-free stack uses an atomic pointer for the head. When pushing, you load the head, set the new node’s next pointer to the old head, and then atomically compare-and-swap the head to the new node. If another thread changed the head in the meantime, you retry. This algorithm depends on memory ordering. Using Acquire/Release ensures that writes to the node are visible to other threads once the head pointer is published. SeqCst is the strongest ordering and is easiest to reason about but slower.
Memory ordering is a contract between threads. Release on a store means all previous writes in that thread are visible to any thread that later performs an Acquire load of the same atomic. This is exactly what you need for a stack push: when the head pointer is published with Release, any consumer that loads the head with Acquire will see a fully initialized node. If you use Relaxed, there is no such guarantee and a consumer might see a partially initialized node.
The challenge is that memory ordering is subtle and error-prone. Many lock-free algorithms can be made correct with SeqCst everywhere, but performance might suffer. In this project, you start with SeqCst, then carefully downgrade to Acquire/Release and prove correctness. This exercise teaches how to reason about visibility and ordering, which is essential for safe concurrency.
How this fit on projects
This concept is foundational to §4.4 algorithm design and §6 testing. It connects to Project 4 (mutex-based queue) as a contrast.
Definitions & key terms
- Atomic: Operation that is indivisible across threads.
- Acquire: Prevents reordering of subsequent loads/stores before the acquire.
- Release: Prevents reordering of previous loads/stores after the release.
- SeqCst: Sequential consistency, the strongest ordering.
Mental model diagram (ASCII)
Thread A: write data -> Release store head
Thread B: Acquire load head -> read data
How it works (step-by-step)
- Allocate node and initialize data.
- Load head with
Acquire. - Set node.next = head.
- CAS head with
Release. - Retry on failure.
Minimal concrete example
head.compare_exchange(old, new, Ordering::Release, Ordering::Relaxed)
Common misconceptions
- “Relaxed is always fine.” It can break visibility guarantees.
- “SeqCst is always required.” It’s safer but sometimes unnecessary.
Check-your-understanding questions
- Why does a stack push require
Release? - What does
Acquireguarantee on pop? - Why is
Relaxeddangerous here?
Check-your-understanding answers
- It publishes the node’s initialized data.
- It ensures the node’s data is visible.
- Without ordering, another thread might see uninitialized data.
Real-world applications
- Lock-free queues in runtimes.
- Shared work-stealing deques.
Where you’ll apply it
- This project: §4.4 algorithm, §6.2 tests.
- Also used in: Project 4: Thread-Safe Queue.
References
- “Rust Atomics and Locks”.
- “The Art of Multiprocessor Programming”.
Key insights
Memory ordering is the correctness contract of lock-free algorithms.
Summary
Atomics plus correct ordering allow safe, fast synchronization without locks.
Homework/Exercises to practice the concept
- Implement a spinlock using atomics.
- Replace SeqCst with Acquire/Release and explain why it’s safe.
Solutions to the homework/exercises
- Use
compare_exchangeon a bool flag. - Show that writes are released before publication and acquired before use.
2.2 The ABA Problem and Reclamation
Fundamentals
The ABA problem occurs when a pointer value changes from A to B and back to A, fooling compare-and-swap into thinking nothing changed. In lock-free stacks, this can lead to use-after-free if a node is popped, freed, and then reallocated. Safe reclamation strategies include hazard pointers and epoch-based reclamation.
Deep Dive into the concept
In a Treiber stack, you load the head pointer, then attempt to CAS it to head.next. If another thread pops the head and then pushes a new node that happens to reuse the same address, the head pointer value appears unchanged (A -> B -> A). The CAS succeeds, but the node you thought you were popping might be a different allocation. This is the ABA problem. It is subtle and can lead to memory corruption if you free nodes immediately.
To mitigate ABA, you need a reclamation strategy. Hazard pointers are one approach: each thread publishes which nodes it might access, and nodes are only freed when no thread hazards them. Epoch-based reclamation is another: memory is freed only after all threads have advanced past the epoch in which the node was removed. Rust’s crossbeam_epoch provides a safe implementation of epoch-based reclamation.
For this project, you can implement a simplified reclamation scheme: use crossbeam_epoch to manage nodes, or implement a simple deferred free list with a global epoch counter. The key is to show that you cannot just Box::from_raw and drop nodes immediately after popping in a multi-threaded context.
How this fit on projects
This concept is required in §3.2 (requirements) and §7.1 (pitfalls). It also connects to Project 2 (refcounting) because both manage memory lifetime under shared access.
Definitions & key terms
- ABA problem: A value changes and changes back, fooling CAS.
- Hazard pointer: Thread-local pointer protecting nodes.
- Epoch: Logical time used for safe reclamation.
Mental model diagram (ASCII)
T1: load head = A
T2: pop A, free, alloc new at A
T1: CAS succeeds -> stale
How it works (step-by-step)
- Thread loads head (A).
- Another thread removes and frees A.
- Allocation returns A again.
- CAS sees A and succeeds incorrectly.
Minimal concrete example
// ABA scenario in comments; use hazard pointers to avoid
Common misconceptions
- “CAS prevents ABA.” It only checks equality, not history.
- “ABA is rare.” It is possible under allocator reuse.
Check-your-understanding questions
- Why is ABA dangerous in a lock-free stack?
- How do hazard pointers solve it?
- Why is immediate free unsafe?
Check-your-understanding answers
- CAS can succeed on a stale pointer.
- Hazard pointers prevent freeing nodes still in use.
- Another thread may still access the node.
Real-world applications
- Lock-free stacks and queues.
- Memory allocators.
Where you’ll apply it
- This project: §5.10 Phase 2, §7 pitfalls.
- Also used in: Project 2: Rc.
References
- “The Art of Multiprocessor Programming”.
- Crossbeam epoch docs.
Key insights
ABA is a memory reclamation problem, not just a CAS problem.
Summary
Lock-free algorithms require safe memory reclamation to avoid ABA bugs.
Homework/Exercises to practice the concept
- Read about hazard pointers and summarize the algorithm.
- Use
crossbeam_epochin a toy example.
Solutions to the homework/exercises
- Hazard pointers publish nodes in use, delaying frees.
- Use
pin()andguard.defer_destroy.
2.3 Treiber Stack Algorithm
Fundamentals
Treiber’s stack is a classic lock-free stack algorithm using an atomic head pointer and compare-and-swap. It is simple and demonstrates core lock-free patterns.
Deep Dive into the concept
Treiber’s algorithm uses a singly linked list. Push: allocate node, set its next to current head, then CAS head to new node. Pop: load head, if null return empty; otherwise set head to head.next via CAS; if CAS succeeds, return node. This algorithm is lock-free because at least one thread makes progress on each CAS attempt. However, it is not wait-free: a thread may retry indefinitely under contention.
The correctness of Treiber’s stack depends on memory ordering and safe reclamation. The push must publish the node after it is initialized. The pop must ensure it reads a consistent node. In Rust, you implement this with AtomicPtr<Node<T>>. You must carefully manage ownership: nodes are allocated on the heap, and reclamation is deferred using hazard pointers or epochs.
How this fit on projects
This concept is the algorithmic core of §4.4 and §5.10. It connects to Project 4 (bounded queue) as a contrast.
Definitions & key terms
- Treiber stack: Lock-free stack using CAS.
- CAS loop: Loop that retries until CAS succeeds.
Mental model diagram (ASCII)
Head -> Node1 -> Node2
Push: new -> old head -> CAS
Pop: read head -> CAS to next
How it works (step-by-step)
- Load head.
- If null, return empty.
- Read next pointer.
- CAS head to next.
- On success, return node.
Minimal concrete example
loop {
let head = self.head.load(Ordering::Acquire);
if head.is_null() { return None; }
let next = unsafe { (*head).next };
if self.head.compare_exchange(head, next, Ordering::AcqRel, Ordering::Relaxed).is_ok() {
return Some(head);
}
}
Common misconceptions
- “Lock-free means faster.” Under contention, lock-free can be slower.
- “Treiber stack is safe without reclamation.” It is not.
Check-your-understanding questions
- Why is Treiber stack lock-free but not wait-free?
- What happens on CAS failure?
Check-your-understanding answers
- Some thread makes progress, but a single thread may starve.
- The thread retries with the new head value.
Real-world applications
- Work-stealing deques.
- Memory allocators.
Where you’ll apply it
- This project: §4.4, §5.10 Phase 2.
- Also used in: Project 4: Thread-Safe Queue.
References
- Treiber (1986) paper.
Key insights
CAS loops are the core of lock-free data structures.
Summary
Treiber’s stack is the simplest lock-free algorithm but still demands careful memory management.
Homework/Exercises to practice the concept
- Implement a single-threaded Treiber stack first.
- Add atomics and test with two threads.
Solutions to the homework/exercises
- Use a
Vecor linked list to simulate stack behavior. - Add
AtomicPtrand CAS loop.
3. Project Specification
3.1 What You Will Build
A crate lockfree_stack implementing Treiber’s lock-free stack with safe memory reclamation and benchmarks.
3.2 Functional Requirements
pushandpopare lock-free.- Safe memory reclamation (hazard pointers or epoch).
- Stress tests with many threads.
- CLI demo shows throughput and correctness.
3.3 Non-Functional Requirements
- Safety: No use-after-free.
- Performance: Competitive with mutex stack.
3.4 Example Usage / Output
$ cargo run --example lockfree_demo
threads=32 ops=1_000_000
push/pop ok
lost: 0
throughput: 8.2M ops/sec
exit code: 0
3.5 Data Formats / Schemas / Protocols
Node { data: T, next: *mut Node<T> }.
3.6 Edge Cases
- Pop on empty stack.
- High contention stress.
- ABA risk under reuse.
3.7 Real World Outcome
Deterministic demo and failure case.
3.7.1 How to Run (Copy/Paste)
cargo run --example lockfree_demo
3.7.2 Golden Path Demo (Deterministic)
- Fixed thread count and op count.
3.7.3 CLI Transcript (Success)
$ cargo run --example lockfree_demo -- --threads 4 --ops 10000
push/pop ok
lost: 0
exit code: 0
3.7.4 Failure Demo (Reclamation Off)
$ cargo run --example lockfree_demo -- --no-reclaim
error: unsafe mode without reclamation enabled
exit code: 2
4. Solution Architecture
4.1 High-Level Design
AtomicPtr<Head>
|
Node { data, next }
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
LockFreeStack |
API | push/pop |
Node |
link node | raw pointers |
| Reclaimer | safe memory | epoch or hazard |
4.4 Data Structures (No Full Code)
struct Node<T> { data: T, next: *mut Node<T> }
4.4 Algorithm Overview
Key Algorithm: CAS Loop
- Load head.
- Set next.
- CAS head.
- Retry on failure.
Complexity Analysis:
- Time: O(1) average per op
- Space: O(n)
5. Implementation Guide
5.1 Development Environment Setup
cargo new lockfree_stack
cd lockfree_stack
cargo add crossbeam-epoch
5.2 Project Structure
lockfree_stack/
├── src/
│ ├── lib.rs
│ └── stack.rs
├── examples/
│ └── lockfree_demo.rs
└── benches/
└── stack_bench.rs
5.3 The Core Question You’re Answering
“How can a stack be safe and correct without using locks?”
5.4 Concepts You Must Understand First
- Atomics and memory ordering.
- ABA problem and reclamation.
- CAS loops.
5.5 Questions to Guide Your Design
- What ordering is required for push/pop?
- How will you reclaim nodes safely?
- How will you test correctness under concurrency?
5.6 Thinking Exercise
Write Treiber’s pseudocode and annotate ordering.
5.7 The Interview Questions They’ll Ask
- “What is the ABA problem?”
- “Why is memory ordering important?”
- “What does lock-free mean?”
5.8 Hints in Layers
Hint 1: Start with SeqCst ordering.
Hint 2: Use crossbeam_epoch for reclamation.
Hint 3: Benchmark against Mutex<Vec<T>>.
5.9 Books That Will Help
| Topic | Book | Chapter | |—|—|—| | Atomics | Rust Atomics and Locks | Ch. 2-4 | | Lock-free | Art of Multiprocessor Programming | Stack chapter |
5.10 Implementation Phases
Phase 1: Single-Threaded Stack (3-4 days)
- Implement basic stack without atomics.
Phase 2: Lock-Free Core (1-2 weeks)
- Implement
AtomicPtr+ CAS loop.
Phase 3: Reclamation + Benchmarks (1 week)
- Add hazard/epoch reclamation, run benchmarks.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |—|—|—|—| | Reclamation | hazard vs epoch | epoch | available crate | | Ordering | SeqCst vs AcqRel | start SeqCst | simplify reasoning |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |—|—|—| | Unit Tests | basic push/pop | single-threaded | | Stress Tests | concurrency | many threads | | Reclamation Tests | ABA | reuse nodes |
6.2 Critical Test Cases
- Push/pop order correct under concurrency.
- No lost nodes under stress.
- Reclamation does not free live nodes.
6.3 Test Data
threads=4
ops=10000
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |—|—|—| | Wrong ordering | data races | use Acquire/Release | | Freeing too early | crashes | use epoch reclamation | | Forgetting retry | lost nodes | CAS loop |
7.2 Debugging Strategies
- Use
loomorcrossbeamtests. - Add debug counters.
7.3 Performance Traps
- Excessive contention on a single head pointer.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
is_emptyandlen(approx).
8.2 Intermediate Extensions
- Implement a lock-free queue.
8.3 Advanced Extensions
- Add elimination backoff stack.
9. Real-World Connections
9.1 Industry Applications
- Memory allocators.
- Runtime schedulers.
9.2 Related Open Source Projects
crossbeam.
9.3 Interview Relevance
- Explain lock-free vs wait-free.
10. Resources
10.1 Essential Reading
- Rust Atomics and Locks.
- Art of Multiprocessor Programming.
10.2 Video Resources
- Talks on lock-free programming.
10.3 Tools & Documentation
crossbeam-epochdocs.
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain Acquire/Release ordering.
- I can explain ABA and reclamation.
11.2 Implementation
- Stack passes stress tests.
- Reclamation is safe.
11.3 Growth
- I can compare lock-free and lock-based designs.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Lock-free push/pop with correct ordering.
- CLI demo with deterministic output and failure case.
Full Completion:
- Safe reclamation strategy implemented.
Excellence (Going Above & Beyond):
- Advanced elimination/backoff strategies.