Project 8: Build a Lock-Free Data Structure (Atomics)

Implement a lock-free Treiber stack with atomic pointers, memory ordering, and safe reclamation.

Quick Reference

Attribute	Value
Difficulty	Master
Time Estimate	3-4 weeks
Main Programming Language	Rust
Alternative Programming Languages	C++ (std::atomic)
Coolness Level	Extremely High
Business Potential	High
Prerequisites	Concurrency, atomics, unsafe Rust
Key Topics	Atomics, memory ordering, ABA, memory reclamation

1. Learning Objectives

By completing this project, you will:

Implement a lock-free stack using atomic compare-and-swap.
Explain memory ordering and why it matters for correctness.
Mitigate the ABA problem with hazard pointers or epochs.
Benchmark lock-free vs mutex-based stacks.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Atomics and Memory Ordering

Fundamentals

Atomics provide low-level synchronization without locks. Memory ordering controls how reads and writes can be reordered by the compiler and CPU. The common orderings are Relaxed, Acquire, Release, and SeqCst. Lock-free data structures rely on correct ordering to ensure visibility across threads.

Deep Dive into the concept

Atomics are the foundation of lock-free programming. A lock-free stack uses an atomic pointer for the head. When pushing, you load the head, set the new node’s next pointer to the old head, and then atomically compare-and-swap the head to the new node. If another thread changed the head in the meantime, you retry. This algorithm depends on memory ordering. Using Acquire/Release ensures that writes to the node are visible to other threads once the head pointer is published. SeqCst is the strongest ordering and is easiest to reason about but slower.

Memory ordering is a contract between threads. Release on a store means all previous writes in that thread are visible to any thread that later performs an Acquire load of the same atomic. This is exactly what you need for a stack push: when the head pointer is published with Release, any consumer that loads the head with Acquire will see a fully initialized node. If you use Relaxed, there is no such guarantee and a consumer might see a partially initialized node.

The challenge is that memory ordering is subtle and error-prone. Many lock-free algorithms can be made correct with SeqCst everywhere, but performance might suffer. In this project, you start with SeqCst, then carefully downgrade to Acquire/Release and prove correctness. This exercise teaches how to reason about visibility and ordering, which is essential for safe concurrency.

How this fit on projects

This concept is foundational to §4.4 algorithm design and §6 testing. It connects to Project 4 (mutex-based queue) as a contrast.

Definitions & key terms

Atomic: Operation that is indivisible across threads.
Acquire: Prevents reordering of subsequent loads/stores before the acquire.
Release: Prevents reordering of previous loads/stores after the release.
SeqCst: Sequential consistency, the strongest ordering.

Mental model diagram (ASCII)

Thread A: write data -> Release store head
Thread B: Acquire load head -> read data

How it works (step-by-step)

Allocate node and initialize data.
Load head with Acquire.
Set node.next = head.
CAS head with Release.
Retry on failure.

Minimal concrete example

head.compare_exchange(old, new, Ordering::Release, Ordering::Relaxed)

Common misconceptions

“Relaxed is always fine.” It can break visibility guarantees.
“SeqCst is always required.” It’s safer but sometimes unnecessary.

Check-your-understanding questions

Why does a stack push require Release?
What does Acquire guarantee on pop?
Why is Relaxed dangerous here?

Check-your-understanding answers

It publishes the node’s initialized data.
It ensures the node’s data is visible.
Without ordering, another thread might see uninitialized data.

Real-world applications

Lock-free queues in runtimes.
Shared work-stealing deques.

Where you’ll apply it

This project: §4.4 algorithm, §6.2 tests.
Also used in: Project 4: Thread-Safe Queue.

References

“Rust Atomics and Locks”.
“The Art of Multiprocessor Programming”.

Key insights

Memory ordering is the correctness contract of lock-free algorithms.

Summary

Atomics plus correct ordering allow safe, fast synchronization without locks.

Homework/Exercises to practice the concept

Implement a spinlock using atomics.
Replace SeqCst with Acquire/Release and explain why it’s safe.

Solutions to the homework/exercises

Use compare_exchange on a bool flag.
Show that writes are released before publication and acquired before use.

2.2 The ABA Problem and Reclamation

Fundamentals

The ABA problem occurs when a pointer value changes from A to B and back to A, fooling compare-and-swap into thinking nothing changed. In lock-free stacks, this can lead to use-after-free if a node is popped, freed, and then reallocated. Safe reclamation strategies include hazard pointers and epoch-based reclamation.

Deep Dive into the concept

In a Treiber stack, you load the head pointer, then attempt to CAS it to head.next. If another thread pops the head and then pushes a new node that happens to reuse the same address, the head pointer value appears unchanged (A -> B -> A). The CAS succeeds, but the node you thought you were popping might be a different allocation. This is the ABA problem. It is subtle and can lead to memory corruption if you free nodes immediately.

To mitigate ABA, you need a reclamation strategy. Hazard pointers are one approach: each thread publishes which nodes it might access, and nodes are only freed when no thread hazards them. Epoch-based reclamation is another: memory is freed only after all threads have advanced past the epoch in which the node was removed. Rust’s crossbeam_epoch provides a safe implementation of epoch-based reclamation.

For this project, you can implement a simplified reclamation scheme: use crossbeam_epoch to manage nodes, or implement a simple deferred free list with a global epoch counter. The key is to show that you cannot just Box::from_raw and drop nodes immediately after popping in a multi-threaded context.

How this fit on projects

This concept is required in §3.2 (requirements) and §7.1 (pitfalls). It also connects to Project 2 (refcounting) because both manage memory lifetime under shared access.

Definitions & key terms

ABA problem: A value changes and changes back, fooling CAS.
Hazard pointer: Thread-local pointer protecting nodes.
Epoch: Logical time used for safe reclamation.

Mental model diagram (ASCII)

T1: load head = A
T2: pop A, free, alloc new at A
T1: CAS succeeds -> stale

How it works (step-by-step)

Thread loads head (A).
Another thread removes and frees A.
Allocation returns A again.
CAS sees A and succeeds incorrectly.

Minimal concrete example

// ABA scenario in comments; use hazard pointers to avoid

Common misconceptions

“CAS prevents ABA.” It only checks equality, not history.
“ABA is rare.” It is possible under allocator reuse.

Check-your-understanding questions

Why is ABA dangerous in a lock-free stack?
How do hazard pointers solve it?
Why is immediate free unsafe?

Check-your-understanding answers

CAS can succeed on a stale pointer.
Hazard pointers prevent freeing nodes still in use.
Another thread may still access the node.

Real-world applications

Lock-free stacks and queues.
Memory allocators.

Where you’ll apply it

This project: §5.10 Phase 2, §7 pitfalls.
Also used in: Project 2: Rc.

References

“The Art of Multiprocessor Programming”.
Crossbeam epoch docs.

Key insights

ABA is a memory reclamation problem, not just a CAS problem.

Summary

Lock-free algorithms require safe memory reclamation to avoid ABA bugs.

Homework/Exercises to practice the concept

Read about hazard pointers and summarize the algorithm.
Use crossbeam_epoch in a toy example.

Solutions to the homework/exercises

Hazard pointers publish nodes in use, delaying frees.
Use pin() and guard.defer_destroy.

2.3 Treiber Stack Algorithm

Fundamentals

Treiber’s stack is a classic lock-free stack algorithm using an atomic head pointer and compare-and-swap. It is simple and demonstrates core lock-free patterns.

Deep Dive into the concept

Treiber’s algorithm uses a singly linked list. Push: allocate node, set its next to current head, then CAS head to new node. Pop: load head, if null return empty; otherwise set head to head.next via CAS; if CAS succeeds, return node. This algorithm is lock-free because at least one thread makes progress on each CAS attempt. However, it is not wait-free: a thread may retry indefinitely under contention.

The correctness of Treiber’s stack depends on memory ordering and safe reclamation. The push must publish the node after it is initialized. The pop must ensure it reads a consistent node. In Rust, you implement this with AtomicPtr<Node<T>>. You must carefully manage ownership: nodes are allocated on the heap, and reclamation is deferred using hazard pointers or epochs.

How this fit on projects

This concept is the algorithmic core of §4.4 and §5.10. It connects to Project 4 (bounded queue) as a contrast.

Definitions & key terms

Treiber stack: Lock-free stack using CAS.
CAS loop: Loop that retries until CAS succeeds.

Mental model diagram (ASCII)

Head -> Node1 -> Node2
Push: new -> old head -> CAS
Pop: read head -> CAS to next

How it works (step-by-step)

Load head.
If null, return empty.
Read next pointer.
CAS head to next.
On success, return node.

Minimal concrete example

loop {
    let head = self.head.load(Ordering::Acquire);
    if head.is_null() { return None; }
    let next = unsafe { (*head).next };
    if self.head.compare_exchange(head, next, Ordering::AcqRel, Ordering::Relaxed).is_ok() {
        return Some(head);
    }
}

Common misconceptions

“Lock-free means faster.” Under contention, lock-free can be slower.
“Treiber stack is safe without reclamation.” It is not.

Check-your-understanding questions

Why is Treiber stack lock-free but not wait-free?
What happens on CAS failure?

Check-your-understanding answers

Some thread makes progress, but a single thread may starve.
The thread retries with the new head value.

Real-world applications

Work-stealing deques.
Memory allocators.

Where you’ll apply it

This project: §4.4, §5.10 Phase 2.
Also used in: Project 4: Thread-Safe Queue.

References

Treiber (1986) paper.

Key insights

CAS loops are the core of lock-free data structures.

Summary

Treiber’s stack is the simplest lock-free algorithm but still demands careful memory management.

Homework/Exercises to practice the concept

Implement a single-threaded Treiber stack first.
Add atomics and test with two threads.

Solutions to the homework/exercises

Use a Vec or linked list to simulate stack behavior.
Add AtomicPtr and CAS loop.

3. Project Specification

3.1 What You Will Build

A crate lockfree_stack implementing Treiber’s lock-free stack with safe memory reclamation and benchmarks.

3.2 Functional Requirements

push and pop are lock-free.
Safe memory reclamation (hazard pointers or epoch).
Stress tests with many threads.
CLI demo shows throughput and correctness.

3.3 Non-Functional Requirements

Safety: No use-after-free.
Performance: Competitive with mutex stack.

3.4 Example Usage / Output

$ cargo run --example lockfree_demo
threads=32 ops=1_000_000
push/pop ok
lost: 0
throughput: 8.2M ops/sec
exit code: 0

3.5 Data Formats / Schemas / Protocols

Node { data: T, next: *mut Node<T> }.

3.6 Edge Cases

Pop on empty stack.
High contention stress.
ABA risk under reuse.

3.7 Real World Outcome

Deterministic demo and failure case.

3.7.1 How to Run (Copy/Paste)

cargo run --example lockfree_demo

3.7.2 Golden Path Demo (Deterministic)

Fixed thread count and op count.

3.7.3 CLI Transcript (Success)

$ cargo run --example lockfree_demo -- --threads 4 --ops 10000
push/pop ok
lost: 0
exit code: 0

3.7.4 Failure Demo (Reclamation Off)

$ cargo run --example lockfree_demo -- --no-reclaim
error: unsafe mode without reclamation enabled
exit code: 2

4. Solution Architecture

4.1 High-Level Design

AtomicPtr<Head>
  |
 Node { data, next }

4.2 Key Components

Component	Responsibility	Key Decisions
`LockFreeStack`	API	push/pop
`Node`	link node	raw pointers
Reclaimer	safe memory	epoch or hazard

4.4 Data Structures (No Full Code)

struct Node<T> { data: T, next: *mut Node<T> }

4.4 Algorithm Overview

Key Algorithm: CAS Loop

Load head.
Set next.
CAS head.
Retry on failure.

Complexity Analysis:

Time: O(1) average per op
Space: O(n)

5. Implementation Guide

5.1 Development Environment Setup

cargo new lockfree_stack
cd lockfree_stack
cargo add crossbeam-epoch

5.2 Project Structure

lockfree_stack/
├── src/
│   ├── lib.rs
│   └── stack.rs
├── examples/
│   └── lockfree_demo.rs
└── benches/
    └── stack_bench.rs

5.3 The Core Question You’re Answering

“How can a stack be safe and correct without using locks?”

5.4 Concepts You Must Understand First

Atomics and memory ordering.
ABA problem and reclamation.
CAS loops.

5.5 Questions to Guide Your Design

What ordering is required for push/pop?
How will you reclaim nodes safely?
How will you test correctness under concurrency?

5.6 Thinking Exercise

Write Treiber’s pseudocode and annotate ordering.

5.7 The Interview Questions They’ll Ask

“What is the ABA problem?”
“Why is memory ordering important?”
“What does lock-free mean?”

5.8 Hints in Layers

Hint 1: Start with SeqCst ordering. Hint 2: Use crossbeam_epoch for reclamation. Hint 3: Benchmark against Mutex<Vec<T>>.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Single-Threaded Stack (3-4 days)

Implement basic stack without atomics.

Phase 2: Lock-Free Core (1-2 weeks)

Implement AtomicPtr + CAS loop.

Phase 3: Reclamation + Benchmarks (1 week)

Add hazard/epoch reclamation, run benchmarks.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Push/pop order correct under concurrency.
No lost nodes under stress.
Reclamation does not free live nodes.

6.3 Test Data

threads=4
ops=10000

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Use loom or crossbeam tests.
Add debug counters.

7.3 Performance Traps

Excessive contention on a single head pointer.

8. Extensions & Challenges

8.1 Beginner Extensions

Add is_empty and len (approx).

8.2 Intermediate Extensions

Implement a lock-free queue.

8.3 Advanced Extensions

Add elimination backoff stack.

9. Real-World Connections

9.1 Industry Applications

Memory allocators.
Runtime schedulers.

crossbeam.

9.3 Interview Relevance

Explain lock-free vs wait-free.

10. Resources

10.1 Essential Reading

Rust Atomics and Locks.
Art of Multiprocessor Programming.

10.2 Video Resources

Talks on lock-free programming.

10.3 Tools & Documentation

crossbeam-epoch docs.

11. Self-Assessment Checklist

11.1 Understanding

I can explain Acquire/Release ordering.
I can explain ABA and reclamation.

11.2 Implementation

Stack passes stress tests.
Reclamation is safe.

11.3 Growth

I can compare lock-free and lock-based designs.

12. Submission / Completion Criteria

Minimum Viable Completion:

Lock-free push/pop with correct ordering.
CLI demo with deterministic output and failure case.

Full Completion:

Safe reclamation strategy implemented.

Excellence (Going Above & Beyond):

Advanced elimination/backoff strategies.