Project 7: Lock-Free Concurrent Queue
Implement a multi-producer, multi-consumer lock-free queue using atomic operations.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C++ or Rust |
| Alternative Programming Languages | C, Go (with atomics) |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | High-performance systems |
| Prerequisites | Concurrency, atomics, memory model basics |
| Key Topics | CAS, memory ordering, ABA, hazard pointers |
1. Learning Objectives
By completing this project, you will:
- Implement a lock-free MPMC queue.
- Use compare-and-swap correctly with memory orderings.
- Explain the ABA problem and mitigation strategies.
- Identify false sharing and align data to cache lines.
- Measure throughput under contention.
- Compare lock-free vs lock-based designs.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Atomic Operations and Memory Ordering
Description / Expanded Explanation
Atomic operations ensure updates appear indivisible. Memory ordering controls visibility of writes across threads. Incorrect ordering can yield subtle bugs.
Definitions & Key Terms
- atomic -> indivisible operation
- memory order -> rules for visibility (relaxed, acquire, release)
- fence -> memory barrier
- data race -> unsynchronized access
Mental Model Diagram (ASCII)
Thread A: write data -> release store
Thread B: acquire load -> read data
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Producer writes node data.
- Producer publishes pointer with release semantics.
- Consumer reads pointer with acquire semantics.
- Consumer sees node data consistently.
Minimal Concrete Example
head.store(newNode, std::memory_order_release);
Node* n = head.load(std::memory_order_acquire);
Common Misconceptions
- “Atomics automatically synchronize everything” -> ordering must be chosen.
- “Relaxed is always safe” -> it can break visibility guarantees.
Check-Your-Understanding Questions
- What does acquire guarantee that relaxed does not?
- When is seq_cst required?
- How can reordering break a queue?
Where You’ll Apply It
- See 4.4 for CAS loops.
- See 5.10 Phase 2 for implementation.
- Also used in: P02 Load Balancer
2.2 Compare-And-Swap (CAS)
Description / Expanded Explanation
CAS updates a value only if it matches an expected value. It is the core primitive for lock-free data structures.
Definitions & Key Terms
- CAS -> compare and swap
- expected -> value you think is in memory
- retry loop -> repeat CAS until success
Mental Model Diagram (ASCII)
if (*ptr == expected) *ptr = desired
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Read pointer.
- Compute next pointer.
- Attempt CAS; if fails, retry.
Minimal Concrete Example
while (!head.compare_exchange_weak(old, new)) { }
Common Misconceptions
- “CAS always succeeds quickly” -> under contention it can spin.
- “CAS replaces locks entirely” -> still needs safe memory reclamation.
Check-Your-Understanding Questions
- Why use compare_exchange_weak in a loop?
- What happens when two threads CAS the same pointer?
- When is CAS insufficient by itself?
Where You’ll Apply It
- See 4.4 algorithm overview.
- See 5.10 Phase 2 for enqueue/dequeue loops.
- Also used in: P01 Memory Allocator
2.3 ABA Problem and Reclamation
Description / Expanded Explanation
ABA occurs when a pointer changes from A to B and back to A. CAS can incorrectly succeed. Safe memory reclamation techniques like hazard pointers or epoch-based reclamation prevent use-after-free.
Definitions & Key Terms
- ABA -> pointer changes and returns to same value
- hazard pointer -> thread-protected reference
- epoch -> global phase tracking
- reclamation -> safe memory free
Mental Model Diagram (ASCII)
A -> B -> A (CAS sees A and thinks nothing changed)
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Thread reads pointer A.
- Another thread removes A, frees it, allocates new node at same address A.
- CAS sees A and succeeds incorrectly.
- Hazard pointers prevent freeing while another thread may use it.
Minimal Concrete Example
hazard[tid]=node; // prevent reclamation
Common Misconceptions
- “ABA is rare” -> it can happen under heavy churn.
- “GC solves everything” -> unmanaged languages need explicit reclamation.
Check-Your-Understanding Questions
- Why does ABA break CAS correctness?
- What is the trade-off between hazard pointers and epochs?
- How can tagged pointers help?
Where You’ll Apply It
- See 3.2 for requirements (memory safety).
- See 5.10 Phase 3 for reclamation.
- Also used in: P03 Persistent Key-Value Store
2.4 Cache Lines and False Sharing
Description / Expanded Explanation
If two hot fields share the same cache line, threads invalidate each other constantly. This can destroy throughput even if correctness is fine.
Definitions & Key Terms
- cache line -> smallest unit of cache coherence (usually 64 bytes)
- false sharing -> unrelated data on same cache line
- padding -> extra bytes to separate fields
Mental Model Diagram (ASCII)
[head|tail] in same cache line -> contention
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Producer updates tail pointer.
- Consumer updates head pointer.
- If same cache line, invalidations cause slowdown.
- Add padding to separate them.
Minimal Concrete Example
struct alignas(64) PaddedPtr { std::atomic<Node*> ptr; };
Common Misconceptions
- “Only correctness matters” -> performance can collapse without padding.
- “Cache lines are always 64 bytes” -> verify target architecture.
Check-Your-Understanding Questions
- What is false sharing?
- Why does padding improve throughput?
- How can you detect false sharing experimentally?
Where You’ll Apply It
- See 4.3 data structures.
- See 7.3 performance traps.
- Also used in: P02 Load Balancer
2.5 Michael-Scott Queue Algorithm
Description / Expanded Explanation
The Michael-Scott (MS) queue is a classic lock-free MPMC queue. It uses a dummy node and CAS on head/tail to ensure progress for multiple producers and consumers.
Definitions & Key Terms
- dummy node -> sentinel at head
- enqueue -> append at tail
- dequeue -> remove from head
- lock-free -> some thread always makes progress
Mental Model Diagram (ASCII)
head -> [dummy] -> [n1] -> [n2] <- tail
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Enqueue: CAS tail->next from null to new node, then CAS tail to new node.
- Dequeue: CAS head to head->next, return old next value.
- If tail lags, help advance it.
Minimal Concrete Example
enqueue: if tail.next==null -> CAS(tail.next,new)
Common Misconceptions
- “Lock-free means wait-free” -> lock-free allows starvation.
- “Dummy node is optional” -> it simplifies edge cases.
Check-Your-Understanding Questions
- Why is a dummy node used?
- What does it mean to “help” advance the tail?
- Can dequeue return null when queue is not empty?
Where You’ll Apply It
- See 4.4 for algorithm details.
- See 5.10 Phase 2 for implementation.
- Also used in: P01 Memory Allocator
3. Project Specification
3.1 What You Will Build
A lock-free queue supporting multiple producers and consumers with safe memory reclamation and performance metrics.
3.2 Functional Requirements
- enqueue and dequeue operations are lock-free.
- MPMC support for concurrent threads.
- Memory reclamation via hazard pointers or epochs.
- Correctness under contention and stress tests.
3.3 Non-Functional Requirements
- Performance: better throughput than mutex queue under contention.
- Reliability: no crashes or memory leaks.
3.4 Example Usage / Output
$ lfqueue-bench --threads 4 --ops 1000000
throughput=12.4M ops/sec
3.5 Data Formats / Schemas / Protocols
Queue node:
struct Node { T value; std::atomic<Node*> next; };
3.6 Edge Cases
- Dequeue from empty queue.
- ABA under heavy churn.
- Tail pointer lagging behind.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make
./lfqueue-bench --threads 4 --ops 100000
3.7.2 Golden Path Demo (Deterministic)
Use fixed thread scheduling seed and deterministic op sequence.
3.7.3 CLI Transcript (Success)
$ ./lfqueue-bench --threads 2 --ops 10000
throughput=1.2M ops/sec
$ echo $?
0
3.7.3 CLI Transcript (Failure)
$ ./lfqueue-bench --threads 0
error: threads must be >= 1
$ echo $?
1
3.7.4 Exit Codes
0success1invalid arguments2detected ABA or corruption
4. Solution Architecture
4.1 High-Level Design
producers -> enqueue -> queue -> dequeue -> consumers
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Queue core | CAS loops | MS queue algorithm | | Reclaimer | safe memory free | hazard pointers | | Benchmarks | throughput tests | fixed workloads |
4.3 Data Structures (No Full Code)
struct alignas(64) PaddedAtomic { std::atomic<Node*> ptr; };
4.4 Algorithm Overview
- Enqueue with CAS on tail and tail->next.
- Dequeue with CAS on head.
Complexity Analysis
- Time: O(1) expected
- Space: O(n) nodes
5. Implementation Guide
5.1 Development Environment Setup
make
5.2 Project Structure
project-root/
├── src/queue/
├── src/reclaim/
├── src/bench/
└── tests/
5.3 The Core Question You’re Answering
“How can multiple threads share a queue without locks and still be correct?”
5.4 Concepts You Must Understand First
- Atomics and memory ordering
- CAS loops
- ABA problem
- False sharing
5.5 Questions to Guide Your Design
- Which memory order is required for enqueue and dequeue?
- How will you safely free nodes?
- How do you measure throughput?
5.6 Thinking Exercise
Simulate two producers enqueueing while a consumer dequeues. Draw the head and tail pointers after each step.
5.7 The Interview Questions They’ll Ask
- What is lock-free vs wait-free?
- Explain the ABA problem.
- Why is memory order important?
5.8 Hints in Layers
Hint 1: Implement single-producer single-consumer first. Hint 2: Add CAS for multiple producers. Hint 3: Add hazard pointers for reclamation.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Atomics | C++ Concurrency in Action | Ch. 5 | | Lock-free | Rust Atomics and Locks | Ch. 3 |
5.10 Implementation Phases
Phase 1: Foundation (4-6 days)
Single-threaded queue, then SPSC.
Phase 2: Core Functionality (7-10 days)
MPMC with CAS.
Phase 3: Polish and Edge Cases (4-6 days)
Hazard pointers, benchmarks, padding.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Reclamation | hazard vs epoch | hazard | simple to reason about | | Memory order | seq_cst vs acquire/release | acquire/release | faster and correct | | Queue algorithm | MS queue | MS queue | proven correctness |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | enqueue/dequeue correctness | basic FIFO | | Stress Tests | concurrent ops | 8 threads | | ABA Tests | forced reuse | custom allocator |
6.2 Critical Test Cases
- FIFO ordering under concurrency.
- Dequeue on empty returns null.
- Hazard pointers prevent use-after-free.
6.3 Test Data
ops: enqueue 1..1M, dequeue 1..1M
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | Wrong ordering | sporadic failures | fix acquire/release | | ABA bugs | rare crashes | add hazard pointers | | False sharing | low throughput | add padding |
7.2 Debugging Strategies
- Use thread sanitizer (TSan).
- Add validation checks under debug builds.
7.3 Performance Traps
- Heavy contention without backoff.
- Frequent allocations without pooling.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add SPSC queue for comparison.
- Add simple benchmark output.
8.2 Intermediate Extensions
- Add MPMC with bounded capacity.
- Add backoff strategy.
8.3 Advanced Extensions
- Implement wait-free queue.
- Add lock-free stack and compare.
9. Real-World Connections
9.1 Industry Applications
- High-performance messaging systems.
- Task schedulers and work queues.
9.2 Related Open Source Projects
- moodycamel concurrent queue
- folly
9.3 Interview Relevance
- Shows mastery of low-level concurrency and memory models.
10. Resources
10.1 Essential Reading
- C++ Concurrency in Action (Chapter 5)
- Rust Atomics and Locks (Chapters 1-3)
10.2 Video Resources
- Lock-free data structure lectures
10.3 Tools & Documentation
- ThreadSanitizer, perf
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain acquire vs release ordering.
- I can explain ABA and hazard pointers.
- I can explain why false sharing matters.
11.2 Implementation
- Queue is correct under stress tests.
- Memory reclamation is safe.
- Benchmarks show expected throughput.
11.3 Growth
- I can compare lock-free vs lock-based queues.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Correct enqueue/dequeue for single thread.
- Basic CAS-based MPMC implementation.
Full Completion:
- Safe memory reclamation.
- Benchmarks showing improvement over mutex queue.
Excellence (Going Above & Beyond):
- Wait-free variant.
- Formal proof or model checking.