Project 7: Lock-Free Concurrent Queue

Implement a multi-producer, multi-consumer lock-free queue using atomic operations.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	2-3 weeks
Main Programming Language	C++ or Rust
Alternative Programming Languages	C, Go (with atomics)
Coolness Level	Level 4: Hardcore Tech Flex
Business Potential	High-performance systems
Prerequisites	Concurrency, atomics, memory model basics
Key Topics	CAS, memory ordering, ABA, hazard pointers

1. Learning Objectives

By completing this project, you will:

Implement a lock-free MPMC queue.
Use compare-and-swap correctly with memory orderings.
Explain the ABA problem and mitigation strategies.
Identify false sharing and align data to cache lines.
Measure throughput under contention.
Compare lock-free vs lock-based designs.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Atomic Operations and Memory Ordering

Description / Expanded Explanation

Atomic operations ensure updates appear indivisible. Memory ordering controls visibility of writes across threads. Incorrect ordering can yield subtle bugs.

Definitions & Key Terms

atomic -> indivisible operation
memory order -> rules for visibility (relaxed, acquire, release)
fence -> memory barrier
data race -> unsynchronized access

Mental Model Diagram (ASCII)

Thread A: write data -> release store
Thread B: acquire load -> read data

Mental Model Diagram (Image)

Memory Ordering

How It Works (Step-by-Step)

Producer writes node data.
Producer publishes pointer with release semantics.
Consumer reads pointer with acquire semantics.
Consumer sees node data consistently.

Minimal Concrete Example

head.store(newNode, std::memory_order_release);
Node* n = head.load(std::memory_order_acquire);

Common Misconceptions

“Atomics automatically synchronize everything” -> ordering must be chosen.
“Relaxed is always safe” -> it can break visibility guarantees.

Check-Your-Understanding Questions

What does acquire guarantee that relaxed does not?
When is seq_cst required?
How can reordering break a queue?

Where You’ll Apply It

See 4.4 for CAS loops.
See 5.10 Phase 2 for implementation.
Also used in: P02 Load Balancer

2.2 Compare-And-Swap (CAS)

Description / Expanded Explanation

CAS updates a value only if it matches an expected value. It is the core primitive for lock-free data structures.

Definitions & Key Terms

CAS -> compare and swap
expected -> value you think is in memory
retry loop -> repeat CAS until success

Mental Model Diagram (ASCII)

if (*ptr == expected) *ptr = desired

Mental Model Diagram (Image)

CAS

How It Works (Step-by-Step)

Read pointer.
Compute next pointer.
Attempt CAS; if fails, retry.

Minimal Concrete Example

while (!head.compare_exchange_weak(old, new)) { }

Common Misconceptions

“CAS always succeeds quickly” -> under contention it can spin.
“CAS replaces locks entirely” -> still needs safe memory reclamation.

Check-Your-Understanding Questions

Why use compare_exchange_weak in a loop?
What happens when two threads CAS the same pointer?
When is CAS insufficient by itself?

Where You’ll Apply It

See 4.4 algorithm overview.
See 5.10 Phase 2 for enqueue/dequeue loops.
Also used in: P01 Memory Allocator

2.3 ABA Problem and Reclamation

Description / Expanded Explanation

ABA occurs when a pointer changes from A to B and back to A. CAS can incorrectly succeed. Safe memory reclamation techniques like hazard pointers or epoch-based reclamation prevent use-after-free.

Definitions & Key Terms

ABA -> pointer changes and returns to same value
hazard pointer -> thread-protected reference
epoch -> global phase tracking
reclamation -> safe memory free

Mental Model Diagram (ASCII)

A -> B -> A  (CAS sees A and thinks nothing changed)

Mental Model Diagram (Image)

ABA

How It Works (Step-by-Step)

Thread reads pointer A.
Another thread removes A, frees it, allocates new node at same address A.
CAS sees A and succeeds incorrectly.
Hazard pointers prevent freeing while another thread may use it.

Minimal Concrete Example

hazard[tid]=node; // prevent reclamation

Common Misconceptions

“ABA is rare” -> it can happen under heavy churn.
“GC solves everything” -> unmanaged languages need explicit reclamation.

Check-Your-Understanding Questions

Why does ABA break CAS correctness?
What is the trade-off between hazard pointers and epochs?
How can tagged pointers help?

Where You’ll Apply It

See 3.2 for requirements (memory safety).
See 5.10 Phase 3 for reclamation.
Also used in: P03 Persistent Key-Value Store

Description / Expanded Explanation

If two hot fields share the same cache line, threads invalidate each other constantly. This can destroy throughput even if correctness is fine.

Definitions & Key Terms

cache line -> smallest unit of cache coherence (usually 64 bytes)
false sharing -> unrelated data on same cache line
padding -> extra bytes to separate fields

Mental Model Diagram (ASCII)

[head|tail] in same cache line -> contention

Mental Model Diagram (Image)

False Sharing

How It Works (Step-by-Step)

Producer updates tail pointer.
Consumer updates head pointer.
If same cache line, invalidations cause slowdown.
Add padding to separate them.

Minimal Concrete Example

struct alignas(64) PaddedPtr { std::atomic<Node*> ptr; };

Common Misconceptions

“Only correctness matters” -> performance can collapse without padding.
“Cache lines are always 64 bytes” -> verify target architecture.

Check-Your-Understanding Questions

What is false sharing?
Why does padding improve throughput?
How can you detect false sharing experimentally?

Where You’ll Apply It

See 4.3 data structures.
See 7.3 performance traps.
Also used in: P02 Load Balancer

2.5 Michael-Scott Queue Algorithm

Description / Expanded Explanation

The Michael-Scott (MS) queue is a classic lock-free MPMC queue. It uses a dummy node and CAS on head/tail to ensure progress for multiple producers and consumers.

Definitions & Key Terms

dummy node -> sentinel at head
enqueue -> append at tail
dequeue -> remove from head
lock-free -> some thread always makes progress

Mental Model Diagram (ASCII)

head -> [dummy] -> [n1] -> [n2] <- tail

Mental Model Diagram (Image)

MS Queue

How It Works (Step-by-Step)

Enqueue: CAS tail->next from null to new node, then CAS tail to new node.
Dequeue: CAS head to head->next, return old next value.
If tail lags, help advance it.

Minimal Concrete Example

enqueue: if tail.next==null -> CAS(tail.next,new)

Common Misconceptions

“Lock-free means wait-free” -> lock-free allows starvation.
“Dummy node is optional” -> it simplifies edge cases.

Check-Your-Understanding Questions

Why is a dummy node used?
What does it mean to “help” advance the tail?
Can dequeue return null when queue is not empty?

Where You’ll Apply It

See 4.4 for algorithm details.
See 5.10 Phase 2 for implementation.
Also used in: P01 Memory Allocator

3. Project Specification

3.1 What You Will Build

A lock-free queue supporting multiple producers and consumers with safe memory reclamation and performance metrics.

3.2 Functional Requirements

enqueue and dequeue operations are lock-free.
MPMC support for concurrent threads.
Memory reclamation via hazard pointers or epochs.
Correctness under contention and stress tests.

3.3 Non-Functional Requirements

Performance: better throughput than mutex queue under contention.
Reliability: no crashes or memory leaks.

3.4 Example Usage / Output

$ lfqueue-bench --threads 4 --ops 1000000
throughput=12.4M ops/sec

3.5 Data Formats / Schemas / Protocols

Queue node:

struct Node { T value; std::atomic<Node*> next; };

3.6 Edge Cases

Dequeue from empty queue.
ABA under heavy churn.
Tail pointer lagging behind.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./lfqueue-bench --threads 4 --ops 100000

3.7.2 Golden Path Demo (Deterministic)

Use fixed thread scheduling seed and deterministic op sequence.

3.7.3 CLI Transcript (Success)

$ ./lfqueue-bench --threads 2 --ops 10000
throughput=1.2M ops/sec
$ echo $?
0

3.7.3 CLI Transcript (Failure)

$ ./lfqueue-bench --threads 0
error: threads must be >= 1
$ echo $?
1

3.7.4 Exit Codes

0 success
1 invalid arguments
2 detected ABA or corruption

4. Solution Architecture

4.1 High-Level Design

producers -> enqueue -> queue -> dequeue -> consumers

4.2 Key Components

4.3 Data Structures (No Full Code)

struct alignas(64) PaddedAtomic { std::atomic<Node*> ptr; };

4.4 Algorithm Overview

Enqueue with CAS on tail and tail->next.
Dequeue with CAS on head.

Complexity Analysis

Time: O(1) expected
Space: O(n) nodes

5. Implementation Guide

5.1 Development Environment Setup

make

5.2 Project Structure

project-root/
├── src/queue/
├── src/reclaim/
├── src/bench/
└── tests/

5.3 The Core Question You’re Answering

“How can multiple threads share a queue without locks and still be correct?”

5.4 Concepts You Must Understand First

Atomics and memory ordering
CAS loops
ABA problem
False sharing

5.5 Questions to Guide Your Design

Which memory order is required for enqueue and dequeue?
How will you safely free nodes?
How do you measure throughput?

5.6 Thinking Exercise

Simulate two producers enqueueing while a consumer dequeues. Draw the head and tail pointers after each step.

5.7 The Interview Questions They’ll Ask

What is lock-free vs wait-free?
Explain the ABA problem.
Why is memory order important?

5.8 Hints in Layers

Hint 1: Implement single-producer single-consumer first. Hint 2: Add CAS for multiple producers. Hint 3: Add hazard pointers for reclamation.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Atomics | C++ Concurrency in Action | Ch. 5 | | Lock-free | Rust Atomics and Locks | Ch. 3 |

5.10 Implementation Phases

Phase 1: Foundation (4-6 days)

Single-threaded queue, then SPSC.

Phase 2: Core Functionality (7-10 days)

MPMC with CAS.

Phase 3: Polish and Edge Cases (4-6 days)

Hazard pointers, benchmarks, padding.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

FIFO ordering under concurrency.
Dequeue on empty returns null.
Hazard pointers prevent use-after-free.

6.3 Test Data

ops: enqueue 1..1M, dequeue 1..1M

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Use thread sanitizer (TSan).
Add validation checks under debug builds.

7.3 Performance Traps

Heavy contention without backoff.
Frequent allocations without pooling.

8. Extensions & Challenges

8.1 Beginner Extensions

Add SPSC queue for comparison.
Add simple benchmark output.

8.2 Intermediate Extensions

Add MPMC with bounded capacity.
Add backoff strategy.

8.3 Advanced Extensions

Implement wait-free queue.
Add lock-free stack and compare.

9. Real-World Connections

9.1 Industry Applications

High-performance messaging systems.
Task schedulers and work queues.

moodycamel concurrent queue
folly

9.3 Interview Relevance

Shows mastery of low-level concurrency and memory models.

10. Resources

10.1 Essential Reading

C++ Concurrency in Action (Chapter 5)
Rust Atomics and Locks (Chapters 1-3)

10.2 Video Resources

Lock-free data structure lectures

10.3 Tools & Documentation

ThreadSanitizer, perf

11. Self-Assessment Checklist

11.1 Understanding

I can explain acquire vs release ordering.
I can explain ABA and hazard pointers.
I can explain why false sharing matters.

11.2 Implementation

Queue is correct under stress tests.
Memory reclamation is safe.
Benchmarks show expected throughput.

11.3 Growth

I can compare lock-free vs lock-based queues.

12. Submission / Completion Criteria

Minimum Viable Completion:

Correct enqueue/dequeue for single thread.
Basic CAS-based MPMC implementation.

Full Completion:

Safe memory reclamation.
Benchmarks showing improvement over mutex queue.

Excellence (Going Above & Beyond):

Wait-free variant.
Formal proof or model checking.

Project 7: Lock-Free Concurrent Queue

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

2.1 Atomic Operations and Memory Ordering

Description / Expanded Explanation

Definitions & Key Terms

Mental Model Diagram (ASCII)

Mental Model Diagram (Image)

How It Works (Step-by-Step)

Minimal Concrete Example

Common Misconceptions

Check-Your-Understanding Questions

Where You’ll Apply It

2.2 Compare-And-Swap (CAS)

Description / Expanded Explanation

Definitions & Key Terms

Mental Model Diagram (ASCII)

Mental Model Diagram (Image)

How It Works (Step-by-Step)

Minimal Concrete Example

Common Misconceptions

Check-Your-Understanding Questions

Where You’ll Apply It

2.3 ABA Problem and Reclamation

Description / Expanded Explanation

Definitions & Key Terms

Mental Model Diagram (ASCII)

Mental Model Diagram (Image)

How It Works (Step-by-Step)

Minimal Concrete Example

Common Misconceptions

Check-Your-Understanding Questions

Where You’ll Apply It

2.4 Cache Lines and False Sharing

Description / Expanded Explanation

Definitions & Key Terms

Mental Model Diagram (ASCII)

Mental Model Diagram (Image)

How It Works (Step-by-Step)

Minimal Concrete Example

Common Misconceptions

Check-Your-Understanding Questions

Where You’ll Apply It

2.5 Michael-Scott Queue Algorithm

Description / Expanded Explanation

Definitions & Key Terms

Mental Model Diagram (ASCII)

Mental Model Diagram (Image)

How It Works (Step-by-Step)

Minimal Concrete Example

Common Misconceptions

Check-Your-Understanding Questions

Where You’ll Apply It

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 CLI Transcript (Success)

3.7.3 CLI Transcript (Failure)

3.7.4 Exit Codes

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask