Project 7: Lock-Free Concurrent Queue

Implement a multi-producer, multi-consumer lock-free queue using atomic operations.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2-3 weeks
Main Programming Language C++ or Rust
Alternative Programming Languages C, Go (with atomics)
Coolness Level Level 4: Hardcore Tech Flex
Business Potential High-performance systems
Prerequisites Concurrency, atomics, memory model basics
Key Topics CAS, memory ordering, ABA, hazard pointers

1. Learning Objectives

By completing this project, you will:

  1. Implement a lock-free MPMC queue.
  2. Use compare-and-swap correctly with memory orderings.
  3. Explain the ABA problem and mitigation strategies.
  4. Identify false sharing and align data to cache lines.
  5. Measure throughput under contention.
  6. Compare lock-free vs lock-based designs.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Atomic Operations and Memory Ordering

Description / Expanded Explanation

Atomic operations ensure updates appear indivisible. Memory ordering controls visibility of writes across threads. Incorrect ordering can yield subtle bugs.

Definitions & Key Terms
  • atomic -> indivisible operation
  • memory order -> rules for visibility (relaxed, acquire, release)
  • fence -> memory barrier
  • data race -> unsynchronized access
Mental Model Diagram (ASCII)
Thread A: write data -> release store
Thread B: acquire load -> read data
Mental Model Diagram (Image)

Memory Ordering

How It Works (Step-by-Step)
  1. Producer writes node data.
  2. Producer publishes pointer with release semantics.
  3. Consumer reads pointer with acquire semantics.
  4. Consumer sees node data consistently.
Minimal Concrete Example
head.store(newNode, std::memory_order_release);
Node* n = head.load(std::memory_order_acquire);
Common Misconceptions
  • “Atomics automatically synchronize everything” -> ordering must be chosen.
  • “Relaxed is always safe” -> it can break visibility guarantees.
Check-Your-Understanding Questions
  1. What does acquire guarantee that relaxed does not?
  2. When is seq_cst required?
  3. How can reordering break a queue?
Where You’ll Apply It
  • See 4.4 for CAS loops.
  • See 5.10 Phase 2 for implementation.
  • Also used in: P02 Load Balancer

2.2 Compare-And-Swap (CAS)

Description / Expanded Explanation

CAS updates a value only if it matches an expected value. It is the core primitive for lock-free data structures.

Definitions & Key Terms
  • CAS -> compare and swap
  • expected -> value you think is in memory
  • retry loop -> repeat CAS until success
Mental Model Diagram (ASCII)
if (*ptr == expected) *ptr = desired
Mental Model Diagram (Image)

CAS

How It Works (Step-by-Step)
  1. Read pointer.
  2. Compute next pointer.
  3. Attempt CAS; if fails, retry.
Minimal Concrete Example
while (!head.compare_exchange_weak(old, new)) { }
Common Misconceptions
  • “CAS always succeeds quickly” -> under contention it can spin.
  • “CAS replaces locks entirely” -> still needs safe memory reclamation.
Check-Your-Understanding Questions
  1. Why use compare_exchange_weak in a loop?
  2. What happens when two threads CAS the same pointer?
  3. When is CAS insufficient by itself?
Where You’ll Apply It
  • See 4.4 algorithm overview.
  • See 5.10 Phase 2 for enqueue/dequeue loops.
  • Also used in: P01 Memory Allocator

2.3 ABA Problem and Reclamation

Description / Expanded Explanation

ABA occurs when a pointer changes from A to B and back to A. CAS can incorrectly succeed. Safe memory reclamation techniques like hazard pointers or epoch-based reclamation prevent use-after-free.

Definitions & Key Terms
  • ABA -> pointer changes and returns to same value
  • hazard pointer -> thread-protected reference
  • epoch -> global phase tracking
  • reclamation -> safe memory free
Mental Model Diagram (ASCII)
A -> B -> A  (CAS sees A and thinks nothing changed)
Mental Model Diagram (Image)

ABA

How It Works (Step-by-Step)
  1. Thread reads pointer A.
  2. Another thread removes A, frees it, allocates new node at same address A.
  3. CAS sees A and succeeds incorrectly.
  4. Hazard pointers prevent freeing while another thread may use it.
Minimal Concrete Example
hazard[tid]=node; // prevent reclamation
Common Misconceptions
  • “ABA is rare” -> it can happen under heavy churn.
  • “GC solves everything” -> unmanaged languages need explicit reclamation.
Check-Your-Understanding Questions
  1. Why does ABA break CAS correctness?
  2. What is the trade-off between hazard pointers and epochs?
  3. How can tagged pointers help?
Where You’ll Apply It

2.4 Cache Lines and False Sharing

Description / Expanded Explanation

If two hot fields share the same cache line, threads invalidate each other constantly. This can destroy throughput even if correctness is fine.

Definitions & Key Terms
  • cache line -> smallest unit of cache coherence (usually 64 bytes)
  • false sharing -> unrelated data on same cache line
  • padding -> extra bytes to separate fields
Mental Model Diagram (ASCII)
[head|tail] in same cache line -> contention
Mental Model Diagram (Image)

False Sharing

How It Works (Step-by-Step)
  1. Producer updates tail pointer.
  2. Consumer updates head pointer.
  3. If same cache line, invalidations cause slowdown.
  4. Add padding to separate them.
Minimal Concrete Example
struct alignas(64) PaddedPtr { std::atomic<Node*> ptr; };
Common Misconceptions
  • “Only correctness matters” -> performance can collapse without padding.
  • “Cache lines are always 64 bytes” -> verify target architecture.
Check-Your-Understanding Questions
  1. What is false sharing?
  2. Why does padding improve throughput?
  3. How can you detect false sharing experimentally?
Where You’ll Apply It
  • See 4.3 data structures.
  • See 7.3 performance traps.
  • Also used in: P02 Load Balancer

2.5 Michael-Scott Queue Algorithm

Description / Expanded Explanation

The Michael-Scott (MS) queue is a classic lock-free MPMC queue. It uses a dummy node and CAS on head/tail to ensure progress for multiple producers and consumers.

Definitions & Key Terms
  • dummy node -> sentinel at head
  • enqueue -> append at tail
  • dequeue -> remove from head
  • lock-free -> some thread always makes progress
Mental Model Diagram (ASCII)
head -> [dummy] -> [n1] -> [n2] <- tail
Mental Model Diagram (Image)

MS Queue

How It Works (Step-by-Step)
  1. Enqueue: CAS tail->next from null to new node, then CAS tail to new node.
  2. Dequeue: CAS head to head->next, return old next value.
  3. If tail lags, help advance it.
Minimal Concrete Example
enqueue: if tail.next==null -> CAS(tail.next,new)
Common Misconceptions
  • “Lock-free means wait-free” -> lock-free allows starvation.
  • “Dummy node is optional” -> it simplifies edge cases.
Check-Your-Understanding Questions
  1. Why is a dummy node used?
  2. What does it mean to “help” advance the tail?
  3. Can dequeue return null when queue is not empty?
Where You’ll Apply It
  • See 4.4 for algorithm details.
  • See 5.10 Phase 2 for implementation.
  • Also used in: P01 Memory Allocator

3. Project Specification

3.1 What You Will Build

A lock-free queue supporting multiple producers and consumers with safe memory reclamation and performance metrics.

3.2 Functional Requirements

  1. enqueue and dequeue operations are lock-free.
  2. MPMC support for concurrent threads.
  3. Memory reclamation via hazard pointers or epochs.
  4. Correctness under contention and stress tests.

3.3 Non-Functional Requirements

  • Performance: better throughput than mutex queue under contention.
  • Reliability: no crashes or memory leaks.

3.4 Example Usage / Output

$ lfqueue-bench --threads 4 --ops 1000000
throughput=12.4M ops/sec

3.5 Data Formats / Schemas / Protocols

Queue node:

struct Node { T value; std::atomic<Node*> next; };

3.6 Edge Cases

  • Dequeue from empty queue.
  • ABA under heavy churn.
  • Tail pointer lagging behind.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./lfqueue-bench --threads 4 --ops 100000

3.7.2 Golden Path Demo (Deterministic)

Use fixed thread scheduling seed and deterministic op sequence.

3.7.3 CLI Transcript (Success)

$ ./lfqueue-bench --threads 2 --ops 10000
throughput=1.2M ops/sec
$ echo $?
0

3.7.3 CLI Transcript (Failure)

$ ./lfqueue-bench --threads 0
error: threads must be >= 1
$ echo $?
1

3.7.4 Exit Codes

  • 0 success
  • 1 invalid arguments
  • 2 detected ABA or corruption

4. Solution Architecture

4.1 High-Level Design

producers -> enqueue -> queue -> dequeue -> consumers

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Queue core | CAS loops | MS queue algorithm | | Reclaimer | safe memory free | hazard pointers | | Benchmarks | throughput tests | fixed workloads |

4.3 Data Structures (No Full Code)

struct alignas(64) PaddedAtomic { std::atomic<Node*> ptr; };

4.4 Algorithm Overview

  • Enqueue with CAS on tail and tail->next.
  • Dequeue with CAS on head.

Complexity Analysis

  • Time: O(1) expected
  • Space: O(n) nodes

5. Implementation Guide

5.1 Development Environment Setup

make

5.2 Project Structure

project-root/
├── src/queue/
├── src/reclaim/
├── src/bench/
└── tests/

5.3 The Core Question You’re Answering

“How can multiple threads share a queue without locks and still be correct?”

5.4 Concepts You Must Understand First

  1. Atomics and memory ordering
  2. CAS loops
  3. ABA problem
  4. False sharing

5.5 Questions to Guide Your Design

  1. Which memory order is required for enqueue and dequeue?
  2. How will you safely free nodes?
  3. How do you measure throughput?

5.6 Thinking Exercise

Simulate two producers enqueueing while a consumer dequeues. Draw the head and tail pointers after each step.

5.7 The Interview Questions They’ll Ask

  1. What is lock-free vs wait-free?
  2. Explain the ABA problem.
  3. Why is memory order important?

5.8 Hints in Layers

Hint 1: Implement single-producer single-consumer first. Hint 2: Add CAS for multiple producers. Hint 3: Add hazard pointers for reclamation.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Atomics | C++ Concurrency in Action | Ch. 5 | | Lock-free | Rust Atomics and Locks | Ch. 3 |

5.10 Implementation Phases

Phase 1: Foundation (4-6 days)

Single-threaded queue, then SPSC.

Phase 2: Core Functionality (7-10 days)

MPMC with CAS.

Phase 3: Polish and Edge Cases (4-6 days)

Hazard pointers, benchmarks, padding.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Reclamation | hazard vs epoch | hazard | simple to reason about | | Memory order | seq_cst vs acquire/release | acquire/release | faster and correct | | Queue algorithm | MS queue | MS queue | proven correctness |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | enqueue/dequeue correctness | basic FIFO | | Stress Tests | concurrent ops | 8 threads | | ABA Tests | forced reuse | custom allocator |

6.2 Critical Test Cases

  1. FIFO ordering under concurrency.
  2. Dequeue on empty returns null.
  3. Hazard pointers prevent use-after-free.

6.3 Test Data

ops: enqueue 1..1M, dequeue 1..1M

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Wrong ordering | sporadic failures | fix acquire/release | | ABA bugs | rare crashes | add hazard pointers | | False sharing | low throughput | add padding |

7.2 Debugging Strategies

  • Use thread sanitizer (TSan).
  • Add validation checks under debug builds.

7.3 Performance Traps

  • Heavy contention without backoff.
  • Frequent allocations without pooling.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add SPSC queue for comparison.
  • Add simple benchmark output.

8.2 Intermediate Extensions

  • Add MPMC with bounded capacity.
  • Add backoff strategy.

8.3 Advanced Extensions

  • Implement wait-free queue.
  • Add lock-free stack and compare.

9. Real-World Connections

9.1 Industry Applications

  • High-performance messaging systems.
  • Task schedulers and work queues.
  • moodycamel concurrent queue
  • folly

9.3 Interview Relevance

  • Shows mastery of low-level concurrency and memory models.

10. Resources

10.1 Essential Reading

  • C++ Concurrency in Action (Chapter 5)
  • Rust Atomics and Locks (Chapters 1-3)

10.2 Video Resources

  • Lock-free data structure lectures

10.3 Tools & Documentation

  • ThreadSanitizer, perf

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain acquire vs release ordering.
  • I can explain ABA and hazard pointers.
  • I can explain why false sharing matters.

11.2 Implementation

  • Queue is correct under stress tests.
  • Memory reclamation is safe.
  • Benchmarks show expected throughput.

11.3 Growth

  • I can compare lock-free vs lock-based queues.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Correct enqueue/dequeue for single thread.
  • Basic CAS-based MPMC implementation.

Full Completion:

  • Safe memory reclamation.
  • Benchmarks showing improvement over mutex queue.

Excellence (Going Above & Beyond):

  • Wait-free variant.
  • Formal proof or model checking.