Project 4: Memory-Mapped Ring Buffer IPC
Build a shared-memory ring buffer for zero-copy IPC between processes.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Language | C (Alternatives: Rust) |
| Prerequisites | Pointers, basic atomics concepts, mmap basics |
| Key Topics | mmap, cache coherency, ring buffers |
1. Learning Objectives
By completing this project, you will:
- Create a shared memory region backed by a file with
mmap(). - Implement a lock-free or low-lock ring buffer.
- Handle memory ordering and cache visibility across processes.
- Recover safely from producer or consumer crashes.
2. Theoretical Foundation
2.1 Core Concepts
- Memory-mapped files: Map a file into virtual memory for direct access.
- Cache coherence: Writes become visible at different times across cores.
- Ring buffer design: Fixed-size circular queue using head/tail indices.
- Memory ordering: Atomics ensure correct visibility across processes.
2.2 Why This Matters
Shared memory IPC avoids kernel copying and can be orders of magnitude faster than sockets or pipes. It also exposes low-level realities like page alignment and cache effects.
2.3 Historical Context / Background
High-frequency trading and telemetry systems popularized shared-memory ring buffers for low latency. The LMAX Disruptor formalized many design principles.
2.4 Common Misconceptions
- “
mmap()means data is instantly on disk.” Not withoutmsync(). - “Atomic increment is enough.” You must order data writes before index updates.
3. Project Specification
3.1 What You Will Build
A producer process writes log entries into a shared memory ring buffer. A consumer process reads and prints entries in real time, without using sockets or pipes.
3.2 Functional Requirements
- Shared memory setup: Create and map a file in both processes.
- Ring buffer: Fixed-size buffer with head/tail indices.
- Synchronization: Atomics or minimal locking for safe access.
- Crash recovery: Detect stale indices and recover.
3.3 Non-Functional Requirements
- Performance: 100k messages/sec on localhost.
- Reliability: No torn reads or corrupted messages.
- Usability: CLI for producer/consumer mode.
3.4 Example Usage / Output
$ ./shm-ring --mode=producer --file=/tmp/ring.bin
$ ./shm-ring --mode=consumer --file=/tmp/ring.bin
[ring] msg_id=1023 payload="worker started"
3.5 Real World Outcome
You run the producer and consumer in separate terminals. The consumer prints messages with minimal latency. Example output:
$ ./shm-ring --mode=consumer --file=/tmp/ring.bin
[ring] 2025-01-10T12:10:11Z id=1241 payload="request 9f2 latency=3ms"
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ mmap ┌──────────────┐
│ Producer │◀───────▶│ Ring Buffer │
└──────────────┘ └──────────────┘
▲ ▼
└──────────────┬──────┘
│
Consumer

4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| SharedHeader | head/tail indices | use atomics |
| RingBuffer | message storage | fixed-size slots |
| Recovery | detect stale state | magic/version fields |
4.3 Data Structures
struct ring_header {
_Atomic uint64_t head;
_Atomic uint64_t tail;
uint64_t capacity;
uint64_t magic;
};
struct ring_msg {
uint64_t id;
uint32_t len;
char payload[256];
};
4.4 Algorithm Overview
Key Algorithm: single-producer, single-consumer
- Producer reads
tail, computes next slot. - Writes payload, then atomically advances
tail. - Consumer reads
head, then payload, then advanceshead.
Complexity Analysis:
- Time: O(1) per message
- Space: O(N) for buffer slots
5. Implementation Guide
5.1 Development Environment Setup
sudo apt-get install build-essential
5.2 Project Structure
shm-ring/
├── src/
│ ├── main.c
│ ├── ring.c
│ └── ring.h
├── tests/
│ └── test_throughput.sh
├── Makefile
└── README.md

5.3 The Core Question You’re Answering
“How do two processes share memory safely without the kernel mediating every access?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Memory mappings
mmap()flags:MAP_SHARED,PROT_READ|PROT_WRITE- Book Reference: “TLPI” Ch. 49
- Memory ordering
- Acquire/release semantics for head/tail updates
- Book Reference: “Rust Atomics and Locks” Ch. 3
- Ring buffers
- Wrap-around arithmetic and capacity
- Reference: LMAX Disruptor white paper
5.5 Questions to Guide Your Design
Before implementing, think through these:
- How do you avoid overwriting unread data?
- What happens if producer crashes mid-write?
- How do you detect buffer corruption?
- How do you align data to cache lines?
5.6 Thinking Exercise
Draw the Ring
Sketch a buffer of size 8. Move head/tail indices through a series of writes and reads. Note when the buffer is full.
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “What is
mmap()and how does it differ fromread()?” - “How do you prevent data races in shared memory?”
- “Why is cache line alignment important?”
5.8 Hints in Layers
Hint 1: Start single-producer/single-consumer Avoid multi-producer complexity at first.
Hint 2: Use a magic header Store a magic number and version to detect stale mappings.
Hint 3: Consider padding Separate head and tail into different cache lines to avoid false sharing.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Memory mappings | “The Linux Programming Interface” | Ch. 49 |
| Virtual memory | “Computer Systems: A Programmer’s Perspective” | Ch. 9 |
| Memory ordering | “Rust Atomics and Locks” | Ch. 3 |
5.10 Implementation Phases
Phase 1: Foundation (3-4 days)
Goals:
- File creation and
mmap() - Basic shared struct
Tasks:
- Map a file and write/read a struct.
- Validate mappings across two processes.
Checkpoint: Both processes see the same data.
Phase 2: Core Functionality (5-7 days)
Goals:
- Ring buffer logic
- Atomic head/tail
Tasks:
- Implement enqueue/dequeue.
- Add acquire/release semantics.
Checkpoint: Producer/consumer run without corruption.
Phase 3: Polish & Edge Cases (3-5 days)
Goals:
- Crash recovery
- Throughput metrics
Tasks:
- Add recovery logic when indices invalid.
- Measure throughput under load.
Checkpoint: Recovers after producer restart.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Synchronization | locks vs atomics | atomics | lower latency |
| Buffer layout | variable vs fixed | fixed-size slots | simple indexing |
| Persistence | msync() vs none |
optional | performance vs durability |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Ring arithmetic | wrap-around tests |
| Integration Tests | Producer/consumer | cross-process test |
| Stress Tests | Throughput | 100k msgs/sec |
6.2 Critical Test Cases
- Wrap-around: head/tail crossing end of buffer.
- Crash recovery: kill producer mid-write.
- High throughput: sustained writes for 60s.
6.3 Test Data
msg_id=1 payload="hello"
msg_id=2 payload="world"
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Missing memory barriers | corrupted reads | use acquire/release |
| Overwrite unread data | lost messages | check full condition |
| False sharing | low throughput | pad header fields |
7.2 Debugging Strategies
- Use
perfto inspect cache misses. - Add sequence numbers to detect dropped messages.
7.3 Performance Traps
Calling msync() on every write destroys performance. Batch it or avoid.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a CLI flag for buffer size.
- Support variable-length payloads with fixed max.
8.2 Intermediate Extensions
- Add multi-consumer support with sequence tracking.
- Add shared memory stats page.
8.3 Advanced Extensions
- Implement multi-producer using CAS loops.
- Support huge pages for performance.
9. Real-World Connections
9.1 Industry Applications
- Telemetry pipelines in HFT and observability.
- Shared memory queues in database engines.
9.2 Related Open Source Projects
- LMAX Disruptor: https://github.com/LMAX-Exchange/disruptor - Ring buffer design
- systemd-journal: https://github.com/systemd/systemd - Uses mmap for log buffers
9.3 Interview Relevance
- Demonstrates understanding of memory mapping and atomics.
- Shows ability to reason about cache effects.
10. Resources
10.1 Essential Reading
- “The Linux Programming Interface” by Michael Kerrisk - Ch. 49
- “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Ch. 9
10.2 Video Resources
- Memory mapping deep dives - YouTube
- Lock-free data structure talks
10.3 Tools & Documentation
man 2 mmap: Memory mappingman 2 msync: Sync to disk
10.4 Related Projects in This Series
- Project 1 teaches inode and FD discipline.
- Project 6 can optionally reuse shared memory for checksums.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain how
mmap()shares memory across processes. - I can describe memory ordering needs.
- I can reason about ring buffer wrap-around.
11.2 Implementation
- Producer/consumer run without corruption.
- Throughput meets target.
- Crash recovery works.
11.3 Growth
- I can explain this project in an interview.
- I documented performance trade-offs.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Producer and consumer exchange messages via mmap.
- No message corruption under normal load.
Full Completion:
- Crash recovery and throughput metrics.
- Basic memory ordering correctness.
Excellence (Going Above & Beyond):
- Multi-producer support with correctness tests.
- Optional durability with batched
msync().
This guide was generated from SPRINT_5_SYSTEMS_INTEGRATION_PROJECTS.md. For the complete learning path, see the parent directory.