Project 4: Memory-Mapped Ring Buffer IPC

Build a shared-memory ring buffer for zero-copy IPC between processes.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2-3 weeks
Language C (Alternatives: Rust)
Prerequisites Pointers, basic atomics concepts, mmap basics
Key Topics mmap, cache coherency, ring buffers

1. Learning Objectives

By completing this project, you will:

  1. Create a shared memory region backed by a file with mmap().
  2. Implement a lock-free or low-lock ring buffer.
  3. Handle memory ordering and cache visibility across processes.
  4. Recover safely from producer or consumer crashes.

2. Theoretical Foundation

2.1 Core Concepts

  • Memory-mapped files: Map a file into virtual memory for direct access.
  • Cache coherence: Writes become visible at different times across cores.
  • Ring buffer design: Fixed-size circular queue using head/tail indices.
  • Memory ordering: Atomics ensure correct visibility across processes.

2.2 Why This Matters

Shared memory IPC avoids kernel copying and can be orders of magnitude faster than sockets or pipes. It also exposes low-level realities like page alignment and cache effects.

2.3 Historical Context / Background

High-frequency trading and telemetry systems popularized shared-memory ring buffers for low latency. The LMAX Disruptor formalized many design principles.

2.4 Common Misconceptions

  • mmap() means data is instantly on disk.” Not without msync().
  • “Atomic increment is enough.” You must order data writes before index updates.

3. Project Specification

3.1 What You Will Build

A producer process writes log entries into a shared memory ring buffer. A consumer process reads and prints entries in real time, without using sockets or pipes.

3.2 Functional Requirements

  1. Shared memory setup: Create and map a file in both processes.
  2. Ring buffer: Fixed-size buffer with head/tail indices.
  3. Synchronization: Atomics or minimal locking for safe access.
  4. Crash recovery: Detect stale indices and recover.

3.3 Non-Functional Requirements

  • Performance: 100k messages/sec on localhost.
  • Reliability: No torn reads or corrupted messages.
  • Usability: CLI for producer/consumer mode.

3.4 Example Usage / Output

$ ./shm-ring --mode=producer --file=/tmp/ring.bin
$ ./shm-ring --mode=consumer --file=/tmp/ring.bin
[ring] msg_id=1023 payload="worker started"

3.5 Real World Outcome

You run the producer and consumer in separate terminals. The consumer prints messages with minimal latency. Example output:

$ ./shm-ring --mode=consumer --file=/tmp/ring.bin
[ring] 2025-01-10T12:10:11Z id=1241 payload="request 9f2 latency=3ms"

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐   mmap   ┌──────────────┐
│  Producer    │◀───────▶│  Ring Buffer │
└──────────────┘         └──────────────┘
         ▲                     ▼
         └──────────────┬──────┘
                        │
                   Consumer

Memory-Mapped Ring Buffer Architecture

4.2 Key Components

Component Responsibility Key Decisions
SharedHeader head/tail indices use atomics
RingBuffer message storage fixed-size slots
Recovery detect stale state magic/version fields

4.3 Data Structures

struct ring_header {
    _Atomic uint64_t head;
    _Atomic uint64_t tail;
    uint64_t capacity;
    uint64_t magic;
};

struct ring_msg {
    uint64_t id;
    uint32_t len;
    char payload[256];
};

4.4 Algorithm Overview

Key Algorithm: single-producer, single-consumer

  1. Producer reads tail, computes next slot.
  2. Writes payload, then atomically advances tail.
  3. Consumer reads head, then payload, then advances head.

Complexity Analysis:

  • Time: O(1) per message
  • Space: O(N) for buffer slots

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

shm-ring/
├── src/
│   ├── main.c
│   ├── ring.c
│   └── ring.h
├── tests/
│   └── test_throughput.sh
├── Makefile
└── README.md

Ring Buffer Project Structure

5.3 The Core Question You’re Answering

“How do two processes share memory safely without the kernel mediating every access?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Memory mappings
    • mmap() flags: MAP_SHARED, PROT_READ|PROT_WRITE
    • Book Reference: “TLPI” Ch. 49
  2. Memory ordering
    • Acquire/release semantics for head/tail updates
    • Book Reference: “Rust Atomics and Locks” Ch. 3
  3. Ring buffers
    • Wrap-around arithmetic and capacity
    • Reference: LMAX Disruptor white paper

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. How do you avoid overwriting unread data?
  2. What happens if producer crashes mid-write?
  3. How do you detect buffer corruption?
  4. How do you align data to cache lines?

5.6 Thinking Exercise

Draw the Ring

Sketch a buffer of size 8. Move head/tail indices through a series of writes and reads. Note when the buffer is full.

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is mmap() and how does it differ from read()?”
  2. “How do you prevent data races in shared memory?”
  3. “Why is cache line alignment important?”

5.8 Hints in Layers

Hint 1: Start single-producer/single-consumer Avoid multi-producer complexity at first.

Hint 2: Use a magic header Store a magic number and version to detect stale mappings.

Hint 3: Consider padding Separate head and tail into different cache lines to avoid false sharing.

5.9 Books That Will Help

Topic Book Chapter
Memory mappings “The Linux Programming Interface” Ch. 49
Virtual memory “Computer Systems: A Programmer’s Perspective” Ch. 9
Memory ordering “Rust Atomics and Locks” Ch. 3

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

Goals:

  • File creation and mmap()
  • Basic shared struct

Tasks:

  1. Map a file and write/read a struct.
  2. Validate mappings across two processes.

Checkpoint: Both processes see the same data.

Phase 2: Core Functionality (5-7 days)

Goals:

  • Ring buffer logic
  • Atomic head/tail

Tasks:

  1. Implement enqueue/dequeue.
  2. Add acquire/release semantics.

Checkpoint: Producer/consumer run without corruption.

Phase 3: Polish & Edge Cases (3-5 days)

Goals:

  • Crash recovery
  • Throughput metrics

Tasks:

  1. Add recovery logic when indices invalid.
  2. Measure throughput under load.

Checkpoint: Recovers after producer restart.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Synchronization locks vs atomics atomics lower latency
Buffer layout variable vs fixed fixed-size slots simple indexing
Persistence msync() vs none optional performance vs durability

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Ring arithmetic wrap-around tests
Integration Tests Producer/consumer cross-process test
Stress Tests Throughput 100k msgs/sec

6.2 Critical Test Cases

  1. Wrap-around: head/tail crossing end of buffer.
  2. Crash recovery: kill producer mid-write.
  3. High throughput: sustained writes for 60s.

6.3 Test Data

msg_id=1 payload="hello"
msg_id=2 payload="world"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Missing memory barriers corrupted reads use acquire/release
Overwrite unread data lost messages check full condition
False sharing low throughput pad header fields

7.2 Debugging Strategies

  • Use perf to inspect cache misses.
  • Add sequence numbers to detect dropped messages.

7.3 Performance Traps

Calling msync() on every write destroys performance. Batch it or avoid.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a CLI flag for buffer size.
  • Support variable-length payloads with fixed max.

8.2 Intermediate Extensions

  • Add multi-consumer support with sequence tracking.
  • Add shared memory stats page.

8.3 Advanced Extensions

  • Implement multi-producer using CAS loops.
  • Support huge pages for performance.

9. Real-World Connections

9.1 Industry Applications

  • Telemetry pipelines in HFT and observability.
  • Shared memory queues in database engines.
  • LMAX Disruptor: https://github.com/LMAX-Exchange/disruptor - Ring buffer design
  • systemd-journal: https://github.com/systemd/systemd - Uses mmap for log buffers

9.3 Interview Relevance

  • Demonstrates understanding of memory mapping and atomics.
  • Shows ability to reason about cache effects.

10. Resources

10.1 Essential Reading

  • “The Linux Programming Interface” by Michael Kerrisk - Ch. 49
  • “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Ch. 9

10.2 Video Resources

  • Memory mapping deep dives - YouTube
  • Lock-free data structure talks

10.3 Tools & Documentation

  • man 2 mmap: Memory mapping
  • man 2 msync: Sync to disk
  • Project 1 teaches inode and FD discipline.
  • Project 6 can optionally reuse shared memory for checksums.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain how mmap() shares memory across processes.
  • I can describe memory ordering needs.
  • I can reason about ring buffer wrap-around.

11.2 Implementation

  • Producer/consumer run without corruption.
  • Throughput meets target.
  • Crash recovery works.

11.3 Growth

  • I can explain this project in an interview.
  • I documented performance trade-offs.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Producer and consumer exchange messages via mmap.
  • No message corruption under normal load.

Full Completion:

  • Crash recovery and throughput metrics.
  • Basic memory ordering correctness.

Excellence (Going Above & Beyond):

  • Multi-producer support with correctness tests.
  • Optional durability with batched msync().

This guide was generated from SPRINT_5_SYSTEMS_INTEGRATION_PROJECTS.md. For the complete learning path, see the parent directory.