Project 8: Cache Coherence Simulator

Build a simulator that models MESI/MOESI coherence states and prints state transitions for a sequence of reads and writes.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	2-3 weeks
Main Programming Language	C++ (Alternatives: Python, Rust)
Alternative Programming Languages	Python, Rust
Coolness Level	Level 5: Pure Magic
Business Potential	1. The “Resume Gold”
Prerequisites	Cache hierarchy basics, state machines, concurrency concepts
Key Topics	MESI/MOESI, coherence transactions, bus vs directory

1. Learning Objectives

By completing this project, you will:

Explain MESI/MOESI states and transitions.
Build a simulator that models cache line state changes.
Compare snooping bus vs directory-based coherence.
Visualize coherence events step-by-step.
Validate your simulator against textbook examples.
Use deterministic input sequences for reproducible demos.

2. All Theory Needed (Per-Concept Breakdown)

2.1 MESI/MOESI State Machine

Fundamentals

MESI and MOESI are coherence protocols that define how caches maintain a consistent view of memory. Each cache line can be in states like Modified, Exclusive, Shared, Invalid, and (in MOESI) Owned. Reads and writes trigger transitions between these states. The protocol ensures that at most one cache has a Modified line and that writes invalidate other copies. A simulator models these transitions, making the hidden coherence behavior explicit and deterministic.

Deep Dive into the Concept

In a multiprocessor system, each core has its own cache. When multiple caches contain the same memory line, coherence must ensure that all cores see the most recent value. MESI achieves this with four states:

Modified (M): The line is dirty and owned exclusively by this cache. Memory is stale.
Exclusive (E): The line is clean and only in this cache. Memory is up-to-date.
Shared (S): The line is clean and may exist in multiple caches.
Invalid (I): The line is not valid in this cache.

MOESI adds Owned (O), which allows a dirty line to be shared: the owner supplies data to others without writing back immediately. This can reduce writebacks and improve performance in some patterns.

Transitions are driven by events: local reads, local writes, and bus transactions. For example, if a core in I state reads a line, it issues a BusRd. If no other cache has the line, it transitions to E; if another cache has it, both move to S. If a core in S wants to write, it must issue a BusRdX to invalidate other copies and move to M. These rules can be encoded in a transition table.

The simulator must model these transitions deterministically. It should also model the responses from other caches: when one cache issues a BusRd, others with the line in M or O must supply data, possibly changing their state. The simulator therefore needs a global controller (bus or directory) to coordinate events. The design can be simplified by focusing on a single cache line and a small number of cores, which is sufficient to demonstrate the protocol.

An important educational aspect is to show how false sharing arises: two cores alternately writing to different words on the same line will force repeated invalidations. The simulator can output a trace showing these invalidations, making the coherence cost tangible. You can also include a “latency” or “cost” counter to show how coherence traffic grows.

How this fits on projects

The simulator’s core logic is the MESI/MOESI state machine. Every event in the input sequence triggers a state transition that your simulator must implement correctly.

Definitions & Key Terms

Modified (M) -> Dirty line owned by one cache.
Exclusive (E) -> Clean line owned by one cache.
Shared (S) -> Clean line in multiple caches.
Invalid (I) -> Line not valid.
Owned (O) -> Dirty shared line with a designated owner.

Mental Model Diagram (ASCII)

MESI State Transitions (simplified)
I --(BusRd, no sharers)--> E
I --(BusRd, sharers)-----> S
S --(Write)--------------> M (via BusRdX)
E --(Write)--------------> M
M --(Read by other)------> S (or O in MOESI)

How It Works (Step-by-Step)

Start with all caches in I state.
Apply a read/write event for a core.
Emit a bus or directory transaction if needed.
Update all affected caches’ states.
Record the transition and continue.

Invariants: At most one cache in M state; shared lines must be clean.

Failure modes: Missing transition cases, incorrect state updates on bus events.

Minimal Concrete Example

Sequence: R1 W1 R2 W2
Step 1: R1 -> Line state: E
Step 2: W1 -> Line state: M
Step 3: R2 -> P1:M -> S, P2:S
Step 4: W2 -> P1:I, P2:M

Common Misconceptions

“MESI is just a diagram” -> It is a concrete protocol with strict invariants.
“Only the requester changes state” -> other caches also transition.
“Owned state is optional” -> it affects writeback behavior and sharing.

Check-Your-Understanding Questions

Why can only one cache hold a line in M state?
What happens when a cache in S state wants to write?
How does MOESI reduce writebacks?

Check-Your-Understanding Answers

Because multiple dirty copies would violate coherence.
It must invalidate other copies and transition to M.
The O state allows a dirty shared line without immediate writeback.

Real-World Applications

CPU coherence verification and microarchitecture education.
Debugging false sharing issues in multi-core programs.

Where You’ll Apply It

In this project: Sec. 3.1, Sec. 4.4, Sec. 5.10.
Also used in: P04-false-sharing-detector.

References

“Computer Architecture” (Hennessy & Patterson) – Ch. 5
“Digital Design and Computer Architecture” (Harris & Harris) – Ch. 8

Key Insights

Coherence is a state machine; once you model the transitions, the “mystery” disappears.

Summary

MESI/MOESI define the states and transitions that guarantee coherence. A simulator that applies these rules to a sequence of events makes hardware behavior explicit and testable.

Homework/Exercises to Practice the Concept

Trace the states for sequence: R1 R2 W1 R2.
Add the O state and show how it changes the trace.
Model a false-sharing pattern and count invalidations.

Solutions to the Homework/Exercises

R1->E, R2->S, W1->M (R2 invalidated), R2->S.
O allows a dirty shared line after a read by another core.
Alternating writes cause invalidation on every step.

2.2 Bus vs Directory Coherence Models

Fundamentals

Coherence protocols rely on a mechanism to notify other caches of reads and writes. A snooping bus broadcasts transactions to all caches, while a directory-based protocol keeps a directory of which caches hold each line and sends targeted messages. Snooping is simpler but does not scale to many cores; directory protocols scale better but require more metadata. Your simulator should support both models to show the difference in message flow and scalability.

Deep Dive into the Concept

In a bus-based snooping system, all caches observe all coherence transactions. When a core issues a BusRd or BusRdX, every cache snoops the bus and responds if it has the line. This is simple and fast for small systems, but the bus becomes a bottleneck as core counts increase. Snooping also forces all caches to process every transaction, even if they do not have the line.

Directory-based protocols introduce a directory structure that tracks which caches hold each line. When a core requests a line, it queries the directory. The directory then sends invalidation messages only to caches that have a copy. This reduces broadcast traffic and scales to many cores. The trade-off is additional storage (the directory bits) and latency for directory lookups. Many modern multi-socket systems use directory-based coherence across sockets and snooping within a socket.

For your simulator, you can implement both models at a high level. In snooping mode, every transaction is broadcast and every cache updates its state based on snoop rules. In directory mode, you maintain a map from line to sharers and issue targeted messages. The core state transitions (MESI) are the same, but the message flow and counts differ. You can add counters that show how many messages were sent in each mode for the same access sequence, illustrating scalability differences.

The directory model also introduces new states: for example, the directory can be in “uncached” or “shared” state and track an owner. While you do not need a full-blown directory protocol, modeling the sharer set and owner is sufficient for educational purposes. The key insight is that coherence is not only about cache states but also about how those states are coordinated across the system.

How this fits on projects

This concept shapes the simulator’s coherence engine. You will implement both snooping and directory message flows and compare their traffic.

Definitions & Key Terms

Snooping bus -> Broadcast coherence transactions to all caches.
Directory -> Metadata tracking which caches hold a line.
Sharer set -> The set of caches with a copy of a line.
Invalidation -> Message telling a cache to drop a line.

Mental Model Diagram (ASCII)

Snooping:
Core0 -> BusRdX -> all caches snoop

Directory:
Core0 -> Directory -> invalidate only sharers

How It Works (Step-by-Step)

On a read/write, the core sends a request.
In snooping, all caches observe and respond.
In directory, only sharers are contacted.
Update cache states and directory metadata.

Invariants: Directory accurately tracks sharers; snooping sees all transactions.

Failure modes: stale directory entries, incorrect sharer updates.

Minimal Concrete Example

Directory sharers for line X: {Core1, Core3}
Core0 writes -> Directory sends invalidates to Core1, Core3

Common Misconceptions

“Directory is always faster” -> It adds lookup latency.
“Snooping doesn’t scale at all” -> It scales within sockets.
“Directory means no broadcasts” -> Some systems still broadcast within a node.

Check-Your-Understanding Questions

Why do directory protocols scale better than snooping?
What metadata does a directory need to store?
When might a snooping bus still be preferred?

Check-Your-Understanding Answers

They avoid broadcasting to all caches, reducing traffic.
The sharer set and possibly an owner state.
In small systems where latency matters more than scalability.

Real-World Applications

Coherence across multi-socket servers.
Verification of cache coherence logic in CPU design.

Where You’ll Apply It

In this project: Sec. 4.2, Sec. 4.4.
Also used in: P04-false-sharing-detector.

References

“Computer Architecture” – Ch. 5
“Inside the Machine” (Stokes) – Ch. 6

Key Insights

Coherence protocols are as much about communication topology as they are about state machines.

Summary

Snooping and directory protocols implement the same MESI rules but differ in how they communicate. Modeling both reveals why small systems use snooping while large systems rely on directories.

Homework/Exercises to Practice the Concept

Count messages for a sequence under snooping vs directory.
Simulate a directory with 4 cores and track sharer sets.
Add a counter for invalidation messages and compare.

Solutions to the Homework/Exercises

Snooping sends broadcasts; directory sends targeted messages.
Sharer sets update on each read/write.
Directory reduces invalidation count for sparse sharing.

3. Project Specification

3.1 What You Will Build

A simulator mesi_sim that models cache coherence for a single cache line across multiple cores. It accepts a sequence of read/write events and prints state transitions for MESI or MOESI, optionally comparing snooping vs directory models.

Included: MESI/MOESI state machine, trace output, message counters. Excluded: full memory hierarchy, multi-line simulation.

3.2 Functional Requirements

Parse input sequences like R1 W1 R2 W2.
Simulate MESI or MOESI transitions.
Support snooping bus and directory modes.
Print per-step state transitions.
Report message counts for each mode.
Provide deterministic output with fixed input sequences.

3.3 Non-Functional Requirements

Performance: handle thousands of events quickly.
Reliability: correct transitions for all event types.
Usability: clear, readable output.

3.4 Example Usage / Output

$ ./mesi_sim --sequence "R1 W1 R2 W2" --mode=snoop
Step 1: R1 -> P1:E
Step 2: W1 -> P1:M
Step 3: R2 -> P1:S, P2:S
Step 4: W2 -> P1:I, P2:M

3.5 Data Formats / Schemas / Protocols

JSON output:

{
  "mode": "snoop",
  "steps": [
    {"event": "R1", "states": ["E","I"]},
    {"event": "W1", "states": ["M","I"]}
  ],
  "messages": 6
}

3.6 Edge Cases

Invalid event sequence (unknown core ID).
Unsupported state transition.
Mixed MOESI and MESI output.

3.7 Real World Outcome

You can step through coherence behavior deterministically, explain why invalidations happen, and compare message counts between snooping and directory coherence.

3.7.1 How to Run (Copy/Paste)

cc -O2 -o mesi_sim src/main.cpp
./mesi_sim --sequence "R1 W1 R2 W2" --mode=snoop

3.7.2 Golden Path Demo (Deterministic)

$ ./mesi_sim --sequence "R1 W1 R2 W2" --mode=directory
Step 1: R1 -> P1:E
Step 2: W1 -> P1:M
Step 3: R2 -> P1:S, P2:S
Step 4: W2 -> P1:I, P2:M
Messages: 5

3.7.3 If CLI: Exact Terminal Transcript

$ ./mesi_sim --sequence "R1 R2 W1" --mode=snoop
Step 1: R1 -> P1:E
Step 2: R2 -> P1:S, P2:S
Step 3: W1 -> P1:M, P2:I

3.7.4 Failure Demo (Deterministic)

$ ./mesi_sim --sequence "R9"
ERROR: core id 9 out of range (max=4)
EXIT: 1

Exit Codes:

0 success
1 invalid input
2 transition error

4. Solution Architecture

4.1 High-Level Design

+--------------+   +--------------+   +------------------+
| Parser       |-->| Coherence FSM|-->| Trace Renderer   |
+--------------+   +--------------+   +------------------+

4.2 Key Components

4.3 Data Structures (No Full Code)

enum State { I, S, E, M, O };
struct LineState { State per_core[MAX_CORES]; };

4.4 Algorithm Overview

Key Algorithm: Event Processing

Parse event (R/W + core ID).
Determine required bus/directory transaction.
Update states for all caches.
Record transition and message count.

Complexity Analysis:

Time: O(cores) per event
Space: O(cores)

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install -y build-essential

5.2 Project Structure

mesi_sim/
|-- src/
|   |-- main.cpp
|   |-- fsm.cpp
|   |-- parser.cpp
|   +-- renderer.cpp
|-- tests/
|   +-- sequences.txt
|-- Makefile
+-- README.md

5.3 The Core Question You’re Answering

“How do coherence protocols actually move cache lines between cores?”

5.4 Concepts You Must Understand First

MESI/MOESI state machine.
Bus vs directory coherence.
Event-driven simulation.

5.5 Questions to Guide Your Design

Will you model one line or multiple lines?
How will you encode transitions (table vs code)?
How will you validate correctness?

5.6 Thinking Exercise

Simulate two cores alternately writing the same line. How many invalidations occur?

5.7 The Interview Questions They’ll Ask

“Explain MESI in your own words.”
“What problem does MOESI solve?”
“Why do directory protocols scale better?”

5.8 Hints in Layers

Hint 1: Start with MESI and add MOESI later.

Hint 2: Use a transition table keyed by (state, event).

Hint 3: Add message counters for each event.

Hint 4: Validate with textbook examples.

5.9 Books That Will Help

| Topic | Book | Chapter | |—|—|—| | Coherence protocols | “Computer Architecture” | Ch. 5 | | State machines | “Digital Design and Computer Architecture” | Ch. 8 |

5.10 Implementation Phases

Phase 1: MESI FSM (1 week)

Goals: implement MESI transitions.

Tasks:

Build transition table.
Parse input sequences.

Checkpoint: matches textbook traces.

Phase 2: Bus/Directory Models (1 week)

Goals: add message flow models.

Tasks:

Implement snooping mode.
Implement directory mode.

Checkpoint: message counts differ as expected.

Phase 3: Visualization + JSON (3-4 days)

Goals: clear trace output.

Tasks:

Add step-by-step renderer.
Add JSON output.

Checkpoint: deterministic output for fixed sequences.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Sequence R1 W1 R2 W2 matches expected trace.
Directory mode sends fewer messages than snoop.
Invalid core ID exits with code 1.

6.3 Test Data

R1 W1 R2 W2
R1 R2 W1

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Compare traces with textbook examples.
Add assertions for invariants (single M state).

7.3 Performance Traps

None significant; simulator is small.

8. Extensions & Challenges

8.1 Beginner Extensions

Add colored output for states.
Add step-by-step interactive mode.

8.2 Intermediate Extensions

Support multiple cache lines.
Add latency/cost model.

8.3 Advanced Extensions

Model writeback buffers and non-blocking caches.

9. Real-World Connections

9.1 Industry Applications

Microarchitecture education and verification.
Performance debugging of false sharing.

gem5 – full system simulator.

9.3 Interview Relevance

Coherence protocol explanation is a classic hardware interview topic.

10. Resources

10.1 Essential Reading

“Computer Architecture” – Ch. 5
“Digital Design and Computer Architecture” – Ch. 8

10.2 Video Resources

Coherence protocol lectures.

10.3 Tools & Documentation

gem5 – for deeper simulation study.

11. Self-Assessment Checklist

11.1 Understanding

I can explain MESI and MOESI transitions.
I can explain bus vs directory coherence.

11.2 Implementation

Simulator matches textbook traces.
Output is deterministic.

11.3 Growth

I can explain coherence in interviews.

12. Submission / Completion Criteria

Minimum Viable Completion:

MESI simulation with correct transitions.
Text-based trace output.

Full Completion:

Directory and snooping modes with message counts.
Deterministic output for fixed sequences.

Excellence (Going Above & Beyond):

MOESI support and multi-line simulation.