Project 9: Cache Lab++ -- Cache Simulator + Locality Visualizer

Project 9: Cache Lab++ โ€“ Cache Simulator + Locality Visualizer

Build a set-associative cache simulator and an ASCII locality visualizer that reveals hit/miss patterns, enabling you to optimize code through informed data layout decisions.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2-3 weeks
Language C (Alternatives: Rust, Zig, C++)
Prerequisites Projects 2 and 8 recommended
Key Topics Cache organization, locality, miss classification, replacement policies
CS:APP Chapters 6 (primary), 5 (secondary)

Table of Contents

  1. Learning Objectives
  2. Deep Theoretical Foundation
  3. Project Specification
  4. Solution Architecture
  5. Implementation Guide
  6. Testing Strategy
  7. Common Pitfalls
  8. Extensions
  9. Real-World Connections
  10. Resources
  11. Self-Assessment Checklist

1. Learning Objectives

By completing this project, you will:

  1. Understand cache organization: Master direct-mapped, set-associative, and fully-associative cache structures
  2. Implement address decomposition: Parse addresses into tag, index, and offset components
  3. Classify cache misses: Distinguish compulsory, capacity, and conflict misses through simulation
  4. Apply replacement policies: Implement LRU, FIFO, and random replacement strategies
  5. Visualize memory access patterns: Create ASCII visualizations showing cache behavior over time
  6. Optimize for locality: Design data layouts and access patterns that exploit cache behavior
  7. Measure and analyze: Use your simulator to quantify cache performance and validate optimizations

2. Deep Theoretical Foundation

2.1 The Memory Hierarchy: Why Caches Exist

Modern processors operate orders of magnitude faster than main memory. Without caches, the CPU would spend most of its time waiting for data:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        MEMORY HIERARCHY OVERVIEW                            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   Component        Size           Access Time    Bandwidth                  โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€   โ”‚
โ”‚   CPU Registers    ~1 KB          ~0.25 ns       100s GB/s                  โ”‚
โ”‚   L1 Cache         32-64 KB       ~1 ns          ~500 GB/s                  โ”‚
โ”‚   L2 Cache         256 KB-1 MB    ~3-4 ns        ~200 GB/s                  โ”‚
โ”‚   L3 Cache         8-64 MB        ~10-20 ns      ~100 GB/s                  โ”‚
โ”‚   Main Memory      16-256 GB      ~50-100 ns     ~50 GB/s                   โ”‚
โ”‚   SSD              256 GB-4 TB    ~100 us        ~5 GB/s                    โ”‚
โ”‚   HDD              1-20 TB        ~10 ms         ~200 MB/s                  โ”‚
โ”‚                                                                             โ”‚
โ”‚                         Speed                                               โ”‚
โ”‚                           โ–ฒ                                                 โ”‚
โ”‚                           โ”‚  Registers                                      โ”‚
โ”‚                           โ”‚    โ—                                            โ”‚
โ”‚                           โ”‚      L1 Cache                                   โ”‚
โ”‚                           โ”‚        โ—                                        โ”‚
โ”‚                           โ”‚          L2 Cache                               โ”‚
โ”‚                           โ”‚            โ—                                    โ”‚
โ”‚                           โ”‚              L3 Cache                           โ”‚
โ”‚                           โ”‚                โ—                                โ”‚
โ”‚                           โ”‚                  Main Memory                    โ”‚
โ”‚                           โ”‚                    โ—                            โ”‚
โ”‚                           โ”‚                                                 โ”‚
โ”‚                           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ Size      โ”‚
โ”‚                                                                             โ”‚
โ”‚   Key insight: Smaller = Faster (physics + economics)                       โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Why this hierarchy works: Programs exhibit locality of reference:

  • Temporal locality: Recently accessed items are likely to be accessed again soon
  • Spatial locality: Items near recently accessed items are likely to be accessed soon

Caches exploit both: they keep recently-used data (temporal) in cache lines that hold multiple adjacent bytes (spatial).

2.2 Cache Organization Fundamentals

A cache is organized as a collection of sets, each containing one or more lines (also called blocks):

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        CACHE ORGANIZATION ANATOMY                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   Cache Parameters:                                                         โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                         โ”‚
โ”‚   S = Number of sets                     (S = 2^s, where s = index bits)   โ”‚
โ”‚   E = Lines per set (associativity)      (E = 1: direct-mapped)            โ”‚
โ”‚   B = Bytes per line (block size)        (B = 2^b, where b = offset bits)  โ”‚
โ”‚                                                                             โ”‚
โ”‚   Total cache size: C = S ร— E ร— B bytes                                    โ”‚
โ”‚                                                                             โ”‚
โ”‚                                                                             โ”‚
โ”‚   Example: 4 sets, 2 lines/set, 4 bytes/line (32-byte 2-way set-assoc)     โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚   โ”‚                            CACHE                                    โ”‚  โ”‚
โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚
โ”‚   โ”‚ Set 0 โ”‚ Line 0: [V][Tag: 0x1A][B0][B1][B2][B3]                      โ”‚  โ”‚
โ”‚   โ”‚       โ”‚ Line 1: [V][Tag: 0x2F][B0][B1][B2][B3]                      โ”‚  โ”‚
โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚
โ”‚   โ”‚ Set 1 โ”‚ Line 0: [V][Tag: 0x05][B0][B1][B2][B3]                      โ”‚  โ”‚
โ”‚   โ”‚       โ”‚ Line 1: [ ][Tag: ---][--][--][--][--]  (invalid)            โ”‚  โ”‚
โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚
โ”‚   โ”‚ Set 2 โ”‚ Line 0: [V][Tag: 0x12][B0][B1][B2][B3]                      โ”‚  โ”‚
โ”‚   โ”‚       โ”‚ Line 1: [V][Tag: 0x77][B0][B1][B2][B3]                      โ”‚  โ”‚
โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚
โ”‚   โ”‚ Set 3 โ”‚ Line 0: [V][Tag: 0x00][B0][B1][B2][B3]                      โ”‚  โ”‚
โ”‚   โ”‚       โ”‚ Line 1: [V][Tag: 0x33][B0][B1][B2][B3]                      โ”‚  โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                                             โ”‚
โ”‚   Each line contains:                                                       โ”‚
โ”‚   โ€ข Valid bit (V): Is this line holding valid data?                        โ”‚
โ”‚   โ€ข Tag: High bits of address to identify which block is stored            โ”‚
โ”‚   โ€ข Data block: B bytes of actual cached data                              โ”‚
โ”‚   โ€ข (Optional) Dirty bit: Has this block been modified? (for write-back)   โ”‚
โ”‚   โ€ข (Optional) LRU bits: Track access recency for replacement              โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.3 Cache Types by Associativity

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    ASSOCIATIVITY SPECTRUM                                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   DIRECT-MAPPED (E = 1)                                                     โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                     โ”‚
โ”‚   Each memory block maps to exactly ONE cache line.                         โ”‚
โ”‚                                                                             โ”‚
โ”‚   Pros: Simple, fast lookup (no searching)                                 โ”‚
โ”‚   Cons: High conflict miss rate                                             โ”‚
โ”‚                                                                             โ”‚
โ”‚   Memory:  [A][B][C][D][E][F][G][H][I][J][K][L]...                          โ”‚
โ”‚             โ”‚     โ”‚     โ”‚     โ”‚                                             โ”‚
โ”‚             โ–ผ     โ–ผ     โ–ผ     โ–ผ                                             โ”‚
โ”‚   Cache:   [0]   [1]   [2]   [3]   โ† A, E, I all compete for line 0!       โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   SET-ASSOCIATIVE (1 < E < C/B)                                            โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                          โ”‚
โ”‚   Each block maps to a SET, can go in any line within that set.            โ”‚
โ”‚                                                                             โ”‚
โ”‚   Pros: Reduces conflict misses; reasonable lookup speed                    โ”‚
โ”‚   Cons: More complex; requires replacement policy                           โ”‚
โ”‚                                                                             โ”‚
โ”‚   2-way set-associative example:                                            โ”‚
โ”‚   Memory:  [A][B][C][D][E][F][G][H]...                                      โ”‚
โ”‚             โ”‚  โ”‚  โ”‚  โ”‚                                                      โ”‚
โ”‚             โ–ผ  โ–ผ  โ–ผ  โ–ผ                                                      โ”‚
โ”‚   Set 0:  [_][_]  โ† A and E can BOTH live here now!                        โ”‚
โ”‚   Set 1:  [_][_]  โ† B and F can both live here                             โ”‚
โ”‚                                                                             โ”‚
โ”‚   Common configurations: 2-way, 4-way, 8-way                                โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   FULLY-ASSOCIATIVE (E = C/B, S = 1)                                       โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                        โ”‚
โ”‚   Any block can go in ANY cache line. One giant set.                       โ”‚
โ”‚                                                                             โ”‚
โ”‚   Pros: No conflict misses (only compulsory and capacity)                  โ”‚
โ”‚   Cons: Slow lookup (must search all lines); complex hardware              โ”‚
โ”‚                                                                             โ”‚
โ”‚   Memory:  [A][B][C][D][E][F][G][H]...                                      โ”‚
โ”‚             โ”‚  โ”‚  โ”‚  โ”‚  โ”‚  โ”‚  โ”‚  โ”‚                                          โ”‚
โ”‚             โ””โ”€โ”€โ”ดโ”€โ”€โ”ดโ”€โ”€โ”ดโ”€โ”€โ”ผโ”€โ”€โ”ดโ”€โ”€โ”ดโ”€โ”€โ”˜                                          โ”‚
โ”‚                         โ–ผ                                                   โ”‚
โ”‚   Cache:   [_][_][_][_][_][_][_][_]  โ† Any block can go anywhere           โ”‚
โ”‚                                                                             โ”‚
โ”‚   Used mainly for TLBs and very small caches                                โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.4 Address Decomposition: Tag, Index, Offset

Every memory address is decomposed into three fields that determine cache behavior:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      ADDRESS DECOMPOSITION                                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   For a cache with S = 2^s sets and B = 2^b bytes per block:               โ”‚
โ”‚                                                                             โ”‚
โ”‚   m-bit address:                                                            โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
โ”‚   โ”‚         Tag            โ”‚    Index      โ”‚    Offset    โ”‚                โ”‚
โ”‚   โ”‚      (t bits)          โ”‚   (s bits)    โ”‚   (b bits)   โ”‚                โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ”‚
โ”‚                                                                             โ”‚
โ”‚   Where: t = m - s - b                                                      โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   EXAMPLE: 32-bit address, 16 sets, 64 bytes/block                         โ”‚
โ”‚                                                                             โ”‚
โ”‚   S = 16 = 2^4  โ†’ s = 4 bits for index                                     โ”‚
โ”‚   B = 64 = 2^6  โ†’ b = 6 bits for offset                                    โ”‚
โ”‚   t = 32 - 4 - 6 = 22 bits for tag                                         โ”‚
โ”‚                                                                             โ”‚
โ”‚   Address 0x12345678:                                                       โ”‚
โ”‚   Binary: 0001 0010 0011 0100 0101 0110 0111 1000                          โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                 โ”‚
โ”‚   โ”‚ 0001001000110100010101 โ”‚ 1001 โ”‚ 111000 โ”‚                                โ”‚
โ”‚   โ”‚       Tag (22 bits)    โ”‚Index โ”‚ Offset โ”‚                                โ”‚
โ”‚   โ”‚       = 0x048D15       โ”‚ = 9  โ”‚ = 56   โ”‚                                โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                 โ”‚
โ”‚                                                                             โ”‚
โ”‚   Interpretation:                                                           โ”‚
โ”‚   โ€ข Look in Set 9                                                           โ”‚
โ”‚   โ€ข Compare tag 0x048D15 against all lines in set 9                        โ”‚
โ”‚   โ€ข If match AND valid bit set: HIT! Return byte at offset 56              โ”‚
โ”‚   โ€ข If no match: MISS. Fetch block, store with this tag                    โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   SPECIAL CASES:                                                            โ”‚
โ”‚                                                                             โ”‚
โ”‚   Direct-mapped (E=1, S=C/B):                                              โ”‚
โ”‚   โ€ข Maximum index bits                                                      โ”‚
โ”‚   โ€ข Block maps to exactly one place                                         โ”‚
โ”‚                                                                             โ”‚
โ”‚   Fully-associative (S=1):                                                  โ”‚
โ”‚   โ€ข No index bits (s=0)                                                     โ”‚
โ”‚   โ€ข Tag is almost the entire address                                        โ”‚
โ”‚   โ€ข Must compare tag against ALL lines                                      โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.5 The Cache Line (Block) Concept

A cache line is the fundamental unit of transfer between cache and memory:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                          CACHE LINE CONCEPT                                 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   When CPU requests 1 byte, the cache fetches an ENTIRE BLOCK:             โ”‚
โ”‚                                                                             โ”‚
โ”‚   CPU requests address 0x1004:                                              โ”‚
โ”‚                                                                             โ”‚
โ”‚   Memory:                                                                   โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚   โ”‚ ... โ”‚0x1000โ”‚0x1001โ”‚0x1002โ”‚0x1003โ”‚0x1004โ”‚0x1005โ”‚0x1006โ”‚0x1007โ”‚ ... โ”‚    โ”‚
โ”‚   โ”‚     โ”‚  A   โ”‚  B   โ”‚  C   โ”‚  D   โ”‚ [E]  โ”‚  F   โ”‚  G   โ”‚  H   โ”‚     โ”‚    โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                                    โ”‚                                        โ”‚
โ”‚           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                        โ”‚
โ”‚           โ–ผ  Entire 8-byte block fetched                                    โ”‚
โ”‚   Cache Line:                                                               โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                    โ”‚
โ”‚   โ”‚ V=1 โ”‚ Tag โ”‚ A โ”‚ B โ”‚ C โ”‚ D โ”‚ E โ”‚ F โ”‚ G โ”‚ H โ”‚       โ”‚                    โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                    โ”‚
โ”‚                           โ†‘                                                 โ”‚
โ”‚                    Requested byte                                           โ”‚
โ”‚                                                                             โ”‚
โ”‚   WHY BLOCKS?                                                               โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                              โ”‚
โ”‚   1. DRAM is optimized for sequential access (row buffers)                 โ”‚
โ”‚   2. Exploits SPATIAL LOCALITY: if you access E, you probably              โ”‚
โ”‚      will soon access F, G, H... they're now already cached!               โ”‚
โ”‚   3. Amortizes the cost of the miss over multiple bytes                    โ”‚
โ”‚                                                                             โ”‚
โ”‚   BLOCK SIZE TRADEOFFS:                                                     โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                     โ”‚
โ”‚   Larger blocks:                                                            โ”‚
โ”‚   + Better spatial locality exploitation                                    โ”‚
โ”‚   + Fewer tags (less overhead)                                              โ”‚
โ”‚   - Longer miss penalty (more bytes to transfer)                           โ”‚
โ”‚   - Fewer blocks for given cache size โ†’ more conflict misses               โ”‚
โ”‚   - May fetch unused data (if locality is poor)                            โ”‚
โ”‚                                                                             โ”‚
โ”‚   Typical sizes: 32, 64, or 128 bytes                                      โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.6 Miss Types: The Three Cs

Understanding WHY a miss occurred is crucial for optimization:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         THE THREE Cs OF CACHE MISSES                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   1. COMPULSORY MISSES (Cold Misses)                                       โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                        โ”‚
โ”‚   First access to a block that has never been in cache.                    โ”‚
โ”‚                                                                             โ”‚
โ”‚   Cause: Block simply hasn't been loaded yet                               โ”‚
โ”‚   Fix: Cannot be eliminated! (Must load data at least once)                โ”‚
โ”‚        Can be reduced with prefetching                                      โ”‚
โ”‚                                                                             โ”‚
โ”‚   Example:                                                                  โ”‚
โ”‚   for (i = 0; i < N; i++)                                                  โ”‚
โ”‚       sum += A[i];  // First access to each cache line is compulsory       โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   2. CAPACITY MISSES                                                        โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                        โ”‚
โ”‚   Working set is larger than the cache.                                    โ”‚
โ”‚                                                                             โ”‚
โ”‚   Cause: Too much data for cache to hold                                   โ”‚
โ”‚   Fix: Reduce working set size (blocking/tiling)                           โ”‚
โ”‚        Or get a bigger cache (hardware solution)                           โ”‚
โ”‚                                                                             โ”‚
โ”‚   Example (cache holds 4 blocks):                                           โ”‚
โ”‚   // Working set = 8 blocks (A[0..7])                                      โ”‚
โ”‚   for (pass = 0; pass < 100; pass++)                                       โ”‚
โ”‚       for (i = 0; i < 8; i++)                                              โ”‚
โ”‚           sum += A[i];  // Every pass evicts earlier blocks                โ”‚
โ”‚                                                                             โ”‚
โ”‚   Access pattern: A0,A1,A2,A3,A4,A5,A6,A7, A0(miss!),A1(miss!)...          โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   3. CONFLICT MISSES                                                        โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                        โ”‚
โ”‚   Block evicted due to other blocks mapping to same set.                   โ”‚
โ”‚   (Would NOT have occurred in a fully-associative cache of same size)      โ”‚
โ”‚                                                                             โ”‚
โ”‚   Cause: Limited associativity + unfortunate address mapping               โ”‚
โ”‚   Fix: Increase associativity; change data layout; padding                 โ”‚
โ”‚                                                                             โ”‚
โ”‚   Example (direct-mapped, 4 sets):                                          โ”‚
โ”‚   // A[0] and A[4] map to same set!                                        โ”‚
โ”‚   for (i = 0; i < 100; i++) {                                              โ”‚
โ”‚       sum += A[0];   // miss, load A[0]                                    โ”‚
โ”‚       sum += A[4];   // miss, evicts A[0], load A[4]                       โ”‚
โ”‚       sum += A[0];   // miss! A[0] was evicted by A[4]                     โ”‚
โ”‚       sum += A[4];   // miss! A[4] was evicted by A[0]                     โ”‚
โ”‚   }                                                                         โ”‚
โ”‚   // Every access is a miss! This is "cache thrashing"                     โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   IDENTIFYING MISS TYPE IN SIMULATION:                                      โ”‚
โ”‚                                                                             โ”‚
โ”‚   if (block never seen before):                                             โ”‚
โ”‚       โ†’ COMPULSORY                                                          โ”‚
โ”‚   else if (would hit in fully-associative cache of same size):             โ”‚
โ”‚       โ†’ CONFLICT (evicted due to set collision)                            โ”‚
โ”‚   else:                                                                     โ”‚
โ”‚       โ†’ CAPACITY (evicted due to overall cache full)                       โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.7 Replacement Policies

When a set is full and we need to load a new block, which victim do we evict?

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         REPLACEMENT POLICIES                                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   1. LRU (Least Recently Used)                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                              โ”‚
โ”‚   Evict the block that hasn't been accessed for the longest time.          โ”‚
โ”‚                                                                             โ”‚
โ”‚   Rationale: Temporal locality โ€” recently used items likely used again     โ”‚
โ”‚   Implementation: Track access order per set                                โ”‚
โ”‚                                                                             โ”‚
โ”‚   Example (2-way set, LRU policy):                                          โ”‚
โ”‚   Access sequence: A, B, A, C                                               โ”‚
โ”‚                                                                             โ”‚
โ”‚   After A:  [A*, _]           (* = most recent)                            โ”‚
โ”‚   After B:  [A,  B*]                                                       โ”‚
โ”‚   After A:  [A*, B]           (A touched, now most recent)                 โ”‚
โ”‚   After C:  [A*, C*] โ†’ evict B (LRU), C takes its place                    โ”‚
โ”‚             [A,  C*]                                                       โ”‚
โ”‚                                                                             โ”‚
โ”‚   Pros: Often best hit rate; matches locality assumption                   โ”‚
โ”‚   Cons: Hardware complexity for high associativity;                        โ”‚
โ”‚         O(E) state per set; update on every access                         โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   2. FIFO (First In, First Out)                                            โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                             โ”‚
โ”‚   Evict the block that was loaded earliest.                                โ”‚
โ”‚                                                                             โ”‚
โ”‚   Rationale: Simple approximation; older data less likely needed           โ”‚
โ”‚   Implementation: Circular pointer per set                                  โ”‚
โ”‚                                                                             โ”‚
โ”‚   Example (2-way set, FIFO policy):                                         โ”‚
โ”‚   Access sequence: A, B, A, C                                               โ”‚
โ”‚                                                                             โ”‚
โ”‚   After A:  [A, _]  pointerโ†’0                                              โ”‚
โ”‚   After B:  [A, B]  pointerโ†’0                                              โ”‚
โ”‚   After A:  [A, B]  (hit, pointer unchanged)                               โ”‚
โ”‚   After C:  [C, B]  evict A (oldest), pointerโ†’1                            โ”‚
โ”‚                                                                             โ”‚
โ”‚   Note: A was evicted even though it was just accessed!                    โ”‚
โ”‚   This is Belady's anomaly โ€” FIFO doesn't match recency.                   โ”‚
โ”‚                                                                             โ”‚
โ”‚   Pros: Simple to implement; O(1) state per set                            โ”‚
โ”‚   Cons: Ignores recency; can have poor hit rate                            โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   3. RANDOM                                                                 โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                                  โ”‚
โ”‚   Pick a random victim from the set.                                       โ”‚
โ”‚                                                                             โ”‚
โ”‚   Rationale: Unpredictable; avoids pathological worst-cases                โ”‚
โ”‚   Implementation: Random number generator                                   โ”‚
โ”‚                                                                             โ”‚
โ”‚   Pros: Simplest hardware; no state needed; avoids worst cases             โ”‚
โ”‚   Cons: Unpredictable; may evict hot blocks                                โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   4. LFU (Least Frequently Used)                                           โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                            โ”‚
โ”‚   Evict the block with fewest accesses.                                    โ”‚
โ”‚                                                                             โ”‚
โ”‚   Pros: Good for some access patterns                                      โ”‚
โ”‚   Cons: Doesn't adapt to changing access patterns;                         โ”‚
โ”‚         Infrequently used early blocks never evicted                       โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   COMPARISON (typical hit rates):                                           โ”‚
โ”‚                                                                             โ”‚
โ”‚   LRU โ‰ฅ Pseudo-LRU โ‰ˆ PLRU > FIFO > Random                                  โ”‚
โ”‚                                                                             โ”‚
โ”‚   In practice: Modern CPUs use variants like Pseudo-LRU or                 โ”‚
โ”‚   tree-based approximations for hardware efficiency.                        โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.8 Write Policies

What happens when the CPU writes to a cached block?

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                           WRITE POLICIES                                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   TWO INDEPENDENT DECISIONS:                                                โ”‚
โ”‚                                                                             โ”‚
โ”‚   1. WRITE HIT: What to do when we write to a block already in cache?      โ”‚
โ”‚                                                                             โ”‚
โ”‚      WRITE-THROUGH                    WRITE-BACK                            โ”‚
โ”‚      โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                            โ”‚
โ”‚      Write to cache AND memory        Write to cache only                   โ”‚
โ”‚      immediately                      Mark block "dirty"                    โ”‚
โ”‚                                       Write to memory only on eviction      โ”‚
โ”‚                                                                             โ”‚
โ”‚      + Memory always consistent       + Fewer memory writes                 โ”‚
โ”‚      + Simple                         + Better performance                  โ”‚
โ”‚      - High memory traffic            - Complex (track dirty bits)         โ”‚
โ”‚      - Slow (wait for memory)         - Memory inconsistent until evict    โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   2. WRITE MISS: What to do when we write to a block NOT in cache?         โ”‚
โ”‚                                                                             โ”‚
โ”‚      WRITE-ALLOCATE                   NO-WRITE-ALLOCATE                    โ”‚
โ”‚      โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                    โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                     โ”‚
โ”‚      Load block into cache,           Write directly to memory             โ”‚
โ”‚      then write to cache              Don't load into cache                 โ”‚
โ”‚                                                                             โ”‚
โ”‚      + Good if writing then reading   + Saves cache space for reads        โ”‚
โ”‚      + Exploits spatial locality      - No locality benefit                โ”‚
โ”‚      - Extra memory read on miss                                            โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   COMMON COMBINATIONS:                                                      โ”‚
โ”‚                                                                             โ”‚
โ”‚   Write-Through + No-Write-Allocate                                        โ”‚
โ”‚   โ€ข Writes always go to memory; reads populate cache                       โ”‚
โ”‚   โ€ข Used in some L1 caches                                                 โ”‚
โ”‚                                                                             โ”‚
โ”‚   Write-Back + Write-Allocate (most common today)                          โ”‚
โ”‚   โ€ข Writes stay in cache until eviction                                    โ”‚
โ”‚   โ€ข Exploits temporal locality on writes                                   โ”‚
โ”‚   โ€ข Used in most L2/L3 caches                                              โ”‚
โ”‚                                                                             โ”‚
โ”‚   FOR THIS PROJECT:                                                         โ”‚
โ”‚   We focus on READ behavior. Write policies are a natural extension.       โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.9 Spatial and Temporal Locality

These are the principles that make caching effective:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      LOCALITY PRINCIPLES                                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   TEMPORAL LOCALITY                                                         โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                         โ”‚
โ”‚   "If you accessed it recently, you'll probably access it again soon."     โ”‚
โ”‚                                                                             โ”‚
โ”‚   Good temporal locality:              Poor temporal locality:              โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€             โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€            โ”‚
โ”‚   sum = 0;                             // Process each element once         โ”‚
โ”‚   for (i = 0; i < 1000; i++) {         for (i = 0; i < N; i++)              โ”‚
โ”‚       sum += A[i % 10];  // Reuse!         process(A[i]);  // No reuse     โ”‚
โ”‚   }                                                                         โ”‚
โ”‚                                                                             โ”‚
โ”‚   Access pattern: 0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,...              โ”‚
โ”‚   Each element accessed 100 times โ†’ 99% reuse from cache!                  โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   SPATIAL LOCALITY                                                          โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                          โ”‚
โ”‚   "If you accessed location X, you'll probably access nearby locations."   โ”‚
โ”‚                                                                             โ”‚
โ”‚   Good spatial locality:               Poor spatial locality:               โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€               โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€            โ”‚
โ”‚   // Sequential access                 // Stride access                     โ”‚
โ”‚   for (i = 0; i < N; i++)              for (i = 0; i < N; i += 64)          โ”‚
โ”‚       sum += A[i];                         sum += A[i];                     โ”‚
โ”‚                                                                             โ”‚
โ”‚   Access pattern: 0,1,2,3,4,5,6,7...   Access pattern: 0,64,128,192...     โ”‚
โ”‚   Each cache line fully utilized       Each cache line: 1 useful byte      โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   QUANTIFYING LOCALITY:                                                     โ”‚
โ”‚                                                                             โ”‚
โ”‚   Miss Rate = Misses / Total Accesses                                      โ”‚
โ”‚                                                                             โ”‚
โ”‚   For sequential access with B-byte blocks:                                โ”‚
โ”‚   Miss rate โ‰ˆ 1/B (one miss per B bytes โ†’ B-1 hits)                        โ”‚
โ”‚                                                                             โ”‚
โ”‚   For stride-K access with B-byte blocks:                                  โ”‚
โ”‚   If K < B: Miss rate โ‰ˆ 1/B (still get some spatial locality)             โ”‚
โ”‚   If K โ‰ฅ B: Miss rate โ‰ˆ 1/1 = 100% (no spatial locality!)                 โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   MATRIX ACCESS PATTERNS:                                                   โ”‚
โ”‚                                                                             โ”‚
โ”‚   Row-major storage: A[i][j] at address base + i*N + j                     โ”‚
โ”‚                                                                             โ”‚
โ”‚   Row-wise access (good):              Column-wise access (bad):            โ”‚
โ”‚   for (i...) for (j...)                for (j...) for (i...)               โ”‚
โ”‚       A[i][j]                              A[i][j]                          โ”‚
โ”‚                                                                             โ”‚
โ”‚   Memory:  [0,0][0,1][0,2][0,3][1,0][1,1][1,2][1,3]...                      โ”‚
โ”‚   Access:   1    2    3    4    9   10   11   12  (row-wise)               โ”‚
โ”‚   Access:   1    5    9   13    2    6   10   14  (col-wise = strided!)    โ”‚
โ”‚                                                                             โ”‚
โ”‚   Column-wise in row-major = stride of N elements!                         โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.10 Blocking (Tiling) for Cache Optimization

The key optimization technique for improving cache performance:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     BLOCKING / TILING TECHNIQUE                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   PROBLEM: Matrix multiply naive version has poor locality                 โ”‚
โ”‚                                                                             โ”‚
โ”‚   // C = A ร— B  (N ร— N matrices)                                           โ”‚
โ”‚   for (i = 0; i < N; i++)                                                  โ”‚
โ”‚       for (j = 0; j < N; j++)                                              โ”‚
โ”‚           for (k = 0; k < N; k++)                                          โ”‚
โ”‚               C[i][j] += A[i][k] * B[k][j];                                โ”‚
โ”‚                                                                             โ”‚
โ”‚   Access analysis (row-major storage):                                      โ”‚
โ”‚   โ€ข A[i][k]: Row-wise access (good)                                        โ”‚
โ”‚   โ€ข B[k][j]: Column-wise access (BAD - stride N!)                          โ”‚
โ”‚   โ€ข C[i][j]: Repeated access (good temporal locality)                      โ”‚
โ”‚                                                                             โ”‚
โ”‚   For large N, B causes cache thrashing.                                   โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   SOLUTION: Process in cache-sized blocks                                   โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”‚
โ”‚   โ”‚                                                                 โ”‚      โ”‚
โ”‚   โ”‚    A                    B                    C                  โ”‚      โ”‚
โ”‚   โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”‚      โ”‚
โ”‚   โ”‚   โ”‚โ–ˆโ–ˆโ–ˆโ”‚   โ”‚   โ”‚        โ”‚โ–ˆโ–ˆโ–ˆโ”‚   โ”‚   โ”‚        โ”‚โ–ˆโ–ˆโ–ˆโ”‚   โ”‚   โ”‚      โ”‚      โ”‚
โ”‚   โ”‚   โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ค   ร—    โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ค   =    โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ค      โ”‚      โ”‚
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚        โ”‚   โ”‚   โ”‚   โ”‚        โ”‚   โ”‚   โ”‚   โ”‚      โ”‚      โ”‚
โ”‚   โ”‚   โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ค        โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ค        โ”œโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”ค      โ”‚      โ”‚
โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚        โ”‚   โ”‚   โ”‚   โ”‚        โ”‚   โ”‚   โ”‚   โ”‚      โ”‚      โ”‚
โ”‚   โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚      โ”‚
โ”‚   โ”‚                                                                 โ”‚      โ”‚
โ”‚   โ”‚   Process one block at a time:                                 โ”‚      โ”‚
โ”‚   โ”‚   C[block] += A[row_block] ร— B[col_block]                      โ”‚      โ”‚
โ”‚   โ”‚                                                                 โ”‚      โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
โ”‚                                                                             โ”‚
โ”‚   BLOCKED MATRIX MULTIPLY:                                                  โ”‚
โ”‚                                                                             โ”‚
โ”‚   // Block size B chosen so 3 blocks fit in cache                          โ”‚
โ”‚   for (ii = 0; ii < N; ii += BLOCK)                                        โ”‚
โ”‚       for (jj = 0; jj < N; jj += BLOCK)                                    โ”‚
โ”‚           for (kk = 0; kk < N; kk += BLOCK)                                โ”‚
โ”‚               // Mini matrix multiply on blocks                             โ”‚
โ”‚               for (i = ii; i < min(ii+BLOCK, N); i++)                      โ”‚
โ”‚                   for (j = jj; j < min(jj+BLOCK, N); j++)                  โ”‚
โ”‚                       for (k = kk; k < min(kk+BLOCK, N); k++)              โ”‚
โ”‚                           C[i][j] += A[i][k] * B[k][j];                    โ”‚
โ”‚                                                                             โ”‚
โ”‚   CHOOSING BLOCK SIZE:                                                      โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                      โ”‚
โ”‚   โ€ข Need 3 blocks in cache: A_block, B_block, C_block                      โ”‚
โ”‚   โ€ข Each block is BLOCK ร— BLOCK elements                                   โ”‚
โ”‚   โ€ข Total: 3 ร— BLOCKยฒ ร— sizeof(element) < Cache Size                       โ”‚
โ”‚   โ€ข BLOCK โ‰ˆ sqrt(CacheSize / (3 ร— sizeof(element)))                        โ”‚
โ”‚                                                                             โ”‚
โ”‚   Example: 32KB L1 cache, double (8 bytes)                                 โ”‚
โ”‚   BLOCK โ‰ˆ sqrt(32768 / 24) โ‰ˆ 36                                            โ”‚
โ”‚   Use BLOCK = 32 (power of 2 for alignment)                                โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.11 Cache-Oblivious Algorithms

An advanced concept: algorithms that are efficient without knowing cache size:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      CACHE-OBLIVIOUS ALGORITHMS                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   CACHE-AWARE (blocking):                                                   โ”‚
โ”‚   โ€ข You choose block size based on known cache size                        โ”‚
โ”‚   โ€ข Good: Optimal for that cache                                           โ”‚
โ”‚   โ€ข Bad: Need to retune for different caches/hierarchy levels              โ”‚
โ”‚                                                                             โ”‚
โ”‚   CACHE-OBLIVIOUS:                                                          โ”‚
โ”‚   โ€ข Algorithm is designed to work well for ANY cache size                  โ”‚
โ”‚   โ€ข Uses recursive divide-and-conquer                                      โ”‚
โ”‚   โ€ข Automatically adapts to all levels of hierarchy                        โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚                                                                             โ”‚
โ”‚   EXAMPLE: Cache-oblivious matrix multiply                                  โ”‚
โ”‚                                                                             โ”‚
โ”‚   void matmul(A, B, C, n) {                                                โ”‚
โ”‚       if (n <= BASE_CASE) {                                                โ”‚
โ”‚           // Direct multiply (fits in registers)                           โ”‚
โ”‚           base_case_multiply(A, B, C);                                     โ”‚
โ”‚       } else {                                                             โ”‚
โ”‚           // Divide each matrix into quadrants                             โ”‚
โ”‚           //  โ”Œโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”                                                  โ”‚
โ”‚           //  โ”‚ A11โ”‚ A12โ”‚                                                  โ”‚
โ”‚           //  โ”œโ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”ค                                                  โ”‚
โ”‚           //  โ”‚ A21โ”‚ A22โ”‚                                                  โ”‚
โ”‚           //  โ””โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”˜                                                  โ”‚
โ”‚           //                                                               โ”‚
โ”‚           // C11 = A11ร—B11 + A12ร—B21                                      โ”‚
โ”‚           // C12 = A11ร—B12 + A12ร—B22                                      โ”‚
โ”‚           // C21 = A21ร—B11 + A22ร—B21                                      โ”‚
โ”‚           // C22 = A21ร—B12 + A22ร—B22                                      โ”‚
โ”‚                                                                             โ”‚
โ”‚           // 8 recursive multiplies, 4 additions                           โ”‚
โ”‚           matmul(A11, B11, T1, n/2);                                       โ”‚
โ”‚           matmul(A12, B21, T2, n/2);                                       โ”‚
โ”‚           add(T1, T2, C11, n/2);                                           โ”‚
โ”‚           // ... etc.                                                      โ”‚
โ”‚       }                                                                     โ”‚
โ”‚   }                                                                         โ”‚
โ”‚                                                                             โ”‚
โ”‚   WHY IT WORKS:                                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                             โ”‚
โ”‚   At some recursion level, subproblems fit in cache.                       โ”‚
โ”‚   From then on, all accesses are cache hits.                               โ”‚
โ”‚   This happens automatically for any cache size!                           โ”‚
โ”‚                                                                             โ”‚
โ”‚   THEORETICAL GUARANTEE:                                                    โ”‚
โ”‚   Cache misses = O(Nยณ / (B ร— sqrt(M)))                                     โ”‚
โ”‚   Where: N = matrix size, B = block size, M = cache size                   โ”‚
โ”‚   This is optimal for any M and B!                                         โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3. Project Specification

3.1 What You Will Build

A two-part system:

  1. Cache Simulator: A configurable set-associative cache simulator that processes memory access traces and reports hit/miss statistics with miss classification.

  2. Locality Visualizer: An ASCII-art visualization tool that shows cache behavior over time, making hit/miss patterns visible and intuitive.

3.2 Functional Requirements

Cache Simulator (csim):

  1. Configuration (-s, -E, -b flags):
    • Accept cache parameters: s (index bits), E (associativity), b (block bits)
    • Support cache sizes from tiny (s=1, E=1, b=2) to realistic (s=10, E=8, b=6)
  2. Trace Processing:
    • Read Valgrind-format memory traces:
      I 0400d7d4,8      (instruction fetch - optional to handle)
       L 7ff0005c8,8    (data load)
       S 7ff0005d0,4    (data store)
       M 0421c7f0,4     (modify = load then store)
      
    • Process each access and update cache state
  3. Statistics Reporting:
    • Count: hits, misses, evictions
    • Classify misses: compulsory, capacity, conflict
    • Support verbose mode showing each access result
  4. Replacement Policies:
    • Implement LRU (required)
    • FIFO and Random (stretch goals)

Locality Visualizer (lvis):

  1. Visual Output:
    • Display cache state as ASCII art after each access (or at intervals)
    • Show which sets are being accessed
    • Highlight hits (green/+) vs misses (red/-)
  2. Pattern Recognition:
    • Identify and annotate access patterns (sequential, strided, random)
    • Calculate and display running miss rate
  3. Code Path Mode:
    • Accept annotated traces that include source location
    • Group statistics by code region

3.3 Non-Functional Requirements

  • Correctness: Match reference simulator output exactly
  • Performance: Handle traces with millions of accesses in reasonable time
  • Clarity: Output must be educational, not just numerical
  • Modularity: Clean separation between simulator core and visualization

3.4 Example Usage and Output

Basic Simulation:

$ ./csim -s 4 -E 2 -b 4 -t traces/yi.trace
hits:4 misses:5 evictions:3

$ ./csim -v -s 4 -E 2 -b 4 -t traces/yi.trace
L 10,1 miss
M 20,1 miss hit
L 22,1 hit
S 18,1 hit
L 110,1 miss eviction
L 210,1 miss eviction
M 12,1 miss eviction hit
hits:4 misses:5 evictions:3

Miss Classification:

$ ./csim -s 4 -E 2 -b 4 -t traces/yi.trace --classify
Cache Configuration:
  Sets: 16 (s=4)
  Lines per set: 2 (E=2)
  Block size: 16 bytes (b=4)
  Total size: 512 bytes

Results:
  Hits:       4  (44.4%)
  Misses:     5  (55.6%)
    Compulsory: 4  (80.0% of misses)
    Capacity:   0  (0.0% of misses)
    Conflict:   1  (20.0% of misses)
  Evictions:  3

Locality Visualizer Output:

$ ./lvis -s 2 -E 2 -b 2 -t traces/simple.trace

=== CACHE LOCALITY VISUALIZER ===
Cache: 4 sets ร— 2 ways ร— 4 bytes = 32 bytes

Access #1: L 0x00 [MISS - Compulsory]
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Set 0: [โ–ˆโ–ˆโ–ˆโ–ˆ tag=0x00] [        ] โ”‚  โ† ACCESSED (miss)
โ”‚ Set 1: [             ] [        ] โ”‚
โ”‚ Set 2: [             ] [        ] โ”‚
โ”‚ Set 3: [             ] [        ] โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Running: hits=0 misses=1 (0.0% hit rate)

Access #2: L 0x04 [MISS - Compulsory]
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Set 0: [โ–ˆโ–ˆโ–ˆโ–ˆ tag=0x00] [        ] โ”‚
โ”‚ Set 1: [โ–ˆโ–ˆโ–ˆโ–ˆ tag=0x00] [        ] โ”‚  โ† ACCESSED (miss)
โ”‚ Set 2: [             ] [        ] โ”‚
โ”‚ Set 3: [             ] [        ] โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Running: hits=0 misses=2 (0.0% hit rate)

Access #3: L 0x00 [HIT]
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Set 0: [โ–ˆโ–ˆโ–ˆโ–ˆ tag=0x00] [        ] โ”‚  โ† ACCESSED (HIT!)
โ”‚ Set 1: [โ–ˆโ–ˆโ–ˆโ–ˆ tag=0x00] [        ] โ”‚
โ”‚ Set 2: [             ] [        ] โ”‚
โ”‚ Set 3: [             ] [        ] โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Running: hits=1 misses=2 (33.3% hit rate)

Access #4: L 0x10 [MISS - Conflict]
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Set 0: [โ–ˆโ–ˆโ–ˆโ–ˆ tag=0x01] [โ–ˆโ–ˆโ–ˆโ–ˆ 0x00]โ”‚  โ† ACCESSED (miss, evicted old)
โ”‚ Set 1: [โ–ˆโ–ˆโ–ˆโ–ˆ tag=0x00] [        ] โ”‚
โ”‚ Set 2: [             ] [        ] โ”‚
โ”‚ Set 3: [             ] [        ] โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Running: hits=1 misses=3 (25.0% hit rate)

=== PATTERN ANALYSIS ===
Stride detected: 4 bytes (matches block size)
Spatial locality: GOOD (sequential within blocks)
Temporal locality: MODERATE (some reuse observed)
Conflict misses: 1 (33% of misses) - consider increasing associativity

4. Solution Architecture

4.1 High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         SYSTEM ARCHITECTURE                                 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚   โ”‚             โ”‚       โ”‚              CACHE SIMULATOR                โ”‚    โ”‚
โ”‚   โ”‚   Trace     โ”‚       โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚    โ”‚
โ”‚   โ”‚   Parser    โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  โ”‚         Cache Controller               โ”‚โ”‚    โ”‚
โ”‚   โ”‚             โ”‚       โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚โ”‚    โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚  โ”‚  โ”‚ Address   โ”‚  โ”‚ Cache State       โ”‚  โ”‚โ”‚    โ”‚
โ”‚         โ–ฒ               โ”‚  โ”‚  โ”‚ Decomposerโ”‚  โ”‚ (sets, lines)     โ”‚  โ”‚โ”‚    โ”‚
โ”‚         โ”‚               โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚โ”‚    โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚โ”‚    โ”‚
โ”‚   โ”‚   Trace     โ”‚       โ”‚  โ”‚  โ”‚ Hit/Miss  โ”‚  โ”‚ Replacement       โ”‚  โ”‚โ”‚    โ”‚
โ”‚   โ”‚   Files     โ”‚       โ”‚  โ”‚  โ”‚ Logic     โ”‚  โ”‚ Policy (LRU/FIFO) โ”‚  โ”‚โ”‚    โ”‚
โ”‚   โ”‚ (valgrind)  โ”‚       โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚โ”‚    โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚    โ”‚
โ”‚                         โ”‚                      โ”‚                       โ”‚    โ”‚
โ”‚                         โ”‚                      โ–ผ                       โ”‚    โ”‚
โ”‚                         โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚    โ”‚
โ”‚                         โ”‚  โ”‚          Statistics Collector           โ”‚โ”‚    โ”‚
โ”‚                         โ”‚  โ”‚  hits, misses, evictions, miss types    โ”‚โ”‚    โ”‚
โ”‚                         โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚    โ”‚
โ”‚                         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚                                              โ”‚                              โ”‚
โ”‚                         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”‚
โ”‚                         โ–ผ                                         โ–ผ        โ”‚
โ”‚               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚               โ”‚  Text Reporter  โ”‚                     โ”‚   ASCII         โ”‚  โ”‚
โ”‚               โ”‚  (stats output) โ”‚                     โ”‚   Visualizer    โ”‚  โ”‚
โ”‚               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.2 Key Data Structures

/* Cache line structure */
typedef struct {
    int valid;           /* Valid bit */
    unsigned long tag;   /* Tag bits */
    int lru_counter;     /* For LRU replacement (higher = more recent) */
    int load_time;       /* For FIFO replacement */
    int dirty;           /* Dirty bit (for write-back, extension) */
} cache_line_t;

/* Cache set structure */
typedef struct {
    cache_line_t *lines; /* Array of E lines */
    int num_lines;       /* Associativity (E) */
} cache_set_t;

/* Cache structure */
typedef struct {
    cache_set_t *sets;   /* Array of S sets */
    int num_sets;        /* S = 2^s */
    int lines_per_set;   /* E */
    int block_size;      /* B = 2^b */

    /* Bit counts for address decomposition */
    int s;               /* Index bits */
    int b;               /* Offset bits */
    int t;               /* Tag bits = m - s - b */

    /* Statistics */
    int hits;
    int misses;
    int evictions;

    /* Miss classification */
    int compulsory_misses;
    int capacity_misses;
    int conflict_misses;

    /* Global LRU counter */
    int lru_clock;

    /* For miss classification: track all unique blocks ever seen */
    /* (Using a set/hash table in real implementation) */
} cache_t;

/* Memory access structure */
typedef struct {
    char op;             /* 'L' (load), 'S' (store), 'M' (modify) */
    unsigned long addr;  /* Memory address */
    int size;            /* Size of access in bytes */
} mem_access_t;

/* Access result for visualization */
typedef struct {
    int hit;             /* 1 if hit, 0 if miss */
    int eviction;        /* 1 if eviction occurred */
    int set_index;       /* Which set was accessed */
    int line_index;      /* Which line in set */
    unsigned long tag;   /* Tag of accessed block */
    char miss_type;      /* 'C'ompulsory, 'A'pacity, 'O'nflict */
} access_result_t;

4.3 Module Structure

cache-simulator/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main.c              # Entry point, CLI parsing
โ”‚   โ”œโ”€โ”€ cache.c             # Cache simulation logic
โ”‚   โ”œโ”€โ”€ cache.h             # Cache data structures and API
โ”‚   โ”œโ”€โ”€ trace.c             # Trace file parsing
โ”‚   โ”œโ”€โ”€ trace.h             # Trace structures
โ”‚   โ”œโ”€โ”€ lru.c               # LRU replacement policy
โ”‚   โ”œโ”€โ”€ replacement.h       # Replacement policy interface
โ”‚   โ”œโ”€โ”€ classify.c          # Miss classification logic
โ”‚   โ”œโ”€โ”€ stats.c             # Statistics collection
โ”‚   โ”œโ”€โ”€ visualizer.c        # ASCII visualization
โ”‚   โ””โ”€โ”€ visualizer.h        # Visualizer API
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ traces/             # Test trace files
โ”‚   โ”‚   โ”œโ”€โ”€ yi.trace
โ”‚   โ”‚   โ”œโ”€โ”€ dave.trace
โ”‚   โ”‚   โ””โ”€โ”€ trans.trace
โ”‚   โ”œโ”€โ”€ test_cache.c        # Unit tests for cache logic
โ”‚   โ”œโ”€โ”€ test_lru.c          # Unit tests for LRU
โ”‚   โ””โ”€โ”€ expected/           # Expected outputs
โ”œโ”€โ”€ tools/
โ”‚   โ”œโ”€โ”€ gen_trace.py        # Generate custom traces
โ”‚   โ””โ”€โ”€ analyze_pattern.py  # Pattern analysis helper
โ”œโ”€โ”€ Makefile
โ””โ”€โ”€ README.md

4.4 Core Algorithm: Cache Access

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      CACHE ACCESS ALGORITHM                                 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   access_cache(cache, address):                                             โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                             โ”‚
โ”‚                                                                             โ”‚
โ”‚   1. DECOMPOSE ADDRESS                                                      โ”‚
โ”‚      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
โ”‚      โ”‚ tag = address >> (s + b)                โ”‚                           โ”‚
โ”‚      โ”‚ set_index = (address >> b) & ((1 << s) - 1)                         โ”‚
โ”‚      โ”‚ offset = address & ((1 << b) - 1)       โ”‚                           โ”‚
โ”‚      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
โ”‚                                                                             โ”‚
โ”‚   2. SEARCH SET FOR MATCHING TAG                                            โ”‚
โ”‚      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
โ”‚      โ”‚ set = cache.sets[set_index]             โ”‚                           โ”‚
โ”‚      โ”‚ for each line in set:                   โ”‚                           โ”‚
โ”‚      โ”‚     if line.valid AND line.tag == tag:  โ”‚                           โ”‚
โ”‚      โ”‚         โ†’ HIT! Update LRU, return       โ”‚                           โ”‚
โ”‚      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
โ”‚                                                                             โ”‚
โ”‚   3. CACHE MISS - NEED TO LOAD BLOCK                                        โ”‚
โ”‚      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
โ”‚      โ”‚ // First check for empty line           โ”‚                           โ”‚
โ”‚      โ”‚ for each line in set:                   โ”‚                           โ”‚
โ”‚      โ”‚     if NOT line.valid:                  โ”‚                           โ”‚
โ”‚      โ”‚         โ†’ Use this empty line           โ”‚                           โ”‚
โ”‚      โ”‚         โ†’ No eviction needed            โ”‚                           โ”‚
โ”‚      โ”‚         break                           โ”‚                           โ”‚
โ”‚      โ”‚                                         โ”‚                           โ”‚
โ”‚      โ”‚ // No empty line - must evict           โ”‚                           โ”‚
โ”‚      โ”‚ if all lines valid:                     โ”‚                           โ”‚
โ”‚      โ”‚     victim = select_victim(set)  // LRU โ”‚                           โ”‚
โ”‚      โ”‚     eviction = true                     โ”‚                           โ”‚
โ”‚      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
โ”‚                                                                             โ”‚
โ”‚   4. INSTALL NEW BLOCK                                                      โ”‚
โ”‚      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
โ”‚      โ”‚ line.valid = true                       โ”‚                           โ”‚
โ”‚      โ”‚ line.tag = tag                          โ”‚                           โ”‚
โ”‚      โ”‚ line.lru_counter = cache.lru_clock++   โ”‚                           โ”‚
โ”‚      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
โ”‚                                                                             โ”‚
โ”‚   5. UPDATE STATISTICS                                                      โ”‚
โ”‚      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
โ”‚      โ”‚ if hit: cache.hits++                    โ”‚                           โ”‚
โ”‚      โ”‚ else:   cache.misses++                  โ”‚                           โ”‚
โ”‚      โ”‚ if eviction: cache.evictions++          โ”‚                           โ”‚
โ”‚      โ”‚ classify_miss(...)                      โ”‚                           โ”‚
โ”‚      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

5. Implementation Guide

5.1 Development Environment Setup

# Required tools
sudo apt-get install gcc gdb valgrind build-essential

# Optional: valgrind for generating traces
# (Traces are provided, but you can generate your own)

# Create project structure
mkdir -p cache-simulator/{src,tests/traces,tools}
cd cache-simulator

5.2 Project Structure

cache-simulator/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main.c
โ”‚   โ”œโ”€โ”€ cache.c / cache.h
โ”‚   โ”œโ”€โ”€ trace.c / trace.h
โ”‚   โ”œโ”€โ”€ visualizer.c / visualizer.h
โ”‚   โ””โ”€โ”€ stats.c / stats.h
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ traces/
โ”‚   โ””โ”€โ”€ test_*.c
โ”œโ”€โ”€ Makefile
โ””โ”€โ”€ README.md

5.3 Implementation Phases

Phase 1: Address Decomposition (Days 1-2)

Goals:

  • Implement address parsing into tag, index, offset
  • Validate decomposition with known examples

Tasks:

  1. Create cache.h with basic structures
  2. Implement decompose_address(addr, s, b, &tag, &index, &offset)
  3. Write unit tests with hand-calculated examples
  4. Test edge cases (addresses at block boundaries)

Test Cases:

// Cache: s=4, b=4 (16 sets, 16-byte blocks)
// Address: 0x12345678

// Expected decomposition:
// offset = 0x8 (4 bits)
// index = 0x7 (4 bits)
// tag = 0x1234567 (24 bits)

void test_decompose() {
    int s = 4, b = 4;
    unsigned long addr = 0x12345678;
    unsigned long tag, index, offset;

    decompose_address(addr, s, b, &tag, &index, &offset);

    assert(offset == 0x8);
    assert(index == 0x7);
    assert(tag == 0x1234567);
    printf("Address decomposition: PASSED\n");
}

Checkpoint: Address decomposition passes all unit tests.

Phase 2: Cache Data Structures (Days 3-4)

Goals:

  • Implement cache initialization
  • Create set and line data structures
  • Implement cache state printing (for debugging)

Tasks:

  1. Implement cache_t* cache_init(int s, int E, int b)
  2. Implement cache_free(cache_t *cache)
  3. Implement cache_print_state(cache_t *cache) for debugging
  4. Validate memory allocation/deallocation

Implementation:

cache_t* cache_init(int s, int E, int b) {
    cache_t *cache = malloc(sizeof(cache_t));

    cache->s = s;
    cache->b = b;
    cache->num_sets = 1 << s;        // 2^s
    cache->lines_per_set = E;
    cache->block_size = 1 << b;      // 2^b
    cache->t = 64 - s - b;           // Assuming 64-bit addresses

    // Allocate sets
    cache->sets = malloc(cache->num_sets * sizeof(cache_set_t));
    for (int i = 0; i < cache->num_sets; i++) {
        cache->sets[i].lines = calloc(E, sizeof(cache_line_t));
        cache->sets[i].num_lines = E;
    }

    // Initialize statistics
    cache->hits = cache->misses = cache->evictions = 0;
    cache->lru_clock = 0;

    return cache;
}

Checkpoint: Cache initializes correctly; memory tools show no leaks.

Phase 3: Cache Access Logic (Days 5-8)

Goals:

  • Implement cache lookup
  • Implement cache update on miss
  • Track hits, misses, evictions

Tasks:

  1. Implement access_result_t cache_access(cache_t *cache, unsigned long addr)
  2. Implement hit detection (tag match + valid)
  3. Implement miss handling (find empty line or evict)
  4. Update statistics on each access
  5. Test with simple traces

Core Implementation:

access_result_t cache_access(cache_t *cache, unsigned long addr) {
    access_result_t result = {0};

    unsigned long tag, set_index, offset;
    decompose_address(addr, cache->s, cache->b, &tag, &set_index, &offset);

    cache_set_t *set = &cache->sets[set_index];
    result.set_index = set_index;
    result.tag = tag;

    // Check for hit
    for (int i = 0; i < set->num_lines; i++) {
        if (set->lines[i].valid && set->lines[i].tag == tag) {
            // HIT
            result.hit = 1;
            result.line_index = i;
            cache->hits++;
            set->lines[i].lru_counter = cache->lru_clock++;
            return result;
        }
    }

    // MISS
    cache->misses++;
    result.hit = 0;

    // Find empty line or evict
    int target_line = -1;
    for (int i = 0; i < set->num_lines; i++) {
        if (!set->lines[i].valid) {
            target_line = i;
            break;
        }
    }

    if (target_line == -1) {
        // All lines full - evict LRU
        target_line = find_lru_victim(set);
        result.eviction = 1;
        cache->evictions++;
    }

    // Install new block
    set->lines[target_line].valid = 1;
    set->lines[target_line].tag = tag;
    set->lines[target_line].lru_counter = cache->lru_clock++;
    result.line_index = target_line;

    return result;
}

Checkpoint: Simple hand-traced examples produce correct hit/miss counts.

Phase 4: LRU Replacement (Days 9-10)

Goals:

  • Implement correct LRU replacement
  • Validate against known LRU behavior

Tasks:

  1. Implement find_lru_victim(cache_set_t *set)
  2. Ensure LRU counters update on every access (hit or miss)
  3. Test LRU ordering with specific sequences

LRU Test Case:

// 2-way set associative, accessing same set
// Sequence: A, B, A, C
// Expected: A, B both cached; access A updates recency;
//           C evicts B (not A, because A more recent)

void test_lru() {
    cache_t *cache = cache_init(1, 2, 4);  // 2 sets, 2-way, 16B blocks

    // All map to set 0
    unsigned long A = 0x00;  // tag 0
    unsigned long B = 0x20;  // tag 1 (same set, different tag)
    unsigned long C = 0x40;  // tag 2 (same set)

    cache_access(cache, A);  // Miss, load A
    cache_access(cache, B);  // Miss, load B. Set: [A, B]
    cache_access(cache, A);  // Hit on A. A now more recent

    access_result_t r = cache_access(cache, C);  // Miss, evict B (LRU)

    assert(r.eviction == 1);
    // Verify B was evicted, not A
    assert(cache->sets[0].lines[0].tag == 0 ||
           cache->sets[0].lines[1].tag == 0);  // A still present
    assert(cache->sets[0].lines[0].tag == 2 ||
           cache->sets[0].lines[1].tag == 2);  // C now present

    printf("LRU replacement: PASSED\n");
    cache_free(cache);
}

Checkpoint: LRU test cases pass; replacement order is correct.

Phase 5: Trace Parser (Days 11-12)

Goals:

  • Parse Valgrind trace format
  • Handle all operation types (L, S, M)

Tasks:

  1. Implement trace_open(filename)
  2. Implement trace_next_access(trace, &access)
  3. Handle โ€˜Mโ€™ as two accesses (load then store)
  4. Validate against provided traces

Trace Format:

I 0400d7d4,8      <- Instruction fetch (ignore or optional)
 L 7ff0005c8,8    <- Data load (note leading space)
 S 7ff0005d0,4    <- Data store
 M 0421c7f0,4     <- Modify (load + store)

Implementation:

int trace_parse_line(char *line, mem_access_t *access) {
    if (line[0] == 'I') {
        return 0;  // Skip instruction fetches
    }

    char op;
    unsigned long addr;
    int size;

    if (sscanf(line, " %c %lx,%d", &op, &addr, &size) == 3) {
        access->op = op;
        access->addr = addr;
        access->size = size;
        return 1;
    }
    return 0;
}

Checkpoint: Trace parser correctly reads provided test traces.

Phase 6: Statistics and Verbose Mode (Days 13-14)

Goals:

  • Output matches reference simulator format
  • Verbose mode shows per-access results

Tasks:

  1. Implement summary output: hits:X misses:Y evictions:Z
  2. Implement verbose output for each access
  3. Compare against reference output
  4. Fix any discrepancies

Verbose Output Format:

void print_access_verbose(mem_access_t *access, access_result_t *result) {
    printf("%c %lx,%d ", access->op, access->addr, access->size);

    if (result->hit) {
        printf("hit ");
    } else {
        printf("miss ");
        if (result->eviction) {
            printf("eviction ");
        }
    }
    printf("\n");
}

Checkpoint: Output matches reference for all provided traces.

Phase 7: Miss Classification (Days 15-17)

Goals:

  • Classify each miss as compulsory, capacity, or conflict
  • Track block history for classification

Tasks:

  1. Track all unique blocks ever accessed (for compulsory detection)
  2. Simulate a fully-associative cache of same size (for conflict detection)
  3. Classify: compulsory if first access; conflict if FA would hit; else capacity

Implementation Strategy:

// Maintain a fully-associative "oracle" cache for classification
typedef struct {
    unsigned long *blocks;  // Array of block addresses
    int capacity;           // Max blocks (same as main cache)
    int count;              // Current count
    int lru_clock;
} oracle_cache_t;

char classify_miss(cache_t *cache, unsigned long block_addr,
                   oracle_cache_t *oracle, hash_set_t *seen_blocks) {

    // Check if block was ever seen before
    if (!hash_set_contains(seen_blocks, block_addr)) {
        hash_set_add(seen_blocks, block_addr);
        return 'C';  // Compulsory
    }

    // Check if fully-associative cache would have hit
    if (oracle_access(oracle, block_addr)) {
        return 'O';  // Conflict (would have hit with full associativity)
    }

    return 'A';  // Capacity (even FA cache missed)
}

Checkpoint: Miss classification matches expected distribution for test traces.

Phase 8: ASCII Visualizer (Days 18-21)

Goals:

  • Display cache state visually
  • Show access patterns and miss types

Tasks:

  1. Design ASCII cache representation
  2. Implement visualize_state(cache, access, result)
  3. Add pattern detection (stride, sequential)
  4. Implement running statistics display

Visualization Implementation:

void visualize_state(cache_t *cache, int access_num,
                     mem_access_t *access, access_result_t *result) {
    printf("\nAccess #%d: %c 0x%lx [%s%s]\n",
           access_num, access->op, access->addr,
           result->hit ? "HIT" : "MISS",
           result->hit ? "" : get_miss_type_str(result->miss_type));

    printf("โ”Œ");
    for (int i = 0; i < 40; i++) printf("โ”€");
    printf("โ”\n");

    for (int s = 0; s < cache->num_sets; s++) {
        printf("โ”‚ Set %d: ", s);
        for (int l = 0; l < cache->lines_per_set; l++) {
            cache_line_t *line = &cache->sets[s].lines[l];
            if (line->valid) {
                printf("[%04lx] ", line->tag & 0xFFFF);
            } else {
                printf("[    ] ");
            }
        }
        if (s == result->set_index) {
            printf(" โ† %s", result->hit ? "HIT" : "accessed");
        }
        printf("\n");
    }

    printf("โ””");
    for (int i = 0; i < 40; i++) printf("โ”€");
    printf("โ”˜\n");

    float hit_rate = (float)cache->hits / (cache->hits + cache->misses) * 100;
    printf("Running: hits=%d misses=%d (%.1f%% hit rate)\n",
           cache->hits, cache->misses, hit_rate);
}

Checkpoint: Visualizer shows correct cache state and highlights accessed sets.


6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Test individual components Address decomposition, LRU logic
Trace Tests Validate against known traces Match reference simulator output
Edge Cases Handle unusual inputs Empty trace, single access, full cache
Stress Tests Performance and correctness at scale Million-access traces

6.2 Test Traces

Create traces for specific scenarios:

trace1_sequential.trace - Sequential access, good spatial locality:

 L 0,4
 L 4,4
 L 8,4
 L c,4
 L 10,4
 L 14,4

trace2_strided.trace - Strided access, poor spatial locality:

 L 0,4
 L 40,4
 L 80,4
 L c0,4
 L 100,4

trace3_conflict.trace - Designed to cause conflict misses:

# For direct-mapped cache with 4 sets (s=2), 16-byte blocks (b=4)
# These all map to set 0
 L 0,4       # Set 0, tag 0
 L 40,4      # Set 0, tag 1 (evicts previous)
 L 0,4       # Set 0, tag 0 (conflict miss!)
 L 40,4      # Set 0, tag 1 (conflict miss!)

trace4_reuse.trace - Temporal locality test:

 L 0,4
 L 10,4
 L 0,4       # Should be hit
 L 10,4      # Should be hit
 L 0,4       # Should be hit

6.3 Reference Validator

#!/bin/bash
# validate.sh - Compare against reference simulator

TRACES="yi dave trans simple"
CONFIGS="1-1-1 4-2-4 2-4-3"  # s-E-b configurations

for trace in $TRACES; do
    for config in $CONFIGS; do
        s=$(echo $config | cut -d- -f1)
        E=$(echo $config | cut -d- -f2)
        b=$(echo $config | cut -d- -f3)

        expected=$(./csim-ref -s $s -E $E -b $b -t traces/${trace}.trace)
        actual=$(./csim -s $s -E $E -b $b -t traces/${trace}.trace)

        if [ "$expected" == "$actual" ]; then
            echo "PASS: $trace with s=$s E=$E b=$b"
        else
            echo "FAIL: $trace with s=$s E=$E b=$b"
            echo "  Expected: $expected"
            echo "  Actual:   $actual"
        fi
    done
done

6.4 Critical Test Cases

  1. Empty cache hit detection: First access is always a miss
  2. LRU correctness: Verify least-recently-used block is evicted
  3. Set isolation: Accesses to different sets donโ€™t affect each other
  4. Block boundary: Adjacent addresses in same block share cache line
  5. Tag uniqueness: Same index, different tags are different blocks
  6. Modify operation: โ€˜Mโ€™ counts as one potential miss but two potential hits

7. Common Pitfalls

7.1 Implementation Mistakes

Pitfall Symptom Solution
Off-by-one in bit extraction Wrong set/tag Use (1 << n) - 1 for n-bit masks
Forgetting to update LRU on hit Wrong eviction Always update LRU counter on any access
Not handling โ€˜Mโ€™ operations Miscount hits โ€˜Mโ€™ = load + store; can hit twice
Signed vs unsigned addresses Negative set indices Use unsigned long for addresses
Wrong mask calculation Tag/index overlap Verify: tag_bits + index_bits + offset_bits = address_bits
Evicting wrong line Incorrect cache state Verify LRU logic with small examples

7.2 Debugging Strategies

// Debug mode: print every decomposition
#ifdef DEBUG
    printf("addr=0x%lx -> tag=0x%lx set=%lu offset=%lu\n",
           addr, tag, set_index, offset);
#endif

// Sanity checks
assert(set_index < cache->num_sets);
assert(offset < cache->block_size);
assert(tag < (1UL << cache->t));

7.3 Performance Traps

  • Large traces: Donโ€™t store all accesses in memory; process streaming
  • Hash table for seen blocks: Donโ€™t use linear search for millions of blocks
  • Visualization overhead: Only visualize in debug mode or with sampling

8. Extensions

8.1 Beginner Extensions

  • FIFO replacement policy: Simpler than LRU
  • Random replacement: Even simpler, good baseline
  • Write operations: Track stores separately from loads
  • Command-line help: Proper usage message with -h

8.2 Intermediate Extensions

  • Multi-level cache: Simulate L1 + L2; misses in L1 go to L2
  • Inclusive vs exclusive: Different multi-level policies
  • Write-back simulation: Track dirty bits, count writebacks
  • Trace generation: Create traces from actual program runs
  • Block-level statistics: Per-block hit/miss counts

8.3 Advanced Extensions

  • Prefetching simulation: Next-line, stride, or correlation prefetch
  • TLB simulation: Combine with page table simulation
  • Cache coherence: Simulate MESI protocol for multi-core
  • Real code optimization: Use simulator to guide actual code changes
  • GUI visualizer: Web-based or ncurses interface

8.4 Multi-Level Cache Example

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      MULTI-LEVEL CACHE SIMULATION                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   Typical modern hierarchy:                                                 โ”‚
โ”‚                                                                             โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                          โ”‚
โ”‚   โ”‚     CPU     โ”‚                                                          โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                          โ”‚
โ”‚          โ–ผ                                                                  โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  ~1ns, 32KB, 8-way                                       โ”‚
โ”‚   โ”‚  L1 Cache   โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                               โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                          โ”‚ L1 miss                       โ”‚
โ”‚          โ–ผ                                 โ–ผ                                โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  ~4ns, 256KB, 8-way                                      โ”‚
โ”‚   โ”‚  L2 Cache   โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                               โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                          โ”‚ L2 miss                       โ”‚
โ”‚          โ–ผ                                 โ–ผ                                โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  ~20ns, 8MB, 16-way                                      โ”‚
โ”‚   โ”‚  L3 Cache   โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                               โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                          โ”‚ L3 miss                       โ”‚
โ”‚          โ–ผ                                 โ–ผ                                โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  ~100ns                                                  โ”‚
โ”‚   โ”‚    DRAM     โ”‚                                                          โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                          โ”‚
โ”‚                                                                             โ”‚
โ”‚   Simulation approach:                                                      โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                        โ”‚
โ”‚   1. Try L1: if hit, done                                                  โ”‚
โ”‚   2. If L1 miss, try L2: if hit, install in L1                            โ”‚
โ”‚   3. If L2 miss, try L3: if hit, install in L1 and L2                     โ”‚
โ”‚   4. If L3 miss, fetch from memory, install everywhere                     โ”‚
โ”‚                                                                             โ”‚
โ”‚   // Pseudocode                                                             โ”‚
โ”‚   result = access_cache(L1, addr);                                         โ”‚
โ”‚   if (!result.hit) {                                                       โ”‚
โ”‚       result = access_cache(L2, addr);                                     โ”‚
โ”‚       if (!result.hit) {                                                   โ”‚
โ”‚           result = access_cache(L3, addr);                                 โ”‚
โ”‚           if (!result.hit) {                                               โ”‚
โ”‚               // Fetch from memory                                         โ”‚
โ”‚           }                                                                โ”‚
โ”‚           install_block(L2, addr);                                         โ”‚
โ”‚       }                                                                    โ”‚
โ”‚       install_block(L1, addr);                                             โ”‚
โ”‚   }                                                                        โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

9. Real-World Connections

9.1 Performance Engineering

Your cache simulator skills directly apply to:

  • Database systems: Query planners consider cache behavior
  • Game engines: Data-oriented design for cache efficiency
  • Scientific computing: Blocking for matrix operations
  • Embedded systems: Understanding tight memory constraints

9.2 Tools That Use These Concepts

Tool Purpose Cache Relevance
Valgrind/Cachegrind Cache profiling Uses simulation like yours
perf Hardware counters Measures real cache events
Intel VTune Performance analysis Cache miss attribution
Compiler (-O3) Optimization Loop blocking, prefetching

9.3 Interview Applications

Common questions your project prepares you for:

  1. โ€œWhy is row-major traversal faster than column-major for matrices?โ€
  2. โ€œHow would you optimize a function with poor cache behavior?โ€
  3. โ€œExplain the tradeoffs in cache associativity.โ€
  4. โ€œWhatโ€™s the difference between L1 and L3 cache?โ€
  5. โ€œHow do you measure cache performance?โ€

9.4 Modern CPU Cache Parameters

For reference, typical modern CPU cache configurations:

Level Size Associativity Block Size Latency
L1D 32-48 KB 8-12 way 64 bytes 4-5 cycles
L1I 32 KB 8 way 64 bytes 4-5 cycles
L2 256-512 KB 4-8 way 64 bytes 12-14 cycles
L3 8-64 MB 12-16 way 64 bytes 40-75 cycles

10. Resources

10.1 Essential Reading

  • CS:APP Chapter 6: โ€œThe Memory Hierarchyโ€ - Core material
  • CS:APP Chapter 5: โ€œOptimizing Program Performanceโ€ - Measurement discipline
  • โ€œWhat Every Programmer Should Know About Memoryโ€ by Ulrich Drepper - Deep dive

10.2 Tools

  • Valgrind/Cachegrind: valgrind --tool=cachegrind ./program
  • perf: perf stat -e cache-references,cache-misses ./program
  • PAPI: Performance API for hardware counters

10.3 Trace Generation

# Generate trace with Valgrind lackey
valgrind --tool=lackey --trace-mem=yes ./your_program 2>&1 | \
    grep -E '^ [LS]' > trace.trace
  • Previous: P8 (Performance Clinic) for measurement methodology
  • Next: P13 (VM Map Visualizer) extends memory hierarchy understanding
  • Related: P14 (Malloc) uses locality principles

10.5 Online Resources


11. Self-Assessment Checklist

Understanding

  • I can explain address decomposition (tag/index/offset) without notes
  • I can draw a set-associative cache and trace an access through it
  • I understand why spatial locality works (blocks, not bytes)
  • I can classify a miss as compulsory, capacity, or conflict
  • I can explain LRU replacement step-by-step
  • I understand why blocking improves matrix multiply performance

Implementation

  • Simulator correctly handles all trace operations (L, S, M)
  • Hit/miss/eviction counts match reference for all test traces
  • LRU replacement is implemented correctly
  • Address decomposition is verified with manual calculations
  • Visualizer shows clear cache state representation

Application

  • I can predict relative cache performance of two code variants
  • I used the simulator to measure a real optimizationโ€™s effect
  • I can explain a miss-rate reduction in terms of locality principles
  • I can choose appropriate cache parameters for a simulation

Growth

  • I understand how to extend to multi-level caches
  • I know how real tools (Cachegrind, perf) relate to my simulator
  • I can apply cache knowledge to optimize real programs

12. Completion Criteria

Minimum Viable Completion

  • Direct-mapped and set-associative simulation works
  • Address decomposition is correct
  • Hit/miss/eviction counting matches reference
  • Processes provided trace files
  • Basic text output works

Full Completion

  • LRU replacement is correct
  • All trace formats handled (L, S, M)
  • Miss classification (compulsory/capacity/conflict)
  • Verbose mode shows per-access details
  • ASCII visualizer displays cache state
  • All reference traces match expected output

Excellence (Going Above & Beyond)

  • Multi-level cache simulation
  • Multiple replacement policies (LRU, FIFO, Random)
  • Prefetching simulation
  • Real program optimization demonstrated with simulator
  • Interactive visualization or web interface

13. Real World Outcome

When you complete this project, you will have a fully functional cache simulator and locality visualizer. Here is exactly what running your tools will look like:

Cache Simulator Basic Usage

$ ./csim -s 4 -E 2 -b 4 -t traces/yi.trace
hits:4 misses:5 evictions:3

Verbose Mode with Full Trace

$ ./csim -v -s 4 -E 2 -b 4 -t traces/yi.trace
Cache Configuration:
  Sets (S):           16 (s=4 bits)
  Lines per set (E):  2 (2-way set-associative)
  Block size (B):     16 bytes (b=4 bits)
  Cache size:         512 bytes

Processing trace: traces/yi.trace
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 L 10,1       set=1   tag=0x000001   miss
 M 20,1       set=2   tag=0x000002   miss hit
 L 22,1       set=2   tag=0x000002   hit
 S 18,1       set=1   tag=0x000001   hit
 L 110,1      set=1   tag=0x000011   miss eviction
 L 210,1      set=1   tag=0x000021   miss eviction
 M 12,1       set=1   tag=0x000001   miss eviction hit
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Summary:
  hits:4 misses:5 evictions:3
  Hit rate: 44.4%
  Miss rate: 55.6%

Miss Classification Analysis

$ ./csim --classify -s 4 -E 2 -b 4 -t traces/matrix_multiply.trace
Cache Configuration:
  Sets: 16 (s=4)
  Lines per set: 2 (E=2)
  Block size: 16 bytes (b=4)
  Total size: 512 bytes

Access Summary:
  Total accesses:     10,000
  Hits:               7,234  (72.3%)
  Misses:             2,766  (27.7%)

Miss Classification:
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ Compulsory:  1,024  (37.0% of misses) โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ     โ”‚
  โ”‚ Capacity:      892  (32.3% of misses) โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ      โ”‚
  โ”‚ Conflict:      850  (30.7% of misses) โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ       โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  Evictions: 1,742

Recommendations:
  - High conflict misses: Consider 4-way associativity
  - Capacity misses indicate working set > cache size
  - Use blocking with block size ~16 elements

Locality Visualizer Output

$ ./lvis -s 2 -E 2 -b 2 -t traces/simple.trace --animate

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘             CACHE LOCALITY VISUALIZER                         โ•‘
โ•‘       Cache: 4 sets ร— 2 ways ร— 4 bytes = 32 bytes            โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Access #1: L 0x00 (4 bytes)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Set 0: [โ–ˆโ–ˆโ–ˆโ–ˆ t=0x00] [        ]            โ”‚ โ† MISS (Compulsory)
โ”‚ Set 1: [            ] [        ]           โ”‚
โ”‚ Set 2: [            ] [        ]           โ”‚
โ”‚ Set 3: [            ] [        ]           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Stats: hits=0  misses=1  rate=0.0%

Access #2: L 0x04 (4 bytes)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Set 0: [โ–ˆโ–ˆโ–ˆโ–ˆ t=0x00] [        ]            โ”‚
โ”‚ Set 1: [โ–ˆโ–ˆโ–ˆโ–ˆ t=0x00] [        ]            โ”‚ โ† MISS (Compulsory)
โ”‚ Set 2: [            ] [        ]           โ”‚
โ”‚ Set 3: [            ] [        ]           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Stats: hits=0  misses=2  rate=0.0%

Access #3: L 0x00 (4 bytes)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Set 0: [โ–ˆโ–ˆโ–ˆโ–ˆ t=0x00] [        ]            โ”‚ โ† HIT!
โ”‚ Set 1: [โ–ˆโ–ˆโ–ˆโ–ˆ t=0x00] [        ]            โ”‚
โ”‚ Set 2: [            ] [        ]           โ”‚
โ”‚ Set 3: [            ] [        ]           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Stats: hits=1  misses=2  rate=33.3%

Access #4: L 0x10 (4 bytes)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Set 0: [โ–ˆโ–ˆโ–ˆโ–ˆ t=0x01] [โ–ˆโ–ˆโ–ˆโ–ˆ t=0x00]         โ”‚ โ† MISS (loaded new tag)
โ”‚ Set 1: [โ–ˆโ–ˆโ–ˆโ–ˆ t=0x00] [        ]            โ”‚
โ”‚ Set 2: [            ] [        ]           โ”‚
โ”‚ Set 3: [            ] [        ]           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Stats: hits=1  misses=3  rate=25.0%

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
                    PATTERN ANALYSIS
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
Detected stride: 4 bytes
Spatial locality: GOOD (sequential within cache lines)
Temporal locality: MODERATE (50% reuse of cached data)
Recommendation: Current cache configuration is adequate
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Matrix Multiply Optimization Analysis

$ ./csim --compare -s 5 -E 4 -b 6 \
    -t traces/matmul_naive.trace \
    -t traces/matmul_blocked.trace

Comparing cache behavior between two traces:
Cache: 32 sets, 4-way, 64-byte blocks (8 KB)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    COMPARISON RESULTS                         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                  โ”‚ matmul_naive       โ”‚ matmul_blocked       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Total accesses   โ”‚ 2,097,152          โ”‚ 2,097,152            โ”‚
โ”‚ Hits             โ”‚ 524,288 (25.0%)    โ”‚ 1,835,008 (87.5%)    โ”‚
โ”‚ Misses           โ”‚ 1,572,864 (75.0%)  โ”‚ 262,144 (12.5%)      โ”‚
โ”‚ Evictions        โ”‚ 1,572,832          โ”‚ 262,112              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Est. cycles      โ”‚ 157,810,688        โ”‚ 28,311,552           โ”‚
โ”‚ Speedup          โ”‚ 1.0x (baseline)    โ”‚ 5.6x faster          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Analysis: Blocking reduced misses by 83.3%
          Estimated 5.6x performance improvement

14. The Core Question Youโ€™re Answering

โ€œHow does the memory hierarchy create the illusion of a large, fast memory, and how can I write code that exploits this illusion?โ€

This project forces you to deeply understand that CPU performance is dominated by memory access patterns. A cache miss can cost 100+ CPU cycles while a hit costs only 1-4 cycles. By building a simulator, you internalize how your data layout and access patterns directly translate to cache behavior - knowledge that separates average programmers from performance-conscious systems engineers.


15. Concepts You Must Understand First

Before starting this project, ensure you have a solid grasp of these foundational concepts:

Concept Where to Learn Why It Matters
Binary/Hexadecimal representation CS:APP 2.1 Address decomposition requires bit manipulation
Bitwise operations (AND, OR, shifts) CS:APP 2.1.4 Extracting tag/index/offset from addresses
Memory addressing CS:APP 6.1 Understanding how addresses map to physical memory
Locality of reference CS:APP 6.2 Why caches work - temporal and spatial locality
C pointers and dynamic allocation CS:APP 2.1, K&R Ch. 5 Building cache data structures
Structs and arrays K&R Ch. 6 Representing cache organization
File I/O in C K&R Ch. 7 Parsing trace files
Command-line argument parsing K&R 5.10 Handling -s, -E, -b flags

16. Questions to Guide Your Design

Work through these questions before writing any code:

  1. Address Decomposition: Given a 64-bit address, s=4, and b=6, write out the formula to extract tag, set index, and block offset. Try it with address 0x7FF000A3C8.

  2. Data Structure Choice: How will you represent a cache line? A set? The entire cache? Why might a 2D array work for some configurations but not others?

  3. LRU Tracking: How will you track access recency for LRU replacement? Counter per line? Linked list? Whatโ€™s the time complexity of each approach?

  4. Miss Classification: To classify a miss as conflict vs capacity, you need to know if a fully-associative cache of the same size would have hit. How will you simulate this efficiently?

  5. Modify Operations: A trace line โ€œM 7ff000a3c8,4โ€ represents a load followed by a store. How do you handle this? Can both the load and store be hits?

  6. Block Boundaries: If a load accesses bytes 60-67 of a 64-byte block, what happens? Do you need to track byte-level access or just block-level?


17. Thinking Exercise

Before coding, hand-trace this sequence through a cache with s=1 (2 sets), E=2 (2-way), b=2 (4-byte blocks):

L 0x00    # Load from address 0
L 0x04    # Load from address 4
L 0x08    # Load from address 8
L 0x00    # Load from address 0 again
L 0x10    # Load from address 16
L 0x00    # Load from address 0 again

For each access, determine:

  • Which set is accessed?
  • Is it a hit or miss?
  • If miss, is it compulsory, capacity, or conflict?
  • Which line is evicted (if any)?
  • What is the final cache state?

Show your work before looking at the solution. Draw the cache state after each access.

Solution (click to expand)

Address decomposition for this cache:

  • Block offset: bits [1:0] (2 bits)
  • Set index: bit [2] (1 bit, so 2 sets)
  • Tag: bits [63:3]
Access Address Set Tag Result Cache State After
L 0x00 0b000 0 0 MISS (compulsory) Set0: [tag=0, _]
L 0x04 0b100 1 0 MISS (compulsory) Set0: [tag=0, _] Set1: [tag=0, _]
L 0x08 0b1000 0 1 MISS (compulsory) Set0: [tag=0, tag=1] Set1: [tag=0, _]
L 0x00 0b000 0 0 HIT Set0: [tag=0, tag=1]
L 0x10 0b10000 0 2 MISS (capacity) Set0: [tag=0, tag=2] (evicted tag=1)
L 0x00 0b000 0 0 HIT Set0: [tag=0, tag=2]

Final: hits=2, misses=4, evictions=1


18. The Interview Questions Theyโ€™ll Ask

After completing this project, you should be able to confidently answer these questions:

  1. โ€œExplain the difference between direct-mapped, set-associative, and fully-associative caches. What are the tradeoffs?โ€
    • Expect follow-ups about hardware cost, hit latency, and miss rate
  2. โ€œWhy does row-major traversal of a matrix outperform column-major traversal in C?โ€
    • Must connect to cache line size and spatial locality
  3. โ€œWhat is cache thrashing and how would you detect and fix it in code?โ€
    • Discuss conflict misses, power-of-2 strides, and padding
  4. โ€œWalk me through exactly what happens when the CPU accesses memory at address X.โ€
    • Cover TLB, address decomposition, tag comparison, miss handling
  5. โ€œHow does blocking (loop tiling) improve matrix multiplication performance?โ€
    • Quantify: 3 blocks must fit in cache, calculate optimal block size
  6. โ€œYou have a program with 90% cache miss rate. How do you diagnose and fix it?โ€
    • Discuss Cachegrind, perf, miss classification, data layout changes

19. Hints in Layers

Use these hints progressively if you get stuck. Try to solve problems yourself before advancing to the next hint level.

Hint Layer 1: Getting Started

  • Start with address decomposition. Write a function that takes an address and prints tag/index/offset. Test it extensively before moving on.
  • For cache state, a 2D array cache_line_t cache[S][E] works well for small caches.
  • Parse traces with fscanf(fp, " %c %lx,%d", &op, &addr, &size).

Hint Layer 2: Core Logic

  • Hit detection: For each line in set[index], check valid && (tag == line.tag).
  • For LRU, a simple counter works: increment global counter on each access, store it in the accessed line. Victim is the line with smallest counter.
  • Handle โ€˜Mโ€™ as two separate accesses: a load (which may miss) followed by a store (which always hits if the load brought in the block).

Hint Layer 3: Miss Classification

  • Maintain a hash set of all block addresses ever seen. First access to a block is compulsory.
  • Simulate a fully-associative cache of the same total size in parallel. If the FA cache hits but your cache misses, itโ€™s a conflict miss.
  • Everything else is a capacity miss.

Hint Layer 4: Common Bugs

  • Off-by-one in bit masks: use (1UL << n) - 1, not (1 << n) - 1 (the UL matters for 64-bit).
  • Forgetting to update LRU counter on hits (not just misses).
  • Using int instead of unsigned long for addresses (signed arithmetic bugs).
  • Not handling the case where a set has empty lines (no eviction needed).

20. Books That Will Help

Topic Book Specific Chapters
Cache organization fundamentals CS:APP (3rd ed.) Chapter 6.1-6.4
Locality and optimization CS:APP (3rd ed.) Chapter 6.5-6.6
Performance measurement CS:APP (3rd ed.) Chapter 5.1-5.7
Memory hierarchy design Patterson & Hennessy: Computer Architecture Chapter 5
Advanced caching Ulrich Drepper: โ€œWhat Every Programmer Should Know About Memoryโ€ Sections 3.1-3.3
C bit manipulation K&R: The C Programming Language Chapter 2.9
File I/O in C K&R: The C Programming Language Chapter 7.5-7.7
Cache-oblivious algorithms Introduction to Algorithms (CLRS) Chapter 27

This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.