Project 21: 8-bit RISC CPU (Final Project)
Design a small CPU with a minimal ISA, control FSM, and memory interface.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Master |
| Time Estimate | 4-8 weeks |
| Main Programming Language | Verilog (Alternatives: VHDL, SystemVerilog) |
| Alternative Programming Languages | VHDL, SystemVerilog |
| Coolness Level | Very High |
| Business Potential | Medium |
| Prerequisites | Datapaths, FSMs, Memory, ALU design |
| Key Topics | Fetch/Decode/Execute, Instruction encoding, Control signals |
1. Learning Objectives
- Design a minimal instruction set
- Build a datapath + control FSM
- Run assembly programs in simulation
2. All Theory Needed (Per-Concept Breakdown)
CPU Fetch/Decode/Execute Cycle
Description/Expanded Explanation of the concept
A CPU repeatedly fetches an instruction, decodes it, and executes the required datapath operations. Control signals orchestrate data movement. A clear micro-architecture makes debugging possible.
Definitions & Key Terms
- Fetch -> read instruction from memory
- Decode -> interpret opcode and operands
- Execute -> perform operation in ALU or memory
Mental Model Diagram (ASCII)
PC -> IMEM -> DECODE -> EXECUTE -> WRITEBACK

How It Works (Step-by-Step)
- Use PC to fetch instruction.
- Decode opcode and operand fields.
- Select ALU op or memory op.
- Write result to register or memory.
- Update PC (sequential or branch).
Minimal Concrete Example
always @(posedge clk) begin
instr <= imem[pc];
pc <= pc_next;
end
Common Misconceptions
- “You need pipelines first.” -> Start with a simple multi-cycle CPU.
- “Control is obvious.” -> It must be designed and tested carefully.
Check-Your-Understanding Questions
- What signals are needed for a LOAD instruction?
- How does a branch update PC?
- Why separate datapath and control?
Where You’ll Apply It
- This project: used in Section 3.2 and Section 4
- Also used in: P05-8-bit-alu-arithmetic-logic-unit.md (ALU foundation)
Datapaths and Flag Generation
Description/Expanded Explanation of the concept
A datapath is the collection of registers, muxes, and an ALU that moves and transforms data. Flags (zero, carry, negative) are simple but essential signals that let control logic make decisions.
Definitions & Key Terms
- Datapath -> the data-processing portion of a design
- Control -> logic that selects operations and routes data
- Flags -> status bits derived from results
Mental Model Diagram (ASCII)
[Regs] -> [ALU] -> [Regs]
| |
Z C flags

How It Works (Step-by-Step)
- Registers feed operands into the ALU.
- Control selects the ALU operation.
- Result and flags are computed in the same cycle.
- Flags are stored or forwarded for decision-making.
Minimal Concrete Example
assign z = (y == 0);
assign n = y[7];
Common Misconceptions
- “Flags are only for CPUs.” -> Many control systems use them too.
- “Flags don’t need testing.” -> They are part of the contract.
Check-Your-Understanding Questions
- When should the zero flag assert?
- How do you compute a negative flag for unsigned data?
- Why separate datapath and control?
Where You’ll Apply It
- This project: used in Section 3.2 and Section 4
- Also used in: P19-calculator-with-7-segment-display.md, Final CPU
Memory Inference and RAM Behavior
Description/Expanded Explanation of the concept
FPGAs contain block RAMs. You can infer them by writing the right Verilog pattern. Synchronous read and write behavior is common, and read-during-write behavior must be defined.
Definitions & Key Terms
- Block RAM -> dedicated on-chip memory
- Synchronous read -> data available after a clock edge
- Read-during-write -> behavior when reading and writing same address
Mental Model Diagram (ASCII)
addr -> [RAM] -> dout
^
din (we)

How It Works (Step-by-Step)
- Define a memory array in Verilog.
- Use a clocked block for writes.
- Decide and document read behavior.
- Test read-during-write cases.
Minimal Concrete Example
reg [7:0] mem [0:7];
always @(posedge clk) begin
if (we) mem[addr] <= din;
dout <= mem[addr];
end
Common Misconceptions
- “Reads are always combinational.” -> Many FPGA RAMs are synchronous.
- “Read-during-write is obvious.” -> Tools may infer different modes.
Check-Your-Understanding Questions
- What is the read latency of your RAM?
- What happens when read and write share the same address?
- How do you initialize memory for simulation?
Where You’ll Apply It
- This project: used in Section 3.2 and Section 4
- Also used in: Final CPU
Verification with Testbenches and Waveforms
Description/Expanded Explanation of the concept
Testbenches are simulation-only modules that apply stimulus and check outputs. Waveforms (VCD) are the hardware engineer’s microscope; they reveal timing, glitches, and ordering problems. A good testbench is deterministic and covers edge cases.
Definitions & Key Terms
- Testbench -> a non-synthesizable module that drives a DUT
- VCD -> Value Change Dump waveform file
- Deterministic test -> same inputs produce same outputs every run
Mental Model Diagram (ASCII)
[Testbench] -> [DUT] -> [VCD] -> [GTKWave]

How It Works (Step-by-Step)
- Initialize inputs to known values.
- Apply stimulus over time.
- Dump waveforms and check outputs.
- Add assertions or PASS/FAIL messages.
Minimal Concrete Example
initial begin
$dumpfile("wave.vcd");
$dumpvars(0, tb);
a = 0; b = 1; #10;
$finish;
end
Common Misconceptions
- “If it simulates once, it’s correct.” -> Cover all relevant cases.
- “Waveforms are optional.” -> They are often the only way to debug timing.
Check-Your-Understanding Questions
- Why keep testbench and DUT separate?
- What is the purpose of
$dumpvars? - How do you make a testbench deterministic?
Where You’ll Apply It
- This project: used throughout Section 6 (testing)
- Also used in: all other projects in this folder
3. Project Specification
3.1 What You Will Build
A complete 8-bit CPU with registers, ALU, PC, ROM, RAM, and control FSM.
3.2 Functional Requirements
- Requirement 1: Fetch instructions from ROM
- Requirement 2: Decode opcodes and drive control signals
- Requirement 3: Execute ALU and memory operations
3.3 Non-Functional Requirements
- Performance: Stable operation at the target clock and interfaces.
- Reliability: Deterministic outputs on all defined inputs.
- Usability: Clear ports and documented behavior.
3.4 Example Usage / Output
{p['example_usage']}
3.5 Data Formats / Schemas / Protocols
{p[‘data_format’]}
3.6 Edge Cases
- Branch to invalid address
- Unknown opcode
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
python asm.py blink.asm > prog.hex && vvp cpu_tb
3.7.2 Golden Path Demo (Deterministic)
Run the demo command above with the provided testbench and confirm the outputs match the golden transcript.
3.7.3 CLI Transcript
PC=0000 INSTR=LOAD R0,#0xFF
PC=0002 INSTR=OUT R0
LED_PORT=FF
3.7.4 Failure Demo (Expected)
# Example failure case
ERROR: Output mismatch at vector 3
Expected: 0x0A, Got: 0x0B
EXIT CODE: 1
Notes:
- Exit code 0 indicates all tests passed
- Exit code 1 indicates a test failure
4. Solution Architecture
4.1 High-Level Design
[inputs] -> [core logic] -> [outputs]

4.2 Key Components
| Component | Responsibility |
|---|---|
| pc | Program counter with next-PC logic |
| imem | Instruction ROM |
| regfile | General-purpose registers |
| alu | Arithmetic and logic |
| control_fsm | Generates control signals |
4.3 Data Structures (No Full Code)
// Example signals (adapt to your design)
reg [7:0] state_reg;
reg [7:0] data_reg;
4.4 Algorithm Overview
Key Algorithm: Core control flow
- Initialize state/reset conditions.
- Apply inputs and compute outputs.
- Update state on clock edges (if sequential).
Complexity Analysis:
- Time: O(1) per cycle
- Space: O(N) for registers and logic
5. Implementation Guide
5.1 Development Environment Setup
iverilog -v
# Ensure GTKWave is installed for waveform viewing
5.2 Project Structure
project-root/
|-- src/
| |-- top.v
| |-- core.v
|-- tb/
| |-- tb.v
|-- Makefile
|-- README.md

5.3 The Core Question You’re Answering
“How does a CPU fetch, decode, and execute instructions, cycle after cycle?”
5.4 Concepts You Must Understand First
- Datapaths
- FSMs
- Memory
- ALU design
5.5 Questions to Guide Your Design
- Single-cycle or multi-cycle?
- How wide are opcodes and registers?
- How will you test correctness (golden model)?
5.6 Thinking Exercise
Draw the datapath and list control signals for LOAD and ADD.
5.7 The Interview Questions They’ll Ask
- Why separate datapath and control?
- What is a minimal ISA and why?
5.8 Hints in Layers
- Start with a 3-4 instruction ISA.
- Get fetch and PC update working first.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| CPU design | Digital Design and Computer Architecture | Ch. 6-7 |
| ISA concepts | Computer Organization and Design (RISC-V) | Ch. 2 |
5.10 Implementation Phases
Phase 1: Foundation
Goals:
- Establish core module structure
- Implement minimal behavior
Tasks:
- Scaffold module ports and internal signals
- Write a minimal testbench that compiles
Checkpoint: Simulation runs without errors
Phase 2: Core Functionality
Goals:
- Implement full logic
- Verify edge cases
Tasks:
- Complete core logic
- Add directed tests for edge cases
Checkpoint: All tests pass and waveforms match expectations
Phase 3: Polish & Edge Cases
Goals:
- Improve readability
- Document behavior
Tasks:
- Add comments and README notes
- Expand tests for unusual inputs
Checkpoint: Design is deterministic and documented
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Reset strategy | Sync / Async | Sync | Simpler timing closure |
| Test coverage | Directed / Exhaustive | Exhaustive for small logic | Prevents missed cases |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test core logic | Small vectors |
| Integration Tests | Test modules together | Full system |
| Edge Case Tests | Boundary conditions | Max/min values |
6.2 Critical Test Cases
- Test 1: Run a 3-instruction program and verify registers
- Test 2: Branch and jump correctness
6.3 Test Data
Use deterministic vectors and document expected outputs.
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Bad decode | Wrong control signals | Probe opcode and control lines |
7.2 Debugging Strategies
- Inspect waveforms at key internal signals
- Add temporary debug outputs to verify state
- Reduce testcases to the smallest failing case
7.3 Performance Traps
- Overly wide counters or combinational paths can reduce max clock
8. Extensions & Challenges
8.1 Beginner Extensions
- Add parameterization for widths
- Add optional features (enable, reset)
8.2 Intermediate Extensions
- Add configuration registers
- Build a simple driver or demo program
8.3 Advanced Extensions
- Integrate with another project in this series
- Implement a hardware demo on FPGA
9. Real-World Connections
9.1 Industry Applications
- Digital control systems and embedded peripherals
- FPGA prototyping and validation
9.2 Related Open Source Projects
- Yosys / nextpnr toolchain for open-source FPGA flow
- Example HDL projects in the FPGA community
9.3 Interview Relevance
- Demonstrates RTL thinking and verification skills
10. Resources
10.1 Essential Reading
- Digital Design and Computer Architecture - Focus on Ch. 6-7
- Computer Organization and Design (RISC-V) - Focus on Ch. 2
10.2 Video Resources
- Search for project-specific HDL walkthroughs and waveforms
10.3 Tools & Documentation
- Icarus Verilog
- GTKWave
10.4 Related Projects in This Series
- See adjacent projects in
VERILOG_FROM_ZERO_PROJECTS/
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the core concept without notes
- I can predict waveform behavior for basic inputs
11.2 Implementation
- All functional requirements are met
- All tests pass
- Edge cases are documented
11.3 Growth
- I can explain this project in an interview
- I documented at least one lesson learned
12. Submission / Completion Criteria
Minimum Viable Completion:
- Functional requirements implemented
- Testbench passes
- Waveforms inspected
Full Completion:
- All minimum criteria plus
- Edge cases covered and documented
Excellence (Going Above & Beyond):
- Hardware demo on FPGA
- Clear write-up of lessons learned
Appendix A: Deep Dive Walkthrough
A.1 ISA Contract (Example)
- Instruction format:
[7:6]=opcode, [5:3]=rd, [2:0]=rs/imm - Opcodes:
00: ADD (rd = rd + rs)01: LOADI (rd = imm)10: JMP (pc = imm)11: OUT (io = rd)
A.2 Datapath Signals
pc,pc_nextir(instruction register)regfile[0:7]alu_y,zero
A.3 Multi-Cycle Control (Suggested)
- FETCH:
ir <= imem[pc],pc <= pc + 1 - DECODE: compute control signals
- EXEC: update regs or pc
A.4 Deterministic Test Program
0: LOADI R0, 1
1: OUT R0
2: ADD R0, R0
3: JMP 1
Expected OUT: 1,2,4,8,… (wrap on overflow)
A.5 Debugging Tip
If the PC is wrong, probe pc, pc_next, and state in the waveform for every cycle.
13. Deep Dive Appendix
13.1 Timing and Resource Budget
- A multi-cycle CPU spreads work across cycles: fetch, decode, execute, writeback.
- The critical path is often the ALU + register writeback mux.
- If memory is synchronous, account for read latency in the control FSM.
13.2 Waveform Interpretation Guide
- Track pc, instr, state, regfile_we, and alu_result together.
- Confirm that writes occur only in the correct state.
- Use a trace log in the testbench to compare expected vs actual execution.
Example trace:
PC=0000 FETCH
PC=0000 DECODE instr=0xA1
PC=0000 EXEC ALU=0x05
PC=0000 WB R1=0x05
13.3 Hardware Bring-Up Notes
- Start with a tiny ROM program (blink LED) to validate fetch/decode.
- Provide a single-step clock for slow, observable execution.
- Expose debug outputs (PC, state) on LEDs or a UART debug port.
13.4 Alternate Implementations and Trade-offs
- Hardwired control: faster, simpler for small ISAs.
- Microcoded control: easier to extend, slower but flexible.
- Single-cycle CPU: simpler control, but slower max clock.
13.5 Additional Exercises
- Add interrupt support with a vector table.
- Add stack and CALL/RET instructions.
- Add memory-mapped IO and a simple UART peripheral.