Project 5: 8-bit ALU (Arithmetic Logic Unit)
Create an ALU with arithmetic, logic, and flag outputs.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1 week |
| Main Programming Language | Verilog (Alternatives: VHDL, SystemVerilog) |
| Alternative Programming Languages | VHDL, SystemVerilog |
| Coolness Level | High |
| Business Potential | Medium |
| Prerequisites | Adders, Two’s complement, Muxes |
| Key Topics | Opcode decoding, Flags, Datapath |
1. Learning Objectives
- Design an opcode table for ALU operations
- Generate flags (Z, C, N)
- Write a testbench with edge cases
2. All Theory Needed (Per-Concept Breakdown)
Adders and Two’s Complement Arithmetic
Description/Expanded Explanation of the concept
Adders are the core of arithmetic hardware. Two’s complement lets the same adder perform subtraction by inverting and adding one. Carry propagation defines speed for ripple adders and is the main timing cost.
Definitions & Key Terms
- Full adder -> adds a, b, and carry-in
- Two’s complement -> negate by invert + 1
- Carry propagation -> carry ripples through bit positions
Mental Model Diagram (ASCII)
[A0+B0] -> c1 -> [A1+B1] -> c2 -> [A2+B2] -> c3

How It Works (Step-by-Step)
- Build a 1-bit full adder.
- Chain full adders for N-bit addition.
- For subtraction, invert B and add 1.
- Verify carry and overflow behavior.
Minimal Concrete Example
assign {c_out, sum} = a + b + c_in;
Common Misconceptions
- “Carry and overflow are the same.” -> They differ for signed numbers.
- “Ripple carry is always fine.” -> It can be too slow for high speeds.
Check-Your-Understanding Questions
- How do you compute a - b using an adder?
- Why is carry a timing bottleneck?
- What is the difference between carry and overflow?
Where You’ll Apply It
- This project: used in Section 3.2 and Section 4 (architecture)
- Also used in: P05-8-bit-alu-arithmetic-logic-unit.md, P19-calculator-with-7-segment-display.md
Datapaths and Flag Generation
Description/Expanded Explanation of the concept
A datapath is the collection of registers, muxes, and an ALU that moves and transforms data. Flags (zero, carry, negative) are simple but essential signals that let control logic make decisions.
Definitions & Key Terms
- Datapath -> the data-processing portion of a design
- Control -> logic that selects operations and routes data
- Flags -> status bits derived from results
Mental Model Diagram (ASCII)
[Regs] -> [ALU] -> [Regs]
| |
Z C flags

How It Works (Step-by-Step)
- Registers feed operands into the ALU.
- Control selects the ALU operation.
- Result and flags are computed in the same cycle.
- Flags are stored or forwarded for decision-making.
Minimal Concrete Example
assign z = (y == 0);
assign n = y[7];
Common Misconceptions
- “Flags are only for CPUs.” -> Many control systems use them too.
- “Flags don’t need testing.” -> They are part of the contract.
Check-Your-Understanding Questions
- When should the zero flag assert?
- How do you compute a negative flag for unsigned data?
- Why separate datapath and control?
Where You’ll Apply It
- This project: used in Section 3.2 and Section 4
- Also used in: P19-calculator-with-7-segment-display.md, Final CPU
Verification with Testbenches and Waveforms
Description/Expanded Explanation of the concept
Testbenches are simulation-only modules that apply stimulus and check outputs. Waveforms (VCD) are the hardware engineer’s microscope; they reveal timing, glitches, and ordering problems. A good testbench is deterministic and covers edge cases.
Definitions & Key Terms
- Testbench -> a non-synthesizable module that drives a DUT
- VCD -> Value Change Dump waveform file
- Deterministic test -> same inputs produce same outputs every run
Mental Model Diagram (ASCII)
[Testbench] -> [DUT] -> [VCD] -> [GTKWave]

How It Works (Step-by-Step)
- Initialize inputs to known values.
- Apply stimulus over time.
- Dump waveforms and check outputs.
- Add assertions or PASS/FAIL messages.
Minimal Concrete Example
initial begin
$dumpfile("wave.vcd");
$dumpvars(0, tb);
a = 0; b = 1; #10;
$finish;
end
Common Misconceptions
- “If it simulates once, it’s correct.” -> Cover all relevant cases.
- “Waveforms are optional.” -> They are often the only way to debug timing.
Check-Your-Understanding Questions
- Why keep testbench and DUT separate?
- What is the purpose of
$dumpvars? - How do you make a testbench deterministic?
Where You’ll Apply It
- This project: used throughout Section 6 (testing)
- Also used in: all other projects in this folder
3. Project Specification
3.1 What You Will Build
An 8-bit ALU supporting add/sub/and/or/xor/shift and flags.
3.2 Functional Requirements
- Requirement 1: Support at least 6 ALU operations
- Requirement 2: Generate zero, carry, and negative flags
- Requirement 3: Provide deterministic simulation outputs
3.3 Non-Functional Requirements
- Performance: Stable operation at the target clock and interfaces.
- Reliability: Deterministic outputs on all defined inputs.
- Usability: Clear ports and documented behavior.
3.4 Example Usage / Output
{p['example_usage']}
3.5 Data Formats / Schemas / Protocols
{p[‘data_format’]}
3.6 Edge Cases
- 0x00 + 0x00
- 0xFF + 0x01
- 0x00 - 0x01
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
vvp alu_tb
3.7.2 Golden Path Demo (Deterministic)
Run the demo command above with the provided testbench and confirm the outputs match the golden transcript.
3.7.3 CLI Transcript
OP=ADD 0x12 + 0x34 = 0x46 Z=0 C=0 N=0
3.7.4 Failure Demo (Expected)
# Example failure case
ERROR: Output mismatch at vector 3
Expected: 0x0A, Got: 0x0B
EXIT CODE: 1
Notes:
- Exit code 0 indicates all tests passed
- Exit code 1 indicates a test failure
4. Solution Architecture
4.1 High-Level Design
[inputs] -> [core logic] -> [outputs]

4.2 Key Components
| Component | Responsibility |
|---|---|
| alu_core | Arithmetic and logic operations |
| flag_gen | Zero/carry/negative flag logic |
4.3 Data Structures (No Full Code)
// Example signals (adapt to your design)
reg [7:0] state_reg;
reg [7:0] data_reg;
4.4 Algorithm Overview
Key Algorithm: Core control flow
- Initialize state/reset conditions.
- Apply inputs and compute outputs.
- Update state on clock edges (if sequential).
Complexity Analysis:
- Time: O(1) per cycle
- Space: O(N) for registers and logic
5. Implementation Guide
5.1 Development Environment Setup
iverilog -v
# Ensure GTKWave is installed for waveform viewing
5.2 Project Structure
project-root/
|-- src/
| |-- top.v
| |-- core.v
|-- tb/
| |-- tb.v
|-- Makefile
|-- README.md

5.3 The Core Question You’re Answering
“How do real CPUs choose arithmetic operations and generate flags?”
5.4 Concepts You Must Understand First
- Adders
- Two’s complement
- Muxes
5.5 Questions to Guide Your Design
- How many opcode bits do you need?
- Will shifts be logical or arithmetic?
5.6 Thinking Exercise
Define a 3-bit opcode table for 8 operations.
5.7 The Interview Questions They’ll Ask
- What is the zero flag?
- Carry vs overflow: difference?
- How do you implement subtraction?
5.8 Hints in Layers
- Implement add/sub first, then bitwise ops.
- Use a case statement to select output.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| ALU design | Digital Design and Computer Architecture | Ch. 5 |
| Two’s complement | Code (Petzold) | Ch. 12 |
5.10 Implementation Phases
Phase 1: Foundation
Goals:
- Establish core module structure
- Implement minimal behavior
Tasks:
- Scaffold module ports and internal signals
- Write a minimal testbench that compiles
Checkpoint: Simulation runs without errors
Phase 2: Core Functionality
Goals:
- Implement full logic
- Verify edge cases
Tasks:
- Complete core logic
- Add directed tests for edge cases
Checkpoint: All tests pass and waveforms match expectations
Phase 3: Polish & Edge Cases
Goals:
- Improve readability
- Document behavior
Tasks:
- Add comments and README notes
- Expand tests for unusual inputs
Checkpoint: Design is deterministic and documented
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Reset strategy | Sync / Async | Sync | Simpler timing closure |
| Test coverage | Directed / Exhaustive | Exhaustive for small logic | Prevents missed cases |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test core logic | Small vectors |
| Integration Tests | Test modules together | Full system |
| Edge Case Tests | Boundary conditions | Max/min values |
6.2 Critical Test Cases
- Test 1: Edge cases for overflow and underflow
- Test 2: Random vectors with expected results
6.3 Test Data
Use deterministic vectors and document expected outputs.
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong carry flag | Flag incorrect on subtraction | Define flag behavior and test |
7.2 Debugging Strategies
- Inspect waveforms at key internal signals
- Add temporary debug outputs to verify state
- Reduce testcases to the smallest failing case
7.3 Performance Traps
- Overly wide counters or combinational paths can reduce max clock
8. Extensions & Challenges
8.1 Beginner Extensions
- Add parameterization for widths
- Add optional features (enable, reset)
8.2 Intermediate Extensions
- Add configuration registers
- Build a simple driver or demo program
8.3 Advanced Extensions
- Integrate with another project in this series
- Implement a hardware demo on FPGA
9. Real-World Connections
9.1 Industry Applications
- Digital control systems and embedded peripherals
- FPGA prototyping and validation
9.2 Related Open Source Projects
- Yosys / nextpnr toolchain for open-source FPGA flow
- Example HDL projects in the FPGA community
9.3 Interview Relevance
- Demonstrates RTL thinking and verification skills
10. Resources
10.1 Essential Reading
- Digital Design and Computer Architecture - Focus on Ch. 5
- Code (Petzold) - Focus on Ch. 12
10.2 Video Resources
- Search for project-specific HDL walkthroughs and waveforms
10.3 Tools & Documentation
- Icarus Verilog
- GTKWave
10.4 Related Projects in This Series
- See adjacent projects in
VERILOG_FROM_ZERO_PROJECTS/
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the core concept without notes
- I can predict waveform behavior for basic inputs
11.2 Implementation
- All functional requirements are met
- All tests pass
- Edge cases are documented
11.3 Growth
- I can explain this project in an interview
- I documented at least one lesson learned
12. Submission / Completion Criteria
Minimum Viable Completion:
- Functional requirements implemented
- Testbench passes
- Waveforms inspected
Full Completion:
- All minimum criteria plus
- Edge cases covered and documented
Excellence (Going Above & Beyond):
- Hardware demo on FPGA
- Clear write-up of lessons learned
Appendix A: Deep Dive Walkthrough
A.1 Signal Map and Opcode Contract
- Inputs:
a[7:0],b[7:0],op[2:0] - Outputs:
y[7:0],z(zero),n(negative),c(carry),v(overflow)
Suggested opcode map (document in your RTL and tests):
000: ADD001: SUB010: AND011: OR100: XOR101: SLT (signed)110: SHL (logical left)111: SHR (logical right)
A.2 Flag Definitions
- Z:
y == 0 - N:
y[7] == 1(signed negative) - C: carry-out from add/sub (borrow for sub handled explicitly)
- V: signed overflow for add/sub
A.3 Reference Vectors (ADD/SUB)
| op | a | b | y | c | v | z | |—–|——|——|——|—|—|—| | ADD | 7F | 01 | 80 | 0 | 1 | 0 | | ADD | FF | 01 | 00 | 1 | 0 | 1 | | SUB | 00 | 01 | FF | 0 | 0 | 0 | | SUB | 80 | 01 | 7F | 1 | 1 | 0 |
A.4 Dataflow Sketch
+-------------------+
A ------>| |----> Y
B ------>| ALU core + flags |
OP ----->| |
+-------------------+

A.5 Verification Checklist
- Verify each opcode with at least 3 vectors (normal, boundary, extreme).
- Check flags independently of
y(especially V and C). - Confirm shift operations fill with zeros (logical).
13. Deep Dive Appendix
13.1 Timing and Resource Budget
- The ALU critical path typically goes through op decode -> mux -> adder/logic -> flags.
- For FPGA, add/sub uses the carry chain; logic ops map to LUTs. The mux depth often sets the clock limit.
- If you plan to clock the ALU, reserve timing slack for the longest opcode (usually ADD/SUB).
13.2 Waveform Interpretation Guide
- Sweep opcode and verify that result and flags (Z, C, N, V) match the expected math.
- Check flag timing: flags should update in the same cycle as the result.
Example (unsigned):
A=0xF0, B=0x30, ADD -> R=0x20, C=1
A=0x80, B=0x80, ADD (signed) -> R=0x00, V=1
13.3 Hardware Bring-Up Notes
- Use DIP switches for A and B, another bank for opcode.
- Display result on LEDs or a 7-seg bank via a simple hex display module.
- If your board has few switches, build a small input latch controlled by a button.
13.4 Alternate Implementations and Trade-offs
- Unified adder: implement ADD/SUB by conditionally inverting B and adding carry-in.
- Dedicated logic block: keep AND/OR/XOR in a separate module and mux at the end.
- Pipelined ALU: add a register stage after the ALU for higher clock rates.
13.5 Additional Exercises
- Add shift and rotate operations with a barrel shifter.
- Implement compare operations (EQ, LT, GE) and expose a compare flag bus.
- Add saturating arithmetic for signal-processing style behavior.