Project 3: Stack Machine

Build a stack-based calculator/VM that parses expressions and executes bytecode instructions.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1 week
Language C (Alternatives: Rust, Python)
Prerequisites Projects 1-2, basic parsing
Key Topics LIFO semantics, expression evaluation, VM design

1. Learning Objectives

By completing this project, you will:

  1. Implement a stack data structure with push/pop/peek.
  2. Parse infix expressions and convert to postfix (RPN).
  3. Build a simple bytecode VM that executes stack ops.
  4. Explain how stacks support function calls and recursion.

2. Theoretical Foundation

2.1 Core Concepts

  • LIFO discipline: Last-in, first-out order models nested computation.
  • RPN evaluation: Postfix notation eliminates precedence rules during execution.
  • Call stacks: Function calls push frames; returns pop them.
  • Virtual machines: Instruction loops interpret bytecode.

2.2 Why This Matters

Stacks are everywhere: compilers, interpreters, parsing, and backtracking. Understanding a stack machine explains how many languages actually execute code.

2.3 Historical Context / Background

Stack machines power early calculators (HP RPN) and virtual machines like the JVM and WebAssembly (stack-based execution models).

2.4 Common Misconceptions

  • “Stack = call stack only”: Many systems use explicit stacks for algorithms.
  • “RPN is obscure”: It is an execution-friendly form that simplifies evaluation.

3. Project Specification

3.1 What You Will Build

  • A stack library (array-based or linked).
  • A parser that converts infix to postfix.
  • A bytecode VM that executes stack instructions.

3.2 Functional Requirements

  1. Parse arithmetic expressions with +, -, *, /, parentheses.
  2. Convert to postfix using the shunting-yard algorithm.
  3. Execute postfix or bytecode with a stack.
  4. Provide a REPL that accepts expressions.

3.3 Non-Functional Requirements

  • Performance: Expressions of 1,000 tokens evaluate fast.
  • Reliability: Detect division by zero and malformed input.
  • Usability: Errors include token index and reason.

3.4 Example Usage / Output

$ ./stack_vm
> (2 + 3) * 4
Postfix: 2 3 + 4 *
Result: 20

3.5 Real World Outcome

The console shows your expression, prints its postfix form, then steps through stack operations and returns a final result. With a “trace” flag, it prints each instruction and the current stack snapshot.


4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌────────────────┐     ┌──────────────┐
│ Parser       │────▶│ Postfix/Bytecode│────▶│ Stack VM      │
└──────────────┘     └────────────────┘     └──────────────┘

4.2 Key Components

Component Responsibility Key Decisions
Tokenizer Break input into tokens Handle numbers and operators
Parser Shunting-yard to postfix Operator precedence table
VM Executes stack ops Iterative loop vs recursion

4.3 Data Structures

typedef struct {
    int* data;
    size_t size;
    size_t capacity;
} Stack;

typedef enum {
    OP_PUSH, OP_ADD, OP_SUB, OP_MUL, OP_DIV
} OpCode;

4.4 Algorithm Overview

Key Algorithm: Shunting-Yard

  1. Read tokens left to right.
  2. Push operators to a stack based on precedence.
  3. Emit operands immediately.
  4. Pop remaining operators at the end.

Complexity Analysis:

  • Time: O(n) tokens
  • Space: O(n) operators

5. Implementation Guide

5.1 Development Environment Setup

clang -Wall -Wextra -g -o stack_vm src/*.c

5.2 Project Structure

project-root/
├── src/
│   ├── stack.c
│   ├── parser.c
│   ├── vm.c
│   └── main.c
├── tests/
│   └── test_parser.c
└── Makefile

5.3 The Core Question You’re Answering

“How does a simple stack turn text expressions into actual computation?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Operator precedence and associativity
  2. Tokenization
  3. Stack API invariants

5.5 Questions to Guide Your Design

  1. How will you represent numbers in the bytecode?
  2. Will you support unary minus?
  3. How will you report parse errors with location?

5.6 Thinking Exercise

Manually convert 3 + 4 * 2 / (1 - 5) to postfix and evaluate it step by step.

5.7 The Interview Questions They’ll Ask

  1. “How does the shunting-yard algorithm work?”
  2. “Why is postfix easier to evaluate than infix?”
  3. “How does a call stack relate to a stack data structure?”

5.8 Hints in Layers

Hint 1: Start with postfix evaluation only Once postfix works, add infix parsing.

Hint 2: Add tracing Print the stack after each opcode for debugging.

5.9 Books That Will Help

Topic Book Chapter
Stacks CLRS Ch. 10
Parsing Crafting Interpreters Ch. 5-7

5.10 Implementation Phases

Phase 1: Foundation (2 days)

Goals:

  • Stack implementation
  • Postfix evaluator

Tasks:

  1. Implement push/pop/peek.
  2. Evaluate postfix arrays.

Checkpoint: 2 3 + returns 5.

Phase 2: Core Functionality (3 days)

Goals:

  • Infix parser
  • Operator precedence

Tasks:

  1. Implement tokenizer.
  2. Implement shunting-yard.

Checkpoint: (2+3)*4 yields postfix 2 3 + 4 *.

Phase 3: Polish & Edge Cases (2 days)

Goals:

  • Error handling and tracing
  • REPL

Tasks:

  1. Add error messages with token index.
  2. Add --trace output.

Checkpoint: Invalid expressions show clear errors.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Stack backing Array, linked list Array Cache-friendly, simple
VM design Direct eval, bytecode Bytecode Extensible for new ops

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Stack and tokenizer push/pop, tokens
Integration Tests Parser + VM expression cases
Edge Case Tests Malformed input missing paren

6.2 Critical Test Cases

  1. Operator precedence: 2 + 3 * 4 -> 14.
  2. Parentheses: (2 + 3) * 4 -> 20.
  3. Divide by zero detection.

6.3 Test Data

Expressions: 1+2, 3*(4-5), 10/2+7

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Precedence errors Wrong result Check operator table
Stack underflow Crash during eval Validate before pop
Tokenization bugs Missing numbers Properly parse multi-digit

7.2 Debugging Strategies

  • Print token stream and postfix output.
  • Run with --trace to see stack changes.

7.3 Performance Traps

  • Rebuilding tokens per character can be slow; use a single pass.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add exponentiation operator.
  • Add variables with a simple map.

8.2 Intermediate Extensions

  • Implement function calls like sin(3).
  • Add bytecode disassembler.

8.3 Advanced Extensions

  • Add control flow (if, while) to the VM.
  • Implement a stack-frame model for functions.

9. Real-World Connections

9.1 Industry Applications

  • Compilers: parse expressions into stacks.
  • Scripting engines: bytecode VMs.
  • Lua: stack-based VM execution.
  • Wasm: stack-machine semantics.

9.3 Interview Relevance

  • Expression evaluation problems are common in interviews.

10. Resources

10.1 Essential Reading

  • Crafting Interpreters by Bob Nystrom - parsing basics
  • CLRS - stacks and queues

10.2 Video Resources

  • “Shunting-Yard Algorithm” - compilers lectures

10.3 Tools & Documentation

  • UBSan: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

11. Self-Assessment Checklist

11.1 Understanding

  • I can convert infix to postfix.
  • I can explain how stack evaluation works.
  • I can describe call stack similarities.

11.2 Implementation

  • REPL works with clear errors.
  • Tests pass for precedence and parentheses.

11.3 Growth

  • I can add a new opcode without breaking the VM.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Stack implementation and postfix evaluator.
  • Infix parser with precedence.

Full Completion:

  • Bytecode VM and REPL.

Excellence (Going Above & Beyond):

  • Variables, functions, and control flow.

This guide was generated from DATA_STRUCTURES_FROM_FIRST_PRINCIPLES.md. For the complete learning path, see the parent directory README.