Project 1: Command-Line Calculator with Expression Parser

Build a calculator that parses and evaluates mathematical expressions with operator precedence, variables, and history tracking.

Quick Reference

Attribute	Value
Difficulty	Beginner
Time Estimate	1-2 weeks
Language	C++
Prerequisites	Basic programming concepts, understanding of arithmetic
Key Topics	String parsing, recursive descent, tokenization, std::map, exception handling

1. Learning Objectives

After completing this project, you will:

Understand how to tokenize input strings into meaningful units (lexical analysis)
Master recursive descent parsing, the foundation of all language processing
Learn to implement operator precedence without ambiguity
Use STL containers (std::map, std::vector) effectively
Handle errors gracefully with C++ exceptions
Build a complete, interactive command-line application

2. Theoretical Foundation

2.1 Core Concepts

Tokenization (Lexical Analysis)

Before you can evaluate an expression like 3 + 4 * 2, you must break it into tokens:

3 (NUMBER)
+ (PLUS)
4 (NUMBER)
* (MULT)
2 (NUMBER)

This process transforms a stream of characters into a stream of meaningful symbols. Each token has a type and optionally a value.

Recursive Descent Parsing

Parsing is the process of understanding the structure of input according to a grammar. Recursive descent is an intuitive technique where each grammar rule becomes a function that calls other grammar functions.

For mathematical expressions, the grammar handles precedence:

expression := term (('+' | '-') term)*
term       := factor (('*' | '/') factor)*
factor     := NUMBER | VARIABLE | '(' expression ')'

This grammar ensures multiplication and division (in term) bind tighter than addition and subtraction (in expression).

Operator Precedence

Why does 3 + 4 * 2 equal 11, not 14? Because multiplication has higher precedence than addition. The expression parses as 3 + (4 * 2), not (3 + 4) * 2.

Recursive descent naturally handles precedence: lower-precedence operators are parsed at higher levels in the call tree.

Parsing: 3 + 4 * 2

parseExpression()
  parseTerm() -> returns 3
  sees '+', consumes it
  parseTerm()
    parseFactor() -> returns 4
    sees '*', consumes it
    parseFactor() -> returns 2
    computes 4 * 2 = 8
    returns 8
  computes 3 + 8 = 11
  returns 11

Symbol Tables

Variables like x = 10 require storing name-value mappings. A symbol table (implemented with std::map or std::unordered_map) provides O(log n) or O(1) lookup.

2.2 Why This Matters

Expression parsing is the foundation of:

Compilers and interpreters: Every programming language starts with parsing
Configuration files: YAML, JSON, TOML parsers use similar techniques
Query languages: SQL, GraphQL, regex engines
Calculators and spreadsheets: Excel formulas are parsed expressions
Domain-specific languages (DSLs): Custom mini-languages in applications

Understanding parsing makes you a better programmer because you understand how code transforms from text to execution.

2.3 Historical Context

Recursive descent parsing dates to the 1960s. Niklaus Wirth, creator of Pascal, popularized it in his book “Algorithms + Data Structures = Programs” (1976). Today, most hand-written parsers use recursive descent because it’s intuitive and maps directly to the grammar.

Parser generators like YACC (1975) and Bison automate parser creation, but understanding manual parsing remains essential for:

Simple languages (where generators are overkill)
Error recovery (hand-tuned error messages)
Performance-critical parsers

2.4 Common Misconceptions

Misconception 1: “I need a parser generator for any parsing” False. For simple grammars (like expressions), hand-written recursive descent is cleaner and more maintainable.

Misconception 2: “Operator precedence requires complex algorithms” False. The grammar structure handles precedence automatically. Each precedence level becomes one parsing function.

Misconception 3: “Tokenization and parsing are the same” False. Tokenization (lexing) breaks input into tokens. Parsing understands the structure of those tokens according to grammar rules.

3. Project Specification

3.1 What You Will Build

A command-line calculator that:

Reads expressions from standard input
Tokenizes input into meaningful symbols
Parses expressions respecting operator precedence
Evaluates expressions and prints results
Supports variable assignment and retrieval
Maintains calculation history
Handles errors gracefully (division by zero, undefined variables, syntax errors)

3.2 Functional Requirements

ID	Requirement
F1	Parse and evaluate arithmetic expressions with +, -, *, /
F2	Respect standard operator precedence (multiplication/division before addition/subtraction)
F3	Support parentheses for grouping: `(1 + 2) * 3`
F4	Support variable assignment: `x = 10`
F5	Support variable usage in expressions: `x * 2 + 5`
F6	Maintain history of calculations
F7	Display history with `history` command
F8	Handle decimal numbers (floating-point)
F9	Provide clear error messages for invalid input
F10	Exit cleanly with `quit` or `exit` command

3.3 Non-Functional Requirements

ID	Requirement
N1	Response time under 10ms for typical expressions
N2	Handle expressions up to 1000 characters
N3	Support at least 100 stored variables
N4	Maintain history of at least 1000 calculations
N5	No memory leaks (verified with Valgrind)

3.4 Example Usage / Output

$ ./calculator
> 3 + 4 * 2
Result: 11

> x = 10
x = 10

> x * 2 + 5
Result: 25

> (1 + 2) * (3 + 4)
Result: 21

> 5 / 0
Error: Division by zero

> y + 1
Error: Undefined variable 'y'

> history
1: 3 + 4 * 2 = 11
2: x = 10
3: x * 2 + 5 = 25
4: (1 + 2) * (3 + 4) = 21

> 2.5 * 4
Result: 10

> (3 +
Error: Unexpected end of expression, expected ')'

> quit
Goodbye!

3.5 Real World Outcome

After completing this project, you will have:

A working calculator you can use daily
Understanding of how programming languages are parsed
Experience with C++ classes, STL containers, and exceptions
A foundation for building more complex parsers (JSON, config files, DSLs)

4. Solution Architecture

4.1 High-Level Design

                         +-------------------+
User Input  -----------> |    Tokenizer      |
                         +-------------------+
                                  |
                           Vector<Token>
                                  |
                                  v
                         +-------------------+
                         |     Parser        |
                         +-------------------+
                                  |
                              AST/Value
                                  |
                                  v
                         +-------------------+
                         |   Evaluator       | <---> Symbol Table
                         +-------------------+
                                  |
                               Result
                                  |
                                  v
                         +-------------------+
                         |     Output        | ---> History
                         +-------------------+

4.2 Key Components

Token

struct Token {
    enum Type { NUMBER, PLUS, MINUS, MULT, DIV, LPAREN, RPAREN,
                VARIABLE, ASSIGN, END };
    Type type;
    double value;      // For NUMBER tokens
    std::string name;  // For VARIABLE tokens
};

Tokenizer

Converts input string to vector of tokens
Skips whitespace
Recognizes numbers (including decimals), operators, parentheses, identifiers

Parser

Implements recursive descent
Three main functions: parseExpression(), parseTerm(), parseFactor()
Returns computed value (or could return AST for more complex interpreters)

Calculator

Orchestrates tokenizer and parser
Manages symbol table (variable storage)
Maintains history
Handles REPL (Read-Eval-Print Loop)

4.3 Data Structures

Structure	Purpose	C++ Type
Token stream	Hold tokenized input	`std::vector<Token>`
Symbol table	Store variables	`std::map<std::string, double>`
History	Track past calculations	`std::vector<std::pair<std::string, double>>`

4.4 Algorithm Overview

Tokenization Algorithm

Start at position 0
Skip whitespace
If end of input, return END token
If digit, consume all digits (and decimal point), return NUMBER
If letter, consume all alphanumeric chars, return VARIABLE
If operator (+,-,*,/), return corresponding token
If parenthesis, return LPAREN or RPAREN
If '=', return ASSIGN
Otherwise, error: unknown character

Recursive Descent Parsing

parseExpression():
    left = parseTerm()
    while current token is + or -:
        op = current token
        advance()
        right = parseTerm()
        left = apply op to (left, right)
    return left

parseTerm():
    left = parseFactor()
    while current token is * or /:
        op = current token
        advance()
        right = parseFactor()
        left = apply op to (left, right)
    return left

parseFactor():
    if current is NUMBER:
        return its value
    if current is VARIABLE:
        return symbol_table[name]
    if current is LPAREN:
        advance()
        result = parseExpression()
        expect RPAREN
        return result
    error: unexpected token

5. Implementation Guide

5.1 Development Environment Setup

# Create project directory
mkdir calculator && cd calculator

# Create source files
touch main.cpp token.hpp tokenizer.hpp tokenizer.cpp parser.hpp parser.cpp

# Compile (C++17 for std::optional, string_view)
g++ -std=c++17 -Wall -Wextra -o calculator main.cpp tokenizer.cpp parser.cpp

# Or with CMake
cat > CMakeLists.txt << 'EOF'
cmake_minimum_required(VERSION 3.10)
project(calculator)
set(CMAKE_CXX_STANDARD 17)
add_executable(calculator main.cpp tokenizer.cpp parser.cpp)
EOF
mkdir build && cd build && cmake .. && make

5.2 Project Structure

calculator/
├── CMakeLists.txt
├── include/
│   ├── token.hpp       # Token struct definition
│   ├── tokenizer.hpp   # Tokenizer class declaration
│   └── parser.hpp      # Parser class declaration
├── src/
│   ├── main.cpp        # REPL and main function
│   ├── tokenizer.cpp   # Tokenizer implementation
│   └── parser.cpp      # Parser implementation
└── tests/
    └── test_calculator.cpp  # Unit tests

5.3 The Core Question You’re Answering

“How do I transform a string of characters into a computed numerical result while respecting mathematical rules?”

This question decomposes into:

How do I recognize meaningful units (tokens) in text?
How do I understand the structure of those tokens (parsing)?
How do I handle operator precedence correctly?
How do I evaluate the parsed structure to get a result?

5.4 Concepts You Must Understand First

Before starting, verify you can answer:

Question	Book Reference
What is the difference between `std::string` and `const char*`?	C++ Primer Plus Ch. 4
How do you use `std::map` to store and retrieve values?	C++ Primer Plus Ch. 16
What is a recursive function and when does it terminate?	C++ Primer Plus Ch. 7
How do exceptions work in C++ (`try`, `catch`, `throw`)?	C++ Primer Plus Ch. 15
What is the difference between `++i` (prefix) and `i++` (postfix)?	C++ Primer Plus Ch. 5

5.5 Questions to Guide Your Design

Tokenizer Design

How will you handle multi-character numbers like 123.45?
How will you distinguish between - as subtraction vs negative number?
What happens when you encounter an unknown character?

Parser Design

How will you track the “current” token?
What should parseFactor() do when it sees a VARIABLE token?
How will you handle the case of missing closing parenthesis?

Error Handling

What types of errors are possible? (Lexical, syntactic, semantic, runtime)
How will you report line/column numbers for errors?
Should you continue parsing after an error or stop immediately?

State Management

How will you store variables between expressions?
How will you format history entries?
When should history be saved? (Every expression? Only successful ones?)

5.6 Thinking Exercise

Before writing code, trace through this expression by hand:

Input: "2 + 3 * 4 - 1"

Step 1: Tokenize
Tokens: [NUMBER(2), PLUS, NUMBER(3), MULT, NUMBER(4), MINUS, NUMBER(1), END]

Step 2: Parse (trace the recursive calls)
parseExpression():
  parseTerm():
    parseFactor() -> 2
    no * or /, return 2
  left = 2
  see PLUS
  parseTerm():
    parseFactor() -> 3
    see MULT
    parseFactor() -> 4
    compute 3 * 4 = 12
    no more * or /, return 12
  left = 2 + 12 = 14
  see MINUS
  parseTerm():
    parseFactor() -> 1
    no * or /, return 1
  left = 14 - 1 = 13
  no more + or -, return 13

Result: 13

Now trace "(2 + 3) * 4" and verify you get 20.

5.7 Hints in Layers

Hint 1: Starting Point (Conceptual) Begin with the tokenizer. Write a class that converts a string to a vector of tokens. Test it independently before moving to parsing.

Hint 2: Next Level (More Specific) For the tokenizer, use a single nextToken() method that returns one token and advances position. Use peek() to look at the current character without consuming it.

class Tokenizer {
    std::string input;
    size_t pos = 0;

    char peek() { return pos < input.size() ? input[pos] : '\0'; }
    char advance() { return input[pos++]; }
    void skipWhitespace();
    Token parseNumber();
    Token parseIdentifier();
public:
    Tokenizer(const std::string& input) : input(input) {}
    Token nextToken();
    std::vector<Token> tokenize();
};

Hint 3: Technical Details (Approach) For the parser, store all tokens in a vector and use an index to track position:

class Parser {
    std::vector<Token> tokens;
    size_t pos = 0;
    std::map<std::string, double>& variables;

    Token& current() { return tokens[pos]; }
    void advance() { pos++; }
    bool match(Token::Type type);

    double parseExpression();
    double parseTerm();
    double parseFactor();
public:
    Parser(const std::vector<Token>& tokens,
           std::map<std::string, double>& vars)
        : tokens(tokens), variables(vars) {}

    double parse();
};

Hint 4: Implementation Details Handle assignment as a special case in the REPL, not in the parser:

// In main REPL loop
std::string line;
while (std::getline(std::cin, line)) {
    // Check for assignment: "x = expr"
    auto eqPos = line.find('=');
    if (eqPos != std::string::npos) {
        std::string varName = trim(line.substr(0, eqPos));
        std::string expr = line.substr(eqPos + 1);
        double value = evaluate(expr);
        variables[varName] = value;
        std::cout << varName << " = " << value << "\n";
    } else {
        double result = evaluate(line);
        std::cout << "Result: " << result << "\n";
        history.push_back({line, result});
    }
}

5.8 The Interview Questions They’ll Ask

“What is the difference between a lexer and a parser?”
- Lexer (tokenizer) converts characters to tokens (lexical analysis)
- Parser understands structure of tokens according to grammar (syntactic analysis)
“How does your parser handle operator precedence?”
- Grammar structure: lower precedence operators at higher levels
- parseExpression handles +/-, calls parseTerm
- parseTerm handles *//, calls parseFactor
- This ensures */ evaluated before +/-
“What happens if the user enters an invalid expression?”
- Tokenizer throws on unknown characters
- Parser throws on unexpected tokens (e.g., two operators in a row)
- Division by zero caught at evaluation time
- All errors caught with try/catch in REPL
“Could you extend this to support functions like sin(x)?”
- Add FUNCTION token type in tokenizer
- Extend parseFactor to recognize function calls
- parseFactor would call parseExpression for the argument
- Store functions in a map<string, function<double(double)»
“How would you add support for unary minus like -5 or -(3+2)?”
- Handle in parseFactor: if current is MINUS, consume it and negate result
- Must distinguish unary (at start or after operator) from binary (between numbers)

5.9 Books That Will Help

Topic	Book	Chapter
String Streams	C++ Primer Plus	Chapter 17
Recursive Descent Parsing	Compilers: Principles and Practice	Chapter 4
STL Maps	The C++ Programming Language	Chapter 31
Exception Handling	C++ Primer Plus	Chapter 15
Grammar and Languages	Engineering a Compiler	Chapter 3

5.10 Implementation Phases

Phase 1: Token Infrastructure (Day 1-2)

Define Token struct with all types
Write basic tokenizer that handles numbers and operators
Test: tokenize “3 + 4” returns [NUMBER(3), PLUS, NUMBER(4), END]

Phase 2: Basic Parser (Day 3-4)

Implement parseFactor (numbers only)
Implement parseTerm (multiplication and division)
Implement parseExpression (addition and subtraction)
Test: “3 + 4 * 2” returns 11

Phase 3: Parentheses (Day 5)

Extend parseFactor to handle LPAREN
Add error checking for missing RPAREN
Test: “(3 + 4) * 2” returns 14

Phase 4: Variables (Day 6-7)

Add VARIABLE token type
Extend tokenizer to recognize identifiers
Add symbol table (std::map)
Handle assignment in REPL
Test: “x = 5” then “x * 2” returns 10

Phase 5: Polish (Day 8-10)

Add history command
Improve error messages
Handle edge cases (empty input, very long numbers)
Add decimal number support
Write comprehensive tests

5.11 Key Implementation Decisions

Decision	Options	Recommendation
Token storage	One at a time vs vector	Vector (simpler parser)
Parser output	Value vs AST	Value (for this project)
Number type	int vs double	double (more flexible)
Variable lookup	map vs unordered_map	map (ordered iteration)
Error handling	Return codes vs exceptions	Exceptions (cleaner)

6. Testing Strategy

6.1 Unit Tests

// Test tokenizer
void testTokenizer() {
    Tokenizer t("3 + 4.5");
    auto tokens = t.tokenize();
    assert(tokens[0].type == Token::NUMBER && tokens[0].value == 3);
    assert(tokens[1].type == Token::PLUS);
    assert(tokens[2].type == Token::NUMBER && tokens[2].value == 4.5);
    assert(tokens[3].type == Token::END);
}

// Test parser
void testParser() {
    std::map<std::string, double> vars;
    assert(evaluate("3 + 4 * 2", vars) == 11);
    assert(evaluate("(3 + 4) * 2", vars) == 14);
    assert(evaluate("10 / 2 - 3", vars) == 2);
}

// Test variables
void testVariables() {
    std::map<std::string, double> vars;
    vars["x"] = 10;
    assert(evaluate("x + 5", vars) == 15);
    assert(evaluate("x * x", vars) == 100);
}

6.2 Edge Cases to Test

Input	Expected
`""`	Error or 0
`42`	42
`3.14159`	3.14159
`1 + 2 + 3 + 4`	10
`((((1))))`	1
`1 / 0`	Error
`x` (undefined)	Error
`1 +`	Error
`* 5`	Error
`1 2 3`	Error

7. Common Pitfalls & Debugging

Problem	Symptom	Root Cause	Fix
Wrong precedence	`3 + 4 * 2 = 14`	Parsing +/- same level as */	Separate into parseExpression and parseTerm
Infinite loop	Program hangs	Parser not advancing token	Ensure every path calls `advance()`
Missing token	“Unexpected end”	Tokenizer ends too early	Check loop condition in tokenizer
Variable not found	Runtime error	Using `map[]` for lookup	Use `map.find()` or `map.at()` with try/catch
Number parsing wrong	`3.5` becomes `3`	Not handling decimal point	Continue parsing digits after `.`

Debugging Tips

Print tokens before parsing to verify tokenizer works
Add tracing to parser functions: print which function and current token
Test small first: verify 5 works before 3 + 4 * 2
Use assertions liberally during development

8. Extensions & Challenges

Once the basic calculator works, try these extensions:

Extension	Difficulty	Concepts Learned
Unary minus: `-5`, `-(3+2)`	Easy	Token context
Power operator: `2^3`	Medium	Right associativity
Functions: `sin(x)`, `sqrt(x)`	Medium	Function call parsing
Comparison: `x > 5`	Medium	Boolean results
Multiple statements: `x=5; y=x*2`	Medium	Statement separation
Comments: `5 + 3 // add`	Easy	Tokenizer modification
Error recovery	Hard	Parser error handling
Build AST instead of evaluating	Hard	Tree data structures

9. Real-World Connections

How Production Systems Solve This

System	Approach
Python	Hand-written recursive descent parser
JavaScript (V8)	Recursive descent with operator precedence climbing
GCC	LALR parser (generated by Bison)
Excel	Custom expression parser with function support
Calculator apps	Similar recursive descent

Industry Relevance

Every IDE has an expression evaluator for debuggers
Build systems (CMake, Make) parse configuration languages
Query builders parse SQL-like expressions
Template engines parse interpolation syntax

10. Resources

Primary References

Stroustrup, B. “The C++ Programming Language, 4th Edition” - Chapter 10 (Expressions)
Prata, S. “C++ Primer Plus, 6th Edition” - Chapters 4, 7, 15, 16
Parr, T. “Language Implementation Patterns” - Chapters 2-4

Online Resources

11. Self-Assessment Checklist

Before considering this project complete, verify:

12. Submission / Completion Criteria

This project is complete when:

All functional requirements (F1-F10) are implemented
All example expressions from section 3.4 work correctly
Edge cases are handled gracefully with clear error messages
Code is well-organized into separate header/source files
At least 10 unit tests pass
No memory leaks detected by Valgrind
You can explain how precedence is implemented in your parser
You can extend it to add one new operator (e.g., power ^)

Deliverables:

Source code with clear comments
README with build instructions
Test file demonstrating functionality
Brief writeup explaining your design decisions

This project establishes the foundation for language processing. The patterns you learn here—tokenization, recursive descent, symbol tables—appear in every compiler, interpreter, and configuration parser. Master this, and you’ve taken the first step toward understanding how programming languages work.