Project 1: Command-Line Calculator with Expression Parser

Build a calculator that parses and evaluates mathematical expressions with operator precedence, variables, and history tracking.

Quick Reference

Attribute Value
Difficulty Beginner
Time Estimate 1-2 weeks
Language C++
Prerequisites Basic programming concepts, understanding of arithmetic
Key Topics String parsing, recursive descent, tokenization, std::map, exception handling

1. Learning Objectives

After completing this project, you will:

  • Understand how to tokenize input strings into meaningful units (lexical analysis)
  • Master recursive descent parsing, the foundation of all language processing
  • Learn to implement operator precedence without ambiguity
  • Use STL containers (std::map, std::vector) effectively
  • Handle errors gracefully with C++ exceptions
  • Build a complete, interactive command-line application

2. Theoretical Foundation

2.1 Core Concepts

Tokenization (Lexical Analysis)

Before you can evaluate an expression like 3 + 4 * 2, you must break it into tokens:

  • 3 (NUMBER)
  • + (PLUS)
  • 4 (NUMBER)
  • * (MULT)
  • 2 (NUMBER)

This process transforms a stream of characters into a stream of meaningful symbols. Each token has a type and optionally a value.

Recursive Descent Parsing

Parsing is the process of understanding the structure of input according to a grammar. Recursive descent is an intuitive technique where each grammar rule becomes a function that calls other grammar functions.

For mathematical expressions, the grammar handles precedence:

expression := term (('+' | '-') term)*
term       := factor (('*' | '/') factor)*
factor     := NUMBER | VARIABLE | '(' expression ')'

This grammar ensures multiplication and division (in term) bind tighter than addition and subtraction (in expression).

Operator Precedence

Why does 3 + 4 * 2 equal 11, not 14? Because multiplication has higher precedence than addition. The expression parses as 3 + (4 * 2), not (3 + 4) * 2.

Recursive descent naturally handles precedence: lower-precedence operators are parsed at higher levels in the call tree.

Parsing: 3 + 4 * 2

parseExpression()
  parseTerm() -> returns 3
  sees '+', consumes it
  parseTerm()
    parseFactor() -> returns 4
    sees '*', consumes it
    parseFactor() -> returns 2
    computes 4 * 2 = 8
    returns 8
  computes 3 + 8 = 11
  returns 11

Symbol Tables

Variables like x = 10 require storing name-value mappings. A symbol table (implemented with std::map or std::unordered_map) provides O(log n) or O(1) lookup.

2.2 Why This Matters

Expression parsing is the foundation of:

  • Compilers and interpreters: Every programming language starts with parsing
  • Configuration files: YAML, JSON, TOML parsers use similar techniques
  • Query languages: SQL, GraphQL, regex engines
  • Calculators and spreadsheets: Excel formulas are parsed expressions
  • Domain-specific languages (DSLs): Custom mini-languages in applications

Understanding parsing makes you a better programmer because you understand how code transforms from text to execution.

2.3 Historical Context

Recursive descent parsing dates to the 1960s. Niklaus Wirth, creator of Pascal, popularized it in his book “Algorithms + Data Structures = Programs” (1976). Today, most hand-written parsers use recursive descent because it’s intuitive and maps directly to the grammar.

Parser generators like YACC (1975) and Bison automate parser creation, but understanding manual parsing remains essential for:

  • Simple languages (where generators are overkill)
  • Error recovery (hand-tuned error messages)
  • Performance-critical parsers

2.4 Common Misconceptions

Misconception 1: “I need a parser generator for any parsing” False. For simple grammars (like expressions), hand-written recursive descent is cleaner and more maintainable.

Misconception 2: “Operator precedence requires complex algorithms” False. The grammar structure handles precedence automatically. Each precedence level becomes one parsing function.

Misconception 3: “Tokenization and parsing are the same” False. Tokenization (lexing) breaks input into tokens. Parsing understands the structure of those tokens according to grammar rules.


3. Project Specification

3.1 What You Will Build

A command-line calculator that:

  1. Reads expressions from standard input
  2. Tokenizes input into meaningful symbols
  3. Parses expressions respecting operator precedence
  4. Evaluates expressions and prints results
  5. Supports variable assignment and retrieval
  6. Maintains calculation history
  7. Handles errors gracefully (division by zero, undefined variables, syntax errors)

3.2 Functional Requirements

ID Requirement
F1 Parse and evaluate arithmetic expressions with +, -, *, /
F2 Respect standard operator precedence (multiplication/division before addition/subtraction)
F3 Support parentheses for grouping: (1 + 2) * 3
F4 Support variable assignment: x = 10
F5 Support variable usage in expressions: x * 2 + 5
F6 Maintain history of calculations
F7 Display history with history command
F8 Handle decimal numbers (floating-point)
F9 Provide clear error messages for invalid input
F10 Exit cleanly with quit or exit command

3.3 Non-Functional Requirements

ID Requirement
N1 Response time under 10ms for typical expressions
N2 Handle expressions up to 1000 characters
N3 Support at least 100 stored variables
N4 Maintain history of at least 1000 calculations
N5 No memory leaks (verified with Valgrind)

3.4 Example Usage / Output

$ ./calculator
> 3 + 4 * 2
Result: 11

> x = 10
x = 10

> x * 2 + 5
Result: 25

> (1 + 2) * (3 + 4)
Result: 21

> 5 / 0
Error: Division by zero

> y + 1
Error: Undefined variable 'y'

> history
1: 3 + 4 * 2 = 11
2: x = 10
3: x * 2 + 5 = 25
4: (1 + 2) * (3 + 4) = 21

> 2.5 * 4
Result: 10

> (3 +
Error: Unexpected end of expression, expected ')'

> quit
Goodbye!

3.5 Real World Outcome

After completing this project, you will have:

  • A working calculator you can use daily
  • Understanding of how programming languages are parsed
  • Experience with C++ classes, STL containers, and exceptions
  • A foundation for building more complex parsers (JSON, config files, DSLs)

4. Solution Architecture

4.1 High-Level Design

                         +-------------------+
User Input  -----------> |    Tokenizer      |
                         +-------------------+
                                  |
                           Vector<Token>
                                  |
                                  v
                         +-------------------+
                         |     Parser        |
                         +-------------------+
                                  |
                              AST/Value
                                  |
                                  v
                         +-------------------+
                         |   Evaluator       | <---> Symbol Table
                         +-------------------+
                                  |
                               Result
                                  |
                                  v
                         +-------------------+
                         |     Output        | ---> History
                         +-------------------+

4.2 Key Components

Token

struct Token {
    enum Type { NUMBER, PLUS, MINUS, MULT, DIV, LPAREN, RPAREN,
                VARIABLE, ASSIGN, END };
    Type type;
    double value;      // For NUMBER tokens
    std::string name;  // For VARIABLE tokens
};

Tokenizer

  • Converts input string to vector of tokens
  • Skips whitespace
  • Recognizes numbers (including decimals), operators, parentheses, identifiers

Parser

  • Implements recursive descent
  • Three main functions: parseExpression(), parseTerm(), parseFactor()
  • Returns computed value (or could return AST for more complex interpreters)

Calculator

  • Orchestrates tokenizer and parser
  • Manages symbol table (variable storage)
  • Maintains history
  • Handles REPL (Read-Eval-Print Loop)

4.3 Data Structures

Structure Purpose C++ Type
Token stream Hold tokenized input std::vector<Token>
Symbol table Store variables std::map<std::string, double>
History Track past calculations std::vector<std::pair<std::string, double>>

4.4 Algorithm Overview

Tokenization Algorithm

1. Start at position 0
2. Skip whitespace
3. If end of input, return END token
4. If digit, consume all digits (and decimal point), return NUMBER
5. If letter, consume all alphanumeric chars, return VARIABLE
6. If operator (+,-,*,/), return corresponding token
7. If parenthesis, return LPAREN or RPAREN
8. If '=', return ASSIGN
9. Otherwise, error: unknown character

Recursive Descent Parsing

parseExpression():
    left = parseTerm()
    while current token is + or -:
        op = current token
        advance()
        right = parseTerm()
        left = apply op to (left, right)
    return left

parseTerm():
    left = parseFactor()
    while current token is * or /:
        op = current token
        advance()
        right = parseFactor()
        left = apply op to (left, right)
    return left

parseFactor():
    if current is NUMBER:
        return its value
    if current is VARIABLE:
        return symbol_table[name]
    if current is LPAREN:
        advance()
        result = parseExpression()
        expect RPAREN
        return result
    error: unexpected token

5. Implementation Guide

5.1 Development Environment Setup

# Create project directory
mkdir calculator && cd calculator

# Create source files
touch main.cpp token.hpp tokenizer.hpp tokenizer.cpp parser.hpp parser.cpp

# Compile (C++17 for std::optional, string_view)
g++ -std=c++17 -Wall -Wextra -o calculator main.cpp tokenizer.cpp parser.cpp

# Or with CMake
cat > CMakeLists.txt << 'EOF'
cmake_minimum_required(VERSION 3.10)
project(calculator)
set(CMAKE_CXX_STANDARD 17)
add_executable(calculator main.cpp tokenizer.cpp parser.cpp)
EOF
mkdir build && cd build && cmake .. && make

5.2 Project Structure

calculator/
├── CMakeLists.txt
├── include/
│   ├── token.hpp       # Token struct definition
│   ├── tokenizer.hpp   # Tokenizer class declaration
│   └── parser.hpp      # Parser class declaration
├── src/
│   ├── main.cpp        # REPL and main function
│   ├── tokenizer.cpp   # Tokenizer implementation
│   └── parser.cpp      # Parser implementation
└── tests/
    └── test_calculator.cpp  # Unit tests

5.3 The Core Question You’re Answering

“How do I transform a string of characters into a computed numerical result while respecting mathematical rules?”

This question decomposes into:

  1. How do I recognize meaningful units (tokens) in text?
  2. How do I understand the structure of those tokens (parsing)?
  3. How do I handle operator precedence correctly?
  4. How do I evaluate the parsed structure to get a result?

5.4 Concepts You Must Understand First

Before starting, verify you can answer:

Question Book Reference
What is the difference between std::string and const char*? C++ Primer Plus Ch. 4
How do you use std::map to store and retrieve values? C++ Primer Plus Ch. 16
What is a recursive function and when does it terminate? C++ Primer Plus Ch. 7
How do exceptions work in C++ (try, catch, throw)? C++ Primer Plus Ch. 15
What is the difference between ++i (prefix) and i++ (postfix)? C++ Primer Plus Ch. 5

5.5 Questions to Guide Your Design

Tokenizer Design

  • How will you handle multi-character numbers like 123.45?
  • How will you distinguish between - as subtraction vs negative number?
  • What happens when you encounter an unknown character?

Parser Design

  • How will you track the “current” token?
  • What should parseFactor() do when it sees a VARIABLE token?
  • How will you handle the case of missing closing parenthesis?

Error Handling

  • What types of errors are possible? (Lexical, syntactic, semantic, runtime)
  • How will you report line/column numbers for errors?
  • Should you continue parsing after an error or stop immediately?

State Management

  • How will you store variables between expressions?
  • How will you format history entries?
  • When should history be saved? (Every expression? Only successful ones?)

5.6 Thinking Exercise

Before writing code, trace through this expression by hand:

Input: "2 + 3 * 4 - 1"

Step 1: Tokenize
Tokens: [NUMBER(2), PLUS, NUMBER(3), MULT, NUMBER(4), MINUS, NUMBER(1), END]

Step 2: Parse (trace the recursive calls)
parseExpression():
  parseTerm():
    parseFactor() -> 2
    no * or /, return 2
  left = 2
  see PLUS
  parseTerm():
    parseFactor() -> 3
    see MULT
    parseFactor() -> 4
    compute 3 * 4 = 12
    no more * or /, return 12
  left = 2 + 12 = 14
  see MINUS
  parseTerm():
    parseFactor() -> 1
    no * or /, return 1
  left = 14 - 1 = 13
  no more + or -, return 13

Result: 13

Now trace "(2 + 3) * 4" and verify you get 20.

5.7 Hints in Layers

Hint 1: Starting Point (Conceptual) Begin with the tokenizer. Write a class that converts a string to a vector of tokens. Test it independently before moving to parsing.

Hint 2: Next Level (More Specific) For the tokenizer, use a single nextToken() method that returns one token and advances position. Use peek() to look at the current character without consuming it.

class Tokenizer {
    std::string input;
    size_t pos = 0;

    char peek() { return pos < input.size() ? input[pos] : '\0'; }
    char advance() { return input[pos++]; }
    void skipWhitespace();
    Token parseNumber();
    Token parseIdentifier();
public:
    Tokenizer(const std::string& input) : input(input) {}
    Token nextToken();
    std::vector<Token> tokenize();
};

Hint 3: Technical Details (Approach) For the parser, store all tokens in a vector and use an index to track position:

class Parser {
    std::vector<Token> tokens;
    size_t pos = 0;
    std::map<std::string, double>& variables;

    Token& current() { return tokens[pos]; }
    void advance() { pos++; }
    bool match(Token::Type type);

    double parseExpression();
    double parseTerm();
    double parseFactor();
public:
    Parser(const std::vector<Token>& tokens,
           std::map<std::string, double>& vars)
        : tokens(tokens), variables(vars) {}

    double parse();
};

Hint 4: Implementation Details Handle assignment as a special case in the REPL, not in the parser:

// In main REPL loop
std::string line;
while (std::getline(std::cin, line)) {
    // Check for assignment: "x = expr"
    auto eqPos = line.find('=');
    if (eqPos != std::string::npos) {
        std::string varName = trim(line.substr(0, eqPos));
        std::string expr = line.substr(eqPos + 1);
        double value = evaluate(expr);
        variables[varName] = value;
        std::cout << varName << " = " << value << "\n";
    } else {
        double result = evaluate(line);
        std::cout << "Result: " << result << "\n";
        history.push_back({line, result});
    }
}

5.8 The Interview Questions They’ll Ask

  1. “What is the difference between a lexer and a parser?”
    • Lexer (tokenizer) converts characters to tokens (lexical analysis)
    • Parser understands structure of tokens according to grammar (syntactic analysis)
  2. “How does your parser handle operator precedence?”
    • Grammar structure: lower precedence operators at higher levels
    • parseExpression handles +/-, calls parseTerm
    • parseTerm handles *//, calls parseFactor
    • This ensures */ evaluated before +/-
  3. “What happens if the user enters an invalid expression?”
    • Tokenizer throws on unknown characters
    • Parser throws on unexpected tokens (e.g., two operators in a row)
    • Division by zero caught at evaluation time
    • All errors caught with try/catch in REPL
  4. “Could you extend this to support functions like sin(x)?”
    • Add FUNCTION token type in tokenizer
    • Extend parseFactor to recognize function calls
    • parseFactor would call parseExpression for the argument
    • Store functions in a map<string, function<double(double)»
  5. “How would you add support for unary minus like -5 or -(3+2)?”
    • Handle in parseFactor: if current is MINUS, consume it and negate result
    • Must distinguish unary (at start or after operator) from binary (between numbers)

5.9 Books That Will Help

Topic Book Chapter
String Streams C++ Primer Plus Chapter 17
Recursive Descent Parsing Compilers: Principles and Practice Chapter 4
STL Maps The C++ Programming Language Chapter 31
Exception Handling C++ Primer Plus Chapter 15
Grammar and Languages Engineering a Compiler Chapter 3

5.10 Implementation Phases

Phase 1: Token Infrastructure (Day 1-2)

  • Define Token struct with all types
  • Write basic tokenizer that handles numbers and operators
  • Test: tokenize “3 + 4” returns [NUMBER(3), PLUS, NUMBER(4), END]

Phase 2: Basic Parser (Day 3-4)

  • Implement parseFactor (numbers only)
  • Implement parseTerm (multiplication and division)
  • Implement parseExpression (addition and subtraction)
  • Test: “3 + 4 * 2” returns 11

Phase 3: Parentheses (Day 5)

  • Extend parseFactor to handle LPAREN
  • Add error checking for missing RPAREN
  • Test: “(3 + 4) * 2” returns 14

Phase 4: Variables (Day 6-7)

  • Add VARIABLE token type
  • Extend tokenizer to recognize identifiers
  • Add symbol table (std::map)
  • Handle assignment in REPL
  • Test: “x = 5” then “x * 2” returns 10

Phase 5: Polish (Day 8-10)

  • Add history command
  • Improve error messages
  • Handle edge cases (empty input, very long numbers)
  • Add decimal number support
  • Write comprehensive tests

5.11 Key Implementation Decisions

Decision Options Recommendation
Token storage One at a time vs vector Vector (simpler parser)
Parser output Value vs AST Value (for this project)
Number type int vs double double (more flexible)
Variable lookup map vs unordered_map map (ordered iteration)
Error handling Return codes vs exceptions Exceptions (cleaner)

6. Testing Strategy

6.1 Unit Tests

// Test tokenizer
void testTokenizer() {
    Tokenizer t("3 + 4.5");
    auto tokens = t.tokenize();
    assert(tokens[0].type == Token::NUMBER && tokens[0].value == 3);
    assert(tokens[1].type == Token::PLUS);
    assert(tokens[2].type == Token::NUMBER && tokens[2].value == 4.5);
    assert(tokens[3].type == Token::END);
}

// Test parser
void testParser() {
    std::map<std::string, double> vars;
    assert(evaluate("3 + 4 * 2", vars) == 11);
    assert(evaluate("(3 + 4) * 2", vars) == 14);
    assert(evaluate("10 / 2 - 3", vars) == 2);
}

// Test variables
void testVariables() {
    std::map<std::string, double> vars;
    vars["x"] = 10;
    assert(evaluate("x + 5", vars) == 15);
    assert(evaluate("x * x", vars) == 100);
}

6.2 Edge Cases to Test

Input Expected
"" Error or 0
42 42
3.14159 3.14159
1 + 2 + 3 + 4 10
((((1)))) 1
1 / 0 Error
x (undefined) Error
1 + Error
* 5 Error
1 2 3 Error

7. Common Pitfalls & Debugging

Problem Symptom Root Cause Fix
Wrong precedence 3 + 4 * 2 = 14 Parsing +/- same level as */ Separate into parseExpression and parseTerm
Infinite loop Program hangs Parser not advancing token Ensure every path calls advance()
Missing token “Unexpected end” Tokenizer ends too early Check loop condition in tokenizer
Variable not found Runtime error Using map[] for lookup Use map.find() or map.at() with try/catch
Number parsing wrong 3.5 becomes 3 Not handling decimal point Continue parsing digits after .

Debugging Tips

  1. Print tokens before parsing to verify tokenizer works
  2. Add tracing to parser functions: print which function and current token
  3. Test small first: verify 5 works before 3 + 4 * 2
  4. Use assertions liberally during development

8. Extensions & Challenges

Once the basic calculator works, try these extensions:

Extension Difficulty Concepts Learned
Unary minus: -5, -(3+2) Easy Token context
Power operator: 2^3 Medium Right associativity
Functions: sin(x), sqrt(x) Medium Function call parsing
Comparison: x > 5 Medium Boolean results
Multiple statements: x=5; y=x*2 Medium Statement separation
Comments: 5 + 3 // add Easy Tokenizer modification
Error recovery Hard Parser error handling
Build AST instead of evaluating Hard Tree data structures

9. Real-World Connections

How Production Systems Solve This

System Approach
Python Hand-written recursive descent parser
JavaScript (V8) Recursive descent with operator precedence climbing
GCC LALR parser (generated by Bison)
Excel Custom expression parser with function support
Calculator apps Similar recursive descent

Industry Relevance

  • Every IDE has an expression evaluator for debuggers
  • Build systems (CMake, Make) parse configuration languages
  • Query builders parse SQL-like expressions
  • Template engines parse interpolation syntax

10. Resources

Primary References

  • Stroustrup, B. “The C++ Programming Language, 4th Edition” - Chapter 10 (Expressions)
  • Prata, S. “C++ Primer Plus, 6th Edition” - Chapters 4, 7, 15, 16
  • Parr, T. “Language Implementation Patterns” - Chapters 2-4

Online Resources


11. Self-Assessment Checklist

Before considering this project complete, verify:

  • Can tokenize: 3.14 + x * (y - 2)
  • Respects precedence: 3 + 4 * 2 = 11
  • Handles parentheses: (3 + 4) * 2 = 14
  • Variables work: x = 10 then x * 2 = 20
  • History command shows past calculations
  • Division by zero gives clear error
  • Undefined variable gives clear error
  • Malformed expression gives clear error
  • No memory leaks (run with Valgrind)
  • Code compiles with -Wall -Wextra without warnings

12. Submission / Completion Criteria

This project is complete when:

  1. All functional requirements (F1-F10) are implemented
  2. All example expressions from section 3.4 work correctly
  3. Edge cases are handled gracefully with clear error messages
  4. Code is well-organized into separate header/source files
  5. At least 10 unit tests pass
  6. No memory leaks detected by Valgrind
  7. You can explain how precedence is implemented in your parser
  8. You can extend it to add one new operator (e.g., power ^)

Deliverables:

  • Source code with clear comments
  • README with build instructions
  • Test file demonstrating functionality
  • Brief writeup explaining your design decisions

This project establishes the foundation for language processing. The patterns you learn here—tokenization, recursive descent, symbol tables—appear in every compiler, interpreter, and configuration parser. Master this, and you’ve taken the first step toward understanding how programming languages work.