Project 1: Command-Line Calculator with Expression Parser
Build a calculator that parses and evaluates mathematical expressions with operator precedence, variables, and history tracking.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Beginner |
| Time Estimate | 1-2 weeks |
| Language | C++ |
| Prerequisites | Basic programming concepts, understanding of arithmetic |
| Key Topics | String parsing, recursive descent, tokenization, std::map, exception handling |
1. Learning Objectives
After completing this project, you will:
- Understand how to tokenize input strings into meaningful units (lexical analysis)
- Master recursive descent parsing, the foundation of all language processing
- Learn to implement operator precedence without ambiguity
- Use STL containers (std::map, std::vector) effectively
- Handle errors gracefully with C++ exceptions
- Build a complete, interactive command-line application
2. Theoretical Foundation
2.1 Core Concepts
Tokenization (Lexical Analysis)
Before you can evaluate an expression like 3 + 4 * 2, you must break it into tokens:
3(NUMBER)+(PLUS)4(NUMBER)*(MULT)2(NUMBER)
This process transforms a stream of characters into a stream of meaningful symbols. Each token has a type and optionally a value.
Recursive Descent Parsing
Parsing is the process of understanding the structure of input according to a grammar. Recursive descent is an intuitive technique where each grammar rule becomes a function that calls other grammar functions.
For mathematical expressions, the grammar handles precedence:
expression := term (('+' | '-') term)*
term := factor (('*' | '/') factor)*
factor := NUMBER | VARIABLE | '(' expression ')'
This grammar ensures multiplication and division (in term) bind tighter than addition and subtraction (in expression).
Operator Precedence
Why does 3 + 4 * 2 equal 11, not 14? Because multiplication has higher precedence than addition. The expression parses as 3 + (4 * 2), not (3 + 4) * 2.
Recursive descent naturally handles precedence: lower-precedence operators are parsed at higher levels in the call tree.
Parsing: 3 + 4 * 2
parseExpression()
parseTerm() -> returns 3
sees '+', consumes it
parseTerm()
parseFactor() -> returns 4
sees '*', consumes it
parseFactor() -> returns 2
computes 4 * 2 = 8
returns 8
computes 3 + 8 = 11
returns 11
Symbol Tables
Variables like x = 10 require storing name-value mappings. A symbol table (implemented with std::map or std::unordered_map) provides O(log n) or O(1) lookup.
2.2 Why This Matters
Expression parsing is the foundation of:
- Compilers and interpreters: Every programming language starts with parsing
- Configuration files: YAML, JSON, TOML parsers use similar techniques
- Query languages: SQL, GraphQL, regex engines
- Calculators and spreadsheets: Excel formulas are parsed expressions
- Domain-specific languages (DSLs): Custom mini-languages in applications
Understanding parsing makes you a better programmer because you understand how code transforms from text to execution.
2.3 Historical Context
Recursive descent parsing dates to the 1960s. Niklaus Wirth, creator of Pascal, popularized it in his book “Algorithms + Data Structures = Programs” (1976). Today, most hand-written parsers use recursive descent because it’s intuitive and maps directly to the grammar.
Parser generators like YACC (1975) and Bison automate parser creation, but understanding manual parsing remains essential for:
- Simple languages (where generators are overkill)
- Error recovery (hand-tuned error messages)
- Performance-critical parsers
2.4 Common Misconceptions
Misconception 1: “I need a parser generator for any parsing” False. For simple grammars (like expressions), hand-written recursive descent is cleaner and more maintainable.
Misconception 2: “Operator precedence requires complex algorithms” False. The grammar structure handles precedence automatically. Each precedence level becomes one parsing function.
Misconception 3: “Tokenization and parsing are the same” False. Tokenization (lexing) breaks input into tokens. Parsing understands the structure of those tokens according to grammar rules.
3. Project Specification
3.1 What You Will Build
A command-line calculator that:
- Reads expressions from standard input
- Tokenizes input into meaningful symbols
- Parses expressions respecting operator precedence
- Evaluates expressions and prints results
- Supports variable assignment and retrieval
- Maintains calculation history
- Handles errors gracefully (division by zero, undefined variables, syntax errors)
3.2 Functional Requirements
| ID | Requirement |
|---|---|
| F1 | Parse and evaluate arithmetic expressions with +, -, *, / |
| F2 | Respect standard operator precedence (multiplication/division before addition/subtraction) |
| F3 | Support parentheses for grouping: (1 + 2) * 3 |
| F4 | Support variable assignment: x = 10 |
| F5 | Support variable usage in expressions: x * 2 + 5 |
| F6 | Maintain history of calculations |
| F7 | Display history with history command |
| F8 | Handle decimal numbers (floating-point) |
| F9 | Provide clear error messages for invalid input |
| F10 | Exit cleanly with quit or exit command |
3.3 Non-Functional Requirements
| ID | Requirement |
|---|---|
| N1 | Response time under 10ms for typical expressions |
| N2 | Handle expressions up to 1000 characters |
| N3 | Support at least 100 stored variables |
| N4 | Maintain history of at least 1000 calculations |
| N5 | No memory leaks (verified with Valgrind) |
3.4 Example Usage / Output
$ ./calculator
> 3 + 4 * 2
Result: 11
> x = 10
x = 10
> x * 2 + 5
Result: 25
> (1 + 2) * (3 + 4)
Result: 21
> 5 / 0
Error: Division by zero
> y + 1
Error: Undefined variable 'y'
> history
1: 3 + 4 * 2 = 11
2: x = 10
3: x * 2 + 5 = 25
4: (1 + 2) * (3 + 4) = 21
> 2.5 * 4
Result: 10
> (3 +
Error: Unexpected end of expression, expected ')'
> quit
Goodbye!
3.5 Real World Outcome
After completing this project, you will have:
- A working calculator you can use daily
- Understanding of how programming languages are parsed
- Experience with C++ classes, STL containers, and exceptions
- A foundation for building more complex parsers (JSON, config files, DSLs)
4. Solution Architecture
4.1 High-Level Design
+-------------------+
User Input -----------> | Tokenizer |
+-------------------+
|
Vector<Token>
|
v
+-------------------+
| Parser |
+-------------------+
|
AST/Value
|
v
+-------------------+
| Evaluator | <---> Symbol Table
+-------------------+
|
Result
|
v
+-------------------+
| Output | ---> History
+-------------------+
4.2 Key Components
Token
struct Token {
enum Type { NUMBER, PLUS, MINUS, MULT, DIV, LPAREN, RPAREN,
VARIABLE, ASSIGN, END };
Type type;
double value; // For NUMBER tokens
std::string name; // For VARIABLE tokens
};
Tokenizer
- Converts input string to vector of tokens
- Skips whitespace
- Recognizes numbers (including decimals), operators, parentheses, identifiers
Parser
- Implements recursive descent
- Three main functions:
parseExpression(),parseTerm(),parseFactor() - Returns computed value (or could return AST for more complex interpreters)
Calculator
- Orchestrates tokenizer and parser
- Manages symbol table (variable storage)
- Maintains history
- Handles REPL (Read-Eval-Print Loop)
4.3 Data Structures
| Structure | Purpose | C++ Type |
|---|---|---|
| Token stream | Hold tokenized input | std::vector<Token> |
| Symbol table | Store variables | std::map<std::string, double> |
| History | Track past calculations | std::vector<std::pair<std::string, double>> |
4.4 Algorithm Overview
Tokenization Algorithm
1. Start at position 0
2. Skip whitespace
3. If end of input, return END token
4. If digit, consume all digits (and decimal point), return NUMBER
5. If letter, consume all alphanumeric chars, return VARIABLE
6. If operator (+,-,*,/), return corresponding token
7. If parenthesis, return LPAREN or RPAREN
8. If '=', return ASSIGN
9. Otherwise, error: unknown character
Recursive Descent Parsing
parseExpression():
left = parseTerm()
while current token is + or -:
op = current token
advance()
right = parseTerm()
left = apply op to (left, right)
return left
parseTerm():
left = parseFactor()
while current token is * or /:
op = current token
advance()
right = parseFactor()
left = apply op to (left, right)
return left
parseFactor():
if current is NUMBER:
return its value
if current is VARIABLE:
return symbol_table[name]
if current is LPAREN:
advance()
result = parseExpression()
expect RPAREN
return result
error: unexpected token
5. Implementation Guide
5.1 Development Environment Setup
# Create project directory
mkdir calculator && cd calculator
# Create source files
touch main.cpp token.hpp tokenizer.hpp tokenizer.cpp parser.hpp parser.cpp
# Compile (C++17 for std::optional, string_view)
g++ -std=c++17 -Wall -Wextra -o calculator main.cpp tokenizer.cpp parser.cpp
# Or with CMake
cat > CMakeLists.txt << 'EOF'
cmake_minimum_required(VERSION 3.10)
project(calculator)
set(CMAKE_CXX_STANDARD 17)
add_executable(calculator main.cpp tokenizer.cpp parser.cpp)
EOF
mkdir build && cd build && cmake .. && make
5.2 Project Structure
calculator/
├── CMakeLists.txt
├── include/
│ ├── token.hpp # Token struct definition
│ ├── tokenizer.hpp # Tokenizer class declaration
│ └── parser.hpp # Parser class declaration
├── src/
│ ├── main.cpp # REPL and main function
│ ├── tokenizer.cpp # Tokenizer implementation
│ └── parser.cpp # Parser implementation
└── tests/
└── test_calculator.cpp # Unit tests
5.3 The Core Question You’re Answering
“How do I transform a string of characters into a computed numerical result while respecting mathematical rules?”
This question decomposes into:
- How do I recognize meaningful units (tokens) in text?
- How do I understand the structure of those tokens (parsing)?
- How do I handle operator precedence correctly?
- How do I evaluate the parsed structure to get a result?
5.4 Concepts You Must Understand First
Before starting, verify you can answer:
| Question | Book Reference |
|---|---|
What is the difference between std::string and const char*? |
C++ Primer Plus Ch. 4 |
How do you use std::map to store and retrieve values? |
C++ Primer Plus Ch. 16 |
| What is a recursive function and when does it terminate? | C++ Primer Plus Ch. 7 |
How do exceptions work in C++ (try, catch, throw)? |
C++ Primer Plus Ch. 15 |
What is the difference between ++i (prefix) and i++ (postfix)? |
C++ Primer Plus Ch. 5 |
5.5 Questions to Guide Your Design
Tokenizer Design
- How will you handle multi-character numbers like
123.45? - How will you distinguish between
-as subtraction vs negative number? - What happens when you encounter an unknown character?
Parser Design
- How will you track the “current” token?
- What should
parseFactor()do when it sees a VARIABLE token? - How will you handle the case of missing closing parenthesis?
Error Handling
- What types of errors are possible? (Lexical, syntactic, semantic, runtime)
- How will you report line/column numbers for errors?
- Should you continue parsing after an error or stop immediately?
State Management
- How will you store variables between expressions?
- How will you format history entries?
- When should history be saved? (Every expression? Only successful ones?)
5.6 Thinking Exercise
Before writing code, trace through this expression by hand:
Input: "2 + 3 * 4 - 1"
Step 1: Tokenize
Tokens: [NUMBER(2), PLUS, NUMBER(3), MULT, NUMBER(4), MINUS, NUMBER(1), END]
Step 2: Parse (trace the recursive calls)
parseExpression():
parseTerm():
parseFactor() -> 2
no * or /, return 2
left = 2
see PLUS
parseTerm():
parseFactor() -> 3
see MULT
parseFactor() -> 4
compute 3 * 4 = 12
no more * or /, return 12
left = 2 + 12 = 14
see MINUS
parseTerm():
parseFactor() -> 1
no * or /, return 1
left = 14 - 1 = 13
no more + or -, return 13
Result: 13
Now trace "(2 + 3) * 4" and verify you get 20.
5.7 Hints in Layers
Hint 1: Starting Point (Conceptual) Begin with the tokenizer. Write a class that converts a string to a vector of tokens. Test it independently before moving to parsing.
Hint 2: Next Level (More Specific)
For the tokenizer, use a single nextToken() method that returns one token and advances position. Use peek() to look at the current character without consuming it.
class Tokenizer {
std::string input;
size_t pos = 0;
char peek() { return pos < input.size() ? input[pos] : '\0'; }
char advance() { return input[pos++]; }
void skipWhitespace();
Token parseNumber();
Token parseIdentifier();
public:
Tokenizer(const std::string& input) : input(input) {}
Token nextToken();
std::vector<Token> tokenize();
};
Hint 3: Technical Details (Approach) For the parser, store all tokens in a vector and use an index to track position:
class Parser {
std::vector<Token> tokens;
size_t pos = 0;
std::map<std::string, double>& variables;
Token& current() { return tokens[pos]; }
void advance() { pos++; }
bool match(Token::Type type);
double parseExpression();
double parseTerm();
double parseFactor();
public:
Parser(const std::vector<Token>& tokens,
std::map<std::string, double>& vars)
: tokens(tokens), variables(vars) {}
double parse();
};
Hint 4: Implementation Details Handle assignment as a special case in the REPL, not in the parser:
// In main REPL loop
std::string line;
while (std::getline(std::cin, line)) {
// Check for assignment: "x = expr"
auto eqPos = line.find('=');
if (eqPos != std::string::npos) {
std::string varName = trim(line.substr(0, eqPos));
std::string expr = line.substr(eqPos + 1);
double value = evaluate(expr);
variables[varName] = value;
std::cout << varName << " = " << value << "\n";
} else {
double result = evaluate(line);
std::cout << "Result: " << result << "\n";
history.push_back({line, result});
}
}
5.8 The Interview Questions They’ll Ask
- “What is the difference between a lexer and a parser?”
- Lexer (tokenizer) converts characters to tokens (lexical analysis)
- Parser understands structure of tokens according to grammar (syntactic analysis)
- “How does your parser handle operator precedence?”
- Grammar structure: lower precedence operators at higher levels
parseExpressionhandles +/-, callsparseTermparseTermhandles *//, callsparseFactor- This ensures */ evaluated before +/-
- “What happens if the user enters an invalid expression?”
- Tokenizer throws on unknown characters
- Parser throws on unexpected tokens (e.g., two operators in a row)
- Division by zero caught at evaluation time
- All errors caught with try/catch in REPL
- “Could you extend this to support functions like sin(x)?”
- Add FUNCTION token type in tokenizer
- Extend parseFactor to recognize function calls
parseFactorwould callparseExpressionfor the argument- Store functions in a map<string, function<double(double)»
- “How would you add support for unary minus like -5 or -(3+2)?”
- Handle in
parseFactor: if current is MINUS, consume it and negate result - Must distinguish unary (at start or after operator) from binary (between numbers)
- Handle in
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| String Streams | C++ Primer Plus | Chapter 17 |
| Recursive Descent Parsing | Compilers: Principles and Practice | Chapter 4 |
| STL Maps | The C++ Programming Language | Chapter 31 |
| Exception Handling | C++ Primer Plus | Chapter 15 |
| Grammar and Languages | Engineering a Compiler | Chapter 3 |
5.10 Implementation Phases
Phase 1: Token Infrastructure (Day 1-2)
- Define Token struct with all types
- Write basic tokenizer that handles numbers and operators
- Test: tokenize “3 + 4” returns [NUMBER(3), PLUS, NUMBER(4), END]
Phase 2: Basic Parser (Day 3-4)
- Implement parseFactor (numbers only)
- Implement parseTerm (multiplication and division)
- Implement parseExpression (addition and subtraction)
- Test: “3 + 4 * 2” returns 11
Phase 3: Parentheses (Day 5)
- Extend parseFactor to handle LPAREN
- Add error checking for missing RPAREN
- Test: “(3 + 4) * 2” returns 14
Phase 4: Variables (Day 6-7)
- Add VARIABLE token type
- Extend tokenizer to recognize identifiers
- Add symbol table (std::map)
- Handle assignment in REPL
- Test: “x = 5” then “x * 2” returns 10
Phase 5: Polish (Day 8-10)
- Add history command
- Improve error messages
- Handle edge cases (empty input, very long numbers)
- Add decimal number support
- Write comprehensive tests
5.11 Key Implementation Decisions
| Decision | Options | Recommendation |
|---|---|---|
| Token storage | One at a time vs vector | Vector (simpler parser) |
| Parser output | Value vs AST | Value (for this project) |
| Number type | int vs double | double (more flexible) |
| Variable lookup | map vs unordered_map | map (ordered iteration) |
| Error handling | Return codes vs exceptions | Exceptions (cleaner) |
6. Testing Strategy
6.1 Unit Tests
// Test tokenizer
void testTokenizer() {
Tokenizer t("3 + 4.5");
auto tokens = t.tokenize();
assert(tokens[0].type == Token::NUMBER && tokens[0].value == 3);
assert(tokens[1].type == Token::PLUS);
assert(tokens[2].type == Token::NUMBER && tokens[2].value == 4.5);
assert(tokens[3].type == Token::END);
}
// Test parser
void testParser() {
std::map<std::string, double> vars;
assert(evaluate("3 + 4 * 2", vars) == 11);
assert(evaluate("(3 + 4) * 2", vars) == 14);
assert(evaluate("10 / 2 - 3", vars) == 2);
}
// Test variables
void testVariables() {
std::map<std::string, double> vars;
vars["x"] = 10;
assert(evaluate("x + 5", vars) == 15);
assert(evaluate("x * x", vars) == 100);
}
6.2 Edge Cases to Test
| Input | Expected |
|---|---|
"" |
Error or 0 |
42 |
42 |
3.14159 |
3.14159 |
1 + 2 + 3 + 4 |
10 |
((((1)))) |
1 |
1 / 0 |
Error |
x (undefined) |
Error |
1 + |
Error |
* 5 |
Error |
1 2 3 |
Error |
7. Common Pitfalls & Debugging
| Problem | Symptom | Root Cause | Fix |
|---|---|---|---|
| Wrong precedence | 3 + 4 * 2 = 14 |
Parsing +/- same level as */ | Separate into parseExpression and parseTerm |
| Infinite loop | Program hangs | Parser not advancing token | Ensure every path calls advance() |
| Missing token | “Unexpected end” | Tokenizer ends too early | Check loop condition in tokenizer |
| Variable not found | Runtime error | Using map[] for lookup |
Use map.find() or map.at() with try/catch |
| Number parsing wrong | 3.5 becomes 3 |
Not handling decimal point | Continue parsing digits after . |
Debugging Tips
- Print tokens before parsing to verify tokenizer works
- Add tracing to parser functions: print which function and current token
- Test small first: verify
5works before3 + 4 * 2 - Use assertions liberally during development
8. Extensions & Challenges
Once the basic calculator works, try these extensions:
| Extension | Difficulty | Concepts Learned |
|---|---|---|
Unary minus: -5, -(3+2) |
Easy | Token context |
Power operator: 2^3 |
Medium | Right associativity |
Functions: sin(x), sqrt(x) |
Medium | Function call parsing |
Comparison: x > 5 |
Medium | Boolean results |
Multiple statements: x=5; y=x*2 |
Medium | Statement separation |
Comments: 5 + 3 // add |
Easy | Tokenizer modification |
| Error recovery | Hard | Parser error handling |
| Build AST instead of evaluating | Hard | Tree data structures |
9. Real-World Connections
How Production Systems Solve This
| System | Approach |
|---|---|
| Python | Hand-written recursive descent parser |
| JavaScript (V8) | Recursive descent with operator precedence climbing |
| GCC | LALR parser (generated by Bison) |
| Excel | Custom expression parser with function support |
| Calculator apps | Similar recursive descent |
Industry Relevance
- Every IDE has an expression evaluator for debuggers
- Build systems (CMake, Make) parse configuration languages
- Query builders parse SQL-like expressions
- Template engines parse interpolation syntax
10. Resources
Primary References
- Stroustrup, B. “The C++ Programming Language, 4th Edition” - Chapter 10 (Expressions)
- Prata, S. “C++ Primer Plus, 6th Edition” - Chapters 4, 7, 15, 16
- Parr, T. “Language Implementation Patterns” - Chapters 2-4
Online Resources
11. Self-Assessment Checklist
Before considering this project complete, verify:
- Can tokenize:
3.14 + x * (y - 2) - Respects precedence:
3 + 4 * 2 = 11 - Handles parentheses:
(3 + 4) * 2 = 14 - Variables work:
x = 10thenx * 2 = 20 - History command shows past calculations
- Division by zero gives clear error
- Undefined variable gives clear error
- Malformed expression gives clear error
- No memory leaks (run with Valgrind)
- Code compiles with
-Wall -Wextrawithout warnings
12. Submission / Completion Criteria
This project is complete when:
- All functional requirements (F1-F10) are implemented
- All example expressions from section 3.4 work correctly
- Edge cases are handled gracefully with clear error messages
- Code is well-organized into separate header/source files
- At least 10 unit tests pass
- No memory leaks detected by Valgrind
- You can explain how precedence is implemented in your parser
- You can extend it to add one new operator (e.g., power
^)
Deliverables:
- Source code with clear comments
- README with build instructions
- Test file demonstrating functionality
- Brief writeup explaining your design decisions
This project establishes the foundation for language processing. The patterns you learn here—tokenization, recursive descent, symbol tables—appear in every compiler, interpreter, and configuration parser. Master this, and you’ve taken the first step toward understanding how programming languages work.