Project 1: Scientific Calculator from Scratch
A command-line calculator that parses mathematical expressions like
3 + 4 * (2 - 1) ^ 2and evaluates them correctly, handling operator precedence, parentheses, and mathematical functions (sin, cos, log, exp, sqrt).
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 1: Beginner (The Tinkerer) |
| Main Programming Language | Python |
| Alternative Programming Languages | C, JavaScript, Rust |
| Coolness Level | Level 2: Practical but Forgettable |
| Business Potential | 1. The “Resume Gold” (Educational/Personal Brand) |
| Knowledge Area | Expression Parsing / Numerical Computing |
| Software or Tool | Calculator Engine |
| Main Book | “C Programming: A Modern Approach” by K. N. King (Chapter 7: Basic Types) |
1. Learning Objectives
By completing this project, you will:
- Translate math definitions into deterministic implementation steps.
- Build validation checks that make correctness observable.
- Diagnose numerical, logical, and data-shape failures early.
- Explain tradeoffs in interviews using evidence from your own build.
2. All Theory Needed (Per-Concept Breakdown)
This project applies the following theory clusters:
- Symbolic-to-numeric translation (expressions, data shapes, invariants)
- Stability constraints (precision, scaling, stopping criteria)
- Optimization or inference logic (depending on project objective)
- Evaluation discipline (error analysis, test coverage, reproducibility)
Concept A: Mathematical Representation Discipline
Fundamentals A math expression is not executable until you define representation, ordering, and domain constraints. The same equation can be represented as a token stream, tree, matrix pipeline, or probability graph. Choosing representation determines what bugs you can catch early.
Deep Dive into the concept Most project failures begin before algorithm selection: they start with ambiguous representation. If your parser cannot distinguish unary minus from subtraction, your calculator fails. If your matrix dimensions are implicit rather than validated, your linear algebra pipeline fails silently. If your probabilistic assumptions (independence, stationarity, or class priors) are not explicit, your inference can look accurate on one split and collapse on another. The core implementation move is to treat representation as a contract. Define each object with shape, domain, and semantic intent. Then enforce invariants at boundaries: input parser, preprocessing, training loop, evaluation stage. This makes debugging local instead of global.
How this fits this project You will encode each operation with explicit contracts and invariant checks.
Definitions & key terms
- Invariant: Property that must hold before and after each operation.
- Shape contract: Expected dimensional structure of vectors/matrices/tensors.
- Domain constraint: Allowed value range (for example log input > 0).
Mental model diagram
User Input -> Representation Layer -> Validated Operation -> Observable Output
(tokens/shapes) (invariants pass) (tests/plots/logs)
How it works
- Parse/ingest data into typed structures.
- Validate shape/domain invariants.
- Execute operation.
- Compare observed output with expected behavior.
- Record failure signature if mismatch appears.
Minimal concrete example
PSEUDOCODE
read expression
tokenize with precedence rules
if token sequence invalid -> return syntax error
evaluate tree
if domain violation -> return bounded diagnostic
print value and confidence check
Common misconceptions
- “If it runs once, representation is correct.” -> false.
- “Type checks are enough without shape checks.” -> false.
Check-your-understanding questions
- Which invariant catches division-by-zero earliest?
- Why does shape validation belong at boundaries rather than only in core logic?
- Predict failure if tokenization ignores unary minus.
Check-your-understanding answers
- Domain check on denominator before operation execution.
- Boundary validation keeps errors local and diagnostic.
- Expressions like
-2^2get misinterpreted and produce wrong precedence behavior.
Real-world applications Feature preprocessing, model-serving input validation, and experiment-tracking schema enforcement.
Where you’ll apply it This project and every downstream project in the sprint.
References
- CSAPP (Bryant & O’Hallaron), floating-point chapter
- Math for Programmers (Paul Orland), representation-oriented chapters
Key insight Correct representation reduces the complexity of every later decision.
Summary Stable ML math implementations start with explicit contracts, not implicit assumptions.
Homework/Exercises
- Write five invariants for your project.
- Build a failing test input for each invariant.
Solutions
- Include at least one shape, one domain, one convergence, one reproducibility, and one output-range invariant.
- Each failing input should trigger exactly one diagnostic to keep root-cause analysis clean.
3. Build Blueprint
- Scope the smallest end-to-end slice that produces visible output.
- Add deterministic tests and edge-case probes.
- Layer complexity only after baseline behavior is stable.
- Add metrics logging before optimization.
- Run failure drills: perturb inputs, scale values, and check stability.
4. Real-World Outcome (Target)
$ ./calculator
> 3 + 4 * 2
11
> (3 + 4) * 2
14
> sqrt(16) + log(exp(5))
9.0
> sin(3.14159/2)
0.9999999999
> 2^10
1024
Implementation Hints:
The key insight is that mathematical expressions have a grammar. The Shunting Yard algorithm (by Dijkstra) converts infix notation to postfix (Reverse Polish Notation), which is trivial to evaluate with a stack. For functions like sin, cos, treat them as unary operators with highest precedence.
For the math itself:
- Exponentiation:
a^bmeans “multiply a by itself b times” - Logarithm:
log_b(x) = ymeans “b raised to y equals x” (inverse of exponentiation) - Trigonometry: Implement using Taylor series:
sin(x) = x - x³/3! + x⁵/5! - ...
Learning milestones:
- Basic arithmetic works with correct precedence → You understand PEMDAS deeply
- Parentheses and nested expressions work → You understand expression trees
- Transcendental functions (sin, log, exp) work → You understand these fundamental relationships
5. Core Design Notes from Main Guide
Core Question
“What does it really mean for a computer to ‘understand’ mathematics?”
When you type 3 + 4 * 2 into any calculator, it returns 11, not 14. But how does the machine know that multiplication comes before addition? How does it know that sin(3.14159/2) should return approximately 1? This project forces you to confront a deep question: mathematical notation is a human invention with implicit rules that we’ve internalized since childhood. A computer has no such intuition—you must teach it every rule explicitly. By building a calculator from scratch, you discover that mathematics is not about numbers but about structure—and that structure can be represented as a tree.
Concepts You Must Understand First
Stop and research these before coding:
- Operator Precedence (PEMDAS/BODMAS)
- Why does multiplication happen before addition?
- What happens when operators have equal precedence (like
+and-)? - How do parentheses override the natural order?
- Book Reference: “C Programming: A Modern Approach” Chapter 4 - K. N. King
- Expression Trees (Abstract Syntax Trees)
- How can an equation be represented as a tree structure?
- What does “parsing” mean and why is it necessary?
- How do you traverse a tree to evaluate an expression?
- Book Reference: “Compilers: Principles, Techniques, and Tools” Chapter 2 - Aho et al.
- The Shunting Yard Algorithm
- How does Dijkstra’s algorithm convert infix to postfix notation?
- What is a stack and why is it essential here?
- How do you handle both left-associative and right-associative operators?
- Book Reference: “Algorithms” Chapter 4.3 - Sedgewick & Wayne
- Transcendental Functions and Their Domains
- Why can’t you take the logarithm of a negative number (in the reals)?
- What is the unit circle and how does it define sine and cosine?
- Why is
log(exp(x)) = xbutexp(log(x)) = xonly forx > 0? - Book Reference: “Calculus: Early Transcendentals” Chapter 1 - James Stewart
- Floating-Point Representation
- Why does
0.1 + 0.2not equal0.3exactly? - What are precision limits and how do they affect calculations?
- When should you worry about floating-point errors?
- Book Reference: “Computer Systems: A Programmer’s Perspective” Chapter 2.4 - Bryant & O’Hallaron
- Why does
- Taylor Series Expansions
- How can you compute
sin(x)from just addition and multiplication? - What is convergence and how many terms do you need?
- Why do Taylor series work for some functions but not others?
- Book Reference: “Calculus” Chapter 11 - James Stewart
- How can you compute
Questions to Guide Your Design
Before implementing, think through these:
- How will you represent mathematical expressions internally? As strings? As trees? As a list of tokens?
- What happens when the user enters invalid input like
3 + + 5orsin()? - How will you distinguish between the subtraction operator
-and a negative number? - Should your calculator support variables like
x = 5thenx + 3? - How will you handle functions that take multiple arguments, like
max(3, 5)? - What precision should your calculator use? How will you display results?
Thinking Exercise
Hand-trace the Shunting Yard algorithm before coding:
Take the expression: 3 + 4 * 2 ^ 2 - 1
Using a piece of paper, maintain two data structures:
- Output queue (will hold the result in postfix notation)
- Operator stack (temporary holding for operators)
Rules to apply:
- Numbers go directly to output queue
- Operators go to stack, BUT first pop higher-precedence operators from stack to output
^(exponentiation) is right-associative; others are left-associative- At the end, pop all remaining operators to output
After hand-tracing, your output queue should contain: 3 4 2 2 ^ * + 1 -
Now evaluate this postfix expression using a single stack:
- Numbers push to stack
- Operators pop two numbers, compute, push result
Verify you get 18 (since 2^2 = 4, then 4*4 = 16, then 3+16 = 19, then 19-1 = 18).
Did you catch all the steps? This is exactly why you need to implement and test carefully.
Interview Questions
- “Explain how you would parse and evaluate the expression
2 + 3 * 4without usingeval()or a library.”- Expected: Describe tokenization, operator precedence, and either recursive descent parsing or Shunting Yard algorithm.
- “What data structure would you use to represent a mathematical expression, and why?”
- Expected: Expression tree (AST), because it naturally represents the hierarchical structure and makes evaluation recursive and clean.
- “How would you add support for user-defined functions like
f(x) = x^2?”- Expected: Store function definitions in a symbol table, substitute values when function is called.
- “What’s the difference between a syntax error and a semantic error in expression parsing?”
- Expected: Syntax error = malformed expression (
3 + + 5); semantic error = valid syntax but meaningless (sqrt(-1)in reals).
- Expected: Syntax error = malformed expression (
- “How would you implement implicit multiplication, like
2(3+4)instead of2*(3+4)?”- Expected: In tokenizer, insert implicit
*when a number is followed by(or a function name.
- Expected: In tokenizer, insert implicit
- “What are the trade-offs between recursive descent parsing and the Shunting Yard algorithm?”
- Expected: Recursive descent is more flexible and handles complex grammars; Shunting Yard is simpler for arithmetic expressions and easily handles precedence.
- “How would you test a calculator to ensure it handles edge cases correctly?”
- Expected: Test negative numbers, division by zero, very large/small numbers, deeply nested parentheses, and operator associativity.
Hints in Layers (Treat as pseudocode guidance)
Hint 1: Start with a simple grammar
Begin by only supporting +, -, *, / on integers with no parentheses. Get this working perfectly before adding complexity. Tokenize first (split "3+4" into ["3", "+", "4"]), then parse.
Hint 2: Use two stacks for simple evaluation For basic expressions, you can use the “two-stack algorithm”: one stack for numbers, one for operators. When you see a higher-precedence operator, push it. When you see a lower-precedence operator, pop and evaluate first.
Hint 3: For parentheses, use recursion or markers
When you encounter (, you can either: (a) recursively parse the sub-expression until ), or (b) push a marker onto the operator stack and pop until you hit it.
Hint 4: Functions are just unary operators with highest precedence
Treat sin, cos, log as operators that take one argument. When you see sin, push it to the operator stack. When you see the closing ) of its argument, pop and evaluate.
Hint 5: For the Shunting Yard algorithm, precedence table is key
Create a dictionary: {'(': 0, '+': 1, '-': 1, '*': 2, '/': 2, '^': 3}. Right-associative operators pop only when incoming operator has strictly lower precedence; left-associative pop when lower or equal.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Operator precedence in programming | “C Programming: A Modern Approach” | Chapter 4 - K. N. King |
| Parsing algorithms | “Compilers: Principles, Techniques, and Tools” | Chapter 4 - Aho, Sethi, Ullman |
| Shunting Yard and stacks | “Algorithms” | Chapter 4.3 - Sedgewick & Wayne |
| Floating-point representation | “Computer Systems: A Programmer’s Perspective” | Chapter 2.4 - Bryant & O’Hallaron |
| Mathematical functions and domains | “Calculus: Early Transcendentals” | Chapter 1 - James Stewart |
| Taylor series for computing functions | “Calculus” | Chapter 11 - James Stewart |
| Expression evaluation in Lisp | “Structure and Interpretation of Computer Programs” | Chapter 1 - Abelson & Sussman |
6. Validation, Pitfalls, and Completion
Common Pitfalls and Debugging
Problem 1: “Outputs drift after a few iterations”
- Why: Hidden numerical instability (unscaled features, aggressive step size, or repeated subtraction of nearly equal values).
- Fix: Normalize inputs, reduce step size, and track relative error rather than only absolute error.
- Quick test: Run the same task with two scales of input (for example x and 10x) and compare normalized error curves.
Problem 2: “Results are inconsistent across runs”
- Why: Random seeds, data split randomness, or non-deterministic ordering are uncontrolled.
- Fix: Set seeds, log configuration, and store split indices and hyperparameters with each run.
- Quick test: Re-run three times with the same seed and confirm metrics remain inside a tight tolerance band.
Problem 3: “The project works on the demo case but fails on edge cases”
- Why: Tests only cover happy-path inputs.
- Fix: Add adversarial inputs (empty values, extreme ranges, near-singular matrices, rare classes).
- Quick test: Build an edge-case test matrix and ensure every scenario reports expected behavior.
Definition of Done
- Core functionality works on reference inputs
- Edge cases are tested and documented
- Results are reproducible (seeded and versioned configuration)
- Performance or convergence behavior is measured and explained
- A short retrospective explains what failed first and how you fixed it
7. Extension Ideas
- Add a stress-test mode with adversarial inputs.
- Add a short benchmark report (runtime + memory + error trend).
- Add a reproducibility bundle (seed, config, and fixed test corpus).
8. Why This Project Matters
You cannot build a calculator without understanding the order of operations (PEMDAS), how functions transform inputs to outputs, and the relationship between exponents and logarithms. Implementing log(exp(x)) = x forces you to understand these as inverse operations.
This project is valuable because it creates observable evidence of mathematical reasoning under real implementation constraints.