Project 14: Script Interpreter
Implement a scripting interpreter with control flow, functions, and positional parameters.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert (The Systems Architect) |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C |
| Alternative Programming Languages | Rust, Go, OCaml |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | 1. The “Resume Gold” (Educational/Personal Brand) |
| Prerequisites | AST parsing, exit status semantics, variable scoping |
| Key Topics | compound commands, functions, positional params |
1. Learning Objectives
By completing this project, you will:
- Explain and implement compound commands in the context of a shell.
- Build a working script interpreter that matches the project specification.
- Design tests that validate correctness and edge cases.
- Document design decisions, trade-offs, and limitations.
2. All Theory Needed (Per-Concept Breakdown)
Shell Scripting, Compound Commands, and Control Flow
Fundamentals
Shell scripting extends command execution into a full programming language. Compound commands like if, while, for, and case are parsed into AST nodes and executed based on exit status truthiness. Variables, positional parameters, and functions provide structure and reuse. A script interpreter must therefore evaluate command lists, manage scopes, and propagate exit codes correctly. This is where shell parsing meets language execution semantics.
Deep Dive into the concept
Shell control flow is unusual because it treats commands as boolean expressions: if cmd; then ... runs the then branch only if cmd exits with 0. This means your interpreter must execute command lists for their side effects and then inspect $? to decide control flow. In while loops, the condition is a command list that is executed before each iteration. In until, the loop continues until the command succeeds. This model is consistent but can be surprising if you are used to languages where conditions are explicit expressions.
Compound commands have their own syntax and parsing rules. An if block includes then, optional elif and else, and ends with fi. A while loop requires do/done delimiters, and a for loop iterates over a list that can be produced by expansions. case statements match patterns against a word. Your parser must create distinct AST nodes for these constructs, because each has different execution rules and scoping behaviors. For example, a case node needs a list of pattern-action pairs; a for node needs a variable name and a list of words or a C-style iteration expression depending on your grammar.
Functions introduce scoping and positional parameters. When a function is called, $1, $2, and $@ should refer to the function’s arguments, not the script’s arguments. This requires a scope stack for positional parameters and local variables. When the function returns (via return or by reaching the end), the previous positional parameters must be restored. Many shells also allow local variables within functions; if you implement this, it should push a new variable scope on entry and pop it on exit.
Error handling in scripts is subtle. Options like set -e change the behavior so that the script exits when a command fails, but there are exceptions (e.g., inside if conditions). You may choose to support a subset of these rules initially, but you must be explicit about what you do. Even without set -e, you must ensure return and exit propagate status correctly and unwind execution stacks as needed.
Script execution differs from interactive execution in terms of prompts, error handling, and environment. Scripts often run non-interactively and should exit on syntax errors rather than recover. When a script starts, it should process the shebang line (#!) if invoked directly. If your shell is the interpreter, you need to parse the script file line by line, handle multi-line constructs, and build a consistent AST. This is where your parser and executor must be robust enough for large inputs rather than single-line commands.
How this fits on projects Scripting is the capstone for parsing and execution; it ties together the shell grammar, variable system, and exit status model.
Definitions & key terms
- Compound command: Multi-token control-flow construct (if, while, case).
- Positional parameters:
$1,$2, etc., representing arguments. - Return status: Exit code set by
returnor last command. - Shebang: Script interpreter directive at file start.
Mental model diagram
AST -> evaluate node -> run commands -> check $?
if/while/case nodes manage control flow
How it works (step-by-step)
- Parse script into an AST with compound nodes.
- Execute nodes recursively, using
$?to guide decisions. - Manage function call frames and positional parameters.
- Handle
returnandexitto unwind execution. - Report errors and set script exit status.
Minimal concrete example
if false; then echo ok; else echo fail; fi
# prints "fail" because false exits non-zero
Common misconceptions
- “Conditions are boolean expressions” -> they are command lists.
- “Functions share the same $1” -> functions have their own params.
- “Script errors behave like interactive errors” -> scripts usually exit.
Check-your-understanding questions
- Why is
if false; then ...valid in shell? - What does
$1refer to inside a function? - How should
return 2affect$??
Check-your-understanding answers
- Because the condition is a command;
falsesets exit status. - The function’s first argument, not the script’s.
- It sets the function’s return status to 2.
Real-world applications
- Shell scripts in CI/CD pipelines.
- System initialization scripts.
- Automation and deployment tooling.
Where you’ll apply it
- In this project: see §4.1 High-Level Design and §5.10 Phase 2.
- Also used in: None
References
- POSIX Shell Command Language (compound commands).
- “Shell Programming in Unix, Linux, and OS X”.
Key insights Shell scripting uses commands as truth values and requires disciplined scope handling.
Summary Implementing scripts turns a command runner into a language runtime, requiring correct parsing, scoping, and exit-status semantics.
Homework/Exercises to practice the concept
- Implement
ifandwhilenodes in a small AST evaluator. - Add function calls with their own positional parameters.
- Parse and execute a script file with multiple blocks.
Solutions to the homework/exercises
- Evaluate condition commands and branch based on
$?. - Push a parameter frame on function entry, pop on return.
- Parse the file into a list node and execute sequentially.
Parsing, Precedence, and AST Construction
Fundamentals
Parsing transforms a linear stream of tokens into a structured representation that reflects operator precedence and command grouping. Shell syntax is deceptively complex: pipelines bind tighter than logical operators, and lists separated by ; or & are lower precedence than && and ||. The parser must also handle compound commands like if, while, and subshells, and integrate redirections that can appear anywhere in a simple command. A well-designed Abstract Syntax Tree (AST) makes execution straightforward because each node type encodes a specific execution strategy.
Deep Dive into the concept
Shell parsing is closer to a compiler front-end than to a simple expression parser. A robust approach is recursive descent with functions that mirror the grammar: parse_list handles sequences, parse_and_or handles && and ||, parse_pipeline handles |, and parse_command handles simple commands and compound constructs. This structure enforces precedence naturally: parse_pipeline is called inside parse_and_or, so pipelines bind tighter than logical operators. Associativity matters too; pipelines and logical operators are typically left-associative, meaning a | b | c becomes a pipeline node with three children rather than nested pipelines that reverse order.
Simple commands are particularly tricky because redirections can be interleaved with words: cmd arg1 > out arg2 < in is legal and should yield a command node with arguments [cmd, arg1, arg2] plus a list of redirections. The parser must therefore accept redirection tokens at any point in a simple command, collect them, and keep the argument order intact. If your parser instead treats redirection as terminating a command, you will mis-handle valid inputs.
Compound commands add complexity. if, while, for, and case introduce nested lists of commands, often with keywords that also appear as regular words in other contexts. The parser must decide when a token like do or then is a keyword versus just a word. Most shells restrict keyword recognition to specific grammar positions, which means the parser’s current nonterminal determines whether a token is treated as a keyword. This is a subtle but essential rule.
Error handling is a first-class concern. An interactive shell should distinguish incomplete input (missing fi, unmatched )) from invalid input. Incomplete input should trigger a continuation prompt; invalid input should report a syntax error and recover to the next command boundary. This requires the parser to track expected tokens and provide meaningful diagnostics. A good parser reports the unexpected token and the context (e.g., “unexpected | after &&”). This is not just UX; it helps you debug your own grammar.
The AST is more than a tree; it is the interface between parsing and execution. Each node type (SimpleCommand, Pipeline, AndOr, List, Subshell, If, While) should hold only what execution needs: arguments, redirections, children, and control flags. If the parser builds a clean AST, the executor can be a straightforward tree walk with predictable behavior. If the AST is messy or ambiguous, execution becomes a tangle of special cases.
How this fits on projects Parsing underpins all control flow and pipeline semantics. Without a correct AST, execution will be wrong even if your runtime is perfect.
Definitions & key terms
- AST: Abstract Syntax Tree representing command structure.
- Precedence: Rules that determine grouping of operators.
- Associativity: Direction of grouping for operators of equal precedence.
- Nonterminal: A grammar production like
pipelineorand_or.
Mental model diagram
Tokens --> Parser --> AST
a | b && c
|
v
AndOr
/ Pipeline c
/ a b
How it works (step-by-step)
- Consume tokens with a recursive descent parser.
- Build nodes for pipelines, logical operators, and lists.
- Attach redirections to simple commands as a list.
- Produce AST root for execution phase.
- On error, report unexpected token and recover if interactive.
Minimal concrete example
Input: a | b && c
AST: AndOr(Pipeline(a,b), c)
Common misconceptions
- “Redirections only come at the end” -> they can appear anywhere.
- “
|and&&have same precedence” -> pipeline binds tighter. - “Keywords are always keywords” -> context decides keyword meaning.
Check-your-understanding questions
- Why must
a | b && cparse as(a | b) && c? - Where should redirections be stored in the AST?
- How should the parser react to an unfinished
ifblock?
Check-your-understanding answers
- Because pipeline has higher precedence than
&&. - Attached to the simple command node as a list.
- Mark input incomplete and request continuation.
Real-world applications
- Shells and REPLs.
- Configuration languages with command syntax.
- DSLs that combine operators and commands.
Where you’ll apply it
- In this project: see §4 Solution Architecture and §5.10 Phase 2.
- Also used in: P03 Shell Parser (AST Builder), P15 POSIX-Compliant Shell, P16 Structured Data Shell (Nushell-Inspired)
References
- POSIX Shell Command Language (grammar).
- “Language Implementation Patterns” (recursive descent).
Key insights The parser shapes execution; a correct AST makes execution trivial.
Summary Parsing is the structural heart of a shell: it enforces precedence, groups commands, and defines control flow for the executor.
Homework/Exercises to practice the concept
- Write a tiny parser for pipelines and
&&/||and dump the AST. - Add support for redirections in simple commands.
- Build a syntax error reporter that shows the unexpected token.
Solutions to the homework/exercises
- Use recursive descent with
parse_pipelineinsideparse_and_or. - Allow redirection tokens anywhere in a simple command loop.
- Track the last token and expected set, print a clear message.
Exit Status, Wait Semantics, and Truthiness
Fundamentals
Shells treat exit status as the primary truth value of command execution. An exit status is an integer reported when a process terminates; by convention, 0 means success and non-zero means failure. The shell reads this status via waitpid() and exposes it as $?. This is not just metadata: control flow operators (&&, ||, if, while) all use exit status to decide what to do next. A correct shell must distinguish normal exit from signal termination, map errors to standard codes like 126 and 127, and propagate the right status through pipelines and compound commands. This concept links process control with scripting semantics and is essential for correctness.
Deep Dive into the concept
When a child exits, the kernel encodes the reason in a status word returned by waitpid(). The shell must interpret this correctly: WIFEXITED(status) indicates normal completion and WEXITSTATUS(status) yields the actual exit code (0–255). WIFSIGNALED(status) indicates termination by a signal, and WTERMSIG(status) gives the signal number. Many shells represent signal termination as 128 + signal, so a process killed by SIGINT (2) yields status 130. This is not just a convention; scripts rely on these distinctions. For example, build scripts may treat 127 as “command not found” and fail fast, while they may retry on other codes.
Exit status also interacts with pipelines. POSIX allows the status of a pipeline to be either the status of the last command or a composite status, and some shells provide options (like pipefail) to change this behavior. If you are building a minimal shell, you should document your choice and implement it consistently. For job control, foreground pipelines typically update $? when the last process in the pipeline completes, while background pipelines update job tables but do not immediately change $?.
The concept of “truthiness” in shell scripting is inverted relative to many languages: success is 0 (true), and failure is non-zero (false). This can surprise programmers coming from C-like languages where 0 is false. In shell, if cmd; then ... fi runs the then branch if cmd exits with 0. Control operators && and || are just syntactic sugar over this truth model: a && b runs b only if a succeeded, and a || b runs b only if a failed. A shell that gets this wrong will make scripts behave unpredictably.
Exit status also carries subtle edge cases. For instance, built-ins must set $? explicitly; functions return their last command’s exit status unless overridden by return. If the shell itself exits due to a fatal error, it should return a meaningful exit status to its parent (often 2 for syntax errors in POSIX). If a child is stopped (SIGTSTP), the status indicates stoppage rather than exit; a job control shell must track that and resume it later.
How this fits on projects You will use exit status to decide control flow, report failures, and mirror real shell behavior in every project that executes commands.
Definitions & key terms
- Exit status: Integer code set by a process at termination.
- WIFEXITED/WEXITSTATUS: Macros to inspect normal exit codes.
- WIFSIGNALED/WTERMSIG: Macros to inspect signal termination.
- Truthiness: In shell,
0is true and non-zero is false.
Mental model diagram
child exits -> kernel encodes status -> waitpid() -> shell updates $?
| |
+-- signal? -> 128 + signal +-- normal? -> exit code
How it works (step-by-step)
- Parent calls
waitpid()for a child PID. - Inspect
WIFEXITEDvsWIFSIGNALED. - Map signal termination to
128 + signal. - Store the code in
$?and in job metadata. - For pipelines, choose consistent policy (last command or pipefail).
Minimal concrete example
int status;
waitpid(pid, &status, 0);
if (WIFEXITED(status)) {
shell_status = WEXITSTATUS(status);
} else if (WIFSIGNALED(status)) {
shell_status = 128 + WTERMSIG(status);
}
Common misconceptions
- “Non-zero is true” -> in shell, non-zero is false.
- “Signal termination has no exit code” -> shells map signals to 128+N.
- “Pipelines always return last status” -> depends on policy.
Check-your-understanding questions
- Why does
false && echo oknot print anything? - What should
$?be when a process is killed by SIGKILL (9)? - How does a shell decide the status of a pipeline?
Check-your-understanding answers
- Because
falseexits non-zero, so the&&short-circuits. - 128 + 9 = 137 in most shells.
- By policy: often the last command, optionally pipefail.
Real-world applications
- Shell scripting control flow.
- CI pipelines that interpret exit codes.
- Service supervisors that restart failed programs.
Where you’ll apply it
- In this project: see §3.2 Functional Requirements and §6 Testing Strategy.
- Also used in: P01 Minimal Command Executor, P04 Built-in Commands Engine
References
- POSIX Shell Command Language (exit status rules).
- “Advanced Programming in the UNIX Environment” (waitpid).
Key insights Exit status is the shell’s core truth value, not a minor detail.
Summary Correct exit status handling ties process control to scripting semantics and determines whether higher-level logic behaves correctly.
Homework/Exercises to practice the concept
- Write a program that runs
falseand prints its exit status. - Send SIGINT to a child and print the resulting
$?equivalent. - Build a tiny pipeline and decide how you will compute its status.
Solutions to the homework/exercises
- Use
fork/execandwaitpidto capture the status. - Use
kill(child, SIGINT)and map to 128+signal. - Record the last child’s status, or implement pipefail.
3. Project Specification
3.1 What You Will Build
A script executor that supports if, while, for, case, and functions.
Included:
- Core feature set described above
- Deterministic CLI behavior and exit codes
Excluded:
- Full POSIX grammar optional; focus on core control flow.
3.2 Functional Requirements
- Requirement 1: Parse and execute compound commands.
- Requirement 2: Implement
if,while,for, andcasesemantics. - Requirement 3: Support shell functions with positional parameters.
- Requirement 4: Handle
returnandexitwith correct status. - Requirement 5: Run scripts with shebang lines.
3.3 Non-Functional Requirements
- Performance: Interactive latency under 50ms for typical inputs; pipeline setup should scale linearly.
- Reliability: No crashes on malformed input; errors reported clearly with non-zero status.
- Usability: Clear prompts, deterministic behavior, and predictable error messages.
3.4 Example Usage / Output
$ cat demo.sh
#!/path/to/mysh
count=0
while [ $count -lt 3 ]; do
echo "count=$count"
count=$((count+1))
done
$ ./mysh demo.sh
count=0
count=1
count=2
3.5 Data Formats / Schemas / Protocols
- AST nodes for compound commands and functions.
3.6 Edge Cases
- Nested loops
- Function recursion
- Exit in pipeline
3.7 Real World Outcome
This is the exact behavior you should be able to demonstrate.
3.7.1 How to Run (Copy/Paste)
- make
- ./mysh demo.sh
3.7.2 Golden Path Demo (Deterministic)
$ cat demo.sh
#!/path/to/mysh
count=0
while [ $count -lt 3 ]; do
echo "count=$count"
count=$((count+1))
done
$ ./mysh demo.sh
count=0
count=1
count=2
3.7.3 Failure Demo (Deterministic)
$ ./mysh demo.sh
mysh> not_a_command
mysh> echo $?
127
4. Solution Architecture
4.1 High-Level Design
[Input] -> [Parser/Lexer] -> [Core Engine] -> [Executor/Output]
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Script Parser | Extend grammar with compound nodes | Keyword-aware parsing. |
| Executor | Evaluate control-flow nodes | Uses $? semantics. |
| Scope Stack | Manages function variables and params | Push/pop on call. |
4.4 Data Structures (No Full Code)
struct Function { char *name; struct ASTNode *body; };
4.4 Algorithm Overview
Key Algorithm: AST Walk
- Evaluate nodes recursively
- propagate status
Complexity Analysis:
- Time: O(n) per script run
- Space: O(n) per script run
5. Implementation Guide
5.1 Development Environment Setup
# install dependencies (if any)
# build
make
5.2 Project Structure
project-root/
├── src/
│ ├── main.c
│ ├── lexer.c
│ └── executor.c
├── tests/
│ └── test_basic.sh
├── Makefile
└── README.md
5.3 The Core Question You’re Answering
How does a shell turn commands into a real programming language?
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Compound commands
- Exit status truthiness
- Function scope
5.5 Questions to Guide Your Design
5.6 Thinking Exercise
The “Truth is Exit Code” Problem
Why is if false; then ... valid in shell but not in C?
5.7 The Interview Questions They’ll Ask
5.8 Hints in Layers
Hint 1: Extend grammar
Add AST nodes for If, While, For, Case, Function.
Hint 2: Use command lists as conditions
Execute and inspect $?.
Hint 3: Maintain a scope stack Push scope for functions, pop on return.
Hint 4: Implement return and exit
Return sets exit status and unwinds function.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Parsing control flow | “Language Implementation Patterns” | Ch. 8 |
| Shell scripting | “Shell Programming in Unix, Linux and OS X” | Ch. 5-8 |
| POSIX rules | POSIX Shell Command Language | Compound commands |
5.10 Implementation Phases
Phase 1: Foundation (2-3 days)
Goals:
- Define data structures and interfaces
- Build a minimal end-to-end demo
Tasks:
- Implement the core data structures
- Build a tiny CLI or harness for manual tests
Checkpoint: A demo command runs end-to-end with clear logging.
Phase 2: Core Functionality (1 week)
Goals:
- Implement full feature set
- Validate with unit tests
Tasks:
- Implement core requirements
- Add error handling and edge cases
Checkpoint: All functional requirements pass basic tests.
Phase 3: Polish & Edge Cases (2-4 days)
Goals:
- Harden for weird inputs
- Improve UX and documentation
Tasks:
- Add edge-case tests
- Document design decisions
Checkpoint: Deterministic golden demo and clean error output.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Parsing depth | Minimal vs full | Incremental | Start small, expand safely |
| Error policy | Silent vs verbose | Verbose | Debuggability for learners |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test individual components | Tokenizer, matcher, env builder |
| Integration Tests | Test component interactions | Full command lines |
| Edge Case Tests | Handle boundary conditions | Empty input, bad args |
6.2 Critical Test Cases
- Golden Path: Run the canonical demo and verify output.
- Failure Path: Provide invalid input and confirm error status.
- Stress Path: Run repeated commands to detect leaks or state corruption.
6.3 Test Data
input: echo hello
output: hello
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Misordered redirection | Output goes to wrong place | Apply redirections left-to-right |
| Leaked file descriptors | Commands hang waiting for EOF | Close unused fds in parent/child |
| Incorrect exit status | &&/|| behave wrong |
Use waitpid macros correctly |
7.2 Debugging Strategies
- Trace syscalls: Use
strace/dtrussto verify fork/exec/dup2 order. - Log state transitions: Print parser states and job table changes in debug mode.
- Compare with
dash: Run the same input in a reference shell.
7.3 Performance Traps
- Avoid O(n^2) behavior in hot paths like line editing.
- Minimize allocations inside the REPL loop.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a
helpbuilt-in with usage docs. - Add colored prompt themes.
8.2 Intermediate Extensions
- Add a simple profiling mode for command timing.
- Implement a
whichbuilt-in using PATH lookup.
8.3 Advanced Extensions
- Add programmable completion or plugin system.
- Add a scriptable test harness with golden outputs.
9. Real-World Connections
9.1 Industry Applications
- Build systems: shells orchestrate compilation and test pipelines.
- DevOps automation: scripts manage deployments and infrastructure.
9.2 Related Open Source Projects
- bash: The most common interactive shell.
- dash: Minimal POSIX shell often used as /bin/sh.
- zsh: Feature-rich interactive shell.
9.3 Interview Relevance
- Process creation and lifecycle questions.
- Parsing and system programming design trade-offs.
10. Resources
10.1 Essential Reading
- “Language Implementation Patterns” by Terence Parr - focus on the chapters relevant to this project.
- “Advanced Programming in the UNIX Environment” - process control and pipes.
10.2 Video Resources
- Unix process model lectures (any OS course).
- Compiler front-end videos for lexing/parsing projects.
10.3 Tools & Documentation
- strace/dtruss: inspect syscalls.
- man pages:
fork,execve,waitpid,pipe,dup2.
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the core concept without notes.
- I can trace a command through my subsystem.
- I understand at least one key design trade-off.
11.2 Implementation
- All functional requirements are met.
- All critical tests pass.
- Edge cases are handled cleanly.
11.3 Growth
- I documented lessons learned.
- I can explain this project in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Core feature works for the golden demo.
- Errors are handled with non-zero status.
- Code is readable and buildable.
Full Completion:
- All functional requirements met.
- Tests cover edge cases and failures.
Excellence (Going Above & Beyond):
- Performance benchmarks and clear documentation.
- Behavior compared against a reference shell.