Project 3: Build Your Own Shell
Build a minimal interactive shell that runs programs, handles built-ins, and wires pipelines.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1 week |
| Main Programming Language | C (Alternatives: Rust, Go) |
| Alternative Programming Languages | Rust, Go |
| Coolness Level | See REFERENCE.md (Level 3) |
| Business Potential | See REFERENCE.md (Level 2) |
| Prerequisites | fork/exec basics, file descriptors, signals |
| Key Topics | process control, pipelines, job control |
1. Learning Objectives
By completing this project, you will:
- Implement the fork-exec-wait loop reliably.
- Build pipelines by connecting file descriptors.
- Handle signals so the shell survives Ctrl+C.
- Explain why built-ins must run inside the shell.
2. All Theory Needed (Per-Concept Breakdown)
Process Model and Pipeline Control
Fundamentals A shell is a process that spawns other processes. It reads a command line, determines whether the command is built-in or external, and then uses fork and exec to launch programs. Pipelines are created by connecting stdout of one process to stdin of another using pipes, which are kernel-managed byte streams. Signals add control flow: the terminal sends SIGINT to the foreground process group, and the shell must decide whether to forward or ignore it. Understanding the process model and pipeline setup is essential because a shell is the simplest place where these OS primitives are used together.
Deep Dive The core shell loop is simple: read input, parse, decide, execute. The complexity lies in process control and file descriptor manipulation. The shell itself is a long-lived process. When it launches a command, it uses fork to create a child process that inherits the parent’s environment, file descriptors, and working directory. The child then replaces itself with the target program via exec. The parent process waits for the child to exit, collects its exit status, and returns to the prompt. This pattern is the foundation of Unix program execution.
Pipelines extend the pattern. A pipeline requires a pipe for each connection between commands. Each pipe has a read end and a write end. To connect two commands, the shell forks a child for the producer, duplicates the write end of the pipe to stdout, and closes unused file descriptors. It then forks a child for the consumer, duplicates the read end of the pipe to stdin, and closes unused file descriptors. The parent closes both ends and waits. The order and timing of these closes matters: leaving an unused write end open can cause the reader to block forever waiting for EOF.
Built-ins are a conceptual twist. Commands like cd and exit must affect the shell itself, not a child. If the shell executed them in a child process, the child would change its directory or exit, but the parent shell would remain unchanged. This is why shells classify commands into built-ins and external programs. This is not a special kernel feature; it is a policy decision enforced by the shell.
Signals and process groups introduce additional constraints. The terminal sends signals to the foreground process group, not to a single process. When the shell starts a pipeline, it should put all pipeline processes into the same process group and set that group as the foreground. The shell should also ignore or handle SIGINT so that Ctrl+C terminates the pipeline but not the shell. Failure to manage process groups results in the shell killing itself or leaving orphaned processes in the background.
Parsing is its own topic, but for a minimal shell you can treat it as tokenization with a few operators (pipes and redirection). The important insight is that parsing determines execution: a line with pipes implies multiple processes and pipes; a line with redirection implies adjusting file descriptors before exec. The shell is therefore a compiler of sorts: it translates a string into a process tree and file descriptor wiring plan.
How this fit on projects You will apply this concept in §3.1 for pipeline requirements, in §4.2 for component design, and in §5.10 for implementation phases. It also supports P02-syscall-tracer.md by providing realistic workloads to trace.
Definitions & key terms
- Fork: Create a child process that inherits parent state.
- Exec: Replace the current process with a new program.
- Pipe: Kernel buffer connecting output of one process to input of another.
- Process group: A set of processes that receive terminal signals together.
- Built-in: Command executed by the shell itself.
Mental model diagram
Shell
|
+-- fork -> child A -> exec cmd1 -> stdout -> pipe -> child B -> exec cmd2
|
+-- wait for children
How it works
- Read and parse a command line.
- If built-in, execute in shell process.
- If external, fork one child per command.
- Connect pipes and redirections.
- Wait for all children and return to prompt.
Minimal concrete example
Pipeline plan (conceptual):
cmd1 | cmd2
- create pipe
- child A: stdout -> pipe write
- child B: stdin -> pipe read
Common misconceptions
- “exec creates a new PID.” It reuses the current PID.
- “Pipes are files on disk.” They are kernel buffers in memory.
- “Signals go to one process only.” Terminals signal process groups.
Check-your-understanding questions
- Why does
cdneed to run in the parent process? - What happens if a pipe write end is left open?
- Why does the shell need to manage process groups?
- What is the difference between fork and exec?
Check-your-understanding answers
- The working directory must change in the shell itself.
- The reader never sees EOF and may block forever.
- To ensure Ctrl+C affects the pipeline, not the shell.
- Fork duplicates process state; exec replaces it.
Real-world applications
- Shell scripting and automation.
- Building process supervisors and job control systems.
- Debugging pipeline stalls in production systems.
Where you’ll apply it
- See §3.1 What You Will Build and §4.1 High-Level Design.
- Also used in: P02-syscall-tracer.md
References
- “The Linux Programming Interface” - process control chapters
- signal(7) man page: https://man7.org/linux/man-pages/man7/signal.7.html
Key insights A shell is a process compiler for pipelines.
Summary If you can implement fork/exec, pipes, and signal handling, you can build a minimal shell.
Homework/Exercises to practice the concept
- Draw a process tree for a pipeline of three commands.
- Explain why leaving file descriptors open can deadlock a pipeline.
Solutions to the homework/exercises
- One parent (shell) with three children in a single process group.
- Open write ends prevent EOF, so readers wait forever.
3. Project Specification
3.1 What You Will Build
A shell named myshell that supports running external commands, built-ins (cd, exit), pipes, and output redirection. It does not need scripting, globbing, or job control beyond foreground processes.
3.2 Functional Requirements
- Built-ins:
cdandexithandled by the shell. - External commands: Run arbitrary commands via fork/exec.
- Pipelines: Support
cmd1 | cmd2andcmd1 | cmd2 | cmd3. - Redirection: Support
>for stdout redirection.
3.3 Non-Functional Requirements
- Performance: Commands should run with minimal overhead.
- Reliability: No zombie processes after command completion.
- Usability: Clear prompt and error messages.
3.4 Example Usage / Output
myshell> ls -l | grep .c
main.c
myshell> echo hello > out.txt
myshell> cat out.txt
hello
3.5 Data Formats / Schemas / Protocols
- Command line is tokenized by whitespace.
- Operators:
|and>. - Exit status: integer code from child process.
3.6 Edge Cases
- Empty input line.
- Unknown command.
- Pipe with missing command on either side.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
- Build in
project-root. - Run
./myshellfrom a terminal.
3.7.2 Golden Path Demo (Deterministic)
Use fixed commands that do not depend on time or randomness.
3.7.3 If CLI: Exact terminal transcript
$ ./myshell
myshell> pwd
/home/user/projects
myshell> echo hello | tr a-z A-Z
HELLO
myshell> exit
# exit code: 0
Failure demo (deterministic):
$ ./myshell
myshell> does-not-exist
error: command not found
# exit code: 127
Exit codes:
- 0 success
- 127 command not found
4. Solution Architecture
4.1 High-Level Design
Input -> Parser -> Execution plan -> fork/exec -> wait -> prompt
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Parser | Tokenize commands and operators | Minimal syntax only |
| Executor | Spawn processes and manage pipes | One process per command |
| Built-ins | Execute in shell process | Only cd/exit |
4.4 Data Structures (No Full Code)
- Token list: sequence of words and operators.
- Command node: argv list + redirection info.
- Pipeline: list of command nodes.
4.4 Algorithm Overview
Key Algorithm: pipeline execution
- Create pipes for N-1 connections.
- Fork N children and wire FDs.
- Close unused FDs in parent.
- Wait for all children.
Complexity Analysis:
- Time: O(n) per command line.
- Space: O(n) for tokens and pipeline metadata.
5. Implementation Guide
5.1 Development Environment Setup
# Install a C compiler and make
5.2 Project Structure
project-root/
├── src/
│ ├── shell.c
│ ├── parser.c
│ └── exec.c
├── tests/
│ └── shell_tests.sh
└── README.md
5.3 The Core Question You’re Answering
“What actually happens when I type a command?”
5.4 Concepts You Must Understand First
- fork/exec/wait
- Why fork and exec are separate steps.
- Book Reference: “The Linux Programming Interface” - Ch. 24-27
- File descriptors
- How redirection rewires stdin/stdout.
- Book Reference: “The Linux Programming Interface” - Ch. 4-5
5.5 Questions to Guide Your Design
- How will you handle multiple pipes in one line?
- How will you prevent zombie processes?
5.6 Thinking Exercise
Pipeline Wiring
Draw file descriptor wiring for a | b | c.
5.7 The Interview Questions They’ll Ask
- “Why must
cdbe a built-in?” - “How does the shell implement pipes?”
- “What is a zombie process?”
- “What is the difference between stdin and stdout?”
5.8 Hints in Layers
Hint 1: Single Command First
Get ls working without pipes.
Hint 2: Add Redirection
Support > by rewiring stdout before exec.
Hint 3: Add Pipes Create pipes and fork one process per command.
Hint 4: Debugging Use Project 2 tracer to confirm syscalls.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Process control | “The Linux Programming Interface” | Ch. 24-27 |
| Signals | “Advanced Programming in the UNIX Environment” | Ch. 10 |
5.10 Implementation Phases
Phase 1: Foundation (2 days)
Goals:
- Parse a line and run a single command.
Tasks:
- Build a token parser.
- Execute a single external command.
Checkpoint: myshell> ls works.
Phase 2: Core Functionality (3 days)
Goals:
- Add redirection and pipes.
Tasks:
- Implement stdout redirection.
- Implement pipelines.
Checkpoint: ls | grep txt works.
Phase 3: Polish & Edge Cases (2 days)
Goals:
- Handle errors and cleanup.
Tasks:
- Print helpful error messages.
- Ensure no zombies.
Checkpoint: ps shows no defunct children.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Parser complexity | Minimal vs full shell grammar | Minimal | Keep scope focused |
| Pipe handling | Recursive vs iterative | Iterative | Easier to debug |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Parser behavior | Tokenize pipes |
| Integration Tests | Command execution | echo hello |
| Edge Case Tests | Empty input | Blank line |
6.2 Critical Test Cases
- Simple command:
pwdprints working directory. - Pipeline:
echo a | tr a-z A-Zprints A. - Redirection:
echo hi > out.txtcreates file.
6.3 Test Data
Commands: pwd, echo, ls
Expected: correct outputs and exit codes
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Not waiting | Zombies appear | Wait for all children |
| FD leak | Pipeline hangs | Close unused pipe ends |
| Built-in forked | cd has no effect | Run built-ins in parent |
7.2 Debugging Strategies
- Trace syscalls: use Project 2 tracer to verify fork/exec.
- Print FD table: log which FDs are open per child.
7.3 Performance Traps
Pipes are fast but too many processes can slow startup; keep pipelines small.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
historycommand. - Add
pwdbuilt-in.
8.2 Intermediate Extensions
- Add
>>append redirection. - Add input redirection
<.
8.3 Advanced Extensions
- Add background jobs with
&. - Add job control for fg/bg.
9. Real-World Connections
9.1 Industry Applications
- Shells: bash, zsh, fish.
- Job control: process supervisors and schedulers.
9.2 Related Open Source Projects
- bash: https://www.gnu.org/software/bash/ - reference shell.
- dash: http://gondor.apana.org.au/~herbert/dash/ - minimal shell.
9.3 Interview Relevance
Pipes, fork/exec, and signal handling are standard OS interview topics.
10. Resources
10.1 Essential Reading
- “The Linux Programming Interface” - process control chapters
- “Advanced Programming in the UNIX Environment” - signals
10.2 Video Resources
- “Build a shell” lectures (search title)
10.3 Tools & Documentation
- man7: https://man7.org/linux/man-pages/man2/fork.2.html
- POSIX: https://pubs.opengroup.org/onlinepubs/9699919799/
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain fork vs exec
- I can explain why built-ins run in the parent
- I understand how pipes connect processes
11.2 Implementation
- All functional requirements are met
- No zombie processes remain
- Pipelines behave deterministically
11.3 Growth
- I can explain this shell in an interview
- I documented my design decisions
- I can propose an extension
12. Submission / Completion Criteria
Minimum Viable Completion:
- Single commands run correctly
- Built-ins work in the parent
- Basic pipelines function
Full Completion:
- All minimum criteria plus:
- Redirection works reliably
- Error messages are clear
Excellence (Going Above & Beyond):
- Job control and background processes
- Script execution support