Project 3: Build Your Own Shell

Build a minimal interactive shell that runs programs, handles built-ins, and wires pipelines.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	1 week
Main Programming Language	C (Alternatives: Rust, Go)
Alternative Programming Languages	Rust, Go
Coolness Level	See REFERENCE.md (Level 3)
Business Potential	See REFERENCE.md (Level 2)
Prerequisites	fork/exec basics, file descriptors, signals
Key Topics	process control, pipelines, job control

1. Learning Objectives

By completing this project, you will:

Implement the fork-exec-wait loop reliably.
Build pipelines by connecting file descriptors.
Handle signals so the shell survives Ctrl+C.
Explain why built-ins must run inside the shell.

2. All Theory Needed (Per-Concept Breakdown)

Process Model and Pipeline Control

Fundamentals A shell is a process that spawns other processes. It reads a command line, determines whether the command is built-in or external, and then uses fork and exec to launch programs. Pipelines are created by connecting stdout of one process to stdin of another using pipes, which are kernel-managed byte streams. Signals add control flow: the terminal sends SIGINT to the foreground process group, and the shell must decide whether to forward or ignore it. Understanding the process model and pipeline setup is essential because a shell is the simplest place where these OS primitives are used together.

Deep Dive The core shell loop is simple: read input, parse, decide, execute. The complexity lies in process control and file descriptor manipulation. The shell itself is a long-lived process. When it launches a command, it uses fork to create a child process that inherits the parent’s environment, file descriptors, and working directory. The child then replaces itself with the target program via exec. The parent process waits for the child to exit, collects its exit status, and returns to the prompt. This pattern is the foundation of Unix program execution.

Pipelines extend the pattern. A pipeline requires a pipe for each connection between commands. Each pipe has a read end and a write end. To connect two commands, the shell forks a child for the producer, duplicates the write end of the pipe to stdout, and closes unused file descriptors. It then forks a child for the consumer, duplicates the read end of the pipe to stdin, and closes unused file descriptors. The parent closes both ends and waits. The order and timing of these closes matters: leaving an unused write end open can cause the reader to block forever waiting for EOF.

Built-ins are a conceptual twist. Commands like cd and exit must affect the shell itself, not a child. If the shell executed them in a child process, the child would change its directory or exit, but the parent shell would remain unchanged. This is why shells classify commands into built-ins and external programs. This is not a special kernel feature; it is a policy decision enforced by the shell.

Signals and process groups introduce additional constraints. The terminal sends signals to the foreground process group, not to a single process. When the shell starts a pipeline, it should put all pipeline processes into the same process group and set that group as the foreground. The shell should also ignore or handle SIGINT so that Ctrl+C terminates the pipeline but not the shell. Failure to manage process groups results in the shell killing itself or leaving orphaned processes in the background.

Parsing is its own topic, but for a minimal shell you can treat it as tokenization with a few operators (pipes and redirection). The important insight is that parsing determines execution: a line with pipes implies multiple processes and pipes; a line with redirection implies adjusting file descriptors before exec. The shell is therefore a compiler of sorts: it translates a string into a process tree and file descriptor wiring plan.

How this fit on projects You will apply this concept in §3.1 for pipeline requirements, in §4.2 for component design, and in §5.10 for implementation phases. It also supports P02-syscall-tracer.md by providing realistic workloads to trace.

Definitions & key terms

Fork: Create a child process that inherits parent state.
Exec: Replace the current process with a new program.
Pipe: Kernel buffer connecting output of one process to input of another.
Process group: A set of processes that receive terminal signals together.
Built-in: Command executed by the shell itself.

Mental model diagram

Shell
  |
  +-- fork -> child A -> exec cmd1 -> stdout -> pipe -> child B -> exec cmd2
  |
  +-- wait for children

How it works

Read and parse a command line.
If built-in, execute in shell process.
If external, fork one child per command.
Connect pipes and redirections.
Wait for all children and return to prompt.

Minimal concrete example

Pipeline plan (conceptual):
cmd1 | cmd2
- create pipe
- child A: stdout -> pipe write
- child B: stdin  -> pipe read

Common misconceptions

“exec creates a new PID.” It reuses the current PID.
“Pipes are files on disk.” They are kernel buffers in memory.
“Signals go to one process only.” Terminals signal process groups.

Check-your-understanding questions

Why does cd need to run in the parent process?
What happens if a pipe write end is left open?
Why does the shell need to manage process groups?
What is the difference between fork and exec?

Check-your-understanding answers

The working directory must change in the shell itself.
The reader never sees EOF and may block forever.
To ensure Ctrl+C affects the pipeline, not the shell.
Fork duplicates process state; exec replaces it.

Real-world applications

Shell scripting and automation.
Building process supervisors and job control systems.
Debugging pipeline stalls in production systems.

Where you’ll apply it

See §3.1 What You Will Build and §4.1 High-Level Design.
Also used in: P02-syscall-tracer.md

References

“The Linux Programming Interface” - process control chapters
signal(7) man page: https://man7.org/linux/man-pages/man7/signal.7.html

Key insights A shell is a process compiler for pipelines.

Summary If you can implement fork/exec, pipes, and signal handling, you can build a minimal shell.

Homework/Exercises to practice the concept

Draw a process tree for a pipeline of three commands.
Explain why leaving file descriptors open can deadlock a pipeline.

Solutions to the homework/exercises

One parent (shell) with three children in a single process group.
Open write ends prevent EOF, so readers wait forever.

3. Project Specification

3.1 What You Will Build

A shell named myshell that supports running external commands, built-ins (cd, exit), pipes, and output redirection. It does not need scripting, globbing, or job control beyond foreground processes.

3.2 Functional Requirements

Built-ins: cd and exit handled by the shell.
External commands: Run arbitrary commands via fork/exec.
Pipelines: Support cmd1 | cmd2 and cmd1 | cmd2 | cmd3.
Redirection: Support > for stdout redirection.

3.3 Non-Functional Requirements

Performance: Commands should run with minimal overhead.
Reliability: No zombie processes after command completion.
Usability: Clear prompt and error messages.

3.4 Example Usage / Output

myshell> ls -l | grep .c
main.c
myshell> echo hello > out.txt
myshell> cat out.txt
hello

3.5 Data Formats / Schemas / Protocols

Command line is tokenized by whitespace.
Operators: | and >.
Exit status: integer code from child process.

3.6 Edge Cases

Empty input line.
Unknown command.
Pipe with missing command on either side.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

Build in project-root.
Run ./myshell from a terminal.

3.7.2 Golden Path Demo (Deterministic)

Use fixed commands that do not depend on time or randomness.

3.7.3 If CLI: Exact terminal transcript

$ ./myshell
myshell> pwd
/home/user/projects
myshell> echo hello | tr a-z A-Z
HELLO
myshell> exit
# exit code: 0

Failure demo (deterministic):

$ ./myshell
myshell> does-not-exist
error: command not found
# exit code: 127

Exit codes:

0 success
127 command not found

4. Solution Architecture

4.1 High-Level Design

Input -> Parser -> Execution plan -> fork/exec -> wait -> prompt

4.2 Key Components

Component	Responsibility	Key Decisions
Parser	Tokenize commands and operators	Minimal syntax only
Executor	Spawn processes and manage pipes	One process per command
Built-ins	Execute in shell process	Only cd/exit

4.4 Data Structures (No Full Code)

Token list: sequence of words and operators.
Command node: argv list + redirection info.
Pipeline: list of command nodes.

4.4 Algorithm Overview

Key Algorithm: pipeline execution

Create pipes for N-1 connections.
Fork N children and wire FDs.
Close unused FDs in parent.
Wait for all children.

Complexity Analysis:

Time: O(n) per command line.
Space: O(n) for tokens and pipeline metadata.

5. Implementation Guide

5.1 Development Environment Setup

# Install a C compiler and make

5.2 Project Structure

project-root/
├── src/
│   ├── shell.c
│   ├── parser.c
│   └── exec.c
├── tests/
│   └── shell_tests.sh
└── README.md

5.3 The Core Question You’re Answering

“What actually happens when I type a command?”

5.4 Concepts You Must Understand First

fork/exec/wait
- Why fork and exec are separate steps.
- Book Reference: “The Linux Programming Interface” - Ch. 24-27
File descriptors
- How redirection rewires stdin/stdout.
- Book Reference: “The Linux Programming Interface” - Ch. 4-5

5.5 Questions to Guide Your Design

How will you handle multiple pipes in one line?
How will you prevent zombie processes?

5.6 Thinking Exercise

Pipeline Wiring

Draw file descriptor wiring for a | b | c.

5.7 The Interview Questions They’ll Ask

“Why must cd be a built-in?”
“How does the shell implement pipes?”
“What is a zombie process?”
“What is the difference between stdin and stdout?”

5.8 Hints in Layers

Hint 1: Single Command First Get ls working without pipes.

Hint 2: Add Redirection Support > by rewiring stdout before exec.

Hint 3: Add Pipes Create pipes and fork one process per command.

Hint 4: Debugging Use Project 2 tracer to confirm syscalls.

5.9 Books That Will Help

Topic	Book	Chapter
Process control	“The Linux Programming Interface”	Ch. 24-27
Signals	“Advanced Programming in the UNIX Environment”	Ch. 10

5.10 Implementation Phases

Phase 1: Foundation (2 days)

Goals:

Parse a line and run a single command.

Tasks:

Build a token parser.
Execute a single external command.

Checkpoint: myshell> ls works.

Phase 2: Core Functionality (3 days)

Goals:

Add redirection and pipes.

Tasks:

Implement stdout redirection.
Implement pipelines.

Checkpoint: ls | grep txt works.

Phase 3: Polish & Edge Cases (2 days)

Goals:

Handle errors and cleanup.

Tasks:

Print helpful error messages.
Ensure no zombies.

Checkpoint: ps shows no defunct children.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Parser complexity	Minimal vs full shell grammar	Minimal	Keep scope focused
Pipe handling	Recursive vs iterative	Iterative	Easier to debug

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Parser behavior	Tokenize pipes
Integration Tests	Command execution	`echo hello`
Edge Case Tests	Empty input	Blank line

6.2 Critical Test Cases

Simple command: pwd prints working directory.
Pipeline: echo a | tr a-z A-Z prints A.
Redirection: echo hi > out.txt creates file.

6.3 Test Data

Commands: pwd, echo, ls
Expected: correct outputs and exit codes

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Not waiting	Zombies appear	Wait for all children
FD leak	Pipeline hangs	Close unused pipe ends
Built-in forked	cd has no effect	Run built-ins in parent

7.2 Debugging Strategies

Trace syscalls: use Project 2 tracer to verify fork/exec.
Print FD table: log which FDs are open per child.

7.3 Performance Traps

Pipes are fast but too many processes can slow startup; keep pipelines small.

8. Extensions & Challenges

8.1 Beginner Extensions

Add history command.
Add pwd built-in.

8.2 Intermediate Extensions

Add >> append redirection.
Add input redirection <.

8.3 Advanced Extensions

Add background jobs with &.
Add job control for fg/bg.

9. Real-World Connections

9.1 Industry Applications

Shells: bash, zsh, fish.
Job control: process supervisors and schedulers.

bash: https://www.gnu.org/software/bash/ - reference shell.
dash: http://gondor.apana.org.au/~herbert/dash/ - minimal shell.

9.3 Interview Relevance

Pipes, fork/exec, and signal handling are standard OS interview topics.

10. Resources

10.1 Essential Reading

“The Linux Programming Interface” - process control chapters
“Advanced Programming in the UNIX Environment” - signals

10.2 Video Resources

“Build a shell” lectures (search title)

10.3 Tools & Documentation

man7: https://man7.org/linux/man-pages/man2/fork.2.html
POSIX: https://pubs.opengroup.org/onlinepubs/9699919799/

11. Self-Assessment Checklist

11.1 Understanding

I can explain fork vs exec
I can explain why built-ins run in the parent
I understand how pipes connect processes

11.2 Implementation

All functional requirements are met
No zombie processes remain
Pipelines behave deterministically

11.3 Growth

I can explain this shell in an interview
I documented my design decisions
I can propose an extension

12. Submission / Completion Criteria

Minimum Viable Completion:

Single commands run correctly
Built-ins work in the parent
Basic pipelines function

Full Completion:

All minimum criteria plus:
Redirection works reliably
Error messages are clear

Excellence (Going Above & Beyond):

Job control and background processes
Script execution support

Project 3: Build Your Own Shell

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

Process Model and Pipeline Control

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 If CLI: Exact terminal transcript

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.4 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2 days)

Phase 2: Core Functionality (3 days)

Phase 3: Polish & Edge Cases (2 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria