Project 3: Build Your Own Shell

Build a minimal interactive shell that runs programs, handles built-ins, and wires pipelines.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1 week
Main Programming Language C (Alternatives: Rust, Go)
Alternative Programming Languages Rust, Go
Coolness Level See REFERENCE.md (Level 3)
Business Potential See REFERENCE.md (Level 2)
Prerequisites fork/exec basics, file descriptors, signals
Key Topics process control, pipelines, job control

1. Learning Objectives

By completing this project, you will:

  1. Implement the fork-exec-wait loop reliably.
  2. Build pipelines by connecting file descriptors.
  3. Handle signals so the shell survives Ctrl+C.
  4. Explain why built-ins must run inside the shell.

2. All Theory Needed (Per-Concept Breakdown)

Process Model and Pipeline Control

Fundamentals A shell is a process that spawns other processes. It reads a command line, determines whether the command is built-in or external, and then uses fork and exec to launch programs. Pipelines are created by connecting stdout of one process to stdin of another using pipes, which are kernel-managed byte streams. Signals add control flow: the terminal sends SIGINT to the foreground process group, and the shell must decide whether to forward or ignore it. Understanding the process model and pipeline setup is essential because a shell is the simplest place where these OS primitives are used together.

Deep Dive The core shell loop is simple: read input, parse, decide, execute. The complexity lies in process control and file descriptor manipulation. The shell itself is a long-lived process. When it launches a command, it uses fork to create a child process that inherits the parent’s environment, file descriptors, and working directory. The child then replaces itself with the target program via exec. The parent process waits for the child to exit, collects its exit status, and returns to the prompt. This pattern is the foundation of Unix program execution.

Pipelines extend the pattern. A pipeline requires a pipe for each connection between commands. Each pipe has a read end and a write end. To connect two commands, the shell forks a child for the producer, duplicates the write end of the pipe to stdout, and closes unused file descriptors. It then forks a child for the consumer, duplicates the read end of the pipe to stdin, and closes unused file descriptors. The parent closes both ends and waits. The order and timing of these closes matters: leaving an unused write end open can cause the reader to block forever waiting for EOF.

Built-ins are a conceptual twist. Commands like cd and exit must affect the shell itself, not a child. If the shell executed them in a child process, the child would change its directory or exit, but the parent shell would remain unchanged. This is why shells classify commands into built-ins and external programs. This is not a special kernel feature; it is a policy decision enforced by the shell.

Signals and process groups introduce additional constraints. The terminal sends signals to the foreground process group, not to a single process. When the shell starts a pipeline, it should put all pipeline processes into the same process group and set that group as the foreground. The shell should also ignore or handle SIGINT so that Ctrl+C terminates the pipeline but not the shell. Failure to manage process groups results in the shell killing itself or leaving orphaned processes in the background.

Parsing is its own topic, but for a minimal shell you can treat it as tokenization with a few operators (pipes and redirection). The important insight is that parsing determines execution: a line with pipes implies multiple processes and pipes; a line with redirection implies adjusting file descriptors before exec. The shell is therefore a compiler of sorts: it translates a string into a process tree and file descriptor wiring plan.

How this fit on projects You will apply this concept in §3.1 for pipeline requirements, in §4.2 for component design, and in §5.10 for implementation phases. It also supports P02-syscall-tracer.md by providing realistic workloads to trace.

Definitions & key terms

  • Fork: Create a child process that inherits parent state.
  • Exec: Replace the current process with a new program.
  • Pipe: Kernel buffer connecting output of one process to input of another.
  • Process group: A set of processes that receive terminal signals together.
  • Built-in: Command executed by the shell itself.

Mental model diagram

Shell
  |
  +-- fork -> child A -> exec cmd1 -> stdout -> pipe -> child B -> exec cmd2
  |
  +-- wait for children

How it works

  1. Read and parse a command line.
  2. If built-in, execute in shell process.
  3. If external, fork one child per command.
  4. Connect pipes and redirections.
  5. Wait for all children and return to prompt.

Minimal concrete example

Pipeline plan (conceptual):
cmd1 | cmd2
- create pipe
- child A: stdout -> pipe write
- child B: stdin  -> pipe read

Common misconceptions

  • “exec creates a new PID.” It reuses the current PID.
  • “Pipes are files on disk.” They are kernel buffers in memory.
  • “Signals go to one process only.” Terminals signal process groups.

Check-your-understanding questions

  1. Why does cd need to run in the parent process?
  2. What happens if a pipe write end is left open?
  3. Why does the shell need to manage process groups?
  4. What is the difference between fork and exec?

Check-your-understanding answers

  1. The working directory must change in the shell itself.
  2. The reader never sees EOF and may block forever.
  3. To ensure Ctrl+C affects the pipeline, not the shell.
  4. Fork duplicates process state; exec replaces it.

Real-world applications

  • Shell scripting and automation.
  • Building process supervisors and job control systems.
  • Debugging pipeline stalls in production systems.

Where you’ll apply it

References

  • “The Linux Programming Interface” - process control chapters
  • signal(7) man page: https://man7.org/linux/man-pages/man7/signal.7.html

Key insights A shell is a process compiler for pipelines.

Summary If you can implement fork/exec, pipes, and signal handling, you can build a minimal shell.

Homework/Exercises to practice the concept

  1. Draw a process tree for a pipeline of three commands.
  2. Explain why leaving file descriptors open can deadlock a pipeline.

Solutions to the homework/exercises

  1. One parent (shell) with three children in a single process group.
  2. Open write ends prevent EOF, so readers wait forever.

3. Project Specification

3.1 What You Will Build

A shell named myshell that supports running external commands, built-ins (cd, exit), pipes, and output redirection. It does not need scripting, globbing, or job control beyond foreground processes.

3.2 Functional Requirements

  1. Built-ins: cd and exit handled by the shell.
  2. External commands: Run arbitrary commands via fork/exec.
  3. Pipelines: Support cmd1 | cmd2 and cmd1 | cmd2 | cmd3.
  4. Redirection: Support > for stdout redirection.

3.3 Non-Functional Requirements

  • Performance: Commands should run with minimal overhead.
  • Reliability: No zombie processes after command completion.
  • Usability: Clear prompt and error messages.

3.4 Example Usage / Output

myshell> ls -l | grep .c
main.c
myshell> echo hello > out.txt
myshell> cat out.txt
hello

3.5 Data Formats / Schemas / Protocols

  • Command line is tokenized by whitespace.
  • Operators: | and >.
  • Exit status: integer code from child process.

3.6 Edge Cases

  • Empty input line.
  • Unknown command.
  • Pipe with missing command on either side.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

  • Build in project-root.
  • Run ./myshell from a terminal.

3.7.2 Golden Path Demo (Deterministic)

Use fixed commands that do not depend on time or randomness.

3.7.3 If CLI: Exact terminal transcript

$ ./myshell
myshell> pwd
/home/user/projects
myshell> echo hello | tr a-z A-Z
HELLO
myshell> exit
# exit code: 0

Failure demo (deterministic):

$ ./myshell
myshell> does-not-exist
error: command not found
# exit code: 127

Exit codes:

  • 0 success
  • 127 command not found

4. Solution Architecture

4.1 High-Level Design

Input -> Parser -> Execution plan -> fork/exec -> wait -> prompt

4.2 Key Components

Component Responsibility Key Decisions
Parser Tokenize commands and operators Minimal syntax only
Executor Spawn processes and manage pipes One process per command
Built-ins Execute in shell process Only cd/exit

4.4 Data Structures (No Full Code)

  • Token list: sequence of words and operators.
  • Command node: argv list + redirection info.
  • Pipeline: list of command nodes.

4.4 Algorithm Overview

Key Algorithm: pipeline execution

  1. Create pipes for N-1 connections.
  2. Fork N children and wire FDs.
  3. Close unused FDs in parent.
  4. Wait for all children.

Complexity Analysis:

  • Time: O(n) per command line.
  • Space: O(n) for tokens and pipeline metadata.

5. Implementation Guide

5.1 Development Environment Setup

# Install a C compiler and make

5.2 Project Structure

project-root/
├── src/
│   ├── shell.c
│   ├── parser.c
│   └── exec.c
├── tests/
│   └── shell_tests.sh
└── README.md

5.3 The Core Question You’re Answering

“What actually happens when I type a command?”

5.4 Concepts You Must Understand First

  1. fork/exec/wait
    • Why fork and exec are separate steps.
    • Book Reference: “The Linux Programming Interface” - Ch. 24-27
  2. File descriptors
    • How redirection rewires stdin/stdout.
    • Book Reference: “The Linux Programming Interface” - Ch. 4-5

5.5 Questions to Guide Your Design

  1. How will you handle multiple pipes in one line?
  2. How will you prevent zombie processes?

5.6 Thinking Exercise

Pipeline Wiring

Draw file descriptor wiring for a | b | c.

5.7 The Interview Questions They’ll Ask

  1. “Why must cd be a built-in?”
  2. “How does the shell implement pipes?”
  3. “What is a zombie process?”
  4. “What is the difference between stdin and stdout?”

5.8 Hints in Layers

Hint 1: Single Command First Get ls working without pipes.

Hint 2: Add Redirection Support > by rewiring stdout before exec.

Hint 3: Add Pipes Create pipes and fork one process per command.

Hint 4: Debugging Use Project 2 tracer to confirm syscalls.

5.9 Books That Will Help

Topic Book Chapter
Process control “The Linux Programming Interface” Ch. 24-27
Signals “Advanced Programming in the UNIX Environment” Ch. 10

5.10 Implementation Phases

Phase 1: Foundation (2 days)

Goals:

  • Parse a line and run a single command.

Tasks:

  1. Build a token parser.
  2. Execute a single external command.

Checkpoint: myshell> ls works.

Phase 2: Core Functionality (3 days)

Goals:

  • Add redirection and pipes.

Tasks:

  1. Implement stdout redirection.
  2. Implement pipelines.

Checkpoint: ls | grep txt works.

Phase 3: Polish & Edge Cases (2 days)

Goals:

  • Handle errors and cleanup.

Tasks:

  1. Print helpful error messages.
  2. Ensure no zombies.

Checkpoint: ps shows no defunct children.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Parser complexity Minimal vs full shell grammar Minimal Keep scope focused
Pipe handling Recursive vs iterative Iterative Easier to debug

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Parser behavior Tokenize pipes
Integration Tests Command execution echo hello
Edge Case Tests Empty input Blank line

6.2 Critical Test Cases

  1. Simple command: pwd prints working directory.
  2. Pipeline: echo a | tr a-z A-Z prints A.
  3. Redirection: echo hi > out.txt creates file.

6.3 Test Data

Commands: pwd, echo, ls
Expected: correct outputs and exit codes

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Not waiting Zombies appear Wait for all children
FD leak Pipeline hangs Close unused pipe ends
Built-in forked cd has no effect Run built-ins in parent

7.2 Debugging Strategies

  • Trace syscalls: use Project 2 tracer to verify fork/exec.
  • Print FD table: log which FDs are open per child.

7.3 Performance Traps

Pipes are fast but too many processes can slow startup; keep pipelines small.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add history command.
  • Add pwd built-in.

8.2 Intermediate Extensions

  • Add >> append redirection.
  • Add input redirection <.

8.3 Advanced Extensions

  • Add background jobs with &.
  • Add job control for fg/bg.

9. Real-World Connections

9.1 Industry Applications

  • Shells: bash, zsh, fish.
  • Job control: process supervisors and schedulers.
  • bash: https://www.gnu.org/software/bash/ - reference shell.
  • dash: http://gondor.apana.org.au/~herbert/dash/ - minimal shell.

9.3 Interview Relevance

Pipes, fork/exec, and signal handling are standard OS interview topics.


10. Resources

10.1 Essential Reading

  • “The Linux Programming Interface” - process control chapters
  • “Advanced Programming in the UNIX Environment” - signals

10.2 Video Resources

  • “Build a shell” lectures (search title)

10.3 Tools & Documentation

  • man7: https://man7.org/linux/man-pages/man2/fork.2.html
  • POSIX: https://pubs.opengroup.org/onlinepubs/9699919799/

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain fork vs exec
  • I can explain why built-ins run in the parent
  • I understand how pipes connect processes

11.2 Implementation

  • All functional requirements are met
  • No zombie processes remain
  • Pipelines behave deterministically

11.3 Growth

  • I can explain this shell in an interview
  • I documented my design decisions
  • I can propose an extension

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Single commands run correctly
  • Built-ins work in the parent
  • Basic pipelines function

Full Completion:

  • All minimum criteria plus:
  • Redirection works reliably
  • Error messages are clear

Excellence (Going Above & Beyond):

  • Job control and background processes
  • Script execution support