Project 7: Unix Shell from Scratch

Implement a functional shell with parsing, pipes, redirection, and job control.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 18-28 hours
Main Programming Language C
Alternative Programming Languages Rust, Go
Coolness Level Very High
Business Potential Medium (developer tooling)
Prerequisites fork/exec, signals, pipes
Key Topics pipelines, redirection, job control, parsing

1. Learning Objectives

By completing this project, you will:

  1. Parse command lines with quotes and escapes.
  2. Implement pipelines and redirection via pipes and dup2.
  3. Manage foreground and background jobs with process groups.
  4. Handle SIGINT/SIGTSTP correctly for child processes.

2. All Theory Needed (Per-Concept Breakdown)

Pipelines, Redirection, and Job Control

Fundamentals

A shell is a process manager and a file descriptor router. It reads a line, parses it into commands, and uses fork and exec to run them. Pipes connect stdout of one process to stdin of another. Redirection replaces stdin/stdout/stderr with files. Job control adds process groups and signals so the shell can manage foreground and background jobs. The shell itself must not die when the user presses Ctrl+C; instead, it should forward signals to the active job. These responsibilities require a precise understanding of file descriptors, process groups, and signal delivery.

Deep Dive into the concept

Pipelines are built by creating a pipe for each pair of commands. A pipe yields two file descriptors: read and write. The shell forks children, and each child uses dup2 to map its stdin/stdout to the appropriate pipe ends. The parent closes all pipe FDs to avoid leaks that keep pipes open. The last command writes to stdout by default unless redirected. This pattern must scale to pipelines of arbitrary length.

Redirection is a special case of file descriptor manipulation. To redirect output, the shell opens a file and uses dup2 to replace stdout (FD 1). For input, it replaces stdin (FD 0). The shell must handle errors such as missing files or permission denied and must produce error messages similar to the system shell. Redirection must apply only to the target command and not leak into the shell itself.

Job control adds another layer. The shell creates a new process group for each pipeline, then sets that group as the foreground process group for the terminal with tcsetpgrp. When the user presses Ctrl+C, the terminal sends SIGINT to the foreground process group, not the shell. After the job completes, the shell regains control of the terminal. Background jobs run in their own process group but do not own the terminal; they should not receive terminal signals by default. The shell tracks jobs and can bring them to foreground (fg) or continue them (bg). Implementing job control requires careful use of setpgid, tcsetpgrp, and signal handlers.

Parsing is not trivial. You must handle quoted strings, escaped characters, and operators like |, <, >, >>, and &. A clean approach is to tokenize the input, then build a pipeline structure. Each command is a list of arguments plus I/O redirection metadata. A minimal shell can skip complex features like globbing or variable expansion, but it must still handle quotes correctly to be usable.

This project ties together many OS concepts: file descriptors, process creation, signals, and synchronization. It also produces a usable tool you can compare to /bin/sh for behavior. If you can build a correct shell, you can reason about process control in any Unix system.

How this fit on projects

This concept is central to Section 3.2, Section 3.7, and Section 5.10 Phases 2-3. It builds directly on Project 6.

Definitions & key terms

  • Pipeline: chain of processes connected by pipes.
  • Redirection: replacing stdin/stdout/stderr with files.
  • Process group: set of processes that receive terminal signals together.
  • Foreground job: process group that owns the terminal.

Mental model diagram (ASCII)

cmd1 | cmd2 | cmd3
  |      |      |
 pipe   pipe   stdout

How it works (step-by-step)

  1. Parse input into commands and operators.
  2. Create pipes for pipeline segments.
  3. Fork child processes.
  4. In each child, dup2 FDs and exec.
  5. Parent closes pipes and waits or tracks job.

Minimal concrete example

pipe(p);
if (fork()==0) { dup2(p[1], 1); execvp(cmd1[0], cmd1); }
if (fork()==0) { dup2(p[0], 0); execvp(cmd2[0], cmd2); }

Common misconceptions

  • “The shell handles Ctrl+C”: the terminal sends SIGINT to the foreground group.
  • “Pipes close automatically”: if the parent keeps them open, children may hang.

Check-your-understanding questions

  1. Why must the shell be a process group leader?
  2. Why must the parent close pipe FDs?
  3. Why is cd a built-in?

Check-your-understanding answers

  1. It manages terminal control and job signals.
  2. Otherwise, readers never see EOF.
  3. It must change the shell’s own working directory.

Real-world applications

  • Shells, job control, and scripting engines.

Where you’ll apply it

  • This project: Section 3.2, Section 3.7, Section 5.10 Phase 3.
  • Also used in: Project 6, Project 12.

References

  • “APUE” Ch. 3, 9
  • “TLPI” Ch. 20-22, 44

Key insights

A shell is a file-descriptor routing engine with process control policies.

Summary

Building a shell forces you to master process creation, pipes, and signals.

Homework/Exercises to practice the concept

  1. Implement | for two commands and then generalize.
  2. Add output redirection with > and >>.
  3. Implement jobs and fg built-ins.

Solutions to the homework/exercises

  1. Use pipe(), fork twice, dup2 in children.
  2. Open file with O_TRUNC or O_APPEND and dup2 to STDOUT.
  3. Track job list with process group IDs and states.

3. Project Specification

3.1 What You Will Build

A shell that supports command execution, pipelines, redirection, and job control, with built-ins for cd, jobs, fg, and exit.

3.2 Functional Requirements

  1. Execute external commands with correct exit status.
  2. Support pipelines of arbitrary length.
  3. Support <, >, and >> redirection.
  4. Support background jobs (&) and job control.

3.3 Non-Functional Requirements

  • Performance: handle pipelines with up to 10 commands.
  • Reliability: no shell crash on Ctrl+C.
  • Usability: prompt and basic error messages.

3.4 Example Usage / Output

myshell> ls -la | grep .c | wc -l
5
myshell> sleep 10 &
[1] 4321
myshell> jobs
[1] Running sleep 10

3.5 Data Formats / Schemas / Protocols

  • Internal pipeline structure with argv arrays and redirection flags.

3.6 Edge Cases

  • Quoted strings with spaces.
  • Redirecting both stdin and stdout.
  • Background pipeline with multiple processes.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

./myshell

3.7.2 Golden Path Demo (Deterministic)

  • Run fixed commands; no randomness involved.

3.7.3 If CLI: exact terminal transcript

$ ./myshell
myshell> echo hello
hello
myshell> printf "b\na\n" | sort
 a
 b
myshell> sleep 1 &
[1] 5000
myshell> jobs
[1] Running sleep 1

Failure demo (deterministic):

myshell> cat < missing.txt
myshell: missing.txt: No such file or directory

Exit codes:

  • 0 success
  • 2 parse error
  • 3 exec error

4. Solution Architecture

4.1 High-Level Design

Parser -> Pipeline builder -> Executor -> Job table

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Tokenizer | split input | handle quotes/escapes | | Parser | build pipeline | represent argv + redirects | | Executor | fork/exec + pipes | close FDs properly | | Job control | process groups | tcsetpgrp + signals |

4.3 Data Structures (No Full Code)

struct command {
    char *argv[64];
    char *infile;
    char *outfile;
    int append;
};

4.4 Algorithm Overview

Key Algorithm: Pipeline execution

  1. Create pipes for N-1 links.
  2. Fork N children.
  3. In each child, dup2 pipe ends.
  4. Exec command.

Complexity Analysis:

  • Time: O(n) processes
  • Space: O(n) for pipes

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

project-root/
|-- myshell.c
|-- parser.c
|-- job.c
`-- Makefile

5.3 The Core Question You’re Answering

“How do pipes, fork/exec, and signals combine to implement the Unix command model?”

5.4 Concepts You Must Understand First

  1. File descriptor duplication with dup2.
  2. Process groups and terminal control.
  3. Signal handling (SIGINT, SIGTSTP).

5.5 Questions to Guide Your Design

  1. How will you parse quotes and escapes?
  2. When should the parent wait vs return to prompt?
  3. How will you keep job state consistent?

5.6 Thinking Exercise

Explain the FD wiring for grep foo < in.txt | sort | uniq -c > out.txt.

5.7 The Interview Questions They’ll Ask

  1. Why must the shell close pipe FDs?
  2. Why is cd built-in?

5.8 Hints in Layers

Hint 1: Single command execution.

Hint 2: Two-command pipeline.

Hint 3: Full job control.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Pipes/signals | TLPI | 20-22, 44 | | Shell design | APUE | 3, 9 |

5.10 Implementation Phases

Phase 1: Parser + single exec (4-6 hours)

Goals: run commands and handle errors.

Phase 2: Pipelines + redirection (6-8 hours)

Goals: pipe chaining and file redirects.

Phase 3: Job control (6-8 hours)

Goals: background jobs and signal forwarding.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Parser | handwritten vs parser generator | handwritten | simpler scope | | Job tracking | array vs list | list | dynamic jobs |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit | tokenizer tests | quoted strings | | Integration | pipelines | echo a | wc -c | | Signal | job control | Ctrl+C on sleep |

6.2 Critical Test Cases

  1. Pipeline of 3 commands.
  2. Redirection with missing input file.
  3. Background job that finishes quickly.

6.3 Test Data

Command: printf "c\nb\na\n" | sort
Expected: a b c

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |——–|———|———-| | Pipes hang | process never exits | close unused FDs | | Shell killed by Ctrl+C | exit on SIGINT | ignore SIGINT in shell | | Wrong parsing | broken argv | add tokenizer tests |

7.2 Debugging Strategies

  • Use strace -f to see pipe/dup2/exec calls.
  • Print the parsed AST before execution.

7.3 Performance Traps

  • Spawning too many processes for built-ins.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add pwd built-in.

8.2 Intermediate Extensions

  • Implement 2> stderr redirection.

8.3 Advanced Extensions

  • Add basic job persistence across shell sessions.

9. Real-World Connections

9.1 Industry Applications

  • Shells and automation systems.
  • bash and dash source code.

9.3 Interview Relevance

  • Pipes, redirection, job control questions.

10. Resources

10.1 Essential Reading

  • TLPI Ch. 20-22, 44

10.2 Video Resources

  • Shell implementation lectures

10.3 Tools & Documentation

  • man pipe, man dup2, man tcsetpgrp

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain job control signals.
  • I can explain pipeline FD wiring.

11.2 Implementation

  • Shell handles pipelines and redirection.

11.3 Growth

  • I can compare my shell to /bin/sh.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Execute commands and pipelines.

Full Completion:

  • Job control with background/foreground.

Excellence (Going Above & Beyond):

  • Advanced parsing and redirection features.