Project 4: Complete Shell Implementation
Build a functional UNIX shell with command execution, pipelines, I/O redirection, background jobs, and basic job control (Ctrl-C, Ctrl-Z).
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4 - Expert |
| Time Estimate | 40-80 hours |
| Language | C (primary), Rust with unsafe blocks (alternative) |
| Prerequisites | Projects 1-3, process control, signals |
| Key Topics | fork/exec/wait, Pipelines, Signals, Job Control, Terminal I/O |
1. Learning Objectives
After completing this project, you will:
- Master the fork/exec/wait pattern for process creation and management
- Understand process groups, sessions, and the controlling terminal
- Implement pipeline construction with proper pipe and file descriptor management
- Handle signals correctly in a parent process (shell) vs child processes
- Implement I/O redirection using dup2() and file descriptor manipulation
- Build a job table for tracking background and stopped processes
- Understand how Ctrl-C, Ctrl-Z, and job control actually work
- Parse command lines including quotes, escapes, and special characters
- Appreciate the complexity hidden behind every shell command you type
2. Theoretical Foundation
2.1 Core Concepts
The fork/exec/wait Pattern
UNIX doesn’t have a “create process” call. Instead, a process clones itself (fork) and optionally replaces its code (exec). The parent waits for the child (wait).
Shell Process (pid 100)
│
│ User types: "ls -la"
│
│ fork()
│
├───────────────────────────────────────────────────────┐
│ │
│ Parent (pid 100) Child (pid 101)
│ │
│ fork() returns 101 fork() returns 0
│ │
│ wait(&status) exec("ls", "ls", "-la")
│ (blocks until child exits) │
│ Code REPLACED with ls
│ │
│ ls runs, prints output
│ │
│ exit(0)
│ │
│ wait() returns 101 (zombie reaped)
│ status = 0 (success)
│
│ Shell prints next prompt
▼
Why this design?
- Separation of concerns: Creating a process and running a program are independent
- Setup between fork and exec: Child can redirect I/O, change directory, set environment
- Copy-on-write optimization: Fork is fast because pages aren’t copied until modified
Process Groups and Sessions
Terminal Session (sid 100, controlling tty: /dev/pts/0)
│
├── Shell (pid 100, pgid 100) ────────────────── Foreground group
│ │
│ └── Foreground job: "vim file.txt"
│ └── vim (pid 101, pgid 101) ─────────── This is foreground now
│ Shell gave it the terminal
│
├── Background job 1: "make &"
│ └── make (pid 200, pgid 200) ─────────────── Background group
│ └── gcc (pid 201, pgid 200) All in same process group
│ └── gcc (pid 202, pgid 200)
│
└── Background job 2: "sleep 100 &"
└── sleep (pid 300, pgid 300) ────────────── Another background group
Key Concepts:
┌──────────────────────────────────────────────────────────────────────┐
│ Session: Collection of process groups, one controlling tty │
│ Process Group: Group of related processes (e.g., pipeline) │
│ Foreground: The process group receiving terminal signals │
│ Background: Process groups not receiving terminal signals │
│ Controlling TTY: The terminal that sends signals (Ctrl-C, Ctrl-Z) │
└──────────────────────────────────────────────────────────────────────┘
Pipeline Construction
For ls | sort | head:
Shell (pid 100, pgid 100)
│
│ Create pipe1: [read_fd=3, write_fd=4]
│ Create pipe2: [read_fd=5, write_fd=6]
│
│ fork() × 3
│
├─────────────────────────────────────────────────────────────┐
│ │
│ Child 1 (pid 101) All children set pgid to 101 │
│ ┌──────────────────┐ (first child's pid) │
│ │ setpgid(0, 0) │ Creates new process group │
│ │ │ │
│ │ close(3) [pipe1 read - not needed] │
│ │ close(5) [pipe2 read - not needed] │
│ │ close(6) [pipe2 write - not needed] │
│ │ │ │
│ │ dup2(4, STDOUT) │ ← stdout → pipe1 write │
│ │ close(4) │ │
│ │ │ │
│ │ exec("ls") │ │
│ └──────────────────┘ │
│ │
│ Child 2 (pid 102) │
│ ┌──────────────────┐ │
│ │ setpgid(0, 101) │ ← Join process group 101 │
│ │ │ │
│ │ close(4) [pipe1 write - not needed] │
│ │ close(5) [pipe2 read - not needed] │
│ │ │ │
│ │ dup2(3, STDIN) │ ← stdin ← pipe1 read │
│ │ dup2(6, STDOUT) │ ← stdout → pipe2 write │
│ │ close(3) │ │
│ │ close(6) │ │
│ │ │ │
│ │ exec("sort") │ │
│ └──────────────────┘ │
│ │
│ Child 3 (pid 103) │
│ ┌──────────────────┐ │
│ │ setpgid(0, 101) │ ← Join process group 101 │
│ │ │ │
│ │ close(3) [pipe1 read - not needed] │
│ │ close(4) [pipe1 write - not needed] │
│ │ close(6) [pipe2 write - not needed] │
│ │ │ │
│ │ dup2(5, STDIN) │ ← stdin ← pipe2 read │
│ │ close(5) │ │
│ │ │ │
│ │ exec("head") │ │
│ └──────────────────┘ │
│
│ Parent (shell):
│ ├── Close ALL pipe fds: 3, 4, 5, 6
│ ├── tcsetpgrp(tty_fd, 101) ← Give terminal to job
│ ├── waitpid(-101, ...) ← Wait for all in group
│ ├── tcsetpgrp(tty_fd, 100) ← Take terminal back
│ └── Print next prompt
Critical rule: Every process must close all pipe ends it doesn’t use. If the shell forgets to close pipe1’s write end, sort will never see EOF and will hang forever.
Signal Handling for Shells
Signal Behavior in Shell vs Child Processes
┌───────────────────────────────────────────────────────────────────────┐
│ │
│ Shell (parent): Child (before exec): │
│ ┌────────────────────────────┐ ┌────────────────────────────────┐ │
│ │ SIGINT → SIG_IGN (ignore) │ │ SIGINT → SIG_DFL (reset) │ │
│ │ SIGTSTP → SIG_IGN (ignore) │ │ SIGTSTP → SIG_DFL (reset) │ │
│ │ SIGTTIN → SIG_IGN (ignore) │ │ SIGTTIN → SIG_DFL (reset) │ │
│ │ SIGTTOU → SIG_IGN (ignore) │ │ SIGTTOU → SIG_DFL (reset) │ │
│ │ SIGCHLD → handler │ │ SIGCHLD → SIG_DFL │ │
│ │ (reap zombies) │ │ │ │
│ └────────────────────────────┘ └────────────────────────────────┘ │
│ │
│ Why? │
│ - Shell ignores Ctrl-C so it doesn't die when user presses Ctrl-C │
│ - Child resets to default so it DOES die on Ctrl-C │
│ - Terminal sends signal to foreground PROCESS GROUP │
│ - Shell is in its own group, so it's protected │
│ │
└───────────────────────────────────────────────────────────────────────┘
When user presses Ctrl-C during "sleep 100":
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ Terminal Driver │
│ │ │
│ │ Ctrl-C detected │
│ │ │
│ ▼ │
│ Send SIGINT to foreground process group │
│ │ │
│ │ Foreground group is 101 (sleep's group) │
│ │ │
│ ├───────────────────────────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ Shell (pgid 100) sleep (pgid 101) │
│ NOT in group 101 IN group 101 │
│ Does NOT receive signal RECEIVES signal │
│ Continues running Terminates │
│ │
│ Shell then: │
│ - SIGCHLD received (child died) │
│ - waitpid() returns │
│ - Take back terminal │
│ - Print prompt │
│ │
└─────────────────────────────────────────────────────────────────────┘
I/O Redirection
Command: cat < input.txt > output.txt 2>&1
Before exec("cat"):
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ Original: After redirection: │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ 0 → stdin │ │ 0 → input.txt │ open("input.txt", O_RDONLY)│
│ │ 1 → stdout │ │ 1 → output.txt │ open("output.txt", O_WRONLY)│
│ │ 2 → stderr │ │ 2 → output.txt │ dup2(1, 2) │
│ └─────────────┘ └─────────────┘ │
│ │
│ Steps: │
│ 1. fd = open("input.txt", O_RDONLY) // Returns e.g., 3 │
│ 2. dup2(fd, 0) // Now 0 → input.txt │
│ 3. close(fd) // Clean up fd 3 │
│ 4. fd = open("output.txt", O_WRONLY|O_CREAT|O_TRUNC, 0644) │
│ 5. dup2(fd, 1) // Now 1 → output.txt │
│ 6. close(fd) │
│ 7. dup2(1, 2) // Now 2 → output.txt │
│ 8. exec("cat") // Cat uses redirected fds │
│ │
└─────────────────────────────────────────────────────────────────────┘
2.2 Why This Matters
The shell is the textbook example of UNIX process control. Building one teaches:
- How programs are actually executed
- Why fork and exec are separate
- How pipes connect processes
- Why Ctrl-C kills the right process
- How background jobs work
- The complexity hidden behind simple commands
Career relevance:
- Every systems programming interview asks about fork/exec
- Understanding the shell helps debug any process issue
- Container runtimes (Docker) use these same primitives
- Init systems (systemd) are essentially specialized shells
2.3 Historical Context
The shell dates to the earliest UNIX. Thompson shell (1971), Bourne shell (1979), C shell, Korn shell, and Bash (1989) all built on the same fundamental model.
The fork-exec split was chosen because it enables:
- Redirection setup between fork and exec
- Pipelines as a composition of simple operations
- Clean separation of process creation and program execution
This design has proven so powerful that it’s essentially unchanged in 50+ years.
2.4 Common Misconceptions
Misconception 1: “The shell runs commands”
- Reality: The shell creates processes that run commands. The shell just coordinates.
Misconception 2: “Ctrl-C kills the shell”
- Reality: Ctrl-C sends SIGINT to the foreground process group. The shell ignores SIGINT.
Misconception 3: “Pipes connect commands”
- Reality: Pipes connect file descriptors. The shell sets up the connections before exec.
Misconception 4: “Background jobs run in the background”
- Reality: They run in a different process group that doesn’t receive terminal signals.
Misconception 5: “exec() creates a new process”
- Reality: exec() replaces the current process’s code. No new process is created.
3. Project Specification
3.1 What You Will Build
A functional UNIX shell called mysh that:
- Executes commands (ls, cat, gcc, etc.)
-
Supports pipelines (cmd1 cmd2 cmd3) - Supports I/O redirection (<, >, », 2>&1)
- Supports background jobs (cmd &)
- Implements job control (Ctrl-C, Ctrl-Z, fg, bg, jobs)
- Has basic built-in commands (cd, exit, jobs, fg, bg)
3.2 Functional Requirements
- Command execution: Parse and execute commands with arguments
-
Pipelines: Connect multiple commands with pipes ( ) - Input redirection:
cmd < filereads stdin from file - Output redirection:
cmd > fileandcmd >> file - Error redirection:
cmd 2> fileandcmd 2>&1 - Background execution:
cmd &runs in background - Job table: Track running and stopped jobs
- Job control: Ctrl-C interrupts, Ctrl-Z stops foreground job
- fg command: Bring job to foreground
- bg command: Continue stopped job in background
- jobs command: List current jobs
- cd command: Change directory
- exit command: Exit shell
3.3 Non-Functional Requirements
- No zombies: All child processes must be reaped
- Signal safety: Proper signal handling, no race conditions
- Resource cleanup: All file descriptors properly closed
- Error messages: Meaningful messages for all errors
- Graceful degradation: Handle errors without crashing
3.4 Example Usage / Output
# 1. Start your shell
$ ./mysh
mysh>
# 2. Basic command execution
mysh> echo hello world
hello world
mysh> ls -la
total 48
drwxr-xr-x 5 user user 4096 Mar 15 10:00 .
...
# 3. Pipelines
mysh> cat /etc/passwd | grep root | cut -d: -f1
root
mysh> ls -la | sort -k5 -n | tail -5
-rw-r--r-- 1 user user 1024 Mar 14 09:00 small.txt
-rw-r--r-- 1 user user 4096 Mar 15 10:00 medium.txt
...
# 4. I/O Redirection
mysh> echo "hello" > output.txt
mysh> cat < output.txt
hello
mysh> ls nosuchfile 2>&1 | grep -i "no such"
ls: cannot access 'nosuchfile': No such file or directory
# 5. Background jobs
mysh> sleep 10 &
[1] 12345
mysh> sleep 20 &
[2] 12346
mysh> jobs
[1] Running sleep 10
[2] Running sleep 20
# 6. Job control
mysh> sleep 100
^Z
[1]+ Stopped sleep 100
mysh> bg
[1]+ sleep 100 &
mysh> fg
sleep 100
^C
mysh>
# 7. Built-in commands
mysh> cd /tmp
mysh> pwd
/tmp
mysh> cd
mysh> pwd
/home/user
# 8. Exit
mysh> exit
$
3.5 Real World Outcome
What success looks like:
- Commands run correctly: ls, cat, grep all work as expected
- Pipelines work: Data flows through multi-stage pipelines
- Redirection works: Files are read and written correctly
- No hangs: Pipelines complete, Ctrl-C works
- No zombies:
ps aux | grep defunctshows nothing - Job control: Can stop, resume, and manage background jobs
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────────┐
│ mysh │
│ │
│ ┌────────────────┐ │
│ │ Initialize │ ← Set up signal handlers, terminal control │
│ └───────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ MAIN LOOP (REPL) │ │
│ │ │ │
│ │ ┌────────────────┐ │ │
│ │ │ Print prompt │ │ │
│ │ └───────┬────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────┐ │ │
│ │ │ Read line │ ← fgets or readline │ │
│ │ └───────┬────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────┐ │ │
│ │ │ Parse line │ ← Tokenize, handle quotes, find |, <, > │ │
│ │ └───────┬────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ Built-in cmd? │──Yes───▶│ Execute built-in (cd, exit) │ │ │
│ │ └───────┬────────┘ └─────────────────────────────┘ │ │
│ │ │ No │ │
│ │ ▼ │ │
│ │ ┌────────────────┐ │ │
│ │ │ Create pipes │ ← For each | in command │ │
│ │ └───────┬────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ For each command in pipeline: │ │ │
│ │ │ │ │ │
│ │ │ fork() ────────────────────────────────────────┐ │ │ │
│ │ │ │ │ │ │
│ │ │ Parent: Child: │ │ │ │
│ │ │ - Track child pid - setpgid() │ │ │ │
│ │ │ - Continue loop - Set up pipes│ │ │ │
│ │ │ - Set up redir│ │ │ │
│ │ │ - Reset signals│ │ │ │
│ │ │ - exec() │ │ │ │
│ │ │ ▼ │ │ │
│ │ │ (runs command) │ │ │
│ │ └──────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────┐ │ │
│ │ │ Close pipes │ ← Parent must close all pipe fds │ │
│ │ └───────┬────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────┐ │ │
│ │ │ Foreground? │ │ │
│ │ └───────┬────────┘ │ │
│ │ │ │ │
│ │ ┌───────┴────────┐ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ [Foreground] [Background] │ │
│ │ tcsetpgrp() Add to job table │ │
│ │ wait for job Print [N] PID │ │
│ │ tcsetpgrp() │ │
│ │ │ │
│ │ Loop back to prompt │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | Purpose | Key Functions |
|---|---|---|
| Parser | Tokenize input, handle quotes/escapes | Custom lexer |
| Pipeline Builder | Create pipes, set up connections | pipe(), dup2() |
| Executor | Fork and exec commands | fork(), execvp() |
| Job Manager | Track jobs, handle status changes | waitpid(), job table |
| Signal Handler | Handle SIGCHLD, ignore SIGINT in shell | sigaction() |
| Terminal Controller | Manage foreground group | tcsetpgrp() |
| Built-in Executor | Handle cd, exit, jobs, fg, bg | Internal functions |
4.3 Data Structures
// Represents a single command
typedef struct command {
char **argv; // NULL-terminated argument array
char *input_file; // < redirection (NULL if none)
char *output_file; // > or >> redirection
int append_output; // 1 for >>, 0 for >
char *error_file; // 2> redirection
int stderr_to_stdout; // 1 for 2>&1
} command_t;
// Represents a pipeline
typedef struct pipeline {
command_t *commands; // Array of commands
int num_commands; // Number of commands
int background; // 1 if ends with &
} pipeline_t;
// Job status
typedef enum {
JOB_RUNNING,
JOB_STOPPED,
JOB_DONE
} job_status_t;
// Job table entry
typedef struct job {
int job_id; // Job number [1], [2], etc.
pid_t pgid; // Process group ID
job_status_t status; // Running, Stopped, Done
char *cmdline; // Original command string
int notified; // 1 if user has been notified of Done
struct job *next; // Linked list
} job_t;
// Global shell state
typedef struct shell {
pid_t shell_pgid; // Shell's process group
int terminal_fd; // File descriptor for terminal
int interactive; // 1 if interactive mode
job_t *jobs; // Job list head
int next_job_id; // Next job number to assign
} shell_t;
4.4 Algorithm Overview
FUNCTION execute_pipeline(pipeline, shell):
// Step 1: Create all pipes needed
pipes = array of (num_commands - 1) pipe pairs
FOR i = 0 TO num_commands - 2:
pipe(pipes[i])
// Step 2: Fork each command
pgid = 0 // Will be set to first child's PID
FOR i = 0 TO num_commands - 1:
pid = fork()
IF pid == 0: // Child
// Set process group
IF pgid == 0:
setpgid(0, 0) // Become group leader
ELSE:
setpgid(0, pgid)
// Set up pipes
IF i > 0: // Not first command
dup2(pipes[i-1][READ], STDIN)
IF i < num_commands - 1: // Not last command
dup2(pipes[i][WRITE], STDOUT)
// Close all pipe ends
FOR each pipe in pipes:
close(pipe[READ])
close(pipe[WRITE])
// Set up redirections
setup_redirections(commands[i])
// Reset signals to default
reset_signals()
// Execute
execvp(commands[i].argv[0], commands[i].argv)
perror("exec failed")
exit(127)
ELSE: // Parent
IF i == 0:
pgid = pid
setpgid(pid, pgid) // Also set from parent (race condition fix)
// Step 3: Parent closes all pipes
FOR each pipe in pipes:
close(pipe[READ])
close(pipe[WRITE])
// Step 4: Handle foreground vs background
IF NOT pipeline.background:
// Foreground job
tcsetpgrp(shell.terminal_fd, pgid)
wait_for_job(pgid)
tcsetpgrp(shell.terminal_fd, shell.shell_pgid)
ELSE:
// Background job
job = create_job(pgid, cmdline)
add_job(shell, job)
printf("[%d] %d\n", job.job_id, pgid)
FUNCTION wait_for_job(pgid):
// Wait for all processes in the group
WHILE TRUE:
pid = waitpid(-pgid, &status, WUNTRACED)
IF pid <= 0:
BREAK
IF WIFSTOPPED(status):
// Job was stopped (Ctrl-Z)
job = find_job_by_pgid(pgid)
job.status = JOB_STOPPED
printf("\n[%d]+ Stopped %s\n", job.job_id, job.cmdline)
BREAK
IF WIFEXITED(status) OR WIFSIGNALED(status):
// Process exited
decrement_process_count(pgid)
IF all_processes_done(pgid):
BREAK
5. Implementation Guide
5.1 Development Environment Setup
# Create project directory
$ mkdir -p ~/projects/mysh
$ cd ~/projects/mysh
# Install readline for nicer input handling (optional)
$ sudo apt install libreadline-dev # Debian/Ubuntu
# Create initial files
$ touch mysh.c Makefile
# Test terminal capabilities
$ echo $TERM
xterm-256color
$ tty
/dev/pts/0
5.2 Project Structure
mysh/
├── mysh.c # Main shell code
├── parser.c # Command parsing
├── parser.h
├── executor.c # Command execution
├── executor.h
├── jobs.c # Job management
├── jobs.h
├── signals.c # Signal handling
├── signals.h
├── builtins.c # Built-in commands
├── builtins.h
├── Makefile
└── README.md
5.3 The Core Question You’re Answering
“How does the shell coordinate multiple processes, connect their inputs and outputs, and manage them as jobs while the user interacts with the terminal?”
The shell is the conductor of an orchestra of processes. Understanding this teaches you everything about UNIX process management that you’ll need for any systems programming.
5.4 Concepts You Must Understand First
Stop and research these before coding:
- fork/exec/wait Pattern
- Why are these separate system calls?
- What does each exec variant do (execl, execv, execvp)?
- Book Reference: “APUE” Ch. 8
- Process Groups and Sessions
- What’s a process group? Why do they exist?
- What’s a session? What’s a controlling terminal?
- Book Reference: “APUE” Ch. 9
- File Descriptor Manipulation
- dup(), dup2(), and their role in redirection
- How pipes connect processes
- Book Reference: “APUE” Ch. 3.12, 15.2
- Signal Handling for Shells
- SIGCHLD—child process status changed
- SIGINT, SIGTSTP—keyboard interrupts
- When should the shell ignore vs handle these?
- Book Reference: “APUE” Ch. 10
5.5 Questions to Guide Your Design
Before implementing, think through these:
- Process Structure
- Does each pipeline stage fork from the shell, or from the previous stage?
- Who creates the pipes? (Answer: shell, before forking)
- Terminal Control
- Which process should receive Ctrl-C?
- How do you give the terminal to a foreground job?
- What happens if a background job tries to read from terminal?
- Job Tracking
- How to track running/stopped jobs?
- When to reap zombies vs wait for foreground?
- How to know when all pipeline processes have exited?
5.6 Thinking Exercise
Trace Pipeline Execution
For command ls | sort | head, trace process creation:
Shell (pid 100, pgid 100, sid 100)
│
├─ fork() ─────────────────────────────────────────────────────┐
│ │
│ v
│ ┌────────────────────┐
│ 1. Create pipe1: [read_fd, write_fd] │ Child1 (pid 101) │
│ 2. fork() for ls │ - setpgid(0, 101) │
│ 3. In child: dup2(pipe1_write, STDOUT) │ - close pipe read │
│ close unused pipe ends │ - exec("ls") │
│ └────────────────────┘
│
├─ fork() ─────────────────────────────────────────────────────┐
│ v
│ ┌────────────────────┐
│ 4. Create pipe2: [read_fd, write_fd] │ Child2 (pid 102) │
│ 5. fork() for sort │ - setpgid(0, 101) │
│ 6. In child: dup2(pipe1_read, STDIN) │ (same group!) │
│ dup2(pipe2_write, STDOUT) │ - exec("sort") │
│ close unused ends └────────────────────┘
│
├─ fork() ─────────────────────────────────────────────────────┐
│ v
│ ┌────────────────────┐
│ 7. fork() for head │ Child3 (pid 103) │
│ 8. In child: dup2(pipe2_read, STDIN) │ - setpgid(0, 101) │
│ close unused ends │ - exec("head") │
│ exec("head") └────────────────────┘
│
│ 9. Shell: close all pipe ends
│ 10. Shell: tcsetpgrp(tty_fd, 101) // Give terminal to job
│ 11. Shell: waitpid(-101, ...) for all children
│ 12. Shell: tcsetpgrp(tty_fd, 100) // Take back terminal
Questions to answer:
- Why must all children be in the same process group? (So Ctrl-C kills them all)
- What happens if shell forgets to close pipe write end? (sort never sees EOF, hangs)
- Why does shell wait AFTER giving terminal to job? (To not race with children)
5.7 Hints in Layers
Hint 1: Start Simple Begin with just executing single commands (fork, exec, wait). No pipes, no redirection. Get this working first.
// Simplest shell
while (1) {
char line[1024];
printf("mysh> ");
fgets(line, sizeof(line), stdin);
// Parse line into argv
pid_t pid = fork();
if (pid == 0) {
execvp(argv[0], argv);
perror("exec");
exit(127);
}
waitpid(pid, NULL, 0);
}
Hint 2: Add Redirection Before exec(), use dup2() to redirect stdin/stdout to files. Remember to close the original file descriptor.
// In child, before exec:
if (input_file) {
int fd = open(input_file, O_RDONLY);
dup2(fd, STDIN_FILENO);
close(fd);
}
if (output_file) {
int fd = open(output_file, O_WRONLY|O_CREAT|O_TRUNC, 0644);
dup2(fd, STDOUT_FILENO);
close(fd);
}
Hint 3: Pipelines
// For each pair of adjacent commands:
int pipefd[2];
pipe(pipefd);
// Left command gets pipefd[1] as stdout
// Right command gets pipefd[0] as stdin
// PARENT MUST CLOSE BOTH ENDS
Hint 4: Job Control Signals In the shell (parent):
- Ignore SIGINT, SIGTSTP, SIGTTIN, SIGTTOU
- Handle SIGCHLD to track job state changes
In child before exec:
- Reset all signals to SIG_DFL
// In shell initialization:
signal(SIGINT, SIG_IGN);
signal(SIGTSTP, SIG_IGN);
signal(SIGTTIN, SIG_IGN);
signal(SIGTTOU, SIG_IGN);
signal(SIGCHLD, sigchld_handler);
// In child, before exec:
signal(SIGINT, SIG_DFL);
signal(SIGTSTP, SIG_DFL);
// etc.
5.8 The Interview Questions They’ll Ask
- “Explain how a pipeline like
cat file | grep pattern | wc -lis set up.”- Shell creates 2 pipes
- Forks 3 children, all in same process group
- Each child redirects stdin/stdout appropriately
- Shell closes all pipe ends, waits for group
- “What happens when you press Ctrl-C while a command is running?”
- Terminal driver generates SIGINT
- Sends to foreground process group
- Shell is in different group, so not affected
- Command receives signal and terminates (usually)
- “How do you prevent zombie processes in a shell?”
- Handle SIGCHLD with handler that calls waitpid()
- Use WNOHANG to not block
- Loop to handle multiple children
- “What’s the difference between a foreground and background job?”
- Foreground: has the terminal, receives keyboard signals
- Background: doesn’t have terminal, can’t read from tty
- tcsetpgrp() controls which group is foreground
- “How would you implement command substitution $(cmd)?”
- Create pipe
- Fork, redirect child stdout to pipe write end
- Parent reads from pipe read end into string
- Wait for child
- Substitute string into command line
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Process control | “APUE” by Stevens | Ch. 8 |
| Process relationships | “APUE” by Stevens | Ch. 9 |
| Signals | “APUE” by Stevens | Ch. 10 |
| Job control | “The Linux Programming Interface” | Ch. 34 |
| Terminal I/O | “APUE” by Stevens | Ch. 18 |
5.10 Implementation Phases
Phase 1: Basic Execution (4-6 hours)
- Read line, parse into argv
- Fork, exec, wait
- Handle basic errors
Phase 2: Built-in Commands (2-3 hours)
- Implement cd, exit
- Check for built-ins before forking
Phase 3: I/O Redirection (4-6 hours)
- Parse <, >, »
- Implement dup2() redirections
- Handle 2>, 2>&1
Phase 4: Pipelines (6-8 hours)
-
Parse separators - Create pipes
- Fork multiple children
- Handle pipe fd cleanup
Phase 5: Background Jobs (4-6 hours)
- Detect & suffix
- Don’t wait for background jobs
- Implement basic job table
Phase 6: Job Control (8-12 hours)
- Set up process groups
- Handle SIGCHLD
- Implement jobs, fg, bg commands
- Handle Ctrl-C, Ctrl-Z
- Terminal control with tcsetpgrp()
Phase 7: Polish (4-6 hours)
- Better parsing (quotes, escapes)
- Prompt customization
- History (optional, use readline)
- Error messages
5.11 Key Implementation Decisions
| Decision | Trade-offs |
|---|---|
| Parse before fork vs in child | Before: can check errors early. In child: simpler code |
| Linked list vs array for jobs | Linked list: easy insert/delete. Array: simpler indexing |
| setpgid in parent AND child | Both needed due to race condition (which runs first?) |
| Readline vs fgets | Readline: history, editing. fgets: simpler, no dependency |
| Separate files vs monolithic | Separate: cleaner. Monolithic: easier to start |
6. Testing Strategy
6.1 Unit Tests
| Test | Input | Expected Result |
|---|---|---|
| Parse simple | “ls -l” | argv = [“ls”, “-l”, NULL] |
| Parse redirect | “cat < f” | input_file = “f” |
| Parse pipe | “a | b” | 2 commands |
| Parse background | “cmd &” | background = 1 |
6.2 Integration Tests
# Test basic execution
$ echo "echo hello" | ./mysh
hello
# Test pipelines
$ echo "echo abc | cat | cat" | ./mysh
abc
# Test redirection
$ echo "echo test > /tmp/out" | ./mysh
$ cat /tmp/out
test
# Test job control (interactive)
$ ./mysh
mysh> sleep 10 &
[1] 12345
mysh> jobs
[1] Running sleep 10
mysh> fg
sleep 10
^C
mysh>
6.3 Edge Cases to Test
| Case | Test | Expected Behavior |
|---|---|---|
| Empty command | Press enter | Print new prompt |
| Command not found | “nosuchcmd” | Error: command not found |
| Permission denied | Execute non-executable | Error message |
| Pipe without command | “ls |” | Syntax error |
| Background empty | “&” | Syntax error or ignore |
| Ctrl-C with no foreground | Press Ctrl-C | Print new prompt |
| cd with no args | “cd” | Go to $HOME |
| cd to nonexistent | “cd /nosuch” | Error message |
| Very long pipeline | “cat|cat|cat|…” | Should work |
6.4 Verification Commands
# Check for zombie processes
$ ps aux | grep -E '(defunct|Z)'
# Should show nothing from your shell
# Check file descriptor leaks
$ ls -l /proc/$(pgrep mysh)/fd | wc -l
# Should be stable over time
# Test with strace
$ strace -f ./mysh
# Verify fork/exec/wait patterns
# Memory check
$ valgrind --leak-check=full ./mysh
# Run some commands, then exit
7. Common Pitfalls & Debugging
Problem 1: “Pipeline hangs forever”
- Why: Forgot to close write end of pipe; reader never sees EOF
- Fix: Close ALL unused pipe ends in ALL processes (parent and children)
Problem 2: “Ctrl-C kills the shell”
- Why: Shell didn’t ignore SIGINT
- Fix: Shell must
signal(SIGINT, SIG_IGN)and reset in children
Problem 3: “Background job immediately stopped”
- Why: Background job tried to read from terminal (SIGTTIN)
- Fix: Redirect stdin from /dev/null for background jobs
Problem 4: “Zombie processes accumulating”
- Why: Not handling SIGCHLD properly
- Fix: Install SIGCHLD handler that calls
while (waitpid(-1, NULL, WNOHANG) > 0);
Problem 5: “fg command doesn’t work”
- Why: Not calling tcsetpgrp() to give terminal to job
- Fix:
tcsetpgrp(shell_terminal, job_pgid)before waiting
Problem 6: “Commands work from shell but not from mysh”
- Why: Not using execvp() (which searches PATH)
- Fix: Use execvp() instead of execv()
8. Extensions & Challenges
8.1 Easy Extensions
| Extension | Description | Learning |
|---|---|---|
| History | Use readline or implement own | File I/O, data structures |
| Aliases | command aliases | Hash tables |
| Environment | export, unset | environ array |
| Prompt | PS1 variable | String formatting |
8.2 Advanced Challenges
| Challenge | Description | Learning |
|---|---|---|
| Command substitution | $(cmd) or cmd |
Nested execution |
| Subshells | (cmd1; cmd2) | Process isolation |
| Control flow | if/then/else, while | Parsing, state machine |
| Functions | Function definition | Symbol table |
| Tab completion | Complete commands/files | Terminal I/O, directory ops |
| Scripting | Run shell scripts | File reading, interpreter |
8.3 Research Topics
- How does bash implement its parser?
- What is the POSIX shell specification?
- How does zsh achieve such fast completion?
- What is fish shell’s approach to job control?
9. Real-World Connections
9.1 Production Systems Using This
| System | How It Uses Process Control | Notable Feature |
|---|---|---|
| bash | Full POSIX shell | Feature-complete |
| zsh | Extended shell | Powerful completion |
| fish | Modern shell | User-friendly defaults |
| init/systemd | Service management | Process supervision |
| Docker | Container execution | Namespace isolation |
| sshd | Remote execution | Pseudo-terminal handling |
9.2 How the Pros Do It
Bash:
- Uses yacc for parsing
- Complex job control state machine
- Careful signal handling
systemd:
- Uses cgroups for process tracking
- Socket activation
- Dependency-based startup
9.3 Reading the Source
# Bash source
$ git clone https://git.savannah.gnu.org/git/bash.git
$ less bash/execute_cmd.c # Command execution
$ less bash/jobs.c # Job control
# Simple shell example
$ git clone https://github.com/brenns10/lsh
$ less lsh/src/main.c
# Plan 9 rc shell (simpler)
$ git clone https://github.com/rakitzis/rc
$ less rc/exec.c
10. Resources
10.1 Man Pages
$ man 2 fork # Create child process
$ man 3 exec # Execute program
$ man 2 wait # Wait for child
$ man 2 pipe # Create pipe
$ man 2 dup2 # Duplicate file descriptor
$ man 2 setpgid # Set process group
$ man 3 tcsetpgrp # Set foreground group
$ man 7 signal # Signal overview
10.2 Online Resources
- Writing a Shell in C - Stephen Brennan
- The Linux Programming Interface - Chapter 34
- POSIX Shell Specification
10.3 Book Chapters
| Book | Chapters | Topics Covered |
|---|---|---|
| “APUE” by Stevens | Ch. 8, 9, 10 | Processes, Groups, Signals |
| “TLPI” by Kerrisk | Ch. 24-28, 34 | Processes, Job Control |
| “The Design of the UNIX Operating System” | Ch. 7 | Process Control |
11. Self-Assessment Checklist
Before considering this project complete, verify:
- I can explain the fork/exec/wait pattern
- I understand why fork and exec are separate
- Pipelines with 3+ stages work correctly
- Ctrl-C kills the foreground job, not the shell
- Ctrl-Z stops the foreground job
- Background jobs run independently
- jobs/fg/bg commands work
- No zombie processes accumulate
- All pipe file descriptors are closed properly
- I can answer all the interview questions
- valgrind shows no memory leaks
12. Submission / Completion Criteria
Your project is complete when:
- Functionality
- Single commands execute correctly
- Pipelines with multiple stages work
- I/O redirection works (< > » 2>)
- Background jobs work (&)
- Job control works (Ctrl-C, Ctrl-Z, fg, bg, jobs)
- Quality
- No compiler warnings
- No valgrind errors
- No zombie processes
- Handles errors gracefully
- Testing
- Can run complex pipelines
- Job control is reliable
- Edge cases handled
- Understanding
- Can explain every aspect of process control
- Can debug process-related issues
- Can extend the shell with new features
Next Steps
After completing this project, you’ve mastered UNIX process control. Consider:
- Project 5: Process Monitor - Apply process knowledge to /proc
- Project 9: Daemon Service - Long-running process management
- Project 15: Terminal Emulator - The other side of terminal I/O
The shell is arguably the most important systems programming project. Everything else builds on these concepts.