Project 17: Final Project - Complete UNIX System Shell

Build a fully-featured UNIX shell combining everything you’ve learned: process control, job control, signals, pipes, redirection, environment variables, command history, tab completion, and scripting support. This is the ultimate integration challenge that tests mastery of every UNIX systems programming concept.


Quick Reference

Attribute Value
Difficulty Level 5 - Master
Time 4-6 Weeks
Language C (alt: Rust)
Prerequisites ALL previous projects, especially Projects 4, 6, 9, 15
Key Topics Process control, job control, signals, pipes, redirection, terminal I/O, parsing, scripting
APUE Chapters All chapters (1-21)
Coolness Level 5 - Pure Magic
Portfolio Value Resume Gold - Ultimate Systems Programming Credential

1. Learning Objectives

By completing this project, you will:

  1. Master process lifecycle management: Create, track, and control multiple child processes with proper resource cleanup
  2. Implement full job control: Background processes, foreground/background switching, job suspension/resumption, process groups, and controlling terminals
  3. Handle signals correctly: Signal-safe programming, signal blocking during critical sections, propagating signals to process groups
  4. Build a complete lexer/parser: Tokenize and parse shell syntax including quotes, escapes, pipes, and control operators
  5. Implement I/O redirection: stdin/stdout/stderr redirection, file descriptors, here-documents
  6. Create interactive features: Line editing, command history, tab completion, prompt customization
  7. Support shell scripting: Control structures (if/for/while), functions, variables, and script execution
  8. Integrate all UNIX systems concepts: This project demonstrates mastery of every topic in Stevens’ APUE

2. Theoretical Foundation

2.1 Core Concepts

What Is a Shell?

A shell is a command interpreter that sits between the user and the operating system kernel. It reads commands, parses them, and executes them by creating new processes.

The Shell's Role in UNIX

┌─────────────────────────────────────────────────────────────────────┐
│                            USER                                      │
│                              │                                       │
│                              ▼                                       │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                         SHELL                                │    │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐│    │
│  │  │  Lexer   │►│  Parser  │►│ Executor │►│ Job Controller   ││    │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────────────┘│    │
│  │       │            │            │               │            │    │
│  │       │            │            │               ▼            │    │
│  │       │            │            │    ┌──────────────────────┐│    │
│  │       │            │            │    │   Process Table      ││    │
│  │       │            │            │    │   (jobs, pids)       ││    │
│  │       │            │            │    └──────────────────────┘│    │
│  └───────┼────────────┼────────────┼───────────────────────────┘    │
│          │            │            │                                 │
│          │  System    │            │                                 │
│          │  Calls     ▼            ▼                                 │
│  ┌───────┴────────────────────────────────────────────────────┐     │
│  │                         KERNEL                              │     │
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐│     │
│  │  │ Process  │ │   File   │ │  Signal  │ │  Terminal        ││     │
│  │  │ Mgmt     │ │ System   │ │ Handling │ │  Driver          ││     │
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────────────┘│     │
│  └─────────────────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────────────┘

Process Groups and Sessions

Job control requires understanding process groups and sessions:

Sessions, Process Groups, and Jobs

┌─────────────────────────────────────────────────────────────────────┐
│                        SESSION (SID = 1000)                          │
│                    Controlling Terminal: /dev/pts/0                  │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  FOREGROUND PROCESS GROUP (PGID = 1000)                        │ │
│  │  ┌─────────────────────────────────────────────────────────┐   │ │
│  │  │ Shell (PID=1000, Session Leader)                        │   │ │
│  │  │ Receives SIGINT, SIGTSTP from terminal                  │   │ │
│  │  └─────────────────────────────────────────────────────────┘   │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  BACKGROUND JOB [1] (PGID = 1050)                              │ │
│  │  ┌─────────────┐   ┌─────────────┐   ┌─────────────┐          │ │
│  │  │ cmd1 (1050) │ ─►│ cmd2 (1051) │ ─►│ cmd3 (1052) │          │ │
│  │  │ (group ldr) │   │             │   │             │          │ │
│  │  └─────────────┘   └─────────────┘   └─────────────┘          │ │
│  │  No terminal input; SIGTTIN if tries to read                   │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  BACKGROUND JOB [2] (PGID = 1060)                              │ │
│  │  ┌─────────────────────────────────────────────────────────┐   │ │
│  │  │ long_running_process (PID=1060)                         │   │ │
│  │  └─────────────────────────────────────────────────────────┘   │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  When shell does: fg %1                                              │
│  1. PGID 1050 becomes foreground process group                      │
│  2. Shell sets tcsetpgrp(fd, 1050)                                  │
│  3. Shell sends SIGCONT to PGID 1050                                │
│  4. Shell waits for job to complete or stop                         │
└─────────────────────────────────────────────────────────────────────┘

Signal Handling in a Shell

A shell must handle many signals carefully:

Signal Handling Requirements

┌──────────────┬────────────────────────────────────────────────────────┐
│   Signal     │   Shell's Response                                     │
├──────────────┼────────────────────────────────────────────────────────┤
│   SIGINT     │   If in prompt: ignore (don't exit shell)             │
│   (Ctrl-C)   │   During command: let foreground job handle it        │
├──────────────┼────────────────────────────────────────────────────────┤
│   SIGTSTP    │   If in prompt: ignore                                 │
│   (Ctrl-Z)   │   During command: let foreground job stop              │
├──────────────┼────────────────────────────────────────────────────────┤
│   SIGCHLD    │   Always: reap zombies, update job table              │
│              │   Check if background jobs completed                   │
├──────────────┼────────────────────────────────────────────────────────┤
│   SIGQUIT    │   Ignore at prompt, let foreground job handle         │
│   (Ctrl-\)   │                                                        │
├──────────────┼────────────────────────────────────────────────────────┤
│   SIGTTIN    │   Background job tried to read terminal               │
│   SIGTTOU    │   Background job tried to write terminal (if set)     │
├──────────────┼────────────────────────────────────────────────────────┤
│   SIGCONT    │   Resume stopped jobs                                  │
└──────────────┴────────────────────────────────────────────────────────┘

Signal Flow:

   User presses Ctrl-C
          │
          ▼
   Terminal driver generates SIGINT
          │
          ▼
   ┌────────────────────────────────────────┐
   │  SIGINT sent to FOREGROUND             │
   │  PROCESS GROUP                         │
   │                                        │
   │  If foreground = shell: shell ignores  │
   │  If foreground = job: job receives it  │
   └────────────────────────────────────────┘

Pipes and Redirection

Pipes connect processes, redirection connects processes to files:

Pipe and Redirection Internals

Command: cat file.txt | grep pattern | wc -l > out.txt

Step 1: Create pipes
   pipe1[0,1]    pipe2[0,1]
      ▼             ▼
   [r1|w1]       [r2|w2]

Step 2: Fork and connect

   Parent (shell)
       │
       ├──fork()──► Child 1 (cat)
       │            dup2(open("file.txt"), STDIN)
       │            dup2(w1, STDOUT)
       │            close(r1, w1, r2, w2)
       │            exec("cat")
       │
       ├──fork()──► Child 2 (grep)
       │            dup2(r1, STDIN)
       │            dup2(w2, STDOUT)
       │            close(r1, w1, r2, w2)
       │            exec("grep", "pattern")
       │
       └──fork()──► Child 3 (wc)
                    dup2(r2, STDIN)
                    dup2(open("out.txt"), STDOUT)
                    close(r1, w1, r2, w2)
                    exec("wc", "-l")

Final state:
┌─────────────────────────────────────────────────────────────────┐
│                                                                  │
│  file.txt ──► [cat] ──pipe1──► [grep] ──pipe2──► [wc] ──► out.txt│
│                                                                  │
│               Process Group: All in same PGID                    │
└─────────────────────────────────────────────────────────────────┘

Shell Parsing Stages

Parsing a Command Line

Input: VAR=hello echo "$VAR" | grep 'h' > out.txt &

Stage 1: LEXER (Tokenization)
┌─────────────────────────────────────────────────────────────────┐
│ Tokens:                                                          │
│   [ASSIGNMENT: VAR=hello]                                       │
│   [WORD: echo]                                                  │
│   [DQUOTE_STRING: "$VAR"]  → Will expand $VAR later             │
│   [PIPE: |]                                                     │
│   [WORD: grep]                                                  │
│   [SQUOTE_STRING: 'h']     → No expansion in single quotes      │
│   [REDIRECT_OUT: >]                                             │
│   [WORD: out.txt]                                               │
│   [BACKGROUND: &]                                               │
└─────────────────────────────────────────────────────────────────┘

Stage 2: PARSER (Build AST)
┌─────────────────────────────────────────────────────────────────┐
│                    BackgroundCommand                             │
│                           │                                      │
│                      Pipeline                                    │
│                     /         \                                  │
│            SimpleCommand    SimpleCommand                        │
│           /     |     \         |      \                        │
│    Assign   Args  Redirs    Args    Redirs                      │
│    VAR=hello  │     │         │        │                        │
│           [echo,$VAR] []  [grep,h] [>out.txt]                   │
└─────────────────────────────────────────────────────────────────┘

Stage 3: EXPANSION
┌─────────────────────────────────────────────────────────────────┐
│ • Variable expansion: "$VAR" → "hello"                          │
│ • Command substitution: $(cmd) → output of cmd                  │
│ • Glob expansion: *.c → file1.c file2.c                        │
│ • Tilde expansion: ~ → /home/user                               │
│ • Arithmetic expansion: $((1+2)) → 3                            │
└─────────────────────────────────────────────────────────────────┘

Stage 4: EXECUTION
┌─────────────────────────────────────────────────────────────────┐
│ 1. Create new process group for pipeline                        │
│ 2. Create pipe between commands                                 │
│ 3. Fork first child (echo):                                     │
│    - Set process group                                          │
│    - Set VAR=hello in environment                               │
│    - Connect stdout to pipe write end                           │
│    - exec("echo", ["echo", "hello"])                            │
│ 4. Fork second child (grep):                                    │
│    - Set same process group                                     │
│    - Connect stdin to pipe read end                             │
│    - Open out.txt, dup2 to stdout                               │
│    - exec("grep", ["grep", "h"])                                │
│ 5. Put in background (don't wait)                               │
│ 6. Print: [1] 12345                                             │
│ 7. Return to prompt                                             │
└─────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

The Ultimate Test: Building a shell requires mastery of:

  • Process management (fork, exec, wait, process groups)
  • Signal handling (all signals relevant to job control)
  • File I/O (redirection, pipes, file descriptors)
  • Terminal I/O (raw mode, line discipline, PTY)
  • IPC (pipes as primary mechanism)
  • Parsing (lexer, parser, AST)
  • Memory management (no leaks in long-running shell)

Historical Significance:

  • The Thompson shell (1971) was the first UNIX shell
  • The Bourne shell (sh, 1979) established the scripting language
  • C shell (csh, 1978) added job control
  • Bash (1989) combined features and became the standard
  • Zsh, Fish, and others continue innovation

Career Impact:

  • Deep understanding of shells makes you an exceptional debugger
  • Shell internals knowledge is tested at Google, Meta, and systems companies
  • Building a shell demonstrates complete UNIX mastery
  • Every sysadmin tool, CI/CD system, and deployment script runs in a shell

2.3 Historical Context

Shell Evolution Timeline

1971: Thompson Shell (Ken Thompson)
      └── First UNIX shell
      └── Simple command execution
      └── Redirection with > and <
      └── No variables or scripting

1975: PWB Shell (John Mashey)
      └── Added shell variables
      └── if/then/else control flow
      └── Shell scripts possible

1978: C Shell - csh (Bill Joy)
      └── C-like syntax
      └── Job control (fg, bg, jobs)
      └── Command history
      └── Aliases

1979: Bourne Shell - sh (Stephen Bourne)
      └── POSIX standard
      └── Powerful scripting
      └── Functions
      └── Here documents
      └── Command substitution

1983: Korn Shell - ksh (David Korn)
      └── Bourne compatible
      └── C shell features
      └── Arrays, associative arrays
      └── Better performance

1989: Bash (Brian Fox, GNU)
      └── Bourne Again Shell
      └── GNU's free replacement
      └── Programmable completion
      └── Most popular shell today

1990: Zsh (Paul Falstad)
      └── Bourne + ksh + csh + more
      └── Spelling correction
      └── Themes, plugins (oh-my-zsh)

2005: Fish (Axel Liljencrantz)
      └── User-friendly defaults
      └── Web-based configuration
      └── Syntax highlighting

2.4 Common Misconceptions

Misconception 1: “The shell is just a program that runs other programs”

  • Reality: The shell manages process groups, sessions, job control, signal handling, environment inheritance, and much more. It’s a complex process supervisor.

Misconception 2: “Pipes are simple - just connect stdout to stdin”

  • Reality: Pipes require careful file descriptor management. You must close unused ends in both parent and children, handle SIGPIPE, and set up process groups correctly.

Misconception 3: “Signal handling in a shell is straightforward”

  • Reality: You must block signals at precise moments, handle race conditions with SIGCHLD, and carefully manage which process group receives terminal signals.

Misconception 4: “Parsing shell commands is like parsing any language”

  • Reality: Shell parsing is context-dependent (quotes change meaning, word splitting, glob expansion). It’s one of the most complex parsers you can build.

Misconception 5: “Job control is just running things in the background”

  • Reality: Job control involves process groups, the controlling terminal, SIGTSTP/SIGCONT, foreground/background switching, and careful state management.

3. Project Specification

3.1 What You Will Build

A fully-featured UNIX shell named mysh that:

  1. Executes commands with PATH lookup
  2. Implements pipes for arbitrary pipeline length
  3. Supports redirection (>, », <, 2>, 2>&1, here-documents)
  4. Provides job control (fg, bg, jobs, Ctrl-Z, &)
  5. Handles signals correctly (SIGINT, SIGTSTP, SIGCHLD, etc.)
  6. Supports builtins (cd, exit, export, source, alias, etc.)
  7. Provides history with expansion (!!, !n, !string)
  8. Implements tab completion for commands, files, and variables
  9. Parses shell scripts with control structures (if, for, while)
  10. Expands variables ($VAR, ${VAR:-default}, $?, $$, etc.)

3.2 Functional Requirements

Basic Command Execution:

  1. Execute commands with PATH search
  2. Handle command not found gracefully
  3. Pass arguments correctly
  4. Return exit status ($?)

Pipes and Redirection:

  1. Support pipes (cmd1 cmd2 cmd3)
  2. Support input redirection (<)
  3. Support output redirection (>, »)
  4. Support stderr redirection (2>, 2»)
  5. Support descriptor duplication (2>&1)
  6. Support here-documents («EOF)

Job Control:

  1. Run commands in background (&)
  2. List jobs (jobs)
  3. Bring job to foreground (fg %n)
  4. Continue job in background (bg %n)
  5. Handle Ctrl-Z (SIGTSTP)
  6. Handle Ctrl-C (SIGINT) properly
  7. Report job completion

Environment:

  1. Support environment variables ($VAR)
  2. Export variables (export VAR=value)
  3. Unset variables (unset VAR)
  4. Local assignment (VAR=value cmd)

Builtins:

  1. cd (with ~, -, CDPATH)
  2. pwd
  3. exit [n]
  4. echo
  5. type (builtin/external)
  6. source / . (execute script)
  7. alias / unalias
  8. history
  9. set / unset

Interactive Features:

  1. Command history (up/down arrows)
  2. History expansion (!!, !n, !?string)
  3. Line editing (arrows, backspace, delete)
  4. Tab completion (files, commands)
  5. Prompt customization ($PS1)

Scripting:

  1. Execute shell scripts
  2. if/then/elif/else/fi
  3. for/do/done
  4. while/until/do/done
  5. case/esac
  6. Functions
  7. Command substitution $(cmd)
  8. Exit on error (set -e)
  9. Glob expansion (*, ?, [])

3.3 Non-Functional Requirements

  1. Reliability:
    • No memory leaks (long-running shell stability)
    • Proper zombie reaping
    • Graceful handling of all edge cases
  2. Performance:
    • Fast command startup
    • Efficient history search
    • Responsive completion
  3. POSIX Compatibility:
    • Basic sh scripts should run
    • Standard builtins behave correctly
  4. Usability:
    • Clear error messages
    • Intuitive completion
    • Reasonable defaults

3.4 Example Usage / Output

# 1. Start your shell
$ ./mysh
Welcome to mysh v1.0
mysh$

# 2. Basic commands with pipes and redirection
mysh$ ls -la | grep "\.c$" | wc -l
15
mysh$ cat /etc/passwd | sort | head -5 > sorted_users.txt
mysh$ grep root < /etc/passwd
root:x:0:0:root:/root:/bin/bash

# 3. Environment variables
mysh$ export MYVAR="Hello World"
mysh$ echo $MYVAR
Hello World
mysh$ env | grep MYVAR
MYVAR=Hello World

# 4. Job control
mysh$ sleep 100 &
[1] 12345
mysh$ sleep 200 &
[2] 12346
mysh$ jobs
[1]  Running    sleep 100 &
[2]  Running    sleep 200 &
mysh$ fg 1
sleep 100
^Z
[1]+ Stopped    sleep 100
mysh$ bg 1
[1]+ sleep 100 &
mysh$ kill %1
[1]  Terminated  sleep 100

# 5. Command history
mysh$ history
   1  ls -la | grep "\.c$" | wc -l
   2  cat /etc/passwd | sort | head -5 > sorted_users.txt
   3  export MYVAR="Hello World"
   4  echo $MYVAR
mysh$ !3
export MYVAR="Hello World"
mysh$ !!
echo $MYVAR
Hello World

# 6. Tab completion
mysh$ cd /usr/lo<TAB>
mysh$ cd /usr/local/
mysh$ ls /usr/local/bi<TAB>
mysh$ ls /usr/local/bin/

# 7. Aliases
mysh$ alias ll="ls -la"
mysh$ ll
total 48
drwxr-xr-x  5 user user 4096 Mar 15 10:00 .
...

# 8. Scripting
mysh$ cat > myscript.sh << 'EOF'
#!/usr/bin/mysh
for i in 1 2 3; do
    echo "Count: $i"
done
EOF
mysh$ chmod +x myscript.sh
mysh$ ./myscript.sh
Count: 1
Count: 2
Count: 3

# 9. Control structures (interactive)
mysh$ if [ -f /etc/passwd ]; then echo "exists"; else echo "no"; fi
exists

mysh$ for f in *.c; do echo "File: $f"; done
File: main.c
File: parser.c
File: executor.c

# 10. Signal handling
mysh$ trap "echo Caught SIGINT" INT
mysh$ ^C
Caught SIGINT
mysh$

# 11. Builtin commands
mysh$ cd /tmp
mysh$ pwd
/tmp
mysh$ type ls
ls is /bin/ls
mysh$ type cd
cd is a shell builtin

# 12. Complex pipelines
mysh$ find . -name "*.c" -print0 | xargs -0 grep -l "main" | head -5
./src/main.c
./tests/test_main.c

# 13. Error handling
mysh$ nonexistent_command
mysh: nonexistent_command: command not found
mysh$ echo $?
127

# 14. Descriptor redirection
mysh$ cmd_that_fails 2>&1 | tee error.log
Error: something went wrong
mysh$ cat error.log
Error: something went wrong

3.5 Real World Outcome

When this project is complete, you will have:

  1. A usable interactive shell that you could actually use daily
  2. A script interpreter capable of running basic shell scripts
  3. Deep understanding of every concept in Stevens’ APUE
  4. Interview-ready knowledge of UNIX systems programming
  5. A portfolio piece demonstrating mastery-level systems programming

You can verify success by:

  • Using your shell as your daily shell for a week
  • Running bash scripts and checking compatibility
  • Running the bash test suite on compatible features
  • Comparing strace output with bash for same commands

4. Solution Architecture

4.1 High-Level Design

                        Shell Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                           USER INPUT                                 │
│                               │                                      │
│                               ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │                      INPUT LAYER                                 ││
│  │  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────────┐ ││
│  │  │   Readline  │  │   History   │  │     Tab Completion       │ ││
│  │  │   (editing) │  │   Manager   │  │     Engine               │ ││
│  │  └─────────────┘  └─────────────┘  └──────────────────────────┘ ││
│  └────────────────────────────┬────────────────────────────────────┘│
│                               │                                      │
│                               ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │                      PARSING LAYER                               ││
│  │  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────────┐ ││
│  │  │   Lexer     │►│   Parser    │►│   AST Builder             │ ││
│  │  │ (tokenize)  │  │  (grammar)  │  │   (nodes)                │ ││
│  │  └─────────────┘  └─────────────┘  └──────────────────────────┘ ││
│  └────────────────────────────┬────────────────────────────────────┘│
│                               │                                      │
│                               ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │                      EXPANSION LAYER                             ││
│  │  ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌──────────────────┐ ││
│  │  │ Variable  │ │ Command   │ │   Glob    │ │   Tilde/Brace    │ ││
│  │  │ Expansion │ │ Substit.  │ │ Expansion │ │   Expansion      │ ││
│  │  └───────────┘ └───────────┘ └───────────┘ └──────────────────┘ ││
│  └────────────────────────────┬────────────────────────────────────┘│
│                               │                                      │
│                               ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │                      EXECUTION LAYER                             ││
│  │  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────────┐ ││
│  │  │  Builtins   │  │  Executor   │  │   Redirector             │ ││
│  │  │  Handler    │  │ (fork/exec) │  │   (dup2, pipes)          │ ││
│  │  └─────────────┘  └─────────────┘  └──────────────────────────┘ ││
│  └────────────────────────────┬────────────────────────────────────┘│
│                               │                                      │
│                               ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │                      JOB CONTROL LAYER                           ││
│  │  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────────┐ ││
│  │  │   Job       │  │  Signal     │  │   Process Group          │ ││
│  │  │   Table     │  │  Manager    │  │   Manager                │ ││
│  │  └─────────────┘  └─────────────┘  └──────────────────────────┘ ││
│  └─────────────────────────────────────────────────────────────────┘│
│                               │                                      │
│                               ▼                                      │
│                           KERNEL                                     │
└─────────────────────────────────────────────────────────────────────┘

4.2 Key Components

1. Input Layer

  • readline.c: Line editing (arrows, backspace, home/end)
  • history.c: Command history storage, navigation, expansion
  • complete.c: Tab completion for files, commands, variables

2. Parsing Layer

  • lexer.c: Tokenize input into words, operators, strings
  • parser.c: Build AST from tokens according to grammar
  • ast.c: AST node types and manipulation

3. Expansion Layer

  • expand.c: Variable, command, and glob expansion
  • glob.c: Pattern matching (* ? [])
  • quote.c: Quote removal and escape handling

4. Execution Layer

  • builtins.c: cd, exit, export, alias, etc.
  • execute.c: Fork, exec, pipe setup
  • redirect.c: File descriptor redirection

5. Job Control Layer

  • jobs.c: Job table management
  • signals.c: Signal handling for job control
  • pgroup.c: Process group management

4.3 Data Structures

/* Token types */
typedef enum {
    TOK_WORD,           /* Regular word */
    TOK_ASSIGNMENT,     /* VAR=value */
    TOK_PIPE,           /* | */
    TOK_AND,            /* && */
    TOK_OR,             /* || */
    TOK_SEMICOLON,      /* ; */
    TOK_NEWLINE,        /* \n */
    TOK_BACKGROUND,     /* & */
    TOK_REDIRECT_IN,    /* < */
    TOK_REDIRECT_OUT,   /* > */
    TOK_REDIRECT_APPEND,/* >> */
    TOK_HEREDOC,        /* << */
    TOK_LPAREN,         /* ( */
    TOK_RPAREN,         /* ) */
    TOK_IF,             /* if */
    TOK_THEN,           /* then */
    TOK_ELSE,           /* else */
    TOK_ELIF,           /* elif */
    TOK_FI,             /* fi */
    TOK_FOR,            /* for */
    TOK_WHILE,          /* while */
    TOK_DO,             /* do */
    TOK_DONE,           /* done */
    TOK_CASE,           /* case */
    TOK_ESAC,           /* esac */
    TOK_EOF,            /* end of input */
} token_type_t;

typedef struct {
    token_type_t type;
    char *value;        /* Token text */
    int line;           /* Source line */
    int column;         /* Source column */
} token_t;

/* AST node types */
typedef enum {
    AST_COMMAND,        /* Simple command */
    AST_PIPELINE,       /* cmd1 | cmd2 */
    AST_AND_OR,         /* cmd1 && cmd2, cmd1 || cmd2 */
    AST_SEQUENCE,       /* cmd1 ; cmd2 */
    AST_BACKGROUND,     /* cmd & */
    AST_SUBSHELL,       /* (cmd) */
    AST_IF,             /* if/then/else/fi */
    AST_FOR,            /* for/do/done */
    AST_WHILE,          /* while/do/done */
    AST_CASE,           /* case/esac */
    AST_FUNCTION,       /* function definition */
} ast_type_t;

/* Redirection */
typedef struct redirect {
    int fd;             /* File descriptor to redirect */
    int flags;          /* O_RDONLY, O_WRONLY, etc. */
    char *filename;     /* Target file, or NULL for fd dup */
    int target_fd;      /* For fd duplication (2>&1) */
    char *heredoc;      /* Content for here-document */
    struct redirect *next;
} redirect_t;

/* Simple command */
typedef struct {
    char **argv;        /* Command and arguments */
    int argc;           /* Argument count */
    char **assignments; /* VAR=value prefixes */
    int assign_count;
    redirect_t *redirects;
} simple_cmd_t;

/* AST node */
typedef struct ast_node {
    ast_type_t type;
    union {
        simple_cmd_t *command;
        struct {
            struct ast_node **commands;
            int count;
        } pipeline;
        struct {
            struct ast_node *left;
            struct ast_node *right;
            int is_and;  /* && vs || */
        } and_or;
        struct {
            struct ast_node *condition;
            struct ast_node *then_part;
            struct ast_node *else_part;
        } if_clause;
        struct {
            char *var;
            char **words;
            struct ast_node *body;
        } for_clause;
        struct {
            struct ast_node *condition;
            struct ast_node *body;
            int is_until;
        } while_clause;
    } u;
    int background;     /* Run in background */
} ast_node_t;

/* Job states */
typedef enum {
    JOB_RUNNING,
    JOB_STOPPED,
    JOB_DONE,
    JOB_TERMINATED,
} job_state_t;

/* Process in a job */
typedef struct process {
    pid_t pid;          /* Process ID */
    char *cmd;          /* Command string (for display) */
    int status;         /* Exit status or signal */
    int completed;      /* Has exited */
    int stopped;        /* Has stopped */
    struct process *next;
} process_t;

/* Job (pipeline of processes) */
typedef struct job {
    int id;             /* Job number [1], [2], etc. */
    pid_t pgid;         /* Process group ID */
    char *command;      /* Full command string */
    process_t *procs;   /* List of processes in pipeline */
    job_state_t state;
    int notified;       /* User has been notified of state change */
    int foreground;     /* Is this the foreground job */
    struct termios tmodes; /* Terminal modes for stopped job */
    struct job *next;
} job_t;

/* Shell state */
typedef struct {
    job_t *jobs;        /* Job list */
    int next_job_id;    /* Next job number */
    char **history;     /* Command history */
    int history_count;
    int history_capacity;
    int history_index;  /* Current position for navigation */
    char *prompt;       /* PS1 */
    char *cwd;          /* Current working directory */
    int last_status;    /* $? */
    pid_t shell_pgid;   /* Shell's process group */
    struct termios shell_tmodes; /* Shell's terminal modes */
    int terminal;       /* Terminal file descriptor */
    int interactive;    /* Is interactive shell */
    /* Variables */
    char **env_vars;    /* Environment variables */
    char **local_vars;  /* Local shell variables */
    /* Aliases */
    char **alias_names;
    char **alias_values;
    int alias_count;
    /* Functions */
    struct {
        char *name;
        ast_node_t *body;
    } *functions;
    int function_count;
} shell_t;

4.4 Algorithm Overview

Main REPL Loop:

function main_loop(shell):
    while true:
        display_prompt()
        line = read_line_with_editing()

        if line is empty or comment:
            continue

        add_to_history(line)

        tokens = lexer_tokenize(line)
        if syntax_error:
            print_error()
            continue

        ast = parser_parse(tokens)
        if parse_error:
            print_error()
            continue

        ast = expand_all(ast)

        execute(shell, ast)

        check_background_jobs(shell)

Execute Pipeline:

function execute_pipeline(shell, pipeline):
    # Create process group
    pgid = 0

    # Create pipes
    pipes = create_pipes(len(pipeline) - 1)

    for i, cmd in enumerate(pipeline):
        pid = fork()

        if pid == 0:  # Child
            # Set process group
            setpgid(0, pgid if pgid else getpid())

            # Set up redirections and pipes
            if i > 0:
                dup2(pipes[i-1][0], STDIN)
            if i < len(pipeline) - 1:
                dup2(pipes[i][1], STDOUT)

            close_all_pipes(pipes)
            setup_redirections(cmd.redirects)

            # Handle local assignments
            for assignment in cmd.assignments:
                putenv(assignment)

            # Execute
            if is_builtin(cmd.argv[0]):
                exit(run_builtin(cmd))
            else:
                execvp(cmd.argv[0], cmd.argv)
                exit(127)  # Not found

        else:  # Parent
            if pgid == 0:
                pgid = pid
            setpgid(pid, pgid)
            add_process_to_job(job, pid, cmd)

    close_all_pipes(pipes)

    if not background:
        put_job_in_foreground(shell, job)
        wait_for_job(shell, job)
    else:
        put_job_in_background(shell, job)
        print_job_started(job)

Job Control - Foreground:

function put_job_in_foreground(shell, job):
    # Give terminal to job's process group
    tcsetpgrp(shell.terminal, job.pgid)

    # Send SIGCONT if stopped
    if job.state == STOPPED:
        kill(-job.pgid, SIGCONT)
        job.state = RUNNING

    # Wait for job
    wait_for_job(shell, job)

    # Take back terminal control
    tcsetpgrp(shell.terminal, shell.shell_pgid)

    # Restore shell's terminal modes
    tcsetattr(shell.terminal, TCSADRAIN, shell.shell_tmodes)

5. Implementation Guide

5.1 Development Environment Setup

# Required packages (Ubuntu/Debian)
sudo apt-get install build-essential libreadline-dev

# macOS (readline from Homebrew for better features)
brew install readline
export CFLAGS="-I/usr/local/opt/readline/include"
export LDFLAGS="-L/usr/local/opt/readline/lib"

# Create project structure
mkdir -p mysh/src mysh/include mysh/tests
cd mysh

# Initialize basic Makefile
cat > Makefile << 'EOF'
CC = gcc
CFLAGS = -Wall -Wextra -g -Iinclude
LDFLAGS = -lreadline

SRC = $(wildcard src/*.c)
OBJ = $(SRC:.c=.o)

mysh: $(OBJ)
	$(CC) -o $@ $^ $(LDFLAGS)

clean:
	rm -f $(OBJ) mysh

test: mysh
	./tests/run_tests.sh

.PHONY: clean test
EOF

5.2 Project Structure

mysh/
├── include/
│   ├── mysh.h              # Main header, shell state
│   ├── lexer.h             # Lexer interface
│   ├── parser.h            # Parser interface
│   ├── ast.h               # AST node definitions
│   ├── expand.h            # Expansion functions
│   ├── execute.h           # Execution functions
│   ├── builtins.h          # Builtin commands
│   ├── jobs.h              # Job control
│   ├── signals.h           # Signal handling
│   ├── history.h           # Command history
│   └── complete.h          # Tab completion
├── src/
│   ├── main.c              # Entry point, REPL
│   ├── shell.c             # Shell initialization, state
│   ├── lexer.c             # Tokenizer
│   ├── parser.c            # Parser
│   ├── ast.c               # AST operations
│   ├── expand.c            # Variable, command, glob expansion
│   ├── execute.c           # Fork, exec, pipes
│   ├── redirect.c          # File descriptor redirection
│   ├── builtins.c          # cd, exit, export, etc.
│   ├── jobs.c              # Job table, job control
│   ├── signals.c           # Signal handlers
│   ├── history.c           # History management
│   ├── complete.c          # Tab completion
│   └── utils.c             # Utility functions
├── tests/
│   ├── run_tests.sh        # Test runner
│   ├── test_lexer.c        # Lexer unit tests
│   ├── test_parser.c       # Parser unit tests
│   ├── test_basic.sh       # Basic command tests
│   ├── test_pipes.sh       # Pipe tests
│   ├── test_redirect.sh    # Redirection tests
│   ├── test_jobs.sh        # Job control tests
│   └── test_scripts.sh     # Script execution tests
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How do you build a complete, production-quality UNIX shell that handles all the edge cases real users encounter?”

This is the ultimate integration challenge. Every concept from this sprint comes together:

  • Processes, process groups, sessions (Projects 4-5)
  • Signals, async-safety (Project 6)
  • Threading concepts (informing concurrent job design) (Projects 7-8)
  • Daemon-like background processing (Project 9)
  • File watching for scripts (Project 10)
  • I/O multiplexing for input (Project 11)
  • Protocol parsing (lexer/parser) (Project 13)
  • IPC for pipes (Project 14)
  • Terminal handling (Project 15)
  • Persistence for history (concepts from Project 16)

5.4 Concepts You Must Understand First

All previous projects provide the foundation:

  1. From Project 4 (Shell) - Basic fork/exec/wait, pipes, redirection
  2. From Project 6 (Signals) - Signal handling, job control signals (SIGCHLD, SIGTSTP, SIGCONT)
  3. From Project 9 (Daemon) - Session management, controlling terminal
  4. From Project 15 (Terminal) - Raw mode for editing, line discipline, PTY concepts

Additional concepts for the complete shell:

  • Readline-style line editing - Arrow keys, history navigation, in-line editing
  • History expansion - !, !!, !n, !?string patterns
  • Variable expansion - $VAR, ${VAR:-default}, $?, $$
  • Control structures - if/then/else, for, while, case parsing and execution

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. Parser Architecture
    • Use a hand-written lexer or lex/flex?
    • Recursive descent parser or other approach?
    • How to handle nested quotes and escapes?
    • How to represent the AST for different constructs?
  2. Completion Engine
    • How to enumerate files, commands, variables?
    • How to handle ambiguous completions?
    • How to integrate with readline or your own line editor?
  3. Script Execution
    • Same parser for interactive and script mode?
    • How to handle syntax errors gracefully?
    • How to implement loops and conditionals?
  4. Job Control Edge Cases
    • What happens if fg on a completed job?
    • How to handle orphaned process groups?
    • What if user closes terminal while jobs running?

5.6 Thinking Exercise

Trace the Full Lifecycle of a Complex Command:

Input: VAR=hello echo "$VAR" | grep 'h' > out.txt &

1. LEXER:
   Input buffer: "VAR=hello echo \"$VAR\" | grep 'h' > out.txt &"

   Tokenization:
   ┌─────────────────────────────────────────────────────────┐
   │ Position 0-9:   "VAR=hello" → TOK_ASSIGNMENT            │
   │ Position 10-13: "echo"      → TOK_WORD                  │
   │ Position 14-20: "\"$VAR\""  → TOK_WORD (with quotes)    │
   │ Position 22:    "|"         → TOK_PIPE                  │
   │ Position 24-27: "grep"      → TOK_WORD                  │
   │ Position 29-31: "'h'"       → TOK_WORD (single quoted)  │
   │ Position 33:    ">"         → TOK_REDIRECT_OUT          │
   │ Position 35-41: "out.txt"   → TOK_WORD                  │
   │ Position 43:    "&"         → TOK_BACKGROUND            │
   └─────────────────────────────────────────────────────────┘

2. PARSER (Build AST):
   ┌─────────────────────────────────────────────────────────┐
   │                    BackgroundCommand                     │
   │                           │                              │
   │                      Pipeline                            │
   │                     /         \                          │
   │            SimpleCommand    SimpleCommand                │
   │           /     |     \         |      \                │
   │    Assign   Args  Redirs    Args    Redirs              │
   │    VAR=hello  │     │         │        │                │
   │           [echo,$VAR] []  [grep,h] [>out.txt]           │
   └─────────────────────────────────────────────────────────┘

3. EXPANSION:
   ┌─────────────────────────────────────────────────────────┐
   │ • Locate $VAR in "\"$VAR\""                             │
   │ • Look up VAR in environment → not found (it's a       │
   │   prefix assignment, only for this command)             │
   │ • Expand to "" or "hello" depending on interpretation   │
   │ • Single-quoted 'h' → literal "h" (no expansion)        │
   │ • No globs to expand                                    │
   │ • Result: [echo, hello] | [grep, h] > out.txt           │
   └─────────────────────────────────────────────────────────┘

4. EXECUTION:
   ┌─────────────────────────────────────────────────────────┐
   │ Step 1: Allocate new job, pgid = 0 (will be first PID) │
   │                                                         │
   │ Step 2: Create pipe                                     │
   │         pipe_fds = [3, 4]  (read, write)               │
   │                                                         │
   │ Step 3: Fork child 1 (echo)                             │
   │         Child:                                          │
   │           setpgid(0, 0)  → becomes PGID leader          │
   │           putenv("VAR=hello")                           │
   │           dup2(4, STDOUT)  → stdout to pipe write       │
   │           close(3), close(4)                            │
   │           execvp("echo", ["echo", "hello"])             │
   │         Parent:                                         │
   │           pgid = child1_pid                             │
   │           setpgid(child1_pid, child1_pid)               │
   │           record process in job                         │
   │                                                         │
   │ Step 4: Fork child 2 (grep)                             │
   │         Child:                                          │
   │           setpgid(0, pgid)  → join process group        │
   │           dup2(3, STDIN)   → stdin from pipe read       │
   │           fd = open("out.txt", O_WRONLY|O_CREAT|O_TRUNC)│
   │           dup2(fd, STDOUT) → stdout to file             │
   │           close(3), close(4), close(fd)                 │
   │           execvp("grep", ["grep", "h"])                 │
   │         Parent:                                         │
   │           setpgid(child2_pid, pgid)                     │
   │           record process in job                         │
   │                                                         │
   │ Step 5: Close pipes in parent                           │
   │         close(3), close(4)                              │
   │                                                         │
   │ Step 6: Background job → don't wait                     │
   │         Add to job table                                │
   │         Print: [1] 12345                                │
   │                                                         │
   │ Step 7: Return to prompt                                │
   └─────────────────────────────────────────────────────────┘

5. OUTPUT:
   [1] 12345
   mysh$

   (Eventually, when job completes:)
   [1]+  Done                    VAR=hello echo "$VAR" | grep 'h' > out.txt

5.7 Hints in Layers

Hint 1: Build Incrementally

Start with what you built in Project 4. Add features one at a time. Test thoroughly after each addition.

Suggested order:

  1. Basic command execution (already done in Project 4)
  2. Proper job control (process groups, SIGCHLD handling)
  3. Background jobs (&)
  4. Job builtins (jobs, fg, bg)
  5. Signal handling (SIGINT, SIGTSTP in foreground)
  6. History (basic up/down)
  7. More builtins (export, cd, alias)
  8. Improved lexer (quotes, escapes)
  9. Variable expansion
  10. Control structures
  11. Tab completion
  12. Scripting

Hint 2: Separate Concerns

lexer.c   → Tokenize input (state machine for quotes/escapes)
parser.c  → Build AST (recursive descent)
expand.c  → Variable/glob/command expansion
execute.c → Fork/exec/pipe orchestration
builtins.c → cd, export, alias, etc.
jobs.c    → Job table management
signals.c → SIGCHLD, SIGINT, SIGTSTP handlers
history.c → Command history and expansion
complete.c → Tab completion
main.c    → REPL loop

Hint 3: Signal Handling Setup

/* Set up signal handlers for the shell */
void setup_signals(shell_t *shell) {
    struct sigaction sa;

    /* Ignore job-control signals in the shell itself */
    signal(SIGTSTP, SIG_IGN);  /* Don't stop the shell */
    signal(SIGTTIN, SIG_IGN);  /* Don't stop on bg read */
    signal(SIGTTOU, SIG_IGN);  /* Don't stop on bg write */

    /* Handle SIGCHLD to reap children */
    sa.sa_handler = sigchld_handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = SA_RESTART | SA_NOCLDSTOP;
    sigaction(SIGCHLD, &sa, NULL);

    /* Handle SIGINT - just interrupt current line */
    sa.sa_handler = sigint_handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    sigaction(SIGINT, &sa, NULL);

    /* Put shell in its own process group */
    shell->shell_pgid = getpid();
    setpgid(shell->shell_pgid, shell->shell_pgid);

    /* Take control of terminal */
    tcsetpgrp(shell->terminal, shell->shell_pgid);

    /* Save terminal attributes */
    tcgetattr(shell->terminal, &shell->shell_tmodes);
}

/* SIGCHLD handler - mark children as reaped */
volatile sig_atomic_t got_sigchld = 0;

void sigchld_handler(int sig) {
    (void)sig;
    got_sigchld = 1;
}

/* Called from main loop to actually reap children */
void reap_children(shell_t *shell) {
    pid_t pid;
    int status;

    while ((pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) {
        update_job_status(shell, pid, status);
    }
}

Hint 4: Job Table Management

/* Add a new job */
job_t *add_job(shell_t *shell, pid_t pgid, const char *cmd) {
    job_t *job = calloc(1, sizeof(job_t));
    job->id = shell->next_job_id++;
    job->pgid = pgid;
    job->command = strdup(cmd);
    job->state = JOB_RUNNING;
    job->foreground = 0;

    /* Add to list */
    job->next = shell->jobs;
    shell->jobs = job;

    return job;
}

/* Update job status after waitpid */
void update_job_status(shell_t *shell, pid_t pid, int status) {
    job_t *job = find_job_by_pid(shell, pid);
    if (!job) return;

    process_t *proc = find_process(job, pid);
    if (!proc) return;

    if (WIFSTOPPED(status)) {
        proc->stopped = 1;
        job->state = JOB_STOPPED;
    } else {
        proc->completed = 1;
        proc->status = status;

        /* Check if all processes in job are done */
        if (all_processes_done(job)) {
            job->state = WIFEXITED(status) ? JOB_DONE : JOB_TERMINATED;
        }
    }
}

/* Foreground a job */
void foreground_job(shell_t *shell, job_t *job) {
    /* Give terminal to job */
    tcsetpgrp(shell->terminal, job->pgid);

    /* Continue if stopped */
    if (job->state == JOB_STOPPED) {
        tcsetattr(shell->terminal, TCSADRAIN, &job->tmodes);
        kill(-job->pgid, SIGCONT);
    }

    job->foreground = 1;
    job->state = JOB_RUNNING;

    /* Wait for job */
    wait_for_job(shell, job);

    /* Restore terminal to shell */
    tcsetpgrp(shell->terminal, shell->shell_pgid);
    tcgetattr(shell->terminal, &job->tmodes);  /* Save job's modes */
    tcsetattr(shell->terminal, TCSADRAIN, &shell->shell_tmodes);
}

Hint 5: Lexer State Machine

typedef enum {
    LEX_NORMAL,
    LEX_WORD,
    LEX_DQUOTE,      /* Inside "..." */
    LEX_SQUOTE,      /* Inside '...' */
    LEX_ESCAPE,      /* After \ */
    LEX_COMMENT,     /* After # */
} lexer_state_t;

token_t *lexer_next_token(lexer_t *lex) {
    char c;
    token_t *tok = token_new();

    skip_whitespace(lex);

    c = peek(lex);
    if (c == '\0' || c == EOF) {
        tok->type = TOK_EOF;
        return tok;
    }

    /* Check for operators first */
    if (is_operator_start(c)) {
        return scan_operator(lex);
    }

    /* Scan a word (potentially with quotes) */
    while ((c = peek(lex)) != '\0') {
        if (lex->state == LEX_NORMAL) {
            if (isspace(c) || is_metachar(c)) {
                break;  /* End of word */
            } else if (c == '"') {
                lex->state = LEX_DQUOTE;
                advance(lex);
            } else if (c == '\'') {
                lex->state = LEX_SQUOTE;
                advance(lex);
            } else if (c == '\\') {
                lex->state = LEX_ESCAPE;
                advance(lex);
            } else {
                append_char(tok, c);
                advance(lex);
            }
        } else if (lex->state == LEX_DQUOTE) {
            if (c == '"') {
                lex->state = LEX_NORMAL;
                advance(lex);
            } else if (c == '\\') {
                advance(lex);
                c = peek(lex);
                if (c == '"' || c == '\\' || c == '$' || c == '`') {
                    append_char(tok, c);
                    advance(lex);
                } else {
                    append_char(tok, '\\');
                }
            } else {
                append_char(tok, c);
                advance(lex);
            }
        } else if (lex->state == LEX_SQUOTE) {
            if (c == '\'') {
                lex->state = LEX_NORMAL;
                advance(lex);
            } else {
                append_char(tok, c);
                advance(lex);
            }
        } else if (lex->state == LEX_ESCAPE) {
            append_char(tok, c);
            advance(lex);
            lex->state = LEX_NORMAL;
        }
    }

    tok->type = TOK_WORD;
    return tok;
}

5.8 The Interview Questions They’ll Ask

  1. “Explain how job control works in a UNIX shell.”
    • Process groups: all processes in a pipeline share a PGID
    • Foreground process group: receives terminal signals
    • tcsetpgrp() to change foreground group
    • SIGTSTP to stop, SIGCONT to resume
    • Shell maintains job table with state
  2. “What signals does a shell need to handle and how?”
    • SIGCHLD: reap children, update job status
    • SIGINT: ignore at prompt, let foreground job handle
    • SIGTSTP: ignore in shell, let foreground job handle
    • SIGTTIN/SIGTTOU: background jobs stopped if accessing terminal
    • Block signals during critical sections
  3. “How do you prevent race conditions when handling SIGCHLD?”
    • Use sig_atomic_t for flag
    • Block SIGCHLD during job table updates
    • Use SA_RESTART to handle interrupted syscalls
    • Reap in main loop, not in handler
  4. “How does pipe setup work for a multi-stage pipeline?”
    • Create n-1 pipes for n commands
    • Fork each child, redirect stdin/stdout appropriately
    • Close all pipe ends in each process (including parent)
    • Put all children in same process group
  5. “What’s the difference between fork+exec and system()?”
    • fork+exec: full control over child setup
    • system(): runs through shell, security risks, less control
    • fork+exec allows redirections, environment setup, process group
  6. “How do you implement command substitution $(…)?”
    • Create pipe, fork
    • Child: redirect stdout to pipe write, exec the command
    • Parent: read pipe until EOF, collect output
    • Substitute output into command line
  7. “What happens when a background job tries to read from the terminal?”
    • Kernel sends SIGTTIN to the process group
    • Default action: stop the job
    • User must fg to allow reading
  8. “How do you implement here-documents?”
    • Collect lines until delimiter
    • Create pipe or temp file
    • Redirect stdin of command to pipe/file
    • Optionally expand variables (based on quoting)

5.9 Books That Will Help

Topic Book Chapter
Process control “APUE” by Stevens Ch. 8
Process groups, sessions “APUE” by Stevens Ch. 9
Signals “APUE” by Stevens Ch. 10
Terminal I/O “APUE” by Stevens Ch. 18
Pseudo terminals “APUE” by Stevens Ch. 19
Complete shell implementation “APUE” by Stevens All chapters combined
Parsing techniques “Compilers” by Aho et al. Ch. 2-4
Shell scripting (understand features) “Classic Shell Scripting” by Robbins All
Real shell source Bash/Zsh source code Study it!

5.10 Implementation Phases

Phase 1: Core Shell Loop (Week 1)

  • Basic read-eval-print loop
  • Simple command execution with fork/exec
  • Basic argument parsing
  • Exit builtin
  • Test: Run single commands like ls, pwd, echo hello

Phase 2: Pipes and Redirection (Week 1-2)

  • Implement pipe parsing and execution
  • Implement >, <, » redirection
  • Implement 2>, 2>&1
  • Test: ls | grep foo, cat < file.txt, cmd > out.txt

Phase 3: Job Control (Week 2-3)

  • Process groups for pipelines
  • Job table data structure
  • SIGCHLD handling and zombie reaping
  • Background jobs (&)
  • jobs, fg, bg builtins
  • SIGINT/SIGTSTP handling
  • Test: Background jobs, Ctrl-Z, fg/bg

Phase 4: Environment and Builtins (Week 3)

  • cd, pwd, export, unset builtins
  • Environment variable storage
  • Variable expansion $VAR
  • Local assignments VAR=value cmd
  • Test: export PATH, cd ~, echo $HOME

Phase 5: Interactive Features (Week 4)

  • Command history
  • History navigation (up/down arrows)
  • History expansion (!!, !n)
  • Basic line editing
  • Test: Arrow keys work, !! runs last command

Phase 6: Advanced Parsing (Week 4-5)

  • Proper quote handling
  • Escape sequences
  • Command substitution $(…)
  • Here-documents «
  • Test: echo "hello $USER", echo 'literal $'

Phase 7: Control Structures (Week 5-6)

  • if/then/else/fi
  • for/do/done
  • while/until/do/done
  • Functions
  • Test: Run shell scripts with loops

Phase 8: Completion and Polish (Week 6)

  • Tab completion
  • Aliases
  • Glob expansion
  • Error messages
  • Edge case handling
  • Test: Comprehensive test suite

5.11 Key Implementation Decisions

Decision 1: Readline vs Custom Line Editing

  • Readline: Feature-rich, well-tested, but adds dependency
  • Custom: Educational, no dependency, more control
  • Recommendation: Use readline initially, understand internals later

Decision 2: Parser Approach

  • Recursive descent: Simple, good for shell grammar
  • Parser generator (yacc/bison): More formal, harder to debug
  • Recommendation: Hand-written recursive descent for learning

Decision 3: AST vs Direct Execution

  • AST: Cleaner, easier to add features, required for scripts
  • Direct: Simpler for basic shell
  • Recommendation: Build AST from the start

Decision 4: Job Table Structure

  • Array: Simple, limited jobs
  • Linked list: Dynamic, efficient for sparse job numbers
  • Recommendation: Linked list for flexibility

Decision 5: Signal Handler Complexity

  • Minimal (set flag): Safe, simple
  • Do work in handler: Risky, race-prone
  • Recommendation: Set flags in handler, do work in main loop

6. Testing Strategy

6.1 Unit Tests

/* test_lexer.c */
void test_simple_word(void) {
    lexer_t *lex = lexer_new("hello");
    token_t *tok = lexer_next(lex);
    assert(tok->type == TOK_WORD);
    assert(strcmp(tok->value, "hello") == 0);
    token_free(tok);
    lexer_free(lex);
    printf("test_simple_word: PASSED\n");
}

void test_quoted_string(void) {
    lexer_t *lex = lexer_new("\"hello world\"");
    token_t *tok = lexer_next(lex);
    assert(tok->type == TOK_WORD);
    assert(strcmp(tok->value, "hello world") == 0);
    token_free(tok);
    lexer_free(lex);
    printf("test_quoted_string: PASSED\n");
}

void test_pipe_operator(void) {
    lexer_t *lex = lexer_new("foo | bar");
    token_t *tok1 = lexer_next(lex);
    assert(tok1->type == TOK_WORD);
    token_t *tok2 = lexer_next(lex);
    assert(tok2->type == TOK_PIPE);
    token_t *tok3 = lexer_next(lex);
    assert(tok3->type == TOK_WORD);
    /* ... cleanup ... */
    printf("test_pipe_operator: PASSED\n");
}

/* test_parser.c */
void test_simple_command(void) {
    ast_node_t *ast = parse("echo hello world");
    assert(ast->type == AST_COMMAND);
    assert(ast->u.command->argc == 3);
    ast_free(ast);
    printf("test_simple_command: PASSED\n");
}

void test_pipeline(void) {
    ast_node_t *ast = parse("ls | grep foo | wc");
    assert(ast->type == AST_PIPELINE);
    assert(ast->u.pipeline.count == 3);
    ast_free(ast);
    printf("test_pipeline: PASSED\n");
}

6.2 Integration Tests

#!/bin/bash
# tests/test_basic.sh

MYSH=./mysh
FAILED=0

test_cmd() {
    expected="$1"
    shift
    result=$(echo "$@" | $MYSH 2>&1)
    if [ "$result" != "$expected" ]; then
        echo "FAIL: $@ => got '$result', expected '$expected'"
        FAILED=$((FAILED + 1))
    else
        echo "PASS: $@"
    fi
}

# Basic commands
test_cmd "hello" "echo hello"
test_cmd "" "true"

# Pipes
test_cmd "2" "echo -e 'a\nb' | wc -l"

# Redirections
echo "hello" | $MYSH "cat > /tmp/test.txt"
test_cmd "hello" "cat /tmp/test.txt"
rm /tmp/test.txt

# Variables
test_cmd "/tmp" "export TESTVAR=/tmp; echo \$TESTVAR"

echo "Failed: $FAILED tests"
exit $FAILED

6.3 Edge Cases to Test

  1. Empty input - Shell should prompt again
  2. Only whitespace - Should not crash
  3. Unclosed quotes - Should prompt for continuation or error
  4. Very long command line - Test buffer limits
  5. Very long pipeline - 100 commands piped
  6. Circular aliases - Should detect and prevent
  7. Non-existent command - Proper error message
  8. Permission denied - Proper error message
  9. Interrupted read - SIGINT during readline
  10. Background job exit - Notification printed
  11. Ctrl-Z in foreground - Job stops correctly
  12. Multiple fg/bg - Switching works
  13. Nested quotes - echo "it's \"quoted\""
  14. Empty heredoc - cat << EOF\nEOF
  15. Script syntax error - Proper error reporting

6.4 Verification Commands

# Trace system calls
strace -f -e fork,execve,pipe,dup2,close,setpgid,tcsetpgrp ./mysh

# Check for memory leaks (run commands, then exit)
valgrind --leak-check=full ./mysh

# Compare with bash
diff <(echo "echo hello | cat" | ./mysh) <(echo "echo hello | cat" | bash)

# Run POSIX test suite (if available)
./run-posix-tests.sh

# Test signal handling
./mysh &
MYSH_PID=$!
kill -TSTP $MYSH_PID  # Should have no effect
kill -INT $MYSH_PID   # Should print new prompt
kill $MYSH_PID

# Test job control
echo "sleep 100 &; jobs; fg %1" | ./mysh

7. Common Pitfalls & Debugging

Problem 1: “Pipes work for simple commands but fail with builtins”

  • Why: Builtins execute in the shell process, can’t be piped normally
  • Fix: For pipelines, always fork (even for builtins) OR execute builtins in a subshell for piped contexts
  • Test: cd /tmp | cat - should cd not happen in parent

Problem 2: “History expansion breaks quoted strings”

  • Why: Expanding ! before parsing quotes
  • Fix: Handle quotes in lexer before history expansion, or escape ! in quotes
  • Test: echo "test!test" should not expand

Problem 3: “Background jobs print output randomly”

  • Why: Background job writes to terminal while user typing
  • Fix: For true solution: pty isolation. For simple: accept the behavior (like bash)
  • Test: cat /dev/urandom & then type

Problem 4: “Script works in bash but not your shell”

  • Why: Missing feature or different behavior
  • Fix: Study POSIX specification, implement correctly
  • Test: Run bash test suites

Problem 5: “Zombie processes accumulate”

  • Why: Not calling waitpid() for all children
  • Fix: Handle SIGCHLD, call waitpid(-1, …, WNOHANG) in a loop
  • Test: ps aux | grep defunct after many background jobs

Problem 6: “Shell hangs after forking”

  • Why: Forgot to close pipe ends in parent
  • Fix: Close all pipe FDs in parent after forking all children
  • Test: Run cat | cat | cat - should work without hanging

Problem 7: “Ctrl-C kills the shell”

  • Why: Shell not ignoring SIGINT when it should
  • Fix: Ignore SIGINT when reading input, only handle during job wait
  • Test: Press Ctrl-C at empty prompt

Problem 8: “fg command does nothing”

  • Why: Not calling tcsetpgrp() to give terminal to job
  • Fix: tcsetpgrp(fd, job->pgid) before waiting
  • Test: sleep 100 &; fg

8. Extensions & Challenges

8.1 Easy Extensions

  1. Prompt customization ($PS1) - Support escape sequences like \u, \h, \w
  2. Command timing - time cmd reports real/user/sys time
  3. Directory stack - pushd, popd, dirs
  4. Improved cd - CDPATH support, cd - for previous dir

8.2 Advanced Challenges

  1. Full POSIX compliance - Pass POSIX shell test suite
  2. Co-processes - Bidirectional pipe to subprocess
  3. Async event loop - Use epoll/kqueue for I/O
  4. Plugin system - Load custom builtins from shared libraries
  5. Remote shell - Work over SSH/PTY
  6. Syntax highlighting - Color commands as user types

8.3 Research Topics

  1. Bash source code - How do they handle all edge cases?
  2. Zsh architecture - What makes it extensible?
  3. Fish design - How do they achieve user-friendliness?
  4. Oil shell - Modern shell language design
  5. NuShell - Structured data in shell

9. Real-World Connections

9.1 Production Systems Using This

  1. Bash - Default shell on most Linux systems, macOS (pre-Catalina)
  2. Zsh - Default on macOS (Catalina+), popular among developers
  3. Dash - Minimal POSIX shell, used for /bin/sh on Debian/Ubuntu
  4. BusyBox ash - Embedded systems shell
  5. Fish - User-friendly interactive shell
  6. tcsh - BSD default, C shell descendant

9.2 How the Pros Do It

Bash architecture:

  • 100,000+ lines of C
  • Yacc-based parser
  • Extensive POSIX compliance
  • Decades of bug fixes

Zsh features:

  • Loadable modules
  • Sophisticated completion system
  • Theming (oh-my-zsh)
  • Advanced globbing

Fish innovations:

  • Web-based configuration
  • Autosuggestions
  • Syntax highlighting by default
  • Simplified scripting language

9.3 Reading the Source

  • Bash: https://git.savannah.gnu.org/cgit/bash.git
  • Zsh: https://github.com/zsh-users/zsh
  • Fish: https://github.com/fish-shell/fish-shell
  • Dash: https://git.kernel.org/pub/scm/utils/dash/dash.git

Start with:

  • parse.y / parser.c for grammar
  • execute.c / exec.c for execution
  • jobs.c for job control
  • builtins/ for builtin commands

10. Resources

10.1 Man Pages

Essential reading:

  • man 2 fork - Process creation
  • man 2 execve - Program execution
  • man 2 wait, man 2 waitpid - Wait for children
  • man 2 setpgid, man 2 getpgid - Process groups
  • man 2 setsid - Create session
  • man 3 tcsetpgrp - Set foreground process group
  • man 2 pipe - Create pipe
  • man 2 dup2 - Duplicate file descriptor
  • man 2 sigaction - Signal handling
  • man 7 signal - Signal overview
  • man 3 readline - GNU readline library
  • man 3 glob - Pathname pattern matching

10.2 Online Resources

10.3 Book Chapters

Topic Book Chapter
Process control “APUE” by Stevens Ch. 8
Process relationships “APUE” by Stevens Ch. 9
Signals “APUE” by Stevens Ch. 10
Terminal I/O “APUE” by Stevens Ch. 18
Pseudo terminals “APUE” by Stevens Ch. 19
All combined “APUE” by Stevens Entire book
Parsing “Compilers” (Dragon Book) Ch. 2-4
Shell scripting “Classic Shell Scripting” All

11. Self-Assessment Checklist

Before considering this project complete, verify:

  • I can explain the difference between process groups and sessions
  • I understand how the controlling terminal works
  • I can explain what happens when the user presses Ctrl-C
  • I understand why pipes require careful FD management
  • I can explain how job control signals work
  • My shell properly reaps all child processes
  • Background jobs work correctly
  • fg/bg commands work correctly
  • Ctrl-Z stops foreground jobs
  • Signal handling doesn’t have race conditions
  • History navigation works
  • Tab completion works
  • Basic scripts run correctly
  • Variable expansion works
  • Quote handling is correct
  • I can answer all the interview questions
  • My code compiles without warnings
  • valgrind reports zero memory leaks
  • My shell is usable for daily work

12. Submission / Completion Criteria

This project is complete when:

  1. Core Functionality:
    • Basic command execution works
    • Pipes work for arbitrary pipeline length
    • All redirection types work (>, », <, 2>, 2>&1)
    • Background jobs (&) work
    • Job control works (jobs, fg, bg)
    • Environment variables work
  2. Interactive Features:
    • Command history with up/down arrows
    • History expansion (!!, !n)
    • Tab completion for files
    • Prompt displays correctly
  3. Builtins:
    • cd (with ~, -)
    • exit
    • export/unset
    • pwd
    • type
    • alias
  4. Signal Handling:
    • Ctrl-C doesn’t kill shell
    • Ctrl-Z stops foreground jobs
    • SIGCHLD properly handled
    • No zombie processes
  5. Scripting (Basic):
    • Script files execute
    • if/then/fi works
    • for loop works
    • Variable expansion in scripts
  6. Quality:
    • All tests pass
    • Zero valgrind errors
    • Can be used as daily shell

Connection to All Previous Projects

This final project integrates knowledge from every previous project:

Previous Project Concepts Used Here
Project 1-3 (File I/O) Redirections, file operations
Project 4 (Basic Shell) Foundation for this project
Project 5 (Process Monitor) Understanding /proc, process info
Project 6 (Signals) All signal handling in shell
Project 7-8 (Threading) Concurrency concepts for job design
Project 9 (Daemon) Session leader, detaching, setsid
Project 10 (File Watcher) Script file monitoring concepts
Project 11 (epoll) Could be used for input handling
Project 12-13 (Network) Protocol parsing informs lexer/parser
Project 14 (IPC) Pipes are primary IPC in shell
Project 15 (Terminal) PTY concepts, termios, raw mode
Project 16 (Database) Could persist history, aliases

Final Thoughts

Building a complete shell is the ultimate test of UNIX systems programming knowledge. When you can build a shell from scratch:

  1. You understand processes - Creation, execution, lifecycle, groups, sessions
  2. You understand signals - Delivery, handling, blocking, race conditions
  3. You understand I/O - File descriptors, redirection, pipes, terminals
  4. You understand parsing - Lexers, parsers, ASTs, state machines
  5. You understand job control - Foreground, background, suspension, terminal ownership

This knowledge applies everywhere in systems programming:

  • Writing servers (fork, signals, I/O)
  • Building containers (namespaces build on process isolation)
  • Debugging production issues (strace, signal handling)
  • Performance optimization (understanding process overhead)

You’ve now completed the Advanced UNIX Programming sprint. You can:

  • Read and understand the source code of bash, zsh, or fish
  • Debug complex process and signal issues
  • Design and implement sophisticated system software
  • Ace systems programming interviews

Welcome to the ranks of engineers who truly understand UNIX.


The UNIX Workstation: Final Integration

As the ultimate capstone, consider combining your shell with other projects:

┌─────────────────────────────────────────────────────────────────────┐
│                    YOUR UNIX WORKSTATION                             │
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  Terminal Emulator (Project 15)                              │    │
│  │  ┌─────────────────────────────────────────────────────────┐│    │
│  │  │  Your Shell (Project 17)                                 ││    │
│  │  │  ┌─────────────────────────────────────────────────────┐││    │
│  │  │  │  Running: Your HTTP Server (Project 13)             │││    │
│  │  │  │           Your File Watcher (Project 10)            │││    │
│  │  │  │           Your IPC Hub (Project 14)                 │││    │
│  │  │  └─────────────────────────────────────────────────────┘││    │
│  │  │                                                          ││    │
│  │  │  History stored in: Your Database (Project 16)          ││    │
│  │  │  Status monitored by: Your Process Monitor (Project 5)  ││    │
│  │  └─────────────────────────────────────────────────────────┘│    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
│  All code written by you, from first principles.                     │
└─────────────────────────────────────────────────────────────────────┘

You’ve built a complete UNIX ecosystem. Congratulations.