Project 15: Mini Shell Implementation

Build a simplified shell that supports parsing, pipelines, redirection, and builtins.

Quick Reference

Attribute Value
Difficulty Level 5: Master
Time Estimate 4-6 weeks
Language Bash (Alternatives: C, Python)
Prerequisites Deep understanding of parsing, processes, file descriptors
Key Topics lexing, parsing, fork/exec, pipes, redirection, builtins

1. Learning Objectives

By completing this project, you will:

  1. Implement a lexer and parser for a shell-like grammar.
  2. Execute commands via fork/exec with proper argv handling.
  3. Implement pipelines and redirection with file descriptors.
  4. Add builtins like cd, exit, export.
  5. Handle signals and job control basics.

2. Theoretical Foundation

2.1 Core Concepts

  • Lexing and parsing: Tokenizing input into commands and operators.
  • Process creation: fork/exec and process hierarchies.
  • Pipes and redirection: file descriptor wiring.
  • Builtins vs external commands: run in parent vs child.
  • Signal handling: Ctrl+C and process interruption.

2.2 Why This Matters

This is the capstone that makes every previous shell concept concrete. You will internalize how commands become processes and how the shell orchestrates execution.

2.3 Historical Context / Background

Shells like sh, bash, and zsh are decades old, but their fundamentals remain the same: parse, expand, execute. You’re rebuilding those foundations.

2.4 Common Misconceptions

  • “Pipelines are magic.” They are just file descriptor plumbing.
  • “Builtins are just commands.” They must run in the parent to affect state.

3. Project Specification

3.1 What You Will Build

A minish program that reads commands, parses tokens, supports pipelines and redirections, and executes commands with a small set of builtins.

3.2 Functional Requirements

  1. Parsing: Tokenize words, pipes, redirections.
  2. Execution: Run external commands with argv.
  3. Pipelines: Support cmd1 | cmd2.
  4. Redirection: >, >>, <.
  5. Builtins: cd, exit, export.
  6. Signals: Ctrl+C interrupt.

3.3 Non-Functional Requirements

  • Correctness: match shell behavior for core features.
  • Reliability: handle errors gracefully.
  • Clarity: clean code and clear structure.

3.4 Example Usage / Output

$ minish
minish$ echo hello | tr a-z A-Z > out.txt
minish$ cat out.txt
HELLO

3.5 Real World Outcome

You can explain exactly how a shell interprets and executes commands and have a working implementation demonstrating it.


4. Solution Architecture

4.1 High-Level Design

input -> lexer -> parser -> executor -> child processes
                       |                |
                       |                +-> pipes/redirection
                       +-> builtins

Project 15: Mini Shell Implementation high-level design diagram

4.2 Key Components

Component Responsibility Key Decisions
Lexer Tokenize input handle quotes
Parser Build command AST pipeline nodes
Executor Run commands fork/exec
Builtins Run in parent modify env
Signal handler Ctrl+C behavior parent vs child

4.3 Data Structures

# pseudo-structure
Command { argv[], redirects[], pipe_to }

4.4 Algorithm Overview

Key Algorithm: Pipeline Execution

  1. Create pipes for N commands.
  2. Fork children and connect stdin/stdout.
  3. Execute each command.
  4. Close pipes in parent.
  5. Wait for children.

Complexity Analysis:

  • Time: O(n) for n commands
  • Space: O(n) for pipes

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install gcc make

5.2 Project Structure

minish/
|-- minish
|-- lib/
|   |-- lexer.sh
|   |-- parser.sh
|   |-- exec.sh
|   `-- builtins.sh

Project 15: Mini Shell Implementation project structure diagram

5.3 The Core Question You Are Answering

“How does a shell turn text into processes and connect them together?”

5.4 Concepts You Must Understand First

  1. File descriptors and redirection
  2. fork/exec model
  3. Lexing/parsing basics

5.5 Questions to Guide Your Design

  • How will you represent pipelines internally?
  • How do you detect builtins and run them in parent?
  • How will you handle quoting and spaces?

5.6 Thinking Exercise

Manually parse: cat file | grep error > out.txt. Draw the pipeline and file descriptors.

5.7 The Interview Questions They Will Ask

  1. Why must cd be a builtin?
  2. How do pipelines work under the hood?
  3. How do you implement redirection safely?

5.8 Hints in Layers

Hint 1: Start with executing a single command without pipes.

Hint 2: Add redirection parsing and file descriptor setup.

Hint 3: Add pipelines with two commands.

Hint 4: Add builtins and signal handling.

5.9 Books That Will Help

Topic Book Chapter
Shell internals “Advanced Programming in the UNIX Environment” Ch. 8-10
Processes “The Linux Programming Interface” Ch. 27

5.10 Implementation Phases

Phase 1: Lexer and Executor (1-2 weeks)

Goals:

  • Tokenize input and run single commands.

Tasks:

  1. Implement lexer with quotes.
  2. Run commands via exec.

Checkpoint: ls and echo work.

Phase 2: Redirection and Pipes (1-2 weeks)

Goals:

  • Support I/O redirection and pipelines.

Tasks:

  1. Parse >, <, >>.
  2. Implement two-command pipelines.

Checkpoint: echo hi | wc -c works.

Phase 3: Builtins and Signals (1 week)

Goals:

  • Builtins and Ctrl+C handling.

Tasks:

  1. Implement cd, exit, export.
  2. Handle SIGINT for children.

Checkpoint: Ctrl+C stops a running command.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Parsing recursive descent vs shunting simple recursive descent clarity
Builtins execute in child vs parent parent state changes
Redirection open files in parent vs child child minimal side effects

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit lexer/parser tokenization cases
Integration pipeline pipe output correctness
Edge Cases quoting spaces and quotes

6.2 Critical Test Cases

  1. echo "a b" preserves space.
  2. yes | head -n 1 terminates correctly.
  3. cd /tmp changes working dir.

6.3 Test Data

fixtures/lexer_cases.txt

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Not closing pipes commands hang close unused fds
Builtins in child cd no effect run in parent
Naive splitting quotes break implement lexer

7.2 Debugging Strategies

  • Print tokens and AST in debug mode.
  • Use strace to inspect syscalls.

7.3 Performance Traps

Spawning too many processes in pipelines can be costly; keep minimal overhead.


8. Extensions and Challenges

8.1 Beginner Extensions

  • Add history.
  • Add pwd builtin.

8.2 Intermediate Extensions

  • Support && and ||.
  • Add background & jobs.

8.3 Advanced Extensions

  • Implement job control (fg, bg).
  • Add command substitution.

9. Real-World Connections

9.1 Industry Applications

  • Foundation for building custom shells or REPLs.
  • Understanding process orchestration in containers.
  • dash: lightweight POSIX shell.
  • busybox sh: embedded shell.

9.3 Interview Relevance

  • Demonstrates system-level understanding.
  • Shows parsing + process control skills.

10. Resources

10.1 Essential Reading

  • POSIX Shell Command Language
  • Bash manual: expansions and redirection

10.2 Video Resources

  • “How a Shell Works” (YouTube)

10.3 Tools and Documentation

  • strace, lsof, man execve
  • Project 7: CLI Parser
  • Project 8: Process Supervisor

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain how pipes connect processes.
  • I can explain why builtins must run in parent.

11.2 Implementation

  • Redirection works correctly.
  • Pipelines execute without hangs.

11.3 Growth

  • I can extend the shell with job control.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Execute commands + basic redirection

Full Completion:

  • Pipelines + builtins + signals

Excellence (Going Above & Beyond):

  • Job control + command substitution