Project 5: A Simple Shell

Build a small shell that runs programs, supports pipes, and handles I/O redirection.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2-4 weeks
Language C (Alternatives: Rust, Go)
Prerequisites fork/exec/wait, file descriptors, parsing
Key Topics Process control, pipes, redirection, parsing

1. Learning Objectives

By completing this project, you will:

  1. Parse a command line into tokens and pipelines.
  2. Implement fork/exec/wait for external commands.
  3. Build pipe chains with pipe() and dup2().
  4. Implement redirection with > and <.
  5. Add minimal built-ins like cd and exit.

2. Theoretical Foundation

2.1 Core Concepts

  • Process creation: fork() creates a new process; exec() replaces it.
  • Pipes: A pipe is a pair of file descriptors; one writes, one reads.
  • Redirection: dup2() reassigns stdin/stdout to a file descriptor.
  • Built-ins: Commands like cd must run in the parent process.

2.2 Why This Matters

The shell is the orchestrator of Unix. Building one forces you to integrate parsing, process control, and I/O wiring into a single system.

2.3 Historical Context / Background

The Unix shell introduced the idea of process pipelines and scripting. Modern shells are advanced, but the core model is still fork/exec/pipe.

2.4 Common Misconceptions

  • cd is just another command”: It must change the current process, not a child.
  • “Pipes are magic”: They are just file descriptors wired with dup2().

3. Project Specification

3.1 What You Will Build

A command-line shell my_shell that reads input, executes external commands, supports a single pipeline (A | B), and basic redirection (< and >). Include built-ins cd and exit.

3.2 Functional Requirements

  1. Execute simple commands with arguments.
  2. Support | to connect two commands.
  3. Support < and > for input and output redirection.
  4. Handle cd and exit as built-ins.

3.3 Non-Functional Requirements

  • Reliability: Close unused pipe ends to avoid deadlocks.
  • Usability: Provide a basic prompt and handle empty input.
  • Correctness: Return proper exit codes from child processes.

3.4 Example Usage / Output

$ ./my_shell
> ls -l | grep .c > output.txt
> cat output.txt
-rw-r--r-- 1 user user 5000 my_wc.c
-rw-r--r-- 1 user user 3500 my_ls.c
> exit

3.5 Real World Outcome

When you run ./my_shell, you see a prompt. You can type ls | grep .md and the output is filtered through your pipeline. If you type echo "hello" > out.txt, a new file appears, and cat out.txt shows the text. This is a working mini-shell that behaves like a simplified bash.


4. Solution Architecture

4.1 High-Level Design

read line -> tokenize -> parse pipeline/redirection -> fork/exec -> wait

4.2 Key Components

Component Responsibility Key Decisions
Tokenizer Split input into tokens Whitespace-based, handle quotes later
Parser Identify pipes and redirection Support one pipe initially
Executor Fork and exec commands Use execvp
Built-ins cd, exit Run in parent

4.3 Data Structures

typedef struct {
    char *argv[64];
    char *input_file;
    char *output_file;
} Command;

4.4 Algorithm Overview

Key Algorithm: Two-command pipeline

  1. Parse A | B into two Command structs.
  2. Create a pipe.
  3. Fork child 1 for A, redirect stdout to pipe write end.
  4. Fork child 2 for B, redirect stdin to pipe read end.
  5. Close pipe ends in parent, wait for both.

Complexity Analysis:

  • Time: O(n) for parsing, O(1) processes per command
  • Space: O(n) for tokens

5. Implementation Guide

5.1 Development Environment Setup

cc -Wall -Wextra -O2 -o my_shell my_shell.c

5.2 Project Structure

my_shell/
├── src/
│   └── my_shell.c
├── tests/
│   └── test_shell.sh
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How does a shell connect independent programs into a single pipeline?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. fork/exec/wait
    • What happens to memory after fork()?
    • Why does exec() not return on success?
  2. File descriptor duplication
    • How does dup2() replace stdin/stdout?
    • Why must you close unused pipe ends?
  3. Parsing basics
    • How do you split a line into tokens?
    • What happens if there are multiple spaces?

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. How will you represent a command and its arguments?
  2. What happens if the user presses Enter on an empty line?
  3. How will you handle quotes or escaped spaces later?

5.6 Thinking Exercise

Trace a Pipeline by Hand

For ls | grep .c, draw the file descriptor table for both child processes after dup2().

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Why must cd be a built-in?”
  2. “How does a pipe connect stdout to stdin?”
  3. “What happens if you forget to close the write end of a pipe?”

5.8 Hints in Layers

Hint 1: Start with single command execution Ignore pipes and redirection until fork/exec works.

Hint 2: Add output redirection Use open() and dup2() for >.

Hint 3: Add a simple pipe Support exactly one | and two commands.

5.9 Books That Will Help

Topic Book Chapter
Process control “The Linux Programming Interface” Ch. 24-27
Pipes and redirection “The Linux Programming Interface” Ch. 44
Shell design “Advanced Programming in the UNIX Environment” Ch. 8

5.10 Implementation Phases

Phase 1: Foundation (4-6 days)

Goals:

  • Execute a simple command

Tasks:

  1. Read line with getline().
  2. Tokenize and execvp().

Checkpoint: ./my_shell can run ls.

Phase 2: Core Functionality (6-10 days)

Goals:

  • Add redirection and built-ins

Tasks:

  1. Implement > and < with dup2().
  2. Add cd and exit.

Checkpoint: cd changes directories, redirection works.

Phase 3: Pipes and Polish (6-10 days)

Goals:

  • Add | pipelines
  • Improve error handling

Tasks:

  1. Parse a single pipe.
  2. Wire two children with pipe().

Checkpoint: ls | grep .c works reliably.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Parsing Simple whitespace split vs full quotes Simple split Keep scope limited
Pipe support Single pipe vs multiple Single pipe Learn core mechanism first

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Tokenization and parsing Input strings
Integration Tests End-to-end shell runs Scripted commands
Edge Case Tests Empty input, invalid commands Error handling

6.2 Critical Test Cases

  1. Invalid command: Prints error and continues.
  2. Redirection to file: Output file contains expected data.
  3. Pipeline: echo hi | tr a-z A-Z prints HI.

6.3 Test Data

ls
pwd
ls | grep .md

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Not closing pipe ends Commands hang Close unused pipe fds
Running cd in child Directory unchanged Run built-ins in parent
Parsing errors Wrong arguments Debug tokenization

7.2 Debugging Strategies

  • Use strace or dtruss to inspect syscalls.
  • Print tokens and parsed commands for debugging.

7.3 Performance Traps

Forking for built-ins wastes resources; keep them in the parent process.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add pwd as a built-in.
  • Support multiple spaces and tabs robustly.

8.2 Intermediate Extensions

  • Support multiple pipes (A | B | C).
  • Add background execution (&).

8.3 Advanced Extensions

  • Implement job control (fg/bg).
  • Add command history and line editing.

9. Real-World Connections

9.1 Industry Applications

  • DevOps tooling: Many deployment scripts rely on shell primitives.
  • Build systems: Build pipelines are chains of commands.
  • bash and dash: Production shell implementations.

9.3 Interview Relevance

Shell internals test deep understanding of processes, pipes, and file descriptors.


10. Resources

10.1 Essential Reading

  • “The Linux Programming Interface” by Michael Kerrisk - Ch. 24-27, 44
  • “Advanced Programming in the UNIX Environment” by Stevens & Rago - Ch. 8

10.2 Video Resources

  • OS process control lectures (any systems programming course)

10.3 Tools & Documentation

  • man 2 fork: Process creation
  • man 2 execve: Program execution
  • man 2 pipe: Pipe creation
  • my_cat: Reinforces file descriptor I/O.
  • Linux From Scratch: Uses your shell as a core tool.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain why cd must be a built-in.
  • I can diagram a pipe with file descriptors.
  • I understand how exec replaces the process.

11.2 Implementation

  • Simple commands run correctly.
  • Redirection and pipes work.
  • Errors are handled gracefully.

11.3 Growth

  • I can add multiple pipes.
  • I can explain this project in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Runs single commands.
  • Implements exit and cd.

Full Completion:

  • Supports one pipe and redirection.
  • Handles invalid commands without crashing.

Excellence (Going Above & Beyond):

  • Supports pipelines of arbitrary length.
  • Adds job control and history.

This guide was generated from LEARN_GNU_TOOLS_DEEP_DIVE.md. For the complete learning path, see the parent directory.