Project 5: A Simple Shell

Build a small shell that runs programs, supports pipes, and handles I/O redirection.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	2-4 weeks
Language	C (Alternatives: Rust, Go)
Prerequisites	fork/exec/wait, file descriptors, parsing
Key Topics	Process control, pipes, redirection, parsing

1. Learning Objectives

By completing this project, you will:

Parse a command line into tokens and pipelines.
Implement fork/exec/wait for external commands.
Build pipe chains with pipe() and dup2().
Implement redirection with > and <.
Add minimal built-ins like cd and exit.

2. Theoretical Foundation

2.1 Core Concepts

Process creation: fork() creates a new process; exec() replaces it.
Pipes: A pipe is a pair of file descriptors; one writes, one reads.
Redirection: dup2() reassigns stdin/stdout to a file descriptor.
Built-ins: Commands like cd must run in the parent process.

2.2 Why This Matters

The shell is the orchestrator of Unix. Building one forces you to integrate parsing, process control, and I/O wiring into a single system.

2.3 Historical Context / Background

The Unix shell introduced the idea of process pipelines and scripting. Modern shells are advanced, but the core model is still fork/exec/pipe.

2.4 Common Misconceptions

“cd is just another command”: It must change the current process, not a child.
“Pipes are magic”: They are just file descriptors wired with dup2().

3. Project Specification

3.1 What You Will Build

A command-line shell my_shell that reads input, executes external commands, supports a single pipeline (A | B), and basic redirection (< and >). Include built-ins cd and exit.

3.2 Functional Requirements

Execute simple commands with arguments.
Support | to connect two commands.
Support < and > for input and output redirection.
Handle cd and exit as built-ins.

3.3 Non-Functional Requirements

Reliability: Close unused pipe ends to avoid deadlocks.
Usability: Provide a basic prompt and handle empty input.
Correctness: Return proper exit codes from child processes.

3.4 Example Usage / Output

$ ./my_shell
> ls -l | grep .c > output.txt
> cat output.txt
-rw-r--r-- 1 user user 5000 my_wc.c
-rw-r--r-- 1 user user 3500 my_ls.c
> exit

3.5 Real World Outcome

When you run ./my_shell, you see a prompt. You can type ls | grep .md and the output is filtered through your pipeline. If you type echo "hello" > out.txt, a new file appears, and cat out.txt shows the text. This is a working mini-shell that behaves like a simplified bash.

4. Solution Architecture

4.1 High-Level Design

read line -> tokenize -> parse pipeline/redirection -> fork/exec -> wait

4.2 Key Components

Component	Responsibility	Key Decisions
Tokenizer	Split input into tokens	Whitespace-based, handle quotes later
Parser	Identify pipes and redirection	Support one pipe initially
Executor	Fork and exec commands	Use `execvp`
Built-ins	`cd`, `exit`	Run in parent

4.3 Data Structures

typedef struct {
    char *argv[64];
    char *input_file;
    char *output_file;
} Command;

4.4 Algorithm Overview

Key Algorithm: Two-command pipeline

Parse A | B into two Command structs.
Create a pipe.
Fork child 1 for A, redirect stdout to pipe write end.
Fork child 2 for B, redirect stdin to pipe read end.
Close pipe ends in parent, wait for both.

Complexity Analysis:

Time: O(n) for parsing, O(1) processes per command
Space: O(n) for tokens

5. Implementation Guide

5.1 Development Environment Setup

cc -Wall -Wextra -O2 -o my_shell my_shell.c

5.2 Project Structure

my_shell/
├── src/
│   └── my_shell.c
├── tests/
│   └── test_shell.sh
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How does a shell connect independent programs into a single pipeline?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

fork/exec/wait
- What happens to memory after fork()?
- Why does exec() not return on success?
File descriptor duplication
- How does dup2() replace stdin/stdout?
- Why must you close unused pipe ends?
Parsing basics
- How do you split a line into tokens?
- What happens if there are multiple spaces?

5.5 Questions to Guide Your Design

Before implementing, think through these:

How will you represent a command and its arguments?
What happens if the user presses Enter on an empty line?
How will you handle quotes or escaped spaces later?

5.6 Thinking Exercise

Trace a Pipeline by Hand

For ls | grep .c, draw the file descriptor table for both child processes after dup2().

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

“Why must cd be a built-in?”
“How does a pipe connect stdout to stdin?”
“What happens if you forget to close the write end of a pipe?”

5.8 Hints in Layers

Hint 1: Start with single command execution Ignore pipes and redirection until fork/exec works.

Hint 2: Add output redirection Use open() and dup2() for >.

Hint 3: Add a simple pipe Support exactly one | and two commands.

5.9 Books That Will Help

Topic	Book	Chapter
Process control	“The Linux Programming Interface”	Ch. 24-27
Pipes and redirection	“The Linux Programming Interface”	Ch. 44
Shell design	“Advanced Programming in the UNIX Environment”	Ch. 8

5.10 Implementation Phases

Phase 1: Foundation (4-6 days)

Goals:

Execute a simple command

Tasks:

Read line with getline().
Tokenize and execvp().

Checkpoint: ./my_shell can run ls.

Phase 2: Core Functionality (6-10 days)

Goals:

Add redirection and built-ins

Tasks:

Implement > and < with dup2().
Add cd and exit.

Checkpoint: cd changes directories, redirection works.

Phase 3: Pipes and Polish (6-10 days)

Goals:

Add | pipelines
Improve error handling

Tasks:

Parse a single pipe.
Wire two children with pipe().

Checkpoint: ls | grep .c works reliably.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Parsing	Simple whitespace split vs full quotes	Simple split	Keep scope limited
Pipe support	Single pipe vs multiple	Single pipe	Learn core mechanism first

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Tokenization and parsing	Input strings
Integration Tests	End-to-end shell runs	Scripted commands
Edge Case Tests	Empty input, invalid commands	Error handling

6.2 Critical Test Cases

Invalid command: Prints error and continues.
Redirection to file: Output file contains expected data.
Pipeline: echo hi | tr a-z A-Z prints HI.

6.3 Test Data

ls
pwd
ls | grep .md

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Not closing pipe ends	Commands hang	Close unused pipe fds
Running `cd` in child	Directory unchanged	Run built-ins in parent
Parsing errors	Wrong arguments	Debug tokenization

7.2 Debugging Strategies

Use strace or dtruss to inspect syscalls.
Print tokens and parsed commands for debugging.

7.3 Performance Traps

Forking for built-ins wastes resources; keep them in the parent process.

8. Extensions & Challenges

8.1 Beginner Extensions

Add pwd as a built-in.
Support multiple spaces and tabs robustly.

8.2 Intermediate Extensions

Support multiple pipes (A | B | C).
Add background execution (&).

8.3 Advanced Extensions

Implement job control (fg/bg).
Add command history and line editing.

9. Real-World Connections

9.1 Industry Applications

DevOps tooling: Many deployment scripts rely on shell primitives.
Build systems: Build pipelines are chains of commands.

bash and dash: Production shell implementations.

9.3 Interview Relevance

Shell internals test deep understanding of processes, pipes, and file descriptors.

10. Resources

10.1 Essential Reading

“The Linux Programming Interface” by Michael Kerrisk - Ch. 24-27, 44
“Advanced Programming in the UNIX Environment” by Stevens & Rago - Ch. 8

10.2 Video Resources

OS process control lectures (any systems programming course)

10.3 Tools & Documentation

man 2 fork: Process creation
man 2 execve: Program execution
man 2 pipe: Pipe creation

my_cat: Reinforces file descriptor I/O.
Linux From Scratch: Uses your shell as a core tool.

11. Self-Assessment Checklist

11.1 Understanding

I can explain why cd must be a built-in.
I can diagram a pipe with file descriptors.
I understand how exec replaces the process.

11.2 Implementation

Simple commands run correctly.
Redirection and pipes work.
Errors are handled gracefully.

11.3 Growth

I can add multiple pipes.
I can explain this project in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

Runs single commands.
Implements exit and cd.

Full Completion:

Supports one pipe and redirection.
Handles invalid commands without crashing.

Excellence (Going Above & Beyond):

Supports pipelines of arbitrary length.
Adds job control and history.

This guide was generated from LEARN_GNU_TOOLS_DEEP_DIVE.md. For the complete learning path, see the parent directory.

Project 5: A Simple Shell

Quick Reference

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Historical Context / Background

2.4 Common Misconceptions

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Real World Outcome

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

Trace a Pipeline by Hand

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (4-6 days)

Phase 2: Core Functionality (6-10 days)

Phase 3: Pipes and Polish (6-10 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria