Project 5: A Simple Shell
Build a small shell that runs programs, supports pipes, and handles I/O redirection.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-4 weeks |
| Language | C (Alternatives: Rust, Go) |
| Prerequisites | fork/exec/wait, file descriptors, parsing |
| Key Topics | Process control, pipes, redirection, parsing |
1. Learning Objectives
By completing this project, you will:
- Parse a command line into tokens and pipelines.
- Implement
fork/exec/waitfor external commands. - Build pipe chains with
pipe()anddup2(). - Implement redirection with
>and<. - Add minimal built-ins like
cdandexit.
2. Theoretical Foundation
2.1 Core Concepts
- Process creation:
fork()creates a new process;exec()replaces it. - Pipes: A pipe is a pair of file descriptors; one writes, one reads.
- Redirection:
dup2()reassigns stdin/stdout to a file descriptor. - Built-ins: Commands like
cdmust run in the parent process.
2.2 Why This Matters
The shell is the orchestrator of Unix. Building one forces you to integrate parsing, process control, and I/O wiring into a single system.
2.3 Historical Context / Background
The Unix shell introduced the idea of process pipelines and scripting. Modern shells are advanced, but the core model is still fork/exec/pipe.
2.4 Common Misconceptions
- “
cdis just another command”: It must change the current process, not a child. - “Pipes are magic”: They are just file descriptors wired with
dup2().
3. Project Specification
3.1 What You Will Build
A command-line shell my_shell that reads input, executes external commands, supports a single pipeline (A | B), and basic redirection (< and >). Include built-ins cd and exit.
3.2 Functional Requirements
- Execute simple commands with arguments.
- Support
|to connect two commands. - Support
<and>for input and output redirection. - Handle
cdandexitas built-ins.
3.3 Non-Functional Requirements
- Reliability: Close unused pipe ends to avoid deadlocks.
- Usability: Provide a basic prompt and handle empty input.
- Correctness: Return proper exit codes from child processes.
3.4 Example Usage / Output
$ ./my_shell
> ls -l | grep .c > output.txt
> cat output.txt
-rw-r--r-- 1 user user 5000 my_wc.c
-rw-r--r-- 1 user user 3500 my_ls.c
> exit
3.5 Real World Outcome
When you run ./my_shell, you see a prompt. You can type ls | grep .md and the output is filtered through your pipeline. If you type echo "hello" > out.txt, a new file appears, and cat out.txt shows the text. This is a working mini-shell that behaves like a simplified bash.
4. Solution Architecture
4.1 High-Level Design
read line -> tokenize -> parse pipeline/redirection -> fork/exec -> wait
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Tokenizer | Split input into tokens | Whitespace-based, handle quotes later |
| Parser | Identify pipes and redirection | Support one pipe initially |
| Executor | Fork and exec commands | Use execvp |
| Built-ins | cd, exit |
Run in parent |
4.3 Data Structures
typedef struct {
char *argv[64];
char *input_file;
char *output_file;
} Command;
4.4 Algorithm Overview
Key Algorithm: Two-command pipeline
- Parse
A | Binto twoCommandstructs. - Create a pipe.
- Fork child 1 for A, redirect stdout to pipe write end.
- Fork child 2 for B, redirect stdin to pipe read end.
- Close pipe ends in parent, wait for both.
Complexity Analysis:
- Time: O(n) for parsing, O(1) processes per command
- Space: O(n) for tokens
5. Implementation Guide
5.1 Development Environment Setup
cc -Wall -Wextra -O2 -o my_shell my_shell.c
5.2 Project Structure
my_shell/
├── src/
│ └── my_shell.c
├── tests/
│ └── test_shell.sh
├── Makefile
└── README.md
5.3 The Core Question You’re Answering
“How does a shell connect independent programs into a single pipeline?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- fork/exec/wait
- What happens to memory after
fork()? - Why does
exec()not return on success?
- What happens to memory after
- File descriptor duplication
- How does
dup2()replace stdin/stdout? - Why must you close unused pipe ends?
- How does
- Parsing basics
- How do you split a line into tokens?
- What happens if there are multiple spaces?
5.5 Questions to Guide Your Design
Before implementing, think through these:
- How will you represent a command and its arguments?
- What happens if the user presses Enter on an empty line?
- How will you handle quotes or escaped spaces later?
5.6 Thinking Exercise
Trace a Pipeline by Hand
For ls | grep .c, draw the file descriptor table for both child processes after dup2().
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “Why must
cdbe a built-in?” - “How does a pipe connect stdout to stdin?”
- “What happens if you forget to close the write end of a pipe?”
5.8 Hints in Layers
Hint 1: Start with single command execution
Ignore pipes and redirection until fork/exec works.
Hint 2: Add output redirection
Use open() and dup2() for >.
Hint 3: Add a simple pipe
Support exactly one | and two commands.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Process control | “The Linux Programming Interface” | Ch. 24-27 |
| Pipes and redirection | “The Linux Programming Interface” | Ch. 44 |
| Shell design | “Advanced Programming in the UNIX Environment” | Ch. 8 |
5.10 Implementation Phases
Phase 1: Foundation (4-6 days)
Goals:
- Execute a simple command
Tasks:
- Read line with
getline(). - Tokenize and
execvp().
Checkpoint: ./my_shell can run ls.
Phase 2: Core Functionality (6-10 days)
Goals:
- Add redirection and built-ins
Tasks:
- Implement
>and<withdup2(). - Add
cdandexit.
Checkpoint: cd changes directories, redirection works.
Phase 3: Pipes and Polish (6-10 days)
Goals:
- Add
|pipelines - Improve error handling
Tasks:
- Parse a single pipe.
- Wire two children with
pipe().
Checkpoint: ls | grep .c works reliably.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Parsing | Simple whitespace split vs full quotes | Simple split | Keep scope limited |
| Pipe support | Single pipe vs multiple | Single pipe | Learn core mechanism first |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Tokenization and parsing | Input strings |
| Integration Tests | End-to-end shell runs | Scripted commands |
| Edge Case Tests | Empty input, invalid commands | Error handling |
6.2 Critical Test Cases
- Invalid command: Prints error and continues.
- Redirection to file: Output file contains expected data.
- Pipeline:
echo hi | tr a-z A-ZprintsHI.
6.3 Test Data
ls
pwd
ls | grep .md
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Not closing pipe ends | Commands hang | Close unused pipe fds |
Running cd in child |
Directory unchanged | Run built-ins in parent |
| Parsing errors | Wrong arguments | Debug tokenization |
7.2 Debugging Strategies
- Use
straceordtrussto inspect syscalls. - Print tokens and parsed commands for debugging.
7.3 Performance Traps
Forking for built-ins wastes resources; keep them in the parent process.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
pwdas a built-in. - Support multiple spaces and tabs robustly.
8.2 Intermediate Extensions
- Support multiple pipes (
A | B | C). - Add background execution (
&).
8.3 Advanced Extensions
- Implement job control (fg/bg).
- Add command history and line editing.
9. Real-World Connections
9.1 Industry Applications
- DevOps tooling: Many deployment scripts rely on shell primitives.
- Build systems: Build pipelines are chains of commands.
9.2 Related Open Source Projects
- bash and dash: Production shell implementations.
9.3 Interview Relevance
Shell internals test deep understanding of processes, pipes, and file descriptors.
10. Resources
10.1 Essential Reading
- “The Linux Programming Interface” by Michael Kerrisk - Ch. 24-27, 44
- “Advanced Programming in the UNIX Environment” by Stevens & Rago - Ch. 8
10.2 Video Resources
- OS process control lectures (any systems programming course)
10.3 Tools & Documentation
man 2 fork: Process creationman 2 execve: Program executionman 2 pipe: Pipe creation
10.4 Related Projects in This Series
- my_cat: Reinforces file descriptor I/O.
- Linux From Scratch: Uses your shell as a core tool.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain why
cdmust be a built-in. - I can diagram a pipe with file descriptors.
- I understand how
execreplaces the process.
11.2 Implementation
- Simple commands run correctly.
- Redirection and pipes work.
- Errors are handled gracefully.
11.3 Growth
- I can add multiple pipes.
- I can explain this project in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Runs single commands.
- Implements
exitandcd.
Full Completion:
- Supports one pipe and redirection.
- Handles invalid commands without crashing.
Excellence (Going Above & Beyond):
- Supports pipelines of arbitrary length.
- Adds job control and history.
This guide was generated from LEARN_GNU_TOOLS_DEEP_DIVE.md. For the complete learning path, see the parent directory.