Project 11: Line Editor (Mini-Readline)
Build a mini-readline line editor with cursor movement and basic editing.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert (The Systems Architect) |
| Time Estimate | 2 weeks |
| Main Programming Language | C |
| Alternative Programming Languages | Rust, Go |
| Coolness Level | Level 5: Pure Magic (Super Cool) |
| Business Potential | 1. The “Resume Gold” (Educational/Personal Brand) |
| Prerequisites | termios, escape sequences, buffers |
| Key Topics | raw mode, cursor control, rendering |
1. Learning Objectives
By completing this project, you will:
- Explain and implement raw mode in the context of a shell.
- Build a working line editor (mini-readline) that matches the project specification.
- Design tests that validate correctness and edge cases.
- Document design decisions, trade-offs, and limitations.
2. All Theory Needed (Per-Concept Breakdown)
Line Editing, Termios, and Interactive Input
Fundamentals
Interactive shells do more than read lines; they allow users to edit input with arrow keys, delete characters, and navigate history. This requires switching the terminal into raw or non-canonical mode using termios, reading input byte-by-byte, and interpreting escape sequences. A line editor maintains a buffer, a cursor position, and a render strategy so that the screen stays consistent with the buffer. Without proper line editing, a shell feels primitive and frustrating to use.
Deep Dive into the concept
Terminals typically operate in canonical mode by default: the kernel buffers input until the user presses Enter, and it handles editing locally. For a custom line editor, the shell must disable canonical mode and echoing, then implement editing itself. Using tcgetattr() and tcsetattr(), you can toggle flags like ICANON and ECHO. In raw or “cbreak” mode, every keypress is delivered to the program immediately, allowing the shell to implement custom behavior. This also means the shell must handle Ctrl+C, Ctrl+D, and other control keys explicitly.
Arrow keys and other special keys arrive as escape sequences, typically beginning with . For example, the left arrow often sends [D. A line editor must parse these sequences and translate them into cursor movement. This is usually implemented as a small state machine: when is seen, the editor reads subsequent bytes to identify the full sequence. The complexity grows as you support more keys (Home, End, Delete, Alt combinations).
Rendering is another challenge. The shell must redraw the current line after any edit without leaving artifacts. A common approach is to print `
` to return to the beginning of the line, print the prompt and the buffer, then clear to end-of-line with [K, and finally reposition the cursor if it is not at the end. This must work even when the line wraps across terminal rows. Handling multi-line input and dynamic terminal resizing (SIGWINCH) are advanced topics but valuable for a robust editor.
History integration ties into editing. When the user presses the up arrow, the editor must replace the current buffer with a previous entry while preserving any partially typed text (often called the “history scratch buffer”). A good editor supports incremental search (Ctrl+R), but even basic navigation requires careful buffer management. This is why many shells use libraries like GNU Readline; in this project you will implement a minimal version to understand the mechanics.
Finally, input editing must be resilient. If the user sends EOF (Ctrl+D) on an empty line, the shell typically exits. If Ctrl+D is pressed with non-empty input, many shells delete the character under the cursor. These rules should be documented and tested so your editor feels consistent with user expectations.
How this fits on projects Line editing is the foundation of the interactive experience and integrates with history and completion systems.
Definitions & key terms
- Canonical mode: Terminal mode where input is line-buffered by the kernel.
- Raw mode: Terminal mode where input is delivered immediately.
- Escape sequence: Multi-byte sequence for special keys.
- Cursor: Position in the input buffer.
Mental model diagram
[terminal bytes] -> [editor state machine] -> [buffer + cursor] -> [render]
How it works (step-by-step)
- Switch terminal to raw/cbreak mode.
- Read bytes one at a time.
- Decode printable chars vs control/escape sequences.
- Update buffer and cursor accordingly.
- Re-render the line and restore cursor position.
Minimal concrete example
Input: a b <left> X
Buffer: a X b
Cursor: between X and space
Common misconceptions
- “Arrow keys are single bytes” -> they are escape sequences.
- “Rendering is just printing the buffer” -> you must clear and reposition.
- “Ctrl+D always exits” -> only when buffer is empty.
Check-your-understanding questions
- Why must canonical mode be disabled for custom line editing?
- How do you detect an arrow key press?
- What is the purpose of clearing to end-of-line?
Check-your-understanding answers
- Canonical mode buffers input and handles editing in the kernel.
- Parse escape sequences that begin with
. - To remove leftover characters from previous renders.
Real-world applications
- Shells and REPLs.
- Text-mode interfaces and TUIs.
- CLI tools with interactive prompts.
Where you’ll apply it
- In this project: see §4.1 High-Level Design and §5.10 Phase 2.
- Also used in: P12 History System, P13 Tab Completion Engine
References
- GNU Readline documentation.
- POSIX termios interfaces.
Key insights Line editing is a mini text editor running inside a terminal stream.
Summary A usable shell requires raw-mode input, escape-sequence parsing, and careful re-rendering to keep the display consistent.
Homework/Exercises to practice the concept
- Build a raw-mode reader that echoes characters manually.
- Add left/right arrow support.
- Implement delete and backspace with correct cursor motion.
Solutions to the homework/exercises
- Use
termiosto disable ICANON and ECHO, then print bytes yourself. - Detect
[sequences and update the cursor. - Modify the buffer and re-render from the prompt.
Signals, Handlers, and Asynchronous Events
Fundamentals Signals are asynchronous notifications delivered to processes to indicate events like interrupts (Ctrl+C), terminal stop (Ctrl+Z), or child termination (SIGCHLD). A shell must carefully manage signal handling so that interactive control works as expected. Typically, the parent shell ignores certain signals (SIGINT, SIGTSTP) so it doesn’t die when the user presses Ctrl+C, while child processes restore default handlers so they can be interrupted. Correct signal behavior is essential for a usable shell.
Deep Dive into the concept
Signals are delivered by the kernel and can interrupt normal control flow. In a shell, this is both powerful and dangerous. For interactive use, the shell should ignore SIGINT and SIGQUIT while idle so that Ctrl+C does not kill the shell itself. However, when running a foreground job, the terminal driver delivers SIGINT to the foreground process group, which should be the job, not the shell. This is why the shell must set process groups and manage terminal control. The shell should also handle SIGCHLD to detect when background jobs exit or stop. A SIGCHLD handler can reap children with waitpid(-1, WNOHANG) and update job state; however, signal handlers must be async-signal-safe, so any complex work should be deferred to the main loop using a self-pipe or flag.
Signal dispositions are inherited across fork() and exec(). If the parent ignores SIGINT, the child would also ignore it unless it resets to default. The correct pattern is: in the parent, ignore SIGINT/SIGTSTP; after fork(), in the child, restore default handlers before execve(). This ensures that interactive interrupts apply to the child program. Similarly, if the shell installs a SIGCHLD handler, the child should either reset it or ensure it does not interfere with program behavior.
Another subtlety is the signal mask. Signals can be blocked temporarily using sigprocmask. Shells often block SIGCHLD during critical sections (like updating the job table) to avoid race conditions. Without this, you can miss a child exit and leave a zombie or display a stale job status. The right approach is to block SIGCHLD, update shared data structures, then unblock and handle pending signals. This is a common pattern in robust shells.
Signal handling must also consider system calls that are interrupted. read() on stdin may return EINTR when a signal arrives. An interactive shell should treat this as a normal condition, maybe redisplay the prompt or re-read input. Using sigaction with SA_RESTART can reduce interruptions for some syscalls, but you must understand which calls will restart and which will not.
Finally, signal behavior influences scripts. In a script, a SIGINT should generally terminate the script unless handled. Some shells provide trap to install user handlers. If you plan to implement trap, you need to record handlers in shell state and run them when signals are received. Even without full trap support, you should at least ensure that signals terminate child processes as expected.
How this fits on projects Signals interact with job control, pipelines, and the line editor. They are central to responsive interactive behavior.
Definitions & key terms
- Signal: Asynchronous notification delivered to a process.
- SIGINT: Interrupt (Ctrl+C).
- SIGTSTP: Terminal stop (Ctrl+Z).
- SIGCHLD: Child process state change.
- Signal disposition: How a process handles a signal (default, ignore, handler).
Mental model diagram
Ctrl+C -> terminal -> SIGINT -> foreground process group
How it works (step-by-step)
- Parent shell sets signal dispositions (ignore SIGINT/SIGTSTP).
- Before exec, child restores default handlers.
- Shell installs SIGCHLD handler to reap background jobs.
- Use
sigprocmaskto block SIGCHLD while updating jobs. - Handle EINTR or use SA_RESTART as appropriate.
Minimal concrete example
struct sigaction sa = {0};
sa.sa_handler = SIG_IGN;
sigaction(SIGINT, &sa, NULL);
Common misconceptions
- “Signals are synchronous” -> they interrupt at arbitrary times.
- “Ignore in parent means ignore everywhere” -> children inherit unless reset.
- “SIGCHLD handler can do anything” -> only async-signal-safe actions are allowed.
Check-your-understanding questions
- Why must children reset SIGINT to default before exec?
- What is the purpose of SIGCHLD in a shell?
- Why block SIGCHLD while updating the job table?
Check-your-understanding answers
- So user interrupts affect child programs rather than the shell.
- To detect child exits and avoid zombies.
- To prevent race conditions between handler and main loop.
Real-world applications
- Interactive shells and REPLs.
- Process supervisors reacting to child exits.
- Terminal-based applications handling Ctrl+C.
Where you’ll apply it
- In this project: see §4.1 High-Level Design and §6 Testing Strategy.
- Also used in: P08 Signal Handler, P09 Job Control System
References
- “Advanced Programming in the UNIX Environment” (signals).
- POSIX signal semantics.
Key insights Signals are asynchronous; robust shells tame them with careful masking and reset rules.
Summary Correct signal handling separates a responsive interactive shell from an unusable one and prevents zombies or unkillable processes.
Homework/Exercises to practice the concept
- Install a SIGINT handler that prints a message in the parent.
- Fork a child and ensure Ctrl+C terminates the child, not the parent.
- Implement a SIGCHLD handler that reaps children.
Solutions to the homework/exercises
- Use
sigactionwithSIG_IGNor a custom handler. - Reset the handler to default in the child before exec.
- Call
waitpid(-1, &status, WNOHANG)in the handler.
3. Project Specification
3.1 What You Will Build
A line editor that supports insertion, deletion, and arrow-key navigation.
Included:
- Core feature set described above
- Deterministic CLI behavior and exit codes
Excluded:
- Full readline features optional.
3.2 Functional Requirements
- Requirement 1: Switch terminal to raw/cbreak mode.
- Requirement 2: Support left/right movement and backspace/delete.
- Requirement 3: Maintain an input buffer and cursor position.
- Requirement 4: Render the prompt + buffer and restore cursor.
- Requirement 5: Handle Ctrl+C, Ctrl+D, and Ctrl+L behavior.
3.3 Non-Functional Requirements
- Performance: Interactive latency under 50ms for typical inputs; pipeline setup should scale linearly.
- Reliability: No crashes on malformed input; errors reported clearly with non-zero status.
- Usability: Clear prompts, deterministic behavior, and predictable error messages.
3.4 Example Usage / Output
$ ./mysh
mysh> git status
# press up arrow
mysh> git status
# press left, backspace
mysh> git staus
# type 't'
mysh> git status
3.5 Data Formats / Schemas / Protocols
- Input buffer + cursor index.
3.6 Edge Cases
- Line wrapping
- Ctrl+D on empty line
- SIGWINCH resize
3.7 Real World Outcome
This is the exact behavior you should be able to demonstrate.
3.7.1 How to Run (Copy/Paste)
- make
- ./mysh
3.7.2 Golden Path Demo (Deterministic)
$ ./mysh
mysh> git status
# press up arrow
mysh> git status
# press left, backspace
mysh> git staus
# type 't'
mysh> git status
3.7.3 Failure Demo (Deterministic)
$ ./mysh
mysh> not_a_command
mysh> echo $?
127
4. Solution Architecture
4.1 High-Level Design
[Input] -> [Parser/Lexer] -> [Core Engine] -> [Executor/Output]
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Input Reader | Reads bytes and decodes escape sequences | Small state machine. |
| Buffer Manager | Insert/delete operations | Keeps cursor index in range. |
| Renderer | Redraw prompt and buffer | Clears line and positions cursor. |
4.4 Data Structures (No Full Code)
struct LineBuffer { char *buf; size_t len; size_t cursor; };
4.4 Algorithm Overview
Key Algorithm: Render Loop
- Write prompt
- Write buffer
- Move cursor
Complexity Analysis:
- Time: O(n) per keypress
- Space: O(n) per keypress
5. Implementation Guide
5.1 Development Environment Setup
# install dependencies (if any)
# build
make
5.2 Project Structure
project-root/
├── src/
│ ├── main.c
│ ├── lexer.c
│ └── executor.c
├── tests/
│ └── test_basic.sh
├── Makefile
└── README.md
5.3 The Core Question You’re Answering
How does a shell let users edit input like a text editor?
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Terminal modes
- Escape sequences
- Screen redraw
5.5 Questions to Guide Your Design
5.6 Thinking Exercise
The “Cursor vs Buffer” Problem
How do you handle inserting a character in the middle of the line?
5.7 The Interview Questions They’ll Ask
5.8 Hints in Layers
Hint 1: Use termios to disable ICANON and ECHO Save and restore terminal settings.
Hint 2: Keep a cursor index Modify buffer at cursor, then redraw.
Hint 3: Use carriage return + clear to end
\r and \x1b[K help redraw lines.
Hint 4: Handle multi-byte escape sequences
Arrow keys start with \x1b.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Terminal I/O | “The Linux Programming Interface” | Ch. 62 |
| Readline concepts | GNU Readline Manual | Key bindings |
5.10 Implementation Phases
Phase 1: Foundation (2-3 days)
Goals:
- Define data structures and interfaces
- Build a minimal end-to-end demo
Tasks:
- Implement the core data structures
- Build a tiny CLI or harness for manual tests
Checkpoint: A demo command runs end-to-end with clear logging.
Phase 2: Core Functionality (1 week)
Goals:
- Implement full feature set
- Validate with unit tests
Tasks:
- Implement core requirements
- Add error handling and edge cases
Checkpoint: All functional requirements pass basic tests.
Phase 3: Polish & Edge Cases (2-4 days)
Goals:
- Harden for weird inputs
- Improve UX and documentation
Tasks:
- Add edge-case tests
- Document design decisions
Checkpoint: Deterministic golden demo and clean error output.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Parsing depth | Minimal vs full | Incremental | Start small, expand safely |
| Error policy | Silent vs verbose | Verbose | Debuggability for learners |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test individual components | Tokenizer, matcher, env builder |
| Integration Tests | Test component interactions | Full command lines |
| Edge Case Tests | Handle boundary conditions | Empty input, bad args |
6.2 Critical Test Cases
- Golden Path: Run the canonical demo and verify output.
- Failure Path: Provide invalid input and confirm error status.
- Stress Path: Run repeated commands to detect leaks or state corruption.
6.3 Test Data
input: echo hello
output: hello
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Misordered redirection | Output goes to wrong place | Apply redirections left-to-right |
| Leaked file descriptors | Commands hang waiting for EOF | Close unused fds in parent/child |
| Incorrect exit status | &&/|| behave wrong |
Use waitpid macros correctly |
7.2 Debugging Strategies
- Trace syscalls: Use
strace/dtrussto verify fork/exec/dup2 order. - Log state transitions: Print parser states and job table changes in debug mode.
- Compare with
dash: Run the same input in a reference shell.
7.3 Performance Traps
- Avoid O(n^2) behavior in hot paths like line editing.
- Minimize allocations inside the REPL loop.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a
helpbuilt-in with usage docs. - Add colored prompt themes.
8.2 Intermediate Extensions
- Add a simple profiling mode for command timing.
- Implement a
whichbuilt-in using PATH lookup.
8.3 Advanced Extensions
- Add programmable completion or plugin system.
- Add a scriptable test harness with golden outputs.
9. Real-World Connections
9.1 Industry Applications
- Build systems: shells orchestrate compilation and test pipelines.
- DevOps automation: scripts manage deployments and infrastructure.
9.2 Related Open Source Projects
- bash: The most common interactive shell.
- dash: Minimal POSIX shell often used as /bin/sh.
- zsh: Feature-rich interactive shell.
9.3 Interview Relevance
- Process creation and lifecycle questions.
- Parsing and system programming design trade-offs.
10. Resources
10.1 Essential Reading
- “The Linux Programming Interface” by Michael Kerrisk - focus on the chapters relevant to this project.
- “Advanced Programming in the UNIX Environment” - process control and pipes.
10.2 Video Resources
- Unix process model lectures (any OS course).
- Compiler front-end videos for lexing/parsing projects.
10.3 Tools & Documentation
- strace/dtruss: inspect syscalls.
- man pages:
fork,execve,waitpid,pipe,dup2.
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the core concept without notes.
- I can trace a command through my subsystem.
- I understand at least one key design trade-off.
11.2 Implementation
- All functional requirements are met.
- All critical tests pass.
- Edge cases are handled cleanly.
11.3 Growth
- I documented lessons learned.
- I can explain this project in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Core feature works for the golden demo.
- Errors are handled with non-zero status.
- Code is readable and buildable.
Full Completion:
- All functional requirements met.
- Tests cover edge cases and failures.
Excellence (Going Above & Beyond):
- Performance benchmarks and clear documentation.
- Behavior compared against a reference shell.