Project 6: Mini Text Editor (Capstone)

Build a terminal-based text editor with file loading, editing, saving, and undo/redo while managing memory safely.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 20-40 hours
Main Programming Language C (Alternatives: C++, Rust)
Alternative Programming Languages C++, Rust, Zig
Coolness Level Very High
Business Potential Medium (editor tooling, embedded UIs)
Prerequisites C pointers, dynamic memory, basic terminal usage
Key Topics Terminal raw mode, text buffers, undo/redo, file I/O

1. Learning Objectives

By completing this project, you will:

  1. Implement a terminal UI loop with raw input handling.
  2. Choose and implement a text buffer data structure.
  3. Manage memory for long-running interactive programs.
  4. Build an undo/redo system with clear ownership rules.
  5. Design a practical CLI editor with save/load and status display.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Terminal Raw Mode and Input Processing

Fundamentals

Terminal input is normally line-buffered: you only receive input after the user presses Enter. A text editor needs raw mode to receive each keystroke immediately. On Unix, raw mode is configured with termios by disabling canonical mode and echo. You must also handle special keys (arrows, backspace, delete) which arrive as escape sequences. Understanding raw mode and input decoding is essential because the editor’s UI depends on real-time keystrokes, not line-based input.

Deep Dive into the Concept

The terminal is a device with a driver that processes input before your program sees it. In canonical mode, the driver buffers input until newline and performs editing (backspace, kill-line). For an editor, you want full control, so you must disable canonical mode and echo using tcgetattr, tcsetattr, and manipulating flags like ICANON and ECHO. You often also disable ISIG so that Ctrl-C is delivered as a byte instead of a signal, depending on your design. Raw mode is not a single flag; it is a configuration of multiple flags in c_lflag, c_iflag, and c_oflag.

Once in raw mode, keys are delivered immediately. Regular ASCII characters are single bytes. Special keys are sequences beginning with ESC (0x1b) followed by bracketed codes (e.g., arrow up is ESC [ A). You must parse these sequences, often with a small state machine. A robust parser reads bytes, recognizes prefixes, and decodes them into internal key codes like KEY_UP, KEY_LEFT, etc. You must also handle partial sequences if input arrives slowly.

The editor must control terminal output as well. You should hide the cursor while redrawing and then restore it. ANSI escape codes allow you to clear the screen, move the cursor, and set colors. A standard approach is to render the entire screen each frame using a single write, which minimizes flicker. You can also use a “dirty region” model for efficiency, but full redraw is simpler and good enough for this project.

A critical safety issue is restoring terminal state on exit. If your program crashes or exits without restoring canonical mode, the user’s terminal becomes unusable. The editor must always restore the original termios settings in an atexit handler and in signal handlers (SIGINT, SIGTERM). This is a correctness requirement, not just a nicety.

Finally, input latency matters. You can use select or poll with a timeout to avoid blocking forever and to enable periodic UI refresh. For a minimal editor, blocking reads are acceptable, but a smoother editor uses non-blocking reads. Either way, you must ensure your input loop is predictable and does not consume 100% CPU. That is why careful handling of read and timeouts is part of the design.

How this fits on projects

Raw mode drives the UI loop in Section 5.2 and the TUI layout in Section 3.7. It also informs pitfalls in Section 7.

Definitions & key terms

  • Canonical mode: Line-buffered input mode.
  • Raw mode: Immediate input, no processing by terminal driver.
  • Escape sequence: Multi-byte sequence for special keys.
  • Termios: POSIX API for terminal attributes.

Mental model diagram (ASCII)

User key -> terminal driver -> raw bytes -> editor parser -> action

How it works (step-by-step, with invariants and failure modes)

  1. Save original termios settings.
  2. Disable canonical and echo flags.
  3. Read bytes and decode keys.
  4. Render screen and move cursor.
  5. Restore settings on exit.

Invariant: Terminal settings must be restored on exit.

Failure modes: Stuck terminal, mis-decoded keys, flicker.

Minimal concrete example

struct termios orig, raw;
tcgetattr(STDIN_FILENO, &orig);
raw = orig;
raw.c_lflag &= ~(ECHO | ICANON);
tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw);

Common misconceptions

  • “Raw mode is one flag.” -> It is several flags.
  • “Arrow keys are single bytes.” -> They are escape sequences.
  • “You can ignore terminal restore.” -> It breaks the user’s terminal.

Check-your-understanding questions

  1. Why disable ICANON?
  2. What is ESC [ A?
  3. Why is atexit important for a TUI?

Check-your-understanding answers

  1. To receive input immediately without Enter.
  2. Arrow up key escape sequence.
  3. To restore terminal settings even on normal exit.

Real-world applications

  • Terminal UIs (editors, dashboards).
  • Remote CLI tools that require interactive input.

Where you’ll apply it

References

  • “Advanced Programming in the UNIX Environment” (termios)
  • man 3 termios

Key insights

Raw mode turns the terminal into a stream of bytes you must interpret.

Summary

You understand how to enable raw mode and decode keystrokes for an editor.

Homework/Exercises to practice the concept

  1. Write a program that prints the hex value of each key pressed.
  2. Decode arrow keys and print their names.

Solutions to the homework/exercises

  1. Use read in raw mode and print 0x%02x.
  2. Detect ESC [ A/B/C/D sequences.

2.2 Text Buffer Data Structures (Gap Buffer Focus)

Fundamentals

A text editor needs a data structure that allows efficient insertions and deletions near the cursor. A gap buffer stores the text in a single array with a “gap” at the cursor position. Insertions write into the gap; deletions expand it. Moving the cursor moves the gap by shifting text. This provides good performance for typical editing patterns and is simpler than ropes or piece tables. Understanding the gap buffer is central to building a functional editor.

Deep Dive into the Concept

The gap buffer is a classic editor data structure. It maintains a contiguous array with two indices: gap_start and gap_end. The characters before gap_start are the text before the cursor; the characters after gap_end are the text after the cursor. The gap represents unused capacity where new characters can be inserted. When you insert, you write into gap_start and increment it. When you delete, you decrement gap_start or increment gap_end depending on backspace vs delete.

The key operation is moving the cursor. If the cursor moves left, you shift the character before the gap into the gap’s end, effectively moving the gap left. If the cursor moves right, you shift the character after the gap into the gap’s start. This costs O(k) for moving k positions. For typical editing (local edits), the gap buffer is efficient. For large cursor jumps, you pay a cost, but it is still acceptable for a small editor.

Gap buffers require a strategy for resizing when the gap is full. A common approach is to allocate a larger array (e.g., double the size), then copy text before and after the gap into the new array, creating a larger gap in the middle. This is similar to dynamic array growth. The resizing must preserve invariants: gap_start <= gap_end, and the array length equals text length + gap size. If resizing fails, the editor must handle the error gracefully.

Representing the cursor is straightforward: the cursor index equals gap_start. The text length is capacity - gap_size. Converting this into lines for display requires scanning the buffer and counting newlines. This is O(n) per frame; for a small editor, this is fine. For larger editors, you would maintain a line index, but that is beyond scope.

Gap buffers are not the only choice. Ropes are better for huge files, and piece tables are popular in modern editors. But a gap buffer is the best choice for this project because it is simple, teaches pointer arithmetic, and is efficient enough for medium-sized files. The goal is not to build Vim, but to understand how text editing works under the hood.

Finally, the gap buffer interacts with undo/redo. Each edit operation can be represented as an insertion or deletion at a cursor position. You can store these operations as commands and replay them. The gap buffer provides a stable, contiguous representation that makes these operations easy to implement.

How this fits on projects

Gap buffer design drives Section 3.2 requirements and the architecture in Section 4.2. It also informs performance discussion in Section 7.3.

Definitions & key terms

  • Gap buffer: Array with a gap at the cursor.
  • Gap start/end: Indices that bound the gap.
  • Capacity: Total array size.
  • Text length: Capacity minus gap size.

Mental model diagram (ASCII)

[hello][ GAP ][world]
 gap_start^   ^gap_end
cursor at gap_start

How it works (step-by-step, with invariants and failure modes)

  1. Insert: write at gap_start, increment.
  2. Delete: move gap_start backward, expand gap.
  3. Move cursor: shift characters across gap.
  4. Resize: allocate new buffer, copy text, expand gap.

Invariant: gap_start <= gap_end and text length = capacity - gap size.

Failure modes: Gap overflow, incorrect cursor shifts, resizing bugs.

Minimal concrete example

typedef struct {
    char *buf;
    size_t gap_start;
    size_t gap_end;
    size_t cap;
} gapbuf;

Common misconceptions

  • “Moving cursor is free.” -> It requires shifting bytes.
  • “Gap buffers are slow.” -> They are fast for local edits.
  • “Resizing can ignore existing content.” -> It must preserve text.

Check-your-understanding questions

  1. Why does insertion cost O(1) near the cursor?
  2. What happens when the gap is full?
  3. How do you compute text length?

Check-your-understanding answers

  1. It writes directly into the gap.
  2. You must resize and create a new gap.
  3. cap - (gap_end - gap_start).

Real-world applications

  • Classic editors (e.g., Emacs uses a gap buffer).
  • Text processing tools with local edits.

Where you’ll apply it

References

  • “The Craft of Text Editing” (gap buffer theory)
  • “Advanced Programming in the UNIX Environment” (I/O basics)

Key insights

The gap buffer turns edits into O(1) operations near the cursor.

Summary

You can now implement a gap buffer and explain its trade-offs.

Homework/Exercises to practice the concept

  1. Simulate inserting characters into a gap buffer by hand.
  2. Implement a cursor move and show how the gap shifts.

Solutions to the homework/exercises

  1. Insert into gap_start and increment it.
  2. Move one char from before gap to after gap.

2.3 Undo/Redo and Ownership Rules

Fundamentals

Undo/redo is a history of edits. Each edit must be recorded with enough information to reverse it. In C, that means explicitly allocating memory for undo records and defining ownership rules for that memory. For example, an insert operation can be undone by deleting the inserted text, but you must store the text and its position. Undo/redo systems teach disciplined memory management because every operation adds to history and must be freed eventually.

Deep Dive into the Concept

An undo system can be modeled as a stack of commands. Each command contains the operation type (insert/delete), the position, and the text involved. When you undo, you pop from the undo stack and apply the inverse operation, then push the command onto the redo stack. When you perform a new edit after undoing, the redo stack must be cleared and freed, because the history diverges. This is where ownership rules matter: the redo stack owns its command data, and when you discard it, you must free all its buffers.

In a C editor, command data often includes variable-length strings (the inserted or deleted text). You must allocate copies for these strings because the editor buffer will change after the edit. If you store pointers into the gap buffer, those pointers will be invalid after edits. Therefore each command should own a heap-allocated copy of the text. This can be small (single character) or large (paste). The system should be designed to handle both.

Memory pressure can become an issue. A long editing session could build a huge undo stack. You should implement a limit (e.g., maximum number of commands or maximum total bytes). When the limit is exceeded, discard the oldest commands and free their memory. This is a practical design rule used by real editors.

The undo system must interact with the buffer correctly. When you undo an insertion, you delete at the stored position. But if the cursor moved, the position might not match the current buffer. To make undo deterministic, record positions in terms of buffer indices, and apply them regardless of the current cursor. This assumes that the buffer hasn’t changed in ways that invalidate the position. For a simple editor, this is acceptable; for advanced editors, a more complex model is needed (e.g., operational transforms), but that is beyond scope.

Finally, a well-designed undo system is a strong test of memory safety. It forces you to allocate, copy, and free many small buffers, and it punishes leaks. Running the editor under Valgrind or ASan should show whether you have cleaned up correctly. This is why the undo/redo system is the capstone of memory management in this sprint.

How this fits on projects

Undo/redo informs Section 3.2 requirements, Section 6 testing, and the memory pitfall discussion in Section 7.

Definitions & key terms

  • Undo stack: Stack of commands to reverse operations.
  • Redo stack: Stack of commands to reapply undone operations.
  • Command: Operation record with text and position.
  • Ownership: Who frees the command’s memory.

Mental model diagram (ASCII)

[undo stack] <-> [redo stack]
 insert -> push undo
 undo   -> pop undo, push redo
 new edit -> clear redo

How it works (step-by-step, with invariants and failure modes)

  1. Apply edit, push command to undo stack.
  2. Undo: pop command, apply inverse, push to redo.
  3. Redo: pop redo, apply, push to undo.
  4. New edit clears redo stack.

Invariant: Commands own their text buffers.

Failure modes: Leaks, dangling pointers, incorrect positions.

Minimal concrete example

typedef struct {
    int type; // INSERT or DELETE
    size_t pos;
    char *text;
    size_t len;
} cmd;

Common misconceptions

  • “Undo can use pointers into the buffer.” -> The buffer changes.
  • “Redo stack can be kept forever.” -> It must be cleared after new edits.
  • “Memory usage is negligible.” -> It grows with history size.

Check-your-understanding questions

  1. Why must undo commands copy text data?
  2. What happens when you edit after undo?
  3. Why do you need a history limit?

Check-your-understanding answers

  1. The buffer mutates, making pointers invalid.
  2. Redo history becomes invalid and must be cleared.
  3. To prevent unbounded memory growth.

Real-world applications

  • Text editors, IDEs, document editors.
  • Command history in interactive tools.

Where you’ll apply it

References

  • “Designing Interfaces” (undo patterns)
  • “The Craft of Text Editing” (command history)

Key insights

Undo/redo is a memory ownership system disguised as a feature.

Summary

You understand how to implement undo/redo safely with explicit ownership.

Homework/Exercises to practice the concept

  1. Implement a stack of commands and push/pop operations.
  2. Simulate undo and redo with a sequence of edits.

Solutions to the homework/exercises

  1. Use a linked list or dynamic array of cmd.
  2. Apply inverse operations and verify buffer contents.

3. Project Specification

3.1 What You Will Build

A terminal-based text editor that:

  • Opens a file (optional), displays it, and edits in place.
  • Supports insert, delete, save, and quit.
  • Supports undo/redo with explicit history.
  • Displays a status bar and cursor position.

3.2 Functional Requirements

  1. File open/save: Load and save text files.
  2. Cursor movement: Arrow keys, page up/down.
  3. Editing: Insert characters, backspace, delete.
  4. Undo/redo: Ctrl-Z / Ctrl-Y.
  5. Status bar: Filename, line count, modified flag.

3.3 Non-Functional Requirements

  • Performance: Responsive for files up to 100k lines.
  • Reliability: No leaks or crashes under long sessions.
  • Usability: Clear keybindings and status messages.

3.4 Example Usage / Output

$ ./mini_editor notes.txt
[STATUS] notes.txt | 12 lines | UTF-8 | Ctrl-S save | Ctrl-Q quit

3.5 Data Formats / Schemas / Protocols

Gap buffer structure:

typedef struct {
    char *buf;
    size_t gap_start;
    size_t gap_end;
    size_t cap;
} gapbuf;

3.6 Edge Cases

  • Empty file or new file.
  • Large paste operations.
  • Undo after long session.
  • Saving to unwritable path.

3.7 Real World Outcome

The editor should render a TUI, accept key input, and save files reliably.

3.7.1 How to Run (Copy/Paste)

make
./mini_editor notes.txt

3.7.2 Golden Path Demo (Deterministic)

Open a file with fixed content and perform a known edit sequence (use a script or expect-style input):

$ ./mini_editor fixtures/hello.txt
# Press: i Hello<Esc>:wq (or Ctrl-S, Ctrl-Q)
[STATUS] hello.txt | 1 line | saved
Exit code: 0

3.7.3 If TUI: ASCII Layout + Interactions

+--------------------------------------------------+
| mini_editor - notes.txt                          |
|--------------------------------------------------|
| 1| Hello world                                   |
| 2|                                                |
| 3|                                                |
|                                                  |
|                                                  |
|--------------------------------------------------|
| STATUS: notes.txt | 2 lines | UTF-8 | *modified  |
| Keys: Ctrl-S save | Ctrl-Q quit | Ctrl-Z undo    |
+--------------------------------------------------+

Key interactions:

  • Arrow keys move cursor.
  • Ctrl-S saves.
  • Ctrl-Q quits (prompts if modified).
  • Ctrl-Z undo, Ctrl-Y redo.

4. Solution Architecture

4.1 High-Level Design

[Input loop] -> [Key parser] -> [Editor state] -> [Renderer]
                 |                     |
                 v                     v
            [Gap buffer]         [Undo/Redo]

4.2 Key Components

Component Responsibility Key Decisions
Input loop read keys and decode raw mode + escape parsing
Gap buffer store text with fast edits gap size strategy
Renderer draw screen and status full redraw per frame
Undo/redo store edit commands command stack

4.3 Data Structures (No Full Code)

typedef struct {
    gapbuf buf;
    size_t cursor;
    char *filename;
    int dirty;
} editor_state;

4.4 Algorithm Overview

Key Algorithm: Insert Character

  1. Ensure gap has space (resize if needed).
  2. Write char at gap_start.
  3. Increment gap_start and cursor.
  4. Record undo command.

Complexity Analysis:

  • Insert near cursor: O(1) amortized.
  • Cursor move: O(k) for k positions.

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

mini_editor/
|-- src/
|   |-- main.c
|   |-- editor.c
|   |-- gapbuf.c
|   |-- undo.c
|   \-- term.c
|-- include/
|   \-- editor.h
|-- fixtures/
|   \-- hello.txt
|-- tests/
|   \-- test_gapbuf.c
\-- Makefile

5.3 The Core Question You’re Answering

“Can I build a real interactive system in C without leaking or corrupting memory?”

5.4 Concepts You Must Understand First

  1. Raw mode and input parsing (see Section 2.1).
  2. Gap buffer design (see Section 2.2).
  3. Undo/redo ownership (see Section 2.3).

5.5 Questions to Guide Your Design

  1. How will you represent cursor position relative to the gap?
  2. How will you decode escape sequences reliably?
  3. How will you cap undo history memory?

5.6 Thinking Exercise

If the cursor is in the middle of a line, how does the gap move when you press the left arrow 5 times? Draw the buffer before and after.

5.7 The Interview Questions They’ll Ask

  1. What data structure would you use for an editor buffer and why?
  2. How do you implement undo/redo safely?
  3. How do you handle raw terminal input?

5.8 Hints in Layers

Hint 1: Start by printing keys in raw mode. Hint 2: Implement a minimal gap buffer that supports insert/delete. Hint 3: Add rendering and a status bar. Hint 4: Add undo/redo and file save.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Terminal I/O | APUE | Ch. 18 | | Data structures | The Craft of Text Editing | gap buffer | | Memory bugs | Expert C Programming | Ch. 2-3 |

5.10 Implementation Phases

Phase 1: Terminal + Rendering (5-8 hours)

Goals: raw mode input and screen redraw. Checkpoint: keys move cursor on screen.

Phase 2: Gap Buffer + Editing (6-10 hours)

Goals: insert/delete, file load/save. Checkpoint: edits persist and save correctly.

Phase 3: Undo/Redo + Polish (6-10 hours)

Goals: command history, status bar, prompts. Checkpoint: undo/redo works without leaks.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Buffer type | gap vs rope | gap buffer | simpler and fast enough | | Redraw strategy | full vs incremental | full redraw | easier correctness | | Undo limit | unlimited vs capped | capped | avoid memory bloat |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | gap buffer correctness | insert/delete, move gap | | Integration Tests | editor flow | open, edit, save | | Edge Case Tests | large edits | paste 10k chars |

6.2 Critical Test Cases

  1. Insert and delete at beginning/end of file.
  2. Undo/redo after multi-line edits.
  3. Save and reopen file matches content.

6.3 Test Data

fixtures/hello.txt (1 line)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Terminal stuck raw | shell unusable | restore settings on exit | | Cursor out of bounds | crashes on navigation | clamp cursor | | Undo leaks | memory growth | free old commands |

7.2 Debugging Strategies

  • Run under Valgrind after long edit sessions.
  • Log key events to a debug file.

7.3 Performance Traps

Rebuilding line layout on every keystroke can be expensive; acceptable for small editors but note the trade-off.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add search (Ctrl-F).
  • Add line numbers.

8.2 Intermediate Extensions

  • Implement a piece table.
  • Add syntax highlighting.

8.3 Advanced Extensions

  • Add multi-file tabs.
  • Add macros and scripting.

9. Real-World Connections

9.1 Industry Applications

  • Terminal editors and embedded device UIs.
  • CLI tooling with interactive interfaces.
  • kilo (small C editor)
  • nano (terminal editor)

9.3 Interview Relevance

  • Shows ability to build interactive systems and manage memory.

10. Resources

10.1 Essential Reading

  • “Advanced Programming in the UNIX Environment” (termios)
  • “The Craft of Text Editing” (editor data structures)

10.2 Video Resources

  • Lectures on terminal I/O and text editor internals.

10.3 Tools & Documentation

  • man 3 termios, man 2 read, man 3 tcsetattr

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain raw mode and escape sequences.
  • I can explain how a gap buffer works.
  • I can describe undo/redo ownership rules.

11.2 Implementation

  • Editor runs without leaking memory.
  • Undo/redo works reliably.
  • Saving preserves file contents.

11.3 Growth

  • I can explain why this editor is safe compared to naive C code.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Editor opens and saves files, supports basic editing.
  • Raw mode is restored on exit.

Full Completion:

  • Undo/redo implemented and memory-safe.
  • Status bar and key hints.

Excellence (Going Above & Beyond):

  • Search, syntax highlighting, or alternative buffer structure.