Project 2: Text Editor Buffer (Gap Buffer Implementation)
Project 2: Text Editor Buffer (Gap Buffer Implementation)
Sprint: 2 - Data & Invariants Difficulty: Advanced Time Estimate: 1-2 weeks Prerequisites: Project 1 (or equivalent dynamic array understanding)
Overview
What youโll build: The core buffer data structure used by real text editorsโa gap buffer that efficiently handles insertions and deletions at the cursor position.
Why it teaches Data & Invariants: A gap buffer has one of the most elegant invariant sets in computer science:
[text before cursor][.....gap.....][text after cursor]
The โgapโ is uninitialized memory that moves with the cursor. You must track:
- Where the gap starts
- Where the gap ends
- Whatโs โreal textโ vs โgarbage in the gapโ
- When the buffer needs to grow
One wrong index and youโre reading garbage or corrupting memory.
The Core Question Youโre Answering:
โHow do you make text insertion O(1) when arrays require O(n) shifting, without using a linked list of characters?โ
Learning Objectives
By the end of this project, you will be able to:
- Implement a gap buffer with correct invariants
- Handle memory movement using memmove (not memcpy) for overlapping regions
- Represent โabsenceโ safely (the gap contains garbage)
- Design struct layout for performance
- Debug buffer corruption using visualization
- Understand aliasing dangers when pointers overlap
Theoretical Foundation
The Gap Buffer Concept
Text editing has a locality propertyโ99% of edits happen at or near the cursor position. A gap buffer exploits this:
Gap Buffer Structure:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ "Hello World" with cursor after "Hello " โ
โ โ
โ In memory: โ
โ [H][e][l][l][o][ ][ GAP ][W][o][r][l][d] โ
โ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 โ
โ ^ ^ โ
โ gap_start gap_end โ
โ โ
โ The cursor is ALWAYS at gap_start โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The Invariants
INVARIANT 1: gap_start <= gap_end
(The gap cannot have negative size)
INVARIANT 2: gap_end <= total_size
(The gap end cannot exceed buffer size)
INVARIANT 3: buffer[0..gap_start-1] is valid text
(Text before cursor is meaningful)
INVARIANT 4: buffer[gap_start..gap_end-1] is GARBAGE
(NEVER read this memory!)
INVARIANT 5: buffer[gap_end..total_size-1] is valid text
(Text after cursor is meaningful)
INVARIANT 6: cursor_position == gap_start
(Cursor is always at start of gap)
Operations
Insert Character:
Before: [H][e][l][l][o][ ][GAP GAP GAP][W][o][r][l][d]
^gap_start
Insert 'X':
[H][e][l][l][o][ ][X][GAP GAP][W][o][r][l][d]
^gap_start (incremented)
Time complexity: O(1) - just write to gap_start and increment
Delete Character (Backspace):
Before: [H][e][l][l][o][ ][GAP GAP GAP][W][o][r][l][d]
^gap_start
Backspace:
[H][e][l][l][o][GAP GAP GAP GAP][W][o][r][l][d]
^gap_start (decremented)
Time complexity: O(1) - just decrement gap_start
Move Cursor Right:
Before: [H][e][l][l][o][ ][GAP GAP GAP][W][o][r][l][d]
^gap_start ^gap_end
Move right 1:
[H][e][l][l][o][ ][W][GAP GAP GAP][o][r][l][d]
^gap_start ^gap_end
What happened:
1. Copy buffer[gap_end] to buffer[gap_start]
2. Increment both gap_start and gap_end
Time complexity: O(1) for single move, O(n) for jump to end
memmove vs memcpy
// DANGEROUS: memcpy doesn't handle overlapping regions
memcpy(dst, src, n); // If dst and src overlap, UNDEFINED!
// SAFE: memmove handles overlapping regions
memmove(dst, src, n); // Works even if regions overlap
// Gap buffer often has overlapping regions when moving gap
// ALWAYS use memmove for gap operations
Project Specification
Core API
// Create and destroy
GapBuffer* gap_create(size_t initial_capacity);
void gap_destroy(GapBuffer* buf);
// Basic operations
void gap_insert_char(GapBuffer* buf, char c);
void gap_delete_backward(GapBuffer* buf); // Backspace
void gap_delete_forward(GapBuffer* buf); // Delete
// Cursor movement
void gap_move_left(GapBuffer* buf);
void gap_move_right(GapBuffer* buf);
void gap_move_to(GapBuffer* buf, size_t position);
size_t gap_cursor_position(GapBuffer* buf);
// Content access
size_t gap_length(GapBuffer* buf); // Text length (excludes gap)
char gap_char_at(GapBuffer* buf, size_t index);
void gap_get_text(GapBuffer* buf, char* out, size_t max_len);
// File I/O
void gap_load_file(GapBuffer* buf, const char* filename);
void gap_save_file(GapBuffer* buf, const char* filename);
// Debug
void gap_debug_print(GapBuffer* buf);
bool gap_check_invariants(GapBuffer* buf);
Expected Output
$ ./gapedit myfile.txt
GapEdit v1.0 - Gap Buffer Text Editor
Loaded: myfile.txt (1,247 bytes)
Buffer allocated: 2048 bytes (gap: 801 bytes)
----------------------------------------
Hello, World!
This is a test file.
โ
----------------------------------------
[Ln 2, Col 1] | GAP: 801 bytes | BUF: 2048 bytes | Ctrl+D=debug, Ctrl+Q=quit
Debug Mode (Ctrl+D):
========== GAP BUFFER DEBUG VIEW ==========
Total buffer size: 2048
Gap start: 41
Gap end: 842
Gap size: 801 bytes
Text before gap: 41 bytes
Text after gap: 205 bytes
Memory layout (visual):
[Hello, World!\nThis is a test file.\n|........gap.........|remaining text]
^gap_start ^gap_end
Invariants check:
โ gap_start <= gap_end
โ gap_end <= total_size
โ buffer != NULL
โ gap is at cursor position
==========================================
Solution Architecture
Data Structure
typedef struct {
char *buffer; // The actual text storage
size_t gap_start; // Start of gap (cursor position)
size_t gap_end; // End of gap (one past gap)
size_t total_size; // Total allocated size
} GapBuffer;
Memory Layout
GapBuffer struct:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ buffer โ points to heap-allocated memory โ
โ gap_start = 6 โ
โ gap_end = 11 โ
โ total_size = 16 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Heap memory:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ[H][e][l][l][o][ ][?][?][?][?][?][W][o][r][l][d] โ
โ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 โ
โ โฒ โฒ โ
โ gap_start gap_end โ
โ โโโ GARBAGE โโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Implementation Guide
Phase 1: Basic Structure (1 hour)
typedef struct {
char *buffer;
size_t gap_start;
size_t gap_end;
size_t total_size;
} GapBuffer;
GapBuffer* gap_create(size_t initial_capacity) {
GapBuffer *buf = malloc(sizeof(GapBuffer));
buf->buffer = malloc(initial_capacity);
buf->gap_start = 0;
buf->gap_end = initial_capacity;
buf->total_size = initial_capacity;
return buf;
}
void gap_destroy(GapBuffer* buf) {
free(buf->buffer);
free(buf);
}
size_t gap_length(GapBuffer* buf) {
// Text length = total - gap size
return buf->total_size - (buf->gap_end - buf->gap_start);
}
Phase 2: Insert and Delete (2 hours)
void gap_insert_char(GapBuffer* buf, char c) {
// Check if gap is full
if (buf->gap_start == buf->gap_end) {
gap_expand(buf); // Grow the buffer
}
buf->buffer[buf->gap_start] = c;
buf->gap_start++;
}
void gap_delete_backward(GapBuffer* buf) { // Backspace
if (buf->gap_start > 0) {
buf->gap_start--;
}
}
void gap_delete_forward(GapBuffer* buf) { // Delete
if (buf->gap_end < buf->total_size) {
buf->gap_end++;
}
}
static void gap_expand(GapBuffer* buf) {
size_t new_size = buf->total_size * 2;
char *new_buffer = malloc(new_size);
// Copy text before gap
memcpy(new_buffer, buf->buffer, buf->gap_start);
// Copy text after gap to end of new buffer
size_t after_gap_size = buf->total_size - buf->gap_end;
memcpy(new_buffer + new_size - after_gap_size,
buf->buffer + buf->gap_end,
after_gap_size);
// Update gap_end to new position
buf->gap_end = new_size - after_gap_size;
buf->total_size = new_size;
free(buf->buffer);
buf->buffer = new_buffer;
}
Phase 3: Cursor Movement (2 hours)
void gap_move_right(GapBuffer* buf) {
if (buf->gap_end >= buf->total_size) return;
// Move one character from after gap to before gap
buf->buffer[buf->gap_start] = buf->buffer[buf->gap_end];
buf->gap_start++;
buf->gap_end++;
}
void gap_move_left(GapBuffer* buf) {
if (buf->gap_start == 0) return;
// Move one character from before gap to after gap
buf->gap_start--;
buf->gap_end--;
buf->buffer[buf->gap_end] = buf->buffer[buf->gap_start];
}
void gap_move_to(GapBuffer* buf, size_t position) {
// Clamp to valid range
size_t text_length = gap_length(buf);
if (position > text_length) position = text_length;
// Move gap to position
while (buf->gap_start < position) {
gap_move_right(buf);
}
while (buf->gap_start > position) {
gap_move_left(buf);
}
}
Phase 4: Content Access (1 hour)
char gap_char_at(GapBuffer* buf, size_t index) {
size_t text_length = gap_length(buf);
assert(index < text_length);
if (index < buf->gap_start) {
return buf->buffer[index];
} else {
// Skip over the gap
return buf->buffer[buf->gap_end + (index - buf->gap_start)];
}
}
void gap_get_text(GapBuffer* buf, char* out, size_t max_len) {
size_t text_len = gap_length(buf);
size_t copy_len = (text_len < max_len - 1) ? text_len : max_len - 1;
size_t out_idx = 0;
// Copy before gap
size_t before_gap = (buf->gap_start < copy_len) ? buf->gap_start : copy_len;
memcpy(out, buf->buffer, before_gap);
out_idx = before_gap;
// Copy after gap
if (out_idx < copy_len) {
size_t remaining = copy_len - out_idx;
memcpy(out + out_idx, buf->buffer + buf->gap_end, remaining);
out_idx += remaining;
}
out[out_idx] = '\0';
}
Phase 5: Invariant Checking (1 hour)
bool gap_check_invariants(GapBuffer* buf) {
bool valid = true;
if (buf->gap_start > buf->gap_end) {
fprintf(stderr, "INVARIANT VIOLATION: gap_start > gap_end\n");
valid = false;
}
if (buf->gap_end > buf->total_size) {
fprintf(stderr, "INVARIANT VIOLATION: gap_end > total_size\n");
valid = false;
}
if (buf->buffer == NULL && buf->total_size != 0) {
fprintf(stderr, "INVARIANT VIOLATION: NULL buffer with non-zero size\n");
valid = false;
}
return valid;
}
void gap_debug_print(GapBuffer* buf) {
printf("=== Gap Buffer Debug ===\n");
printf("Total size: %zu\n", buf->total_size);
printf("Gap start: %zu\n", buf->gap_start);
printf("Gap end: %zu\n", buf->gap_end);
printf("Gap size: %zu\n", buf->gap_end - buf->gap_start);
printf("Text length: %zu\n", gap_length(buf));
printf("Content: [");
for (size_t i = 0; i < buf->total_size; i++) {
if (i == buf->gap_start) printf("|");
if (i >= buf->gap_start && i < buf->gap_end) {
printf(".");
} else {
printf("%c", buf->buffer[i]);
}
if (i == buf->gap_end - 1) printf("|");
}
printf("]\n");
printf("Invariants: %s\n", gap_check_invariants(buf) ? "OK" : "VIOLATED");
}
Common Pitfalls
Pitfall 1: Using memcpy for Overlapping Regions
// WRONG: memcpy is undefined for overlapping regions
void gap_move_to_slow(GapBuffer* buf, size_t pos) {
// This might overlap!
memcpy(buf->buffer + pos, buf->buffer + buf->gap_start, size);
}
// CORRECT: memmove handles overlapping
void gap_move_to_slow(GapBuffer* buf, size_t pos) {
memmove(buf->buffer + pos, buf->buffer + buf->gap_start, size);
}
Pitfall 2: Reading From the Gap
// WRONG: Reading garbage
for (size_t i = 0; i < buf->total_size; i++) {
printf("%c", buf->buffer[i]); // Includes gap garbage!
}
// CORRECT: Skip the gap
for (size_t i = 0; i < buf->gap_start; i++) {
printf("%c", buf->buffer[i]);
}
for (size_t i = buf->gap_end; i < buf->total_size; i++) {
printf("%c", buf->buffer[i]);
}
Pitfall 3: Off-by-One in Gap Boundaries
// WRONG: Incorrect gap_end update
buf->gap_end = new_size - after_gap_size - 1; // Off by one!
// CORRECT: gap_end points one past the gap
buf->gap_end = new_size - after_gap_size;
Interview Preparation
Common Questions
- โExplain how a gap buffer works in 30 seconds.โ
- Keep a โgapโ of unused space at cursor position
- Insertions fill the gap: O(1)
- Cursor movement shifts text across gap: O(n) worst case
- Most edits are local, so usually O(1)
- โWhatโs the time complexity of inserting at the cursor in a gap buffer?โ
- O(1) if gap has space
- O(n) if buffer must grow (amortized O(1))
- โWhy canโt you use memcpy to move the gap?โ
- Source and destination may overlap
- memcpy has undefined behavior for overlapping regions
- Must use memmove
- โWhen would a gap buffer be worse than a linked list?โ
- Frequent jumps to distant positions (jump to end in 1MB file)
- Multiple cursor support (each needs its own gap)
Self-Assessment Checklist
Understanding
- Can explain gap buffer invariants
- Can trace insert/delete operations by hand
- Understand memmove vs memcpy difference
- Know when gap movement is O(1) vs O(n)
Implementation
- Insert/delete work correctly
- Cursor movement works correctly
- Buffer grows when needed
- No reading from gap
- Valgrind clean
Resources
Books
| Topic | Book | Chapter |
|---|---|---|
| Gap Buffer Algorithm | โThe Craft of Text Editingโ by Craig Finseth | Chapter 6 (free online) |
| Memory Movement | โC Programming: A Modern Approachโ by K.N. King | Chapter 17.7 |
| Struct Layout | โComputer Systems: A Programmerโs Perspectiveโ | Chapter 3.9 |
Next Project: P03: Memory Arena Allocator - Build your own memory allocator with clear ownership semantics.