Project 2: Text Editor Buffer (Gap Buffer Implementation)

Sprint: 2 - Data & Invariants Difficulty: Advanced Time Estimate: 1-2 weeks Prerequisites: Project 1 (or equivalent dynamic array understanding)

Overview

What you’ll build: The core buffer data structure used by real text editors—a gap buffer that efficiently handles insertions and deletions at the cursor position.

Why it teaches Data & Invariants: A gap buffer has one of the most elegant invariant sets in computer science:

[text before cursor][.....gap.....][text after cursor]

The “gap” is uninitialized memory that moves with the cursor. You must track:

Where the gap starts
Where the gap ends
What’s “real text” vs “garbage in the gap”
When the buffer needs to grow

One wrong index and you’re reading garbage or corrupting memory.

The Core Question You’re Answering:

“How do you make text insertion O(1) when arrays require O(n) shifting, without using a linked list of characters?”

Learning Objectives

By the end of this project, you will be able to:

Implement a gap buffer with correct invariants
Handle memory movement using memmove (not memcpy) for overlapping regions
Represent “absence” safely (the gap contains garbage)
Design struct layout for performance
Debug buffer corruption using visualization
Understand aliasing dangers when pointers overlap

Theoretical Foundation

The Gap Buffer Concept

Text editing has a locality property—99% of edits happen at or near the cursor position. A gap buffer exploits this:

Gap Buffer Structure:
┌──────────────────────────────────────────────────────────────┐
│  "Hello World" with cursor after "Hello "                    │
│                                                              │
│  In memory:                                                  │
│  [H][e][l][l][o][ ][   GAP   ][W][o][r][l][d]               │
│   0  1  2  3  4  5  6 7 8 9 10 11 12 13 14 15               │
│                     ^         ^                              │
│                 gap_start   gap_end                          │
│                                                              │
│  The cursor is ALWAYS at gap_start                          │
└──────────────────────────────────────────────────────────────┘

The Invariants

INVARIANT 1: gap_start <= gap_end
  (The gap cannot have negative size)

INVARIANT 2: gap_end <= total_size
  (The gap end cannot exceed buffer size)

INVARIANT 3: buffer[0..gap_start-1] is valid text
  (Text before cursor is meaningful)

INVARIANT 4: buffer[gap_start..gap_end-1] is GARBAGE
  (NEVER read this memory!)

INVARIANT 5: buffer[gap_end..total_size-1] is valid text
  (Text after cursor is meaningful)

INVARIANT 6: cursor_position == gap_start
  (Cursor is always at start of gap)

Operations

Insert Character:

Before: [H][e][l][l][o][ ][GAP GAP GAP][W][o][r][l][d]
                         ^gap_start

Insert 'X':
        [H][e][l][l][o][ ][X][GAP GAP][W][o][r][l][d]
                            ^gap_start (incremented)

Time complexity: O(1) - just write to gap_start and increment

Delete Character (Backspace):

Before: [H][e][l][l][o][ ][GAP GAP GAP][W][o][r][l][d]
                         ^gap_start

Backspace:
        [H][e][l][l][o][GAP GAP GAP GAP][W][o][r][l][d]
                       ^gap_start (decremented)

Time complexity: O(1) - just decrement gap_start

Move Cursor Right:

Before: [H][e][l][l][o][ ][GAP GAP GAP][W][o][r][l][d]
                         ^gap_start    ^gap_end

Move right 1:
        [H][e][l][l][o][ ][W][GAP GAP GAP][o][r][l][d]
                            ^gap_start    ^gap_end

What happened:
  1. Copy buffer[gap_end] to buffer[gap_start]
  2. Increment both gap_start and gap_end

Time complexity: O(1) for single move, O(n) for jump to end

memmove vs memcpy

// DANGEROUS: memcpy doesn't handle overlapping regions
memcpy(dst, src, n);  // If dst and src overlap, UNDEFINED!

// SAFE: memmove handles overlapping regions
memmove(dst, src, n);  // Works even if regions overlap

// Gap buffer often has overlapping regions when moving gap
// ALWAYS use memmove for gap operations

Project Specification

Core API

// Create and destroy
GapBuffer* gap_create(size_t initial_capacity);
void gap_destroy(GapBuffer* buf);

// Basic operations
void gap_insert_char(GapBuffer* buf, char c);
void gap_delete_backward(GapBuffer* buf);  // Backspace
void gap_delete_forward(GapBuffer* buf);   // Delete

// Cursor movement
void gap_move_left(GapBuffer* buf);
void gap_move_right(GapBuffer* buf);
void gap_move_to(GapBuffer* buf, size_t position);
size_t gap_cursor_position(GapBuffer* buf);

// Content access
size_t gap_length(GapBuffer* buf);  // Text length (excludes gap)
char gap_char_at(GapBuffer* buf, size_t index);
void gap_get_text(GapBuffer* buf, char* out, size_t max_len);

// File I/O
void gap_load_file(GapBuffer* buf, const char* filename);
void gap_save_file(GapBuffer* buf, const char* filename);

// Debug
void gap_debug_print(GapBuffer* buf);
bool gap_check_invariants(GapBuffer* buf);

Expected Output

$ ./gapedit myfile.txt
GapEdit v1.0 - Gap Buffer Text Editor
Loaded: myfile.txt (1,247 bytes)
Buffer allocated: 2048 bytes (gap: 801 bytes)
----------------------------------------
Hello, World!
This is a test file.
█
----------------------------------------
[Ln 2, Col 1] | GAP: 801 bytes | BUF: 2048 bytes | Ctrl+D=debug, Ctrl+Q=quit

Debug Mode (Ctrl+D):

========== GAP BUFFER DEBUG VIEW ==========
Total buffer size: 2048
Gap start: 41
Gap end: 842
Gap size: 801 bytes
Text before gap: 41 bytes
Text after gap: 205 bytes

Memory layout (visual):
[Hello, World!\nThis is a test file.\n|........gap.........|remaining text]
                                     ^gap_start          ^gap_end

Invariants check:
✓ gap_start <= gap_end
✓ gap_end <= total_size
✓ buffer != NULL
✓ gap is at cursor position
==========================================

Solution Architecture

Data Structure

typedef struct {
    char *buffer;      // The actual text storage
    size_t gap_start;  // Start of gap (cursor position)
    size_t gap_end;    // End of gap (one past gap)
    size_t total_size; // Total allocated size
} GapBuffer;

Memory Layout

GapBuffer struct:
┌─────────────────────────────────────────────────────────────┐
│ buffer → points to heap-allocated memory                    │
│ gap_start = 6                                               │
│ gap_end = 11                                                │
│ total_size = 16                                             │
└─────────────────────────────────────────────────────────────┘
         │
         ▼
Heap memory:
┌─────────────────────────────────────────────────────────────┐
│[H][e][l][l][o][ ][?][?][?][?][?][W][o][r][l][d]             │
│ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15             │
│              ▲                  ▲                           │
│          gap_start          gap_end                         │
│              └── GARBAGE ──────┘                            │
└─────────────────────────────────────────────────────────────┘

Implementation Guide

Phase 1: Basic Structure (1 hour)

typedef struct {
    char *buffer;
    size_t gap_start;
    size_t gap_end;
    size_t total_size;
} GapBuffer;

GapBuffer* gap_create(size_t initial_capacity) {
    GapBuffer *buf = malloc(sizeof(GapBuffer));
    buf->buffer = malloc(initial_capacity);
    buf->gap_start = 0;
    buf->gap_end = initial_capacity;
    buf->total_size = initial_capacity;
    return buf;
}

void gap_destroy(GapBuffer* buf) {
    free(buf->buffer);
    free(buf);
}

size_t gap_length(GapBuffer* buf) {
    // Text length = total - gap size
    return buf->total_size - (buf->gap_end - buf->gap_start);
}

Phase 2: Insert and Delete (2 hours)

void gap_insert_char(GapBuffer* buf, char c) {
    // Check if gap is full
    if (buf->gap_start == buf->gap_end) {
        gap_expand(buf);  // Grow the buffer
    }

    buf->buffer[buf->gap_start] = c;
    buf->gap_start++;
}

void gap_delete_backward(GapBuffer* buf) {  // Backspace
    if (buf->gap_start > 0) {
        buf->gap_start--;
    }
}

void gap_delete_forward(GapBuffer* buf) {  // Delete
    if (buf->gap_end < buf->total_size) {
        buf->gap_end++;
    }
}

static void gap_expand(GapBuffer* buf) {
    size_t new_size = buf->total_size * 2;
    char *new_buffer = malloc(new_size);

    // Copy text before gap
    memcpy(new_buffer, buf->buffer, buf->gap_start);

    // Copy text after gap to end of new buffer
    size_t after_gap_size = buf->total_size - buf->gap_end;
    memcpy(new_buffer + new_size - after_gap_size,
           buf->buffer + buf->gap_end,
           after_gap_size);

    // Update gap_end to new position
    buf->gap_end = new_size - after_gap_size;
    buf->total_size = new_size;

    free(buf->buffer);
    buf->buffer = new_buffer;
}

Phase 3: Cursor Movement (2 hours)

void gap_move_right(GapBuffer* buf) {
    if (buf->gap_end >= buf->total_size) return;

    // Move one character from after gap to before gap
    buf->buffer[buf->gap_start] = buf->buffer[buf->gap_end];
    buf->gap_start++;
    buf->gap_end++;
}

void gap_move_left(GapBuffer* buf) {
    if (buf->gap_start == 0) return;

    // Move one character from before gap to after gap
    buf->gap_start--;
    buf->gap_end--;
    buf->buffer[buf->gap_end] = buf->buffer[buf->gap_start];
}

void gap_move_to(GapBuffer* buf, size_t position) {
    // Clamp to valid range
    size_t text_length = gap_length(buf);
    if (position > text_length) position = text_length;

    // Move gap to position
    while (buf->gap_start < position) {
        gap_move_right(buf);
    }
    while (buf->gap_start > position) {
        gap_move_left(buf);
    }
}

Phase 4: Content Access (1 hour)

char gap_char_at(GapBuffer* buf, size_t index) {
    size_t text_length = gap_length(buf);
    assert(index < text_length);

    if (index < buf->gap_start) {
        return buf->buffer[index];
    } else {
        // Skip over the gap
        return buf->buffer[buf->gap_end + (index - buf->gap_start)];
    }
}

void gap_get_text(GapBuffer* buf, char* out, size_t max_len) {
    size_t text_len = gap_length(buf);
    size_t copy_len = (text_len < max_len - 1) ? text_len : max_len - 1;

    size_t out_idx = 0;

    // Copy before gap
    size_t before_gap = (buf->gap_start < copy_len) ? buf->gap_start : copy_len;
    memcpy(out, buf->buffer, before_gap);
    out_idx = before_gap;

    // Copy after gap
    if (out_idx < copy_len) {
        size_t remaining = copy_len - out_idx;
        memcpy(out + out_idx, buf->buffer + buf->gap_end, remaining);
        out_idx += remaining;
    }

    out[out_idx] = '\0';
}

Phase 5: Invariant Checking (1 hour)

bool gap_check_invariants(GapBuffer* buf) {
    bool valid = true;

    if (buf->gap_start > buf->gap_end) {
        fprintf(stderr, "INVARIANT VIOLATION: gap_start > gap_end\n");
        valid = false;
    }

    if (buf->gap_end > buf->total_size) {
        fprintf(stderr, "INVARIANT VIOLATION: gap_end > total_size\n");
        valid = false;
    }

    if (buf->buffer == NULL && buf->total_size != 0) {
        fprintf(stderr, "INVARIANT VIOLATION: NULL buffer with non-zero size\n");
        valid = false;
    }

    return valid;
}

void gap_debug_print(GapBuffer* buf) {
    printf("=== Gap Buffer Debug ===\n");
    printf("Total size: %zu\n", buf->total_size);
    printf("Gap start: %zu\n", buf->gap_start);
    printf("Gap end: %zu\n", buf->gap_end);
    printf("Gap size: %zu\n", buf->gap_end - buf->gap_start);
    printf("Text length: %zu\n", gap_length(buf));

    printf("Content: [");
    for (size_t i = 0; i < buf->total_size; i++) {
        if (i == buf->gap_start) printf("|");
        if (i >= buf->gap_start && i < buf->gap_end) {
            printf(".");
        } else {
            printf("%c", buf->buffer[i]);
        }
        if (i == buf->gap_end - 1) printf("|");
    }
    printf("]\n");

    printf("Invariants: %s\n", gap_check_invariants(buf) ? "OK" : "VIOLATED");
}

Common Pitfalls

Pitfall 1: Using memcpy for Overlapping Regions

// WRONG: memcpy is undefined for overlapping regions
void gap_move_to_slow(GapBuffer* buf, size_t pos) {
    // This might overlap!
    memcpy(buf->buffer + pos, buf->buffer + buf->gap_start, size);
}

// CORRECT: memmove handles overlapping
void gap_move_to_slow(GapBuffer* buf, size_t pos) {
    memmove(buf->buffer + pos, buf->buffer + buf->gap_start, size);
}

Pitfall 2: Reading From the Gap

// WRONG: Reading garbage
for (size_t i = 0; i < buf->total_size; i++) {
    printf("%c", buf->buffer[i]);  // Includes gap garbage!
}

// CORRECT: Skip the gap
for (size_t i = 0; i < buf->gap_start; i++) {
    printf("%c", buf->buffer[i]);
}
for (size_t i = buf->gap_end; i < buf->total_size; i++) {
    printf("%c", buf->buffer[i]);
}

Pitfall 3: Off-by-One in Gap Boundaries

// WRONG: Incorrect gap_end update
buf->gap_end = new_size - after_gap_size - 1;  // Off by one!

// CORRECT: gap_end points one past the gap
buf->gap_end = new_size - after_gap_size;

Interview Preparation

Common Questions

“Explain how a gap buffer works in 30 seconds.”
- Keep a “gap” of unused space at cursor position
- Insertions fill the gap: O(1)
- Cursor movement shifts text across gap: O(n) worst case
- Most edits are local, so usually O(1)
“What’s the time complexity of inserting at the cursor in a gap buffer?”
- O(1) if gap has space
- O(n) if buffer must grow (amortized O(1))
“Why can’t you use memcpy to move the gap?”
- Source and destination may overlap
- memcpy has undefined behavior for overlapping regions
- Must use memmove
“When would a gap buffer be worse than a linked list?”
- Frequent jumps to distant positions (jump to end in 1MB file)
- Multiple cursor support (each needs its own gap)

Self-Assessment Checklist

Understanding

Can explain gap buffer invariants
Can trace insert/delete operations by hand
Understand memmove vs memcpy difference
Know when gap movement is O(1) vs O(n)

Implementation

Resources

Books

Topic	Book	Chapter
Gap Buffer Algorithm	“The Craft of Text Editing” by Craig Finseth	Chapter 6 (free online)
Memory Movement	“C Programming: A Modern Approach” by K.N. King	Chapter 17.7
Struct Layout	“Computer Systems: A Programmer’s Perspective”	Chapter 3.9

Next Project: P03: Memory Arena Allocator - Build your own memory allocator with clear ownership semantics.