Project 14: Str Module - Low-Level String Operations

Build a string module with position-based indexing (1-based, with negative indices from end) that operates on substrings without allocation.

Quick Reference

Attribute	Value
Difficulty	Level 3 - Advanced
Time Estimate	4-5 days
Language	C
Prerequisites	Mem module, Assert module, Understanding of C strings
Key Topics	String processing, Memory efficiency, Position semantics, Zero-copy operations

1. Learning Objectives

By completing this project, you will:

Design safer string interfaces - Create APIs that prevent common C string errors
Implement position-based indexing - Use 1-based indices with negative indices from end
Master zero-copy operations - Return pointers into existing strings without allocation
Handle string views - Understand the difference between owning and borrowing strings
Bridge to C strings - Interoperate with standard null-terminated strings

Why Str Matters:

C strings are notoriously error-prone:

Buffer overflows from strcpy, sprintf
Off-by-one errors with 0-based indexing
O(n) length calculation with strlen
No substring support without copying
Null terminator confusion

Hanson’s Str module addresses these problems by:

Using 1-based, end-inclusive positions (like human counting)
Supporting negative indices (like Python)
Tracking length explicitly
Returning views (pointers) instead of copies
Clear ownership semantics

Real-World Applications:

Parsers: Efficient substring extraction during lexing/parsing
Protocol handlers: Extracting fields from packet payloads
Text editors: Manipulating ranges of text efficiently
Log analysis: Processing log entries without copying

2. Theoretical Foundation

2.1 Core Concepts

The Problems with C Strings

C String Problems
--------------------------------------------------------------------------------

Problem 1: Off-by-one errors with 0-based indexing

    char s[] = "Hello";  // indices 0,1,2,3,4
    s[5] = '!';          // Bug! Off the end
    char c = s[5];       // What's at index 5? The null terminator!

    // Which index is 'o'?
    // Answer: 4 (not 5, because we count from 0)
    // This catches EVERYONE sometimes

Problem 2: strlen is O(n)

    char *s = malloc(1000000);
    // ... fill with data ...
    int len = strlen(s);  // Walks ALL MILLION BYTES

    // In a loop?
    for (int i = 0; i < strlen(s); i++)  // O(n^2) disaster!
        process(s[i]);

Problem 3: No substring without copying

    char s[] = "Hello, World!";
    // I want just "World" - how?

    char sub[6];
    strncpy(sub, s + 7, 5);  // Copy 5 chars
    sub[5] = '\0';           // Don't forget null terminator!
    // Memory allocated, copy made, error-prone

Problem 4: Buffer overflows

    char buf[10];
    strcpy(buf, "This string is way too long");  // BOOM
    // No bounds checking, memory corruption

Problem 5: Null in data

    char data[] = {0x48, 0x00, 0x65, 0x6c};  // Binary data with null
    int len = strlen(data);  // Returns 1, not 4!
    // C strings can't contain null bytes

Position-Based Indexing

Hanson’s Str uses a different indexing scheme:

Str Position Semantics
--------------------------------------------------------------------------------

String: "Hello"
         H e l l o
Position: 1 2 3 4 5    (positive: count from start, 1-based)
         -5-4-3-2-1    (negative: count from end)
          0            (position 0 = after last character)

Key insight: Positions are between characters, not on them

          ↓1  ↓2  ↓3  ↓4  ↓5  ↓0 (end)
          | H | e | l | l | o |
          ↑-5 ↑-4 ↑-3 ↑-2 ↑-1 ↑

Str_sub(s, 1, 5)  → "Hello"  (positions 1 to 5)
Str_sub(s, 2, 4)  → "ell"    (positions 2 to 4)
Str_sub(s, -3, 0) → "llo"    (position -3 to end)
Str_sub(s, 1, -1) → "Hell"   (position 1 to second-to-last)

Why 1-based?
- Matches human counting: "first character", "fifth character"
- Position 0 has special meaning (end of string)
- Negative indices work more intuitively

Why negative indices?
- Access from end without knowing length
- s[-1] = last character (like Python)
- s[-2] = second to last
- Common operations become simpler

Comparison:
    C (0-based):           Str (1-based):
    s[0] = first char      Str_get(s, 1) = first char
    s[n-1] = last char     Str_get(s, -1) = last char
    s[n] = null term       position 0 = end marker

Zero-Copy Substring Operations

Zero-Copy Substring
--------------------------------------------------------------------------------

Traditional substring (copies):

    char original[] = "Hello, World!";
    char *sub = substring(original, 7, 5);
    // sub is NEWLY ALLOCATED, points to copy of "World"

    ┌─────────────────────┐
    │ H e l l o ,   W o r │ l d ! \0    original
    └─────────────────────┘

    ┌─────────────┐
    │ W o r l d \0│                      sub (NEW MEMORY)
    └─────────────┘

Str module (zero-copy):

    char original[] = "Hello, World!";
    int len;
    char *sub = Str_sub(original, 8, 12, &len);
    // sub points INTO original, len = 5

    ┌─────────────────────┐
    │ H e l l o ,   W o r │ l d ! \0    original
    └─────────────────────┘
                    ↑
                    sub (points here, len=5)

Benefits:
- No allocation needed
- No copying
- Faster
- Less memory

Gotchas:
- sub is NOT null-terminated
- sub is only valid while original exists
- Must track length separately
- Can't modify without affecting original

2.2 Why This Matters

Understanding string manipulation is fundamental because:

Security: Buffer overflows are still a top vulnerability class
Performance: String operations are often on the critical path
Correctness: Off-by-one errors plague every codebase
Memory: Unnecessary copies waste RAM and cache

Industry Impact:

Redis’s SDS (Simple Dynamic Strings) uses similar principles
Rust’s string slices are the same concept
Go’s strings track length separately from data
Every modern language has moved away from null-terminated strings

2.3 Historical Context

C strings have been problematic since the beginning:

1970s: C inherits null-terminated strings from B and BCPL
1988: Morris Worm exploits buffer overflow in fingerd
1990s: Constant stream of buffer overflow vulnerabilities
2000s: Languages like Python, Java, Go abandon null-termination
2014: Heartbleed (OpenSSL) shows C string handling still dangerous
Today: Rust’s borrow checker + string slices provide safety

Hanson’s Contribution: The Str module (1996) anticipated modern string slice/view concepts by decades. The position semantics and zero-copy approach influenced later designs.

2.4 Common Misconceptions

Misconception 1: “Negative indices are confusing”

Reality: Once learned, they’re more intuitive for end-relative access. Python programmers use them constantly.

Misconception 2: “Zero-copy is dangerous because of lifetime issues”

Reality: The danger exists with any pointer in C. Clear documentation and discipline solve this.

Misconception 3: “1-based indexing is old-fashioned”

Reality: Many modern languages (Lua, Julia, MATLAB) use 1-based indexing. It matches human counting.

Misconception 4: “Str is just a wrapper around char*“

Reality: Str provides a fundamentally different abstraction - length-tracked, position-semantics strings.

3. Project Specification

3.1 What You Will Build

A complete Str module implementing Hanson’s CII interface:

// str.h - The Interface

#ifndef STR_INCLUDED
#define STR_INCLUDED

#include <stddef.h>

/* Substring extraction - ZERO COPY
 * Returns pointer into s, sets *len to substring length
 * i, j are positions (1-based, negative from end, 0 = end)
 */
extern char *Str_sub  (const char *s, int i, int j, int *len);
extern char *Str_dup  (const char *s, int i, int j, int alloc);

/* Character operations */
extern int   Str_chr  (const char *s, int i, int j, int c);
extern int   Str_rchr (const char *s, int i, int j, int c);
extern int   Str_upto (const char *s, int i, int j, const char *set);
extern int   Str_rupto(const char *s, int i, int j, const char *set);
extern int   Str_find (const char *s, int i, int j, const char *str);
extern int   Str_rfind(const char *s, int i, int j, const char *str);

/* Comparison */
extern int   Str_cmp  (const char *s1, int i1, int j1,
                       const char *s2, int i2, int j2);

/* Concatenation */
extern char *Str_cat  (const char *s1, int i1, int j1,
                       const char *s2, int i2, int j2);
extern char *Str_catv (const char *s, int i, int j, ...);

/* Mapping */
extern char *Str_map  (const char *s, int i, int j,
                       const char *from, const char *to);

/* String length */
extern int   Str_len  (const char *s, int i, int j);

/* Position conversion helpers */
extern int   Str_pos  (const char *s, int i);  /* Convert to index */

#undef T
#endif

3.2 Functional Requirements

Position semantics: 1-based, negative from end, 0 = end
Str_sub(s, i, j, &len): Return pointer to substring, set len
Str_dup(s, i, j, alloc): Return allocated copy (alloc = include null terminator)
Str_chr/rchr: Find character in range (forward/reverse)
Str_upto/rupto: Find first char from set in range
Str_find/rfind: Find substring in range
Str_cmp: Compare substrings
Str_cat/catv: Concatenate substrings (allocates result)
Str_map: Character mapping (like tr command)
Str_len: Return length of substring

3.3 Non-Functional Requirements

Zero-copy where possible: Str_sub returns pointer, not copy
Explicit length: Never rely on null termination in internal operations
Bounds checking: Assert on invalid positions
Memory safety: Use Mem module for allocations
Interoperability: Work with standard null-terminated C strings

3.4 Example Usage / Output

$ ./str_test

# Test 1: Position-based indexing (1-based!)
s = "Hello, World!"
Str_sub(s, 1, 5, &len) → "Hello" (len=5)     # positions 1-5
Str_sub(s, 8, 0, &len) → "World!" (len=6)    # position 8 to end
Str_sub(s, -6, 0, &len) → "World!" (len=6)   # -6 = 6th from end

# Test 2: Negative indices
Str_sub(s, -1, -1, &len) → "!" (len=1)       # last character
Str_sub(s, 1, -2, &len) → "Hello, World" (len=12)  # all but last

# Test 3: No allocation - returns view
int len;
char *sub = Str_sub(s, 1, 5, &len);
# sub points into s, no malloc!
# len = 5
# IMPORTANT: sub is NOT null-terminated!

# Test 4: Character operations
Str_chr(s, 1, 0, 'o') → 5      # first 'o' at position 5
Str_rchr(s, 1, 0, 'o') → 9     # last 'o' at position 9
Str_chr(s, 1, 0, 'z') → 0      # 'z' not found (returns 0)

# Test 5: Substring search
Str_find(s, 1, 0, "World") → 8  # "World" starts at position 8
Str_find(s, 1, 0, "world") → 0  # case-sensitive, not found

# Test 6: Comparison
Str_cmp(s, 1, 5, t, 1, 5) → 0   # compare "Hello" == "Hello"
Str_cmp("abc", 1, 0, "abd", 1, 0) → -1  # "abc" < "abd"

# Test 7: Concatenation (allocates!)
char *result = Str_cat(s1, 1, 0, s2, 1, 0);
# result = s1 + s2, newly allocated
# caller must free

# Test 8: Character mapping
char *mapped = Str_map("Hello", 1, 0, "aeiou", "AEIOU");
# result = "HEllO" (vowels uppercased)

# Test 9: Allocated copy with null terminator
char *copy = Str_dup(s, 1, 5, 1);  # alloc=1 means null-terminate
# copy = "Hello\0" (6 bytes allocated)
# Can be used with printf, etc.

4. Solution Architecture

4.1 High-Level Design

Str Module Architecture
--------------------------------------------------------------------------------

                    ┌─────────────────────────────────────┐
                    │           str.h (Interface)           │
                    │                                       │
                    │  Position-based API:                  │
                    │  - Positions are 1-based             │
                    │  - Negative = from end               │
                    │  - 0 = after last char               │
                    │                                       │
                    │  Str_sub, Str_dup, Str_chr, etc.     │
                    └─────────────────┬─────────────────────┘
                                      │
                    ┌─────────────────┴─────────────────────┐
                    │          str.c (Implementation)        │
                    │                                       │
                    │  Core helper: convert(s, i, j)        │
                    │  - Converts positions to indices      │
                    │  - Handles negative values            │
                    │  - Validates bounds                   │
                    │                                       │
                    │  Internal:                            │
                    │  - Works with (pointer, length)       │
                    │  - No null-termination assumptions    │
                    │                                       │
                    └───────────────────────────────────────┘
                                      │
                    ┌─────────────────┴─────────────────────┐
                    │              Dependencies             │
                    │                                       │
                    │  mem.h ─────► ALLOC for Str_dup, etc  │
                    │  assert.h ──► position validation     │
                    └───────────────────────────────────────┘

4.2 Key Components

Position Conversion:

// Convert position i to 0-based index into string of length n
// Positions: 1 = first char, -1 = last char, 0 = past end
static int convert(int n, int i) {
    if (i == 0)
        return n;
    if (i < 0)
        return n + i + 1;  // -1 → n, -2 → n-1, etc.
    return i;               // Positive: i → i (1-based to 1-based)
}

// For a string "Hello" (n=5):
// convert(5, 1)  → 1  (first char)
// convert(5, 5)  → 5  (last char)
// convert(5, 0)  → 5  (past end, same as length)
// convert(5, -1) → 5  (last char)
// convert(5, -5) → 1  (first char)

4.3 Data Structures

Unlike most CII modules, Str doesn’t define a new opaque type. It operates on const char* strings directly, with lengths passed explicitly or derived.

The key insight is representing substrings as (pointer, length) pairs internally:

// Internal representation during processing
struct str_view {
    const char *data;  // Points into source string
    int len;           // Length of substring
};

// Example: substring "llo" of "Hello"
// data = original + 2   (points to 'l')
// len = 3

// This is NOT exposed to users - just used internally
// Users work with positions, not indices

4.4 Algorithm Overview

Position-to-Index Conversion:

convert(length, position):
    if position == 0:
        return length        # Past end
    if position < 0:
        return length + position + 1
    return position          # 1-based, keep as-is

Then adjust for 0-based C indexing:
    index = convert(length, position) - 1

Substring Extraction:

char *Str_sub(const char *s, int i, int j, int *len) {
    int n = strlen(s);

    // Convert positions to 1-based values
    i = convert(n, i);
    j = convert(n, j);

    // Validate
    assert(1 <= i && i <= n + 1);
    assert(1 <= j && j <= n + 1);
    assert(i <= j + 1);  // Empty string OK when i == j + 1

    // Set length
    if (len)
        *len = j - i + 1;

    // Return pointer to start (convert to 0-based index)
    return (char *)s + i - 1;
}

Character Search:

int Str_chr(const char *s, int i, int j, int c) {
    int n = strlen(s);
    i = convert(n, i);
    j = convert(n, j);

    assert(1 <= i && i <= n + 1);
    assert(1 <= j && j <= n + 1);
    assert(i <= j + 1);

    // Search from position i to j
    for (int k = i; k <= j; k++) {
        if (s[k-1] == c)  // k-1 for 0-based index
            return k;      // Return 1-based position
    }
    return 0;  // Not found
}

5. Implementation Guide

5.1 Development Environment Setup

# Required tools
gcc --version          # GCC with C11 support
make --version
valgrind --version

# Project structure
C_INTERFACES_AND_IMPLEMENTATIONS_MASTERY/
├── include/
│   ├── str.h
│   ├── mem.h
│   └── assert.h
├── src/
│   ├── str.c
│   ├── mem.c
│   └── assert.c
├── tests/
│   └── str_test.c
└── Makefile

# Compiler flags
CFLAGS = -Wall -Wextra -Wpedantic -std=c11 -g

5.2 Project Structure

str.h          Interface (public API)
str.c          Implementation
str_test.c     Comprehensive test suite

5.3 The Core Question You’re Answering

“How do you design a string interface that prevents the off-by-one and buffer overflow errors that plague C string code?”

The answer involves:

1-based positions that match human counting
Explicit length tracking
Zero-copy where possible
Clear assertions on invalid access

5.4 Concepts You Must Understand First

Before implementing, ensure you can answer:

C string basics: What is strlen actually doing? Why is it O(n)?
Pointer arithmetic: What does s + 5 mean when s is char*?
0-based vs 1-based: Why do most languages use 0-based indexing?
String views: What’s the difference between char* and a copied string?

5.5 Questions to Guide Your Design

Validation: What happens if positions are out of range?
Null termination: Does Str guarantee null-terminated results?
Mutability: Can Str operations modify the source string?
Integration: How do you convert between Str and char*?
Empty strings: What do positions mean for an empty string?

5.6 Thinking Exercise

Trace through these operations on “Hello, World!” (13 chars):

String: H e l l o ,   W o r l  d  !
Index:  0 1 2 3 4 5 6 7 8 9 10 11 12  (0-based C index)
Pos:    1 2 3 4 5 6 7 8 9 10 11 12 13 (1-based position)
NegPos:-13-12... -6-5-4-3-2-1          (negative position)

Convert these positions to 0-based indices:

Position 1 → index ?
Position 8 → index ?
Position -1 → index ?
Position -6 → index ?
Position 0 → past end

Click to see solution

``` Position 1 → index 0 (H) Position 8 → index 7 (W) Position -1 → index 12 (!) Position -6 → index 7 (W) Position 0 → index 13 (past end, used as endpoint) Formula: - Positive i: index = i - 1 - Negative i: index = length + i - Zero: index = length For "Hello, World!" (n=13): - convert(13, 1) = 1, then index = 0 - convert(13, 8) = 8, then index = 7 - convert(13, -1) = 13 + (-1) + 1 = 13, then index = 12 - convert(13, -6) = 13 + (-6) + 1 = 8, then index = 7 - convert(13, 0) = 13, then index = 13 (past end) ```

5.7 Hints in Layers

Hint 1 - Starting Point (Conceptual Direction):

The key function is position conversion. Get this right first:

// Hanson's convert function (from CII)
// Returns 1-based position given possibly negative input
static int convert(int n, int i) {
    // n = string length
    // i = position (1-based, or negative, or 0)

    if (i == 0)
        return n + 1;  // Position 0 means "past end"
    else if (i < 0)
        return n + 1 + i;  // -1 → n, -2 → n-1, etc.
    else
        return i;  // Positive: keep as-is
}

Hint 2 - Next Level (More Specific Guidance):

Most functions follow this pattern:

int Str_something(const char *s, int i, int j, ...) {
    int n = strlen(s);  // Get length once

    // Convert positions
    i = convert(n, i);
    j = convert(n, j);

    // Validate: positions must be valid for string
    assert(1 <= i && i <= n + 1);
    assert(1 <= j && j <= n + 1);

    // For substring operations: i must not exceed j+1
    assert(i <= j + 1);  // i == j+1 means empty substring

    // Now work with 0-based indices: s[i-1] to s[j-1]
    // ...
}

Hint 3 - Technical Details (Approach/Pseudocode):

For Str_sub (zero-copy substring):

char *Str_sub(const char *s, int i, int j, int *len) {
    int n = strlen(s);
    i = convert(n, i);
    j = convert(n, j);

    assert(s);
    assert(1 <= i && i <= n + 1);
    assert(1 <= j && j <= n + 1);
    assert(i <= j + 1);

    if (len)
        *len = j - i + 1;  // Could be 0 if i > j

    // Return pointer into original string
    // i is 1-based, so s + i - 1 gives 0-based start
    return (char *)(s + i - 1);
}

For Str_dup (allocated copy):

char *Str_dup(const char *s, int i, int j, int alloc) {
    int len;
    char *sub = Str_sub(s, i, j, &len);

    // Allocate: len bytes + 1 for null if alloc is true
    char *copy = ALLOC(len + (alloc ? 1 : 0));
    memcpy(copy, sub, len);
    if (alloc)
        copy[len] = '\0';

    return copy;
}

Hint 4 - Tools/Debugging (Verification Methods):

Test edge cases carefully:

// Empty string
char *empty = "";
assert(Str_len(empty, 1, 0) == 0);  // Position 1 to end

// Single character
char *one = "X";
assert(Str_chr(one, 1, 0, 'X') == 1);   // Found at position 1
assert(Str_chr(one, 1, 0, 'Y') == 0);   // Not found

// Position 0 as endpoint
assert(Str_len("Hello", 1, 0) == 5);    // All chars
assert(Str_len("Hello", -1, 0) == 1);   // Just last char

// Negative positions
assert(Str_chr("Hello", -3, 0, 'l') == 4);  // Search last 3

5.8 The Interview Questions They’ll Ask

“Design a string library that prevents buffer overflows. What’s your approach?”

Expected Strong Answer: I’d track length separately from data (like Pascal strings, Rust’s &str, Go’s strings). Bounds checking on every access via assertions. Substring operations return views (pointer + length) rather than copies, eliminating allocation. Position-based API with validation prevents off-by-one errors. For output to C functions needing null-termination, provide explicit to_cstr() that allocates and null-terminates.
“Implement substring search. What algorithm would you use and why?”

Expected Strong Answer: For simple cases, naive O(nm) search is fine and has good cache behavior. For larger texts or repeated searches, I’d use Boyer-Moore (O(n/m) best case by skipping) or KMP (O(n+m) guaranteed). In practice, the naive algorithm with SIMD (like glibc’s memmem) is often fastest for typical string lengths. The choice depends on: pattern length, alphabet size, and whether pattern is reused.
“Compare string views (Rust/C++) to null-terminated strings. Trade-offs?”

Expected Strong Answer: String views store (pointer, length), null-terminated strings rely on sentinel. Views: O(1) length, can contain nulls, substring is free (just adjust pointer/length), but need explicit lifetime management. Null-terminated: compatible with C, self-describing size (sort of), but O(n) strlen, can’t contain nulls, substring requires copy or mutation. Modern languages overwhelmingly prefer views.
“How would you implement Unicode support in a string library?”

Expected Strong Answer: First, decide: UTF-8, UTF-16, or UTF-32 internally? UTF-8 is space-efficient and C-compatible but variable-width. Key insight: indices can mean bytes, code points, or grapheme clusters - each gives different results for position 5 in “cafe\u0301” (cafe with combining accent). I’d provide byte-indexed operations (fast) and optional grapheme-aware iteration. Rust does this with String (UTF-8 bytes) and .chars() iterator.
“What’s the complexity of your substring operation? Does it allocate?”

Expected Strong Answer: Str_sub is O(1) - just pointer arithmetic and validation. It does NOT allocate - returns a pointer into the original string. The caller receives (pointer, length) and must understand the returned string is not null-terminated and only valid while the source string exists. For a null-terminated copy, use Str_dup which is O(n) for the copy. This zero-copy approach is critical for parser performance.

5.9 Books That Will Help

Topic	Book	Chapter
Str implementation	“C Interfaces and Implementations” by Hanson	Ch. 15
String algorithms	“Algorithms in C” by Sedgewick	String processing
Safe string handling	“Secure Coding in C and C++” by Seacord	Ch. 2
Modern string design	“Programming Rust” by Blandy & Orendorff	String chapters
Efficient string operations	“The Practice of Programming” by Kernighan & Pike	Ch. 9

5.10 Implementation Phases

Phase 1: Position Conversion (Day 1)

Implement convert() helper
Test with positive, negative, and zero positions
Verify edge cases (empty string, single char)

Phase 2: Core Operations (Day 2)

Implement Str_sub (zero-copy substring)
Implement Str_dup (allocated copy)
Implement Str_len
Test thoroughly

Phase 3: Search Operations (Day 3)

Implement Str_chr (forward search for char)
Implement Str_rchr (reverse search)
Implement Str_find (substring search)
Implement Str_rfind

Phase 4: Advanced Operations (Day 4)

Implement Str_upto (search for char set)
Implement Str_cmp (substring comparison)
Implement Str_cat (concatenation)
Implement Str_map (character mapping)

Phase 5: Integration & Polish (Day 5)

Integration tests
Memory leak testing
Edge case coverage
Documentation

5.11 Key Implementation Decisions

Position Conversion: Use Hanson’s convert() formula exactly - it’s been tested for decades
Zero Return for Not Found: Str_chr returns 0 when not found (since 0 is never a valid position)
Empty Substrings: Allow i > j to represent empty substring (length 0)
Const Correctness: Functions take const char* but Str_sub returns non-const (C limitation)
Allocation Policy: Str_sub never allocates; Str_dup, Str_cat always allocate

6. Testing Strategy

Unit Tests

void test_position_conversion(void) {
    // Test convert helper indirectly through Str_sub
    int len;

    // Positive positions
    char *s = "Hello";
    assert(Str_sub(s, 1, 1, &len) == s);  // First char
    assert(len == 1);

    assert(Str_sub(s, 1, 5, &len) == s);  // Whole string
    assert(len == 5);

    // Negative positions
    assert(Str_sub(s, -1, -1, &len) == s + 4);  // Last char
    assert(len == 1);

    assert(Str_sub(s, -5, -1, &len) == s);  // Whole string via negatives
    assert(len == 5);

    // Mixed
    assert(Str_sub(s, 1, -1, &len) == s);  // Start to last
    assert(len == 5);

    // Position 0 as end
    assert(Str_sub(s, 1, 0, &len) == s);
    assert(len == 5);

    printf("test_position_conversion: PASSED\n");
}

void test_chr_operations(void) {
    char *s = "Hello, World!";

    // Forward search
    assert(Str_chr(s, 1, 0, 'H') == 1);
    assert(Str_chr(s, 1, 0, 'o') == 5);   // First 'o'
    assert(Str_chr(s, 1, 0, 'z') == 0);   // Not found
    assert(Str_chr(s, 6, 0, 'o') == 9);   // 'o' in "World"

    // Reverse search
    assert(Str_rchr(s, 1, 0, 'o') == 9);  // Last 'o'
    assert(Str_rchr(s, 1, 5, 'o') == 5);  // Last 'o' in "Hello"

    printf("test_chr_operations: PASSED\n");
}

void test_substring_search(void) {
    char *s = "Hello, World!";

    assert(Str_find(s, 1, 0, "Hello") == 1);
    assert(Str_find(s, 1, 0, "World") == 8);
    assert(Str_find(s, 1, 0, "world") == 0);  // Case-sensitive
    assert(Str_find(s, 1, 0, "") == 1);       // Empty matches at start

    assert(Str_rfind(s, 1, 0, "o") == 9);     // Last 'o'

    printf("test_substring_search: PASSED\n");
}

Edge Cases

void test_edge_cases(void) {
    int len;

    // Empty string
    char *empty = "";
    assert(Str_len(empty, 1, 0) == 0);
    assert(Str_chr(empty, 1, 0, 'x') == 0);

    // Single character
    char *one = "X";
    assert(Str_len(one, 1, 0) == 1);
    assert(Str_chr(one, 1, 0, 'X') == 1);
    assert(Str_sub(one, 1, 1, &len) == one);
    assert(len == 1);

    // Empty substring (i > j after conversion)
    char *s = "Hello";
    assert(Str_len(s, 3, 2) == 0);  // Position 3 to 2 = empty

    // Boundary positions
    assert(Str_sub(s, 5, 5, &len) == s + 4);  // Just 'o'
    assert(len == 1);

    printf("test_edge_cases: PASSED\n");
}

Allocation Tests

void test_allocating_functions(void) {
    char *s = "Hello, World!";

    // Str_dup with null terminator
    char *copy = Str_dup(s, 1, 5, 1);  // "Hello\0"
    assert(strcmp(copy, "Hello") == 0);
    assert(strlen(copy) == 5);
    FREE(copy);

    // Str_cat
    char *cat = Str_cat(s, 1, 5, s, 8, 0);  // "Hello" + "World!"
    assert(strcmp(cat, "HelloWorld!") == 0);
    FREE(cat);

    // Str_map
    char *mapped = Str_map("Hello", 1, 0, "aeiou", "AEIOU");
    assert(strcmp(mapped, "HEllO") == 0);
    FREE(mapped);

    printf("test_allocating_functions: PASSED\n");
}

7. Common Pitfalls & Debugging

Problem 1: “Str_sub returns wrong pointer”

Symptom: Characters at wrong position
Why: Off-by-one in position-to-index conversion
Fix: Remember: position 1 = index 0, so s + i - 1
Test: Verify Str_sub(s, 1, 1, &len) returns pointer to first char

Problem 2: “Negative indices don’t work correctly”

Symptom: Position -1 gives wrong character
Why: Wrong formula for negative conversion
Fix: For length n and position i < 0: convert to n + i + 1 (1-based result)
Test: For “Hello”, position -1 should give ‘o’ (index 4)

Problem 3: “Str_find doesn’t find substring at start”

Symptom: Searching for “Hello” in “Hello World” returns 0
Why: Returning 0 instead of position 1
Fix: Found at start means position 1, not 0. Return 0 only for not-found.
Test: Str_find("abc", 1, 0, "a") should return 1

Problem 4: “Crash when using Str_sub result with printf”

Symptom: Garbage output or crash
Why: Str_sub returns non-null-terminated pointer
Fix: Use %.*s format or Str_dup with alloc=1
Test: printf("%.*s\n", len, Str_sub(s, i, j, &len))

Problem 5: “Position 0 doesn’t work as expected”

Symptom: Str_sub(s, 1, 0, &len) gives wrong length
Why: Position 0 should mean “end of string” (n+1), not index 0
Fix: In convert(), handle i==0 specially: return n+1
Test: For “Hello”, Str_sub(s, 1, 0, &len) should have len=5

8. Extensions & Challenges

8.1 Case-Insensitive Operations

Add Str_ichr, Str_ifind, Str_icmp variants that ignore case.

8.2 Pattern Matching

Add Str_match with glob patterns: * matches any sequence, ? matches any character.

8.3 Trim Operations

Add Str_ltrim, Str_rtrim, Str_trim that return positions of first/last non-whitespace.

8.4 Split/Join

Add Str_split(s, delim) returning array of (position, length) pairs.

8.5 UTF-8 Awareness

Add Str_uchr that finds Unicode code points, not just bytes.

9. Real-World Connections

Redis Simple Dynamic Strings (SDS)

// Redis uses similar principles
struct sdshdr {
    int len;        // Length of string
    int free;       // Free bytes in buffer
    char buf[];     // The data
};

// O(1) length, explicit tracking
// Hanson's Str predates SDS by ~10 years

Rust String Slices

// Rust's &str is exactly Str's philosophy
let s = "Hello, World!";
let hello: &str = &s[0..5];  // Zero-copy slice
// hello points into s, doesn't own memory
// Same concept as Str_sub returning pointer + length

Go Strings

// Go strings internally are (pointer, length)
type stringHeader struct {
    Data uintptr
    Len  int
}
// Substrings share backing array
// Same zero-copy principle

Protocol Parsers

// Parsing HTTP headers
// "Content-Length: 42\r\n"
int pos = Str_chr(line, 1, 0, ':');
char *name = Str_sub(line, 1, pos - 1, &name_len);
char *value = Str_sub(line, pos + 2, -3, &val_len);  // Skip ": " and "\r\n"
// Zero-copy extraction of fields

10. Resources

Primary References

C Interfaces and Implementations by David Hanson, Chapter 15
- Official CII source: https://github.com/drh/cii
Secure Coding in C and C++ by Robert Seacord
- Chapter on string handling security

Online Resources

Redis SDS: https://redis.io/docs/reference/internals/sds/
Rust String handling: https://doc.rust-lang.org/book/ch08-02-strings.html
C string vulnerabilities: https://cwe.mitre.org/data/definitions/120.html

11. Self-Assessment Checklist

Before considering this project complete, verify:

12. Submission / Completion Criteria

Your Str module implementation is complete when:

All tests pass: Position semantics, edge cases, allocations
Memory-safe: No leaks (Str_dup/cat return allocated memory)
Documented: Header explains position semantics clearly
Follows CII conventions: Uses Mem module, assertions
Zero-copy verification: Str_sub demonstrably doesn’t allocate

Deliverables:

str.h - Interface with position semantics documentation
str.c - Implementation
str_test.c - Comprehensive test suite
Makefile - Build configuration
Brief writeup explaining: (1) position conversion formula, (2) zero-copy approach, (3) how you handle C string interop