Project 14: Str Module - Low-Level String Operations
Build a string module with position-based indexing (1-based, with negative indices from end) that operates on substrings without allocation.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3 - Advanced |
| Time Estimate | 4-5 days |
| Language | C |
| Prerequisites | Mem module, Assert module, Understanding of C strings |
| Key Topics | String processing, Memory efficiency, Position semantics, Zero-copy operations |
1. Learning Objectives
By completing this project, you will:
- Design safer string interfaces - Create APIs that prevent common C string errors
- Implement position-based indexing - Use 1-based indices with negative indices from end
- Master zero-copy operations - Return pointers into existing strings without allocation
- Handle string views - Understand the difference between owning and borrowing strings
- Bridge to C strings - Interoperate with standard null-terminated strings
Why Str Matters:
C strings are notoriously error-prone:
- Buffer overflows from
strcpy,sprintf - Off-by-one errors with 0-based indexing
- O(n) length calculation with
strlen - No substring support without copying
- Null terminator confusion
Hanson’s Str module addresses these problems by:
- Using 1-based, end-inclusive positions (like human counting)
- Supporting negative indices (like Python)
- Tracking length explicitly
- Returning views (pointers) instead of copies
- Clear ownership semantics
Real-World Applications:
- Parsers: Efficient substring extraction during lexing/parsing
- Protocol handlers: Extracting fields from packet payloads
- Text editors: Manipulating ranges of text efficiently
- Log analysis: Processing log entries without copying
2. Theoretical Foundation
2.1 Core Concepts
The Problems with C Strings
C String Problems
--------------------------------------------------------------------------------
Problem 1: Off-by-one errors with 0-based indexing
char s[] = "Hello"; // indices 0,1,2,3,4
s[5] = '!'; // Bug! Off the end
char c = s[5]; // What's at index 5? The null terminator!
// Which index is 'o'?
// Answer: 4 (not 5, because we count from 0)
// This catches EVERYONE sometimes
Problem 2: strlen is O(n)
char *s = malloc(1000000);
// ... fill with data ...
int len = strlen(s); // Walks ALL MILLION BYTES
// In a loop?
for (int i = 0; i < strlen(s); i++) // O(n^2) disaster!
process(s[i]);
Problem 3: No substring without copying
char s[] = "Hello, World!";
// I want just "World" - how?
char sub[6];
strncpy(sub, s + 7, 5); // Copy 5 chars
sub[5] = '\0'; // Don't forget null terminator!
// Memory allocated, copy made, error-prone
Problem 4: Buffer overflows
char buf[10];
strcpy(buf, "This string is way too long"); // BOOM
// No bounds checking, memory corruption
Problem 5: Null in data
char data[] = {0x48, 0x00, 0x65, 0x6c}; // Binary data with null
int len = strlen(data); // Returns 1, not 4!
// C strings can't contain null bytes
Position-Based Indexing
Hanson’s Str uses a different indexing scheme:
Str Position Semantics
--------------------------------------------------------------------------------
String: "Hello"
H e l l o
Position: 1 2 3 4 5 (positive: count from start, 1-based)
-5-4-3-2-1 (negative: count from end)
0 (position 0 = after last character)
Key insight: Positions are between characters, not on them
↓1 ↓2 ↓3 ↓4 ↓5 ↓0 (end)
| H | e | l | l | o |
↑-5 ↑-4 ↑-3 ↑-2 ↑-1 ↑
Str_sub(s, 1, 5) → "Hello" (positions 1 to 5)
Str_sub(s, 2, 4) → "ell" (positions 2 to 4)
Str_sub(s, -3, 0) → "llo" (position -3 to end)
Str_sub(s, 1, -1) → "Hell" (position 1 to second-to-last)
Why 1-based?
- Matches human counting: "first character", "fifth character"
- Position 0 has special meaning (end of string)
- Negative indices work more intuitively
Why negative indices?
- Access from end without knowing length
- s[-1] = last character (like Python)
- s[-2] = second to last
- Common operations become simpler
Comparison:
C (0-based): Str (1-based):
s[0] = first char Str_get(s, 1) = first char
s[n-1] = last char Str_get(s, -1) = last char
s[n] = null term position 0 = end marker
Zero-Copy Substring Operations
Zero-Copy Substring
--------------------------------------------------------------------------------
Traditional substring (copies):
char original[] = "Hello, World!";
char *sub = substring(original, 7, 5);
// sub is NEWLY ALLOCATED, points to copy of "World"
┌─────────────────────┐
│ H e l l o , W o r │ l d ! \0 original
└─────────────────────┘
┌─────────────┐
│ W o r l d \0│ sub (NEW MEMORY)
└─────────────┘
Str module (zero-copy):
char original[] = "Hello, World!";
int len;
char *sub = Str_sub(original, 8, 12, &len);
// sub points INTO original, len = 5
┌─────────────────────┐
│ H e l l o , W o r │ l d ! \0 original
└─────────────────────┘
↑
sub (points here, len=5)
Benefits:
- No allocation needed
- No copying
- Faster
- Less memory
Gotchas:
- sub is NOT null-terminated
- sub is only valid while original exists
- Must track length separately
- Can't modify without affecting original
2.2 Why This Matters
Understanding string manipulation is fundamental because:
- Security: Buffer overflows are still a top vulnerability class
- Performance: String operations are often on the critical path
- Correctness: Off-by-one errors plague every codebase
- Memory: Unnecessary copies waste RAM and cache
Industry Impact:
- Redis’s SDS (Simple Dynamic Strings) uses similar principles
- Rust’s string slices are the same concept
- Go’s strings track length separately from data
- Every modern language has moved away from null-terminated strings
2.3 Historical Context
C strings have been problematic since the beginning:
- 1970s: C inherits null-terminated strings from B and BCPL
- 1988: Morris Worm exploits buffer overflow in fingerd
- 1990s: Constant stream of buffer overflow vulnerabilities
- 2000s: Languages like Python, Java, Go abandon null-termination
- 2014: Heartbleed (OpenSSL) shows C string handling still dangerous
- Today: Rust’s borrow checker + string slices provide safety
Hanson’s Contribution: The Str module (1996) anticipated modern string slice/view concepts by decades. The position semantics and zero-copy approach influenced later designs.
2.4 Common Misconceptions
Misconception 1: “Negative indices are confusing”
- Reality: Once learned, they’re more intuitive for end-relative access. Python programmers use them constantly.
Misconception 2: “Zero-copy is dangerous because of lifetime issues”
- Reality: The danger exists with any pointer in C. Clear documentation and discipline solve this.
Misconception 3: “1-based indexing is old-fashioned”
- Reality: Many modern languages (Lua, Julia, MATLAB) use 1-based indexing. It matches human counting.
Misconception 4: “Str is just a wrapper around char*“
- Reality: Str provides a fundamentally different abstraction - length-tracked, position-semantics strings.
3. Project Specification
3.1 What You Will Build
A complete Str module implementing Hanson’s CII interface:
// str.h - The Interface
#ifndef STR_INCLUDED
#define STR_INCLUDED
#include <stddef.h>
/* Substring extraction - ZERO COPY
* Returns pointer into s, sets *len to substring length
* i, j are positions (1-based, negative from end, 0 = end)
*/
extern char *Str_sub (const char *s, int i, int j, int *len);
extern char *Str_dup (const char *s, int i, int j, int alloc);
/* Character operations */
extern int Str_chr (const char *s, int i, int j, int c);
extern int Str_rchr (const char *s, int i, int j, int c);
extern int Str_upto (const char *s, int i, int j, const char *set);
extern int Str_rupto(const char *s, int i, int j, const char *set);
extern int Str_find (const char *s, int i, int j, const char *str);
extern int Str_rfind(const char *s, int i, int j, const char *str);
/* Comparison */
extern int Str_cmp (const char *s1, int i1, int j1,
const char *s2, int i2, int j2);
/* Concatenation */
extern char *Str_cat (const char *s1, int i1, int j1,
const char *s2, int i2, int j2);
extern char *Str_catv (const char *s, int i, int j, ...);
/* Mapping */
extern char *Str_map (const char *s, int i, int j,
const char *from, const char *to);
/* String length */
extern int Str_len (const char *s, int i, int j);
/* Position conversion helpers */
extern int Str_pos (const char *s, int i); /* Convert to index */
#undef T
#endif
3.2 Functional Requirements
- Position semantics: 1-based, negative from end, 0 = end
- Str_sub(s, i, j, &len): Return pointer to substring, set len
- Str_dup(s, i, j, alloc): Return allocated copy (alloc = include null terminator)
- Str_chr/rchr: Find character in range (forward/reverse)
- Str_upto/rupto: Find first char from set in range
- Str_find/rfind: Find substring in range
- Str_cmp: Compare substrings
- Str_cat/catv: Concatenate substrings (allocates result)
- Str_map: Character mapping (like tr command)
- Str_len: Return length of substring
3.3 Non-Functional Requirements
- Zero-copy where possible: Str_sub returns pointer, not copy
- Explicit length: Never rely on null termination in internal operations
- Bounds checking: Assert on invalid positions
- Memory safety: Use Mem module for allocations
- Interoperability: Work with standard null-terminated C strings
3.4 Example Usage / Output
$ ./str_test
# Test 1: Position-based indexing (1-based!)
s = "Hello, World!"
Str_sub(s, 1, 5, &len) → "Hello" (len=5) # positions 1-5
Str_sub(s, 8, 0, &len) → "World!" (len=6) # position 8 to end
Str_sub(s, -6, 0, &len) → "World!" (len=6) # -6 = 6th from end
# Test 2: Negative indices
Str_sub(s, -1, -1, &len) → "!" (len=1) # last character
Str_sub(s, 1, -2, &len) → "Hello, World" (len=12) # all but last
# Test 3: No allocation - returns view
int len;
char *sub = Str_sub(s, 1, 5, &len);
# sub points into s, no malloc!
# len = 5
# IMPORTANT: sub is NOT null-terminated!
# Test 4: Character operations
Str_chr(s, 1, 0, 'o') → 5 # first 'o' at position 5
Str_rchr(s, 1, 0, 'o') → 9 # last 'o' at position 9
Str_chr(s, 1, 0, 'z') → 0 # 'z' not found (returns 0)
# Test 5: Substring search
Str_find(s, 1, 0, "World") → 8 # "World" starts at position 8
Str_find(s, 1, 0, "world") → 0 # case-sensitive, not found
# Test 6: Comparison
Str_cmp(s, 1, 5, t, 1, 5) → 0 # compare "Hello" == "Hello"
Str_cmp("abc", 1, 0, "abd", 1, 0) → -1 # "abc" < "abd"
# Test 7: Concatenation (allocates!)
char *result = Str_cat(s1, 1, 0, s2, 1, 0);
# result = s1 + s2, newly allocated
# caller must free
# Test 8: Character mapping
char *mapped = Str_map("Hello", 1, 0, "aeiou", "AEIOU");
# result = "HEllO" (vowels uppercased)
# Test 9: Allocated copy with null terminator
char *copy = Str_dup(s, 1, 5, 1); # alloc=1 means null-terminate
# copy = "Hello\0" (6 bytes allocated)
# Can be used with printf, etc.
4. Solution Architecture
4.1 High-Level Design
Str Module Architecture
--------------------------------------------------------------------------------
┌─────────────────────────────────────┐
│ str.h (Interface) │
│ │
│ Position-based API: │
│ - Positions are 1-based │
│ - Negative = from end │
│ - 0 = after last char │
│ │
│ Str_sub, Str_dup, Str_chr, etc. │
└─────────────────┬─────────────────────┘
│
┌─────────────────┴─────────────────────┐
│ str.c (Implementation) │
│ │
│ Core helper: convert(s, i, j) │
│ - Converts positions to indices │
│ - Handles negative values │
│ - Validates bounds │
│ │
│ Internal: │
│ - Works with (pointer, length) │
│ - No null-termination assumptions │
│ │
└───────────────────────────────────────┘
│
┌─────────────────┴─────────────────────┐
│ Dependencies │
│ │
│ mem.h ─────► ALLOC for Str_dup, etc │
│ assert.h ──► position validation │
└───────────────────────────────────────┘
4.2 Key Components
Position Conversion:
// Convert position i to 0-based index into string of length n
// Positions: 1 = first char, -1 = last char, 0 = past end
static int convert(int n, int i) {
if (i == 0)
return n;
if (i < 0)
return n + i + 1; // -1 → n, -2 → n-1, etc.
return i; // Positive: i → i (1-based to 1-based)
}
// For a string "Hello" (n=5):
// convert(5, 1) → 1 (first char)
// convert(5, 5) → 5 (last char)
// convert(5, 0) → 5 (past end, same as length)
// convert(5, -1) → 5 (last char)
// convert(5, -5) → 1 (first char)
4.3 Data Structures
Unlike most CII modules, Str doesn’t define a new opaque type. It operates on const char* strings directly, with lengths passed explicitly or derived.
The key insight is representing substrings as (pointer, length) pairs internally:
// Internal representation during processing
struct str_view {
const char *data; // Points into source string
int len; // Length of substring
};
// Example: substring "llo" of "Hello"
// data = original + 2 (points to 'l')
// len = 3
// This is NOT exposed to users - just used internally
// Users work with positions, not indices
4.4 Algorithm Overview
Position-to-Index Conversion:
convert(length, position):
if position == 0:
return length # Past end
if position < 0:
return length + position + 1
return position # 1-based, keep as-is
Then adjust for 0-based C indexing:
index = convert(length, position) - 1
Substring Extraction:
char *Str_sub(const char *s, int i, int j, int *len) {
int n = strlen(s);
// Convert positions to 1-based values
i = convert(n, i);
j = convert(n, j);
// Validate
assert(1 <= i && i <= n + 1);
assert(1 <= j && j <= n + 1);
assert(i <= j + 1); // Empty string OK when i == j + 1
// Set length
if (len)
*len = j - i + 1;
// Return pointer to start (convert to 0-based index)
return (char *)s + i - 1;
}
Character Search:
int Str_chr(const char *s, int i, int j, int c) {
int n = strlen(s);
i = convert(n, i);
j = convert(n, j);
assert(1 <= i && i <= n + 1);
assert(1 <= j && j <= n + 1);
assert(i <= j + 1);
// Search from position i to j
for (int k = i; k <= j; k++) {
if (s[k-1] == c) // k-1 for 0-based index
return k; // Return 1-based position
}
return 0; // Not found
}
5. Implementation Guide
5.1 Development Environment Setup
# Required tools
gcc --version # GCC with C11 support
make --version
valgrind --version
# Project structure
C_INTERFACES_AND_IMPLEMENTATIONS_MASTERY/
├── include/
│ ├── str.h
│ ├── mem.h
│ └── assert.h
├── src/
│ ├── str.c
│ ├── mem.c
│ └── assert.c
├── tests/
│ └── str_test.c
└── Makefile
# Compiler flags
CFLAGS = -Wall -Wextra -Wpedantic -std=c11 -g
5.2 Project Structure
str.h Interface (public API)
str.c Implementation
str_test.c Comprehensive test suite
5.3 The Core Question You’re Answering
“How do you design a string interface that prevents the off-by-one and buffer overflow errors that plague C string code?”
The answer involves:
- 1-based positions that match human counting
- Explicit length tracking
- Zero-copy where possible
- Clear assertions on invalid access
5.4 Concepts You Must Understand First
Before implementing, ensure you can answer:
- C string basics: What is
strlenactually doing? Why is it O(n)? - Pointer arithmetic: What does
s + 5mean when s ischar*? - 0-based vs 1-based: Why do most languages use 0-based indexing?
- String views: What’s the difference between
char*and a copied string?
5.5 Questions to Guide Your Design
- Validation: What happens if positions are out of range?
- Null termination: Does Str guarantee null-terminated results?
- Mutability: Can Str operations modify the source string?
- Integration: How do you convert between Str and char*?
- Empty strings: What do positions mean for an empty string?
5.6 Thinking Exercise
Trace through these operations on “Hello, World!” (13 chars):
String: H e l l o , W o r l d !
Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 (0-based C index)
Pos: 1 2 3 4 5 6 7 8 9 10 11 12 13 (1-based position)
NegPos:-13-12... -6-5-4-3-2-1 (negative position)
Convert these positions to 0-based indices:
- Position 1 → index ?
- Position 8 → index ?
- Position -1 → index ?
- Position -6 → index ?
- Position 0 → past end
Click to see solution
``` Position 1 → index 0 (H) Position 8 → index 7 (W) Position -1 → index 12 (!) Position -6 → index 7 (W) Position 0 → index 13 (past end, used as endpoint) Formula: - Positive i: index = i - 1 - Negative i: index = length + i - Zero: index = length For "Hello, World!" (n=13): - convert(13, 1) = 1, then index = 0 - convert(13, 8) = 8, then index = 7 - convert(13, -1) = 13 + (-1) + 1 = 13, then index = 12 - convert(13, -6) = 13 + (-6) + 1 = 8, then index = 7 - convert(13, 0) = 13, then index = 13 (past end) ```5.7 Hints in Layers
Hint 1 - Starting Point (Conceptual Direction):
The key function is position conversion. Get this right first:
// Hanson's convert function (from CII)
// Returns 1-based position given possibly negative input
static int convert(int n, int i) {
// n = string length
// i = position (1-based, or negative, or 0)
if (i == 0)
return n + 1; // Position 0 means "past end"
else if (i < 0)
return n + 1 + i; // -1 → n, -2 → n-1, etc.
else
return i; // Positive: keep as-is
}
Hint 2 - Next Level (More Specific Guidance):
Most functions follow this pattern:
int Str_something(const char *s, int i, int j, ...) {
int n = strlen(s); // Get length once
// Convert positions
i = convert(n, i);
j = convert(n, j);
// Validate: positions must be valid for string
assert(1 <= i && i <= n + 1);
assert(1 <= j && j <= n + 1);
// For substring operations: i must not exceed j+1
assert(i <= j + 1); // i == j+1 means empty substring
// Now work with 0-based indices: s[i-1] to s[j-1]
// ...
}
Hint 3 - Technical Details (Approach/Pseudocode):
For Str_sub (zero-copy substring):
char *Str_sub(const char *s, int i, int j, int *len) {
int n = strlen(s);
i = convert(n, i);
j = convert(n, j);
assert(s);
assert(1 <= i && i <= n + 1);
assert(1 <= j && j <= n + 1);
assert(i <= j + 1);
if (len)
*len = j - i + 1; // Could be 0 if i > j
// Return pointer into original string
// i is 1-based, so s + i - 1 gives 0-based start
return (char *)(s + i - 1);
}
For Str_dup (allocated copy):
char *Str_dup(const char *s, int i, int j, int alloc) {
int len;
char *sub = Str_sub(s, i, j, &len);
// Allocate: len bytes + 1 for null if alloc is true
char *copy = ALLOC(len + (alloc ? 1 : 0));
memcpy(copy, sub, len);
if (alloc)
copy[len] = '\0';
return copy;
}
Hint 4 - Tools/Debugging (Verification Methods):
Test edge cases carefully:
// Empty string
char *empty = "";
assert(Str_len(empty, 1, 0) == 0); // Position 1 to end
// Single character
char *one = "X";
assert(Str_chr(one, 1, 0, 'X') == 1); // Found at position 1
assert(Str_chr(one, 1, 0, 'Y') == 0); // Not found
// Position 0 as endpoint
assert(Str_len("Hello", 1, 0) == 5); // All chars
assert(Str_len("Hello", -1, 0) == 1); // Just last char
// Negative positions
assert(Str_chr("Hello", -3, 0, 'l') == 4); // Search last 3
5.8 The Interview Questions They’ll Ask
-
“Design a string library that prevents buffer overflows. What’s your approach?”
Expected Strong Answer: I’d track length separately from data (like Pascal strings, Rust’s &str, Go’s strings). Bounds checking on every access via assertions. Substring operations return views (pointer + length) rather than copies, eliminating allocation. Position-based API with validation prevents off-by-one errors. For output to C functions needing null-termination, provide explicit
to_cstr()that allocates and null-terminates. -
“Implement substring search. What algorithm would you use and why?”
Expected Strong Answer: For simple cases, naive O(nm) search is fine and has good cache behavior. For larger texts or repeated searches, I’d use Boyer-Moore (O(n/m) best case by skipping) or KMP (O(n+m) guaranteed). In practice, the naive algorithm with SIMD (like glibc’s memmem) is often fastest for typical string lengths. The choice depends on: pattern length, alphabet size, and whether pattern is reused.
-
“Compare string views (Rust/C++) to null-terminated strings. Trade-offs?”
Expected Strong Answer: String views store (pointer, length), null-terminated strings rely on sentinel. Views: O(1) length, can contain nulls, substring is free (just adjust pointer/length), but need explicit lifetime management. Null-terminated: compatible with C, self-describing size (sort of), but O(n) strlen, can’t contain nulls, substring requires copy or mutation. Modern languages overwhelmingly prefer views.
-
“How would you implement Unicode support in a string library?”
Expected Strong Answer: First, decide: UTF-8, UTF-16, or UTF-32 internally? UTF-8 is space-efficient and C-compatible but variable-width. Key insight: indices can mean bytes, code points, or grapheme clusters - each gives different results for position 5 in “cafe\u0301” (cafe with combining accent). I’d provide byte-indexed operations (fast) and optional grapheme-aware iteration. Rust does this with String (UTF-8 bytes) and .chars() iterator.
-
“What’s the complexity of your substring operation? Does it allocate?”
Expected Strong Answer: Str_sub is O(1) - just pointer arithmetic and validation. It does NOT allocate - returns a pointer into the original string. The caller receives (pointer, length) and must understand the returned string is not null-terminated and only valid while the source string exists. For a null-terminated copy, use Str_dup which is O(n) for the copy. This zero-copy approach is critical for parser performance.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Str implementation | “C Interfaces and Implementations” by Hanson | Ch. 15 |
| String algorithms | “Algorithms in C” by Sedgewick | String processing |
| Safe string handling | “Secure Coding in C and C++” by Seacord | Ch. 2 |
| Modern string design | “Programming Rust” by Blandy & Orendorff | String chapters |
| Efficient string operations | “The Practice of Programming” by Kernighan & Pike | Ch. 9 |
5.10 Implementation Phases
Phase 1: Position Conversion (Day 1)
- Implement convert() helper
- Test with positive, negative, and zero positions
- Verify edge cases (empty string, single char)
Phase 2: Core Operations (Day 2)
- Implement Str_sub (zero-copy substring)
- Implement Str_dup (allocated copy)
- Implement Str_len
- Test thoroughly
Phase 3: Search Operations (Day 3)
- Implement Str_chr (forward search for char)
- Implement Str_rchr (reverse search)
- Implement Str_find (substring search)
- Implement Str_rfind
Phase 4: Advanced Operations (Day 4)
- Implement Str_upto (search for char set)
- Implement Str_cmp (substring comparison)
- Implement Str_cat (concatenation)
- Implement Str_map (character mapping)
Phase 5: Integration & Polish (Day 5)
- Integration tests
- Memory leak testing
- Edge case coverage
- Documentation
5.11 Key Implementation Decisions
-
Position Conversion: Use Hanson’s convert() formula exactly - it’s been tested for decades
-
Zero Return for Not Found: Str_chr returns 0 when not found (since 0 is never a valid position)
-
Empty Substrings: Allow i > j to represent empty substring (length 0)
-
Const Correctness: Functions take
const char*but Str_sub returns non-const (C limitation) -
Allocation Policy: Str_sub never allocates; Str_dup, Str_cat always allocate
6. Testing Strategy
Unit Tests
void test_position_conversion(void) {
// Test convert helper indirectly through Str_sub
int len;
// Positive positions
char *s = "Hello";
assert(Str_sub(s, 1, 1, &len) == s); // First char
assert(len == 1);
assert(Str_sub(s, 1, 5, &len) == s); // Whole string
assert(len == 5);
// Negative positions
assert(Str_sub(s, -1, -1, &len) == s + 4); // Last char
assert(len == 1);
assert(Str_sub(s, -5, -1, &len) == s); // Whole string via negatives
assert(len == 5);
// Mixed
assert(Str_sub(s, 1, -1, &len) == s); // Start to last
assert(len == 5);
// Position 0 as end
assert(Str_sub(s, 1, 0, &len) == s);
assert(len == 5);
printf("test_position_conversion: PASSED\n");
}
void test_chr_operations(void) {
char *s = "Hello, World!";
// Forward search
assert(Str_chr(s, 1, 0, 'H') == 1);
assert(Str_chr(s, 1, 0, 'o') == 5); // First 'o'
assert(Str_chr(s, 1, 0, 'z') == 0); // Not found
assert(Str_chr(s, 6, 0, 'o') == 9); // 'o' in "World"
// Reverse search
assert(Str_rchr(s, 1, 0, 'o') == 9); // Last 'o'
assert(Str_rchr(s, 1, 5, 'o') == 5); // Last 'o' in "Hello"
printf("test_chr_operations: PASSED\n");
}
void test_substring_search(void) {
char *s = "Hello, World!";
assert(Str_find(s, 1, 0, "Hello") == 1);
assert(Str_find(s, 1, 0, "World") == 8);
assert(Str_find(s, 1, 0, "world") == 0); // Case-sensitive
assert(Str_find(s, 1, 0, "") == 1); // Empty matches at start
assert(Str_rfind(s, 1, 0, "o") == 9); // Last 'o'
printf("test_substring_search: PASSED\n");
}
Edge Cases
void test_edge_cases(void) {
int len;
// Empty string
char *empty = "";
assert(Str_len(empty, 1, 0) == 0);
assert(Str_chr(empty, 1, 0, 'x') == 0);
// Single character
char *one = "X";
assert(Str_len(one, 1, 0) == 1);
assert(Str_chr(one, 1, 0, 'X') == 1);
assert(Str_sub(one, 1, 1, &len) == one);
assert(len == 1);
// Empty substring (i > j after conversion)
char *s = "Hello";
assert(Str_len(s, 3, 2) == 0); // Position 3 to 2 = empty
// Boundary positions
assert(Str_sub(s, 5, 5, &len) == s + 4); // Just 'o'
assert(len == 1);
printf("test_edge_cases: PASSED\n");
}
Allocation Tests
void test_allocating_functions(void) {
char *s = "Hello, World!";
// Str_dup with null terminator
char *copy = Str_dup(s, 1, 5, 1); // "Hello\0"
assert(strcmp(copy, "Hello") == 0);
assert(strlen(copy) == 5);
FREE(copy);
// Str_cat
char *cat = Str_cat(s, 1, 5, s, 8, 0); // "Hello" + "World!"
assert(strcmp(cat, "HelloWorld!") == 0);
FREE(cat);
// Str_map
char *mapped = Str_map("Hello", 1, 0, "aeiou", "AEIOU");
assert(strcmp(mapped, "HEllO") == 0);
FREE(mapped);
printf("test_allocating_functions: PASSED\n");
}
7. Common Pitfalls & Debugging
Problem 1: “Str_sub returns wrong pointer”
- Symptom: Characters at wrong position
- Why: Off-by-one in position-to-index conversion
- Fix: Remember: position 1 = index 0, so
s + i - 1 - Test: Verify
Str_sub(s, 1, 1, &len)returns pointer to first char
Problem 2: “Negative indices don’t work correctly”
- Symptom: Position -1 gives wrong character
- Why: Wrong formula for negative conversion
- Fix: For length n and position i < 0: convert to
n + i + 1(1-based result) - Test: For “Hello”, position -1 should give ‘o’ (index 4)
Problem 3: “Str_find doesn’t find substring at start”
- Symptom: Searching for “Hello” in “Hello World” returns 0
- Why: Returning 0 instead of position 1
- Fix: Found at start means position 1, not 0. Return 0 only for not-found.
- Test:
Str_find("abc", 1, 0, "a")should return 1
Problem 4: “Crash when using Str_sub result with printf”
- Symptom: Garbage output or crash
- Why: Str_sub returns non-null-terminated pointer
- Fix: Use
%.*sformat or Str_dup with alloc=1 - Test:
printf("%.*s\n", len, Str_sub(s, i, j, &len))
Problem 5: “Position 0 doesn’t work as expected”
- Symptom:
Str_sub(s, 1, 0, &len)gives wrong length - Why: Position 0 should mean “end of string” (n+1), not index 0
- Fix: In convert(), handle i==0 specially: return n+1
- Test: For “Hello”,
Str_sub(s, 1, 0, &len)should have len=5
8. Extensions & Challenges
8.1 Case-Insensitive Operations
Add Str_ichr, Str_ifind, Str_icmp variants that ignore case.
8.2 Pattern Matching
Add Str_match with glob patterns: * matches any sequence, ? matches any character.
8.3 Trim Operations
Add Str_ltrim, Str_rtrim, Str_trim that return positions of first/last non-whitespace.
8.4 Split/Join
Add Str_split(s, delim) returning array of (position, length) pairs.
8.5 UTF-8 Awareness
Add Str_uchr that finds Unicode code points, not just bytes.
9. Real-World Connections
Redis Simple Dynamic Strings (SDS)
// Redis uses similar principles
struct sdshdr {
int len; // Length of string
int free; // Free bytes in buffer
char buf[]; // The data
};
// O(1) length, explicit tracking
// Hanson's Str predates SDS by ~10 years
Rust String Slices
// Rust's &str is exactly Str's philosophy
let s = "Hello, World!";
let hello: &str = &s[0..5]; // Zero-copy slice
// hello points into s, doesn't own memory
// Same concept as Str_sub returning pointer + length
Go Strings
// Go strings internally are (pointer, length)
type stringHeader struct {
Data uintptr
Len int
}
// Substrings share backing array
// Same zero-copy principle
Protocol Parsers
// Parsing HTTP headers
// "Content-Length: 42\r\n"
int pos = Str_chr(line, 1, 0, ':');
char *name = Str_sub(line, 1, pos - 1, &name_len);
char *value = Str_sub(line, pos + 2, -3, &val_len); // Skip ": " and "\r\n"
// Zero-copy extraction of fields
10. Resources
Primary References
- C Interfaces and Implementations by David Hanson, Chapter 15
- Official CII source: https://github.com/drh/cii
- Secure Coding in C and C++ by Robert Seacord
- Chapter on string handling security
Online Resources
- Redis SDS: https://redis.io/docs/reference/internals/sds/
- Rust String handling: https://doc.rust-lang.org/book/ch08-02-strings.html
- C string vulnerabilities: https://cwe.mitre.org/data/definitions/120.html
11. Self-Assessment Checklist
Before considering this project complete, verify:
- Position conversion handles positive, negative, and zero correctly
Str_subreturns pointer into original (no allocation)Str_dupallocates and optionally null-terminatesStr_chrfinds first occurrence, returns position (not index)Str_rchrfinds last occurrenceStr_findfinds substring, returns starting positionStr_cmpcompares substrings correctlyStr_catconcatenates and allocates resultStr_mapperforms character translation- Empty strings handled correctly
- Negative indices work as documented
- Position 0 means “end of string”
- No memory leaks
- Assertions catch invalid positions
12. Submission / Completion Criteria
Your Str module implementation is complete when:
- All tests pass: Position semantics, edge cases, allocations
- Memory-safe: No leaks (Str_dup/cat return allocated memory)
- Documented: Header explains position semantics clearly
- Follows CII conventions: Uses Mem module, assertions
- Zero-copy verification: Str_sub demonstrably doesn’t allocate
Deliverables:
str.h- Interface with position semantics documentationstr.c- Implementationstr_test.c- Comprehensive test suiteMakefile- Build configuration- Brief writeup explaining: (1) position conversion formula, (2) zero-copy approach, (3) how you handle C string interop