Project 13: Safe String Library
Build a bounds-checked string library that prevents buffer overflows, the most dangerous vulnerability class in C programming history.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C (strict C11/C17) |
| Difficulty | Level 4 (Expert) |
| Time | 2 Weeks |
| Key Concepts | Buffer safety, bounds checking, secure coding |
| Prerequisites | P01-P12, pointer arithmetic mastery |
| Portfolio Value | High - demonstrates security awareness |
Learning Objectives
By completing this project, you will:
-
Understand buffer overflow vulnerabilities: Know exactly how strcpy, sprintf, and gets lead to exploits and why they remain dangerous despite being well-known
-
Master bounded string operations: Implement strlcpy/strlcat semantics that guarantee null termination and prevent buffer overflows
-
Design length-prefixed strings: Understand why modern languages abandoned null-terminated strings and implement a safer alternative
-
Apply CERT C Secure Coding guidelines: Follow industry standards for secure C programming (SEI CERT C Coding Standard)
-
Implement capacity tracking: Build strings that know their own size and refuse dangerous operations
-
Create defensive APIs: Design function interfaces that make misuse difficult or impossible
-
Recognize exploit patterns: Identify code vulnerable to format string attacks, integer overflow in length calculations, and off-by-one errors
Theoretical Foundation
Why C Strings Are Dangerous
C inherited null-terminated strings from its predecessors, a decision that has caused billions of dollars in security damage. The core problems are:
THE NULL-TERMINATED STRING PROBLEM
═══════════════════════════════════════════════════════════════
char buffer[10];
strcpy(buffer, user_input); // DISASTER WAITING TO HAPPEN
If user_input is "Hello":
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
│ H │ e │ l │ l │ o │\0 │ ? │ ? │ ? │ ? │ ← OK, fits
└───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
If user_input is "ThisIsWayTooLong":
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐───┬───┬───┬───┬───┬───┬───┐
│ T │ h │ i │ s │ I │ s │ W │ a │ y │ T │ o │ o │ L │ o │ n │ g │\0 │
└───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘───┴───┴───┴───┴───┴───┴───┘
buffer boundary ───────────────┘ │
└── OVERWRITES STACK!
What gets overwritten?
┌─────────────────────────────────────────────────────────────────┐
│ LOW ADDRESS HIGH ADDRESS │
│ ┌──────────┬──────────┬──────────┬──────────┬──────────────┐ │
│ │ buffer │ padding │saved EBP │ ret addr │ arguments │ │
│ │ [10] │ │ │ │ │ │
│ └──────────┴──────────┴──────────┴──────────┴──────────────┘ │
│ ▲ │ │ │
│ │ │ └── Attacker controls │
│ Overflow starts here ───────┘ where execution │
│ returns! │
└─────────────────────────────────────────────────────────────────┘
The Dangerous Standard Library Functions
┌────────────────────────────────────────────────────────────────────────────┐
│ DANGEROUS C STRING FUNCTIONS │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ NEVER USE THESE: │
│ ───────────────── │
│ gets() │ Cannot be used safely - REMOVED in C11 │
│ strcpy() │ No bounds checking, buffer overflow guaranteed │
│ strcat() │ No bounds checking, concatenation overflow │
│ sprintf() │ No size limit, format string + overflow vulnerabilities │
│ vsprintf() │ Same as sprintf with va_list │
│ scanf("%s") │ No field width limit, classic overflow │
│ │
│ BETTER BUT STILL PROBLEMATIC: │
│ ──────────────────────────── │
│ strncpy() │ May not null-terminate! Wastes time zeroing. │
│ strncat() │ Size parameter is confusing (remaining space, not total) │
│ snprintf() │ Safe if used correctly, but return value often ignored │
│ │
│ THE BSD SOLUTION (recommended): │
│ ──────────────────────────────── │
│ strlcpy() │ Always null-terminates, returns total length needed │
│ strlcat() │ Always null-terminates, returns total length needed │
│ │
│ THE MICROSOFT SOLUTION: │
│ ───────────────────────── │
│ strcpy_s() │ Returns error code, invokes handler on overflow │
│ strcat_s() │ Part of C11 Annex K (optional) │
│ │
└────────────────────────────────────────────────────────────────────────────┘
The strncpy Problem
Many developers think strncpy is safe. It is not:
┌────────────────────────────────────────────────────────────────────────────┐
│ WHY strncpy IS NOT THE ANSWER │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ Problem 1: No null termination guarantee │
│ ────────────────────────────────────────── │
│ │
│ char dest[5]; │
│ strncpy(dest, "Hello World", 5); │
│ │
│ Result: │
│ ┌───┬───┬───┬───┬───┐ │
│ │ H │ e │ l │ l │ o │ ← NO NULL TERMINATOR! │
│ └───┴───┴───┴───┴───┘ │
│ │
│ strlen(dest) = UNDEFINED (reads past buffer until random \0) │
│ printf("%s", dest) = UNDEFINED (prints garbage) │
│ │
│ Problem 2: Wasteful padding │
│ ─────────────────────────── │
│ │
│ char dest[1000]; │
│ strncpy(dest, "Hi", 1000); │
│ │
│ Result: strncpy writes 998 null bytes after "Hi\0" │
│ This was designed for fixed-width database fields, not security! │
│ │
│ The Fix (manual, error-prone): │
│ ─────────────────────────────── │
│ │
│ strncpy(dest, src, sizeof(dest) - 1); │
│ dest[sizeof(dest) - 1] = '\0'; // Must remember this! │
│ │
└────────────────────────────────────────────────────────────────────────────┘
The strlcpy/strlcat Solution (BSD)
OpenBSD introduced strlcpy and strlcat in 1998. They are safer by design:
┌────────────────────────────────────────────────────────────────────────────┐
│ strlcpy/strlcat SEMANTICS │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ size_t strlcpy(char *dst, const char *src, size_t dstsize); │
│ size_t strlcat(char *dst, const char *src, size_t dstsize); │
│ │
│ GUARANTEES: │
│ ─────────── │
│ 1. dst is ALWAYS null-terminated (if dstsize > 0) │
│ 2. Never writes more than dstsize bytes total │
│ 3. Returns strlen(src) for strlcpy, strlen(dst)+strlen(src) for strlcat │
│ 4. If return value >= dstsize, truncation occurred │
│ │
│ TRUNCATION DETECTION: │
│ ───────────────────── │
│ │
│ char buf[10]; │
│ if (strlcpy(buf, "Hello World", sizeof(buf)) >= sizeof(buf)) { │
│ // Truncation! "Hello Wor\0" stored, needed 12 bytes │
│ handle_error(); │
│ } │
│ │
│ CONCATENATION: │
│ ────────────── │
│ │
│ char path[PATH_MAX]; │
│ if (strlcpy(path, dir, sizeof(path)) >= sizeof(path)) │
│ goto toolong; │
│ if (strlcat(path, "/", sizeof(path)) >= sizeof(path)) │
│ goto toolong; │
│ if (strlcat(path, file, sizeof(path)) >= sizeof(path)) │
│ goto toolong; │
│ │
└────────────────────────────────────────────────────────────────────────────┘
Length-Prefixed Strings
An alternative approach used by Pascal, BASIC, and modern languages:
┌────────────────────────────────────────────────────────────────────────────┐
│ LENGTH-PREFIXED STRINGS │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ C String (null-terminated): │
│ ┌───┬───┬───┬───┬───┬───┐ │
│ │ H │ e │ l │ l │ o │\0 │ │
│ └───┴───┴───┴───┴───┴───┘ │
│ Problems: strlen is O(n), no embedded nulls, overflow prone │
│ │
│ Length-prefixed (Pascal style, 1 byte length): │
│ ┌───┬───┬───┬───┬───┬───┐ │
│ │ 5 │ H │ e │ l │ l │ o │ │
│ └───┴───┴───┴───┴───┴───┘ │
│ Problems: Max length 255, wastes 1 byte for short strings │
│ │
│ Capacity-tracked (your implementation): │
│ ┌──────────┬──────────┬───────────────────────────────────┐ │
│ │ capacity │ length │ data │ │
│ │ size_t │ size_t │ char[capacity] │ │
│ └──────────┴──────────┴───────────────────────────────────┘ │
│ │
│ typedef struct { │
│ size_t capacity; // Total allocated size (including \0) │
│ size_t length; // Current string length (not including \0) │
│ char data[]; // Flexible array member │
│ } SafeString; │
│ │
│ ADVANTAGES: │
│ ─────────── │
│ • strlen is O(1) │
│ • Buffer overflow impossible (we know the capacity) │
│ • Can contain embedded null bytes │
│ • Clear memory ownership semantics │
│ │
│ DISADVANTAGES: │
│ ───────────── │
│ • Extra memory overhead (2 * sizeof(size_t)) │
│ • Not compatible with C string APIs without extraction │
│ • Requires heap allocation for most operations │
│ │
└────────────────────────────────────────────────────────────────────────────┘
Integer Overflow in Length Calculations
A subtle but critical vulnerability:
┌────────────────────────────────────────────────────────────────────────────┐
│ INTEGER OVERFLOW VULNERABILITIES │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ VULNERABLE CODE: │
│ ───────────────── │
│ │
│ void concat(char *dst, size_t dstsize, │
│ const char *s1, const char *s2) { │
│ size_t len1 = strlen(s1); │
│ size_t len2 = strlen(s2); │
│ size_t total = len1 + len2 + 1; // OVERFLOW POSSIBLE! │
│ │
│ if (total <= dstsize) { // Comparison passes if overflow! │
│ strcpy(dst, s1); │
│ strcat(dst, s2); │
│ } │
│ } │
│ │
│ Attack: s1 is SIZE_MAX-10 chars, s2 is 20 chars │
│ len1 + len2 + 1 wraps to small positive number │
│ Check passes, strcpy overflows │
│ │
│ SAFE VERSION: │
│ ───────────── │
│ │
│ bool safe_add(size_t a, size_t b, size_t *result) { │
│ if (a > SIZE_MAX - b) return false; // Would overflow │
│ *result = a + b; │
│ return true; │
│ } │
│ │
│ void concat_safe(char *dst, size_t dstsize, │
│ const char *s1, const char *s2) { │
│ size_t len1 = strlen(s1); │
│ size_t len2 = strlen(s2); │
│ size_t total; │
│ │
│ if (!safe_add(len1, len2, &total)) return; // Overflow │
│ if (!safe_add(total, 1, &total)) return; // Overflow │
│ if (total > dstsize) return; // Doesn't fit │
│ │
│ memcpy(dst, s1, len1); │
│ memcpy(dst + len1, s2, len2 + 1); │
│ } │
│ │
└────────────────────────────────────────────────────────────────────────────┘
Format String Vulnerabilities
Related to string handling, format strings are another attack vector:
┌────────────────────────────────────────────────────────────────────────────┐
│ FORMAT STRING ATTACKS │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ VULNERABLE: │
│ ─────────── │
│ printf(user_input); // User controls format string! │
│ │
│ ATTACKS: │
│ ──────── │
│ Input: "%s%s%s%s%s" → Crashes (reads from stack) │
│ Input: "%x%x%x%x" → Leaks stack data │
│ Input: "%n" → Writes to memory! (number of chars written) │
│ │
│ How %n works: │
│ int count; │
│ printf("Hello%n", &count); // count = 5 │
│ │
│ Attack: Control stack to make %n write to chosen address │
│ │
│ SAFE: │
│ ───── │
│ printf("%s", user_input); // User input is data, not format │
│ fputs(user_input, stdout); // No format interpretation │
│ │
│ RULE: Never pass user input as format string │
│ │
└────────────────────────────────────────────────────────────────────────────┘
Project Specification
What You Will Build
A complete safe string library with two components:
- BSD-style bounded functions: strlcpy, strlcat implementations that work on char arrays
- Capacity-tracked string type: A SafeString struct with safe operations
Functional Requirements
Bounded String Functions (strlcpy/strlcat compatible):
// Copy src to dst, guaranteeing null-termination
// Returns strlen(src), so truncation if return >= dstsize
size_t safe_strcpy(char *dst, const char *src, size_t dstsize);
// Append src to dst, guaranteeing null-termination
// Returns strlen(dst) + strlen(src), truncation if return >= dstsize
size_t safe_strcat(char *dst, const char *src, size_t dstsize);
// Bounded sprintf, always null-terminates
// Returns number of chars that would have been written
int safe_sprintf(char *dst, size_t dstsize, const char *fmt, ...);
// Safe substring extraction
size_t safe_substr(char *dst, size_t dstsize,
const char *src, size_t start, size_t len);
Capacity-Tracked SafeString Type:
// Opaque handle for type safety
typedef struct SafeString SafeString;
// Creation and destruction
SafeString *ss_create(size_t initial_capacity);
SafeString *ss_from_cstr(const char *cstr);
SafeString *ss_clone(const SafeString *s);
void ss_free(SafeString *s);
// Length and capacity queries (O(1))
size_t ss_len(const SafeString *s);
size_t ss_capacity(const SafeString *s);
bool ss_is_empty(const SafeString *s);
// C string access (read-only, null-terminated)
const char *ss_cstr(const SafeString *s);
// Modification operations (return false on allocation failure)
bool ss_set(SafeString *s, const char *cstr);
bool ss_append(SafeString *s, const char *cstr);
bool ss_append_char(SafeString *s, char c);
bool ss_append_fmt(SafeString *s, const char *fmt, ...);
bool ss_insert(SafeString *s, size_t pos, const char *cstr);
bool ss_erase(SafeString *s, size_t start, size_t len);
// In-place transformations
void ss_clear(SafeString *s);
void ss_trim(SafeString *s);
void ss_to_upper(SafeString *s);
void ss_to_lower(SafeString *s);
// Capacity management
bool ss_reserve(SafeString *s, size_t min_capacity);
void ss_shrink_to_fit(SafeString *s);
// Comparison
int ss_compare(const SafeString *a, const SafeString *b);
bool ss_equals(const SafeString *a, const SafeString *b);
bool ss_starts_with(const SafeString *s, const char *prefix);
bool ss_ends_with(const SafeString *s, const char *suffix);
// Search
size_t ss_find(const SafeString *s, const char *needle); // SIZE_MAX if not found
size_t ss_rfind(const SafeString *s, const char *needle);
Non-Functional Requirements
- No buffer overflows: All operations must be bounds-checked
- Always null-terminated: String data must always end with ‘\0’
- Overflow checking: Length calculations must check for integer overflow
- No undefined behavior: Even with malicious inputs
- Thread safety: Individual strings are not thread-safe, but no global state
- Valgrind clean: No memory leaks, no invalid accesses
- CERT C compliant: Follow SEI CERT C Coding Standard
Example Usage and Output
#include "safestring.h"
#include <stdio.h>
int main(void) {
// Bounded array operations
char path[20];
size_t needed;
needed = safe_strcpy(path, "/home/user", sizeof(path));
printf("After copy: '%s' (needed %zu bytes)\n", path, needed);
// Output: After copy: '/home/user' (needed 10 bytes)
needed = safe_strcat(path, "/documents/important/file.txt", sizeof(path));
printf("After cat: '%s' (needed %zu bytes)\n", path, needed);
// Output: After cat: '/home/user/docume' (needed 39 bytes)
// Note: truncated! needed >= sizeof(path)
if (needed >= sizeof(path)) {
printf("WARNING: Path truncated!\n");
}
// SafeString operations
SafeString *s = ss_from_cstr("Hello");
printf("Initial: '%s' (len=%zu, cap=%zu)\n",
ss_cstr(s), ss_len(s), ss_capacity(s));
// Output: Initial: 'Hello' (len=5, cap=16)
ss_append(s, ", World!");
printf("After append: '%s'\n", ss_cstr(s));
// Output: After append: 'Hello, World!'
ss_to_upper(s);
printf("Upper: '%s'\n", ss_cstr(s));
// Output: Upper: 'HELLO, WORLD!'
// Safe formatting
char buffer[32];
int ret = safe_sprintf(buffer, sizeof(buffer),
"User: %s, ID: %d",
"Administrator", 12345);
printf("Formatted: '%s' (would need %d)\n", buffer, ret);
// Output: Formatted: 'User: Administrator, ID: 1234' (would need 29)
ss_free(s);
return 0;
}
Real-World Outcome
When you complete this project, you will have a library that prevents this:
$ # Compile vulnerable program
$ gcc -o vuln vuln.c
$ # Normal usage
$ ./vuln "Hello"
Copied: Hello
$ # Stack smashing attack (without safe string library)
$ ./vuln $(python -c "print('A'*200)")
*** stack smashing detected ***
Aborted (core dumped)
$ # With your safe string library
$ ./safe_vuln $(python -c "print('A'*200)")
Copied: AAAAAAAAAAAAAAAA (truncated)
Warning: Input too long, truncated from 200 to 16 bytes
Solution Architecture
High-Level Design
┌─────────────────────────────────────────────────────────────────────────────┐
│ SAFE STRING LIBRARY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ PUBLIC API LAYER │ │
│ │ ┌─────────────────────┐ ┌──────────────────────────────────────┐ │ │
│ │ │ Bounded Functions │ │ SafeString Type │ │ │
│ │ │ │ │ │ │ │
│ │ │ safe_strcpy() │ │ ss_create(), ss_free() │ │ │
│ │ │ safe_strcat() │ │ ss_append(), ss_insert() │ │ │
│ │ │ safe_sprintf() │ │ ss_find(), ss_compare() │ │ │
│ │ │ safe_substr() │ │ ss_to_upper(), ss_trim() │ │ │
│ │ └─────────────────────┘ └──────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CORE SAFETY LAYER │ │
│ │ │ │
│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │
│ │ │ Bounds Checking │ │ Overflow Detection │ │ │
│ │ │ │ │ │ │ │
│ │ │ check_bounds() │ │ safe_add() │ │ │
│ │ │ check_overlap() │ │ safe_mul() │ │ │
│ │ │ validate_ptr() │ │ check_size() │ │ │
│ │ └─────────────────────┘ └─────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ MEMORY MANAGEMENT LAYER │ │
│ │ │ │
│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │
│ │ │ SafeString Storage │ │ Growth Strategy │ │ │
│ │ │ │ │ │ │ │
│ │ │ struct { │ │ grow_capacity() │ │ │
│ │ │ size_t capacity; │ │ shrink_capacity() │ │ │
│ │ │ size_t length; │ │ align_capacity() │ │ │
│ │ │ char data[]; │ │ │ │ │
│ │ │ } │ │ │ │ │
│ │ └─────────────────────┘ └─────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Module Structure
safe-string-library/
├── include/
│ └── safestring.h # Public API header
├── src/
│ ├── bounded.c # safe_strcpy, safe_strcat, etc.
│ ├── safestring.c # SafeString type implementation
│ ├── safety.c # Bounds checking, overflow detection
│ ├── safety.h # Internal safety utilities
│ └── internal.h # Internal macros and definitions
├── tests/
│ ├── test_bounded.c # Bounded function tests
│ ├── test_safestring.c # SafeString tests
│ ├── test_overflow.c # Integer overflow tests
│ ├── test_fuzzing.c # Fuzz testing harness
│ └── run_tests.sh # Test runner
├── examples/
│ ├── path_builder.c # Building file paths safely
│ ├── log_formatter.c # Safe logging with formatting
│ └── config_parser.c # Parsing config files safely
├── Makefile
├── CMakeLists.txt
└── README.md
Key Data Structures
/* Internal SafeString structure (not exposed in header) */
struct SafeString {
size_t capacity; /* Total allocated bytes including null */
size_t length; /* Current string length excluding null */
char data[]; /* Flexible array member for string data */
};
/* Constants for growth strategy */
#define SS_MIN_CAPACITY 16
#define SS_GROWTH_FACTOR 2
#define SS_MAX_CAPACITY (SIZE_MAX / 2) /* Prevent overflow */
/* Error sentinel for find operations */
#define SS_NPOS SIZE_MAX
/* Internal result type for operations that can fail */
typedef enum {
SS_OK = 0,
SS_ERROR_NULL_PTR,
SS_ERROR_OVERFLOW,
SS_ERROR_ALLOC,
SS_ERROR_BOUNDS
} ss_result_t;
Memory Layout
┌────────────────────────────────────────────────────────────────────────────┐
│ SafeString MEMORY LAYOUT │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ ss_from_cstr("Hello") │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ capacity │ length │ H │ e │ l │ l │ o │\0 │ │ │ │ │ │ │
│ │ 16 │ 5 │ │ │ │ │ │ │ │ │ │ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │← 8 bytes→│← 8 bytes→│←────────── 16 bytes ─────────────────────→│ │
│ │
│ Total malloc'd: 8 + 8 + 16 = 32 bytes │
│ Usable capacity: 16 bytes (including null) │
│ Max string length: 15 characters │
│ │
│ After ss_append(s, " World!"): │
│ length now 12, capacity still 16 │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ capacity │ length │ H │ e │ l │ l │ o │ │ W │ o │ r │ l │ d │ │
│ │ 16 │ 12 │ │ │ │ │ │ │ │ │ │ │ ! │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ After ss_append(s, "!!!!"): │
│ Needs 17 bytes, capacity grows to 32 │
│ │
└────────────────────────────────────────────────────────────────────────────┘
Implementation Guide
Phase 1: Core Safety Functions (Days 1-2)
Goals:
- Implement integer overflow checking
- Create bounds validation functions
- Set up testing infrastructure
Tasks:
- Implement overflow-safe arithmetic:
/* safety.c */
#include <stdbool.h>
#include <stddef.h>
#include <limits.h>
/* Returns false if a + b would overflow size_t */
bool safe_size_add(size_t a, size_t b, size_t *result) {
if (a > SIZE_MAX - b) {
return false;
}
*result = a + b;
return true;
}
/* Returns false if a * b would overflow size_t */
bool safe_size_mul(size_t a, size_t b, size_t *result) {
if (a == 0 || b == 0) {
*result = 0;
return true;
}
if (a > SIZE_MAX / b) {
return false;
}
*result = a * b;
return true;
}
- Create validation macros:
/* internal.h */
#define SS_UNLIKELY(x) __builtin_expect(!!(x), 0)
#define SS_LIKELY(x) __builtin_expect(!!(x), 1)
#define SS_VALIDATE_PTR(ptr) \
do { \
if (SS_UNLIKELY((ptr) == NULL)) { \
return SS_ERROR_NULL_PTR; \
} \
} while (0)
#define SS_VALIDATE_BOUNDS(idx, len) \
do { \
if (SS_UNLIKELY((idx) > (len))) { \
return SS_ERROR_BOUNDS; \
} \
} while (0)
Checkpoint: All safety utilities pass unit tests with edge cases.
Phase 2: Bounded String Functions (Days 3-5)
Goals:
- Implement strlcpy semantics
- Implement strlcat semantics
- Add safe sprintf
Tasks:
- Implement safe_strcpy (strlcpy semantics):
/* bounded.c */
size_t safe_strcpy(char *dst, const char *src, size_t dstsize) {
const char *s = src;
size_t n = dstsize;
if (n == 0) {
/* No space for even null terminator */
return strlen(src);
}
/* Copy as many bytes as will fit */
while (n > 1 && *s != '\0') {
*dst++ = *s++;
n--;
}
/* Always null-terminate (dstsize > 0) */
*dst = '\0';
/* Return total length of src */
while (*s != '\0') {
s++;
}
return (size_t)(s - src);
}
- Implement safe_strcat (strlcat semantics):
size_t safe_strcat(char *dst, const char *src, size_t dstsize) {
char *d = dst;
const char *s = src;
size_t n = dstsize;
size_t dlen;
/* Find end of dst within dstsize bytes */
while (n > 0 && *d != '\0') {
d++;
n--;
}
dlen = (size_t)(d - dst);
if (n == 0) {
/* dst was not null-terminated within dstsize */
return dlen + strlen(src);
}
/* Copy src, leaving room for null */
while (*s != '\0') {
if (n > 1) {
*d++ = *s;
n--;
}
s++;
}
*d = '\0';
return dlen + (size_t)(s - src);
}
- Implement safe_sprintf:
#include <stdarg.h>
#include <stdio.h>
int safe_sprintf(char *dst, size_t dstsize, const char *fmt, ...) {
va_list ap;
int ret;
if (dstsize == 0) {
/* Can't write anything, but count what would be written */
va_start(ap, fmt);
ret = vsnprintf(NULL, 0, fmt, ap);
va_end(ap);
return ret;
}
va_start(ap, fmt);
ret = vsnprintf(dst, dstsize, fmt, ap);
va_end(ap);
/* vsnprintf guarantees null-termination if dstsize > 0 */
return ret;
}
Checkpoint: Bounded functions match BSD strlcpy/strlcat behavior in all tests.
Phase 3: SafeString Core (Days 6-9)
Goals:
- Implement creation/destruction
- Implement basic operations
- Handle memory growth
Tasks:
- Core creation functions:
/* safestring.c */
#include <stdlib.h>
#include <string.h>
static size_t align_capacity(size_t requested) {
/* Round up to power of 2, minimum SS_MIN_CAPACITY */
if (requested < SS_MIN_CAPACITY) {
return SS_MIN_CAPACITY;
}
size_t cap = SS_MIN_CAPACITY;
while (cap < requested && cap < SS_MAX_CAPACITY) {
cap *= 2;
}
return cap;
}
SafeString *ss_create(size_t initial_capacity) {
size_t cap = align_capacity(initial_capacity > 0 ? initial_capacity : SS_MIN_CAPACITY);
/* Overflow check for total allocation */
size_t total;
if (!safe_size_add(sizeof(SafeString), cap, &total)) {
return NULL;
}
SafeString *s = malloc(total);
if (s == NULL) {
return NULL;
}
s->capacity = cap;
s->length = 0;
s->data[0] = '\0';
return s;
}
SafeString *ss_from_cstr(const char *cstr) {
if (cstr == NULL) {
return ss_create(0);
}
size_t len = strlen(cstr);
size_t needed;
if (!safe_size_add(len, 1, &needed)) {
return NULL;
}
SafeString *s = ss_create(needed);
if (s == NULL) {
return NULL;
}
memcpy(s->data, cstr, len + 1);
s->length = len;
return s;
}
void ss_free(SafeString *s) {
free(s); /* free(NULL) is safe */
}
- Growth strategy:
static bool ss_grow(SafeString **sp, size_t min_capacity) {
SafeString *s = *sp;
if (s->capacity >= min_capacity) {
return true; /* Already big enough */
}
/* Calculate new capacity (double until big enough) */
size_t new_cap = s->capacity;
while (new_cap < min_capacity && new_cap < SS_MAX_CAPACITY) {
new_cap *= SS_GROWTH_FACTOR;
}
if (new_cap < min_capacity) {
return false; /* Would exceed maximum */
}
/* Reallocate */
size_t total;
if (!safe_size_add(sizeof(SafeString), new_cap, &total)) {
return false;
}
SafeString *new_s = realloc(s, total);
if (new_s == NULL) {
return false;
}
new_s->capacity = new_cap;
*sp = new_s;
return true;
}
- Append operation:
bool ss_append(SafeString *s, const char *cstr) {
if (s == NULL || cstr == NULL) {
return false;
}
size_t add_len = strlen(cstr);
size_t new_len;
if (!safe_size_add(s->length, add_len, &new_len)) {
return false;
}
size_t needed;
if (!safe_size_add(new_len, 1, &needed)) {
return false;
}
/* We can't use ss_grow with a pointer to s from outside... */
/* This is a design consideration - see Architecture note */
if (s->capacity < needed) {
return false; /* Would need reallocation - not possible with current ptr */
}
memcpy(s->data + s->length, cstr, add_len + 1);
s->length = new_len;
return true;
}
Note: The above append has a design limitation. For production, use a different approach - either return new pointer or use double indirection.
Checkpoint: SafeString creation, access, and basic modifications work correctly.
Phase 4: Advanced Operations (Days 10-12)
Goals:
- Implement search operations
- Implement transformations
- Add formatting support
Tasks:
- Search operations:
size_t ss_find(const SafeString *s, const char *needle) {
if (s == NULL || needle == NULL) {
return SS_NPOS;
}
const char *found = strstr(s->data, needle);
if (found == NULL) {
return SS_NPOS;
}
return (size_t)(found - s->data);
}
bool ss_starts_with(const SafeString *s, const char *prefix) {
if (s == NULL || prefix == NULL) {
return false;
}
size_t prefix_len = strlen(prefix);
if (prefix_len > s->length) {
return false;
}
return memcmp(s->data, prefix, prefix_len) == 0;
}
bool ss_ends_with(const SafeString *s, const char *suffix) {
if (s == NULL || suffix == NULL) {
return false;
}
size_t suffix_len = strlen(suffix);
if (suffix_len > s->length) {
return false;
}
return memcmp(s->data + s->length - suffix_len, suffix, suffix_len) == 0;
}
- Transformations:
void ss_to_upper(SafeString *s) {
if (s == NULL) return;
for (size_t i = 0; i < s->length; i++) {
if (s->data[i] >= 'a' && s->data[i] <= 'z') {
s->data[i] -= 32;
}
}
}
void ss_trim(SafeString *s) {
if (s == NULL || s->length == 0) return;
/* Find first non-whitespace */
size_t start = 0;
while (start < s->length && isspace((unsigned char)s->data[start])) {
start++;
}
/* Find last non-whitespace */
size_t end = s->length;
while (end > start && isspace((unsigned char)s->data[end - 1])) {
end--;
}
/* Shift and truncate */
size_t new_len = end - start;
if (start > 0) {
memmove(s->data, s->data + start, new_len);
}
s->data[new_len] = '\0';
s->length = new_len;
}
Checkpoint: All operations work correctly with comprehensive test cases.
Phase 5: Testing and Hardening (Days 13-14)
Goals:
- Comprehensive test coverage
- Edge case handling
- Security review
Tasks:
- Edge case tests:
/* test_bounded.c */
void test_strcpy_empty_dest(void) {
char dest[1] = "";
size_t ret = safe_strcpy(dest, "Hello", 1);
assert(dest[0] == '\0'); /* Must be null-terminated */
assert(ret == 5); /* Returns strlen of src */
}
void test_strcpy_zero_size(void) {
char dest[10] = "original";
size_t ret = safe_strcpy(dest, "new", 0);
assert(strcmp(dest, "original") == 0); /* Unchanged */
assert(ret == 3); /* Still returns strlen(src) */
}
void test_strcat_full_buffer(void) {
char dest[10] = "Hello";
size_t ret = safe_strcat(dest, "World!", sizeof(dest));
assert(strlen(dest) == 9); /* Max that fits */
assert(ret == 11); /* Truncation detected */
assert(dest[9] == '\0'); /* Null-terminated */
}
- Fuzz testing:
/* test_fuzzing.c */
#include <stdlib.h>
#include <time.h>
void fuzz_safe_strcpy(int iterations) {
srand(time(NULL));
for (int i = 0; i < iterations; i++) {
/* Random destination size (0 to 1000) */
size_t dstsize = rand() % 1001;
char *dst = malloc(dstsize > 0 ? dstsize : 1);
/* Random source length (0 to 2000) */
size_t srclen = rand() % 2001;
char *src = malloc(srclen + 1);
memset(src, 'A', srclen);
src[srclen] = '\0';
/* Call function */
size_t ret = safe_strcpy(dst, src, dstsize);
/* Verify invariants */
assert(ret == srclen); /* Returns strlen(src) */
if (dstsize > 0) {
assert(dst[dstsize > srclen ? srclen : dstsize - 1] == '\0');
}
free(dst);
free(src);
}
printf("Fuzz test passed: %d iterations\n", iterations);
}
Testing Strategy
Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Basic Functionality | Core operations work | Copy, append, find |
| Boundary Conditions | Edge cases handled | Empty strings, size 0 |
| Overflow Protection | Integer overflows caught | SIZE_MAX inputs |
| Memory Safety | No buffer overflows | Truncation, null-term |
| Null Safety | NULL inputs handled | NULL src, NULL dst |
Critical Test Cases
/* Must-pass tests for security certification */
/* 1. Buffer overflow prevention */
void test_no_overflow_on_long_input(void) {
char small[8];
safe_strcpy(small, "This is a very long string", sizeof(small));
assert(strlen(small) == 7); /* Truncated to fit */
assert(small[7] == '\0'); /* Null-terminated */
}
/* 2. Integer overflow in length calculation */
void test_length_overflow_detected(void) {
/* Simulate concatenating strings that would overflow size_t */
SafeString *s = ss_create(10);
/* This test requires mocking or specific setup */
/* The library should reject operations that would overflow */
}
/* 3. NULL pointer safety */
void test_null_inputs_safe(void) {
char buf[10] = "test";
size_t ret = safe_strcpy(buf, NULL, sizeof(buf));
/* Define behavior: either return 0 or leave buf unchanged */
assert(safe_strcat(NULL, "test", 10) != 0 || true); /* No crash */
}
/* 4. Zero-size buffer */
void test_zero_size_buffer(void) {
char *nullbuf = NULL;
size_t ret = safe_strcpy(nullbuf, "test", 0);
assert(ret == 4); /* Still returns strlen(src) */
/* No crash, no write */
}
Common Pitfalls & Debugging
Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Off-by-one in null terminator | Unterminated strings | Always reserve n+1 for n chars |
| Forgetting to check dstsize == 0 | Crash or buffer underflow | Handle 0 as special case first |
| Using strlen on unterminated string | Crash or garbage | Use strnlen or known length |
| Integer overflow in size calc | Allocate too little | Check with safe_add before math |
| Returning pointer from realloc | Memory leak if NULL | Check return before overwriting |
Debugging Strategies
Use AddressSanitizer:
gcc -fsanitize=address -g -o test test.c safestring.c
./test
Valgrind for memory issues:
valgrind --leak-check=full --track-origins=yes ./test
Custom debug builds:
#ifdef DEBUG
#define SS_DEBUG_PRINT(fmt, ...) \
fprintf(stderr, "[SS DEBUG] " fmt "\n", ##__VA_ARGS__)
#else
#define SS_DEBUG_PRINT(fmt, ...)
#endif
Extensions & Challenges
Beginner Extensions
- Add ss_split() to split string into array
- Add ss_join() to join array with delimiter
- Add ss_replace() for find-and-replace
- Add Unicode support (UTF-8 validation)
Intermediate Extensions
- Implement rope data structure for large strings
- Add copy-on-write optimization
- Create small-string optimization (SSO)
- Add regex matching with safe bounds
Advanced Extensions
- Thread-safe variant with atomic operations
- Memory pool allocator for SafeString
- Zero-copy slicing and views
- Compile-time bounds checking with static analysis
Real-World Connections
Famous Buffer Overflow Vulnerabilities
Morris Worm (1988): Exploited buffer overflow in fingerd via gets()
Code Red (2001): IIS buffer overflow in URL handling
Heartbleed (2014): OpenSSL buffer over-read exposing memory
Your library prevents all of these patterns.
Industry Standards
CERT C Secure Coding Standard:
- STR31-C: Guarantee that storage for strings has sufficient space for data and null terminator
- STR32-C: Do not pass a non-null-terminated string to a library function that expects a string
MISRA C:
- Bounded string operations required for safety-critical systems
Resources
Essential Reading
| Topic | Source | Chapter/Section |
|---|---|---|
| String vulnerabilities | “Effective C” by Seacord | Chapter 7: Strings |
| Secure coding | SEI CERT C Coding Standard | STR section |
| strlcpy design | “strlcpy and strlcat” Miller & de Raadt | OpenBSD paper |
| Buffer overflow exploits | “Hacking: Art of Exploitation” | Chapter 2 |
Related Projects
- Previous: P12 (Bug Catalog) - Understanding C pitfalls
- Next: P14 (Memory Debugger) - Runtime safety checking
- Related: P16 (Portable Code Checker) - Static analysis
Self-Assessment Checklist
Understanding Verification
- I can explain exactly how strcpy leads to buffer overflows
- I understand the difference between strncpy and strlcpy semantics
- I can identify integer overflow vulnerabilities in string code
- I know why format strings are dangerous with user input
Implementation Verification
- safe_strcpy always null-terminates (even when truncating)
- safe_strcat correctly handles already-full destination
- Integer overflow is caught before any allocation
- NULL inputs do not cause crashes
Security Verification
- No buffer overflows under any input
- No integer overflows in size calculations
- No undefined behavior with edge cases
- Valgrind reports zero errors
Submission / Completion Criteria
Minimum Viable Completion:
- safe_strcpy and safe_strcat working correctly
- Basic SafeString creation and access
- No buffer overflows in any test case
Full Completion:
- All bounded functions implemented
- Complete SafeString API
- Comprehensive test suite (100+ tests)
- Fuzz testing passes
- Valgrind clean
Excellence:
- Small-string optimization implemented
- Thread-safe variant
- Integration with static analyzer
- Benchmarks showing performance vs std functions
The Core Question You’re Answering
“Why are C strings the source of so many security vulnerabilities, and how can we design string-handling APIs that make buffer overflows impossible?”
This project teaches you that security is not an afterthought but a design consideration. By building safe string functions from scratch, you understand exactly what makes code vulnerable and how to write APIs that prevent misuse.
This guide was expanded from EXPERT_C_PROGRAMMING_DEEP_DIVE.md. For the complete learning path, see the project index.