Project 13: Safe String Library

Build a bounds-checked string library that prevents buffer overflows, the most dangerous vulnerability class in C programming history.


Quick Reference

Attribute Value
Language C (strict C11/C17)
Difficulty Level 4 (Expert)
Time 2 Weeks
Key Concepts Buffer safety, bounds checking, secure coding
Prerequisites P01-P12, pointer arithmetic mastery
Portfolio Value High - demonstrates security awareness

Learning Objectives

By completing this project, you will:

  1. Understand buffer overflow vulnerabilities: Know exactly how strcpy, sprintf, and gets lead to exploits and why they remain dangerous despite being well-known

  2. Master bounded string operations: Implement strlcpy/strlcat semantics that guarantee null termination and prevent buffer overflows

  3. Design length-prefixed strings: Understand why modern languages abandoned null-terminated strings and implement a safer alternative

  4. Apply CERT C Secure Coding guidelines: Follow industry standards for secure C programming (SEI CERT C Coding Standard)

  5. Implement capacity tracking: Build strings that know their own size and refuse dangerous operations

  6. Create defensive APIs: Design function interfaces that make misuse difficult or impossible

  7. Recognize exploit patterns: Identify code vulnerable to format string attacks, integer overflow in length calculations, and off-by-one errors


Theoretical Foundation

Why C Strings Are Dangerous

C inherited null-terminated strings from its predecessors, a decision that has caused billions of dollars in security damage. The core problems are:

                    THE NULL-TERMINATED STRING PROBLEM
    ═══════════════════════════════════════════════════════════════

    char buffer[10];
    strcpy(buffer, user_input);  // DISASTER WAITING TO HAPPEN

    If user_input is "Hello":
    ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
    │ H │ e │ l │ l │ o │\0 │ ? │ ? │ ? │ ? │  ← OK, fits
    └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘

    If user_input is "ThisIsWayTooLong":
    ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐───┬───┬───┬───┬───┬───┬───┐
    │ T │ h │ i │ s │ I │ s │ W │ a │ y │ T │ o │ o │ L │ o │ n │ g │\0 │
    └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘───┴───┴───┴───┴───┴───┴───┘
          buffer boundary ───────────────┘     │
                                               └── OVERWRITES STACK!

    What gets overwritten?
    ┌─────────────────────────────────────────────────────────────────┐
    │  LOW ADDRESS                                      HIGH ADDRESS  │
    │  ┌──────────┬──────────┬──────────┬──────────┬──────────────┐   │
    │  │  buffer  │ padding  │saved EBP │ ret addr │  arguments   │   │
    │  │  [10]    │          │          │          │              │   │
    │  └──────────┴──────────┴──────────┴──────────┴──────────────┘   │
    │       ▲                      │          │                       │
    │       │                      │          └── Attacker controls   │
    │  Overflow starts here ───────┘              where execution     │
    │                                              returns!           │
    └─────────────────────────────────────────────────────────────────┘

The Dangerous Standard Library Functions

    ┌────────────────────────────────────────────────────────────────────────────┐
    │                    DANGEROUS C STRING FUNCTIONS                            │
    ├────────────────────────────────────────────────────────────────────────────┤
    │                                                                            │
    │  NEVER USE THESE:                                                          │
    │  ─────────────────                                                         │
    │  gets()        │ Cannot be used safely - REMOVED in C11                    │
    │  strcpy()      │ No bounds checking, buffer overflow guaranteed            │
    │  strcat()      │ No bounds checking, concatenation overflow                │
    │  sprintf()     │ No size limit, format string + overflow vulnerabilities   │
    │  vsprintf()    │ Same as sprintf with va_list                              │
    │  scanf("%s")   │ No field width limit, classic overflow                    │
    │                                                                            │
    │  BETTER BUT STILL PROBLEMATIC:                                             │
    │  ────────────────────────────                                              │
    │  strncpy()     │ May not null-terminate! Wastes time zeroing.              │
    │  strncat()     │ Size parameter is confusing (remaining space, not total)  │
    │  snprintf()    │ Safe if used correctly, but return value often ignored    │
    │                                                                            │
    │  THE BSD SOLUTION (recommended):                                           │
    │  ────────────────────────────────                                          │
    │  strlcpy()     │ Always null-terminates, returns total length needed       │
    │  strlcat()     │ Always null-terminates, returns total length needed       │
    │                                                                            │
    │  THE MICROSOFT SOLUTION:                                                   │
    │  ─────────────────────────                                                 │
    │  strcpy_s()    │ Returns error code, invokes handler on overflow           │
    │  strcat_s()    │ Part of C11 Annex K (optional)                            │
    │                                                                            │
    └────────────────────────────────────────────────────────────────────────────┘

The strncpy Problem

Many developers think strncpy is safe. It is not:

    ┌────────────────────────────────────────────────────────────────────────────┐
    │                    WHY strncpy IS NOT THE ANSWER                           │
    ├────────────────────────────────────────────────────────────────────────────┤
    │                                                                            │
    │  Problem 1: No null termination guarantee                                  │
    │  ──────────────────────────────────────────                                │
    │                                                                            │
    │  char dest[5];                                                             │
    │  strncpy(dest, "Hello World", 5);                                          │
    │                                                                            │
    │  Result:                                                                   │
    │  ┌───┬───┬───┬───┬───┐                                                     │
    │  │ H │ e │ l │ l │ o │  ← NO NULL TERMINATOR!                              │
    │  └───┴───┴───┴───┴───┘                                                     │
    │                                                                            │
    │  strlen(dest) = UNDEFINED (reads past buffer until random \0)              │
    │  printf("%s", dest) = UNDEFINED (prints garbage)                           │
    │                                                                            │
    │  Problem 2: Wasteful padding                                               │
    │  ───────────────────────────                                               │
    │                                                                            │
    │  char dest[1000];                                                          │
    │  strncpy(dest, "Hi", 1000);                                                │
    │                                                                            │
    │  Result: strncpy writes 998 null bytes after "Hi\0"                        │
    │  This was designed for fixed-width database fields, not security!         │
    │                                                                            │
    │  The Fix (manual, error-prone):                                            │
    │  ───────────────────────────────                                           │
    │                                                                            │
    │  strncpy(dest, src, sizeof(dest) - 1);                                     │
    │  dest[sizeof(dest) - 1] = '\0';  // Must remember this!                    │
    │                                                                            │
    └────────────────────────────────────────────────────────────────────────────┘

The strlcpy/strlcat Solution (BSD)

OpenBSD introduced strlcpy and strlcat in 1998. They are safer by design:

    ┌────────────────────────────────────────────────────────────────────────────┐
    │                    strlcpy/strlcat SEMANTICS                               │
    ├────────────────────────────────────────────────────────────────────────────┤
    │                                                                            │
    │  size_t strlcpy(char *dst, const char *src, size_t dstsize);               │
    │  size_t strlcat(char *dst, const char *src, size_t dstsize);               │
    │                                                                            │
    │  GUARANTEES:                                                               │
    │  ───────────                                                               │
    │  1. dst is ALWAYS null-terminated (if dstsize > 0)                         │
    │  2. Never writes more than dstsize bytes total                             │
    │  3. Returns strlen(src) for strlcpy, strlen(dst)+strlen(src) for strlcat   │
    │  4. If return value >= dstsize, truncation occurred                        │
    │                                                                            │
    │  TRUNCATION DETECTION:                                                     │
    │  ─────────────────────                                                     │
    │                                                                            │
    │  char buf[10];                                                             │
    │  if (strlcpy(buf, "Hello World", sizeof(buf)) >= sizeof(buf)) {            │
    │      // Truncation! "Hello Wor\0" stored, needed 12 bytes                  │
    │      handle_error();                                                       │
    │  }                                                                         │
    │                                                                            │
    │  CONCATENATION:                                                            │
    │  ──────────────                                                            │
    │                                                                            │
    │  char path[PATH_MAX];                                                      │
    │  if (strlcpy(path, dir, sizeof(path)) >= sizeof(path))                     │
    │      goto toolong;                                                         │
    │  if (strlcat(path, "/", sizeof(path)) >= sizeof(path))                     │
    │      goto toolong;                                                         │
    │  if (strlcat(path, file, sizeof(path)) >= sizeof(path))                    │
    │      goto toolong;                                                         │
    │                                                                            │
    └────────────────────────────────────────────────────────────────────────────┘

Length-Prefixed Strings

An alternative approach used by Pascal, BASIC, and modern languages:

    ┌────────────────────────────────────────────────────────────────────────────┐
    │                    LENGTH-PREFIXED STRINGS                                 │
    ├────────────────────────────────────────────────────────────────────────────┤
    │                                                                            │
    │  C String (null-terminated):                                               │
    │  ┌───┬───┬───┬───┬───┬───┐                                                 │
    │  │ H │ e │ l │ l │ o │\0 │                                                 │
    │  └───┴───┴───┴───┴───┴───┘                                                 │
    │  Problems: strlen is O(n), no embedded nulls, overflow prone               │
    │                                                                            │
    │  Length-prefixed (Pascal style, 1 byte length):                            │
    │  ┌───┬───┬───┬───┬───┬───┐                                                 │
    │  │ 5 │ H │ e │ l │ l │ o │                                                 │
    │  └───┴───┴───┴───┴───┴───┘                                                 │
    │  Problems: Max length 255, wastes 1 byte for short strings                 │
    │                                                                            │
    │  Capacity-tracked (your implementation):                                   │
    │  ┌──────────┬──────────┬───────────────────────────────────┐               │
    │  │ capacity │  length  │             data                  │               │
    │  │  size_t  │  size_t  │         char[capacity]            │               │
    │  └──────────┴──────────┴───────────────────────────────────┘               │
    │                                                                            │
    │  typedef struct {                                                          │
    │      size_t capacity;  // Total allocated size (including \0)              │
    │      size_t length;    // Current string length (not including \0)         │
    │      char   data[];    // Flexible array member                            │
    │  } SafeString;                                                             │
    │                                                                            │
    │  ADVANTAGES:                                                               │
    │  ───────────                                                               │
    │  • strlen is O(1)                                                          │
    │  • Buffer overflow impossible (we know the capacity)                       │
    │  • Can contain embedded null bytes                                         │
    │  • Clear memory ownership semantics                                        │
    │                                                                            │
    │  DISADVANTAGES:                                                            │
    │  ─────────────                                                             │
    │  • Extra memory overhead (2 * sizeof(size_t))                              │
    │  • Not compatible with C string APIs without extraction                    │
    │  • Requires heap allocation for most operations                            │
    │                                                                            │
    └────────────────────────────────────────────────────────────────────────────┘

Integer Overflow in Length Calculations

A subtle but critical vulnerability:

    ┌────────────────────────────────────────────────────────────────────────────┐
    │                    INTEGER OVERFLOW VULNERABILITIES                        │
    ├────────────────────────────────────────────────────────────────────────────┤
    │                                                                            │
    │  VULNERABLE CODE:                                                          │
    │  ─────────────────                                                         │
    │                                                                            │
    │  void concat(char *dst, size_t dstsize,                                    │
    │              const char *s1, const char *s2) {                             │
    │      size_t len1 = strlen(s1);                                             │
    │      size_t len2 = strlen(s2);                                             │
    │      size_t total = len1 + len2 + 1;  // OVERFLOW POSSIBLE!                │
    │                                                                            │
    │      if (total <= dstsize) {  // Comparison passes if overflow!           │
    │          strcpy(dst, s1);                                                  │
    │          strcat(dst, s2);                                                  │
    │      }                                                                     │
    │  }                                                                         │
    │                                                                            │
    │  Attack: s1 is SIZE_MAX-10 chars, s2 is 20 chars                           │
    │          len1 + len2 + 1 wraps to small positive number                    │
    │          Check passes, strcpy overflows                                    │
    │                                                                            │
    │  SAFE VERSION:                                                             │
    │  ─────────────                                                             │
    │                                                                            │
    │  bool safe_add(size_t a, size_t b, size_t *result) {                       │
    │      if (a > SIZE_MAX - b) return false;  // Would overflow                │
    │      *result = a + b;                                                      │
    │      return true;                                                          │
    │  }                                                                         │
    │                                                                            │
    │  void concat_safe(char *dst, size_t dstsize,                               │
    │                   const char *s1, const char *s2) {                        │
    │      size_t len1 = strlen(s1);                                             │
    │      size_t len2 = strlen(s2);                                             │
    │      size_t total;                                                         │
    │                                                                            │
    │      if (!safe_add(len1, len2, &total)) return;  // Overflow               │
    │      if (!safe_add(total, 1, &total)) return;    // Overflow               │
    │      if (total > dstsize) return;                // Doesn't fit            │
    │                                                                            │
    │      memcpy(dst, s1, len1);                                                │
    │      memcpy(dst + len1, s2, len2 + 1);                                     │
    │  }                                                                         │
    │                                                                            │
    └────────────────────────────────────────────────────────────────────────────┘

Format String Vulnerabilities

Related to string handling, format strings are another attack vector:

    ┌────────────────────────────────────────────────────────────────────────────┐
    │                    FORMAT STRING ATTACKS                                   │
    ├────────────────────────────────────────────────────────────────────────────┤
    │                                                                            │
    │  VULNERABLE:                                                               │
    │  ───────────                                                               │
    │  printf(user_input);  // User controls format string!                      │
    │                                                                            │
    │  ATTACKS:                                                                  │
    │  ────────                                                                  │
    │  Input: "%s%s%s%s%s"   → Crashes (reads from stack)                        │
    │  Input: "%x%x%x%x"    → Leaks stack data                                   │
    │  Input: "%n"          → Writes to memory! (number of chars written)        │
    │                                                                            │
    │  How %n works:                                                             │
    │  int count;                                                                │
    │  printf("Hello%n", &count);  // count = 5                                  │
    │                                                                            │
    │  Attack: Control stack to make %n write to chosen address                  │
    │                                                                            │
    │  SAFE:                                                                     │
    │  ─────                                                                     │
    │  printf("%s", user_input);  // User input is data, not format              │
    │  fputs(user_input, stdout); // No format interpretation                    │
    │                                                                            │
    │  RULE: Never pass user input as format string                              │
    │                                                                            │
    └────────────────────────────────────────────────────────────────────────────┘

Project Specification

What You Will Build

A complete safe string library with two components:

  1. BSD-style bounded functions: strlcpy, strlcat implementations that work on char arrays
  2. Capacity-tracked string type: A SafeString struct with safe operations

Functional Requirements

Bounded String Functions (strlcpy/strlcat compatible):

// Copy src to dst, guaranteeing null-termination
// Returns strlen(src), so truncation if return >= dstsize
size_t safe_strcpy(char *dst, const char *src, size_t dstsize);

// Append src to dst, guaranteeing null-termination
// Returns strlen(dst) + strlen(src), truncation if return >= dstsize
size_t safe_strcat(char *dst, const char *src, size_t dstsize);

// Bounded sprintf, always null-terminates
// Returns number of chars that would have been written
int safe_sprintf(char *dst, size_t dstsize, const char *fmt, ...);

// Safe substring extraction
size_t safe_substr(char *dst, size_t dstsize,
                   const char *src, size_t start, size_t len);

Capacity-Tracked SafeString Type:

// Opaque handle for type safety
typedef struct SafeString SafeString;

// Creation and destruction
SafeString *ss_create(size_t initial_capacity);
SafeString *ss_from_cstr(const char *cstr);
SafeString *ss_clone(const SafeString *s);
void ss_free(SafeString *s);

// Length and capacity queries (O(1))
size_t ss_len(const SafeString *s);
size_t ss_capacity(const SafeString *s);
bool ss_is_empty(const SafeString *s);

// C string access (read-only, null-terminated)
const char *ss_cstr(const SafeString *s);

// Modification operations (return false on allocation failure)
bool ss_set(SafeString *s, const char *cstr);
bool ss_append(SafeString *s, const char *cstr);
bool ss_append_char(SafeString *s, char c);
bool ss_append_fmt(SafeString *s, const char *fmt, ...);
bool ss_insert(SafeString *s, size_t pos, const char *cstr);
bool ss_erase(SafeString *s, size_t start, size_t len);

// In-place transformations
void ss_clear(SafeString *s);
void ss_trim(SafeString *s);
void ss_to_upper(SafeString *s);
void ss_to_lower(SafeString *s);

// Capacity management
bool ss_reserve(SafeString *s, size_t min_capacity);
void ss_shrink_to_fit(SafeString *s);

// Comparison
int ss_compare(const SafeString *a, const SafeString *b);
bool ss_equals(const SafeString *a, const SafeString *b);
bool ss_starts_with(const SafeString *s, const char *prefix);
bool ss_ends_with(const SafeString *s, const char *suffix);

// Search
size_t ss_find(const SafeString *s, const char *needle);  // SIZE_MAX if not found
size_t ss_rfind(const SafeString *s, const char *needle);

Non-Functional Requirements

  1. No buffer overflows: All operations must be bounds-checked
  2. Always null-terminated: String data must always end with ‘\0’
  3. Overflow checking: Length calculations must check for integer overflow
  4. No undefined behavior: Even with malicious inputs
  5. Thread safety: Individual strings are not thread-safe, but no global state
  6. Valgrind clean: No memory leaks, no invalid accesses
  7. CERT C compliant: Follow SEI CERT C Coding Standard

Example Usage and Output

#include "safestring.h"
#include <stdio.h>

int main(void) {
    // Bounded array operations
    char path[20];
    size_t needed;

    needed = safe_strcpy(path, "/home/user", sizeof(path));
    printf("After copy: '%s' (needed %zu bytes)\n", path, needed);
    // Output: After copy: '/home/user' (needed 10 bytes)

    needed = safe_strcat(path, "/documents/important/file.txt", sizeof(path));
    printf("After cat: '%s' (needed %zu bytes)\n", path, needed);
    // Output: After cat: '/home/user/docume' (needed 39 bytes)
    // Note: truncated! needed >= sizeof(path)

    if (needed >= sizeof(path)) {
        printf("WARNING: Path truncated!\n");
    }

    // SafeString operations
    SafeString *s = ss_from_cstr("Hello");
    printf("Initial: '%s' (len=%zu, cap=%zu)\n",
           ss_cstr(s), ss_len(s), ss_capacity(s));
    // Output: Initial: 'Hello' (len=5, cap=16)

    ss_append(s, ", World!");
    printf("After append: '%s'\n", ss_cstr(s));
    // Output: After append: 'Hello, World!'

    ss_to_upper(s);
    printf("Upper: '%s'\n", ss_cstr(s));
    // Output: Upper: 'HELLO, WORLD!'

    // Safe formatting
    char buffer[32];
    int ret = safe_sprintf(buffer, sizeof(buffer),
                          "User: %s, ID: %d",
                          "Administrator", 12345);
    printf("Formatted: '%s' (would need %d)\n", buffer, ret);
    // Output: Formatted: 'User: Administrator, ID: 1234' (would need 29)

    ss_free(s);
    return 0;
}

Real-World Outcome

When you complete this project, you will have a library that prevents this:

$ # Compile vulnerable program
$ gcc -o vuln vuln.c

$ # Normal usage
$ ./vuln "Hello"
Copied: Hello

$ # Stack smashing attack (without safe string library)
$ ./vuln $(python -c "print('A'*200)")
*** stack smashing detected ***
Aborted (core dumped)

$ # With your safe string library
$ ./safe_vuln $(python -c "print('A'*200)")
Copied: AAAAAAAAAAAAAAAA (truncated)
Warning: Input too long, truncated from 200 to 16 bytes

Solution Architecture

High-Level Design

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                        SAFE STRING LIBRARY                                  │
    ├─────────────────────────────────────────────────────────────────────────────┤
    │                                                                             │
    │  ┌─────────────────────────────────────────────────────────────────────┐    │
    │  │                     PUBLIC API LAYER                                │    │
    │  │  ┌─────────────────────┐  ┌──────────────────────────────────────┐  │    │
    │  │  │ Bounded Functions   │  │        SafeString Type               │  │    │
    │  │  │                     │  │                                      │  │    │
    │  │  │ safe_strcpy()       │  │ ss_create(), ss_free()               │  │    │
    │  │  │ safe_strcat()       │  │ ss_append(), ss_insert()             │  │    │
    │  │  │ safe_sprintf()      │  │ ss_find(), ss_compare()              │  │    │
    │  │  │ safe_substr()       │  │ ss_to_upper(), ss_trim()             │  │    │
    │  │  └─────────────────────┘  └──────────────────────────────────────┘  │    │
    │  └─────────────────────────────────────────────────────────────────────┘    │
    │                               │                                             │
    │                               ▼                                             │
    │  ┌─────────────────────────────────────────────────────────────────────┐    │
    │  │                     CORE SAFETY LAYER                               │    │
    │  │                                                                     │    │
    │  │  ┌─────────────────────┐  ┌─────────────────────┐                   │    │
    │  │  │ Bounds Checking     │  │ Overflow Detection  │                   │    │
    │  │  │                     │  │                     │                   │    │
    │  │  │ check_bounds()      │  │ safe_add()          │                   │    │
    │  │  │ check_overlap()     │  │ safe_mul()          │                   │    │
    │  │  │ validate_ptr()      │  │ check_size()        │                   │    │
    │  │  └─────────────────────┘  └─────────────────────┘                   │    │
    │  └─────────────────────────────────────────────────────────────────────┘    │
    │                               │                                             │
    │                               ▼                                             │
    │  ┌─────────────────────────────────────────────────────────────────────┐    │
    │  │                     MEMORY MANAGEMENT LAYER                         │    │
    │  │                                                                     │    │
    │  │  ┌─────────────────────┐  ┌─────────────────────┐                   │    │
    │  │  │ SafeString Storage  │  │ Growth Strategy     │                   │    │
    │  │  │                     │  │                     │                   │    │
    │  │  │ struct {            │  │ grow_capacity()     │                   │    │
    │  │  │   size_t capacity;  │  │ shrink_capacity()   │                   │    │
    │  │  │   size_t length;    │  │ align_capacity()    │                   │    │
    │  │  │   char data[];      │  │                     │                   │    │
    │  │  │ }                   │  │                     │                   │    │
    │  │  └─────────────────────┘  └─────────────────────┘                   │    │
    │  └─────────────────────────────────────────────────────────────────────┘    │
    │                                                                             │
    └─────────────────────────────────────────────────────────────────────────────┘

Module Structure

safe-string-library/
├── include/
│   └── safestring.h       # Public API header
├── src/
│   ├── bounded.c          # safe_strcpy, safe_strcat, etc.
│   ├── safestring.c       # SafeString type implementation
│   ├── safety.c           # Bounds checking, overflow detection
│   ├── safety.h           # Internal safety utilities
│   └── internal.h         # Internal macros and definitions
├── tests/
│   ├── test_bounded.c     # Bounded function tests
│   ├── test_safestring.c  # SafeString tests
│   ├── test_overflow.c    # Integer overflow tests
│   ├── test_fuzzing.c     # Fuzz testing harness
│   └── run_tests.sh       # Test runner
├── examples/
│   ├── path_builder.c     # Building file paths safely
│   ├── log_formatter.c    # Safe logging with formatting
│   └── config_parser.c    # Parsing config files safely
├── Makefile
├── CMakeLists.txt
└── README.md

Key Data Structures

/* Internal SafeString structure (not exposed in header) */
struct SafeString {
    size_t capacity;    /* Total allocated bytes including null */
    size_t length;      /* Current string length excluding null */
    char data[];        /* Flexible array member for string data */
};

/* Constants for growth strategy */
#define SS_MIN_CAPACITY     16
#define SS_GROWTH_FACTOR    2
#define SS_MAX_CAPACITY     (SIZE_MAX / 2)  /* Prevent overflow */

/* Error sentinel for find operations */
#define SS_NPOS SIZE_MAX

/* Internal result type for operations that can fail */
typedef enum {
    SS_OK = 0,
    SS_ERROR_NULL_PTR,
    SS_ERROR_OVERFLOW,
    SS_ERROR_ALLOC,
    SS_ERROR_BOUNDS
} ss_result_t;

Memory Layout

    ┌────────────────────────────────────────────────────────────────────────────┐
    │                    SafeString MEMORY LAYOUT                                │
    ├────────────────────────────────────────────────────────────────────────────┤
    │                                                                            │
    │  ss_from_cstr("Hello")                                                     │
    │                                                                            │
    │  ┌────────────────────────────────────────────────────────────────────┐    │
    │  │ capacity │  length  │ H │ e │ l │ l │ o │\0 │   │   │   │   │     │    │
    │  │    16    │    5     │   │   │   │   │   │   │   │   │   │   │     │    │
    │  └────────────────────────────────────────────────────────────────────┘    │
    │  │← 8 bytes→│← 8 bytes→│←────────── 16 bytes ─────────────────────→│       │
    │                                                                            │
    │  Total malloc'd: 8 + 8 + 16 = 32 bytes                                     │
    │  Usable capacity: 16 bytes (including null)                                │
    │  Max string length: 15 characters                                          │
    │                                                                            │
    │  After ss_append(s, " World!"):                                            │
    │  length now 12, capacity still 16                                          │
    │                                                                            │
    │  ┌────────────────────────────────────────────────────────────────────┐    │
    │  │ capacity │  length  │ H │ e │ l │ l │ o │   │ W │ o │ r │ l │ d   │    │
    │  │    16    │    12    │   │   │   │   │   │   │   │   │   │   │  !  │    │
    │  └────────────────────────────────────────────────────────────────────┘    │
    │                                                                            │
    │  After ss_append(s, "!!!!"):                                               │
    │  Needs 17 bytes, capacity grows to 32                                      │
    │                                                                            │
    └────────────────────────────────────────────────────────────────────────────┘

Implementation Guide

Phase 1: Core Safety Functions (Days 1-2)

Goals:

  • Implement integer overflow checking
  • Create bounds validation functions
  • Set up testing infrastructure

Tasks:

  1. Implement overflow-safe arithmetic:
/* safety.c */
#include <stdbool.h>
#include <stddef.h>
#include <limits.h>

/* Returns false if a + b would overflow size_t */
bool safe_size_add(size_t a, size_t b, size_t *result) {
    if (a > SIZE_MAX - b) {
        return false;
    }
    *result = a + b;
    return true;
}

/* Returns false if a * b would overflow size_t */
bool safe_size_mul(size_t a, size_t b, size_t *result) {
    if (a == 0 || b == 0) {
        *result = 0;
        return true;
    }
    if (a > SIZE_MAX / b) {
        return false;
    }
    *result = a * b;
    return true;
}
  1. Create validation macros:
/* internal.h */
#define SS_UNLIKELY(x) __builtin_expect(!!(x), 0)
#define SS_LIKELY(x)   __builtin_expect(!!(x), 1)

#define SS_VALIDATE_PTR(ptr) \
    do { \
        if (SS_UNLIKELY((ptr) == NULL)) { \
            return SS_ERROR_NULL_PTR; \
        } \
    } while (0)

#define SS_VALIDATE_BOUNDS(idx, len) \
    do { \
        if (SS_UNLIKELY((idx) > (len))) { \
            return SS_ERROR_BOUNDS; \
        } \
    } while (0)

Checkpoint: All safety utilities pass unit tests with edge cases.

Phase 2: Bounded String Functions (Days 3-5)

Goals:

  • Implement strlcpy semantics
  • Implement strlcat semantics
  • Add safe sprintf

Tasks:

  1. Implement safe_strcpy (strlcpy semantics):
/* bounded.c */
size_t safe_strcpy(char *dst, const char *src, size_t dstsize) {
    const char *s = src;
    size_t n = dstsize;

    if (n == 0) {
        /* No space for even null terminator */
        return strlen(src);
    }

    /* Copy as many bytes as will fit */
    while (n > 1 && *s != '\0') {
        *dst++ = *s++;
        n--;
    }

    /* Always null-terminate (dstsize > 0) */
    *dst = '\0';

    /* Return total length of src */
    while (*s != '\0') {
        s++;
    }

    return (size_t)(s - src);
}
  1. Implement safe_strcat (strlcat semantics):
size_t safe_strcat(char *dst, const char *src, size_t dstsize) {
    char *d = dst;
    const char *s = src;
    size_t n = dstsize;
    size_t dlen;

    /* Find end of dst within dstsize bytes */
    while (n > 0 && *d != '\0') {
        d++;
        n--;
    }
    dlen = (size_t)(d - dst);

    if (n == 0) {
        /* dst was not null-terminated within dstsize */
        return dlen + strlen(src);
    }

    /* Copy src, leaving room for null */
    while (*s != '\0') {
        if (n > 1) {
            *d++ = *s;
            n--;
        }
        s++;
    }
    *d = '\0';

    return dlen + (size_t)(s - src);
}
  1. Implement safe_sprintf:
#include <stdarg.h>
#include <stdio.h>

int safe_sprintf(char *dst, size_t dstsize, const char *fmt, ...) {
    va_list ap;
    int ret;

    if (dstsize == 0) {
        /* Can't write anything, but count what would be written */
        va_start(ap, fmt);
        ret = vsnprintf(NULL, 0, fmt, ap);
        va_end(ap);
        return ret;
    }

    va_start(ap, fmt);
    ret = vsnprintf(dst, dstsize, fmt, ap);
    va_end(ap);

    /* vsnprintf guarantees null-termination if dstsize > 0 */
    return ret;
}

Checkpoint: Bounded functions match BSD strlcpy/strlcat behavior in all tests.

Phase 3: SafeString Core (Days 6-9)

Goals:

  • Implement creation/destruction
  • Implement basic operations
  • Handle memory growth

Tasks:

  1. Core creation functions:
/* safestring.c */
#include <stdlib.h>
#include <string.h>

static size_t align_capacity(size_t requested) {
    /* Round up to power of 2, minimum SS_MIN_CAPACITY */
    if (requested < SS_MIN_CAPACITY) {
        return SS_MIN_CAPACITY;
    }

    size_t cap = SS_MIN_CAPACITY;
    while (cap < requested && cap < SS_MAX_CAPACITY) {
        cap *= 2;
    }
    return cap;
}

SafeString *ss_create(size_t initial_capacity) {
    size_t cap = align_capacity(initial_capacity > 0 ? initial_capacity : SS_MIN_CAPACITY);

    /* Overflow check for total allocation */
    size_t total;
    if (!safe_size_add(sizeof(SafeString), cap, &total)) {
        return NULL;
    }

    SafeString *s = malloc(total);
    if (s == NULL) {
        return NULL;
    }

    s->capacity = cap;
    s->length = 0;
    s->data[0] = '\0';

    return s;
}

SafeString *ss_from_cstr(const char *cstr) {
    if (cstr == NULL) {
        return ss_create(0);
    }

    size_t len = strlen(cstr);
    size_t needed;
    if (!safe_size_add(len, 1, &needed)) {
        return NULL;
    }

    SafeString *s = ss_create(needed);
    if (s == NULL) {
        return NULL;
    }

    memcpy(s->data, cstr, len + 1);
    s->length = len;

    return s;
}

void ss_free(SafeString *s) {
    free(s);  /* free(NULL) is safe */
}
  1. Growth strategy:
static bool ss_grow(SafeString **sp, size_t min_capacity) {
    SafeString *s = *sp;

    if (s->capacity >= min_capacity) {
        return true;  /* Already big enough */
    }

    /* Calculate new capacity (double until big enough) */
    size_t new_cap = s->capacity;
    while (new_cap < min_capacity && new_cap < SS_MAX_CAPACITY) {
        new_cap *= SS_GROWTH_FACTOR;
    }

    if (new_cap < min_capacity) {
        return false;  /* Would exceed maximum */
    }

    /* Reallocate */
    size_t total;
    if (!safe_size_add(sizeof(SafeString), new_cap, &total)) {
        return false;
    }

    SafeString *new_s = realloc(s, total);
    if (new_s == NULL) {
        return false;
    }

    new_s->capacity = new_cap;
    *sp = new_s;
    return true;
}
  1. Append operation:
bool ss_append(SafeString *s, const char *cstr) {
    if (s == NULL || cstr == NULL) {
        return false;
    }

    size_t add_len = strlen(cstr);
    size_t new_len;
    if (!safe_size_add(s->length, add_len, &new_len)) {
        return false;
    }

    size_t needed;
    if (!safe_size_add(new_len, 1, &needed)) {
        return false;
    }

    /* We can't use ss_grow with a pointer to s from outside... */
    /* This is a design consideration - see Architecture note */
    if (s->capacity < needed) {
        return false;  /* Would need reallocation - not possible with current ptr */
    }

    memcpy(s->data + s->length, cstr, add_len + 1);
    s->length = new_len;

    return true;
}

Note: The above append has a design limitation. For production, use a different approach - either return new pointer or use double indirection.

Checkpoint: SafeString creation, access, and basic modifications work correctly.

Phase 4: Advanced Operations (Days 10-12)

Goals:

  • Implement search operations
  • Implement transformations
  • Add formatting support

Tasks:

  1. Search operations:
size_t ss_find(const SafeString *s, const char *needle) {
    if (s == NULL || needle == NULL) {
        return SS_NPOS;
    }

    const char *found = strstr(s->data, needle);
    if (found == NULL) {
        return SS_NPOS;
    }

    return (size_t)(found - s->data);
}

bool ss_starts_with(const SafeString *s, const char *prefix) {
    if (s == NULL || prefix == NULL) {
        return false;
    }

    size_t prefix_len = strlen(prefix);
    if (prefix_len > s->length) {
        return false;
    }

    return memcmp(s->data, prefix, prefix_len) == 0;
}

bool ss_ends_with(const SafeString *s, const char *suffix) {
    if (s == NULL || suffix == NULL) {
        return false;
    }

    size_t suffix_len = strlen(suffix);
    if (suffix_len > s->length) {
        return false;
    }

    return memcmp(s->data + s->length - suffix_len, suffix, suffix_len) == 0;
}
  1. Transformations:
void ss_to_upper(SafeString *s) {
    if (s == NULL) return;

    for (size_t i = 0; i < s->length; i++) {
        if (s->data[i] >= 'a' && s->data[i] <= 'z') {
            s->data[i] -= 32;
        }
    }
}

void ss_trim(SafeString *s) {
    if (s == NULL || s->length == 0) return;

    /* Find first non-whitespace */
    size_t start = 0;
    while (start < s->length && isspace((unsigned char)s->data[start])) {
        start++;
    }

    /* Find last non-whitespace */
    size_t end = s->length;
    while (end > start && isspace((unsigned char)s->data[end - 1])) {
        end--;
    }

    /* Shift and truncate */
    size_t new_len = end - start;
    if (start > 0) {
        memmove(s->data, s->data + start, new_len);
    }
    s->data[new_len] = '\0';
    s->length = new_len;
}

Checkpoint: All operations work correctly with comprehensive test cases.

Phase 5: Testing and Hardening (Days 13-14)

Goals:

  • Comprehensive test coverage
  • Edge case handling
  • Security review

Tasks:

  1. Edge case tests:
/* test_bounded.c */
void test_strcpy_empty_dest(void) {
    char dest[1] = "";
    size_t ret = safe_strcpy(dest, "Hello", 1);
    assert(dest[0] == '\0');  /* Must be null-terminated */
    assert(ret == 5);         /* Returns strlen of src */
}

void test_strcpy_zero_size(void) {
    char dest[10] = "original";
    size_t ret = safe_strcpy(dest, "new", 0);
    assert(strcmp(dest, "original") == 0);  /* Unchanged */
    assert(ret == 3);  /* Still returns strlen(src) */
}

void test_strcat_full_buffer(void) {
    char dest[10] = "Hello";
    size_t ret = safe_strcat(dest, "World!", sizeof(dest));
    assert(strlen(dest) == 9);  /* Max that fits */
    assert(ret == 11);           /* Truncation detected */
    assert(dest[9] == '\0');     /* Null-terminated */
}
  1. Fuzz testing:
/* test_fuzzing.c */
#include <stdlib.h>
#include <time.h>

void fuzz_safe_strcpy(int iterations) {
    srand(time(NULL));

    for (int i = 0; i < iterations; i++) {
        /* Random destination size (0 to 1000) */
        size_t dstsize = rand() % 1001;
        char *dst = malloc(dstsize > 0 ? dstsize : 1);

        /* Random source length (0 to 2000) */
        size_t srclen = rand() % 2001;
        char *src = malloc(srclen + 1);
        memset(src, 'A', srclen);
        src[srclen] = '\0';

        /* Call function */
        size_t ret = safe_strcpy(dst, src, dstsize);

        /* Verify invariants */
        assert(ret == srclen);  /* Returns strlen(src) */
        if (dstsize > 0) {
            assert(dst[dstsize > srclen ? srclen : dstsize - 1] == '\0');
        }

        free(dst);
        free(src);
    }
    printf("Fuzz test passed: %d iterations\n", iterations);
}

Testing Strategy

Test Categories

Category Purpose Examples
Basic Functionality Core operations work Copy, append, find
Boundary Conditions Edge cases handled Empty strings, size 0
Overflow Protection Integer overflows caught SIZE_MAX inputs
Memory Safety No buffer overflows Truncation, null-term
Null Safety NULL inputs handled NULL src, NULL dst

Critical Test Cases

/* Must-pass tests for security certification */

/* 1. Buffer overflow prevention */
void test_no_overflow_on_long_input(void) {
    char small[8];
    safe_strcpy(small, "This is a very long string", sizeof(small));
    assert(strlen(small) == 7);  /* Truncated to fit */
    assert(small[7] == '\0');    /* Null-terminated */
}

/* 2. Integer overflow in length calculation */
void test_length_overflow_detected(void) {
    /* Simulate concatenating strings that would overflow size_t */
    SafeString *s = ss_create(10);
    /* This test requires mocking or specific setup */
    /* The library should reject operations that would overflow */
}

/* 3. NULL pointer safety */
void test_null_inputs_safe(void) {
    char buf[10] = "test";
    size_t ret = safe_strcpy(buf, NULL, sizeof(buf));
    /* Define behavior: either return 0 or leave buf unchanged */

    assert(safe_strcat(NULL, "test", 10) != 0 || true);  /* No crash */
}

/* 4. Zero-size buffer */
void test_zero_size_buffer(void) {
    char *nullbuf = NULL;
    size_t ret = safe_strcpy(nullbuf, "test", 0);
    assert(ret == 4);  /* Still returns strlen(src) */
    /* No crash, no write */
}

Common Pitfalls & Debugging

Frequent Mistakes

Pitfall Symptom Solution
Off-by-one in null terminator Unterminated strings Always reserve n+1 for n chars
Forgetting to check dstsize == 0 Crash or buffer underflow Handle 0 as special case first
Using strlen on unterminated string Crash or garbage Use strnlen or known length
Integer overflow in size calc Allocate too little Check with safe_add before math
Returning pointer from realloc Memory leak if NULL Check return before overwriting

Debugging Strategies

Use AddressSanitizer:

gcc -fsanitize=address -g -o test test.c safestring.c
./test

Valgrind for memory issues:

valgrind --leak-check=full --track-origins=yes ./test

Custom debug builds:

#ifdef DEBUG
#define SS_DEBUG_PRINT(fmt, ...) \
    fprintf(stderr, "[SS DEBUG] " fmt "\n", ##__VA_ARGS__)
#else
#define SS_DEBUG_PRINT(fmt, ...)
#endif

Extensions & Challenges

Beginner Extensions

  • Add ss_split() to split string into array
  • Add ss_join() to join array with delimiter
  • Add ss_replace() for find-and-replace
  • Add Unicode support (UTF-8 validation)

Intermediate Extensions

  • Implement rope data structure for large strings
  • Add copy-on-write optimization
  • Create small-string optimization (SSO)
  • Add regex matching with safe bounds

Advanced Extensions

  • Thread-safe variant with atomic operations
  • Memory pool allocator for SafeString
  • Zero-copy slicing and views
  • Compile-time bounds checking with static analysis

Real-World Connections

Famous Buffer Overflow Vulnerabilities

Morris Worm (1988): Exploited buffer overflow in fingerd via gets()

Code Red (2001): IIS buffer overflow in URL handling

Heartbleed (2014): OpenSSL buffer over-read exposing memory

Your library prevents all of these patterns.

Industry Standards

CERT C Secure Coding Standard:

  • STR31-C: Guarantee that storage for strings has sufficient space for data and null terminator
  • STR32-C: Do not pass a non-null-terminated string to a library function that expects a string

MISRA C:

  • Bounded string operations required for safety-critical systems

Resources

Essential Reading

Topic Source Chapter/Section
String vulnerabilities “Effective C” by Seacord Chapter 7: Strings
Secure coding SEI CERT C Coding Standard STR section
strlcpy design “strlcpy and strlcat” Miller & de Raadt OpenBSD paper
Buffer overflow exploits “Hacking: Art of Exploitation” Chapter 2
  • Previous: P12 (Bug Catalog) - Understanding C pitfalls
  • Next: P14 (Memory Debugger) - Runtime safety checking
  • Related: P16 (Portable Code Checker) - Static analysis

Self-Assessment Checklist

Understanding Verification

  • I can explain exactly how strcpy leads to buffer overflows
  • I understand the difference between strncpy and strlcpy semantics
  • I can identify integer overflow vulnerabilities in string code
  • I know why format strings are dangerous with user input

Implementation Verification

  • safe_strcpy always null-terminates (even when truncating)
  • safe_strcat correctly handles already-full destination
  • Integer overflow is caught before any allocation
  • NULL inputs do not cause crashes

Security Verification

  • No buffer overflows under any input
  • No integer overflows in size calculations
  • No undefined behavior with edge cases
  • Valgrind reports zero errors

Submission / Completion Criteria

Minimum Viable Completion:

  • safe_strcpy and safe_strcat working correctly
  • Basic SafeString creation and access
  • No buffer overflows in any test case

Full Completion:

  • All bounded functions implemented
  • Complete SafeString API
  • Comprehensive test suite (100+ tests)
  • Fuzz testing passes
  • Valgrind clean

Excellence:

  • Small-string optimization implemented
  • Thread-safe variant
  • Integration with static analyzer
  • Benchmarks showing performance vs std functions

The Core Question You’re Answering

“Why are C strings the source of so many security vulnerabilities, and how can we design string-handling APIs that make buffer overflows impossible?”

This project teaches you that security is not an afterthought but a design consideration. By building safe string functions from scratch, you understand exactly what makes code vulnerable and how to write APIs that prevent misuse.


This guide was expanded from EXPERT_C_PROGRAMMING_DEEP_DIVE.md. For the complete learning path, see the project index.