Project 2: Safe String Library

Project 2: Safe String Library

The Core Question: “Why is strcpy considered dangerous, and what would a safe version look like?”

Project Overview

Attribute Value
Difficulty Intermediate
Time Estimate Weekend (8-16 hours)
Language C
Prerequisites Project 1 or equivalent comfort with addresses
Main Book “The C Programming Language” by Kernighan & Ritchie

Learning Objectives

By completing this project, you will:

  1. Understand C string representation - Why “hello” takes 6 bytes, not 5
  2. Master buffer overflow mechanics - See exactly how strcpy corrupts memory
  3. Implement bounds checking - Build safe alternatives to dangerous functions
  4. Practice defensive programming - Always validate before operating
  5. Develop security intuition - Understand why ~70% of CVEs are memory safety issues

Theoretical Foundation

What IS a C String?

A C string is NOT a first-class type. It’s simply:

  • A sequence of char bytes
  • Terminated by a null byte ('\0' = 0x00)
  • Stored in contiguous memory
char str[] = "hello";

// Memory layout (6 bytes total!):
// Index:  0    1    2    3    4    5
// Value: 'h'  'e'  'l'  'l'  'o'  '\0'
// Hex:   0x68 0x65 0x6C 0x6C 0x6F 0x00

Critical Insight: strlen("hello") returns 5, but sizeof("hello") returns 6. The null terminator is essential but not counted by strlen.

Why C Strings Are Dangerous

C strings have three fundamental problems:

Problem 1: No Length Information

void dangerous_function(char *str) {
    // str is just an address
    // We have NO IDEA how long the string is
    // We HOPE there's a '\0' somewhere
}

Problem 2: No Bounds Checking

char dest[10];
char *src = "This string is way too long for dest!";
strcpy(dest, src);  // Copies until '\0' found
                    // Writes 37 bytes into 10-byte buffer!

Problem 3: The Destination Size is Unknown

// strcpy has NO WAY to know how big dest is
char *strcpy(char *dest, const char *src) {
    // Only knows dest's address, not its size!
    while (*src) {
        *dest++ = *src++;  // Just keeps writing...
    }
    *dest = '\0';
    return dest;
}

Buffer Overflow: Step by Step

When you overflow a stack buffer, you corrupt adjacent memory:

void vulnerable() {
    char buffer[10];      // 10 bytes
    int authenticated = 0; // 4 bytes, right after buffer

    // Stack layout:
    // [buffer: 10 bytes][authenticated: 4 bytes][saved rbp][return addr]

    strcpy(buffer, "AAAAAAAAAABBBB");  // 15 bytes!
    // buffer:       "AAAAAAAAAA" (10 bytes)
    // authenticated: "BBBB"       (overwritten!)
    // Now authenticated != 0, so any check passes!
}

The String Functions Hall of Shame

Function Problem Safer Alternative
gets() No length limit at all Removed from C11
strcpy() No dest size check strncpy(), strlcpy()
strcat() No dest size check strncat(), strlcat()
sprintf() No buffer size limit snprintf()
scanf("%s") No length limit scanf("%9s") with width

The strncpy Problem

strncpy is often recommended but has its own issues:

char dest[10];
strncpy(dest, "hello world", sizeof(dest));
// dest is now: "hello worl" (NO null terminator!)
// Using dest as a string is undefined behavior!

// strncpy also pads with zeros, wasting cycles:
strncpy(dest, "hi", sizeof(dest));
// dest: 'h', 'i', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0'

The strlcpy Solution (BSD)

strlcpy does what you actually want:

size_t strlcpy(char *dst, const char *src, size_t size) {
    // 1. Always null-terminates (if size > 0)
    // 2. Returns length of src (so you can detect truncation)
    // 3. Copies at most size-1 characters
}

char dest[10];
size_t len = strlcpy(dest, "hello world", sizeof(dest));
// dest: "hello wor\0" (properly terminated!)
// len: 11 (original length - so we know truncation happened)

if (len >= sizeof(dest)) {
    printf("Warning: string was truncated!\n");
}

Project Specification

What You’re Building

A bounds-checked string library with these functions:

// Safe string length (with maximum)
ssize_t safe_strlen(const char *s, size_t max_len);

// Safe string copy (always null-terminates)
size_t safe_strcpy(char *dst, const char *src, size_t dst_size);

// Safe string concatenation
size_t safe_strcat(char *dst, const char *src, size_t dst_size);

// Safe substring extraction
ssize_t safe_substr(char *dst, const char *src, size_t start, size_t len, size_t dst_size);

API Design Principles

  1. Always take destination size - Every function that writes must know the limit
  2. Always null-terminate - If dst_size > 0, result is always a valid string
  3. Return useful information - Indicate success, truncation, or error
  4. Validate inputs - Check for NULL pointers, zero sizes

Function Specifications

safe_strlen

// Returns: length of string, or -1 if no null found within max_len
// Note: Prevents reading past buffer bounds
ssize_t safe_strlen(const char *s, size_t max_len);

safe_strcpy

// Returns: length that WOULD have been copied (like snprintf)
// - If return value >= dst_size, truncation occurred
// - Always null-terminates if dst_size > 0
size_t safe_strcpy(char *dst, const char *src, size_t dst_size);

safe_strcat

// Returns: total length that would result (existing + added)
// - If return value >= dst_size, truncation occurred
// - Always null-terminates if dst_size > 0
size_t safe_strcat(char *dst, const char *src, size_t dst_size);

Solution Architecture

Module Design

safe_string/
├── safe_string.h       # Public API
├── safe_string.c       # Implementation
├── test_safe_string.c  # Unit tests
├── demo_overflow.c     # Overflow demonstrations
├── benchmark.c         # Performance comparison
└── Makefile

Header File Structure

#ifndef SAFE_STRING_H
#define SAFE_STRING_H

#include <stddef.h>
#include <sys/types.h>

// Error codes
#define SAFE_STR_OK          0
#define SAFE_STR_TRUNCATED   1
#define SAFE_STR_NULL_INPUT  -1
#define SAFE_STR_NO_TERMINATOR -2

// Core functions
ssize_t safe_strlen(const char *s, size_t max_len);
size_t safe_strcpy(char *dst, const char *src, size_t dst_size);
size_t safe_strcat(char *dst, const char *src, size_t dst_size);
ssize_t safe_substr(char *dst, const char *src, size_t start, size_t len, size_t dst_size);

// Utility functions
int safe_strcmp(const char *s1, const char *s2, size_t max_len);
char *safe_strdup(const char *s, size_t max_len);

#endif

Implementation Guide

Phase 1: safe_strlen (1-2 hours)

Goal: Count characters without reading past a maximum.

ssize_t safe_strlen(const char *s, size_t max_len) {
    if (s == NULL) {
        return -1;  // Invalid input
    }

    for (size_t i = 0; i < max_len; i++) {
        if (s[i] == '\0') {
            return (ssize_t)i;
        }
    }

    return -1;  // No null terminator found within max_len
}

Test Cases:

assert(safe_strlen("hello", 100) == 5);
assert(safe_strlen("hello", 5) == -1);   // No room for '\0' check
assert(safe_strlen("hello", 6) == 5);    // Just enough
assert(safe_strlen("", 100) == 0);       // Empty string
assert(safe_strlen(NULL, 100) == -1);    // NULL input

Phase 2: safe_strcpy (2-3 hours)

Goal: Copy with guaranteed null-termination.

size_t safe_strcpy(char *dst, const char *src, size_t dst_size) {
    if (dst == NULL || dst_size == 0) {
        return 0;
    }

    if (src == NULL) {
        dst[0] = '\0';
        return 0;
    }

    size_t src_len = 0;
    while (src[src_len] != '\0') {
        src_len++;
    }

    // Copy at most dst_size - 1 characters
    size_t copy_len = (src_len < dst_size - 1) ? src_len : dst_size - 1;

    for (size_t i = 0; i < copy_len; i++) {
        dst[i] = src[i];
    }
    dst[copy_len] = '\0';  // Always null-terminate

    return src_len;  // Return original length for truncation detection
}

Key Insight: By returning src_len, callers can detect truncation:

char buf[10];
if (safe_strcpy(buf, input, sizeof(buf)) >= sizeof(buf)) {
    printf("Warning: input was truncated\n");
}

Phase 3: safe_strcat (2-3 hours)

Goal: Concatenate without overflowing destination.

size_t safe_strcat(char *dst, const char *src, size_t dst_size) {
    if (dst == NULL || dst_size == 0) {
        return 0;
    }

    // Find current length of dst (within dst_size)
    size_t dst_len = 0;
    while (dst_len < dst_size && dst[dst_len] != '\0') {
        dst_len++;
    }

    if (dst_len >= dst_size) {
        // dst is not properly terminated within dst_size
        // Force termination and return error
        dst[dst_size - 1] = '\0';
        return dst_size;  // Indicates error
    }

    if (src == NULL) {
        return dst_len;
    }

    // Calculate source length
    size_t src_len = 0;
    while (src[src_len] != '\0') {
        src_len++;
    }

    // Space available for copying (including null terminator)
    size_t space_left = dst_size - dst_len - 1;

    // Copy as much as fits
    size_t copy_len = (src_len < space_left) ? src_len : space_left;

    for (size_t i = 0; i < copy_len; i++) {
        dst[dst_len + i] = src[i];
    }
    dst[dst_len + copy_len] = '\0';

    return dst_len + src_len;  // What the total WOULD be
}

Phase 4: Overflow Demonstration (1-2 hours)

Create a program that shows what happens with unsafe functions:

// demo_overflow.c
#include <stdio.h>
#include <string.h>

void demonstrate_overflow(void) {
    char buffer[10];
    int canary = 0xDEADBEEF;

    printf("BEFORE overflow:\n");
    printf("  buffer address: %p\n", (void*)buffer);
    printf("  canary address: %p\n", (void*)&canary);
    printf("  canary value:   0x%08X\n", canary);

    // This WILL overflow and corrupt canary
    strcpy(buffer, "AAAABBBBCCCCDDDD");

    printf("\nAFTER overflow:\n");
    printf("  canary value:   0x%08X\n", canary);
    printf("  canary is now CORRUPTED!\n");
}

Compile with AddressSanitizer to see the detection:

$ clang -fsanitize=address demo_overflow.c -o demo
$ ./demo
# AddressSanitizer will report the buffer overflow!

Testing Strategy

Test Suite Structure

// test_safe_string.c

void test_safe_strlen(void) {
    // Normal cases
    assert(safe_strlen("hello", 100) == 5);
    assert(safe_strlen("", 100) == 0);

    // Edge cases
    assert(safe_strlen("hello", 5) == -1);  // Exactly at limit (no '\0' check)
    assert(safe_strlen("hello", 6) == 5);   // One past limit
    assert(safe_strlen(NULL, 100) == -1);

    printf("safe_strlen: all tests passed\n");
}

void test_safe_strcpy(void) {
    char buf[10];

    // Normal copy
    assert(safe_strcpy(buf, "hello", sizeof(buf)) == 5);
    assert(strcmp(buf, "hello") == 0);

    // Truncation
    assert(safe_strcpy(buf, "this is too long", sizeof(buf)) == 16);
    assert(strlen(buf) == 9);  // Truncated to 9 chars + '\0'
    assert(buf[9] == '\0');    // Still null-terminated!

    // Empty string
    assert(safe_strcpy(buf, "", sizeof(buf)) == 0);
    assert(buf[0] == '\0');

    // NULL handling
    assert(safe_strcpy(buf, NULL, sizeof(buf)) == 0);

    printf("safe_strcpy: all tests passed\n");
}

void test_safe_strcat(void) {
    char buf[15];

    // Normal concatenation
    safe_strcpy(buf, "Hello", sizeof(buf));
    assert(safe_strcat(buf, " World", sizeof(buf)) == 11);
    assert(strcmp(buf, "Hello World") == 0);

    // Truncation during concat
    safe_strcpy(buf, "Hello", sizeof(buf));
    size_t result = safe_strcat(buf, " World and more", sizeof(buf));
    assert(result == 20);                  // Would have been 20 chars
    assert(strlen(buf) == 14);             // Actually 14 chars
    assert(buf[14] == '\0');               // Still terminated

    printf("safe_strcat: all tests passed\n");
}

Comparison with Standard Library

void compare_with_standard(void) {
    char safe_buf[10], unsafe_buf[10];

    printf("=== SAFE vs UNSAFE ===\n\n");

    // Test 1: Overflow scenario
    const char *too_long = "This is way too long for 10 bytes";

    // Unsafe version (DON'T DO THIS IN PRODUCTION)
    // strcpy(unsafe_buf, too_long);  // Would overflow!

    // Safe version
    size_t result = safe_strcpy(safe_buf, too_long, sizeof(safe_buf));
    printf("safe_strcpy returned: %zu\n", result);
    printf("String length: %zu\n", strlen(safe_buf));
    printf("Buffer content: '%s'\n", safe_buf);

    if (result >= sizeof(safe_buf)) {
        printf("NOTICE: String was truncated (would have been %zu chars)\n", result);
    }
}

Common Pitfalls and Debugging Tips

Pitfall 1: Off-by-One in Size Calculations

// WRONG: forgot to account for null terminator
char buf[10];
for (int i = 0; i < 10; i++) {  // Should be 9!
    buf[i] = 'A';
}
buf[10] = '\0';  // Out of bounds!

// CORRECT
for (int i = 0; i < 9; i++) {
    buf[i] = 'A';
}
buf[9] = '\0';

Pitfall 2: Assuming strlen() Includes Null Terminator

// WRONG
char *copy = malloc(strlen(src));  // Need +1 for '\0'!
strcpy(copy, src);

// CORRECT
char *copy = malloc(strlen(src) + 1);
strcpy(copy, src);

Pitfall 3: Not Checking Return Values

// WRONG: ignoring truncation
char buf[10];
safe_strcpy(buf, user_input, sizeof(buf));
// What if user_input was 100 chars? It got truncated!

// CORRECT
if (safe_strcpy(buf, user_input, sizeof(buf)) >= sizeof(buf)) {
    fprintf(stderr, "Error: input too long\n");
    return -1;
}

Debugging with Valgrind

$ valgrind --leak-check=full ./test_safe_string

Check for:

  • Invalid reads (reading past buffer)
  • Invalid writes (buffer overflow)
  • Memory leaks

Extensions and Challenges

Challenge 1: safe_sprintf

Implement a safe version of sprintf:

int safe_sprintf(char *buf, size_t buf_size, const char *fmt, ...);
// Returns: number of chars that WOULD have been written
// Always null-terminates if buf_size > 0

Challenge 2: String Builder

Create a dynamic string builder that grows as needed:

typedef struct {
    char *data;
    size_t length;
    size_t capacity;
} StringBuilder;

void sb_init(StringBuilder *sb);
void sb_append(StringBuilder *sb, const char *str);
char *sb_to_string(StringBuilder *sb);  // Returns owned copy
void sb_free(StringBuilder *sb);

Challenge 3: Unicode-Aware strlen

// Count Unicode code points, not bytes
size_t utf8_strlen(const char *s);

// Example: "café" is 4 code points but 5 bytes (é = 2 bytes in UTF-8)

Challenge 4: Fuzz Testing

Write a fuzzer that generates random inputs:

void fuzz_safe_strcpy(int iterations) {
    for (int i = 0; i < iterations; i++) {
        size_t src_len = rand() % 1000;
        size_t dst_size = rand() % 100 + 1;

        char *src = generate_random_string(src_len);
        char *dst = malloc(dst_size);

        safe_strcpy(dst, src, dst_size);

        // Verify invariants:
        assert(strlen(dst) < dst_size);
        assert(dst[strlen(dst)] == '\0');

        free(src);
        free(dst);
    }
}

Real-World Connections

Connection 1: CVE Database

Search for “buffer overflow” on CVE databases—you’ll find thousands of vulnerabilities that would have been prevented by bounds checking.

Connection 2: Modern Language Design

Languages like Rust prevent these issues at compile time:

  • Rust strings know their length
  • Rust prevents buffer overflows via the borrow checker
  • Rust’s String type is always valid UTF-8

Connection 3: Production Libraries

Study how production code handles strings:

  • OpenBSD’s strlcpy/strlcat (your inspiration)
  • SQLite’s string handling (extremely careful)
  • Linux kernel’s string functions

Interview Questions You Can Now Answer

  1. “What’s wrong with strcpy and how would you fix it?”
    • strcpy doesn’t know destination size
    • Fix: Pass destination size, copy at most size-1 chars, always null-terminate
  2. “What is a buffer overflow? How does it lead to code execution?”
    • Writing past buffer boundaries corrupts adjacent memory
    • Can overwrite return addresses to redirect execution
  3. “What’s the difference between strncpy and strlcpy?”
    • strncpy: May not null-terminate, pads with zeros
    • strlcpy: Always null-terminates, returns source length
  4. “Why does "hello" take 6 bytes of memory?”
    • 5 characters + 1 null terminator
  5. “How would you implement strlen without using any library functions?”
    • Loop until you find ‘\0’, count iterations
  6. “What happens if you pass a non-null-terminated string to printf("%s", ...)?”
    • Undefined behavior—printf reads until it finds ‘\0’ (potentially forever)

Resources

Books

  • “The C Programming Language” by K&R - Chapter 5 (Pointers and Arrays)
  • “Effective C” by Robert Seacord - Chapter 7 (Strings)
  • “Secure Coding in C and C++” by Robert Seacord - Chapter 2 (Strings)

Online

Tools

  • AddressSanitizer (-fsanitize=address)
  • Valgrind
  • Coverity (static analysis)

Self-Assessment Checklist

Before moving to the next project, you should be able to:

  • Explain why "hello" is 6 bytes
  • Describe exactly why strcpy(small_buffer, huge_string) corrupts memory
  • Implement safe_strcpy from scratch
  • Detect buffer overflows using AddressSanitizer
  • Explain the difference between strncpy and strlcpy
  • Design a function API that makes buffer overflows impossible

Final Milestone: You instinctively check buffer sizes before any string operation.