Project 2: Safe String Library

The Core Question: “Why is strcpy considered dangerous, and what would a safe version look like?”

Project Overview

Attribute	Value
Difficulty	Intermediate
Time Estimate	Weekend (8-16 hours)
Language	C
Prerequisites	Project 1 or equivalent comfort with addresses
Main Book	“The C Programming Language” by Kernighan & Ritchie

Learning Objectives

By completing this project, you will:

Understand C string representation - Why “hello” takes 6 bytes, not 5
Master buffer overflow mechanics - See exactly how strcpy corrupts memory
Implement bounds checking - Build safe alternatives to dangerous functions
Practice defensive programming - Always validate before operating
Develop security intuition - Understand why ~70% of CVEs are memory safety issues

Theoretical Foundation

What IS a C String?

A C string is NOT a first-class type. It’s simply:

A sequence of char bytes
Terminated by a null byte ('\0' = 0x00)
Stored in contiguous memory

char str[] = "hello";

// Memory layout (6 bytes total!):
// Index:  0    1    2    3    4    5
// Value: 'h'  'e'  'l'  'l'  'o'  '\0'
// Hex:   0x68 0x65 0x6C 0x6C 0x6F 0x00

Critical Insight: strlen("hello") returns 5, but sizeof("hello") returns 6. The null terminator is essential but not counted by strlen.

Why C Strings Are Dangerous

C strings have three fundamental problems:

Problem 1: No Length Information

void dangerous_function(char *str) {
    // str is just an address
    // We have NO IDEA how long the string is
    // We HOPE there's a '\0' somewhere
}

Problem 2: No Bounds Checking

char dest[10];
char *src = "This string is way too long for dest!";
strcpy(dest, src);  // Copies until '\0' found
                    // Writes 37 bytes into 10-byte buffer!

Problem 3: The Destination Size is Unknown

// strcpy has NO WAY to know how big dest is
char *strcpy(char *dest, const char *src) {
    // Only knows dest's address, not its size!
    while (*src) {
        *dest++ = *src++;  // Just keeps writing...
    }
    *dest = '\0';
    return dest;
}

Buffer Overflow: Step by Step

When you overflow a stack buffer, you corrupt adjacent memory:

void vulnerable() {
    char buffer[10];      // 10 bytes
    int authenticated = 0; // 4 bytes, right after buffer

    // Stack layout:
    // [buffer: 10 bytes][authenticated: 4 bytes][saved rbp][return addr]

    strcpy(buffer, "AAAAAAAAAABBBB");  // 15 bytes!
    // buffer:       "AAAAAAAAAA" (10 bytes)
    // authenticated: "BBBB"       (overwritten!)
    // Now authenticated != 0, so any check passes!
}

The String Functions Hall of Shame

Function	Problem	Safer Alternative
`gets()`	No length limit at all	Removed from C11
`strcpy()`	No dest size check	`strncpy()`, `strlcpy()`
`strcat()`	No dest size check	`strncat()`, `strlcat()`
`sprintf()`	No buffer size limit	`snprintf()`
`scanf("%s")`	No length limit	`scanf("%9s")` with width

The strncpy Problem

strncpy is often recommended but has its own issues:

char dest[10];
strncpy(dest, "hello world", sizeof(dest));
// dest is now: "hello worl" (NO null terminator!)
// Using dest as a string is undefined behavior!

// strncpy also pads with zeros, wasting cycles:
strncpy(dest, "hi", sizeof(dest));
// dest: 'h', 'i', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0'

The strlcpy Solution (BSD)

strlcpy does what you actually want:

size_t strlcpy(char *dst, const char *src, size_t size) {
    // 1. Always null-terminates (if size > 0)
    // 2. Returns length of src (so you can detect truncation)
    // 3. Copies at most size-1 characters
}

char dest[10];
size_t len = strlcpy(dest, "hello world", sizeof(dest));
// dest: "hello wor\0" (properly terminated!)
// len: 11 (original length - so we know truncation happened)

if (len >= sizeof(dest)) {
    printf("Warning: string was truncated!\n");
}

Project Specification

What You’re Building

A bounds-checked string library with these functions:

// Safe string length (with maximum)
ssize_t safe_strlen(const char *s, size_t max_len);

// Safe string copy (always null-terminates)
size_t safe_strcpy(char *dst, const char *src, size_t dst_size);

// Safe string concatenation
size_t safe_strcat(char *dst, const char *src, size_t dst_size);

// Safe substring extraction
ssize_t safe_substr(char *dst, const char *src, size_t start, size_t len, size_t dst_size);

API Design Principles

Always take destination size - Every function that writes must know the limit
Always null-terminate - If dst_size > 0, result is always a valid string
Return useful information - Indicate success, truncation, or error
Validate inputs - Check for NULL pointers, zero sizes

Function Specifications

`safe_strlen`

// Returns: length of string, or -1 if no null found within max_len
// Note: Prevents reading past buffer bounds
ssize_t safe_strlen(const char *s, size_t max_len);

`safe_strcpy`

// Returns: length that WOULD have been copied (like snprintf)
// - If return value >= dst_size, truncation occurred
// - Always null-terminates if dst_size > 0
size_t safe_strcpy(char *dst, const char *src, size_t dst_size);

`safe_strcat`

// Returns: total length that would result (existing + added)
// - If return value >= dst_size, truncation occurred
// - Always null-terminates if dst_size > 0
size_t safe_strcat(char *dst, const char *src, size_t dst_size);

Solution Architecture

Module Design

safe_string/
├── safe_string.h       # Public API
├── safe_string.c       # Implementation
├── test_safe_string.c  # Unit tests
├── demo_overflow.c     # Overflow demonstrations
├── benchmark.c         # Performance comparison
└── Makefile

Header File Structure

#ifndef SAFE_STRING_H
#define SAFE_STRING_H

#include <stddef.h>
#include <sys/types.h>

// Error codes
#define SAFE_STR_OK          0
#define SAFE_STR_TRUNCATED   1
#define SAFE_STR_NULL_INPUT  -1
#define SAFE_STR_NO_TERMINATOR -2

// Core functions
ssize_t safe_strlen(const char *s, size_t max_len);
size_t safe_strcpy(char *dst, const char *src, size_t dst_size);
size_t safe_strcat(char *dst, const char *src, size_t dst_size);
ssize_t safe_substr(char *dst, const char *src, size_t start, size_t len, size_t dst_size);

// Utility functions
int safe_strcmp(const char *s1, const char *s2, size_t max_len);
char *safe_strdup(const char *s, size_t max_len);

#endif

Implementation Guide

Phase 1: safe_strlen (1-2 hours)

Goal: Count characters without reading past a maximum.

ssize_t safe_strlen(const char *s, size_t max_len) {
    if (s == NULL) {
        return -1;  // Invalid input
    }

    for (size_t i = 0; i < max_len; i++) {
        if (s[i] == '\0') {
            return (ssize_t)i;
        }
    }

    return -1;  // No null terminator found within max_len
}

Test Cases:

assert(safe_strlen("hello", 100) == 5);
assert(safe_strlen("hello", 5) == -1);   // No room for '\0' check
assert(safe_strlen("hello", 6) == 5);    // Just enough
assert(safe_strlen("", 100) == 0);       // Empty string
assert(safe_strlen(NULL, 100) == -1);    // NULL input

Phase 2: safe_strcpy (2-3 hours)

Goal: Copy with guaranteed null-termination.

size_t safe_strcpy(char *dst, const char *src, size_t dst_size) {
    if (dst == NULL || dst_size == 0) {
        return 0;
    }

    if (src == NULL) {
        dst[0] = '\0';
        return 0;
    }

    size_t src_len = 0;
    while (src[src_len] != '\0') {
        src_len++;
    }

    // Copy at most dst_size - 1 characters
    size_t copy_len = (src_len < dst_size - 1) ? src_len : dst_size - 1;

    for (size_t i = 0; i < copy_len; i++) {
        dst[i] = src[i];
    }
    dst[copy_len] = '\0';  // Always null-terminate

    return src_len;  // Return original length for truncation detection
}

Key Insight: By returning src_len, callers can detect truncation:

char buf[10];
if (safe_strcpy(buf, input, sizeof(buf)) >= sizeof(buf)) {
    printf("Warning: input was truncated\n");
}

Phase 3: safe_strcat (2-3 hours)

Goal: Concatenate without overflowing destination.

size_t safe_strcat(char *dst, const char *src, size_t dst_size) {
    if (dst == NULL || dst_size == 0) {
        return 0;
    }

    // Find current length of dst (within dst_size)
    size_t dst_len = 0;
    while (dst_len < dst_size && dst[dst_len] != '\0') {
        dst_len++;
    }

    if (dst_len >= dst_size) {
        // dst is not properly terminated within dst_size
        // Force termination and return error
        dst[dst_size - 1] = '\0';
        return dst_size;  // Indicates error
    }

    if (src == NULL) {
        return dst_len;
    }

    // Calculate source length
    size_t src_len = 0;
    while (src[src_len] != '\0') {
        src_len++;
    }

    // Space available for copying (including null terminator)
    size_t space_left = dst_size - dst_len - 1;

    // Copy as much as fits
    size_t copy_len = (src_len < space_left) ? src_len : space_left;

    for (size_t i = 0; i < copy_len; i++) {
        dst[dst_len + i] = src[i];
    }
    dst[dst_len + copy_len] = '\0';

    return dst_len + src_len;  // What the total WOULD be
}

Phase 4: Overflow Demonstration (1-2 hours)

Create a program that shows what happens with unsafe functions:

// demo_overflow.c
#include <stdio.h>
#include <string.h>

void demonstrate_overflow(void) {
    char buffer[10];
    int canary = 0xDEADBEEF;

    printf("BEFORE overflow:\n");
    printf("  buffer address: %p\n", (void*)buffer);
    printf("  canary address: %p\n", (void*)&canary);
    printf("  canary value:   0x%08X\n", canary);

    // This WILL overflow and corrupt canary
    strcpy(buffer, "AAAABBBBCCCCDDDD");

    printf("\nAFTER overflow:\n");
    printf("  canary value:   0x%08X\n", canary);
    printf("  canary is now CORRUPTED!\n");
}

Compile with AddressSanitizer to see the detection:

$ clang -fsanitize=address demo_overflow.c -o demo
$ ./demo
# AddressSanitizer will report the buffer overflow!

Testing Strategy

Test Suite Structure

// test_safe_string.c

void test_safe_strlen(void) {
    // Normal cases
    assert(safe_strlen("hello", 100) == 5);
    assert(safe_strlen("", 100) == 0);

    // Edge cases
    assert(safe_strlen("hello", 5) == -1);  // Exactly at limit (no '\0' check)
    assert(safe_strlen("hello", 6) == 5);   // One past limit
    assert(safe_strlen(NULL, 100) == -1);

    printf("safe_strlen: all tests passed\n");
}

void test_safe_strcpy(void) {
    char buf[10];

    // Normal copy
    assert(safe_strcpy(buf, "hello", sizeof(buf)) == 5);
    assert(strcmp(buf, "hello") == 0);

    // Truncation
    assert(safe_strcpy(buf, "this is too long", sizeof(buf)) == 16);
    assert(strlen(buf) == 9);  // Truncated to 9 chars + '\0'
    assert(buf[9] == '\0');    // Still null-terminated!

    // Empty string
    assert(safe_strcpy(buf, "", sizeof(buf)) == 0);
    assert(buf[0] == '\0');

    // NULL handling
    assert(safe_strcpy(buf, NULL, sizeof(buf)) == 0);

    printf("safe_strcpy: all tests passed\n");
}

void test_safe_strcat(void) {
    char buf[15];

    // Normal concatenation
    safe_strcpy(buf, "Hello", sizeof(buf));
    assert(safe_strcat(buf, " World", sizeof(buf)) == 11);
    assert(strcmp(buf, "Hello World") == 0);

    // Truncation during concat
    safe_strcpy(buf, "Hello", sizeof(buf));
    size_t result = safe_strcat(buf, " World and more", sizeof(buf));
    assert(result == 20);                  // Would have been 20 chars
    assert(strlen(buf) == 14);             // Actually 14 chars
    assert(buf[14] == '\0');               // Still terminated

    printf("safe_strcat: all tests passed\n");
}

Comparison with Standard Library

void compare_with_standard(void) {
    char safe_buf[10], unsafe_buf[10];

    printf("=== SAFE vs UNSAFE ===\n\n");

    // Test 1: Overflow scenario
    const char *too_long = "This is way too long for 10 bytes";

    // Unsafe version (DON'T DO THIS IN PRODUCTION)
    // strcpy(unsafe_buf, too_long);  // Would overflow!

    // Safe version
    size_t result = safe_strcpy(safe_buf, too_long, sizeof(safe_buf));
    printf("safe_strcpy returned: %zu\n", result);
    printf("String length: %zu\n", strlen(safe_buf));
    printf("Buffer content: '%s'\n", safe_buf);

    if (result >= sizeof(safe_buf)) {
        printf("NOTICE: String was truncated (would have been %zu chars)\n", result);
    }
}

Common Pitfalls and Debugging Tips

Pitfall 1: Off-by-One in Size Calculations

// WRONG: forgot to account for null terminator
char buf[10];
for (int i = 0; i < 10; i++) {  // Should be 9!
    buf[i] = 'A';
}
buf[10] = '\0';  // Out of bounds!

// CORRECT
for (int i = 0; i < 9; i++) {
    buf[i] = 'A';
}
buf[9] = '\0';

Pitfall 2: Assuming strlen() Includes Null Terminator

// WRONG
char *copy = malloc(strlen(src));  // Need +1 for '\0'!
strcpy(copy, src);

// CORRECT
char *copy = malloc(strlen(src) + 1);
strcpy(copy, src);

Pitfall 3: Not Checking Return Values

// WRONG: ignoring truncation
char buf[10];
safe_strcpy(buf, user_input, sizeof(buf));
// What if user_input was 100 chars? It got truncated!

// CORRECT
if (safe_strcpy(buf, user_input, sizeof(buf)) >= sizeof(buf)) {
    fprintf(stderr, "Error: input too long\n");
    return -1;
}

Debugging with Valgrind

$ valgrind --leak-check=full ./test_safe_string

Check for:

Invalid reads (reading past buffer)
Invalid writes (buffer overflow)
Memory leaks

Extensions and Challenges

Challenge 1: safe_sprintf

Implement a safe version of sprintf:

int safe_sprintf(char *buf, size_t buf_size, const char *fmt, ...);
// Returns: number of chars that WOULD have been written
// Always null-terminates if buf_size > 0

Challenge 2: String Builder

Create a dynamic string builder that grows as needed:

typedef struct {
    char *data;
    size_t length;
    size_t capacity;
} StringBuilder;

void sb_init(StringBuilder *sb);
void sb_append(StringBuilder *sb, const char *str);
char *sb_to_string(StringBuilder *sb);  // Returns owned copy
void sb_free(StringBuilder *sb);

Challenge 3: Unicode-Aware strlen

// Count Unicode code points, not bytes
size_t utf8_strlen(const char *s);

// Example: "café" is 4 code points but 5 bytes (é = 2 bytes in UTF-8)

Challenge 4: Fuzz Testing

Write a fuzzer that generates random inputs:

void fuzz_safe_strcpy(int iterations) {
    for (int i = 0; i < iterations; i++) {
        size_t src_len = rand() % 1000;
        size_t dst_size = rand() % 100 + 1;

        char *src = generate_random_string(src_len);
        char *dst = malloc(dst_size);

        safe_strcpy(dst, src, dst_size);

        // Verify invariants:
        assert(strlen(dst) < dst_size);
        assert(dst[strlen(dst)] == '\0');

        free(src);
        free(dst);
    }
}

Real-World Connections

Connection 1: CVE Database

Search for “buffer overflow” on CVE databases—you’ll find thousands of vulnerabilities that would have been prevented by bounds checking.

Connection 2: Modern Language Design

Languages like Rust prevent these issues at compile time:

Rust strings know their length
Rust prevents buffer overflows via the borrow checker
Rust’s String type is always valid UTF-8

Connection 3: Production Libraries

Study how production code handles strings:

OpenBSD’s strlcpy/strlcat (your inspiration)
SQLite’s string handling (extremely careful)
Linux kernel’s string functions

Interview Questions You Can Now Answer

“What’s wrong with strcpy and how would you fix it?”
- strcpy doesn’t know destination size
- Fix: Pass destination size, copy at most size-1 chars, always null-terminate
“What is a buffer overflow? How does it lead to code execution?”
- Writing past buffer boundaries corrupts adjacent memory
- Can overwrite return addresses to redirect execution
“What’s the difference between strncpy and strlcpy?”
- strncpy: May not null-terminate, pads with zeros
- strlcpy: Always null-terminates, returns source length
“Why does "hello" take 6 bytes of memory?”
- 5 characters + 1 null terminator
“How would you implement strlen without using any library functions?”
- Loop until you find ‘\0’, count iterations
“What happens if you pass a non-null-terminated string to printf("%s", ...)?”
- Undefined behavior—printf reads until it finds ‘\0’ (potentially forever)

Resources

Books

“The C Programming Language” by K&R - Chapter 5 (Pointers and Arrays)
“Effective C” by Robert Seacord - Chapter 7 (Strings)
“Secure Coding in C and C++” by Robert Seacord - Chapter 2 (Strings)

Online

Tools

AddressSanitizer (-fsanitize=address)
Valgrind
Coverity (static analysis)

Self-Assessment Checklist

Before moving to the next project, you should be able to:

Explain why "hello" is 6 bytes
Describe exactly why strcpy(small_buffer, huge_string) corrupts memory
Implement safe_strcpy from scratch
Detect buffer overflows using AddressSanitizer
Explain the difference between strncpy and strlcpy
Design a function API that makes buffer overflows impossible

Final Milestone: You instinctively check buffer sizes before any string operation.