Project 2: Safe String Library
Project 2: Safe String Library
The Core Question: “Why is
strcpyconsidered dangerous, and what would a safe version look like?”
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | Weekend (8-16 hours) |
| Language | C |
| Prerequisites | Project 1 or equivalent comfort with addresses |
| Main Book | “The C Programming Language” by Kernighan & Ritchie |
Learning Objectives
By completing this project, you will:
- Understand C string representation - Why “hello” takes 6 bytes, not 5
- Master buffer overflow mechanics - See exactly how
strcpycorrupts memory - Implement bounds checking - Build safe alternatives to dangerous functions
- Practice defensive programming - Always validate before operating
- Develop security intuition - Understand why ~70% of CVEs are memory safety issues
Theoretical Foundation
What IS a C String?
A C string is NOT a first-class type. It’s simply:
- A sequence of
charbytes - Terminated by a null byte (
'\0'=0x00) - Stored in contiguous memory
char str[] = "hello";
// Memory layout (6 bytes total!):
// Index: 0 1 2 3 4 5
// Value: 'h' 'e' 'l' 'l' 'o' '\0'
// Hex: 0x68 0x65 0x6C 0x6C 0x6F 0x00
Critical Insight: strlen("hello") returns 5, but sizeof("hello") returns 6. The null terminator is essential but not counted by strlen.
Why C Strings Are Dangerous
C strings have three fundamental problems:
Problem 1: No Length Information
void dangerous_function(char *str) {
// str is just an address
// We have NO IDEA how long the string is
// We HOPE there's a '\0' somewhere
}
Problem 2: No Bounds Checking
char dest[10];
char *src = "This string is way too long for dest!";
strcpy(dest, src); // Copies until '\0' found
// Writes 37 bytes into 10-byte buffer!
Problem 3: The Destination Size is Unknown
// strcpy has NO WAY to know how big dest is
char *strcpy(char *dest, const char *src) {
// Only knows dest's address, not its size!
while (*src) {
*dest++ = *src++; // Just keeps writing...
}
*dest = '\0';
return dest;
}
Buffer Overflow: Step by Step
When you overflow a stack buffer, you corrupt adjacent memory:
void vulnerable() {
char buffer[10]; // 10 bytes
int authenticated = 0; // 4 bytes, right after buffer
// Stack layout:
// [buffer: 10 bytes][authenticated: 4 bytes][saved rbp][return addr]
strcpy(buffer, "AAAAAAAAAABBBB"); // 15 bytes!
// buffer: "AAAAAAAAAA" (10 bytes)
// authenticated: "BBBB" (overwritten!)
// Now authenticated != 0, so any check passes!
}
The String Functions Hall of Shame
| Function | Problem | Safer Alternative |
|---|---|---|
gets() |
No length limit at all | Removed from C11 |
strcpy() |
No dest size check | strncpy(), strlcpy() |
strcat() |
No dest size check | strncat(), strlcat() |
sprintf() |
No buffer size limit | snprintf() |
scanf("%s") |
No length limit | scanf("%9s") with width |
The strncpy Problem
strncpy is often recommended but has its own issues:
char dest[10];
strncpy(dest, "hello world", sizeof(dest));
// dest is now: "hello worl" (NO null terminator!)
// Using dest as a string is undefined behavior!
// strncpy also pads with zeros, wasting cycles:
strncpy(dest, "hi", sizeof(dest));
// dest: 'h', 'i', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0'
The strlcpy Solution (BSD)
strlcpy does what you actually want:
size_t strlcpy(char *dst, const char *src, size_t size) {
// 1. Always null-terminates (if size > 0)
// 2. Returns length of src (so you can detect truncation)
// 3. Copies at most size-1 characters
}
char dest[10];
size_t len = strlcpy(dest, "hello world", sizeof(dest));
// dest: "hello wor\0" (properly terminated!)
// len: 11 (original length - so we know truncation happened)
if (len >= sizeof(dest)) {
printf("Warning: string was truncated!\n");
}
Project Specification
What You’re Building
A bounds-checked string library with these functions:
// Safe string length (with maximum)
ssize_t safe_strlen(const char *s, size_t max_len);
// Safe string copy (always null-terminates)
size_t safe_strcpy(char *dst, const char *src, size_t dst_size);
// Safe string concatenation
size_t safe_strcat(char *dst, const char *src, size_t dst_size);
// Safe substring extraction
ssize_t safe_substr(char *dst, const char *src, size_t start, size_t len, size_t dst_size);
API Design Principles
- Always take destination size - Every function that writes must know the limit
- Always null-terminate - If dst_size > 0, result is always a valid string
- Return useful information - Indicate success, truncation, or error
- Validate inputs - Check for NULL pointers, zero sizes
Function Specifications
safe_strlen
// Returns: length of string, or -1 if no null found within max_len
// Note: Prevents reading past buffer bounds
ssize_t safe_strlen(const char *s, size_t max_len);
safe_strcpy
// Returns: length that WOULD have been copied (like snprintf)
// - If return value >= dst_size, truncation occurred
// - Always null-terminates if dst_size > 0
size_t safe_strcpy(char *dst, const char *src, size_t dst_size);
safe_strcat
// Returns: total length that would result (existing + added)
// - If return value >= dst_size, truncation occurred
// - Always null-terminates if dst_size > 0
size_t safe_strcat(char *dst, const char *src, size_t dst_size);
Solution Architecture
Module Design
safe_string/
├── safe_string.h # Public API
├── safe_string.c # Implementation
├── test_safe_string.c # Unit tests
├── demo_overflow.c # Overflow demonstrations
├── benchmark.c # Performance comparison
└── Makefile
Header File Structure
#ifndef SAFE_STRING_H
#define SAFE_STRING_H
#include <stddef.h>
#include <sys/types.h>
// Error codes
#define SAFE_STR_OK 0
#define SAFE_STR_TRUNCATED 1
#define SAFE_STR_NULL_INPUT -1
#define SAFE_STR_NO_TERMINATOR -2
// Core functions
ssize_t safe_strlen(const char *s, size_t max_len);
size_t safe_strcpy(char *dst, const char *src, size_t dst_size);
size_t safe_strcat(char *dst, const char *src, size_t dst_size);
ssize_t safe_substr(char *dst, const char *src, size_t start, size_t len, size_t dst_size);
// Utility functions
int safe_strcmp(const char *s1, const char *s2, size_t max_len);
char *safe_strdup(const char *s, size_t max_len);
#endif
Implementation Guide
Phase 1: safe_strlen (1-2 hours)
Goal: Count characters without reading past a maximum.
ssize_t safe_strlen(const char *s, size_t max_len) {
if (s == NULL) {
return -1; // Invalid input
}
for (size_t i = 0; i < max_len; i++) {
if (s[i] == '\0') {
return (ssize_t)i;
}
}
return -1; // No null terminator found within max_len
}
Test Cases:
assert(safe_strlen("hello", 100) == 5);
assert(safe_strlen("hello", 5) == -1); // No room for '\0' check
assert(safe_strlen("hello", 6) == 5); // Just enough
assert(safe_strlen("", 100) == 0); // Empty string
assert(safe_strlen(NULL, 100) == -1); // NULL input
Phase 2: safe_strcpy (2-3 hours)
Goal: Copy with guaranteed null-termination.
size_t safe_strcpy(char *dst, const char *src, size_t dst_size) {
if (dst == NULL || dst_size == 0) {
return 0;
}
if (src == NULL) {
dst[0] = '\0';
return 0;
}
size_t src_len = 0;
while (src[src_len] != '\0') {
src_len++;
}
// Copy at most dst_size - 1 characters
size_t copy_len = (src_len < dst_size - 1) ? src_len : dst_size - 1;
for (size_t i = 0; i < copy_len; i++) {
dst[i] = src[i];
}
dst[copy_len] = '\0'; // Always null-terminate
return src_len; // Return original length for truncation detection
}
Key Insight: By returning src_len, callers can detect truncation:
char buf[10];
if (safe_strcpy(buf, input, sizeof(buf)) >= sizeof(buf)) {
printf("Warning: input was truncated\n");
}
Phase 3: safe_strcat (2-3 hours)
Goal: Concatenate without overflowing destination.
size_t safe_strcat(char *dst, const char *src, size_t dst_size) {
if (dst == NULL || dst_size == 0) {
return 0;
}
// Find current length of dst (within dst_size)
size_t dst_len = 0;
while (dst_len < dst_size && dst[dst_len] != '\0') {
dst_len++;
}
if (dst_len >= dst_size) {
// dst is not properly terminated within dst_size
// Force termination and return error
dst[dst_size - 1] = '\0';
return dst_size; // Indicates error
}
if (src == NULL) {
return dst_len;
}
// Calculate source length
size_t src_len = 0;
while (src[src_len] != '\0') {
src_len++;
}
// Space available for copying (including null terminator)
size_t space_left = dst_size - dst_len - 1;
// Copy as much as fits
size_t copy_len = (src_len < space_left) ? src_len : space_left;
for (size_t i = 0; i < copy_len; i++) {
dst[dst_len + i] = src[i];
}
dst[dst_len + copy_len] = '\0';
return dst_len + src_len; // What the total WOULD be
}
Phase 4: Overflow Demonstration (1-2 hours)
Create a program that shows what happens with unsafe functions:
// demo_overflow.c
#include <stdio.h>
#include <string.h>
void demonstrate_overflow(void) {
char buffer[10];
int canary = 0xDEADBEEF;
printf("BEFORE overflow:\n");
printf(" buffer address: %p\n", (void*)buffer);
printf(" canary address: %p\n", (void*)&canary);
printf(" canary value: 0x%08X\n", canary);
// This WILL overflow and corrupt canary
strcpy(buffer, "AAAABBBBCCCCDDDD");
printf("\nAFTER overflow:\n");
printf(" canary value: 0x%08X\n", canary);
printf(" canary is now CORRUPTED!\n");
}
Compile with AddressSanitizer to see the detection:
$ clang -fsanitize=address demo_overflow.c -o demo
$ ./demo
# AddressSanitizer will report the buffer overflow!
Testing Strategy
Test Suite Structure
// test_safe_string.c
void test_safe_strlen(void) {
// Normal cases
assert(safe_strlen("hello", 100) == 5);
assert(safe_strlen("", 100) == 0);
// Edge cases
assert(safe_strlen("hello", 5) == -1); // Exactly at limit (no '\0' check)
assert(safe_strlen("hello", 6) == 5); // One past limit
assert(safe_strlen(NULL, 100) == -1);
printf("safe_strlen: all tests passed\n");
}
void test_safe_strcpy(void) {
char buf[10];
// Normal copy
assert(safe_strcpy(buf, "hello", sizeof(buf)) == 5);
assert(strcmp(buf, "hello") == 0);
// Truncation
assert(safe_strcpy(buf, "this is too long", sizeof(buf)) == 16);
assert(strlen(buf) == 9); // Truncated to 9 chars + '\0'
assert(buf[9] == '\0'); // Still null-terminated!
// Empty string
assert(safe_strcpy(buf, "", sizeof(buf)) == 0);
assert(buf[0] == '\0');
// NULL handling
assert(safe_strcpy(buf, NULL, sizeof(buf)) == 0);
printf("safe_strcpy: all tests passed\n");
}
void test_safe_strcat(void) {
char buf[15];
// Normal concatenation
safe_strcpy(buf, "Hello", sizeof(buf));
assert(safe_strcat(buf, " World", sizeof(buf)) == 11);
assert(strcmp(buf, "Hello World") == 0);
// Truncation during concat
safe_strcpy(buf, "Hello", sizeof(buf));
size_t result = safe_strcat(buf, " World and more", sizeof(buf));
assert(result == 20); // Would have been 20 chars
assert(strlen(buf) == 14); // Actually 14 chars
assert(buf[14] == '\0'); // Still terminated
printf("safe_strcat: all tests passed\n");
}
Comparison with Standard Library
void compare_with_standard(void) {
char safe_buf[10], unsafe_buf[10];
printf("=== SAFE vs UNSAFE ===\n\n");
// Test 1: Overflow scenario
const char *too_long = "This is way too long for 10 bytes";
// Unsafe version (DON'T DO THIS IN PRODUCTION)
// strcpy(unsafe_buf, too_long); // Would overflow!
// Safe version
size_t result = safe_strcpy(safe_buf, too_long, sizeof(safe_buf));
printf("safe_strcpy returned: %zu\n", result);
printf("String length: %zu\n", strlen(safe_buf));
printf("Buffer content: '%s'\n", safe_buf);
if (result >= sizeof(safe_buf)) {
printf("NOTICE: String was truncated (would have been %zu chars)\n", result);
}
}
Common Pitfalls and Debugging Tips
Pitfall 1: Off-by-One in Size Calculations
// WRONG: forgot to account for null terminator
char buf[10];
for (int i = 0; i < 10; i++) { // Should be 9!
buf[i] = 'A';
}
buf[10] = '\0'; // Out of bounds!
// CORRECT
for (int i = 0; i < 9; i++) {
buf[i] = 'A';
}
buf[9] = '\0';
Pitfall 2: Assuming strlen() Includes Null Terminator
// WRONG
char *copy = malloc(strlen(src)); // Need +1 for '\0'!
strcpy(copy, src);
// CORRECT
char *copy = malloc(strlen(src) + 1);
strcpy(copy, src);
Pitfall 3: Not Checking Return Values
// WRONG: ignoring truncation
char buf[10];
safe_strcpy(buf, user_input, sizeof(buf));
// What if user_input was 100 chars? It got truncated!
// CORRECT
if (safe_strcpy(buf, user_input, sizeof(buf)) >= sizeof(buf)) {
fprintf(stderr, "Error: input too long\n");
return -1;
}
Debugging with Valgrind
$ valgrind --leak-check=full ./test_safe_string
Check for:
- Invalid reads (reading past buffer)
- Invalid writes (buffer overflow)
- Memory leaks
Extensions and Challenges
Challenge 1: safe_sprintf
Implement a safe version of sprintf:
int safe_sprintf(char *buf, size_t buf_size, const char *fmt, ...);
// Returns: number of chars that WOULD have been written
// Always null-terminates if buf_size > 0
Challenge 2: String Builder
Create a dynamic string builder that grows as needed:
typedef struct {
char *data;
size_t length;
size_t capacity;
} StringBuilder;
void sb_init(StringBuilder *sb);
void sb_append(StringBuilder *sb, const char *str);
char *sb_to_string(StringBuilder *sb); // Returns owned copy
void sb_free(StringBuilder *sb);
Challenge 3: Unicode-Aware strlen
// Count Unicode code points, not bytes
size_t utf8_strlen(const char *s);
// Example: "café" is 4 code points but 5 bytes (é = 2 bytes in UTF-8)
Challenge 4: Fuzz Testing
Write a fuzzer that generates random inputs:
void fuzz_safe_strcpy(int iterations) {
for (int i = 0; i < iterations; i++) {
size_t src_len = rand() % 1000;
size_t dst_size = rand() % 100 + 1;
char *src = generate_random_string(src_len);
char *dst = malloc(dst_size);
safe_strcpy(dst, src, dst_size);
// Verify invariants:
assert(strlen(dst) < dst_size);
assert(dst[strlen(dst)] == '\0');
free(src);
free(dst);
}
}
Real-World Connections
Connection 1: CVE Database
Search for “buffer overflow” on CVE databases—you’ll find thousands of vulnerabilities that would have been prevented by bounds checking.
Connection 2: Modern Language Design
Languages like Rust prevent these issues at compile time:
- Rust strings know their length
- Rust prevents buffer overflows via the borrow checker
- Rust’s
Stringtype is always valid UTF-8
Connection 3: Production Libraries
Study how production code handles strings:
- OpenBSD’s
strlcpy/strlcat(your inspiration) - SQLite’s string handling (extremely careful)
- Linux kernel’s string functions
Interview Questions You Can Now Answer
- “What’s wrong with
strcpyand how would you fix it?”strcpydoesn’t know destination size- Fix: Pass destination size, copy at most size-1 chars, always null-terminate
- “What is a buffer overflow? How does it lead to code execution?”
- Writing past buffer boundaries corrupts adjacent memory
- Can overwrite return addresses to redirect execution
- “What’s the difference between
strncpyandstrlcpy?”strncpy: May not null-terminate, pads with zerosstrlcpy: Always null-terminates, returns source length
- “Why does
"hello"take 6 bytes of memory?”- 5 characters + 1 null terminator
- “How would you implement
strlenwithout using any library functions?”- Loop until you find ‘\0’, count iterations
- “What happens if you pass a non-null-terminated string to
printf("%s", ...)?”- Undefined behavior—printf reads until it finds ‘\0’ (potentially forever)
Resources
Books
- “The C Programming Language” by K&R - Chapter 5 (Pointers and Arrays)
- “Effective C” by Robert Seacord - Chapter 7 (Strings)
- “Secure Coding in C and C++” by Robert Seacord - Chapter 2 (Strings)
Online
- CWE-120: Buffer Copy without Checking Size
- CWE-134: Uncontrolled Format String
- OpenBSD strlcpy man page
Tools
- AddressSanitizer (
-fsanitize=address) - Valgrind
- Coverity (static analysis)
Self-Assessment Checklist
Before moving to the next project, you should be able to:
- Explain why
"hello"is 6 bytes - Describe exactly why
strcpy(small_buffer, huge_string)corrupts memory - Implement
safe_strcpyfrom scratch - Detect buffer overflows using AddressSanitizer
- Explain the difference between
strncpyandstrlcpy - Design a function API that makes buffer overflows impossible
Final Milestone: You instinctively check buffer sizes before any string operation.