Project 7: String Library from Scratch

A complete string library with safe string functions, UTF-8 support, and bounds-checking interfaces that prevent buffer overflows.

Quick Reference

Attribute Value
Primary Language C
Alternative Languages None
Difficulty Level 3 - Advanced
Time Estimate See main guide
Knowledge Area String Handling, Security
Tooling GCC, Valgrind, AddressSanitizer
Prerequisites See main guide

What You Will Build

A complete string library with safe string functions, UTF-8 support, and bounds-checking interfaces that prevent buffer overflows.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Null terminator handling → Maps to understanding C strings
  • Buffer overflow prevention → Maps to secure coding
  • UTF-8 encoding → Maps to Unicode support

Key Concepts

  • Map the project to core concepts before you code.

Real-World Outcome

# 1. Basic safe operations
$ ./string_test safe_ops
Testing safe_strcpy:
  Source: "Hello, World!" (13 chars)
  Dest buffer: 10 bytes
  Result: ERROR_BUFFER_TOO_SMALL
  Dest contents: "Hello, Wo" (truncated with null terminator)

Testing safe_strcat:
  Dest: "Hello" (5 chars)
  Source: ", World!" (8 chars)
  Dest buffer: 15 bytes
  Result: SUCCESS
  Dest contents: "Hello, World!" (13 chars)

# 2. UTF-8 handling
$ ./string_test utf8 "Hello, 世界! 🌍"
Input string bytes: 19
ASCII character count: 19 (wrong!)
UTF-8 codepoint count: 12 (correct!)
Codepoints:
  H (U+0048) - 1 byte
  e (U+0065) - 1 byte
  l (U+006C) - 1 byte
  l (U+006C) - 1 byte
  o (U+006F) - 1 byte
  , (U+002C) - 1 byte
  (space) (U+0020) - 1 byte
  世 (U+4E16) - 3 bytes
  界 (U+754C) - 3 bytes
  ! (U+0021) - 1 byte
  (space) (U+0020) - 1 byte
  🌍 (U+1F30D) - 4 bytes

# 3. Overflow prevention demo
$ ./string_test overflow
Standard strcpy (DANGEROUS):
  Attempting to copy 100 bytes into 10-byte buffer...
  [AddressSanitizer would catch: stack-buffer-overflow]

Safe strcpy:
  Attempting to copy 100 bytes into 10-byte buffer...
  Result: ERROR_BUFFER_TOO_SMALL
  No overflow occurred. Buffer contains: "123456789" (truncated safely)

# 4. Format string safety
$ ./string_test format
safe_snprintf(buf, 10, "Value: %d", 12345)
Result: "Value: 12" (truncated, no overflow)
Return value: 12 (would need 12 chars for full output)

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: PROFESSIONAL_C_PROGRAMMING_MASTERY.md
  • Effective C, 2nd Edition by Robert C. Seacord