Project 2: String Library

Implement a small, well-tested string library to master pointer arithmetic and C APIs.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Language C
Prerequisites Project 1, pointer basics
Key Topics C strings, bounds, APIs, tests

1. Learning Objectives

By completing this project, you will:

  1. Implement core C string functions safely.
  2. Design clean, documented C APIs.
  3. Write tests for edge cases and invalid inputs.
  4. Understand null-termination and buffer sizing deeply.

2. Theoretical Foundation

2.1 Core Concepts

  • Null-terminated strings: A string ends at the first \0 byte.
  • Pointer arithmetic: You walk characters by incrementing pointers.
  • Undefined behavior: Writing past buffers is catastrophic in C.

2.2 Why This Matters

C strings are the source of many security bugs. Building your own library forces you to respect boundaries and to design APIs that prevent mistakes.

2.3 Historical Context / Background

The original C standard library is minimal and unsafe by modern standards. A careful, well-tested string library is a rite of passage for C developers.

2.4 Common Misconceptions

  • “Strings know their length”: They do not; you must compute it.
  • strcpy is safe”: It is not unless you check sizes.

3. Project Specification

3.1 What You Will Build

A libstr library that includes safe versions of:

  • str_len, str_copy, str_ncopy
  • str_cmp, str_ncmp
  • str_find, str_rfind
  • str_split and str_join

3.2 Functional Requirements

  1. All functions must handle empty strings.
  2. Copy routines must respect buffer sizes.
  3. str_split must return allocated tokens and a free helper.
  4. Errors should be reported via return codes or NULL.

3.3 Non-Functional Requirements

  • Reliability: No buffer overflows.
  • Usability: Clear names and documented behavior.
  • Testability: Unit tests for all functions.

3.4 Example Usage / Output

char buf[32];
str_copy(buf, sizeof(buf), "hello");
// buf == "hello"

char **parts = str_split("a,b,c", ',');
// parts -> ["a", "b", "c", NULL]

3.5 Real World Outcome

You can include your library in small projects and avoid unsafe strcpy usage. A test suite proves your functions behave correctly for edge cases.


4. Solution Architecture

4.1 High-Level Design

libstr/ -> header + source -> unit tests -> example program

4.2 Key Components

Component Responsibility Key Decisions
libstr.h Public API Clear naming and contracts
libstr.c Implementations Prefer size-aware routines
Tests Validate correctness Include edge cases

4.3 Data Structures

typedef struct {
    char **items;
    size_t count;
} StrList;

4.4 Algorithm Overview

Key Algorithm: Split

  1. Count delimiters to determine token count.
  2. Allocate array of pointers.
  3. Allocate and copy each token.

Complexity Analysis:

  • Time: O(n)
  • Space: O(n)

5. Implementation Guide

5.1 Development Environment Setup

cc -Wall -Wextra -O2 -g -o test_strings tests.c libstr.c

5.2 Project Structure

libstr/
├── src/
│   ├── libstr.c
│   └── libstr.h
├── tests/
│   └── test_libstr.c
└── README.md

5.3 The Core Question You’re Answering

“How do I manipulate strings safely when the language gives me only raw pointers?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Null termination
    • Why does C use \0?
    • What happens if it is missing?
  2. Buffer sizes
    • How do you pass destination size into functions?
  3. Ownership
    • Who allocates and frees memory in str_split?

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. Will your functions mimic standard library semantics or improve them?
  2. How will you report errors (NULL, -1, errno)?
  3. Which functions should be size-aware by default?

5.6 Thinking Exercise

Design a Safe Copy

What should str_copy(dest, size, src) return if src does not fit? How should it null-terminate?

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Why is strcpy unsafe?”
  2. “How do you implement strlen without over-reading?”
  3. “How should a string split function manage memory?”

5.8 Hints in Layers

Hint 1: Implement str_len first Many other functions depend on length.

Hint 2: Add size-aware copies Use str_ncopy and return a status.

Hint 3: Write tests for edge cases Empty strings, long strings, and delimiters at edges.

5.9 Books That Will Help

Topic Book Chapter
Strings and pointers “The C Programming Language” Ch. 5
Safe APIs “Secure Coding in C and C++” Ch. 3

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

  • Core string length and compare

Tasks:

  1. Implement str_len and str_cmp.

Checkpoint: Tests pass for basic comparisons.

Phase 2: Core Functionality (4-6 days)

Goals:

  • Copy and find functions

Tasks:

  1. Implement str_copy with size.
  2. Implement str_find and str_rfind.

Checkpoint: Copy functions prevent overflow.

Phase 3: Advanced Helpers (3-4 days)

Goals:

  • Split and join

Tasks:

  1. Implement str_split and free helper.
  2. Add str_join.

Checkpoint: Split/join round trips.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Error signaling NULL vs errno Return status + output Explicit handling
Split API char ** vs struct Struct Encodes count

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate functions str_len("")
Edge Case Tests Null, empty, long Truncation behavior
Fuzz Tests Random input Ensure no crashes

6.2 Critical Test Cases

  1. Empty string: Functions should return sensible values.
  2. Exact fit: Copy should succeed and null-terminate.
  3. Overflow: Copy should return error and safe result.

6.3 Test Data

""
"hello"
"a,b,,c"

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Off-by-one in copy Missing null byte Always reserve space for \0
Memory leaks in split Lost tokens Free in a loop
Missing NULL checks Crashes Guard input pointers

7.2 Debugging Strategies

  • Use valgrind to detect leaks and overflows.
  • Add tests for every edge case you can imagine.

7.3 Performance Traps

Repeated strlen in loops can be O(n^2). Cache lengths when needed.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add str_starts_with and str_ends_with.
  • Add str_trim for whitespace.

8.2 Intermediate Extensions

  • Implement case-insensitive compare.
  • Add UTF-8 aware functions.

8.3 Advanced Extensions

  • Provide allocator hooks for custom memory management.
  • Create a small string struct with length caching.

9. Real-World Connections

9.1 Industry Applications

  • Security: Hardened string handling prevents exploits.
  • Systems libraries: Embedded systems often roll custom libs.
  • musl libc: Minimal C standard library implementation.

9.3 Interview Relevance

String manipulation and pointer reasoning are classic interview topics.


10. Resources

10.1 Essential Reading

  • “The C Programming Language” - Ch. 5
  • “Secure Coding in C and C++” - Ch. 3

10.2 Video Resources

  • C pointers and strings lectures (any reputable systems course)

10.3 Tools & Documentation

  • man 3 strlen and related string APIs
  • Dynamic Array: More memory ownership practice.
  • Hash Table: Uses string keys.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain null-termination clearly.
  • I understand why size-aware APIs matter.
  • I can reason about string ownership.

11.2 Implementation

  • All functions pass tests.
  • No memory leaks or overflows.
  • API is documented and consistent.

11.3 Growth

  • I can compare my behavior to libc functions.
  • I can explain this project in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Implement str_len, str_copy, str_cmp.
  • Tests for core functions.

Full Completion:

  • Includes str_split and str_join with tests.

Excellence (Going Above & Beyond):

  • UTF-8 aware helpers and fuzz tests.

This guide was generated from C_PROGRAMMING_COMPLETE_MASTERY.md. For the complete learning path, see the parent directory.