Project 13: Build a Safe String Library

Design a safer string API with explicit length and capacity.

Quick Reference

Attribute	Value
Difficulty	Level 4
Time Estimate	20-30h
Main Programming Language	C (Alternatives: Rust, Zig)
Alternative Programming Languages	Rust, Zig
Coolness Level	See REFERENCE.md
Business Potential	See REFERENCE.md
Prerequisites	C strings, memory management
Key Topics	bounds checking, ownership

1. Learning Objectives

By completing this project, you will:

Build a working artifact that demonstrates the core concept.
Explain the underlying C rules that make the behavior correct or surprising.
Validate outcomes with deterministic tests or outputs.
Document pitfalls and fixes for common mistakes.

2. All Theory Needed (Per-Concept Breakdown)

Storage Duration, Lifetime, and Layout

Fundamentals C objects live in text, read-only data, initialized data, BSS, heap, or stack. Each object has a storage duration (static, automatic, allocated) and a lifetime (when it is valid to access). Alignment rules insert padding into structs, which affects size and layout. These details explain why returning a pointer to a local variable is invalid and why struct layouts often exceed the sum of their fields.

Deep Dive into the concept Process layout is consistent in practice: text/rodata/data/bss, then heap growing upward, and stack growing downward. Lifetimes are strict: automatic objects end at scope exit, allocated objects end at free, and static objects persist for the program lifetime. Alignment and padding ensure the CPU can access data efficiently. Understanding layout turns memory bugs into predictable, diagnosable problems.

How this fit on projects This concept directly powers the core logic of this project and informs the design choices in Section 3.1 and Section 5.5.

Definitions & key terms

Core type rules: The language rules that define the meaning of expressions and declarations.
Constraint: A rule that, if violated, produces a diagnostic.

Mental model diagram

Input -> Rule Engine -> Deterministic Output

How it works (step-by-step)

Identify the relevant rule from the C standard or ABI.
Apply the rule to a concrete example.
Validate the outcome with a deterministic test or output.

Minimal concrete example

Example: apply the rule to one small input and show the resulting output line.

Common misconceptions

Assuming the rule is intuitive rather than specified.
Forgetting implicit adjustments or promotions.

Check-your-understanding questions

Which rule applies in this case, and why?
What would change if the type or qualifier changed?
Which output line proves the rule is applied correctly?

Check-your-understanding answers

The rule depends on the declarator, type, or ABI constraint.
Changing types often changes the conversion or calling rule.
The output lines that show size, address, or mapping provide proof.

Real-world applications

Safer APIs, correct diagnostics, and predictable binaries.

Where you’ll apply it See Section 3.1, Section 5.4, and Section 5.5 in this file. Also used in: P03-memory-layout-visualizer.md, P13-safe-string-library.md, P15-struct-packing-analyzer.md.

References

“Expert C Programming” and C standard references
“CS:APP” and ABI documentation

Key insights Rules beat intuition; write tools that make the rules visible.

Summary This concept explains why the output looks the way it does.

Homework/Exercises to practice the concept

Rewrite a rule in your own words and test it with two examples.
Predict output before running the tool.

Solutions to the homework/exercises

Compare your predictions to the golden transcript and adjust.

Undefined Behavior and Portability

Fundamentals Undefined behavior (UB) means the C standard provides no guarantees. Compilers assume UB never happens and may optimize accordingly. Common UB sources include out-of-bounds access, use-after-free, signed overflow, and strict aliasing violations. Portability requires avoiding UB and minimizing implementation-defined assumptions.

Deep Dive into the concept UB is not “just a crash.” It can produce silent miscompilations that only appear at higher optimization levels. Strict aliasing rules allow compilers to assume that different types do not alias, enabling reordering that breaks type-punning hacks. Portable code makes assumptions explicit and uses fixed-width types and well-defined conversions.

How this fit on projects This concept directly powers the core logic of this project and informs the design choices in Section 3.1 and Section 5.5.

Definitions & key terms

Core type rules: The language rules that define the meaning of expressions and declarations.
Constraint: A rule that, if violated, produces a diagnostic.

Mental model diagram

Input -> Rule Engine -> Deterministic Output

How it works (step-by-step)

Identify the relevant rule from the C standard or ABI.
Apply the rule to a concrete example.
Validate the outcome with a deterministic test or output.

Minimal concrete example

Example: apply the rule to one small input and show the resulting output line.

Common misconceptions

Assuming the rule is intuitive rather than specified.
Forgetting implicit adjustments or promotions.

Check-your-understanding questions

Which rule applies in this case, and why?
What would change if the type or qualifier changed?
Which output line proves the rule is applied correctly?

Check-your-understanding answers

The rule depends on the declarator, type, or ABI constraint.
Changing types often changes the conversion or calling rule.
The output lines that show size, address, or mapping provide proof.

Real-world applications

Safer APIs, correct diagnostics, and predictable binaries.

Where you’ll apply it See Section 3.1, Section 5.4, and Section 5.5 in this file. Also used in: P12-bug-catalog-language-features.md, P14-memory-debugger-mini-valgrind.md, P16-portable-code-checker.md.

References

“Expert C Programming” and C standard references
“CS:APP” and ABI documentation

Key insights Rules beat intuition; write tools that make the rules visible.

Summary This concept explains why the output looks the way it does.

Homework/Exercises to practice the concept

Rewrite a rule in your own words and test it with two examples.
Predict output before running the tool.

Solutions to the homework/exercises

Compare your predictions to the golden transcript and adjust.

3. Project Specification

3.1 What You Will Build

A library that prevents buffer overflows through explicit capacity checks.

Included: Core features required to demonstrate the concept. Excluded: Unicode normalization or internationalization.

3.2 Functional Requirements

Core functionality: Provide the primary output described in the Real World Outcome.
Deterministic output: The same input produces the same output format.
Self-checks: Include at least one validation or sanity-check mode.

3.3 Non-Functional Requirements

Performance: Runs on a small example in under a second.
Reliability: Handles invalid input gracefully without crashing.
Usability: Output is readable and clearly labeled.

3.4 Example Usage / Output

$ ./safe_string_tests
[OK] create length=5 capacity=16

3.5 Data Formats / Schemas / Protocols

Input: API calls. Output: status codes and string values.

3.6 Edge Cases

Truncation behavior
Null pointer inputs

3.7 Real World Outcome

This section is your golden reference for correctness and determinism.

3.7.1 How to Run (Copy/Paste)

Build: make (or cc with a minimal Makefile)
Run (success): ./P13-safe-string-library with the example inputs
Run (failure): use an invalid input to confirm error handling
Working directory: project root

3.7.2 Golden Path Demo (Deterministic)

$ ./safe_string_tests
[OK] create length=5 capacity=16

3.7.3 Failure Demo (Deterministic)

$ ./safe_string_tests --bad
error: test case not found
exit code: 2

3.7.4 Exit Codes

0 = success
1 = IO or missing file
2 = invalid arguments or parse failure
3 = unsupported platform/ABI (if applicable)

4. Solution Architecture

4.1 High-Level Design

+-------------+     +-------------+     +-------------+
|  Input      |---->|  Core Logic |---->|  Output     |
+-------------+     +-------------+     +-------------+

4.2 Key Components

Component	Responsibility	Key Decisions
Parser/Reader	Load and validate input	Keep input handling strict and explicit
Core Engine	Execute the core logic	Separate calculation from formatting
Reporter	Format and print output	Make output deterministic and labeled

4.3 Data Structures (No Full Code)

Input model: parsed representation of the inputs.
Core model: data structures representing the concept rules.
Report model: formatted lines or records for output.

4.4 Algorithm Overview

Key Algorithm: Rule Engine

Parse and normalize input.
Apply the relevant C/ABI rules.
Emit a deterministic, labeled report.

Complexity Analysis:

Time: O(n) over input size
Space: O(n) for parsed representation

5. Implementation Guide

5.1 Development Environment Setup

- cc (GCC or Clang)
- make
- gdb or lldb

5.2 Project Structure

project-root/
+-- src/
|   +-- main.c
|   +-- parser.c
|   +-- report.c
+-- tests/
|   +-- test_cases.txt
+-- Makefile
+-- README.md

5.3 The Core Question You’re Answering

“How can I design string APIs that make unsafe operations hard to express?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

The primary rule set behind this project’s logic.
The data representation used by the project output.
The error cases that trigger invalid behavior.

5.5 Questions to Guide Your Design

How will you parse inputs without ambiguity?
What invariants must your core logic maintain?
How will you ensure deterministic output formatting?

5.6 Thinking Exercise

Sketch the input and output for a minimal example. Then predict what the tool should print before you run it.

5.7 The Interview Questions They’ll Ask

What does this project reveal about bounds checking, ownership?
How do you validate that the output is correct?
Which C rule makes the most surprising outcome happen?
What is the most common mistake in this domain?
How would you explain this project to a teammate?

5.8 Hints in Layers

Hint 1: Start simple Handle a tiny input and one output line.

Hint 2: Add structure Build a parsed representation before formatting.

Hint 3: Validate output Compare against a golden transcript line by line.

Hint 4: Use tooling Inspect intermediate artifacts with compiler flags like -E, -S, or -c if applicable.

5.9 Books That Will Help

Topic	Book	Chapter
Safer C	“Effective C”	Ch. 5
C strings	K&R	Ch. 5

5.10 Implementation Phases

Phase 1: Foundation (2-4h)

Goals: input parsing and minimal output
Tasks: build a minimal parser, print one line of output
Checkpoint: a single example produces the expected output

Phase 2: Core Functionality (4-10h)

Goals: full rule coverage and deterministic output
Tasks: add rule engine, add edge cases
Checkpoint: all golden cases match expected output

Phase 3: Polish & Edge Cases (2-6h)

Goals: robust error handling and clean output
Tasks: add error messages, format output
Checkpoint: failure demos return correct exit codes

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Parsing depth	Shallow vs full	Full within project scope	Avoid ambiguous results
Output format	Minimal vs labeled	Labeled	Easier verification
Error handling	Silent vs explicit	Explicit	Deterministic failure demos

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Test core logic	parser normalization, rule application
Integration Tests	Test full flow	input -> output transcript
Edge Case Tests	Boundary behavior	invalid input, large nesting

6.2 Critical Test Cases

Golden path: Valid input produces the exact expected output.
Invalid input: Returns exit code 2 with a clear error.
Edge input: Deep nesting or unusual qualifiers still parse correctly.

6.3 Test Data

- small_valid_case
- deep_nesting_case
- invalid_syntax_case

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Weak parsing	Output is inconsistent	Normalize tokens first
Missing rule	Output differs from expected	Add rule and regression test
Bad error handling	Crash on invalid input	Return explicit exit codes

7.2 Debugging Strategies

Compare outputs: Use a golden transcript for diffing.
Inspect pipeline: Use compiler flags to validate intermediate stages.

7.3 Performance Traps

For large inputs, avoid quadratic scans; keep parsing linear.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a help flag that prints usage examples.
Add a version flag and build metadata.

8.2 Intermediate Extensions

Add a JSON output mode for tooling integration.
Add a mode that explains rules step-by-step.

8.3 Advanced Extensions

Add platform-specific ABI notes where relevant.
Add cross-compiler comparison outputs.

9. Real-World Connections

9.1 Industry Applications

Debugging: Translating compiler errors into clear explanations.
Tooling: Building diagnostics for build systems and CI.

cdecl: A classic declaration translation tool.
Compiler Explorer: Shows how code becomes assembly.

9.3 Interview Relevance

Declaration parsing, pointer reasoning, and ABI understanding are common systems interview topics.

10. Resources

10.1 Essential Reading

Safer C “Effective C” Ch. 5
C strings K&R Ch. 5

10.2 Video Resources

“C Declarations Explained” (conference talk)
“Linker Errors Demystified” (lecture)

10.3 Tools & Documentation

GCC/Clang documentation (compiler flags)
System V AMD64 ABI

Previous: P03-memory-layout-visualizer.md
Next: P16-portable-code-checker.md

11. Self-Assessment Checklist

11.1 Understanding

I can explain the core rules without notes
I can predict output before running the tool
I can explain why the failure demo fails

11.2 Implementation

All functional requirements are met
All test cases pass
Edge cases are handled

11.3 Growth

I can explain this project in an interview
I documented at least one lesson learned

12. Submission / Completion Criteria

Minimum Viable Completion:

Golden path output matches the example transcript
Failure demo produces correct exit code
README explains how to run

Full Completion:

All edge cases documented and tested
Deterministic output across runs

Excellence (Going Above & Beyond):

Additional output format (JSON or annotated)
Cross-compiler comparisons