Project 2: String Library
Implement a small, well-tested string library to master pointer arithmetic and C APIs.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Language | C |
| Prerequisites | Project 1, pointer basics |
| Key Topics | C strings, bounds, APIs, tests |
1. Learning Objectives
By completing this project, you will:
- Implement core C string functions safely.
- Design clean, documented C APIs.
- Write tests for edge cases and invalid inputs.
- Understand null-termination and buffer sizing deeply.
2. Theoretical Foundation
2.1 Core Concepts
- Null-terminated strings: A string ends at the first
\0byte. - Pointer arithmetic: You walk characters by incrementing pointers.
- Undefined behavior: Writing past buffers is catastrophic in C.
2.2 Why This Matters
C strings are the source of many security bugs. Building your own library forces you to respect boundaries and to design APIs that prevent mistakes.
2.3 Historical Context / Background
The original C standard library is minimal and unsafe by modern standards. A careful, well-tested string library is a rite of passage for C developers.
2.4 Common Misconceptions
- “Strings know their length”: They do not; you must compute it.
- “
strcpyis safe”: It is not unless you check sizes.
3. Project Specification
3.1 What You Will Build
A libstr library that includes safe versions of:
str_len,str_copy,str_ncopystr_cmp,str_ncmpstr_find,str_rfindstr_splitandstr_join
3.2 Functional Requirements
- All functions must handle empty strings.
- Copy routines must respect buffer sizes.
str_splitmust return allocated tokens and a free helper.- Errors should be reported via return codes or NULL.
3.3 Non-Functional Requirements
- Reliability: No buffer overflows.
- Usability: Clear names and documented behavior.
- Testability: Unit tests for all functions.
3.4 Example Usage / Output
char buf[32];
str_copy(buf, sizeof(buf), "hello");
// buf == "hello"
char **parts = str_split("a,b,c", ',');
// parts -> ["a", "b", "c", NULL]
3.5 Real World Outcome
You can include your library in small projects and avoid unsafe strcpy usage. A test suite proves your functions behave correctly for edge cases.
4. Solution Architecture
4.1 High-Level Design
libstr/ -> header + source -> unit tests -> example program
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
libstr.h |
Public API | Clear naming and contracts |
libstr.c |
Implementations | Prefer size-aware routines |
| Tests | Validate correctness | Include edge cases |
4.3 Data Structures
typedef struct {
char **items;
size_t count;
} StrList;
4.4 Algorithm Overview
Key Algorithm: Split
- Count delimiters to determine token count.
- Allocate array of pointers.
- Allocate and copy each token.
Complexity Analysis:
- Time: O(n)
- Space: O(n)
5. Implementation Guide
5.1 Development Environment Setup
cc -Wall -Wextra -O2 -g -o test_strings tests.c libstr.c
5.2 Project Structure
libstr/
├── src/
│ ├── libstr.c
│ └── libstr.h
├── tests/
│ └── test_libstr.c
└── README.md
5.3 The Core Question You’re Answering
“How do I manipulate strings safely when the language gives me only raw pointers?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Null termination
- Why does C use
\0? - What happens if it is missing?
- Why does C use
- Buffer sizes
- How do you pass destination size into functions?
- Ownership
- Who allocates and frees memory in
str_split?
- Who allocates and frees memory in
5.5 Questions to Guide Your Design
Before implementing, think through these:
- Will your functions mimic standard library semantics or improve them?
- How will you report errors (NULL, -1, errno)?
- Which functions should be size-aware by default?
5.6 Thinking Exercise
Design a Safe Copy
What should str_copy(dest, size, src) return if src does not fit? How should it null-terminate?
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “Why is
strcpyunsafe?” - “How do you implement
strlenwithout over-reading?” - “How should a string split function manage memory?”
5.8 Hints in Layers
Hint 1: Implement str_len first
Many other functions depend on length.
Hint 2: Add size-aware copies
Use str_ncopy and return a status.
Hint 3: Write tests for edge cases Empty strings, long strings, and delimiters at edges.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Strings and pointers | “The C Programming Language” | Ch. 5 |
| Safe APIs | “Secure Coding in C and C++” | Ch. 3 |
5.10 Implementation Phases
Phase 1: Foundation (2-3 days)
Goals:
- Core string length and compare
Tasks:
- Implement
str_lenandstr_cmp.
Checkpoint: Tests pass for basic comparisons.
Phase 2: Core Functionality (4-6 days)
Goals:
- Copy and find functions
Tasks:
- Implement
str_copywith size. - Implement
str_findandstr_rfind.
Checkpoint: Copy functions prevent overflow.
Phase 3: Advanced Helpers (3-4 days)
Goals:
- Split and join
Tasks:
- Implement
str_splitand free helper. - Add
str_join.
Checkpoint: Split/join round trips.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Error signaling | NULL vs errno | Return status + output | Explicit handling |
| Split API | char ** vs struct |
Struct | Encodes count |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate functions | str_len("") |
| Edge Case Tests | Null, empty, long | Truncation behavior |
| Fuzz Tests | Random input | Ensure no crashes |
6.2 Critical Test Cases
- Empty string: Functions should return sensible values.
- Exact fit: Copy should succeed and null-terminate.
- Overflow: Copy should return error and safe result.
6.3 Test Data
""
"hello"
"a,b,,c"
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Off-by-one in copy | Missing null byte | Always reserve space for \0 |
| Memory leaks in split | Lost tokens | Free in a loop |
| Missing NULL checks | Crashes | Guard input pointers |
7.2 Debugging Strategies
- Use
valgrindto detect leaks and overflows. - Add tests for every edge case you can imagine.
7.3 Performance Traps
Repeated strlen in loops can be O(n^2). Cache lengths when needed.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
str_starts_withandstr_ends_with. - Add
str_trimfor whitespace.
8.2 Intermediate Extensions
- Implement case-insensitive compare.
- Add UTF-8 aware functions.
8.3 Advanced Extensions
- Provide allocator hooks for custom memory management.
- Create a small
stringstruct with length caching.
9. Real-World Connections
9.1 Industry Applications
- Security: Hardened string handling prevents exploits.
- Systems libraries: Embedded systems often roll custom libs.
9.2 Related Open Source Projects
- musl libc: Minimal C standard library implementation.
9.3 Interview Relevance
String manipulation and pointer reasoning are classic interview topics.
10. Resources
10.1 Essential Reading
- “The C Programming Language” - Ch. 5
- “Secure Coding in C and C++” - Ch. 3
10.2 Video Resources
- C pointers and strings lectures (any reputable systems course)
10.3 Tools & Documentation
man 3 strlenand related string APIs
10.4 Related Projects in This Series
- Dynamic Array: More memory ownership practice.
- Hash Table: Uses string keys.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain null-termination clearly.
- I understand why size-aware APIs matter.
- I can reason about string ownership.
11.2 Implementation
- All functions pass tests.
- No memory leaks or overflows.
- API is documented and consistent.
11.3 Growth
- I can compare my behavior to libc functions.
- I can explain this project in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Implement
str_len,str_copy,str_cmp. - Tests for core functions.
Full Completion:
- Includes
str_splitandstr_joinwith tests.
Excellence (Going Above & Beyond):
- UTF-8 aware helpers and fuzz tests.
This guide was generated from C_PROGRAMMING_COMPLETE_MASTERY.md. For the complete learning path, see the parent directory.