Project 12: Bug Catalog from “It’s Not a Bug, It’s a Language Feature”
Document undefined and unspecified behaviors with examples.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3 |
| Time Estimate | 10-15h |
| Main Programming Language | C (Alternatives: None) |
| Alternative Programming Languages | None |
| Coolness Level | See REFERENCE.md |
| Business Potential | See REFERENCE.md |
| Prerequisites | Basic C semantics |
| Key Topics | undefined behavior |
1. Learning Objectives
By completing this project, you will:
- Build a working artifact that demonstrates the core concept.
- Explain the underlying C rules that make the behavior correct or surprising.
- Validate outcomes with deterministic tests or outputs.
- Document pitfalls and fixes for common mistakes.
2. All Theory Needed (Per-Concept Breakdown)
Undefined Behavior and Portability
Fundamentals Undefined behavior (UB) means the C standard provides no guarantees. Compilers assume UB never happens and may optimize accordingly. Common UB sources include out-of-bounds access, use-after-free, signed overflow, and strict aliasing violations. Portability requires avoiding UB and minimizing implementation-defined assumptions.
Deep Dive into the concept UB is not “just a crash.” It can produce silent miscompilations that only appear at higher optimization levels. Strict aliasing rules allow compilers to assume that different types do not alias, enabling reordering that breaks type-punning hacks. Portable code makes assumptions explicit and uses fixed-width types and well-defined conversions.
How this fit on projects This concept directly powers the core logic of this project and informs the design choices in Section 3.1 and Section 5.5.
Definitions & key terms
- Core type rules: The language rules that define the meaning of expressions and declarations.
- Constraint: A rule that, if violated, produces a diagnostic.
Mental model diagram
Input -> Rule Engine -> Deterministic Output
How it works (step-by-step)
- Identify the relevant rule from the C standard or ABI.
- Apply the rule to a concrete example.
- Validate the outcome with a deterministic test or output.
Minimal concrete example
Example: apply the rule to one small input and show the resulting output line.
Common misconceptions
- Assuming the rule is intuitive rather than specified.
- Forgetting implicit adjustments or promotions.
Check-your-understanding questions
- Which rule applies in this case, and why?
- What would change if the type or qualifier changed?
- Which output line proves the rule is applied correctly?
Check-your-understanding answers
- The rule depends on the declarator, type, or ABI constraint.
- Changing types often changes the conversion or calling rule.
- The output lines that show size, address, or mapping provide proof.
Real-world applications
- Safer APIs, correct diagnostics, and predictable binaries.
Where you’ll apply it See Section 3.1, Section 5.4, and Section 5.5 in this file. Also used in: P12-bug-catalog-language-features.md, P14-memory-debugger-mini-valgrind.md, P16-portable-code-checker.md.
References
- “Expert C Programming” and C standard references
- “CS:APP” and ABI documentation
Key insights Rules beat intuition; write tools that make the rules visible.
Summary This concept explains why the output looks the way it does.
Homework/Exercises to practice the concept
- Rewrite a rule in your own words and test it with two examples.
- Predict output before running the tool.
Solutions to the homework/exercises
- Compare your predictions to the golden transcript and adjust.
3. Project Specification
3.1 What You Will Build
A catalog of edge cases with explanations and mitigations.
Included: Core features required to demonstrate the concept. Excluded: Formal proof or compiler integration.
3.2 Functional Requirements
- Core functionality: Provide the primary output described in the Real World Outcome.
- Deterministic output: The same input produces the same output format.
- Self-checks: Include at least one validation or sanity-check mode.
3.3 Non-Functional Requirements
- Performance: Runs on a small example in under a second.
- Reliability: Handles invalid input gracefully without crashing.
- Usability: Output is readable and clearly labeled.
3.4 Example Usage / Output
$ ./bug_catalog
[UB-01] Signed overflow -> optimizer may remove checks
3.5 Data Formats / Schemas / Protocols
Input: none. Output: catalog report.
3.6 Edge Cases
- Signed overflow
- Type punning
- Unsequenced side effects
3.7 Real World Outcome
This section is your golden reference for correctness and determinism.
3.7.1 How to Run (Copy/Paste)
- Build:
make(orccwith a minimal Makefile) - Run (success):
./P12-bug-catalog-language-featureswith the example inputs - Run (failure): use an invalid input to confirm error handling
- Working directory: project root
3.7.2 Golden Path Demo (Deterministic)
$ ./bug_catalog
[UB-01] Signed overflow -> optimizer may remove checks
3.7.3 Failure Demo (Deterministic)
$ ./bug_catalog --bad
error: unknown option
exit code: 2
3.7.4 Exit Codes
0= success1= IO or missing file2= invalid arguments or parse failure3= unsupported platform/ABI (if applicable)
4. Solution Architecture
4.1 High-Level Design
+-------------+ +-------------+ +-------------+
| Input |---->| Core Logic |---->| Output |
+-------------+ +-------------+ +-------------+
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Parser/Reader | Load and validate input | Keep input handling strict and explicit |
| Core Engine | Execute the core logic | Separate calculation from formatting |
| Reporter | Format and print output | Make output deterministic and labeled |
4.3 Data Structures (No Full Code)
- Input model: parsed representation of the inputs.
- Core model: data structures representing the concept rules.
- Report model: formatted lines or records for output.
4.4 Algorithm Overview
Key Algorithm: Rule Engine
- Parse and normalize input.
- Apply the relevant C/ABI rules.
- Emit a deterministic, labeled report.
Complexity Analysis:
- Time: O(n) over input size
- Space: O(n) for parsed representation
5. Implementation Guide
5.1 Development Environment Setup
- cc (GCC or Clang)
- make
- gdb or lldb
5.2 Project Structure
project-root/
+-- src/
| +-- main.c
| +-- parser.c
| +-- report.c
+-- tests/
| +-- test_cases.txt
+-- Makefile
+-- README.md
5.3 The Core Question You’re Answering
“Which C behaviors are undefined, and how do I avoid them?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- The primary rule set behind this project’s logic.
- The data representation used by the project output.
- The error cases that trigger invalid behavior.
5.5 Questions to Guide Your Design
- How will you parse inputs without ambiguity?
- What invariants must your core logic maintain?
- How will you ensure deterministic output formatting?
5.6 Thinking Exercise
Sketch the input and output for a minimal example. Then predict what the tool should print before you run it.
5.7 The Interview Questions They’ll Ask
- What does this project reveal about undefined behavior?
- How do you validate that the output is correct?
- Which C rule makes the most surprising outcome happen?
- What is the most common mistake in this domain?
- How would you explain this project to a teammate?
5.8 Hints in Layers
Hint 1: Start simple Handle a tiny input and one output line.
Hint 2: Add structure Build a parsed representation before formatting.
Hint 3: Validate output Compare against a golden transcript line by line.
Hint 4: Use tooling
Inspect intermediate artifacts with compiler flags like -E, -S, or -c if applicable.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Undefined behavior | “Effective C” | Ch. 2 |
| C gotchas | “Expert C Programming” | Ch. 1 |
5.10 Implementation Phases
Phase 1: Foundation (2-4h)
- Goals: input parsing and minimal output
- Tasks: build a minimal parser, print one line of output
- Checkpoint: a single example produces the expected output
Phase 2: Core Functionality (4-10h)
- Goals: full rule coverage and deterministic output
- Tasks: add rule engine, add edge cases
- Checkpoint: all golden cases match expected output
Phase 3: Polish & Edge Cases (2-6h)
- Goals: robust error handling and clean output
- Tasks: add error messages, format output
- Checkpoint: failure demos return correct exit codes
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Parsing depth | Shallow vs full | Full within project scope | Avoid ambiguous results |
| Output format | Minimal vs labeled | Labeled | Easier verification |
| Error handling | Silent vs explicit | Explicit | Deterministic failure demos |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test core logic | parser normalization, rule application |
| Integration Tests | Test full flow | input -> output transcript |
| Edge Case Tests | Boundary behavior | invalid input, large nesting |
6.2 Critical Test Cases
- Golden path: Valid input produces the exact expected output.
- Invalid input: Returns exit code 2 with a clear error.
- Edge input: Deep nesting or unusual qualifiers still parse correctly.
6.3 Test Data
- small_valid_case
- deep_nesting_case
- invalid_syntax_case
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Weak parsing | Output is inconsistent | Normalize tokens first |
| Missing rule | Output differs from expected | Add rule and regression test |
| Bad error handling | Crash on invalid input | Return explicit exit codes |
7.2 Debugging Strategies
- Compare outputs: Use a golden transcript for diffing.
- Inspect pipeline: Use compiler flags to validate intermediate stages.
7.3 Performance Traps
For large inputs, avoid quadratic scans; keep parsing linear.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a help flag that prints usage examples.
- Add a version flag and build metadata.
8.2 Intermediate Extensions
- Add a JSON output mode for tooling integration.
- Add a mode that explains rules step-by-step.
8.3 Advanced Extensions
- Add platform-specific ABI notes where relevant.
- Add cross-compiler comparison outputs.
9. Real-World Connections
9.1 Industry Applications
- Debugging: Translating compiler errors into clear explanations.
- Tooling: Building diagnostics for build systems and CI.
9.2 Related Open Source Projects
- cdecl: A classic declaration translation tool.
- Compiler Explorer: Shows how code becomes assembly.
9.3 Interview Relevance
- Declaration parsing, pointer reasoning, and ABI understanding are common systems interview topics.
10. Resources
10.1 Essential Reading
-
Undefined behavior “Effective C” Ch. 2 -
C gotchas “Expert C Programming” Ch. 1
10.2 Video Resources
- “C Declarations Explained” (conference talk)
- “Linker Errors Demystified” (lecture)
10.3 Tools & Documentation
- GCC/Clang documentation (compiler flags)
- System V AMD64 ABI
10.4 Related Projects in This Series
- Previous: P12-bug-catalog-language-features.md
- Next: P16-portable-code-checker.md
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the core rules without notes
- I can predict output before running the tool
- I can explain why the failure demo fails
11.2 Implementation
- All functional requirements are met
- All test cases pass
- Edge cases are handled
11.3 Growth
- I can explain this project in an interview
- I documented at least one lesson learned
12. Submission / Completion Criteria
Minimum Viable Completion:
- Golden path output matches the example transcript
- Failure demo produces correct exit code
- README explains how to run
Full Completion:
- All edge cases documented and tested
- Deterministic output across runs
Excellence (Going Above & Beyond):
- Additional output format (JSON or annotated)
- Cross-compiler comparisons