Project 1: Compiler Behavior Laboratory

A reproducible test harness that reveals how C behavior categories and compiler optimizations change program outcomes.

Quick Reference

Attribute	Value
Difficulty	Level 2 - Intermediate
Time Estimate	4-8 hours
Main Programming Language	C
Alternative Programming Languages	None (this is about C compilers)
Coolness Level	Level 3 - Genuinely Clever
Business Potential	Level 1 - Resume Gold
Prerequisites	C basics, command line, build flags, basic debugging
Key Topics	Abstract machine, UB, optimization, toolchain, diagnostics

1. Learning Objectives

By completing this project, you will:

Distinguish well-defined, implementation-defined, unspecified, and undefined behavior with concrete examples.
Build a repeatable test harness that compares compiler outputs across versions and optimization levels.
Explain the as-if rule and how observable behavior constrains optimization.
Design experiments that prevent the compiler from optimizing away critical observations.
Produce a documented report that can be used as a portability checklist for future C code.

2. All Theory Needed (Per-Concept Breakdown)

Concept 1: The C Abstract Machine and Behavior Categories

Fundamentals

The C standard does not describe a specific CPU or OS. Instead, it defines an abstract machine: a rules-based model that explains what C programs mean. Every C expression, object, and side effect is defined in terms of this abstract machine. The abstract machine is intentionally flexible so compilers can map it to many different real machines. This is why C has behavior categories. Well-defined behavior is portable; implementation-defined behavior is portable if you check the compiler’s documentation; unspecified behavior allows multiple outcomes; undefined behavior (UB) means the program violated a rule and the compiler has no obligations. Understanding these categories is the foundation for professional C because it teaches you what you can safely rely on and what you must avoid or document.

Deep Dive into the concept

The abstract machine sits between your C code and the physical machine. It defines sequence points, object lifetimes, representation, and how evaluation is sequenced. The compiler is free to transform your program as long as the observable behavior is consistent with what the abstract machine permits. That is why the abstract machine uses a precise vocabulary: “effective type,” “object representation,” “lvalue,” “trap representation,” and more. Each one corresponds to a rule that the compiler can use for optimization. If your program violates a rule, the compiler can assume that situation never occurs, and it can rewrite the program under that assumption. In practice, this means that UB can cause optimizations that delete checks, fold branches, or reorder memory operations. Those transformations are not “bugs” in the compiler; they are consequences of the abstract machine.

Behavior categories are the standard’s way of drawing lines around what is safe. Implementation-defined behavior is still part of the abstract machine, but the standard delegates the concrete choice to the implementation. For example, the size of int, the signedness of char, and the representation of long double are implementation-defined. The compiler must document them, which means you can depend on them if you restrict your portability target. Unspecified behavior is trickier: the compiler can choose from multiple valid outcomes, and it is not required to document the choice. A classic example is the order of evaluation of function arguments. If your program depends on that order, you have a bug that may only surface when you change compilers or flags.

Undefined behavior is the most dangerous category. The standard intentionally avoids defining behavior for certain cases because it would prevent optimizations or complicate implementations. Signed integer overflow, shifting by negative counts, out-of-bounds pointer arithmetic, and reading uninitialized values are all UB. Compilers use UB to justify aggressive optimizations: if UB cannot happen, branches checking for it may be removed. This is why a program might behave “correctly” at -O0 but break at -O3. It’s not that the optimization “introduced” the bug; rather, it revealed assumptions the program already violated. In this project, you will design tests that make these boundaries explicit and document how different compilers exploit them.

In practice, you will discover that behavior categories are not only theoretical labels but also decision points that shape how you write and review code. A professional workflow is to treat every unclear construct as a question: is this well-defined, implementation-defined, or unspecified? Then you record the answer, preferably with a citation to the compiler manual or the standard. This changes how you write tests: you do not just assert outputs, you also assert categories and the conditions that make them valid. For example, you might list assumptions about two’s complement, char signedness, or right-shift behavior. You can also build small diagnostics that encode those assumptions into compile-time checks (static_assert) or runtime probes, making the program self-documenting. Finally, you should observe how compiler flags like -fwrapv, -fno-strict-aliasing, and sanitizer builds effectively move code between categories by changing what the compiler assumes. This is why professional C codebases often centralize compiler flags and document which behaviors they rely on, rather than letting each file make implicit assumptions.

To operationalize this concept in a real codebase, create a short checklist of invariants and a set of micro-experiments. Start with a minimal, deterministic test that isolates one rule or behavior, then vary a single parameter at a time (inputs, flags, platform, or data layout) and record the outcome. Keep a table of assumptions and validate them with assertions or static checks so violations are caught early. Whenever the concept touches the compiler or OS, capture tool output such as assembly, warnings, or system call traces and attach it to your lab notes. Finally, define explicit failure modes: what does a violation look like at runtime, and how would you detect it in logs or tests? This turns abstract theory into repeatable engineering practice and makes results comparable across machines and compiler versions.

How this fits on projects

It defines the classification system for every test in your lab.
It determines which results are portable and which are compiler-specific.
It informs how you document outcomes and which warnings are fatal.

Definitions & key terms

Abstract machine: The formal model defined by the C standard that determines program meaning.
Well-defined behavior: Behavior specified by the standard, portable across conforming implementations.
Implementation-defined behavior: Behavior chosen by the implementation, documented in compiler/ABI docs.
Unspecified behavior: Multiple outcomes allowed, no documentation required.
Undefined behavior (UB): The standard imposes no requirements; anything may happen.
Trap representation: A bit pattern that does not represent a valid value for a type.

Mental model diagram (ASCII)

Source Code
   |
   v
[ C Abstract Machine ] --rules--> { defined | impl-defined | unspecified | UB }
   |
   v
Compiler transforms (as-if rule)
   |
   v
Machine code -> OS -> Hardware

How it works (step-by-step, with invariants and failure modes)

The compiler parses your code and interprets it through the abstract machine’s rules.
It classifies operations into behavior categories based on those rules.
Optimizations assume that UB never occurs and that implementation-defined behavior matches documented choices.
If your program violates a rule, the compiler may remove checks or reorder code.
The resulting binary can show outcomes that are surprising but still valid under the abstract machine.

Invariant: The compiler must preserve observable behavior for well-defined programs. Failure mode: If UB occurs, the invariant no longer applies and optimizations may invalidate assumptions.

Minimal concrete example

#include <stdio.h>
#include <limits.h>

int main(void) {
    int x = INT_MAX;
    int y = x + 1; // signed overflow: UB
    printf("%d\n", y);
    return 0;
}

Common misconceptions

“Undefined behavior just means a crash.” → UB can also look correct or vary by optimization level.
“If it works on GCC, it’s portable.” → Implementation-defined behavior might differ elsewhere.
“Warnings don’t matter.” → Many UB cases appear only as warnings, not errors.

Check-your-understanding questions

Why can a compiler remove an overflow check on signed integers?
What is the difference between unspecified and implementation-defined behavior?
Give an example of UB that does not immediately crash.
Why is a trap representation relevant to floating-point or pointer types?
How does the abstract machine enable portability across CPUs?

Check-your-understanding answers

Because signed overflow is UB; the compiler can assume it never happens.
Implementation-defined behavior must be documented; unspecified behavior need not be.
Reading an uninitialized local variable often prints “garbage” but may appear consistent.
Trap representations are invalid bit patterns; reading them can cause UB.
The abstract machine defines semantics independent of hardware, letting compilers map to many CPUs.

Real-world applications

Writing portable libraries that must compile with GCC, Clang, and MSVC.
Debugging “release-only” bugs caused by UB optimizations.
Auditing security-sensitive code where UB can become exploitable.

Where you’ll apply it

See §3.2 Functional Requirements for how tests are categorized.
See §5.4 Concepts You Must Understand First for how you choose experiments.
Also used in: Project 4: Expression and Operator Mastery, Project 11: Testing and Analysis Framework.

References

“Effective C, 2nd Edition” — Robert C. Seacord, Ch. 1-2
“C Programming: A Modern Approach” — K.N. King, Ch. 12-13
ISO C Standard (C23 draft): sections on behavior categories and constraints

Key insights

Professional C means treating the abstract machine as your contract and UB as a design failure.

Summary

The abstract machine defines what C means without committing to a specific CPU. Behavior categories are the standard’s mechanism for separating portable guarantees from compiler-specific choices and outright errors. Knowing these categories is essential for building robust, portable C programs and for interpreting the surprising behavior that emerges under optimization.

Homework/Exercises to practice the concept

List five examples of undefined behavior and explain why each is UB.
Find three implementation-defined behaviors on your compiler and document them.
Write a small program that depends on unspecified evaluation order; then fix it.

Solutions to the homework/exercises

Examples: signed overflow, shifting by >= width, out-of-bounds pointer, reading uninitialized memory, modifying a string literal.
sizeof(int)=4, char signedness, right-shift of negative signed values.
Split the expression into separate statements or use sequencing operators.

Concept 2: The As-If Rule, Optimization, and Observable Behavior

Fundamentals

The as-if rule says the compiler may transform your program in any way as long as the observable behavior is unchanged. Observable behavior includes I/O, volatile accesses, and interactions that the C standard defines as visible. Everything else is fair game. This is what allows the compiler to remove dead code, inline functions, reorder operations, and vectorize loops. It also explains why undefined behavior is so dangerous: if the compiler can prove a situation is impossible under the abstract machine, it can eliminate code paths that would otherwise handle it. Understanding the as-if rule helps you predict optimization effects and design experiments that the compiler cannot legally delete.

Deep Dive into the concept

Optimization is not a bag of tricks; it is a set of transformations justified by the abstract machine’s rules. The compiler builds an internal representation of your program and reasons about equivalence: can this expression be replaced with a simpler one without changing observable behavior? It assumes the program follows the rules of the language. For example, if it can prove that a pointer is never null, it can remove null checks and dereferences. This is legal because in a well-defined program, a null dereference cannot happen. The as-if rule is the formal license that allows these proofs to become transformations.

Observable behavior is intentionally narrow to give compilers maximum freedom. A sequence of pure arithmetic operations with no side effects can be reordered, combined, or removed entirely. But if those operations feed a printf, then the output becomes observable. The compiler must preserve the output as if the operations happened in the original order. The presence of volatile expands the observable set by forcing loads and stores to happen. This is why you use volatile in a compiler experiment: it pins the optimizer to the reality you want to observe.

Optimization levels are bundles of transformation passes. -O0 tries to preserve structure for debugging. -O1 and -O2 prioritize safe, common optimizations; -O3 may unroll loops, vectorize, and reorder more aggressively. Some optimizations are sensitive to UB: if a compiler can prove an expression never overflows or a pointer is never out of bounds, it will use that fact globally. This can lead to surprising behavior when the program violates the assumption. Your test harness must therefore both provoke UB and control the optimizer so you can see the difference. That means using volatile, isolating expressions in functions, and controlling inlining with attributes or -fno-inline when necessary.

Another subtlety is that the as-if rule applies after translation units are combined. Link-time optimization (LTO) gives the compiler visibility across files, enabling transformations that are impossible within a single translation unit. That means a program that appears safe in one file can be optimized away once the compiler sees the whole program. The lab should include tests that compare normal builds with LTO to show how visibility changes optimization.

How this fits on projects

It explains why results differ between -O0 and -O3.
It guides the use of volatile and barriers to preserve behavior.
It informs the report format, which must capture compiler flags and optimization levels.

Definitions & key terms

As-if rule: Allows transformations that preserve observable behavior.
Observable behavior: Effects visible to the outside world (I/O, volatile, atomic operations).
Optimization pass: A compiler phase that transforms IR to improve performance.
LTO: Link-time optimization that enables whole-program analysis.
Volatile: Qualifier that forces loads/stores to be performed as written.

Mental model diagram (ASCII)

Source -> IR -> [Optimization Passes] -> IR' -> Codegen
                 |         |
                 |         +-- Assumes UB never occurs
                 +-- Must preserve observable behavior

How it works (step-by-step, with invariants and failure modes)

The compiler builds IR and annotates side effects.
It applies transformations that preserve observable behavior.
It assumes UB does not occur and treats impossible paths as dead.
It emits optimized machine code based on those assumptions.
If UB occurs at runtime, the emitted code may not match your intuition.

Invariant: Observable behavior must match the abstract machine for defined programs. Failure mode: UB breaks the invariant and allows aggressive reordering or deletion.

Minimal concrete example

int foo(int *p) {
    if (p == 0) return 0;
    *p = 1;
    return 1;
}

When compiled with -O3, the compiler might assume p is never null if it can prove no callers pass null, removing the check entirely.

Common misconceptions

“Volatile makes code thread-safe.” → It only affects optimization, not atomicity.
“-O0 means no optimization.” → Some optimizations still happen.
“If I see the check in source, it must execute.” → The compiler can delete it.

Check-your-understanding questions

Why is volatile useful in a compiler behavior lab?
What changes when you enable LTO?
Which operations are considered observable?
How can -fno-inline change your experiment results?
Why does UB allow optimizations that appear to “break” code?

Check-your-understanding answers

It forces actual loads/stores, preventing the optimizer from removing them.
The compiler sees more code, enabling whole-program reasoning and more optimizations.
I/O, volatile/atomic accesses, and other side effects visible outside the program.
It prevents inlining, keeping function boundaries and call semantics intact.
Because the compiler assumes UB never occurs, it can remove paths that would handle it.

Real-world applications

Diagnosing why a security check disappears in optimized builds.
Writing firmware where volatile-mapped registers must not be optimized away.
Understanding why a debug build works but a release build fails.

Where you’ll apply it

See §3.7 Real World Outcome for how outputs vary by optimization level.
See §5.8 Hints in Layers for how to freeze behavior during experiments.
Also used in: Project 6: Dynamic Memory Allocator, Project 11: Testing and Analysis Framework.

References

“Engineering a Compiler” — Cooper & Torczon, optimization chapters
“Computer Systems: A Programmer’s Perspective” — Bryant & O’Hallaron, Ch. 5
GCC and Clang optimization manuals

Key insights

Optimizers are correct because they assume your program follows the rules you promised.

Summary

The as-if rule grants the compiler enormous freedom as long as observable behavior is preserved. That freedom is what makes modern compilers fast and also what makes UB so dangerous. In the lab you will design experiments that reveal these transformations and document how they vary across compilers and flags.

Homework/Exercises to practice the concept

Compile a simple loop at -O0 and -O3, then compare assembly.
Write a function with a null check and see if it disappears under LTO.
Use volatile to pin a variable and observe how codegen changes.

Solutions to the homework/exercises

You should see fewer loads/stores and more register usage at -O3.
With LTO, the compiler may remove the check if it proves non-null callers.
volatile forces loads/stores, increasing memory traffic and reducing reordering.

3. Project Specification

3.1 What You Will Build

A multi-compiler test harness that runs a curated set of C programs and produces a comparison report showing how behavior categories manifest across compilers, versions, flags, and optimization levels. The harness includes a runner script, a standardized output format, and documentation explaining each test case and its behavior category.

3.2 Functional Requirements

Behavior Categorization: Each test case must be labeled as defined, implementation-defined, unspecified, or UB.
Multi-Compiler Support: The harness must compile each test with GCC and Clang (MSVC optional).
Optimization Matrix: Each test must run at least at -O0 and -O3.
Result Capture: Outputs must be captured and normalized into a single report format.
Version Logging: Compiler versions and flags must be included in the report header.

3.3 Non-Functional Requirements

Performance: The suite should complete in under 60 seconds for the default set.
Reliability: Tests must be deterministic when the behavior category is defined.
Usability: One command should generate the full report.

3.4 Example Usage / Output

$ ./run_all_tests.sh
=== COMPILER BEHAVIOR LAB REPORT ===
Compilers: gcc 14.2.0, clang 18.1.2
Flags: -std=c23 -O0, -O3

Test: signed_overflow (UB)
  gcc -O0: -2147483648
  gcc -O3: 2147483647
  clang -O0: -2147483648
  clang -O3: <crash>

Test: right_shift_negative (implementation-defined)
  gcc -O0: -1
  clang -O0: -1

Test: eval_order_args (unspecified)
  gcc -O0: 1 2
  clang -O0: 2 1

3.5 Data Formats / Schemas / Protocols

Report format (tabular text):

Test Name | Category | gcc -O0 | gcc -O3 | clang -O0 | clang -O3
--------- | -------- | ------- | ------- | --------- | ---------
...

3.6 Edge Cases

Tests optimized away due to inlining or constant folding.
UB tests that crash under one compiler but not another.
Implementation-defined results that change between compiler versions.

3.7 Real World Outcome

What you will see:

A reproducible report comparing outputs across compilers and flags.
A catalog of behavior categories with explanations.
A test harness you can reuse to validate portability.

3.7.1 How to Run (Copy/Paste)

# From the project root
make clean
make all
./run_all_tests.sh > reports/behavior_report.txt

3.7.2 Golden Path Demo (Deterministic)

Run a known-defined behavior test (e.g., unsigned wraparound) and verify identical results across compilers.

3.7.3 If CLI: exact terminal transcript

$ ./run_all_tests.sh --tests unsigned_wraparound
=== COMPILER BEHAVIOR LAB REPORT ===
Test: unsigned_wraparound (defined)
  gcc -O0: 0
  gcc -O3: 0
  clang -O0: 0
  clang -O3: 0
Exit: 0

Failure demo (deterministic):

$ ./run_all_tests.sh --tests missing_file
ERROR: test file not found: tests/missing_file.c
Exit: 2

4. Solution Architecture

4.1 High-Level Design

+-------------------+
| test_cases/        |
|  *.c               |
+---------+---------+
          |
          v
+-------------------+     +------------------+
| build_matrix.sh   | --> | compiler outputs |
+---------+---------+     +------------------+
          |
          v
+-------------------+     +------------------+
| runner            | --> | normalized logs  |
+---------+---------+     +------------------+
          |
          v
+-------------------+
| report generator  |
+-------------------+

4.2 Key Components

Component	Responsibility	Key Decisions
Test cases	Individual C programs per behavior category	Keep each test isolated and minimal
Build matrix	Compile tests across compilers/flags	Explicit matrix to make differences visible
Runner	Execute tests, capture output/exit	Normalize output for comparison
Reporter	Produce final table	Text format for portability

4.3 Data Structures (No Full Code)

struct test_case {
    const char *name;
    const char *category; // defined, impl-defined, unspecified, UB
    const char *source_path;
};

4.4 Algorithm Overview

Key Algorithm: Build-and-run matrix

Enumerate test cases.
For each compiler and optimization flag, compile each test.
Run each binary and capture stdout/stderr/exit code.
Normalize output and write report.

Complexity Analysis:

Time: O(T * C * F) where T=tests, C=compilers, F=flags
Space: O(T) for report aggregation

5. Implementation Guide

5.1 Development Environment Setup

# Required compilers
gcc --version
clang --version

# Build tools
make --version

5.2 Project Structure

compiler-behavior-lab/
├── tests/
│   ├── ub_signed_overflow.c
│   ├── impl_right_shift.c
│   └── unspecified_eval_order.c
├── scripts/
│   ├── build_matrix.sh
│   └── run_all_tests.sh
├── reports/
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“What exactly does my C code mean, and who decides?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

Behavior categories and the abstract machine.
Optimization levels and the as-if rule.
How to prevent the compiler from optimizing away your experiment.

5.5 Questions to Guide Your Design

How will you keep each test small and isolated?
How will you normalize output for fair comparisons?
How will you handle tests that crash or time out?

5.6 Thinking Exercise

Design a test that demonstrates unspecified evaluation order without using UB.

5.7 The Interview Questions They’ll Ask

Why is signed overflow undefined in C?
How would you demonstrate implementation-defined behavior to a teammate?
What does the as-if rule guarantee?

5.8 Hints in Layers

Hint 1: Start with one test case and a single compiler/flag.
Hint 2: Add volatile to keep computations visible.
Hint 3: Separate compilation and execution so you can compare outputs.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2-3 hours)

Build one test and compile with two flags.
Script the build matrix.
Checkpoint: The same test runs across GCC/Clang with captured outputs.

Phase 2: Core Functionality (3-4 hours)

Add 10+ tests across categories.
Implement report normalization.
Checkpoint: Report lists all tests across compilers/flags.

Phase 3: Polish & Edge Cases (1-2 hours)

Add crash handling and exit codes.
Document each test with category reasoning.
Checkpoint: Report includes reasons and compiler versions.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Missing compiler: Should exit with code 3 and clear message.
UB crash: Should capture exit code and continue other tests.
Normalization: Output should strip timestamps or non-deterministic values.

6.3 Test Data

Test: signed_overflow
Expected: non-deterministic between -O0 and -O3

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Use -S to inspect assembly for removed branches.
Add -fno-inline to keep function boundaries.

7.3 Performance Traps

Running too many tests with LTO can slow the suite; default to non-LTO and add LTO as a separate matrix row.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a summary for each behavior category.
Add colorized output for readability.

8.2 Intermediate Extensions

Add MSVC support using a Windows CI runner.
Compare -O2 and -Os in addition to -O0/-O3.

8.3 Advanced Extensions

Add LTO and PGO comparisons.
Generate HTML reports with links to the test source.

9. Real-World Connections

9.1 Industry Applications

Porting low-level C libraries across compilers and architectures.
Auditing embedded firmware where compiler flags change behavior.

LLVM test suite — extensive behavior and codegen tests.
GCC torture tests — compiler regression corpus.

9.3 Interview Relevance

UB and optimization questions are common in systems interviews.
Demonstrating the as-if rule shows strong language model knowledge.

10. Resources

10.1 Essential Reading

“Effective C, 2nd Edition” — Seacord (behavior categories)
“CS:APP” — Bryant & O’Hallaron (optimization and machine model)

10.2 Video Resources

Compiler Explorer walkthroughs (Godbolt demos)
Talks on undefined behavior and optimization

10.3 Tools & Documentation

GCC docs: optimization flags
Clang docs: undefined behavior sanitizer

11. Self-Assessment Checklist

11.1 Understanding

I can define all four behavior categories with examples.
I can explain how the as-if rule enables optimization.
I can explain why UB can remove checks.

11.2 Implementation

The harness builds on GCC and Clang.
The report is reproducible and deterministic for defined tests.
Exit codes are documented and consistent.

11.3 Growth

I can explain a real-world bug caused by UB.
I can use this lab as a portability checklist.

12. Submission / Completion Criteria

Minimum Viable Completion:

Behavior categories covered with 3+ tests each.
Report generated for GCC and Clang at -O0 and -O3.
Report includes compiler versions and flags.

Full Completion:

All minimum criteria plus:
LTO comparison row included.
Markdown or CSV export available.

Excellence (Going Above & Beyond):

Automated CI job that runs the lab on multiple OS targets.
Write-up explaining at least 5 surprising optimization outcomes.