Project 11: Preprocessor Output Analyzer

Build a tool that visualizes macro expansion, X-macros, token pasting, and stringification, revealing what the compiler really sees.


Quick Reference

Attribute Value
Language C
Difficulty Level 3 (Advanced)
Time 1 Week
Chapters Expert C Programming Ch. 7
Coolness 3/5 - Essential for macro debugging
Portfolio Value Moderate - Shows metaprogramming knowledge

1. Learning Objectives

By completing this project, you will:

  1. Master macro expansion rules: Understand the exact order and rules by which macros are expanded, including recursive expansion
  2. Demystify token pasting (##): Know precisely when and how tokens are concatenated and why it sometimes fails
  3. Understand stringification (#): Convert macro arguments to string literals correctly
  4. Implement X-macro patterns: Use this powerful technique for maintaining parallel data structures
  5. Debug complex macro issues: Trace expansion step-by-step to find subtle bugs
  6. Compare manual analysis with gcc -E: Verify your understanding against actual preprocessor output
  7. Recognize variadic macro patterns: Handle __VA_ARGS__ and ##__VA_ARGS__ correctly
  8. Identify preprocessor pitfalls: Avoid common bugs like double evaluation and missing parentheses

2. Theoretical Foundation

2.1 Core Concepts

The C preprocessor is a text transformation engine that runs BEFORE the compiler sees your code. Understanding it means understanding:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    THE PREPROCESSOR IN THE COMPILATION PIPELINE             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   source.c ──┬──▶ PREPROCESSOR ──▶ source.i ──▶ COMPILER ──▶ source.s     │
│              │                                                              │
│              │    What it does:                                             │
│              │    1. Process #include (file inclusion)                      │
│              │    2. Process #define (macro definition)                     │
│              │    3. Process #if/#ifdef (conditional compilation)           │
│              │    4. Expand macros (text substitution)                      │
│              │    5. Process # and ## operators                             │
│              │    6. Handle line continuations (\)                          │
│              │    7. Strip comments                                         │
│              │                                                              │
│   View output:                                                              │
│   $ gcc -E source.c -o source.i                                            │
│   $ clang -E source.c                                                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Macro Expansion Order

The preprocessor follows specific rules for macro expansion:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         MACRO EXPANSION ALGORITHM                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  RULE 1: Argument prescan                                                   │
│  ─────────────────────────                                                  │
│  Before substitution, macro arguments are fully expanded UNLESS:            │
│  - The argument is operand of # (stringification)                           │
│  - The argument is operand of ## (token pasting)                            │
│                                                                             │
│  RULE 2: Argument substitution                                              │
│  ─────────────────────────────                                              │
│  After prescan, arguments replace parameters in the replacement text.       │
│                                                                             │
│  RULE 3: # and ## processing                                                │
│  ───────────────────────────                                                │
│  # converts argument to string literal (not pre-expanded!)                  │
│  ## concatenates adjacent tokens (not pre-expanded!)                        │
│                                                                             │
│  RULE 4: Rescan                                                             │
│  ──────────────                                                             │
│  The result is rescanned for more macro expansion.                          │
│  The original macro is "painted blue" to prevent infinite recursion.        │
│                                                                             │
│  EXAMPLE: Step-by-step expansion                                            │
│  ────────────────────────────────                                           │
│  #define DOUBLE(x) ((x) + (x))                                              │
│  #define SQUARE(x) ((x) * (x))                                              │
│  #define QUAD(x) DOUBLE(SQUARE(x))                                          │
│                                                                             │
│  QUAD(3)                                                                    │
│    ↓ Step 1: Identify macro QUAD, argument is "3"                           │
│  DOUBLE(SQUARE(3))                                                          │
│    ↓ Step 2: Prescan argument SQUARE(3)                                     │
│  DOUBLE(((3) * (3)))                                                        │
│    ↓ Step 3: Expand DOUBLE with expanded argument                           │
│  ((((3) * (3))) + (((3) * (3))))                                            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Token Pasting (##)

Token pasting combines two tokens into one:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         TOKEN PASTING (##)                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  #define CONCAT(a, b) a ## b                                                │
│  #define MAKE_VAR(n) var_ ## n                                              │
│  #define MAKE_FUNC(name) int name ## _init(void)                            │
│                                                                             │
│  CONCAT(foo, bar)   →  foobar                                               │
│  MAKE_VAR(42)       →  var_42                                               │
│  MAKE_FUNC(logger)  →  int logger_init(void)                                │
│                                                                             │
│  CRITICAL: Arguments to ## are NOT pre-expanded!                            │
│  ──────────────────────────────────────────────                             │
│                                                                             │
│  #define A 100                                                              │
│  #define B 200                                                              │
│  #define PASTE(x, y) x ## y                                                 │
│                                                                             │
│  PASTE(A, B)                                                                │
│    ↓ Arguments NOT expanded before ##                                       │
│  AB  (NOT 100200!)                                                          │
│    ↓ Rescan: Is AB a macro? No.                                             │
│  AB  (final result - probably undefined identifier!)                        │
│                                                                             │
│  SOLUTION: Use indirection                                                  │
│  ──────────────────────────                                                 │
│  #define PASTE2(x, y) x ## y                                                │
│  #define PASTE(x, y) PASTE2(x, y)                                           │
│                                                                             │
│  PASTE(A, B)                                                                │
│    ↓ Arguments expanded before PASTE2 sees them                             │
│  PASTE2(100, 200)                                                           │
│    ↓ Now ## operates on 100 and 200                                         │
│  100200  (a valid number!)                                                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Stringification (#)

Stringification converts a macro argument to a string literal:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         STRINGIFICATION (#)                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  #define STRINGIFY(x) #x                                                    │
│  #define TO_STRING(x) STRINGIFY(x)                                          │
│                                                                             │
│  STRINGIFY(hello world)   →  "hello world"                                  │
│  STRINGIFY(100 + 200)     →  "100 + 200"                                    │
│  STRINGIFY("already")     →  "\"already\""  (quotes escaped!)               │
│                                                                             │
│  SAME RULE: Arguments to # are NOT pre-expanded!                            │
│  ─────────────────────────────────────────────                              │
│                                                                             │
│  #define VERSION 2                                                          │
│  STRINGIFY(VERSION)       →  "VERSION"  (NOT "2"!)                          │
│                                                                             │
│  SOLUTION: Use indirection                                                  │
│  ──────────────────────────                                                 │
│  TO_STRING(VERSION)                                                         │
│    ↓ Argument expanded before STRINGIFY sees it                             │
│  STRINGIFY(2)                                                               │
│    ↓ # operates on "2"                                                      │
│  "2"                                                                        │
│                                                                             │
│  PRACTICAL EXAMPLE: Debug logging                                           │
│  ─────────────────────────────────                                          │
│  #define SHOW(expr) printf(#expr " = %d\n", (expr))                         │
│                                                                             │
│  int x = 5;                                                                 │
│  SHOW(x + 3);                                                               │
│    ↓ Expands to:                                                            │
│  printf("x + 3" " = %d\n", (x + 3));                                        │
│    ↓ String concatenation:                                                  │
│  printf("x + 3 = %d\n", (x + 3));                                           │
│    ↓ Output:                                                                │
│  x + 3 = 8                                                                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

X-Macros Pattern

X-macros are a powerful metaprogramming technique:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         X-MACRO PATTERN                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  THE PROBLEM:                                                               │
│  You need multiple parallel definitions that must stay in sync:             │
│  - An enum for error codes                                                  │
│  - A string array for error messages                                        │
│  - A switch statement for handling                                          │
│                                                                             │
│  TRADITIONAL (error-prone):                                                 │
│  ──────────────────────────                                                 │
│  enum { ERR_NONE, ERR_FILE, ERR_MEM, ERR_NET };  // Add here...             │
│  const char *msgs[] = { "None", "File", "Memory", "Network" }; // ...and here│
│                                                                             │
│  X-MACRO SOLUTION:                                                          │
│  ─────────────────                                                          │
│                                                                             │
│  // Define the data ONCE:                                                   │
│  #define ERROR_TABLE(X) \                                                   │
│      X(ERR_NONE, "No error")       \                                        │
│      X(ERR_FILE, "File error")     \                                        │
│      X(ERR_MEM,  "Memory error")   \                                        │
│      X(ERR_NET,  "Network error")                                           │
│                                                                             │
│  // Generate enum:                                                          │
│  #define MAKE_ENUM(name, msg) name,                                         │
│  enum ErrorCode {                                                           │
│      ERROR_TABLE(MAKE_ENUM)                                                 │
│      ERR_COUNT  // Automatically gets correct count!                        │
│  };                                                                         │
│  #undef MAKE_ENUM                                                           │
│                                                                             │
│  // Generate string array:                                                  │
│  #define MAKE_STRING(name, msg) [name] = msg,                               │
│  const char *error_messages[ERR_COUNT] = {                                  │
│      ERROR_TABLE(MAKE_STRING)                                               │
│  };                                                                         │
│  #undef MAKE_STRING                                                         │
│                                                                             │
│  EXPANSION:                                                                 │
│  ──────────                                                                 │
│  enum ErrorCode {                                                           │
│      ERR_NONE, ERR_FILE, ERR_MEM, ERR_NET,                                  │
│      ERR_COUNT                                                              │
│  };                                                                         │
│  const char *error_messages[ERR_COUNT] = {                                  │
│      [ERR_NONE] = "No error",                                               │
│      [ERR_FILE] = "File error",                                             │
│      [ERR_MEM] = "Memory error",                                            │
│      [ERR_NET] = "Network error",                                           │
│  };                                                                         │
│                                                                             │
│  BENEFIT: Add new error? Change ONE place. Everything stays in sync!        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Variadic Macros

C99 introduced variadic macros with __VA_ARGS__:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         VARIADIC MACROS                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  BASIC SYNTAX:                                                              │
│  ─────────────                                                              │
│  #define LOG(fmt, ...) printf(fmt, __VA_ARGS__)                             │
│                                                                             │
│  LOG("x=%d y=%d", x, y);  →  printf("x=%d y=%d", x, y);                    │
│                                                                             │
│  THE TRAILING COMMA PROBLEM:                                                │
│  ────────────────────────────                                               │
│  LOG("hello");  →  printf("hello", );   // SYNTAX ERROR!                   │
│                                                                             │
│  SOLUTIONS:                                                                 │
│                                                                             │
│  1. GNU Extension: ##__VA_ARGS__ (removes comma if empty)                   │
│     #define LOG(fmt, ...) printf(fmt, ##__VA_ARGS__)                        │
│     LOG("hello");  →  printf("hello");  // OK!                              │
│                                                                             │
│  2. C23 Standard: __VA_OPT__(x) (includes x only if args present)           │
│     #define LOG(fmt, ...) printf(fmt __VA_OPT__(,) __VA_ARGS__)             │
│                                                                             │
│  3. Workaround: Always require at least one arg                             │
│     #define LOG(fmt, arg1, ...) printf(fmt, arg1, ##__VA_ARGS__)            │
│                                                                             │
│  COUNTING ARGUMENTS:                                                        │
│  ────────────────────                                                       │
│  #define COUNT_ARGS(...) COUNT_ARGS_IMPL(__VA_ARGS__, 5, 4, 3, 2, 1, 0)     │
│  #define COUNT_ARGS_IMPL(_1, _2, _3, _4, _5, N, ...) N                      │
│                                                                             │
│  COUNT_ARGS(a)           →  1                                               │
│  COUNT_ARGS(a, b)        →  2                                               │
│  COUNT_ARGS(a, b, c, d)  →  4                                               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

Understanding the preprocessor deeply enables:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    WHY PREPROCESSOR KNOWLEDGE MATTERS                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. DEBUGGING MACRO ISSUES                                                  │
│     ─────────────────────────                                               │
│     - Complex macros can produce cryptic compiler errors                    │
│     - Error points to expanded code, not original macro                     │
│     - gcc -E shows what compiler actually sees                              │
│                                                                             │
│  2. METAPROGRAMMING / CODE GENERATION                                       │
│     ─────────────────────────────────                                       │
│     - Generate repetitive code automatically                                │
│     - X-macros keep parallel data structures in sync                        │
│     - Implement compile-time dispatch tables                                │
│                                                                             │
│  3. CROSS-PLATFORM COMPATIBILITY                                            │
│     ─────────────────────────────                                           │
│     - #ifdef for platform-specific code                                     │
│     - Feature detection macros                                              │
│     - API version handling                                                  │
│                                                                             │
│  4. DEBUG/RELEASE BUILDS                                                    │
│     ─────────────────────────                                               │
│     - assert() is a macro (disabled in release)                             │
│     - Logging macros with __FILE__, __LINE__                                │
│     - Performance-critical code paths                                       │
│                                                                             │
│  5. UNDERSTANDING LIBRARY CODE                                              │
│     ─────────────────────────────                                           │
│     - Linux kernel uses heavy macro magic                                   │
│     - GTK, Qt use macros for OOP patterns                                   │
│     - Reading glibc headers requires macro fluency                          │
│                                                                             │
│  REAL-WORLD EXAMPLES:                                                       │
│  ────────────────────                                                       │
│  - Linux kernel: container_of(), list_for_each()                            │
│  - SQLite: One of the most macro-heavy C codebases                          │
│  - CPython: Object type definitions via macros                              │
│  - Unity test framework: TEST_ASSERT macros                                 │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

2.3 Historical Context

┌─────────────────────────────────────────────────────────────────────────────┐
│                    WHY C HAS A PREPROCESSOR                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  HISTORICAL ORIGINS (1972):                                                 │
│  ──────────────────────────                                                 │
│  - C was designed for the PDP-11 with limited memory                        │
│  - No link-time optimization, no inline functions                           │
│  - Macros provided "zero-cost" abstractions                                 │
│  - Text substitution was simple to implement                                │
│                                                                             │
│  THE PREPROCESSOR IS NOT PART OF C:                                         │
│  ───────────────────────────────────                                        │
│  - It's a separate text-processing phase                                    │
│  - Has no knowledge of C syntax or types                                    │
│  - Works purely on tokens, not semantic understanding                       │
│  - This is why macro bugs are so insidious                                  │
│                                                                             │
│  EVOLUTION:                                                                 │
│  ──────────                                                                 │
│  Pre-K&R: Very primitive macros, no arguments                               │
│  K&R C:   Function-like macros added                                        │
│  C89:     # and ## operators standardized                                   │
│  C99:     Variadic macros (__VA_ARGS__)                                     │
│  C11:     _Generic (type-based dispatch, not preprocessor)                  │
│  C23:     __VA_OPT__ for better variadic handling                           │
│                                                                             │
│  WHY IT PERSISTS:                                                           │
│  ────────────────                                                           │
│  - Backward compatibility is paramount in C                                 │
│  - inline functions don't fully replace macros                              │
│  - Conditional compilation has no alternative                               │
│  - Code generation patterns are powerful                                    │
│                                                                             │
│  MODERN ALTERNATIVES:                                                       │
│  ────────────────────                                                       │
│  - C++: Templates, constexpr, inline                                        │
│  - Rust: Procedural macros (hygenic, type-aware)                            │
│  - Zig: Comptime (compile-time evaluation)                                  │
│  - C stays with the preprocessor                                            │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

2.4 Common Misconceptions

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PREPROCESSOR MISCONCEPTIONS                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  MISCONCEPTION 1: "Macros are like functions"                               │
│  ─────────────────────────────────────────────                              │
│  WRONG: Macros are TEXT SUBSTITUTION.                                       │
│                                                                             │
│  #define SQUARE(x) x * x                                                    │
│  SQUARE(1+2)  →  1+2 * 1+2  →  1 + 2 + 2 = 5 (not 9!)                      │
│                                                                             │
│  FIX: Parenthesize everything                                               │
│  #define SQUARE(x) ((x) * (x))                                              │
│                                                                             │
│  ────────────────────────────────────────────────────────────────────────── │
│                                                                             │
│  MISCONCEPTION 2: "Macros evaluate arguments once"                          │
│  ─────────────────────────────────────────────────                          │
│  WRONG: Arguments are substituted textually, evaluated multiple times.      │
│                                                                             │
│  #define MAX(a, b) ((a) > (b) ? (a) : (b))                                  │
│  MAX(i++, j++)                                                              │
│  → ((i++) > (j++) ? (i++) : (j++))                                          │
│  // i or j incremented TWICE!                                               │
│                                                                             │
│  FIX: Use inline functions, or statement expressions (GNU extension)        │
│  #define MAX(a, b) ({ typeof(a) _a = (a); typeof(b) _b = (b); \             │
│                       _a > _b ? _a : _b; })                                 │
│                                                                             │
│  ────────────────────────────────────────────────────────────────────────── │
│                                                                             │
│  MISCONCEPTION 3: "# and ## expand their arguments"                         │
│  ──────────────────────────────────────────────────                         │
│  WRONG: # and ## see the UNEXPANDED argument.                               │
│                                                                             │
│  #define VERSION 3                                                          │
│  #define STR(x) #x                                                          │
│  STR(VERSION)  →  "VERSION" (not "3"!)                                      │
│                                                                             │
│  FIX: Use indirection pattern                                               │
│  #define STR(x) STR2(x)                                                     │
│  #define STR2(x) #x                                                         │
│  STR(VERSION)  →  STR2(3)  →  "3"                                           │
│                                                                             │
│  ────────────────────────────────────────────────────────────────────────── │
│                                                                             │
│  MISCONCEPTION 4: "Macros can be recursive"                                 │
│  ──────────────────────────────────────────                                 │
│  WRONG: Macros cannot call themselves (would be infinite loop).             │
│                                                                             │
│  #define FOO (1 + FOO)                                                      │
│  FOO  →  (1 + FOO)   // FOO is "painted blue", not expanded again           │
│                                                                             │
│  This is intentional to prevent infinite expansion.                         │
│                                                                             │
│  ────────────────────────────────────────────────────────────────────────── │
│                                                                             │
│  MISCONCEPTION 5: "The preprocessor sees C syntax"                          │
│  ─────────────────────────────────────────────────                          │
│  WRONG: Preprocessor works on tokens, not syntax.                           │
│                                                                             │
│  #define BEGIN {                                                            │
│  #define END }                                                              │
│  if (x) BEGIN ... END   // Valid!                                           │
│                                                                             │
│  The preprocessor doesn't know { } are special in C.                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

3. Project Specification

3.1 What You Will Build

A macro expansion visualizer tool that shows step-by-step how the preprocessor transforms macros:

$ ./preproc_analyzer macros.c --expand
================================================================================
                    PREPROCESSOR OUTPUT ANALYZER
================================================================================

File: macros.c

MACRO DEFINITIONS FOUND:
────────────────────────
1. DEBUG_LOG(fmt, ...)do { fprintf(stderr, "[%s:%d] " fmt "\n", __FILE__, __LINE__, ##__VA_ARGS__); } while(0)
2. STRINGIFY(x)#x
3. XSTRINGIFY(x)       → STRINGIFY(x)
4. CONCAT(a, b)        → a ## b
5. XCONCAT(a, b)       → CONCAT(a, b)
6. VERSION             → 42

EXPANSION ANALYSIS:
────────────────────

Line 15: DEBUG_LOG("value = %d", x);
┌───────────────────────────────────────────────────────────────────────────┐
│ Step 1: Identify macro invocation                                         │
│   DEBUG_LOG("value = %d", x)                                              │
│   Arguments: fmt="value = %d", ...=x                                      │
├───────────────────────────────────────────────────────────────────────────┤
│ Step 2: Substitute into replacement text                                  │
│   do { fprintf(stderr, "[%s:%d] " "value = %d" "\n",                     │
│                __FILE__, __LINE__, x); } while(0)                         │
├───────────────────────────────────────────────────────────────────────────┤
│ Step 3: Expand predefined macros                                          │
│   __FILE__ → "macros.c"                                                   │
│   __LINE__ → 15                                                           │
├───────────────────────────────────────────────────────────────────────────┤
│ Step 4: Final result                                                      │
│   do { fprintf(stderr, "[%s:%d] " "value = %d" "\n",                     │
│                "macros.c", 15, x); } while(0)                             │
└───────────────────────────────────────────────────────────────────────────┘

Line 22: STRINGIFY(VERSION)
┌───────────────────────────────────────────────────────────────────────────┐
│ Step 1: Identify macro invocation                                         │
│   STRINGIFY(VERSION)                                                      │
│   Argument x=VERSION (NOT expanded - operand of #)                        │
├───────────────────────────────────────────────────────────────────────────┤
│ Step 2: Apply # operator                                                  │"VERSION"                                                               │
├───────────────────────────────────────────────────────────────────────────┤
│ RESULT: "VERSION" (probably not what you wanted!)                         │
│ HINT: Use XSTRINGIFY(VERSION) for "42"                                    │
└───────────────────────────────────────────────────────────────────────────┘

Line 25: XSTRINGIFY(VERSION)
┌───────────────────────────────────────────────────────────────────────────┐
│ Step 1: Identify macro invocation                                         │
│   XSTRINGIFY(VERSION)                                                     │
│   Argument x=VERSION                                                      │
├───────────────────────────────────────────────────────────────────────────┤
│ Step 2: Pre-expand argument (not operand of # or ##)                      │
│   VERSION → 42                                                            │
├───────────────────────────────────────────────────────────────────────────┤
│ Step 3: Substitute expanded argument                                      │
│   STRINGIFY(42)                                                           │
├───────────────────────────────────────────────────────────────────────────┤
│ Step 4: Rescan and expand STRINGIFY                                       │
│   "42"                                                                    │
└───────────────────────────────────────────────────────────────────────────┘

Line 30: CONCAT(foo, bar)
┌───────────────────────────────────────────────────────────────────────────┐
│ Step 1: Identify macro invocation                                         │
│   CONCAT(foo, bar)                                                        │
│   Arguments: a=foo, b=bar (NOT expanded - operands of ##)                 │
├───────────────────────────────────────────────────────────────────────────┤
│ Step 2: Apply ## operator                                                 │
│   foo ## bar → foobar                                                     │
├───────────────────────────────────────────────────────────────────────────┤
│ Step 3: Rescan for macros                                                 │
│   Is 'foobar' a macro? NO                                                 │
├───────────────────────────────────────────────────────────────────────────┤
│ RESULT: foobar                                                            │
└───────────────────────────────────────────────────────────────────────────┘

================================================================================
VERIFICATION: Compare with gcc -E output
$ gcc -E macros.c | grep -A1 "line 15"
================================================================================

3.2 Functional Requirements

  1. Macro Definition Parsing:
    • Extract all #define directives from source files
    • Handle object-like and function-like macros
    • Parse parameters including variadic ...
    • Handle multi-line macros (with \ continuation)
  2. Expansion Visualization:
    • Show step-by-step expansion process
    • Identify which expansion rules apply at each step
    • Highlight # and ## operations
    • Show prescan behavior for arguments
  3. X-Macro Support:
    • Recognize X-macro patterns
    • Show multiple expansion passes
    • Visualize table generation
  4. Predefined Macro Handling:
    • Substitute __FILE__, __LINE__, __DATE__, __TIME__
    • Show __func__ context (C99)
    • Handle __VA_ARGS__
  5. Comparison Mode:
    • Run gcc -E on the file
    • Compare tool output with actual preprocessor
    • Highlight any differences

3.3 Non-Functional Requirements

  • Accuracy: Expansion must match gcc -E output exactly
  • Clarity: Step-by-step output must be educational and readable
  • Performance: Handle files with hundreds of macros
  • Portability: Work on Linux and macOS

3.4 Example Usage / Output

Basic Usage

# Analyze a single file
$ ./preproc_analyzer source.c

# Show step-by-step expansion
$ ./preproc_analyzer source.c --steps

# Compare with gcc output
$ ./preproc_analyzer source.c --verify

# Expand specific macro
$ ./preproc_analyzer source.c --expand "DEBUG_LOG"

# Show X-macro expansion
$ ./preproc_analyzer source.c --xmacro "ERROR_TABLE"

# Interactive mode
$ ./preproc_analyzer -i
preproc> define MAX(a,b) ((a) > (b) ? (a) : (b))
preproc> expand MAX(x++, y)
Step 1: MAX(x++, y)
        Arguments: a=x++, b=y
Step 2: ((x++) > (y) ? (x++) : (y))
        WARNING: 'a' appears twice - side effects evaluated twice!

X-Macro Visualization

$ ./preproc_analyzer xmacro_example.c --xmacro "COLORS"

================================================================================
X-MACRO ANALYSIS: COLORS
================================================================================

Definition:
────────────
#define COLORS(X) \
    X(RED,   0xFF0000) \
    X(GREEN, 0x00FF00) \
    X(BLUE,  0x0000FF)

Usage 1: Generate enum (line 15)
──────────────────────────────────
#define ENUM_ENTRY(name, val) name,
enum Color { COLORS(ENUM_ENTRY) };
#undef ENUM_ENTRY

Expansion:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Pass 1: COLORS(ENUM_ENTRY)                                                  │
│   X = ENUM_ENTRY                                                            │
│   Substitute X in each table row:                                           │
│     ENUM_ENTRY(RED, 0xFF0000)                                               │
│     ENUM_ENTRY(GREEN, 0x00FF00)                                             │
│     ENUM_ENTRY(BLUE, 0x0000FF)                                              │
├─────────────────────────────────────────────────────────────────────────────┤
│ Pass 2: Expand each ENUM_ENTRY                                              │
│     RED, GREEN, BLUE,                                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│ Final:                                                                      │
│   enum Color { RED, GREEN, BLUE, };                                         │
└─────────────────────────────────────────────────────────────────────────────┘

Usage 2: Generate array (line 20)
──────────────────────────────────
#define ARRAY_ENTRY(name, val) [name] = val,
uint32_t color_values[] = { COLORS(ARRAY_ENTRY) };
#undef ARRAY_ENTRY

Expansion:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Final:                                                                      │
│   uint32_t color_values[] = {                                               │
│       [RED] = 0xFF0000,                                                     │
│       [GREEN] = 0x00FF00,                                                   │
│       [BLUE] = 0x0000FF,                                                    │
│   };                                                                        │
└─────────────────────────────────────────────────────────────────────────────┘

BENEFIT: Adding a new color requires changing only the COLORS definition!

Token Pasting Debugging

$ ./preproc_analyzer tokens.c --expand "MAKE_FUNC"

================================================================================
TOKEN PASTING ANALYSIS
================================================================================

Line 10: MAKE_FUNC(init)
────────────────────────

Definition: #define MAKE_FUNC(name) void name ## _handler(void)

Expansion:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Step 1: Identify tokens to paste                                            │
│   Left operand:  name (argument, value: init)                               │
│   ## operator                                                               │
│   Right operand: _handler (literal token)                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│ Step 2: Arguments for ## are NOT pre-expanded                               │
│   Using raw argument value: init                                            │
├─────────────────────────────────────────────────────────────────────────────┤
│ Step 3: Concatenate tokens                                                  │
│   init ## _handler → init_handler                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│ Step 4: Complete substitution                                               │
│   void init_handler(void)                                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│ Step 5: Rescan for macros                                                   │
│   Is 'init_handler' a macro? NO                                             │
│   Is 'void' a macro? NO                                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│ FINAL: void init_handler(void)                                              │
└─────────────────────────────────────────────────────────────────────────────┘

3.5 Real World Outcome

When complete, you will be able to:

  1. Debug any macro expansion issue by tracing step-by-step
  2. Understand Linux kernel macros like container_of, list_for_each
  3. Write correct X-macro patterns for code generation
  4. Avoid common pitfalls like double evaluation and missing parentheses
  5. Teach others how the preprocessor works

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PREPROCESSOR OUTPUT ANALYZER                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   source.c ─────▶ ┌──────────────┐                                         │
│                   │    LEXER     │                                          │
│                   │  (Tokenize)  │                                          │
│                   └──────┬───────┘                                          │
│                          │                                                  │
│                          ▼                                                  │
│   Macro Table ◀── ┌──────────────┐                                         │
│   ┌────────────┐  │   PARSER     │                                          │
│   │ MAX(a,b)   │  │ (#define)    │                                          │
│   │ DEBUG(...) │  └──────┬───────┘                                          │
│   │ VERSION=42 │         │                                                  │
│   └────────────┘         ▼                                                  │
│        │         ┌──────────────┐                                          │
│        │         │   EXPANDER   │                                          │
│        └────────▶│  (Step-by-   │                                          │
│                  │   step)      │                                          │
│                  └──────┬───────┘                                          │
│                         │                                                   │
│                         ▼                                                   │
│                  ┌──────────────┐      ┌──────────────┐                    │
│                  │   OUTPUT     │─────▶│   VERIFIER   │                    │
│                  │  FORMATTER   │      │  (gcc -E)    │                    │
│                  └──────────────┘      └──────────────┘                    │
│                         │                                                   │
│                         ▼                                                   │
│                  Step-by-step expansion report                              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component Responsibility Key Functions
Lexer Tokenize source and macro bodies tokenize(), next_token()
Macro Parser Extract #define directives parse_define(), parse_params()
Macro Table Store macro definitions macro_add(), macro_lookup()
Expander Perform step-by-step expansion expand(), prescan(), paste(), stringify()
Output Formatter Generate readable reports format_step(), format_comparison()
Verifier Compare with gcc -E output run_gcc(), compare_output()

4.3 Data Structures

/* Token representation */
typedef enum {
    TOK_IDENTIFIER,
    TOK_NUMBER,
    TOK_STRING,
    TOK_CHAR,
    TOK_PUNCT,       /* (, ), [, ], {, }, etc. */
    TOK_HASH,        /* # */
    TOK_HASHHASH,    /* ## */
    TOK_COMMA,
    TOK_ELLIPSIS,    /* ... */
    TOK_NEWLINE,
    TOK_WHITESPACE,
    TOK_EOF
} TokenType;

typedef struct {
    TokenType type;
    char *text;
    int line;
    int column;
} Token;

typedef struct {
    Token *tokens;
    size_t count;
    size_t capacity;
} TokenList;

/* Macro definition */
typedef struct {
    char *name;
    char **params;         /* NULL for object-like macros */
    int param_count;
    int is_variadic;       /* Has ... parameter */
    TokenList body;        /* Replacement token list */
    char *file;            /* Where defined */
    int line;              /* Line number */
    int is_predefined;     /* __FILE__, __LINE__, etc. */
} Macro;

typedef struct {
    Macro **macros;
    size_t count;
    size_t capacity;
} MacroTable;

/* Expansion step for visualization */
typedef enum {
    STEP_IDENTIFY,         /* Identify macro invocation */
    STEP_PRESCAN,          /* Pre-expand arguments */
    STEP_SUBSTITUTE,       /* Substitute parameters */
    STEP_STRINGIFY,        /* Apply # operator */
    STEP_PASTE,            /* Apply ## operator */
    STEP_RESCAN,           /* Rescan for more macros */
    STEP_FINAL             /* Final result */
} StepType;

typedef struct {
    StepType type;
    char *description;
    char *before;
    char *after;
    char *rule;            /* Which rule applies */
} ExpansionStep;

typedef struct {
    ExpansionStep *steps;
    size_t count;
    char *original;
    char *final;
    char **warnings;       /* Double evaluation, etc. */
    int warning_count;
} ExpansionTrace;

4.4 Algorithm Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MACRO EXPANSION ALGORITHM                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  expand(tokens, macro_table, trace):                                        │
│      result = empty_list                                                    │
│      painted = set()  // Macros being expanded (prevent recursion)          │
│                                                                             │
│      for each token in tokens:                                              │
│          if token is identifier and token.text in macro_table:              │
│              macro = macro_table[token.text]                                │
│                                                                             │
│              if macro in painted:                                           │
│                  // Prevent infinite recursion                              │
│                  result.append(token)                                       │
│                  continue                                                   │
│                                                                             │
│              if macro is function-like:                                     │
│                  args = parse_arguments(tokens)                             │
│                                                                             │
│                  // Step: Prescan arguments (unless # or ## operand)        │
│                  expanded_args = []                                         │
│                  for i, arg in enumerate(args):                             │
│                      if not is_hash_operand(macro, i):                      │
│                          expanded_args[i] = expand(arg, macro_table)        │
│                      else:                                                  │
│                          expanded_args[i] = arg  // Keep raw                │
│                                                                             │
│                  // Step: Substitute parameters                             │
│                  substituted = substitute(macro.body, expanded_args)        │
│                                                                             │
│                  // Step: Apply # and ##                                    │
│                  processed = apply_hash_operators(substituted)              │
│                                                                             │
│              else:  // Object-like macro                                    │
│                  processed = macro.body                                     │
│                                                                             │
│              // Step: Rescan with macro painted                             │
│              painted.add(macro)                                             │
│              expanded = expand(processed, macro_table, trace)               │
│              painted.remove(macro)                                          │
│                                                                             │
│              result.extend(expanded)                                        │
│          else:                                                              │
│              result.append(token)                                           │
│                                                                             │
│      return result                                                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

5. Implementation Guide

5.1 Development Environment Setup

# Required tools
sudo apt-get install gcc gdb build-essential

# For verification
gcc --version

# Create project structure
mkdir -p preproc_analyzer/{src,test,examples}
cd preproc_analyzer

# Test with gcc -E
echo '#define MAX(a,b) ((a)>(b)?(a):(b))
int x = MAX(3, 5);' | gcc -E -

# Should output:
# int x = ((3)>(5)?(3):(5));

5.2 Project Structure

preproc_analyzer/
├── src/
│   ├── main.c              # Entry point, CLI parsing
│   ├── lexer.c             # Tokenization
│   ├── lexer.h
│   ├── macro.c             # Macro table management
│   ├── macro.h
│   ├── expand.c            # Expansion engine
│   ├── expand.h
│   ├── output.c            # Output formatting
│   ├── output.h
│   ├── verify.c            # gcc -E comparison
│   └── verify.h
├── test/
│   ├── test_lexer.c
│   ├── test_expand.c
│   └── test_cases/
│       ├── basic.c
│       ├── stringify.c
│       ├── paste.c
│       ├── variadic.c
│       └── xmacro.c
├── examples/
│   ├── debug_log.c         # Logging macro example
│   ├── error_table.c       # X-macro example
│   └── generic_max.c       # Type-generic MAX
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“What exactly does the preprocessor do to my code before the compiler sees it?”

The preprocessor is often treated as magic - developers write macros, and somehow they expand. This project forces you to understand every rule:

  • Why does STRINGIFY(VERSION) produce "VERSION" instead of "42"?
  • Why does token pasting sometimes create invalid tokens?
  • How does the “painted blue” rule prevent infinite recursion?
  • Why do X-macros work the way they do?

5.4 Concepts You Must Understand First

Before starting implementation, verify you understand:

  1. Preprocessing phases: Trigraphs, line splicing, tokenization, then macro expansion
  2. Token vs text: The preprocessor works on tokens, not raw text
  3. Prescan rule: Arguments are expanded BEFORE substitution (except for # and ##)
  4. Rescan rule: Result is rescanned for more macros
  5. Painting: A macro being expanded is “painted blue” and won’t expand again

5.5 Questions to Guide Your Design

Work through these questions BEFORE writing code:

  1. Tokenization: How do you handle string literals that contain # characters?

  2. Multi-line macros: How do you detect and handle \ continuation?

  3. Variadic macros: How do you handle __VA_ARGS__ and the comma deletion with ##?

  4. Nested macros: How do you track which macros are currently being expanded?

  5. Argument counting: How do you match actual arguments to formal parameters?

  6. Token pasting: What if pasting produces an invalid token (like + ## -)?

  7. X-macro detection: How do you recognize the X-macro pattern in source?

  8. Verification: How do you normalize output for comparison with gcc -E?

5.6 Thinking Exercise

Before writing code, trace these expansions by hand:

Exercise 1: Stringification order

#define VERSION 3
#define STR(x) #x
#define XSTR(x) STR(x)

STR(VERSION)   // What is the result?
XSTR(VERSION)  // What is the result?

Exercise 2: Token pasting with macros

#define A 1
#define B 2
#define PASTE(x, y) x ## y
#define XPASTE(x, y) PASTE(x, y)

PASTE(A, B)   // What is the result?
XPASTE(A, B)  // What is the result?

Exercise 3: Recursive macro prevention

#define FOO (1 + BAR)
#define BAR (2 + FOO)

FOO  // Trace the expansion. What happens?

Exercise 4: X-macro expansion

#define FRUITS(X) X(APPLE) X(BANANA) X(CHERRY)
#define COUNT(x) + 1
#define NAME(x) #x,

int count = 0 FRUITS(COUNT);    // What is count?
char *names[] = { FRUITS(NAME) }; // What is names?

5.7 Hints in Layers

Hint 1: Starting with the Lexer

The lexer must handle preprocessor-specific tokens:

// Preprocessor tokens are different from C tokens!
// Must recognize:
// - ## (token pasting operator)
// - # (stringification in macro body)
// - ... (ellipsis for variadic)
// - Identifiers (including keywords as regular identifiers)

typedef enum {
    PP_TOK_IDENT,
    PP_TOK_NUMBER,
    PP_TOK_STRING,
    PP_TOK_CHAR,
    PP_TOK_PUNCT,
    PP_TOK_HASH,        // # alone
    PP_TOK_HASHHASH,    // ##
    PP_TOK_ELLIPSIS,
    PP_TOK_SPACE,       // Whitespace matters for pasting!
    PP_TOK_NEWLINE,
    PP_TOK_EOF
} PPTokenType;

Key insight: Whitespace matters in the preprocessor! a ## b is different from a##b in some edge cases.

Hint 2: Parsing #define Directives

Function-like macros need careful parsing:

// Parse: #define MAX(a, b) ((a) > (b) ? (a) : (b))
//        ^name  ^params   ^body

Macro *parse_define(TokenList *tokens) {
    Macro *m = calloc(1, sizeof(Macro));

    // Skip #define
    expect(tokens, PP_TOK_IDENT);  // "define"

    // Get macro name
    Token name = expect(tokens, PP_TOK_IDENT);
    m->name = strdup(name.text);

    // Check for ( immediately after name (NO SPACE!)
    // MAX(a,b) is function-like
    // MAX (a,b) is object-like with body "(a,b)"
    Token next = peek(tokens);
    if (next.type == PP_TOK_PUNCT && next.text[0] == '('
        && tokens->current_col == name.col + strlen(name.text)) {
        // Function-like macro
        parse_params(tokens, m);
    }

    // Rest is the body
    parse_body(tokens, m);

    return m;
}
Hint 3: The Expansion Core

The key insight is tracking the expansion state:

typedef struct {
    MacroTable *macros;
    Set *painted;        // Macros currently being expanded
    ExpansionTrace *trace;
    int trace_enabled;
} ExpandContext;

TokenList expand_tokens(ExpandContext *ctx, TokenList *input) {
    TokenList result = {0};

    for (size_t i = 0; i < input->count; i++) {
        Token tok = input->tokens[i];

        if (tok.type == PP_TOK_IDENT) {
            Macro *m = macro_lookup(ctx->macros, tok.text);

            if (m && !set_contains(ctx->painted, m->name)) {
                // Found unexpanded macro
                TokenList expanded;

                if (m->params) {
                    // Function-like: parse arguments
                    TokenList *args = parse_macro_args(input, &i, m);
                    expanded = expand_function_macro(ctx, m, args);
                } else {
                    // Object-like
                    expanded = expand_object_macro(ctx, m);
                }

                // Append expanded tokens
                for (size_t j = 0; j < expanded.count; j++) {
                    token_list_append(&result, expanded.tokens[j]);
                }
                continue;
            }
        }

        token_list_append(&result, tok);
    }

    return result;
}
Hint 4: Handling # and ##

The tricky part is knowing when arguments are pre-expanded:

TokenList expand_function_macro(ExpandContext *ctx, Macro *m, TokenList **args) {
    // Step 1: Pre-expand arguments that are NOT operands of # or ##
    TokenList *expanded_args = malloc(m->param_count * sizeof(TokenList));

    for (int i = 0; i < m->param_count; i++) {
        if (is_hash_or_paste_operand(m, i)) {
            // Keep raw for # or ##
            expanded_args[i] = *args[i];
        } else {
            // Pre-expand
            expanded_args[i] = expand_tokens(ctx, args[i]);
        }
    }

    // Step 2: Substitute parameters in body
    TokenList substituted = substitute_params(m->body, m->params,
                                              expanded_args, args);

    // Step 3: Process # and ## operators
    TokenList processed = process_hash_ops(&substituted);

    // Step 4: Rescan with macro painted
    set_add(ctx->painted, m->name);
    TokenList result = expand_tokens(ctx, &processed);
    set_remove(ctx->painted, m->name);

    return result;
}

// Check if parameter i is operand of # or ##
int is_hash_or_paste_operand(Macro *m, int param_idx) {
    char *param_name = m->params[param_idx];

    for (size_t i = 0; i < m->body.count; i++) {
        Token *t = &m->body.tokens[i];

        if (t->type == PP_TOK_IDENT && strcmp(t->text, param_name) == 0) {
            // Check if preceded by # or ##
            if (i > 0) {
                Token *prev = &m->body.tokens[i-1];
                if (prev->type == PP_TOK_HASH ||
                    prev->type == PP_TOK_HASHHASH) {
                    return 1;
                }
            }
            // Check if followed by ##
            if (i + 1 < m->body.count) {
                Token *next = &m->body.tokens[i+1];
                if (next->type == PP_TOK_HASHHASH) {
                    return 1;
                }
            }
        }
    }
    return 0;
}
Hint 5: Stringification Implementation

The # operator converts tokens to a string literal:

Token stringify(TokenList *arg) {
    // Build string from tokens, handling special cases
    StringBuilder sb = {0};

    sb_append(&sb, "\"");

    for (size_t i = 0; i < arg->count; i++) {
        Token *t = &arg->tokens[i];

        // Collapse whitespace to single space
        if (t->type == PP_TOK_SPACE) {
            if (sb.len > 1 && sb.data[sb.len-1] != ' ') {
                sb_append(&sb, " ");
            }
            continue;
        }

        // Escape quotes and backslashes in strings/chars
        if (t->type == PP_TOK_STRING || t->type == PP_TOK_CHAR) {
            for (char *p = t->text; *p; p++) {
                if (*p == '"' || *p == '\\') {
                    sb_append_char(&sb, '\\');
                }
                sb_append_char(&sb, *p);
            }
        } else {
            sb_append(&sb, t->text);
        }
    }

    // Trim trailing space
    while (sb.len > 1 && sb.data[sb.len-1] == ' ') {
        sb.len--;
    }

    sb_append(&sb, "\"");

    return (Token){ .type = PP_TOK_STRING, .text = sb.data };
}
Hint 6: Token Pasting Implementation

The ## operator concatenates adjacent tokens:

TokenList process_paste(TokenList *input) {
    TokenList result = {0};

    for (size_t i = 0; i < input->count; i++) {
        Token *t = &input->tokens[i];

        if (t->type == PP_TOK_HASHHASH) {
            // Find tokens to paste
            // Remove whitespace before ##
            while (result.count > 0 &&
                   result.tokens[result.count-1].type == PP_TOK_SPACE) {
                result.count--;
            }

            Token *left = &result.tokens[result.count - 1];

            // Skip whitespace after ##
            i++;
            while (i < input->count &&
                   input->tokens[i].type == PP_TOK_SPACE) {
                i++;
            }

            Token *right = &input->tokens[i];

            // Concatenate token texts
            char *pasted = malloc(strlen(left->text) + strlen(right->text) + 1);
            sprintf(pasted, "%s%s", left->text, right->text);

            // Re-tokenize the result (might be invalid!)
            TokenList retok = tokenize_string(pasted);
            if (retok.count != 1) {
                // Pasting produced invalid or multiple tokens
                warning("Token pasting produced '%s' - may be invalid", pasted);
            }

            // Replace left token with pasted result
            result.tokens[result.count - 1] = retok.tokens[0];
        } else {
            token_list_append(&result, *t);
        }
    }

    return result;
}

5.8 The Interview Questions They’ll Ask

After completing this project, you’ll be ready for these questions:

  1. “Explain the difference between #x and STRINGIFY(x) where STRINGIFY(x) is #x
    • They’re the same macro! The difference is in USING XSTRINGIFY(x) = STRINGIFY(x)
    • Direct use: argument not pre-expanded
    • Indirect use: argument is pre-expanded before the inner call
  2. “Why does MAX(i++, j) cause problems if MAX is a macro?”
    • Macros do text substitution, not value passing
    • i++ appears twice in expansion, incremented twice
    • Inline functions or statement expressions are the fix
  3. “What is the X-macro pattern and when would you use it?”
    • Define data once, generate multiple constructs
    • Perfect for enums with string names
    • Used in error handling, state machines, command tables
  4. “How does ##__VA_ARGS__ work?”
    • GNU extension for variadic macros
    • Deletes preceding comma if __VA_ARGS__ is empty
    • Standard alternative in C23: __VA_OPT__(,)
  5. “What does ‘painted blue’ mean in macro expansion?”
    • Prevents infinite recursion
    • A macro currently being expanded won’t expand again
    • Allows #define FOO FOO without hanging
  6. “How would you debug a complex macro expansion issue?”
    • Use gcc -E to see preprocessor output
    • Add step-by-step tracing
    • Break complex macros into smaller pieces
    • Use this tool!

5.9 Books That Will Help

Topic Book Chapter
Preprocessor basics Expert C Programming Ch. 7 “The Preprocessor”
Macro techniques C Interfaces and Implementations Ch. 1 “Exceptions”
X-macros 21st Century C Ch. 10 “Better Structures”
Variadic macros C: A Reference Manual Ch. 3.3 “Macros”
Preprocessor specification C Standard Section 6.10

5.10 Implementation Phases

Phase 1: Basic Expansion (Days 1-2)

Goals:

  • Implement lexer for preprocessor tokens
  • Parse object-like macros
  • Basic expansion without # or ##

Test Cases:

#define VERSION 42
#define MESSAGE "Hello"
#define EMPTY

int v = VERSION;        // → int v = 42;
char *m = MESSAGE;      // → char *m = "Hello";
int e = EMPTY 5;        // → int e = 5;

Phase 2: Function-like Macros (Days 3-4)

Goals:

  • Parse function-like macros with parameters
  • Implement argument substitution
  • Handle variadic macros

Test Cases:

#define MAX(a, b) ((a) > (b) ? (a) : (b))
#define LOG(fmt, ...) printf(fmt, ##__VA_ARGS__)

int x = MAX(3, 5);      // → int x = ((3) > (5) ? (3) : (5));
LOG("hello");           // → printf("hello");
LOG("x=%d", x);         // → printf("x=%d", x);

Phase 3: # and ## Operators (Days 5-6)

Goals:

  • Implement stringification
  • Implement token pasting
  • Handle indirection patterns

Test Cases:

#define STR(x) #x
#define XSTR(x) STR(x)
#define PASTE(a, b) a ## b
#define XPASTE(a, b) PASTE(a, b)
#define VER 3

STR(VER)      // → "VER"
XSTR(VER)     // → "3"
PASTE(a, b)   // → ab

Phase 4: Visualization & X-Macros (Day 7)

Goals:

  • Step-by-step output formatting
  • X-macro pattern recognition
  • Verification against gcc -E

Test Cases:

#define COLORS(X) X(RED) X(GREEN) X(BLUE)
#define ENUM(x) x,
enum { COLORS(ENUM) };  // → enum { RED, GREEN, BLUE, };

5.11 Key Implementation Decisions

  1. Token vs String representation: Work with tokens for accuracy, convert to strings for display

  2. When to trace: Add hooks at each expansion step, controlled by –steps flag

  3. Error handling: Invalid paste results, mismatched arguments, recursive detection

  4. Whitespace handling: Preprocessor preserves some whitespace between tokens

  5. Predefined macros: __FILE__, __LINE__ need context, may need to approximate


6. Testing Strategy

Test Categories

Category Purpose Examples
Unit Tests Test individual components Lexer tokenization, macro parsing
Expansion Tests Verify correct expansion Compare output with gcc -E
Edge Cases Handle tricky situations Empty arguments, nested macros
Visualization Tests Check output formatting Step-by-step display

Critical Test Cases

// test_cases/basic.c - Object-like macros
#define PI 3.14159
#define EMPTY
#define MULTI_LINE one \
                   two \
                   three

// test_cases/stringify.c - Stringification
#define STR(x) #x
#define XSTR(x) STR(x)
#define VER 42
// Test: STR(VER) should give "VER"
// Test: XSTR(VER) should give "42"

// test_cases/paste.c - Token pasting
#define PASTE(a, b) a##b
#define XPASTE(a, b) PASTE(a, b)
#define A 1
#define B 2
// Test: PASTE(A, B) should give AB
// Test: XPASTE(A, B) should give 12

// test_cases/variadic.c - Variadic macros
#define LOG1(fmt, ...) printf(fmt, __VA_ARGS__)
#define LOG2(fmt, ...) printf(fmt, ##__VA_ARGS__)
// Test: LOG1("hi") should give printf("hi", )  (invalid!)
// Test: LOG2("hi") should give printf("hi")

// test_cases/xmacro.c - X-macro pattern
#define COLORS(X) X(RED, 0) X(GREEN, 1) X(BLUE, 2)
#define ENUM_GEN(name, val) name = val,
enum Color { COLORS(ENUM_GEN) };
// Should expand to: enum Color { RED = 0, GREEN = 1, BLUE = 2, };

// test_cases/recursion.c - Recursion prevention
#define FOO (1 + FOO)
// Test: FOO should give (1 + FOO), not infinite loop

// test_cases/nested.c - Nested expansion
#define A B
#define B C
#define C 42
// Test: A should give 42

Verification Script

#!/bin/bash
# verify.sh - Compare tool output with gcc -E

for testfile in test_cases/*.c; do
    echo "Testing $testfile..."

    # Get gcc output
    gcc -E "$testfile" 2>/dev/null | grep -v '^#' > /tmp/gcc_out.txt

    # Get tool output
    ./preproc_analyzer "$testfile" --raw > /tmp/tool_out.txt

    # Compare
    if diff -q /tmp/gcc_out.txt /tmp/tool_out.txt > /dev/null; then
        echo "  PASS"
    else
        echo "  FAIL - outputs differ:"
        diff /tmp/gcc_out.txt /tmp/tool_out.txt | head -20
    fi
done

7. Common Pitfalls & Debugging

Frequent Mistakes

Pitfall Symptom Solution
Forgetting parentheses SQUARE(1+2) gives 5, not 9 Parenthesize all parameters and result
Double evaluation Side effects happen twice Use inline functions or statement expressions
# without indirection STR(MACRO) gives “MACRO” Use two-level macro: XSTR(x) STR(x)
## without indirection Pasting unexpanded tokens Use two-level macro for expansion
Comma in argument Argument splitting Use parentheses: MACRO((a, b))
Missing continuation Macro ends unexpectedly Check for \ at line ends

Debugging Strategies

For unexpected expansion:

# See what gcc produces
gcc -E source.c | grep -A5 'line_number'

# Use -dM to see all macro definitions
gcc -E -dM source.c

# Use -dD to see defines in context
gcc -E -dD source.c

For your tool:

# Enable verbose tracing
./preproc_analyzer source.c --trace

# Show each expansion step
./preproc_analyzer source.c --steps

# Expand single macro interactively
./preproc_analyzer -i

Common debugging patterns:

// Debug: See what macro produces
#define SHOW_EXPANSION(x) #x
printf("Expands to: %s\n", SHOW_EXPANSION(YOUR_MACRO(args)));

// Debug: Check intermediate result
#define DEBUG_STRINGIFY(x) DEBUG_STRINGIFY2(x)
#define DEBUG_STRINGIFY2(x) #x
// Now DEBUG_STRINGIFY(MACRO) shows the expanded form as a string

8. Extensions & Challenges

Beginner Extensions

  • Interactive mode: REPL for testing macro expansions
  • Colorized output: Highlight # and ## operators, macro names
  • Warning detection: Flag double evaluation, missing parentheses
  • Macro dependency graph: Show which macros use which

Intermediate Extensions

  • Include processing: Expand #include directives
  • Conditional compilation: Handle #if, #ifdef, #else
  • C++ support: Handle namespace, templates in headers
  • Web interface: Interactive macro expander in browser

Advanced Extensions

  • Full preprocessor: Complete preprocessing, not just macros
  • Macro debugger: Breakpoints on expansion, step-through
  • Performance analysis: Identify slow macro patterns
  • Macro refactoring: Suggest improvements to complex macros

9. Real-World Connections

Industry Applications

Linux Kernel:

// container_of - Get containing structure from member pointer
#define container_of(ptr, type, member) ({                      \
    const typeof( ((type *)0)->member ) *__mptr = (ptr);        \
    (type *)( (char *)__mptr - offsetof(type,member) );})

// list_for_each - Iterate over a list
#define list_for_each(pos, head) \
    for (pos = (head)->next; pos != (head); pos = pos->next)

Unity Test Framework:

#define TEST_ASSERT_EQUAL_INT(expected, actual) \
    UnityAssertEqualNumber((UNITY_INT)(expected), (UNITY_INT)(actual), \
                          __LINE__, UNITY_DISPLAY_STYLE_INT)

SQLite:

// X-macro for opcode table
#define OP_Goto            1
#define OP_Gosub           2
// ... hundreds more, all generated from table
  • GCC: The actual preprocessor implementation
  • cpp: Standalone C preprocessor
  • mcpp: Portable C preprocessor implementation
  • coan: C preprocessor analyzer and simplifier
  • unifdef: Remove conditional compilation

10. Resources

Essential Reading

  • C Standard Section 6.10: Official preprocessor specification
  • Expert C Programming Ch. 7: “The Preprocessor”
  • GCC Preprocessor Manual: Detailed implementation docs

Online Tools

  • Godbolt Compiler Explorer: See preprocessor output online
  • C Preprocessor Tricks: https://github.com/pfultz2/Cloak/wiki
  • Boost.PP: Advanced preprocessor metaprogramming (C++)

Books That Will Help

Topic Book Chapter
Preprocessor overview Expert C Programming Ch. 7
Macro patterns C Interfaces and Implementations Ch. 1
X-macros 21st Century C Ch. 10
Formal specification C: A Reference Manual Ch. 3
Advanced techniques C Programming FAQs Questions 10.*

11. Self-Assessment Checklist

Understanding

  • I can explain why #x doesn’t expand x but XSTR(x) does
  • I understand the prescan/substitute/rescan expansion order
  • I know when to use the two-level indirection pattern
  • I can explain “painted blue” and why it’s needed
  • I understand the X-macro pattern and can implement one

Implementation

  • My lexer correctly handles all preprocessor tokens
  • My expander handles object-like macros correctly
  • My expander handles function-like macros with arguments
  • Stringification (#) works correctly
  • Token pasting (##) works correctly
  • Variadic macros with __VA_ARGS__ work
  • Output matches gcc -E for all test cases

Visualization

  • Step-by-step output is clear and educational
  • X-macro expansions are shown properly
  • Warnings are generated for common pitfalls

Growth

  • I can debug complex macro expansion issues
  • I can read and understand Linux kernel macros
  • I can write correct X-macro patterns
  • I know when to use macros vs inline functions

12. Submission / Completion Criteria

Minimum Viable Completion

  • Parses #define directives (object and function-like)
  • Expands basic macros correctly
  • Handles # stringification
  • Handles ## token pasting
  • Output matches gcc -E for basic cases

Full Completion

  • All macro types work correctly
  • Step-by-step visualization implemented
  • X-macro pattern recognized and visualized
  • Variadic macros handled (including ##VA_ARGS)
  • Verification mode compares with gcc -E
  • Comprehensive test suite passing

Excellence (Going Above & Beyond)

  • Interactive mode with REPL
  • Full conditional compilation support
  • Warning detection for common pitfalls
  • Web interface for macro exploration
  • Performance analysis for macro patterns
  • Documentation generator from macro comments

13. Thinking Exercise Answers

Exercise 1: Stringification order

STR(VERSION)   // → "VERSION" (# sees unexpanded VERSION)
XSTR(VERSION)  // → "3" (VERSION expanded to 3, then STR(3) → "3")

Exercise 2: Token pasting with macros

PASTE(A, B)    // → AB (A and B not expanded, just pasted)
XPASTE(A, B)   // → 12 (A→1, B→2 expanded first, then PASTE(1,2) → 12)

Exercise 3: Recursive macro prevention

FOO
 (1 + BAR)     // FOO painted blue
 (1 + (2 + FOO))  // BAR expanded, FOO painted, won't expand again
// Final: (1 + (2 + FOO))

Exercise 4: X-macro expansion

int count = 0 FRUITS(COUNT);
 int count = 0 + 1 + 1 + 1;  // count = 3

char *names[] = { FRUITS(NAME) };
 char *names[] = { "APPLE", "BANANA", "CHERRY", };

This project is part of the Expert C Programming Mastery series. For the complete learning path, see the project index.