Project 9: AoS to SoA Converter Tool

A command-line utility that reads a C header file, finds a specific struct definition (e.g., marked with a __declspec or __attribute__), and automatically generates a new header file containing the equivalent Struct-of-Arrays (SoA) layout and accessor functions.

Quick Reference

Attribute Value
Primary Language C
Alternative Languages C++, Rust, Python (for parsing)
Difficulty Level 3: Advanced
Time Estimate 2-3 weeks
Knowledge Area Code Generation / Parsing / Data-Oriented Design
Tooling A C parsing library (like libclang) is recommended.
Prerequisites Project 5 (Image Processing Benchmark), strong C skills.

What You Will Build

A command-line utility that reads a C header file, finds a specific struct definition (e.g., marked with a __declspec or __attribute__), and automatically generates a new header file containing the equivalent Struct-of-Arrays (SoA) layout and accessor functions.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Parsing C struct definitions → maps to using a library like libclang or writing a simple, brittle parser yourself
  • Transforming the parsed data → maps to converting the list of fields into a set of arrays
  • Generating the new C code → maps to printf-ing valid C code into a new .h file
  • Generating helper functions/macros → maps to creating an API to make the SoA layout usable (e.g., get_pixel_r(soa_img, i))

Key Concepts

  • Abstract Syntax Trees (AST): The primary output of a parser like libclang.
  • Code Generation: A core concept in compilers.
  • AoS vs. SoA: The core performance concept being automated.

Real-World Outcome

// @soa_transform
typedef struct Pixel {
    float r, g, b, a;
    int obj_id;
} Pixel;

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: LEARN_C_PERFORMANCE_DEEP_DIVE.md
  • “Language Implementation Patterns” by Terence Parr