Project 9: AoS to SoA Converter Tool
A command-line utility that reads a C header file, finds a specific
structdefinition (e.g., marked with a__declspecor__attribute__), and automatically generates a new header file containing the equivalent Struct-of-Arrays (SoA) layout and accessor functions.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | C |
| Alternative Languages | C++, Rust, Python (for parsing) |
| Difficulty | Level 3: Advanced |
| Time Estimate | 2-3 weeks |
| Knowledge Area | Code Generation / Parsing / Data-Oriented Design |
| Tooling | A C parsing library (like libclang) is recommended. |
| Prerequisites | Project 5 (Image Processing Benchmark), strong C skills. |
What You Will Build
A command-line utility that reads a C header file, finds a specific struct definition (e.g., marked with a __declspec or __attribute__), and automatically generates a new header file containing the equivalent Struct-of-Arrays (SoA) layout and accessor functions.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Parsing C struct definitions → maps to using a library like libclang or writing a simple, brittle parser yourself
- Transforming the parsed data → maps to converting the list of fields into a set of arrays
- Generating the new C code → maps to
printf-ing valid C code into a new.hfile - Generating helper functions/macros → maps to creating an API to make the SoA layout usable (e.g.,
get_pixel_r(soa_img, i))
Key Concepts
- Abstract Syntax Trees (AST): The primary output of a parser like
libclang. - Code Generation: A core concept in compilers.
- AoS vs. SoA: The core performance concept being automated.
Real-World Outcome
// @soa_transform
typedef struct Pixel {
float r, g, b, a;
int obj_id;
} Pixel;
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
LEARN_C_PERFORMANCE_DEEP_DIVE.md - “Language Implementation Patterns” by Terence Parr