Project 15: Struct Packing Analyzer

Build a tool that analyzes struct padding and alignment, revealing hidden memory waste and optimization opportunities.


Quick Reference

Attribute Value
Language C
Difficulty Level 3 (Advanced)
Time 1 Week
Key Concepts Alignment, padding, cache optimization, data layout
Prerequisites P03 (Memory Layout), struct basics, pointer arithmetic
Portfolio Value High - demonstrates systems knowledge and optimization skills

1. Learning Objectives

By completing this project, you will:

  1. Master memory alignment rules: Understand why CPUs require data at aligned addresses and the performance penalties of misaligned access

  2. Calculate struct padding: Predict exactly how the compiler will lay out struct members, including all padding bytes

  3. Optimize data structures: Reorder struct members to minimize memory footprint while maintaining alignment

  4. Understand cache-line optimization: Design data structures that minimize cache misses and false sharing

  5. Parse C struct declarations: Build a parser that handles nested structs, arrays, and all basic types

  6. Apply the offsetof macro: Use and implement offsetof() to verify struct layouts

  7. Handle platform differences: Account for 32-bit vs 64-bit alignment requirements


2. Theoretical Foundation

2.1 Core Concepts

Natural Alignment

Every data type has a natural alignment requirement, typically equal to its size:

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                         NATURAL ALIGNMENT                                    │
    ├─────────────────────────────────────────────────────────────────────────────┤
    │                                                                             │
    │  TYPE          SIZE    ALIGNMENT   VALID ADDRESSES                          │
    │  ──────────────────────────────────────────────────────────────────────────  │
    │  char          1       1           Any address (0, 1, 2, 3, 4, 5, 6, 7...)  │
    │  short         2       2           Even addresses (0, 2, 4, 6, 8...)        │
    │  int           4       4           Divisible by 4 (0, 4, 8, 12, 16...)      │
    │  float         4       4           Divisible by 4 (0, 4, 8, 12, 16...)      │
    │  double        8       8           Divisible by 8 (0, 8, 16, 24, 32...)     │
    │  long (64-bit) 8       8           Divisible by 8 (0, 8, 16, 24, 32...)     │
    │  pointer       4/8     4/8         Divisible by pointer size                │
    │                                                                             │
    │  RULE: Alignment = min(sizeof(type), MAX_ALIGNMENT)                         │
    │                                                                             │
    │  On most systems: MAX_ALIGNMENT = 8 (64-bit) or 4 (32-bit)                  │
    │                                                                             │
    └─────────────────────────────────────────────────────────────────────────────┘

Padding Bytes

The compiler inserts invisible padding to maintain alignment:

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                         PADDING VISUALIZATION                                │
    ├─────────────────────────────────────────────────────────────────────────────┤
    │                                                                             │
    │  struct example {                                                           │
    │      char a;      // 1 byte                                                 │
    │      int b;       // 4 bytes, needs 4-byte alignment                        │
    │      char c;      // 1 byte                                                 │
    │  };                                                                         │
    │                                                                             │
    │  MEMORY LAYOUT:                                                             │
    │                                                                             │
    │  Offset: 0     1     2     3     4     5     6     7     8     9    10   11 │
    │          ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐│
    │          │  a  │ PAD │ PAD │ PAD │     b (4 bytes)     │  c  │ PAD │ PAD ││ │
    │          └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘│
    │          │<- char ->│<- 3 pad -->│<----- int -------->│<-ch->│<-3 pad->│    │
    │                                                                             │
    │  Total: 12 bytes (not 6!)                                                   │
    │  Data:   6 bytes                                                            │
    │  Padding: 6 bytes (50% waste!)                                              │
    │                                                                             │
    │  WHY THE TRAILING PADDING?                                                  │
    │  ─────────────────────────                                                  │
    │  In arrays: struct example arr[2];                                          │
    │  arr[1].b must be at an aligned address.                                    │
    │  Struct size is rounded to struct alignment.                                │
    │                                                                             │
    └─────────────────────────────────────────────────────────────────────────────┘

Struct Padding Visualization

Struct Alignment Rules

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                    STRUCT ALIGNMENT RULES                                    │
    ├─────────────────────────────────────────────────────────────────────────────┤
    │                                                                             │
    │  RULE 1: Each member is placed at offset divisible by its alignment         │
    │                                                                             │
    │  RULE 2: Struct alignment = alignment of largest member                     │
    │                                                                             │
    │  RULE 3: Struct size is rounded up to struct alignment                      │
    │          (so arrays of structs work correctly)                              │
    │                                                                             │
    │  RULE 4: Padding is added BEFORE a member if needed for alignment           │
    │          Padding is added AFTER the last member if needed for struct size   │
    │                                                                             │
    │  ALGORITHM TO CALCULATE LAYOUT:                                             │
    │  ─────────────────────────────────                                          │
    │  current_offset = 0                                                         │
    │  struct_alignment = 1                                                       │
    │                                                                             │
    │  for each member:                                                           │
    │      member_alignment = alignment_of(member_type)                           │
    │      struct_alignment = max(struct_alignment, member_alignment)             │
    │                                                                             │
    │      // Add padding if needed                                               │
    │      if (current_offset % member_alignment != 0):                           │
    │          padding = member_alignment - (current_offset % member_alignment)   │
    │          current_offset += padding                                          │
    │                                                                             │
    │      member_offset = current_offset                                         │
    │      current_offset += sizeof(member_type)                                  │
    │                                                                             │
    │  // Add trailing padding                                                    │
    │  if (current_offset % struct_alignment != 0):                               │
    │      padding = struct_alignment - (current_offset % struct_alignment)       │
    │      current_offset += padding                                              │
    │                                                                             │
    │  struct_size = current_offset                                               │
    │                                                                             │
    └─────────────────────────────────────────────────────────────────────────────┘

Cache Line Optimization

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                    CACHE LINE CONSIDERATIONS                                 │
    ├─────────────────────────────────────────────────────────────────────────────┤
    │                                                                             │
    │  CACHE LINE: Minimum unit of data transferred from RAM to CPU cache         │
    │              Typically 64 bytes on modern x86/ARM                           │
    │                                                                             │
    │  PROBLEM 1: Struct spanning cache lines                                     │
    │  ─────────────────────────────────────                                      │
    │                                                                             │
    │  struct big_struct {                                                        │
    │      char data[60];                                                         │
    │      int important_field;  // Might cross cache line boundary!              │
    │  };                                                                         │
    │                                                                             │
    │  Cache Line 1 (64 bytes)          Cache Line 2 (64 bytes)                   │
    │  ┌────────────────────────────────┬────────────────────────────────┐        │
    │  │ data[0..59]          │imp[0:3]│ imp[4:7] │ ...                  │        │
    │  └────────────────────────────────┴────────────────────────────────┘        │
    │                         ▲                                                   │
    │                         └── Reading important_field needs TWO cache loads!  │
    │                                                                             │
    │  PROBLEM 2: False sharing (multithreading)                                  │
    │  ──────────────────────────────────────────                                 │
    │                                                                             │
    │  struct counter {                                                           │
    │      int thread1_count;    // Thread 1 writes here                          │
    │      int thread2_count;    // Thread 2 writes here                          │
    │  };  // Both in same cache line = cache thrashing!                          │
    │                                                                             │
    │  SOLUTION: Align to cache line or add padding                               │
    │                                                                             │
    │  struct counter_fixed {                                                     │
    │      int thread1_count;                                                     │
    │      char _pad1[60];       // Force to different cache lines                │
    │      int thread2_count;                                                     │
    │      char _pad2[60];                                                        │
    │  };                                                                         │
    │                                                                             │
    └─────────────────────────────────────────────────────────────────────────────┘

Cache Line Considerations

__attribute__((packed))

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                    PACKED STRUCTS                                            │
    ├─────────────────────────────────────────────────────────────────────────────┤
    │                                                                             │
    │  // Normal struct (with padding)                                            │
    │  struct normal {                                                            │
    │      char a;                                                                │
    │      int b;                                                                 │
    │  };  // sizeof = 8                                                          │
    │                                                                             │
    │  // Packed struct (no padding)                                              │
    │  struct __attribute__((packed)) packed {                                    │
    │      char a;                                                                │
    │      int b;                                                                 │
    │  };  // sizeof = 5                                                          │
    │                                                                             │
    │  NORMAL:          PACKED:                                                   │
    │  ┌───┬───┬───┬───┬───────────────┐    ┌───┬───────────────┐                 │
    │  │ a │PAD│PAD│PAD│       b       │    │ a │       b       │                 │
    │  └───┴───┴───┴───┴───────────────┘    └───┴───────────────┘                 │
    │  0   1   2   3   4               8    0   1               5                 │
    │                                                                             │
    │  WARNING: Accessing b in packed struct may be MISALIGNED!                   │
    │                                                                             │
    │  Performance implications:                                                  │
    │  • x86: Hardware handles misalignment (slower but works)                    │
    │  • ARM: May crash or require special instructions                           │
    │  • Network protocols often use packed structs (protocol spec alignment)     │
    │                                                                             │
    │  USE PACKED FOR:                                                            │
    │  • Network protocol headers                                                 │
    │  • File format structures                                                   │
    │  • Memory-mapped hardware registers                                         │
    │  • Binary file I/O                                                          │
    │                                                                             │
    │  AVOID PACKED FOR:                                                          │
    │  • General data structures                                                  │
    │  • Frequently accessed structs                                              │
    │  • Structs used in tight loops                                              │
    │                                                                             │
    └─────────────────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

Memory Efficiency

Large applications with millions of struct instances can waste gigabytes on padding:

struct inefficient {    // Common in real codebases
    char type;          // 1 byte
    double value;       // 8 bytes (but at offset 8 due to padding!)
    char flags;         // 1 byte
    int count;          // 4 bytes (but at offset 20 due to padding!)
};  // sizeof = 24 bytes, but only 14 bytes of data!

struct efficient {      // Same data, better layout
    double value;       // 8 bytes at offset 0
    int count;          // 4 bytes at offset 8
    char type;          // 1 byte at offset 12
    char flags;         // 1 byte at offset 13
};  // sizeof = 16 bytes, saved 33%!

Cache Performance

Poor struct layout causes:

  • More cache misses (larger structs = fewer fit in cache)
  • More memory bandwidth used (loading padding bytes)
  • False sharing in multithreaded code

Network Protocols and Binary Formats

Understanding struct layout is essential for:

  • Parsing binary file formats
  • Network protocol implementation
  • Serialization/deserialization
  • Memory-mapped I/O

2.3 Historical Context

Why Alignment Matters to CPUs

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                    CPU MEMORY ACCESS                                         │
    ├─────────────────────────────────────────────────────────────────────────────┤
    │                                                                             │
    │  CPUs access memory in fixed-size chunks (word size: 4 or 8 bytes)          │
    │                                                                             │
    │  ALIGNED ACCESS (one memory operation):                                     │
    │  ─────────────────────────────────────────                                  │
    │  Reading int at address 8:                                                  │
    │                                                                             │
    │  Memory: │        │        │        │        │                              │
    │  Addr:   0        4        8        12       16                             │
    │                            │████████│                                       │
    │                            ▲        ▼                                       │
    │                            └─ CPU reads one 4-byte word                     │
    │                                                                             │
    │  MISALIGNED ACCESS (two memory operations):                                 │
    │  ────────────────────────────────────────────                               │
    │  Reading int at address 6:                                                  │
    │                                                                             │
    │  Memory: │        │        │        │        │                              │
    │  Addr:   0        4        8        12       16                             │
    │                      │██████████████│                                       │
    │                      │ 4    │ 8     │                                       │
    │                      ▲              ▼                                       │
    │                      └─ CPU must read TWO words, then extract & combine!    │
    │                                                                             │
    │  PENALTY:                                                                   │
    │  • x86: 2-20x slower for misaligned access                                  │
    │  • Some ARM: SIGBUS crash!                                                  │
    │  • Some RISC: Trap to OS handler (very slow)                                │
    │                                                                             │
    └─────────────────────────────────────────────────────────────────────────────┘

Evolution of Alignment Requirements

  • 1970s (PDP-11, early C): 2-byte alignment was typical
  • 1980s (68000, VAX): 2 or 4-byte alignment
  • 1990s (x86 32-bit): 4-byte alignment common
  • 2000s+ (x86-64, ARM64): 8-byte alignment for 64-bit types
  • Modern SIMD: 16/32/64-byte alignment for vector operations

2.4 Common Misconceptions

Misconception 1: “Structs are laid out in memory exactly as written”

Reality: The compiler adds padding for alignment. sizeof(struct) >= sum of member sizes.

Misconception 2: “Packing saves memory without cost”

Reality: Packed structs cause misaligned access, which is slower (or crashes on some CPUs).

Misconception 3: “Alignment is just an optimization hint”

Reality: It is a hard requirement on many architectures. Misaligned access can crash.

Misconception 4: “Reordering members is purely about size”

Reality: Access patterns matter too. Keep frequently accessed members together for cache efficiency.


3. Project Specification

3.1 What You Will Build

A struct layout analyzer tool that:

  1. Parses C struct definitions from command line or file
  2. Calculates the exact memory layout including padding
  3. Shows a visual representation of the layout
  4. Suggests optimal member ordering to minimize size
  5. Reports statistics on memory waste
$ ./struct_analyzer "struct example { char a; int b; char c; double d; }"

╔════════════════════════════════════════════════════════════════════════════╗
║                         STRUCT LAYOUT ANALYSIS                              ║
║                         Platform: 64-bit (8-byte alignment)                 ║
╠════════════════════════════════════════════════════════════════════════════╣
║                                                                            ║
║  struct example {                                                          ║
║      char a;                                                               ║
║      int b;                                                                ║
║      char c;                                                               ║
║      double d;                                                             ║
║  };                                                                        ║
║                                                                            ║
║  ┌─────────────────────────────────────────────────────────────────────┐   ║
║  │ MEMORY LAYOUT                                                       │   ║
║  ├─────────────────────────────────────────────────────────────────────┤   ║
║  │                                                                     │   ║
║  │  Offset  Size  Member           Notes                               │   ║
║  │  ──────  ────  ──────           ─────                               │   ║
║  │     0      1   char a                                               │   ║
║  │     1      3   [padding]        align int b to 4-byte boundary      │   ║
║  │     4      4   int b                                                │   ║
║  │     8      1   char c                                               │   ║
║  │     9      7   [padding]        align double d to 8-byte boundary   │   ║
║  │    16      8   double d                                             │   ║
║  │  ──────                                                             │   ║
║  │    24          TOTAL SIZE                                           │   ║
║  │                                                                     │   ║
║  └─────────────────────────────────────────────────────────────────────┘   ║
║                                                                            ║
║  ┌─────────────────────────────────────────────────────────────────────┐   ║
║  │ VISUAL REPRESENTATION (each █ = 1 byte)                             │   ║
║  ├─────────────────────────────────────────────────────────────────────┤   ║
║  │                                                                     │   ║
║  │  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 │
║  │  ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐│
║  │  │a │░░│░░│░░│  b (4)  │c │░░│░░│░░│░░│░░│░░│░░│    d (8)           ││   │
║  │  └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘│
║  │  Legend: █ data  ░ padding                                          │   ║
║  │                                                                     │   ║
║  └─────────────────────────────────────────────────────────────────────┘   ║
║                                                                            ║
║  ┌─────────────────────────────────────────────────────────────────────┐   ║
║  │ STATISTICS                                                          │   ║
║  ├─────────────────────────────────────────────────────────────────────┤   ║
║  │                                                                     │   ║
║  │  Total Size:        24 bytes                                        │   ║
║  │  Data Bytes:        14 bytes                                        │   ║
║  │  Padding Bytes:     10 bytes (41.7% waste)                          │   ║
║  │  Struct Alignment:   8 bytes                                        │   ║
║  │                                                                     │   ║
║  └─────────────────────────────────────────────────────────────────────┘   ║
║                                                                            ║
╠════════════════════════════════════════════════════════════════════════════╣
║                         OPTIMIZED LAYOUT                                    ║
╠════════════════════════════════════════════════════════════════════════════╣
║                                                                            ║
║  struct example_optimized {                                                ║
║      double d;    // offset 0, size 8                                      ║
║      int b;       // offset 8, size 4                                      ║
║      char a;      // offset 12, size 1                                     ║
║      char c;      // offset 13, size 1                                     ║
║      // 2 bytes padding for struct alignment                               ║
║  };                                                                        ║
║                                                                            ║
║  Optimized Size: 16 bytes                                                  ║
║  Savings:        8 bytes (33.3% reduction)                                 ║
║                                                                            ║
╚════════════════════════════════════════════════════════════════════════════╝

3.2 Functional Requirements

Core Features

  1. Parse struct definitions: Handle standard C struct syntax
    • Basic types: char, short, int, long, float, double, pointers
    • Unsigned variants
    • Arrays within structs
    • Type qualifiers (const, volatile)
  2. Calculate layout:
    • Member offsets
    • Padding locations and sizes
    • Total struct size
    • Struct alignment
  3. Display results:
    • Tabular layout showing offset, size, member name
    • Visual byte-by-byte representation
    • Statistics (total size, data bytes, padding bytes, waste percentage)
  4. Suggest optimizations:
    • Optimal member ordering (largest to smallest alignment)
    • Show savings from reordering

Optional Features

  1. Platform modes:
    • 32-bit mode (4-byte pointer, 4-byte long)
    • 64-bit mode (8-byte pointer, 8-byte long)
    • Custom alignment rules
  2. Packed analysis:
    • Show what layout would be with attribute((packed))
    • Warn about misalignment issues
  3. Nested structs:
    • Handle struct-within-struct
    • Calculate nested alignment requirements

3.3 Non-Functional Requirements

  1. Accuracy: Layout calculations must match actual compiler output (verifiable with offsetof)
  2. Portability: Compile and run on Linux and macOS
  3. Usability: Clear, educational output that teaches alignment concepts
  4. Performance: Handle any reasonable struct definition quickly

3.4 Example Usage / Output

Basic Usage

# Command line input
$ ./struct_analyzer "struct point { int x; int y; }"

struct point:
  Offset 0: int x (4 bytes)
  Offset 4: int y (4 bytes)
  Total: 8 bytes (0% padding)

# From file
$ cat example.h
struct node {
    char tag;
    struct node *next;
    int value;
};

$ ./struct_analyzer -f example.h

struct node:
  Offset  0: char tag (1 byte)
  Offset  1: [3 bytes padding]
  Offset  4: struct node *next (8 bytes)   # On 64-bit
  ...

Comparing Platforms

$ ./struct_analyzer --32bit "struct foo { char a; long b; }"

[32-bit mode]
struct foo:
  Offset 0: char a (1 byte)
  Offset 1: [3 bytes padding]
  Offset 4: long b (4 bytes)
  Total: 8 bytes

$ ./struct_analyzer --64bit "struct foo { char a; long b; }"

[64-bit mode]
struct foo:
  Offset 0: char a (1 byte)
  Offset 1: [7 bytes padding]
  Offset 8: long b (8 bytes)
  Total: 16 bytes

3.5 Real World Outcome

When you complete this project:

  1. You can optimize data structures in production code, reducing memory footprint by 20-50% in many cases

  2. You understand binary compatibility when reading/writing binary files or network protocols

  3. You can debug mysterious crashes caused by misaligned access on strict-alignment platforms

  4. Interview preparation: “How would you optimize this struct?” becomes trivial


4. Solution Architecture

4.1 High-Level Design

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                    STRUCT PACKING ANALYZER ARCHITECTURE                      │
    ├─────────────────────────────────────────────────────────────────────────────┤
    │                                                                             │
    │  INPUT                                                                      │
    │  ─────                                                                      │
    │  "struct example { char a; int b; }"                                        │
    │                          │                                                  │
    │                          ▼                                                  │
    │  ┌─────────────────────────────────────────────────────────────────────┐    │
    │  │                         LEXER                                       │    │
    │  │  Tokenize input: STRUCT, IDENT, LBRACE, TYPE, IDENT, SEMI, ...     │    │
    │  └─────────────────────────────────────────────────────────────────────┘    │
    │                          │                                                  │
    │                          ▼                                                  │
    │  ┌─────────────────────────────────────────────────────────────────────┐    │
    │  │                        PARSER                                       │    │
    │  │  Build AST: StructDef { name, members[] }                          │    │
    │  │             MemberDef { type, name, array_size }                   │    │
    │  └─────────────────────────────────────────────────────────────────────┘    │
    │                          │                                                  │
    │                          ▼                                                  │
    │  ┌─────────────────────────────────────────────────────────────────────┐    │
    │  │                   LAYOUT CALCULATOR                                 │    │
    │  │  Apply alignment rules:                                            │    │
    │  │  - Calculate member offsets                                        │    │
    │  │  - Insert padding                                                  │    │
    │  │  - Calculate struct alignment                                      │    │
    │  │  - Calculate total size                                            │    │
    │  └─────────────────────────────────────────────────────────────────────┘    │
    │                          │                                                  │
    │                          ▼                                                  │
    │  ┌─────────────────────────────────────────────────────────────────────┐    │
    │  │                      OPTIMIZER                                      │    │
    │  │  Sort members by alignment (descending)                            │    │
    │  │  Recalculate layout                                                │    │
    │  │  Compare with original                                             │    │
    │  └─────────────────────────────────────────────────────────────────────┘    │
    │                          │                                                  │
    │                          ▼                                                  │
    │  ┌─────────────────────────────────────────────────────────────────────┐    │
    │  │                       REPORTER                                      │    │
    │  │  Format output:                                                    │    │
    │  │  - Text table of members                                           │    │
    │  │  - Visual byte layout                                              │    │
    │  │  - Statistics                                                      │    │
    │  │  - Optimization suggestions                                        │    │
    │  └─────────────────────────────────────────────────────────────────────┘    │
    │                          │                                                  │
    │                          ▼                                                  │
    │  OUTPUT (stdout)                                                            │
    │                                                                             │
    └─────────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

1. Lexer (lexer.c, lexer.h)

  • Tokenizes struct definition string
  • Handles keywords (struct, char, int, etc.)
  • Handles identifiers, punctuation, numbers (for arrays)

2. Parser (parser.c, parser.h)

  • Builds abstract syntax tree (AST) from tokens
  • Validates syntax
  • Handles arrays (e.g., char name[32])

3. Type System (types.c, types.h)

  • Stores size and alignment for each type
  • Platform-specific values (32-bit vs 64-bit)
  • Handles pointers, arrays

4. Layout Calculator (layout.c, layout.h)

  • Core algorithm: iterate members, add padding
  • Returns list of layout entries (member or padding)

5. Optimizer (optimizer.c, optimizer.h)

  • Sorts members by alignment
  • Groups equal-alignment members
  • Calculates improved layout

6. Reporter (reporter.c, reporter.h)

  • Formats output
  • Visual diagram generation
  • Statistics calculation

4.3 Data Structures

/* Token types for lexer */
typedef enum {
    TOK_STRUCT,
    TOK_CHAR, TOK_SHORT, TOK_INT, TOK_LONG,
    TOK_FLOAT, TOK_DOUBLE,
    TOK_UNSIGNED, TOK_SIGNED, TOK_CONST, TOK_VOLATILE,
    TOK_STAR,       /* * (pointer) */
    TOK_IDENT,      /* identifier */
    TOK_NUMBER,     /* array size */
    TOK_LBRACE, TOK_RBRACE,
    TOK_LBRACKET, TOK_RBRACKET,
    TOK_SEMICOLON,
    TOK_EOF,
    TOK_ERROR
} TokenType;

typedef struct {
    TokenType type;
    char text[MAX_IDENT_LEN];
    int number;     /* for array sizes */
} Token;

/* Type information */
typedef struct {
    const char *name;
    size_t size;
    size_t alignment;
} TypeInfo;

/* Parsed member */
typedef struct {
    char name[MAX_IDENT_LEN];
    char type_name[MAX_IDENT_LEN];
    size_t size;
    size_t alignment;
    int pointer_depth;  /* 0 = value, 1 = *, 2 = **, etc. */
    size_t array_size;  /* 0 = not array, N = array of N elements */
} MemberDef;

/* Parsed struct */
typedef struct {
    char name[MAX_IDENT_LEN];
    MemberDef members[MAX_MEMBERS];
    int member_count;
} StructDef;

/* Layout entry (member or padding) */
typedef struct {
    enum { ENTRY_MEMBER, ENTRY_PADDING } type;
    size_t offset;
    size_t size;
    union {
        MemberDef *member;  /* if type == ENTRY_MEMBER */
        struct {            /* if type == ENTRY_PADDING */
            const char *reason;  /* "align X to N-byte boundary" */
        } padding;
    };
} LayoutEntry;

/* Complete layout */
typedef struct {
    LayoutEntry entries[MAX_ENTRIES];
    int entry_count;
    size_t total_size;
    size_t data_size;
    size_t padding_size;
    size_t struct_alignment;
} StructLayout;

/* Platform configuration */
typedef struct {
    int bits;           /* 32 or 64 */
    size_t pointer_size;
    size_t long_size;
    size_t max_alignment;
} Platform;

4.4 Algorithm Overview

Layout Calculation Algorithm

StructLayout calculate_layout(StructDef *def, Platform *platform) {
    StructLayout layout = {0};
    size_t current_offset = 0;
    size_t struct_align = 1;

    for (int i = 0; i < def->member_count; i++) {
        MemberDef *m = &def->members[i];

        /* Get member's alignment requirement */
        size_t member_align = get_alignment(m, platform);
        struct_align = max(struct_align, member_align);

        /* Add padding if needed */
        size_t padding = (member_align - (current_offset % member_align)) % member_align;
        if (padding > 0) {
            add_padding_entry(&layout, current_offset, padding,
                             m->name, member_align);
            current_offset += padding;
        }

        /* Add member entry */
        add_member_entry(&layout, current_offset, m);
        current_offset += get_size(m, platform);
    }

    /* Add trailing padding for struct alignment */
    size_t trailing = (struct_align - (current_offset % struct_align)) % struct_align;
    if (trailing > 0) {
        add_trailing_padding(&layout, current_offset, trailing, struct_align);
        current_offset += trailing;
    }

    layout.total_size = current_offset;
    layout.struct_alignment = struct_align;
    calculate_statistics(&layout);

    return layout;
}

Optimization Algorithm

StructDef optimize_struct(StructDef *original) {
    StructDef optimized = *original;

    /* Sort members by alignment (descending), then by size (descending) */
    qsort(optimized.members, optimized.member_count,
          sizeof(MemberDef), compare_by_alignment_desc);

    return optimized;
}

int compare_by_alignment_desc(const void *a, const void *b) {
    const MemberDef *ma = a;
    const MemberDef *mb = b;

    /* Primary: alignment descending */
    if (ma->alignment != mb->alignment) {
        return (mb->alignment > ma->alignment) ? 1 : -1;
    }

    /* Secondary: size descending (for cache locality) */
    if (ma->size != mb->size) {
        return (mb->size > ma->size) ? 1 : -1;
    }

    return 0;
}

5. Implementation Guide

5.1 Development Environment Setup

Required Tools

# Compiler
gcc --version    # or clang --version

# Build
make --version

# Verification tools
gcc -E -dM - </dev/null | grep -E "SIZEOF|ALIGN"

Recommended Compiler Flags

CC = gcc
CFLAGS = -Wall -Wextra -Wpedantic -std=c11 -g
CFLAGS += -fsanitize=address -fsanitize=undefined

5.2 Project Structure

struct-analyzer/
├── src/
│   ├── main.c              # Entry point, argument parsing
│   ├── lexer.c             # Tokenization
│   ├── lexer.h
│   ├── parser.c            # Parse tokens to AST
│   ├── parser.h
│   ├── types.c             # Type system (sizes, alignments)
│   ├── types.h
│   ├── layout.c            # Layout calculation
│   ├── layout.h
│   ├── optimizer.c         # Struct optimization
│   ├── optimizer.h
│   ├── reporter.c          # Output formatting
│   └── reporter.h
├── tests/
│   ├── test_lexer.c
│   ├── test_parser.c
│   ├── test_layout.c
│   ├── verify_layout.c     # Compare with actual offsetof()
│   └── run_tests.sh
├── examples/
│   ├── basic_types.txt
│   ├── nested_structs.txt
│   └── real_world.txt
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How does the compiler lay out struct members in memory, and how can we optimize for size?”

This project forces you to understand the exact rules compilers use for alignment and padding, then apply that knowledge to minimize memory waste.

5.4 Concepts You Must Understand First

Before implementing, verify you can answer:

  1. What is the alignment of an int? (Usually sizeof(int), but check your platform)

  2. Why does struct { char c; int i; } have size 8, not 5? (Padding before int for alignment, plus trailing padding for array usage)

  3. What determines struct alignment? (Largest member alignment)

  4. What is the offsetof macro? (offsetof(struct_type, member_name) returns byte offset)

  5. Why is trailing padding needed? (Arrays: arr[1] members must be aligned)

5.5 Questions to Guide Your Design

Type System

  • How will you handle unsigned vs signed types?
  • How will you handle pointers (platform-dependent size)?
  • How will you handle arrays within structs?
  • How will you handle nested structs?

Parser

  • Will you handle full C syntax or a simplified subset?
  • How will you report syntax errors?
  • How will you handle type qualifiers (const, volatile)?

Layout

  • How will you track padding locations for the report?
  • How will you handle platform differences (32 vs 64 bit)?
  • How will you verify your calculations are correct?

Output

  • What format is most educational?
  • How will you visualize the layout?
  • How will you handle very large structs in the visual display?

5.6 Thinking Exercise

Before coding, manually calculate the layout for:

struct test {
    char a;
    short b;
    char c;
    int d;
    char e;
};

Work through it step by step:

  1. char a: offset 0, size 1, alignment 1
  2. short b needs alignment 2. Offset 1 is not aligned. Add 1 byte padding. b at offset 2.
  3. char c: offset 4 (after b which is 2 bytes). Size 1.
  4. int d needs alignment 4. Offset 5 is not aligned. Add 3 bytes padding. d at offset 8.
  5. char e: offset 12. Size 1.
  6. Struct alignment = max(1, 2, 1, 4, 1) = 4
  7. Current size = 13. Round up to alignment 4 = 16 bytes total.

Layout:

Offset 0: a (1)
Offset 1: padding (1)
Offset 2: b (2)
Offset 4: c (1)
Offset 5: padding (3)
Offset 8: d (4)
Offset 12: e (1)
Offset 13: trailing padding (3)
Total: 16 bytes

Verify with code:

#include <stddef.h>
#include <stdio.h>

struct test { char a; short b; char c; int d; char e; };

int main(void) {
    printf("sizeof(struct test) = %zu\n", sizeof(struct test));
    printf("offsetof(a) = %zu\n", offsetof(struct test, a));
    printf("offsetof(b) = %zu\n", offsetof(struct test, b));
    printf("offsetof(c) = %zu\n", offsetof(struct test, c));
    printf("offsetof(d) = %zu\n", offsetof(struct test, d));
    printf("offsetof(e) = %zu\n", offsetof(struct test, e));
    return 0;
}

5.7 Hints in Layers

Hint 1: Start with the Type System

Begin by defining the type sizes and alignments. Create a lookup table:

typedef struct {
    const char *name;
    size_t size;
    size_t alignment;
} TypeInfo;

TypeInfo types_64bit[] = {
    {"char",     1, 1},
    {"short",    2, 2},
    {"int",      4, 4},
    {"long",     8, 8},
    {"float",    4, 4},
    {"double",   8, 8},
    {NULL, 0, 0}
};

TypeInfo *lookup_type(const char *name) {
    for (int i = 0; types_64bit[i].name; i++) {
        if (strcmp(types_64bit[i].name, name) == 0) {
            return &types_64bit[i];
        }
    }
    return NULL;
}

Test this first before building the parser.

Hint 2: Lexer Approach

A simple lexer can be built with a state machine:

Token next_token(const char **input) {
    Token tok = {0};

    /* Skip whitespace */
    while (isspace(**input)) (*input)++;

    if (**input == '\0') {
        tok.type = TOK_EOF;
        return tok;
    }

    /* Keywords and identifiers */
    if (isalpha(**input) || **input == '_') {
        int len = 0;
        while (isalnum(**input) || **input == '_') {
            tok.text[len++] = *(*input)++;
        }
        tok.text[len] = '\0';

        /* Check for keywords */
        if (strcmp(tok.text, "struct") == 0) tok.type = TOK_STRUCT;
        else if (strcmp(tok.text, "char") == 0) tok.type = TOK_CHAR;
        /* ... more keywords ... */
        else tok.type = TOK_IDENT;

        return tok;
    }

    /* Punctuation */
    switch (*(*input)++) {
        case '{': tok.type = TOK_LBRACE; break;
        case '}': tok.type = TOK_RBRACE; break;
        case ';': tok.type = TOK_SEMICOLON; break;
        case '*': tok.type = TOK_STAR; break;
        /* ... */
    }

    return tok;
}
Hint 3: Layout Algorithm in Detail

The key function:

size_t align_to(size_t offset, size_t alignment) {
    size_t remainder = offset % alignment;
    if (remainder == 0) return offset;
    return offset + (alignment - remainder);
}

void calculate_layout(StructDef *def, StructLayout *layout) {
    size_t offset = 0;
    size_t max_align = 1;

    for (int i = 0; i < def->member_count; i++) {
        MemberDef *m = &def->members[i];
        size_t align = m->alignment;

        /* Update max alignment */
        if (align > max_align) max_align = align;

        /* Add padding if needed */
        size_t aligned_offset = align_to(offset, align);
        if (aligned_offset > offset) {
            add_padding(layout, offset, aligned_offset - offset);
            offset = aligned_offset;
        }

        /* Add member */
        add_member(layout, offset, m);
        offset += m->size;
    }

    /* Trailing padding */
    size_t final_size = align_to(offset, max_align);
    if (final_size > offset) {
        add_trailing_padding(layout, offset, final_size - offset);
    }

    layout->total_size = final_size;
    layout->struct_alignment = max_align;
}
Hint 4: Verification Strategy

Generate C code to verify your calculations:

void generate_verification(StructDef *def, StructLayout *layout) {
    printf("#include <stddef.h>\n");
    printf("#include <stdio.h>\n");
    printf("#include <assert.h>\n\n");

    /* Print struct definition */
    printf("%s\n\n", def->original_text);

    printf("int main(void) {\n");
    printf("    assert(sizeof(%s) == %zu);\n",
           def->name, layout->total_size);

    for (int i = 0; i < layout->entry_count; i++) {
        if (layout->entries[i].type == ENTRY_MEMBER) {
            printf("    assert(offsetof(%s, %s) == %zu);\n",
                   def->name,
                   layout->entries[i].member->name,
                   layout->entries[i].offset);
        }
    }

    printf("    printf(\"All assertions passed!\\n\");\n");
    printf("    return 0;\n");
    printf("}\n");
}

Then compile and run the generated code to verify correctness!

5.8 The Interview Questions They’ll Ask

  1. “Why isn’t sizeof(struct { char c; int i; }) equal to 5?” Answer: The int must be aligned to a 4-byte boundary. The compiler adds 3 bytes of padding after the char. Plus trailing padding makes total 8 bytes.

  2. “How can you minimize struct size?” Answer: Order members from largest alignment to smallest. This minimizes internal padding. Example: put doubles first, then ints, then chars.

  3. “What is the alignment of a struct?” Answer: The largest alignment requirement among all members. This ensures arrays of the struct work correctly.

  4. “What does __attribute__((packed)) do?” Answer: Removes all padding, creating a compact layout. Warning: may cause misaligned access, which is slow or crashes on some architectures.

  5. “How does struct layout affect cache performance?” Answer: Larger structs = fewer fit in cache = more cache misses. Also, frequently accessed members should be together to benefit from spatial locality.

  6. “What is false sharing?” Answer: When different threads access different variables that happen to be in the same cache line, they invalidate each other’s cache. Solution: pad between frequently-written variables.

5.9 Books That Will Help

Topic Book Chapter
Data representation Write Great Code Vol 1 Ch. 4 “Floating-Point”, Ch. 5 “Character”
Memory organization Write Great Code Vol 1 Ch. 6 “Memory Organization and Access”
Composite data types Write Great Code Vol 1 Ch. 7 “Composite Data Types”
Data layout CS:APP 3rd Ed Ch. 3.9 “Heterogeneous Data Structures”
Memory hierarchy CS:APP 3rd Ed Ch. 6 “The Memory Hierarchy”
C type rules Expert C Programming Ch. 2-3

5.10 Implementation Phases

Phase 1: Type System and Manual Testing (Day 1)

  • Define type sizes and alignments
  • Write a test program that manually creates a StructDef
  • Calculate layout for that struct
  • Verify with offsetof()

Phase 2: Basic Lexer (Day 2)

  • Implement token recognition
  • Test with simple input
  • Handle keywords, identifiers, punctuation

Phase 3: Parser (Days 3-4)

  • Parse struct name { type member; ... }
  • Handle arrays type member[N]
  • Handle pointers type *member
  • Error handling for invalid syntax

Phase 4: Reporter (Day 5)

  • Format output as tables
  • Generate visual byte layout
  • Calculate statistics

Phase 5: Optimizer (Day 6)

  • Sort members by alignment
  • Calculate optimized layout
  • Show savings

Phase 6: Polish and Extensions (Day 7)

  • Handle platform modes (32/64 bit)
  • Add packed analysis
  • Comprehensive testing

5.11 Key Implementation Decisions

  1. How much C syntax to support?
    • Minimum: basic types, pointers, arrays
    • Nice to have: nested structs, unions, bitfields
    • Skip: function pointers, complex declarations
  2. How to handle unknown types?
    • Option A: Error out
    • Option B: Assume pointer-sized alignment (8 bytes on 64-bit)
    • Option C: Allow user-specified type info
  3. Output format?
    • Text-only (portable)
    • Color-coded (requires terminal support)
    • Machine-readable (JSON/XML) for scripting

6. Testing Strategy

Verification with offsetof()

The key to testing is comparing your calculations with actual compiler behavior:

/* verify_layout.c */
#include <stddef.h>
#include <stdio.h>
#include <assert.h>

/* Test struct */
struct test1 {
    char a;
    int b;
    char c;
    double d;
};

int main(void) {
    /* Verify against your tool's output */
    printf("struct test1:\n");
    printf("  sizeof = %zu\n", sizeof(struct test1));
    printf("  offsetof(a) = %zu\n", offsetof(struct test1, a));
    printf("  offsetof(b) = %zu\n", offsetof(struct test1, b));
    printf("  offsetof(c) = %zu\n", offsetof(struct test1, c));
    printf("  offsetof(d) = %zu\n", offsetof(struct test1, d));

    /* Assertions based on expected layout */
    assert(offsetof(struct test1, a) == 0);
    assert(offsetof(struct test1, b) == 4);
    assert(offsetof(struct test1, c) == 8);
    assert(offsetof(struct test1, d) == 16);
    assert(sizeof(struct test1) == 24);

    printf("All tests passed!\n");
    return 0;
}

Test Categories

Category Purpose Example
Basic types All C basic types work char, int, double, etc.
Alignment cases Padding calculated correctly char followed by int
Arrays Array size multiplied correctly char name[32]
Pointers Platform-dependent size char *ptr
No padding Optimal ordering detected int, int, char, char
Maximum padding Worst-case handled char, double, char, double
Empty struct Edge case struct empty {}
Single member Edge case struct single { int x; }

Critical Test Cases

# Test 1: No padding needed
./struct_analyzer "struct aligned { int a; int b; int c; }"
# Expected: 12 bytes, 0% padding

# Test 2: Maximum waste scenario
./struct_analyzer "struct wasteful { char a; double b; char c; double d; }"
# Expected: 32 bytes, ~53% padding

# Test 3: Optimal ordering
./struct_analyzer "struct optimal { double a; double b; int c; char d; char e; }"
# Expected: minimal padding

# Test 4: Arrays
./struct_analyzer "struct with_array { char name[32]; int id; }"

# Test 5: Pointers (verify platform-specific)
./struct_analyzer "struct with_ptr { char *data; int len; }"

7. Common Pitfalls & Debugging

Frequent Mistakes

Pitfall Symptom Solution
Forgetting trailing padding Struct size too small Round up to struct alignment at end
Wrong pointer size Offsets wrong on 64-bit Check platform pointer size
Array alignment Arrays with wrong offset Array alignment = element alignment
Off-by-one in modulo Occasional wrong padding Use (align - (offset % align)) % align
Not testing edge cases Fails on empty/single-member structs Test all degenerate cases

Debugging Strategies

1. Print intermediate calculations:

printf("Member %s: size=%zu, align=%zu, offset before=%zu, padding=%zu, offset after=%zu\n",
       member->name, size, align, offset_before, padding, offset_after);

2. Generate verification code:

./struct_analyzer --emit-verify "struct x { ... }" > verify.c
gcc -o verify verify.c && ./verify

3. Compare with pahole:

# pahole is a great tool for struct layout analysis
pahole --show_only_data_members ./your_binary

8. Extensions & Challenges

Beginner Extensions

  • Add color output for padding vs data
  • Support reading from stdin
  • Add --json output format

Intermediate Extensions

  • Handle nested structs
  • Support unions
  • Support typedef names
  • Add --packed mode to show packed layout
  • Cross-platform comparison (show 32-bit AND 64-bit side by side)

Advanced Extensions

  • Parse actual C header files (with preprocessor)
  • Handle bitfields
  • Cache line analysis (detect structs spanning cache lines)
  • False sharing detection (warn about multithreaded issues)
  • Integration with debug info (DWARF parsing)
  • GUI visualization

9. Real-World Connections

Network Protocols

Network headers are designed with explicit packing:

/* IP header - MUST match wire format */
struct __attribute__((packed)) ip_header {
    uint8_t  version_ihl;
    uint8_t  tos;
    uint16_t total_length;
    uint16_t identification;
    uint16_t flags_fragment;
    uint8_t  ttl;
    uint8_t  protocol;
    uint16_t checksum;
    uint32_t source_ip;
    uint32_t dest_ip;
};

Database Records

Database systems carefully design record layouts:

/* Before optimization: 32 bytes */
struct record_v1 {
    char status;        /* 1 byte + 7 padding */
    double timestamp;   /* 8 bytes */
    char type;          /* 1 byte + 3 padding */
    int id;             /* 4 bytes */
    char flags;         /* 1 byte + 7 padding (trailing) */
};

/* After optimization: 24 bytes */
struct record_v2 {
    double timestamp;   /* 8 bytes */
    int id;             /* 4 bytes */
    char status;        /* 1 byte */
    char type;          /* 1 byte */
    char flags;         /* 1 byte + 1 padding */
};  /* 25% smaller! */

Game Development

Games often pack vertex data tightly for GPU efficiency:

struct vertex {
    float position[3];  /* 12 bytes */
    float normal[3];    /* 12 bytes */
    float uv[2];        /* 8 bytes */
};  /* 32 bytes - cache-line friendly */

10. Resources

Tools for Comparison

  • pahole: Linux tool for struct layout analysis (part of dwarves package)
  • Compiler Explorer (godbolt.org): See struct layout in assembly
  • offsetof() macro: Built-in verification

Documentation


11. Self-Assessment Checklist

Understanding Verification

  • I can manually calculate struct layout with padding
  • I know why alignment exists (CPU efficiency)
  • I understand trailing padding (array element alignment)
  • I can explain the difference between packed and normal structs
  • I know how to verify layouts with offsetof()

Implementation Verification

  • Tool parses basic struct definitions
  • Calculated sizes match sizeof()
  • Calculated offsets match offsetof()
  • Visual output shows padding clearly
  • Optimization suggestions reduce size

Quality Verification

  • All test cases pass
  • Handles edge cases (empty struct, single member)
  • Reasonable error messages for bad input
  • Works on both 32-bit and 64-bit modes (if implemented)

12. Submission / Completion Criteria

Minimum Viable Completion

  • Parses struct definitions with basic types
  • Calculates correct layout (verified with offsetof())
  • Shows offset table with padding marked
  • Reports total size and padding percentage

Full Completion

  • Handles arrays and pointers
  • Visual byte-by-byte layout display
  • Optimization suggestions
  • Platform mode selection (32/64 bit)
  • Comprehensive test suite

Excellence

  • Nested struct support
  • Union support
  • Generate verification C code
  • Integration with pahole comparison
  • JSON output for scripting
  • Performance benchmarking of packed vs unpacked

The Core Question You’re Answering

“How does the compiler lay out struct members in memory, and how can we optimize for size?”

By building this tool, you will understand exactly how compilers implement the C standard’s requirements for data alignment. You will be able to look at any struct and predict its size, identify wasted padding, and optimize for both memory efficiency and cache performance.

This knowledge is essential for:

  • Systems programming (kernel, drivers)
  • Embedded development (memory-constrained)
  • High-performance computing (cache optimization)
  • Network programming (protocol structures)
  • Database development (record layouts)

This guide was expanded from EXPERT_C_PROGRAMMING_DEEP_DIVE.md. For the complete learning path, see the project index.