Project 16: Portable Code Checker
Build a static analysis tool that detects non-portable C constructs, helping you write code that compiles and runs correctly across different platforms, compilers, and architectures.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C |
| Difficulty | Level 4 (Advanced) |
| Time | 2 Weeks |
| Key Concepts | Portability, undefined behavior, implementation-defined behavior, standards compliance |
| Prerequisites | P01-P15, understanding of C standards, familiarity with multiple platforms |
| Portfolio Value | High - demonstrates cross-platform expertise |
1. Learning Objectives
By completing this project, you will:
-
Master the distinction between portable and non-portable constructs: Understand what the C standard guarantees versus what individual implementations may do differently
-
Understand undefined vs implementation-defined behavior: Know when code “works by accident” versus when it’s guaranteed to work
-
Learn data model differences: Comprehend ILP32, LP64, LLP64, and why
sizeof(long)varies -
Recognize compiler extensions: Identify GCC, Clang, and MSVC-specific features that break portability
-
Build a practical static analyzer: Create a tool that catches real portability bugs before they cause cross-platform failures
-
Write truly portable C: Apply lessons learned to write code that works everywhere
-
Understand alignment and representation: Know how struct padding, endianness, and type representations vary
2. Theoretical Foundation
2.1 Core Concepts
Undefined Behavior vs Implementation-Defined vs Unspecified
The C standard categorizes behavior into distinct categories:
+------------------------------------------------------------------------------+
| C STANDARD BEHAVIOR CATEGORIES |
+------------------------------------------------------------------------------+
| |
| DEFINED BEHAVIOR (Portable): |
| ----------------------------- |
| The standard specifies exactly what happens. |
| Example: printf("%d\n", 5); // Always prints "5" |
| |
| IMPLEMENTATION-DEFINED BEHAVIOR: |
| -------------------------------- |
| The compiler must document what happens, but it varies. |
| Example: sizeof(int) // 2, 4, or 8 depending on platform |
| Right shift of negative number: arithmetic or logical? |
| Representation of signed integers: two's complement? |
| |
| UNSPECIFIED BEHAVIOR: |
| --------------------- |
| The compiler chooses from options, but doesn't have to document. |
| Example: Order of function argument evaluation |
| foo(i++, i++); // Which i++ happens first? |
| |
| UNDEFINED BEHAVIOR (UB): |
| ------------------------ |
| Anything can happen. Compilers assume UB never occurs. |
| Example: Signed integer overflow |
| Dereferencing NULL |
| Out-of-bounds array access |
| Using uninitialized variables |
| |
| WHY THIS MATTERS FOR PORTABILITY: |
| Code relying on undefined or implementation-defined behavior may: |
| - Work on one platform, fail on another |
| - Work today, break with compiler updates |
| - Work in debug, fail in release (optimizations expose UB) |
| |
+------------------------------------------------------------------------------+
Data Models: Why long Is Not Always 32 Bits
+------------------------------------------------------------------------------+
| DATA MODELS ACROSS PLATFORMS |
+------------------------------------------------------------------------------+
| |
| MODEL char short int long long long pointer PLATFORM |
| ----------- ---- ----- ---- ---- --------- ------- -------- |
| ILP32 8 16 32 32 64 32 32-bit Unix |
| LP64 8 16 32 64 64 64 64-bit Unix/Mac |
| LLP64 8 16 32 32 64 64 64-bit Windows |
| ILP64 8 16 64 64 64 64 (rare) |
| |
| KEY INSIGHT: |
| - sizeof(int) is 32 bits everywhere (modern platforms) |
| - sizeof(long) varies! 32 on Windows 64-bit, 64 on Unix 64-bit |
| - sizeof(void*) varies! 32 on 32-bit, 64 on 64-bit |
| |
| PORTABLE CODE USES: |
| - int32_t, int64_t, uint32_t etc. from <stdint.h> |
| - size_t for sizes and array indices |
| - ptrdiff_t for pointer differences |
| - intptr_t/uintptr_t for pointer-to-integer conversions |
| |
| COMMON BUG: |
| long file_size = ...; |
| printf("%ld\n", file_size); // Fine on LP64, wrong on LLP64! |
| |
+------------------------------------------------------------------------------+
Endianness: Byte Order Matters
+------------------------------------------------------------------------------+
| ENDIANNESS |
+------------------------------------------------------------------------------+
| |
| uint32_t value = 0x12345678; |
| |
| LITTLE-ENDIAN (x86, ARM default): |
| Memory: [78] [56] [34] [12] |
| Address: 0 1 2 3 |
| Least significant byte at lowest address |
| |
| BIG-ENDIAN (Network byte order, PowerPC, SPARC): |
| Memory: [12] [34] [56] [78] |
| Address: 0 1 2 3 |
| Most significant byte at lowest address |
| |
| NON-PORTABLE CODE: |
| ------------------ |
| uint32_t val = 0x12345678; |
| uint8_t *bytes = (uint8_t *)&val; |
| if (bytes[0] == 0x78) { /* Assumes little-endian! */ } |
| |
| FILE FORMAT CODE: |
| ----------------- |
| // Reading a 32-bit value from a file |
| fread(&value, 4, 1, file); // NON-PORTABLE! Depends on endianness |
| |
| PORTABLE ALTERNATIVE: |
| uint32_t read_le32(FILE *f) { |
| uint8_t bytes[4]; |
| fread(bytes, 1, 4, f); |
| return bytes[0] | (bytes[1] << 8) | |
| (bytes[2] << 16) | (bytes[3] << 24); |
| } |
| |
+------------------------------------------------------------------------------+
Alignment Requirements
+------------------------------------------------------------------------------+
| ALIGNMENT REQUIREMENTS |
+------------------------------------------------------------------------------+
| |
| Every type has an alignment requirement: |
| - char: 1 byte alignment |
| - short: typically 2 byte |
| - int: typically 4 byte |
| - long: 4 or 8 byte (platform-dependent) |
| - double: typically 8 byte |
| - pointers: 4 or 8 byte |
| |
| MISALIGNED ACCESS - WHAT HAPPENS: |
| x86: Works, but slower (hardware handles it) |
| ARM (older): SIGBUS crash! |
| ARM (newer): Works, but may be slow or require special instructions |
| SPARC: SIGBUS crash! |
| |
| NON-PORTABLE CODE: |
| char buffer[100]; |
| int *ip = (int *)(buffer + 3); // Potentially misaligned! |
| *ip = 42; // May crash on strict-alignment platforms |
| |
| PORTABLE ALTERNATIVE: |
| char buffer[100]; |
| int value = 42; |
| memcpy(buffer + 3, &value, sizeof(value)); // Always safe |
| |
+------------------------------------------------------------------------------+
Compiler Extensions
+------------------------------------------------------------------------------+
| COMMON COMPILER EXTENSIONS |
+------------------------------------------------------------------------------+
| |
| GCC/CLANG EXTENSIONS: |
| --------------------- |
| __attribute__((packed)) // Non-standard, use #pragma pack |
| __attribute__((aligned(N))) // Non-standard |
| __attribute__((constructor)) // Init function, no standard equivalent |
| typeof(expr) // C23 has typeof, older C doesn't |
| Statement expressions ({ ... }) |
| Nested functions // GCC only! |
| __builtin_* // Compiler builtins |
| Variable-length arrays in structs (flexible array members are standard) |
| |
| MSVC EXTENSIONS: |
| ---------------- |
| __declspec(dllexport/dllimport) |
| __forceinline |
| __assume(expr) |
| #pragma warning |
| __int64 (use int64_t instead) |
| |
| DETECTING COMPILER: |
| #if defined(__GNUC__) |
| #elif defined(_MSC_VER) |
| #elif defined(__clang__) |
| #endif |
| |
+------------------------------------------------------------------------------+
2.2 Why Portability Matters
Cross-Platform Development Reality:
- Mobile apps often share C/C++ code between iOS and Android
- Servers run Linux, but developers use macOS or Windows
- Embedded systems use various microcontroller architectures
- Libraries must work across all platforms
Real-World Portability Disasters:
-
The
time_tY2K38 Problem: 32-bittime_toverflows in 2038. Code assumingsizeof(time_t) == 4will break. -
The Windows Long Fiasco: Thousands of Unix programs assumed
sizeof(long) == sizeof(void*). When ported to 64-bit Windows (LLP64), they crashed. -
ARM SIGBUS Crashes: x86-developed code with misaligned accesses ran fine on Intel but crashed on ARM mobile devices.
-
Big-Endian Network Protocols: Code that reads network packets directly into structs works on big-endian machines but produces garbage on little-endian.
2.3 Historical Context: Why C Allows Non-Portable Constructs
C was designed as a “portable assembler” for Unix, initially targeting the PDP-11. The philosophy was:
- Trust the programmer - Don’t add runtime checks
- Keep the language simple - Implementation-defined behavior reduces compiler complexity
- Allow efficient code - Undefined behavior enables optimizations
- Support diverse hardware - Different machines have different natural word sizes
This design was brilliant for its era but means modern C programmers must understand what’s portable and what isn’t.
2.4 Common Misconceptions
Misconception 1: “It works, so it’s portable”
- Code may work by accident on your platform
- Compiler optimizations can expose latent bugs
- Different compilers make different choices
Misconception 2: “The compiler would warn me”
- Many non-portable constructs don’t trigger warnings
-Walldoesn’t catch everything- Different compilers warn about different things
Misconception 3: “I only target one platform”
- Requirements change
- Compiler updates can change behavior
- Future developers may need to port your code
Misconception 4: “Standard C is always portable”
- Implementation-defined behavior is standard but not portable
- Even standards-compliant code may not be portable
3. Project Specification
3.1 What You Will Build
A static analysis tool that scans C source code and reports non-portable constructs:
$ ./portcheck source.c
Analyzing: source.c
=== PORTABILITY ISSUES ===
source.c:15: WARNING [TYPE-SIZE] Assuming sizeof(int) == 4
int buffer[1024 / sizeof(int)];
Issue: sizeof(int) is implementation-defined (could be 2 on 16-bit systems)
Suggestion: Use SIZE_MAX/sizeof(int) or fixed-width types like int32_t
source.c:23: WARNING [PTR-INT] Pointer-to-int cast may truncate on 64-bit
int addr = (int)ptr;
Issue: sizeof(int) < sizeof(void*) on LP64 and LLP64 systems
Suggestion: Use uintptr_t for pointer-to-integer conversions
source.c:45: WARNING [BIT-FIELD] Bit-field signedness is implementation-defined
unsigned int flags : 3;
Issue: Bit-field layout varies between compilers and ABIs
Suggestion: Document expected behavior or avoid bit-fields in cross-platform ABIs
source.c:67: WARNING [EXTENSION] Non-standard extension: typeof()
typeof(x) y = x;
Issue: typeof is a GCC/Clang extension, not standard until C23
Suggestion: Use explicit type declaration or _Generic for C11
source.c:89: WARNING [ENDIAN] Potential endianness assumption
*(uint32_t *)buffer = value;
Issue: Type punning through pointer cast assumes specific byte order
Suggestion: Use memcpy() or explicit byte serialization
source.c:112: WARNING [ALIGN] Potentially misaligned pointer cast
int *ip = (int *)(char_buffer + 1);
Issue: Unaligned access causes SIGBUS on strict-alignment architectures
Suggestion: Use memcpy() for unaligned access
source.c:134: WARNING [OVERFLOW] Signed integer overflow is undefined behavior
if (x + y > MAX) // where x, y are signed
Issue: Compiler may assume overflow never happens and optimize unexpectedly
Suggestion: Check for overflow before it happens, or use unsigned types
source.c:156: WARNING [SHIFT] Shift amount may exceed type width
uint32_t mask = 1 << bit; // bit could be 32+
Issue: Shifting by >= type width is undefined behavior
Suggestion: Use (uint32_t)1 << (bit & 31) with bounds checking
source.c:178: WARNING [IMPL-DEF] Right-shift of signed value
int result = negative_val >> 4;
Issue: Right shift of negative values is implementation-defined
Suggestion: Use unsigned type or explicit sign handling
=== SUMMARY ===
Total issues: 9
Critical (will break): 3
Warning (may break): 4
Info (best practice): 2
Platforms affected:
- 64-bit Windows (LLP64): 2 issues
- 32-bit systems: 1 issue
- Big-endian systems: 1 issue
- Strict-alignment (ARM, SPARC): 1 issue
- Non-GCC compilers: 1 issue
3.2 Functional Requirements
Issue Categories to Detect:
typedef enum {
PORT_TYPE_SIZE, // Assumptions about type sizes
PORT_PTR_INT, // Pointer-integer conversions
PORT_BIT_FIELD, // Bit-field portability
PORT_EXTENSION, // Compiler-specific extensions
PORT_ENDIAN, // Endianness assumptions
PORT_ALIGN, // Alignment issues
PORT_OVERFLOW, // Integer overflow assumptions
PORT_SHIFT, // Shift operation issues
PORT_SIGN, // Signed/unsigned issues
PORT_IMPL_DEF, // Implementation-defined behavior
PORT_UB, // Undefined behavior
PORT_CHAR_SIGN, // char signedness assumptions
PORT_PADDING, // Struct padding assumptions
PORT_VARARGS, // Variadic function issues
PORT_PRINTF, // printf format string issues
} PortabilityCategory;
Severity Levels:
typedef enum {
SEVERITY_CRITICAL, // Will definitely break on some platforms
SEVERITY_WARNING, // May break depending on circumstances
SEVERITY_INFO, // Best practice, unlikely to break
} Severity;
3.3 Non-Functional Requirements
- Low false positives: Only report likely issues, not every cast
- Actionable output: Always suggest a portable alternative
- Platform targeting: Allow checking for specific platforms
- Integration friendly: Exit codes and machine-readable output
- Fast: Handle large codebases efficiently
3.4 Comprehensive Portability Issues to Detect
Category 1: Type Size Assumptions
// ISSUE: Assuming sizeof(int) == 4
int arr[1000 / sizeof(int)]; // Non-portable
int32_t arr[1000 / sizeof(int32_t)]; // Portable
// ISSUE: Assuming sizeof(long) == 4 or 8
long value = 0x12345678ABCDEF; // May overflow on 32-bit long
int64_t value = 0x12345678ABCDEF; // Portable
// ISSUE: Assuming sizeof(pointer) == sizeof(int)
void *ptr = malloc(100);
int handle = (int)ptr; // Truncates on 64-bit!
intptr_t handle = (intptr_t)ptr; // Portable
// ISSUE: Using int for array sizes
int size = strlen(str); // strlen returns size_t
size_t size = strlen(str); // Portable
// ISSUE: Assuming size_t width
printf("%d items\n", count); // count is size_t - wrong on 64-bit
printf("%zu items\n", count); // Portable (C99+)
Category 2: Pointer-Integer Conversions
// ISSUE: int to pointer (may truncate on 64-bit)
void *ptr = (void *)some_int;
// ISSUE: pointer to int (definitely truncates on 64-bit)
int addr = (int)some_ptr;
uintptr_t addr = (uintptr_t)some_ptr; // Portable
// ISSUE: Arithmetic on void pointers (extension)
void *p = ptr + 1; // GCC extension!
char *p = (char *)ptr + 1; // Portable
// ISSUE: Function pointer to data pointer
void (*func)() = ...;
void *data = (void *)func; // Not portable! Different sizes possible
Category 3: Endianness Issues
// ISSUE: Struct overlay on byte buffer
struct __attribute__((packed)) Header {
uint32_t magic;
uint16_t version;
};
struct Header *h = (struct Header *)buffer; // Endian-dependent!
// ISSUE: Direct byte access for multi-byte values
uint32_t val = 0x12345678;
uint8_t low_byte = *(uint8_t *)&val; // Different on BE vs LE
// ISSUE: Union type punning for byte access
union {
uint32_t i;
uint8_t bytes[4];
} u;
u.i = 0x12345678;
uint8_t first = u.bytes[0]; // 0x78 on LE, 0x12 on BE
// PORTABLE: Explicit byte extraction
uint8_t low_byte = val & 0xFF;
uint8_t high_byte = (val >> 24) & 0xFF;
Category 4: Alignment Issues
// ISSUE: Casting to stricter alignment
char buffer[100];
int *ip = (int *)(buffer + 1); // Misaligned!
*ip = 42; // SIGBUS on strict-alignment CPUs
// ISSUE: Packed struct members
struct __attribute__((packed)) Data {
char c;
int i; // Misaligned!
};
struct Data d;
int *ip = &d.i; // Pointer to misaligned int
// PORTABLE: Use memcpy for unaligned access
char buffer[100];
int value = 42;
memcpy(buffer + 1, &value, sizeof(value));
Category 5: Bit-Field Issues
// ISSUE: Bit-field layout is implementation-defined
struct Flags {
unsigned int a : 1;
unsigned int b : 3;
unsigned int c : 4;
}; // Order in memory varies by compiler!
// ISSUE: Bit-field signedness without explicit sign
struct Bits {
int x : 4; // Is x signed or unsigned? Implementation-defined!
};
// ISSUE: Bit-field crossing storage unit boundary
struct Wide {
unsigned int a : 20;
unsigned int b : 20; // May or may not cross int boundary
};
Category 6: Signed Integer Issues
// ISSUE: Signed integer overflow is UB
int result = INT_MAX + 1; // Undefined behavior!
// ISSUE: Signed right-shift is implementation-defined
int x = -8;
int y = x >> 1; // Could be -4 (arithmetic) or large positive (logical)
// ISSUE: char signedness varies
char c = 200;
if (c > 100) { ... } // May be false if char is signed!
// ISSUE: Comparing signed and unsigned
int i = -1;
unsigned u = 1;
if (i < u) { } // FALSE! -1 becomes UINT_MAX
// PORTABLE: Explicit unsigned
signed char c = 200; // or unsigned char c = 200;
Category 7: Shift Operations
// ISSUE: Shifting by type width or more is UB
uint32_t x = 1 << 32; // UB!
uint32_t x = 1 << bit; // UB if bit >= 32
// ISSUE: Shifting negative values is UB
int x = -1 << 4; // Undefined behavior!
// ISSUE: Shift amount is negative is UB
int x = 1 << -1; // Undefined behavior!
// ISSUE: int promotion in shift
uint8_t x = 0xFF;
uint32_t y = x << 24; // x promotes to int, may be signed!
uint32_t y = (uint32_t)x << 24; // Portable
Category 8: Compiler Extensions
// ISSUE: typeof (GCC/Clang, standard in C23)
typeof(x) y = x;
// ISSUE: Statement expressions (GCC)
int max = ({ int a = x; int b = y; a > b ? a : b; });
// ISSUE: Nested functions (GCC only)
void outer() {
void inner() { } // GCC extension!
}
// ISSUE: __attribute__ (GCC/Clang)
__attribute__((packed)) struct S { };
__attribute__((aligned(16))) int x;
__attribute__((constructor)) void init() { }
// ISSUE: Designated initializers mixing styles
int arr[] = { [0] = 1, 2, [5] = 3 }; // Mixing styles may confuse
// ISSUE: Variable-length arrays (optional in C11+)
void func(int n) {
int arr[n]; // VLA - optional feature
}
Category 9: Printf Format Specifiers
// ISSUE: Wrong format for type
size_t sz = 100;
printf("%d\n", sz); // WRONG on 64-bit
printf("%zu\n", sz); // Portable (C99+)
long l = 100;
printf("%d\n", l); // WRONG
printf("%ld\n", l); // Portable
int64_t i64 = 100;
printf("%lld\n", i64); // WRONG on some platforms
printf("%" PRId64 "\n", i64); // Portable (from <inttypes.h>)
void *p = &x;
printf("%x\n", p); // WRONG
printf("%p\n", p); // Portable
Category 10: Struct Padding and Layout
// ISSUE: Assuming no padding
struct Data {
char c;
int i;
};
assert(sizeof(struct Data) == 5); // WRONG! Usually 8
// ISSUE: Network/file I/O with structs
struct Packet {
uint32_t id;
uint16_t len;
};
fwrite(&pkt, sizeof(pkt), 1, file); // Includes padding!
// ISSUE: Offsetof assumptions
struct S { char c; int i; };
char *p = (char *)&s + 1;
int *ip = (int *)p; // May not point to s.i!
int *ip = (int *)((char *)&s + offsetof(struct S, i)); // Portable
3.5 Example Usage / Output
Command-Line Interface:
# Basic usage
$ ./portcheck source.c
# Check specific platforms
$ ./portcheck --platform=windows64 source.c
$ ./portcheck --platform=arm32 source.c
# Specify severity threshold
$ ./portcheck --min-severity=warning source.c
# Machine-readable output
$ ./portcheck --format=json source.c
$ ./portcheck --format=csv source.c
# Check entire project
$ ./portcheck --recursive src/
# Suppress specific warnings
$ ./portcheck --suppress=TYPE-SIZE,EXTENSION source.c
# Pedantic mode (warn about everything)
$ ./portcheck --pedantic source.c
# Show only specific categories
$ ./portcheck --only=endian,align source.c
JSON Output:
{
"file": "source.c",
"issues": [
{
"line": 23,
"column": 15,
"category": "PTR-INT",
"severity": "critical",
"message": "Pointer-to-int cast may truncate on 64-bit",
"code": "int addr = (int)ptr;",
"suggestion": "Use uintptr_t for pointer-to-integer conversions",
"platforms_affected": ["LP64", "LLP64"]
}
],
"summary": {
"total": 9,
"critical": 3,
"warning": 4,
"info": 2
}
}
3.6 Real World Outcome
After completing this project, you will be able to:
- Write portable C code: Automatically identify and fix portability issues
- Port existing codebases: Quickly find issues when moving code to new platforms
- Review others’ code: Catch portability bugs in code review
- Understand platform differences: Deep knowledge of why different platforms behave differently
- Debug cross-platform issues: Know what to look for when code works on one platform but not another
4. Solution Architecture
4.1 High-Level Design
+------------------------------------------------------------------------------+
| PORTABLE CODE CHECKER ARCHITECTURE |
+------------------------------------------------------------------------------+
| |
| +------------------------------------------------------------------------+ |
| | INPUT LAYER | |
| | | |
| | +----------------+ +------------------+ +------------------------+ | |
| | | Command Line | | Configuration | | File Discovery | | |
| | | Parser | | (.portcheck.yml) | | (recursive, globs) | | |
| | +----------------+ +------------------+ +------------------------+ | |
| +------------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------------+ |
| | LEXER/TOKENIZER | |
| | | |
| | source.c --> [Token Stream] | |
| | | |
| | Tokens: KEYWORD(int), IDENT(addr), OP(=), OP((), TYPE(int), OP())... | |
| +------------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------------+ |
| | SIMPLE PARSER (AST-lite) | |
| | | |
| | Not a full C parser - pattern matching on token streams | |
| | Tracks: declarations, casts, expressions, function calls | |
| +------------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------------+ |
| | ANALYSIS ENGINE | |
| | | |
| | +------------------+ +------------------+ +--------------------+ | |
| | | Type Size | | Pointer-Int | | Endian | | |
| | | Checker | | Checker | | Checker | | |
| | +------------------+ +------------------+ +--------------------+ | |
| | | |
| | +------------------+ +------------------+ +--------------------+ | |
| | | Alignment | | Bit-field | | Extension | | |
| | | Checker | | Checker | | Checker | | |
| | +------------------+ +------------------+ +--------------------+ | |
| | | |
| | +------------------+ +------------------+ +--------------------+ | |
| | | Shift | | Printf Format | | Overflow | | |
| | | Checker | | Checker | | Checker | | |
| | +------------------+ +------------------+ +--------------------+ | |
| +------------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------------+ |
| | ISSUE COLLECTOR | |
| | | |
| | Deduplication, sorting, severity classification | |
| +------------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------------+ |
| | OUTPUT FORMATTER | |
| | | |
| | +----------------+ +----------------+ +----------------+ | |
| | | Text/Console | | JSON | | CSV | | |
| | +----------------+ +----------------+ +----------------+ | |
| +------------------------------------------------------------------------+ |
| |
+------------------------------------------------------------------------------+
4.2 Key Components
1. Lexer/Tokenizer:
- Converts C source into token stream
- Handles preprocessor directives
- Tracks line/column for error reporting
2. Pattern Matcher:
- Recognizes problematic patterns without full parsing
- Uses heuristics for type inference
- Handles common idioms
3. Checkers:
- Each checker looks for specific issue categories
- Checkers are independent and can be enabled/disabled
- Each produces structured issue reports
4. Issue Database:
- Contains descriptions, suggestions, affected platforms
- Allows severity configuration
- Supports suppression
4.3 Data Structures
/* Token representation */
typedef enum {
TOK_KEYWORD, /* int, char, long, sizeof, etc. */
TOK_IDENTIFIER, /* variable/function names */
TOK_NUMBER, /* numeric literals */
TOK_STRING, /* string literals */
TOK_CHAR, /* character literals */
TOK_OPERATOR, /* +, -, *, /, <<, >>, etc. */
TOK_PUNCTUATION, /* (, ), {, }, [, ], ;, etc. */
TOK_PREPROCESSOR, /* #include, #define, etc. */
TOK_COMMENT, /* // and /* comments */
TOK_ATTRIBUTE, /* __attribute__ */
TOK_EOF,
} TokenType;
typedef struct {
TokenType type;
char *text;
size_t line;
size_t column;
char *file;
} Token;
/* Issue representation */
typedef struct {
PortabilityCategory category;
Severity severity;
char *file;
size_t line;
size_t column;
char *code_snippet;
char *message;
char *suggestion;
char *platforms_affected[8];
int num_platforms;
} PortabilityIssue;
/* Checker interface */
typedef struct {
const char *name;
PortabilityCategory category;
void (*check)(Token *tokens, size_t count, PortabilityIssue **issues, size_t *num_issues);
bool enabled;
} Checker;
/* Configuration */
typedef struct {
bool pedantic;
Severity min_severity;
char *target_platforms[8];
int num_platforms;
PortabilityCategory suppressed[32];
int num_suppressed;
char *format; /* "text", "json", "csv" */
} Config;
4.4 Algorithm Overview: Pattern Matching
Rather than building a full C parser (complex!), we use pattern matching:
TYPE SIZE CHECK:
Pattern: sizeof ( <type> ) in arithmetic or comparison context
Example: 1024 / sizeof(int)
Action: Warn if type is not fixed-width
POINTER-INT CAST:
Pattern: ( int ) <pointer_expression> OR ( int* ) <integer>
Example: (int)ptr OR (int *)offset
Action: Warn about truncation
ENDIANNESS:
Pattern: * ( <integer_type> * ) <byte_buffer>
Example: *(uint32_t *)buffer
Action: Warn about byte-order assumption
ALIGNMENT:
Pattern: ( <type> * ) ( <expr> + <literal> )
Example: (int *)(buffer + 3)
Action: Warn if offset not aligned for target type
EXTENSION:
Pattern: typeof | __attribute__ | __builtin_ | etc.
Action: Warn about non-standard feature
SHIFT:
Pattern: <expr> << <literal> where literal >= type_width
Pattern: <expr> << <variable> (potential for overflow)
Action: Warn about undefined behavior
5. Implementation Guide
5.1 Development Environment Setup
Required:
# C compiler
gcc --version # or clang
# Build system
make --version
# Testing
diff --version # for output comparison tests
Recommended:
# For testing portability claims
# Cross-compilers or VMs for different platforms
arm-linux-gnueabi-gcc --version
x86_64-w64-mingw32-gcc --version
5.2 Project Structure
portable-checker/
+-- include/
| +-- portcheck.h # Public API
| +-- lexer.h # Tokenizer interface
| +-- checker.h # Checker interface
| +-- issue.h # Issue data structures
| +-- config.h # Configuration
+-- src/
| +-- main.c # Entry point
| +-- lexer.c # C tokenizer
| +-- parser.c # Simple pattern parser
| +-- checkers/
| | +-- type_size.c # Type size assumptions
| | +-- ptr_int.c # Pointer-integer conversions
| | +-- endian.c # Endianness issues
| | +-- align.c # Alignment issues
| | +-- bitfield.c # Bit-field issues
| | +-- extension.c # Compiler extensions
| | +-- shift.c # Shift operations
| | +-- overflow.c # Integer overflow
| | +-- printf.c # Printf format strings
| | +-- sign.c # Signed/unsigned issues
| +-- issue_db.c # Issue descriptions
| +-- output.c # Formatters
| +-- config.c # Configuration parsing
+-- tests/
| +-- test_files/ # C files with known issues
| | +-- type_size.c
| | +-- ptr_int.c
| | +-- endian.c
| | +-- ...
| +-- expected/ # Expected outputs
| +-- run_tests.sh
+-- Makefile
+-- README.md
5.3 The Core Question You’re Answering
“What assumptions in my C code will cause it to break on different platforms, and how can I automatically detect them?”
5.4 Concepts You Must Understand First
Before implementing, ensure you can answer:
- What are the sizes of
int,long,size_t, andvoid*on ILP32, LP64, and LLP64?- Reference: C standard, platform documentation
- When is signed integer overflow undefined behavior?
- Reference: C11 Standard 6.5/5
- What alignment requirements exist on different architectures?
- Reference: Architecture ABI documentation
- How do bit-fields layout differ between compilers?
- Reference: Expert C Programming Ch. 8, compiler documentation
- Which GCC/Clang attributes have no standard equivalent?
- Reference: GCC documentation
5.5 Questions to Guide Your Design
Lexer Design:
- How will you handle preprocessor directives? (Skip, track, expand?)
- How will you handle comments? (Preserve for context?)
- How will you track macro expansions?
Parser Design:
- Do you need a full AST or can you pattern-match on tokens?
- How will you handle nested expressions?
- How will you track type information without full semantic analysis?
Checker Design:
- Should checkers be independent or share state?
- How will you handle overlapping issues?
- How will you reduce false positives?
Output Design:
- How will you format multi-line code snippets?
- How will you handle suggestions that require context?
- How will you allow suppression?
5.6 Thinking Exercise
Before coding, manually analyze this file and list all portability issues:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
char type;
long value;
unsigned int flags : 3;
unsigned int status : 5;
} Record;
int hash_pointer(void *ptr) {
return (int)ptr ^ ((int)ptr >> 16);
}
void read_header(char *buffer) {
uint32_t *magic = (uint32_t *)buffer;
if (*magic == 0x89504E47) { /* PNG magic */
uint32_t *width = (uint32_t *)(buffer + 16);
printf("Width: %d\n", *width);
}
}
int safe_add(int a, int b) {
if (a + b < a) return -1; /* Overflow check */
return a + b;
}
void process_data(int n) {
int arr[n]; /* VLA */
typeof(arr) copy;
for (int i = 0; i < n; i++) {
arr[i] = i << 30; /* May overflow for i > 1 */
}
}
int main() {
char buffer[100];
int *ip = (int *)(buffer + 3);
*ip = 42;
long size = sizeof(int) * 1000;
printf("Size: %d\n", size);
signed char c = 200;
if (c > 100) {
printf("Large\n");
}
return 0;
}
Expected issues to find:
sizeof(long)varies (LP64 vs LLP64)- Bit-field layout implementation-defined
(int)ptrtruncates on 64-bit- Right-shift of int (may be signed)
- Endianness assumption in
*magic - Misaligned access
(uint32_t *)(buffer + 16) - Signed overflow in
a + b < acheck - VLA optional in C11+
typeofis extension- Shift
<< 30with loop variable may overflow - Misaligned access
(int *)(buffer + 3) printf("%d", size)wrong for longsigned char c = 200overflow/implementation-defined
5.7 Hints in Layers
Hint 1: Getting Started Start with the lexer. You need reliable tokenization before you can match patterns. Focus on keywords, identifiers, operators, and literals.
Hint 2: First Checker
Implement the compiler extension checker first - it’s pattern-based on specific keywords (typeof, __attribute__, __builtin_*) and doesn’t require type analysis.
Hint 3: Type Tracking
For type-aware checkers (pointer-int casts), track recent declarations. When you see int *ptr, remember that ptr has pointer type. This enables detecting (int)ptr.
Hint 4: False Positive Reduction Allow annotations in comments:
int addr = (int)ptr; /* PORTCHECK: suppress PTR-INT */
Parse comments and check for suppression markers.
5.8 The Interview Questions They’ll Ask
- “What’s the difference between undefined behavior and implementation-defined behavior?”
- UB: Standard imposes no requirements; compiler can do anything
- Impl-defined: Compiler must document behavior, but it varies
- Example: Signed overflow (UB) vs sizeof(long) (impl-defined)
- “Why is
int x = -1; unsigned y = 1; if (x < y)false?”- Usual arithmetic conversions convert
xto unsigned - -1 as unsigned is
UINT_MAX UINT_MAX > 1, so condition is false
- Usual arithmetic conversions convert
- “How would you write truly portable serialization code?”
- Don’t use struct overlays; serialize field by field
- Use explicit byte-order functions (htonl, ntohl, or custom)
- Use fixed-width types (uint32_t, not int)
- Use memcpy for potentially unaligned access
- “What happens when you right-shift a negative number?”
- Implementation-defined! Could be:
- Arithmetic shift (sign extension) - most common
- Logical shift (zero fill) - some platforms
- “Why might code work in debug but fail in release?”
- Compiler optimizations exploit UB assumptions
- Debug: UB may coincidentally work
- Release: Optimizer removes “impossible” code paths
- Example: Overflow check
if (x + y < x)optimized away
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Portability fundamentals | Expert C Programming | Ch. 8: “Why Programmers Can’t Tell Halloween from Christmas” |
| Implementation-defined behavior | Effective C | Ch. 1: “Getting Started with C” |
| Data models | Write Great Code Vol 1 | Ch. 4: “Floating-Point Representation” (and Ch. 3 for integers) |
| Undefined behavior | C Traps and Pitfalls | Throughout |
| Standards compliance | ISO C Standard | Annex J (Portability issues) |
| Platform differences | CS:APP | Ch. 2: “Representing Information” |
5.10 Implementation Phases
Phase 1: Lexer and Infrastructure (Days 1-3)
Goals:
- Tokenize C source files
- Track file, line, column
- Handle basic preprocessor directives
/* lexer.c skeleton */
typedef struct {
FILE *file;
char *filename;
size_t line;
size_t column;
int current_char;
int next_char;
} Lexer;
Token *lexer_next_token(Lexer *lex);
void lexer_skip_whitespace(Lexer *lex);
void lexer_skip_comment(Lexer *lex);
Token *lexer_read_identifier(Lexer *lex);
Token *lexer_read_number(Lexer *lex);
Token *lexer_read_string(Lexer *lex);
Checkpoint: Can tokenize simple C files correctly.
Phase 2: First Checkers (Days 4-6)
Goals:
- Implement extension checker (keyword matching)
- Implement pointer-int checker (cast pattern)
- Implement basic type size checker
/* checkers/extension.c */
static const char *gcc_extensions[] = {
"typeof", "__typeof__",
"__attribute__",
"__builtin_expect",
"__builtin_return_address",
"__extension__",
"__asm__", "__volatile__",
NULL
};
void check_extensions(Token *tokens, size_t count,
PortabilityIssue **issues, size_t *num_issues) {
for (size_t i = 0; i < count; i++) {
if (tokens[i].type == TOK_KEYWORD ||
tokens[i].type == TOK_IDENTIFIER) {
for (int j = 0; gcc_extensions[j]; j++) {
if (strcmp(tokens[i].text, gcc_extensions[j]) == 0) {
add_issue(issues, num_issues,
PORT_EXTENSION, SEVERITY_WARNING,
tokens[i].file, tokens[i].line,
"Non-standard extension",
"Consider standard alternative or #ifdef");
}
}
}
}
}
Checkpoint: Detects obvious extensions and pointer casts.
Phase 3: Pattern Analysis (Days 7-9)
Goals:
- Track variable types through declarations
- Detect endianness issues (pointer type punning)
- Detect alignment issues
/* Simple type tracking */
typedef struct {
char *name;
bool is_pointer;
bool is_signed;
int pointer_depth;
char *base_type; /* "int", "char", "long", etc. */
} VariableType;
typedef struct {
VariableType *vars;
size_t count;
size_t capacity;
} TypeEnvironment;
void track_declaration(TypeEnvironment *env, Token *tokens, size_t start);
VariableType *lookup_variable(TypeEnvironment *env, const char *name);
Checkpoint: Can identify variable types and detect more subtle issues.
Phase 4: Advanced Checkers (Days 10-12)
Goals:
- Printf format checker
- Shift operation checker
- Signed/unsigned comparison checker
/* checkers/printf.c */
typedef struct {
const char *format;
const char *correct_types[4]; /* Acceptable types */
} FormatSpec;
static FormatSpec specs[] = {
{ "%d", {"int", "short", "char", NULL} },
{ "%ld", {"long", NULL} },
{ "%lld", {"long long", "int64_t", NULL} },
{ "%zu", {"size_t", NULL} },
{ "%p", {"void*", NULL} },
{ "%" PRId64, {"int64_t", NULL} },
{ NULL }
};
void check_printf_format(Token *tokens, size_t count,
PortabilityIssue **issues, size_t *num_issues);
Checkpoint: Catches format string mismatches.
Phase 5: Output and Polish (Days 13-14)
Goals:
- Multiple output formats (text, JSON, CSV)
- Configuration file support
- Suppression comments
- Testing and refinement
/* output.c */
void output_text(PortabilityIssue *issues, size_t count, FILE *out);
void output_json(PortabilityIssue *issues, size_t count, FILE *out);
void output_csv(PortabilityIssue *issues, size_t count, FILE *out);
/* config.c */
Config *config_from_args(int argc, char **argv);
Config *config_from_file(const char *filename);
void config_merge(Config *base, Config *override);
Checkpoint: Complete, usable tool.
5.11 Key Implementation Decisions
Decision 1: Full parser vs pattern matching?
- Pattern matching is simpler and sufficient for most issues
- Trade-off: May have false positives/negatives vs full AST
- Recommendation: Start with patterns, add AST later if needed
Decision 2: How to handle macros?
- Option A: Expand macros (requires preprocessor integration)
- Option B: Analyze macro definitions separately
- Option C: Warn about macros that may hide issues
- Recommendation: Start with Option C, add expansion later
Decision 3: Type inference depth?
- Shallow: Only track explicit declarations in current function
- Deep: Track across function calls, includes, etc.
- Recommendation: Shallow for v1, document limitations
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| True Positives | Verify detection | Known non-portable code |
| True Negatives | Verify no false positives | Portable code patterns |
| Edge Cases | Boundary conditions | Empty files, huge files |
| Regression | Prevent reintroduction | Fixed bugs stay fixed |
6.2 Test Files
Create test files with known issues:
/* test_files/type_size.c */
// EXPECT: TYPE-SIZE at line 4
// EXPECT: TYPE-SIZE at line 7
void test_type_size() {
int arr[1024 / sizeof(int)]; // Issue
// This is fine:
int32_t arr2[1024 / sizeof(int32_t)];
long x = 0x123456789ABCDEF; // Issue: may overflow
}
/* test_files/ptr_int.c */
// EXPECT: PTR-INT at line 5
// EXPECT: PTR-INT at line 8
void test_ptr_int() {
void *ptr = malloc(100);
int addr = (int)ptr; // Issue
// This is fine:
uintptr_t addr2 = (uintptr_t)ptr;
int *ip = (int *)0x1000; // Issue (usually)
}
6.3 Running Tests
#!/bin/bash
# run_tests.sh
PORTCHECK=./portcheck
TESTDIR=tests/test_files
EXPECTED=tests/expected
PASS=0
FAIL=0
for test_file in $TESTDIR/*.c; do
name=$(basename "$test_file" .c)
expected="$EXPECTED/$name.txt"
$PORTCHECK "$test_file" > /tmp/actual.txt 2>&1
if diff -q "$expected" /tmp/actual.txt > /dev/null; then
echo "PASS: $name"
((PASS++))
else
echo "FAIL: $name"
echo "Expected:"
cat "$expected"
echo "Actual:"
cat /tmp/actual.txt
((FAIL++))
fi
done
echo "Results: $PASS passed, $FAIL failed"
exit $FAIL
6.4 Cross-Platform Verification
Verify that detected issues actually cause problems:
# Compile test programs on different platforms
# and verify they behave differently
# 64-bit Linux (LP64)
gcc -m64 test_ptr_int.c -o test_lp64
./test_lp64
# 32-bit (ILP32)
gcc -m32 test_ptr_int.c -o test_ilp32
./test_ilp32
# Compare outputs
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Over-reporting | Too many false positives | Add context analysis |
| Under-reporting | Missing obvious issues | Expand pattern set |
| Macro blindness | Issues hidden in macros | Warn about risky macros |
| Type confusion | Wrong type inference | Track declarations carefully |
| Comment parsing | Fails on weird comments | Handle edge cases |
7.2 Debugging the Checker
/* Add verbose mode for debugging */
#ifdef DEBUG
#define DEBUG_LOG(fmt, ...) \
fprintf(stderr, "[DEBUG] %s:%d: " fmt "\n", \
__FILE__, __LINE__, ##__VA_ARGS__)
#else
#define DEBUG_LOG(fmt, ...)
#endif
void check_ptr_int(Token *tokens, size_t count, ...) {
for (size_t i = 0; i < count; i++) {
DEBUG_LOG("Token %zu: type=%d text='%s'",
i, tokens[i].type, tokens[i].text);
// ...
}
}
7.3 Handling Edge Cases
/* Edge case: sizeof in macro */
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))
/* Edge case: Cast in macro */
#define PTR_TO_INT(p) ((int)(uintptr_t)(p)) // Actually portable!
/* Edge case: Conditional compilation */
#ifdef __LP64__
typedef long intptr;
#else
typedef int intptr;
#endif
/* Edge case: Attribute with fallback */
#ifdef __GNUC__
#define PACKED __attribute__((packed))
#else
#define PACKED
#endif
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
--helpwith detailed usage examples - Add
--versioncommand - Support reading from stdin (
portcheck -) - Add color output for terminals
8.2 Intermediate Extensions
- Add POSIX compliance checking (POSIX vs C standard)
- Add C11/C17/C23 feature detection (flag non-C89 features)
- Add severity levels configurable per-category
- Add fix suggestions that can be applied automatically
- Support checking header files specially
8.3 Advanced Extensions
- Add full preprocessor integration (expand macros)
- Add interprocedural analysis (track types across functions)
- Add configuration profiles for specific platforms
- Add integration with build systems (CMake, Meson)
- Add IDE integration (LSP server for real-time checking)
- Add machine learning for false positive reduction
8.4 Research Extensions
- Analyze real-world portability bugs from CVE database
- Build corpus of portable vs non-portable code patterns
- Compare with clang-tidy and other static analyzers
- Study compiler warning evolution across versions
9. Real-World Connections
9.1 Famous Portability Disasters
1. The 64-bit Windows Porting Crisis (2005-2010)
- Thousands of Unix programs assumed
sizeof(long) == sizeof(void*) - Windows chose LLP64 (long stays 32-bit), breaking this assumption
- Cost: Millions of developer-hours fixing long-to-pointer casts
2. The Y2K38 Problem (Ongoing)
- 32-bit
time_toverflows on January 19, 2038 - Systems assuming
sizeof(time_t) == 4will break - Many embedded systems still at risk
3. ARM Alignment Crashes (2010s)
- x86-developed code assumed unaligned access is fine
- Early ARM mobile devices crashed on unaligned access
- Android NDK code required extensive fixes
4. Heartbleed (2014)
- Not directly a portability bug, but shows importance of careful C
- Buffer over-read in OpenSSL
- Tools like portcheck catch related issues (unsafe casts, size assumptions)
9.2 Professional Tools Comparison
| Tool | Focus | Pros | Cons |
|---|---|---|---|
| Your tool | Portability | Focused, fast, educational | Limited scope |
| clang-tidy | General quality | Comprehensive, maintained | Complex, many rules |
| cppcheck | Bugs, style | Good portability checks | C++ focused |
| PVS-Studio | Deep analysis | Very thorough | Commercial, slow |
| Coverity | Enterprise | Industry standard | Very expensive |
9.3 How Companies Handle Portability
Linux Kernel:
- Extensive use of
#ifdeffor platform differences - Custom types (
u32,u64) for known sizes - Strict coding style enforces portable patterns
SQLite:
- Single-file amalgamation reduces platform issues
- Extensive testing on all platforms
- Avoids GCC-specific features
curl:
- Builds on 90+ operating systems
- Extensive autoconf checks for platform features
- Conservative C (mostly C89 compatible)
10. Resources
10.1 C Standards
- C11 Standard (N1570 Draft) - Free draft
- C17 Standard - Official (paid)
- C23 Draft - Latest draft
10.2 Platform Documentation
- System V AMD64 ABI - Linux/Unix 64-bit
- ARM ABI - ARM architecture
- Microsoft x64 ABI - Windows 64-bit
10.3 Related Projects
- Include What You Use - Header analysis
- clang-tidy - General linter
- sparse - Linux kernel static analyzer
10.4 Online Tools
- Compiler Explorer - See how code compiles on different platforms
- cdecl.org - Decode C declarations
- cppreference - C/C++ reference
11. Self-Assessment Checklist
11.1 Understanding Verification
- I can explain the difference between undefined and implementation-defined behavior
- I know the sizes of basic types on ILP32, LP64, and LLP64
- I understand why endianness matters for file I/O and networking
- I can identify alignment requirements for different types
- I know which GCC attributes have no standard equivalent
11.2 Implementation Verification
- My lexer correctly tokenizes C source files
- My tool detects pointer-to-int casts that may truncate
- My tool detects sizeof assumptions on non-fixed types
- My tool identifies common compiler extensions
- My tool provides actionable suggestions for each issue
11.3 Quality Verification
- False positive rate is acceptable (< 10% on real code)
- Tool runs fast enough for large codebases (< 1 sec per 1000 lines)
- Output is clear and actionable
- Tool handles edge cases gracefully (empty files, binary files)
12. Submission / Completion Criteria
12.1 Minimum Viable Completion
- Lexer tokenizes C files correctly
- Detects at least 5 categories of portability issues
- Provides clear error messages with line numbers
- Handles command-line arguments (file input)
- Has at least 10 test cases
12.2 Full Completion
- All 10+ issue categories implemented
- Multiple output formats (text, JSON)
- Configuration file support
- Suppression comments supported
- Comprehensive test suite (50+ test cases)
- Documentation (README, man page)
12.3 Excellence
- False positive rate < 5% on real-world code
- Successfully finds real bugs in open-source projects
- Performance: > 10,000 lines/second
- IDE integration (basic LSP or editor plugin)
- Cross-platform testing verification
The Core Question You’re Answering
“What assumptions in C code cause it to work on one platform but fail on another, and how can we automatically detect these assumptions before they cause production failures?”
This project teaches you that portability isn’t about following rules blindly - it’s about understanding why the rules exist. When you know that sizeof(long) varies because of deliberate ABI choices made decades ago, and you understand the trade-offs those choices represent, you can write code that respects those differences while still being efficient and readable.
The checker you build is both a practical tool and a crystallization of platform-specific knowledge that would otherwise take years to accumulate through painful production bugs.
Navigation
Previous: P15: Struct Packing Analyzer - Understanding memory layout and alignment
Next: P17: Calling Convention Visualizer - How functions pass arguments
Related:
- P05: Type Promotion Tester - Related signed/unsigned issues
- P08: Multi-dimensional Array Navigator - Memory layout concepts
- P12: Bug Catalog - Language quirks and pitfalls
This guide was expanded from EXPERT_C_PROGRAMMING_DEEP_DIVE.md. For the complete learning path, see the project index.