Project 5: Cross-Platform Shared Library with C API

Build a shared library with a stable C ABI that compiles on Linux, macOS, and Windows, and can be called safely from Python via ctypes.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 2-3 weeks
Main Programming Language C (Alternatives: C++, Rust, Zig)
Alternative Programming Languages C++, Rust, Zig
Coolness Level Level 3: Genuinely Clever
Business Potential Level 3: Service & Support Model
Prerequisites C ABI basics, build tools, basic FFI
Key Topics C ABI, calling conventions, symbol export, versioning

1. Learning Objectives

By completing this project, you will:

  1. Design a stable C API that is safe for FFI.
  2. Export symbols correctly on Linux/macOS/Windows.
  3. Implement versioning and compatibility checks.
  4. Manage memory ownership across language boundaries.
  5. Build and test across platforms with deterministic examples.

2. All Theory Needed (Per-Concept Breakdown)

2.1 C ABI and Calling Conventions

Fundamentals The C ABI defines how functions are called at the binary level: which registers carry arguments, how the stack is used, and how return values are passed. It also defines data layout (struct padding, alignment) and name mangling rules (or the lack thereof). The reason C is the lingua franca of binary interfaces is that its ABI is widely standardized and stable across compilers on a given platform. A shared library that exposes a C ABI can be consumed by many languages via FFI.

Deep Dive into the concept Calling conventions vary by platform. On x86-64 Linux, the System V ABI passes the first six integer/pointer arguments in registers (RDI, RSI, RDX, RCX, R8, R9). On Windows x64, the first four arguments are in RCX, RDX, R8, R9, and the caller allocates “shadow space” on the stack. If you compile a library with one convention and call it with another, arguments will be read from the wrong locations and the program will crash. C compilers hide this from you, but FFI bindings must match the platform’s convention.

Data layout is equally important. struct fields are padded so that each field aligns to its required boundary. This means the size of a struct can change if you insert a field with stricter alignment. If your API exposes structs directly, you must treat them as part of the ABI. A safer approach is to use opaque pointers and accessor functions, keeping structs private. This is why many C APIs use handles like foo_t* and provide functions to create, query, and destroy them.

The ABI also defines type sizes. For example, long is 8 bytes on Linux x86-64 but 4 bytes on Windows. If you expose long in your API, you may break cross-platform compatibility. Prefer fixed-width types like int32_t and uint64_t. For FFI, use simple types: integers, floats, pointers, and function pointers. Avoid passing structs by value unless you fully control the bindings.

Finally, name mangling matters. C++ compilers mangle names to support overloading, which makes them incompatible with C ABI. Using extern "C" disables mangling. If your library is implemented in C++, you must wrap the public API with extern "C" to keep the ABI stable.

How this fits in this project Your library will expose a C API intended for Python ctypes. That means all exported functions must use C linkage, stable types, and a predictable calling convention.

Definitions & key terms

  • ABI -> Binary contract defining calling conventions and data layout.
  • Calling convention -> Rules for passing arguments and returns.
  • Opaque handle -> Pointer to an internal struct not exposed publicly.
  • Name mangling -> Compiler-generated symbol names for overloads.

Mental model diagram (ASCII)

Python ctypes -> C ABI -> shared library
   (ffi)        (stable)   (exported symbols)

How it works (step-by-step, with invariants and failure modes)

  1. API functions are declared with extern "C" (if C++).
  2. Functions use fixed-width types and pointers.
  3. Caller passes args using platform calling convention.
  4. Callee reads args and returns value safely.

Invariants: same calling convention on both sides; stable type sizes. Failure modes: wrong type sizes, mismatched calling convention.

Minimal concrete example

#ifdef __cplusplus
extern "C" {
#endif

int32_t stats_mean(const double* values, size_t n);

#ifdef __cplusplus
}
#endif

Common misconceptions

  • long is portable.” -> It is not.
  • “FFI can call any C++ function.” -> Only extern "C" functions with simple types.

Check-your-understanding questions

  1. Why is C ABI more portable than C++ ABI?
  2. Why should you avoid passing structs by value across FFI?
  3. Why are fixed-width types preferred?

Check-your-understanding answers

  1. C ABI is standardized and has no name mangling.
  2. Struct layout can vary by compiler and platform.
  3. They guarantee size and alignment across platforms.

Real-world applications

  • Language bindings for Python, Rust, and Java.
  • System libraries like libsqlite3.

Where you’ll apply it

References

  • System V ABI documentation.
  • Microsoft x64 calling convention docs.

Key insights A portable library starts with a portable ABI: fixed types, stable calling conventions, and C linkage.

Summary The C ABI is the contract between your library and every foreign caller. If you keep it simple and explicit, your library becomes portable and reliable.

Homework/Exercises to practice the concept

  1. Write a small C library and call it from Python with ctypes.
  2. Replace long with int64_t and observe how bindings change.
  3. Write an extern "C" wrapper around a C++ function.

Solutions to the homework/exercises

  1. Export add(int, int) and call it with ctypes.CDLL.
  2. Update your ctypes bindings and rerun tests.
  3. Use extern "C" and verify symbol name with nm.

2.2 Symbol Export and Visibility Across Platforms

Fundamentals A shared library only exposes symbols that the loader can see. On Linux and macOS, you can control visibility with compiler attributes and link flags. On Windows, you must explicitly export symbols using __declspec(dllexport) or a .def file. If a symbol is not exported, consumers cannot call it, even if it exists in the binary.

Deep Dive into the concept On ELF platforms (Linux), every global symbol is exported by default unless you compile with -fvisibility=hidden. Best practice is to hide all symbols and explicitly mark the public API with __attribute__((visibility("default"))). This reduces symbol collisions and protects internal functions from becoming part of your ABI. On macOS, the default is similar, but the linker uses different tools (nm, otool -L, install_name_tool). On Windows, export rules are strict: you must annotate public symbols with __declspec(dllexport) when building the DLL, and with __declspec(dllimport) when consuming it. If you forget, the symbol will not appear in the DLL’s export table and GetProcAddress will fail.

A cross-platform library often uses an API macro that expands to the appropriate visibility attribute. This macro is included in every public function declaration. For example:

#if defined(_WIN32)
  #define API __declspec(dllexport)
#else
  #define API __attribute__((visibility("default")))
#endif

This ensures that the same header works everywhere.

Visibility also affects performance. Fewer exported symbols means smaller dynamic symbol tables and faster loader startup. It also reduces accidental ABI coupling. For this project, exporting only a minimal API surface is essential to keep ABI stability manageable.

How this fits in this project You will create a cross-platform API macro and ensure that only the intended functions are exported. The Python bindings will depend on these exports.

Definitions & key terms

  • Symbol export -> Making a symbol visible to the dynamic loader.
  • Visibility -> ELF concept controlling which symbols are exported.
  • Export table -> Windows DLL table of exported symbols.

Mental model diagram (ASCII)

[libmystats.so]
  - internal_helper (hidden)
  - API stats_mean (exported)

How it works (step-by-step, with invariants and failure modes)

  1. Define API macro based on platform.
  2. Annotate all public functions with API.
  3. Build library with -fvisibility=hidden on ELF.
  4. Verify exports with nm -D or dumpbin /exports.

Invariants: public symbols must be exported; internal symbols hidden. Failure modes: missing exports, accidental ABI exposure.

Minimal concrete example

#if defined(_WIN32)
  #define API __declspec(dllexport)
#else
  #define API __attribute__((visibility("default")))
#endif

API double stats_mean(const double* v, size_t n);

Common misconceptions

  • “Global symbols are always exported.” -> Not on Windows, and not if visibility is hidden.
  • “Exporting everything is harmless.” -> It bloats the ABI surface.

Check-your-understanding questions

  1. Why hide symbols by default on ELF?
  2. How do you verify exported symbols on Windows?
  3. What happens if you forget __declspec(dllexport)?

Check-your-understanding answers

  1. To avoid accidental ABI exposure and reduce symbol collisions.
  2. Use dumpbin /exports or llvm-objdump --exports.
  3. The symbol is missing from the export table, so consumers cannot link.

Real-world applications

  • Cross-platform SDKs and plugin APIs.

Where you’ll apply it

  • In this project: see Section 3.2 Functional Requirements and Section 5.2 Project Structure.
  • Also used in: P03-ld-preload-interceptor for symbol visibility awareness.

References

  • GCC visibility documentation.
  • Microsoft DLL export docs.

Key insights Exporting only what you intend is a security and stability feature, not just a build detail.

Summary Symbol export rules differ across platforms. A clean API macro and visibility discipline make your library portable and stable.

Homework/Exercises to practice the concept

  1. Build a library with -fvisibility=hidden and verify exports.
  2. Create a DLL with a .def file and compare exports.
  3. Test missing export behavior by calling an unexported function from Python.

Solutions to the homework/exercises

  1. Use nm -D libmystats.so.
  2. Use dumpbin /exports mystats.dll.
  3. ctypes will raise AttributeError for missing symbols.

2.3 Versioning, SONAME, and Compatibility Guarantees

Fundamentals Shared libraries evolve. Versioning is how you communicate compatibility. On Linux, SONAME is the ABI identity, and loaders use it to ensure the correct version is loaded. On macOS, install names and @rpath serve similar roles. On Windows, DLL versioning is less formal but still important. A stable library requires a versioning strategy that distinguishes compatible changes from breaking ones.

Deep Dive into the concept On Linux, a shared library file might be libmystats.so.1.2.3. The SONAME is typically libmystats.so.1 and is embedded in the library. When a binary links against the library, it records the SONAME in its DT_NEEDED list. If you later ship libmystats.so.2, the loader will not consider it compatible with libmystats.so.1 unless you provide a symlink or compatibility layer. This enforces ABI stability at runtime. Therefore, if you make a breaking ABI change, you must bump the SONAME.

On macOS, dynamic libraries have install names like @rpath/libmystats.dylib. The loader resolves these using @rpath entries. Versioning is managed via compatibility_version and current_version fields, which you can set using -compatibility_version and -current_version linker flags. On Windows, versioning is often handled through filename changes or embedded version resources, but the loader does not enforce compatibility automatically.

For a cross-platform library, you should adopt a semantic versioning policy at the API level and map breaking changes to SONAME or equivalent changes on each platform. That means: change SONAME when the ABI breaks; keep it the same for backward-compatible changes. Provide clear documentation of your version policy and include runtime checks (like mystats_version() returning a string or integer) so callers can verify compatibility.

How this fits in this project You will implement a versioning scheme (major/minor/patch), embed it in the build, and expose it through the API. On Linux, you will set the SONAME appropriately.

Definitions & key terms

  • SONAME -> Shared object name that defines ABI identity.
  • Compatibility version -> macOS notion of backward compatibility.
  • Semantic versioning -> Major/minor/patch version policy.

Mental model diagram (ASCII)

libmystats.so.1.2.3
  SONAME = libmystats.so.1
  ABI compatible with 1.x

How it works (step-by-step, with invariants and failure modes)

  1. Define API version constants in headers.
  2. Build library with SONAME set to major version.
  3. Expose runtime version query function.
  4. Increment SONAME on breaking ABI changes.

Invariants: major version change implies ABI break. Failure modes: mismatched SONAME causing runtime load errors.

Minimal concrete example

cc -shared -Wl,-soname,libmystats.so.1 -o libmystats.so.1.0.0 mystats.o

Common misconceptions

  • “Minor changes never break ABI.” -> They can if you change struct layout.
  • “SONAME is optional.” -> It defines runtime compatibility.

Check-your-understanding questions

  1. Why is SONAME important for runtime compatibility?
  2. How does macOS express compatibility version?
  3. What should happen when you make an ABI-breaking change?

Check-your-understanding answers

  1. The loader uses it to decide which library satisfies DT_NEEDED.
  2. Via -compatibility_version and install names.
  3. Bump the major version and SONAME.

Real-world applications

  • System libraries like libssl that evolve across major versions.

Where you’ll apply it

References

  • man ld.so, man ld.
  • Apple ld documentation for compatibility versions.

Key insights Versioning is a runtime compatibility contract, not just a semantic label.

Summary A cross-platform library must map its versioning policy to platform-specific mechanisms like SONAME and install names.

Homework/Exercises to practice the concept

  1. Build libfoo.so.1.0.0 with SONAME libfoo.so.1.
  2. Change the API and bump SONAME to libfoo.so.2.
  3. Use readelf -d to confirm DT_SONAME.

Solutions to the homework/exercises

  1. Use -Wl,-soname,libfoo.so.1.
  2. Change the SONAME and update symlinks.
  3. readelf -d libfoo.so.1.0.0 | grep SONAME.

2.4 Memory Ownership and FFI Safety

Fundamentals When a library is used across languages, memory ownership must be explicit. If a library allocates memory, it should also provide a function to free it. Mixing allocators across language boundaries can cause crashes because different runtimes may use different heaps. Therefore, FFI-safe libraries either (1) require the caller to allocate buffers, or (2) provide alloc/free functions that are always used symmetrically.

Deep Dive into the concept Memory ownership rules must be part of the API documentation. For example, if the library returns a string, is the caller supposed to free it? If so, with which function? If you return char* that was allocated with malloc, a Python caller using ctypes could call libc.free, but this is unsafe if the library uses a different allocator. The safer design is to provide mystats_free(void*) or to require the caller to pass a buffer for output.

Additionally, consider lifetime of handles. If you create a stats_ctx_t*, you need stats_destroy() to free it. Document whether functions are thread-safe, and whether handles can be used concurrently. For FFI, keep APIs simple: functions should either be pure (no ownership) or have clear create/destroy semantics.

This concept ties into ABI stability because ownership rules are part of the contract. Changing them is a breaking change. Therefore, you should version ownership changes explicitly.

How this fits in this project You will design your API to avoid returning allocated memory when possible. When you must allocate, you will provide matching free functions.

Definitions & key terms

  • Ownership -> Responsibility for freeing memory.
  • Allocator mismatch -> Crash due to freeing memory with a different heap.
  • FFI safety -> API design safe for foreign languages.

Mental model diagram (ASCII)

Caller allocates buffer -> library fills -> caller frees
or
Library allocates -> library provides free()

How it works (step-by-step, with invariants and failure modes)

  1. Define ownership rules in API docs.
  2. If library allocates, expose a free function.
  3. Ensure callers use the correct free function.

Invariants: every allocation has a matching free on the same allocator. Failure modes: double free, allocator mismatch.

Minimal concrete example

API char* stats_format(const stats_t* s);
API void stats_free(void* p);

Common misconceptions

  • “The caller can free with libc free.” -> Not always safe.
  • “Ownership rules can be inferred.” -> They must be explicit.

Check-your-understanding questions

  1. Why is allocator mismatch dangerous?
  2. How do you design an API to avoid ownership confusion?
  3. Why are ownership rules part of ABI stability?

Check-your-understanding answers

  1. Different allocators manage different heaps; freeing with the wrong one corrupts memory.
  2. Use caller-allocated buffers or provide explicit free functions.
  3. Changing ownership semantics breaks existing callers.

Real-world applications

  • Cross-language bindings for C libraries like libgit2.

Where you’ll apply it

  • In this project: see Section 3.5 Data Formats and Section 5.11 Key Implementation Decisions.
  • Also used in: P01-plugin-audio-effects-processor for plugin ownership rules.

References

  • “C Interfaces and Implementations” (Hanson), interface boundaries.

Key insights FFI safety depends more on ownership rules than on syntax.

Summary If you make ownership explicit and consistent, your library becomes safe to use across languages.

Homework/Exercises to practice the concept

  1. Design an API that returns a string and provides a free function.
  2. Bind it with ctypes and ensure correct cleanup.
  3. Introduce an ownership bug and observe the crash.

Solutions to the homework/exercises

  1. Return char* and implement stats_free.
  2. Call lib.stats_free(ptr) in Python.
  3. Free with libc.free and see heap corruption.

3. Project Specification

3.1 What You Will Build

A small statistics library libmystats that:

  • Computes mean, median, and variance on arrays of doubles.
  • Exposes a stable C API for cross-language use.
  • Builds on Linux, macOS, and Windows.
  • Provides Python bindings via ctypes.

3.2 Functional Requirements

  1. C API: Fixed-width types, no exposed structs.
  2. Symbol export: Correct visibility on all platforms.
  3. Versioning: Expose mystats_version() and set SONAME/install name.
  4. FFI safety: Clear memory ownership rules.
  5. Bindings: Provide a minimal Python ctypes example.

3.3 Non-Functional Requirements

  • Portability: Must build with GCC/Clang/MSVC.
  • Reliability: Deterministic results for identical inputs.
  • Documentation: API and ownership rules clearly described.

3.4 Example Usage / Output

import ctypes
lib = ctypes.CDLL("./libmystats.so")
lib.mystats_mean.argtypes = [ctypes.POINTER(ctypes.c_double), ctypes.c_size_t]
lib.mystats_mean.restype = ctypes.c_double
print(lib.mystats_mean((ctypes.c_double*3)(1.0, 2.0, 3.0), 3))
# Output: 2.0

3.5 Data Formats / Schemas / Protocols

C API

API double mystats_mean(const double* v, size_t n);
API double mystats_variance(const double* v, size_t n);
API const char* mystats_version(void);

3.6 Edge Cases

  • n == 0 -> return NaN and set error code.
  • Null pointer input.
  • Extremely large numbers (overflow handling).

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

# Linux build
cc -shared -fPIC -o libmystats.so mystats.c -Wl,-soname,libmystats.so.1

# Run Python example
python3 examples/ctypes_demo.py

3.7.2 Golden Path Demo (Deterministic)

  • Input: [1.0, 2.0, 3.0]
  • Output: mean = 2.0, variance = 0.6666666667

3.7.3 CLI Transcript (Success + Failure)

$ python3 examples/ctypes_demo.py
mean=2.0 variance=0.6666666667
[exit] code=0

$ python3 examples/ctypes_demo_empty.py
[error] mystats_mean: n==0
[exit] code=11

3.7.4 If CLI: Exit Codes

  • 0: success
  • 11: invalid input (n==0)

3.7.5 If Library: Usage Snippet and Errors

Install/Build

cc -shared -fPIC -o libmystats.so mystats.c -Wl,-soname,libmystats.so.1

Minimal usage

import ctypes
lib = ctypes.CDLL("./libmystats.so")
lib.mystats_mean.argtypes = [ctypes.POINTER(ctypes.c_double), ctypes.c_size_t]
lib.mystats_mean.restype = ctypes.c_double
result = lib.mystats_mean((ctypes.c_double*3)(1.0, 2.0, 3.0), 3)
print(result)  # 2.0

Expected return values/output

  • mystats_mean returns a double with the mean.\n- mystats_variance returns a double with the variance.

Error-handling usage snippet

import ctypes, math
lib = ctypes.CDLL("./libmystats.so", use_errno=True)
lib.mystats_mean.argtypes = [ctypes.POINTER(ctypes.c_double), ctypes.c_size_t]
lib.mystats_mean.restype = ctypes.c_double
res = lib.mystats_mean((ctypes.c_double*0)(), 0)
if math.isnan(res):
    print("error errno=", ctypes.get_errno())

4. Solution Architecture

4.1 High-Level Design

Python ctypes -> libmystats (C ABI) -> math routines

4.2 Key Components

Component Responsibility Key Decisions
API header Stable function signatures Fixed-width types
Core math Mean/variance algorithms Deterministic sum order
Export macro Platform-specific visibility Single API macro
Bindings Python ctypes example Minimal demo

4.3 Data Structures (No Full Code)

typedef struct {
    double mean;
    double variance;
} mystats_result_t; /* internal only */

4.4 Algorithm Overview

Key Algorithm: Mean and Variance

  1. Compute mean by summing and dividing by n.
  2. Compute variance by second pass over data.

Complexity Analysis:

  • Time: O(n)
  • Space: O(1)

5. Implementation Guide

5.1 Development Environment Setup

# Linux
sudo apt-get install build-essential

5.2 Project Structure

libmystats/
|-- include/
|   `-- mystats.h
|-- src/
|   `-- mystats.c
|-- examples/
|   `-- ctypes_demo.py
|-- CMakeLists.txt
`-- README.md

5.3 The Core Question You’re Answering

“How do I ship a binary library that remains ABI-stable across platforms?”

5.4 Concepts You Must Understand First

  1. C ABI and calling conventions.
  2. Symbol export and visibility.
  3. Versioning and SONAME/install names.
  4. Memory ownership rules.

5.5 Questions to Guide Your Design

  1. Which types are safe to expose across platforms?
  2. How will you ensure symbols are exported consistently?
  3. What is your versioning policy?

5.6 Thinking Exercise

Design a JSON parsing API that never exposes internal structs. What functions do you need?

5.7 The Interview Questions They’ll Ask

  1. “Why is C ABI the lingua franca for shared libraries?”
  2. “How do you export symbols on Windows vs Linux?”
  3. “How do you handle memory ownership across FFI?”

5.8 Hints in Layers

Hint 1: Use extern "C" and fixed-width types

Hint 2: Define an API macro for exports

Hint 3: Add mystats_version() early

5.9 Books That Will Help

Topic Book Chapter
Interfaces C Interfaces and Implementations Ch. 2
Linking TLPI Ch. 41

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

  • Define headers and export macro.
  • Implement mean/variance core.

Phase 2: Cross-Platform Build (5-7 days)

  • Add CMake and platform-specific flags.
  • Validate exports on each OS.

Phase 3: FFI Integration (3-4 days)

  • Add Python ctypes demo and tests.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Exposed types Structs vs opaque Opaque ABI stability
Export method Def file vs declspec Macro Portable
Versioning Semver + SONAME Use SONAME Loader compatibility

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Math correctness mean, variance
Integration Tests Python FFI ctypes demo
Edge Case Tests empty input n==0

6.2 Critical Test Cases

  1. Mean of [1,2,3] is 2.0.
  2. Variance matches expected value.
  3. n==0 returns NaN and error code.

6.3 Test Data

[1.0, 2.0, 3.0]
[5.0, 5.0, 5.0]

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Export missing ctypes cannot find symbol Fix API macro
Using long Wrong results on Windows Use int64_t
Ownership confusion Crash in Python Provide free functions

7.2 Debugging Strategies

  • Use nm -D or dumpbin to verify exports.
  • Add small C test program before Python bindings.

7.3 Performance Traps

  • Recomputing mean multiple times; cache if needed.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add mystats_stddev.
  • Add a C test harness.

8.2 Intermediate Extensions

  • Add SIMD optimization with fallback.
  • Provide a Rust binding.

8.3 Advanced Extensions

  • ABI compatibility test suite across versions.
  • Build and publish wheels for Python.

9. Real-World Connections

9.1 Industry Applications

  • Shared C libraries used by multiple languages.
  • SDKs for hardware devices and analytics.
  • libcurl: stable C API with bindings.
  • libsqlite3: portable database library.

9.3 Interview Relevance

  • Demonstrates cross-platform ABI design and FFI knowledge.

10. Resources

10.1 Essential Reading

  • “C Interfaces and Implementations” (Hanson).
  • Platform ABI docs (System V, Microsoft x64).

10.2 Video Resources

  • FFI and ABI talks.

10.3 Tools & Documentation

  • nm, objdump, dumpbin, otool.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the platform calling convention differences.
  • I can explain how symbol export works on each OS.
  • I can explain SONAME and versioning policy.

11.2 Implementation

  • Library builds on Linux/macOS/Windows.
  • Python ctypes demo works.
  • Ownership rules are documented and correct.

11.3 Growth

  • I can explain ABI design in an interview.
  • I documented at least one cross-platform build issue.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Stable C API with mean/variance and version function.
  • Linux build with correct exports.

Full Completion:

  • Builds on three platforms and works with Python ctypes.
  • SONAME/install name versioning documented.

Excellence (Going Above & Beyond):

  • ABI compatibility test suite across versions.
  • Published bindings for two languages.