Project 3: LD_PRELOAD Function Interceptor

Build a shared library that intercepts libc functions via LD_PRELOAD, logs calls safely, and forwards to the real implementations without recursion bugs.

Quick Reference

Attribute	Value
Difficulty	Level 3: Advanced
Time Estimate	Weekend to 1 week
Main Programming Language	C (Alternatives: C++, Rust)
Alternative Programming Languages	C++, Rust
Coolness Level	Level 4: Hardcore Tech Flex
Business Potential	Level 3: Service & Support Model
Prerequisites	Dynamic loading, basic threading, libc awareness
Key Topics	Symbol interposition, `RTLD_NEXT`, reentrancy, thread safety

1. Learning Objectives

By completing this project, you will:

Explain and exploit the dynamic loader’s symbol resolution order.
Implement safe function interception with LD_PRELOAD and RTLD_NEXT.
Avoid recursion and deadlocks in hook implementations.
Collect deterministic call metrics and emit logs safely.
Debug tricky loader behaviors and symbol collisions.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Symbol Resolution Order and Interposition

Fundamentals Dynamic linking resolves symbols by searching a chain of objects: the main executable, preloaded libraries, and DT_NEEDED dependencies. LD_PRELOAD allows you to inject a library at the front of this chain. If your preload library exports a symbol with the same name as libc, your version wins. This is called interposition. It is a powerful mechanism for debugging, profiling, or altering program behavior without modifying source code. However, it is also fragile because it relies on loader rules and symbol visibility.

Deep Dive into the concept The loader builds a global symbol scope for the process. When resolving a symbol, it searches in a defined order: first the main executable, then LD_PRELOAD libraries, then dependencies in the load order (breadth-first in many implementations). Interposition works because the first definition found is used to resolve relocations. This is why LD_PRELOAD can override malloc or open. But interposition does not apply to all symbols. Some functions are resolved at link time and may be bound to internal implementations if the binary is statically linked or if -Wl,-Bsymbolic is used in a library, which forces local bindings.

There are subtleties with symbol versioning and visibility. If libc exports multiple versions of a symbol, the loader uses versioned symbols to resolve the exact one required. Your preload library must export a compatible version or the symbol may not be chosen. In practice, for common libc functions, exporting the unversioned symbol name often works because the loader is permissive, but this can vary. If you intercept a function that is inlined or is a macro (like open mapping to open64), you may not see calls you expect. This is why interceptors often hook multiple symbol names (open, open64, __libc_open depending on platform).

Interposition affects the entire process, not just the target library. That means any code in the process that calls malloc will go through your interceptor, including other libraries and even your interceptor itself. This is the core challenge: you must handle reentrancy and avoid recursion. A typical pattern is to resolve the real function pointer once (using dlsym(RTLD_NEXT, "malloc")) and then use a thread-local guard to prevent recursion.

On Linux, LD_PRELOAD is ignored for setuid binaries for security reasons (secure-exec). Your tool should detect and report this to avoid confusion. On macOS, the equivalent is DYLD_INSERT_LIBRARIES, and on Windows you need different injection techniques. For this project, focus on Linux and make the limitations explicit.

How this fits in this project You will create a preload library that intercepts malloc, open, and connect. Understanding the loader’s symbol order is what makes this possible and explains why your interceptors fire.

Definitions & key terms

Interposition -> Replacing a symbol with another definition at runtime.
LD_PRELOAD -> Environment variable specifying libraries to load first.
Global scope -> Symbol namespace used for resolution.
-Bsymbolic -> Linker option that binds symbol references locally.

Mental model diagram (ASCII)

Resolution order:
[main exe] -> [LD_PRELOAD libs] -> [DT_NEEDED libs]
   ^              ^
   |              |
 symbols resolved here first

How it works (step-by-step, with invariants and failure modes)

Loader reads LD_PRELOAD and loads your interceptor library first.
Loader resolves symbols; your overrides are chosen before libc.
When target program calls malloc, your function is invoked.
Your interceptor calls the real malloc via RTLD_NEXT.

Invariants: your interceptor must provide a correct signature and must call the real function. Failure modes: recursion loops, incorrect signatures, or missing symbol versions.

Minimal concrete example

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

static void* (*real_malloc)(size_t) = NULL;

void* malloc(size_t size) {
    if (!real_malloc) {
        real_malloc = dlsym(RTLD_NEXT, "malloc");
    }
    void* p = real_malloc(size);
    fprintf(stderr, "malloc(%zu)=%p\n", size, p);
    return p;
}

Common misconceptions

“LD_PRELOAD always works.” -> It is ignored for setuid binaries.
“You can just call printf inside malloc.” -> That may call malloc again.
“Interposition affects only the target function.” -> It affects the entire process.

Check-your-understanding questions

Why does LD_PRELOAD allow your symbols to override libc?
What does -Bsymbolic do to interposition?
Why might open calls not be intercepted?

Check-your-understanding answers

Preloaded libraries are searched before other dependencies in symbol resolution.
It forces local binding in a library, preventing overrides from outside.
open might be inlined, redirected to open64, or implemented via other syscalls.

Real-world applications

Profiling allocators, tracing system calls, or enforcing policy.

Where you’ll apply it

In this project: see Section 3.2 Functional Requirements and Section 5.8 Hints in Layers.
Also used in: P01-plugin-audio-effects-processor and P06-minimal-dynamic-linker.

References

“The Linux Programming Interface” (Kerrisk), Ch. 42.
man ld.so.

Key insights Interposition works because the loader resolves symbols in a strict order and LD_PRELOAD moves your library to the front.

Summary Symbol resolution order is the foundation of function interception. If you control the order, you can control behavior.

Homework/Exercises to practice the concept

Intercept puts and log its arguments.
Compile a binary with -Wl,-Bsymbolic and observe reduced interposition.
Intercept open and open64 and compare which gets called.

Solutions to the homework/exercises

Use LD_PRELOAD=./libhook.so and override puts.
Build a shared library with -Bsymbolic and see that your interceptor no longer wins.
Use strace to see actual syscalls and compare with your logs.

2.2 `RTLD_NEXT`, Reentrancy, and Safe Forwarding

Fundamentals When your interceptor overrides a function, you still need access to the real implementation. dlsym(RTLD_NEXT, "symbol") returns the next definition of a symbol in the loader’s search order, skipping your interceptor. This is the standard way to forward calls. However, your interceptor may itself call functions that use the intercepted symbol, causing recursion. Avoiding recursion requires reentrancy guards and careful use of low-level functions.

Deep Dive into the concept RTLD_NEXT is a special handle that tells dlsym to search for the next symbol definition after the current object. This is essential for interceptors. The typical pattern is to resolve the real function pointer once and store it in a static variable. But initialization itself can be reentrant: if the code that resolves the function (or logs) calls the intercepted function, you will recurse before the pointer is set. To avoid this, you can use a thread-local guard (e.g., __thread int in_hook) or resolve in a constructor function that runs before normal execution.

Logging is a common source of reentrancy. printf uses malloc internally, so logging from within malloc can re-enter your hook. For low-level hooks, use write(2, ...) with a preformatted buffer. For more complex logs, you can use syscall directly. Another approach is to detect recursion and bypass logging if already inside the hook.

Thread safety is also critical. The interceptor will be called from multiple threads, so any counters or global state must be protected by atomic operations or thread-local storage. If you use a mutex, be careful not to call pthread_mutex_lock in a hook that intercepts pthread_mutex_lock or another function that uses it internally.

Finally, understand that dlsym itself can call functions that may be intercepted, depending on the loader implementation. Therefore, a safe design caches function pointers during library initialization using minimal dependencies. A constructor function (__attribute__((constructor))) can initialize function pointers before any intercepted calls occur, reducing recursion risk.

How this fits in this project You will use RTLD_NEXT to forward calls to the real malloc, open, and connect. You will implement recursion guards and safe logging to prevent infinite loops.

Definitions & key terms

RTLD_NEXT -> Special handle to find the next symbol definition.
Reentrancy -> A function being called again before the previous call returns.
Constructor -> Function executed when the library is loaded.
Thread-local storage (TLS) -> Per-thread state used to avoid global locks.

Mental model diagram (ASCII)

malloc() in app
  -> your malloc()
      -> dlsym(RTLD_NEXT, "malloc") -> real malloc()
      -> log via write()

How it works (step-by-step, with invariants and failure modes)

On first call, interceptor resolves real function with RTLD_NEXT.
Interceptor sets a TLS guard to prevent recursion.
Interceptor calls the real function.
Interceptor logs safely without calling intercepted functions.
Interceptor clears guard and returns result.

Invariants: real function pointer is valid; recursion guard prevents infinite loops. Failure modes: logging recursion, deadlocks, uninitialized function pointers.

Minimal concrete example

static __thread int in_hook = 0;
static int (*real_open)(const char*, int, ...) = NULL;

int open(const char* path, int flags, ...) {
    if (in_hook) return real_open(path, flags);
    in_hook = 1;
    if (!real_open) real_open = dlsym(RTLD_NEXT, "open");
    int fd = real_open(path, flags);
    write(2, "[hook] open\n", 12);
    in_hook = 0;
    return fd;
}

Common misconceptions

“printf is safe inside hooks.” -> It can re-enter your hooks.
“dlsym is always safe.” -> It can call loader internals that allocate memory.
“Global locks are fine.” -> They can deadlock in unexpected contexts.

Check-your-understanding questions

Why do you need a recursion guard?
Why use TLS instead of a global flag?
What is the risk of calling malloc inside your malloc hook?

Check-your-understanding answers

Because the hook may trigger the same symbol internally.
Multiple threads could interfere with each other otherwise.
You will recurse indefinitely and likely crash.

Real-world applications

Low-level profiling tools like libtcmalloc interposition.

Where you’ll apply it

In this project: see Section 5.8 Hints in Layers and Section 7.1 Frequent Mistakes.
Also used in: P04-hot-reload-dev-server for safe reload hooks.

References

man dlsym, man dlopen.
“The Linux Programming Interface” (Kerrisk), Ch. 42.

Key insights Interceptors must be written as if they are called from inside the loader itself: minimal dependencies, safe logging, and recursion guards.

Summary RTLD_NEXT gives you the real function, but safe forwarding requires reentrancy discipline. Treat hooks as low-level code.

Homework/Exercises to practice the concept

Implement a write-only logger for your hooks.
Add a TLS guard and verify recursion is prevented.
Add per-thread counters and print totals on exit.

Solutions to the homework/exercises

Use snprintf into a fixed buffer and call write(2, buf, len).
Use static __thread int in_hook.
Use TLS counters and aggregate in a destructor.

2.3 Thread Safety, Metrics, and Deterministic Logging

Fundamentals Interceptors often collect metrics like call counts and bytes allocated. These counters must be thread-safe because the intercepted functions are frequently used by multiple threads. Deterministic logging means the output is stable and predictable for testing, so you should control when and how logs are emitted, even in multi-threaded programs.

Deep Dive into the concept Thread safety can be achieved with atomics or thread-local storage. For global counters, use stdatomic.h or compiler built-ins to avoid locks. A lock inside malloc interception can cause deadlocks because memory allocation may be used inside the locking implementation. Thread-local counters avoid contention but require aggregation at exit. For deterministic output, you should avoid logging every call because the order can be nondeterministic across threads. Instead, log summary statistics at program exit or after a fixed number of calls.

Deterministic output also requires stable time references. If you include timestamps, you should use a mockable or fixed time source during tests. For this project, the simplest solution is to omit timestamps and log only counts and sizes. If you must include time, allow a HOOK_TIME_SEED to override it for tests.

For hooks that intercept network or file I/O, you may want to include the arguments (e.g., path or address). Be careful with pointer lifetimes and avoid printf formatting that might allocate memory. Preformat into a static buffer or use minimal formatting. A common pattern is to log only in debug mode and keep production mode silent.

Finally, remember that your interceptor is loaded into other programs, so its behavior must be conservative. It should never crash the host program. Defensive coding and minimal dependencies are crucial.

How this fits in this project You will count calls to malloc, open, and connect using atomics or TLS, and emit a deterministic summary at program exit.

Definitions & key terms

Atomic -> Operation that is safe across threads without locks.
TLS -> Thread-local storage.
Deterministic logging -> Output that is stable across runs.

Mental model diagram (ASCII)

Thread A -> malloc -> counter++
Thread B -> malloc -> counter++
On exit -> print total

How it works (step-by-step, with invariants and failure modes)

Interceptor increments atomic counters on each call.
Logs are buffered or deferred.
Destructor prints totals on program exit.

Invariants: counters remain consistent; logging does not allocate memory. Failure modes: data races or recursive logging.

Minimal concrete example

#include <stdatomic.h>
static _Atomic size_t total_calls = 0;

void* malloc(size_t size) {
    atomic_fetch_add(&total_calls, 1);
    return real_malloc(size);
}

__attribute__((destructor))
static void report(void) {
    char buf[128];
    int n = snprintf(buf, sizeof(buf), "total_malloc_calls=%zu\n", total_calls);
    write(2, buf, n);
}

Common misconceptions

“Locks are always safe.” -> Locks can re-enter intercepted functions.
“Per-call logging is fine.” -> It can be nondeterministic and slow.

Check-your-understanding questions

Why prefer atomics over mutexes in hooks?
How can you make logs deterministic in multi-threaded programs?
Why emit logs in a destructor?

Check-your-understanding answers

Locks can cause deadlocks and may allocate memory.
Emit summary logs at exit or in controlled intervals.
It ensures all calls have completed and you print once.

Real-world applications

Production profilers and tracing tools.

Where you’ll apply it

In this project: see Section 6.2 Critical Test Cases and Section 7.1 Frequent Mistakes.
Also used in: P02-library-dependency-visualizer for deterministic output strategy.

References

C11 stdatomic.h documentation.
“Advanced Programming in the UNIX Environment” (Stevens), threading chapters.

Key insights Deterministic metrics are better than verbose logs when you hook critical functions.

Summary Thread-safe counters and deferred logging keep your interceptor stable and testable.

Homework/Exercises to practice the concept

Implement TLS counters and aggregate in a destructor.
Add a --quiet mode that disables logging.
Compare performance with per-call logging vs summary logging.

Solutions to the homework/exercises

Use __thread counters and sum them in an exit handler.
Check an environment variable before logging.
Run the target program and measure runtime with time.

3. Project Specification

3.1 What You Will Build

A shared library libintercept.so that:

Overrides malloc, open, and connect.
Logs calls safely without recursion or deadlocks.
Tracks deterministic summary metrics.
Forwards calls to the real implementations using RTLD_NEXT.

3.2 Functional Requirements

Interpose functions: malloc, open, connect.
Forward correctly: Use RTLD_NEXT to call real functions.
Thread safety: Use atomics or TLS for counters.
Deterministic logging: Summary log at exit.
Failure handling: Detect missing symbols and disable hooks gracefully.

3.3 Non-Functional Requirements

Reliability: Never crash the host program.
Performance: Minimal overhead; avoid heavy logging.
Portability: Linux focus; document limitations.

3.4 Example Usage / Output

$ LD_PRELOAD=./libintercept.so /usr/bin/curl https://example.com
[hook] malloc calls=842 bytes=2.1MB
[hook] open calls=7
[hook] connect calls=3

3.5 Data Formats / Schemas / Protocols

Log format

[hook] malloc calls=<count> bytes=<bytes>

3.6 Edge Cases

Interceptor used on setuid binary (ignored).
Missing symbol version in libc.
open called via open64 or __open.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

gcc -shared -fPIC -o libintercept.so intercept.c -ldl
LD_PRELOAD=./libintercept.so /bin/ls

3.7.2 Golden Path Demo (Deterministic)

Run /bin/true with preload and verify summary counts are stable.

3.7.3 CLI Transcript (Success + Failure)

$ LD_PRELOAD=./libintercept.so /bin/ls /tmp
[hook] malloc calls=12 bytes=4096
[hook] open calls=2
[hook] connect calls=0
[exit] code=0

$ LD_PRELOAD=./libintercept.so /bin/su
[warning] secure-exec: LD_PRELOAD ignored
[exit] code=7

3.7.4 If CLI: Exit Codes

0: success
7: preload ignored or disabled
8: missing symbol

3.7.5 If Library: Usage Snippet and Errors

Install/Build

cc -shared -fPIC -o libintercept.so intercept.c -ldl

Minimal usage

LD_PRELOAD=./libintercept.so /bin/ls

Expected output

Summary lines printed to stderr on exit.

Error handling snippet

LD_PRELOAD=./libintercept.so /bin/su
# [warning] secure-exec: LD_PRELOAD ignored

4. Solution Architecture

4.1 High-Level Design

app -> loader -> libintercept.so
                | overrides malloc/open/connect
                | forwards to real libc via RTLD_NEXT

4.2 Key Components

Component	Responsibility	Key Decisions
Hook functions	Override libc symbols	Match exact signatures
Resolver	Find real functions	Cache `dlsym(RTLD_NEXT)`
Logger	Emit deterministic summary	Use `write`
Metrics	Thread-safe counters	Use atomics/TLS

4.3 Data Structures (No Full Code)

typedef struct {
    _Atomic size_t malloc_calls;
    _Atomic size_t malloc_bytes;
    _Atomic size_t open_calls;
    _Atomic size_t connect_calls;
} metrics_t;

4.4 Algorithm Overview

Key Algorithm: Hook Call Flow

Guard against recursion.
Resolve real function if needed.
Call real function.
Update counters.
Return result.

Complexity Analysis:

Time: O(1) per call.
Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

intercept/
|-- src/
|   |-- intercept.c
|   `-- log.c
|-- Makefile
`-- README.md

5.3 The Core Question You’re Answering

“How does symbol resolution order allow runtime interception?”

5.4 Concepts You Must Understand First

Symbol resolution order and interposition.
RTLD_NEXT and recursion guards.
Thread-safe deterministic logging.

5.5 Questions to Guide Your Design

Which functions are safe to call inside hooks?
How will you detect if your hook is running recursively?
How will you handle missing symbol versions?

5.6 Thinking Exercise

If you intercept printf, list the functions it might call internally that could re-enter your interceptor. How would you avoid recursion?

5.7 The Interview Questions They’ll Ask

“Why does LD_PRELOAD override libc functions?”
“What is RTLD_NEXT and when do you use it?”
“What are the dangers of intercepting malloc?”

5.8 Hints in Layers

Hint 1: Start with puts

Hint 2: Use write for logging

Hint 3: Add TLS recursion guards

5.9 Books That Will Help

Topic	Book	Chapter
Dynamic loading	TLPI	Ch. 42
Threads	APUE	Ch. 11

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Interpose puts and log calls safely.

Phase 2: Core Functionality (2-3 days)

Add malloc, open, connect hooks with forwarding.

Phase 3: Polish & Edge Cases (1-2 days)

Add deterministic summary and secure-exec detection.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Logging	printf vs write	write	Avoid recursion
Counters	mutex vs atomics	atomics	Avoid deadlocks
Hook init	constructor vs lazy	constructor	Reduce recursion

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Hook logic	direct call to `malloc`
Integration Tests	Real binaries	`/bin/ls`, `curl`
Edge Case Tests	Secure-exec	`su`, `sudo`

6.2 Critical Test Cases

Intercept malloc without recursion crash.
Threaded program still logs correct totals.
LD_PRELOAD ignored for secure-exec -> warning printed.

6.3 Test Data

/sbin/ldconfig
/bin/ls

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Using `printf` in `malloc`	Crash or hang	Use `write`
Wrong signature	Corrupted stack	Match function prototype
Missing `-ldl`	Undefined `dlsym`	Link against `libdl`

7.2 Debugging Strategies

Run with LD_DEBUG=bindings to see symbol resolution.
Use strace to validate system calls.

7.3 Performance Traps

Logging per call can slow programs dramatically.

8. Extensions & Challenges

8.1 Beginner Extensions

Add hooks for read and write.
Add a --quiet environment flag.

8.2 Intermediate Extensions

Export metrics as JSON on exit.
Add per-thread statistics.

8.3 Advanced Extensions

Add symbol version support.
Support macOS DYLD_INSERT_LIBRARIES.

9. Real-World Connections

9.1 Industry Applications

Debugging production binaries without recompilation.
Profiling allocators and I/O.

libeatmydata: intercepts fsync.
jemalloc: allocator with interposition options.

9.3 Interview Relevance

Shows understanding of loader symbol resolution and runtime behavior.

10. Resources

10.1 Essential Reading

“The Linux Programming Interface” (Kerrisk), Ch. 42.
man ld.so, man dlsym.

10.2 Video Resources

Dynamic linker internals talks.

10.3 Tools & Documentation

LD_DEBUG=bindings for symbol resolution tracing.

11. Self-Assessment Checklist

11.1 Understanding

I can explain how LD_PRELOAD changes resolution order.
I can describe how RTLD_NEXT works.
I can explain why recursion guards are necessary.

11.2 Implementation

Hooks work on real binaries without crashes.
Summary logs are deterministic.
Thread-safe counters are correct.

11.3 Growth

I can explain symbol interposition in an interview.
I documented at least one tricky hook bug.

12. Submission / Completion Criteria

Minimum Viable Completion:

Intercept one libc function and forward correctly.
Log deterministic summary at exit.

Full Completion:

Intercept malloc, open, and connect safely.
Handles secure-exec and missing symbols.

Excellence (Going Above & Beyond):

Cross-platform injection or symbol version support.
JSON metrics exporter with deterministic output.

Project 3: LD_PRELOAD Function Interceptor

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

2.1 Symbol Resolution Order and Interposition

2.2 RTLD_NEXT, Reentrancy, and Safe Forwarding

2.3 Thread Safety, Metrics, and Deterministic Logging

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 CLI Transcript (Success + Failure)

3.7.4 If CLI: Exit Codes

3.7.5 If Library: Usage Snippet and Errors

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Phase 2: Core Functionality (2-3 days)

Phase 3: Polish & Edge Cases (1-2 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria

2.2 `RTLD_NEXT`, Reentrancy, and Safe Forwarding