Project 3: LD_PRELOAD Function Interceptor

Build a shared library that intercepts libc functions via LD_PRELOAD, logs calls safely, and forwards to the real implementations without recursion bugs.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate Weekend to 1 week
Main Programming Language C (Alternatives: C++, Rust)
Alternative Programming Languages C++, Rust
Coolness Level Level 4: Hardcore Tech Flex
Business Potential Level 3: Service & Support Model
Prerequisites Dynamic loading, basic threading, libc awareness
Key Topics Symbol interposition, RTLD_NEXT, reentrancy, thread safety

1. Learning Objectives

By completing this project, you will:

  1. Explain and exploit the dynamic loader’s symbol resolution order.
  2. Implement safe function interception with LD_PRELOAD and RTLD_NEXT.
  3. Avoid recursion and deadlocks in hook implementations.
  4. Collect deterministic call metrics and emit logs safely.
  5. Debug tricky loader behaviors and symbol collisions.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Symbol Resolution Order and Interposition

Fundamentals Dynamic linking resolves symbols by searching a chain of objects: the main executable, preloaded libraries, and DT_NEEDED dependencies. LD_PRELOAD allows you to inject a library at the front of this chain. If your preload library exports a symbol with the same name as libc, your version wins. This is called interposition. It is a powerful mechanism for debugging, profiling, or altering program behavior without modifying source code. However, it is also fragile because it relies on loader rules and symbol visibility.

Deep Dive into the concept The loader builds a global symbol scope for the process. When resolving a symbol, it searches in a defined order: first the main executable, then LD_PRELOAD libraries, then dependencies in the load order (breadth-first in many implementations). Interposition works because the first definition found is used to resolve relocations. This is why LD_PRELOAD can override malloc or open. But interposition does not apply to all symbols. Some functions are resolved at link time and may be bound to internal implementations if the binary is statically linked or if -Wl,-Bsymbolic is used in a library, which forces local bindings.

There are subtleties with symbol versioning and visibility. If libc exports multiple versions of a symbol, the loader uses versioned symbols to resolve the exact one required. Your preload library must export a compatible version or the symbol may not be chosen. In practice, for common libc functions, exporting the unversioned symbol name often works because the loader is permissive, but this can vary. If you intercept a function that is inlined or is a macro (like open mapping to open64), you may not see calls you expect. This is why interceptors often hook multiple symbol names (open, open64, __libc_open depending on platform).

Interposition affects the entire process, not just the target library. That means any code in the process that calls malloc will go through your interceptor, including other libraries and even your interceptor itself. This is the core challenge: you must handle reentrancy and avoid recursion. A typical pattern is to resolve the real function pointer once (using dlsym(RTLD_NEXT, "malloc")) and then use a thread-local guard to prevent recursion.

On Linux, LD_PRELOAD is ignored for setuid binaries for security reasons (secure-exec). Your tool should detect and report this to avoid confusion. On macOS, the equivalent is DYLD_INSERT_LIBRARIES, and on Windows you need different injection techniques. For this project, focus on Linux and make the limitations explicit.

How this fits in this project You will create a preload library that intercepts malloc, open, and connect. Understanding the loader’s symbol order is what makes this possible and explains why your interceptors fire.

Definitions & key terms

  • Interposition -> Replacing a symbol with another definition at runtime.
  • LD_PRELOAD -> Environment variable specifying libraries to load first.
  • Global scope -> Symbol namespace used for resolution.
  • -Bsymbolic -> Linker option that binds symbol references locally.

Mental model diagram (ASCII)

Resolution order:
[main exe] -> [LD_PRELOAD libs] -> [DT_NEEDED libs]
   ^              ^
   |              |
 symbols resolved here first

How it works (step-by-step, with invariants and failure modes)

  1. Loader reads LD_PRELOAD and loads your interceptor library first.
  2. Loader resolves symbols; your overrides are chosen before libc.
  3. When target program calls malloc, your function is invoked.
  4. Your interceptor calls the real malloc via RTLD_NEXT.

Invariants: your interceptor must provide a correct signature and must call the real function. Failure modes: recursion loops, incorrect signatures, or missing symbol versions.

Minimal concrete example

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

static void* (*real_malloc)(size_t) = NULL;

void* malloc(size_t size) {
    if (!real_malloc) {
        real_malloc = dlsym(RTLD_NEXT, "malloc");
    }
    void* p = real_malloc(size);
    fprintf(stderr, "malloc(%zu)=%p\n", size, p);
    return p;
}

Common misconceptions

  • “LD_PRELOAD always works.” -> It is ignored for setuid binaries.
  • “You can just call printf inside malloc.” -> That may call malloc again.
  • “Interposition affects only the target function.” -> It affects the entire process.

Check-your-understanding questions

  1. Why does LD_PRELOAD allow your symbols to override libc?
  2. What does -Bsymbolic do to interposition?
  3. Why might open calls not be intercepted?

Check-your-understanding answers

  1. Preloaded libraries are searched before other dependencies in symbol resolution.
  2. It forces local binding in a library, preventing overrides from outside.
  3. open might be inlined, redirected to open64, or implemented via other syscalls.

Real-world applications

  • Profiling allocators, tracing system calls, or enforcing policy.

Where you’ll apply it

References

  • “The Linux Programming Interface” (Kerrisk), Ch. 42.
  • man ld.so.

Key insights Interposition works because the loader resolves symbols in a strict order and LD_PRELOAD moves your library to the front.

Summary Symbol resolution order is the foundation of function interception. If you control the order, you can control behavior.

Homework/Exercises to practice the concept

  1. Intercept puts and log its arguments.
  2. Compile a binary with -Wl,-Bsymbolic and observe reduced interposition.
  3. Intercept open and open64 and compare which gets called.

Solutions to the homework/exercises

  1. Use LD_PRELOAD=./libhook.so and override puts.
  2. Build a shared library with -Bsymbolic and see that your interceptor no longer wins.
  3. Use strace to see actual syscalls and compare with your logs.

2.2 RTLD_NEXT, Reentrancy, and Safe Forwarding

Fundamentals When your interceptor overrides a function, you still need access to the real implementation. dlsym(RTLD_NEXT, "symbol") returns the next definition of a symbol in the loader’s search order, skipping your interceptor. This is the standard way to forward calls. However, your interceptor may itself call functions that use the intercepted symbol, causing recursion. Avoiding recursion requires reentrancy guards and careful use of low-level functions.

Deep Dive into the concept RTLD_NEXT is a special handle that tells dlsym to search for the next symbol definition after the current object. This is essential for interceptors. The typical pattern is to resolve the real function pointer once and store it in a static variable. But initialization itself can be reentrant: if the code that resolves the function (or logs) calls the intercepted function, you will recurse before the pointer is set. To avoid this, you can use a thread-local guard (e.g., __thread int in_hook) or resolve in a constructor function that runs before normal execution.

Logging is a common source of reentrancy. printf uses malloc internally, so logging from within malloc can re-enter your hook. For low-level hooks, use write(2, ...) with a preformatted buffer. For more complex logs, you can use syscall directly. Another approach is to detect recursion and bypass logging if already inside the hook.

Thread safety is also critical. The interceptor will be called from multiple threads, so any counters or global state must be protected by atomic operations or thread-local storage. If you use a mutex, be careful not to call pthread_mutex_lock in a hook that intercepts pthread_mutex_lock or another function that uses it internally.

Finally, understand that dlsym itself can call functions that may be intercepted, depending on the loader implementation. Therefore, a safe design caches function pointers during library initialization using minimal dependencies. A constructor function (__attribute__((constructor))) can initialize function pointers before any intercepted calls occur, reducing recursion risk.

How this fits in this project You will use RTLD_NEXT to forward calls to the real malloc, open, and connect. You will implement recursion guards and safe logging to prevent infinite loops.

Definitions & key terms

  • RTLD_NEXT -> Special handle to find the next symbol definition.
  • Reentrancy -> A function being called again before the previous call returns.
  • Constructor -> Function executed when the library is loaded.
  • Thread-local storage (TLS) -> Per-thread state used to avoid global locks.

Mental model diagram (ASCII)

malloc() in app
  -> your malloc()
      -> dlsym(RTLD_NEXT, "malloc") -> real malloc()
      -> log via write()

How it works (step-by-step, with invariants and failure modes)

  1. On first call, interceptor resolves real function with RTLD_NEXT.
  2. Interceptor sets a TLS guard to prevent recursion.
  3. Interceptor calls the real function.
  4. Interceptor logs safely without calling intercepted functions.
  5. Interceptor clears guard and returns result.

Invariants: real function pointer is valid; recursion guard prevents infinite loops. Failure modes: logging recursion, deadlocks, uninitialized function pointers.

Minimal concrete example

static __thread int in_hook = 0;
static int (*real_open)(const char*, int, ...) = NULL;

int open(const char* path, int flags, ...) {
    if (in_hook) return real_open(path, flags);
    in_hook = 1;
    if (!real_open) real_open = dlsym(RTLD_NEXT, "open");
    int fd = real_open(path, flags);
    write(2, "[hook] open\n", 12);
    in_hook = 0;
    return fd;
}

Common misconceptions

  • printf is safe inside hooks.” -> It can re-enter your hooks.
  • dlsym is always safe.” -> It can call loader internals that allocate memory.
  • “Global locks are fine.” -> They can deadlock in unexpected contexts.

Check-your-understanding questions

  1. Why do you need a recursion guard?
  2. Why use TLS instead of a global flag?
  3. What is the risk of calling malloc inside your malloc hook?

Check-your-understanding answers

  1. Because the hook may trigger the same symbol internally.
  2. Multiple threads could interfere with each other otherwise.
  3. You will recurse indefinitely and likely crash.

Real-world applications

  • Low-level profiling tools like libtcmalloc interposition.

Where you’ll apply it

  • In this project: see Section 5.8 Hints in Layers and Section 7.1 Frequent Mistakes.
  • Also used in: P04-hot-reload-dev-server for safe reload hooks.

References

  • man dlsym, man dlopen.
  • “The Linux Programming Interface” (Kerrisk), Ch. 42.

Key insights Interceptors must be written as if they are called from inside the loader itself: minimal dependencies, safe logging, and recursion guards.

Summary RTLD_NEXT gives you the real function, but safe forwarding requires reentrancy discipline. Treat hooks as low-level code.

Homework/Exercises to practice the concept

  1. Implement a write-only logger for your hooks.
  2. Add a TLS guard and verify recursion is prevented.
  3. Add per-thread counters and print totals on exit.

Solutions to the homework/exercises

  1. Use snprintf into a fixed buffer and call write(2, buf, len).
  2. Use static __thread int in_hook.
  3. Use TLS counters and aggregate in a destructor.

2.3 Thread Safety, Metrics, and Deterministic Logging

Fundamentals Interceptors often collect metrics like call counts and bytes allocated. These counters must be thread-safe because the intercepted functions are frequently used by multiple threads. Deterministic logging means the output is stable and predictable for testing, so you should control when and how logs are emitted, even in multi-threaded programs.

Deep Dive into the concept Thread safety can be achieved with atomics or thread-local storage. For global counters, use stdatomic.h or compiler built-ins to avoid locks. A lock inside malloc interception can cause deadlocks because memory allocation may be used inside the locking implementation. Thread-local counters avoid contention but require aggregation at exit. For deterministic output, you should avoid logging every call because the order can be nondeterministic across threads. Instead, log summary statistics at program exit or after a fixed number of calls.

Deterministic output also requires stable time references. If you include timestamps, you should use a mockable or fixed time source during tests. For this project, the simplest solution is to omit timestamps and log only counts and sizes. If you must include time, allow a HOOK_TIME_SEED to override it for tests.

For hooks that intercept network or file I/O, you may want to include the arguments (e.g., path or address). Be careful with pointer lifetimes and avoid printf formatting that might allocate memory. Preformat into a static buffer or use minimal formatting. A common pattern is to log only in debug mode and keep production mode silent.

Finally, remember that your interceptor is loaded into other programs, so its behavior must be conservative. It should never crash the host program. Defensive coding and minimal dependencies are crucial.

How this fits in this project You will count calls to malloc, open, and connect using atomics or TLS, and emit a deterministic summary at program exit.

Definitions & key terms

  • Atomic -> Operation that is safe across threads without locks.
  • TLS -> Thread-local storage.
  • Deterministic logging -> Output that is stable across runs.

Mental model diagram (ASCII)

Thread A -> malloc -> counter++
Thread B -> malloc -> counter++
On exit -> print total

How it works (step-by-step, with invariants and failure modes)

  1. Interceptor increments atomic counters on each call.
  2. Logs are buffered or deferred.
  3. Destructor prints totals on program exit.

Invariants: counters remain consistent; logging does not allocate memory. Failure modes: data races or recursive logging.

Minimal concrete example

#include <stdatomic.h>
static _Atomic size_t total_calls = 0;

void* malloc(size_t size) {
    atomic_fetch_add(&total_calls, 1);
    return real_malloc(size);
}

__attribute__((destructor))
static void report(void) {
    char buf[128];
    int n = snprintf(buf, sizeof(buf), "total_malloc_calls=%zu\n", total_calls);
    write(2, buf, n);
}

Common misconceptions

  • “Locks are always safe.” -> Locks can re-enter intercepted functions.
  • “Per-call logging is fine.” -> It can be nondeterministic and slow.

Check-your-understanding questions

  1. Why prefer atomics over mutexes in hooks?
  2. How can you make logs deterministic in multi-threaded programs?
  3. Why emit logs in a destructor?

Check-your-understanding answers

  1. Locks can cause deadlocks and may allocate memory.
  2. Emit summary logs at exit or in controlled intervals.
  3. It ensures all calls have completed and you print once.

Real-world applications

  • Production profilers and tracing tools.

Where you’ll apply it

  • In this project: see Section 6.2 Critical Test Cases and Section 7.1 Frequent Mistakes.
  • Also used in: P02-library-dependency-visualizer for deterministic output strategy.

References

  • C11 stdatomic.h documentation.
  • “Advanced Programming in the UNIX Environment” (Stevens), threading chapters.

Key insights Deterministic metrics are better than verbose logs when you hook critical functions.

Summary Thread-safe counters and deferred logging keep your interceptor stable and testable.

Homework/Exercises to practice the concept

  1. Implement TLS counters and aggregate in a destructor.
  2. Add a --quiet mode that disables logging.
  3. Compare performance with per-call logging vs summary logging.

Solutions to the homework/exercises

  1. Use __thread counters and sum them in an exit handler.
  2. Check an environment variable before logging.
  3. Run the target program and measure runtime with time.

3. Project Specification

3.1 What You Will Build

A shared library libintercept.so that:

  • Overrides malloc, open, and connect.
  • Logs calls safely without recursion or deadlocks.
  • Tracks deterministic summary metrics.
  • Forwards calls to the real implementations using RTLD_NEXT.

3.2 Functional Requirements

  1. Interpose functions: malloc, open, connect.
  2. Forward correctly: Use RTLD_NEXT to call real functions.
  3. Thread safety: Use atomics or TLS for counters.
  4. Deterministic logging: Summary log at exit.
  5. Failure handling: Detect missing symbols and disable hooks gracefully.

3.3 Non-Functional Requirements

  • Reliability: Never crash the host program.
  • Performance: Minimal overhead; avoid heavy logging.
  • Portability: Linux focus; document limitations.

3.4 Example Usage / Output

$ LD_PRELOAD=./libintercept.so /usr/bin/curl https://example.com
[hook] malloc calls=842 bytes=2.1MB
[hook] open calls=7
[hook] connect calls=3

3.5 Data Formats / Schemas / Protocols

Log format

[hook] malloc calls=<count> bytes=<bytes>

3.6 Edge Cases

  • Interceptor used on setuid binary (ignored).
  • Missing symbol version in libc.
  • open called via open64 or __open.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

gcc -shared -fPIC -o libintercept.so intercept.c -ldl
LD_PRELOAD=./libintercept.so /bin/ls

3.7.2 Golden Path Demo (Deterministic)

  • Run /bin/true with preload and verify summary counts are stable.

3.7.3 CLI Transcript (Success + Failure)

$ LD_PRELOAD=./libintercept.so /bin/ls /tmp
[hook] malloc calls=12 bytes=4096
[hook] open calls=2
[hook] connect calls=0
[exit] code=0

$ LD_PRELOAD=./libintercept.so /bin/su
[warning] secure-exec: LD_PRELOAD ignored
[exit] code=7

3.7.4 If CLI: Exit Codes

  • 0: success
  • 7: preload ignored or disabled
  • 8: missing symbol

3.7.5 If Library: Usage Snippet and Errors

Install/Build

cc -shared -fPIC -o libintercept.so intercept.c -ldl

Minimal usage

LD_PRELOAD=./libintercept.so /bin/ls

Expected output

  • Summary lines printed to stderr on exit.

Error handling snippet

LD_PRELOAD=./libintercept.so /bin/su
# [warning] secure-exec: LD_PRELOAD ignored

4. Solution Architecture

4.1 High-Level Design

app -> loader -> libintercept.so
                | overrides malloc/open/connect
                | forwards to real libc via RTLD_NEXT

4.2 Key Components

Component Responsibility Key Decisions
Hook functions Override libc symbols Match exact signatures
Resolver Find real functions Cache dlsym(RTLD_NEXT)
Logger Emit deterministic summary Use write
Metrics Thread-safe counters Use atomics/TLS

4.3 Data Structures (No Full Code)

typedef struct {
    _Atomic size_t malloc_calls;
    _Atomic size_t malloc_bytes;
    _Atomic size_t open_calls;
    _Atomic size_t connect_calls;
} metrics_t;

4.4 Algorithm Overview

Key Algorithm: Hook Call Flow

  1. Guard against recursion.
  2. Resolve real function if needed.
  3. Call real function.
  4. Update counters.
  5. Return result.

Complexity Analysis:

  • Time: O(1) per call.
  • Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

intercept/
|-- src/
|   |-- intercept.c
|   `-- log.c
|-- Makefile
`-- README.md

5.3 The Core Question You’re Answering

“How does symbol resolution order allow runtime interception?”

5.4 Concepts You Must Understand First

  1. Symbol resolution order and interposition.
  2. RTLD_NEXT and recursion guards.
  3. Thread-safe deterministic logging.

5.5 Questions to Guide Your Design

  1. Which functions are safe to call inside hooks?
  2. How will you detect if your hook is running recursively?
  3. How will you handle missing symbol versions?

5.6 Thinking Exercise

If you intercept printf, list the functions it might call internally that could re-enter your interceptor. How would you avoid recursion?

5.7 The Interview Questions They’ll Ask

  1. “Why does LD_PRELOAD override libc functions?”
  2. “What is RTLD_NEXT and when do you use it?”
  3. “What are the dangers of intercepting malloc?”

5.8 Hints in Layers

Hint 1: Start with puts

Hint 2: Use write for logging

Hint 3: Add TLS recursion guards

5.9 Books That Will Help

Topic Book Chapter
Dynamic loading TLPI Ch. 42
Threads APUE Ch. 11

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

  • Interpose puts and log calls safely.

Phase 2: Core Functionality (2-3 days)

  • Add malloc, open, connect hooks with forwarding.

Phase 3: Polish & Edge Cases (1-2 days)

  • Add deterministic summary and secure-exec detection.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Logging printf vs write write Avoid recursion
Counters mutex vs atomics atomics Avoid deadlocks
Hook init constructor vs lazy constructor Reduce recursion

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Hook logic direct call to malloc
Integration Tests Real binaries /bin/ls, curl
Edge Case Tests Secure-exec su, sudo

6.2 Critical Test Cases

  1. Intercept malloc without recursion crash.
  2. Threaded program still logs correct totals.
  3. LD_PRELOAD ignored for secure-exec -> warning printed.

6.3 Test Data

/sbin/ldconfig
/bin/ls

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Using printf in malloc Crash or hang Use write
Wrong signature Corrupted stack Match function prototype
Missing -ldl Undefined dlsym Link against libdl

7.2 Debugging Strategies

  • Run with LD_DEBUG=bindings to see symbol resolution.
  • Use strace to validate system calls.

7.3 Performance Traps

  • Logging per call can slow programs dramatically.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add hooks for read and write.
  • Add a --quiet environment flag.

8.2 Intermediate Extensions

  • Export metrics as JSON on exit.
  • Add per-thread statistics.

8.3 Advanced Extensions

  • Add symbol version support.
  • Support macOS DYLD_INSERT_LIBRARIES.

9. Real-World Connections

9.1 Industry Applications

  • Debugging production binaries without recompilation.
  • Profiling allocators and I/O.
  • libeatmydata: intercepts fsync.
  • jemalloc: allocator with interposition options.

9.3 Interview Relevance

  • Shows understanding of loader symbol resolution and runtime behavior.

10. Resources

10.1 Essential Reading

  • “The Linux Programming Interface” (Kerrisk), Ch. 42.
  • man ld.so, man dlsym.

10.2 Video Resources

  • Dynamic linker internals talks.

10.3 Tools & Documentation

  • LD_DEBUG=bindings for symbol resolution tracing.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain how LD_PRELOAD changes resolution order.
  • I can describe how RTLD_NEXT works.
  • I can explain why recursion guards are necessary.

11.2 Implementation

  • Hooks work on real binaries without crashes.
  • Summary logs are deterministic.
  • Thread-safe counters are correct.

11.3 Growth

  • I can explain symbol interposition in an interview.
  • I documented at least one tricky hook bug.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Intercept one libc function and forward correctly.
  • Log deterministic summary at exit.

Full Completion:

  • Intercept malloc, open, and connect safely.
  • Handles secure-exec and missing symbols.

Excellence (Going Above & Beyond):

  • Cross-platform injection or symbol version support.
  • JSON metrics exporter with deterministic output.