Project 3: LD_PRELOAD Function Interceptor

Build a shared library that intercepts libc calls via LD_PRELOAD and logs behavior.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate Weekend-1 week
Language C
Prerequisites Function pointers, dlopen/dlsym basics
Key Topics Symbol interposition, RTLD_NEXT, thread safety

1. Learning Objectives

By completing this project, you will:

  1. Explain symbol interposition and the loader search order.
  2. Implement function interceptors with LD_PRELOAD.
  3. Call the original function using dlsym(RTLD_NEXT, ...).
  4. Avoid recursion and maintain thread safety.

2. Theoretical Foundation

2.1 Core Concepts

  • Symbol interposition: The loader resolves symbols to the first matching definition in the search order, including preloaded libraries.
  • RTLD_NEXT: Lets your interceptor call the next symbol in the chain (the real function).
  • Reentrancy pitfalls: Interceptors can accidentally call the same function again.

2.2 Why This Matters

This is how many tracing tools and debuggers work without kernel involvement. It gives you visibility into runtime behavior with minimal effort.

2.3 Historical Context / Background

LD_PRELOAD was introduced to override symbols dynamically and has been used for debugging, testing, and sometimes exploitation.

2.4 Common Misconceptions

  • “Interception is universal”: Static binaries and some setuid binaries ignore LD_PRELOAD.
  • “printf is safe”: Many libc functions internally call others you intercept, causing recursion.

3. Project Specification

3.1 What You Will Build

A shared library that intercepts malloc, free, and open, logs usage, and reports totals on exit.

3.2 Functional Requirements

  1. Implement wrappers for at least two libc functions.
  2. Use dlsym(RTLD_NEXT, ...) to call the original.
  3. Prevent recursion with a guard or minimal syscalls.
  4. Print summary statistics at program exit.

3.3 Non-Functional Requirements

  • Performance: Minimal overhead per call.
  • Reliability: Should not crash target programs.
  • Usability: Simple LD_PRELOAD=... usage.

3.4 Example Usage / Output

$ LD_PRELOAD=./libintercept.so /bin/ls
[intercept] open("/etc/ld.so.cache") = 3
[intercept] malloc(1024) = 0x7f8a...
[summary] allocs=120 bytes=98304

3.5 Real World Outcome

You can trace real programs without recompilation:

$ LD_PRELOAD=./libintercept.so /usr/bin/curl https://example.com
[intercept] connect(fd=5, addr=93.184.216.34:443)
[intercept] malloc(4096) = 0x7f8a...
[summary] allocs=847 bytes=2304000

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│ target app   │────▶│ libintercept.so │────▶│ libc real fn │
└──────────────┘     └─────────────────┘     └──────────────┘

4.2 Key Components

Component Responsibility Key Decisions
Interceptor funcs Log and forward Use RTLD_NEXT
Guard Prevent recursion Thread-local flag
Reporter Print totals at exit atexit hook

4.3 Data Structures

typedef struct {
    size_t alloc_count;
    size_t alloc_bytes;
} alloc_stats_t;

4.4 Algorithm Overview

Key Algorithm: Interpose and forward

  1. Resolve original symbol with dlsym(RTLD_NEXT, ...).
  2. Log parameters.
  3. Call original function.
  4. Update stats and return.

Complexity Analysis:

  • Time: O(1) per intercepted call.
  • Space: O(1) global state.

5. Implementation Guide

5.1 Development Environment Setup

gcc --version
man ld.so

5.2 Project Structure

project-root/
├── intercept.c
└── Makefile

5.3 The Core Question You’re Answering

“How does the dynamic loader choose which function implementation to call?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Symbol resolution order
    • Preload libraries vs default paths
  2. Function pointers
    • Matching signatures exactly
  3. Reentrancy hazards
    • Logging functions calling intercepted functions

5.5 Questions to Guide Your Design

  1. How will you avoid recursive calls while logging?
  2. Which functions are safe to call inside interceptors?
  3. How will you store state across threads?

5.6 Thinking Exercise

Design an interceptor for connect() that blocks connections to a specific IP.

5.7 The Interview Questions They’ll Ask

  1. What does LD_PRELOAD do?
  2. Why is RTLD_NEXT necessary?
  3. What breaks when you intercept malloc without care?

5.8 Hints in Layers

Hint 1: Minimal logging

  • Use write(2, ...) instead of printf.

Hint 2: Guard

  • Use __thread guard variable.

Hint 3: Resolve once

  • Cache the original function pointer.

5.9 Books That Will Help

Topic Book Chapter
Interposition TLPI Ch. 42
Function pointers “C Programming: A Modern Approach” Ch. 17
Loader behavior Drepper PDF symbol lookup

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Goals:

  • Intercept a simple function like puts.

Tasks:

  1. Create shared library with -shared -fPIC.
  2. Override puts and forward to original.

Checkpoint: LD_PRELOAD logs each call.

Phase 2: Core Functionality (2-3 days)

Goals:

  • Intercept malloc and track stats.

Tasks:

  1. Add guard to avoid recursion.
  2. Track total allocations.

Checkpoint: Summary printed at exit.

Phase 3: Polish & Edge Cases (2-3 days)

Goals:

  • Thread safety and stability.

Tasks:

  1. Add thread-local guard.
  2. Avoid unsafe logging.

Checkpoint: Works on multi-threaded apps.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Logging method printf vs write write Avoid recursion
Guard type global vs thread-local thread-local Multi-thread safety

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Interception Confirm override LD_PRELOAD on /bin/ls
Forwarding Ensure real fn called Output still correct
Stability Avoid recursion No crashes on curl

6.2 Critical Test Cases

  1. Intercepted function logs and returns correct value.
  2. No infinite recursion when intercepting malloc.
  3. Works on real programs without crash.

6.3 Test Data

/bin/ls, /usr/bin/curl

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong signature Crash or garbage Match exact prototype
Using printf Recursion Use write
Missing RTLD_NEXT Calls itself Use RTLD_NEXT

7.2 Debugging Strategies

  • Use LD_DEBUG=libs,bindings to see loader decisions.
  • Test with a simple program before big apps.

7.3 Performance Traps

Intercepting hot functions adds overhead; keep logging minimal.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Intercept open and log file paths.

8.2 Intermediate Extensions

  • Build a per-thread allocation tracker.

8.3 Advanced Extensions

  • Implement a policy engine to block disallowed syscalls.

9. Real-World Connections

9.1 Industry Applications

  • Profilers: Track allocations and I/O patterns.
  • Security tooling: Enforce runtime policy without kernel hooks.
  • libeatmydata: Interposes fsync for performance.
  • jemalloc: Advanced allocator with interposition hooks.

9.3 Interview Relevance

  • Shows deep understanding of symbol resolution and runtime behavior.

10. Resources

10.1 Essential Reading

  • TLPI - Shared libraries advanced features.
  • Drepper - Symbol lookup details.

10.2 Video Resources

  • Search: “LD_PRELOAD tutorial”.

10.3 Tools & Documentation

  • ld.so: man ld.so
  • dlopen/dlsym: man dlopen

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain loader symbol order.
  • I can describe RTLD_NEXT behavior.

11.2 Implementation

  • Interceptors log and forward correctly.
  • No recursion or crashes on real programs.

11.3 Growth

  • I can design a custom tracing tool for my stack.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Intercept one libc function and log it.

Full Completion:

  • Track allocation stats and print a summary.

Excellence (Going Above & Beyond):

  • Add policy-based blocking or per-thread reporting.

This guide was generated from SHARED_LIBRARIES_LEARNING_PROJECTS.md. For the complete learning path, see the parent directory README.