LEARN C LINKING DEEP DIVE
Learn C Linking: From Object Files to Process Execution
Goal: To deeply understand the entire C build and execution pipeline after the compiler is done—linkers, loaders, object files, symbols, relocations, and the magic that turns your code into a running process.
Why Learn This?
Most C programmers treat the linking and loading process as a black box. You type gcc main.c -o main, and it just works. But understanding this process is the key to mastering C and systems programming. It demystifies common errors, explains performance characteristics, and unlocks advanced techniques like plugin architectures and code patching.
After completing these projects, you will:
- Read and understand the structure of ELF/Mach-O executable files.
- Know exactly how a linker resolves symbols and why “undefined reference” errors happen.
- Understand the difference between static and dynamic linking and their real-world tradeoffs.
- Grasp how relocations work to allow code to run at any memory address.
- Comprehend the role of the OS loader and how a program is brought to life in memory.
Core Concept Analysis
The Post-Compilation Pipeline
┌──────────────────┐ ┌──────────────────┐
│ main.c │ │ helper.c │
└──────────────────┘ └──────────────────┘
│ │
▼ gcc -c ▼ gcc -c
┌──────────────────┐ ┌──────────────────┐
│ main.o │ │ helper.o │
│ (Object File) │ │ (Object File) │
└──────────────────┘ └──────────────────┘
│ │
└───────────▼───────────┘
ld (The Linker)
┌───────────▲───────────┐
│ │ │
┌──────────────────┐ │ ┌──────────────────┐
│ libc.a │ │ │ libc.so │
│ (Static Library) │ │ │(Shared Library) │
└──────────────────┘ │ └──────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ my_program (Executable) │
└──────────────────────────────────────────┘
│
▼ OS Executes
┌──────────────────────────────────────────┐
│ ld.so (The Dynamic Loader/Linker) │
│ (Finds libc.so, performs relocations) │
└─────────────────────────────────────────────────────────┐
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Running Process in Memory │
└─────────────────────────────────────────────────────────────────────────┘
Fundamental Concepts
-
Object File Formats (ELF/Mach-O/PE): These are structured files containing the compiled code (
.textsection), initialized data (.data), symbol tables, relocation information, and other metadata needed by the linker. - Symbols: A symbol is a name for a function or a variable. The linker’s main job is to connect references to a symbol with its definition.
- Strong Symbols: The primary definition of a function or initialized global variable. You can only have one per program.
- Weak Symbols: A secondary definition. If a strong symbol is present, the weak one is ignored. If not, the weak one is used. This is often used for providing default implementations.
-
Relocations: An entry in an object file that tells the linker, “You don’t know the final memory address of
printfyet, but when you do, patch that address into mycallinstruction here.” -
Static Linking: The linker copies all the necessary code from static libraries (
.afiles) directly into your final executable. The result is a large, self-contained file. - Dynamic Linking: The linker doesn’t copy the library code. Instead, it leaves a note in the executable saying, “At runtime, you will need
libc.so.”- Position-Independent Code (PIC): Code generated in a special way that can be loaded at any memory address without needing modification. This is essential for shared libraries, as they can’t know where they’ll be loaded in each process.
- Global Offset Table (GOT) & Procedure Linkage Table (PLT): Mechanisms used in PIC to look up the addresses of global variables and functions at runtime.
- The Loader: When you run
./my_program, the OS loader reads the executable, maps its sections into memory, sees that it’s dynamically linked, and hands control over to the program interpreter (usuallyld-linux.so.2on Linux). This dynamic loader then finds all the required shared libraries (.sofiles), maps them into memory, performs final relocations, and finally jumps to your program’s_startfunction.
Project List
These projects are designed to be done in order, as they build upon each other to create a complete mental model of the linking and loading process.
Project 1: Build an ELF/Mach-O Inspector
- File: LEARN_C_LINKING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Python, Go, Rust
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Systems Programming / Binary Formats
- Software or Tool: A C compiler (GCC/Clang)
- Main Book: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron
What you’ll build: A command-line tool that reads an object file (.o) or executable and prints its essential metadata: the file headers, the list of sections (like .text, .data, .bss), and the symbols defined or required by the file. Think of it as a simplified readelf or objdump.
Why it teaches the fundamentals: You cannot understand linking without first understanding the data structure a linker operates on: the object file. This project forces you to confront the binary layout, byte by byte. You’ll stop seeing executables as opaque blobs and start seeing them as structured data.
Core challenges you’ll face:
- Parsing the main file header → maps to understanding the file’s architecture, type, and entry point
- Locating and reading the section header table → maps to learning how the file is divided into code, data, etc.
- Finding the symbol table and string table → maps to figuring out how symbol names are stored and referenced
- Handling different endianness and word sizes (32/64-bit) → maps to writing portable and robust parsing code
Key Concepts:
- ELF File Format:
man 5 elfon Linux is the canonical source. - Struct-based Parsing: “The C Programming Language” (K&R) Ch. 6 on structures.
- File I/O:
fopen,fread,fseekare your primary tools.
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Solid C programming skills, including pointers, structs, and file I/O.
Real world outcome: A tool that gives you insight into any compiled program on your system.
$ ./my_readelf my_program.o
ELF Header:
Magic: 7f 45 4c 46 02 01 01 ...
Class: ELF64
Type: REL (Relocatable file)
Machine: AMD x86-64
Section Headers:
[Nr] Name Type Address Offset Size
[ 1] .text PROGBITS 0000000000000000 00000040 0000005a
[ 2] .data PROGBITS 0000000000000000 0000009c 00000004
[ 3] .symtab SYMTAB 0000000000000000 00000a30 000001b0
Symbol Table '.symtab':
Num: Value Size Type Bind Name
8: 000000000000001a 42 FUNC GLOBAL my_function
9: 0000000000000000 0 NOTYPE GLOBAL my_global_var
10: 0000000000000000 0 NOTYPE GLOBAL printf (UNDEFINED)
Implementation Hints:
- Include the Header: On Linux, start by including
<elf.h>. This contains theElf64_Ehdr,Elf64_Shdr, andElf64_Symstructs you’ll need. On macOS, you’ll need<mach-o/loader.h>. - Start with the Header:
freadtheElf64_Ehdrfrom the beginning of the file. Check thee_identmagic number to ensure it’s a valid ELF file. - Find the Section Headers: The main header’s
e_shofffield gives you the file offset to the section header table.e_shnumtells you how many there are. Read this table into an array ofElf64_Shdrstructs. - Find the String Table: One of the section headers is the “section header string table” (its name is in the header itself). The
e_shstrndxfield tells you which one. This table is needed to get the names of all the other sections. - Find Symbols: Loop through your array of section headers to find the one with type
SHT_SYMTAB. This is your symbol table. Read it into an array ofElf64_Symstructs. The names are stored in a separate string table (.strtab), which you’ll also need to locate.
Learning milestones:
- You can correctly parse and display the ELF header → You understand the basic blueprint of an executable.
- You can list all section names and their sizes → You see how code and data are organized.
- You can list all symbols → You understand the “table of contents” for the linker.
- Your tool can distinguish between defined and undefined symbols → You’ve identified the core work the linker needs to do.
Project 2: A Practical Study of Symbols and Linking
- File: LEARN_C_LINKING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: C Programming / Linker Theory
- Software or Tool: GCC/Clang, and your tool from Project 1.
- Main Book: “Linkers and Loaders” by John R. Levine
What you’ll build: Not a single tool, but a series of small, targeted C programs that demonstrate specific linker behaviors. You will create experiments to prove how symbol resolution, weak symbols, and static linking work.
Why it teaches symbols: This project moves from parsing the data structures to understanding the rules the linker applies to them. You’ll create scenarios that intentionally cause linker errors or trigger specific behaviors, forcing you to understand the “why”.
Core challenges you’ll face:
- Causing a “multiple definition” error → maps to understanding what a “strong” symbol is
- Using weak symbols to provide default implementations → maps to practical application of weak/strong symbol rules
- Investigating a static library (
.a) file → maps to understanding that a static library is just an archive of.ofiles - Seeing how the linker pulls only needed objects from a library → maps to understanding efficient linking
Key Concepts:
- Symbol Resolution Rules: “Computer Systems: A Programmer’s Perspective” Ch. 7
- Weak and Strong Symbols: The
__attribute__((weak))GCC extension. - Static Libraries: The
arcommand (ar t my_library.alists objects).
Difficulty: Beginner Time estimate: Weekend Prerequisites: Project 1, basic command-line skills.
Real world outcome:
A deep, intuitive understanding of linker errors. You’ll never be confused by an “undefined reference” again. You’ll have a git repository with several small directories, each demonstrating a core linking concept with a README.md explaining the behavior.
Implementation Hints:
Experiment 1: Strong and Weak Symbols
- Create
lib.c:__attribute__((weak)) void func() { printf("Weak implementation\n"); } - Create
main.c:void func(); int main() { func(); return 0; } - Compile and link:
gcc main.c lib.c. It should print “Weak implementation”. - Now, create
main_strong.c:void func() { printf("Strong implementation\n"); } int main() { func(); return 0; } - Link again:
gcc main_strong.c lib.c. It should print “Strong implementation”. You’ve just seen the linker prefer the strong symbol.
Experiment 2: Selective Linking from a Static Library
- Create
lib_a.c(void a(){...}),lib_b.c(void b(){...}). - Compile them:
gcc -c lib_a.c lib_b.c. - Create a static library:
ar rcs my_lib.a lib_a.o lib_b.o. - Create
main.cthat only callsa(). - Link it:
gcc main.c my_lib.a -o main_a. - Use your ELF inspector (or
nmorobjdump) onmain_a. You should see that the symbolais included, but the symbolbis not. The linker was smart enough to only pull in the object file it needed.
Learning milestones:
- You can intentionally create and explain a “multiple definition” error → You understand strong symbols.
- You can override a weak function in a library with your own strong version → You understand the weak/strong dynamic.
- You can explain why the order of libraries on the command line matters → You understand the linker’s single-pass resolution process.
- You can predict which
.ofiles will be extracted from a.afile → You understand how static libraries work.
Project 3: Relocation and the PLT/GOT
- File: LEARN_C_LINKING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: N/A
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Assembly / Dynamic Linking
- Software or Tool: A debugger (GDB/LLDB) and a disassembler (
objdump). - Main Book: “Linkers and Loaders” by John R. Levine
What you’ll build: Another lab-based project. You’ll write simple C code that calls a shared library function (like printf). You will then disassemble the executable and trace its execution in a debugger to see the Procedure Linkage Table (PLT) and Global Offset Table (GOT) in action.
Why it teaches relocations and PIC: This project unravels the “magic” of dynamic linking. You will see the exact machine code mechanism that allows an executable to call a library function whose address isn’t known until runtime. It connects the concepts of object files directly to CPU execution.
Core challenges you’ll face:
- Generating readable assembly → maps to using
objdump -dor GDB’sdisassemblecommand - Finding the PLT and GOT sections → maps to using your inspector or
readelf -S - Stepping through the PLT indirection in a debugger → maps to seeing the lazy binding process happen live
- Understanding how the GOT entry is patched on the first call → maps to witnessing the dynamic loader’s work
Key Concepts:
- Procedure Linkage Table: Excellent explanation at technovelty.org
- Lazy Binding: “Computer Systems: A Programmer’s Perspective” Ch. 7.9
- x86 Assembly: A basic understanding of
call,jmp, and memory addressing is needed.
Difficulty: Advanced
Time estimate: 1-2 weeks
Prerequisites: Project 1, basic GDB skills (setting breakpoints, stepping instructions (si), examining memory (x)).
Real world outcome:
A profound “aha!” moment. You will be able to look at a disassembled binary, see a call printf@plt, and know exactly what series of jumps and memory lookups the CPU will perform to find and execute printf.
Implementation Hints:
- Create the test case:
simple_call.ccontainingint main() { printf("Hello, world!\n"); return 0; }. - Compile it:
gcc -O0 simple_call.c -o simple_call. The-O0is important to prevent the compiler from optimizing away the call. - Disassemble
main:objdump -d simple_call | grep -A 5 "<main>:"). You will see an instruction likecallq <printf@plt>. Note the address. - Start GDB:
gdb ./simple_call. - Observe the PLT: Use
disassemble printf@plt. You’ll see ajmpto an address stored in the GOT, followed by code to invoke the dynamic loader. - Set a breakpoint:
break main. Run the program. - Watch it happen: Use
si(step instruction) to step up to and into thecallq <printf@plt>instruction. You will land in the PLT stub. - Examine the GOT entry for
printfbefore the call: It will point back into the PLT stub. - Step through the stub: You will see it jump to the dynamic loader (
ld.so). - Continue execution:
continue. The program will print “Hello, world!”. - Examine the GOT entry again: Now, it will contain the actual address of
printfinlibc.so. The next time you callprintf, the PLT will jump directly there, skipping the loader.
Learning milestones:
- You can locate the PLT and GOT in an executable → You know where to look for dynamic linking machinery.
- You can explain what the first
jmpin a PLT entry does → You understand the GOT lookup. - You can trace a function call through the PLT to the dynamic loader → You have witnessed lazy binding.
- You can explain why subsequent calls to the same function are faster → You understand how the GOT is patched.
Project 4: The Dynamic Loader in Action (dlopen)
- File: LEARN_C_LINKING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Systems Programming / C
- Software or Tool:
libdllibrary - Main Book: “Advanced Programming in the UNIX Environment” by Stevens & Rago
What you’ll build: A simple plugin-based program. The main application will load shared libraries (.so files) from a plugins/ directory at runtime, look for a specific function within them (e.g., run_plugin), and execute it.
Why it teaches loading: This project lets you play the role of the dynamic loader. You’ll use the same system calls (dlopen, dlsym, dlclose) that ld.so uses, but explicitly in your own code. It’s the key to building extensible applications.
Core challenges you’ll face:
- Compiling a shared library → maps to using the
-fPICand-sharedflags correctly - Loading a library at runtime → maps to using
dlopenand handling errors - Finding a symbol in the loaded library → maps to using
dlsymand casting the returnedvoid*to a function pointer - Managing library handles and memory → maps to understanding
dlcloseand its implications
Key Concepts:
- Dynamic Loading API:
man 3 dlopen. - Function Pointers: “The C Programming Language” (K&R) Ch. 5.11.
- PIC/Shared Libraries: GCC documentation on
-fPIC.
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Solid C skills, especially function pointers.
Real world outcome:
A working host program that can run new functionality by simply dropping new .so files into a directory, without recompiling the main application.
Implementation Hints:
- Create a Plugin Interface (
plugin.h):#ifndef PLUGIN_H #define PLUGIN_H void run_plugin(); #endif - Create two plugins:
plugin1.c:#include <stdio.h>void run_plugin() { printf("Hello from Plugin 1!\n"); }plugin2.c:#include <stdio.h>void run_plugin() { printf("Greetings from Plugin 2!\n"); }
- Compile the plugins as shared libraries:
gcc -fPIC -shared plugin1.c -o plugins/plugin1.so gcc -fPIC -shared plugin2.c -o plugins/plugin2.so - Create the host application (
main.c):#include <dlfcn.h> #include <stdio.h> // Function pointer type that matches our plugin's function signature typedef void (*plugin_func_t)(); int main() { void* handle = dlopen("./plugins/plugin1.so", RTLD_LAZY); if (!handle) { /* error handling */ } // Clear any existing errors dlerror(); plugin_func_t run = (plugin_func_t)dlsym(handle, "run_plugin"); const char* dlsym_error = dlerror(); if (dlsym_error) { /* error handling */ } run(); // Execute the plugin's function dlclose(handle); return 0; } - Compile and Run the Host:
gcc main.c -ldl -o host(the-ldlis crucial). Then./host.
Learning milestones:
- You can successfully compile a
.sofile → You understand the flags required for shared objects. - Your host program can load a
.sofile without crashing → You understanddlopen. - You can successfully look up and call a function by name → You’ve mastered
dlsymand function pointers. - Your program can be extended by adding a new plugin file without recompiling the host → You understand the power of dynamic loading.
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| ELF/Mach-O Inspector | Level 2: Intermediate | 1-2 weeks | ★★★☆☆ | ★★★☆☆ |
| Symbol/Linking Study | Level 1: Beginner | Weekend | ★★★☆☆ | ★★☆☆☆ |
| PLT/GOT Relocation Lab | Level 3: Advanced | 1-2 weeks | ★★★★★ | ★★★★☆ |
Dynamic Loader (dlopen) |
Level 2: Intermediate | Weekend | ★★★★☆ | ★★★★☆ |
Recommendation
It is essential to do these projects in order. Start with Project 1: Build an ELF/Mach-O Inspector. This provides the fundamental knowledge of the data structures involved. Without it, the other projects will feel abstract and magical.
Once you have your inspector, proceed to the Symbol Study and the PLT/GOT Lab. These hands-on analysis projects will connect the file format knowledge to the actual behavior of the linker and loader. Finally, the dlopen project will let you apply this knowledge to a practical, real-world programming pattern.
This path will take you from theory (file formats) to observation (debugging) to application (plugins), giving you a robust and complete understanding of the C linking and loading process.
Summary
| Project | Main Programming Language |
|---|---|
| Build an ELF/Mach-O Inspector | C |
| A Practical Study of Symbols and Linking | C |
| Relocation and the PLT/GOT | C |
The Dynamic Loader in Action (dlopen) |
C |