← Back to all projects

LEARN C LINKING DEEP DIVE

Learn C Linking: From Object Files to Process Execution

Goal: To deeply understand the entire C build and execution pipeline after the compiler is done—linkers, loaders, object files, symbols, relocations, and the magic that turns your code into a running process.


Why Learn This?

Most C programmers treat the linking and loading process as a black box. You type gcc main.c -o main, and it just works. But understanding this process is the key to mastering C and systems programming. It demystifies common errors, explains performance characteristics, and unlocks advanced techniques like plugin architectures and code patching.

After completing these projects, you will:

  • Read and understand the structure of ELF/Mach-O executable files.
  • Know exactly how a linker resolves symbols and why “undefined reference” errors happen.
  • Understand the difference between static and dynamic linking and their real-world tradeoffs.
  • Grasp how relocations work to allow code to run at any memory address.
  • Comprehend the role of the OS loader and how a program is brought to life in memory.

Core Concept Analysis

The Post-Compilation Pipeline

┌──────────────────┐      ┌──────────────────┐
│    main.c        │      │   helper.c       │
└──────────────────┘      └──────────────────┘
         │                       │
         ▼  gcc -c               ▼ gcc -c
┌──────────────────┐      ┌──────────────────┐
│    main.o        │      │   helper.o       │
│ (Object File)    │      │ (Object File)    │
└──────────────────┘      └──────────────────┘
         │                       │
         └───────────▼───────────┘
               ld (The Linker)
         ┌───────────▲───────────┐
         │           │           │
┌──────────────────┐ │ ┌──────────────────┐
│   libc.a         │ │ │   libc.so        │
│ (Static Library) │ │ │(Shared Library)  │
└──────────────────┘ │ └──────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────┐
│             my_program (Executable)      │
└──────────────────────────────────────────┘
                     │
                     ▼ OS Executes
┌──────────────────────────────────────────┐
│      ld.so (The Dynamic Loader/Linker)   │
│   (Finds libc.so, performs relocations)  │
└─────────────────────────────────────────────────────────┐
                                                          ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                           Running Process in Memory                     │
└─────────────────────────────────────────────────────────────────────────┘

Fundamental Concepts

  1. Object File Formats (ELF/Mach-O/PE): These are structured files containing the compiled code (.text section), initialized data (.data), symbol tables, relocation information, and other metadata needed by the linker.

  2. Symbols: A symbol is a name for a function or a variable. The linker’s main job is to connect references to a symbol with its definition.
    • Strong Symbols: The primary definition of a function or initialized global variable. You can only have one per program.
    • Weak Symbols: A secondary definition. If a strong symbol is present, the weak one is ignored. If not, the weak one is used. This is often used for providing default implementations.
  3. Relocations: An entry in an object file that tells the linker, “You don’t know the final memory address of printf yet, but when you do, patch that address into my call instruction here.”

  4. Static Linking: The linker copies all the necessary code from static libraries (.a files) directly into your final executable. The result is a large, self-contained file.

  5. Dynamic Linking: The linker doesn’t copy the library code. Instead, it leaves a note in the executable saying, “At runtime, you will need libc.so.”
    • Position-Independent Code (PIC): Code generated in a special way that can be loaded at any memory address without needing modification. This is essential for shared libraries, as they can’t know where they’ll be loaded in each process.
    • Global Offset Table (GOT) & Procedure Linkage Table (PLT): Mechanisms used in PIC to look up the addresses of global variables and functions at runtime.
  6. The Loader: When you run ./my_program, the OS loader reads the executable, maps its sections into memory, sees that it’s dynamically linked, and hands control over to the program interpreter (usually ld-linux.so.2 on Linux). This dynamic loader then finds all the required shared libraries (.so files), maps them into memory, performs final relocations, and finally jumps to your program’s _start function.

Project List

These projects are designed to be done in order, as they build upon each other to create a complete mental model of the linking and loading process.


Project 1: Build an ELF/Mach-O Inspector

  • File: LEARN_C_LINKING_DEEP_DIVE.md
  • Main Programming Language: C
  • Alternative Programming Languages: Python, Go, Rust
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Systems Programming / Binary Formats
  • Software or Tool: A C compiler (GCC/Clang)
  • Main Book: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron

What you’ll build: A command-line tool that reads an object file (.o) or executable and prints its essential metadata: the file headers, the list of sections (like .text, .data, .bss), and the symbols defined or required by the file. Think of it as a simplified readelf or objdump.

Why it teaches the fundamentals: You cannot understand linking without first understanding the data structure a linker operates on: the object file. This project forces you to confront the binary layout, byte by byte. You’ll stop seeing executables as opaque blobs and start seeing them as structured data.

Core challenges you’ll face:

  • Parsing the main file header → maps to understanding the file’s architecture, type, and entry point
  • Locating and reading the section header table → maps to learning how the file is divided into code, data, etc.
  • Finding the symbol table and string table → maps to figuring out how symbol names are stored and referenced
  • Handling different endianness and word sizes (32/64-bit) → maps to writing portable and robust parsing code

Key Concepts:

  • ELF File Format: man 5 elf on Linux is the canonical source.
  • Struct-based Parsing: “The C Programming Language” (K&R) Ch. 6 on structures.
  • File I/O: fopen, fread, fseek are your primary tools.

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Solid C programming skills, including pointers, structs, and file I/O.

Real world outcome: A tool that gives you insight into any compiled program on your system.

$ ./my_readelf my_program.o
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 ...
  Class:   ELF64
  Type:    REL (Relocatable file)
  Machine: AMD x86-64

Section Headers:
  [Nr] Name      Type      Address          Offset   Size
  [ 1] .text     PROGBITS  0000000000000000 00000040 0000005a
  [ 2] .data     PROGBITS  0000000000000000 0000009c 00000004
  [ 3] .symtab   SYMTAB    0000000000000000 00000a30 000001b0

Symbol Table '.symtab':
   Num:    Value          Size Type    Bind   Name
     8: 000000000000001a    42 FUNC    GLOBAL my_function
     9: 0000000000000000     0 NOTYPE  GLOBAL my_global_var
    10: 0000000000000000     0 NOTYPE  GLOBAL printf      (UNDEFINED)

Implementation Hints:

  1. Include the Header: On Linux, start by including <elf.h>. This contains the Elf64_Ehdr, Elf64_Shdr, and Elf64_Sym structs you’ll need. On macOS, you’ll need <mach-o/loader.h>.
  2. Start with the Header: fread the Elf64_Ehdr from the beginning of the file. Check the e_ident magic number to ensure it’s a valid ELF file.
  3. Find the Section Headers: The main header’s e_shoff field gives you the file offset to the section header table. e_shnum tells you how many there are. Read this table into an array of Elf64_Shdr structs.
  4. Find the String Table: One of the section headers is the “section header string table” (its name is in the header itself). The e_shstrndx field tells you which one. This table is needed to get the names of all the other sections.
  5. Find Symbols: Loop through your array of section headers to find the one with type SHT_SYMTAB. This is your symbol table. Read it into an array of Elf64_Sym structs. The names are stored in a separate string table (.strtab), which you’ll also need to locate.

Learning milestones:

  1. You can correctly parse and display the ELF header → You understand the basic blueprint of an executable.
  2. You can list all section names and their sizes → You see how code and data are organized.
  3. You can list all symbols → You understand the “table of contents” for the linker.
  4. Your tool can distinguish between defined and undefined symbols → You’ve identified the core work the linker needs to do.

Project 2: A Practical Study of Symbols and Linking

  • File: LEARN_C_LINKING_DEEP_DIVE.md
  • Main Programming Language: C
  • Alternative Programming Languages: N/A
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: C Programming / Linker Theory
  • Software or Tool: GCC/Clang, and your tool from Project 1.
  • Main Book: “Linkers and Loaders” by John R. Levine

What you’ll build: Not a single tool, but a series of small, targeted C programs that demonstrate specific linker behaviors. You will create experiments to prove how symbol resolution, weak symbols, and static linking work.

Why it teaches symbols: This project moves from parsing the data structures to understanding the rules the linker applies to them. You’ll create scenarios that intentionally cause linker errors or trigger specific behaviors, forcing you to understand the “why”.

Core challenges you’ll face:

  • Causing a “multiple definition” error → maps to understanding what a “strong” symbol is
  • Using weak symbols to provide default implementations → maps to practical application of weak/strong symbol rules
  • Investigating a static library (.a) file → maps to understanding that a static library is just an archive of .o files
  • Seeing how the linker pulls only needed objects from a library → maps to understanding efficient linking

Key Concepts:

  • Symbol Resolution Rules: “Computer Systems: A Programmer’s Perspective” Ch. 7
  • Weak and Strong Symbols: The __attribute__((weak)) GCC extension.
  • Static Libraries: The ar command (ar t my_library.a lists objects).

Difficulty: Beginner Time estimate: Weekend Prerequisites: Project 1, basic command-line skills.

Real world outcome: A deep, intuitive understanding of linker errors. You’ll never be confused by an “undefined reference” again. You’ll have a git repository with several small directories, each demonstrating a core linking concept with a README.md explaining the behavior.

Implementation Hints:

Experiment 1: Strong and Weak Symbols

  1. Create lib.c: __attribute__((weak)) void func() { printf("Weak implementation\n"); }
  2. Create main.c: void func(); int main() { func(); return 0; }
  3. Compile and link: gcc main.c lib.c. It should print “Weak implementation”.
  4. Now, create main_strong.c: void func() { printf("Strong implementation\n"); } int main() { func(); return 0; }
  5. Link again: gcc main_strong.c lib.c. It should print “Strong implementation”. You’ve just seen the linker prefer the strong symbol.

Experiment 2: Selective Linking from a Static Library

  1. Create lib_a.c (void a(){...}), lib_b.c (void b(){...}).
  2. Compile them: gcc -c lib_a.c lib_b.c.
  3. Create a static library: ar rcs my_lib.a lib_a.o lib_b.o.
  4. Create main.c that only calls a().
  5. Link it: gcc main.c my_lib.a -o main_a.
  6. Use your ELF inspector (or nm or objdump) on main_a. You should see that the symbol a is included, but the symbol b is not. The linker was smart enough to only pull in the object file it needed.

Learning milestones:

  1. You can intentionally create and explain a “multiple definition” error → You understand strong symbols.
  2. You can override a weak function in a library with your own strong version → You understand the weak/strong dynamic.
  3. You can explain why the order of libraries on the command line matters → You understand the linker’s single-pass resolution process.
  4. You can predict which .o files will be extracted from a .a file → You understand how static libraries work.

Project 3: Relocation and the PLT/GOT

  • File: LEARN_C_LINKING_DEEP_DIVE.md
  • Main Programming Language: C
  • Alternative Programming Languages: N/A
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Assembly / Dynamic Linking
  • Software or Tool: A debugger (GDB/LLDB) and a disassembler (objdump).
  • Main Book: “Linkers and Loaders” by John R. Levine

What you’ll build: Another lab-based project. You’ll write simple C code that calls a shared library function (like printf). You will then disassemble the executable and trace its execution in a debugger to see the Procedure Linkage Table (PLT) and Global Offset Table (GOT) in action.

Why it teaches relocations and PIC: This project unravels the “magic” of dynamic linking. You will see the exact machine code mechanism that allows an executable to call a library function whose address isn’t known until runtime. It connects the concepts of object files directly to CPU execution.

Core challenges you’ll face:

  • Generating readable assembly → maps to using objdump -d or GDB’s disassemble command
  • Finding the PLT and GOT sections → maps to using your inspector or readelf -S
  • Stepping through the PLT indirection in a debugger → maps to seeing the lazy binding process happen live
  • Understanding how the GOT entry is patched on the first call → maps to witnessing the dynamic loader’s work

Key Concepts:

  • Procedure Linkage Table: Excellent explanation at technovelty.org
  • Lazy Binding: “Computer Systems: A Programmer’s Perspective” Ch. 7.9
  • x86 Assembly: A basic understanding of call, jmp, and memory addressing is needed.

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 1, basic GDB skills (setting breakpoints, stepping instructions (si), examining memory (x)).

Real world outcome: A profound “aha!” moment. You will be able to look at a disassembled binary, see a call printf@plt, and know exactly what series of jumps and memory lookups the CPU will perform to find and execute printf.

Implementation Hints:

  1. Create the test case: simple_call.c containing int main() { printf("Hello, world!\n"); return 0; }.
  2. Compile it: gcc -O0 simple_call.c -o simple_call. The -O0 is important to prevent the compiler from optimizing away the call.
  3. Disassemble main: objdump -d simple_call | grep -A 5 "<main>:"). You will see an instruction like callq <printf@plt>. Note the address.
  4. Start GDB: gdb ./simple_call.
  5. Observe the PLT: Use disassemble printf@plt. You’ll see a jmp to an address stored in the GOT, followed by code to invoke the dynamic loader.
  6. Set a breakpoint: break main. Run the program.
  7. Watch it happen: Use si (step instruction) to step up to and into the callq <printf@plt> instruction. You will land in the PLT stub.
  8. Examine the GOT entry for printf before the call: It will point back into the PLT stub.
  9. Step through the stub: You will see it jump to the dynamic loader (ld.so).
  10. Continue execution: continue. The program will print “Hello, world!”.
  11. Examine the GOT entry again: Now, it will contain the actual address of printf in libc.so. The next time you call printf, the PLT will jump directly there, skipping the loader.

Learning milestones:

  1. You can locate the PLT and GOT in an executable → You know where to look for dynamic linking machinery.
  2. You can explain what the first jmp in a PLT entry does → You understand the GOT lookup.
  3. You can trace a function call through the PLT to the dynamic loader → You have witnessed lazy binding.
  4. You can explain why subsequent calls to the same function are faster → You understand how the GOT is patched.

Project 4: The Dynamic Loader in Action (dlopen)

  • File: LEARN_C_LINKING_DEEP_DIVE.md
  • Main Programming Language: C
  • Alternative Programming Languages: N/A
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Systems Programming / C
  • Software or Tool: libdl library
  • Main Book: “Advanced Programming in the UNIX Environment” by Stevens & Rago

What you’ll build: A simple plugin-based program. The main application will load shared libraries (.so files) from a plugins/ directory at runtime, look for a specific function within them (e.g., run_plugin), and execute it.

Why it teaches loading: This project lets you play the role of the dynamic loader. You’ll use the same system calls (dlopen, dlsym, dlclose) that ld.so uses, but explicitly in your own code. It’s the key to building extensible applications.

Core challenges you’ll face:

  • Compiling a shared library → maps to using the -fPIC and -shared flags correctly
  • Loading a library at runtime → maps to using dlopen and handling errors
  • Finding a symbol in the loaded library → maps to using dlsym and casting the returned void* to a function pointer
  • Managing library handles and memory → maps to understanding dlclose and its implications

Key Concepts:

  • Dynamic Loading API: man 3 dlopen.
  • Function Pointers: “The C Programming Language” (K&R) Ch. 5.11.
  • PIC/Shared Libraries: GCC documentation on -fPIC.

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Solid C skills, especially function pointers.

Real world outcome: A working host program that can run new functionality by simply dropping new .so files into a directory, without recompiling the main application.

Implementation Hints:

  1. Create a Plugin Interface (plugin.h):
    #ifndef PLUGIN_H
    #define PLUGIN_H
    void run_plugin();
    #endif
    
  2. Create two plugins:
    • plugin1.c: #include <stdio.h> void run_plugin() { printf("Hello from Plugin 1!\n"); }
    • plugin2.c: #include <stdio.h> void run_plugin() { printf("Greetings from Plugin 2!\n"); }
  3. Compile the plugins as shared libraries:
    gcc -fPIC -shared plugin1.c -o plugins/plugin1.so
    gcc -fPIC -shared plugin2.c -o plugins/plugin2.so
    
  4. Create the host application (main.c):
    #include <dlfcn.h>
    #include <stdio.h>
    
    // Function pointer type that matches our plugin's function signature
    typedef void (*plugin_func_t)();
    
    int main() {
        void* handle = dlopen("./plugins/plugin1.so", RTLD_LAZY);
        if (!handle) { /* error handling */ }
    
        // Clear any existing errors
        dlerror();
    
        plugin_func_t run = (plugin_func_t)dlsym(handle, "run_plugin");
        const char* dlsym_error = dlerror();
        if (dlsym_error) { /* error handling */ }
    
        run(); // Execute the plugin's function
    
        dlclose(handle);
        return 0;
    }
    
  5. Compile and Run the Host: gcc main.c -ldl -o host (the -ldl is crucial). Then ./host.

Learning milestones:

  1. You can successfully compile a .so file → You understand the flags required for shared objects.
  2. Your host program can load a .so file without crashing → You understand dlopen.
  3. You can successfully look up and call a function by name → You’ve mastered dlsym and function pointers.
  4. Your program can be extended by adding a new plugin file without recompiling the host → You understand the power of dynamic loading.

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
ELF/Mach-O Inspector Level 2: Intermediate 1-2 weeks ★★★☆☆ ★★★☆☆
Symbol/Linking Study Level 1: Beginner Weekend ★★★☆☆ ★★☆☆☆
PLT/GOT Relocation Lab Level 3: Advanced 1-2 weeks ★★★★★ ★★★★☆
Dynamic Loader (dlopen) Level 2: Intermediate Weekend ★★★★☆ ★★★★☆

Recommendation

It is essential to do these projects in order. Start with Project 1: Build an ELF/Mach-O Inspector. This provides the fundamental knowledge of the data structures involved. Without it, the other projects will feel abstract and magical.

Once you have your inspector, proceed to the Symbol Study and the PLT/GOT Lab. These hands-on analysis projects will connect the file format knowledge to the actual behavior of the linker and loader. Finally, the dlopen project will let you apply this knowledge to a practical, real-world programming pattern.

This path will take you from theory (file formats) to observation (debugging) to application (plugins), giving you a robust and complete understanding of the C linking and loading process.

Summary

Project Main Programming Language
Build an ELF/Mach-O Inspector C
A Practical Study of Symbols and Linking C
Relocation and the PLT/GOT C
The Dynamic Loader in Action (dlopen) C