Project 2: Library Dependency Visualizer

Build a tool that parses binaries and renders their shared library dependency graph.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Language C
Prerequisites Binary file I/O, basic ELF/Mach-O concepts
Key Topics DT_NEEDED, search paths, recursion, graph output

1. Learning Objectives

By completing this project, you will:

  1. Parse ELF (Linux) or Mach-O (macOS) headers.
  2. Extract dynamic dependencies (DT_NEEDED or LC_LOAD_DYLIB).
  3. Resolve library paths using real search rules.
  4. Render a dependency graph as SVG or ASCII.

2. Theoretical Foundation

2.1 Core Concepts

  • Dynamic section entries: DT_NEEDED records required shared libraries.
  • Search path resolution: rpath, runpath, and environment variables determine actual locations.
  • Dependency graphs: Libraries depend on other libraries transitively.

2.2 Why This Matters

Understanding dependencies explains many real-world loader failures and ABI mismatches. It also reveals why packaging issues happen.

2.3 Historical Context / Background

ldd is a thin wrapper around the dynamic loader. This project exposes the underlying data and logic that ldd hides.

2.4 Common Misconceptions

  • “ldd is magic”: It follows defined rules and metadata.
  • “Dependencies are flat”: They are recursive graphs.

3. Project Specification

3.1 What You Will Build

A CLI tool that reads a binary, finds all dependencies, and outputs a dependency graph as SVG (Graphviz) or ASCII.

3.2 Functional Requirements

  1. Parse a target binary and extract its direct dependencies.
  2. Resolve each dependency to a full path.
  3. Recursively parse dependencies without infinite loops.
  4. Emit output in a chosen format.

3.3 Non-Functional Requirements

  • Performance: Avoid re-parsing libraries (use a visited set).
  • Reliability: Handle missing libraries gracefully.
  • Usability: Provide a clean CLI interface.

3.4 Example Usage / Output

$ ./libviz /usr/bin/python3 --output deps.svg
[ok] wrote deps.svg

3.5 Real World Outcome

You open the SVG and see exactly why a binary depends on libc.so, libpthread.so, and libdl.so:

$ ./libviz /usr/bin/python3 --output deps.svg
[scan] /usr/bin/python3
[found] libpython3.11.so
[found] libc.so.6
[found] libdl.so.2
[ok] wrote deps.svg

4. Solution Architecture

4.1 High-Level Design

┌───────────────┐     ┌────────────────┐     ┌──────────────┐
│ input binary  │────▶│ dependency scan│────▶│ graph output │
└───────────────┘     └────────────────┘     └──────────────┘

4.2 Key Components

Component Responsibility Key Decisions
Binary parser Read headers and dynamic section ELF vs Mach-O path
Resolver Apply search rules Respect rpath/runpath
Graph builder Track edges and nodes Avoid duplicates

4.3 Data Structures

typedef struct {
    char *name;
    char *path;
} lib_node_t;

typedef struct {
    int from;
    int to;
} lib_edge_t;

4.4 Algorithm Overview

Key Algorithm: Dependency discovery

  1. Parse binary, extract dependency names.
  2. Resolve each name to a path.
  3. Recurse into each dependency if not visited.
  4. Emit graph edges.

Complexity Analysis:

  • Time: O(N + E) for nodes and edges.
  • Space: O(N) for visited set.

5. Implementation Guide

5.1 Development Environment Setup

gcc --version
readelf --version  # useful for validation

5.2 Project Structure

project-root/
├── src/
│   ├── main.c
│   ├── elf_parser.c
│   ├── resolver.c
│   └── graph.c
└── Makefile

5.3 The Core Question You’re Answering

“What exact chain of shared libraries does this program depend on, and why?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. ELF dynamic section
    • DT_NEEDED, DT_RPATH, DT_RUNPATH
  2. Search path algorithm
    • LD_LIBRARY_PATH and default paths
  3. Graph traversal
    • Avoid cycles with a visited set

5.5 Questions to Guide Your Design

  1. Which metadata fields are required to resolve paths?
  2. How will you handle duplicate dependencies?
  3. What should the tool do when a library is missing?

5.6 Thinking Exercise

How would the graph change if you set LD_LIBRARY_PATH to include a custom directory?

5.7 The Interview Questions They’ll Ask

  1. What is DT_NEEDED and where is it stored?
  2. What is the difference between rpath and runpath?
  3. How do you avoid infinite loops in dependency graphs?

5.8 Hints in Layers

Hint 1: Start small

  • Parse a single ELF header and list DT_NEEDED entries.

Hint 2: Use a visited set

  • Hash the resolved library path.

Hint 3: Graph output

  • Emit DOT format for Graphviz.

5.9 Books That Will Help

Topic Book Chapter
Linking mechanics “Linkers and Loaders” Ch. 10
ELF basics CSAPP Ch. 7
Binary formats “Practical Binary Analysis” Ch. 2-3

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

Goals:

  • Parse a binary and list dependencies.

Tasks:

  1. Read ELF headers.
  2. Extract DT_NEEDED names.

Checkpoint: Print direct dependencies for one binary.

Phase 2: Core Functionality (4-5 days)

Goals:

  • Resolve paths and recurse.

Tasks:

  1. Implement search path resolution.
  2. Build recursive traversal.

Checkpoint: Full dependency list without duplicates.

Phase 3: Polish & Edge Cases (3-4 days)

Goals:

  • Graph output and error handling.

Tasks:

  1. Emit DOT/SVG.
  2. Handle missing libraries.

Checkpoint: Graph renders correctly.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Output format DOT/SVG/ASCII DOT + Graphviz Easy visualization
Parsing approach Manual vs libelf Manual first Learning value

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Parsing Confirm DT_NEEDED readelf -d comparison
Resolution Validate paths Compare with ldd
Graph Verify edges DOT has correct node links

6.2 Critical Test Cases

  1. A simple binary with known dependencies.
  2. A binary with recursive dependencies.
  3. A missing library path scenario.

6.3 Test Data

/usr/bin/ls
/usr/bin/python3

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Incorrect offsets Garbage names Validate with readelf
Missing runpath Wrong path Implement runpath rules
Infinite recursion Hang Track visited nodes

7.2 Debugging Strategies

  • Compare output against ldd.
  • Dump parsed offsets while developing.

7.3 Performance Traps

Repeated parsing of the same library without caching.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add ASCII graph output for terminal use.

8.2 Intermediate Extensions

  • Support both ELF and Mach-O.

8.3 Advanced Extensions

  • Highlight version mismatches and missing symbols.

9. Real-World Connections

9.1 Industry Applications

  • Packaging: Track dependency closure for distribution.
  • Debugging: Diagnose “library not found” errors.
  • lddtree: Similar tool for dependency trees.
  • patchelf: Modifies RPATH and interpreter.

9.3 Interview Relevance

  • Demonstrates knowledge of ELF metadata and loader behavior.

10. Resources

10.1 Essential Reading

  • “Linkers and Loaders” - Dynamic linking chapter.
  • TLPI - Shared library fundamentals.

10.2 Video Resources

  • Search: “ELF DT_NEEDED tutorial”.

10.3 Tools & Documentation

  • readelf: Inspect dynamic section.
  • objdump: Disassembly and headers.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain how the loader finds libraries.
  • I can read DT_NEEDED entries.

11.2 Implementation

  • My tool matches ldd output for basic binaries.
  • My graph renders without duplicates.

11.3 Growth

  • I can troubleshoot dependency issues faster.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Extract direct dependencies for a binary.

Full Completion:

  • Recursive graph with correct search paths.

Excellence (Going Above & Beyond):

  • Support multiple binary formats and version annotations.

This guide was generated from SHARED_LIBRARIES_LEARNING_PROJECTS.md. For the complete learning path, see the parent directory README.