Project 1: Memory Inspector Tool

Project 1: Memory Inspector Tool

The Core Question: โ€œWhat IS memory? Where do my variables actually live, and how can I see them?โ€

Project Overview

Attribute Value
Difficulty Intermediate
Time Estimate Weekend (8-16 hours)
Language C
Prerequisites Basic C syntax, compiling with gcc/clang
Main Book โ€œComputer Systems: A Programmerโ€™s Perspectiveโ€ by Bryant & Oโ€™Hallaron

Learning Objectives

By completing this project, you will:

  1. Understand memory as numbered bytes - Not abstract โ€œvariablesโ€ but actual addresses in memory
  2. Visualize stack vs heap - See how local variables and mallocโ€™d memory occupy different regions
  3. Master pointer semantics - Know exactly what &x, *p, and p + 1 mean at the hardware level
  4. Use debuggers effectively - Verify your understanding with lldb/gdb
  5. Develop memory intuition - Instinctively think of memory as a big array of bytes

Theoretical Foundation

What Memory Actually Is

At the hardware level, your computerโ€™s RAM is a giant array of bytes. Each byte has:

  • An address: A number from 0 to (RAM_SIZE - 1)
  • A value: An 8-bit number (0-255)

When you write int x = 42; in C, youโ€™re saying:

โ€œReserve 4 consecutive bytes somewhere, and store the binary representation of 42 in them.โ€

Memory Address    Contents (hex)    Contents (decimal)
0x7ffeefbff4ac    2A                42    <- x lives here (4 bytes: 2A 00 00 00)
0x7ffeefbff4ad    00                0
0x7ffeefbff4ae    00                0
0x7ffeefbff4af    00                0

The Address-of Operator (&)

The & operator returns the address where a variable is stored:

int x = 42;
int *p = &x;  // p contains the ADDRESS of x

// If x is at 0x7ffeefbff4ac:
// - x contains 42 (the VALUE)
// - &x returns 0x7ffeefbff4ac (the ADDRESS)
// - p contains 0x7ffeefbff4ac (same as &x)
// - *p returns 42 (the value AT that address)

The Process Memory Layout

When your program runs, the operating system creates a virtual address space:

High addresses (0xFFFFFFFF...)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Kernel Space       โ”‚  โ† OS code (you can't touch this)
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                            โ”‚
โ”‚          Stack             โ”‚  โ† Local variables, return addresses
โ”‚            โ†“               โ”‚    GROWS DOWNWARD
โ”‚                            โ”‚
โ”‚         (empty)            โ”‚
โ”‚                            โ”‚
โ”‚            โ†‘               โ”‚
โ”‚          Heap              โ”‚  โ† malloc'd memory
โ”‚                            โ”‚    GROWS UPWARD
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚          BSS               โ”‚  โ† Uninitialized globals
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚          Data              โ”‚  โ† Initialized globals
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚          Text              โ”‚  โ† Your compiled code (read-only)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Low addresses (0x00000000...)

Stack vs Heap: The Key Distinction

Aspect Stack Heap
Allocation Automatic (when function called) Manual (malloc())
Deallocation Automatic (when function returns) Manual (free())
Speed Very fast (just move stack pointer) Slower (allocator overhead)
Size Limited (~8MB default on Linux) Limited by RAM
Growth Downward (toward lower addresses) Upward (toward higher addresses)
Typical addresses High (0x7fffโ€ฆ) Lower (0x6000โ€ฆ)

Why Stack Grows Downward

When you call a function, a new โ€œstack frameโ€ is pushed:

void bar() {
    int z = 30;    // Lives at lower address than y
}

void foo() {
    int y = 20;    // Lives at lower address than x
    bar();
}

int main() {
    int x = 10;    // Lives at high address
    foo();
}

Stack during bar():
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” High addresses
โ”‚   main's frame      โ”‚
โ”‚   int x = 10        โ”‚  โ† 0x7ffeefbff4bc
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   foo's frame       โ”‚
โ”‚   int y = 20        โ”‚  โ† 0x7ffeefbff49c
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   bar's frame       โ”‚
โ”‚   int z = 30        โ”‚  โ† 0x7ffeefbff47c
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ Low addresses (stack grows down)

Pointer Arithmetic

C pointers are โ€œtype-awareโ€โ€”arithmetic moves by the size of the pointed-to type:

int arr[3] = {10, 20, 30};
int *p = arr;

// If p points to address 0x1000:
// p + 0  โ†’  0x1000  โ†’  arr[0] = 10
// p + 1  โ†’  0x1004  โ†’  arr[1] = 20  (moved by 4 bytes = sizeof(int))
// p + 2  โ†’  0x1008  โ†’  arr[2] = 30

This is why char *p and int *p behave differently:

  • char *p: p + 1 moves by 1 byte
  • int *p: p + 1 moves by 4 bytes
  • double *p: p + 1 moves by 8 bytes

Endianness: How Multi-Byte Values Are Stored

On x86/x64 (little-endian), the least significant byte comes first:

int x = 0x12345678;

Memory layout (little-endian):
Address   Value
0x1000    0x78    โ† Least significant byte first
0x1001    0x56
0x1002    0x34
0x1003    0x12    โ† Most significant byte last

Project Specification

What Youโ€™re Building

A command-line tool that visualizes the memory layout of a C program, showing:

  1. Stack variables and their addresses
  2. Heap allocations and their addresses
  3. How addresses change during function calls
  4. Raw byte contents of variables
  5. (Optional) Memory corruption demonstrations

Core Features

Feature 1: Stack Variable Visualization

$ ./memory_inspector --stack
[STACK VARIABLES]
Variable 'x' (int):
  Address: 0x7ffeefbff4ac
  Value: 42
  Size: 4 bytes
  Raw bytes: 2a 00 00 00

Feature 2: Heap Allocation Visualization

$ ./memory_inspector --heap
[HEAP ALLOCATIONS]
Pointer 'p' points to:
  Address: 0x600000004000
  Value: 100
  Size: 4 bytes
  Location: HEAP

Feature 3: Stack Frame Inspection

$ ./memory_inspector --frames
[STACK FRAME VISUALIZATION]
Calling sequence: main() โ†’ foo() โ†’ bar()

In bar(): z at 0x7ffeefbff47c = 30
In foo(): y at 0x7ffeefbff49c = 20
In main(): x at 0x7ffeefbff4bc = 10

Notice: Addresses DECREASE as we go deeper!

Feature 4: Raw Byte Dump

$ ./memory_inspector --bytes
Integer: 0x12345678
Byte-by-byte (little-endian):
  Byte 0: 0x78 (least significant)
  Byte 1: 0x56
  Byte 2: 0x34
  Byte 3: 0x12 (most significant)

Solution Architecture

Module Design

memory_inspector/
โ”œโ”€โ”€ main.c              # Entry point, argument parsing
โ”œโ”€โ”€ stack_demo.c        # Stack visualization functions
โ”œโ”€โ”€ heap_demo.c         # Heap allocation demonstrations
โ”œโ”€โ”€ frame_demo.c        # Stack frame hierarchy
โ”œโ”€โ”€ bytes_demo.c        # Raw byte inspection
โ”œโ”€โ”€ utils.c             # Printing utilities
โ”œโ”€โ”€ utils.h             # Shared declarations
โ””โ”€โ”€ Makefile

Key Data Structures

// For tracking memory regions
typedef enum {
    REGION_STACK,
    REGION_HEAP,
    REGION_BSS,
    REGION_DATA,
    REGION_TEXT,
    REGION_UNKNOWN
} MemoryRegion;

// For describing a variable's memory location
typedef struct {
    const char *name;
    void *address;
    size_t size;
    const char *type_name;
    MemoryRegion region;
} MemoryInfo;

Core Functions to Implement

// Determine which memory region an address belongs to
MemoryRegion classify_address(void *addr);

// Print variable information
void inspect_variable(const char *name, void *addr, size_t size, const char *type);

// Dump raw bytes of a variable
void dump_bytes(void *addr, size_t size);

// Demonstrate stack frame hierarchy
void demonstrate_stack_frames(void);

// Show heap allocation behavior
void demonstrate_heap(void);

Implementation Guide

Phase 1: Basic Address Printing (2-3 hours)

Goal: Print the address of a single variable.

Start with the simplest possible program:

#include <stdio.h>

int main(void) {
    int x = 42;
    printf("x is at address %p, value = %d\n", (void*)&x, x);
    return 0;
}

Checkpoint Questions:

  • What format does %p print in? (Hexadecimal)
  • Why cast to (void*)? (Portabilityโ€”%p expects void pointer)
  • Run it 3 times. Does the address change? (Yes, due to ASLR)

Extension: Add more variables and observe their relative positions:

int a = 1;
int b = 2;
int c = 3;
printf("a: %p, b: %p, c: %p\n", (void*)&a, (void*)&b, (void*)&c);
// Observe: addresses decrease (stack grows down)

Phase 2: Stack vs Heap Comparison (2-3 hours)

Goal: Show the difference between stack and heap addresses.

void compare_stack_heap(void) {
    int stack_var = 100;
    int *heap_ptr = malloc(sizeof(int));
    *heap_ptr = 200;

    printf("Stack variable at: %p\n", (void*)&stack_var);
    printf("Heap allocation at: %p\n", (void*)heap_ptr);

    // Notice: stack addresses are much higher
    // Stack: 0x7fff... (high addresses)
    // Heap:  0x6000... (lower addresses)

    free(heap_ptr);
}

Key Insight: You can often tell whether memory is stack or heap by looking at the address prefix:

  • Stack addresses typically start with 0x7ff... on 64-bit Linux/macOS
  • Heap addresses typically start with 0x6... or lower

Phase 3: Function Call Stack Demonstration (2-3 hours)

Goal: Visualize how function calls create stack frames.

void bar(void) {
    int z = 30;
    printf("  In bar(): z at %p = %d\n", (void*)&z, z);
}

void foo(void) {
    int y = 20;
    printf("  In foo(): y at %p = %d\n", (void*)&y, y);
    bar();
    printf("  Back in foo()\n");
}

int main(void) {
    int x = 10;
    printf("In main(): x at %p = %d\n", (void*)&x, x);
    foo();
    printf("Back in main()\n");
    return 0;
}

Expected Output:

In main(): x at 0x7ffeefbff4bc = 10
  In foo(): y at 0x7ffeefbff49c = 20
    In bar(): z at 0x7ffeefbff47c = 30
  Back in foo()
Back in main()

Calculate the frame size: 0x7ffeefbff4bc - 0x7ffeefbff49c = 32 bytes between main and foo.

Phase 4: Raw Byte Inspection (2-3 hours)

Goal: See exactly how multi-byte values are stored.

void dump_bytes(void *ptr, size_t size) {
    unsigned char *bytes = (unsigned char *)ptr;
    for (size_t i = 0; i < size; i++) {
        printf("  Byte %zu at %p: 0x%02x\n", i, (void*)(bytes + i), bytes[i]);
    }
}

int main(void) {
    int x = 0x12345678;
    printf("Integer 0x%08x at %p:\n", x, (void*)&x);
    dump_bytes(&x, sizeof(x));
    return 0;
}

Expected Output (on little-endian system):

Integer 0x12345678 at 0x7ffeefbff4ac:
  Byte 0 at 0x7ffeefbff4ac: 0x78
  Byte 1 at 0x7ffeefbff4ad: 0x56
  Byte 2 at 0x7ffeefbff4ae: 0x34
  Byte 3 at 0x7ffeefbff4af: 0x12

Phase 5: Struct Padding Demonstration (2-3 hours)

Goal: See how compilers add padding for alignment.

struct Padded {
    char a;     // 1 byte
    int b;      // 4 bytes
    char c;     // 1 byte
};

int main(void) {
    struct Padded p = {'A', 100, 'B'};

    printf("sizeof(struct Padded) = %zu\n", sizeof(struct Padded));
    printf("Expected without padding: %zu\n", sizeof(char) + sizeof(int) + sizeof(char));

    printf("\nField addresses:\n");
    printf("  a at offset %zu: %p\n", offsetof(struct Padded, a), (void*)&p.a);
    printf("  b at offset %zu: %p\n", offsetof(struct Padded, b), (void*)&p.b);
    printf("  c at offset %zu: %p\n", offsetof(struct Padded, c), (void*)&p.c);

    printf("\nRaw bytes:\n");
    dump_bytes(&p, sizeof(p));

    return 0;
}

Expected Output:

sizeof(struct Padded) = 12
Expected without padding: 6

Field addresses:
  a at offset 0
  b at offset 4
  c at offset 8

Raw bytes:
  Byte 0: 0x41 ('A')
  Byte 1: 0x00 (padding)
  Byte 2: 0x00 (padding)
  Byte 3: 0x00 (padding)
  Byte 4: 0x64 (100, least significant)
  Byte 5: 0x00
  Byte 6: 0x00
  Byte 7: 0x00
  Byte 8: 0x42 ('B')
  Byte 9: 0x00 (padding)
  Byte 10: 0x00 (padding)
  Byte 11: 0x00 (padding)

Testing Strategy

Test 1: Address Consistency

Run the program multiple times and verify:

  • Stack addresses change (ASLR)
  • Relative positions within a function remain consistent
  • Stack grows downward (addresses decrease with depth)

Test 2: Stack vs Heap Verification

void test_stack_vs_heap(void) {
    int stack_var;
    int *heap_ptr = malloc(sizeof(int));

    // Stack should be at higher address than heap
    assert((uintptr_t)&stack_var > (uintptr_t)heap_ptr);

    free(heap_ptr);
    printf("Stack vs Heap test: PASS\n");
}

Test 3: Endianness Verification

void test_endianness(void) {
    int x = 0x01;
    unsigned char *bytes = (unsigned char*)&x;

    if (bytes[0] == 0x01) {
        printf("System is little-endian (x86/x64)\n");
    } else {
        printf("System is big-endian\n");
    }
}

Test 4: Using lldb for Verification

$ clang -g memory_inspector.c -o memory_inspector
$ lldb ./memory_inspector
(lldb) breakpoint set --name main
(lldb) run
(lldb) frame variable         # Show local variables
(lldb) memory read &x         # Show raw bytes at x's address
(lldb) register read rsp      # Show stack pointer

Common Pitfalls and Debugging Tips

Pitfall 1: Forgetting (void*) Cast with %p

// WRONG - undefined behavior
printf("%p\n", &x);

// CORRECT
printf("%p\n", (void*)&x);

Pitfall 2: Confusing & and *

int x = 42;
int *p = &x;

// &x  = address of x     (a number like 0x7fff...)
// x   = value of x       (42)
// p   = address of x     (same as &x)
// *p  = value at address p (42, same as x)
// &p  = address of p     (different from &x!)

Pitfall 3: Returning Pointer to Local Variable

// WRONG - undefined behavior!
int* bad_function(void) {
    int local = 42;
    return &local;  // local dies when function returns!
}

// CORRECT - allocate on heap
int* good_function(void) {
    int *ptr = malloc(sizeof(int));
    *ptr = 42;
    return ptr;  // caller must free
}

Debugging with AddressSanitizer

$ clang -fsanitize=address -g memory_inspector.c -o memory_inspector
$ ./memory_inspector

AddressSanitizer will catch:

  • Use-after-free
  • Buffer overflows
  • Stack use after return

Extensions and Challenges

Challenge 1: Memory Region Classifier

Implement a function that determines which region an address belongs to:

MemoryRegion classify_address(void *addr) {
    // Use heuristics based on address ranges
    // Stack: 0x7fff... range
    // Heap: 0x6... range
    // etc.
}

Challenge 2: Pointer Validity Detector

Create a function that attempts to detect dangling pointers:

// Track allocations and frees
void* tracked_malloc(size_t size);
void tracked_free(void *ptr);
bool is_valid_pointer(void *ptr);

Challenge 3: Memory Layout Visualizer

Create an ASCII art visualization of the process memory:

=== MEMORY LAYOUT ===
0x7fff... [####----] Stack (4KB used, 8KB total)
          ...
0x6000... [##------] Heap (2KB used, 8KB total)
          ...
0x4000... [########] Code (read-only)

Challenge 4: ASLR Demonstration

Show how Address Space Layout Randomization works:

$ for i in {1..5}; do ./memory_inspector --stack-addr; done
# Show that addresses change each run

Real-World Connections

Connection 1: Debugger Internals

Debuggers like lldb and gdb use these same concepts to:

  • Display variable values
  • Show memory contents
  • Set breakpoints at specific addresses

Connection 2: Exploit Development

Understanding memory layout is essential for:

  • Buffer overflow exploitation
  • Return-oriented programming (ROP)
  • Understanding how ASLR protects against attacks

Connection 3: Performance Optimization

Memory layout affects:

  • Cache utilization (struct packing)
  • Memory bandwidth (alignment)
  • False sharing in multi-threaded code

Interview Questions You Can Now Answer

  1. โ€œWhat is the difference between &x and x?โ€
    • &x is the address where x is stored; x is the value at that address
  2. โ€œHow can you tell if an address is on the stack or the heap?โ€
    • Stack addresses are typically much higher (0x7fffโ€ฆ range on 64-bit)
    • Heap addresses are lower (0x6โ€ฆ range)
  3. โ€œWhat happens to a local variable when a function returns?โ€
    • Its stack frame is โ€œpoppedโ€โ€”the memory is still there but invalid
  4. โ€œWhat is a pointer, really?โ€
    • A number that represents a memory address, with type information for arithmetic
  5. โ€œWhy does the stack grow downward on x86?โ€
    • Historical convention; allows stack and heap to grow toward each other
  6. โ€œWhat is ASLR and why does it exist?โ€
    • Address Space Layout Randomization; prevents attackers from knowing where code/data is located

Resources

Books

  • Computer Systems: A Programmerโ€™s Perspective - Ch. 2-3
  • Understanding and Using C Pointers by Richard Reese - Ch. 1-2
  • The Linux Programming Interface by Michael Kerrisk - Ch. 6

Online

Tools

  • lldb or gdb - Debuggers
  • AddressSanitizer (-fsanitize=address)
  • objdump -d - Disassembly

Self-Assessment Checklist

Before moving to the next project, you should be able to:

  • Explain why &x and x are different
  • Predict whether a variable is on stack or heap by looking at its address
  • Draw a diagram of a stack frame with local variables
  • Explain why int *p and char *p behave differently with p + 1
  • Demonstrate struct padding with actual numbers
  • Use lldb to inspect memory at a given address
  • Explain what ASLR does and why it matters

Final Milestone: You instinctively think of memory as numbered bytes, not abstract โ€œvariables.โ€