Project 1: Memory Inspector Tool
Project 1: Memory Inspector Tool
The Core Question: โWhat IS memory? Where do my variables actually live, and how can I see them?โ
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | Weekend (8-16 hours) |
| Language | C |
| Prerequisites | Basic C syntax, compiling with gcc/clang |
| Main Book | โComputer Systems: A Programmerโs Perspectiveโ by Bryant & OโHallaron |
Learning Objectives
By completing this project, you will:
- Understand memory as numbered bytes - Not abstract โvariablesโ but actual addresses in memory
- Visualize stack vs heap - See how local variables and mallocโd memory occupy different regions
- Master pointer semantics - Know exactly what
&x,*p, andp + 1mean at the hardware level - Use debuggers effectively - Verify your understanding with lldb/gdb
- Develop memory intuition - Instinctively think of memory as a big array of bytes
Theoretical Foundation
What Memory Actually Is
At the hardware level, your computerโs RAM is a giant array of bytes. Each byte has:
- An address: A number from 0 to (RAM_SIZE - 1)
- A value: An 8-bit number (0-255)
When you write int x = 42; in C, youโre saying:
โReserve 4 consecutive bytes somewhere, and store the binary representation of 42 in them.โ
Memory Address Contents (hex) Contents (decimal)
0x7ffeefbff4ac 2A 42 <- x lives here (4 bytes: 2A 00 00 00)
0x7ffeefbff4ad 00 0
0x7ffeefbff4ae 00 0
0x7ffeefbff4af 00 0
The Address-of Operator (&)
The & operator returns the address where a variable is stored:
int x = 42;
int *p = &x; // p contains the ADDRESS of x
// If x is at 0x7ffeefbff4ac:
// - x contains 42 (the VALUE)
// - &x returns 0x7ffeefbff4ac (the ADDRESS)
// - p contains 0x7ffeefbff4ac (same as &x)
// - *p returns 42 (the value AT that address)
The Process Memory Layout
When your program runs, the operating system creates a virtual address space:
High addresses (0xFFFFFFFF...)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Kernel Space โ โ OS code (you can't touch this)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Stack โ โ Local variables, return addresses
โ โ โ GROWS DOWNWARD
โ โ
โ (empty) โ
โ โ
โ โ โ
โ Heap โ โ malloc'd memory
โ โ GROWS UPWARD
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ BSS โ โ Uninitialized globals
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Data โ โ Initialized globals
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Text โ โ Your compiled code (read-only)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Low addresses (0x00000000...)
Stack vs Heap: The Key Distinction
| Aspect | Stack | Heap |
|---|---|---|
| Allocation | Automatic (when function called) | Manual (malloc()) |
| Deallocation | Automatic (when function returns) | Manual (free()) |
| Speed | Very fast (just move stack pointer) | Slower (allocator overhead) |
| Size | Limited (~8MB default on Linux) | Limited by RAM |
| Growth | Downward (toward lower addresses) | Upward (toward higher addresses) |
| Typical addresses | High (0x7fffโฆ) | Lower (0x6000โฆ) |
Why Stack Grows Downward
When you call a function, a new โstack frameโ is pushed:
void bar() {
int z = 30; // Lives at lower address than y
}
void foo() {
int y = 20; // Lives at lower address than x
bar();
}
int main() {
int x = 10; // Lives at high address
foo();
}
Stack during bar():
โโโโโโโโโโโโโโโโโโโโโโโ High addresses
โ main's frame โ
โ int x = 10 โ โ 0x7ffeefbff4bc
โโโโโโโโโโโโโโโโโโโโโโโค
โ foo's frame โ
โ int y = 20 โ โ 0x7ffeefbff49c
โโโโโโโโโโโโโโโโโโโโโโโค
โ bar's frame โ
โ int z = 30 โ โ 0x7ffeefbff47c
โโโโโโโโโโโโโโโโโโโโโโโ Low addresses (stack grows down)
Pointer Arithmetic
C pointers are โtype-awareโโarithmetic moves by the size of the pointed-to type:
int arr[3] = {10, 20, 30};
int *p = arr;
// If p points to address 0x1000:
// p + 0 โ 0x1000 โ arr[0] = 10
// p + 1 โ 0x1004 โ arr[1] = 20 (moved by 4 bytes = sizeof(int))
// p + 2 โ 0x1008 โ arr[2] = 30
This is why char *p and int *p behave differently:
char *p:p + 1moves by 1 byteint *p:p + 1moves by 4 bytesdouble *p:p + 1moves by 8 bytes
Endianness: How Multi-Byte Values Are Stored
On x86/x64 (little-endian), the least significant byte comes first:
int x = 0x12345678;
Memory layout (little-endian):
Address Value
0x1000 0x78 โ Least significant byte first
0x1001 0x56
0x1002 0x34
0x1003 0x12 โ Most significant byte last
Project Specification
What Youโre Building
A command-line tool that visualizes the memory layout of a C program, showing:
- Stack variables and their addresses
- Heap allocations and their addresses
- How addresses change during function calls
- Raw byte contents of variables
- (Optional) Memory corruption demonstrations
Core Features
Feature 1: Stack Variable Visualization
$ ./memory_inspector --stack
[STACK VARIABLES]
Variable 'x' (int):
Address: 0x7ffeefbff4ac
Value: 42
Size: 4 bytes
Raw bytes: 2a 00 00 00
Feature 2: Heap Allocation Visualization
$ ./memory_inspector --heap
[HEAP ALLOCATIONS]
Pointer 'p' points to:
Address: 0x600000004000
Value: 100
Size: 4 bytes
Location: HEAP
Feature 3: Stack Frame Inspection
$ ./memory_inspector --frames
[STACK FRAME VISUALIZATION]
Calling sequence: main() โ foo() โ bar()
In bar(): z at 0x7ffeefbff47c = 30
In foo(): y at 0x7ffeefbff49c = 20
In main(): x at 0x7ffeefbff4bc = 10
Notice: Addresses DECREASE as we go deeper!
Feature 4: Raw Byte Dump
$ ./memory_inspector --bytes
Integer: 0x12345678
Byte-by-byte (little-endian):
Byte 0: 0x78 (least significant)
Byte 1: 0x56
Byte 2: 0x34
Byte 3: 0x12 (most significant)
Solution Architecture
Module Design
memory_inspector/
โโโ main.c # Entry point, argument parsing
โโโ stack_demo.c # Stack visualization functions
โโโ heap_demo.c # Heap allocation demonstrations
โโโ frame_demo.c # Stack frame hierarchy
โโโ bytes_demo.c # Raw byte inspection
โโโ utils.c # Printing utilities
โโโ utils.h # Shared declarations
โโโ Makefile
Key Data Structures
// For tracking memory regions
typedef enum {
REGION_STACK,
REGION_HEAP,
REGION_BSS,
REGION_DATA,
REGION_TEXT,
REGION_UNKNOWN
} MemoryRegion;
// For describing a variable's memory location
typedef struct {
const char *name;
void *address;
size_t size;
const char *type_name;
MemoryRegion region;
} MemoryInfo;
Core Functions to Implement
// Determine which memory region an address belongs to
MemoryRegion classify_address(void *addr);
// Print variable information
void inspect_variable(const char *name, void *addr, size_t size, const char *type);
// Dump raw bytes of a variable
void dump_bytes(void *addr, size_t size);
// Demonstrate stack frame hierarchy
void demonstrate_stack_frames(void);
// Show heap allocation behavior
void demonstrate_heap(void);
Implementation Guide
Phase 1: Basic Address Printing (2-3 hours)
Goal: Print the address of a single variable.
Start with the simplest possible program:
#include <stdio.h>
int main(void) {
int x = 42;
printf("x is at address %p, value = %d\n", (void*)&x, x);
return 0;
}
Checkpoint Questions:
- What format does
%pprint in? (Hexadecimal) - Why cast to
(void*)? (Portabilityโ%pexpects void pointer) - Run it 3 times. Does the address change? (Yes, due to ASLR)
Extension: Add more variables and observe their relative positions:
int a = 1;
int b = 2;
int c = 3;
printf("a: %p, b: %p, c: %p\n", (void*)&a, (void*)&b, (void*)&c);
// Observe: addresses decrease (stack grows down)
Phase 2: Stack vs Heap Comparison (2-3 hours)
Goal: Show the difference between stack and heap addresses.
void compare_stack_heap(void) {
int stack_var = 100;
int *heap_ptr = malloc(sizeof(int));
*heap_ptr = 200;
printf("Stack variable at: %p\n", (void*)&stack_var);
printf("Heap allocation at: %p\n", (void*)heap_ptr);
// Notice: stack addresses are much higher
// Stack: 0x7fff... (high addresses)
// Heap: 0x6000... (lower addresses)
free(heap_ptr);
}
Key Insight: You can often tell whether memory is stack or heap by looking at the address prefix:
- Stack addresses typically start with
0x7ff...on 64-bit Linux/macOS - Heap addresses typically start with
0x6...or lower
Phase 3: Function Call Stack Demonstration (2-3 hours)
Goal: Visualize how function calls create stack frames.
void bar(void) {
int z = 30;
printf(" In bar(): z at %p = %d\n", (void*)&z, z);
}
void foo(void) {
int y = 20;
printf(" In foo(): y at %p = %d\n", (void*)&y, y);
bar();
printf(" Back in foo()\n");
}
int main(void) {
int x = 10;
printf("In main(): x at %p = %d\n", (void*)&x, x);
foo();
printf("Back in main()\n");
return 0;
}
Expected Output:
In main(): x at 0x7ffeefbff4bc = 10
In foo(): y at 0x7ffeefbff49c = 20
In bar(): z at 0x7ffeefbff47c = 30
Back in foo()
Back in main()
Calculate the frame size: 0x7ffeefbff4bc - 0x7ffeefbff49c = 32 bytes between main and foo.
Phase 4: Raw Byte Inspection (2-3 hours)
Goal: See exactly how multi-byte values are stored.
void dump_bytes(void *ptr, size_t size) {
unsigned char *bytes = (unsigned char *)ptr;
for (size_t i = 0; i < size; i++) {
printf(" Byte %zu at %p: 0x%02x\n", i, (void*)(bytes + i), bytes[i]);
}
}
int main(void) {
int x = 0x12345678;
printf("Integer 0x%08x at %p:\n", x, (void*)&x);
dump_bytes(&x, sizeof(x));
return 0;
}
Expected Output (on little-endian system):
Integer 0x12345678 at 0x7ffeefbff4ac:
Byte 0 at 0x7ffeefbff4ac: 0x78
Byte 1 at 0x7ffeefbff4ad: 0x56
Byte 2 at 0x7ffeefbff4ae: 0x34
Byte 3 at 0x7ffeefbff4af: 0x12
Phase 5: Struct Padding Demonstration (2-3 hours)
Goal: See how compilers add padding for alignment.
struct Padded {
char a; // 1 byte
int b; // 4 bytes
char c; // 1 byte
};
int main(void) {
struct Padded p = {'A', 100, 'B'};
printf("sizeof(struct Padded) = %zu\n", sizeof(struct Padded));
printf("Expected without padding: %zu\n", sizeof(char) + sizeof(int) + sizeof(char));
printf("\nField addresses:\n");
printf(" a at offset %zu: %p\n", offsetof(struct Padded, a), (void*)&p.a);
printf(" b at offset %zu: %p\n", offsetof(struct Padded, b), (void*)&p.b);
printf(" c at offset %zu: %p\n", offsetof(struct Padded, c), (void*)&p.c);
printf("\nRaw bytes:\n");
dump_bytes(&p, sizeof(p));
return 0;
}
Expected Output:
sizeof(struct Padded) = 12
Expected without padding: 6
Field addresses:
a at offset 0
b at offset 4
c at offset 8
Raw bytes:
Byte 0: 0x41 ('A')
Byte 1: 0x00 (padding)
Byte 2: 0x00 (padding)
Byte 3: 0x00 (padding)
Byte 4: 0x64 (100, least significant)
Byte 5: 0x00
Byte 6: 0x00
Byte 7: 0x00
Byte 8: 0x42 ('B')
Byte 9: 0x00 (padding)
Byte 10: 0x00 (padding)
Byte 11: 0x00 (padding)
Testing Strategy
Test 1: Address Consistency
Run the program multiple times and verify:
- Stack addresses change (ASLR)
- Relative positions within a function remain consistent
- Stack grows downward (addresses decrease with depth)
Test 2: Stack vs Heap Verification
void test_stack_vs_heap(void) {
int stack_var;
int *heap_ptr = malloc(sizeof(int));
// Stack should be at higher address than heap
assert((uintptr_t)&stack_var > (uintptr_t)heap_ptr);
free(heap_ptr);
printf("Stack vs Heap test: PASS\n");
}
Test 3: Endianness Verification
void test_endianness(void) {
int x = 0x01;
unsigned char *bytes = (unsigned char*)&x;
if (bytes[0] == 0x01) {
printf("System is little-endian (x86/x64)\n");
} else {
printf("System is big-endian\n");
}
}
Test 4: Using lldb for Verification
$ clang -g memory_inspector.c -o memory_inspector
$ lldb ./memory_inspector
(lldb) breakpoint set --name main
(lldb) run
(lldb) frame variable # Show local variables
(lldb) memory read &x # Show raw bytes at x's address
(lldb) register read rsp # Show stack pointer
Common Pitfalls and Debugging Tips
Pitfall 1: Forgetting (void*) Cast with %p
// WRONG - undefined behavior
printf("%p\n", &x);
// CORRECT
printf("%p\n", (void*)&x);
Pitfall 2: Confusing & and *
int x = 42;
int *p = &x;
// &x = address of x (a number like 0x7fff...)
// x = value of x (42)
// p = address of x (same as &x)
// *p = value at address p (42, same as x)
// &p = address of p (different from &x!)
Pitfall 3: Returning Pointer to Local Variable
// WRONG - undefined behavior!
int* bad_function(void) {
int local = 42;
return &local; // local dies when function returns!
}
// CORRECT - allocate on heap
int* good_function(void) {
int *ptr = malloc(sizeof(int));
*ptr = 42;
return ptr; // caller must free
}
Debugging with AddressSanitizer
$ clang -fsanitize=address -g memory_inspector.c -o memory_inspector
$ ./memory_inspector
AddressSanitizer will catch:
- Use-after-free
- Buffer overflows
- Stack use after return
Extensions and Challenges
Challenge 1: Memory Region Classifier
Implement a function that determines which region an address belongs to:
MemoryRegion classify_address(void *addr) {
// Use heuristics based on address ranges
// Stack: 0x7fff... range
// Heap: 0x6... range
// etc.
}
Challenge 2: Pointer Validity Detector
Create a function that attempts to detect dangling pointers:
// Track allocations and frees
void* tracked_malloc(size_t size);
void tracked_free(void *ptr);
bool is_valid_pointer(void *ptr);
Challenge 3: Memory Layout Visualizer
Create an ASCII art visualization of the process memory:
=== MEMORY LAYOUT ===
0x7fff... [####----] Stack (4KB used, 8KB total)
...
0x6000... [##------] Heap (2KB used, 8KB total)
...
0x4000... [########] Code (read-only)
Challenge 4: ASLR Demonstration
Show how Address Space Layout Randomization works:
$ for i in {1..5}; do ./memory_inspector --stack-addr; done
# Show that addresses change each run
Real-World Connections
Connection 1: Debugger Internals
Debuggers like lldb and gdb use these same concepts to:
- Display variable values
- Show memory contents
- Set breakpoints at specific addresses
Connection 2: Exploit Development
Understanding memory layout is essential for:
- Buffer overflow exploitation
- Return-oriented programming (ROP)
- Understanding how ASLR protects against attacks
Connection 3: Performance Optimization
Memory layout affects:
- Cache utilization (struct packing)
- Memory bandwidth (alignment)
- False sharing in multi-threaded code
Interview Questions You Can Now Answer
- โWhat is the difference between
&xandx?โ&xis the address where x is stored;xis the value at that address
- โHow can you tell if an address is on the stack or the heap?โ
- Stack addresses are typically much higher (0x7fffโฆ range on 64-bit)
- Heap addresses are lower (0x6โฆ range)
- โWhat happens to a local variable when a function returns?โ
- Its stack frame is โpoppedโโthe memory is still there but invalid
- โWhat is a pointer, really?โ
- A number that represents a memory address, with type information for arithmetic
- โWhy does the stack grow downward on x86?โ
- Historical convention; allows stack and heap to grow toward each other
- โWhat is ASLR and why does it exist?โ
- Address Space Layout Randomization; prevents attackers from knowing where code/data is located
Resources
Books
- Computer Systems: A Programmerโs Perspective - Ch. 2-3
- Understanding and Using C Pointers by Richard Reese - Ch. 1-2
- The Linux Programming Interface by Michael Kerrisk - Ch. 6
Online
Tools
lldborgdb- Debuggers- AddressSanitizer (
-fsanitize=address) objdump -d- Disassembly
Self-Assessment Checklist
Before moving to the next project, you should be able to:
- Explain why
&xandxare different - Predict whether a variable is on stack or heap by looking at its address
- Draw a diagram of a stack frame with local variables
- Explain why
int *pandchar *pbehave differently withp + 1 - Demonstrate struct padding with actual numbers
- Use lldb to inspect memory at a given address
- Explain what ASLR does and why it matters
Final Milestone: You instinctively think of memory as numbered bytes, not abstract โvariables.โ