Project 1: Memory Inspector Tool
Build a CLI that prints real addresses, labels memory regions, and explains a running C process by reading
/proc/self/maps.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 8-12 hours |
| Main Programming Language | C (Alternatives: Rust, Zig) |
| Alternative Programming Languages | Rust, Zig, C++ |
| Coolness Level | High |
| Business Potential | Medium (debug tooling, observability) |
| Prerequisites | C pointers, basic Linux CLI, compiling with gcc/clang |
| Key Topics | Virtual memory, process layout, /proc, pointers, permissions, ASLR |
1. Learning Objectives
By completing this project, you will:
- Read and interpret a process memory map line-by-line.
- Explain stack vs heap addresses with concrete evidence.
- Prove where code, data, bss, and mapped regions live in memory.
- Build a deterministic report by parsing a fixture map file.
- Design a clean CLI output format suitable for debugging.
- Connect virtual memory concepts to crash behavior and security mitigations.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Virtual Memory and Process Layout
Fundamentals
Virtual memory is the abstraction that lets every process believe it owns a large, contiguous address space. In practice, your program sees virtual addresses, not physical ones. The OS and CPU translate those virtual addresses to physical RAM using page tables. This makes isolation, protection, and sharing possible. In a typical Unix process, distinct regions appear: the text (code) segment is read+execute, data and bss are read+write, the heap grows upward with dynamic allocations, and the stack grows downward with function calls. Although the exact addresses change between runs due to ASLR, the relative layout stays consistent. Understanding this layout is essential for a memory inspector because every address you print must be tied back to a region with permissions and a purpose.
Deep Dive into the Concept
A process starts as an executable file (ELF on Linux, Mach-O on macOS) with segments that the loader maps into virtual memory. The loader creates VMAs (virtual memory areas) that represent contiguous address ranges with specific permissions, like r-x for code and rw- for data. The CPU does not directly read or write physical memory. It operates on virtual addresses and consults a translation lookaside buffer (TLB) and page tables to resolve them to physical pages. Pages are typically 4 KiB, but larger page sizes exist. If a virtual page is not mapped or violates permissions, the CPU triggers a page fault that the OS handles. This is why invalid pointer dereferences produce SIGSEGV or SIGBUS.
Memory mapping is also demand-driven. When the loader maps a segment, it may not immediately allocate physical pages. Instead, it sets up page table entries, and physical pages are allocated on first access. This explains why a large allocation can appear “reserved” in virtual memory but not consume RAM until touched. It also explains copy-on-write behavior: after fork(), parent and child share physical pages marked read-only. Only when one process writes does the kernel create a private copy of the page. A memory inspector can’t see physical pages directly, but it can infer these behaviors by observing mappings, permissions, and the relative positions of regions.
Another crucial detail is that the heap and stack are not truly single regions. The heap usually begins near the end of the data segment and grows upward through brk() or sbrk() for small allocations, while large allocations are fulfilled with mmap(), creating separate anonymous mappings. This is why your inspector might show multiple “heap-like” regions. The stack is created as a fixed-size mapping with a guard page below it; the guard page is unmapped so that overflow triggers a crash rather than silently corrupting memory. This guard page is not visible as a labeled region, but you will often see a small unmapped gap below the stack mapping in /proc/self/maps.
Permissions are enforced at the page level. Code pages are r-x so that they are executable but not writable. Data pages are rw- so that they can be modified but not executed. These permissions enable NX (non-executable) and W^X policies. ASLR randomizes the base addresses of key regions (stack, heap, shared libraries) to make exploitation harder. Your tool must explain that the addresses change but the region labels and permissions remain constant. In a deterministic mode, your tool should operate on a fixture map file so that the output remains stable even with ASLR.
Finally, virtual memory is the foundation for debugging. Tools like GDB, AddressSanitizer, and Valgrind observe your program by reading its memory layout, intercepting allocation functions, and tracing addresses. A memory inspector is a simplified, educational version of these tools. It is not just a “print addresses” program; it is an instrument that teaches how the OS frames your process. Once you can map addresses to regions, you can reason about why a pointer is invalid, why a function pointer lives in text, or why a heap allocation appears far from the heap label. This is the mental model that makes real-world debugging possible.
How this fits on projects
This concept powers the “segment classification” in Section 3.5 and the “region labeling” in Section 4.2. It also supports the deterministic fixture parsing in Section 3.7 and the output design in Section 5.2.
Definitions & key terms
- Virtual Address Space: The range of addresses a process can use.
- Page Table: Data structure mapping virtual pages to physical pages.
- VMA: Virtual Memory Area with permissions and backing.
- ASLR: Address Space Layout Randomization.
- NX: Non-executable memory policy.
Mental model diagram (ASCII)
Virtual Address Space (high -> low)
+-----------------------------+ High
| Stack (rw-) |
| grows down |
+-----------------------------+
| Mapped libs (r-x, rw-) |
+-----------------------------+
| Heap (rw-) |
| grows up |
+-----------------------------+
| BSS (rw-) |
| Data (rw-) |
| Text (r-x) |
+-----------------------------+ Low
How it works (step-by-step, with invariants and failure modes)
- Loader maps executable segments with fixed permissions.
- Kernel creates stack mapping with a guard page.
- Heap starts after data/bss; grows via
brk()ormmap(). - Page table entries are created; physical pages allocated lazily.
- ASLR randomizes base addresses on each run.
Invariants: Code pages are not writable; data pages are not executable.
Failure modes: Access to unmapped or prohibited pages triggers SIGSEGV/SIGBUS.
Minimal concrete example
#include <stdio.h>
#include <stdlib.h>
int global = 123;
int main(void) {
int local = 7;
int *heap = malloc(16);
printf("text=%p data=%p stack=%p heap=%p\n",
(void*)&main, (void*)&global, (void*)&local, (void*)heap);
free(heap);
return 0;
}
Common misconceptions
- “Pointers are physical addresses.” -> They are virtual addresses.
- “The heap is a single block.” -> It may be multiple mappings.
- “Page faults mean failure.” -> Many are normal and lazy-alloc.
Check-your-understanding questions
- Why do stack addresses change across runs?
- Why is code typically
r-xrather thanrwx? - Why might a heap allocation appear inside an
mmapregion?
Check-your-understanding answers
- ASLR randomizes the stack base address.
- Write+execute is dangerous; W^X prevents code injection.
- Large allocations are served with
mmap()rather thanbrk().
Real-world applications
- Debugging crashes with core dumps.
- Memory-mapped databases and caches.
- Security hardening and exploit mitigation analysis.
Where you’ll apply it
- Section 3.5 Data Formats / Schemas / Protocols
- Section 4.1 High-Level Design
- Section 5.4 Concepts You Must Understand First
- Also used in: Project 4: Arena Allocator
References
- “Computer Systems: A Programmer’s Perspective” (Ch. 9)
- “Operating Systems: Three Easy Pieces” (Virtual Memory chapters)
man 5 proc
Key insights
Virtual memory is the OS-owned map that makes every pointer either safe or fatal.
Summary
You learned how a process address space is structured, why permissions exist, and how ASLR changes concrete addresses without changing the layout’s structure.
Homework/Exercises to practice the concept
- Print addresses of globals, locals, and heap allocations three times.
- Annotate
/proc/self/mapswith region roles.
Solutions to the homework/exercises
- You should see stack and heap addresses shift on each run.
- Text is
r-x, data/bss/heap/stack arerw-, mapped libs arer-xorrw-.
2.2 Pointer Representation and Address Formatting
Fundamentals
C pointers are variables that store addresses. A pointer does not “know” the type of data it points to at runtime; the type exists for the compiler’s benefit so that it can scale pointer arithmetic and enforce basic type checks. When you print a pointer, you are printing a value in the process’s virtual address space. The correct format specifier in C is %p, and the value should be cast to void* to avoid undefined behavior. This is essential for a memory inspector, because printing addresses safely and consistently is the main visible output. Pointer arithmetic works in units of the pointed-to type, not bytes, which is why p + 1 means “next element,” not “next byte,” unless you cast to char*.
Deep Dive into the Concept
Pointers are integers under the hood, but not all integers are valid pointers. A pointer must reference a mapped, properly aligned address for the intended type. For example, an int* that is not aligned to a 4-byte boundary can cause misaligned access on some architectures. When you print an address, the C standard only guarantees that %p will print a representation of the pointer value. On most platforms, that representation is a hexadecimal address with a 0x prefix. Casting to void* is required because %p expects a void* argument, and passing a different pointer type is undefined behavior. In practice, compilers often allow it, but building a debugging tool means you should be strictly correct.
Pointer arithmetic is one of the biggest sources of confusion. p + 1 adds sizeof(*p) bytes, not 1 byte. This makes array traversal natural but can be surprising when you attempt to interpret raw memory. In a memory inspector, you often want byte-level operations, so you will cast to unsigned char* or uint8_t* to treat memory as a byte array. This is the same technique used by debuggers and hexdump tools. You should also understand that a pointer’s numeric value does not encode the region; you must compare it against known ranges from /proc/self/maps to label it. This means your tool should parse the memory map and then compare each address you print to those ranges.
Another subtlety is that addresses can be printed in different bases or with different widths. For clarity, always print pointers in hexadecimal with a fixed width based on sizeof(void*). A 64-bit process has 8-byte pointers, so you can format using %016lx if you cast to unsigned long or uintptr_t. But %p is the portable choice. For deterministic output, you should allow an option that prints offsets relative to a region base (e.g., “heap+0x40”) rather than absolute addresses, because ASLR makes addresses non-repeatable. This is the same strategy used by addr2line and many debuggers, which report addresses relative to a module base.
Pointer provenance matters for safety. A pointer that once referred to allocated memory becomes invalid after free(). The address value may remain unchanged, but the memory is no longer owned by your program. Dereferencing it is undefined behavior and can appear to “work” by accident. A memory inspector can demonstrate this by printing a pointer before and after free() and then explaining why that address is no longer safe. This concept helps learners understand why memory bugs are so dangerous: the address is not inherently safe; it is only safe within the lifetime rules enforced by the allocator and the C standard.
Finally, pointer formatting is a UX problem. Your output should teach, not just dump hex. Consider grouping addresses into sections (text, data, heap, stack) and showing the region label, permissions, and offset in the region. An address without context is meaningless; an address with region metadata becomes a learning tool. This is why the memory inspector must connect pointer representation to memory map parsing. You are building the bridge between “this is a number” and “this is a location with meaning.”
How this fits on projects
You will apply pointer formatting in Section 3.4 Example Usage and Section 3.7 Real World Outcome. You will apply pointer arithmetic in Section 5.2 Project Structure and Section 5.5 Questions to Guide Your Design.
Definitions & key terms
- Pointer: A variable that stores a memory address.
- Pointer arithmetic: Adding offsets scaled by
sizeof(*p). - Alignment: Address multiple required by a type.
uintptr_t: Integer type large enough to store pointer values.
Mental model diagram (ASCII)
Address value --> [Region lookup] --> Label + Offset
0x7ffd_1234_abcd
| compare against ranges
v
[stack] + 0x1f3
How it works (step-by-step, with invariants and failure modes)
- Collect pointer values from globals, locals, heap, and functions.
- Format using
%poruintptr_twith hex. - Compare address against map ranges to determine region.
- Print address, region, permissions, and offset.
Invariant: Use void* with %p and never assume addresses are stable.
Failure modes: Misusing %p or printing with the wrong type yields UB or garbage.
Minimal concrete example
#include <inttypes.h>
#include <stdio.h>
#include <stdint.h>
int main(void) {
int x = 42;
uintptr_t addr = (uintptr_t)&x;
printf("x at %p (0x%016" PRIxPTR ")\n", (void*)&x, addr);
return 0;
}
Common misconceptions
- “
p + 1adds 1 byte.” -> It addssizeof(*p)bytes. - “Any integer can be a pointer.” -> Most integers are invalid pointers.
- “Printing
%pis always consistent.” -> Only if you passvoid*.
Check-your-understanding questions
- Why should
%preceive avoid*? - How many bytes does
p + 1advance forint*on a 64-bit system? - Why are offsets more stable than absolute addresses?
Check-your-understanding answers
- The C standard defines
%pas expectingvoid*; anything else is UB. sizeof(int)bytes, typically 4 bytes.- ASLR changes base addresses but offsets within a region stay constant.
Real-world applications
- Address-based debugging (core dumps, crash reports).
- Symbolication tools that resolve addresses to source lines.
- Observability tools that report memory addresses with context.
Where you’ll apply it
- Section 3.4 Example Usage / Output
- Section 3.7.3 CLI transcript
- Also used in: Project 3: Memory Leak Detector
References
- K&R “The C Programming Language” (Pointers chapter)
- “Understanding and Using C Pointers” by Richard Reese
man 3 printf
Key insights
An address is only meaningful when you can name its region and lifetime.
Summary
You now know how to print, interpret, and reason about pointer values as real addresses inside a process.
Homework/Exercises to practice the concept
- Print the addresses of three different local variables and compare.
- Print the address of a function and confirm it lies in a
r-xregion.
Solutions to the homework/exercises
- The addresses should be close together on the stack.
- Function addresses resolve to the text region in
/proc/self/maps.
2.3 /proc/self/maps Parsing and Region Classification
Fundamentals
On Linux, /proc/self/maps is a text file that lists every virtual memory region mapped into your process. Each line shows a start and end address, permissions, an offset, device numbers, an inode, and an optional path or label such as [heap] or [stack]. Your memory inspector will read this file, parse each line, and convert it into a structured representation. The key challenge is robust parsing: the file is formatted in columns but spacing can vary. By reading it safely and parsing into ranges, you can label any pointer by comparing its address to the map ranges. This is the heart of your tool.
Deep Dive into the Concept
Each line of /proc/self/maps looks like this:
00400000-00452000 r-xp 00000000 08:01 123456 /usr/bin/myprog
The first field is a range: start and end addresses in hex. Permissions follow (read, write, execute, private/shared). The offset refers to the file offset in the mapped file. The device and inode identify the file backing the mapping, and the final field (optional) is the pathname or a label. Special labels like [heap], [stack], [vdso], and [vvar] are used for anonymous or kernel-provided regions. Your parser should treat the line as a series of tokens: range, perms, offset, dev, inode, and then the remainder of the line as the path or label.
Robust parsing means avoiding fragile assumptions. The path might be missing, or it might contain spaces. The safest approach is to read the first five tokens using sscanf and then parse the rest as a raw string. Another approach is to use getline() and manual tokenization with strtok_r, but you must be careful not to lose the pathname. A good pattern is to locate the first space after the inode field and treat the rest as the path. Your tool can then classify regions based on path and permissions. For example, if the label is [heap], you know it is the primary heap region. If the path ends in .so, it is a shared library. If permissions are r-x and the path is your executable, it is a code region.
Once parsed, you can create a structure like:
typedef struct {
uintptr_t start, end;
char perms[5];
char path[256];
} map_region;
Then, given any pointer address, you can search for the region whose range contains it. A linear scan is sufficient for small numbers of regions (usually <200). For performance, you can sort and binary search, but that is optional. The important part is correct comparison: start <= addr < end. This avoids off-by-one errors.
Classification logic is a learning feature, not just a parsing feature. You should group regions into categories: text, data, bss, heap, stack, mapped libs, anonymous mappings. Some regions (like vvar and vdso) are kernel-provided and worth calling out. Your tool should label these explicitly so the user can connect them to system behavior. For example, vdso is a small shared object the kernel maps to allow user-space to call certain kernel functions without a syscall.
Determinism is a special requirement for this project. Because /proc/self/maps reflects the current system state, output can vary across machines and runs. To provide a stable golden path, your tool should accept an --input option that reads a fixture file instead of live /proc/self/maps. This allows unit tests and documentation examples to be stable. For example, you can ship a fixtures/maps.sample file and ensure the output exactly matches the example output in Section 3.7. This is how you satisfy the “deterministic golden path” requirement without disabling ASLR system-wide.
Finally, note that macOS does not have /proc/self/maps. If you want portability, you can add a stub mode that uses vmmap or mach_vm_region, but for this sprint, Linux is the target. Your CLI should be explicit about this. If run on macOS, print a clear error message and exit with a non-zero code. This is part of the “non-functional requirements” and “failure demo.”
How this fits on projects
Parsing maps is the core of Section 3.5 and the classification table in Section 4.2. The fixture mode is required for Section 3.7.2 and the testing strategy in Section 6.
Definitions & key terms
/proc/self/maps: Kernel-provided file listing memory mappings.- VMA: Virtual Memory Area (range + permissions).
- Private/Shared: Whether changes are shared with other processes.
vdso/vvar: Kernel-provided virtual segments.
Mental model diagram (ASCII)
Line -> tokens -> struct -> region classification
"0040-0452 r-xp 0 08:01 123 /usr/bin/a"
| | | | | |
v v v v v v
start end perms ... path
How it works (step-by-step, with invariants and failure modes)
- Read each line with
getline(). - Parse range and permissions into a struct.
- Capture optional path/label.
- Store regions in a vector/list.
- For each address, find the containing region.
Invariant: Address ranges are half-open: start <= addr < end.
Failure modes: Misparsing paths, off-by-one range checks, or buffer overflows.
Minimal concrete example
char *line = NULL; size_t n = 0;
while (getline(&line, &n, f) != -1) {
unsigned long start, end, offset;
char perms[5], dev[12], path[256] = {0};
unsigned long inode;
int fields = sscanf(line, "%lx-%lx %4s %lx %11s %lu %255[^\n]",
&start, &end, perms, &offset, dev, &inode, path);
if (fields >= 6) { /* store region */ }
}
Common misconceptions
- “The maps file is stable.” -> It changes as mappings are created/destroyed.
- “The path always exists.” -> Anonymous mappings may have no path.
- “Ranges are inclusive.” -> End is exclusive.
Check-your-understanding questions
- Why might a line have no path?
- What does the
pinr-xpmean? - Why must ranges be treated as half-open intervals?
Check-your-understanding answers
- Anonymous mappings or special kernel regions may lack paths.
pmeans private (copy-on-write).- Otherwise, adjacent ranges would overlap at boundaries.
Real-world applications
- Building process explorers and memory visualization tools.
- Debugging address-based crashes.
- Forensics and memory analysis.
Where you’ll apply it
- Section 3.5 Data Formats / Schemas / Protocols
- Section 4.2 Key Components
- Also used in: Project 3: Memory Leak Detector for stable reporting formats.
References
man 5 proc- “The Linux Programming Interface” (procfs chapters)
Key insights
A reliable parser turns raw kernel text into structured memory insight.
Summary
You can now parse /proc/self/maps and classify regions in a stable, testable way.
Homework/Exercises to practice the concept
- Parse a saved maps file and list only regions with
r-x. - Find the region containing a given address.
Solutions to the homework/exercises
- The output should include your executable text and shared libraries.
- Compare the address to each region’s start/end range.
3. Project Specification
3.1 What You Will Build
A command-line tool called memviz that:
- Prints key addresses (text, data, bss, heap, stack).
- Reads
/proc/self/maps(or a fixture file) and labels regions. - Outputs a summary table and a detailed listing.
- Explains permissions and region purposes.
Included: Linux parsing, deterministic fixture mode, region classification, formatted output. Excluded: macOS support, graphical UI, kernel-level memory analysis.
3.2 Functional Requirements
- Print core addresses: text (
&main), data (&global), bss, heap, and stack. - Parse maps: read
/proc/self/mapsor--input <file>. - Classify regions: label stack, heap, text, data, bss, mapped libs, anonymous.
- Show permissions: display
rwxandp/sflags. - Offset mode: optional
--offsetsto show offsets within region bases. - Deterministic mode:
--inputfixture yields stable output.
3.3 Non-Functional Requirements
- Performance: Parse maps in under 50 ms on typical systems.
- Reliability: Handle missing
/procgracefully with clear errors. - Usability: Output should be human-readable, aligned, and annotated.
3.4 Example Usage / Output
$ ./memviz --summary
Text: 0x0000000000401140 (r-x)
Data: 0x0000000000602000 (rw-)
BSS: 0x0000000000603000 (rw-)
Heap: 0x0000000001c2b000 (rw-)
Stack: 0x00007ffffffde000 (rw-)
$ ./memviz --input fixtures/maps.sample --offsets
[heap] +0x0000 size=0x21000 perms=rw-p
[stack] +0x0000 size=0x21000 perms=rw-p
[text] +0x0000 size=0x52000 perms=r-xp /usr/bin/memviz
3.5 Data Formats / Schemas / Protocols
Input (maps line):
00400000-00452000 r-xp 00000000 08:01 123456 /usr/bin/memviz
Parsed Struct:
typedef struct {
uintptr_t start, end;
char perms[5];
char path[256];
} map_region;
3.6 Edge Cases
- Maps line missing a pathname.
- Regions with spaces in path (e.g., deleted files).
- Running on non-Linux systems.
- Reading fixture file with trailing whitespace or empty lines.
- Addresses that fall into no known region.
3.7 Real World Outcome
The outcome is a terminal tool that prints an annotated memory map and a summary of key addresses. You should be able to run it in deterministic mode and see the exact output below. It must include a failure path for missing files and a clear exit code.
3.7.1 How to Run (Copy/Paste)
make
./memviz --summary
./memviz --input fixtures/maps.sample --offsets
3.7.2 Golden Path Demo (Deterministic)
Use the provided fixtures/maps.sample file shipped with the project:
$ ./memviz --input fixtures/maps.sample --offsets
[memviz] reading fixtures/maps.sample
[text] +0x0000 size=0x00052000 perms=r-xp /usr/bin/memviz
[data] +0x0000 size=0x00001000 perms=rw-p /usr/bin/memviz
[bss] +0x1000 size=0x00002000 perms=rw-p /usr/bin/memviz
[heap] +0x0000 size=0x00021000 perms=rw-p [heap]
[stack] +0x0000 size=0x00021000 perms=rw-p [stack]
Exit code: 0
3.7.3 If CLI: Exact Terminal Transcript
$ ./memviz --summary
[memviz] pid=12345
Text: 0x0000000000401140 (r-x)
Data: 0x0000000000602000 (rw-)
BSS: 0x0000000000603000 (rw-)
Heap: 0x0000000001c2b000 (rw-)
Stack: 0x00007ffffffde000 (rw-)
Exit code: 0
$ ./memviz --input missing.maps
[memviz] error: cannot open input file "missing.maps"
Exit code: 2
4. Solution Architecture
4.1 High-Level Design
+---------+ +----------------+ +------------------+
| CLI | ---> | Map Parser | ---> | Region Classifier |
+---------+ +----------------+ +------------------+
| | |
| v v
| Region List Summary Renderer
+---------------------------------------->
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| CLI Parser | Parse flags like --summary, --input |
Minimal flags, predictable defaults |
| Map Parser | Read and parse /proc/self/maps |
getline + sscanf for robustness |
| Classifier | Label regions based on path/label | Explicit mapping table |
| Renderer | Print summary and detailed table | Fixed-width columns for readability |
4.3 Data Structures (No Full Code)
typedef struct {
uintptr_t start, end;
char perms[5];
char path[256];
char label[32];
} map_region;
4.4 Algorithm Overview
Key Algorithm: Region Lookup
- Parse maps into
map_regionlist. - For each target address, find region where
start <= addr < end. - Compute offset
addr - startfor offset mode. - Render labeled output.
Complexity Analysis:
- Time: O(R + A*R) where R is number of regions, A is number of addresses.
- Space: O(R).
5. Implementation Guide
5.1 Development Environment Setup
sudo apt-get install build-essential
# or on macOS with Linux VM
5.2 Project Structure
memviz/
|-- src/
| |-- main.c
| |-- maps.c
| |-- classify.c
| \-- render.c
|-- include/
| \-- memviz.h
|-- fixtures/
| \-- maps.sample
|-- tests/
| \-- test_maps.c
|-- Makefile
\-- README.md
5.3 The Core Question You’re Answering
“What IS memory? Where do my variables actually live in memory, and how can I prove it?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Virtual memory layout and permissions (see Section 2.1).
- Pointer formatting and arithmetic (see Section 2.2).
/proc/self/mapsparsing (see Section 2.3).
5.5 Questions to Guide Your Design
- How will you parse variable-width columns safely?
- What labels matter to a learner, and how will you derive them?
- How will you make output deterministic for tests?
- Should you print absolute addresses or offsets?
5.6 Thinking Exercise
Draw a mock /proc/self/maps file and annotate which lines correspond to code, heap, stack, and shared libraries. Then decide how your tool should label each.
5.7 The Interview Questions They’ll Ask
- How does ASLR affect debugging?
- What is the difference between
r-xandrw-regions? - Why are the heap and stack grown in opposite directions?
5.8 Hints in Layers
Hint 1: Start by printing a raw /proc/self/maps file.
Hint 2: Parse the range into start/end integers and print them.
Hint 3: Add labels for [heap] and [stack] first.
Hint 4: Add --input for deterministic fixtures.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Virtual memory | CSAPP | Ch. 9 | | Process layout | OSTEP | VM chapters | | Pointers | K&R | Ch. 5 |
5.10 Implementation Phases
Phase 1: Parser Foundation (2-3 hours)
Goals: parse maps into structs; print raw ranges. Tasks: parse line tokens, store in list, print back. Checkpoint: output matches original file lines.
Phase 2: Classification + Summary (3-4 hours)
Goals: label regions and print summary. Tasks: implement label rules, summary for text/data/heap/stack. Checkpoint: addresses show correct regions.
Phase 3: Deterministic Mode + Polish (2-3 hours)
Goals: add --input, format output, handle errors.
Tasks: implement fixture parsing, error handling, tests.
Checkpoint: output matches golden path exactly.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|———-|———|—————-|———–|
| Parsing strategy | sscanf vs tokenization | getline + sscanf | Simpler, robust enough |
| Output format | raw vs aligned table | aligned | Readable for humans |
| Determinism | disable ASLR vs fixture | fixture | Safe and portable |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|———-|———|———-|
| Unit Tests | Parser correctness | fixture parsing, range boundaries |
| Integration Tests | End-to-end output | memviz --input fixtures/maps.sample |
| Edge Case Tests | Robust parsing | empty lines, missing path |
6.2 Critical Test Cases
- Fixture file parses all regions correctly.
- Address in region boundary is classified correctly.
- Missing input file yields exit code 2.
6.3 Test Data
fixtures/maps.sample
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|———|———|———-|
| Off-by-one in ranges | address classified wrong | use start <= addr < end |
| Misparsed path | labels missing | parse remainder after inode |
| Hardcoded /proc | tool fails in tests | add --input |
7.2 Debugging Strategies
- Print parsed tokens before classification.
- Compare your output with
cat /proc/self/maps. - Use
asserton range ordering.
7.3 Performance Traps
Parsing is small; performance issues usually come from overly complex regex or dynamic allocations inside loops.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
--jsonoutput. - Add colorized output.
8.2 Intermediate Extensions
- Add
--addr <hex>to classify arbitrary address. - Show region size in KB/MB.
8.3 Advanced Extensions
- Add symbol lookup with
addr2linefor text addresses. - Visualize memory map as ASCII bar chart.
9. Real-World Connections
9.1 Industry Applications
- Debugging tools: crash triage uses memory maps to identify faults.
- Security: exploit mitigations rely on memory layout randomness.
9.2 Related Open Source Projects
- pmap: shows process memory mappings.
- procps: process tools that parse
/proc.
9.3 Interview Relevance
- Explaining stack vs heap and ASLR is a common systems interview topic.
10. Resources
10.1 Essential Reading
- CSAPP Ch. 9 (Virtual Memory)
- OSTEP VM chapters
10.2 Video Resources
- “Virtual Memory” lectures from MIT 6.828 / 6.1810
10.3 Tools & Documentation
man 5 procfor/proc/self/mapsman 3 printffor%p
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain virtual memory and ASLR.
- I can parse
/proc/self/mapsinto ranges. - I can classify an address into the correct region.
11.2 Implementation
- Summary output matches golden path.
- Missing file errors are handled with exit code 2.
- Code is clean and documented.
11.3 Growth
- I can explain my output in a debugging session.
- I documented surprising results.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Prints addresses for text/data/bss/heap/stack.
- Parses
/proc/self/mapsand labels regions. - Passes fixture test.
Full Completion:
- Includes
--offsetsand--inputmodes. - Outputs aligned summary and full map.
Excellence (Going Above & Beyond):
- Adds JSON output and address lookup.
- Includes ASCII memory map visualization.