LEARN LINUX CRASH DUMP ANALYSIS
Learn Linux Crash Dump Analysis: From Core Dumps to Kernel Panics
Goal: Master the art of post-mortem debugging in Linux. Go from analyzing a simple segmentation fault to dissecting a full kernel panic, learning the tools and techniques to find the root cause of any crash.
Why Learn Crash Dump Analysis?
When a program disappears, leaving nothing but a Segmentation fault (core dumped) message, most developers are left guessing. Learning to analyze the “core dump” file is a superpower. It allows you to freeze time at the moment of the crash and inspect the entire state of your program: every variable, every thread, and the full call stack. This is the most direct way to solve the most mysterious bugs.
After completing these projects, you will:
- Confidently use GDB and other tools for post-mortem debugging.
- Understand the ELF format of core dump files.
- Diagnose segmentation faults, memory corruption, and stack overflows from a dump.
- Analyze crashes in complex multi-threaded applications.
- Understand the fundamentals of kernel crash dumps (
kdump) and how to begin analyzing them. - Build your own tools to automate crash analysis.
Core Concept Analysis
The Crash Analysis Landscape
┌───────────────────────────────────────────────────────────┐
│ RUNNING APPLICATION │
│ │
│ int main() {
│ char *p = NULL;
│ *p = 'a'; // CRASH! (Segmentation Fault)
│ return 0;
│ }
│
└───────────────────────────────────────────────────────────┘
│
▼ Signal (SIGSEGV)
┌───────────────────────────────────────────────────────────┐
│ LINUX KERNEL HANDLER │
│ │
│ Generates a snapshot of the process's memory state │
│ based on system settings (ulimit -c, core_pattern).
│
└───────────────────────────────────────────────────────────┘
│
▼ File is created
┌───────────────────────────────────────────────────────────┐
│ CORE DUMP FILE (e.g., core.1234) │
│ │
│ ELF format containing: │
│ • Register values (RIP, RSP, RAX, etc.) │
│ • The entire memory space (.text, .data, stack, heap) │
│ • Information about threads, signals, etc. │
│
└───────────────────────────────────────────────────────────┘
│
▼ Post-Mortem Debugging
┌───────────────────────────────────────────────────────────┐
│ ANALYSIS WITH GDB │
│ │
│ $ gdb ./my_app ./core.1234 │
│ (gdb) bt // See the call stack │
│ (gdb) frame 2 // Switch to a specific stack frame │
│ (gdb) p my_var // Print a variable's value │
│ (gdb) x/16wx $rsp // Examine memory on the stack │
│
└───────────────────────────────────────────────────────────┘
Key Concepts Explained
1. What is a Core Dump?
A core dump is a file containing a snapshot of a process’s memory and state at the moment it terminated unexpectedly (crashed). It’s primarily used for post-mortem debugging. On Linux, core dumps are typically in the ELF format.
2. Enabling Core Dumps
By default, many systems have core dump generation turned off. You can enable it for your current session:
# Allow unlimited size for core dump files
ulimit -c unlimited
The kernel’s behavior for naming and locating core dumps is controlled by /proc/sys/kernel/core_pattern.
3. GDB for Post-Mortem Debugging
The GNU Debugger (GDB) is the primary tool for analyzing core dumps on Linux. The basic syntax is:
gdb <path_to_executable> <path_to_core_dump>
Essential GDB commands for core analysis:
btorbacktrace: Show the call stack.frame <N>: Select stack frame N to inspect its context.info registers: Display the CPU registers at the time of the crash.p <variable>: Print the value of a variable.x/<format> <address>: Examine memory at a given address.info threads: List all threads and their states.
4. Kernel Crash Dumps (vmcore)
When the OS itself crashes (a “kernel panic”), it can also generate a dump file, often called vmcore. This is enabled via kdump, a mechanism that boots into a secondary, tiny Linux kernel to save the main memory of the panicked system. These are analyzed with specialized tools like crash.
Project List
The following 10 projects will guide you from analyzing your first simple crash to understanding the complex world of kernel panics.
Project 1: The First Crash
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language: C
- Alternative Programming Languages: C++, Rust
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Debugging / System Configuration
- Software or Tool: GCC, GDB, Bash
- Main Book: “The C Programming Language” by Kernighan & Ritchie
What you’ll build: A C program designed to crash with a segmentation fault, and a script to configure your system, run the program, and confirm a core dump file was created.
Why it teaches crash analysis: You can’t analyze what you can’t create. This project forces you to understand the prerequisites for crash analysis: enabling core dumps and knowing where to find them.
Core challenges you’ll face:
- Writing code that reliably segfaults → maps to understanding invalid memory access (e.g., dereferencing NULL)
- Configuring
ulimit→ maps to shell resource limits - Finding the generated core dump file → maps to understanding
core_pattern - Verifying the dump with
file→ maps to confirming the file is a valid ELF core dump
Key Concepts:
- Segmentation Fault: “Computer Systems: A Programmer’s Perspective” Ch. 9 - Bryant & O’Hallaron
- ulimit:
man bash - core_pattern:
man core
Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic C programming, Linux command line.
Real world outcome:
$ ./run_crash.sh
Enabling core dumps...
Running 'crashing_program'...
Segmentation fault (core dumped)
Verifying core dump...
core.crashing_program.1234: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from './crashing_program'
Success! Core dump generated.
Implementation Hints:
A simple crashing program in C:
// Don't just copy! Understand why this crashes.
int main() {
int *ptr = NULL; // ptr points to an invalid address
*ptr = 42; // Attempting to write to address 0 causes a SIGSEGV
return 0;
}
Your run_crash.sh script should:
- Print an informative message.
- Execute
ulimit -c unlimited. - Execute your compiled C program.
- Check if a file named
core*(or similar, depending on your system) was created. - Run the
filecommand on the core dump to show its type.
Learning milestones:
- Program crashes as expected → You understand how to trigger a common bug.
- Core dump is generated → You can configure a system for post-mortem debugging.
filecommand recognizes the dump → You know the core dump is a structured ELF file.
Project 2: The GDB Backtrace
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language: C
- Alternative Programming Languages: GDB commands
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Debugging / GDB
- Software or Tool: GDB
- Main Book: “The Art of Debugging with GDB” by Matloff & Salzman
What you’ll build: Using the program and core dump from Project 1, you will load them into GDB and perform the most fundamental analysis: getting a stack backtrace to find the exact line of the crash.
Why it teaches crash analysis: The backtrace is the “map” that shows you how your program got to the crash. Learning to read it is the first and most important skill in post-mortem debugging.
Core challenges you’ll face:
- Loading the executable and core file into GDB → maps to the basic GDB command-line syntax
- Compiling with debug symbols (
-g) → maps to understanding the difference between seeing memory addresses and meaningful function/line numbers - Running the
backtracecommand → maps to the primary command for crash analysis - Interpreting the output → maps to linking the stack frame to your source code
Key Concepts:
- Debug Symbols: “The Art of Debugging with GDB” Ch. 1
- Stack Frames: “Computer Systems: A Programmer’s Perspective” Ch. 3
- GDB Backtrace: GDB Documentation
Difficulty: Beginner Time estimate: Weekend Prerequisites: Project 1.
Real world outcome:
# First, compile WITH debug symbols
$ gcc -g -o crashing_program crashing_program.c
# Run it to get a core dump...
$ gdb crashing_program core.crashing_program.1234
(gdb) bt
#0 0x000055555555513d in main () at crashing_program.c:4
(gdb) quit
The output at crashing_program.c:4 is the “treasure”—the exact location of the bug. Without -g, you would only see a memory address like 0x000055555555513d.
Implementation Hints:
- Re-compile your program from Project 1, but this time add the
-gflag to GCC. This tells the compiler to include debug information (symbols) in the executable. - Generate a new core dump with this new executable.
- Launch GDB with
gdb <executable> <core_file>. - At the
(gdb)prompt, typebt(orbacktrace) and press Enter. - Analyze the output. Frame
#0is the function where the crash occurred. Notice how GDB shows you the file name and line number. - Try the same steps with a version compiled without
-g. What’s the difference in thebtoutput? This will teach you why symbols are critical.
Learning milestones:
- GDB loads successfully → You understand the basic GDB workflow.
- Backtrace shows line numbers → You understand the importance of debug symbols.
- You can identify the crashing function and line → You have learned the single most valuable skill in crash analysis.
Project 3: The Memory Inspector
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language: C
- Alternative Programming Languages: C++
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Debugging / Memory Analysis
- Software or Tool: GDB
- Main Book: “Understanding and Using C Pointers” by Richard M Reese
What you’ll build: A program that corrupts a data structure on the stack (e.g., a buffer overflow overwriting a local variable) before crashing. You will then use GDB on the core dump to inspect variables and raw memory to piece together why the crash happened.
Why it teaches crash analysis: A backtrace tells you where it crashed, but memory inspection tells you why. This project teaches you to be a detective, examining the “crime scene” (the memory dump) to understand the state of the program.
Core challenges you’ll face:
- Writing a controlled buffer overflow → maps to understanding stack layout and array bounds
- Using
print(p) to inspect variables → maps to seeing the corrupted values - Using
examine(x) to view raw memory → maps to seeing how data is laid out on the stack - Navigating stack frames (
frame) → maps to inspecting variables in the calling function
Key Concepts:
- The Stack and Local Variables: “Dive Into Systems” Ch. 6 - Matthews, Newhall, Webb
- Buffer Overflow: “Hacking: The Art of Exploitation” Ch. 2 - Jon Erickson
- GDB Memory Examination:
help xwithin GDB
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 2, basic understanding of pointers and arrays.
Real world outcome: You’ll load a core dump and perform an investigation inside GDB.
$ gdb ./corrupter core.1234
(gdb) bt
#0 0x0000... in vulnerable_function (input=0x7ffc...) at corrupter.c:15
#1 0x0000... in main () at corrupter.c:22
# You notice the crash is in a function you didn't expect. Why?
(gdb) frame 1
#1 0x0000... in main () at corrupter.c:22
(gdb) p secret_code
$1 = 65 'A' # This should be the original value, but it's been overwritten!
# Let's look at the buffer that was passed to vulnerable_function
(gdb) p buffer
$2 = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA..."
# Let's examine memory around the local variables
(gdb) x/8wx &secret_code
0x7ffc12345678: 0x00004141 0x41414141 ... # See the 'A's (0x41) overflowing!
This investigation would prove the buffer overflow in vulnerable_function corrupted secret_code in main.
Implementation Hints:
Create a C program with a main function and a vulnerable_function.
- In
main, declare achar buffer[128]and anint secret_code = 1337;. - Fill the
bufferwith user input or a hardcoded long string (e.g., 200 ‘A’s). - Call
vulnerable_function(buffer). - In
vulnerable_function, declare a local buffer, e.g.,char local_buf[64];. - Use
strcpyto copy the oversized input frommain’s buffer intolocal_buf. This will overflowlocal_bufand corrupt other data on the stack. - Add a line after the
strcpythat will crash, perhaps by calling a function pointer that also got overwritten. - Analyze the core dump. Use
p secret_codein themainframe. Is it still1337? Usexto look at the memory on the stack to see the overflow.
Learning milestones:
- You can inspect variable state → You can check program logic from a dump.
- You can inspect raw memory → You can see beyond what variables show you.
- You can link memory corruption to a crash → You have connected the “cause” (overflow) to the “effect” (crash).
Project 4: The Automated Crash Detective
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language: Python
- Alternative Programming Languages: Bash scripting with GDB batch mode
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Automation / GDB Scripting
- Software or Tool: GDB Python API
- Main Book: “Black Hat Python” by Justin Seitz (for general automation mindset)
What you’ll build: A Python script that automates the initial triage of a core dump. The script will take an executable and a core file as input, and programmatically use GDB to extract the backtrace, register state, and the crashing signal, printing a concise summary report.
Why it teaches crash analysis: Real-world systems can have dozens of crashes a day. Manually running GDB for each one is not scalable. This project teaches you to automate the repetitive parts of analysis, a key skill for SREs and platform engineers.
Core challenges you’ll face:
- Running GDB in batch mode → maps to
--batchand--commandflags - Using GDB’s Python API → maps to scripting GDB for custom logic
- Parsing GDB’s output → maps to extracting useful information from text
- Handling different crash types gracefully → maps to making your script robust
Key Concepts:
- GDB Batch Mode: GDB Documentation
- GDB Python API:
gdb.execute,gdb.parse_and_eval - Subprocess Management: Python’s
subprocessmodule
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 2, basic Python scripting.
Real world outcome:
$ python3 auto_analyzer.py ./my_app core.1234
---
Crash Analysis Report ---
Executable: ./my_app
Core File: core.1234
Signal: SIGSEGV (Segmentation fault) at 0x0
Crashing IP (RIP): 0x55555555513d
---
Backtrace ---
#0 0x000055555555513d in main () at crashing_program.c:4
--- Registers ---
RAX: 0x0
RBX: 0x0
RCX: 0x7f...
...
Implementation Hints: You can approach this in two ways:
1. GDB Batch File:
- Create a file,
commands.gdb, with GDB commands:set pagination off bt info registers quit - Run GDB from your script:
gdb -q --batch -x commands.gdb <executable> <core_file> - Capture and parse the stdout.
2. GDB Python API (more powerful):
- Create a Python script,
analyzer.py, to be loaded by GDB:# This script runs *inside* GDB import gdb gdb.execute("set pagination off") print("---\\n--- Backtrace ---") gdb.execute("bt") print("\n--- Registers ---") # You can get registers more programmatically here rip = gdb.parse_and_eval("$rip") print(f"RIP: {rip}") # ... and so on - Run GDB from a wrapper script:
gdb -q --batch -x analyzer.py --args <executable> <core_file>
The Python API is more robust as it gives you structured access to GDB’s internals, whereas parsing text output is brittle.
Learning milestones:
- Script successfully runs GDB → You can control GDB from code.
- Script extracts backtrace → You have automated the most basic triage step.
- Script generates a full report → You have built a genuinely useful analysis tool.
Project 5: Multi-threaded Mayhem
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language: C
- Alternative Programming Languages: C++, Rust
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Concurrency / Advanced Debugging
- Software or Tool: GDB, pthreads
- Main Book: “The Linux Programming Interface” by Michael Kerrisk (for pthreads)
What you’ll build: A multi-threaded C program where one thread causes a crash (e.g., a data race leading to a NULL pointer dereference). You will analyze the core dump to inspect the state of all threads to understand the interaction that led to the crash.
Why it teaches crash analysis: Single-threaded crashes are simple. In the real world, crashes are often caused by complex interactions between threads. This project teaches you to analyze the entire process, not just the single thread that failed.
Core challenges you’ll face:
- Writing a multi-threaded program with a data race → maps to understanding shared memory and synchronization problems
- Using
info threadsin GDB → maps to seeing the state of all threads - Switching between threads (
thread <N>) → maps to examining the context of each thread - Using
thread apply all bt→ maps to getting a backtrace for every single thread at once
Key Concepts:
- pthreads: “The Linux Programming Interface” Ch. 29-30
- Data Races: “Rust for Rustaceans” Ch. 6 - Jon Gjengset (explains the concept well, even for C)
- GDB Thread Debugging: GDB Documentation
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 3, understanding of threading concepts.
Real world outcome: In GDB, you’ll see a list of all threads and be able to investigate the “innocent” thread that was affected by the “guilty” one.
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f... (LWP 1234) 0x0000... in main () at mt_crash.c:30
2 Thread 0x7f... (LWP 1235) 0x0000... in writer_thread () at mt_crash.c:15
3 Thread 0x7f... (LWP 1236) 0x0000... in reader_thread () at mt_crash.c:25
# The crash happened in main (thread 1), but let's see what all threads were doing
(gdb) thread apply all bt
Thread 3 (Thread 0x7f... LWP 1236):
#0 reader_thread () at mt_crash.c:25
...
Thread 2 (Thread 0x7f... LWP 1235):
#0 writer_thread () at mt_crash.c:15
...
Thread 1 (Thread 0x7f... LWP 1234):
#0 0x0000... in main () at mt_crash.c:30 # CRASH HERE
...
# Let's switch to the writer thread and see what it did
(gdb) thread 2
(gdb) p local_ptr
$1 = (void *) 0x0 # This thread nulled the pointer that main later used!
Implementation Hints:
- Create a global pointer, e.g.,
char *g_data = "some data";. - Create a “writer” thread that, after a
sleep, setsg_data = NULL;. - Create a “reader” thread that just loops and does nothing, to show its state.
- In the
mainthread, after starting the other threads,sleepfor a shorter time than the writer thread, then try to dereferenceg_datain a loop. - By tuning the
sleepcalls, you can create a race wheremainreads the pointer just after the writer thread has set it to NULL, causing a crash inmain. - In GDB, use
info threadsandthread apply all btto see what every thread was doing. You’ll see the crash inmain, but by examining the state of thewriter_thread, you can deduce the root cause.
Learning milestones:
- You can view all threads in a crashed process → You understand the scope of a core dump.
- You can get a backtrace for every thread → You have automated the most basic triage step.
- You can correlate the state of multiple threads to find a root cause → You are now a true concurrency detective.
Project 6: Deconstructing a Stripped Binary Crash
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language: C
- Alternative Programming Languages: C++, Rust, Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Reverse Engineering / Disassembly
- Software or Tool: GDB,
strip,objdump - Main Book: “Practical Binary Analysis” by Dennis Andriesse
What you’ll build: You will take a crashing program, create a “stripped” version (removing all debug symbols), and analyze the resulting core dump. You will learn techniques to manually map memory addresses back to functions using disassembly.
Why it teaches crash analysis: In the real world, you often have to debug crashes in production binaries that have been stripped of all symbols to reduce size. This project teaches you how to work backward from raw memory addresses to find the location of a bug.
Core challenges you’ll face:
- Stripping an executable (
stripcommand) → maps to understanding what symbols are and why they are removed - Getting a backtrace with only addresses → maps to the reality of production debugging
- Using
disassemblein GDB → maps to viewing the raw assembly code at the crash site - Using
info filesandobjdump→ maps to manually calculating function locations from the unstripped binary
Key Concepts:
- Symbol Tables: “Practical Binary Analysis” Ch. 2
- x86 Assembly: “Low-Level Programming” by Igor Zhirkov
- GDB Disassembly:
help disassemblein GDB
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2, basic knowledge of what assembly code is.
Real world outcome: You’ll be faced with a useless backtrace and learn how to make sense of it.
$ gdb ./crashing_program_stripped core.5678
(gdb) bt
#0 0x000055555555513d in ?? ()
#1 0x00007ffff7de8b25 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00005555555550a0 in ?? ()
# The address 0x55555555513d is useless. Let's look at the code there.
(gdb) disassemble 0x000055555555513d
Dump of assembler code for function main:
0x0000555555555129 <+0>: push %rbp
...
=> 0x000055555555513d <+20>: movl $0x2a,(%rax) # The crash is here!
...
End of assembler dump.
# By looking at the surrounding assembly, you can figure out what the C code was doing.
# You can also use objdump on the *unstripped* binary to find what function 0x55555555513d belongs to.
Implementation Hints:
- Compile your crashing program with
-g. Keep this file (crashing_program_debug). - Create a stripped copy:
cp crashing_program_debug crashing_program_strippedthenstrip crashing_program_stripped. - Generate a core dump using the
crashing_program_strippedexecutable. - Load it in GDB:
gdb ./crashing_program_stripped core.1234. - Run
bt. Observe the lack of symbols (??). Note the memory address of frame #0. - Use the
disassemblecommand on that address to see the assembly code. - Separately, use
objdump -d crashing_program_debugorgdb crashing_program_debugto find out what function that address corresponds to. This simulates having the original build handy.
Learning milestones:
- You see a useless backtrace → You understand the problem of stripped binaries.
- You can disassemble the crashing address → You can see the machine-level instructions causing the fault.
- You can manually map an address back to a function → You have learned the fundamental technique for debugging stripped binaries.
Project 7: The Minidump Parser
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language: Python
- Alternative Programming Languages: C, Go, Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Binary Formats / File Parsing
- Software or Tool: Google Breakpad / Crashpad
- Main Book: “Practical Binary Analysis” by Dennis Andriesse (for the file parsing mindset)
What you’ll build: A Python tool that parses a Google Breakpad minidump file (.dmp) and prints basic information: the OS, architecture, crashing reason, and a list of modules loaded in the process.
Why it teaches crash analysis: Full core dumps are huge. Many real-world systems (like Chrome, Firefox, and services using Sentry) use smaller “minidumps”. Understanding this format shows you a more scalable, cross-platform approach to crash reporting.
Core challenges you’ll face:
- Generating a minidump → maps to integrating a crash reporting library like Breakpad/Crashpad
- Parsing the minidump header → maps to understanding the file’s magic number and structure
- Navigating the stream directory → maps to finding the different sections (thread list, module list, etc.)
- Parsing a specific stream (e.g., ModuleListStream) → maps to decoding a specific part of the crash data
Key Concepts:
- Minidump File Format: Microsoft Docs (Breakpad uses a superset of this)
- Breakpad/Crashpad: Google’s open-source crash reporting libraries.
- Binary Stream Parsing: Similar to parsing ELF or other binary formats.
Difficulty: Advanced
Time estimate: 2-3 weeks
Prerequisites: Project 1, strong Python skills, experience with binary data (struct module).
Real world outcome:
$ python3 minidump_parser.py 1a2b3c4d-e5f6-....dmp
---
Minidump Analysis ---
File: 1a2b3c4d-e5f6-....dmp
Timestamp: 2025-12-20 14:30:00
Flags: MiniDumpWithThreadInfo | MiniDumpWithProcessThreadData
--- System Info ---
OS: Linux 6.1.0-13-amd64
CPU Arch: AMD64
--- Crash Info ---
Signal: SIGSEGV
Address: 0x0
--- Loaded Modules (15) ---
0x000055... - /usr/bin/crashing_program
0x00007f... - /lib/x86_64-linux-gnu/libc.so.6
0x00007f... - /lib/x86_64-linux-gnu/libpthread.so.0
...
Implementation Hints:
- First, you need a minidump. The easiest way is to find a project that uses Crashpad and make it crash, or attempt to integrate it yourself into a simple C++ program. This is a challenge in itself!
- Start by parsing the
MINIDUMP_HEADER. It’s at the start of the file. Check theSignature(“MDMP”). - The header gives you
NumberOfStreamsandStreamDirectoryRva(an offset). Go to that offset. - The stream directory is an array of
MINIDUMP_DIRECTORYstructs. Each entry tells you theStreamType(e.g.,ThreadListStream,ModuleListStream) and gives you a location (offset and size) for that data. - Pick one stream to start, like
ModuleListStream. Go to its location. - The
ModuleListStreamstarts with a number of modules, followed by an array ofMINIDUMP_MODULEstructs. Parse these to get the name and base address of each loaded library.
Learning milestones:
- You can parse the minidump header → You understand the basic structure of the file.
- You can iterate the stream directory → You can find all the data sections in the file.
- You can extract the module list → You have successfully parsed a complete, non-trivial section of the dump.
- You appreciate higher-level tools → You understand the complexity that tools like
minidump_stackwalkare hiding.
Project 8: Introduction to Kernel Panics
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language: C (for the module), Bash
- Alternative Programming Languages: N/A
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Kernel Development / System Administration
- Software or Tool: QEMU/VirtualBox,
kdump, a custom kernel module - Main Book: “Linux Device Drivers, 3rd Edition” by Corbet, Rubini, and Kroah-Hartman
What you’ll build: A tiny, buggy Linux kernel module that triggers a kernel panic. You will configure a virtual machine with kdump to capture a vmcore (kernel crash dump) when the panic occurs.
Why it teaches crash analysis: This is the next level. User-space crashes are contained; kernel panics bring down the whole system. This project teaches you the mechanics of how the system saves itself for later analysis, the foundation of debugging the OS itself.
Core challenges you’ll face:
- Setting up a kernel development environment → maps to compiling modules requires kernel headers
- Writing a loadable kernel module → maps to the basics of kernel programming
- Triggering a panic → maps to dereferencing a NULL pointer in kernel space
- Configuring
kdump→ maps to reserving memory and setting up the capture kernel
Key Concepts:
- Kernel Modules: “Linux Device Drivers” Ch. 2
- Kernel Panics: “Understanding the Linux Kernel” Ch. 4 - Bovet & Cesati
- kdump/kexec: Red Hat kdump documentation
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Strong Linux admin skills, C programming, and a willingness to break and reboot VMs.
Real world outcome:
You will have a virtual machine that, upon loading your module, will panic and reboot into a capture kernel. After the reboot, you’ll find a vmcore file in /var/crash/.
# In your VM
$ sudo insmod buggy_module.ko
[ 123.456] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 123.457] #PF: supervisor write access in kernel mode
[ 123.458] PGD 0 P4D 0
[ 123.459] Oops: 0002 [#1] PREEMPT SMP
...
[ 123.465] Kernel panic - not syncing: Fatal exception
[ 123.468] ---[ end Kernel panic - not syncing: Fatal exception ]---
# The system will kexec into the dump-capture kernel and save the memory.
# After rebooting back to the normal kernel:
$ ls /var/crash/
127.0.0.1-2025-12-20-15:00:00/
$ ls /var/crash/127.0.0.1-2025-12-20-15:00:00/
vmcore vmcore-dmesg.txt
Implementation Hints:
- Use a VM! Do not do this on your host machine. QEMU or VirtualBox is perfect. Use snapshots liberally.
- Install kernel development tools on your VM (
kernel-devel,gcc,make, etc.). - Write a minimal kernel module
Makefile. - Write a minimal kernel module (
buggy_module.c) that has aninitfunction dereferencing a NULL pointer. This is enough to cause a panic. - Follow a guide for your specific distro (e.g., RHEL/CentOS, Debian) to install and configure
kdump. This usually involves editing/etc/default/grubto reserve memory (e.g.,crashkernel=512M) and enabling thekdumpservice. - Reboot to apply the kernel command line changes.
- Load your module with
insmod. Brace for impact.
Learning milestones:
- You can compile a kernel module → You understand the basics of the kernel build system.
- You can trigger a kernel panic → You understand how fragile kernel space is.
kdumpsuccessfully captures avmcore→ You have configured a production-grade kernel crash capture system.
Project 9: Analyzing a Kernel Panic with crash
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language:
crashutility commands - Alternative Programming Languages: N/A
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Kernel Internals / Advanced Debugging
- Software or Tool:
crashutility - Main Book: “Linux Kernel Development” by Robert Love
What you’ll build: Using the vmcore file from Project 8, you will use the crash utility to perform a basic analysis: get a backtrace of the panicked process, view the kernel log buffer, and inspect basic system state.
Why it teaches crash analysis: This is the final frontier of system-level crash analysis. The crash utility is GDB for the entire kernel. Learning its basics allows you to debug the OS itself, the kind of skill needed for kernel development, driver development, and high-end SRE work.
Core challenges you’ll face:
- Installing
crashand the right debug symbols → maps to linking thevmcoreto the exact kernel build - Loading the
vmcoreintocrash→ maps to the basiccrashworkflow - Running
btto see the kernel stack → maps to finding where in the kernel the panic occurred - Using
logto seedmesgoutput → maps to finding the kernel’s own error messages leading up to the panic
Key Concepts:
- crash utility:
man crash - System Map: File mapping kernel symbols to addresses.
- Kernel Data Structures: “Linux Kernel Development” Ch. 3-5
Difficulty: Expert Time estimate: 1-2 weeks
- Prerequisites: Project 8.
Real world outcome: You will interactively explore the state of the entire operating system at the moment of its death.
# You need the debug symbols for your kernel version
$ sudo crash /usr/lib/debug/lib/modules/6.1.0-13-amd64/vmlinux /var/crash/.../vmcore
crash> bt
PID: 1234 TASK: ffff88810a4d8000 CPU: 1 COMMAND: "insmod"
#0 [ffffc90000a77e30] machine_panic at ffffffff8100259b
#1 [ffffc90000a77e80] panic at ffffffff8106a3e8
#2 [ffffc90000a77f00] oops_end at ffffffff81c01b9a
#3 [ffffc90000a77f30] no_context at ffffffff8104d2ab
#4 [ffffc90000a77f80] __do_page_fault at ffffffff81c05d7a
#5 [ffffc90000a77ff0] do_page_fault at ffffffff81c0605e
#6 [ffffc90000a77ff0] page_fault at ffffffff82000b9e
#7 [ffffc90000a77ff8] buggy_module_init at ffffffffc06ef00a [buggy_module] <-- Your module!
#8 [ffffc90000a77ff8] do_one_initcall at ffffffff811020e0
...
crash> log
[ 123.456] BUG: kernel NULL pointer dereference, address: 0000000000000000
...
[ 123.465] Kernel panic - not syncing: Fatal exception
...
crash> quit
Implementation Hints:
- On the same VM where the panic occurred, install the
crashutility. - You also need the kernel debug symbols for the exact kernel version that panicked. The package is often named
kernel-debuginfoorlinux-image-amd64-dbgsym. The path to the uncompressed kernel image (vmlinux) is usually needed. - Launch
crashwith the path to the vmlinux file and thevmcorefile. - Once at the
crash>prompt, usebtto see the backtrace of the active task when the panic occurred. You should see the call chain leading from your module’s init function up topanic(). - Use
logto dump the kernel message buffer. This often contains the most direct error message. - Explore other commands like
ps(to see running processes) andstruct(to inspect kernel data structures).
Learning milestones:
crashloads the vmcore successfully → You have a working kernel analysis environment.btshows the panic stack → You can identify the crashing code path in the kernel.logshows the kernel messages → You can correlate the crash with the kernel’s own logs.- You feel like a wizard → You have successfully debugged the entire operating system.
Project 10: Building a Centralized Crash Reporter
- File: LEARN_LINUX_CRASH_DUMP_ANALYSIS.md
- Main Programming Language: Python
- Alternative Programming Languages: Go, Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: System Design / Distributed Systems
- Software or Tool: A web framework (Flask/FastAPI), S3-compatible storage, GDB
- Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann
What you’ll build: A “mini-Sentry.” This involves two parts: a client-side utility that sends crash dumps (user-space minidumps) to a server, and a server-side application that stores them, runs an automated analysis (like Project 4), and displays a simple dashboard of unique crashes.
Why it teaches crash analysis: This project elevates you from analyzing a single crash to managing crashes at scale. It forces you to think about symbolication, deduplication, and building a robust pipeline—the exact problems that real-world crash reporting services solve.
Core challenges you’ll face:
- Configuring
core_patternto pipe to your uploader → maps to programmatically capturing crashes system-wide - Designing a server to receive uploads → maps to building a reliable web API
- Storing dumps and symbols → maps to blob storage and managing debug files
- Deduplicating crashes → maps to generating a stable “fingerprint” for a crash
- Automating analysis on the server → maps to running GDB in a containerized, secure environment
Key Concepts:
core_patternPiping:man core- Symbolication: The process of resolving addresses to function/line numbers.
- System Design: “Designing Data-Intensive Applications” Ch. 1
- REST API Design: Best practices for file uploads.
Difficulty: Master Time estimate: 1 month+
- Prerequisites: Projects 4 and 7, web API development experience.
Real world outcome: You will have a dashboard accessible in your browser showing a list of crashes from your systems.
--- Crash Dashboard ---
Crash Group 1 (5 occurrences):
SIGSEGV in main() at crashing_program.c:4
- Last seen: 2025-12-20 16:00:00
[ View Latest Report ]
Crash Group 2 (2 occurrences):
SIGABRT in raise() from libc.so.6
- Last seen: 2025-12-20 15:30:00
[ View Latest Report ]
Clicking “View Latest Report” would show the automated analysis output from your GDB script.
Implementation Hints:
- The Uploader: Modify
/proc/sys/kernel/core_patternto pipe the core dump to your script, e.g.,|/usr/local/bin/crash_uploader %p %e. Yourcrash_uploaderscript will receive the core dump on stdin and should upload it (e.g., via aPOSTrequest) to your server. Use minidumps for this to keep uploads small. - The Server API: Create a web server (e.g., in Flask) with an endpoint like
/api/upload. It should accept the dump file and metadata (executable name, version). Store the dump in a blob store (like MinIO or AWS S3). - The Analyzer Worker: When a new dump arrives, trigger a background job. This job needs the dump file and the corresponding executable with debug symbols. It can then run your automated GDB script from Project 4 and save the text report.
- Crash Deduplication: The key is to generate a stable fingerprint. A simple way is to use the top few frames of the stack trace (file names and line numbers). If a new crash has the same fingerprint as an existing group, increment the counter. Otherwise, create a new group.
- The Dashboard: A simple web page that reads the grouped crash data from a database and displays it.
Learning milestones:
- Crashes are automatically uploaded → You have built a system-wide capture mechanism.
- Server processes and analyzes dumps → You have a working, automated analysis pipeline.
- Dashboard shows grouped crashes → You have successfully implemented crash deduplication.
- You have built a foundational piece of modern SRE/DevOps infrastructure.
Summary
| Project | Main Language | Difficulty |
|---|---|---|
| Project 1: The First Crash | C | Beginner |
| Project 2: The GDB Backtrace | C | Beginner |
| Project 3: The Memory Inspector | C | Intermediate |
| Project 4: The Automated Crash Detective | Python | Intermediate |
| Project 5: Multi-threaded Mayhem | C | Advanced |
| Project 6: Deconstructing a Stripped Binary Crash | C | Advanced |
| Project 7: The Minidump Parser | Python | Advanced |
| Project 8: Introduction to Kernel Panics | C | Expert |
Project 9: Analyzing a Kernel Panic with crash |
crash commands |
Expert |
| Project 10: Building a Centralized Crash Reporter | Python | Master |