LEARN C SECURE CODING DEEP DIVE
Learn C Secure Coding: From Vulnerable to Invincible
Goal: To deeply understand how common C vulnerabilities work—from stack smashing and integer overflows to format string bugs—by building, exploiting, and then fixing them. You will learn to think like an attacker to write C code that is resilient by default.
Why Learn Secure C Coding?
C gives you unparalleled power and performance, but it comes at a cost: there are no safety nets. A single mistake—a forgotten bounds check, a mathematical miscalculation—can open a security hole that allows an attacker to take complete control of your program. Most of the major security vulnerabilities of the past 30 years have their roots in unsafe C and C++ code.
After completing these projects, you will:
- Intuitively recognize dangerous coding patterns.
- Understand the mechanics of stack, heap, and format string exploits.
- Master the use of safe APIs and defensive programming techniques.
- Know how to compile your code with modern security mitigations.
- Write C code that is not just correct, but robust and secure.
Core Concept Analysis
1. The C Process Memory Layout
Understanding security starts with understanding memory. A running C program’s memory is typically divided into these segments:
High Addresses
+-------------------+
| Command-line |
| & Environment |
+-------------------+
| Stack | Grows down. Stores local variables, function
| (variables, | parameters, and return addresses.
| return addrs) |
+-------------------+
| |
| ... |
| |
+-------------------+
| Heap | Grows up. Dynamic memory allocated with malloc().
+-------------------+
| BSS | Uninitialized global/static variables.
+-------------------+
| Data Segment | Initialized global/static variables.
+-------------------+
| Text Segment | Read-only machine code of the program.
+-------------------+
Low Addresses
Most vulnerabilities are about an attacker gaining the ability to read or write to a memory location they shouldn’t, especially the stack and the heap.
2. The Stack Frame
When a function is called, a “stack frame” is pushed onto the stack.
(Higher addresses)
+-------------------------+
| Function Arguments |
+-------------------------+
| Return Address | <-- CRITICAL: Where to go back to after this function.
+-------------------------+
| Saved Frame Pointer |
+-------------------------+
| Local Variables |
| (e.g., char buffer[64]) |
+-------------------------+ <-- Stack Pointer (RSP)
(Stack grows down)
Stack Smashing is the act of writing past the end of a local variable (like buffer) to overwrite the Return Address. If an attacker can control that address, they control where the program executes next.
3. Integer Overflow
C’s integers have fixed sizes and can “wrap around.” This can have disastrous security consequences.
// Imagine size and count come from a user
// size = 1,073,741,824
// count = 4
// On a 32-bit system, size_t is 32 bits.
size_t bytes_to_alloc = size * count;
// The multiplication overflows!
// 1,073,741,824 * 4 = 4,294,967,296, which is 2^32.
// In a 32-bit unsigned integer, this wraps around to 0.
char* p = malloc(bytes_to_alloc); // p = malloc(0)!
// The program thinks it has a 4GB buffer, but it has a 0-byte buffer.
// The next `memcpy` will be a massive heap overflow.
4. Format String Vulnerabilities
The printf family of functions is more powerful than it looks. The format string is an instruction manual for printf on how to interpret data from the stack.
printf("User count: %d\n", user_count); // Normal use
char user_input[100] = "%x.%x.%x.%x.%x.%x\n";
printf(user_input); // VULNERABLE!
When printf sees the %x specifiers, it obediently pops values off the stack and prints them as hex. The attacker provides the “instructions” (%x) and printf provides the data from your program’s stack, leading to information leaks. The %n specifier is even more dangerous: it writes the number of characters printed so far to an address on the stack, allowing an attacker to write to arbitrary memory.
Project List
These 10 projects are structured as a series of labs. You will first learn to spot a vulnerability, then exploit it, and finally, fix it.
Project 1: The Legacy API Pitfall Lab
- File: LEARN_C_SECURE_CODING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: C++ (has similar issues)
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Secure APIs / Bounds Checking
- Software or Tool: GCC/Clang, GDB
- Main Book: “Effective C” by Robert C. Seacord
What you’ll build: A small command-line program that asks for your name and a message, then prints a greeting. You will first build it using notoriously unsafe functions (gets, strcpy, sprintf) and learn to crash it. Then, you will fix it using their modern, safe counterparts.
Why it teaches Secure Coding: This is the most fundamental lesson in C security. It provides a hands-on demonstration of why entire classes of functions are forbidden in professional codebases and forces you to learn and use their safer replacements.
Core challenges you’ll face:
- Using
gets()to read input → maps to learning why you should never, ever use it - Crashing the program with a long input string → maps to performing a basic buffer overflow
- Replacing
gets()withfgets()→ maps to learning to specify buffer sizes - Replacing
strcpy()andsprintf()withstrncpy()/strlcpy()andsnprintf()→ maps to defensive coding against overflows
Key Concepts:
- Unsafe C Library Functions: “SEI CERT C Coding Standard” - A free, comprehensive guide.
- Bounds Checking: “Effective C” Ch. 5 - Robert C. Seacord
Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic C programming.
Real world outcome: You will have two versions of a program.
Vulnerable version (./unsafe_greeter):
$ ./unsafe_greeter
What is your name? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
Fixed version (./safe_greeter):
$ ./safe_greeter
What is your name? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Hello, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA! Your message was processed.
# The program correctly truncates the input and does not crash.
Implementation Hints: Your vulnerable program structure will be simple.
// Not the real code, but the conceptual flow
void process_greeting(char* name) {
char greeting[64];
// This is the vulnerability. What if `name` is > 30 chars?
sprintf(greeting, "Hello, %s! Your message was processed.", name);
printf("%s\n", greeting);
}
int main() {
char name[32];
printf("What is your name? ");
gets(name); // This is the first vulnerability. No size check!
process_greeting(name);
}
Your first task is to provide an input long enough to overflow name and then greeting.
Your second task is to rewrite this using fgets to read the name (it takes a size argument) and snprintf to write the greeting (it also takes a size argument).
Learning milestones:
- You successfully crash the
getsversion → You understand unbounded input is dangerous. - You successfully crash the
sprintfversion → You see that danger can come from multiple sources. - You fix the program with
fgetsandsnprintf→ You have learned the basic toolkit of safe string handling. - You start instinctively reaching for
snprintfoversprintf→ The lesson is becoming second nature.
Project 2: Stack Smashing 101 (ret2win)
- File: LEARN_C_SECURE_CODING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Assembly (for understanding)
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Exploit Development / Stack Overflows
- Software or Tool: GCC/Clang, GDB, Python (for exploit scripting)
- Main Book: “Hacking: The Art of Exploitation” by Jon Erickson
What you’ll build: A vulnerable C program with a password check and a hidden win() function that should never be called. Your goal is to write a Python script that crafts a special input string to overflow a buffer on the stack and overwrite the function’s return address, hijacking the program’s execution to call win().
Why it teaches Secure Coding: This is the classic “Aha!” moment for every security researcher. You will learn exactly how a buffer overflow translates into arbitrary code execution. By building the exploit, you will gain a visceral understanding of the stack’s structure and why protecting the return address is so critical.
Core challenges you’ll face:
- Disabling security mitigations for the lab → maps to using GCC flags like
-fno-stack-protectorand-z execstack - Finding the buffer overflow → maps to identifying a vulnerable function like
strcpy - Calculating the offset to the return address → maps to using GDB and pattern strings to find the exact number of bytes
- Crafting the exploit payload → maps to combining padding with the address of
win()
Key Concepts:
- Stack Frame Layout: “Hacking: The Art of Exploitation” Ch. 3
- Controlling EIP/RIP: Numerous CTF writeups on “ret2win”
- GDB with
pwndbg/GEF: These GDB enhancers are essential for exploit dev.
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 1, basic GDB knowledge.
Real world outcome: A working exploit that calls a hidden function.
// The vulnerable C code
void secret_function() {
printf("Congratulations! You've found the secret!\n");
// In a real CTF, this would print a flag file.
}
void vulnerable_function() {
char buffer[64];
printf("Enter your password: ");
gets(buffer); // Vulnerable!
}
// You are not supposed to be able to call secret_function().
# Your exploit script in action
$ python exploit.py | ./vulnerable_program
Enter your password: Congratulations! You've found the secret!
Implementation Hints:
First, compile your C program with security turned off so you can learn the raw mechanics.
gcc -m32 -fno-stack-protector -z execstack -no-pie -o vuln vuln.c
-m32: 32-bit makes addresses easier to handle for beginners.-fno-stack-protector: Disables stack canaries.-no-pie: Disables Address Space Layout Randomization (ASLR), sosecret_functionis always at the same address.
Your exploit process:
- Run the program in GDB.
- Use a cyclic pattern (e.g., from
pwntoolsin Python:cyclic(200)) as input. - The program will crash. The instruction pointer (
EIP/RIP) will be overwritten with part of your pattern (e.g.,0x6161616a). - Use the pattern to find the exact offset from the start of your buffer to the return address.
- Find the address of
secret_function()in GDB (p secret_function). - Construct your final payload:
offset_bytes + address_of_secret_function.
Learning milestones:
- You can crash the program and control the instruction pointer → You have achieved control over execution flow.
- You can reliably calculate the offset to the return address → You understand the stack layout.
- Your exploit successfully calls the
winfunction → You have completed a full “ret2win” exploit. - You understand what
-fno-stack-protectorand-no-piedo → You are learning about the security features you’ll need to defeat next.
Project 3: The Format String Bug Lab
- File: LEARN_C_SECURE_CODING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: N/A
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Exploit Development / Memory Disclosure / Arbitrary Write
- Software or Tool: GCC/Clang, GDB, Python
- Main Book: “The Shellcoder’s Handbook”
What you’ll build: A vulnerable program that takes user input and passes it directly to printf. This lab has three stages: (1) Use format specifiers like %x to leak data from the stack. (2) Use the %s specifier to read a secret string from a known address. (3) Use the %n specifier to overwrite a variable in memory and gain admin privileges.
Why it teaches Secure Coding: Format string bugs are a powerful and non-obvious vulnerability class. This project teaches you that even seemingly innocuous functions can be dangerous. It demonstrates how an attacker can achieve information leaks and arbitrary memory writes without a traditional buffer overflow.
Core challenges you’ll face:
- Leaking stack data with
%x→ maps to understanding how printf walks the stack - Positioning an address on the stack for
%s→ maps to crafting a precise payload - Using
%nto write a value → maps to the concept of an arbitrary-write primitive - Calculating offsets and padding for a precise write → maps to fine-grained payload construction
Key Concepts:
- Format String Exploitation: A classic paper on the topic is “Exploiting Format String Vulnerabilities” by scut / Team Teso.
- Variadic Functions: Understanding how functions like
printf(const char*, ...)work.
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2 (Stack Smashing).
Real world outcome: A series of exploits for a single vulnerable program.
// The C code
int check_auth() {
char password[16];
int auth_flag = 0; // Should be 1 for admin
char secret_message[] = "TheLaunchCodesAre0123";
printf("Enter password: ");
fgets(password, sizeof(password), stdin);
printf("Log: ");
printf(password); // Vulnerable!
if (auth_flag) {
printf("\nAccess Granted. %s\n", secret_message);
} else {
printf("\nAccess Denied.\n");
}
}
Exploit 1: Leak the stack
$ python -c 'print "%x."*10' | ./vuln
Log: 80485d4.ffc875a0.0.804860b.f7e5b5c0.ffc875a0...
# You just leaked stack contents!
Exploit 2: Arbitrary Write with %n to get Admin
# A carefully crafted payload that puts the address of auth_flag on the stack
# and uses %n to write a non-zero value into it.
$ ./vuln
Enter password: <crafted_payload>
Log: <output>
Access Granted. TheLaunchCodesAre0123
Implementation Hints:
The key is that printf expects its arguments to be on the stack. When you provide specifiers like %d, %x, etc., it just consumes the next item from the stack. Your exploit payload will contain the addresses you want to write to, followed by the format specifiers that operate on them.
%<number>$x: This lets you leak the Nth argument from the stack directly, instead of using many%x’s.%n: Writes the number of characters printed so far to an address pointed to by a stack value. Your payload for the arbitrary write will look something like[address_of_auth_flag]...%<number>c%<offset>$n. You print a certain number of characters and then use%nto write that number into your target address.
Learning milestones:
- You can reliably leak stack data and the secret message → You understand information disclosure.
- You can use
%nto overwrite a variable → You have achieved an arbitrary-write primitive. - You can construct a payload to gain admin privileges → You have completed a full format string exploit.
- You will never again write
printf(user_input)→ You have internalized this critical lesson.
Project 4: Integer Overflow to Heap Overflow
- File: LEARN_C_SECURE_CODING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: C++
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Integer Overflows / Heap Overflows
- Software or Tool: GCC/Clang, GDB
- Main Book: “Secure Coding in C and C++” by Robert C. Seacord
What you’ll build: A program that simulates processing a shopping cart. It reads a list of items, each with a quantity and price_per_item. It calculates the total size needed for the item names and allocates it on the heap. Your task is to provide input that causes the size calculation to overflow, leading to a small allocation followed by a large memcpy, resulting in a heap overflow.
Why it teaches Secure Coding: This demonstrates how a seemingly harmless math error can be a gateway to a massive memory corruption vulnerability. It connects two different vulnerability classes (integer overflow and heap overflow) and teaches you to be paranoid about any calculation involving user-provided data.
Core challenges you’ll face:
- Identifying the vulnerable calculation → maps to spotting
count * sizepatterns - Finding values that cause an integer overflow → maps to understanding the limits of
int,unsigned int, andsize_t - Triggering the heap overflow → maps to causing a
memcpyto write out of bounds - Observing the crash in GDB → maps to seeing the heap corruption and corrupted
mallocmetadata
Key Concepts:
- Integer Overflow Dangers: SEI CERT C INT32-C. Ensure that operations on signed integers do not result in overflow.
- Heap Allocator Internals: Basic understanding that
mallocstores metadata next to allocated chunks.
Difficulty: Intermediate
Time estimate: Weekend
Prerequisites: Understanding of malloc and the heap.
Real world outcome: A program that crashes in a very specific way when given malicious input, which you can analyze in GDB.
// The vulnerable C logic
// item_count and item_name_length come from user input
size_t total_size = item_count * item_name_length;
// If item_count is large and item_name_length is large,
// total_size can wrap around to be a small number.
char* buffer = malloc(total_size);
// The loop then copies `item_count * item_name_length` bytes
// (the real amount) into the tiny `buffer`.
for (int i = 0; i < item_count; i++) {
memcpy(buffer + i * item_name_length, ...); // Heap overflow!
}
Implementation Hints:
Let’s say size_t is 32 bits (unsigned int). Its max value is 4,294,967,295.
You need to find two numbers, A and B, such that A * B is greater than UINT_MAX.
For example, if the user provides item_count = 1,073,741,824 and item_name_length = 4.
The multiplication 1073741824 * 4 overflows to 0. malloc(0) is called.
The subsequent memcpy then tries to write 4GB of data into a 0-byte buffer, corrupting the heap.
Your task is to build a program with this flaw and then craft the input file that triggers it. The “fix” is to use safe integer arithmetic libraries or to check for overflow before the multiplication.
// The fix
if (item_name_length > 0 && item_count > SIZE_MAX / item_name_length) {
// Handle overflow error!
} else {
size_t total_size = item_count * item_name_length;
// ... proceed ...
}
Learning milestones:
- You can reliably cause the integer overflow → You understand integer limits.
- You can trigger the heap overflow and crash the program → You have linked the integer math to memory corruption.
- You can inspect the corrupted heap in GDB → You can see the damage your overflow caused.
- You can implement a safe check to prevent the overflow → You know how to defend against this class of attack.
Project 5: Heap Exploitation Lab (Use-After-Free)
- File: LEARN_C_SECURE_CODING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: C++ (where it’s even more common with objects)
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Exploit Development / Heap Exploitation / Type Confusion
- Software or Tool: GCC/Clang, GDB
- Main Book: “The Shellcoder’s Handbook”
What you’ll build: A program that manages “notes” for a user. The user can create a note, delete a note, and print a note. The vulnerability is that after a note is deleted (free‘d), the pointer to it is not cleared (it becomes a “dangling pointer”). Your goal is to exploit this to call an arbitrary function.
Why it teaches Secure Coding: Use-After-Free (UAF) is one of the most powerful and common vulnerability classes in modern software (especially browsers). This project teaches you the importance of pointer hygiene and object lifetimes. By creating a type confusion scenario, you will achieve a powerful exploit primitive: turning a data write into code execution.
Core challenges you’ll face:
- Creating the UAF vulnerability → maps to
free(ptr)withoutptr = NULL - Understanding heap allocator behavior (
tcache/fastbins) → maps to knowing that a subsequentmallocwill return the same memory address - Grooming the heap → maps to making allocations and deallocations to control what
mallocreturns - Achieving type confusion → maps to allocating an object of a different type in the place of the freed one
- Hijacking control flow → maps to calling a function pointer from your fake object
Key Concepts:
- Use-After-Free: OWASP article on UAF.
- Heap Grooming/Feng Shui: “The Art of Software Security Assessment” by Dowd, McDonald, and Schuh.
tcacheandfastbins: Understanding modernglibcheap implementation is key. Search for writeups ontcachepoisoning.
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Solid understanding of the heap, function pointers, and GDB.
Real world outcome: A working exploit that turns a simple note-taking program into a shell.
// Conceptual vulnerable flow
struct Note {
void (*print_note)();
char data[64];
};
struct Action {
long long command_id;
void (*execute_action)();
};
// 1. Allocate a Note. Its print_note points to a safe function.
Note* note = malloc(sizeof(Note));
note->print_note = print_note_safely;
// 2. Free the note, but the `note` pointer is still valid (dangling).
free(note);
// 3. Allocate an Action. If the size is right, malloc returns the SAME memory.
// We control the data written here.
Action* action = malloc(sizeof(Action));
action->execute_action = system; // Points to system()
// And we write "/bin/sh" into the part of the struct that overlaps `note->data`.
// 4. The program later calls the print function on the dangling pointer.
note->print_note();
// This now executes `system("/bin/sh")` because the memory has been replaced.
Implementation Hints:
Your program should have an array of pointers to notes. The “delete” function should free the note but not clear the entry in the array. The “print” function will use the dangling pointer from the array. Your exploit will involve deleting a note, then creating a new object (of a different type) that gets allocated in the same spot, and then calling the “print” function on the old note index.
Learning milestones:
- You can trigger a crash by accessing the dangling pointer → You have confirmed the UAF condition.
- You can reliably allocate a new object in the same memory location → You understand the basics of your heap allocator’s caching mechanism.
- You can overwrite a function pointer and call an existing function → You have achieved control flow hijack.
- You can call
system("/bin/sh")and get a shell → You have completed a full, modern exploit chain.
Project 6: Building a “Safe String” Library
- File: LEARN_C_SECURE_CODING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Defensive Programming / API Design / Data Structures
- Software or Tool: GCC/Clang
- Main Book: “C Interfaces and Implementations” by David R. Hanson
What you’ll build: Your own simple, bounds-checked string library. You will create a safe_string_t structure that holds a char*, a length, and a capacity. You will then implement functions like s_create, s_destroy, s_append, s_copy, etc., that are impossible to use unsafely.
Why it teaches Secure Coding: This is a purely defensive project. It forces you to think about what a “safe” API looks like. Instead of just avoiding bad functions, you will be creating good ones. This teaches you to manage memory and string state explicitly, which is the key to eliminating entire classes of buffer overflow vulnerabilities.
Core challenges you’ll face:
- Designing the
safe_string_tstruct → maps to explicit state management - Handling memory allocation and reallocation → maps to dynamically growing the string’s capacity
- Implementing bounds-checked operations → maps to writing the core logic that prevents overflows
- Designing an ergonomic API → maps to making your library easy and intuitive to use
Key Concepts:
- Opaque Pointers: Hiding implementation details from the user.
- Defensive Programming: “The Practice of Programming” by Kernighan and Pike.
- API Design: “API Design for C++” by Martin Reddy (principles are universal).
Difficulty: Intermediate
Time estimate: 1-2 weeks
Prerequisites: Strong C skills with malloc/realloc/free.
Real world outcome:
A reusable library (safestring.h, safestring.c) that you can use to rewrite Project 1.
API Usage (safestring.h):
typedef struct safe_string_t* safe_string_handle;
safe_string_handle s_create(const char* initial_str);
void s_destroy(safe_string_handle ssh);
int s_append(safe_string_handle ssh, const char* to_append);
int s_copy(safe_string_handle dest, const safe_string_handle src);
const char* s_get_cstr(safe_string_handle ssh);
size_t s_get_length(safe_string_handle ssh);
Implementation (safestring.c):
// The hidden struct
struct safe_string_t {
char* buffer;
size_t length;
size_t capacity;
};
int s_append(safe_string_handle ssh, const char* to_append) {
size_t append_len = strlen(to_append);
// This is the core defensive check
if (ssh->length + append_len + 1 > ssh->capacity) {
// Not enough space, reallocate or return an error
// ... reallocation logic ...
}
// If we get here, it's safe to append
strcat(ssh->buffer, to_append);
ssh->length += append_len;
return 0; // Success
}
Learning milestones:
- You have a working create/destroy/append implementation → You can manage the string’s lifecycle.
- Your reallocation logic correctly handles growth → Your strings can be dynamic.
- All your functions correctly check bounds before writing → Your library is fundamentally safe.
- You use your library to build another program and find it easier and safer → You appreciate the value of good API design.
Project 7: Static Analysis Tool for Vulnerabilities
- File: LEARN_C_SECURE_CODING_DEEP_DIVE.md
- Main Programming Language: Python
- Alternative Programming Languages: C++ (using
libclang), Go - Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Static Analysis / Parsing / Tooling
- Software or Tool: Python
remodule orlibclangbindings - Main Book: “Language Implementation Patterns” by Terence Parr
What you’ll build: A command-line tool that scans a C source file and flags calls to dangerous, legacy functions like gets, strcpy, strcat, and sprintf (without a size-limiting format string).
Why it teaches Secure Coding: This project moves you from finding bugs manually to automating their discovery. It teaches you to think about security at scale. Writing a static analyzer, even a simple one, is a great introduction to the powerful field of program analysis and is what real-world companies use to keep their codebases safe.
Core challenges you’ll face:
- Reading and processing a source file → maps to basic file I/O
- Using regular expressions to find function calls → maps to the simple but brittle approach
- (Advanced) Using a C parser like
libclang→ maps to the robust approach using Abstract Syntax Trees (AST) - Reporting findings with file names and line numbers → maps to making the tool useful
Key Concepts:
- Static Application Security Testing (SAST): The formal name for this type of tool.
- Regular Expressions: Essential for the simple version of this tool.
- Abstract Syntax Trees (AST): The output of a true compiler front-end, which provides a much more accurate way to analyze code.
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic Python or another scripting language.
Real world outcome: A tool that improves your workflow.
$ cat test.c
#include <stdio.h>
int main() {
char buf[10];
gets(buf); // Dangerous!
return 0;
}
$ ./c_linter test.c
[WARNING] test.c:4: Call to dangerous function 'gets'. Use 'fgets' instead.
Found 1 potential issue(s).
Implementation Hints:
Simple Regex Approach (in Python):
import re
import sys
DANGEROUS_FUNCTIONS = ["gets", "strcpy", "strcat", "sprintf"]
# A regex to find function calls. This is naive and will have false positives.
# It looks for a word from the list followed by an open parenthesis.
pattern = r'\b(' + '|'.join(DANGEROUS_FUNCTIONS) + r')\s*
'
for line_num, line in enumerate(open(sys.argv[1]), 1):
for match in re.finditer(pattern, line):
print(f"[WARNING] {sys.argv[1]}:{line_num}: Call to dangerous function '{match.group(1)}'.")
This approach is simple but can be fooled by comments, strings, etc.
Advanced libclang Approach:
This is much more complex but far more accurate.
- Install Python bindings for
libclang. - Use
clang.cindex.Index.create()to create an index. - Parse the source file into a “translation unit”:
index.parse(filename). - Write a recursive function that walks the Abstract Syntax Tree (AST) of the program.
- At each node in the tree, check if
node.kind == CursorKind.CALL_EXPR. - If it’s a function call, check if
node.spellingis in your list of dangerous functions. - If it is, report the finding using
node.location.
Learning milestones:
- Your regex-based tool finds obvious bugs → You have a working v1.
- You find the limitations of the regex approach → You understand why real parsers are needed.
- (Advanced) You get a
libclangversion working → You have built a legitimate static analysis tool. - You integrate the tool into your build process → You are automating security checks.
Project 8: The “Jailbreak” Sandbox Escape
- File: LEARN_C_SECURE_CODING_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: C++
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 5: Master
- Knowledge Area: Exploit Chaining / Sandboxing / Systems Programming
- Software or Tool:
seccomp-bpf(Linux),ptrace - Main Book: “Linux System Programming” by Robert Love
What you’ll build: A program that creates a very restrictive “sandbox” for a small piece of C code using seccomp-bpf, which filters the system calls the code is allowed to make (e.g., it can only use read, write, exit). The sandboxed code contains a hidden buffer overflow vulnerability. Your goal is to exploit the overflow to craft a ROP (Return-Oriented Programming) chain that opens a file and prints its contents (open, read, write), bypassing the sandbox policy.
Why it teaches Secure Coding: This project teaches both offense and defense at the highest level. You will learn how to build a sandbox—a key defense-in-depth mechanism. You will then learn how attackers bypass these defenses by chaining together small, allowed pieces of code (“gadgets”) to achieve a forbidden action. This is representative of how modern, complex exploits are built.
Core challenges you’ll face:
- Building a
seccompfilter → maps to learning the BPF assembly-like language for kernel filters - Finding ROP gadgets in the binary → maps to using tools like
ROPgadgetto find useful instruction sequences - Crafting a ROP chain to make system calls → maps to setting up registers (RDI, RSI, RDX…) and calling the
syscallinstruction - Chaining multiple syscalls (
open->read->write) → maps to advanced exploit chaining
Key Concepts:
- Seccomp-bpf: A powerful Linux sandboxing mechanism. Search for tutorials on using
libseccompor raw BPF filters. - Return-Oriented Programming (ROP): The definitive technique for bypassing non-executable memory (NX/DEP).
syscallCalling Convention: Understanding how to set up registers for Linux system calls.
Difficulty: Master Time estimate: 1 month+ Prerequisites: Project 2 (Stack Smashing), strong understanding of assembly and Linux syscalls.
Real world outcome: An exploit that makes a heavily restricted program read a secret file it shouldn’t be able to access.
The Sandbox Policy: “You are only allowed to write to stdout and exit. You cannot open files.”
The Vulnerable Code: Has a buffer overflow.
The Exploit: Overflows the buffer and writes a ROP chain to the stack.
- The chain first sets up the registers for
open("flag.txt", O_RDONLY). - It then calls a
syscallgadget. - It then takes the returned file descriptor and sets up registers for
read(fd, buffer, size). - It then calls
syscallagain. - Finally, it sets up registers for
write(1, buffer, size)and callssyscall.
Result:
$ ./sandboxed_program
Enter your data: <very_long_and_complex_rop_chain_payload>
FLAG{y0u_h4v3_3sc4p3d_th3_s4ndb0x}
Implementation Hints:
Use a tool like ROPgadget --binary ./program to find the building blocks for your chain. You’ll need gadgets like:
pop rdi; ret(to control the first argument to a syscall)pop rsi; ret(to control the second argument)pop rdx; ret(to control the third argument)pop rax; ret(to control the syscall number)syscall; ret(to trigger the system call)
Your payload will be a long sequence of gadget addresses and the data values you want to pop into the registers.
Learning milestones:
- You successfully create a working
seccompsandbox → You can implement a key security defense. - You find the gadgets needed for basic system calls → You understand ROP fundamentals.
- You can craft a ROP chain to perform a single, allowed syscall (like
write) → You can control program execution via ROP. - You successfully chain
open-read-writeto bypass the filter → You have built a modern, complex exploit and understand how to defeat sandboxes.
Summary
| Project | Main Language | Difficulty | Key Concept Taught |
|---|---|---|---|
| Legacy API Pitfall Lab | C | Beginner | Safe APIs vs. Unsafe APIs |
| Stack Smashing 101 | C | Intermediate | Stack Buffer Overflows, ret2win |
| Format String Bug Lab | C | Advanced | Information Leaks, Arbitrary Write |
| Integer Overflow to Heap Overflow | C | Intermediate | Integer Security |
| Heap Exploitation Lab (UAF) | C | Expert | Use-After-Free, Heap Exploitation |
| Building a “Safe String” Library | C | Intermediate | Defensive API Design |
| Static Analysis Tool | Python | Intermediate | Automated Vulnerability Discovery (SAST) |
| The “Jailbreak” Sandbox Escape | C | Master | ROP, Exploit Chaining, Sandboxing |
(Note: Projects 9 and 10 from the previous list could be adapted here as well, such as “Implementing Stack Canaries” and “The Secure Server” capstone project.)