Project 7: Static Analysis Tool for Vulnerabilities

A command-line tool that scans a C source file and flags calls to dangerous, legacy functions like gets, strcpy, strcat, and sprintf (without a size-limiting format string).

Quick Reference

Attribute Value
Primary Language Python
Alternative Languages C++ (using libclang), Go
Difficulty Level 2: Intermediate
Time Estimate 1-2 weeks
Knowledge Area Static Analysis / Parsing / Tooling
Tooling Python re module or libclang bindings
Prerequisites Basic Python or another scripting language.

What You Will Build

A command-line tool that scans a C source file and flags calls to dangerous, legacy functions like gets, strcpy, strcat, and sprintf (without a size-limiting format string).

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Reading and processing a source file → maps to basic file I/O
  • Using regular expressions to find function calls → maps to the simple but brittle approach
  • (Advanced) Using a C parser like libclang → maps to the robust approach using Abstract Syntax Trees (AST)
  • Reporting findings with file names and line numbers → maps to making the tool useful

Key Concepts

  • Static Application Security Testing (SAST): The formal name for this type of tool.
  • Regular Expressions: Essential for the simple version of this tool.
  • Abstract Syntax Trees (AST): The output of a true compiler front-end, which provides a much more accurate way to analyze code.

Real-World Outcome

$ cat test.c
#include <stdio.h>
int main() {
    char buf[10];
    gets(buf); // Dangerous!
    return 0;
}

$ ./c_linter test.c
[WARNING] test.c:4: Call to dangerous function 'gets'. Use 'fgets' instead.
Found 1 potential issue(s).

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: LEARN_C_SECURE_CODING_DEEP_DIVE.md
  • “Language Implementation Patterns” by Terence Parr