P14: Smart Contract Security Scanner

P14: Smart Contract Security Scanner

Project Overview

Attribute Value
Main Language Python
Alternative Languages Rust, TypeScript
Difficulty Expert
Coolness Level Level 5: Production-Ready Security Tool
Business Potential High (Security Auditing Services, SaaS Tool)
Knowledge Area Security / Auditing
Main Book โ€œMastering Ethereumโ€ by Andreas M. Antonopoulos & Gavin Wood

Learning Objectives

By completing this project, you will:

  1. Master Solidity AST parsing understanding how compilers represent smart contract code as abstract syntax trees and how to traverse them programmatically
  2. Implement control flow analysis tracing all possible execution paths through a contract to detect state-dependent vulnerabilities
  3. Build pattern-matching engines that detect known vulnerability signatures while understanding why each pattern is dangerous
  4. Develop false positive reduction strategies using dataflow analysis, symbolic constraints, and heuristics to distinguish real bugs from benign patterns
  5. Understand smart contract security deeply from reentrancy and integer overflows to access control and oracle manipulation

Deep Theoretical Foundation

The $3 Billion Problem

Smart contract vulnerabilities have resulted in billions of dollars in losses:

Incident Year Loss Vulnerability
The DAO 2016 $60M Reentrancy
Parity Multisig 2017 $30M Access Control
Parity Wallet Library 2017 $280M Delegatecall + Access Control
bZx Flash Loan 2020 $8M Oracle Manipulation
Cream Finance 2021 $130M Flash Loan + Reentrancy
Ronin Bridge 2022 $625M Private Key Compromise
Euler Finance 2023 $197M Donation Attack

Unlike traditional software where bugs cause crashes, smart contract bugs cause permanent financial loss. Thereโ€™s no โ€œundoโ€ button, no customer support to call, no rollback. Code is law, and flawed law can be exploited.

Why Static Analysis Matters

Manual auditing cannot scale. A skilled auditor might review 200-500 lines of Solidity per hour. Complex DeFi protocols contain tens of thousands of lines. At the same time, new contracts are deployed every minute.

Static analysis tools serve as the first line of defense:

  1. Speed: Analyze thousands of contracts per hour
  2. Consistency: Never miss a known pattern due to fatigue
  3. Coverage: Check every function, every path, every state
  4. Documentation: Generate reports that guide manual review

But static analysis has fundamental limitations:

  1. False positives: Flagging safe code as vulnerable
  2. False negatives: Missing actual vulnerabilities
  3. Semantic blindness: Not understanding business logic
  4. Halting problem: Cannot determine all runtime behaviors

The art of building a security scanner is balancing coverage (catching real bugs) against precision (not crying wolf).

Understanding the Solidity Compilation Pipeline

Source Code (.sol)
        |
        v
    [Lexer/Parser]
        |
        v
Abstract Syntax Tree (AST)
        |
        v
   [Type Checker]
        |
        v
  Annotated AST
        |
        v
   [IR Generator]
        |
        v
 Yul Intermediate Representation
        |
        v
   [Optimizer]
        |
        v
   [Code Generator]
        |
        v
    EVM Bytecode

For static analysis, we primarily work with the AST and sometimes the Yul IR. The AST preserves the full semantic structure of the source code.

The Solidity AST Structure

When you compile Solidity with --ast-compact-json, you get a tree structure:

{
  "nodeType": "ContractDefinition",
  "name": "Vulnerable",
  "nodes": [
    {
      "nodeType": "FunctionDefinition",
      "name": "withdraw",
      "body": {
        "nodeType": "Block",
        "statements": [
          {
            "nodeType": "ExpressionStatement",
            "expression": {
              "nodeType": "FunctionCall",
              "expression": {
                "nodeType": "MemberAccess",
                "memberName": "call"
              }
            }
          }
        ]
      }
    }
  ]
}

Each node has:

  • nodeType: The grammatical category (ContractDefinition, FunctionDefinition, Assignment, etc.)
  • src: Source location (file:start:length:fileIndex)
  • id: Unique identifier for cross-references
  • children: Type-specific child nodes

Control Flow Graphs (CFG)

A Control Flow Graph represents all possible execution paths through a function:

              [Entry]
                 |
                 v
         [if (balance > 0)]
            /         \
           T           F
          /             \
         v               v
    [call.value()]    [return]
         |
         v
  [balance = 0]
         |
         v
      [Exit]

Nodes represent basic blocks (sequences of statements with no branches). Edges represent control flow (function calls, if/else, loops, reverts).

CFGs enable:

  1. Path analysis: What states are possible at each point?
  2. Dominator analysis: What must execute before reaching a point?
  3. Liveness analysis: What variables are โ€œliveโ€ (used later)?

Dataflow Analysis

Dataflow analysis tracks how values flow through the program:

Reaching Definitions: Which assignments might reach a given use?

x = 1;           // Definition D1
if (condition) {
    x = 2;       // Definition D2
}
use(x);          // D1 and D2 both reach here

Taint Analysis: Which values are influenced by user input?

function transfer(address to, uint amount) {
    // 'to' and 'amount' are tainted (user-controlled)
    balances[msg.sender] -= amount;  // 'amount' flows into state
    balances[to] += amount;          // 'to' flows into state key
}

Available Expressions: What expressions have already been computed?

a = x + y;
b = x + y;  // 'x + y' is available, could be reused

The Vulnerability Taxonomy

Category 1: Reentrancy

The Bug: External calls transfer control to an untrusted contract, which can call back before state updates complete.

Classic Pattern:

function withdraw() public {
    uint amount = balances[msg.sender];
    (bool success, ) = msg.sender.call{value: amount}("");  // External call
    require(success);
    balances[msg.sender] = 0;  // State update AFTER call
}

The Attack:

contract Attacker {
    Vulnerable target;

    function attack() public payable {
        target.deposit{value: 1 ether}();
        target.withdraw();
    }

    receive() external payable {
        if (address(target).balance >= 1 ether) {
            target.withdraw();  // Re-enter before balance updated
        }
    }
}

Detection Pattern:

  1. Find external calls: call, send, transfer, or calls to external contracts
  2. Find state writes to the same storage slot
  3. Check if state write happens AFTER external call
  4. Account for the Checks-Effects-Interactions pattern

Variations:

  • Cross-function reentrancy: Call from function A re-enters function B
  • Cross-contract reentrancy: Via shared state in another contract
  • Read-only reentrancy: Re-enter to read stale state, not modify it

Category 2: Integer Overflow/Underflow

The Bug: Arithmetic operations wrap around instead of reverting.

Pre-Solidity 0.8.0 Pattern:

function transfer(address to, uint256 amount) public {
    balances[msg.sender] -= amount;  // Underflows if amount > balance
    balances[to] += amount;          // Overflows if result > 2^256-1
}

Post-0.8.0: Solidity automatically checks for overflow/underflow. But unchecked blocks re-enable the vulnerability:

unchecked {
    balances[msg.sender] -= amount;  // Vulnerable again
}

Detection Pattern:

  1. Check compiler version (<0.8.0 is vulnerable by default)
  2. Find unchecked blocks
  3. Analyze arithmetic operations within unchecked contexts
  4. Check for safe-math library usage

Category 3: Access Control

The Bug: Critical functions lack proper authorization checks.

Vulnerable Pattern:

function setOwner(address newOwner) public {
    owner = newOwner;  // Anyone can call!
}

function mint(address to, uint256 amount) public {
    _mint(to, amount);  // Anyone can mint!
}

Detection Pattern:

  1. Identify sensitive operations: owner changes, minting, pausing, upgrading
  2. Check for onlyOwner, require(msg.sender == ...), or role-based checks
  3. Trace through modifiers to verify they contain actual checks
  4. Flag functions that modify critical state without authorization

Variations:

  • Missing zero-address check: Setting owner to 0x0
  • Front-running: Someone monitors mempool and beats legitimate transaction
  • Centralization risk: Owner has too much power

Category 4: Unchecked External Calls

The Bug: Ignoring return values from call, send, or transfer.

Vulnerable Pattern:

function withdraw() public {
    payable(msg.sender).send(balance);  // Returns false on failure, not checked
    balance = 0;
}

The Risk: If send fails (e.g., recipient reverts or runs out of gas), the state still updates as if it succeeded.

Detection Pattern:

  1. Find all low-level calls: call, delegatecall, staticcall, send
  2. Check if return value is captured and checked
  3. For transfer, it reverts automatically (safe but inflexible)
  4. Flag unchecked calls as potential vulnerabilities

Category 5: Denial of Service

The Bug: Attackers can make the contract unusable.

Gas Limit Pattern:

function distributeRewards() public {
    for (uint i = 0; i < users.length; i++) {
        users[i].transfer(rewards[users[i]]);  // Unbounded loop
    }
}

If users.length grows large, the transaction exceeds the block gas limit.

Revert Pattern:

function claimFirst() public {
    (bool success, ) = winner.call{value: prize}("");
    require(success);  // If winner is a contract that reverts, no one can claim
}

Detection Pattern:

  1. Find unbounded loops over user-controlled arrays
  2. Identify external calls in loops
  3. Check for pull-over-push payment patterns
  4. Flag require on external call results

Category 6: Oracle Manipulation

The Bug: Relying on spot prices from DEXs that can be manipulated within a transaction.

Vulnerable Pattern:

function getPrice() public view returns (uint256) {
    return (reserve1 * 1e18) / reserve0;  // Spot price from AMM
}

function liquidate(address user) public {
    uint256 price = getPrice();
    if (getUserCollateralValue(user, price) < getUserDebt(user)) {
        // Liquidate...
    }
}

The Attack: Attacker uses a flash loan to manipulate the AMM reserves, changing the โ€œpriceโ€ within a single transaction.

Detection Pattern:

  1. Identify price oracle calls
  2. Check if using spot prices vs time-weighted average prices (TWAP)
  3. Flag direct Uniswap/Curve reserve reads without safeguards
  4. Check for flash loan protection

Category 7: Front-Running and MEV

The Bug: Transaction ordering can be exploited by miners or sophisticated actors.

Sandwich Attack Pattern:

// User submits: swap 100 ETH for DAI with 1% slippage
// Attacker front-runs: buy DAI, pushing price up
// User's trade executes at worse price
// Attacker back-runs: sell DAI for profit

Detection Pattern:

  1. Identify swaps with configurable slippage
  2. Check for commit-reveal patterns
  3. Flag time-sensitive operations without MEV protection
  4. Analyze for flashbots/private mempool compatibility

Category 8: Signature Replay

The Bug: Signed messages can be reused if not properly invalidated.

Vulnerable Pattern:

function executeWithSignature(
    address to,
    uint256 amount,
    bytes memory signature
) public {
    bytes32 hash = keccak256(abi.encodePacked(to, amount));
    require(recoverSigner(hash, signature) == owner);
    // Execute... but signature can be replayed!
}

Detection Pattern:

  1. Find signature verification (ecrecover, ECDSA.recover)
  2. Check if nonce is included in signed message
  3. Verify nonce is incremented after use
  4. Check for chain ID to prevent cross-chain replay

Complete Project Specification

Functional Requirements

  1. AST Parsing and Traversal
    • Parse Solidity AST JSON output
    • Build navigable tree structure
    • Support contract inheritance resolution
    • Handle multiple source files
  2. Control Flow Analysis
    • Build CFG for each function
    • Identify basic blocks and edges
    • Handle try/catch, if/else, loops
    • Model reverts and returns
  3. Vulnerability Detectors
    • Reentrancy (classic, cross-function, read-only)
    • Integer overflow/underflow (pre-0.8.0 and unchecked)
    • Access control issues
    • Unchecked external calls
    • Denial of service patterns
    • Timestamp dependence
    • Tx.origin authentication
    • Uninitialized storage pointers
    • Delegatecall to untrusted contracts
  4. False Positive Reduction
    • Track SafeMath usage
    • Recognize reentrancy guards
    • Identify checks-effects-interactions pattern
    • Understand OpenZeppelin patterns
  5. Reporting
    • JSON output for tooling integration
    • Human-readable reports with source context
    • Severity classification (Critical, High, Medium, Low, Informational)
    • SARIF format for IDE integration

Non-Functional Requirements

  • Performance: Analyze 10,000+ line contracts in under 30 seconds
  • Accuracy: Less than 10% false positive rate on benchmark suite
  • Extensibility: Plugin architecture for new detectors
  • Maintainability: Clear separation between parsing, analysis, and reporting

Command-Line Interface

# Basic scan
$ solscan analyze contract.sol
[CRITICAL] Reentrancy in Vault.withdraw() at line 45
[HIGH] Unchecked call return in Treasury.send() at line 78
[MEDIUM] Missing zero-address check in Ownable.setOwner() at line 12

# Detailed JSON output
$ solscan analyze --format json contract.sol > report.json

# Scan compiled AST
$ solscan analyze --ast-json artifacts/contract.json

# Scan with specific detectors
$ solscan analyze --detectors reentrancy,overflow contract.sol

# Scan entire project
$ solscan analyze --recursive ./contracts/

# Generate SARIF for CI integration
$ solscan analyze --format sarif contract.sol > results.sarif

Solution Architecture

Module Structure

src/
โ”œโ”€โ”€ main.py                # CLI entry point
โ”œโ”€โ”€ solscan/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ parser/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ ast_parser.py      # JSON AST parsing
โ”‚   โ”‚   โ”œโ”€โ”€ ast_nodes.py       # AST node types
โ”‚   โ”‚   โ”œโ”€โ”€ source_mapper.py   # Map AST to source locations
โ”‚   โ”‚   โ””โ”€โ”€ inheritance.py     # Resolve contract inheritance
โ”‚   โ”œโ”€โ”€ analysis/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ cfg.py             # Control flow graph builder
โ”‚   โ”‚   โ”œโ”€โ”€ dataflow.py        # Dataflow analysis framework
โ”‚   โ”‚   โ”œโ”€โ”€ call_graph.py      # Inter-procedural call graph
โ”‚   โ”‚   โ”œโ”€โ”€ taint.py           # Taint tracking
โ”‚   โ”‚   โ””โ”€โ”€ state_tracker.py   # Storage slot analysis
โ”‚   โ”œโ”€โ”€ detectors/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ base.py            # Detector interface
โ”‚   โ”‚   โ”œโ”€โ”€ reentrancy.py      # Reentrancy detector
โ”‚   โ”‚   โ”œโ”€โ”€ overflow.py        # Integer overflow detector
โ”‚   โ”‚   โ”œโ”€โ”€ access_control.py  # Access control detector
โ”‚   โ”‚   โ”œโ”€โ”€ unchecked_call.py  # Unchecked call detector
โ”‚   โ”‚   โ”œโ”€โ”€ dos.py             # Denial of service detector
โ”‚   โ”‚   โ”œโ”€โ”€ timestamp.py       # Timestamp dependence
โ”‚   โ”‚   โ”œโ”€โ”€ tx_origin.py       # tx.origin authentication
โ”‚   โ”‚   โ””โ”€โ”€ delegatecall.py    # Delegatecall vulnerabilities
โ”‚   โ”œโ”€โ”€ reporters/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ text.py            # Human-readable output
โ”‚   โ”‚   โ”œโ”€โ”€ json_reporter.py   # JSON format
โ”‚   โ”‚   โ””โ”€โ”€ sarif.py           # SARIF for IDE integration
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ compiler.py        # Solidity compilation wrapper
โ”‚       โ””โ”€โ”€ patterns.py        # Common vulnerability patterns
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ contracts/             # Test contracts with known vulnerabilities
โ”‚   โ”œโ”€โ”€ test_parser.py
โ”‚   โ”œโ”€โ”€ test_cfg.py
โ”‚   โ”œโ”€โ”€ test_detectors.py
โ”‚   โ””โ”€โ”€ test_benchmarks.py
โ””โ”€โ”€ benchmarks/
    โ”œโ”€โ”€ not_so_smart_contracts/  # Known vulnerable contracts
    โ””โ”€โ”€ false_positive_suite/    # Contracts that should NOT flag

Core Data Structures

from dataclasses import dataclass, field
from typing import List, Dict, Optional, Set
from enum import Enum

class NodeType(Enum):
    CONTRACT = "ContractDefinition"
    FUNCTION = "FunctionDefinition"
    MODIFIER = "ModifierDefinition"
    VARIABLE = "VariableDeclaration"
    BLOCK = "Block"
    IF_STATEMENT = "IfStatement"
    FOR_LOOP = "ForStatement"
    WHILE_LOOP = "WhileStatement"
    EXPRESSION = "ExpressionStatement"
    RETURN = "Return"
    REVERT = "RevertStatement"
    FUNCTION_CALL = "FunctionCall"
    MEMBER_ACCESS = "MemberAccess"
    ASSIGNMENT = "Assignment"
    BINARY_OP = "BinaryOperation"
    IDENTIFIER = "Identifier"
    # ... more node types


@dataclass
class SourceLocation:
    """Maps AST node to source code location."""
    file_path: str
    start: int
    length: int
    line: int
    column: int

    def get_source_snippet(self, source: str, context_lines: int = 2) -> str:
        """Extract source code around this location."""
        ...


@dataclass
class ASTNode:
    """Base class for all AST nodes."""
    node_type: NodeType
    node_id: int
    src: SourceLocation
    parent: Optional['ASTNode'] = None
    children: List['ASTNode'] = field(default_factory=list)

    def get_children_of_type(self, node_type: NodeType) -> List['ASTNode']:
        """Recursively find all descendants of a given type."""
        ...

    def get_ancestor_of_type(self, node_type: NodeType) -> Optional['ASTNode']:
        """Find the nearest ancestor of a given type."""
        ...


@dataclass
class ContractNode(ASTNode):
    """Represents a contract definition."""
    name: str
    base_contracts: List[str]
    is_abstract: bool
    is_interface: bool
    state_variables: List['VariableNode'] = field(default_factory=list)
    functions: List['FunctionNode'] = field(default_factory=list)
    modifiers: List['ModifierNode'] = field(default_factory=list)


@dataclass
class FunctionNode(ASTNode):
    """Represents a function definition."""
    name: str
    visibility: str  # public, external, internal, private
    state_mutability: str  # pure, view, payable, nonpayable
    modifiers: List[str]
    parameters: List['VariableNode']
    return_parameters: List['VariableNode']
    body: Optional['BlockNode']


@dataclass
class BasicBlock:
    """A sequence of statements with no branches."""
    id: int
    statements: List[ASTNode]
    predecessors: List['BasicBlock'] = field(default_factory=list)
    successors: List['BasicBlock'] = field(default_factory=list)


@dataclass
class ControlFlowGraph:
    """Control flow graph for a function."""
    function: FunctionNode
    entry: BasicBlock
    exit: BasicBlock
    blocks: List[BasicBlock]

    def get_paths(self) -> List[List[BasicBlock]]:
        """Enumerate all paths through the CFG (with loop unrolling limit)."""
        ...

    def dominators(self) -> Dict[BasicBlock, Set[BasicBlock]]:
        """Compute dominator sets for each block."""
        ...


class Severity(Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
    INFORMATIONAL = "informational"


@dataclass
class Finding:
    """A detected vulnerability."""
    detector_name: str
    title: str
    description: str
    severity: Severity
    location: SourceLocation
    contract_name: str
    function_name: Optional[str]
    source_snippet: str
    recommendation: str
    references: List[str] = field(default_factory=list)
    confidence: str = "high"  # high, medium, low

Detector Interface

from abc import ABC, abstractmethod
from typing import List

class BaseDetector(ABC):
    """Base class for all vulnerability detectors."""

    @property
    @abstractmethod
    def name(self) -> str:
        """Unique detector identifier."""
        pass

    @property
    @abstractmethod
    def description(self) -> str:
        """What this detector finds."""
        pass

    @property
    @abstractmethod
    def severity(self) -> Severity:
        """Default severity of findings."""
        pass

    @abstractmethod
    def detect(self, context: AnalysisContext) -> List[Finding]:
        """
        Run detection and return findings.

        Args:
            context: Contains parsed AST, CFGs, call graph, etc.

        Returns:
            List of detected vulnerabilities.
        """
        pass


class AnalysisContext:
    """Shared context for all detectors."""

    def __init__(self, ast: ASTNode, source: str):
        self.ast = ast
        self.source = source
        self._cfgs: Dict[FunctionNode, ControlFlowGraph] = {}
        self._call_graph: Optional[CallGraph] = None
        self._taint_results: Dict[FunctionNode, TaintAnalysis] = {}

    def get_cfg(self, function: FunctionNode) -> ControlFlowGraph:
        """Get or build CFG for a function."""
        if function not in self._cfgs:
            self._cfgs[function] = CFGBuilder().build(function)
        return self._cfgs[function]

    def get_call_graph(self) -> CallGraph:
        """Get or build the inter-procedural call graph."""
        if self._call_graph is None:
            self._call_graph = CallGraphBuilder().build(self.ast)
        return self._call_graph

Phased Implementation Guide

Phase 1: AST Parsing

Goal: Parse Solidity compiler JSON output into navigable data structures.

Tasks:

  1. Use solc --ast-compact-json to generate AST
  2. Parse JSON into Python objects
  3. Build parent-child relationships
  4. Implement node type classification
  5. Handle source mapping

Validation:

ast = parse_solidity("contract.sol")
assert ast.node_type == NodeType.SOURCE_UNIT
assert len(ast.get_children_of_type(NodeType.CONTRACT)) >= 1

Hints if stuck:

  • The solc compiler is available via solcx Python package
  • Start with a simple contract: contract A { function foo() public {} }
  • Print the raw JSON to understand the structure
  • Node IDs are unique and can be used for cross-references

Phase 2: AST Traversal and Queries

Goal: Build utilities for navigating and querying the AST.

Tasks:

  1. Implement visitor pattern for AST traversal
  2. Build queries: โ€œfind all function callsโ€, โ€œfind all assignmentsโ€
  3. Resolve identifiers to their declarations
  4. Handle inheritance (base contract functions)

Validation:

functions = ast.query(NodeType.FUNCTION)
calls = ast.query(NodeType.FUNCTION_CALL)
assert all(isinstance(f, FunctionNode) for f in functions)

Hints if stuck:

  • Visitor pattern: define visit_FunctionDefinition(node), etc.
  • For inheritance, look at baseContracts in ContractDefinition
  • Use the referencedDeclaration field to resolve identifiers

Phase 3: Control Flow Graph Construction

Goal: Build CFGs for each function.

Tasks:

  1. Identify basic block boundaries (branches, jumps)
  2. Create nodes for each basic block
  3. Add edges for control flow
  4. Handle try/catch, if/else, for/while, break/continue
  5. Model function returns and reverts

Validation:

cfg = build_cfg(function)
# Simple function should have: entry -> body -> exit
assert len(cfg.blocks) == 3
assert cfg.entry.successors == [body_block]

Hints if stuck:

  • A new block starts after: if/else, loops, function calls
  • revert() and return terminate a block
  • Use a stack to track nested control structures
  • Start with straight-line code, then add branches

Phase 4: First Detector - Unchecked Low-Level Calls

Goal: Implement a simple but useful detector.

Tasks:

  1. Find all call, delegatecall, staticcall, send expressions
  2. Check if the return value is captured
  3. Check if the captured value is used in a require/assert/if
  4. Report unchecked calls

Validation:

// Should flag:
payable(addr).send(amount);
addr.call{value: amount}("");

// Should NOT flag:
(bool success, ) = addr.call{value: amount}("");
require(success);

Hints if stuck:

  • Look for MemberAccess nodes with memberName of call, send, etc.
  • Check the parent node: is it an Assignment or ExpressionStatement?
  • If Assignment, trace the variable to see if itโ€™s checked

Phase 5: Reentrancy Detection

Goal: Detect classic reentrancy vulnerabilities.

Tasks:

  1. Identify external calls in each function
  2. Track state variable writes
  3. Determine if writes happen after calls (using CFG)
  4. Recognize reentrancy guards (mutex patterns)
  5. Handle modifiers that provide protection

Validation:

// Should flag:
function withdraw() public {
    msg.sender.call{value: balances[msg.sender]}("");
    balances[msg.sender] = 0;
}

// Should NOT flag:
function withdraw() public {
    uint amount = balances[msg.sender];
    balances[msg.sender] = 0;
    msg.sender.call{value: amount}("");
}

Hints if stuck:

  • Use the CFG to determine order of operations
  • External calls are: .call(), .send(), .transfer(), and calls to external contracts
  • State writes are assignments to state variables (look at the variable declarationโ€™s stateVariable field)
  • Common guard pattern: require(!locked); locked = true; ...; locked = false;

Phase 6: Integer Overflow Detection

Goal: Detect arithmetic operations that might overflow.

Tasks:

  1. Check Solidity version for default behavior
  2. Find unchecked blocks
  3. Identify arithmetic operations within unchecked contexts
  4. Recognize SafeMath usage (for older contracts)
  5. Track user-controlled inputs that flow into arithmetic

Validation:

// Should flag (if <0.8.0 or in unchecked):
balances[msg.sender] -= amount;

// Should NOT flag (0.8.0+ outside unchecked):
balances[msg.sender] -= amount;

Hints if stuck:

  • Compiler version is in the ASTโ€™s sourcesContent or parsed separately
  • unchecked blocks have node type UncheckedBlock
  • Look for BinaryOperation with operators: +, -, *, /, %, **
  • For SafeMath, check if arithmetic is wrapped in a function call like add, sub

Phase 7: Access Control Detection

Goal: Identify functions that modify sensitive state without authorization.

Tasks:

  1. Identify sensitive operations (owner changes, minting, pausing, upgrading, selfdestruct)
  2. Check for authorization modifiers
  3. Trace modifier implementations to verify they check
  4. Flag unprotected sensitive functions

Validation:

// Should flag:
function setOwner(address newOwner) public {
    owner = newOwner;
}

// Should NOT flag:
function setOwner(address newOwner) public onlyOwner {
    owner = newOwner;
}

Hints if stuck:

  • Sensitive patterns: owner = , _mint(, selfdestruct(, pause(
  • Modifiers are applied in the functionโ€™s modifiers list
  • Trace into the modifier body to find require(msg.sender == owner) or similar
  • OpenZeppelinโ€™s Ownable uses onlyOwner modifier

Phase 8: False Positive Reduction

Goal: Reduce noise by understanding context.

Tasks:

  1. Implement pattern recognition for safe patterns
  2. Track modifier effects (e.g., nonReentrant)
  3. Recognize standard library usage (OpenZeppelin)
  4. Implement confidence levels for findings

Validation:

// Should NOT flag (OpenZeppelin ReentrancyGuard):
function withdraw() public nonReentrant {
    msg.sender.call{value: balances[msg.sender]}("");
    balances[msg.sender] = 0;
}

Hints if stuck:

  • Build a list of known safe patterns
  • Check for imports from โ€œ@openzeppelin/โ€
  • Look for mutex variables (_status, locked)
  • If uncertain, report with lower confidence rather than suppressing

Phase 9: Reporting and Output Formats

Goal: Generate actionable reports.

Tasks:

  1. Implement text reporter with source context
  2. Implement JSON reporter for tooling
  3. Implement SARIF reporter for IDE integration
  4. Add severity classification
  5. Include remediation recommendations

Validation:

$ solscan analyze contract.sol
[CRITICAL] Reentrancy vulnerability
  Location: contract.sol:45:5
  Function: Vault.withdraw()

    44 |   function withdraw() public {
    45 |     msg.sender.call{value: balances[msg.sender]}("");
       |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    46 |     balances[msg.sender] = 0;

  Description: External call occurs before state update, allowing reentrancy.
  Recommendation: Update state before external call (Checks-Effects-Interactions pattern).
  References:
    - https://swcregistry.io/docs/SWC-107

Hints if stuck:

  • SARIF spec: https://sarifweb.azurewebsites.net/
  • Use the source location to extract context lines
  • Include links to SWC (Smart Contract Weakness Classification)

Phase 10: Performance and Scalability

Goal: Handle large codebases efficiently.

Tasks:

  1. Cache AST parsing results
  2. Parallelize detector execution
  3. Implement incremental analysis (only re-analyze changed files)
  4. Add timeout protection for complex analysis
  5. Benchmark on real-world contracts

Validation:

$ time solscan analyze --recursive ./large-project/
# Should complete in under 60 seconds for 50,000 lines

Hints if stuck:

  • Use multiprocessing for parallel detector execution
  • Cache CFGs as theyโ€™re expensive to build
  • Use file hashes to detect changes for incremental analysis
  • Set recursion limits for deeply nested structures

Testing Strategy

Unit Tests

def test_ast_parsing_simple_contract():
    source = """
    contract Test {
        function foo() public pure returns (uint) {
            return 42;
        }
    }
    """
    ast = parse_solidity_source(source)
    contracts = ast.get_children_of_type(NodeType.CONTRACT)
    assert len(contracts) == 1
    assert contracts[0].name == "Test"


def test_cfg_if_statement():
    source = """
    function test(bool x) public {
        if (x) {
            doA();
        } else {
            doB();
        }
    }
    """
    cfg = build_cfg_from_source(source, "test")
    # Entry -> Condition -> (True: doA | False: doB) -> Exit
    assert len(cfg.blocks) == 5
    assert len(cfg.entry.successors) == 1  # Goes to condition


def test_reentrancy_detector_positive():
    source = """
    contract Vulnerable {
        mapping(address => uint) balances;

        function withdraw() public {
            msg.sender.call{value: balances[msg.sender]}("");
            balances[msg.sender] = 0;
        }
    }
    """
    findings = run_detector(ReentrancyDetector(), source)
    assert len(findings) == 1
    assert findings[0].severity == Severity.CRITICAL


def test_reentrancy_detector_negative():
    source = """
    contract Safe {
        mapping(address => uint) balances;

        function withdraw() public {
            uint amount = balances[msg.sender];
            balances[msg.sender] = 0;
            msg.sender.call{value: amount}("");
        }
    }
    """
    findings = run_detector(ReentrancyDetector(), source)
    assert len(findings) == 0

Integration Tests (Real Vulnerabilities)

def test_the_dao_reentrancy():
    """Test detection on The DAO's actual vulnerable pattern."""
    source = load_contract("benchmarks/the_dao.sol")
    findings = analyze(source)
    reentrancy_findings = [f for f in findings if "reentrancy" in f.detector_name.lower()]
    assert len(reentrancy_findings) >= 1


def test_parity_multisig_access_control():
    """Test detection on Parity multisig vulnerability."""
    source = load_contract("benchmarks/parity_multisig.sol")
    findings = analyze(source)
    access_findings = [f for f in findings if "access" in f.detector_name.lower()]
    assert len(access_findings) >= 1

False Positive Tests

def test_no_false_positive_on_safe_patterns():
    """Verify we don't flag obviously safe code."""
    source = """
    contract Safe {
        mapping(address => uint) balances;
        bool locked;

        modifier nonReentrant() {
            require(!locked);
            locked = true;
            _;
            locked = false;
        }

        function withdraw() public nonReentrant {
            msg.sender.call{value: balances[msg.sender]}("");
            balances[msg.sender] = 0;
        }
    }
    """
    findings = analyze(source)
    critical_findings = [f for f in findings if f.severity == Severity.CRITICAL]
    assert len(critical_findings) == 0

Benchmark Tests

def test_performance_large_contract():
    """Ensure analysis completes in reasonable time."""
    source = load_contract("benchmarks/large_defi_protocol.sol")
    start = time.time()
    findings = analyze(source)
    elapsed = time.time() - start
    assert elapsed < 30.0  # 30 seconds max

Common Pitfalls and Debugging

Pitfall 1: AST Version Differences

Problem: Solidity AST format changes between compiler versions.

Symptom: Parser crashes or misses nodes on certain contracts.

Solution:

def get_node_type(node: dict) -> str:
    # Handle both old and new AST formats
    return node.get("nodeType") or node.get("name")

def get_children(node: dict) -> List[dict]:
    # Different versions use different child keys
    children = []
    for key in ["nodes", "children", "statements", "body", "expression"]:
        if key in node:
            child = node[key]
            if isinstance(child, list):
                children.extend(child)
            elif child:
                children.append(child)
    return children

Pitfall 2: Inheritance Resolution

Problem: Functions in base contracts are not analyzed.

Symptom: Missing vulnerabilities in inherited functions.

Solution:

def get_all_functions(contract: ContractNode) -> List[FunctionNode]:
    functions = list(contract.functions)
    for base_name in contract.base_contracts:
        base = resolve_contract(base_name)
        if base:
            functions.extend(get_all_functions(base))
    return functions

Pitfall 3: Modifier Analysis

Problem: Treating modifiers as opaque, missing their effects.

Symptom: False positives when modifiers provide protection.

Solution:

def modifier_has_requirement(modifier: ModifierNode) -> bool:
    """Check if modifier contains a require/assert/revert."""
    for node in modifier.body.get_all_descendants():
        if node.node_type in [NodeType.REQUIRE, NodeType.ASSERT, NodeType.REVERT]:
            return True
    return False

def modifier_is_reentrancy_guard(modifier: ModifierNode) -> bool:
    """Check for mutex pattern in modifier."""
    # Look for: locked = true; _; locked = false;
    ...

Pitfall 4: External vs Internal Calls

Problem: Flagging internal function calls as external calls.

Symptom: False positives for reentrancy on internal calls.

Solution:

def is_external_call(call: FunctionCallNode) -> bool:
    # Check if callee is external
    if call.expression.node_type == NodeType.MEMBER_ACCESS:
        member = call.expression
        if member.member_name in ["call", "delegatecall", "staticcall", "send", "transfer"]:
            return True
        # Check if target is an external contract
        target_type = get_type(member.expression)
        if is_external_contract_type(target_type):
            return True
    return False

Pitfall 5: State Variable vs Local Variable

Problem: Confusing local variables with state variables.

Symptom: False positives for state modifications.

Solution:

def is_state_variable(var: VariableNode) -> bool:
    # State variables are declared at contract level
    parent = var.get_ancestor_of_type(NodeType.CONTRACT)
    if parent and var in parent.state_variables:
        return True
    # Check the stateVariable field in newer AST
    return getattr(var, "stateVariable", False)

Pitfall 6: Path Explosion in CFG Analysis

Problem: Exponential paths through complex functions.

Symptom: Analysis hangs or runs out of memory.

Solution:

def analyze_paths(cfg: ControlFlowGraph, max_paths: int = 1000) -> List[Path]:
    paths = []
    worklist = [(cfg.entry, [])]

    while worklist and len(paths) < max_paths:
        block, path = worklist.pop()
        new_path = path + [block]

        if block == cfg.exit:
            paths.append(new_path)
        else:
            for succ in block.successors:
                if succ not in path:  # Avoid infinite loops
                    worklist.append((succ, new_path))

    return paths

Extensions and Challenges

Challenge 1: Symbolic Execution

Implement symbolic execution to explore all possible input values:

  • Track symbolic values through operations
  • Build path constraints
  • Use Z3 SMT solver to find satisfying inputs
  • Detect more subtle vulnerabilities (e.g., integer overflow with specific inputs)

Challenge 2: Cross-Contract Analysis

Extend analysis to understand interactions between contracts:

  • Build inter-contract call graphs
  • Track state shared across contracts
  • Detect cross-contract reentrancy
  • Analyze upgradeable proxy patterns

Challenge 3: Machine Learning Classification

Train a model to classify vulnerability likelihood:

  • Extract features from AST nodes
  • Train on labeled vulnerability dataset
  • Combine with static analysis for confidence scoring
  • Reduce false positives through learned patterns

Challenge 4: Formal Verification Integration

Connect to formal verification tools:

  • Generate verification conditions
  • Interface with Certora, KEVM, or other provers
  • Provide counterexamples when verification fails
  • Combine lightweight analysis with heavyweight verification

Challenge 5: Real-Time Analysis

Build a Language Server Protocol (LSP) implementation:

  • Analyze as developers type
  • Show inline vulnerability warnings
  • Suggest automated fixes
  • Integrate with VS Code, IntelliJ, etc.

Challenge 6: Bytecode Analysis

Analyze compiled EVM bytecode directly:

  • Decompile bytecode to recover structure
  • Analyze contracts without source code
  • Detect malicious contract patterns
  • Compare deployed bytecode to audited source

Real-World Connections

Professional Security Auditing

Security audit firms like Trail of Bits, OpenZeppelin, and Consensys Diligence use static analysis tools as the first step in their audit process. Tools flag potential issues, then auditors investigate manually.

Bug Bounty Hunting

Bug bounty hunters use automated tools to scan new contracts for known vulnerabilities. Finding a critical bug in a major DeFi protocol can earn $1M+ in bounty rewards.

CI/CD Integration

DeFi teams integrate static analysis into their deployment pipelines. No contract deploys without passing security checks. Tools like Slither and Mythril are commonly used.

Insurance and Risk Assessment

DeFi insurance protocols use automated analysis to assess contract risk. Higher-risk contracts pay higher premiums or are denied coverage.

Existing Tools in This Space

Your project will be similar to these production tools:

  • Slither (Trail of Bits): Python-based, uses AST analysis
  • Mythril (ConsenSys): Symbolic execution on bytecode
  • Securify (ETH Zurich): Pattern-based analysis
  • Solhint: Linting and style checking
  • Echidna: Fuzzing and property testing

Understanding how these tools work will help you build a better one.


Resources

Primary References

  1. โ€œMastering Ethereumโ€ Chapter 9: Smart Contract Security - Essential security concepts
  2. SWC Registry: swcregistry.io - Standardized vulnerability classification
  3. Solidity Documentation: docs.soliditylang.org - AST and compiler details
  4. โ€œPrinciples of Program Analysisโ€ by Nielson et al. - Formal dataflow analysis

Security Learning Resources

  1. Damn Vulnerable DeFi: damnvulnerabledefi.xyz - CTF-style learning
  2. Ethernaut: ethernaut.openzeppelin.com - Smart contract hacking challenges
  3. Capture the Ether: capturetheether.com - More hacking challenges
  4. Secureum Bootcamp: In-depth security training

Code References

  1. Slither: github.com/crytic/slither - Reference implementation
  2. solc-typed-ast: github.com/ConsenSys/solc-typed-ast - TypeScript AST library
  3. py-solc-x: github.com/iamdefinitelyahuman/py-solc-x - Python Solidity compiler wrapper

Academic Papers

  1. โ€œMaking Smart Contracts Smarterโ€ (Luu et al.) - Oyente, first smart contract analyzer
  2. โ€œSecurify: Practical Security Analysis of Smart Contractsโ€ (Tsankov et al.)
  3. โ€œSlither: A Static Analysis Framework for Smart Contractsโ€ (Feist et al.)

Self-Assessment Checklist

Before moving to the next project, verify:

  • I can explain how Solidity compiles to AST and why AST analysis is useful
  • I understand control flow graphs and can manually construct one for a simple function
  • I can describe at least 5 common smart contract vulnerabilities and their root causes
  • My scanner correctly detects reentrancy in classic vulnerable patterns
  • My scanner does NOT flag safe patterns (checks-effects-interactions, reentrancy guards)
  • I understand why false positives are costly and how to reduce them
  • I can explain the difference between static analysis, symbolic execution, and fuzzing
  • My tool produces actionable reports with source context and remediation advice
  • I have tested on real vulnerable contracts and achieved acceptable accuracy
  • I understand the limitations of static analysis and when manual review is necessary

Whatโ€™s Next?

With your security scanner complete, you now have deep insight into smart contract vulnerabilities and how to detect them automatically. In Project 15: Decentralized Storage Client, youโ€™ll explore a different aspect of Web3 infrastructure - building an IPFS-like content-addressed storage system with DHT-based peer discovery and BitTorrent-style file transfer.

The security mindset youโ€™ve developed will carry forward: every protocol you build from here on, youโ€™ll be asking โ€œhow could this be exploited?โ€