P14: Smart Contract Security Scanner

Project Overview

Attribute	Value
Main Language	Python
Alternative Languages	Rust, TypeScript
Difficulty	Expert
Coolness Level	Level 5: Production-Ready Security Tool
Business Potential	High (Security Auditing Services, SaaS Tool)
Knowledge Area	Security / Auditing
Main Book	“Mastering Ethereum” by Andreas M. Antonopoulos & Gavin Wood

Learning Objectives

By completing this project, you will:

Master Solidity AST parsing understanding how compilers represent smart contract code as abstract syntax trees and how to traverse them programmatically
Implement control flow analysis tracing all possible execution paths through a contract to detect state-dependent vulnerabilities
Build pattern-matching engines that detect known vulnerability signatures while understanding why each pattern is dangerous
Develop false positive reduction strategies using dataflow analysis, symbolic constraints, and heuristics to distinguish real bugs from benign patterns
Understand smart contract security deeply from reentrancy and integer overflows to access control and oracle manipulation

Deep Theoretical Foundation

The $3 Billion Problem

Smart contract vulnerabilities have resulted in billions of dollars in losses:

Incident	Year	Loss	Vulnerability
The DAO	2016	$60M	Reentrancy
Parity Multisig	2017	$30M	Access Control
Parity Wallet Library	2017	$280M	Delegatecall + Access Control
bZx Flash Loan	2020	$8M	Oracle Manipulation
Cream Finance	2021	$130M	Flash Loan + Reentrancy
Ronin Bridge	2022	$625M	Private Key Compromise
Euler Finance	2023	$197M	Donation Attack

Unlike traditional software where bugs cause crashes, smart contract bugs cause permanent financial loss. There’s no “undo” button, no customer support to call, no rollback. Code is law, and flawed law can be exploited.

Why Static Analysis Matters

Manual auditing cannot scale. A skilled auditor might review 200-500 lines of Solidity per hour. Complex DeFi protocols contain tens of thousands of lines. At the same time, new contracts are deployed every minute.

Static analysis tools serve as the first line of defense:

Speed: Analyze thousands of contracts per hour
Consistency: Never miss a known pattern due to fatigue
Coverage: Check every function, every path, every state
Documentation: Generate reports that guide manual review

But static analysis has fundamental limitations:

False positives: Flagging safe code as vulnerable
False negatives: Missing actual vulnerabilities
Semantic blindness: Not understanding business logic
Halting problem: Cannot determine all runtime behaviors

The art of building a security scanner is balancing coverage (catching real bugs) against precision (not crying wolf).

Understanding the Solidity Compilation Pipeline

Source Code (.sol)
        |
        v
    [Lexer/Parser]
        |
        v
Abstract Syntax Tree (AST)
        |
        v
   [Type Checker]
        |
        v
  Annotated AST
        |
        v
   [IR Generator]
        |
        v
 Yul Intermediate Representation
        |
        v
   [Optimizer]
        |
        v
   [Code Generator]
        |
        v
    EVM Bytecode

For static analysis, we primarily work with the AST and sometimes the Yul IR. The AST preserves the full semantic structure of the source code.

The Solidity AST Structure

When you compile Solidity with --ast-compact-json, you get a tree structure:

{
  "nodeType": "ContractDefinition",
  "name": "Vulnerable",
  "nodes": [
    {
      "nodeType": "FunctionDefinition",
      "name": "withdraw",
      "body": {
        "nodeType": "Block",
        "statements": [
          {
            "nodeType": "ExpressionStatement",
            "expression": {
              "nodeType": "FunctionCall",
              "expression": {
                "nodeType": "MemberAccess",
                "memberName": "call"
              }
            }
          }
        ]
      }
    }
  ]
}

Each node has:

nodeType: The grammatical category (ContractDefinition, FunctionDefinition, Assignment, etc.)
src: Source location (file:start:length:fileIndex)
id: Unique identifier for cross-references
children: Type-specific child nodes

Control Flow Graphs (CFG)

A Control Flow Graph represents all possible execution paths through a function:

              [Entry]
                 |
                 v
         [if (balance > 0)]
            /         \
           T           F
          /             \
         v               v
    [call.value()]    [return]
         |
         v
  [balance = 0]
         |
         v
      [Exit]

Nodes represent basic blocks (sequences of statements with no branches). Edges represent control flow (function calls, if/else, loops, reverts).

CFGs enable:

Path analysis: What states are possible at each point?
Dominator analysis: What must execute before reaching a point?
Liveness analysis: What variables are “live” (used later)?

Dataflow Analysis

Dataflow analysis tracks how values flow through the program:

Reaching Definitions: Which assignments might reach a given use?

x = 1;           // Definition D1
if (condition) {
    x = 2;       // Definition D2
}
use(x);          // D1 and D2 both reach here

Taint Analysis: Which values are influenced by user input?

function transfer(address to, uint amount) {
    // 'to' and 'amount' are tainted (user-controlled)
    balances[msg.sender] -= amount;  // 'amount' flows into state
    balances[to] += amount;          // 'to' flows into state key
}

Available Expressions: What expressions have already been computed?

a = x + y;
b = x + y;  // 'x + y' is available, could be reused

The Vulnerability Taxonomy

Category 1: Reentrancy

The Bug: External calls transfer control to an untrusted contract, which can call back before state updates complete.

Classic Pattern:

function withdraw() public {
    uint amount = balances[msg.sender];
    (bool success, ) = msg.sender.call{value: amount}("");  // External call
    require(success);
    balances[msg.sender] = 0;  // State update AFTER call
}

The Attack:

contract Attacker {
    Vulnerable target;

    function attack() public payable {
        target.deposit{value: 1 ether}();
        target.withdraw();
    }

    receive() external payable {
        if (address(target).balance >= 1 ether) {
            target.withdraw();  // Re-enter before balance updated
        }
    }
}

Detection Pattern:

Find external calls: call, send, transfer, or calls to external contracts
Find state writes to the same storage slot
Check if state write happens AFTER external call
Account for the Checks-Effects-Interactions pattern

Variations:

Cross-function reentrancy: Call from function A re-enters function B
Cross-contract reentrancy: Via shared state in another contract
Read-only reentrancy: Re-enter to read stale state, not modify it

Category 2: Integer Overflow/Underflow

The Bug: Arithmetic operations wrap around instead of reverting.

Pre-Solidity 0.8.0 Pattern:

function transfer(address to, uint256 amount) public {
    balances[msg.sender] -= amount;  // Underflows if amount > balance
    balances[to] += amount;          // Overflows if result > 2^256-1
}

Post-0.8.0: Solidity automatically checks for overflow/underflow. But unchecked blocks re-enable the vulnerability:

unchecked {
    balances[msg.sender] -= amount;  // Vulnerable again
}

Detection Pattern:

Check compiler version (<0.8.0 is vulnerable by default)
Find unchecked blocks
Analyze arithmetic operations within unchecked contexts
Check for safe-math library usage

Category 3: Access Control

The Bug: Critical functions lack proper authorization checks.

Vulnerable Pattern:

function setOwner(address newOwner) public {
    owner = newOwner;  // Anyone can call!
}

function mint(address to, uint256 amount) public {
    _mint(to, amount);  // Anyone can mint!
}

Detection Pattern:

Identify sensitive operations: owner changes, minting, pausing, upgrading
Check for onlyOwner, require(msg.sender == ...), or role-based checks
Trace through modifiers to verify they contain actual checks
Flag functions that modify critical state without authorization

Variations:

Missing zero-address check: Setting owner to 0x0
Front-running: Someone monitors mempool and beats legitimate transaction
Centralization risk: Owner has too much power

Category 4: Unchecked External Calls

The Bug: Ignoring return values from call, send, or transfer.

Vulnerable Pattern:

function withdraw() public {
    payable(msg.sender).send(balance);  // Returns false on failure, not checked
    balance = 0;
}

The Risk: If send fails (e.g., recipient reverts or runs out of gas), the state still updates as if it succeeded.

Detection Pattern:

Find all low-level calls: call, delegatecall, staticcall, send
Check if return value is captured and checked
For transfer, it reverts automatically (safe but inflexible)
Flag unchecked calls as potential vulnerabilities

Category 5: Denial of Service

The Bug: Attackers can make the contract unusable.

Gas Limit Pattern:

function distributeRewards() public {
    for (uint i = 0; i < users.length; i++) {
        users[i].transfer(rewards[users[i]]);  // Unbounded loop
    }
}

If users.length grows large, the transaction exceeds the block gas limit.

Revert Pattern:

function claimFirst() public {
    (bool success, ) = winner.call{value: prize}("");
    require(success);  // If winner is a contract that reverts, no one can claim
}

Detection Pattern:

Find unbounded loops over user-controlled arrays
Identify external calls in loops
Check for pull-over-push payment patterns
Flag require on external call results

Category 6: Oracle Manipulation

The Bug: Relying on spot prices from DEXs that can be manipulated within a transaction.

Vulnerable Pattern:

function getPrice() public view returns (uint256) {
    return (reserve1 * 1e18) / reserve0;  // Spot price from AMM
}

function liquidate(address user) public {
    uint256 price = getPrice();
    if (getUserCollateralValue(user, price) < getUserDebt(user)) {
        // Liquidate...
    }
}

The Attack: Attacker uses a flash loan to manipulate the AMM reserves, changing the “price” within a single transaction.

Detection Pattern:

Identify price oracle calls
Check if using spot prices vs time-weighted average prices (TWAP)
Flag direct Uniswap/Curve reserve reads without safeguards
Check for flash loan protection

Category 7: Front-Running and MEV

The Bug: Transaction ordering can be exploited by miners or sophisticated actors.

Sandwich Attack Pattern:

// User submits: swap 100 ETH for DAI with 1% slippage
// Attacker front-runs: buy DAI, pushing price up
// User's trade executes at worse price
// Attacker back-runs: sell DAI for profit

Detection Pattern:

Identify swaps with configurable slippage
Check for commit-reveal patterns
Flag time-sensitive operations without MEV protection
Analyze for flashbots/private mempool compatibility

Category 8: Signature Replay

The Bug: Signed messages can be reused if not properly invalidated.

Vulnerable Pattern:

function executeWithSignature(
    address to,
    uint256 amount,
    bytes memory signature
) public {
    bytes32 hash = keccak256(abi.encodePacked(to, amount));
    require(recoverSigner(hash, signature) == owner);
    // Execute... but signature can be replayed!
}

Detection Pattern:

Find signature verification (ecrecover, ECDSA.recover)
Check if nonce is included in signed message
Verify nonce is incremented after use
Check for chain ID to prevent cross-chain replay

Complete Project Specification

Functional Requirements

AST Parsing and Traversal
- Parse Solidity AST JSON output
- Build navigable tree structure
- Support contract inheritance resolution
- Handle multiple source files
Control Flow Analysis
- Build CFG for each function
- Identify basic blocks and edges
- Handle try/catch, if/else, loops
- Model reverts and returns
Vulnerability Detectors
- Reentrancy (classic, cross-function, read-only)
- Integer overflow/underflow (pre-0.8.0 and unchecked)
- Access control issues
- Unchecked external calls
- Denial of service patterns
- Timestamp dependence
- Tx.origin authentication
- Uninitialized storage pointers
- Delegatecall to untrusted contracts
False Positive Reduction
- Track SafeMath usage
- Recognize reentrancy guards
- Identify checks-effects-interactions pattern
- Understand OpenZeppelin patterns
Reporting
- JSON output for tooling integration
- Human-readable reports with source context
- Severity classification (Critical, High, Medium, Low, Informational)
- SARIF format for IDE integration

Non-Functional Requirements

Performance: Analyze 10,000+ line contracts in under 30 seconds
Accuracy: Less than 10% false positive rate on benchmark suite
Extensibility: Plugin architecture for new detectors
Maintainability: Clear separation between parsing, analysis, and reporting

Command-Line Interface

# Basic scan
$ solscan analyze contract.sol
[CRITICAL] Reentrancy in Vault.withdraw() at line 45
[HIGH] Unchecked call return in Treasury.send() at line 78
[MEDIUM] Missing zero-address check in Ownable.setOwner() at line 12

# Detailed JSON output
$ solscan analyze --format json contract.sol > report.json

# Scan compiled AST
$ solscan analyze --ast-json artifacts/contract.json

# Scan with specific detectors
$ solscan analyze --detectors reentrancy,overflow contract.sol

# Scan entire project
$ solscan analyze --recursive ./contracts/

# Generate SARIF for CI integration
$ solscan analyze --format sarif contract.sol > results.sarif

Solution Architecture

Module Structure

src/
├── main.py                # CLI entry point
├── solscan/
│   ├── __init__.py
│   ├── parser/
│   │   ├── __init__.py
│   │   ├── ast_parser.py      # JSON AST parsing
│   │   ├── ast_nodes.py       # AST node types
│   │   ├── source_mapper.py   # Map AST to source locations
│   │   └── inheritance.py     # Resolve contract inheritance
│   ├── analysis/
│   │   ├── __init__.py
│   │   ├── cfg.py             # Control flow graph builder
│   │   ├── dataflow.py        # Dataflow analysis framework
│   │   ├── call_graph.py      # Inter-procedural call graph
│   │   ├── taint.py           # Taint tracking
│   │   └── state_tracker.py   # Storage slot analysis
│   ├── detectors/
│   │   ├── __init__.py
│   │   ├── base.py            # Detector interface
│   │   ├── reentrancy.py      # Reentrancy detector
│   │   ├── overflow.py        # Integer overflow detector
│   │   ├── access_control.py  # Access control detector
│   │   ├── unchecked_call.py  # Unchecked call detector
│   │   ├── dos.py             # Denial of service detector
│   │   ├── timestamp.py       # Timestamp dependence
│   │   ├── tx_origin.py       # tx.origin authentication
│   │   └── delegatecall.py    # Delegatecall vulnerabilities
│   ├── reporters/
│   │   ├── __init__.py
│   │   ├── text.py            # Human-readable output
│   │   ├── json_reporter.py   # JSON format
│   │   └── sarif.py           # SARIF for IDE integration
│   └── utils/
│       ├── __init__.py
│       ├── compiler.py        # Solidity compilation wrapper
│       └── patterns.py        # Common vulnerability patterns
├── tests/
│   ├── contracts/             # Test contracts with known vulnerabilities
│   ├── test_parser.py
│   ├── test_cfg.py
│   ├── test_detectors.py
│   └── test_benchmarks.py
└── benchmarks/
    ├── not_so_smart_contracts/  # Known vulnerable contracts
    └── false_positive_suite/    # Contracts that should NOT flag

Core Data Structures

from dataclasses import dataclass, field
from typing import List, Dict, Optional, Set
from enum import Enum

class NodeType(Enum):
    CONTRACT = "ContractDefinition"
    FUNCTION = "FunctionDefinition"
    MODIFIER = "ModifierDefinition"
    VARIABLE = "VariableDeclaration"
    BLOCK = "Block"
    IF_STATEMENT = "IfStatement"
    FOR_LOOP = "ForStatement"
    WHILE_LOOP = "WhileStatement"
    EXPRESSION = "ExpressionStatement"
    RETURN = "Return"
    REVERT = "RevertStatement"
    FUNCTION_CALL = "FunctionCall"
    MEMBER_ACCESS = "MemberAccess"
    ASSIGNMENT = "Assignment"
    BINARY_OP = "BinaryOperation"
    IDENTIFIER = "Identifier"
    # ... more node types


@dataclass
class SourceLocation:
    """Maps AST node to source code location."""
    file_path: str
    start: int
    length: int
    line: int
    column: int

    def get_source_snippet(self, source: str, context_lines: int = 2) -> str:
        """Extract source code around this location."""
        ...


@dataclass
class ASTNode:
    """Base class for all AST nodes."""
    node_type: NodeType
    node_id: int
    src: SourceLocation
    parent: Optional['ASTNode'] = None
    children: List['ASTNode'] = field(default_factory=list)

    def get_children_of_type(self, node_type: NodeType) -> List['ASTNode']:
        """Recursively find all descendants of a given type."""
        ...

    def get_ancestor_of_type(self, node_type: NodeType) -> Optional['ASTNode']:
        """Find the nearest ancestor of a given type."""
        ...


@dataclass
class ContractNode(ASTNode):
    """Represents a contract definition."""
    name: str
    base_contracts: List[str]
    is_abstract: bool
    is_interface: bool
    state_variables: List['VariableNode'] = field(default_factory=list)
    functions: List['FunctionNode'] = field(default_factory=list)
    modifiers: List['ModifierNode'] = field(default_factory=list)


@dataclass
class FunctionNode(ASTNode):
    """Represents a function definition."""
    name: str
    visibility: str  # public, external, internal, private
    state_mutability: str  # pure, view, payable, nonpayable
    modifiers: List[str]
    parameters: List['VariableNode']
    return_parameters: List['VariableNode']
    body: Optional['BlockNode']


@dataclass
class BasicBlock:
    """A sequence of statements with no branches."""
    id: int
    statements: List[ASTNode]
    predecessors: List['BasicBlock'] = field(default_factory=list)
    successors: List['BasicBlock'] = field(default_factory=list)


@dataclass
class ControlFlowGraph:
    """Control flow graph for a function."""
    function: FunctionNode
    entry: BasicBlock
    exit: BasicBlock
    blocks: List[BasicBlock]

    def get_paths(self) -> List[List[BasicBlock]]:
        """Enumerate all paths through the CFG (with loop unrolling limit)."""
        ...

    def dominators(self) -> Dict[BasicBlock, Set[BasicBlock]]:
        """Compute dominator sets for each block."""
        ...


class Severity(Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"
    INFORMATIONAL = "informational"


@dataclass
class Finding:
    """A detected vulnerability."""
    detector_name: str
    title: str
    description: str
    severity: Severity
    location: SourceLocation
    contract_name: str
    function_name: Optional[str]
    source_snippet: str
    recommendation: str
    references: List[str] = field(default_factory=list)
    confidence: str = "high"  # high, medium, low

Detector Interface

from abc import ABC, abstractmethod
from typing import List

class BaseDetector(ABC):
    """Base class for all vulnerability detectors."""

    @property
    @abstractmethod
    def name(self) -> str:
        """Unique detector identifier."""
        pass

    @property
    @abstractmethod
    def description(self) -> str:
        """What this detector finds."""
        pass

    @property
    @abstractmethod
    def severity(self) -> Severity:
        """Default severity of findings."""
        pass

    @abstractmethod
    def detect(self, context: AnalysisContext) -> List[Finding]:
        """
        Run detection and return findings.

        Args:
            context: Contains parsed AST, CFGs, call graph, etc.

        Returns:
            List of detected vulnerabilities.
        """
        pass


class AnalysisContext:
    """Shared context for all detectors."""

    def __init__(self, ast: ASTNode, source: str):
        self.ast = ast
        self.source = source
        self._cfgs: Dict[FunctionNode, ControlFlowGraph] = {}
        self._call_graph: Optional[CallGraph] = None
        self._taint_results: Dict[FunctionNode, TaintAnalysis] = {}

    def get_cfg(self, function: FunctionNode) -> ControlFlowGraph:
        """Get or build CFG for a function."""
        if function not in self._cfgs:
            self._cfgs[function] = CFGBuilder().build(function)
        return self._cfgs[function]

    def get_call_graph(self) -> CallGraph:
        """Get or build the inter-procedural call graph."""
        if self._call_graph is None:
            self._call_graph = CallGraphBuilder().build(self.ast)
        return self._call_graph

Phased Implementation Guide

Phase 1: AST Parsing

Goal: Parse Solidity compiler JSON output into navigable data structures.

Tasks:

Use solc --ast-compact-json to generate AST
Parse JSON into Python objects
Build parent-child relationships
Implement node type classification
Handle source mapping

Validation:

ast = parse_solidity("contract.sol")
assert ast.node_type == NodeType.SOURCE_UNIT
assert len(ast.get_children_of_type(NodeType.CONTRACT)) >= 1

Hints if stuck:

The solc compiler is available via solcx Python package
Start with a simple contract: contract A { function foo() public {} }
Print the raw JSON to understand the structure
Node IDs are unique and can be used for cross-references

Phase 2: AST Traversal and Queries

Goal: Build utilities for navigating and querying the AST.

Tasks:

Implement visitor pattern for AST traversal
Build queries: “find all function calls”, “find all assignments”
Resolve identifiers to their declarations
Handle inheritance (base contract functions)

Validation:

functions = ast.query(NodeType.FUNCTION)
calls = ast.query(NodeType.FUNCTION_CALL)
assert all(isinstance(f, FunctionNode) for f in functions)

Hints if stuck:

Visitor pattern: define visit_FunctionDefinition(node), etc.
For inheritance, look at baseContracts in ContractDefinition
Use the referencedDeclaration field to resolve identifiers

Phase 3: Control Flow Graph Construction

Goal: Build CFGs for each function.

Tasks:

Identify basic block boundaries (branches, jumps)
Create nodes for each basic block
Add edges for control flow
Handle try/catch, if/else, for/while, break/continue
Model function returns and reverts

Validation:

cfg = build_cfg(function)
# Simple function should have: entry -> body -> exit
assert len(cfg.blocks) == 3
assert cfg.entry.successors == [body_block]

Hints if stuck:

A new block starts after: if/else, loops, function calls
revert() and return terminate a block
Use a stack to track nested control structures
Start with straight-line code, then add branches

Phase 4: First Detector - Unchecked Low-Level Calls

Goal: Implement a simple but useful detector.

Tasks:

Find all call, delegatecall, staticcall, send expressions
Check if the return value is captured
Check if the captured value is used in a require/assert/if
Report unchecked calls

Validation:

// Should flag:
payable(addr).send(amount);
addr.call{value: amount}("");

// Should NOT flag:
(bool success, ) = addr.call{value: amount}("");
require(success);

Hints if stuck:

Look for MemberAccess nodes with memberName of call, send, etc.
Check the parent node: is it an Assignment or ExpressionStatement?
If Assignment, trace the variable to see if it’s checked

Phase 5: Reentrancy Detection

Goal: Detect classic reentrancy vulnerabilities.

Tasks:

Identify external calls in each function
Track state variable writes
Determine if writes happen after calls (using CFG)
Recognize reentrancy guards (mutex patterns)
Handle modifiers that provide protection

Validation:

// Should flag:
function withdraw() public {
    msg.sender.call{value: balances[msg.sender]}("");
    balances[msg.sender] = 0;
}

// Should NOT flag:
function withdraw() public {
    uint amount = balances[msg.sender];
    balances[msg.sender] = 0;
    msg.sender.call{value: amount}("");
}

Hints if stuck:

Use the CFG to determine order of operations
External calls are: .call(), .send(), .transfer(), and calls to external contracts
State writes are assignments to state variables (look at the variable declaration’s stateVariable field)
Common guard pattern: require(!locked); locked = true; ...; locked = false;

Phase 6: Integer Overflow Detection

Goal: Detect arithmetic operations that might overflow.

Tasks:

Check Solidity version for default behavior
Find unchecked blocks
Identify arithmetic operations within unchecked contexts
Recognize SafeMath usage (for older contracts)
Track user-controlled inputs that flow into arithmetic

Validation:

// Should flag (if <0.8.0 or in unchecked):
balances[msg.sender] -= amount;

// Should NOT flag (0.8.0+ outside unchecked):
balances[msg.sender] -= amount;

Hints if stuck:

Compiler version is in the AST’s sourcesContent or parsed separately
unchecked blocks have node type UncheckedBlock
Look for BinaryOperation with operators: +, -, *, /, %, **
For SafeMath, check if arithmetic is wrapped in a function call like add, sub

Phase 7: Access Control Detection

Goal: Identify functions that modify sensitive state without authorization.

Tasks:

Identify sensitive operations (owner changes, minting, pausing, upgrading, selfdestruct)
Check for authorization modifiers
Trace modifier implementations to verify they check
Flag unprotected sensitive functions

Validation:

// Should flag:
function setOwner(address newOwner) public {
    owner = newOwner;
}

// Should NOT flag:
function setOwner(address newOwner) public onlyOwner {
    owner = newOwner;
}

Hints if stuck:

Sensitive patterns: owner = , _mint(, selfdestruct(, pause(
Modifiers are applied in the function’s modifiers list
Trace into the modifier body to find require(msg.sender == owner) or similar
OpenZeppelin’s Ownable uses onlyOwner modifier

Phase 8: False Positive Reduction

Goal: Reduce noise by understanding context.

Tasks:

Implement pattern recognition for safe patterns
Track modifier effects (e.g., nonReentrant)
Recognize standard library usage (OpenZeppelin)
Implement confidence levels for findings

Validation:

// Should NOT flag (OpenZeppelin ReentrancyGuard):
function withdraw() public nonReentrant {
    msg.sender.call{value: balances[msg.sender]}("");
    balances[msg.sender] = 0;
}

Hints if stuck:

Build a list of known safe patterns
Check for imports from “@openzeppelin/”
Look for mutex variables (_status, locked)
If uncertain, report with lower confidence rather than suppressing

Phase 9: Reporting and Output Formats

Goal: Generate actionable reports.

Tasks:

Implement text reporter with source context
Implement JSON reporter for tooling
Implement SARIF reporter for IDE integration
Add severity classification
Include remediation recommendations

Validation:

$ solscan analyze contract.sol
[CRITICAL] Reentrancy vulnerability
  Location: contract.sol:45:5
  Function: Vault.withdraw()

    44 |   function withdraw() public {
    45 |     msg.sender.call{value: balances[msg.sender]}("");
       |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    46 |     balances[msg.sender] = 0;

  Description: External call occurs before state update, allowing reentrancy.
  Recommendation: Update state before external call (Checks-Effects-Interactions pattern).
  References:
    - https://swcregistry.io/docs/SWC-107

Hints if stuck:

SARIF spec: https://sarifweb.azurewebsites.net/
Use the source location to extract context lines
Include links to SWC (Smart Contract Weakness Classification)

Phase 10: Performance and Scalability

Goal: Handle large codebases efficiently.

Tasks:

Cache AST parsing results
Parallelize detector execution
Implement incremental analysis (only re-analyze changed files)
Add timeout protection for complex analysis
Benchmark on real-world contracts

Validation:

$ time solscan analyze --recursive ./large-project/
# Should complete in under 60 seconds for 50,000 lines

Hints if stuck:

Use multiprocessing for parallel detector execution
Cache CFGs as they’re expensive to build
Use file hashes to detect changes for incremental analysis
Set recursion limits for deeply nested structures

Testing Strategy

Unit Tests

def test_ast_parsing_simple_contract():
    source = """
    contract Test {
        function foo() public pure returns (uint) {
            return 42;
        }
    }
    """
    ast = parse_solidity_source(source)
    contracts = ast.get_children_of_type(NodeType.CONTRACT)
    assert len(contracts) == 1
    assert contracts[0].name == "Test"


def test_cfg_if_statement():
    source = """
    function test(bool x) public {
        if (x) {
            doA();
        } else {
            doB();
        }
    }
    """
    cfg = build_cfg_from_source(source, "test")
    # Entry -> Condition -> (True: doA | False: doB) -> Exit
    assert len(cfg.blocks) == 5
    assert len(cfg.entry.successors) == 1  # Goes to condition


def test_reentrancy_detector_positive():
    source = """
    contract Vulnerable {
        mapping(address => uint) balances;

        function withdraw() public {
            msg.sender.call{value: balances[msg.sender]}("");
            balances[msg.sender] = 0;
        }
    }
    """
    findings = run_detector(ReentrancyDetector(), source)
    assert len(findings) == 1
    assert findings[0].severity == Severity.CRITICAL


def test_reentrancy_detector_negative():
    source = """
    contract Safe {
        mapping(address => uint) balances;

        function withdraw() public {
            uint amount = balances[msg.sender];
            balances[msg.sender] = 0;
            msg.sender.call{value: amount}("");
        }
    }
    """
    findings = run_detector(ReentrancyDetector(), source)
    assert len(findings) == 0

Integration Tests (Real Vulnerabilities)

def test_the_dao_reentrancy():
    """Test detection on The DAO's actual vulnerable pattern."""
    source = load_contract("benchmarks/the_dao.sol")
    findings = analyze(source)
    reentrancy_findings = [f for f in findings if "reentrancy" in f.detector_name.lower()]
    assert len(reentrancy_findings) >= 1


def test_parity_multisig_access_control():
    """Test detection on Parity multisig vulnerability."""
    source = load_contract("benchmarks/parity_multisig.sol")
    findings = analyze(source)
    access_findings = [f for f in findings if "access" in f.detector_name.lower()]
    assert len(access_findings) >= 1

False Positive Tests

def test_no_false_positive_on_safe_patterns():
    """Verify we don't flag obviously safe code."""
    source = """
    contract Safe {
        mapping(address => uint) balances;
        bool locked;

        modifier nonReentrant() {
            require(!locked);
            locked = true;
            _;
            locked = false;
        }

        function withdraw() public nonReentrant {
            msg.sender.call{value: balances[msg.sender]}("");
            balances[msg.sender] = 0;
        }
    }
    """
    findings = analyze(source)
    critical_findings = [f for f in findings if f.severity == Severity.CRITICAL]
    assert len(critical_findings) == 0

Benchmark Tests

def test_performance_large_contract():
    """Ensure analysis completes in reasonable time."""
    source = load_contract("benchmarks/large_defi_protocol.sol")
    start = time.time()
    findings = analyze(source)
    elapsed = time.time() - start
    assert elapsed < 30.0  # 30 seconds max

Common Pitfalls and Debugging

Pitfall 1: AST Version Differences

Problem: Solidity AST format changes between compiler versions.

Symptom: Parser crashes or misses nodes on certain contracts.

Solution:

def get_node_type(node: dict) -> str:
    # Handle both old and new AST formats
    return node.get("nodeType") or node.get("name")

def get_children(node: dict) -> List[dict]:
    # Different versions use different child keys
    children = []
    for key in ["nodes", "children", "statements", "body", "expression"]:
        if key in node:
            child = node[key]
            if isinstance(child, list):
                children.extend(child)
            elif child:
                children.append(child)
    return children

Pitfall 2: Inheritance Resolution

Problem: Functions in base contracts are not analyzed.

Symptom: Missing vulnerabilities in inherited functions.

Solution:

def get_all_functions(contract: ContractNode) -> List[FunctionNode]:
    functions = list(contract.functions)
    for base_name in contract.base_contracts:
        base = resolve_contract(base_name)
        if base:
            functions.extend(get_all_functions(base))
    return functions

Pitfall 3: Modifier Analysis

Problem: Treating modifiers as opaque, missing their effects.

Symptom: False positives when modifiers provide protection.

Solution:

def modifier_has_requirement(modifier: ModifierNode) -> bool:
    """Check if modifier contains a require/assert/revert."""
    for node in modifier.body.get_all_descendants():
        if node.node_type in [NodeType.REQUIRE, NodeType.ASSERT, NodeType.REVERT]:
            return True
    return False

def modifier_is_reentrancy_guard(modifier: ModifierNode) -> bool:
    """Check for mutex pattern in modifier."""
    # Look for: locked = true; _; locked = false;
    ...

Pitfall 4: External vs Internal Calls

Problem: Flagging internal function calls as external calls.

Symptom: False positives for reentrancy on internal calls.

Solution:

def is_external_call(call: FunctionCallNode) -> bool:
    # Check if callee is external
    if call.expression.node_type == NodeType.MEMBER_ACCESS:
        member = call.expression
        if member.member_name in ["call", "delegatecall", "staticcall", "send", "transfer"]:
            return True
        # Check if target is an external contract
        target_type = get_type(member.expression)
        if is_external_contract_type(target_type):
            return True
    return False

Pitfall 5: State Variable vs Local Variable

Problem: Confusing local variables with state variables.

Symptom: False positives for state modifications.

Solution:

def is_state_variable(var: VariableNode) -> bool:
    # State variables are declared at contract level
    parent = var.get_ancestor_of_type(NodeType.CONTRACT)
    if parent and var in parent.state_variables:
        return True
    # Check the stateVariable field in newer AST
    return getattr(var, "stateVariable", False)

Pitfall 6: Path Explosion in CFG Analysis

Problem: Exponential paths through complex functions.

Symptom: Analysis hangs or runs out of memory.

Solution:

def analyze_paths(cfg: ControlFlowGraph, max_paths: int = 1000) -> List[Path]:
    paths = []
    worklist = [(cfg.entry, [])]

    while worklist and len(paths) < max_paths:
        block, path = worklist.pop()
        new_path = path + [block]

        if block == cfg.exit:
            paths.append(new_path)
        else:
            for succ in block.successors:
                if succ not in path:  # Avoid infinite loops
                    worklist.append((succ, new_path))

    return paths

Extensions and Challenges

Challenge 1: Symbolic Execution

Implement symbolic execution to explore all possible input values:

Track symbolic values through operations
Build path constraints
Use Z3 SMT solver to find satisfying inputs
Detect more subtle vulnerabilities (e.g., integer overflow with specific inputs)

Challenge 2: Cross-Contract Analysis

Extend analysis to understand interactions between contracts:

Build inter-contract call graphs
Track state shared across contracts
Detect cross-contract reentrancy
Analyze upgradeable proxy patterns

Challenge 3: Machine Learning Classification

Train a model to classify vulnerability likelihood:

Extract features from AST nodes
Train on labeled vulnerability dataset
Combine with static analysis for confidence scoring
Reduce false positives through learned patterns

Challenge 4: Formal Verification Integration

Connect to formal verification tools:

Generate verification conditions
Interface with Certora, KEVM, or other provers
Provide counterexamples when verification fails
Combine lightweight analysis with heavyweight verification

Challenge 5: Real-Time Analysis

Build a Language Server Protocol (LSP) implementation:

Analyze as developers type
Show inline vulnerability warnings
Suggest automated fixes
Integrate with VS Code, IntelliJ, etc.

Challenge 6: Bytecode Analysis

Analyze compiled EVM bytecode directly:

Decompile bytecode to recover structure
Analyze contracts without source code
Detect malicious contract patterns
Compare deployed bytecode to audited source

Real-World Connections

Professional Security Auditing

Security audit firms like Trail of Bits, OpenZeppelin, and Consensys Diligence use static analysis tools as the first step in their audit process. Tools flag potential issues, then auditors investigate manually.

Bug Bounty Hunting

Bug bounty hunters use automated tools to scan new contracts for known vulnerabilities. Finding a critical bug in a major DeFi protocol can earn $1M+ in bounty rewards.

CI/CD Integration

DeFi teams integrate static analysis into their deployment pipelines. No contract deploys without passing security checks. Tools like Slither and Mythril are commonly used.

Insurance and Risk Assessment

DeFi insurance protocols use automated analysis to assess contract risk. Higher-risk contracts pay higher premiums or are denied coverage.

Existing Tools in This Space

Your project will be similar to these production tools:

Slither (Trail of Bits): Python-based, uses AST analysis
Mythril (ConsenSys): Symbolic execution on bytecode
Securify (ETH Zurich): Pattern-based analysis
Solhint: Linting and style checking
Echidna: Fuzzing and property testing

Understanding how these tools work will help you build a better one.

Resources

Primary References

“Mastering Ethereum” Chapter 9: Smart Contract Security - Essential security concepts
SWC Registry: swcregistry.io - Standardized vulnerability classification
Solidity Documentation: docs.soliditylang.org - AST and compiler details
“Principles of Program Analysis” by Nielson et al. - Formal dataflow analysis

Security Learning Resources

Damn Vulnerable DeFi: damnvulnerabledefi.xyz - CTF-style learning
Ethernaut: ethernaut.openzeppelin.com - Smart contract hacking challenges
Capture the Ether: capturetheether.com - More hacking challenges
Secureum Bootcamp: In-depth security training

Code References

Slither: github.com/crytic/slither - Reference implementation
solc-typed-ast: github.com/ConsenSys/solc-typed-ast - TypeScript AST library
py-solc-x: github.com/iamdefinitelyahuman/py-solc-x - Python Solidity compiler wrapper

Academic Papers

“Making Smart Contracts Smarter” (Luu et al.) - Oyente, first smart contract analyzer
“Securify: Practical Security Analysis of Smart Contracts” (Tsankov et al.)
“Slither: A Static Analysis Framework for Smart Contracts” (Feist et al.)

Self-Assessment Checklist

Before moving to the next project, verify:

What’s Next?

With your security scanner complete, you now have deep insight into smart contract vulnerabilities and how to detect them automatically. In Project 15: Decentralized Storage Client, you’ll explore a different aspect of Web3 infrastructure - building an IPFS-like content-addressed storage system with DHT-based peer discovery and BitTorrent-style file transfer.

The security mindset you’ve developed will carry forward: every protocol you build from here on, you’ll be asking “how could this be exploited?”