P14: Smart Contract Security Scanner
P14: Smart Contract Security Scanner
Project Overview
| Attribute | Value |
|---|---|
| Main Language | Python |
| Alternative Languages | Rust, TypeScript |
| Difficulty | Expert |
| Coolness Level | Level 5: Production-Ready Security Tool |
| Business Potential | High (Security Auditing Services, SaaS Tool) |
| Knowledge Area | Security / Auditing |
| Main Book | โMastering Ethereumโ by Andreas M. Antonopoulos & Gavin Wood |
Learning Objectives
By completing this project, you will:
- Master Solidity AST parsing understanding how compilers represent smart contract code as abstract syntax trees and how to traverse them programmatically
- Implement control flow analysis tracing all possible execution paths through a contract to detect state-dependent vulnerabilities
- Build pattern-matching engines that detect known vulnerability signatures while understanding why each pattern is dangerous
- Develop false positive reduction strategies using dataflow analysis, symbolic constraints, and heuristics to distinguish real bugs from benign patterns
- Understand smart contract security deeply from reentrancy and integer overflows to access control and oracle manipulation
Deep Theoretical Foundation
The $3 Billion Problem
Smart contract vulnerabilities have resulted in billions of dollars in losses:
| Incident | Year | Loss | Vulnerability |
|---|---|---|---|
| The DAO | 2016 | $60M | Reentrancy |
| Parity Multisig | 2017 | $30M | Access Control |
| Parity Wallet Library | 2017 | $280M | Delegatecall + Access Control |
| bZx Flash Loan | 2020 | $8M | Oracle Manipulation |
| Cream Finance | 2021 | $130M | Flash Loan + Reentrancy |
| Ronin Bridge | 2022 | $625M | Private Key Compromise |
| Euler Finance | 2023 | $197M | Donation Attack |
Unlike traditional software where bugs cause crashes, smart contract bugs cause permanent financial loss. Thereโs no โundoโ button, no customer support to call, no rollback. Code is law, and flawed law can be exploited.
Why Static Analysis Matters
Manual auditing cannot scale. A skilled auditor might review 200-500 lines of Solidity per hour. Complex DeFi protocols contain tens of thousands of lines. At the same time, new contracts are deployed every minute.
Static analysis tools serve as the first line of defense:
- Speed: Analyze thousands of contracts per hour
- Consistency: Never miss a known pattern due to fatigue
- Coverage: Check every function, every path, every state
- Documentation: Generate reports that guide manual review
But static analysis has fundamental limitations:
- False positives: Flagging safe code as vulnerable
- False negatives: Missing actual vulnerabilities
- Semantic blindness: Not understanding business logic
- Halting problem: Cannot determine all runtime behaviors
The art of building a security scanner is balancing coverage (catching real bugs) against precision (not crying wolf).
Understanding the Solidity Compilation Pipeline
Source Code (.sol)
|
v
[Lexer/Parser]
|
v
Abstract Syntax Tree (AST)
|
v
[Type Checker]
|
v
Annotated AST
|
v
[IR Generator]
|
v
Yul Intermediate Representation
|
v
[Optimizer]
|
v
[Code Generator]
|
v
EVM Bytecode
For static analysis, we primarily work with the AST and sometimes the Yul IR. The AST preserves the full semantic structure of the source code.
The Solidity AST Structure
When you compile Solidity with --ast-compact-json, you get a tree structure:
{
"nodeType": "ContractDefinition",
"name": "Vulnerable",
"nodes": [
{
"nodeType": "FunctionDefinition",
"name": "withdraw",
"body": {
"nodeType": "Block",
"statements": [
{
"nodeType": "ExpressionStatement",
"expression": {
"nodeType": "FunctionCall",
"expression": {
"nodeType": "MemberAccess",
"memberName": "call"
}
}
}
]
}
}
]
}
Each node has:
- nodeType: The grammatical category (ContractDefinition, FunctionDefinition, Assignment, etc.)
- src: Source location (file:start:length:fileIndex)
- id: Unique identifier for cross-references
- children: Type-specific child nodes
Control Flow Graphs (CFG)
A Control Flow Graph represents all possible execution paths through a function:
[Entry]
|
v
[if (balance > 0)]
/ \
T F
/ \
v v
[call.value()] [return]
|
v
[balance = 0]
|
v
[Exit]
Nodes represent basic blocks (sequences of statements with no branches). Edges represent control flow (function calls, if/else, loops, reverts).
CFGs enable:
- Path analysis: What states are possible at each point?
- Dominator analysis: What must execute before reaching a point?
- Liveness analysis: What variables are โliveโ (used later)?
Dataflow Analysis
Dataflow analysis tracks how values flow through the program:
Reaching Definitions: Which assignments might reach a given use?
x = 1; // Definition D1
if (condition) {
x = 2; // Definition D2
}
use(x); // D1 and D2 both reach here
Taint Analysis: Which values are influenced by user input?
function transfer(address to, uint amount) {
// 'to' and 'amount' are tainted (user-controlled)
balances[msg.sender] -= amount; // 'amount' flows into state
balances[to] += amount; // 'to' flows into state key
}
Available Expressions: What expressions have already been computed?
a = x + y;
b = x + y; // 'x + y' is available, could be reused
The Vulnerability Taxonomy
Category 1: Reentrancy
The Bug: External calls transfer control to an untrusted contract, which can call back before state updates complete.
Classic Pattern:
function withdraw() public {
uint amount = balances[msg.sender];
(bool success, ) = msg.sender.call{value: amount}(""); // External call
require(success);
balances[msg.sender] = 0; // State update AFTER call
}
The Attack:
contract Attacker {
Vulnerable target;
function attack() public payable {
target.deposit{value: 1 ether}();
target.withdraw();
}
receive() external payable {
if (address(target).balance >= 1 ether) {
target.withdraw(); // Re-enter before balance updated
}
}
}
Detection Pattern:
- Find external calls:
call,send,transfer, or calls to external contracts - Find state writes to the same storage slot
- Check if state write happens AFTER external call
- Account for the Checks-Effects-Interactions pattern
Variations:
- Cross-function reentrancy: Call from function A re-enters function B
- Cross-contract reentrancy: Via shared state in another contract
- Read-only reentrancy: Re-enter to read stale state, not modify it
Category 2: Integer Overflow/Underflow
The Bug: Arithmetic operations wrap around instead of reverting.
Pre-Solidity 0.8.0 Pattern:
function transfer(address to, uint256 amount) public {
balances[msg.sender] -= amount; // Underflows if amount > balance
balances[to] += amount; // Overflows if result > 2^256-1
}
Post-0.8.0: Solidity automatically checks for overflow/underflow. But unchecked blocks re-enable the vulnerability:
unchecked {
balances[msg.sender] -= amount; // Vulnerable again
}
Detection Pattern:
- Check compiler version (<0.8.0 is vulnerable by default)
- Find
uncheckedblocks - Analyze arithmetic operations within unchecked contexts
- Check for safe-math library usage
Category 3: Access Control
The Bug: Critical functions lack proper authorization checks.
Vulnerable Pattern:
function setOwner(address newOwner) public {
owner = newOwner; // Anyone can call!
}
function mint(address to, uint256 amount) public {
_mint(to, amount); // Anyone can mint!
}
Detection Pattern:
- Identify sensitive operations: owner changes, minting, pausing, upgrading
- Check for
onlyOwner,require(msg.sender == ...), or role-based checks - Trace through modifiers to verify they contain actual checks
- Flag functions that modify critical state without authorization
Variations:
- Missing zero-address check: Setting owner to 0x0
- Front-running: Someone monitors mempool and beats legitimate transaction
- Centralization risk: Owner has too much power
Category 4: Unchecked External Calls
The Bug: Ignoring return values from call, send, or transfer.
Vulnerable Pattern:
function withdraw() public {
payable(msg.sender).send(balance); // Returns false on failure, not checked
balance = 0;
}
The Risk: If send fails (e.g., recipient reverts or runs out of gas), the state still updates as if it succeeded.
Detection Pattern:
- Find all low-level calls:
call,delegatecall,staticcall,send - Check if return value is captured and checked
- For
transfer, it reverts automatically (safe but inflexible) - Flag unchecked calls as potential vulnerabilities
Category 5: Denial of Service
The Bug: Attackers can make the contract unusable.
Gas Limit Pattern:
function distributeRewards() public {
for (uint i = 0; i < users.length; i++) {
users[i].transfer(rewards[users[i]]); // Unbounded loop
}
}
If users.length grows large, the transaction exceeds the block gas limit.
Revert Pattern:
function claimFirst() public {
(bool success, ) = winner.call{value: prize}("");
require(success); // If winner is a contract that reverts, no one can claim
}
Detection Pattern:
- Find unbounded loops over user-controlled arrays
- Identify external calls in loops
- Check for pull-over-push payment patterns
- Flag
requireon external call results
Category 6: Oracle Manipulation
The Bug: Relying on spot prices from DEXs that can be manipulated within a transaction.
Vulnerable Pattern:
function getPrice() public view returns (uint256) {
return (reserve1 * 1e18) / reserve0; // Spot price from AMM
}
function liquidate(address user) public {
uint256 price = getPrice();
if (getUserCollateralValue(user, price) < getUserDebt(user)) {
// Liquidate...
}
}
The Attack: Attacker uses a flash loan to manipulate the AMM reserves, changing the โpriceโ within a single transaction.
Detection Pattern:
- Identify price oracle calls
- Check if using spot prices vs time-weighted average prices (TWAP)
- Flag direct Uniswap/Curve reserve reads without safeguards
- Check for flash loan protection
Category 7: Front-Running and MEV
The Bug: Transaction ordering can be exploited by miners or sophisticated actors.
Sandwich Attack Pattern:
// User submits: swap 100 ETH for DAI with 1% slippage
// Attacker front-runs: buy DAI, pushing price up
// User's trade executes at worse price
// Attacker back-runs: sell DAI for profit
Detection Pattern:
- Identify swaps with configurable slippage
- Check for commit-reveal patterns
- Flag time-sensitive operations without MEV protection
- Analyze for flashbots/private mempool compatibility
Category 8: Signature Replay
The Bug: Signed messages can be reused if not properly invalidated.
Vulnerable Pattern:
function executeWithSignature(
address to,
uint256 amount,
bytes memory signature
) public {
bytes32 hash = keccak256(abi.encodePacked(to, amount));
require(recoverSigner(hash, signature) == owner);
// Execute... but signature can be replayed!
}
Detection Pattern:
- Find signature verification (ecrecover, ECDSA.recover)
- Check if nonce is included in signed message
- Verify nonce is incremented after use
- Check for chain ID to prevent cross-chain replay
Complete Project Specification
Functional Requirements
- AST Parsing and Traversal
- Parse Solidity AST JSON output
- Build navigable tree structure
- Support contract inheritance resolution
- Handle multiple source files
- Control Flow Analysis
- Build CFG for each function
- Identify basic blocks and edges
- Handle try/catch, if/else, loops
- Model reverts and returns
- Vulnerability Detectors
- Reentrancy (classic, cross-function, read-only)
- Integer overflow/underflow (pre-0.8.0 and unchecked)
- Access control issues
- Unchecked external calls
- Denial of service patterns
- Timestamp dependence
- Tx.origin authentication
- Uninitialized storage pointers
- Delegatecall to untrusted contracts
- False Positive Reduction
- Track SafeMath usage
- Recognize reentrancy guards
- Identify checks-effects-interactions pattern
- Understand OpenZeppelin patterns
- Reporting
- JSON output for tooling integration
- Human-readable reports with source context
- Severity classification (Critical, High, Medium, Low, Informational)
- SARIF format for IDE integration
Non-Functional Requirements
- Performance: Analyze 10,000+ line contracts in under 30 seconds
- Accuracy: Less than 10% false positive rate on benchmark suite
- Extensibility: Plugin architecture for new detectors
- Maintainability: Clear separation between parsing, analysis, and reporting
Command-Line Interface
# Basic scan
$ solscan analyze contract.sol
[CRITICAL] Reentrancy in Vault.withdraw() at line 45
[HIGH] Unchecked call return in Treasury.send() at line 78
[MEDIUM] Missing zero-address check in Ownable.setOwner() at line 12
# Detailed JSON output
$ solscan analyze --format json contract.sol > report.json
# Scan compiled AST
$ solscan analyze --ast-json artifacts/contract.json
# Scan with specific detectors
$ solscan analyze --detectors reentrancy,overflow contract.sol
# Scan entire project
$ solscan analyze --recursive ./contracts/
# Generate SARIF for CI integration
$ solscan analyze --format sarif contract.sol > results.sarif
Solution Architecture
Module Structure
src/
โโโ main.py # CLI entry point
โโโ solscan/
โ โโโ __init__.py
โ โโโ parser/
โ โ โโโ __init__.py
โ โ โโโ ast_parser.py # JSON AST parsing
โ โ โโโ ast_nodes.py # AST node types
โ โ โโโ source_mapper.py # Map AST to source locations
โ โ โโโ inheritance.py # Resolve contract inheritance
โ โโโ analysis/
โ โ โโโ __init__.py
โ โ โโโ cfg.py # Control flow graph builder
โ โ โโโ dataflow.py # Dataflow analysis framework
โ โ โโโ call_graph.py # Inter-procedural call graph
โ โ โโโ taint.py # Taint tracking
โ โ โโโ state_tracker.py # Storage slot analysis
โ โโโ detectors/
โ โ โโโ __init__.py
โ โ โโโ base.py # Detector interface
โ โ โโโ reentrancy.py # Reentrancy detector
โ โ โโโ overflow.py # Integer overflow detector
โ โ โโโ access_control.py # Access control detector
โ โ โโโ unchecked_call.py # Unchecked call detector
โ โ โโโ dos.py # Denial of service detector
โ โ โโโ timestamp.py # Timestamp dependence
โ โ โโโ tx_origin.py # tx.origin authentication
โ โ โโโ delegatecall.py # Delegatecall vulnerabilities
โ โโโ reporters/
โ โ โโโ __init__.py
โ โ โโโ text.py # Human-readable output
โ โ โโโ json_reporter.py # JSON format
โ โ โโโ sarif.py # SARIF for IDE integration
โ โโโ utils/
โ โโโ __init__.py
โ โโโ compiler.py # Solidity compilation wrapper
โ โโโ patterns.py # Common vulnerability patterns
โโโ tests/
โ โโโ contracts/ # Test contracts with known vulnerabilities
โ โโโ test_parser.py
โ โโโ test_cfg.py
โ โโโ test_detectors.py
โ โโโ test_benchmarks.py
โโโ benchmarks/
โโโ not_so_smart_contracts/ # Known vulnerable contracts
โโโ false_positive_suite/ # Contracts that should NOT flag
Core Data Structures
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Set
from enum import Enum
class NodeType(Enum):
CONTRACT = "ContractDefinition"
FUNCTION = "FunctionDefinition"
MODIFIER = "ModifierDefinition"
VARIABLE = "VariableDeclaration"
BLOCK = "Block"
IF_STATEMENT = "IfStatement"
FOR_LOOP = "ForStatement"
WHILE_LOOP = "WhileStatement"
EXPRESSION = "ExpressionStatement"
RETURN = "Return"
REVERT = "RevertStatement"
FUNCTION_CALL = "FunctionCall"
MEMBER_ACCESS = "MemberAccess"
ASSIGNMENT = "Assignment"
BINARY_OP = "BinaryOperation"
IDENTIFIER = "Identifier"
# ... more node types
@dataclass
class SourceLocation:
"""Maps AST node to source code location."""
file_path: str
start: int
length: int
line: int
column: int
def get_source_snippet(self, source: str, context_lines: int = 2) -> str:
"""Extract source code around this location."""
...
@dataclass
class ASTNode:
"""Base class for all AST nodes."""
node_type: NodeType
node_id: int
src: SourceLocation
parent: Optional['ASTNode'] = None
children: List['ASTNode'] = field(default_factory=list)
def get_children_of_type(self, node_type: NodeType) -> List['ASTNode']:
"""Recursively find all descendants of a given type."""
...
def get_ancestor_of_type(self, node_type: NodeType) -> Optional['ASTNode']:
"""Find the nearest ancestor of a given type."""
...
@dataclass
class ContractNode(ASTNode):
"""Represents a contract definition."""
name: str
base_contracts: List[str]
is_abstract: bool
is_interface: bool
state_variables: List['VariableNode'] = field(default_factory=list)
functions: List['FunctionNode'] = field(default_factory=list)
modifiers: List['ModifierNode'] = field(default_factory=list)
@dataclass
class FunctionNode(ASTNode):
"""Represents a function definition."""
name: str
visibility: str # public, external, internal, private
state_mutability: str # pure, view, payable, nonpayable
modifiers: List[str]
parameters: List['VariableNode']
return_parameters: List['VariableNode']
body: Optional['BlockNode']
@dataclass
class BasicBlock:
"""A sequence of statements with no branches."""
id: int
statements: List[ASTNode]
predecessors: List['BasicBlock'] = field(default_factory=list)
successors: List['BasicBlock'] = field(default_factory=list)
@dataclass
class ControlFlowGraph:
"""Control flow graph for a function."""
function: FunctionNode
entry: BasicBlock
exit: BasicBlock
blocks: List[BasicBlock]
def get_paths(self) -> List[List[BasicBlock]]:
"""Enumerate all paths through the CFG (with loop unrolling limit)."""
...
def dominators(self) -> Dict[BasicBlock, Set[BasicBlock]]:
"""Compute dominator sets for each block."""
...
class Severity(Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
INFORMATIONAL = "informational"
@dataclass
class Finding:
"""A detected vulnerability."""
detector_name: str
title: str
description: str
severity: Severity
location: SourceLocation
contract_name: str
function_name: Optional[str]
source_snippet: str
recommendation: str
references: List[str] = field(default_factory=list)
confidence: str = "high" # high, medium, low
Detector Interface
from abc import ABC, abstractmethod
from typing import List
class BaseDetector(ABC):
"""Base class for all vulnerability detectors."""
@property
@abstractmethod
def name(self) -> str:
"""Unique detector identifier."""
pass
@property
@abstractmethod
def description(self) -> str:
"""What this detector finds."""
pass
@property
@abstractmethod
def severity(self) -> Severity:
"""Default severity of findings."""
pass
@abstractmethod
def detect(self, context: AnalysisContext) -> List[Finding]:
"""
Run detection and return findings.
Args:
context: Contains parsed AST, CFGs, call graph, etc.
Returns:
List of detected vulnerabilities.
"""
pass
class AnalysisContext:
"""Shared context for all detectors."""
def __init__(self, ast: ASTNode, source: str):
self.ast = ast
self.source = source
self._cfgs: Dict[FunctionNode, ControlFlowGraph] = {}
self._call_graph: Optional[CallGraph] = None
self._taint_results: Dict[FunctionNode, TaintAnalysis] = {}
def get_cfg(self, function: FunctionNode) -> ControlFlowGraph:
"""Get or build CFG for a function."""
if function not in self._cfgs:
self._cfgs[function] = CFGBuilder().build(function)
return self._cfgs[function]
def get_call_graph(self) -> CallGraph:
"""Get or build the inter-procedural call graph."""
if self._call_graph is None:
self._call_graph = CallGraphBuilder().build(self.ast)
return self._call_graph
Phased Implementation Guide
Phase 1: AST Parsing
Goal: Parse Solidity compiler JSON output into navigable data structures.
Tasks:
- Use
solc --ast-compact-jsonto generate AST - Parse JSON into Python objects
- Build parent-child relationships
- Implement node type classification
- Handle source mapping
Validation:
ast = parse_solidity("contract.sol")
assert ast.node_type == NodeType.SOURCE_UNIT
assert len(ast.get_children_of_type(NodeType.CONTRACT)) >= 1
Hints if stuck:
- The
solccompiler is available viasolcxPython package - Start with a simple contract:
contract A { function foo() public {} } - Print the raw JSON to understand the structure
- Node IDs are unique and can be used for cross-references
Phase 2: AST Traversal and Queries
Goal: Build utilities for navigating and querying the AST.
Tasks:
- Implement visitor pattern for AST traversal
- Build queries: โfind all function callsโ, โfind all assignmentsโ
- Resolve identifiers to their declarations
- Handle inheritance (base contract functions)
Validation:
functions = ast.query(NodeType.FUNCTION)
calls = ast.query(NodeType.FUNCTION_CALL)
assert all(isinstance(f, FunctionNode) for f in functions)
Hints if stuck:
- Visitor pattern: define
visit_FunctionDefinition(node), etc. - For inheritance, look at
baseContractsin ContractDefinition - Use the
referencedDeclarationfield to resolve identifiers
Phase 3: Control Flow Graph Construction
Goal: Build CFGs for each function.
Tasks:
- Identify basic block boundaries (branches, jumps)
- Create nodes for each basic block
- Add edges for control flow
- Handle try/catch, if/else, for/while, break/continue
- Model function returns and reverts
Validation:
cfg = build_cfg(function)
# Simple function should have: entry -> body -> exit
assert len(cfg.blocks) == 3
assert cfg.entry.successors == [body_block]
Hints if stuck:
- A new block starts after: if/else, loops, function calls
revert()andreturnterminate a block- Use a stack to track nested control structures
- Start with straight-line code, then add branches
Phase 4: First Detector - Unchecked Low-Level Calls
Goal: Implement a simple but useful detector.
Tasks:
- Find all
call,delegatecall,staticcall,sendexpressions - Check if the return value is captured
- Check if the captured value is used in a require/assert/if
- Report unchecked calls
Validation:
// Should flag:
payable(addr).send(amount);
addr.call{value: amount}("");
// Should NOT flag:
(bool success, ) = addr.call{value: amount}("");
require(success);
Hints if stuck:
- Look for
MemberAccessnodes withmemberNameofcall,send, etc. - Check the parent node: is it an Assignment or ExpressionStatement?
- If Assignment, trace the variable to see if itโs checked
Phase 5: Reentrancy Detection
Goal: Detect classic reentrancy vulnerabilities.
Tasks:
- Identify external calls in each function
- Track state variable writes
- Determine if writes happen after calls (using CFG)
- Recognize reentrancy guards (mutex patterns)
- Handle modifiers that provide protection
Validation:
// Should flag:
function withdraw() public {
msg.sender.call{value: balances[msg.sender]}("");
balances[msg.sender] = 0;
}
// Should NOT flag:
function withdraw() public {
uint amount = balances[msg.sender];
balances[msg.sender] = 0;
msg.sender.call{value: amount}("");
}
Hints if stuck:
- Use the CFG to determine order of operations
- External calls are:
.call(),.send(),.transfer(), and calls to external contracts - State writes are assignments to state variables (look at the variable declarationโs
stateVariablefield) - Common guard pattern:
require(!locked); locked = true; ...; locked = false;
Phase 6: Integer Overflow Detection
Goal: Detect arithmetic operations that might overflow.
Tasks:
- Check Solidity version for default behavior
- Find
uncheckedblocks - Identify arithmetic operations within unchecked contexts
- Recognize SafeMath usage (for older contracts)
- Track user-controlled inputs that flow into arithmetic
Validation:
// Should flag (if <0.8.0 or in unchecked):
balances[msg.sender] -= amount;
// Should NOT flag (0.8.0+ outside unchecked):
balances[msg.sender] -= amount;
Hints if stuck:
- Compiler version is in the ASTโs
sourcesContentor parsed separately uncheckedblocks have node typeUncheckedBlock- Look for
BinaryOperationwith operators: +, -, *, /, %, ** - For SafeMath, check if arithmetic is wrapped in a function call like
add,sub
Phase 7: Access Control Detection
Goal: Identify functions that modify sensitive state without authorization.
Tasks:
- Identify sensitive operations (owner changes, minting, pausing, upgrading, selfdestruct)
- Check for authorization modifiers
- Trace modifier implementations to verify they check
- Flag unprotected sensitive functions
Validation:
// Should flag:
function setOwner(address newOwner) public {
owner = newOwner;
}
// Should NOT flag:
function setOwner(address newOwner) public onlyOwner {
owner = newOwner;
}
Hints if stuck:
- Sensitive patterns:
owner =,_mint(,selfdestruct(,pause( - Modifiers are applied in the functionโs
modifierslist - Trace into the modifier body to find
require(msg.sender == owner)or similar - OpenZeppelinโs Ownable uses
onlyOwnermodifier
Phase 8: False Positive Reduction
Goal: Reduce noise by understanding context.
Tasks:
- Implement pattern recognition for safe patterns
- Track modifier effects (e.g., nonReentrant)
- Recognize standard library usage (OpenZeppelin)
- Implement confidence levels for findings
Validation:
// Should NOT flag (OpenZeppelin ReentrancyGuard):
function withdraw() public nonReentrant {
msg.sender.call{value: balances[msg.sender]}("");
balances[msg.sender] = 0;
}
Hints if stuck:
- Build a list of known safe patterns
- Check for imports from โ@openzeppelin/โ
- Look for mutex variables (
_status,locked) - If uncertain, report with lower confidence rather than suppressing
Phase 9: Reporting and Output Formats
Goal: Generate actionable reports.
Tasks:
- Implement text reporter with source context
- Implement JSON reporter for tooling
- Implement SARIF reporter for IDE integration
- Add severity classification
- Include remediation recommendations
Validation:
$ solscan analyze contract.sol
[CRITICAL] Reentrancy vulnerability
Location: contract.sol:45:5
Function: Vault.withdraw()
44 | function withdraw() public {
45 | msg.sender.call{value: balances[msg.sender]}("");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
46 | balances[msg.sender] = 0;
Description: External call occurs before state update, allowing reentrancy.
Recommendation: Update state before external call (Checks-Effects-Interactions pattern).
References:
- https://swcregistry.io/docs/SWC-107
Hints if stuck:
- SARIF spec: https://sarifweb.azurewebsites.net/
- Use the source location to extract context lines
- Include links to SWC (Smart Contract Weakness Classification)
Phase 10: Performance and Scalability
Goal: Handle large codebases efficiently.
Tasks:
- Cache AST parsing results
- Parallelize detector execution
- Implement incremental analysis (only re-analyze changed files)
- Add timeout protection for complex analysis
- Benchmark on real-world contracts
Validation:
$ time solscan analyze --recursive ./large-project/
# Should complete in under 60 seconds for 50,000 lines
Hints if stuck:
- Use
multiprocessingfor parallel detector execution - Cache CFGs as theyโre expensive to build
- Use file hashes to detect changes for incremental analysis
- Set recursion limits for deeply nested structures
Testing Strategy
Unit Tests
def test_ast_parsing_simple_contract():
source = """
contract Test {
function foo() public pure returns (uint) {
return 42;
}
}
"""
ast = parse_solidity_source(source)
contracts = ast.get_children_of_type(NodeType.CONTRACT)
assert len(contracts) == 1
assert contracts[0].name == "Test"
def test_cfg_if_statement():
source = """
function test(bool x) public {
if (x) {
doA();
} else {
doB();
}
}
"""
cfg = build_cfg_from_source(source, "test")
# Entry -> Condition -> (True: doA | False: doB) -> Exit
assert len(cfg.blocks) == 5
assert len(cfg.entry.successors) == 1 # Goes to condition
def test_reentrancy_detector_positive():
source = """
contract Vulnerable {
mapping(address => uint) balances;
function withdraw() public {
msg.sender.call{value: balances[msg.sender]}("");
balances[msg.sender] = 0;
}
}
"""
findings = run_detector(ReentrancyDetector(), source)
assert len(findings) == 1
assert findings[0].severity == Severity.CRITICAL
def test_reentrancy_detector_negative():
source = """
contract Safe {
mapping(address => uint) balances;
function withdraw() public {
uint amount = balances[msg.sender];
balances[msg.sender] = 0;
msg.sender.call{value: amount}("");
}
}
"""
findings = run_detector(ReentrancyDetector(), source)
assert len(findings) == 0
Integration Tests (Real Vulnerabilities)
def test_the_dao_reentrancy():
"""Test detection on The DAO's actual vulnerable pattern."""
source = load_contract("benchmarks/the_dao.sol")
findings = analyze(source)
reentrancy_findings = [f for f in findings if "reentrancy" in f.detector_name.lower()]
assert len(reentrancy_findings) >= 1
def test_parity_multisig_access_control():
"""Test detection on Parity multisig vulnerability."""
source = load_contract("benchmarks/parity_multisig.sol")
findings = analyze(source)
access_findings = [f for f in findings if "access" in f.detector_name.lower()]
assert len(access_findings) >= 1
False Positive Tests
def test_no_false_positive_on_safe_patterns():
"""Verify we don't flag obviously safe code."""
source = """
contract Safe {
mapping(address => uint) balances;
bool locked;
modifier nonReentrant() {
require(!locked);
locked = true;
_;
locked = false;
}
function withdraw() public nonReentrant {
msg.sender.call{value: balances[msg.sender]}("");
balances[msg.sender] = 0;
}
}
"""
findings = analyze(source)
critical_findings = [f for f in findings if f.severity == Severity.CRITICAL]
assert len(critical_findings) == 0
Benchmark Tests
def test_performance_large_contract():
"""Ensure analysis completes in reasonable time."""
source = load_contract("benchmarks/large_defi_protocol.sol")
start = time.time()
findings = analyze(source)
elapsed = time.time() - start
assert elapsed < 30.0 # 30 seconds max
Common Pitfalls and Debugging
Pitfall 1: AST Version Differences
Problem: Solidity AST format changes between compiler versions.
Symptom: Parser crashes or misses nodes on certain contracts.
Solution:
def get_node_type(node: dict) -> str:
# Handle both old and new AST formats
return node.get("nodeType") or node.get("name")
def get_children(node: dict) -> List[dict]:
# Different versions use different child keys
children = []
for key in ["nodes", "children", "statements", "body", "expression"]:
if key in node:
child = node[key]
if isinstance(child, list):
children.extend(child)
elif child:
children.append(child)
return children
Pitfall 2: Inheritance Resolution
Problem: Functions in base contracts are not analyzed.
Symptom: Missing vulnerabilities in inherited functions.
Solution:
def get_all_functions(contract: ContractNode) -> List[FunctionNode]:
functions = list(contract.functions)
for base_name in contract.base_contracts:
base = resolve_contract(base_name)
if base:
functions.extend(get_all_functions(base))
return functions
Pitfall 3: Modifier Analysis
Problem: Treating modifiers as opaque, missing their effects.
Symptom: False positives when modifiers provide protection.
Solution:
def modifier_has_requirement(modifier: ModifierNode) -> bool:
"""Check if modifier contains a require/assert/revert."""
for node in modifier.body.get_all_descendants():
if node.node_type in [NodeType.REQUIRE, NodeType.ASSERT, NodeType.REVERT]:
return True
return False
def modifier_is_reentrancy_guard(modifier: ModifierNode) -> bool:
"""Check for mutex pattern in modifier."""
# Look for: locked = true; _; locked = false;
...
Pitfall 4: External vs Internal Calls
Problem: Flagging internal function calls as external calls.
Symptom: False positives for reentrancy on internal calls.
Solution:
def is_external_call(call: FunctionCallNode) -> bool:
# Check if callee is external
if call.expression.node_type == NodeType.MEMBER_ACCESS:
member = call.expression
if member.member_name in ["call", "delegatecall", "staticcall", "send", "transfer"]:
return True
# Check if target is an external contract
target_type = get_type(member.expression)
if is_external_contract_type(target_type):
return True
return False
Pitfall 5: State Variable vs Local Variable
Problem: Confusing local variables with state variables.
Symptom: False positives for state modifications.
Solution:
def is_state_variable(var: VariableNode) -> bool:
# State variables are declared at contract level
parent = var.get_ancestor_of_type(NodeType.CONTRACT)
if parent and var in parent.state_variables:
return True
# Check the stateVariable field in newer AST
return getattr(var, "stateVariable", False)
Pitfall 6: Path Explosion in CFG Analysis
Problem: Exponential paths through complex functions.
Symptom: Analysis hangs or runs out of memory.
Solution:
def analyze_paths(cfg: ControlFlowGraph, max_paths: int = 1000) -> List[Path]:
paths = []
worklist = [(cfg.entry, [])]
while worklist and len(paths) < max_paths:
block, path = worklist.pop()
new_path = path + [block]
if block == cfg.exit:
paths.append(new_path)
else:
for succ in block.successors:
if succ not in path: # Avoid infinite loops
worklist.append((succ, new_path))
return paths
Extensions and Challenges
Challenge 1: Symbolic Execution
Implement symbolic execution to explore all possible input values:
- Track symbolic values through operations
- Build path constraints
- Use Z3 SMT solver to find satisfying inputs
- Detect more subtle vulnerabilities (e.g., integer overflow with specific inputs)
Challenge 2: Cross-Contract Analysis
Extend analysis to understand interactions between contracts:
- Build inter-contract call graphs
- Track state shared across contracts
- Detect cross-contract reentrancy
- Analyze upgradeable proxy patterns
Challenge 3: Machine Learning Classification
Train a model to classify vulnerability likelihood:
- Extract features from AST nodes
- Train on labeled vulnerability dataset
- Combine with static analysis for confidence scoring
- Reduce false positives through learned patterns
Challenge 4: Formal Verification Integration
Connect to formal verification tools:
- Generate verification conditions
- Interface with Certora, KEVM, or other provers
- Provide counterexamples when verification fails
- Combine lightweight analysis with heavyweight verification
Challenge 5: Real-Time Analysis
Build a Language Server Protocol (LSP) implementation:
- Analyze as developers type
- Show inline vulnerability warnings
- Suggest automated fixes
- Integrate with VS Code, IntelliJ, etc.
Challenge 6: Bytecode Analysis
Analyze compiled EVM bytecode directly:
- Decompile bytecode to recover structure
- Analyze contracts without source code
- Detect malicious contract patterns
- Compare deployed bytecode to audited source
Real-World Connections
Professional Security Auditing
Security audit firms like Trail of Bits, OpenZeppelin, and Consensys Diligence use static analysis tools as the first step in their audit process. Tools flag potential issues, then auditors investigate manually.
Bug Bounty Hunting
Bug bounty hunters use automated tools to scan new contracts for known vulnerabilities. Finding a critical bug in a major DeFi protocol can earn $1M+ in bounty rewards.
CI/CD Integration
DeFi teams integrate static analysis into their deployment pipelines. No contract deploys without passing security checks. Tools like Slither and Mythril are commonly used.
Insurance and Risk Assessment
DeFi insurance protocols use automated analysis to assess contract risk. Higher-risk contracts pay higher premiums or are denied coverage.
Existing Tools in This Space
Your project will be similar to these production tools:
- Slither (Trail of Bits): Python-based, uses AST analysis
- Mythril (ConsenSys): Symbolic execution on bytecode
- Securify (ETH Zurich): Pattern-based analysis
- Solhint: Linting and style checking
- Echidna: Fuzzing and property testing
Understanding how these tools work will help you build a better one.
Resources
Primary References
- โMastering Ethereumโ Chapter 9: Smart Contract Security - Essential security concepts
- SWC Registry: swcregistry.io - Standardized vulnerability classification
- Solidity Documentation: docs.soliditylang.org - AST and compiler details
- โPrinciples of Program Analysisโ by Nielson et al. - Formal dataflow analysis
Security Learning Resources
- Damn Vulnerable DeFi: damnvulnerabledefi.xyz - CTF-style learning
- Ethernaut: ethernaut.openzeppelin.com - Smart contract hacking challenges
- Capture the Ether: capturetheether.com - More hacking challenges
- Secureum Bootcamp: In-depth security training
Code References
- Slither: github.com/crytic/slither - Reference implementation
- solc-typed-ast: github.com/ConsenSys/solc-typed-ast - TypeScript AST library
- py-solc-x: github.com/iamdefinitelyahuman/py-solc-x - Python Solidity compiler wrapper
Academic Papers
- โMaking Smart Contracts Smarterโ (Luu et al.) - Oyente, first smart contract analyzer
- โSecurify: Practical Security Analysis of Smart Contractsโ (Tsankov et al.)
- โSlither: A Static Analysis Framework for Smart Contractsโ (Feist et al.)
Self-Assessment Checklist
Before moving to the next project, verify:
- I can explain how Solidity compiles to AST and why AST analysis is useful
- I understand control flow graphs and can manually construct one for a simple function
- I can describe at least 5 common smart contract vulnerabilities and their root causes
- My scanner correctly detects reentrancy in classic vulnerable patterns
- My scanner does NOT flag safe patterns (checks-effects-interactions, reentrancy guards)
- I understand why false positives are costly and how to reduce them
- I can explain the difference between static analysis, symbolic execution, and fuzzing
- My tool produces actionable reports with source context and remediation advice
- I have tested on real vulnerable contracts and achieved acceptable accuracy
- I understand the limitations of static analysis and when manual review is necessary
Whatโs Next?
With your security scanner complete, you now have deep insight into smart contract vulnerabilities and how to detect them automatically. In Project 15: Decentralized Storage Client, youโll explore a different aspect of Web3 infrastructure - building an IPFS-like content-addressed storage system with DHT-based peer discovery and BitTorrent-style file transfer.
The security mindset youโve developed will carry forward: every protocol you build from here on, youโll be asking โhow could this be exploited?โ