Project 2: BusyBox Script Compatibility Checker
Quick Reference
| Attribute | Details |
|---|---|
| Difficulty | Beginner |
| Time Estimate | Weekend (8-16 hours) |
| Primary Language | Shell (POSIX sh) |
| Alternative Language | Python |
| Knowledge Area | Shell Scripting / POSIX Compatibility |
| Software/Tools | BusyBox, ShellCheck, Alpine Linux, Docker |
| Main Book | “Effective Shell” by Dave Kerr |
| Prerequisites | Basic shell scripting, command-line familiarity |
Learning Objectives
By completing this project, you will:
- Understand the difference between POSIX shell and Bash and why it matters for portability
- Identify common bash-isms that break on BusyBox/ash shells
- Recognize GNU-specific command options that fail on BusyBox utilities
- Write portable shell scripts that work across all Unix-like systems
- Build a static analysis tool that detects compatibility issues before deployment
- Master POSIX-compatible alternatives to common bash features
Theoretical Foundation
Core Concepts
1. What Is BusyBox?
BusyBox is a single executable (~1 MB) that provides stripped-down versions of approximately 400 Unix utilities. On Alpine Linux:
┌─────────────────────────────────────────────────────────────────────┐
│ BusyBox Architecture │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ /bin/busybox (single binary, ~1 MB) │
│ │ │
│ ├── ls → symlink to busybox │
│ ├── grep → symlink to busybox │
│ ├── awk → symlink to busybox │
│ ├── sed → symlink to busybox │
│ ├── sh (ash) → symlink to busybox │
│ ├── find → symlink to busybox │
│ └── ... 400+ more applets │
│ │
│ When you run "ls", BusyBox checks argv[0] and runs the │
│ appropriate applet with simplified functionality. │
│ │
└─────────────────────────────────────────────────────────────────────┘
Verification:
# On Alpine Linux
ls -la /bin/ls
# lrwxrwxrwx 1 root root 12 /bin/ls -> /bin/busybox
# List all BusyBox applets
busybox --list | wc -l
# ~400 applets
2. POSIX Shell vs Bash
POSIX (Portable Operating System Interface) defines a standard for Unix-like operating systems. The POSIX shell specification describes a minimal, portable shell. Bash extends POSIX with many features:
┌─────────────────────────────────────────────────────────────────────┐
│ POSIX Shell vs Bash Features │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ POSIX Shell (sh, ash, dash) Bash Extensions │
│ ────────────────────────── ─────────────── │
│ [ test ] [[ extended test ]] │
│ $(command) $((arithmetic)) │
│ $var, ${var} ${var:0:5} substrings │
│ case/esac arrays: arr=(a b c) │
│ if/then/else/fi associative arrays │
│ for/while/until brace expansion: {1..10} │
│ local variables here-strings: <<< │
│ functions process substitution: <() │
│ pipes and redirection regex: =~ │
│ exit status ($?) BASHPID, BASH_VERSION │
│ │
│ ✓ Works everywhere ✗ Bash-only features │
│ │
└─────────────────────────────────────────────────────────────────────┘
3. GNU vs BusyBox Utilities
GNU Coreutils provides feature-rich implementations with many options. BusyBox provides minimal implementations:
┌─────────────────────────────────────────────────────────────────────┐
│ GNU Coreutils vs BusyBox Comparison │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Command GNU Options BusyBox Support │
│ ─────── ─────────── ─────────────── │
│ grep -P (Perl regex) ❌ Not supported │
│ -o (only matching) ✓ Supported │
│ --color ✓ Supported │
│ │
│ sed -i'' (no backup) ❌ Syntax differs │
│ -E (extended) ✓ Supported │
│ -z (null-delimited) ❌ Not supported │
│ │
│ find -printf ❌ Not supported │
│ -name ✓ Supported │
│ -exec ✓ Supported │
│ │
│ date -d "string" ⚠️ Different syntax │
│ +%format ✓ Supported │
│ │
│ xargs -r (no-run-if-empty) ⚠️ Different behavior │
│ -0 (null-delimited) ✓ Supported │
│ │
│ cp --parents ❌ Not supported │
│ -r (recursive) ✓ Supported │
│ │
│ stat -c (custom format) ⚠️ Limited format options │
│ │
└─────────────────────────────────────────────────────────────────────┘
Why This Matters
The Real-World Problem:
You write a Dockerfile that works perfectly on your Ubuntu development machine:
FROM alpine:latest
COPY build.sh /
RUN chmod +x /build.sh && /build.sh
But build.sh contains:
#!/bin/bash
arr=(one two three)
if [[ $DEBUG == "true" ]]; then
grep -P '\d+' logfile.txt
fi
Result: The container build fails with cryptic errors:
/bin/bash: not foundsyntax error: unexpected "("(arrays)[[: not found
The Cost:
- Production deployments fail
- CI/CD pipelines break
- Hours spent debugging “works on my machine” issues
- Security vulnerabilities from installing full bash/GNU tools unnecessarily
Historical Context
Why does Alpine use BusyBox?
Alpine Linux was designed for embedded systems, routers, and containers where:
- Disk space is precious (5 MB base vs 70+ MB)
- Attack surface must be minimal
- Simplicity aids security auditing
- Speed matters (fewer bytes to load)
BusyBox was created in 1996 by Bruce Perens for the Debian installer, specifically to fit on a single floppy disk (1.44 MB). It became the standard for embedded Linux.
Why does this create problems?
Most developers learn shell scripting on systems with:
- Bash as the default shell (Ubuntu, macOS, RHEL)
- GNU Coreutils with full feature sets
- Tutorials that assume these tools
This creates a “works on my machine” problem when scripts move to Alpine.
Common Misconceptions
Misconception 1: “sh is just a symlink to bash”
On Ubuntu/Debian, /bin/sh is often a symlink to dash (Debian Almquist Shell), not bash. On Alpine, it’s ash (Almquist Shell from BusyBox). Neither supports bash-isms.
# Check what sh actually is
ls -la /bin/sh
# Alpine: /bin/sh -> /bin/busybox
# Ubuntu: /bin/sh -> /bin/dash
Misconception 2: “If it works in bash, it’s portable”
Bash is one of the most feature-rich shells. Testing only in bash guarantees nothing about portability.
Misconception 3: “Just install bash on Alpine”
While possible (apk add bash), this:
- Increases image size by ~5 MB
- Adds potential security vulnerabilities
- Defeats the purpose of using Alpine
- Doesn’t fix GNU utility compatibility issues
Misconception 4: “ShellCheck catches everything”
ShellCheck is excellent but primarily focuses on bash correctness. While it can warn about bash-isms when targeting sh, it doesn’t know about BusyBox-specific limitations in utilities like grep, sed, or find.
Project Specification
What You Will Build
A command-line tool that analyzes shell scripts for Alpine Linux/BusyBox compatibility issues. The tool will:
- Detect bash-isms that fail on ash/POSIX shells
- Identify GNU-specific command options not supported by BusyBox
- Suggest POSIX-compatible alternatives
- Provide severity levels (error, warning, info)
- Support checking individual files or entire directories
- Output in human-readable or machine-parseable formats
Functional Requirements
- Shebang Detection
- Warn on
#!/bin/bashor#!/usr/bin/env bash - Suggest
#!/bin/shfor portability
- Warn on
- Bash Syntax Detection
[[ ]]extended test syntax- Arrays:
arr=(a b c),${arr[@]},${arr[0]} - Associative arrays:
declare -A - Brace expansion:
{1..10},{a,b,c} - Here-strings:
<<< - Process substitution:
<(),>() $'...'ANSI-C quoting==in test (should be=)function name()(should bename())source(should be.)letand(( ))arithmetic${var:offset:length}substring${var/pattern/replacement}substitutionlocal -a,local -Atyped local variablesread -a(read into array)printf -v(assign to variable)selectloops
- GNU Utility Option Detection
grep -P(Perl regex)grep --include,grep --excludesed -i''(in-place with empty suffix)sed -z(null-delimited)find -printffind -regex(with certain patterns)cp --parentsdate -d(parse date string)stat -c(with unsupported format codes)xargs -r(different behavior)readlink -f(works but verify)ls -G(hide group)timeoutcommandrealpathcommandmktemp -t(template handling)
- Report Generation
- Line number and column
- Problematic code snippet
- Severity level
- Suggested fix
- Reference documentation
Non-Functional Requirements
- Process files under 10 MB within 1 second
- Support UTF-8 encoded scripts
- Zero dependencies for shell implementation
- Python implementation should work with Python 3.6+
- Exit code 0 if no errors, 1 if errors found, 2 if warnings only
Example Usage and Output
Input script (deploy.sh):
#!/bin/bash
# Configuration
declare -A config
config[host]="prod.example.com"
config[port]=22
# Arrays for servers
servers=(web1 web2 web3)
# Check if debug mode
if [[ $DEBUG == "true" ]]; then
set -x
fi
# Find and process logs
find /var/log -name "*.log" -printf "%f\n" | while read file; do
grep -P '\d{4}-\d{2}-\d{2}' "$file" > /tmp/dates.txt
# Get line count
count=$(wc -l <<< "$(cat /tmp/dates.txt)")
echo "Found $count date patterns in $file"
done
# In-place update
sed -i'' 's/old/new/g' /etc/config.conf
# Check status
for server in ${servers[@]}; do
timeout 5 nc -z "$server" 22
done
Tool output:
$ ./alpine-check.sh deploy.sh
=== Alpine/BusyBox Compatibility Report ===
File: deploy.sh
================================================================================
Line 1: #!/bin/bash
[ERROR] Bash shebang detected
Alpine uses ash (BusyBox shell). Bash is not installed by default.
FIX: Change to #!/bin/sh or install bash with: apk add bash
Line 4: declare -A config
[ERROR] Associative arrays require bash
BusyBox ash does not support associative arrays.
FIX: Use separate variables or a configuration file
Line 8: servers=(web1 web2 web3)
[ERROR] Array syntax not supported in ash
BusyBox ash does not support bash arrays.
FIX: Use positional parameters: set -- web1 web2 web3
Then access with: $1, $2, $3 or "$@"
Line 11: if [[ $DEBUG == "true" ]]; then
[ERROR] Extended test [[ ]] not supported
BusyBox ash uses [ ] (single brackets) only.
FIX: Use [ "$DEBUG" = "true" ]
Note: Use = not == for string comparison
Line 16: find /var/log -name "*.log" -printf "%f\n"
[ERROR] -printf option not supported by BusyBox find
BusyBox find lacks -printf formatting.
FIX: Use: find /var/log -name "*.log" -exec basename {} \;
Or: find /var/log -name "*.log" | xargs -n1 basename
Line 17: grep -P '\d{4}-\d{2}-\d{2}'
[ERROR] -P (Perl regex) not supported by BusyBox grep
BusyBox grep does not support Perl regular expressions.
FIX: Install GNU grep: apk add grep
Or use extended regex: grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}'
Line 20: count=$(wc -l <<< "$(cat /tmp/dates.txt)")
[ERROR] Here-string (<<<) not supported in ash
BusyBox ash does not support here-strings.
FIX: Use echo and pipe: count=$(cat /tmp/dates.txt | wc -l)
Or: count=$(wc -l < /tmp/dates.txt)
Line 25: sed -i'' 's/old/new/g' /etc/config.conf
[WARNING] sed -i syntax differs in BusyBox
BusyBox sed requires: sed -i 's/old/new/g' (no quotes after -i)
FIX: Use: sed -i 's/old/new/g' /etc/config.conf
Line 28: for server in ${servers[@]}; do
[ERROR] Array expansion ${arr[@]} requires bash
BusyBox ash does not support arrays.
FIX: Use positional parameters with: for server in "$@"; do
Line 29: timeout 5 nc -z "$server" 22
[WARNING] timeout command may not be available
BusyBox includes timeout but behavior may differ.
Verify with: busybox timeout --help
================================================================================
Summary:
Errors: 8
Warnings: 2
Info: 0
This script will NOT work on Alpine Linux without modifications.
Run with --fix to see a corrected version.
Real World Outcome
After running your tool and fixing the issues, the corrected script:
#!/bin/sh
# Configuration (using separate variables instead of associative array)
config_host="prod.example.com"
config_port=22
# Servers using positional parameters
set -- web1 web2 web3
# Check if debug mode (POSIX-compatible test)
if [ "$DEBUG" = "true" ]; then
set -x
fi
# Find and process logs (without -printf)
find /var/log -name "*.log" -exec basename {} \; | while read file; do
grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}' "/var/log/$file" > /tmp/dates.txt
# Get line count (without here-string)
count=$(wc -l < /tmp/dates.txt)
echo "Found $count date patterns in $file"
done
# In-place update (BusyBox syntax)
sed -i 's/old/new/g' /etc/config.conf
# Check status (using positional parameters)
for server in "$@"; do
timeout 5 nc -z "$server" 22
done
Solution Architecture
High-Level Design
┌─────────────────────────────────────────────────────────────────────────────┐
│ BusyBox Script Compatibility Checker │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Input Handler │
│ • Parse command-line arguments (--json, --fix, --severity) │
│ • Read file(s) or stdin │
│ • Handle encoding (UTF-8) │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Pattern Database │
│ ┌─────────────────────┐ ┌─────────────────────┐ ┌──────────────────────┐ │
│ │ Bash-ism Rules │ │ GNU Option Rules │ │ Shebang Rules │ │
│ │ (syntax errors) │ │ (command opts) │ │ (interpreter) │ │
│ └─────────────────────┘ └─────────────────────┘ └──────────────────────┘ │
│ │
│ Each rule contains: │
│ • Pattern (regex or string match) │
│ • Severity (error, warning, info) │
│ • Description │
│ • Fix suggestion │
│ • Reference URL │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Line Scanner │
│ • For each line in file: │
│ - Skip comments (but check shebang on line 1) │
│ - Apply each rule pattern │
│ - Record matches with line/column info │
│ • Handle multi-line constructs │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Report Generator │
│ • Aggregate findings │
│ • Sort by severity/line number │
│ • Format output (human-readable, JSON, or fixed script) │
│ • Set exit code │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Output │
│ • stdout: Report or fixed script │
│ • stderr: Error messages │
│ • Exit code: 0 (ok), 1 (errors), 2 (warnings only) │
└─────────────────────────────────────────────────────────────────────────────┘
Key Components
1. Pattern Database Structure
Each compatibility rule is defined with:
Rule:
id: unique identifier (e.g., "BASH_ARRAY")
pattern: regex or literal string to match
severity: error | warning | info
message: human-readable description
fix: suggested replacement or workaround
reference: URL to documentation
context: where to check (shebang, line, command)
2. Scanner Logic
For each file:
Read all lines
For line 1:
Check shebang rules
For each line:
Skip if starts with # (comment) and line > 1
For each bash-ism rule:
If pattern matches:
Record finding
For each command pattern:
If command with GNU option found:
Record finding
Return list of findings
Data Structures
Shell Implementation (using shell constructs):
# Findings stored as formatted strings
# Format: LINE:SEVERITY:RULE_ID:MESSAGE
# Example:
# "11:error:BASH_DOUBLE_BRACKET:Extended test [[ ]] not supported"
Python Implementation:
@dataclass
class Finding:
file_path: str
line_number: int
column: int
severity: str # "error", "warning", "info"
rule_id: str
message: str
snippet: str
fix: str
reference: str
@dataclass
class Rule:
id: str
pattern: re.Pattern
severity: str
message: str
fix: str
reference: str = ""
context: str = "line" # "shebang", "line", "command"
Algorithm Overview
Main Algorithm:
function check_file(path):
lines = read_file(path)
findings = []
# Check shebang
if lines[0] matches bash shebang:
findings.append(bash_shebang_finding)
# Check each line
for i, line in enumerate(lines):
# Skip comments (except shebang)
if line starts with "#" and i > 0:
continue
# Check bash-isms
for rule in bash_rules:
if rule.pattern matches line:
findings.append(create_finding(rule, line, i))
# Check GNU options
for cmd_rule in command_rules:
if cmd_rule.pattern matches line:
findings.append(create_finding(cmd_rule, line, i))
return findings
Implementation Guide
Development Environment Setup
Option 1: Shell-only development (recommended for learning)
# Alpine container for testing
docker run -it --rm alpine:latest sh
# Create work directory
mkdir -p /workspace
cd /workspace
# Verify BusyBox shell
echo $0 # Should show: /bin/ash
# Test which features are missing
[[ "test" == "test" ]] && echo "bash" || echo "not bash"
# Result: syntax error - proves [[ ]] doesn't work
Option 2: Python development
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# No external dependencies needed for basic version
# Optional: install pytest for testing
pip install pytest
Test Container Setup:
# Dockerfile.test
FROM alpine:latest
RUN apk add --no-cache coreutils grep sed findutils
# Now we have both BusyBox AND GNU versions for comparison
WORKDIR /workspace
COPY . .
Project Structure
Shell Implementation:
alpine-check/
├── alpine-check.sh # Main script
├── rules/
│ ├── bash-isms.sh # Bash syntax patterns
│ └── gnu-options.sh # GNU utility patterns
├── tests/
│ ├── test-samples/ # Sample scripts to check
│ │ ├── bash-only.sh # Script with bash-isms
│ │ ├── gnu-heavy.sh # Script with GNU options
│ │ └── posix-clean.sh # POSIX-compliant script
│ └── run-tests.sh # Test runner
└── README.md
Python Implementation:
alpine_check/
├── alpine_check/
│ ├── __init__.py
│ ├── main.py # CLI entry point
│ ├── scanner.py # File scanner
│ ├── rules.py # Rule definitions
│ ├── reporter.py # Output formatting
│ └── fixer.py # Auto-fix suggestions
├── tests/
│ ├── test_scanner.py
│ ├── test_rules.py
│ └── fixtures/ # Sample scripts
├── pyproject.toml
└── README.md
The Core Question You’re Answering
“How do I know if my shell script will work on Alpine Linux before I deploy it?”
This is the gap you’re filling. Currently, developers must:
- Write a script on their dev machine (Ubuntu, macOS)
- Deploy to Alpine container
- Watch it fail
- Debug cryptic syntax errors
- Search Stack Overflow
- Repeat
Your tool provides immediate feedback during development.
Concepts to Understand First
Before implementing, ensure you can answer:
- What is a shebang and how does it affect script execution?
- The
#!line tells the kernel which interpreter to use #!/bin/bashexplicitly requires bash; fails if bash is missing
- The
- What is the difference between
[and[[?[is a command (actually/usr/bin/[or built-in)[[is bash syntax, not a command[[allows==, pattern matching,&&/||without quoting
- Why do arrays fail in POSIX sh?
- Arrays are a bash extension
- POSIX only defines
$*and$@for positional parameters
- What makes a regex “Perl-compatible”?
- PCRE includes features like
\d,\w, lookahead, etc. - BRE and ERE (POSIX) use
[0-9],[a-zA-Z_], etc.
- PCRE includes features like
Questions to Guide Your Design
Architecture Questions:
- Should the tool be a single file or modular?
- How do you handle multi-line constructs like here-documents?
- Should you parse the AST or use regex on raw lines?
Feature Questions:
- Should you detect issues inside strings? (
echo "[[ test ]]"is fine) - How do you handle sourced/included files?
- Should you support configuration files for custom rules?
Output Questions:
- What output formats are needed (text, JSON, SARIF)?
- How verbose should the default output be?
- Should the tool suggest fixes or just report issues?
Thinking Exercise
Before writing code, trace through this script manually:
#!/bin/bash
files=($(find . -name "*.txt"))
for f in "${files[@]}"; do
if [[ -f $f ]]; then
count=$(wc -l <<< "$(cat $f)")
echo "$f: $count lines"
fi
done
Exercise:
- List every line that would fail on Alpine
- For each issue, identify the exact error ash would produce
- Write a POSIX-compatible version
Expected Analysis:
- Line 1:
#!/bin/bash- fails, bash not installed - Line 2:
files=(...)- syntax error, arrays not supported - Line 2:
$(...)inside array - would work if arrays worked - Line 3:
"${files[@]}"- array expansion, not supported - Line 4:
[[ -f $f ]]- extended test, not supported - Line 5:
<<<- here-string, not supported
Hints in Layers
Hint 1 - Starting Point: Begin with the simplest check: detecting bash shebangs. This is a literal string match:
check_shebang() {
line="$1"
case "$line" in
"#!/bin/bash"*|"#!/usr/bin/env bash"*)
echo "Bash shebang detected"
;;
esac
}
Hint 2 - Pattern Matching Approach:
For shell implementation, use case statements with glob patterns. For more complex patterns, use grep -E:
check_double_brackets() {
line="$1"
# Match [[ anywhere in the line (not in strings)
echo "$line" | grep -qE '\[\[.*\]\]' && echo "Found [[ ]]"
}
Hint 3 - Command Option Detection: Extract the command first, then check its options:
check_grep_options() {
line="$1"
# Check if line contains grep
case "$line" in
*grep*)
# Check for -P option
echo "$line" | grep -qE 'grep[[:space:]]+-[^-]*P' && \
echo "grep -P not supported"
;;
esac
}
Hint 4 - Complete Rule Structure: For the shell version, store rules as functions with embedded metadata:
# Rule: BASH_ARRAY
# Severity: error
# Pattern: name=(...)
check_bash_array() {
line="$1"
lineno="$2"
# Match: var=(...) but not var=$(...)
if echo "$line" | grep -qE '[a-zA-Z_][a-zA-Z0-9_]*=\([^$]'; then
echo "$lineno:error:BASH_ARRAY:Array syntax not supported"
echo " FIX: Use positional parameters: set -- val1 val2 val3"
fi
}
Interview Questions This Project Prepares You For
- “What’s the difference between POSIX shell and Bash?”
- Answer: Bash extends POSIX with arrays,
[[ ]], process substitution, here-strings, etc. POSIX sh is the portable subset.
- Answer: Bash extends POSIX with arrays,
- “Why might a shell script work on Ubuntu but fail on Alpine?”
- Answer: Alpine uses BusyBox ash (POSIX sh) by default, not bash. Also, utilities are BusyBox versions with fewer options.
- “How would you make a shell script portable across different Unix systems?”
- Answer: Use
#!/bin/sh, avoid bash-isms, use only POSIX-defined utility options, test on multiple systems.
- Answer: Use
- “What’s the difference between
grep -Eandgrep -P?”- Answer:
-Euses Extended Regular Expressions (ERE, POSIX),-Puses Perl-Compatible Regular Expressions (PCRE). PCRE is not portable.
- Answer:
- “How do you handle arrays in POSIX sh?”
- Answer: Use positional parameters (
set -- val1 val2, then$1,$@) or newline-separated strings.
- Answer: Use positional parameters (
- “What does BusyBox provide and why is it used?”
- Answer: BusyBox is a single binary providing 400+ Unix utilities. Used in embedded systems and Alpine for minimal size.
- “How would you build a static analysis tool for shell scripts?”
- Answer: Either parse the shell grammar into an AST (complex) or use pattern matching on lines (simpler but less accurate).
Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| POSIX Shell | “Effective Shell” by Dave Kerr | Chapters on portability |
| Shell Scripting | “Classic Shell Scripting” by Robbins & Beebe | Chapter 6: Variables, Making Decisions |
| Regular Expressions | “Mastering Regular Expressions” by Friedl | BRE vs ERE vs PCRE |
| Unix Philosophy | “The Art of Unix Programming” by Raymond | Chapters 7-8: Multiprogramming, Minilanguages |
| Alpine Linux | Alpine Wiki | Comparison with other distros |
Implementation Phases
Phase 1: Basic Shebang and Syntax Detection (2-3 hours)
Goals:
- Detect bash shebangs
- Detect
[[ ]]extended tests - Detect array declarations
Shell Implementation:
#!/bin/sh
# alpine-check.sh - Check scripts for Alpine compatibility
# Check a single line for issues
check_line() {
line="$1"
lineno="$2"
# Check for bash shebang (only line 1)
if [ "$lineno" -eq 1 ]; then
case "$line" in
"#!/bin/bash"*|"#!/usr/bin/env bash"*)
printf "Line %d: %s\n" "$lineno" "$line"
printf " [ERROR] Bash shebang - Alpine uses ash, not bash\n"
printf " FIX: Use #!/bin/sh or install: apk add bash\n\n"
;;
esac
fi
# Check for [[ ]]
if echo "$line" | grep -qE '\[\[.*\]\]'; then
printf "Line %d: %s\n" "$lineno" "$line"
printf " [ERROR] Extended test [[ ]] not supported in ash\n"
printf " FIX: Use [ ] with proper quoting\n\n"
fi
# Check for arrays
if echo "$line" | grep -qE '^[[:space:]]*[a-zA-Z_][a-zA-Z0-9_]*=\([^$]'; then
printf "Line %d: %s\n" "$lineno" "$line"
printf " [ERROR] Array syntax not supported in ash\n"
printf " FIX: Use positional parameters: set -- val1 val2\n\n"
fi
}
# Main: read file and check each line
check_file() {
file="$1"
lineno=0
printf "=== Checking: %s ===\n\n" "$file"
while IFS= read -r line || [ -n "$line" ]; do
lineno=$((lineno + 1))
check_line "$line" "$lineno"
done < "$file"
}
# Entry point
for file in "$@"; do
if [ -f "$file" ]; then
check_file "$file"
else
printf "Error: %s not found\n" "$file" >&2
fi
done
Test:
# Create test script
cat > /tmp/test.sh << 'EOF'
#!/bin/bash
arr=(one two three)
if [[ $x == "test" ]]; then
echo "match"
fi
EOF
# Run checker
./alpine-check.sh /tmp/test.sh
Phase 2: Complete Bash-ism Detection (3-4 hours)
Goals:
- Add all bash-ism patterns
- Improve output formatting
- Add severity levels
Additional patterns to implement:
# Here-strings
check_here_string() {
if echo "$line" | grep -qE '<<<'; then
report_error "$lineno" "Here-string (<<<) not supported"
report_fix "Use echo and pipe: echo \"text\" | cmd"
fi
}
# Process substitution
check_process_substitution() {
if echo "$line" | grep -qE '<\(|>\('; then
report_error "$lineno" "Process substitution <() >() not supported"
report_fix "Use temporary files or named pipes"
fi
}
# Brace expansion
check_brace_expansion() {
if echo "$line" | grep -qE '\{[0-9]+\.\.[0-9]+\}|\{[a-z],[a-z]\}'; then
report_error "$lineno" "Brace expansion not supported"
report_fix "For numbers, use: seq 1 10"
report_fix "For letters, list explicitly: a b c"
fi
}
# == in test
check_double_equals() {
if echo "$line" | grep -qE '\[\s.*=='; then
report_warning "$lineno" "Use = not == for string comparison"
fi
}
# source vs .
check_source() {
if echo "$line" | grep -qE '^\s*source\s'; then
report_warning "$lineno" "'source' may not work; use '.' instead"
fi
}
# function keyword
check_function_keyword() {
if echo "$line" | grep -qE '^\s*function\s+[a-zA-Z_]'; then
report_warning "$lineno" "'function' keyword is not portable"
report_fix "Use: name() { ... }"
fi
}
Phase 3: GNU Utility Detection (2-3 hours)
Goals:
- Detect GNU-specific command options
- Provide BusyBox alternatives or apk install suggestions
Command patterns:
# grep -P (Perl regex)
check_grep_perl() {
if echo "$line" | grep -qE 'grep\s+(-[a-zA-Z]*P|-P|--perl-regexp)'; then
report_error "$lineno" "grep -P not supported by BusyBox"
report_fix "Install GNU grep: apk add grep"
report_fix "Or use ERE: grep -E '[0-9]{4}'"
fi
}
# find -printf
check_find_printf() {
if echo "$line" | grep -qE 'find\s.*-printf'; then
report_error "$lineno" "find -printf not supported by BusyBox"
report_fix "Install GNU find: apk add findutils"
report_fix "Or use: find ... -exec basename {} \\;"
fi
}
# sed -i with quotes
check_sed_inplace() {
if echo "$line" | grep -qE "sed\s+(-i''|-i\"\")"; then
report_warning "$lineno" "sed -i'' syntax differs in BusyBox"
report_fix "Use: sed -i 's/old/new/' file"
fi
}
# date -d
check_date_parse() {
if echo "$line" | grep -qE 'date\s+(-d|--date)'; then
report_warning "$lineno" "date -d syntax differs in BusyBox"
report_fix "BusyBox uses: date -D FORMAT -d STRING"
report_fix "Or install coreutils: apk add coreutils"
fi
}
# cp --parents
check_cp_parents() {
if echo "$line" | grep -qE 'cp\s+.*--parents'; then
report_error "$lineno" "cp --parents not supported by BusyBox"
report_fix "Install GNU coreutils: apk add coreutils"
fi
}
Phase 4: Output Formatting and CLI (2-3 hours)
Goals:
- Parse command-line options
- Support JSON output
- Generate summary statistics
CLI options:
usage() {
cat << 'EOF'
Usage: alpine-check.sh [OPTIONS] FILE...
Options:
-h, --help Show this help message
-j, --json Output in JSON format
-q, --quiet Only show errors (no warnings/info)
-v, --verbose Show all findings including info
-f, --fix Show corrected script on stdout
--severity LVL Filter by severity (error, warning, info)
-- End of options
Examples:
alpine-check.sh script.sh
alpine-check.sh -j *.sh
alpine-check.sh --fix broken.sh > fixed.sh
EOF
}
# Parse options
while [ $# -gt 0 ]; do
case "$1" in
-h|--help)
usage
exit 0
;;
-j|--json)
OUTPUT_FORMAT="json"
shift
;;
-q|--quiet)
VERBOSITY="quiet"
shift
;;
-v|--verbose)
VERBOSITY="verbose"
shift
;;
-f|--fix)
MODE="fix"
shift
;;
--)
shift
break
;;
-*)
printf "Unknown option: %s\n" "$1" >&2
exit 2
;;
*)
break
;;
esac
done
Phase 5: Testing and Edge Cases (2-3 hours)
Goals:
- Create comprehensive test suite
- Handle edge cases
- Ensure no false positives
Test cases to include:
# test_samples/false_positives.sh
#!/bin/sh
# These should NOT trigger warnings
# String containing [[
echo "Use [[ for bash"
# Comment with array syntax
# arr=(this is a comment)
# Quoted grep -P
echo 'grep -P is not portable'
# Inside single quotes
sed 's/\[\[/test/'
# Heredoc content
cat << 'EOF'
arr=(inside heredoc)
[[ inside heredoc ]]
EOF
# test_samples/all_bashisms.sh
#!/bin/bash
# Arrays
arr=(one two three)
declare -a indexed
declare -A assoc
arr+=("four")
echo "${arr[@]}"
echo "${arr[0]}"
echo "${#arr[@]}"
# Extended test
[[ $var == "test" ]]
[[ $var =~ ^[0-9]+$ ]]
[[ -f file && -r file ]]
# Here-strings
cat <<< "here string"
read var <<< "input"
# Process substitution
diff <(sort file1) <(sort file2)
tee >(cat > log)
# Brace expansion
echo {1..10}
echo {a,b,c}.txt
mkdir -p dir/{sub1,sub2}
# Substrings
echo "${var:0:5}"
echo "${var:5}"
echo "${var: -3}"
# Replacements
echo "${var//old/new}"
echo "${var/old/new}"
echo "${var/#prefix/}"
echo "${var/%suffix/}"
# source vs .
source script.sh
# function keyword
function myfunc() {
local -a arr
}
# Arithmetic
let x=1+2
(( x++ ))
$(( x + 1 ))
# select loop
select opt in a b c; do
break
done
# local typed
local -i num
local -a arr
local -A assoc
# read options
read -a arr
read -t 5 var
read -p "prompt" var
# printf -v
printf -v var "value"
Testing Strategy
Unit Tests
For shell implementation (using shell functions):
#!/bin/sh
# test_rules.sh
test_check_double_brackets() {
result=$(check_line '[[ $x == "test" ]]' 1)
if echo "$result" | grep -q "ERROR"; then
echo "PASS: Detects [[ ]]"
else
echo "FAIL: Should detect [[ ]]"
return 1
fi
}
test_false_positive_string() {
result=$(check_line 'echo "Use [[ for bash"' 1)
if echo "$result" | grep -q "ERROR"; then
echo "FAIL: False positive in string"
return 1
else
echo "PASS: No false positive in string"
fi
}
test_array_detection() {
result=$(check_line 'arr=(one two three)' 1)
if echo "$result" | grep -q "Array"; then
echo "PASS: Detects array"
else
echo "FAIL: Should detect array"
return 1
fi
}
test_command_substitution_not_array() {
result=$(check_line 'var=$(command)' 1)
if echo "$result" | grep -q "Array"; then
echo "FAIL: False positive for command substitution"
return 1
else
echo "PASS: Command substitution not detected as array"
fi
}
# Run all tests
run_tests() {
tests=0
passed=0
for test in test_check_double_brackets \
test_false_positive_string \
test_array_detection \
test_command_substitution_not_array; do
tests=$((tests + 1))
if $test; then
passed=$((passed + 1))
fi
done
echo ""
echo "Results: $passed/$tests passed"
[ "$passed" -eq "$tests" ]
}
run_tests
Integration Tests
#!/bin/sh
# test_integration.sh
# Test 1: Script with known issues should report them
test_known_issues() {
cat > /tmp/known_issues.sh << 'EOF'
#!/bin/bash
arr=(one two three)
[[ $x == "test" ]]
grep -P '\d+' file
EOF
result=$(./alpine-check.sh /tmp/known_issues.sh)
# Should find 4 issues: shebang, array, [[]], grep -P
count=$(echo "$result" | grep -c '\[ERROR\]')
if [ "$count" -ge 4 ]; then
echo "PASS: Found expected errors"
else
echo "FAIL: Expected 4+ errors, found $count"
return 1
fi
}
# Test 2: Clean POSIX script should have no errors
test_clean_posix() {
cat > /tmp/clean_posix.sh << 'EOF'
#!/bin/sh
var="test"
if [ "$var" = "test" ]; then
echo "match"
fi
for x in one two three; do
echo "$x"
done
EOF
result=$(./alpine-check.sh /tmp/clean_posix.sh)
count=$(echo "$result" | grep -c '\[ERROR\]')
if [ "$count" -eq 0 ]; then
echo "PASS: No false positives on clean script"
else
echo "FAIL: False positives on clean script"
echo "$result"
return 1
fi
}
# Test 3: Exit codes
test_exit_codes() {
./alpine-check.sh /tmp/known_issues.sh > /dev/null 2>&1
if [ $? -eq 1 ]; then
echo "PASS: Exit code 1 for errors"
else
echo "FAIL: Wrong exit code for errors"
return 1
fi
./alpine-check.sh /tmp/clean_posix.sh > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "PASS: Exit code 0 for clean script"
else
echo "FAIL: Wrong exit code for clean script"
return 1
fi
}
# Run all integration tests
run_integration_tests() {
test_known_issues && test_clean_posix && test_exit_codes
}
run_integration_tests
Real-World Validation
# Test against actual Alpine container
test_on_alpine() {
docker run --rm alpine:latest sh -c '
# This should fail
[[ "test" == "test" ]]
' 2>&1
# Expected: syntax error
docker run --rm alpine:latest sh -c '
# This should work
[ "test" = "test" ]
'
# Expected: no error
}
Common Pitfalls and Debugging
Pitfall 1: Overly Aggressive Pattern Matching
Problem: Detecting [[ inside strings or comments.
Example:
echo "Use [[ for conditional" # This is fine!
# This comment [[ is also fine ]]
Solution: Implement context awareness:
# Simple: skip lines that are clearly strings/comments
is_comment() {
echo "$1" | grep -qE '^\s*#'
}
# Better: track quote state (complex)
in_quotes=0
for char in $(echo "$line" | sed 's/./& /g'); do
case "$char" in
\")
in_quotes=$((1 - in_quotes))
;;
esac
done
Pitfall 2: Missing Multi-line Constructs
Problem: Here-documents span multiple lines.
Example:
cat << 'EOF'
This is inside a heredoc
arr=(this shouldn't trigger)
[[ neither should this ]]
EOF
Solution: Track heredoc state:
in_heredoc=0
heredoc_end=""
while read line; do
# Check for heredoc start
if echo "$line" | grep -qE '<<\s*'"'"'?([A-Za-z_]+)'"'"'?\s*$'; then
heredoc_end=$(echo "$line" | sed "s/.*<<\s*'\?\([A-Za-z_]*\)'\?.*/\1/")
in_heredoc=1
continue
fi
# Check for heredoc end
if [ "$in_heredoc" -eq 1 ] && [ "$line" = "$heredoc_end" ]; then
in_heredoc=0
continue
fi
# Skip content inside heredocs
[ "$in_heredoc" -eq 1 ] && continue
# Normal line checking
check_line "$line" "$lineno"
done
Pitfall 3: Regex Escaping
Problem: Special regex characters in patterns.
Example:
# Need to match: ${var[0]}
# Wrong: grep '[0-9]' # The [ ] are special
# Right: grep '\[0-9\]'
Solution: Escape carefully or use fixed strings:
# Use fgrep for literal strings
echo "$line" | grep -F '[[' && echo "found"
# Or properly escape regex
echo "$line" | grep '\[\[' && echo "found"
Pitfall 4: Command Spanning Lines
Problem: Commands can span multiple lines with backslash.
Example:
find /var \
-name "*.log" \
-printf "%f\n"
Solution: Join continued lines:
join_continued_lines() {
awk '{
if (/\\$/) {
gsub(/\\$/, "")
line = line $0
} else {
print line $0
line = ""
}
}'
}
Debugging Techniques
1. Test patterns in isolation:
# Test if pattern matches
pattern='grep\s+.*-P'
test_line='grep -P pattern file'
echo "$test_line" | grep -qE "$pattern" && echo "match" || echo "no match"
2. Verbose mode:
# Add debug output
if [ "$DEBUG" = "1" ]; then
printf "DEBUG: Checking line %d: %s\n" "$lineno" "$line" >&2
fi
3. Compare with ShellCheck:
# ShellCheck can validate your findings
shellcheck -s sh script.sh
# Compare its output with yours
Extensions and Challenges
Extension 1: Auto-Fix Mode
Generate a corrected script with portable alternatives:
fix_line() {
line="$1"
# Fix [[ to [
fixed=$(echo "$line" | sed 's/\[\[/[/g; s/\]\]/]/g; s/==/=/g')
# Fix here-strings to echo|
fixed=$(echo "$fixed" | sed 's/<<<\s*\(.*\)/echo \1 |/')
echo "$fixed"
}
Extension 2: Dockerfile Analysis
Detect compatibility issues in Dockerfiles:
check_dockerfile() {
file="$1"
# Check base image
grep -E '^FROM\s+alpine' "$file" || return
# Check RUN commands for shell issues
grep -E '^RUN\s' "$file" | while read line; do
# Extract shell commands
cmd=$(echo "$line" | sed 's/^RUN\s*//')
check_shell_command "$cmd"
done
}
Extension 3: CI Integration
Create GitHub Action or GitLab CI configuration:
# .github/workflows/alpine-check.yml
name: Alpine Compatibility Check
on: [push, pull_request]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Check shell scripts
run: |
find . -name "*.sh" -exec ./alpine-check.sh {} \;
- name: Test on Alpine
run: |
docker run -v $PWD:/scripts alpine:latest sh -c '
for f in /scripts/*.sh; do
sh -n "$f" || exit 1
done
'
Extension 4: Rule Configuration
Allow users to customize rules:
# ~/.alpine-check.conf or .alpine-check.yml
rules:
BASH_DOUBLE_BRACKET:
severity: error # or warning, info, ignore
SED_INPLACE:
severity: warning
ignore_paths:
- vendor/
- node_modules/
ignore_patterns:
- "*.bats" # Skip bats test files
Challenge 1: Handle All Edge Cases
Make the tool handle:
- Nested quotes
- Escaped characters
- Variable expansion in strings
- Subshell contexts
Challenge 2: Python Implementation
Reimplement in Python with:
- Proper AST parsing using
shlex - Click CLI framework
- pytest test suite
- Type hints throughout
Challenge 3: Real-time Editor Integration
Create plugins for:
- VS Code (using Language Server Protocol)
- Vim (using ALE or similar)
- Emacs (using Flycheck)
Real-World Connections
Container Ecosystems
Why Docker loves Alpine:
- 5 MB base image vs 70+ MB for Ubuntu
- Faster pulls, less storage, smaller attack surface
- Your tool helps teams adopt Alpine confidently
Real usage:
- Official Docker images often have Alpine variants
- Kubernetes operators frequently use Alpine-based images
- Serverless platforms use Alpine for cold-start optimization
Embedded Systems
BusyBox origins:
- Created for Debian installer to fit on floppy disk
- Now standard in routers, IoT devices, Android
Your tool’s relevance:
- OpenWrt (router firmware) uses BusyBox
- Yocto/Buildroot embedded Linux uses BusyBox
- Scripts must be portable across these systems
CI/CD Pipelines
Common problem:
- Developers test on macOS/Ubuntu
- CI runs on Alpine containers
- Scripts break in CI but pass locally
Your tool as a solution:
- Run before committing
- Integrate into pre-commit hooks
- Catch issues before they reach CI
Security Auditing
Minimal attack surface:
- Alpine’s small size means fewer vulnerabilities
- BusyBox’s simple implementation has fewer CVEs than GNU
- Your tool helps maintain this by avoiding unnecessary packages
Resources
Documentation
- POSIX Shell Command Language - The official standard
- BusyBox Applet Documentation - Command reference
- ShellCheck Wiki - Shell gotchas
- Alpine Linux Wiki - Alpine specifics
- Dash as /bin/sh - Ubuntu’s POSIX compliance
Tools
- ShellCheck - Linter for shell scripts
- checkbashisms - Debian’s bash-ism detector
- shfmt - Shell formatter and parser
- BATS - Bash Automated Testing System
Books
- “Effective Shell” by Dave Kerr - Modern shell practices
- “Classic Shell Scripting” by Robbins & Beebe - Portable scripting
- “The UNIX Programming Environment” by Kernighan & Pike - Unix fundamentals
Self-Assessment Checklist
Before considering this project complete, verify you can:
Core Functionality:
- Detect bash shebangs and suggest alternatives
- Identify
[[ ]]extended test syntax - Catch array declarations and expansions
- Flag here-strings (
<<<) - Detect process substitution (
<(),>()) - Find brace expansion (
{1..10}) - Identify GNU-specific grep, sed, find options
Code Quality:
- Tool runs on Alpine/BusyBox without modifications
- No false positives on clean POSIX scripts
- Handles edge cases (strings, comments, heredocs)
- Clear, actionable error messages
- Proper exit codes (0=ok, 1=errors, 2=warnings)
Understanding:
- Can explain difference between POSIX sh and bash
- Know which GNU options lack BusyBox support
- Understand why Alpine uses BusyBox
- Can write portable shell scripts yourself
Testing:
- Unit tests for each pattern
- Integration tests with real scripts
- Validated against actual Alpine container
Submission/Completion Criteria
Minimum Viable Product (MVP):
- Shell script that detects at least 10 common bash-isms
- Detects at least 5 GNU-specific command options
- Provides clear error messages with fix suggestions
- Correctly handles basic edge cases (comments, strings)
- Works when run on Alpine Linux itself
- Includes at least 10 test cases
Full Implementation:
- All 20+ bash-isms from the pattern list
- All 15+ GNU options from the pattern list
- JSON output format option
- Auto-fix suggestion mode
- Recursive directory scanning
- Configuration file support
- 50+ test cases with edge cases
- Documentation with examples
Stretch Goals:
- Python implementation with AST parsing
- VS Code extension
- GitHub Action integration
- Support for other minimal shells (dash, mksh)
- Performance optimized for large codebases
Complete Bash-ism Reference
For implementation, here is the comprehensive list of bash-isms to detect:
Syntax Features
| Feature | Example | POSIX Alternative |
|---|---|---|
| Extended test | [[ $a == $b ]] |
[ "$a" = "$b" ] |
| Arrays | arr=(a b c) |
set -- a b c |
| Associative arrays | declare -A map |
Use files or separate variables |
| Here-strings | cat <<< "text" |
echo "text" \| cat |
| Process substitution | diff <(cmd1) <(cmd2) |
Use temporary files |
| Brace expansion | {1..10} |
seq 1 10 |
| Brace lists | {a,b,c}.txt |
a.txt b.txt c.txt |
$'...' quoting |
$'\n' |
$(printf '\n') |
| Regex match | [[ $s =~ regex ]] |
echo "$s" \| grep -E regex |
== in test |
[ "$a" == "$b" ] |
[ "$a" = "$b" ] |
function keyword |
function name() {} |
name() {} |
source |
source file.sh |
. file.sh |
let |
let x=1+2 |
x=$((1+2)) |
(( )) |
(( x++ )) |
x=$((x+1)) |
| Substring | ${var:0:5} |
echo "$var" \| cut -c1-5 |
| Replacement | ${var//old/new} |
echo "$var" \| sed 's/old/new/g' |
| Default value | ${var:-default} |
Supported (POSIX) |
local -a |
local -a arr |
local arr (no type) |
read -a |
read -a arr |
Read into separate variables |
read -p |
read -p "prompt" |
printf "prompt"; read |
printf -v |
printf -v var "%s" |
var=$(printf "%s") |
select |
select opt in ... |
Use case statement |
coproc |
coproc cmd |
Use named pipes |
GNU Utility Options
| Command | GNU Option | BusyBox Support | Alternative |
|---|---|---|---|
| grep | -P (Perl regex) |
Not supported | Use -E (ERE) |
| grep | --include |
Not supported | Use find \| xargs grep |
| grep | --exclude |
Not supported | Use find \| xargs grep |
| sed | -i'' |
Different syntax | Use -i (no argument) |
| sed | -z |
Not supported | Install GNU sed |
| find | -printf |
Not supported | Use -exec with commands |
| find | -regex |
Limited support | Use -name patterns |
| cp | --parents |
Not supported | Install GNU coreutils |
| date | -d "string" |
Different syntax | Use -D format -d string |
| stat | -c |
Limited formats | Check available formats |
| xargs | -r |
Different meaning | Test behavior |
| ls | -G |
Not supported | Use awk to hide group |
| timeout | N/A | Included | Works but check options |
| realpath | N/A | May need install | Use readlink -f |
| mktemp | -t |
Different handling | Test behavior |
| sort | -V |
Not supported | Install GNU coreutils |
| head/tail | -c +N |
May differ | Test behavior |
This project bridges the gap between “works on my machine” and “works in production.” By understanding what makes Alpine different, you’ll write better, more portable shell scripts everywhere.