Project 2: BusyBox Script Compatibility Checker

Quick Reference

Attribute Details
Difficulty Beginner
Time Estimate Weekend (8-16 hours)
Primary Language Shell (POSIX sh)
Alternative Language Python
Knowledge Area Shell Scripting / POSIX Compatibility
Software/Tools BusyBox, ShellCheck, Alpine Linux, Docker
Main Book “Effective Shell” by Dave Kerr
Prerequisites Basic shell scripting, command-line familiarity

Learning Objectives

By completing this project, you will:

  1. Understand the difference between POSIX shell and Bash and why it matters for portability
  2. Identify common bash-isms that break on BusyBox/ash shells
  3. Recognize GNU-specific command options that fail on BusyBox utilities
  4. Write portable shell scripts that work across all Unix-like systems
  5. Build a static analysis tool that detects compatibility issues before deployment
  6. Master POSIX-compatible alternatives to common bash features

Theoretical Foundation

Core Concepts

1. What Is BusyBox?

BusyBox is a single executable (~1 MB) that provides stripped-down versions of approximately 400 Unix utilities. On Alpine Linux:

┌─────────────────────────────────────────────────────────────────────┐
│                        BusyBox Architecture                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   /bin/busybox (single binary, ~1 MB)                               │
│       │                                                             │
│       ├── ls → symlink to busybox                                   │
│       ├── grep → symlink to busybox                                 │
│       ├── awk → symlink to busybox                                  │
│       ├── sed → symlink to busybox                                  │
│       ├── sh (ash) → symlink to busybox                             │
│       ├── find → symlink to busybox                                 │
│       └── ... 400+ more applets                                     │
│                                                                     │
│   When you run "ls", BusyBox checks argv[0] and runs the            │
│   appropriate applet with simplified functionality.                  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Verification:

# On Alpine Linux
ls -la /bin/ls
# lrwxrwxrwx 1 root root 12 /bin/ls -> /bin/busybox

# List all BusyBox applets
busybox --list | wc -l
# ~400 applets

2. POSIX Shell vs Bash

POSIX (Portable Operating System Interface) defines a standard for Unix-like operating systems. The POSIX shell specification describes a minimal, portable shell. Bash extends POSIX with many features:

┌─────────────────────────────────────────────────────────────────────┐
│                   POSIX Shell vs Bash Features                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   POSIX Shell (sh, ash, dash)        Bash Extensions                │
│   ──────────────────────────         ───────────────                │
│   [ test ]                           [[ extended test ]]            │
│   $(command)                         $((arithmetic))                │
│   $var, ${var}                       ${var:0:5} substrings          │
│   case/esac                          arrays: arr=(a b c)            │
│   if/then/else/fi                    associative arrays             │
│   for/while/until                    brace expansion: {1..10}       │
│   local variables                    here-strings: <<<              │
│   functions                          process substitution: <()      │
│   pipes and redirection              regex: =~                      │
│   exit status ($?)                   BASHPID, BASH_VERSION          │
│                                                                     │
│   ✓ Works everywhere                 ✗ Bash-only features           │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

3. GNU vs BusyBox Utilities

GNU Coreutils provides feature-rich implementations with many options. BusyBox provides minimal implementations:

┌─────────────────────────────────────────────────────────────────────┐
│                  GNU Coreutils vs BusyBox Comparison                 │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   Command    GNU Options           BusyBox Support                  │
│   ───────    ───────────           ───────────────                  │
│   grep       -P (Perl regex)       ❌ Not supported                  │
│              -o (only matching)    ✓ Supported                      │
│              --color               ✓ Supported                      │
│                                                                     │
│   sed        -i'' (no backup)      ❌ Syntax differs                 │
│              -E (extended)         ✓ Supported                      │
│              -z (null-delimited)   ❌ Not supported                  │
│                                                                     │
│   find       -printf               ❌ Not supported                  │
│              -name                 ✓ Supported                      │
│              -exec                 ✓ Supported                      │
│                                                                     │
│   date       -d "string"           ⚠️ Different syntax               │
│              +%format              ✓ Supported                      │
│                                                                     │
│   xargs      -r (no-run-if-empty)  ⚠️ Different behavior             │
│              -0 (null-delimited)   ✓ Supported                      │
│                                                                     │
│   cp         --parents             ❌ Not supported                  │
│              -r (recursive)        ✓ Supported                      │
│                                                                     │
│   stat       -c (custom format)    ⚠️ Limited format options         │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Why This Matters

The Real-World Problem:

You write a Dockerfile that works perfectly on your Ubuntu development machine:

FROM alpine:latest
COPY build.sh /
RUN chmod +x /build.sh && /build.sh

But build.sh contains:

#!/bin/bash
arr=(one two three)
if [[ $DEBUG == "true" ]]; then
    grep -P '\d+' logfile.txt
fi

Result: The container build fails with cryptic errors:

  • /bin/bash: not found
  • syntax error: unexpected "(" (arrays)
  • [[: not found

The Cost:

  • Production deployments fail
  • CI/CD pipelines break
  • Hours spent debugging “works on my machine” issues
  • Security vulnerabilities from installing full bash/GNU tools unnecessarily

Historical Context

Why does Alpine use BusyBox?

Alpine Linux was designed for embedded systems, routers, and containers where:

  • Disk space is precious (5 MB base vs 70+ MB)
  • Attack surface must be minimal
  • Simplicity aids security auditing
  • Speed matters (fewer bytes to load)

BusyBox was created in 1996 by Bruce Perens for the Debian installer, specifically to fit on a single floppy disk (1.44 MB). It became the standard for embedded Linux.

Why does this create problems?

Most developers learn shell scripting on systems with:

  • Bash as the default shell (Ubuntu, macOS, RHEL)
  • GNU Coreutils with full feature sets
  • Tutorials that assume these tools

This creates a “works on my machine” problem when scripts move to Alpine.

Common Misconceptions

Misconception 1: “sh is just a symlink to bash”

On Ubuntu/Debian, /bin/sh is often a symlink to dash (Debian Almquist Shell), not bash. On Alpine, it’s ash (Almquist Shell from BusyBox). Neither supports bash-isms.

# Check what sh actually is
ls -la /bin/sh
# Alpine: /bin/sh -> /bin/busybox
# Ubuntu: /bin/sh -> /bin/dash

Misconception 2: “If it works in bash, it’s portable”

Bash is one of the most feature-rich shells. Testing only in bash guarantees nothing about portability.

Misconception 3: “Just install bash on Alpine”

While possible (apk add bash), this:

  • Increases image size by ~5 MB
  • Adds potential security vulnerabilities
  • Defeats the purpose of using Alpine
  • Doesn’t fix GNU utility compatibility issues

Misconception 4: “ShellCheck catches everything”

ShellCheck is excellent but primarily focuses on bash correctness. While it can warn about bash-isms when targeting sh, it doesn’t know about BusyBox-specific limitations in utilities like grep, sed, or find.


Project Specification

What You Will Build

A command-line tool that analyzes shell scripts for Alpine Linux/BusyBox compatibility issues. The tool will:

  1. Detect bash-isms that fail on ash/POSIX shells
  2. Identify GNU-specific command options not supported by BusyBox
  3. Suggest POSIX-compatible alternatives
  4. Provide severity levels (error, warning, info)
  5. Support checking individual files or entire directories
  6. Output in human-readable or machine-parseable formats

Functional Requirements

  1. Shebang Detection
    • Warn on #!/bin/bash or #!/usr/bin/env bash
    • Suggest #!/bin/sh for portability
  2. Bash Syntax Detection
    • [[ ]] extended test syntax
    • Arrays: arr=(a b c), ${arr[@]}, ${arr[0]}
    • Associative arrays: declare -A
    • Brace expansion: {1..10}, {a,b,c}
    • Here-strings: <<<
    • Process substitution: <(), >()
    • $'...' ANSI-C quoting
    • == in test (should be =)
    • function name() (should be name())
    • source (should be .)
    • let and (( )) arithmetic
    • ${var:offset:length} substring
    • ${var/pattern/replacement} substitution
    • local -a, local -A typed local variables
    • read -a (read into array)
    • printf -v (assign to variable)
    • select loops
  3. GNU Utility Option Detection
    • grep -P (Perl regex)
    • grep --include, grep --exclude
    • sed -i'' (in-place with empty suffix)
    • sed -z (null-delimited)
    • find -printf
    • find -regex (with certain patterns)
    • cp --parents
    • date -d (parse date string)
    • stat -c (with unsupported format codes)
    • xargs -r (different behavior)
    • readlink -f (works but verify)
    • ls -G (hide group)
    • timeout command
    • realpath command
    • mktemp -t (template handling)
  4. Report Generation
    • Line number and column
    • Problematic code snippet
    • Severity level
    • Suggested fix
    • Reference documentation

Non-Functional Requirements

  • Process files under 10 MB within 1 second
  • Support UTF-8 encoded scripts
  • Zero dependencies for shell implementation
  • Python implementation should work with Python 3.6+
  • Exit code 0 if no errors, 1 if errors found, 2 if warnings only

Example Usage and Output

Input script (deploy.sh):

#!/bin/bash

# Configuration
declare -A config
config[host]="prod.example.com"
config[port]=22

# Arrays for servers
servers=(web1 web2 web3)

# Check if debug mode
if [[ $DEBUG == "true" ]]; then
    set -x
fi

# Find and process logs
find /var/log -name "*.log" -printf "%f\n" | while read file; do
    grep -P '\d{4}-\d{2}-\d{2}' "$file" > /tmp/dates.txt

    # Get line count
    count=$(wc -l <<< "$(cat /tmp/dates.txt)")
    echo "Found $count date patterns in $file"
done

# In-place update
sed -i'' 's/old/new/g' /etc/config.conf

# Check status
for server in ${servers[@]}; do
    timeout 5 nc -z "$server" 22
done

Tool output:

$ ./alpine-check.sh deploy.sh

=== Alpine/BusyBox Compatibility Report ===
File: deploy.sh
================================================================================

Line 1: #!/bin/bash
  [ERROR] Bash shebang detected
  Alpine uses ash (BusyBox shell). Bash is not installed by default.
  FIX: Change to #!/bin/sh or install bash with: apk add bash

Line 4: declare -A config
  [ERROR] Associative arrays require bash
  BusyBox ash does not support associative arrays.
  FIX: Use separate variables or a configuration file

Line 8: servers=(web1 web2 web3)
  [ERROR] Array syntax not supported in ash
  BusyBox ash does not support bash arrays.
  FIX: Use positional parameters: set -- web1 web2 web3
       Then access with: $1, $2, $3 or "$@"

Line 11: if [[ $DEBUG == "true" ]]; then
  [ERROR] Extended test [[ ]] not supported
  BusyBox ash uses [ ] (single brackets) only.
  FIX: Use [ "$DEBUG" = "true" ]
       Note: Use = not == for string comparison

Line 16: find /var/log -name "*.log" -printf "%f\n"
  [ERROR] -printf option not supported by BusyBox find
  BusyBox find lacks -printf formatting.
  FIX: Use: find /var/log -name "*.log" -exec basename {} \;
       Or: find /var/log -name "*.log" | xargs -n1 basename

Line 17: grep -P '\d{4}-\d{2}-\d{2}'
  [ERROR] -P (Perl regex) not supported by BusyBox grep
  BusyBox grep does not support Perl regular expressions.
  FIX: Install GNU grep: apk add grep
       Or use extended regex: grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}'

Line 20: count=$(wc -l <<< "$(cat /tmp/dates.txt)")
  [ERROR] Here-string (<<<) not supported in ash
  BusyBox ash does not support here-strings.
  FIX: Use echo and pipe: count=$(cat /tmp/dates.txt | wc -l)
       Or: count=$(wc -l < /tmp/dates.txt)

Line 25: sed -i'' 's/old/new/g' /etc/config.conf
  [WARNING] sed -i syntax differs in BusyBox
  BusyBox sed requires: sed -i 's/old/new/g' (no quotes after -i)
  FIX: Use: sed -i 's/old/new/g' /etc/config.conf

Line 28: for server in ${servers[@]}; do
  [ERROR] Array expansion ${arr[@]} requires bash
  BusyBox ash does not support arrays.
  FIX: Use positional parameters with: for server in "$@"; do

Line 29: timeout 5 nc -z "$server" 22
  [WARNING] timeout command may not be available
  BusyBox includes timeout but behavior may differ.
  Verify with: busybox timeout --help

================================================================================
Summary:
  Errors:   8
  Warnings: 2
  Info:     0

This script will NOT work on Alpine Linux without modifications.
Run with --fix to see a corrected version.

Real World Outcome

After running your tool and fixing the issues, the corrected script:

#!/bin/sh

# Configuration (using separate variables instead of associative array)
config_host="prod.example.com"
config_port=22

# Servers using positional parameters
set -- web1 web2 web3

# Check if debug mode (POSIX-compatible test)
if [ "$DEBUG" = "true" ]; then
    set -x
fi

# Find and process logs (without -printf)
find /var/log -name "*.log" -exec basename {} \; | while read file; do
    grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}' "/var/log/$file" > /tmp/dates.txt

    # Get line count (without here-string)
    count=$(wc -l < /tmp/dates.txt)
    echo "Found $count date patterns in $file"
done

# In-place update (BusyBox syntax)
sed -i 's/old/new/g' /etc/config.conf

# Check status (using positional parameters)
for server in "$@"; do
    timeout 5 nc -z "$server" 22
done

Solution Architecture

High-Level Design

┌─────────────────────────────────────────────────────────────────────────────┐
│                     BusyBox Script Compatibility Checker                     │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              Input Handler                                   │
│  • Parse command-line arguments (--json, --fix, --severity)                 │
│  • Read file(s) or stdin                                                    │
│  • Handle encoding (UTF-8)                                                  │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            Pattern Database                                  │
│  ┌─────────────────────┐  ┌─────────────────────┐  ┌──────────────────────┐ │
│  │   Bash-ism Rules    │  │   GNU Option Rules  │  │   Shebang Rules      │ │
│  │   (syntax errors)   │  │   (command opts)    │  │   (interpreter)      │ │
│  └─────────────────────┘  └─────────────────────┘  └──────────────────────┘ │
│                                                                              │
│  Each rule contains:                                                         │
│  • Pattern (regex or string match)                                          │
│  • Severity (error, warning, info)                                          │
│  • Description                                                              │
│  • Fix suggestion                                                           │
│  • Reference URL                                                            │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              Line Scanner                                    │
│  • For each line in file:                                                   │
│    - Skip comments (but check shebang on line 1)                            │
│    - Apply each rule pattern                                                │
│    - Record matches with line/column info                                   │
│  • Handle multi-line constructs                                             │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            Report Generator                                  │
│  • Aggregate findings                                                        │
│  • Sort by severity/line number                                             │
│  • Format output (human-readable, JSON, or fixed script)                    │
│  • Set exit code                                                            │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                                Output                                        │
│  • stdout: Report or fixed script                                           │
│  • stderr: Error messages                                                   │
│  • Exit code: 0 (ok), 1 (errors), 2 (warnings only)                         │
└─────────────────────────────────────────────────────────────────────────────┘

Key Components

1. Pattern Database Structure

Each compatibility rule is defined with:

Rule:
  id: unique identifier (e.g., "BASH_ARRAY")
  pattern: regex or literal string to match
  severity: error | warning | info
  message: human-readable description
  fix: suggested replacement or workaround
  reference: URL to documentation
  context: where to check (shebang, line, command)

2. Scanner Logic

For each file:
  Read all lines

  For line 1:
    Check shebang rules

  For each line:
    Skip if starts with # (comment) and line > 1

    For each bash-ism rule:
      If pattern matches:
        Record finding

    For each command pattern:
      If command with GNU option found:
        Record finding

  Return list of findings

Data Structures

Shell Implementation (using shell constructs):

# Findings stored as formatted strings
# Format: LINE:SEVERITY:RULE_ID:MESSAGE

# Example:
# "11:error:BASH_DOUBLE_BRACKET:Extended test [[ ]] not supported"

Python Implementation:

@dataclass
class Finding:
    file_path: str
    line_number: int
    column: int
    severity: str  # "error", "warning", "info"
    rule_id: str
    message: str
    snippet: str
    fix: str
    reference: str

@dataclass
class Rule:
    id: str
    pattern: re.Pattern
    severity: str
    message: str
    fix: str
    reference: str = ""
    context: str = "line"  # "shebang", "line", "command"

Algorithm Overview

Main Algorithm:

function check_file(path):
    lines = read_file(path)
    findings = []

    # Check shebang
    if lines[0] matches bash shebang:
        findings.append(bash_shebang_finding)

    # Check each line
    for i, line in enumerate(lines):
        # Skip comments (except shebang)
        if line starts with "#" and i > 0:
            continue

        # Check bash-isms
        for rule in bash_rules:
            if rule.pattern matches line:
                findings.append(create_finding(rule, line, i))

        # Check GNU options
        for cmd_rule in command_rules:
            if cmd_rule.pattern matches line:
                findings.append(create_finding(cmd_rule, line, i))

    return findings

Implementation Guide

Development Environment Setup

Option 1: Shell-only development (recommended for learning)

# Alpine container for testing
docker run -it --rm alpine:latest sh

# Create work directory
mkdir -p /workspace
cd /workspace

# Verify BusyBox shell
echo $0  # Should show: /bin/ash

# Test which features are missing
[[ "test" == "test" ]] && echo "bash" || echo "not bash"
# Result: syntax error - proves [[ ]] doesn't work

Option 2: Python development

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# No external dependencies needed for basic version
# Optional: install pytest for testing
pip install pytest

Test Container Setup:

# Dockerfile.test
FROM alpine:latest
RUN apk add --no-cache coreutils grep sed findutils
# Now we have both BusyBox AND GNU versions for comparison
WORKDIR /workspace
COPY . .

Project Structure

Shell Implementation:

alpine-check/
├── alpine-check.sh          # Main script
├── rules/
│   ├── bash-isms.sh         # Bash syntax patterns
│   └── gnu-options.sh       # GNU utility patterns
├── tests/
│   ├── test-samples/        # Sample scripts to check
│   │   ├── bash-only.sh     # Script with bash-isms
│   │   ├── gnu-heavy.sh     # Script with GNU options
│   │   └── posix-clean.sh   # POSIX-compliant script
│   └── run-tests.sh         # Test runner
└── README.md

Python Implementation:

alpine_check/
├── alpine_check/
│   ├── __init__.py
│   ├── main.py              # CLI entry point
│   ├── scanner.py           # File scanner
│   ├── rules.py             # Rule definitions
│   ├── reporter.py          # Output formatting
│   └── fixer.py             # Auto-fix suggestions
├── tests/
│   ├── test_scanner.py
│   ├── test_rules.py
│   └── fixtures/            # Sample scripts
├── pyproject.toml
└── README.md

The Core Question You’re Answering

“How do I know if my shell script will work on Alpine Linux before I deploy it?”

This is the gap you’re filling. Currently, developers must:

  1. Write a script on their dev machine (Ubuntu, macOS)
  2. Deploy to Alpine container
  3. Watch it fail
  4. Debug cryptic syntax errors
  5. Search Stack Overflow
  6. Repeat

Your tool provides immediate feedback during development.

Concepts to Understand First

Before implementing, ensure you can answer:

  1. What is a shebang and how does it affect script execution?
    • The #! line tells the kernel which interpreter to use
    • #!/bin/bash explicitly requires bash; fails if bash is missing
  2. What is the difference between [ and [[?
    • [ is a command (actually /usr/bin/[ or built-in)
    • [[ is bash syntax, not a command
    • [[ allows ==, pattern matching, &&/|| without quoting
  3. Why do arrays fail in POSIX sh?
    • Arrays are a bash extension
    • POSIX only defines $* and $@ for positional parameters
  4. What makes a regex “Perl-compatible”?
    • PCRE includes features like \d, \w, lookahead, etc.
    • BRE and ERE (POSIX) use [0-9], [a-zA-Z_], etc.

Questions to Guide Your Design

Architecture Questions:

  • Should the tool be a single file or modular?
  • How do you handle multi-line constructs like here-documents?
  • Should you parse the AST or use regex on raw lines?

Feature Questions:

  • Should you detect issues inside strings? (echo "[[ test ]]" is fine)
  • How do you handle sourced/included files?
  • Should you support configuration files for custom rules?

Output Questions:

  • What output formats are needed (text, JSON, SARIF)?
  • How verbose should the default output be?
  • Should the tool suggest fixes or just report issues?

Thinking Exercise

Before writing code, trace through this script manually:

#!/bin/bash
files=($(find . -name "*.txt"))
for f in "${files[@]}"; do
    if [[ -f $f ]]; then
        count=$(wc -l <<< "$(cat $f)")
        echo "$f: $count lines"
    fi
done

Exercise:

  1. List every line that would fail on Alpine
  2. For each issue, identify the exact error ash would produce
  3. Write a POSIX-compatible version

Expected Analysis:

  • Line 1: #!/bin/bash - fails, bash not installed
  • Line 2: files=(...) - syntax error, arrays not supported
  • Line 2: $(...) inside array - would work if arrays worked
  • Line 3: "${files[@]}" - array expansion, not supported
  • Line 4: [[ -f $f ]] - extended test, not supported
  • Line 5: <<< - here-string, not supported

Hints in Layers

Hint 1 - Starting Point: Begin with the simplest check: detecting bash shebangs. This is a literal string match:

check_shebang() {
    line="$1"
    case "$line" in
        "#!/bin/bash"*|"#!/usr/bin/env bash"*)
            echo "Bash shebang detected"
            ;;
    esac
}

Hint 2 - Pattern Matching Approach: For shell implementation, use case statements with glob patterns. For more complex patterns, use grep -E:

check_double_brackets() {
    line="$1"
    # Match [[ anywhere in the line (not in strings)
    echo "$line" | grep -qE '\[\[.*\]\]' && echo "Found [[ ]]"
}

Hint 3 - Command Option Detection: Extract the command first, then check its options:

check_grep_options() {
    line="$1"
    # Check if line contains grep
    case "$line" in
        *grep*)
            # Check for -P option
            echo "$line" | grep -qE 'grep[[:space:]]+-[^-]*P' && \
                echo "grep -P not supported"
            ;;
    esac
}

Hint 4 - Complete Rule Structure: For the shell version, store rules as functions with embedded metadata:

# Rule: BASH_ARRAY
# Severity: error
# Pattern: name=(...)
check_bash_array() {
    line="$1"
    lineno="$2"

    # Match: var=(...) but not var=$(...)
    if echo "$line" | grep -qE '[a-zA-Z_][a-zA-Z0-9_]*=\([^$]'; then
        echo "$lineno:error:BASH_ARRAY:Array syntax not supported"
        echo "  FIX: Use positional parameters: set -- val1 val2 val3"
    fi
}

Interview Questions This Project Prepares You For

  1. “What’s the difference between POSIX shell and Bash?”
    • Answer: Bash extends POSIX with arrays, [[ ]], process substitution, here-strings, etc. POSIX sh is the portable subset.
  2. “Why might a shell script work on Ubuntu but fail on Alpine?”
    • Answer: Alpine uses BusyBox ash (POSIX sh) by default, not bash. Also, utilities are BusyBox versions with fewer options.
  3. “How would you make a shell script portable across different Unix systems?”
    • Answer: Use #!/bin/sh, avoid bash-isms, use only POSIX-defined utility options, test on multiple systems.
  4. “What’s the difference between grep -E and grep -P?”
    • Answer: -E uses Extended Regular Expressions (ERE, POSIX), -P uses Perl-Compatible Regular Expressions (PCRE). PCRE is not portable.
  5. “How do you handle arrays in POSIX sh?”
    • Answer: Use positional parameters (set -- val1 val2, then $1, $@) or newline-separated strings.
  6. “What does BusyBox provide and why is it used?”
    • Answer: BusyBox is a single binary providing 400+ Unix utilities. Used in embedded systems and Alpine for minimal size.
  7. “How would you build a static analysis tool for shell scripts?”
    • Answer: Either parse the shell grammar into an AST (complex) or use pattern matching on lines (simpler but less accurate).

Books That Will Help

Topic Book Chapter/Section
POSIX Shell “Effective Shell” by Dave Kerr Chapters on portability
Shell Scripting “Classic Shell Scripting” by Robbins & Beebe Chapter 6: Variables, Making Decisions
Regular Expressions “Mastering Regular Expressions” by Friedl BRE vs ERE vs PCRE
Unix Philosophy “The Art of Unix Programming” by Raymond Chapters 7-8: Multiprogramming, Minilanguages
Alpine Linux Alpine Wiki Comparison with other distros

Implementation Phases

Phase 1: Basic Shebang and Syntax Detection (2-3 hours)

Goals:

  • Detect bash shebangs
  • Detect [[ ]] extended tests
  • Detect array declarations

Shell Implementation:

#!/bin/sh
# alpine-check.sh - Check scripts for Alpine compatibility

# Check a single line for issues
check_line() {
    line="$1"
    lineno="$2"

    # Check for bash shebang (only line 1)
    if [ "$lineno" -eq 1 ]; then
        case "$line" in
            "#!/bin/bash"*|"#!/usr/bin/env bash"*)
                printf "Line %d: %s\n" "$lineno" "$line"
                printf "  [ERROR] Bash shebang - Alpine uses ash, not bash\n"
                printf "  FIX: Use #!/bin/sh or install: apk add bash\n\n"
                ;;
        esac
    fi

    # Check for [[ ]]
    if echo "$line" | grep -qE '\[\[.*\]\]'; then
        printf "Line %d: %s\n" "$lineno" "$line"
        printf "  [ERROR] Extended test [[ ]] not supported in ash\n"
        printf "  FIX: Use [ ] with proper quoting\n\n"
    fi

    # Check for arrays
    if echo "$line" | grep -qE '^[[:space:]]*[a-zA-Z_][a-zA-Z0-9_]*=\([^$]'; then
        printf "Line %d: %s\n" "$lineno" "$line"
        printf "  [ERROR] Array syntax not supported in ash\n"
        printf "  FIX: Use positional parameters: set -- val1 val2\n\n"
    fi
}

# Main: read file and check each line
check_file() {
    file="$1"
    lineno=0

    printf "=== Checking: %s ===\n\n" "$file"

    while IFS= read -r line || [ -n "$line" ]; do
        lineno=$((lineno + 1))
        check_line "$line" "$lineno"
    done < "$file"
}

# Entry point
for file in "$@"; do
    if [ -f "$file" ]; then
        check_file "$file"
    else
        printf "Error: %s not found\n" "$file" >&2
    fi
done

Test:

# Create test script
cat > /tmp/test.sh << 'EOF'
#!/bin/bash
arr=(one two three)
if [[ $x == "test" ]]; then
    echo "match"
fi
EOF

# Run checker
./alpine-check.sh /tmp/test.sh

Phase 2: Complete Bash-ism Detection (3-4 hours)

Goals:

  • Add all bash-ism patterns
  • Improve output formatting
  • Add severity levels

Additional patterns to implement:

# Here-strings
check_here_string() {
    if echo "$line" | grep -qE '<<<'; then
        report_error "$lineno" "Here-string (<<<) not supported"
        report_fix "Use echo and pipe: echo \"text\" | cmd"
    fi
}

# Process substitution
check_process_substitution() {
    if echo "$line" | grep -qE '<\(|>\('; then
        report_error "$lineno" "Process substitution <() >() not supported"
        report_fix "Use temporary files or named pipes"
    fi
}

# Brace expansion
check_brace_expansion() {
    if echo "$line" | grep -qE '\{[0-9]+\.\.[0-9]+\}|\{[a-z],[a-z]\}'; then
        report_error "$lineno" "Brace expansion not supported"
        report_fix "For numbers, use: seq 1 10"
        report_fix "For letters, list explicitly: a b c"
    fi
}

# == in test
check_double_equals() {
    if echo "$line" | grep -qE '\[\s.*=='; then
        report_warning "$lineno" "Use = not == for string comparison"
    fi
}

# source vs .
check_source() {
    if echo "$line" | grep -qE '^\s*source\s'; then
        report_warning "$lineno" "'source' may not work; use '.' instead"
    fi
}

# function keyword
check_function_keyword() {
    if echo "$line" | grep -qE '^\s*function\s+[a-zA-Z_]'; then
        report_warning "$lineno" "'function' keyword is not portable"
        report_fix "Use: name() { ... }"
    fi
}

Phase 3: GNU Utility Detection (2-3 hours)

Goals:

  • Detect GNU-specific command options
  • Provide BusyBox alternatives or apk install suggestions

Command patterns:

# grep -P (Perl regex)
check_grep_perl() {
    if echo "$line" | grep -qE 'grep\s+(-[a-zA-Z]*P|-P|--perl-regexp)'; then
        report_error "$lineno" "grep -P not supported by BusyBox"
        report_fix "Install GNU grep: apk add grep"
        report_fix "Or use ERE: grep -E '[0-9]{4}'"
    fi
}

# find -printf
check_find_printf() {
    if echo "$line" | grep -qE 'find\s.*-printf'; then
        report_error "$lineno" "find -printf not supported by BusyBox"
        report_fix "Install GNU find: apk add findutils"
        report_fix "Or use: find ... -exec basename {} \\;"
    fi
}

# sed -i with quotes
check_sed_inplace() {
    if echo "$line" | grep -qE "sed\s+(-i''|-i\"\")"; then
        report_warning "$lineno" "sed -i'' syntax differs in BusyBox"
        report_fix "Use: sed -i 's/old/new/' file"
    fi
}

# date -d
check_date_parse() {
    if echo "$line" | grep -qE 'date\s+(-d|--date)'; then
        report_warning "$lineno" "date -d syntax differs in BusyBox"
        report_fix "BusyBox uses: date -D FORMAT -d STRING"
        report_fix "Or install coreutils: apk add coreutils"
    fi
}

# cp --parents
check_cp_parents() {
    if echo "$line" | grep -qE 'cp\s+.*--parents'; then
        report_error "$lineno" "cp --parents not supported by BusyBox"
        report_fix "Install GNU coreutils: apk add coreutils"
    fi
}

Phase 4: Output Formatting and CLI (2-3 hours)

Goals:

  • Parse command-line options
  • Support JSON output
  • Generate summary statistics

CLI options:

usage() {
    cat << 'EOF'
Usage: alpine-check.sh [OPTIONS] FILE...

Options:
  -h, --help      Show this help message
  -j, --json      Output in JSON format
  -q, --quiet     Only show errors (no warnings/info)
  -v, --verbose   Show all findings including info
  -f, --fix       Show corrected script on stdout
  --severity LVL  Filter by severity (error, warning, info)
  --            End of options

Examples:
  alpine-check.sh script.sh
  alpine-check.sh -j *.sh
  alpine-check.sh --fix broken.sh > fixed.sh
EOF
}

# Parse options
while [ $# -gt 0 ]; do
    case "$1" in
        -h|--help)
            usage
            exit 0
            ;;
        -j|--json)
            OUTPUT_FORMAT="json"
            shift
            ;;
        -q|--quiet)
            VERBOSITY="quiet"
            shift
            ;;
        -v|--verbose)
            VERBOSITY="verbose"
            shift
            ;;
        -f|--fix)
            MODE="fix"
            shift
            ;;
        --)
            shift
            break
            ;;
        -*)
            printf "Unknown option: %s\n" "$1" >&2
            exit 2
            ;;
        *)
            break
            ;;
    esac
done

Phase 5: Testing and Edge Cases (2-3 hours)

Goals:

  • Create comprehensive test suite
  • Handle edge cases
  • Ensure no false positives

Test cases to include:

# test_samples/false_positives.sh
#!/bin/sh
# These should NOT trigger warnings

# String containing [[
echo "Use [[ for bash"

# Comment with array syntax
# arr=(this is a comment)

# Quoted grep -P
echo 'grep -P is not portable'

# Inside single quotes
sed 's/\[\[/test/'

# Heredoc content
cat << 'EOF'
arr=(inside heredoc)
[[ inside heredoc ]]
EOF
# test_samples/all_bashisms.sh
#!/bin/bash

# Arrays
arr=(one two three)
declare -a indexed
declare -A assoc
arr+=("four")
echo "${arr[@]}"
echo "${arr[0]}"
echo "${#arr[@]}"

# Extended test
[[ $var == "test" ]]
[[ $var =~ ^[0-9]+$ ]]
[[ -f file && -r file ]]

# Here-strings
cat <<< "here string"
read var <<< "input"

# Process substitution
diff <(sort file1) <(sort file2)
tee >(cat > log)

# Brace expansion
echo {1..10}
echo {a,b,c}.txt
mkdir -p dir/{sub1,sub2}

# Substrings
echo "${var:0:5}"
echo "${var:5}"
echo "${var: -3}"

# Replacements
echo "${var//old/new}"
echo "${var/old/new}"
echo "${var/#prefix/}"
echo "${var/%suffix/}"

# source vs .
source script.sh

# function keyword
function myfunc() {
    local -a arr
}

# Arithmetic
let x=1+2
(( x++ ))
$(( x + 1 ))

# select loop
select opt in a b c; do
    break
done

# local typed
local -i num
local -a arr
local -A assoc

# read options
read -a arr
read -t 5 var
read -p "prompt" var

# printf -v
printf -v var "value"

Testing Strategy

Unit Tests

For shell implementation (using shell functions):

#!/bin/sh
# test_rules.sh

test_check_double_brackets() {
    result=$(check_line '[[ $x == "test" ]]' 1)
    if echo "$result" | grep -q "ERROR"; then
        echo "PASS: Detects [[ ]]"
    else
        echo "FAIL: Should detect [[ ]]"
        return 1
    fi
}

test_false_positive_string() {
    result=$(check_line 'echo "Use [[ for bash"' 1)
    if echo "$result" | grep -q "ERROR"; then
        echo "FAIL: False positive in string"
        return 1
    else
        echo "PASS: No false positive in string"
    fi
}

test_array_detection() {
    result=$(check_line 'arr=(one two three)' 1)
    if echo "$result" | grep -q "Array"; then
        echo "PASS: Detects array"
    else
        echo "FAIL: Should detect array"
        return 1
    fi
}

test_command_substitution_not_array() {
    result=$(check_line 'var=$(command)' 1)
    if echo "$result" | grep -q "Array"; then
        echo "FAIL: False positive for command substitution"
        return 1
    else
        echo "PASS: Command substitution not detected as array"
    fi
}

# Run all tests
run_tests() {
    tests=0
    passed=0

    for test in test_check_double_brackets \
                test_false_positive_string \
                test_array_detection \
                test_command_substitution_not_array; do
        tests=$((tests + 1))
        if $test; then
            passed=$((passed + 1))
        fi
    done

    echo ""
    echo "Results: $passed/$tests passed"
    [ "$passed" -eq "$tests" ]
}

run_tests

Integration Tests

#!/bin/sh
# test_integration.sh

# Test 1: Script with known issues should report them
test_known_issues() {
    cat > /tmp/known_issues.sh << 'EOF'
#!/bin/bash
arr=(one two three)
[[ $x == "test" ]]
grep -P '\d+' file
EOF

    result=$(./alpine-check.sh /tmp/known_issues.sh)

    # Should find 4 issues: shebang, array, [[]], grep -P
    count=$(echo "$result" | grep -c '\[ERROR\]')
    if [ "$count" -ge 4 ]; then
        echo "PASS: Found expected errors"
    else
        echo "FAIL: Expected 4+ errors, found $count"
        return 1
    fi
}

# Test 2: Clean POSIX script should have no errors
test_clean_posix() {
    cat > /tmp/clean_posix.sh << 'EOF'
#!/bin/sh
var="test"
if [ "$var" = "test" ]; then
    echo "match"
fi
for x in one two three; do
    echo "$x"
done
EOF

    result=$(./alpine-check.sh /tmp/clean_posix.sh)

    count=$(echo "$result" | grep -c '\[ERROR\]')
    if [ "$count" -eq 0 ]; then
        echo "PASS: No false positives on clean script"
    else
        echo "FAIL: False positives on clean script"
        echo "$result"
        return 1
    fi
}

# Test 3: Exit codes
test_exit_codes() {
    ./alpine-check.sh /tmp/known_issues.sh > /dev/null 2>&1
    if [ $? -eq 1 ]; then
        echo "PASS: Exit code 1 for errors"
    else
        echo "FAIL: Wrong exit code for errors"
        return 1
    fi

    ./alpine-check.sh /tmp/clean_posix.sh > /dev/null 2>&1
    if [ $? -eq 0 ]; then
        echo "PASS: Exit code 0 for clean script"
    else
        echo "FAIL: Wrong exit code for clean script"
        return 1
    fi
}

# Run all integration tests
run_integration_tests() {
    test_known_issues && test_clean_posix && test_exit_codes
}

run_integration_tests

Real-World Validation

# Test against actual Alpine container
test_on_alpine() {
    docker run --rm alpine:latest sh -c '
        # This should fail
        [[ "test" == "test" ]]
    ' 2>&1
    # Expected: syntax error

    docker run --rm alpine:latest sh -c '
        # This should work
        [ "test" = "test" ]
    '
    # Expected: no error
}

Common Pitfalls and Debugging

Pitfall 1: Overly Aggressive Pattern Matching

Problem: Detecting [[ inside strings or comments.

Example:

echo "Use [[ for conditional"  # This is fine!
# This comment [[ is also fine ]]

Solution: Implement context awareness:

# Simple: skip lines that are clearly strings/comments
is_comment() {
    echo "$1" | grep -qE '^\s*#'
}

# Better: track quote state (complex)
in_quotes=0
for char in $(echo "$line" | sed 's/./& /g'); do
    case "$char" in
        \")
            in_quotes=$((1 - in_quotes))
            ;;
    esac
done

Pitfall 2: Missing Multi-line Constructs

Problem: Here-documents span multiple lines.

Example:

cat << 'EOF'
This is inside a heredoc
arr=(this shouldn't trigger)
[[ neither should this ]]
EOF

Solution: Track heredoc state:

in_heredoc=0
heredoc_end=""

while read line; do
    # Check for heredoc start
    if echo "$line" | grep -qE '<<\s*'"'"'?([A-Za-z_]+)'"'"'?\s*$'; then
        heredoc_end=$(echo "$line" | sed "s/.*<<\s*'\?\([A-Za-z_]*\)'\?.*/\1/")
        in_heredoc=1
        continue
    fi

    # Check for heredoc end
    if [ "$in_heredoc" -eq 1 ] && [ "$line" = "$heredoc_end" ]; then
        in_heredoc=0
        continue
    fi

    # Skip content inside heredocs
    [ "$in_heredoc" -eq 1 ] && continue

    # Normal line checking
    check_line "$line" "$lineno"
done

Pitfall 3: Regex Escaping

Problem: Special regex characters in patterns.

Example:

# Need to match: ${var[0]}
# Wrong: grep '[0-9]'  # The [ ] are special
# Right: grep '\[0-9\]'

Solution: Escape carefully or use fixed strings:

# Use fgrep for literal strings
echo "$line" | grep -F '[[' && echo "found"

# Or properly escape regex
echo "$line" | grep '\[\[' && echo "found"

Pitfall 4: Command Spanning Lines

Problem: Commands can span multiple lines with backslash.

Example:

find /var \
    -name "*.log" \
    -printf "%f\n"

Solution: Join continued lines:

join_continued_lines() {
    awk '{
        if (/\\$/) {
            gsub(/\\$/, "")
            line = line $0
        } else {
            print line $0
            line = ""
        }
    }'
}

Debugging Techniques

1. Test patterns in isolation:

# Test if pattern matches
pattern='grep\s+.*-P'
test_line='grep -P pattern file'
echo "$test_line" | grep -qE "$pattern" && echo "match" || echo "no match"

2. Verbose mode:

# Add debug output
if [ "$DEBUG" = "1" ]; then
    printf "DEBUG: Checking line %d: %s\n" "$lineno" "$line" >&2
fi

3. Compare with ShellCheck:

# ShellCheck can validate your findings
shellcheck -s sh script.sh
# Compare its output with yours

Extensions and Challenges

Extension 1: Auto-Fix Mode

Generate a corrected script with portable alternatives:

fix_line() {
    line="$1"

    # Fix [[ to [
    fixed=$(echo "$line" | sed 's/\[\[/[/g; s/\]\]/]/g; s/==/=/g')

    # Fix here-strings to echo|
    fixed=$(echo "$fixed" | sed 's/<<<\s*\(.*\)/echo \1 |/')

    echo "$fixed"
}

Extension 2: Dockerfile Analysis

Detect compatibility issues in Dockerfiles:

check_dockerfile() {
    file="$1"

    # Check base image
    grep -E '^FROM\s+alpine' "$file" || return

    # Check RUN commands for shell issues
    grep -E '^RUN\s' "$file" | while read line; do
        # Extract shell commands
        cmd=$(echo "$line" | sed 's/^RUN\s*//')
        check_shell_command "$cmd"
    done
}

Extension 3: CI Integration

Create GitHub Action or GitLab CI configuration:

# .github/workflows/alpine-check.yml
name: Alpine Compatibility Check

on: [push, pull_request]

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Check shell scripts
        run: |
          find . -name "*.sh" -exec ./alpine-check.sh {} \;

      - name: Test on Alpine
        run: |
          docker run -v $PWD:/scripts alpine:latest sh -c '
            for f in /scripts/*.sh; do
              sh -n "$f" || exit 1
            done
          '

Extension 4: Rule Configuration

Allow users to customize rules:

# ~/.alpine-check.conf or .alpine-check.yml
rules:
  BASH_DOUBLE_BRACKET:
    severity: error  # or warning, info, ignore
  SED_INPLACE:
    severity: warning

ignore_paths:
  - vendor/
  - node_modules/

ignore_patterns:
  - "*.bats"  # Skip bats test files

Challenge 1: Handle All Edge Cases

Make the tool handle:

  • Nested quotes
  • Escaped characters
  • Variable expansion in strings
  • Subshell contexts

Challenge 2: Python Implementation

Reimplement in Python with:

  • Proper AST parsing using shlex
  • Click CLI framework
  • pytest test suite
  • Type hints throughout

Challenge 3: Real-time Editor Integration

Create plugins for:

  • VS Code (using Language Server Protocol)
  • Vim (using ALE or similar)
  • Emacs (using Flycheck)

Real-World Connections

Container Ecosystems

Why Docker loves Alpine:

  • 5 MB base image vs 70+ MB for Ubuntu
  • Faster pulls, less storage, smaller attack surface
  • Your tool helps teams adopt Alpine confidently

Real usage:

  • Official Docker images often have Alpine variants
  • Kubernetes operators frequently use Alpine-based images
  • Serverless platforms use Alpine for cold-start optimization

Embedded Systems

BusyBox origins:

  • Created for Debian installer to fit on floppy disk
  • Now standard in routers, IoT devices, Android

Your tool’s relevance:

  • OpenWrt (router firmware) uses BusyBox
  • Yocto/Buildroot embedded Linux uses BusyBox
  • Scripts must be portable across these systems

CI/CD Pipelines

Common problem:

  • Developers test on macOS/Ubuntu
  • CI runs on Alpine containers
  • Scripts break in CI but pass locally

Your tool as a solution:

  • Run before committing
  • Integrate into pre-commit hooks
  • Catch issues before they reach CI

Security Auditing

Minimal attack surface:

  • Alpine’s small size means fewer vulnerabilities
  • BusyBox’s simple implementation has fewer CVEs than GNU
  • Your tool helps maintain this by avoiding unnecessary packages

Resources

Documentation

Tools

Books

  • “Effective Shell” by Dave Kerr - Modern shell practices
  • “Classic Shell Scripting” by Robbins & Beebe - Portable scripting
  • “The UNIX Programming Environment” by Kernighan & Pike - Unix fundamentals

Self-Assessment Checklist

Before considering this project complete, verify you can:

Core Functionality:

  • Detect bash shebangs and suggest alternatives
  • Identify [[ ]] extended test syntax
  • Catch array declarations and expansions
  • Flag here-strings (<<<)
  • Detect process substitution (<(), >())
  • Find brace expansion ({1..10})
  • Identify GNU-specific grep, sed, find options

Code Quality:

  • Tool runs on Alpine/BusyBox without modifications
  • No false positives on clean POSIX scripts
  • Handles edge cases (strings, comments, heredocs)
  • Clear, actionable error messages
  • Proper exit codes (0=ok, 1=errors, 2=warnings)

Understanding:

  • Can explain difference between POSIX sh and bash
  • Know which GNU options lack BusyBox support
  • Understand why Alpine uses BusyBox
  • Can write portable shell scripts yourself

Testing:

  • Unit tests for each pattern
  • Integration tests with real scripts
  • Validated against actual Alpine container

Submission/Completion Criteria

Minimum Viable Product (MVP):

  1. Shell script that detects at least 10 common bash-isms
  2. Detects at least 5 GNU-specific command options
  3. Provides clear error messages with fix suggestions
  4. Correctly handles basic edge cases (comments, strings)
  5. Works when run on Alpine Linux itself
  6. Includes at least 10 test cases

Full Implementation:

  1. All 20+ bash-isms from the pattern list
  2. All 15+ GNU options from the pattern list
  3. JSON output format option
  4. Auto-fix suggestion mode
  5. Recursive directory scanning
  6. Configuration file support
  7. 50+ test cases with edge cases
  8. Documentation with examples

Stretch Goals:

  1. Python implementation with AST parsing
  2. VS Code extension
  3. GitHub Action integration
  4. Support for other minimal shells (dash, mksh)
  5. Performance optimized for large codebases

Complete Bash-ism Reference

For implementation, here is the comprehensive list of bash-isms to detect:

Syntax Features

Feature Example POSIX Alternative
Extended test [[ $a == $b ]] [ "$a" = "$b" ]
Arrays arr=(a b c) set -- a b c
Associative arrays declare -A map Use files or separate variables
Here-strings cat <<< "text" echo "text" \| cat
Process substitution diff <(cmd1) <(cmd2) Use temporary files
Brace expansion {1..10} seq 1 10
Brace lists {a,b,c}.txt a.txt b.txt c.txt
$'...' quoting $'\n' $(printf '\n')
Regex match [[ $s =~ regex ]] echo "$s" \| grep -E regex
== in test [ "$a" == "$b" ] [ "$a" = "$b" ]
function keyword function name() {} name() {}
source source file.sh . file.sh
let let x=1+2 x=$((1+2))
(( )) (( x++ )) x=$((x+1))
Substring ${var:0:5} echo "$var" \| cut -c1-5
Replacement ${var//old/new} echo "$var" \| sed 's/old/new/g'
Default value ${var:-default} Supported (POSIX)
local -a local -a arr local arr (no type)
read -a read -a arr Read into separate variables
read -p read -p "prompt" printf "prompt"; read
printf -v printf -v var "%s" var=$(printf "%s")
select select opt in ... Use case statement
coproc coproc cmd Use named pipes

GNU Utility Options

Command GNU Option BusyBox Support Alternative
grep -P (Perl regex) Not supported Use -E (ERE)
grep --include Not supported Use find \| xargs grep
grep --exclude Not supported Use find \| xargs grep
sed -i'' Different syntax Use -i (no argument)
sed -z Not supported Install GNU sed
find -printf Not supported Use -exec with commands
find -regex Limited support Use -name patterns
cp --parents Not supported Install GNU coreutils
date -d "string" Different syntax Use -D format -d string
stat -c Limited formats Check available formats
xargs -r Different meaning Test behavior
ls -G Not supported Use awk to hide group
timeout N/A Included Works but check options
realpath N/A May need install Use readlink -f
mktemp -t Different handling Test behavior
sort -V Not supported Install GNU coreutils
head/tail -c +N May differ Test behavior

This project bridges the gap between “works on my machine” and “works in production.” By understanding what makes Alpine different, you’ll write better, more portable shell scripts everywhere.