Project 2: BusyBox Script Compatibility Checker

Quick Reference

Attribute	Details
Difficulty	Beginner
Time Estimate	Weekend (8-16 hours)
Primary Language	Shell (POSIX sh)
Alternative Language	Python
Knowledge Area	Shell Scripting / POSIX Compatibility
Software/Tools	BusyBox, ShellCheck, Alpine Linux, Docker
Main Book	“Effective Shell” by Dave Kerr
Prerequisites	Basic shell scripting, command-line familiarity

Learning Objectives

By completing this project, you will:

Understand the difference between POSIX shell and Bash and why it matters for portability
Identify common bash-isms that break on BusyBox/ash shells
Recognize GNU-specific command options that fail on BusyBox utilities
Write portable shell scripts that work across all Unix-like systems
Build a static analysis tool that detects compatibility issues before deployment
Master POSIX-compatible alternatives to common bash features

Theoretical Foundation

Core Concepts

1. What Is BusyBox?

BusyBox is a single executable (~1 MB) that provides stripped-down versions of approximately 400 Unix utilities. On Alpine Linux:

┌─────────────────────────────────────────────────────────────────────┐
│                        BusyBox Architecture                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   /bin/busybox (single binary, ~1 MB)                               │
│       │                                                             │
│       ├── ls → symlink to busybox                                   │
│       ├── grep → symlink to busybox                                 │
│       ├── awk → symlink to busybox                                  │
│       ├── sed → symlink to busybox                                  │
│       ├── sh (ash) → symlink to busybox                             │
│       ├── find → symlink to busybox                                 │
│       └── ... 400+ more applets                                     │
│                                                                     │
│   When you run "ls", BusyBox checks argv[0] and runs the            │
│   appropriate applet with simplified functionality.                  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Verification:

# On Alpine Linux
ls -la /bin/ls
# lrwxrwxrwx 1 root root 12 /bin/ls -> /bin/busybox

# List all BusyBox applets
busybox --list | wc -l
# ~400 applets

2. POSIX Shell vs Bash

POSIX (Portable Operating System Interface) defines a standard for Unix-like operating systems. The POSIX shell specification describes a minimal, portable shell. Bash extends POSIX with many features:

┌─────────────────────────────────────────────────────────────────────┐
│                   POSIX Shell vs Bash Features                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   POSIX Shell (sh, ash, dash)        Bash Extensions                │
│   ──────────────────────────         ───────────────                │
│   [ test ]                           [[ extended test ]]            │
│   $(command)                         $((arithmetic))                │
│   $var, ${var}                       ${var:0:5} substrings          │
│   case/esac                          arrays: arr=(a b c)            │
│   if/then/else/fi                    associative arrays             │
│   for/while/until                    brace expansion: {1..10}       │
│   local variables                    here-strings: <<<              │
│   functions                          process substitution: <()      │
│   pipes and redirection              regex: =~                      │
│   exit status ($?)                   BASHPID, BASH_VERSION          │
│                                                                     │
│   ✓ Works everywhere                 ✗ Bash-only features           │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

3. GNU vs BusyBox Utilities

GNU Coreutils provides feature-rich implementations with many options. BusyBox provides minimal implementations:

┌─────────────────────────────────────────────────────────────────────┐
│                  GNU Coreutils vs BusyBox Comparison                 │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   Command    GNU Options           BusyBox Support                  │
│   ───────    ───────────           ───────────────                  │
│   grep       -P (Perl regex)       ❌ Not supported                  │
│              -o (only matching)    ✓ Supported                      │
│              --color               ✓ Supported                      │
│                                                                     │
│   sed        -i'' (no backup)      ❌ Syntax differs                 │
│              -E (extended)         ✓ Supported                      │
│              -z (null-delimited)   ❌ Not supported                  │
│                                                                     │
│   find       -printf               ❌ Not supported                  │
│              -name                 ✓ Supported                      │
│              -exec                 ✓ Supported                      │
│                                                                     │
│   date       -d "string"           ⚠️ Different syntax               │
│              +%format              ✓ Supported                      │
│                                                                     │
│   xargs      -r (no-run-if-empty)  ⚠️ Different behavior             │
│              -0 (null-delimited)   ✓ Supported                      │
│                                                                     │
│   cp         --parents             ❌ Not supported                  │
│              -r (recursive)        ✓ Supported                      │
│                                                                     │
│   stat       -c (custom format)    ⚠️ Limited format options         │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Why This Matters

The Real-World Problem:

You write a Dockerfile that works perfectly on your Ubuntu development machine:

FROM alpine:latest
COPY build.sh /
RUN chmod +x /build.sh && /build.sh

But build.sh contains:

#!/bin/bash
arr=(one two three)
if [[ $DEBUG == "true" ]]; then
    grep -P '\d+' logfile.txt
fi

Result: The container build fails with cryptic errors:

/bin/bash: not found
syntax error: unexpected "(" (arrays)
[[: not found

The Cost:

Production deployments fail
CI/CD pipelines break
Hours spent debugging “works on my machine” issues
Security vulnerabilities from installing full bash/GNU tools unnecessarily

Historical Context

Why does Alpine use BusyBox?

Alpine Linux was designed for embedded systems, routers, and containers where:

Disk space is precious (5 MB base vs 70+ MB)
Attack surface must be minimal
Simplicity aids security auditing
Speed matters (fewer bytes to load)

BusyBox was created in 1996 by Bruce Perens for the Debian installer, specifically to fit on a single floppy disk (1.44 MB). It became the standard for embedded Linux.

Why does this create problems?

Most developers learn shell scripting on systems with:

Bash as the default shell (Ubuntu, macOS, RHEL)
GNU Coreutils with full feature sets
Tutorials that assume these tools

This creates a “works on my machine” problem when scripts move to Alpine.

Common Misconceptions

Misconception 1: “sh is just a symlink to bash”

On Ubuntu/Debian, /bin/sh is often a symlink to dash (Debian Almquist Shell), not bash. On Alpine, it’s ash (Almquist Shell from BusyBox). Neither supports bash-isms.

# Check what sh actually is
ls -la /bin/sh
# Alpine: /bin/sh -> /bin/busybox
# Ubuntu: /bin/sh -> /bin/dash

Misconception 2: “If it works in bash, it’s portable”

Bash is one of the most feature-rich shells. Testing only in bash guarantees nothing about portability.

Misconception 3: “Just install bash on Alpine”

While possible (apk add bash), this:

Increases image size by ~5 MB
Adds potential security vulnerabilities
Defeats the purpose of using Alpine
Doesn’t fix GNU utility compatibility issues

Misconception 4: “ShellCheck catches everything”

ShellCheck is excellent but primarily focuses on bash correctness. While it can warn about bash-isms when targeting sh, it doesn’t know about BusyBox-specific limitations in utilities like grep, sed, or find.

Project Specification

What You Will Build

A command-line tool that analyzes shell scripts for Alpine Linux/BusyBox compatibility issues. The tool will:

Detect bash-isms that fail on ash/POSIX shells
Identify GNU-specific command options not supported by BusyBox
Suggest POSIX-compatible alternatives
Provide severity levels (error, warning, info)
Support checking individual files or entire directories
Output in human-readable or machine-parseable formats

Functional Requirements

Shebang Detection
- Warn on #!/bin/bash or #!/usr/bin/env bash
- Suggest #!/bin/sh for portability
Bash Syntax Detection
- [[ ]] extended test syntax
- Arrays: arr=(a b c), ${arr[@]}, ${arr[0]}
- Associative arrays: declare -A
- Brace expansion: {1..10}, {a,b,c}
- Here-strings: <<<
- Process substitution: <(), >()
- $'...' ANSI-C quoting
- == in test (should be =)
- function name() (should be name())
- source (should be .)
- let and (( )) arithmetic
- ${var:offset:length} substring
- ${var/pattern/replacement} substitution
- local -a, local -A typed local variables
- read -a (read into array)
- printf -v (assign to variable)
- select loops
GNU Utility Option Detection
- grep -P (Perl regex)
- grep --include, grep --exclude
- sed -i'' (in-place with empty suffix)
- sed -z (null-delimited)
- find -printf
- find -regex (with certain patterns)
- cp --parents
- date -d (parse date string)
- stat -c (with unsupported format codes)
- xargs -r (different behavior)
- readlink -f (works but verify)
- ls -G (hide group)
- timeout command
- realpath command
- mktemp -t (template handling)
Report Generation
- Line number and column
- Problematic code snippet
- Severity level
- Suggested fix
- Reference documentation

Non-Functional Requirements

Process files under 10 MB within 1 second
Support UTF-8 encoded scripts
Zero dependencies for shell implementation
Python implementation should work with Python 3.6+
Exit code 0 if no errors, 1 if errors found, 2 if warnings only

Example Usage and Output

Input script (deploy.sh):

#!/bin/bash

# Configuration
declare -A config
config[host]="prod.example.com"
config[port]=22

# Arrays for servers
servers=(web1 web2 web3)

# Check if debug mode
if [[ $DEBUG == "true" ]]; then
    set -x
fi

# Find and process logs
find /var/log -name "*.log" -printf "%f\n" | while read file; do
    grep -P '\d{4}-\d{2}-\d{2}' "$file" > /tmp/dates.txt

    # Get line count
    count=$(wc -l <<< "$(cat /tmp/dates.txt)")
    echo "Found $count date patterns in $file"
done

# In-place update
sed -i'' 's/old/new/g' /etc/config.conf

# Check status
for server in ${servers[@]}; do
    timeout 5 nc -z "$server" 22
done

Tool output:

$ ./alpine-check.sh deploy.sh

=== Alpine/BusyBox Compatibility Report ===
File: deploy.sh
================================================================================

Line 1: #!/bin/bash
  [ERROR] Bash shebang detected
  Alpine uses ash (BusyBox shell). Bash is not installed by default.
  FIX: Change to #!/bin/sh or install bash with: apk add bash

Line 4: declare -A config
  [ERROR] Associative arrays require bash
  BusyBox ash does not support associative arrays.
  FIX: Use separate variables or a configuration file

Line 8: servers=(web1 web2 web3)
  [ERROR] Array syntax not supported in ash
  BusyBox ash does not support bash arrays.
  FIX: Use positional parameters: set -- web1 web2 web3
       Then access with: $1, $2, $3 or "$@"

Line 11: if [[ $DEBUG == "true" ]]; then
  [ERROR] Extended test [[ ]] not supported
  BusyBox ash uses [ ] (single brackets) only.
  FIX: Use [ "$DEBUG" = "true" ]
       Note: Use = not == for string comparison

Line 16: find /var/log -name "*.log" -printf "%f\n"
  [ERROR] -printf option not supported by BusyBox find
  BusyBox find lacks -printf formatting.
  FIX: Use: find /var/log -name "*.log" -exec basename {} \;
       Or: find /var/log -name "*.log" | xargs -n1 basename

Line 17: grep -P '\d{4}-\d{2}-\d{2}'
  [ERROR] -P (Perl regex) not supported by BusyBox grep
  BusyBox grep does not support Perl regular expressions.
  FIX: Install GNU grep: apk add grep
       Or use extended regex: grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}'

Line 20: count=$(wc -l <<< "$(cat /tmp/dates.txt)")
  [ERROR] Here-string (<<<) not supported in ash
  BusyBox ash does not support here-strings.
  FIX: Use echo and pipe: count=$(cat /tmp/dates.txt | wc -l)
       Or: count=$(wc -l < /tmp/dates.txt)

Line 25: sed -i'' 's/old/new/g' /etc/config.conf
  [WARNING] sed -i syntax differs in BusyBox
  BusyBox sed requires: sed -i 's/old/new/g' (no quotes after -i)
  FIX: Use: sed -i 's/old/new/g' /etc/config.conf

Line 28: for server in ${servers[@]}; do
  [ERROR] Array expansion ${arr[@]} requires bash
  BusyBox ash does not support arrays.
  FIX: Use positional parameters with: for server in "$@"; do

Line 29: timeout 5 nc -z "$server" 22
  [WARNING] timeout command may not be available
  BusyBox includes timeout but behavior may differ.
  Verify with: busybox timeout --help

================================================================================
Summary:
  Errors:   8
  Warnings: 2
  Info:     0

This script will NOT work on Alpine Linux without modifications.
Run with --fix to see a corrected version.

Real World Outcome

After running your tool and fixing the issues, the corrected script:

#!/bin/sh

# Configuration (using separate variables instead of associative array)
config_host="prod.example.com"
config_port=22

# Servers using positional parameters
set -- web1 web2 web3

# Check if debug mode (POSIX-compatible test)
if [ "$DEBUG" = "true" ]; then
    set -x
fi

# Find and process logs (without -printf)
find /var/log -name "*.log" -exec basename {} \; | while read file; do
    grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}' "/var/log/$file" > /tmp/dates.txt

    # Get line count (without here-string)
    count=$(wc -l < /tmp/dates.txt)
    echo "Found $count date patterns in $file"
done

# In-place update (BusyBox syntax)
sed -i 's/old/new/g' /etc/config.conf

# Check status (using positional parameters)
for server in "$@"; do
    timeout 5 nc -z "$server" 22
done

Solution Architecture

High-Level Design

┌─────────────────────────────────────────────────────────────────────────────┐
│                     BusyBox Script Compatibility Checker                     │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              Input Handler                                   │
│  • Parse command-line arguments (--json, --fix, --severity)                 │
│  • Read file(s) or stdin                                                    │
│  • Handle encoding (UTF-8)                                                  │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            Pattern Database                                  │
│  ┌─────────────────────┐  ┌─────────────────────┐  ┌──────────────────────┐ │
│  │   Bash-ism Rules    │  │   GNU Option Rules  │  │   Shebang Rules      │ │
│  │   (syntax errors)   │  │   (command opts)    │  │   (interpreter)      │ │
│  └─────────────────────┘  └─────────────────────┘  └──────────────────────┘ │
│                                                                              │
│  Each rule contains:                                                         │
│  • Pattern (regex or string match)                                          │
│  • Severity (error, warning, info)                                          │
│  • Description                                                              │
│  • Fix suggestion                                                           │
│  • Reference URL                                                            │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              Line Scanner                                    │
│  • For each line in file:                                                   │
│    - Skip comments (but check shebang on line 1)                            │
│    - Apply each rule pattern                                                │
│    - Record matches with line/column info                                   │
│  • Handle multi-line constructs                                             │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                            Report Generator                                  │
│  • Aggregate findings                                                        │
│  • Sort by severity/line number                                             │
│  • Format output (human-readable, JSON, or fixed script)                    │
│  • Set exit code                                                            │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                                Output                                        │
│  • stdout: Report or fixed script                                           │
│  • stderr: Error messages                                                   │
│  • Exit code: 0 (ok), 1 (errors), 2 (warnings only)                         │
└─────────────────────────────────────────────────────────────────────────────┘

Key Components

1. Pattern Database Structure

Each compatibility rule is defined with:

Rule:
  id: unique identifier (e.g., "BASH_ARRAY")
  pattern: regex or literal string to match
  severity: error | warning | info
  message: human-readable description
  fix: suggested replacement or workaround
  reference: URL to documentation
  context: where to check (shebang, line, command)

2. Scanner Logic

For each file:
  Read all lines

  For line 1:
    Check shebang rules

  For each line:
    Skip if starts with # (comment) and line > 1

    For each bash-ism rule:
      If pattern matches:
        Record finding

    For each command pattern:
      If command with GNU option found:
        Record finding

  Return list of findings

Data Structures

Shell Implementation (using shell constructs):

# Findings stored as formatted strings
# Format: LINE:SEVERITY:RULE_ID:MESSAGE

# Example:
# "11:error:BASH_DOUBLE_BRACKET:Extended test [[ ]] not supported"

Python Implementation:

@dataclass
class Finding:
    file_path: str
    line_number: int
    column: int
    severity: str  # "error", "warning", "info"
    rule_id: str
    message: str
    snippet: str
    fix: str
    reference: str

@dataclass
class Rule:
    id: str
    pattern: re.Pattern
    severity: str
    message: str
    fix: str
    reference: str = ""
    context: str = "line"  # "shebang", "line", "command"

Algorithm Overview

Main Algorithm:

function check_file(path):
    lines = read_file(path)
    findings = []

    # Check shebang
    if lines[0] matches bash shebang:
        findings.append(bash_shebang_finding)

    # Check each line
    for i, line in enumerate(lines):
        # Skip comments (except shebang)
        if line starts with "#" and i > 0:
            continue

        # Check bash-isms
        for rule in bash_rules:
            if rule.pattern matches line:
                findings.append(create_finding(rule, line, i))

        # Check GNU options
        for cmd_rule in command_rules:
            if cmd_rule.pattern matches line:
                findings.append(create_finding(cmd_rule, line, i))

    return findings

Implementation Guide

Development Environment Setup

Option 1: Shell-only development (recommended for learning)

# Alpine container for testing
docker run -it --rm alpine:latest sh

# Create work directory
mkdir -p /workspace
cd /workspace

# Verify BusyBox shell
echo $0  # Should show: /bin/ash

# Test which features are missing
[[ "test" == "test" ]] && echo "bash" || echo "not bash"
# Result: syntax error - proves [[ ]] doesn't work

Option 2: Python development

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# No external dependencies needed for basic version
# Optional: install pytest for testing
pip install pytest

Test Container Setup:

# Dockerfile.test
FROM alpine:latest
RUN apk add --no-cache coreutils grep sed findutils
# Now we have both BusyBox AND GNU versions for comparison
WORKDIR /workspace
COPY . .

Project Structure

Shell Implementation:

alpine-check/
├── alpine-check.sh          # Main script
├── rules/
│   ├── bash-isms.sh         # Bash syntax patterns
│   └── gnu-options.sh       # GNU utility patterns
├── tests/
│   ├── test-samples/        # Sample scripts to check
│   │   ├── bash-only.sh     # Script with bash-isms
│   │   ├── gnu-heavy.sh     # Script with GNU options
│   │   └── posix-clean.sh   # POSIX-compliant script
│   └── run-tests.sh         # Test runner
└── README.md

Python Implementation:

alpine_check/
├── alpine_check/
│   ├── __init__.py
│   ├── main.py              # CLI entry point
│   ├── scanner.py           # File scanner
│   ├── rules.py             # Rule definitions
│   ├── reporter.py          # Output formatting
│   └── fixer.py             # Auto-fix suggestions
├── tests/
│   ├── test_scanner.py
│   ├── test_rules.py
│   └── fixtures/            # Sample scripts
├── pyproject.toml
└── README.md

The Core Question You’re Answering

“How do I know if my shell script will work on Alpine Linux before I deploy it?”

This is the gap you’re filling. Currently, developers must:

Write a script on their dev machine (Ubuntu, macOS)
Deploy to Alpine container
Watch it fail
Debug cryptic syntax errors
Search Stack Overflow
Repeat

Your tool provides immediate feedback during development.

Concepts to Understand First

Before implementing, ensure you can answer:

What is a shebang and how does it affect script execution?
- The #! line tells the kernel which interpreter to use
- #!/bin/bash explicitly requires bash; fails if bash is missing
What is the difference between [ and [[?
- [ is a command (actually /usr/bin/[ or built-in)
- [[ is bash syntax, not a command
- [[ allows ==, pattern matching, &&/|| without quoting
Why do arrays fail in POSIX sh?
- Arrays are a bash extension
- POSIX only defines $* and $@ for positional parameters
What makes a regex “Perl-compatible”?
- PCRE includes features like \d, \w, lookahead, etc.
- BRE and ERE (POSIX) use [0-9], [a-zA-Z_], etc.

Questions to Guide Your Design

Architecture Questions:

Should the tool be a single file or modular?
How do you handle multi-line constructs like here-documents?
Should you parse the AST or use regex on raw lines?

Feature Questions:

Should you detect issues inside strings? (echo "[[ test ]]" is fine)
How do you handle sourced/included files?
Should you support configuration files for custom rules?

Output Questions:

What output formats are needed (text, JSON, SARIF)?
How verbose should the default output be?
Should the tool suggest fixes or just report issues?

Thinking Exercise

Before writing code, trace through this script manually:

#!/bin/bash
files=($(find . -name "*.txt"))
for f in "${files[@]}"; do
    if [[ -f $f ]]; then
        count=$(wc -l <<< "$(cat $f)")
        echo "$f: $count lines"
    fi
done

Exercise:

List every line that would fail on Alpine
For each issue, identify the exact error ash would produce
Write a POSIX-compatible version

Expected Analysis:

Line 1: #!/bin/bash - fails, bash not installed
Line 2: files=(...) - syntax error, arrays not supported
Line 2: $(...) inside array - would work if arrays worked
Line 3: "${files[@]}" - array expansion, not supported
Line 4: [[ -f $f ]] - extended test, not supported
Line 5: <<< - here-string, not supported

Hints in Layers

Hint 1 - Starting Point: Begin with the simplest check: detecting bash shebangs. This is a literal string match:

check_shebang() {
    line="$1"
    case "$line" in
        "#!/bin/bash"*|"#!/usr/bin/env bash"*)
            echo "Bash shebang detected"
            ;;
    esac
}

Hint 2 - Pattern Matching Approach: For shell implementation, use case statements with glob patterns. For more complex patterns, use grep -E:

check_double_brackets() {
    line="$1"
    # Match [[ anywhere in the line (not in strings)
    echo "$line" | grep -qE '\[\[.*\]\]' && echo "Found [[ ]]"
}

Hint 3 - Command Option Detection: Extract the command first, then check its options:

check_grep_options() {
    line="$1"
    # Check if line contains grep
    case "$line" in
        *grep*)
            # Check for -P option
            echo "$line" | grep -qE 'grep[[:space:]]+-[^-]*P' && \
                echo "grep -P not supported"
            ;;
    esac
}

Hint 4 - Complete Rule Structure: For the shell version, store rules as functions with embedded metadata:

# Rule: BASH_ARRAY
# Severity: error
# Pattern: name=(...)
check_bash_array() {
    line="$1"
    lineno="$2"

    # Match: var=(...) but not var=$(...)
    if echo "$line" | grep -qE '[a-zA-Z_][a-zA-Z0-9_]*=\([^$]'; then
        echo "$lineno:error:BASH_ARRAY:Array syntax not supported"
        echo "  FIX: Use positional parameters: set -- val1 val2 val3"
    fi
}

Interview Questions This Project Prepares You For

“What’s the difference between POSIX shell and Bash?”
- Answer: Bash extends POSIX with arrays, [[ ]], process substitution, here-strings, etc. POSIX sh is the portable subset.
“Why might a shell script work on Ubuntu but fail on Alpine?”
- Answer: Alpine uses BusyBox ash (POSIX sh) by default, not bash. Also, utilities are BusyBox versions with fewer options.
“How would you make a shell script portable across different Unix systems?”
- Answer: Use #!/bin/sh, avoid bash-isms, use only POSIX-defined utility options, test on multiple systems.
“What’s the difference between grep -E and grep -P?”
- Answer: -E uses Extended Regular Expressions (ERE, POSIX), -P uses Perl-Compatible Regular Expressions (PCRE). PCRE is not portable.
“How do you handle arrays in POSIX sh?”
- Answer: Use positional parameters (set -- val1 val2, then $1, $@) or newline-separated strings.
“What does BusyBox provide and why is it used?”
- Answer: BusyBox is a single binary providing 400+ Unix utilities. Used in embedded systems and Alpine for minimal size.
“How would you build a static analysis tool for shell scripts?”
- Answer: Either parse the shell grammar into an AST (complex) or use pattern matching on lines (simpler but less accurate).

Books That Will Help

Topic	Book	Chapter/Section
POSIX Shell	“Effective Shell” by Dave Kerr	Chapters on portability
Shell Scripting	“Classic Shell Scripting” by Robbins & Beebe	Chapter 6: Variables, Making Decisions
Regular Expressions	“Mastering Regular Expressions” by Friedl	BRE vs ERE vs PCRE
Unix Philosophy	“The Art of Unix Programming” by Raymond	Chapters 7-8: Multiprogramming, Minilanguages
Alpine Linux	Alpine Wiki	Comparison with other distros

Implementation Phases

Phase 1: Basic Shebang and Syntax Detection (2-3 hours)

Goals:

Detect bash shebangs
Detect [[ ]] extended tests
Detect array declarations

Shell Implementation:

#!/bin/sh
# alpine-check.sh - Check scripts for Alpine compatibility

# Check a single line for issues
check_line() {
    line="$1"
    lineno="$2"

    # Check for bash shebang (only line 1)
    if [ "$lineno" -eq 1 ]; then
        case "$line" in
            "#!/bin/bash"*|"#!/usr/bin/env bash"*)
                printf "Line %d: %s\n" "$lineno" "$line"
                printf "  [ERROR] Bash shebang - Alpine uses ash, not bash\n"
                printf "  FIX: Use #!/bin/sh or install: apk add bash\n\n"
                ;;
        esac
    fi

    # Check for [[ ]]
    if echo "$line" | grep -qE '\[\[.*\]\]'; then
        printf "Line %d: %s\n" "$lineno" "$line"
        printf "  [ERROR] Extended test [[ ]] not supported in ash\n"
        printf "  FIX: Use [ ] with proper quoting\n\n"
    fi

    # Check for arrays
    if echo "$line" | grep -qE '^[[:space:]]*[a-zA-Z_][a-zA-Z0-9_]*=\([^$]'; then
        printf "Line %d: %s\n" "$lineno" "$line"
        printf "  [ERROR] Array syntax not supported in ash\n"
        printf "  FIX: Use positional parameters: set -- val1 val2\n\n"
    fi
}

# Main: read file and check each line
check_file() {
    file="$1"
    lineno=0

    printf "=== Checking: %s ===\n\n" "$file"

    while IFS= read -r line || [ -n "$line" ]; do
        lineno=$((lineno + 1))
        check_line "$line" "$lineno"
    done < "$file"
}

# Entry point
for file in "$@"; do
    if [ -f "$file" ]; then
        check_file "$file"
    else
        printf "Error: %s not found\n" "$file" >&2
    fi
done

Test:

# Create test script
cat > /tmp/test.sh << 'EOF'
#!/bin/bash
arr=(one two three)
if [[ $x == "test" ]]; then
    echo "match"
fi
EOF

# Run checker
./alpine-check.sh /tmp/test.sh

Phase 2: Complete Bash-ism Detection (3-4 hours)

Goals:

Add all bash-ism patterns
Improve output formatting
Add severity levels

Additional patterns to implement:

# Here-strings
check_here_string() {
    if echo "$line" | grep -qE '<<<'; then
        report_error "$lineno" "Here-string (<<<) not supported"
        report_fix "Use echo and pipe: echo \"text\" | cmd"
    fi
}

# Process substitution
check_process_substitution() {
    if echo "$line" | grep -qE '<\(|>\('; then
        report_error "$lineno" "Process substitution <() >() not supported"
        report_fix "Use temporary files or named pipes"
    fi
}

# Brace expansion
check_brace_expansion() {
    if echo "$line" | grep -qE '\{[0-9]+\.\.[0-9]+\}|\{[a-z],[a-z]\}'; then
        report_error "$lineno" "Brace expansion not supported"
        report_fix "For numbers, use: seq 1 10"
        report_fix "For letters, list explicitly: a b c"
    fi
}

# == in test
check_double_equals() {
    if echo "$line" | grep -qE '\[\s.*=='; then
        report_warning "$lineno" "Use = not == for string comparison"
    fi
}

# source vs.
check_source() {
    if echo "$line" | grep -qE '^\s*source\s'; then
        report_warning "$lineno" "'source' may not work; use '.' instead"
    fi
}

# function keyword
check_function_keyword() {
    if echo "$line" | grep -qE '^\s*function\s+[a-zA-Z_]'; then
        report_warning "$lineno" "'function' keyword is not portable"
        report_fix "Use: name() { ... }"
    fi
}

Phase 3: GNU Utility Detection (2-3 hours)

Goals:

Detect GNU-specific command options
Provide BusyBox alternatives or apk install suggestions

Command patterns:

# grep -P (Perl regex)
check_grep_perl() {
    if echo "$line" | grep -qE 'grep\s+(-[a-zA-Z]*P|-P|--perl-regexp)'; then
        report_error "$lineno" "grep -P not supported by BusyBox"
        report_fix "Install GNU grep: apk add grep"
        report_fix "Or use ERE: grep -E '[0-9]{4}'"
    fi
}

# find -printf
check_find_printf() {
    if echo "$line" | grep -qE 'find\s.*-printf'; then
        report_error "$lineno" "find -printf not supported by BusyBox"
        report_fix "Install GNU find: apk add findutils"
        report_fix "Or use: find ... -exec basename {} \\;"
    fi
}

# sed -i with quotes
check_sed_inplace() {
    if echo "$line" | grep -qE "sed\s+(-i''|-i\"\")"; then
        report_warning "$lineno" "sed -i'' syntax differs in BusyBox"
        report_fix "Use: sed -i 's/old/new/' file"
    fi
}

# date -d
check_date_parse() {
    if echo "$line" | grep -qE 'date\s+(-d|--date)'; then
        report_warning "$lineno" "date -d syntax differs in BusyBox"
        report_fix "BusyBox uses: date -D FORMAT -d STRING"
        report_fix "Or install coreutils: apk add coreutils"
    fi
}

# cp --parents
check_cp_parents() {
    if echo "$line" | grep -qE 'cp\s+.*--parents'; then
        report_error "$lineno" "cp --parents not supported by BusyBox"
        report_fix "Install GNU coreutils: apk add coreutils"
    fi
}

Phase 4: Output Formatting and CLI (2-3 hours)

Goals:

Parse command-line options
Support JSON output
Generate summary statistics

CLI options:

usage() {
    cat << 'EOF'
Usage: alpine-check.sh [OPTIONS] FILE...

Options:
  -h, --help      Show this help message
  -j, --json      Output in JSON format
  -q, --quiet     Only show errors (no warnings/info)
  -v, --verbose   Show all findings including info
  -f, --fix       Show corrected script on stdout
  --severity LVL  Filter by severity (error, warning, info)
  --            End of options

Examples:
  alpine-check.sh script.sh
  alpine-check.sh -j *.sh
  alpine-check.sh --fix broken.sh > fixed.sh
EOF
}

# Parse options
while [ $# -gt 0 ]; do
    case "$1" in
        -h|--help)
            usage
            exit 0
            ;;
        -j|--json)
            OUTPUT_FORMAT="json"
            shift
            ;;
        -q|--quiet)
            VERBOSITY="quiet"
            shift
            ;;
        -v|--verbose)
            VERBOSITY="verbose"
            shift
            ;;
        -f|--fix)
            MODE="fix"
            shift
            ;;
        --)
            shift
            break
            ;;
        -*)
            printf "Unknown option: %s\n" "$1" >&2
            exit 2
            ;;
        *)
            break
            ;;
    esac
done

Phase 5: Testing and Edge Cases (2-3 hours)

Goals:

Create comprehensive test suite
Handle edge cases
Ensure no false positives

Test cases to include:

# test_samples/false_positives.sh
#!/bin/sh
# These should NOT trigger warnings

# String containing [[
echo "Use [[ for bash"

# Comment with array syntax
# arr=(this is a comment)

# Quoted grep -P
echo 'grep -P is not portable'

# Inside single quotes
sed 's/\[\[/test/'

# Heredoc content
cat << 'EOF'
arr=(inside heredoc)
[[ inside heredoc ]]
EOF

# test_samples/all_bashisms.sh
#!/bin/bash

# Arrays
arr=(one two three)
declare -a indexed
declare -A assoc
arr+=("four")
echo "${arr[@]}"
echo "${arr[0]}"
echo "${#arr[@]}"

# Extended test
[[ $var == "test" ]]
[[ $var =~ ^[0-9]+$ ]]
[[ -f file && -r file ]]

# Here-strings
cat <<< "here string"
read var <<< "input"

# Process substitution
diff <(sort file1) <(sort file2)
tee >(cat > log)

# Brace expansion
echo {1..10}
echo {a,b,c}.txt
mkdir -p dir/{sub1,sub2}

# Substrings
echo "${var:0:5}"
echo "${var:5}"
echo "${var: -3}"

# Replacements
echo "${var//old/new}"
echo "${var/old/new}"
echo "${var/#prefix/}"
echo "${var/%suffix/}"

# source vs.
source script.sh

# function keyword
function myfunc() {
    local -a arr
}

# Arithmetic
let x=1+2
(( x++ ))
$(( x + 1 ))

# select loop
select opt in a b c; do
    break
done

# local typed
local -i num
local -a arr
local -A assoc

# read options
read -a arr
read -t 5 var
read -p "prompt" var

# printf -v
printf -v var "value"

Testing Strategy

Unit Tests

For shell implementation (using shell functions):

#!/bin/sh
# test_rules.sh

test_check_double_brackets() {
    result=$(check_line '[[ $x == "test" ]]' 1)
    if echo "$result" | grep -q "ERROR"; then
        echo "PASS: Detects [[ ]]"
    else
        echo "FAIL: Should detect [[ ]]"
        return 1
    fi
}

test_false_positive_string() {
    result=$(check_line 'echo "Use [[ for bash"' 1)
    if echo "$result" | grep -q "ERROR"; then
        echo "FAIL: False positive in string"
        return 1
    else
        echo "PASS: No false positive in string"
    fi
}

test_array_detection() {
    result=$(check_line 'arr=(one two three)' 1)
    if echo "$result" | grep -q "Array"; then
        echo "PASS: Detects array"
    else
        echo "FAIL: Should detect array"
        return 1
    fi
}

test_command_substitution_not_array() {
    result=$(check_line 'var=$(command)' 1)
    if echo "$result" | grep -q "Array"; then
        echo "FAIL: False positive for command substitution"
        return 1
    else
        echo "PASS: Command substitution not detected as array"
    fi
}

# Run all tests
run_tests() {
    tests=0
    passed=0

    for test in test_check_double_brackets \
                test_false_positive_string \
                test_array_detection \
                test_command_substitution_not_array; do
        tests=$((tests + 1))
        if $test; then
            passed=$((passed + 1))
        fi
    done

    echo ""
    echo "Results: $passed/$tests passed"
    [ "$passed" -eq "$tests" ]
}

run_tests

Integration Tests

#!/bin/sh
# test_integration.sh

# Test 1: Script with known issues should report them
test_known_issues() {
    cat > /tmp/known_issues.sh << 'EOF'
#!/bin/bash
arr=(one two three)
[[ $x == "test" ]]
grep -P '\d+' file
EOF

    result=$(./alpine-check.sh /tmp/known_issues.sh)

    # Should find 4 issues: shebang, array, [[]], grep -P
    count=$(echo "$result" | grep -c '\[ERROR\]')
    if [ "$count" -ge 4 ]; then
        echo "PASS: Found expected errors"
    else
        echo "FAIL: Expected 4+ errors, found $count"
        return 1
    fi
}

# Test 2: Clean POSIX script should have no errors
test_clean_posix() {
    cat > /tmp/clean_posix.sh << 'EOF'
#!/bin/sh
var="test"
if [ "$var" = "test" ]; then
    echo "match"
fi
for x in one two three; do
    echo "$x"
done
EOF

    result=$(./alpine-check.sh /tmp/clean_posix.sh)

    count=$(echo "$result" | grep -c '\[ERROR\]')
    if [ "$count" -eq 0 ]; then
        echo "PASS: No false positives on clean script"
    else
        echo "FAIL: False positives on clean script"
        echo "$result"
        return 1
    fi
}

# Test 3: Exit codes
test_exit_codes() {
    ./alpine-check.sh /tmp/known_issues.sh > /dev/null 2>&1
    if [ $? -eq 1 ]; then
        echo "PASS: Exit code 1 for errors"
    else
        echo "FAIL: Wrong exit code for errors"
        return 1
    fi

    ./alpine-check.sh /tmp/clean_posix.sh > /dev/null 2>&1
    if [ $? -eq 0 ]; then
        echo "PASS: Exit code 0 for clean script"
    else
        echo "FAIL: Wrong exit code for clean script"
        return 1
    fi
}

# Run all integration tests
run_integration_tests() {
    test_known_issues && test_clean_posix && test_exit_codes
}

run_integration_tests

Real-World Validation

# Test against actual Alpine container
test_on_alpine() {
    docker run --rm alpine:latest sh -c '
        # This should fail
        [[ "test" == "test" ]]
    ' 2>&1
    # Expected: syntax error

    docker run --rm alpine:latest sh -c '
        # This should work
        [ "test" = "test" ]
    '
    # Expected: no error
}

Common Pitfalls and Debugging

Pitfall 1: Overly Aggressive Pattern Matching

Problem: Detecting [[ inside strings or comments.

Example:

echo "Use [[ for conditional"  # This is fine!
# This comment [[ is also fine ]]

Solution: Implement context awareness:

# Simple: skip lines that are clearly strings/comments
is_comment() {
    echo "$1" | grep -qE '^\s*#'
}

# Better: track quote state (complex)
in_quotes=0
for char in $(echo "$line" | sed 's/./& /g'); do
    case "$char" in
        \")
            in_quotes=$((1 - in_quotes))
            ;;
    esac
done

Pitfall 2: Missing Multi-line Constructs

Problem: Here-documents span multiple lines.

Example:

cat << 'EOF'
This is inside a heredoc
arr=(this shouldn't trigger)
[[ neither should this ]]
EOF

Solution: Track heredoc state:

in_heredoc=0
heredoc_end=""

while read line; do
    # Check for heredoc start
    if echo "$line" | grep -qE '<<\s*'"'"'?([A-Za-z_]+)'"'"'?\s*$'; then
        heredoc_end=$(echo "$line" | sed "s/.*<<\s*'\?\([A-Za-z_]*\)'\?.*/\1/")
        in_heredoc=1
        continue
    fi

    # Check for heredoc end
    if [ "$in_heredoc" -eq 1 ] && [ "$line" = "$heredoc_end" ]; then
        in_heredoc=0
        continue
    fi

    # Skip content inside heredocs
    [ "$in_heredoc" -eq 1 ] && continue

    # Normal line checking
    check_line "$line" "$lineno"
done

Pitfall 3: Regex Escaping

Problem: Special regex characters in patterns.

Example:

# Need to match: ${var[0]}
# Wrong: grep '[0-9]' # The [ ] are special
# Right: grep '\[0-9\]'

Solution: Escape carefully or use fixed strings:

# Use fgrep for literal strings
echo "$line" | grep -F '[[' && echo "found"

# Or properly escape regex
echo "$line" | grep '\[\[' && echo "found"

Pitfall 4: Command Spanning Lines

Problem: Commands can span multiple lines with backslash.

Example:

find /var \
    -name "*.log" \
    -printf "%f\n"

Solution: Join continued lines:

join_continued_lines() {
    awk '{
        if (/\\$/) {
            gsub(/\\$/, "")
            line = line $0
        } else {
            print line $0
            line = ""
        }
    }'
}

Debugging Techniques

1. Test patterns in isolation:

# Test if pattern matches
pattern='grep\s+.*-P'
test_line='grep -P pattern file'
echo "$test_line" | grep -qE "$pattern" && echo "match" || echo "no match"

2. Verbose mode:

# Add debug output
if [ "$DEBUG" = "1" ]; then
    printf "DEBUG: Checking line %d: %s\n" "$lineno" "$line" >&2
fi

3. Compare with ShellCheck:

# ShellCheck can validate your findings
shellcheck -s sh script.sh
# Compare its output with yours

Extensions and Challenges

Extension 1: Auto-Fix Mode

Generate a corrected script with portable alternatives:

fix_line() {
    line="$1"

    # Fix [[ to [
    fixed=$(echo "$line" | sed 's/\[\[/[/g; s/\]\]/]/g; s/==/=/g')

    # Fix here-strings to echo|
    fixed=$(echo "$fixed" | sed 's/<<<\s*\(.*\)/echo \1 |/')

    echo "$fixed"
}

Extension 2: Dockerfile Analysis

Detect compatibility issues in Dockerfiles:

check_dockerfile() {
    file="$1"

    # Check base image
    grep -E '^FROM\s+alpine' "$file" || return

    # Check RUN commands for shell issues
    grep -E '^RUN\s' "$file" | while read line; do
        # Extract shell commands
        cmd=$(echo "$line" | sed 's/^RUN\s*//')
        check_shell_command "$cmd"
    done
}

Extension 3: CI Integration

Create GitHub Action or GitLab CI configuration:

# .github/workflows/alpine-check.yml
name: Alpine Compatibility Check

on: [push, pull_request]

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Check shell scripts
        run: |
          find . -name "*.sh" -exec ./alpine-check.sh {} \;

      - name: Test on Alpine
        run: |
          docker run -v $PWD:/scripts alpine:latest sh -c '
            for f in /scripts/*.sh; do
              sh -n "$f" || exit 1
            done
          '

Extension 4: Rule Configuration

Allow users to customize rules:

# ~/.alpine-check.conf or.alpine-check.yml
rules:
  BASH_DOUBLE_BRACKET:
    severity: error  # or warning, info, ignore
  SED_INPLACE:
    severity: warning

ignore_paths:
  - vendor/
  - node_modules/

ignore_patterns:
  - "*.bats"  # Skip bats test files

Challenge 1: Handle All Edge Cases

Make the tool handle:

Nested quotes
Escaped characters
Variable expansion in strings
Subshell contexts

Challenge 2: Python Implementation

Reimplement in Python with:

Proper AST parsing using shlex
Click CLI framework
pytest test suite
Type hints throughout

Challenge 3: Real-time Editor Integration

Create plugins for:

VS Code (using Language Server Protocol)
Vim (using ALE or similar)
Emacs (using Flycheck)

Real-World Connections

Container Ecosystems

Why Docker loves Alpine:

5 MB base image vs 70+ MB for Ubuntu
Faster pulls, less storage, smaller attack surface
Your tool helps teams adopt Alpine confidently

Real usage:

Official Docker images often have Alpine variants
Kubernetes operators frequently use Alpine-based images
Serverless platforms use Alpine for cold-start optimization

Embedded Systems

BusyBox origins:

Created for Debian installer to fit on floppy disk
Now standard in routers, IoT devices, Android

Your tool’s relevance:

OpenWrt (router firmware) uses BusyBox
Yocto/Buildroot embedded Linux uses BusyBox
Scripts must be portable across these systems

CI/CD Pipelines

Common problem:

Developers test on macOS/Ubuntu
CI runs on Alpine containers
Scripts break in CI but pass locally

Your tool as a solution:

Run before committing
Integrate into pre-commit hooks
Catch issues before they reach CI

Security Auditing

Minimal attack surface:

Alpine’s small size means fewer vulnerabilities
BusyBox’s simple implementation has fewer CVEs than GNU
Your tool helps maintain this by avoiding unnecessary packages

Resources

Documentation

POSIX Shell Command Language - The official standard
BusyBox Applet Documentation - Command reference
ShellCheck Wiki - Shell gotchas
Alpine Linux Wiki - Alpine specifics
Dash as /bin/sh - Ubuntu’s POSIX compliance

Tools

ShellCheck - Linter for shell scripts
checkbashisms - Debian’s bash-ism detector
shfmt - Shell formatter and parser
BATS - Bash Automated Testing System

Books

“Effective Shell” by Dave Kerr - Modern shell practices
“Classic Shell Scripting” by Robbins & Beebe - Portable scripting
“The UNIX Programming Environment” by Kernighan & Pike - Unix fundamentals

Self-Assessment Checklist

Before considering this project complete, verify you can:

Core Functionality:

Detect bash shebangs and suggest alternatives
Identify [[ ]] extended test syntax
Catch array declarations and expansions
Flag here-strings (<<<)
Detect process substitution (<(), >())
Find brace expansion ({1..10})
Identify GNU-specific grep, sed, find options

Code Quality:

Tool runs on Alpine/BusyBox without modifications
No false positives on clean POSIX scripts
Handles edge cases (strings, comments, heredocs)
Clear, actionable error messages
Proper exit codes (0=ok, 1=errors, 2=warnings)

Understanding:

Can explain difference between POSIX sh and bash
Know which GNU options lack BusyBox support
Understand why Alpine uses BusyBox
Can write portable shell scripts yourself

Testing:

Unit tests for each pattern
Integration tests with real scripts
Validated against actual Alpine container

Submission/Completion Criteria

Minimum Viable Product (MVP):

Shell script that detects at least 10 common bash-isms
Detects at least 5 GNU-specific command options
Provides clear error messages with fix suggestions
Correctly handles basic edge cases (comments, strings)
Works when run on Alpine Linux itself
Includes at least 10 test cases

Full Implementation:

All 20+ bash-isms from the pattern list
All 15+ GNU options from the pattern list
JSON output format option
Auto-fix suggestion mode
Recursive directory scanning
Configuration file support
50+ test cases with edge cases
Documentation with examples

Stretch Goals:

Python implementation with AST parsing
VS Code extension
GitHub Action integration
Support for other minimal shells (dash, mksh)
Performance optimized for large codebases

Complete Bash-ism Reference

For implementation, here is the comprehensive list of bash-isms to detect:

Syntax Features

Feature	Example	POSIX Alternative
Extended test	`[[ $a == $b ]]`	`[ "$a" = "$b" ]`
Arrays	`arr=(a b c)`	`set -- a b c`
Associative arrays	`declare -A map`	Use files or separate variables
Here-strings	`cat <<< "text"`	`echo "text" \\| cat`
Process substitution	`diff <(cmd1) <(cmd2)`	Use temporary files
Brace expansion	`{1..10}`	`seq 1 10`
Brace lists	`{a,b,c}.txt`	`a.txt b.txt c.txt`
`$'...'` quoting	`$'\n'`	`$(printf '\n')`
Regex match	`[[ $s =~ regex ]]`	`echo "$s" \\| grep -E regex`
`==` in test	`[ "$a" == "$b" ]`	`[ "$a" = "$b" ]`
`function` keyword	`function name() {}`	`name() {}`
`source`	`source file.sh`	`. file.sh`
`let`	`let x=1+2`	`x=$((1+2))`
`(( ))`	`(( x++ ))`	`x=$((x+1))`
Substring	`${var:0:5}`	`echo "$var" \\| cut -c1-5`
Replacement	`${var//old/new}`	`echo "$var" \\| sed 's/old/new/g'`
Default value	`${var:-default}`	Supported (POSIX)
`local -a`	`local -a arr`	`local arr` (no type)
`read -a`	`read -a arr`	Read into separate variables
`read -p`	`read -p "prompt"`	`printf "prompt"; read`
`printf -v`	`printf -v var "%s"`	`var=$(printf "%s")`
`select`	`select opt in ...`	Use case statement
`coproc`	`coproc cmd`	Use named pipes

GNU Utility Options

Command	GNU Option	BusyBox Support	Alternative
grep	`-P` (Perl regex)	Not supported	Use `-E` (ERE)
grep	`--include`	Not supported	Use `find \\| xargs grep`
grep	`--exclude`	Not supported	Use `find \\| xargs grep`
sed	`-i''`	Different syntax	Use `-i` (no argument)
sed	`-z`	Not supported	Install GNU sed
find	`-printf`	Not supported	Use `-exec` with commands
find	`-regex`	Limited support	Use `-name` patterns
cp	`--parents`	Not supported	Install GNU coreutils
date	`-d "string"`	Different syntax	Use `-D format -d string`
stat	`-c`	Limited formats	Check available formats
xargs	`-r`	Different meaning	Test behavior
ls	`-G`	Not supported	Use `awk` to hide group
timeout	N/A	Included	Works but check options
realpath	N/A	May need install	Use `readlink -f`
mktemp	`-t`	Different handling	Test behavior
sort	`-V`	Not supported	Install GNU coreutils
head/tail	`-c +N`	May differ	Test behavior

This project bridges the gap between “works on my machine” and “works in production.” By understanding what makes Alpine different, you’ll write better, more portable shell scripts everywhere.