Project 4: Alpine Docker Optimization

Quick Reference

Attribute Details
Difficulty Level 2: Intermediate
Time Estimate Weekend
Primary Language Dockerfile
Alternative Languages Go, Rust, Python, Node.js
Knowledge Area Containers / Docker
Tools Required Docker, dive, trivy
Primary Reference Alpine Wiki, Docker Documentation

Learning Objectives

By completing this project, you will be able to:

  1. Reduce Docker image sizes by 80-95% by switching from Ubuntu/Debian to Alpine base images
  2. Use virtual packages (.build-deps pattern) to install build dependencies, compile, and remove them in a single layer
  3. Implement multi-stage builds that separate build-time dependencies from runtime requirements
  4. Apply static linking strategies for Go, Rust, and C applications to create minimal final images
  5. Leverage Alpine’s --no-cache flag to eliminate package index bloat from images
  6. Evaluate security posture of Alpine-based images using vulnerability scanning
  7. Debug musl-related runtime issues when migrating from glibc-based images
  8. Optimize layer caching to speed up CI/CD builds while maintaining small final images

Theoretical Foundation

Core Concepts

Why Image Size Matters

Docker image size directly impacts:

Image Size Impact Chain:
┌─────────────────┐
│  Larger Images  │
└────────┬────────┘
         │
         ├──► Slower Pulls (network bandwidth)
         │    • 500 MB image @ 100 Mbps = 40+ seconds
         │    • 50 MB image @ 100 Mbps = 4 seconds
         │
         ├──► Slower Startup (disk I/O, layer extraction)
         │    • Critical for autoscaling, cold starts
         │    • Lambda/Cloud Run cold start penalty
         │
         ├──► Higher Storage Costs (registry, nodes)
         │    • Multiply by environments × versions × regions
         │    • 10x image size = 10x storage costs
         │
         └──► Larger Attack Surface (more CVEs)
             • More packages = more vulnerabilities
             • Debian:bullseye has ~400 packages
             • Alpine:latest has ~15 packages

The Alpine Advantage

Alpine Linux achieves its minimal size through four key design choices:

┌─────────────────────────────────────────────────────────────────┐
│                 WHY ALPINE IS SMALL                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. musl libc (~1 MB)     vs  glibc (~8 MB)                    │
│     ├── Smaller, simpler C library                              │
│     └── Stricter POSIX compliance                               │
│                                                                 │
│  2. BusyBox (~1 MB)       vs  GNU Coreutils (~20 MB)           │
│     ├── Single binary, 400+ utilities                           │
│     └── Stripped-down implementations                           │
│                                                                 │
│  3. apk package manager   vs  apt/dpkg                          │
│     ├── Single-file packages (.apk)                             │
│     └── No apt cache, dpkg database overhead                    │
│                                                                 │
│  4. No systemd            vs  systemd + dependencies            │
│     ├── OpenRC is simpler                                       │
│     └── Containers typically don't need init systems            │
│                                                                 │
│  Result: 5 MB base vs 70-120 MB base                           │
└─────────────────────────────────────────────────────────────────┘

Understanding Docker Layers

Every Dockerfile instruction creates a layer. Layers are additive - files deleted in later layers still consume space:

Layer Behavior:
┌──────────────────────────────────────────────────────────────┐
│ RUN apt-get install -y build-essential     # +300 MB        │
├──────────────────────────────────────────────────────────────┤
│ RUN make && make install                   # +50 MB         │
├──────────────────────────────────────────────────────────────┤
│ RUN apt-get remove -y build-essential      # +1 MB (!)      │
│     └── Files are MASKED, not removed                       │
│         Previous layer still contains 300 MB                 │
├──────────────────────────────────────────────────────────────┤
│ Total Image Size: ~350 MB                                    │
└──────────────────────────────────────────────────────────────┘

VS (Same Layer Pattern):
┌──────────────────────────────────────────────────────────────┐
│ RUN apt-get install -y build-essential \                     │
│     && make && make install \                                │
│     && apt-get remove -y build-essential    # +50 MB only   │
│     └── Install + Build + Remove in ONE layer                │
├──────────────────────────────────────────────────────────────┤
│ Total Image Size: ~50 MB                                     │
└──────────────────────────────────────────────────────────────┘

Multi-Stage Builds: The Game Changer

Multi-stage builds allow complete separation of build and runtime environments:

Multi-Stage Build Flow:
┌─────────────────────────────────────────────────────────────┐
│ Stage 1: BUILD                                              │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ FROM golang:1.22-alpine AS builder                      │ │
│ │                                                         │ │
│ │ • Full compiler toolchain                               │ │
│ │ • All build dependencies                                │ │
│ │ • Source code                                           │ │
│ │ • 500+ MB                                               │ │
│ │                                                         │ │
│ │ Output: Single compiled binary                          │ │
│ └─────────────────────────────────────────────────────────┘ │
│                         │                                    │
│                         ▼                                    │
│ Stage 2: RUNTIME                                            │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ FROM alpine:latest                                      │ │
│ │                                                         │ │
│ │ • Just the binary                                       │ │
│ │ • Minimal runtime dependencies                          │ │
│ │ • 5-10 MB                                               │ │
│ │                                                         │ │
│ │ COPY --from=builder /app/binary /binary                 │ │
│ └─────────────────────────────────────────────────────────┘ │
│                                                             │
│ Final Image: ONLY Stage 2 contents (5-10 MB)                │
│ Stage 1 is discarded after COPY                             │
└─────────────────────────────────────────────────────────────┘

Why This Matters

  1. Cost Reduction: At scale, image size directly impacts cloud costs. A 90% size reduction means 90% less storage, 90% less bandwidth.

  2. Security Posture: Fewer packages means fewer CVEs. Alpine’s minimal base means you inherit fewer vulnerabilities.

  3. Developer Experience: Faster builds, faster pulls, faster iteration cycles.

  4. Production Reliability: Smaller images pull faster during autoscaling events, reducing latency during traffic spikes.

Historical Context

Docker originally encouraged ubuntu:latest as a base image. As container adoption grew, the inefficiency became apparent:

  • 2013-2015: Ubuntu/Debian dominated, 500 MB+ images were common
  • 2016: Alpine gained traction (Docker Inc. officially endorsed it)
  • 2017: Multi-stage builds introduced in Docker 17.05
  • 2018-2020: Distroless and scratch images emerged for minimal footprints
  • 2021+: Security scanning became standard; smaller images = fewer CVEs

Common Misconceptions

Misconception 1: “Alpine is less stable because it’s smaller” Reality: Alpine has been used in production by major companies since 2016. Its simplicity often means fewer moving parts to break.

Misconception 2: “I need bash and common tools in my container” Reality: Production containers should run one process. Debugging tools can be added temporarily or via sidecar containers.

Misconception 3: “musl compatibility issues are common” Reality: Most issues occur with pre-compiled binaries. Code compiled on Alpine works fine. Language runtimes (Python, Node.js) have Alpine-native packages.

Misconception 4: “Multi-stage builds are only for compiled languages” Reality: Python, Node.js, and interpreted languages benefit significantly from multi-stage builds (installing dev dependencies, building wheels, etc.).


Project Specification

What You Will Build

A complete Docker optimization toolkit consisting of:

  1. Before/After Dockerfiles for 5 language stacks (Python, Go, Rust, Node.js, C)
  2. Optimization scripts that apply best practices automatically
  3. Size comparison reports documenting the reduction achieved
  4. Security scan comparisons showing CVE count reduction
  5. Build performance benchmarks measuring CI/CD improvements

Functional Requirements

FR-1: Create optimized Dockerfiles for at least 5 different application types
FR-2: Achieve minimum 75% size reduction compared to standard base images
FR-3: Use virtual packages pattern for all build dependencies
FR-4: Implement multi-stage builds for all compiled languages
FR-5: Generate comparison reports with image sizes and layer analysis
FR-6: Include security scanning with CVE count comparison
FR-7: Maintain functionality - optimized images must pass all tests

Non-Functional Requirements

NFR-1: Build time should not increase by more than 20%
NFR-2: Docker layer caching should be optimized for iterative development
NFR-3: All images should run as non-root users
NFR-4: No secrets or sensitive data in image layers
NFR-5: Documentation for each optimization technique applied

Example Usage/Output

# Run the optimization toolkit
$ ./optimize-docker.sh python-app/

Analyzing: python-app/Dockerfile.original
Original: python:3.11 base
Size: 1.2 GB
CVEs: 142 (12 Critical, 45 High)

Generating: python-app/Dockerfile.optimized
Optimized: python:3.11-alpine base
Size: 89 MB  (92.6% reduction)
CVEs: 8 (0 Critical, 2 High)

Layer Analysis:
┌────────────────────────────────────────────────────────────┐
│ LAYER                                SIZE     CACHED       │
├────────────────────────────────────────────────────────────┤
│ python:3.11-alpine                   48 MB    yes          │
│ apk add --virtual .build-deps...     0 MB     (removed)    │
│ pip install requirements.txt         35 MB    yes          │
│ COPY . /app                          6 MB     no           │
├────────────────────────────────────────────────────────────┤
│ TOTAL                                89 MB                 │
└────────────────────────────────────────────────────────────┘

Build Time: 45s (original) → 52s (optimized) [+15%]
Pull Time:  38s (original) → 4s (optimized)  [-89%]

✓ All tests passed on optimized image
✓ Report saved: reports/python-app-optimization.md

Real World Outcome

After completing this project, you will have:

  1. A collection of production-ready Dockerfile templates
  2. Scripts to analyze and optimize existing Dockerfiles
  3. Documentation explaining each optimization technique
  4. Metrics proving the business value of optimization
  5. Skills directly applicable to DevOps/SRE roles

Solution Architecture

High-Level Design

┌─────────────────────────────────────────────────────────────────────┐
│                     DOCKER OPTIMIZATION TOOLKIT                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌───────────────┐    ┌───────────────┐    ┌───────────────┐       │
│   │   Original    │    │   Analyzer    │    │  Optimizer    │       │
│   │  Dockerfile   │───▶│               │───▶│               │       │
│   │               │    │ • Size check  │    │ • Base swap   │       │
│   └───────────────┘    │ • Layer count │    │ • Multi-stage │       │
│                        │ • CVE scan    │    │ • Virtual pkg │       │
│                        └───────────────┘    └───────┬───────┘       │
│                                                     │               │
│                                                     ▼               │
│   ┌───────────────┐    ┌───────────────┐    ┌───────────────┐       │
│   │   Optimized   │◀───│   Builder     │◀───│   Templates   │       │
│   │  Dockerfile   │    │               │    │               │       │
│   │               │    │ • docker build│    │ • Go          │       │
│   └───────────────┘    │ • dive analyze│    │ • Rust        │       │
│         │              │ • trivy scan  │    │ • Python      │       │
│         │              └───────────────┘    │ • Node.js     │       │
│         │                                   │ • C/C++       │       │
│         ▼                                   └───────────────┘       │
│   ┌───────────────────────────────────────────────────────┐         │
│   │                 COMPARISON REPORT                      │         │
│   │  • Size: Before/After                                  │         │
│   │  • CVEs: Before/After                                  │         │
│   │  • Build Time                                          │         │
│   │  • Layer Efficiency                                    │         │
│   └───────────────────────────────────────────────────────┘         │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Key Components

  1. Analyzer: Examines existing Dockerfiles and built images
    • Measures image size using docker images
    • Counts layers using docker history
    • Scans CVEs using trivy
    • Identifies optimization opportunities
  2. Template Library: Pre-built optimized Dockerfiles
    • Language-specific best practices
    • Virtual package patterns
    • Multi-stage configurations
  3. Builder: Constructs and validates optimized images
    • Builds both original and optimized versions
    • Runs test suites on both
    • Generates comparison metrics
  4. Reporter: Creates detailed comparison documents
    • Markdown reports with tables
    • JSON for CI/CD integration
    • Visual layer analysis

Data Structures

# Project directory structure
docker-optimization-toolkit/
├── templates/
│   ├── python/
│   │   ├── Dockerfile.original       # Standard python:3.11 approach
│   │   ├── Dockerfile.alpine         # Alpine with virtual packages
│   │   └── Dockerfile.multistage     # Multi-stage for wheels
│   ├── go/
│   │   ├── Dockerfile.original
│   │   ├── Dockerfile.alpine
│   │   └── Dockerfile.scratch        # FROM scratch final stage
│   ├── rust/
│   │   ├── Dockerfile.original
│   │   ├── Dockerfile.alpine-musl
│   │   └── Dockerfile.scratch
│   ├── nodejs/
│   │   ├── Dockerfile.original
│   │   ├── Dockerfile.alpine
│   │   └── Dockerfile.multistage
│   └── c-cpp/
│       ├── Dockerfile.original
│       ├── Dockerfile.alpine
│       └── Dockerfile.static
├── scripts/
│   ├── analyze.sh                    # Analyze existing Dockerfile
│   ├── optimize.sh                   # Apply optimization
│   ├── compare.sh                    # Generate comparison
│   └── scan.sh                       # Security scanning
├── examples/
│   ├── python-flask-app/
│   ├── go-api-server/
│   ├── rust-cli-tool/
│   ├── node-express-app/
│   └── c-nginx-module/
└── reports/
    └── *.md                          # Generated reports

Algorithm Overview

OPTIMIZATION_ALGORITHM:

1. ANALYZE original Dockerfile:
   - Identify base image (ubuntu, debian, python, node, etc.)
   - Parse RUN instructions for package installations
   - Identify build vs runtime dependencies
   - Measure current size and CVE count

2. SELECT optimization strategy:
   IF language is compiled (Go, Rust, C):
       APPLY multi-stage with scratch/alpine final
   ELIF language needs native extensions (Python with C):
       APPLY virtual packages + multi-stage wheel builder
   ELIF language is interpreted (Node.js):
       APPLY alpine base + npm ci --production
   END

3. GENERATE optimized Dockerfile:
   - Replace base image with Alpine variant
   - Group RUN commands for layer efficiency
   - Add virtual packages for build deps
   - Configure multi-stage if applicable
   - Add non-root user
   - Add .dockerignore

4. BUILD and VALIDATE:
   - Build both original and optimized
   - Run test suite on optimized image
   - Generate size comparison
   - Run security scan
   - Create report

5. ITERATE if tests fail:
   - Identify missing runtime dependencies
   - Add to final stage
   - Rebuild and retest

Implementation Guide

Development Environment Setup

# Prerequisites
docker --version       # 24.0+ recommended
docker buildx version  # For multi-platform builds

# Install analysis tools
# dive: Layer analysis tool
curl -Lo dive.deb https://github.com/wagoodman/dive/releases/download/v0.12.0/dive_0.12.0_linux_amd64.deb
sudo dpkg -i dive.deb

# trivy: Vulnerability scanner
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sudo sh -s -- -b /usr/local/bin

# Verify installations
dive --version
trivy --version

Project Structure

Create the project skeleton:

mkdir -p docker-optimization-toolkit/{templates/{python,go,rust,nodejs,c-cpp},scripts,examples,reports}
cd docker-optimization-toolkit

The Core Question You’re Answering

“How do I systematically reduce Docker image size while maintaining functionality and improving security?”

Concepts You Must Understand First

Before implementing, verify you can answer these questions:

  1. Why does deleting files in a later layer not reduce image size?
    • Answer: Layers are additive. Deleted files are masked, not removed.
  2. What is the difference between RUN commands on separate lines vs chained with &&?
    • Answer: Separate lines create separate layers. Chained commands create one layer.
  3. Why does Alpine use musl instead of glibc?
    • Answer: musl is smaller, simpler, and more secure. Trade-off is compatibility.
  4. What happens to build stage layers in multi-stage builds?
    • Answer: They are discarded. Only the final stage contributes to image size.
  5. Why is --no-cache important in apk add?
    • Answer: It prevents storing the package index in the layer, saving ~10 MB.

Questions to Guide Your Design

Base Image Selection:

  • What is the absolute minimum my application needs to run?
  • Does my language have an official Alpine variant?
  • Can I use scratch as my final stage (for static binaries)?

Dependency Management:

  • Which dependencies are only needed at build time?
  • Which must be present at runtime?
  • Can I separate these into different stages?

Layer Optimization:

  • Which files change frequently vs rarely?
  • How can I order COPY instructions to maximize cache hits?
  • Are there any files I’m including unnecessarily?

Security:

  • Am I running as root unnecessarily?
  • Are there secrets in my build context?
  • Have I scanned for vulnerabilities?

Thinking Exercise

Before writing any Dockerfiles, trace through this exercise:

Given this application structure:
myapp/
├── app.py
├── requirements.txt
├── tests/
└── docs/

And these requirements.txt contents:
flask==3.0.0
gunicorn==21.0.0
cryptography==42.0.0  # Requires compilation

Trace the build process for both approaches:

APPROACH 1: Standard Debian
FROM python:3.11
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["gunicorn", "app:app"]

Questions:
1. What is the base image size?
2. What packages are installed for cryptography compilation?
3. Are those packages needed at runtime?
4. What is the final image size (estimate)?

APPROACH 2: Alpine Multi-Stage
FROM python:3.11-alpine AS builder
RUN apk add --no-cache --virtual .build-deps gcc musl-dev libffi-dev
COPY requirements.txt .
RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt

FROM python:3.11-alpine
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir /wheels/*
COPY app.py .
CMD ["gunicorn", "app:app"]

Questions:
1. Why use pip wheel in the build stage?
2. Why copy wheels instead of requirements.txt?
3. What happens to gcc, musl-dev, libffi-dev?
4. What is the final image size (estimate)?

Hints in Layers

Hint 1: Starting Point (Conceptual Direction) Start with the simplest optimization: replace your base image with its Alpine variant. Most official images have Alpine versions (e.g., python:3.11-alpine, node:20-alpine).

Hint 2: Next Level (More Specific Guidance) For packages that require compilation, use the virtual packages pattern:

RUN apk add --no-cache --virtual .build-deps \
        gcc musl-dev python3-dev \
    && pip install --no-cache-dir -r requirements.txt \
    && apk del .build-deps

Hint 3: Technical Details (Approach/Pseudocode) For multi-stage builds with compiled languages:

# Stage 1: Build
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.* ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .

# Stage 2: Runtime
FROM scratch
COPY --from=builder /app/main /main
ENTRYPOINT ["/main"]

Hint 4: Tools/Debugging (Verification Methods) Use dive to analyze layer efficiency:

dive myimage:latest
# Look for:
# - Wasted space (files added then removed)
# - Large layers that could be optimized
# - Duplicate content across layers

Interview Questions They’ll Ask

  1. “How would you reduce a 1.2 GB Python Docker image to under 100 MB?”
    • Expected answer: Alpine base, virtual packages for build deps, multi-stage for wheel building, –no-cache everywhere
  2. “What’s the difference between ADD and COPY in a Dockerfile?”
    • Expected answer: ADD has extra features (URL fetch, tar extraction). COPY is preferred for simple file copying as it’s more explicit.
  3. “Why might an application work in a Debian container but fail in Alpine?”
    • Expected answer: musl vs glibc incompatibility. Solutions include static linking, gcompat, or recompiling for musl.
  4. “How do you debug a minimal container that has no shell?”
    • Expected answer: Multi-stage with debug stage, ephemeral sidecar containers, or docker debug (buildx feature).
  5. “What security benefits does a smaller image provide?”
    • Expected answer: Fewer packages = fewer CVEs, smaller attack surface, easier to audit, faster to update.
  6. “How would you optimize Docker layer caching for a CI/CD pipeline?”
    • Expected answer: Order instructions by change frequency, separate dependency installation from code copy, use BuildKit cache mounts.

Books That Will Help

Topic Book/Resource Relevant Section
Docker fundamentals Docker Deep Dive by Nigel Poulton Chapters 8-9: Images and Containers
Container security Container Security by Liz Rice Chapter 6: Container Images
Alpine specifics Alpine Wiki Running glibc programs, Package management
Production Docker Docker in Practice by Miell & Sayers Chapter 7: Image optimization
musl compatibility musl-libc Wiki Functional differences from glibc

Implementation Phases

Phase 1: Python Flask Application (Day 1 Morning)

Original Dockerfile (Ubuntu-based):

# python-app/Dockerfile.original
FROM ubuntu:22.04

RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    python3-dev \
    build-essential \
    libpq-dev

WORKDIR /app
COPY requirements.txt .
RUN pip3 install -r requirements.txt

COPY . .
EXPOSE 5000
CMD ["python3", "-m", "flask", "run", "--host=0.0.0.0"]

Optimized Dockerfile (Alpine + Virtual Packages):

# python-app/Dockerfile.optimized
FROM python:3.11-alpine

# Install build dependencies as virtual package
RUN apk add --no-cache --virtual .build-deps \
        gcc \
        musl-dev \
        python3-dev \
        libffi-dev \
        postgresql-dev

WORKDIR /app

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Remove build dependencies
RUN apk del .build-deps

# Install runtime dependencies only
RUN apk add --no-cache libpq

# Copy application
COPY . .

# Run as non-root user
RUN adduser -D appuser
USER appuser

EXPOSE 5000
CMD ["python", "-m", "flask", "run", "--host=0.0.0.0"]

Multi-Stage Optimized Dockerfile (Maximum Reduction):

# python-app/Dockerfile.multistage
# Stage 1: Build wheels
FROM python:3.11-alpine AS builder

RUN apk add --no-cache \
        gcc \
        musl-dev \
        python3-dev \
        libffi-dev \
        postgresql-dev

WORKDIR /build
COPY requirements.txt .

# Build wheels for all dependencies
RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt

# Stage 2: Runtime
FROM python:3.11-alpine

# Runtime dependencies only
RUN apk add --no-cache libpq

# Copy pre-built wheels
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels

WORKDIR /app
COPY . .

# Non-root user
RUN adduser -D appuser
USER appuser

EXPOSE 5000
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

Comparison Script:

#!/bin/bash
# scripts/compare-python.sh

echo "Building original image..."
docker build -t python-app:original -f Dockerfile.original .

echo "Building optimized image..."
docker build -t python-app:optimized -f Dockerfile.multistage .

echo ""
echo "=== SIZE COMPARISON ==="
echo "Original:  $(docker images python-app:original --format '{{.Size}}')"
echo "Optimized: $(docker images python-app:optimized --format '{{.Size}}')"

echo ""
echo "=== LAYER COUNT ==="
echo "Original:  $(docker history python-app:original | wc -l) layers"
echo "Optimized: $(docker history python-app:optimized | wc -l) layers"

echo ""
echo "=== CVE SCAN ==="
echo "Original:"
trivy image --severity HIGH,CRITICAL python-app:original 2>/dev/null | grep -E "^Total:|HIGH|CRITICAL" | head -5
echo ""
echo "Optimized:"
trivy image --severity HIGH,CRITICAL python-app:optimized 2>/dev/null | grep -E "^Total:|HIGH|CRITICAL" | head -5

Phase 2: Go API Server (Day 1 Afternoon)

Original Dockerfile:

# go-app/Dockerfile.original
FROM golang:1.22

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN go build -o server .

EXPOSE 8080
CMD ["./server"]

Optimized Dockerfile (Static Binary + Scratch):

# go-app/Dockerfile.optimized
# Stage 1: Build
FROM golang:1.22-alpine AS builder

# Install CA certificates for HTTPS
RUN apk add --no-cache ca-certificates git

WORKDIR /app

# Cache dependencies
COPY go.mod go.sum ./
RUN go mod download

COPY . .

# Build static binary
# CGO_ENABLED=0: No C dependencies
# -a: Force rebuild
# -installsuffix cgo: Use separate directory for non-cgo builds
# -ldflags '-s -w': Strip debug symbols
RUN CGO_ENABLED=0 GOOS=linux go build \
    -a -installsuffix cgo \
    -ldflags '-s -w -extldflags "-static"' \
    -o server .

# Stage 2: Minimal runtime
FROM scratch

# Copy CA certificates for HTTPS
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# Copy binary
COPY --from=builder /app/server /server

EXPOSE 8080
ENTRYPOINT ["/server"]

Expected Results:

Original (golang:1.22):     ~850 MB
Optimized (scratch):        ~5-15 MB
Reduction:                  98%+

Phase 3: Rust CLI Tool (Day 1 Evening)

Original Dockerfile:

# rust-app/Dockerfile.original
FROM rust:1.75

WORKDIR /app
COPY . .
RUN cargo build --release

CMD ["./target/release/myapp"]

Optimized Dockerfile (musl + Scratch):

# rust-app/Dockerfile.optimized
# Stage 1: Build with musl for static linking
FROM rust:1.75-alpine AS builder

# Install musl tools
RUN apk add --no-cache musl-dev

WORKDIR /app
COPY Cargo.toml Cargo.lock ./
COPY src ./src

# Build for musl target (static binary)
RUN rustup target add x86_64-unknown-linux-musl
RUN cargo build --release --target x86_64-unknown-linux-musl

# Stage 2: Scratch runtime
FROM scratch

COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/myapp /myapp

ENTRYPOINT ["/myapp"]

Expected Results:

Original (rust:1.75):       ~1.4 GB
Optimized (scratch):        ~2-10 MB (depends on dependencies)
Reduction:                  99%+

Phase 4: Node.js Express Application (Day 2 Morning)

Original Dockerfile:

# node-app/Dockerfile.original
FROM node:20

WORKDIR /app
COPY package*.json ./
RUN npm install

COPY . .
EXPOSE 3000
CMD ["node", "server.js"]

Optimized Dockerfile (Alpine + Production Dependencies):

# node-app/Dockerfile.optimized
# Stage 1: Build (install all deps including devDependencies)
FROM node:20-alpine AS builder

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm ci

# Copy source and build (if using TypeScript/build step)
COPY . .
RUN npm run build --if-present

# Stage 2: Production runtime
FROM node:20-alpine

# Minimal runtime packages (if needed)
RUN apk add --no-cache tini

WORKDIR /app

# Copy package files and install production deps only
COPY package*.json ./
RUN npm ci --omit=dev && npm cache clean --force

# Copy built application
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/server.js ./

# Non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001
USER nodejs

EXPOSE 3000

# Use tini as init process
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]

Expected Results:

Original (node:20):         ~1.1 GB
Optimized (node:20-alpine): ~150-200 MB
Reduction:                  80-85%

Phase 5: C Application (Day 2 Afternoon)

Original Dockerfile:

# c-app/Dockerfile.original
FROM ubuntu:22.04

RUN apt-get update && apt-get install -y \
    build-essential \
    cmake

WORKDIR /app
COPY . .
RUN mkdir build && cd build && cmake .. && make

CMD ["./build/myapp"]

Optimized Dockerfile (Static Linking + Scratch):

# c-app/Dockerfile.optimized
# Stage 1: Build with static linking
FROM alpine:latest AS builder

RUN apk add --no-cache \
        build-base \
        cmake \
        musl-dev

WORKDIR /app
COPY . .

# Build with static linking
RUN mkdir build && cd build && \
    cmake -DCMAKE_EXE_LINKER_FLAGS="-static" .. && \
    make

# Stage 2: Scratch runtime
FROM scratch

COPY --from=builder /app/build/myapp /myapp

ENTRYPOINT ["/myapp"]

Expected Results:

Original (ubuntu:22.04):    ~400 MB
Optimized (scratch):        ~1-5 MB
Reduction:                  99%+

Key Implementation Decisions

  1. Virtual Package Naming Convention Always use descriptive names: .build-deps, .python-build-deps, .rust-build-deps

  2. Order of Dockerfile Instructions
    # Most stable (rarely changes) → Most volatile (changes often)
    FROM base-image                 # 1. Base image
    RUN install-system-packages     # 2. System packages
    WORKDIR /app                    # 3. Working directory
    COPY package*.json ./           # 4. Dependency manifests
    RUN install-dependencies        # 5. Dependencies
    COPY . .                        # 6. Application code (changes most)
    CMD ["start"]                   # 7. Entry point
    
  3. When to Use scratch vs alpine Final Stage
    • Use scratch: Static binaries (Go, Rust, C with static linking)
    • Use alpine: Need shell, runtime libraries, or debugging access
  4. Handling CA Certificates For scratch images making HTTPS requests:
    COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
    
  5. .dockerignore is Critical Always create .dockerignore:
    .git
    .gitignore
    node_modules
    *.md
    Dockerfile*
    docker-compose*
    .env*
    __pycache__
    *.pyc
    target/
    build/
    

Testing Strategy

Unit Tests

  1. Image Size Verification
    test_image_size() {
        size=$(docker images myapp:optimized --format '{{.Size}}' | sed 's/MB//')
        if [ "$size" -lt 100 ]; then
            echo "PASS: Image size ${size}MB < 100MB"
        else
            echo "FAIL: Image size ${size}MB >= 100MB"
            exit 1
        fi
    }
    
  2. Layer Count Check
    test_layer_count() {
        layers=$(docker history myapp:optimized --no-trunc | wc -l)
        if [ "$layers" -lt 15 ]; then
            echo "PASS: Layer count ${layers} < 15"
        else
            echo "FAIL: Layer count ${layers} >= 15"
            exit 1
        fi
    }
    
  3. Non-Root User Verification
    test_non_root() {
        user=$(docker run --rm myapp:optimized whoami 2>/dev/null || echo "root")
        if [ "$user" != "root" ]; then
            echo "PASS: Running as non-root ($user)"
        else
            echo "FAIL: Running as root"
            exit 1
        fi
    }
    

Integration Tests

  1. Application Functionality
    test_app_runs() {
        docker run -d --name test-app -p 8080:8080 myapp:optimized
        sleep 2
        response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health)
        docker rm -f test-app
    
        if [ "$response" = "200" ]; then
            echo "PASS: App responds with 200"
        else
            echo "FAIL: App responds with $response"
            exit 1
        fi
    }
    
  2. Security Scan Threshold
    test_no_critical_cves() {
        critical=$(trivy image --severity CRITICAL myapp:optimized 2>/dev/null | grep -c CRITICAL || echo "0")
        if [ "$critical" -eq 0 ]; then
            echo "PASS: No critical CVEs"
        else
            echo "FAIL: Found $critical critical CVEs"
            exit 1
        fi
    }
    

Validation Experiments

  1. musl Compatibility Test Run application-specific tests to verify musl doesn’t break functionality:
    • DNS resolution
    • Timezone handling
    • Locale processing
    • Threading behavior
  2. Build Cache Efficiency Measure cache hit rate across multiple builds:
    time docker build --no-cache -t myapp:cold .
    time docker build -t myapp:warm .
    # Warm build should be significantly faster
    

Common Pitfalls & Debugging

Pitfall 1: “not found” Error for Binaries

Symptom:

standard_init_linux.go:228: exec user process caused: no such file or directory

Cause: Binary was compiled with glibc (on Debian/Ubuntu) but running on musl (Alpine).

Solution:

# Option 1: Static linking (Go)
RUN CGO_ENABLED=0 go build -o myapp .

# Option 2: musl target (Rust)
RUN rustup target add x86_64-unknown-linux-musl
RUN cargo build --target x86_64-unknown-linux-musl

# Option 3: Compile on Alpine
FROM alpine:latest AS builder
RUN apk add --no-cache build-base
# ... build here

# Option 4: gcompat layer (last resort)
RUN apk add --no-cache gcompat

Pitfall 2: DNS Resolution Failures

Symptom:

dial tcp: lookup myhost.local: no such host

Cause: musl’s DNS resolver is simpler than glibc’s and handles some edge cases differently.

Solution:

# Ensure nsswitch.conf exists
RUN echo "hosts: files dns" > /etc/nsswitch.conf

# Or add DNS-related packages
RUN apk add --no-cache bind-tools

Pitfall 3: Missing CA Certificates in Scratch

Symptom:

x509: certificate signed by unknown authority

Cause: Scratch image has no certificates.

Solution:

# In builder stage
FROM alpine:latest AS builder
RUN apk add --no-cache ca-certificates

# Copy to scratch
FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

Pitfall 4: Virtual Package Deletion Fails

Symptom:

ERROR: .build-deps: No such package

Cause: Virtual package name was mistyped or already deleted.

Solution:

# Use consistent naming
RUN apk add --no-cache --virtual .build-deps gcc musl-dev \
    && pip install package \
    && apk del .build-deps  # Exact same name

# Or combine in single RUN to avoid typos

Pitfall 5: Python Wheels Not Compatible

Symptom:

Could not find a version that satisfies the requirement package==1.0.0

Cause: Pre-built wheels on PyPI are for glibc, not musl.

Solution:

# Build wheels from source in Alpine
RUN apk add --no-cache --virtual .build-deps \
        gcc musl-dev python3-dev \
    && pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt

Pitfall 6: Timezone Data Missing

Symptom:

unknown time zone America/New_York

Cause: Alpine doesn’t include timezone data by default.

Solution:

RUN apk add --no-cache tzdata
ENV TZ=America/New_York

Debugging Commands

# Inspect image layers
dive myimage:latest

# Check file sizes
docker run --rm myimage:latest du -sh /*

# Test binary execution
docker run --rm myimage:latest /bin/sh -c "ldd /app/myapp"

# Check for glibc dependencies
docker run --rm myimage:latest readelf -d /app/myapp | grep NEEDED

# Interactive debugging
docker run --rm -it myimage:latest /bin/sh

# Copy files out for inspection
docker cp $(docker create myimage:latest):/app/myapp ./myapp-extracted

Extensions & Challenges

Extension 1: BuildKit Cache Mounts

Use BuildKit’s cache mounts for even faster builds:

# syntax=docker/dockerfile:1.4
FROM python:3.11-alpine AS builder

RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

Extension 2: Multi-Architecture Builds

Build for both AMD64 and ARM64:

docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:multi .

Extension 3: Distroless Comparison

Compare Alpine against Google’s distroless images:

FROM gcr.io/distroless/python3
COPY --from=builder /app /app
CMD ["app.py"]

Extension 4: Supply Chain Security

Implement image signing and verification:

# Sign with cosign
cosign sign myregistry/myapp:latest

# Verify
cosign verify myregistry/myapp:latest

Challenge: Sub-5MB Python Image

Create a Python application image under 5 MB using:

  • Statically compiled Python interpreter
  • Stripped dependencies
  • Aggressive optimization

Real-World Connections

How This Applies in Production

  1. CI/CD Pipeline Optimization
    • Faster builds mean faster feedback loops
    • Smaller images mean faster deployments
    • Security scans complete faster
  2. Kubernetes at Scale
    • Faster pod startup for autoscaling
    • Reduced registry storage costs
    • Lower network bandwidth during deployments
  3. Edge Computing
    • Minimal images essential for IoT/edge
    • Faster updates over limited bandwidth
    • Reduced storage on edge devices
  4. Security Compliance
    • Fewer CVEs to remediate
    • Smaller attack surface
    • Easier auditing

Industry Examples

  • Google: Uses distroless images in production, heavily influences minimal image practices
  • Docker Inc: Endorses Alpine as preferred base image
  • HashiCorp: All tools use Alpine-based multi-stage builds
  • GitLab: Runner images optimized for Alpine

Resources

Essential Reading

Tools

Reference Dockerfiles


Self-Assessment Checklist

Before considering this project complete, verify:

  • You can reduce a Python image from 1+ GB to under 100 MB
  • You understand when to use virtual packages vs multi-stage builds
  • You can create a Go binary that runs on scratch (< 10 MB total)
  • You know how to handle musl compatibility issues
  • You can explain why layer order matters for caching
  • You’ve used dive to analyze layer efficiency
  • You’ve run trivy and compared CVE counts
  • Your images run as non-root users
  • You have a .dockerignore for each project

Submission/Completion Criteria

To complete this project, deliver:

  1. Dockerfile Collection: At least 5 before/after Dockerfile pairs
  2. Comparison Report: Markdown document with:
    • Size comparisons (table format)
    • CVE count comparisons
    • Build time analysis
    • Layer efficiency analysis
  3. Scripts: Automation for building, comparing, and scanning
  4. Documentation: README explaining each optimization technique
  5. Test Results: Evidence that optimized images pass functional tests

Bonus Points:

  • Multi-architecture builds
  • BuildKit cache optimization
  • Integration with CI/CD pipeline
  • Supply chain security (signing/verification)

Alpine Linux’s minimalism isn’t a limitation - it’s a feature. By understanding Docker layers, musl compatibility, and multi-stage builds, you transform large, vulnerable images into lean, secure production artifacts.