Project 5: Satellite Land Cover Classifier

Build a repeatable remote-sensing pipeline that classifies land cover and quantifies change by administrative zone.

Quick Reference

Attribute	Value
Difficulty	Level 3: Advanced (The Engineer)
Time Estimate	28-40 hours
Main Programming Language	Python (Alternatives: R, Julia)
Coolness Level	Level 4: Hardcore Tech Flex
Business Potential	3. The “Service & Support” Model
Prerequisites	Raster fundamentals, CRS alignment, basic classification
Key Topics	Multi-band rasters, masking, change detection, zonal stats

1. Learning Objectives

Select and prepare comparable Sentinel-2 scenes.
Apply raster alignment and nodata/cloud masking correctly.
Compute spectral indices and classify core land-cover classes.
Summarize change by zones with coverage-aware confidence flags.
Explain uncertainty and false-change failure modes.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Raster Grid Integrity

Fundamentals

Raster analytics depend on CRS, transform, resolution, and nodata metadata.
Multi-date comparison requires common grid alignment.

Deep Dive Most false change signals come from preprocessing mismatch, not true land transformation. Ensure both scenes are aligned to a canonical grid and use consistent resampling rules by variable type. Keep metadata checks explicit and fail fast if mismatch is detected.

2.2 Masking and Spectral Indices

Fundamentals

Cloud and nodata masks must be applied before index calculations.
Spectral indices amplify specific land-cover signatures.

Deep Dive Without masking, clouds and shadows can mimic dramatic change. NDVI-like index interpretation depends on scene comparability and radiometric consistency. Build QA metrics for valid-pixel coverage before reporting any change percentages.

2.3 Classification and Zonal Interpretation

Fundamentals

Classification uncertainty must be reported, not hidden.
Zonal summaries inherit polygon size and boundary effects.

Deep Dive Per-zone change reports are decision-friendly, but can overstate confidence for small polygons or low valid-pixel coverage. Add confidence flags based on pixel support and class uncertainty. Publish both absolute and relative change to avoid misleading ranking behavior.

3. Project Specification

3.1 What You Will Build

A pipeline that:

queries and selects quality-controlled Sentinel-2 scenes,
computes indices and classifies land cover,
compares classes across dates,
outputs zone-level change report and map.

Included:

deterministic scene IDs,
coverage-aware reporting,
interpretable class change outputs.

Excluded:

deep neural segmentation,
near-real-time stream processing,
global-scale processing.

3.2 Functional Requirements

Select two comparable scenes for an AOI.
Reproject/resample to canonical grid.
Apply cloud and nodata masks.
Compute indices and classify classes.
Calculate zonal change metrics and export outputs.

3.3 Non-Functional Requirements

Reproducibility: Scene IDs and thresholds recorded.
Trustworthiness: Coverage and uncertainty metrics included.
Scalability: Windowed/tiled processing for memory stability.

3.4 Data Formats / Schemas

Zone change output columns:
- zone_id
- valid_pixel_coverage
- class_water_delta_pct
- class_vegetation_delta_pct
- class_built_up_delta_pct
- class_bare_soil_delta_pct
- confidence_flag

3.5 Edge Cases

Cloud-heavy scene with low valid coverage
Seasonal mismatch between scenes
Tiny polygons with unstable class percentages
Nodata collisions during differencing

3.6 Real World Outcome

3.6.1 How to Run

$ python run_project5_landcover.py --aoi aoi.geojson --start 2024-05-01 --end 2025-05-31

3.6.2 Golden Path Demo

$ python run_project5_landcover.py --fixture fixtures/sentinel_pair_manifest.json
[INFO] scenes selected: t1=2024-06-03 t2=2025-06-07
[INFO] valid coverage median=0.87
[INFO] zones processed=84
[DONE] change outputs exported

3.6.3 Exact Terminal Transcript (Live)

$ python run_project5_landcover.py --aoi aoi.geojson --start 2024-05-01 --end 2025-05-31
[INFO] Querying scenes with cloud <= 15%
[INFO] Aligning rasters to canonical grid
[INFO] Computing NDVI/NDBI/NDWI
[INFO] Running land-cover classification
[INFO] Computing zonal change metrics
[INFO] Exporting outputs/landcover_change_by_zone.csv
[DONE] Completed in 11m 42s

4. Solution Architecture

4.1 High-Level Design

Scene Query -> Grid Alignment -> Masking -> Index + Classification -> Change Engine -> Zonal Reporter

4.2 Key Components

Component	Responsibility	Key Decision
Scene Selector	Choose comparable imagery	Cloud and season thresholds
Preprocessor	Align grids and masks	Canonical transform policy
Classifier	Assign land-cover classes	Rule-based vs supervised baseline
Change Engine	Compare class maps	Temporal pairing strategy
Zonal Reporter	Aggregate by polygons	Coverage and confidence policy

4.3 Algorithm Overview

Select valid scene pair.
Align to common grid.
Apply masks and compute indices.
Classify both dates.
Compute class deltas and zonal summaries.

5. Implementation Guide

5.1 Development Environment Setup

$ mamba create -n geo-p05 python=3.11 rasterio geopandas numpy scipy -y
$ mamba activate geo-p05

5.2 Project Structure

project5/
├── src/
│   ├── scenes.py
│   ├── preprocess.py
│   ├── classify.py
│   ├── change.py
│   └── main.py
├── fixtures/
├── outputs/
└── tests/

5.3 The Core Question You Are Answering

“How do we convert satellite pixels into change signals that remain defensible after uncertainty checks?”

5.4 Concepts You Must Understand First

Raster alignment invariants.
Mask-first processing.
Coverage-aware zonal reporting.

5.5 Questions to Guide Your Design

How strict should cloud threshold be for your AOI?
Which classes are operationally meaningful for stakeholders?
What minimum valid coverage is required per zone?

5.6 Thinking Exercise

Choose one zone with large observed change and list three alternative explanations unrelated to true land-cover change.

5.7 Interview Questions

Why is grid alignment mandatory before differencing?
How can nodata bias zonal change metrics?
What makes a land-cover report decision-grade?

5.8 Hints in Layers

Hint 1: Freeze scene IDs first.
Hint 2: Validate transform/resolution equality before index math.
Hint 3: Compute coverage before classification summary.
Hint 4: Flag low-confidence zones explicitly in output.

6. Testing Strategy

Category	Purpose
Unit	Mask logic, index calculations, class delta math
Integration	End-to-end scene-pair workflow
Edge Case	Cloud-heavy scenes, low-coverage zones

Critical tests:

Misaligned test rasters must fail fast with clear error.
Cloud-mask fixture should reduce false-change spikes.
Low-coverage zones should carry confidence warning flag.

7. Common Pitfalls & Debugging

Pitfall	Symptom	Fix
Misaligned rasters	Checkerboard-like false change	Align both scenes to canonical grid
Mask omission	Implausible class transitions	Apply cloud/nodata masks before classification
Overconfident reporting	Small zones show extreme noisy deltas	Add minimum coverage thresholds and flags

8. Extensions & Challenges

Add multi-year trend pipeline.
Add class-specific uncertainty estimates.
Add temporal smoothing for noisy class transitions.

9. Real-World Connections

Urban growth and impervious-surface monitoring.
Conservation and vegetation-loss tracking.
Municipal land-use change reporting.

10. Resources

Sentinel-2 mission overview: https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-2
Rasterio docs: https://rasterio.readthedocs.io/en/stable/
GDAL docs: https://gdal.org/en/stable/

11. Self-Assessment Checklist

I can explain why scene pairing choices matter.
I can prove my rasters are aligned before differencing.
I can report valid-pixel coverage with every zone result.
I can articulate uncertainty sources in class-change outputs.

12. Submission / Completion Criteria

Minimum Viable Completion

Working class-change report for one AOI with deterministic scene IDs.

Full Completion

Includes coverage-aware zonal report and confidence flags.

Excellence

Includes uncertainty analysis and false-change diagnostics appendix.