Project 5: Satellite Image Land Cover Classifier

Build a pipeline that processes satellite imagery and classifies land cover types over time.


Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 3-4 weeks
Language Python
Prerequisites NumPy, basic ML concepts
Key Topics raster data, spectral indices, classification, change detection
Output Classified rasters + change report

Learning Objectives

By completing this project, you will:

  1. Load multi-band raster imagery with Rasterio.
  2. Compute spectral indices like NDVI and NDWI.
  3. Train a baseline classifier for land cover.
  4. Produce a classified map with a legend.
  5. Quantify land cover change between two dates.
  6. Handle large rasters efficiently with windowed reads.

The Core Question You’re Answering

“How do I turn raw satellite pixels into meaningful land cover categories and track change over time?”

This is the foundation of remote sensing and environmental monitoring.


Concepts You Must Understand First

Concept Why It Matters Where to Learn
Raster data Pixels + bands are the raw input Rasterio docs
Spectral indices NDVI separates vegetation Remote sensing intro
Supervised classification Turn labeled pixels into models ML basics
Change detection Compare classified rasters Remote sensing guides
CRS and transforms Preserve georeferencing GIS basics

Key Concepts Deep Dive

  1. Raster Bands
    • Sentinel-2 has multiple bands (red, green, NIR).
    • Each pixel is a vector, not a single value.
  2. NDVI Formula
    • (NIR - Red) / (NIR + Red)
    • High NDVI = healthy vegetation.
  3. Classification Workflow
    • Labeled pixels -> feature matrix -> model -> prediction.
  4. Change Detection
    • Compare class counts across years.
    • Generate percent change per class.

Theoretical Foundation

Raster Classification Pipeline

Raw Bands -> Indices -> Feature Stack -> Classifier -> Land Cover Map
                                                        |
                                                        v
                                                 Change Detection

Why Windowed Reads Matter

Large rasters can exceed memory. Process them tile-by-tile to avoid crashes.


Project Specification

What You Will Build

A pipeline that downloads satellite imagery for two dates, computes indices, trains a classifier, produces land cover maps, and reports changes.

Functional Requirements

  1. Load multi-band imagery (two dates).
  2. Compute NDVI and NDWI.
  3. Build training samples and train a classifier.
  4. Generate classified raster outputs.
  5. Compute change statistics by class.

Non-Functional Requirements

  • Memory safety: Windowed processing for large rasters.
  • Reproducibility: Save model parameters and metadata.
  • Clarity: Document class labels and assumptions.

Example Usage / Output

$ python classify_landcover.py --region "sf_bay" --year 2019 --year2 2024
Saved: output/landcover_2019.tif
Saved: output/landcover_2024.tif
Change report: output/change_report.txt

Real World Outcome

You open the output maps and see:

  1. A classified land cover map for each year.
  2. A legend that maps colors to classes.
  3. A change report showing percent increases/decreases.

Solution Architecture

High-Level Design

Imagery -> Indices -> Training -> Model -> Classification -> Change Report

Key Components

Component Responsibility Key Decisions
Loader Read bands Rasterio windowed reads
Feature builder Compute indices NDVI + NDWI
Trainer Build model Random forest baseline
Classifier Predict per pixel Windowed inference
Reporter Change stats Percent by class

Data Structures

numpy.ndarray  # raster bands and features

Algorithm Overview

  1. Load raster bands for two dates.
  2. Compute NDVI/NDWI and stack features.
  3. Train classifier using labeled pixels.
  4. Predict classes for each pixel.
  5. Compare class counts between dates.

Complexity

  • Feature computation: O(n) per pixel.
  • Classification: O(n * trees) for random forest.

Implementation Guide

Development Environment Setup

python -m venv geo-env
source geo-env/bin/activate
pip install rasterio numpy scikit-learn

Project Structure

project-root/
├── data/
│   ├── imagery_2019.tif
│   └── imagery_2024.tif
├── src/
│   ├── features.py
│   ├── train.py
│   └── classify.py
├── output/
│   ├── landcover_2019.tif
│   ├── landcover_2024.tif
│   └── change_report.txt
└── README.md

The Core Question You’re Answering

“How do I convert raw spectral data into land cover classes and quantify change?”

Concepts You Must Understand First

  1. Raster metadata: CRS, transform, band order.
  2. Indices: NDVI, NDWI meanings.
  3. Model validation: Confusion matrix and accuracy.

Questions to Guide Your Design

  1. How will you label training data?
  2. Which classes are most important to detect?
  3. What is an acceptable accuracy threshold?
  4. How will you handle cloud cover?

Thinking Exercise

Compute NDVI manually for one pixel and interpret the result. Then classify it as vegetation or not.

Interview Questions

  1. What is a raster and how is it different from vector data?
  2. How does NDVI detect vegetation?
  3. What are common sources of classification error?
  4. How do you evaluate a land cover classifier?
  5. Why does CRS metadata matter?
  6. How would you scale this to national datasets?

Hints in Layers

  • Hint 1: Start with a small raster subset.
  • Hint 2: Use a baseline random forest model.
  • Hint 3: Save CRS and transform when writing outputs.
  • Hint 4: Add cloud masks if available.

Books That Will Help

Topic Book Chapter
Raster basics Rasterio docs Quickstart
Remote sensing GIS programming Remote sensing
ML classification ML intro Classification

Implementation Phases

Phase 1: Data Prep (1 week)

  • Load imagery and inspect bands.
  • Compute indices.

Checkpoint: NDVI images are generated.

Phase 2: Training (1 week)

  • Build labeled dataset.
  • Train classifier and evaluate accuracy.

Checkpoint: Model produces reasonable accuracy.

Phase 3: Classification + Change (1-2 weeks)

  • Classify both dates.
  • Compute change statistics.

Checkpoint: Change report generated.

Key Implementation Decisions

Decision Options Recommendation Rationale
Data source Sentinel-2, Landsat Sentinel-2 Higher resolution
Model Random forest, SVM Random forest Strong baseline
Features bands only, bands + indices bands + indices Better separability

Testing Strategy

Category Purpose Examples
Raster integrity Valid metadata CRS present
Model accuracy Confusion matrix Per-class accuracy
Output validity Class values Expected classes only

Critical cases:

  • Cloud-covered pixels.
  • Missing bands.
  • Class imbalance.

Common Pitfalls and Debugging

Pitfall Symptom Solution
Wrong band order Incorrect NDVI Verify band indices
Memory errors Crash Windowed reads
Poor labels Low accuracy Improve training samples

Extensions and Challenges

  • Add cloud masking.
  • Use a deep learning segmentation model.
  • Build a batch pipeline for multiple regions.

Real-World Connections

  • Agriculture monitoring.
  • Urban expansion tracking.
  • Climate research.

Resources

  • Rasterio: https://rasterio.readthedocs.io/
  • Sentinel Hub: https://www.sentinel-hub.com/
  • Scikit-learn: https://scikit-learn.org/

Self-Assessment Checklist

  • I can compute NDVI and explain it.
  • I can train and evaluate a classifier.
  • I can quantify change over time.

Submission / Completion Criteria

Minimum Viable Completion

  • NDVI computed and baseline classification produced.

Full Completion

  • Two time periods classified with change report.

Excellence

  • Cloud masking and advanced model.

This guide was generated from GEOSPATIAL_PYTHON_LEARNING_PROJECTS.md. For the complete learning path, see the parent directory GEOSPATIAL_PYTHON_LEARNING_PROJECTS/README.md.