Project 5: Satellite Image Land Cover Classifier
Build a pipeline that processes satellite imagery and classifies land cover types over time.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 3-4 weeks |
| Language | Python |
| Prerequisites | NumPy, basic ML concepts |
| Key Topics | raster data, spectral indices, classification, change detection |
| Output | Classified rasters + change report |
Learning Objectives
By completing this project, you will:
- Load multi-band raster imagery with Rasterio.
- Compute spectral indices like NDVI and NDWI.
- Train a baseline classifier for land cover.
- Produce a classified map with a legend.
- Quantify land cover change between two dates.
- Handle large rasters efficiently with windowed reads.
The Core Question You’re Answering
“How do I turn raw satellite pixels into meaningful land cover categories and track change over time?”
This is the foundation of remote sensing and environmental monitoring.
Concepts You Must Understand First
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Raster data | Pixels + bands are the raw input | Rasterio docs |
| Spectral indices | NDVI separates vegetation | Remote sensing intro |
| Supervised classification | Turn labeled pixels into models | ML basics |
| Change detection | Compare classified rasters | Remote sensing guides |
| CRS and transforms | Preserve georeferencing | GIS basics |
Key Concepts Deep Dive
- Raster Bands
- Sentinel-2 has multiple bands (red, green, NIR).
- Each pixel is a vector, not a single value.
- NDVI Formula
- (NIR - Red) / (NIR + Red)
- High NDVI = healthy vegetation.
- Classification Workflow
- Labeled pixels -> feature matrix -> model -> prediction.
- Change Detection
- Compare class counts across years.
- Generate percent change per class.
Theoretical Foundation
Raster Classification Pipeline
Raw Bands -> Indices -> Feature Stack -> Classifier -> Land Cover Map
|
v
Change Detection
Why Windowed Reads Matter
Large rasters can exceed memory. Process them tile-by-tile to avoid crashes.
Project Specification
What You Will Build
A pipeline that downloads satellite imagery for two dates, computes indices, trains a classifier, produces land cover maps, and reports changes.
Functional Requirements
- Load multi-band imagery (two dates).
- Compute NDVI and NDWI.
- Build training samples and train a classifier.
- Generate classified raster outputs.
- Compute change statistics by class.
Non-Functional Requirements
- Memory safety: Windowed processing for large rasters.
- Reproducibility: Save model parameters and metadata.
- Clarity: Document class labels and assumptions.
Example Usage / Output
$ python classify_landcover.py --region "sf_bay" --year 2019 --year2 2024
Saved: output/landcover_2019.tif
Saved: output/landcover_2024.tif
Change report: output/change_report.txt
Real World Outcome
You open the output maps and see:
- A classified land cover map for each year.
- A legend that maps colors to classes.
- A change report showing percent increases/decreases.
Solution Architecture
High-Level Design
Imagery -> Indices -> Training -> Model -> Classification -> Change Report
Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Loader | Read bands | Rasterio windowed reads |
| Feature builder | Compute indices | NDVI + NDWI |
| Trainer | Build model | Random forest baseline |
| Classifier | Predict per pixel | Windowed inference |
| Reporter | Change stats | Percent by class |
Data Structures
numpy.ndarray # raster bands and features
Algorithm Overview
- Load raster bands for two dates.
- Compute NDVI/NDWI and stack features.
- Train classifier using labeled pixels.
- Predict classes for each pixel.
- Compare class counts between dates.
Complexity
- Feature computation: O(n) per pixel.
- Classification: O(n * trees) for random forest.
Implementation Guide
Development Environment Setup
python -m venv geo-env
source geo-env/bin/activate
pip install rasterio numpy scikit-learn
Project Structure
project-root/
├── data/
│ ├── imagery_2019.tif
│ └── imagery_2024.tif
├── src/
│ ├── features.py
│ ├── train.py
│ └── classify.py
├── output/
│ ├── landcover_2019.tif
│ ├── landcover_2024.tif
│ └── change_report.txt
└── README.md
The Core Question You’re Answering
“How do I convert raw spectral data into land cover classes and quantify change?”
Concepts You Must Understand First
- Raster metadata: CRS, transform, band order.
- Indices: NDVI, NDWI meanings.
- Model validation: Confusion matrix and accuracy.
Questions to Guide Your Design
- How will you label training data?
- Which classes are most important to detect?
- What is an acceptable accuracy threshold?
- How will you handle cloud cover?
Thinking Exercise
Compute NDVI manually for one pixel and interpret the result. Then classify it as vegetation or not.
Interview Questions
- What is a raster and how is it different from vector data?
- How does NDVI detect vegetation?
- What are common sources of classification error?
- How do you evaluate a land cover classifier?
- Why does CRS metadata matter?
- How would you scale this to national datasets?
Hints in Layers
- Hint 1: Start with a small raster subset.
- Hint 2: Use a baseline random forest model.
- Hint 3: Save CRS and transform when writing outputs.
- Hint 4: Add cloud masks if available.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Raster basics | Rasterio docs | Quickstart |
| Remote sensing | GIS programming | Remote sensing |
| ML classification | ML intro | Classification |
Implementation Phases
Phase 1: Data Prep (1 week)
- Load imagery and inspect bands.
- Compute indices.
Checkpoint: NDVI images are generated.
Phase 2: Training (1 week)
- Build labeled dataset.
- Train classifier and evaluate accuracy.
Checkpoint: Model produces reasonable accuracy.
Phase 3: Classification + Change (1-2 weeks)
- Classify both dates.
- Compute change statistics.
Checkpoint: Change report generated.
Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Data source | Sentinel-2, Landsat | Sentinel-2 | Higher resolution |
| Model | Random forest, SVM | Random forest | Strong baseline |
| Features | bands only, bands + indices | bands + indices | Better separability |
Testing Strategy
| Category | Purpose | Examples |
|---|---|---|
| Raster integrity | Valid metadata | CRS present |
| Model accuracy | Confusion matrix | Per-class accuracy |
| Output validity | Class values | Expected classes only |
Critical cases:
- Cloud-covered pixels.
- Missing bands.
- Class imbalance.
Common Pitfalls and Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong band order | Incorrect NDVI | Verify band indices |
| Memory errors | Crash | Windowed reads |
| Poor labels | Low accuracy | Improve training samples |
Extensions and Challenges
- Add cloud masking.
- Use a deep learning segmentation model.
- Build a batch pipeline for multiple regions.
Real-World Connections
- Agriculture monitoring.
- Urban expansion tracking.
- Climate research.
Resources
- Rasterio: https://rasterio.readthedocs.io/
- Sentinel Hub: https://www.sentinel-hub.com/
- Scikit-learn: https://scikit-learn.org/
Self-Assessment Checklist
- I can compute NDVI and explain it.
- I can train and evaluate a classifier.
- I can quantify change over time.
Submission / Completion Criteria
Minimum Viable Completion
- NDVI computed and baseline classification produced.
Full Completion
- Two time periods classified with change report.
Excellence
- Cloud masking and advanced model.
This guide was generated from GEOSPATIAL_PYTHON_LEARNING_PROJECTS.md. For the complete learning path, see the parent directory GEOSPATIAL_PYTHON_LEARNING_PROJECTS/README.md.