Project 5: Wardriving WiFi Mapper

Build a mobile WiFi mapping tool that fuses scan results with GPS data and writes clean CSV logs suitable for mapping tools and Wigle imports.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2–3 weeks
Main Programming Language C/C++ (ESP-IDF)
Alternative Programming Languages Arduino
Coolness Level High
Business Potential Medium (site surveys, wireless planning)
Prerequisites WiFi scanning basics, UART/GPS parsing, SD storage
Key Topics channel hopping, NMEA parsing, data fusion, durable logging

1. Learning Objectives

By completing this project, you will:

  1. Perform continuous WiFi scanning with channel hopping.
  2. Parse GPS NMEA sentences and extract accurate timestamps.
  3. Fuse asynchronous WiFi and GPS streams into a coherent dataset.
  4. Build a CSV logging pipeline with durability and schema validation.
  5. Design a UI that displays live fix status and scan rate.

2. All Theory Needed (Per-Concept Breakdown)

2.1 WiFi Scanning, Channel Hopping, and RSSI Interpretation

Fundamentals

WiFi scanning is a periodic snapshot of visible networks on a channel. Because WiFi is distributed across channels, you must hop channels to cover the spectrum. Each hop is a tradeoff: spend too little time and you miss beacons; spend too long and your coverage is slow. RSSI is a noisy measure of signal strength; it fluctuates due to multipath, movement, and interference. A mapper should treat RSSI as a statistical measure rather than a precise distance estimate.

Deep Dive into the concept

ESP32 provides active scanning APIs that sweep channels and return a list of APs with SSID, BSSID, channel, and RSSI. Active scanning sends probe requests and waits for responses, which can reveal hidden SSIDs but also increases power use and may be less stealthy. Passive scanning listens for beacons; it is slower but quieter. For wardriving, active scanning is often acceptable but must be balanced with battery life. The scan interval determines how frequently you update the network list. If you scan too often, you starve GPS parsing and SD logging; if you scan too slowly, you lose spatial resolution.

Channel hopping can be implemented either by the WiFi scan API (which hops internally) or by manual channel setting and short dwell listening. Manual hopping gives more control and can be synchronized with GPS timestamps, but it requires careful timing and callback management. A stable system uses a fixed hop schedule, e.g., 200 ms per channel, and logs the channel for each captured network. RSSI values are typically in dBm and are negative; higher values (closer to 0) indicate stronger signals. Because RSSI fluctuates, you should smooth it or log multiple observations per AP and compute a moving average.

RSSI alone does not give accurate distance; it depends on environment, antenna orientation, and hardware variation. Therefore, a mapper should not over-interpret RSSI. Instead, you can use it to highlight strong vs weak signals and to filter duplicates. For example, you might log an AP only if the RSSI changes by more than 5 dB since last observation, reducing file size while preserving useful data.

How this fits in projects

This scanning knowledge builds directly on P01-wifi-packet-sniffer-network-analyzer.md and feeds into P08-complete-cardputer-security-toolkit.md.

Definitions & key terms

  • Active scan → sends probe requests to solicit responses
  • Passive scan → listens for beacons only
  • Channel hop → switching channels on a schedule
  • RSSI → received signal strength indicator (dBm)

Mental model diagram (ASCII)

[Channel 1] -> [Scan] -> [Results] -> [Log]
[Channel 6] -> [Scan] -> [Results] -> [Log]
[Channel 11]-> [Scan] -> [Results] -> [Log]

How it works (step-by-step, with invariants and failure modes)

  1. Start scan cycle on channel set.
  2. Collect AP list with RSSI and metadata.
  3. Tag results with channel and time.
  4. Filter duplicates and log significant changes.

Invariants: scan cadence stable; channels tracked; RSSI recorded consistently. Failure modes: scanning too often starves GPS parsing; hopping too fast misses APs.

Minimal concrete example

esp_wifi_scan_start(&scan_cfg, true);

Common misconceptions

  • “RSSI equals distance.” → It is noisy and environment-dependent.
  • “Faster hopping is always better.” → Too fast misses beacons.
  • “Active scanning is harmless.” → It increases power and visibility.

Check-your-understanding questions

  1. Why does RSSI fluctuate even when you stand still?
  2. What is the tradeoff between active and passive scanning?
  3. How does hop dwell time affect accuracy?

Check-your-understanding answers

  1. Multipath fading and interference cause rapid changes.
  2. Active finds more networks but costs power and emits probes.
  3. Too short dwell misses beacons; too long reduces coverage rate.

Real-world applications

  • Wireless site surveys for businesses.
  • Field diagnostics for network operators.

Where you’ll apply it

  • This project: see §4.1 and §5.10 for scan scheduling.
  • Also used in: P01-wifi-packet-sniffer-network-analyzer.md, P08-complete-cardputer-security-toolkit.md.

References

  • ESP-IDF WiFi scan documentation.
  • Computer Networks – wireless chapters.

Key insight

A mapper measures trends, not absolute truth; treat RSSI as a noisy signal.

Summary

Scanning and hopping are about timing tradeoffs. Stable cadence and sensible filtering produce useful maps.

Homework/Exercises to practice the concept

  1. Log RSSI for a single AP every second for 2 minutes.
  2. Compare active vs passive scan results.

Solutions to the homework/exercises

  1. Plot RSSI and observe fluctuations of 3–10 dB.
  2. Active scans typically find more SSIDs but consume more power.

2.2 GPS NMEA Parsing and Time Synchronization

Fundamentals

GPS modules output NMEA sentences over UART. These sentences contain time, latitude, longitude, altitude, and fix status. A mapper must parse at least $GPRMC and $GPGGA to get timestamp and position. The challenge is to parse efficiently without blocking and to handle cases where no fix is available.

Deep Dive into the concept

NMEA sentences are ASCII lines starting with $ and ending with \r\n, with fields separated by commas. $GPRMC contains time, date, validity, latitude, longitude, speed, and course. $GPGGA includes fix quality, satellites, altitude, and HDOP. Parsing should be incremental: read bytes from UART into a ring buffer, detect full lines, then parse fields. This avoids blocking and allows you to handle partial lines or noise.

Time synchronization is critical. WiFi scans produce their own timestamps (usually from the system clock), while GPS provides absolute UTC time. You should treat GPS time as the authority once a fix is valid. When GPS is not locked, you can still log WiFi scans with a “no fix” flag and later discard or interpolate them. To merge streams, you can store the latest GPS fix and associate it with each scan event. Because scans and GPS updates occur at different rates, you should record the time of each scan and the time of the latest GPS fix. If the fix is stale (older than a threshold), mark the record accordingly.

Parsing must be robust: fields can be empty, and checksums can indicate corrupted sentences. Implement checksum validation and discard invalid lines. Convert NMEA lat/long format (ddmm.mmmm) into decimal degrees, and handle N/S/E/W correctly. Mistakes here will flip hemispheres or produce invalid coordinates. For altitude, convert meters to float and store. For mapping, you can also store HDOP or accuracy estimates.

How this fits in projects

Parsing discipline and asynchronous data handling show up in P02-universal-ir-remote-with-learning-capability.md (timing capture) and in the capstone P08-complete-cardputer-security-toolkit.md where GPS may serve multiple tools.

Definitions & key terms

  • NMEA sentence → ASCII line with GPS fields
  • Fix → valid position solution (2D or 3D)
  • HDOP → horizontal dilution of precision (accuracy estimate)
  • UTC → Coordinated Universal Time

Mental model diagram (ASCII)

[UART Bytes] -> [Line Parser] -> [NMEA Fields] -> [GPS Fix State]

How it works (step-by-step, with invariants and failure modes)

  1. Read UART bytes into a buffer.
  2. Detect end-of-line and validate checksum.
  3. Parse fields into a GPS fix struct.
  4. Update latest fix with timestamp and quality.
  5. Mark validity and age.

Invariants: checksum must pass; fix quality >= 1. Failure modes: empty fields, malformed checksum, stale fixes.

Minimal concrete example

if (nmea_checksum_ok(line)) parse_gprmc(line, &fix);

Common misconceptions

  • “GPS always has a fix.” → It can take minutes or fail indoors.
  • “Parsing once is enough.” → You must update continuously.
  • “UTC time equals local time.” → You must convert if needed.

Check-your-understanding questions

  1. What does HDOP represent?
  2. Why must you validate checksums?
  3. How do you convert NMEA lat/long format?

Check-your-understanding answers

  1. An estimate of horizontal accuracy (lower is better).
  2. To avoid corrupt data from noisy serial links.
  3. Convert degrees and minutes into decimal degrees and apply N/S/E/W sign.

Real-world applications

  • GPS logging for fitness and transportation.
  • Asset tracking and fleet management.

Where you’ll apply it

  • This project: see §4.2 and §5.10 Phase 2.
  • Also used in: P08-complete-cardputer-security-toolkit.md (global location service).

References

  • NMEA 0183 sentence reference.
  • GPS module datasheets (u-blox, etc.).

Key insight

GPS parsing is a streaming problem; treat sentences as a continuous stream, not a one-off read.

Summary

Robust NMEA parsing with checksum validation and fix age tracking is the backbone of reliable mapping.

Homework/Exercises to practice the concept

  1. Parse a recorded NMEA log and output CSV lat/lon/time.
  2. Simulate loss of fix and verify “no fix” flagging.

Solutions to the homework/exercises

  1. Split by commas and convert ddmm.mmmm to decimal degrees.
  2. Set fix.valid=false when no recent valid sentence is received.

2.3 Data Fusion and Durable CSV Logging

Fundamentals

Wardriving produces two asynchronous data streams: WiFi scans and GPS fixes. Data fusion means pairing each scan with the closest GPS fix and logging them into a consistent schema. Logging must be durable: writing thousands of rows to SD requires batching and careful formatting to avoid corruption and fragmentation.

Deep Dive into the concept

Data fusion is essentially a join operation on time. The WiFi scan gives you a list of APs at time T. The GPS provides fixes at times T1, T2, … A simple fusion strategy uses the most recent fix at the time of the scan, as long as it is not stale (e.g., older than 5 seconds). If it is stale, you still log the scan but mark coordinates as null and set fix_valid=false. This preserves data integrity and allows later filtering.

CSV schema design matters because tools like Wigle expect specific columns. A robust schema includes timestamp, SSID, BSSID, channel, RSSI, auth type, latitude, longitude, altitude, and accuracy. Use consistent ordering and include a header row. CSV should be strictly comma-separated with proper quoting for SSIDs that include commas or quotes. If you fail to escape SSIDs, the CSV becomes invalid. For performance, avoid sprintf for each line; use a preallocated buffer and append. Batch writes in chunks (e.g., 4–8 KB) to reduce SD wear and latency.

Durability requires careful flush strategy. Write header once, then append lines. Flush periodically (e.g., every 1 second or every N rows). If power is lost, you may lose the last buffer but preserve most data. Optionally write a companion index file that records last flush time and row count. This can help detect incomplete logs and allow safe recovery.

Deduplication can reduce file size: if the same BSSID is seen repeatedly with minor RSSI changes, log only when RSSI changes beyond a threshold or after a time interval. However, deduplication must be explicit and configurable because some mapping tools prefer raw observations. Provide a “full log” and “filtered log” mode.

How this fits in projects

The logging and durability patterns mirror the PCAP writer in P01-wifi-packet-sniffer-network-analyzer.md and the storage discipline required in P08-complete-cardputer-security-toolkit.md.

Definitions & key terms

  • Data fusion → combining multiple data streams by time or identity
  • Schema → fixed column layout for CSV
  • Stale fix → GPS fix older than threshold
  • Escaping → quoting CSV values with commas/quotes

Mental model diagram (ASCII)

[WiFi Scan @ T] + [Latest GPS Fix] -> [Row Builder] -> [CSV Buffer] -> [SD]

How it works (step-by-step, with invariants and failure modes)

  1. Receive scan results with timestamp.
  2. Read latest GPS fix and check age.
  3. Build CSV rows for each AP.
  4. Escape SSID as needed and append to buffer.
  5. Flush buffer on threshold or timer.

Invariants: header written once; CSV columns fixed; GPS fix age tracked. Failure modes: unescaped SSIDs break CSV; SD delays cause drops; stale GPS leads to wrong coordinates.

Minimal concrete example

snprintf(line, sizeof(line), "%s,%s,%s,%d,%d,%s,%.6f,%.6f",
    ts, ssid, bssid, ch, rssi, auth, lat, lon);

Common misconceptions

  • “CSV is trivial.” → Escaping and consistency are critical.
  • “GPS fix is always valid.” → It can be stale or invalid.
  • “Logging every scan is always best.” → It creates huge files and wear.

Check-your-understanding questions

  1. Why should you track GPS fix age?
  2. What happens if SSID contains a comma?
  3. Why batch writes instead of writing each line?

Check-your-understanding answers

  1. Because stale fixes produce incorrect coordinates.
  2. It breaks CSV column alignment unless quoted.
  3. To reduce SD overhead and latency.

Real-world applications

  • Wireless coverage mapping.
  • Urban connectivity analysis.

Where you’ll apply it

  • This project: see §3.5 data format and §5.10 Phase 3.
  • Also used in: P01-wifi-packet-sniffer-network-analyzer.md, P08-complete-cardputer-security-toolkit.md.

References

  • Wigle CSV format references.
  • Making Embedded Systems – logging and storage chapters.

Key insight

Good maps come from clean data; clean data comes from careful fusion and logging.

Summary

Fuse scans with the latest valid GPS fix, log consistently, and batch writes for durability.

Homework/Exercises to practice the concept

  1. Design a CSV schema and validate it in a spreadsheet.
  2. Implement a 5-second fix-age timeout.

Solutions to the homework/exercises

  1. Ensure header matches each row and SSIDs are quoted.
  2. Store fix_time and compare to scan timestamp.

3. Project Specification

3.1 What You Will Build

A WiFi mapper that:

  • scans networks across channels,
  • reads GPS fixes via UART,
  • logs CSV rows with WiFi + GPS data,
  • displays live fix status and scan counts.

3.2 Functional Requirements

  1. WiFi scanning: continuous channel hops with configurable dwell time.
  2. GPS parsing: parse NMEA and track fix validity.
  3. Data fusion: associate scans with most recent fix.
  4. CSV logging: strict schema, header, and escaping.
  5. UI: show fix status, satellites, and last SSID.

3.3 Non-Functional Requirements

  • Performance: handle 30+ scans/min with no UI lag.
  • Reliability: logs remain valid after power cycles.
  • Usability: clear indication of fix validity.

3.4 Example Usage / Output

1) Connect GPS module to Grove UART.
2) Start mapping; device shows fix status.
3) After 10 minutes, remove SD and import CSV into a mapping tool.

3.5 Data Formats / Schemas / Protocols

CSV header:

timestamp,ssid,bssid,channel,rssi,auth,lat,lon,alt_m,accuracy_m,fix_valid

3.6 Edge Cases

  • No GPS fix (log with fix_valid=false).
  • SD full (stop logging, show error).
  • SSIDs with commas or quotes.

3.7 Real World Outcome

A successful build produces a CSV file that imports cleanly into mapping tools, with valid coordinates and consistent columns. The UI shows GPS fix and scan rate in real time.

3.7.1 How to Run (Copy/Paste)

idf.py set-target esp32s3
idf.py build
idf.py -p /dev/ttyUSB0 flash monitor

3.7.2 Golden Path Demo (Deterministic)

  • Use a stationary test with a GPS fix and two known APs.
  • Perform 5 scans and verify 10 rows logged.

Failure demo (deterministic):

  • Disconnect GPS module and start mapping for 30 seconds. Expected: UI shows “NO FIX,” CSV rows have fix_valid=false, and a warning is logged. Exit code: 2 when stopping (no-fix warning).

3.7.3 If CLI: exact terminal transcript

I (5200) gps: fix=3D sats=8 lat=37.7749 lon=-122.4194
I (5201) wifi: scan=5 ssids=12
I (5202) log: wrote 12 rows

Exit codes: 0 = success, 2 = no GPS fix, 3 = SD write error.

3.7.4 If Web App

Not applicable.

3.7.5 If API

Not applicable.

3.7.6 If Library

Not applicable.

3.7.7 If GUI / Desktop / Mobile

Not applicable.

3.7.8 If TUI

+----------------------------+
| Wardrive Mapper            |
| Fix: 3D (8 sats)           |
| Networks: 247              |
| Last: CoffeeShop (-42)     |
+----------------------------+

4. Solution Architecture

4.1 High-Level Design

[WiFi Scan] -> [Results] -> [Fusion] -> [CSV Buffer] -> [SD Writer]
                          ^
                          |
                       [GPS Fix]

4.2 Key Components

Component Responsibility Key Decisions
WiFi scanner Scan/hop channels active vs passive
GPS parser Parse NMEA checksum validation
Fusion engine Combine data fix age threshold
Logger CSV buffering batch size
UI Fix status + stats update cadence

4.3 Data Structures (No Full Code)

typedef struct {
    bool fix_valid;
    double lat, lon;
    float hdop;
    uint32_t fix_time;
} gps_fix_t;

4.4 Algorithm Overview

Key Algorithm: Scan-Fix Join

  1. Capture scan timestamp.
  2. Retrieve latest GPS fix.
  3. If fix age < threshold, use fix; else mark invalid.
  4. Emit CSV rows.

Complexity Analysis:

  • Time: O(n) per scan (n = AP count)
  • Space: O(1) extra per scan

5. Implementation Guide

5.1 Development Environment Setup

idf.py set-target esp32s3
idf.py build

5.2 Project Structure

project-root/
├── main/
│   ├── wifi_scan.c
│   ├── gps.c
│   ├── fusion.c
│   ├── csv_log.c
│   └── ui.c
└── README.md

5.3 The Core Question You’re Answering

“How do I merge WiFi and GPS streams into a clean, mappable dataset?”

5.4 Concepts You Must Understand First

  1. WiFi scanning and RSSI behavior.
  2. NMEA parsing and fix validity.
  3. CSV schema discipline.

5.5 Questions to Guide Your Design

  1. What fix age threshold is acceptable?
  2. How will you handle SSIDs with commas?
  3. Will you deduplicate scans or log all?

5.6 Thinking Exercise

Design a CSV schema that can be imported into Wigle. Which fields are mandatory?

5.7 The Interview Questions They Will Ask

  1. Why is RSSI unstable?
  2. How do you handle stale GPS data?
  3. How do you prevent SD wear in long logs?

5.8 Hints in Layers

Hint 1: Display scan results on screen first.

Hint 2: Parse GPS and show fix status.

Hint 3: Log CSV with a fixed header.

5.9 Books That Will Help

Topic Book Chapter
WiFi basics Computer Networks Ch. 6
Logging Making Embedded Systems Ch. 8

5.10 Implementation Phases

Phase 1: WiFi scanning (4–5 days)

Phase 2: GPS parsing + fusion (5–7 days)

Phase 3: Logging + UI (5–7 days)

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Scan mode active, passive active more SSIDs, faster results
Fix age 2s, 5s, 10s 5s balances accuracy and availability
Logging full, dedup configurable user control

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests NMEA parsing checksum validation
Integration Tests scan+GPS+log valid CSV output
Edge Tests no fix fix_valid=false rows

6.2 Critical Test Cases

  1. GPS no fix for 60s → rows flagged invalid.
  2. SSID with comma is quoted properly.
  3. SD removed → logging disabled, UI warning.

6.3 Test Data

Sample NMEA sentences
Mock scan results with SSID commas

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
No checksum validation random coordinates validate NMEA checksum
No quoting CSV import fails escape SSIDs
Too frequent scans UI lag increase scan interval

7.2 Debugging Strategies

  • Log raw NMEA lines when fix fails.
  • Validate CSV in a spreadsheet after each run.

7.3 Performance Traps

  • Writing each row individually to SD.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a “pause logging” button.

8.2 Intermediate Extensions

  • Add JSON export option.

8.3 Advanced Extensions

  • Add heatmap aggregation on device.

9. Real-World Connections

9.1 Industry Applications

  • Wireless planning and site surveys.
  • Municipal connectivity mapping.
  • Wigle data collectors.

9.3 Interview Relevance

  • Data fusion, asynchronous streams, and logging reliability.

10. Resources

10.1 Essential Reading

  • NMEA 0183 reference.

10.2 Video Resources

  • GPS parsing tutorials.

10.3 Tools & Documentation

  • ESP-IDF WiFi scan API.
  • P01-wifi-packet-sniffer-network-analyzer.md
  • P08-complete-cardputer-security-toolkit.md

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain active vs passive scanning.
  • I can parse NMEA sentences.

11.2 Implementation

  • CSV logs are consistent and importable.
  • GPS fix status is accurate.

11.3 Growth

  • I can explain data fusion strategies.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Scans WiFi and logs CSV rows with GPS fix or no-fix flag.

Full Completion:

  • Robust parsing, consistent schema, and stable UI.

Excellence (Going Above & Beyond):

  • Heatmap generation or Wigle-compatible export tools.