Project 5: Embedded Sensor State Machine (Arduino/STM32)
Build an embedded sensor controller with explicit states, safe error recovery, and low-power transitions.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C (Alternatives: Rust, C++) |
| Alternative Programming Languages | Rust (embedded), C++ |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | Level 2: Micro-SaaS / Hardware Tool |
| Prerequisites | Basic embedded I/O, interrupts, timers |
| Key Topics | State machines, ISR discipline, watchdogs, power management |
1. Learning Objectives
By completing this project, you will:
- Model an embedded system as an explicit event-driven state machine.
- Separate ISR work from main-loop work safely.
- Implement debouncing and timing-based transitions.
- Add fault handling, watchdog safety, and recovery states.
- Demonstrate deterministic behavior under scripted events.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Event-Driven Embedded State Machines
Fundamentals
Embedded systems must respond to asynchronous events (timers, interrupts, sensor data). The safest way to manage this is an explicit state machine: INIT -> ACTIVE -> LOW_POWER -> ERROR_SAFE. Each event triggers a transition, and each state has clear invariants (e.g., sensors initialized, timers running, watchdog fed). Without explicit states, the system becomes a maze of flags and hidden assumptions. A state machine also makes error recovery deterministic: a sensor failure does not crash the system; it transitions into a safe state that can retry or shut down.
This approach makes power and safety policy explicit. You can declare which peripherals are enabled in each state and prove that no unsafe combinations exist. That clarity is essential when hardware constraints are tight and failures are costly.
Deep Dive into the concept
The embedded world is unforgiving because there is no operating system to protect you. Any bug can hang the device or drain the battery. A state machine provides structure. Define explicit states, define the events that cause transitions, and define the actions performed on each transition. This makes your control flow understandable and testable. For example, the INIT state must configure GPIO, I2C, and timers before transitioning to ACTIVE. ACTIVE collects sensor data periodically. LOW_POWER disables peripherals and sleeps, waking only on a timer or external event. ERROR_SAFE disables outputs and attempts recovery after a backoff.
An event-driven design also separates concerns between interrupts and the main loop. Interrupts should be minimal and should only set flags or push events into a queue. The main loop reads these events and drives the state machine. This separation prevents reentrancy bugs: if your ISR performs I2C operations, and the main loop also uses I2C, you can deadlock or corrupt bus state. The state machine prevents this by ensuring that hardware operations occur only in the main loop, in defined states.
Event ordering is another subtlety. If multiple events arrive close together (for example, a timer tick and a sensor-ready interrupt), the order in which you process them can change behavior. A queued event model makes ordering explicit: you define whether timer events are processed before sensor events, and you can test both. This matters when transitions depend on counters like idle time or failure counts, because a different processing order can produce a different state sequence.
Transitions must be explicit and guarded. For example, a transition from ACTIVE to LOW_POWER should only happen after the system has been idle for N cycles. This is a temporal invariant: “idle for N cycles” must be true before the transition. Similarly, a transition from ERROR_SAFE to ACTIVE might require a successful sensor self-test. These guards are part of the state machine definition and should be encoded as conditions, not ad-hoc checks scattered in the code.
State machines also make testing possible. You can simulate events in a desktop harness and verify that the state transitions happen as expected. This is important because real hardware is slow to iterate with. The simulator can replay a fixed sequence of events (sensor OK, sensor timeout, user wake) and you can check that the system enters and exits ERROR_SAFE correctly. This deterministic testing is essential for safety-critical systems.
Finally, embedded state machines must consider resource lifecycles. A sensor may require initialization and shutdown steps. A low-power mode may require saving configuration or disabling peripherals. If you transition to LOW_POWER without shutting down the sensor bus, you may draw too much current. If you transition out of ERROR_SAFE without reinitializing, you may have stale state. Explicit state transitions make these lifecycle steps predictable and auditable.
A well-designed state machine also encodes a cooperative scheduling policy. You decide which tasks are allowed to run in each state and keep the loop bounded in time, so no single task starves others. This ensures that safety-critical tasks like watchdog feeding or fault checks are never delayed by non-critical tasks like logging or telemetry.
How this fit on projects
The state machine is the central architecture of the project: all sensor events, power states, and error recoveries are modeled as transitions with explicit invariants.
Definitions & key terms
- ISR (Interrupt Service Routine): Code executed asynchronously on hardware interrupt.
- Event queue: Buffer of events processed by main loop.
- State guard: Condition that must be true for a transition.
- Safe state: State where outputs are disabled and system is stable.
Mental model diagram (ASCII)
INIT -> ACTIVE -> LOW_POWER
| | |
v v v
ERROR_SAFE <---------
How it works (step-by-step)
- On boot, enter INIT and configure peripherals.
- Transition to ACTIVE after init success.
- On idle timeout, transition to LOW_POWER.
- On sensor failure, transition to ERROR_SAFE.
- After recovery condition, transition back to ACTIVE.
Failure modes: running I2C in ISR, missing transition guards, outputs active in error state.
Minimal concrete example
typedef enum { INIT, ACTIVE, LOW_POWER, ERROR_SAFE } State;
volatile int evt_queue[16];
void loop(void) {
int evt = pop_event();
switch (state) {
case ACTIVE:
if (evt == EVT_SENSOR_TIMEOUT) state = ERROR_SAFE;
break;
}
}
Common misconceptions
- “ISRs can do anything.” They must be minimal and non-blocking.
- “State can be derived from flags.” Multiple flags create ambiguous states.
Check-your-understanding questions
- Why should ISR work be minimal?
- What is a state guard?
- Why is ERROR_SAFE a separate explicit state?
Check-your-understanding answers
- ISRs must be fast and cannot block; heavy work risks deadlocks and missed interrupts.
- A condition that must be true before a transition is allowed.
- It ensures outputs are disabled and recovery is deterministic.
Real-world applications
- IoT sensors and wearable devices
- Automotive controllers
- Industrial monitoring systems
Where you’ll apply it
- In this project: see §3.2 Functional Requirements and §5.10 Phase 2.
- Also used in: P01-modal-text-editor.md (state machines), P03-connection-pool.md (resource lifecycles).
References
- “Making Embedded Systems” by Elecia White (state machine design)
- “Embedded Systems Fundamentals” by Audsley (event-driven design)
Key insights
State machines turn unpredictable events into deterministic behavior.
Summary
Explicit states and guarded transitions are the only safe way to manage embedded control flow.
Homework/Exercises to practice the concept
- Draw a state machine for a door sensor with OPEN, CLOSED, ALARM.
- Simulate events and verify transitions manually.
Solutions to the homework/exercises
- CLOSED -> OPEN on open event; OPEN -> CLOSED on close; OPEN -> ALARM after timeout.
- Step through events and ensure guards are satisfied before transitions.
2.2 Timing, Interrupts, Debouncing, and Fault Handling
Fundamentals
Embedded systems depend on time. Sensors are read periodically, buttons bounce, and watchdogs must be fed on schedule. Timing is handled by timers and interrupts, but these must be integrated safely with your state machine. Debouncing prevents noisy inputs from causing false transitions. Fault handling ensures that when a sensor fails or returns invalid data, the system enters a safe state rather than misbehaving. Together, these timing and fault strategies enforce temporal invariants: “sensor read every 100ms,” “button must be stable for 20ms,” “watchdog fed every 1s.” Without them, the system will be unreliable or unsafe.
Because timing errors compound, you must treat time as a first-class input stream. Every transition that depends on time should be stated in terms of counters or deadlines, not implicit delays. This makes behavior deterministic and testable, both in simulation and on real hardware.
Deep Dive into the concept
Timing in embedded systems is often implemented with hardware timers that generate periodic interrupts. A common design is to have the timer ISR set a flag or enqueue a TICK event, then the main loop processes that event. This allows you to keep the ISR fast and avoids reentrancy problems. The main loop then uses a software timer or counter to decide when to read sensors, when to enter low power, and when to feed the watchdog. This is a temporal state machine layered on top of the explicit state machine.
Debouncing is crucial for physical inputs. A button press may produce multiple rapid on/off transitions due to mechanical bounce. If you use raw reads, you will trigger multiple state transitions for a single press. A standard approach is to require N consecutive identical readings before accepting a state change. This can be implemented with a counter that increments when readings are stable and resets when they are not. The debouncer itself is a state machine with states such as STABLE_HIGH, STABLE_LOW, and TRANSITIONING. Incorporating this into your main state machine prevents spurious transitions.
Fault handling depends on defining what constitutes a failure. For a sensor, failures might include I2C timeouts, invalid data ranges, or repeated checksum errors. The system should not attempt to use invalid data; instead, it should transition into ERROR_SAFE. In ERROR_SAFE, you may attempt recovery after a backoff, but you must ensure the system remains safe (e.g., motors off). This requires temporal logic: “after 3 consecutive failures, enter ERROR_SAFE” and “after 10 seconds, attempt re-init.” These counters are part of your state machine and must be reset on success.
Watchdog timers add another layer of safety. A watchdog reset means your system failed to make progress. To avoid this, you must feed the watchdog in every main loop path, including error states. This is another example of control flow discipline: the watchdog feed is a resource release, and it must happen on every path. If an error handler forgets to feed it, the system will reset repeatedly. This is a subtle bug that only shows under faults, which is exactly why explicit invariants matter.
Low-power transitions also rely on timing. You may want to enter sleep after N cycles of inactivity. But entering sleep requires disabling certain peripherals, saving state, and configuring wakeup sources. These steps must happen in a strict order, and you must ensure that no pending I2C transactions or timer events will be lost. This is best modeled as a transition with a precondition “no pending operations” and a postcondition “peripherals disabled, wake timer armed.” If the precondition is not satisfied, you should postpone the transition.
Testing timing logic is notoriously hard on real hardware. To make it deterministic, build a simulation mode where time advances in steps and events are injected in a fixed order. Use fixed timing values and log each state transition. This allows you to replay scenarios such as “sensor fails 3 times then recovers” or “button bounce for 10ms” and confirm correct behavior. Deterministic simulation is the only scalable way to validate temporal invariants.
How this fit on projects
This concept drives the timing and fault logic: how you debounce inputs, schedule sensor reads, manage watchdog feeding, and transition between ACTIVE, LOW_POWER, and ERROR_SAFE.
Definitions & key terms
- Debounce: Filtering of noisy input transitions.
- Watchdog: Hardware timer that resets system if not fed.
- Backoff: Delay before retrying a failed operation.
- Temporal invariant: Condition involving time or sequence.
Mental model diagram (ASCII)
Timer ISR -> TICK event -> Main loop
| |
v v
Debounce -> Sensor Read -> Fault Count
How it works (step-by-step)
- Timer ISR sets TICK event.
- Main loop processes TICK, increments counters.
- Debounce inputs by requiring stable readings.
- If sensor read fails N times, enter ERROR_SAFE.
- Feed watchdog in every iteration.
Failure modes: missed watchdog feed, noisy inputs causing false transitions, sleep entered with pending operations.
Minimal concrete example
if (tick) {
tick = 0;
if (stable_readings >= 3) evt = EVT_BUTTON;
if (++sensor_failures >= 3) state = ERROR_SAFE;
}
Common misconceptions
- “Debounce can be ignored.” It causes random false events.
- “Watchdog only matters in normal state.” Error paths must feed it too.
Check-your-understanding questions
- Why should the ISR avoid doing I2C reads?
- How does debouncing prevent false transitions?
- What happens if you forget to feed the watchdog in ERROR_SAFE?
Check-your-understanding answers
- ISRs must be fast and non-blocking; I2C can block and break timing.
- It requires stable input for N cycles before accepting a change.
- The watchdog will reset the system repeatedly, preventing recovery.
Real-world applications
- Wearable device power management
- Automotive sensor controllers
- Industrial safety systems
Where you’ll apply it
- In this project: see §3.2 Functional Requirements, §5.10 Phase 3.
- Also used in: P02-http-1-1-parser.md (timeouts), P03-connection-pool.md (backoff and retries).
References
- “Making Embedded Systems” by Elecia White (timers, debouncing)
- Microcontroller datasheets for watchdog and sleep modes
Key insights
Temporal correctness is a state machine over time; timing events are just another input stream.
Summary
Debouncing, watchdogs, and timing-based transitions are non-negotiable in reliable embedded systems.
Homework/Exercises to practice the concept
- Implement a debounce function with a stability counter.
- Simulate three sensor timeouts and verify transition to ERROR_SAFE.
Solutions to the homework/exercises
- Increment a counter on stable reads, reset on changes, trigger when counter >= N.
- Trigger ERROR_SAFE after three consecutive failures and log the transition.
3. Project Specification
3.1 What You Will Build
An embedded sensor controller that reads a sensor periodically, handles failures, and enters low-power mode after inactivity. The controller is implemented as a state machine and includes a desktop simulation harness for deterministic testing.
Included:
- INIT, ACTIVE, LOW_POWER, ERROR_SAFE states
- Timer-based sensor read
- Debounced input to wake from sleep
- Watchdog feeding
Excluded:
- Full UI or networking
3.2 Functional Requirements
- State machine: explicit enum with transitions.
- Sensor read loop: periodic reads with retry logic.
- Debounce: stable input required for wake events.
- Low-power: enter sleep after idle timeout.
- Error safe: disable outputs and attempt recovery.
- Watchdog: fed in all states.
3.3 Non-Functional Requirements
- Performance: stable operation for 24-hour simulation.
- Reliability: no hangs on sensor failure.
- Power: low-power mode reduces activity.
3.4 Example Usage / Output
$ ./sensor_sim --events "OK,OK,OK,IDLE,WAKE"
[STATE] INIT -> ACTIVE
[STATE] ACTIVE -> LOW_POWER
[STATE] LOW_POWER -> ACTIVE
3.5 Data Formats / Schemas / Protocols
- Event script: CSV list of events for simulation.
- JSON error shape (if
--json):{ "error": { "code": "SENSOR_TIMEOUT", "message": "no response" } }
3.6 Edge Cases
- Sensor disconnect while in LOW_POWER
- Button bounce causing multiple wake events
- Watchdog reset in ERROR_SAFE
3.7 Real World Outcome
Deterministic simulation uses fixed event scripts.
3.7.1 How to Run (Copy/Paste)
cc -std=c11 -O2 -o sensor_sim src/sensor_sim.c
./sensor_sim --events "OK,OK,OK,FAIL,FAIL,FAIL,RECOVER" --seed 42
3.7.2 Golden Path Demo (Deterministic)
Expected output:
[STATE] INIT -> ACTIVE
[STATE] ACTIVE -> ERROR_SAFE
[STATE] ERROR_SAFE -> ACTIVE
3.7.3 CLI Transcript (Success + Failure)
$ ./sensor_sim --events "OK,OK,OK"
[STATE] INIT -> ACTIVE
$ echo $?
0
$ ./sensor_sim --events "FAIL,FAIL,FAIL"
ERROR: SENSOR_TIMEOUT
$ echo $?
1
Exit codes:
0success1sensor/error state2system error
4. Solution Architecture
4.1 High-Level Design
[Timer ISR] -> [Event Queue] -> [State Machine] -> [Sensor Driver]
|
v
[Power Manager]
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | State Machine | Handle transitions | Explicit enum + switch | | Event Queue | Buffer ISR events | Fixed-size ring buffer | | Sensor Driver | Read/validate sensor | Non-blocking in main loop | | Power Manager | Enter/exit low power | Explicit enter/exit hooks |
4.3 Data Structures (No Full Code)
typedef struct {
State state;
int failure_count;
int idle_ticks;
} Controller;
4.4 Algorithm Overview
Key Algorithm: Tick-Driven Control Loop
- On tick, enqueue EVT_TICK.
- Main loop processes events and updates state.
- On failure threshold, transition to ERROR_SAFE.
- On idle threshold, transition to LOW_POWER.
Complexity Analysis:
- Time: O(1) per event
- Space: O(1) fixed buffers
5. Implementation Guide
5.1 Development Environment Setup
cc --version
5.2 Project Structure
project-root/
├── src/
│ ├── controller.c
│ ├── controller.h
│ └── sensor_sim.c
├── tests/
│ └── test_controller.c
└── Makefile
5.3 The Core Question You’re Answering
“How do I keep an embedded system safe and responsive when events and failures arrive unpredictably?”
5.4 Concepts You Must Understand First
- Event-driven state machines
- ISR vs main loop separation
- Debouncing and watchdogs
5.5 Questions to Guide Your Design
- Which events are allowed in each state?
- What must always be true in ERROR_SAFE?
- How do you guarantee the watchdog is fed on all paths?
5.6 Thinking Exercise
Simulate: INIT -> ACTIVE, 3 failures, then recovery. What state transitions occur?
5.7 The Interview Questions They’ll Ask
- “Why should ISRs avoid heavy work?”
- “How do you debounce a button?”
- “What ensures safe recovery after a sensor fault?”
5.8 Hints in Layers
Hint 1: Start with a desktop simulation before hardware.
Hint 2: Implement state machine transitions first.
Hint 3: Add timing and debouncing after core logic works.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | State machines | “Making Embedded Systems” | Ch. 5 | | Timers | “Making Embedded Systems” | Ch. 4 | | Error handling | “Effective C” | Ch. 8 |
5.10 Implementation Phases
Phase 1: Core State Machine (3-4 days)
- Implement enum states and transitions.
- Add basic events in simulation.
Phase 2: Timing + Debounce (4-5 days)
- Add tick events and debouncing.
- Add idle timeout and wake events.
Phase 3: Fault Handling (4-5 days)
- Implement ERROR_SAFE and recovery logic.
- Add watchdog feeding.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Event handling | Direct flags, queue | Queue | Predictable ordering | | Debounce | time-based, count-based | Count-based | Deterministic in simulation | | Recovery | immediate, backoff | Backoff | Prevent thrashing |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | State transitions | INIT->ACTIVE | | Integration Tests | Fault recovery | sensor fail then recover | | Edge Case Tests | Debounce | noisy input |
6.2 Critical Test Cases
- Three consecutive failures triggers ERROR_SAFE.
- Debounce ignores rapid toggles.
- Watchdog fed in ERROR_SAFE.
6.3 Test Data
Events: OK,OK,FAIL,FAIL,FAIL,RECOVER
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |——–|———|———-| | I2C in ISR | Hangs | Move to main loop | | Missing debounce | False transitions | Add stability counter | | No backoff | Thrashing | Add delay before recovery |
7.2 Debugging Strategies
- Log state transitions with timestamps.
- Use simulation harness for deterministic replay.
7.3 Performance Traps
Sleeping too frequently without wake sources can lock the device; always configure wake events.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add LED indicators for each state.
- Add a manual reset command.
8.2 Intermediate Extensions
- Implement exponential backoff for recovery.
- Add multiple sensors with independent states.
8.3 Advanced Extensions
- Integrate with real hardware (Arduino/STM32).
- Add persistent error logs stored in flash.
9. Real-World Connections
9.1 Industry Applications
- IoT device firmware
- Safety-critical controllers
9.2 Related Open Source Projects
- FreeRTOS (task-based state machines)
- Zephyr (embedded event loops)
9.3 Interview Relevance
- Event-driven state machines and watchdogs are common embedded interview topics.
10. Resources
10.1 Essential Reading
- “Making Embedded Systems” by Elecia White
- Microcontroller datasheets for timers and watchdogs
10.2 Video Resources
- Embedded systems state machine tutorials
10.3 Tools & Documentation
arduino-clior STM32CubeIDE
10.4 Related Projects in This Series
- Project 3: Connection Pool for lifecycle invariants.
- Project 6: Git-like VCS for recovery patterns.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain each state and its invariants.
- I can explain why ISR work must be minimal.
- I can explain watchdog feeding requirements.
11.2 Implementation
- All functional requirements are met.
- Simulation tests pass deterministically.
- Recovery from faults works reliably.
11.3 Growth
- I can reason about timing-based transitions.
12. Submission / Completion Criteria
Minimum Viable Completion:
- State machine with INIT/ACTIVE/ERROR_SAFE works.
- Sensor failures trigger ERROR_SAFE.
Full Completion:
- Low-power mode, debounce, and watchdog implemented.
Excellence (Going Above & Beyond):
- Full hardware integration with logging and persistent error reports.
13. Additional Content Rules (Compliance)
- Deterministic demos in §3.7.
- Failure demo with exit codes included.
- Cross-links included in §2.1 and §2.2.