Project 10: The Micro-Edge (Micro-ROS on ESP32)
A Micro-ROS node on an ESP32 that publishes sensor data into a ROS 2 graph via the XRCE agent.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert |
| Time Estimate | 3-4 weeks |
| Main Programming Language | C |
| Alternative Programming Languages | C++, MicroPython |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | 5. The Industry Disruptor |
| Prerequisites | Embedded C, ESP-IDF, UART/UDP basics |
| Key Topics | XRCE Client/Agent Model, Static Memory Pools, Embedded Transport Basics |
1. Learning Objectives
By completing this project, you will:
- Explain how XRCE Client/Agent Model affects ROS 2 behavior in this project.
- Implement the core pipeline for Project 10 and validate it with a deterministic demo.
- Measure and document performance or correctness under at least one stress condition.
- Produce artifacts (configs, logs, scripts) that make the system reproducible.
2. All Theory Needed (Per-Concept Breakdown)
XRCE Client/Agent Model
Fundamentals
XRCE Client/Agent Model is the split architecture where a microcontroller client uses an agent to speak DDS. In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how XRCE client and agent influence behavior. When you debug a system, you will almost always inspect session or stream first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.
Deep Dive into the concept
A deeper look at XRCE Client/Agent Model starts by tracing data from the API surface to the middleware. Every time you configure XRCE client or agent, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in session can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).
Failure modes cluster around mismatched assumptions. If stream is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If proxy entities is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.
How this fits on projects
This concept directly shapes how you implement and validate Project 10. You will configure it, observe it, and stress it under controlled conditions.
Definitions & key terms
- XRCE client: XRCE client in the context of XRCE Client/Agent Model and ROS 2 systems.
- agent: agent in the context of XRCE Client/Agent Model and ROS 2 systems.
- session: session in the context of XRCE Client/Agent Model and ROS 2 systems.
- stream: stream in the context of XRCE Client/Agent Model and ROS 2 systems.
- proxy entities: proxy entities in the context of XRCE Client/Agent Model and ROS 2 systems.
Mental model diagram (ASCII)
[User Code] -> [XRCE Client/Agent Model] -> [rmw/DDS] -> [Wire/Runtime Effects]
| | | |
Config/API Policies Entities Observability
How it works (step-by-step, with invariants and failure modes)
- A node configures the concept through API calls or config files.
- The rmw layer translates the settings into DDS/RTPS fields (XRCE client, agent).
- Peers evaluate compatibility, matching, or timing using session and stream.
- The runtime queues or state machines enforce the policy and emit data.
- Observability tools (logs, CLI, packet capture) confirm proxy entities behavior.
Minimal concrete example
micro-ros-agent udp4 --port 8888
Common misconceptions
- Assuming defaults are identical across vendors.
- Believing that discovery implies data flow without validating compatibility.
Check-your-understanding questions
- Explain how XRCE Client/Agent Model changes runtime behavior in ROS 2.
- Predict what happens if XRCE client conflicts with agent.
- Why might two nodes discover each other but still exchange no data?
Check-your-understanding answers
- It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
- The endpoints fail to match or drop messages due to incompatible policy/encoding.
- QoS or policy mismatch prevents writer-reader matching or delivery.
Real-world applications
- microcontrollers in ROS 2
- resource-constrained robots
Where you’ll apply it
- You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
- Also used in: P11-multi-robot-swarm-global-data-space.md and other projects in this series.
References
- micro-ROS XRCE-DDS docs
- eProsima Micro XRCE-DDS
Key insights
- XRCE Client/Agent Model is the lever that connects configuration to observable system behavior.
Summary
This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.
Homework/Exercises to practice the concept
- Capture or log a minimal trace where this concept is visible.
- Change one policy/setting and predict the system impact before running it.
- Explain the failure mode you expect if the configuration is wrong.
Solutions to the homework/exercises
- The trace should show the concept-specific fields or events you expect.
- Your prediction should name which endpoints match and how latency/loss changes.
- A wrong configuration should lead to mismatch, dropped data, or timeouts.
Static Memory Pools
Fundamentals
Static Memory Pools is pre-allocating memory for embedded ROS nodes to avoid dynamic allocation. In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how pool size and allocation failure influence behavior. When you debug a system, you will almost always inspect determinism or static buffers first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.
Deep Dive into the concept
A deeper look at Static Memory Pools starts by tracing data from the API surface to the middleware. Every time you configure pool size or allocation failure, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in determinism can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).
Failure modes cluster around mismatched assumptions. If static buffers is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If configuration is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.
How this fits on projects
This concept directly shapes how you implement and validate Project 10. You will configure it, observe it, and stress it under controlled conditions.
Definitions & key terms
- pool size: pool size in the context of Static Memory Pools and ROS 2 systems.
- allocation failure: allocation failure in the context of Static Memory Pools and ROS 2 systems.
- determinism: determinism in the context of Static Memory Pools and ROS 2 systems.
- static buffers: static buffers in the context of Static Memory Pools and ROS 2 systems.
- configuration: configuration in the context of Static Memory Pools and ROS 2 systems.
Mental model diagram (ASCII)
[User Code] -> [Static Memory Pools] -> [rmw/DDS] -> [Wire/Runtime Effects]
| | | |
Config/API Policies Entities Observability
How it works (step-by-step, with invariants and failure modes)
- A node configures the concept through API calls or config files.
- The rmw layer translates the settings into DDS/RTPS fields (pool size, allocation failure).
- Peers evaluate compatibility, matching, or timing using determinism and static buffers.
- The runtime queues or state machines enforce the policy and emit data.
- Observability tools (logs, CLI, packet capture) confirm configuration behavior.
Minimal concrete example
RCLC_EXECUTOR_HANDLE_NUMBER=4; static uint8_t pool[4096];
Common misconceptions
- Assuming defaults are identical across vendors.
- Believing that discovery implies data flow without validating compatibility.
Check-your-understanding questions
- Explain how Static Memory Pools changes runtime behavior in ROS 2.
- Predict what happens if pool size conflicts with allocation failure.
- Why might two nodes discover each other but still exchange no data?
Check-your-understanding answers
- It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
- The endpoints fail to match or drop messages due to incompatible policy/encoding.
- QoS or policy mismatch prevents writer-reader matching or delivery.
Real-world applications
- real-time microcontrollers
- safety-critical firmware
Where you’ll apply it
- You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
- Also used in: P11-multi-robot-swarm-global-data-space.md and other projects in this series.
References
- Embedded systems design texts
- micro-ROS memory docs
Key insights
- Static Memory Pools is the lever that connects configuration to observable system behavior.
Summary
This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.
Homework/Exercises to practice the concept
- Capture or log a minimal trace where this concept is visible.
- Change one policy/setting and predict the system impact before running it.
- Explain the failure mode you expect if the configuration is wrong.
Solutions to the homework/exercises
- The trace should show the concept-specific fields or events you expect.
- Your prediction should name which endpoints match and how latency/loss changes.
- A wrong configuration should lead to mismatch, dropped data, or timeouts.
Embedded Transport Basics
Fundamentals
Embedded Transport Basics is serial and UDP transports used to connect MCUs to ROS 2 via agents. In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how UART and UDP influence behavior. When you debug a system, you will almost always inspect framing or retries first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.
Deep Dive into the concept
A deeper look at Embedded Transport Basics starts by tracing data from the API surface to the middleware. Every time you configure UART or UDP, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in framing can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).
Failure modes cluster around mismatched assumptions. If retries is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If MTU is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.
How this fits on projects
This concept directly shapes how you implement and validate Project 10. You will configure it, observe it, and stress it under controlled conditions.
Definitions & key terms
- UART: UART in the context of Embedded Transport Basics and ROS 2 systems.
- UDP: UDP in the context of Embedded Transport Basics and ROS 2 systems.
- framing: framing in the context of Embedded Transport Basics and ROS 2 systems.
- retries: retries in the context of Embedded Transport Basics and ROS 2 systems.
- MTU: MTU in the context of Embedded Transport Basics and ROS 2 systems.
Mental model diagram (ASCII)
[User Code] -> [Embedded Transport Basics] -> [rmw/DDS] -> [Wire/Runtime Effects]
| | | |
Config/API Policies Entities Observability
How it works (step-by-step, with invariants and failure modes)
- A node configures the concept through API calls or config files.
- The rmw layer translates the settings into DDS/RTPS fields (UART, UDP).
- Peers evaluate compatibility, matching, or timing using framing and retries.
- The runtime queues or state machines enforce the policy and emit data.
- Observability tools (logs, CLI, packet capture) confirm MTU behavior.
Minimal concrete example
/dev/ttyUSB0 @ 115200, XRCE framing
Common misconceptions
- Assuming defaults are identical across vendors.
- Believing that discovery implies data flow without validating compatibility.
Check-your-understanding questions
- Explain how Embedded Transport Basics changes runtime behavior in ROS 2.
- Predict what happens if UART conflicts with UDP.
- Why might two nodes discover each other but still exchange no data?
Check-your-understanding answers
- It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
- The endpoints fail to match or drop messages due to incompatible policy/encoding.
- QoS or policy mismatch prevents writer-reader matching or delivery.
Real-world applications
- sensor boards
- edge telemetry
Where you’ll apply it
- You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
- Also used in: P11-multi-robot-swarm-global-data-space.md and other projects in this series.
References
- Embedded serial comms guides
- micro-ROS transport docs
Key insights
- Embedded Transport Basics is the lever that connects configuration to observable system behavior.
Summary
This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.
Homework/Exercises to practice the concept
- Capture or log a minimal trace where this concept is visible.
- Change one policy/setting and predict the system impact before running it.
- Explain the failure mode you expect if the configuration is wrong.
Solutions to the homework/exercises
- The trace should show the concept-specific fields or events you expect.
- Your prediction should name which endpoints match and how latency/loss changes.
- A wrong configuration should lead to mismatch, dropped data, or timeouts.
3. Project Specification
3.1 What You Will Build
A Micro-ROS node on an ESP32 that publishes sensor data into a ROS 2 graph via the XRCE agent.
Included features:
- Deterministic startup with explicit configuration.
- Observability (logs/CLI output) that exposes discovery/data flow.
- A reproducible demo and a failure case.
Excluded on purpose:
- Full robot control stacks or SLAM pipelines.
- Custom GUIs beyond CLI output.
3.2 Functional Requirements
- **Static memory constraints: **Static memory constraints -> Avoiding dynamic allocation.
- **Transport setup: **Transport setup -> UART vs UDP.
- **Agent configuration: **Agent configuration -> Bridging into DDS.
- Deterministic startup: The project must start with a reproducible, logged configuration.
- Observability: Provide CLI or log output that confirms each major component is working.
3.3 Non-Functional Requirements
- Performance: Must meet the throughput/latency targets documented in the benchmark.\n- Reliability: Must handle common network or runtime failures gracefully.\n- Usability: CLI flags and logs must make configuration and diagnosis obvious.
3.4 Example Usage / Output
$ micro-ros-agent udp4 --port 8888
[INFO] client connected
$ ros2 topic echo /esp32/analog
3.5 Data Formats / Schemas / Protocols
micro-ros-agent udp4 --port 8888
ros2 topic echo /esp32/analog
3.6 Edge Cases
- Agent down
- Transport mismatch
- Memory pool exhausted
3.7 Real World Outcome
By the end of this project you will have a reproducible system that produces the same observable signals every time you run it. You will be able to point to console output, captured packets, or bag files and explain exactly why the result is correct. You will also be able to force a failure and demonstrate a clean error path.
3.7.1 How to Run (Copy/Paste)
# Build
colcon build --packages-select project_10
# Run
source install/setup.bash
# Start the main node/tool
./run_project_10.sh
3.7.2 Golden Path Demo (Deterministic)
$ micro-ros-agent udp4 --port 8888
[INFO] client connected
$ ros2 topic echo /esp32/analog
3.7.3 Failure Demo (Deterministic)
$ ros2 topic echo /esp32/analog
[WARN] no publisher (agent not running)
4. Solution Architecture
4.1 High-Level Design
[Input/Config] -> [Core Engine] -> [ROS 2/DDS] -> [Observability Output]
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| MCU Client | Micro-ROS node on ESP32 | Static memory setup |
| XRCE Agent | Bridge MCU to DDS | Transport configuration |
| Telemetry Topic | Publish sensor data | Rate limiting |
4.3 Data Structures (No Full Code)
struct SensorMsg {
float analog;
uint32_t seq;
};
4.4 Algorithm Overview
Key Algorithm: Core Pipeline
- Start agent
- Connect MCU
- Create node and publisher
- Publish samples
Complexity Analysis:
- Time: O(n) over messages/events processed
- Space: O(1) to O(n) depending on buffering
5. Implementation Guide
5.1 Development Environment Setup
# Install ROS 2 and dependencies
sudo apt-get update
sudo apt-get install -y ros-$ROS_DISTRO-ros-base python3-colcon-common-extensions
5.2 Project Structure
project-root/
|-- src/
| |-- main.cpp
| |-- config.yaml
| `-- utils.cpp
|-- scripts/
| `-- run_project.sh
|-- tests/
| `-- test_core.py
`-- README.md
5.3 The Core Question You’re Answering
“How can a microcontroller join a ROS 2 system without full DDS?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- XRCE Client/Agent Model
- What breaks if this is misconfigured?
- How will you observe it?
- Static Memory Pools
- What breaks if this is misconfigured?
- How will you observe it?
- Embedded Transport Basics
- What breaks if this is misconfigured?
- How will you observe it?
5.5 Questions to Guide Your Design
- Which transport is best for your hardware?
- How will you handle memory limits?
5.6 Thinking Exercise
Estimate the maximum number of publishers your MCU can handle with static pools.
5.7 The Interview Questions They’ll Ask
- “Why does micro-ROS use XRCE?”
- “What is the role of the agent?”
5.8 Hints in Layers
Hint 1: Start agent first Hint 2: Use a minimal publisher example Hint 3: Pre-allocate buffers Confirm pool sizes at startup and log allocation failures. Hint 4: Verify transport with a loopback test Send test bytes over UART/UDP before launching ROS 2 code.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Topic | Book | Chapter |
| Embedded | “Making Embedded Systems” | Ch. 1-3 |
5.10 Implementation Phases
Phase 1: Foundation (3-4 days)
Goals:
- Reproduce the baseline example from the original project outline.
- Validate toolchain, dependencies, and environment variables.
Tasks:
- Create the repository and baseline project structure.
- Run a minimal example to confirm discovery/data flow.
Checkpoint: You can reproduce the minimal example and collect logs.
Phase 2: Core Functionality (3-4 weeks)
Goals:
- Implement the full feature set from the requirements.
- Instrument key metrics and logs.
Tasks:
- Implement each component and integrate them.
- Add CLI/config flags for core parameters.
Checkpoint: Golden path demo succeeds with deterministic output.
Phase 3: Polish & Edge Cases (3-5 days)
Goals:
- Handle failure scenarios and document them.
- Create a short report/README describing results.
Tasks:
- Add error handling, timeouts, and validation.
- Capture failure demo output and metrics.
Checkpoint: Failure demo yields the expected errors and exit codes.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Transport | UDP, shared memory, serial | UDP for baseline | Simplest to observe and debug |
| QoS | Default, tuned | Default then tune | Establish baseline before optimization |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate parsers and helpers | Packet decoder, config parser |
| Integration Tests | End-to-end ROS 2 flow | Publisher -> Subscriber -> Metrics |
| Edge Case Tests | Failures & mismatches | Wrong domain ID, missing config |
6.2 Critical Test Cases
- Test 1: Baseline message flow works end-to-end.
- Test 2: Configuration mismatch produces a clear, actionable error.
- Test 3: Performance/latency stays within documented bounds.
6.3 Test Data
Use a fixed dataset or fixed random seed to make metrics reproducible.
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| QoS mismatch | Discovery works but no data | Align policies explicitly |
| Misconfigured env vars | No nodes discovered | Print and validate env on startup |
| Network filtering | Intermittent data | Check firewall and multicast settings |
7.2 Debugging Strategies
- Start from the graph: confirm discovery before tuning QoS.
- Capture packets: validate that RTPS traffic appears on expected ports.
7.3 Performance Traps
If throughput is low, check for unnecessary serialization, small history depth, or lack of shared memory.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add verbose logging and a dry-run mode.
- Add a simple configuration file parser.
8.2 Intermediate Extensions
- Add metrics export to CSV or JSON.
- Add automated regression tests.
8.3 Advanced Extensions
- Implement cross-vendor compatibility validation.
- Add chaos testing with randomized loss/latency patterns.
9. Real-World Connections
9.1 Industry Applications
- Fleet robotics where reliability must be guaranteed under lossy Wi-Fi.
- Industrial systems that require deterministic startup and clear failure modes.
9.2 Related Open Source Projects
- ROS 2 core repositories (rcl, rmw, rosidl)
- DDS vendors: Fast DDS, Cyclone DDS
9.3 Interview Relevance
- Explain QoS compatibility and discovery failures.
- Describe how to debug why nodes discover but do not communicate.
10. Resources
10.1 Essential Reading
- “Mastering ROS 2 for Robotics Programming” (focus on the sections related to XRCE Client/Agent Model)
- ROS 2 official docs for the specific APIs used in this project
10.2 Video Resources
- ROS 2 community talks on middleware and DDS
- Vendor tutorials on discovery and QoS
10.3 Tools & Documentation
- ROS 2 CLI and rclcpp/rclpy docs
- Wireshark or tcpdump for network visibility
10.4 Related Projects in This Series
- Project 9: Builds prerequisite concepts
- Project 11: Extends the middleware layer
11. Self-Assessment Checklist
11.1 Understanding
- I can explain XRCE Client/Agent Model without notes
- I can explain how QoS and discovery interact
- I understand why the system fails when policies mismatch
11.2 Implementation
- All functional requirements are met
- Golden path demo succeeds
- Failure demo produces expected errors
11.3 Growth
- I can explain this project in a technical interview
- I documented lessons learned and configs
- I can reproduce the results on another machine
12. Submission / Completion Criteria
Minimum Viable Completion:
- Golden path demo output matches documentation
- At least one failure scenario is documented
- Metrics or logs demonstrate correct behavior
Full Completion:
- All minimum criteria plus:
- Compatibility verified across at least two QoS settings
- Results written to a short report
Excellence (Going Above & Beyond):
- Automated regression tests for discovery/QoS behavior
- Clear compatibility matrix or benchmark chart