Project 9: The Encrypted Robot (SROS2)

A secure ROS 2 graph where only nodes with valid certificates can communicate.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	2-3 weeks
Main Programming Language	Bash
Alternative Programming Languages	XML
Coolness Level	Level 5: Pure Magic
Business Potential	3. The Service & Support Model
Prerequisites	PKI basics, ROS 2 security env vars, openssl
Key Topics	DDS-Security, Enclave Policies, PKI Basics

1. Learning Objectives

By completing this project, you will:

Explain how DDS-Security affects ROS 2 behavior in this project.
Implement the core pipeline for Project 9 and validate it with a deterministic demo.
Measure and document performance or correctness under at least one stress condition.
Produce artifacts (configs, logs, scripts) that make the system reproducible.

2. All Theory Needed (Per-Concept Breakdown)

DDS-Security

Fundamentals

DDS-Security is the DDS security plugins that provide authentication, encryption, and access control. In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how authentication and encryption influence behavior. When you debug a system, you will almost always inspect access control or governance first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.

Deep Dive into the concept

A deeper look at DDS-Security starts by tracing data from the API surface to the middleware. Every time you configure authentication or encryption, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in access control can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).

Failure modes cluster around mismatched assumptions. If governance is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If permissions is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.

How this fits on projects

This concept directly shapes how you implement and validate Project 9. You will configure it, observe it, and stress it under controlled conditions.

Definitions & key terms

authentication: authentication in the context of DDS-Security and ROS 2 systems.
encryption: encryption in the context of DDS-Security and ROS 2 systems.
access control: access control in the context of DDS-Security and ROS 2 systems.
governance: governance in the context of DDS-Security and ROS 2 systems.
permissions: permissions in the context of DDS-Security and ROS 2 systems.

Mental model diagram (ASCII)

[User Code] -> [DDS-Security] -> [rmw/DDS] -> [Wire/Runtime Effects]
       |             |               |                 |
   Config/API     Policies        Entities         Observability

How it works (step-by-step, with invariants and failure modes)

A node configures the concept through API calls or config files.
The rmw layer translates the settings into DDS/RTPS fields (authentication, encryption).
Peers evaluate compatibility, matching, or timing using access control and governance.
The runtime queues or state machines enforce the policy and emit data.
Observability tools (logs, CLI, packet capture) confirm permissions behavior.

Minimal concrete example

ROS_SECURITY_ENABLE=true ROS_SECURITY_STRATEGY=Enforce

Common misconceptions

Assuming defaults are identical across vendors.
Believing that discovery implies data flow without validating compatibility.

Check-your-understanding questions

Explain how DDS-Security changes runtime behavior in ROS 2.
Predict what happens if authentication conflicts with encryption.
Why might two nodes discover each other but still exchange no data?

Check-your-understanding answers

It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
The endpoints fail to match or drop messages due to incompatible policy/encoding.
QoS or policy mismatch prevents writer-reader matching or delivery.

Real-world applications

secure robot fleets
regulated environments

Where you’ll apply it

You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
Also used in: P10-the-micro-edge-micro-ros-on-esp32.md and other projects in this series.

References

DDS-Security spec
SROS2 docs

Key insights

DDS-Security is the lever that connects configuration to observable system behavior.

Summary

This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.

Homework/Exercises to practice the concept

Capture or log a minimal trace where this concept is visible.
Change one policy/setting and predict the system impact before running it.
Explain the failure mode you expect if the configuration is wrong.

Solutions to the homework/exercises

The trace should show the concept-specific fields or events you expect.
Your prediction should name which endpoints match and how latency/loss changes.
A wrong configuration should lead to mismatch, dropped data, or timeouts.

Enclave Policies

Fundamentals

Enclave Policies is the SROS2 policy files that define which nodes can publish/subscribe. In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how governance.xml and permissions.xml influence behavior. When you debug a system, you will almost always inspect enclave or allow/deny rules first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.

Deep Dive into the concept

A deeper look at Enclave Policies starts by tracing data from the API surface to the middleware. Every time you configure governance.xml or permissions.xml, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in enclave can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).

Failure modes cluster around mismatched assumptions. If allow/deny rules is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If configuration is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.

How this fits on projects

This concept directly shapes how you implement and validate Project 9. You will configure it, observe it, and stress it under controlled conditions.

Definitions & key terms

governance.xml: governance.xml in the context of Enclave Policies and ROS 2 systems.
permissions.xml: permissions.xml in the context of Enclave Policies and ROS 2 systems.
enclave: enclave in the context of Enclave Policies and ROS 2 systems.
allow/deny rules: allow/deny rules in the context of Enclave Policies and ROS 2 systems.
configuration: configuration in the context of Enclave Policies and ROS 2 systems.

Mental model diagram (ASCII)

[User Code] -> [Enclave Policies] -> [rmw/DDS] -> [Wire/Runtime Effects]
       |             |               |                 |
   Config/API     Policies        Entities         Observability

How it works (step-by-step, with invariants and failure modes)

A node configures the concept through API calls or config files.
The rmw layer translates the settings into DDS/RTPS fields (governance.xml, permissions.xml).
Peers evaluate compatibility, matching, or timing using enclave and allow/deny rules.
The runtime queues or state machines enforce the policy and emit data.
Observability tools (logs, CLI, packet capture) confirm configuration behavior.

Minimal concrete example

<allow_rule><domains>0</domains><topics>...</topics></allow_rule>

Common misconceptions

Assuming defaults are identical across vendors.
Believing that discovery implies data flow without validating compatibility.

Check-your-understanding questions

Explain how Enclave Policies changes runtime behavior in ROS 2.
Predict what happens if governance.xml conflicts with permissions.xml.
Why might two nodes discover each other but still exchange no data?

Check-your-understanding answers

It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
The endpoints fail to match or drop messages due to incompatible policy/encoding.
QoS or policy mismatch prevents writer-reader matching or delivery.

Real-world applications

multi-tenant robots
least-privilege systems

Where you’ll apply it

You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
Also used in: P10-the-micro-edge-micro-ros-on-esp32.md and other projects in this series.

References

SROS2 policy templates
DDS-Security governance docs

Key insights

Enclave Policies is the lever that connects configuration to observable system behavior.

Summary

This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.

Homework/Exercises to practice the concept

Capture or log a minimal trace where this concept is visible.
Change one policy/setting and predict the system impact before running it.
Explain the failure mode you expect if the configuration is wrong.

Solutions to the homework/exercises

The trace should show the concept-specific fields or events you expect.
Your prediction should name which endpoints match and how latency/loss changes.
A wrong configuration should lead to mismatch, dropped data, or timeouts.

PKI Basics

Fundamentals

PKI Basics is public key infrastructure concepts needed to issue and validate DDS certificates. In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how CA and certificate influence behavior. When you debug a system, you will almost always inspect private key or CSR first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.

Deep Dive into the concept

A deeper look at PKI Basics starts by tracing data from the API surface to the middleware. Every time you configure CA or certificate, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in private key can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).

Failure modes cluster around mismatched assumptions. If CSR is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If trust chain is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.

How this fits on projects

This concept directly shapes how you implement and validate Project 9. You will configure it, observe it, and stress it under controlled conditions.

Definitions & key terms

CA: CA in the context of PKI Basics and ROS 2 systems.
certificate: certificate in the context of PKI Basics and ROS 2 systems.
private key: private key in the context of PKI Basics and ROS 2 systems.
CSR: CSR in the context of PKI Basics and ROS 2 systems.
trust chain: trust chain in the context of PKI Basics and ROS 2 systems.

Mental model diagram (ASCII)

[User Code] -> [PKI Basics] -> [rmw/DDS] -> [Wire/Runtime Effects]
       |             |               |                 |
   Config/API     Policies        Entities         Observability

How it works (step-by-step, with invariants and failure modes)

A node configures the concept through API calls or config files.
The rmw layer translates the settings into DDS/RTPS fields (CA, certificate).
Peers evaluate compatibility, matching, or timing using private key and CSR.
The runtime queues or state machines enforce the policy and emit data.
Observability tools (logs, CLI, packet capture) confirm trust chain behavior.

Minimal concrete example

openssl req -new -key node.key -out node.csr

Common misconceptions

Assuming defaults are identical across vendors.
Believing that discovery implies data flow without validating compatibility.

Check-your-understanding questions

Explain how PKI Basics changes runtime behavior in ROS 2.
Predict what happens if CA conflicts with certificate.
Why might two nodes discover each other but still exchange no data?

Check-your-understanding answers

It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
The endpoints fail to match or drop messages due to incompatible policy/encoding.
QoS or policy mismatch prevents writer-reader matching or delivery.

Real-world applications

secure node identity
certificate rotation

Where you’ll apply it

You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
Also used in: P10-the-micro-edge-micro-ros-on-esp32.md and other projects in this series.

References

Serious Cryptography
OpenSSL docs

Key insights

PKI Basics is the lever that connects configuration to observable system behavior.

Summary

This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.

Homework/Exercises to practice the concept

Capture or log a minimal trace where this concept is visible.
Change one policy/setting and predict the system impact before running it.
Explain the failure mode you expect if the configuration is wrong.

Solutions to the homework/exercises

The trace should show the concept-specific fields or events you expect.
Your prediction should name which endpoints match and how latency/loss changes.
A wrong configuration should lead to mismatch, dropped data, or timeouts.

3. Project Specification

3.1 What You Will Build

A secure ROS 2 graph where only nodes with valid certificates can communicate.

Included features:

Deterministic startup with explicit configuration.
Observability (logs/CLI output) that exposes discovery/data flow.
A reproducible demo and a failure case.

Excluded on purpose:

Full robot control stacks or SLAM pipelines.
Custom GUIs beyond CLI output.

3.2 Functional Requirements

**PKI setup: **PKI setup -> Creating CAs and certificates.
**Governance/permissions: **Governance/permissions -> Defining policies.
**Verification: **Verification -> Proving encryption on the wire.
Deterministic startup: The project must start with a reproducible, logged configuration.
Observability: Provide CLI or log output that confirms each major component is working.

3.3 Non-Functional Requirements

Performance: Must meet the throughput/latency targets documented in the benchmark.\n- Reliability: Must handle common network or runtime failures gracefully.\n- Usability: CLI flags and logs must make configuration and diagnosis obvious.

3.4 Example Usage / Output

$ ros2 run demo_nodes_cpp talker --ros-args --enclave /robot1
[INFO] secure graph established

3.5 Data Formats / Schemas / Protocols

ROS_SECURITY_ENABLE=true
ROS_SECURITY_STRATEGY=Enforce
ROS_SECURITY_KEYSTORE=./keystore

3.6 Edge Cases

Expired certs
Permissions missing topic
Mixed secure/insecure graph

3.7 Real World Outcome

By the end of this project you will have a reproducible system that produces the same observable signals every time you run it. You will be able to point to console output, captured packets, or bag files and explain exactly why the result is correct. You will also be able to force a failure and demonstrate a clean error path.

3.7.1 How to Run (Copy/Paste)

# Build
colcon build --packages-select project_9
# Run
source install/setup.bash
# Start the main node/tool
./run_project_9.sh

3.7.2 Golden Path Demo (Deterministic)

$ ros2 run demo_nodes_cpp talker --ros-args --enclave /robot1
[INFO] secure graph established

3.7.3 Failure Demo (Deterministic)

$ ROS_SECURITY_STRATEGY=Enforce ros2 node list
[ERROR] permission denied for /unauthorized

4. Solution Architecture

4.1 High-Level Design

[Input/Config] -> [Core Engine] -> [ROS 2/DDS] -> [Observability Output]

4.2 Key Components

Component	Responsibility	Key Decisions
Keystore	Generate certs and keys for nodes	Repeatable build
Policy Files	Governance and permissions	Least-privilege rules
Secure Launch	Start nodes with enforced security	Verify encryption

4.3 Data Structures (No Full Code)

governance.xml / permissions.xml (DDS-Security)

4.4 Algorithm Overview

Key Algorithm: Core Pipeline

Generate keystore
Write policies
Launch with security env
Verify

Complexity Analysis:

Time: O(n) over messages/events processed
Space: O(1) to O(n) depending on buffering

5. Implementation Guide

5.1 Development Environment Setup

# Install ROS 2 and dependencies
sudo apt-get update
sudo apt-get install -y ros-$ROS_DISTRO-ros-base python3-colcon-common-extensions

5.2 Project Structure

project-root/
|-- src/
|   |-- main.cpp
|   |-- config.yaml
|   `-- utils.cpp
|-- scripts/
|   `-- run_project.sh
|-- tests/
|   `-- test_core.py
`-- README.md

5.3 The Core Question You’re Answering

“How do I prevent rogue nodes from joining my robot network?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

DDS-Security
- What breaks if this is misconfigured?
- How will you observe it?
Enclave Policies
- What breaks if this is misconfigured?
- How will you observe it?
PKI Basics
- What breaks if this is misconfigured?
- How will you observe it?

5.5 Questions to Guide Your Design

What topics should be allowed in permissions?
Should discovery be encrypted?

5.6 Thinking Exercise

Write a permissions policy that allows only /cmd_vel publishing.

5.7 The Interview Questions They’ll Ask

“What files are required for an enclave?”
“How does DDS-Security enforce access control?”

5.8 Hints in Layers

Hint 1: Use sros2 keystore generation Hint 2: Enable ROS_SECURITY_ENABLE Hint 3: Run with ROS_SECURITY_STRATEGY=Enforce Force the system to reject any node without valid credentials. Hint 4: Use a packet capture Verify that discovery/data payloads are encrypted when security is enabled.

5.9 Books That Will Help

Topic	Book	Chapter
Topic	Book	Chapter
Security	“Serious Cryptography”	Ch. 2-4

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

Reproduce the baseline example from the original project outline.
Validate toolchain, dependencies, and environment variables.

Tasks:

Create the repository and baseline project structure.
Run a minimal example to confirm discovery/data flow.

Checkpoint: You can reproduce the minimal example and collect logs.

Phase 2: Core Functionality (2-3 weeks)

Goals:

Implement the full feature set from the requirements.
Instrument key metrics and logs.

Tasks:

Implement each component and integrate them.
Add CLI/config flags for core parameters.

Checkpoint: Golden path demo succeeds with deterministic output.

Phase 3: Polish & Edge Cases (3-5 days)

Goals:

Handle failure scenarios and document them.
Create a short report/README describing results.

Tasks:

Add error handling, timeouts, and validation.
Capture failure demo output and metrics.

Checkpoint: Failure demo yields the expected errors and exit codes.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Transport	UDP, shared memory, serial	UDP for baseline	Simplest to observe and debug
QoS	Default, tuned	Default then tune	Establish baseline before optimization

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Validate parsers and helpers	Packet decoder, config parser
Integration Tests	End-to-end ROS 2 flow	Publisher -> Subscriber -> Metrics
Edge Case Tests	Failures & mismatches	Wrong domain ID, missing config

6.2 Critical Test Cases

Test 1: Baseline message flow works end-to-end.
Test 2: Configuration mismatch produces a clear, actionable error.
Test 3: Performance/latency stays within documented bounds.

6.3 Test Data

Use a fixed dataset or fixed random seed to make metrics reproducible.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
QoS mismatch	Discovery works but no data	Align policies explicitly
Misconfigured env vars	No nodes discovered	Print and validate env on startup
Network filtering	Intermittent data	Check firewall and multicast settings

7.2 Debugging Strategies

Start from the graph: confirm discovery before tuning QoS.
Capture packets: validate that RTPS traffic appears on expected ports.

7.3 Performance Traps

If throughput is low, check for unnecessary serialization, small history depth, or lack of shared memory.

8. Extensions & Challenges

8.1 Beginner Extensions

Add verbose logging and a dry-run mode.
Add a simple configuration file parser.

8.2 Intermediate Extensions

Add metrics export to CSV or JSON.
Add automated regression tests.

8.3 Advanced Extensions

Implement cross-vendor compatibility validation.
Add chaos testing with randomized loss/latency patterns.

9. Real-World Connections

9.1 Industry Applications

Fleet robotics where reliability must be guaranteed under lossy Wi-Fi.
Industrial systems that require deterministic startup and clear failure modes.

ROS 2 core repositories (rcl, rmw, rosidl)
DDS vendors: Fast DDS, Cyclone DDS

9.3 Interview Relevance

Explain QoS compatibility and discovery failures.
Describe how to debug why nodes discover but do not communicate.

10. Resources

10.1 Essential Reading

“Mastering ROS 2 for Robotics Programming” (focus on the sections related to DDS-Security)
ROS 2 official docs for the specific APIs used in this project

10.2 Video Resources

ROS 2 community talks on middleware and DDS
Vendor tutorials on discovery and QoS

10.3 Tools & Documentation

ROS 2 CLI and rclcpp/rclpy docs
Wireshark or tcpdump for network visibility

Project 8: Builds prerequisite concepts
Project 10: Extends the middleware layer

11. Self-Assessment Checklist

11.1 Understanding

I can explain DDS-Security without notes
I can explain how QoS and discovery interact
I understand why the system fails when policies mismatch

11.2 Implementation

All functional requirements are met
Golden path demo succeeds
Failure demo produces expected errors

11.3 Growth

I can explain this project in a technical interview
I documented lessons learned and configs
I can reproduce the results on another machine

12. Submission / Completion Criteria

Minimum Viable Completion:

Golden path demo output matches documentation
At least one failure scenario is documented
Metrics or logs demonstrate correct behavior

Full Completion:

All minimum criteria plus:
Compatibility verified across at least two QoS settings
Results written to a short report

Excellence (Going Above & Beyond):

Automated regression tests for discovery/QoS behavior
Clear compatibility matrix or benchmark chart

Project 9: The Encrypted Robot (SROS2)

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

DDS-Security

Enclave Policies

PKI Basics

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 Failure Demo (Deterministic)

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Phase 2: Core Functionality (2-3 weeks)

Phase 3: Polish & Edge Cases (3-5 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria