Project 1: The RTPS Packet Sniffer (Seeing the “Handshake”)

A packet sniffer that listens on DDS discovery ports and decodes RTPS headers, GUIDs, and discovery submessages. It prints participant and endpoint metadata in real time.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 2-3 weeks
Main Programming Language Python
Alternative Programming Languages C (libpcap), Go, Rust
Coolness Level Level 4: Hardcore Tech Flex
Business Potential 1. The “Resume Gold”
Prerequisites Linux CLI, UDP fundamentals, Wireshark basics, Python scripting
Key Topics RTPS Packet Structure, Port Mapping Formula, Multicast Discovery, DDS Discovery (PDP/EDP)

1. Learning Objectives

By completing this project, you will:

  1. Explain how RTPS Packet Structure affects ROS 2 behavior in this project.
  2. Implement the core pipeline for Project 1 and validate it with a deterministic demo.
  3. Measure and document performance or correctness under at least one stress condition.
  4. Produce artifacts (configs, logs, scripts) that make the system reproducible.

2. All Theory Needed (Per-Concept Breakdown)

RTPS Packet Structure

Fundamentals

RTPS Packet Structure is the wire-level layout of DDS-RTPS packets used for discovery and data exchange. In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how RTPS header and submessage IDs influence behavior. When you debug a system, you will almost always inspect endianness flag or GUID prefix first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.

Deep Dive into the concept

A deeper look at RTPS Packet Structure starts by tracing data from the API surface to the middleware. Every time you configure RTPS header or submessage IDs, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in endianness flag can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).

Failure modes cluster around mismatched assumptions. If GUID prefix is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If vendor ID is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.

How this fits on projects

This concept directly shapes how you implement and validate Project 1. You will configure it, observe it, and stress it under controlled conditions.

Definitions & key terms

  • RTPS header: RTPS header in the context of RTPS Packet Structure and ROS 2 systems.
  • submessage IDs: submessage IDs in the context of RTPS Packet Structure and ROS 2 systems.
  • endianness flag: endianness flag in the context of RTPS Packet Structure and ROS 2 systems.
  • GUID prefix: GUID prefix in the context of RTPS Packet Structure and ROS 2 systems.
  • vendor ID: vendor ID in the context of RTPS Packet Structure and ROS 2 systems.

Mental model diagram (ASCII)

[User Code] -> [RTPS Packet Structure] -> [rmw/DDS] -> [Wire/Runtime Effects]
       |             |               |                 |
   Config/API     Policies        Entities         Observability

How it works (step-by-step, with invariants and failure modes)

  1. A node configures the concept through API calls or config files.
  2. The rmw layer translates the settings into DDS/RTPS fields (RTPS header, submessage IDs).
  3. Peers evaluate compatibility, matching, or timing using endianness flag and GUID prefix.
  4. The runtime queues or state machines enforce the policy and emit data.
  5. Observability tools (logs, CLI, packet capture) confirm vendor ID behavior.

Minimal concrete example

RTPS | Protocol=2.3 | Vendor=0x010f
SUBMESSAGE: INFO_TS
SUBMESSAGE: DATA(WriterId=0x02c2, Seq=41)

Common misconceptions

  • Assuming defaults are identical across vendors.
  • Believing that discovery implies data flow without validating compatibility.

Check-your-understanding questions

  1. Explain how RTPS Packet Structure changes runtime behavior in ROS 2.
  2. Predict what happens if RTPS header conflicts with submessage IDs.
  3. Why might two nodes discover each other but still exchange no data?

Check-your-understanding answers

  1. It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
  2. The endpoints fail to match or drop messages due to incompatible policy/encoding.
  3. QoS or policy mismatch prevents writer-reader matching or delivery.

Real-world applications

  • packet sniffers for ROS 2
  • interop debugging between DDS vendors

Where you’ll apply it

  • You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
  • Also used in: P02-the-skeleton-node-c-no-boilerplate.md and other projects in this series.

References

  • OMG DDSI-RTPS Specification
  • Wireshark RTPS dissector docs

Key insights

  • RTPS Packet Structure is the lever that connects configuration to observable system behavior.

Summary

This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.

Homework/Exercises to practice the concept

  1. Capture or log a minimal trace where this concept is visible.
  2. Change one policy/setting and predict the system impact before running it.
  3. Explain the failure mode you expect if the configuration is wrong.

Solutions to the homework/exercises

  1. The trace should show the concept-specific fields or events you expect.
  2. Your prediction should name which endpoints match and how latency/loss changes.
  3. A wrong configuration should lead to mismatch, dropped data, or timeouts.

Port Mapping Formula

Fundamentals

Port Mapping Formula is the deterministic mapping from ROS_DOMAIN_ID / participant ID to UDP port numbers. In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how 7400 base port and domain ID influence behavior. When you debug a system, you will almost always inspect participant ID or multicast ports first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.

Deep Dive into the concept

A deeper look at Port Mapping Formula starts by tracing data from the API surface to the middleware. Every time you configure 7400 base port or domain ID, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in participant ID can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).

Failure modes cluster around mismatched assumptions. If multicast ports is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If unicast ports is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.

How this fits on projects

This concept directly shapes how you implement and validate Project 1. You will configure it, observe it, and stress it under controlled conditions.

Definitions & key terms

  • 7400 base port: 7400 base port in the context of Port Mapping Formula and ROS 2 systems.
  • domain ID: domain ID in the context of Port Mapping Formula and ROS 2 systems.
  • participant ID: participant ID in the context of Port Mapping Formula and ROS 2 systems.
  • multicast ports: multicast ports in the context of Port Mapping Formula and ROS 2 systems.
  • unicast ports: unicast ports in the context of Port Mapping Formula and ROS 2 systems.

Mental model diagram (ASCII)

[User Code] -> [Port Mapping Formula] -> [rmw/DDS] -> [Wire/Runtime Effects]
       |             |               |                 |
   Config/API     Policies        Entities         Observability

How it works (step-by-step, with invariants and failure modes)

  1. A node configures the concept through API calls or config files.
  2. The rmw layer translates the settings into DDS/RTPS fields (7400 base port, domain ID).
  3. Peers evaluate compatibility, matching, or timing using participant ID and multicast ports.
  4. The runtime queues or state machines enforce the policy and emit data.
  5. Observability tools (logs, CLI, packet capture) confirm unicast ports behavior.

Minimal concrete example

Domain 0: 7400 (multicast), 7410/7411 (user data)
Domain 5: 8650 base (7400 + 250*5)

Common misconceptions

  • Assuming defaults are identical across vendors.
  • Believing that discovery implies data flow without validating compatibility.

Check-your-understanding questions

  1. Explain how Port Mapping Formula changes runtime behavior in ROS 2.
  2. Predict what happens if 7400 base port conflicts with domain ID.
  3. Why might two nodes discover each other but still exchange no data?

Check-your-understanding answers

  1. It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
  2. The endpoints fail to match or drop messages due to incompatible policy/encoding.
  3. QoS or policy mismatch prevents writer-reader matching or delivery.

Real-world applications

  • firewall rules for ROS 2
  • multi-robot domain isolation

Where you’ll apply it

  • You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
  • Also used in: P02-the-skeleton-node-c-no-boilerplate.md and other projects in this series.

References

  • DDSI-RTPS spec section on port calculation
  • ROS 2 networking guides

Key insights

  • Port Mapping Formula is the lever that connects configuration to observable system behavior.

Summary

This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.

Homework/Exercises to practice the concept

  1. Capture or log a minimal trace where this concept is visible.
  2. Change one policy/setting and predict the system impact before running it.
  3. Explain the failure mode you expect if the configuration is wrong.

Solutions to the homework/exercises

  1. The trace should show the concept-specific fields or events you expect.
  2. Your prediction should name which endpoints match and how latency/loss changes.
  3. A wrong configuration should lead to mismatch, dropped data, or timeouts.

Multicast Discovery

Fundamentals

Multicast Discovery is how ROS 2 participants use multicast to discover each other without a central server. In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how 239.255.0.1 and PDP announcements influence behavior. When you debug a system, you will almost always inspect IGMP or multicast TTL first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.

Deep Dive into the concept

A deeper look at Multicast Discovery starts by tracing data from the API surface to the middleware. Every time you configure 239.255.0.1 or PDP announcements, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in IGMP can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).

Failure modes cluster around mismatched assumptions. If multicast TTL is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If switch filtering is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.

How this fits on projects

This concept directly shapes how you implement and validate Project 1. You will configure it, observe it, and stress it under controlled conditions.

Definitions & key terms

  • 239.255.0.1: 239.255.0.1 in the context of Multicast Discovery and ROS 2 systems.
  • PDP announcements: PDP announcements in the context of Multicast Discovery and ROS 2 systems.
  • IGMP: IGMP in the context of Multicast Discovery and ROS 2 systems.
  • multicast TTL: multicast TTL in the context of Multicast Discovery and ROS 2 systems.
  • switch filtering: switch filtering in the context of Multicast Discovery and ROS 2 systems.

Mental model diagram (ASCII)

[User Code] -> [Multicast Discovery] -> [rmw/DDS] -> [Wire/Runtime Effects]
       |             |               |                 |
   Config/API     Policies        Entities         Observability

How it works (step-by-step, with invariants and failure modes)

  1. A node configures the concept through API calls or config files.
  2. The rmw layer translates the settings into DDS/RTPS fields (239.255.0.1, PDP announcements).
  3. Peers evaluate compatibility, matching, or timing using IGMP and multicast TTL.
  4. The runtime queues or state machines enforce the policy and emit data.
  5. Observability tools (logs, CLI, packet capture) confirm switch filtering behavior.

Minimal concrete example

udp dst 239.255.0.1:7400 -> RTPS (PDP)

Common misconceptions

  • Assuming defaults are identical across vendors.
  • Believing that discovery implies data flow without validating compatibility.

Check-your-understanding questions

  1. Explain how Multicast Discovery changes runtime behavior in ROS 2.
  2. Predict what happens if 239.255.0.1 conflicts with PDP announcements.
  3. Why might two nodes discover each other but still exchange no data?

Check-your-understanding answers

  1. It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
  2. The endpoints fail to match or drop messages due to incompatible policy/encoding.
  3. QoS or policy mismatch prevents writer-reader matching or delivery.

Real-world applications

  • plug-and-play robots on LANs
  • lab networks with multicast enabled

Where you’ll apply it

  • You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
  • Also used in: P02-the-skeleton-node-c-no-boilerplate.md and other projects in this series.

References

  • TCP/IP Illustrated Vol 1 (multicast)
  • ROS 2 discovery documentation

Key insights

  • Multicast Discovery is the lever that connects configuration to observable system behavior.

Summary

This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.

Homework/Exercises to practice the concept

  1. Capture or log a minimal trace where this concept is visible.
  2. Change one policy/setting and predict the system impact before running it.
  3. Explain the failure mode you expect if the configuration is wrong.

Solutions to the homework/exercises

  1. The trace should show the concept-specific fields or events you expect.
  2. Your prediction should name which endpoints match and how latency/loss changes.
  3. A wrong configuration should lead to mismatch, dropped data, or timeouts.

DDS Discovery (PDP/EDP)

Fundamentals

DDS Discovery (PDP/EDP) is the two-stage discovery protocol that announces participants (PDP) and endpoints (EDP). In ROS 2, this concept defines how nodes coordinate, exchange data, and enforce guarantees. At a minimum you should be able to name the primary entities involved, identify where configuration lives, and explain how participant discovery and endpoint discovery influence behavior. When you debug a system, you will almost always inspect built-in endpoints or SPDP first because those details surface mismatches early. The practical goal is to build a mental map that connects the API knobs you change to the wire-level or runtime effects you observe. If you can explain this concept without naming a single ROS 2 command, you know it as a systems principle rather than a tooling trick, which is exactly what you need for production robotics.

Deep Dive into the concept

A deeper look at DDS Discovery (PDP/EDP) starts by tracing data from the API surface to the middleware. Every time you configure participant discovery or endpoint discovery, ROS 2 expresses that intent in the rmw layer, which then maps the intent into DDS-RTPS structures. The mapping is not always one-to-one: a single policy or field can affect multiple runtime behaviors, including buffering, matching, and timing. This is why a simple change in built-in endpoints can cause a subscriber to stop receiving data, or why two vendors can discover each other but never exchange payloads. The useful diagnostic strategy is to observe the graph (who matched), then the transport (what packets appear), and finally the runtime state (queues, deadlines, timers).

Failure modes cluster around mismatched assumptions. If SPDP is configured incorrectly, you may see data on one machine but not another, or discover that messages arrive but are rejected silently. If SEDP is too restrictive, you will observe a graph that looks healthy but never transitions into active data flow. In embedded settings, this can appear as missed deadlines or watchdog resets rather than explicit errors. A robust design therefore includes explicit validation: log the effective policy, emit version identifiers, and test a known-good baseline before you change parameters. This project forces that discipline because you will create repeatable experiments and capture deterministic outputs, so you can explain not only what happened but why it happened.

How this fits on projects

This concept directly shapes how you implement and validate Project 1. You will configure it, observe it, and stress it under controlled conditions.

Definitions & key terms

  • participant discovery: participant discovery in the context of DDS Discovery (PDP/EDP) and ROS 2 systems.
  • endpoint discovery: endpoint discovery in the context of DDS Discovery (PDP/EDP) and ROS 2 systems.
  • built-in endpoints: built-in endpoints in the context of DDS Discovery (PDP/EDP) and ROS 2 systems.
  • SPDP: SPDP in the context of DDS Discovery (PDP/EDP) and ROS 2 systems.
  • SEDP: SEDP in the context of DDS Discovery (PDP/EDP) and ROS 2 systems.

Mental model diagram (ASCII)

[User Code] -> [DDS Discovery (PDP/EDP)] -> [rmw/DDS] -> [Wire/Runtime Effects]
       |             |               |                 |
   Config/API     Policies        Entities         Observability

How it works (step-by-step, with invariants and failure modes)

  1. A node configures the concept through API calls or config files.
  2. The rmw layer translates the settings into DDS/RTPS fields (participant discovery, endpoint discovery).
  3. Peers evaluate compatibility, matching, or timing using built-in endpoints and SPDP.
  4. The runtime queues or state machines enforce the policy and emit data.
  5. Observability tools (logs, CLI, packet capture) confirm SEDP behavior.

Minimal concrete example

PDP: new participant GUID prefix announced
EDP: writer reader endpoints exchanged

Common misconceptions

  • Assuming defaults are identical across vendors.
  • Believing that discovery implies data flow without validating compatibility.

Check-your-understanding questions

  1. Explain how DDS Discovery (PDP/EDP) changes runtime behavior in ROS 2.
  2. Predict what happens if participant discovery conflicts with endpoint discovery.
  3. Why might two nodes discover each other but still exchange no data?

Check-your-understanding answers

  1. It alters matching, buffering, or timing constraints expressed via DDS/RTPS.
  2. The endpoints fail to match or drop messages due to incompatible policy/encoding.
  3. QoS or policy mismatch prevents writer-reader matching or delivery.

Real-world applications

  • diagnosing missing topics
  • scaling multi-robot graphs

Where you’ll apply it

  • You will apply it in Section 5.4 (Concepts You Must Understand First), Section 5.10 (Implementation Phases), and Section 6.2 (Critical Test Cases).
  • Also used in: P02-the-skeleton-node-c-no-boilerplate.md and other projects in this series.

References

  • DDSI-RTPS spec
  • Fast DDS discovery docs

Key insights

  • DDS Discovery (PDP/EDP) is the lever that connects configuration to observable system behavior.

Summary

This concept is the bridge between theory and runtime evidence. Mastery means you can predict outcomes, not just observe them.

Homework/Exercises to practice the concept

  1. Capture or log a minimal trace where this concept is visible.
  2. Change one policy/setting and predict the system impact before running it.
  3. Explain the failure mode you expect if the configuration is wrong.

Solutions to the homework/exercises

  1. The trace should show the concept-specific fields or events you expect.
  2. Your prediction should name which endpoints match and how latency/loss changes.
  3. A wrong configuration should lead to mismatch, dropped data, or timeouts.

3. Project Specification

3.1 What You Will Build

A packet sniffer that listens on DDS discovery ports and decodes RTPS headers, GUIDs, and discovery submessages. It prints participant and endpoint metadata in real time.

Included features:

  • Deterministic startup with explicit configuration.
  • Observability (logs/CLI output) that exposes discovery/data flow.
  • A reproducible demo and a failure case.

Excluded on purpose:

  • Full robot control stacks or SLAM pipelines.
  • Custom GUIs beyond CLI output.

3.2 Functional Requirements

  1. **UDP multicast parsing: **UDP multicast parsing -> sniff discovery traffic without false positives.
  2. **RTPS header decoding: **RTPS header decoding -> extract GUID prefix, vendor ID, and submessages.
  3. **Parameter list parsing: **Parameter list parsing -> decode participant and endpoint metadata.
  4. Deterministic startup: The project must start with a reproducible, logged configuration.
  5. Observability: Provide CLI or log output that confirms each major component is working.

3.3 Non-Functional Requirements

  • Performance: Must meet the throughput/latency targets documented in the benchmark.\n- Reliability: Must handle common network or runtime failures gracefully.\n- Usability: CLI flags and logs must make configuration and diagnosis obvious.

3.4 Example Usage / Output

$ sudo python3 rtps_sniff.py -i wlan0 --domain 0
[+] Listening on 7400/7410/7411
[PDP] GUID: 01:0f:45:... vendor: eProsima
[EDP] Writer /chatter type std_msgs/msg/String

3.5 Data Formats / Schemas / Protocols

RTPS Header (16 bytes)
- Magic: 'RTPS'
- Version: major/minor
- Vendor ID: 2 bytes
- GUID Prefix: 12 bytes

Submessage: [ID][Flags][Length][Payload...]

3.6 Edge Cases

  • Packets on non-default domain ID
  • Malformed submessage length
  • Multicast blocked

3.7 Real World Outcome

By the end of this project you will have a reproducible system that produces the same observable signals every time you run it. You will be able to point to console output, captured packets, or bag files and explain exactly why the result is correct. You will also be able to force a failure and demonstrate a clean error path.

3.7.1 How to Run (Copy/Paste)

# Build
colcon build --packages-select project_1
# Run
source install/setup.bash
# Start the main node/tool
./run_project_1.sh

3.7.2 Golden Path Demo (Deterministic)

$ sudo python3 rtps_sniff.py -i wlan0 --domain 0
[+] Listening on 7400/7410/7411
[PDP] GUID: 01:0f:45:... vendor: eProsima
[EDP] Writer /chatter type std_msgs/msg/String

3.7.3 Failure Demo (Deterministic)

$ sudo python3 rtps_sniff.py -i wlan0 --domain 42
[!] No RTPS traffic detected after 10s (check ROS_DOMAIN_ID)

4. Solution Architecture

4.1 High-Level Design

[Input/Config] -> [Core Engine] -> [ROS 2/DDS] -> [Observability Output]

4.2 Key Components

Component Responsibility Key Decisions
Capture Engine Sniff UDP discovery ports and timestamp packets Use libpcap/scapy for portability
RTPS Decoder Parse headers and submessages into structured events Handle endianness and padding
Event Reporter Print participant and endpoint events in a stable format Deterministic output ordering

4.3 Data Structures (No Full Code)

struct RtpsHeader {
  char magic[4];
  uint8_t version[2];
  uint16_t vendor;
  uint8_t guid_prefix[12];
};

4.4 Algorithm Overview

Key Algorithm: Core Pipeline

  1. Capture packet
  2. Validate RTPS header
  3. Iterate submessages
  4. Emit discovery event

Complexity Analysis:

  • Time: O(n) over messages/events processed
  • Space: O(1) to O(n) depending on buffering

5. Implementation Guide

5.1 Development Environment Setup

# Install ROS 2 and dependencies
sudo apt-get update
sudo apt-get install -y ros-$ROS_DISTRO-ros-base python3-colcon-common-extensions

5.2 Project Structure

project-root/
|-- src/
|   |-- main.cpp
|   |-- config.yaml
|   `-- utils.cpp
|-- scripts/
|   `-- run_project.sh
|-- tests/
|   `-- test_core.py
`-- README.md

5.3 The Core Question You’re Answering

“How does Node A discover Node B without a master, and how can I verify it on the wire?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. RTPS Packet Structure
    • What breaks if this is misconfigured?
    • How will you observe it?
  2. Port Mapping Formula
    • What breaks if this is misconfigured?
    • How will you observe it?
  3. Multicast Discovery
    • What breaks if this is misconfigured?
    • How will you observe it?
  4. DDS Discovery (PDP/EDP)
    • What breaks if this is misconfigured?
    • How will you observe it?

5.5 Questions to Guide Your Design

  1. Capture Strategy
    • Which UDP ports should be filtered for discovery vs data?
    • How will you differentiate PDP vs EDP packets?
  2. Parser Design
    • Will you use a binary parser or Scapy fields?
    • How will you handle unknown submessage types?
  3. Output Format
    • What metadata is most helpful for debugging?
    • Should you log vendor ID and domain ID?

5.6 Thinking Exercise

The GUID Trace Given GUID Prefix 01:0f:12:34:56:78:9a:bc:de:f0:11:22:

  • Which bytes identify the participant vs endpoint?
  • If a node launches two publishers, which GUID portion changes?

5.7 The Interview Questions They’ll Ask

  1. “Explain PDP vs EDP in DDS discovery.”
  2. “Why does ROS 2 use multicast for discovery?”
  3. “How does ROS_DOMAIN_ID affect port selection?”
  4. “What RTPS submessages are involved in discovery?”
  5. “How would you debug a node that can’t see peers?”

5.8 Hints in Layers

Hint 1: Start with Port 7400

sniff(filter="udp port 7400", prn=process)

Hint 2: Look for ‘RTPS’ magic Check the first 4 bytes of payload. Hint 3: Submessage Loop Parse submessages until you find DATA or INFO_TS. Hint 4: Vendor IDs Compare vendor ID bytes to known DDS vendor lists.

5.9 Books That Will Help

Topic Book Chapter
Topic Book Chapter
UDP & Multicast “TCP/IP Illustrated, Vol. 1” Ch. 10, 12
Binary Protocol Parsing “Computer Networks” by Tanenbaum Ch. 1-2

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

  • Reproduce the baseline example from the original project outline.
  • Validate toolchain, dependencies, and environment variables.

Tasks:

  1. Create the repository and baseline project structure.
  2. Run a minimal example to confirm discovery/data flow.

Checkpoint: You can reproduce the minimal example and collect logs.

Phase 2: Core Functionality (2-3 weeks)

Goals:

  • Implement the full feature set from the requirements.
  • Instrument key metrics and logs.

Tasks:

  1. Implement each component and integrate them.
  2. Add CLI/config flags for core parameters.

Checkpoint: Golden path demo succeeds with deterministic output.

Phase 3: Polish & Edge Cases (3-5 days)

Goals:

  • Handle failure scenarios and document them.
  • Create a short report/README describing results.

Tasks:

  1. Add error handling, timeouts, and validation.
  2. Capture failure demo output and metrics.

Checkpoint: Failure demo yields the expected errors and exit codes.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Transport UDP, shared memory, serial UDP for baseline Simplest to observe and debug
QoS Default, tuned Default then tune Establish baseline before optimization

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate parsers and helpers Packet decoder, config parser
Integration Tests End-to-end ROS 2 flow Publisher -> Subscriber -> Metrics
Edge Case Tests Failures & mismatches Wrong domain ID, missing config

6.2 Critical Test Cases

  1. Test 1: Baseline message flow works end-to-end.
  2. Test 2: Configuration mismatch produces a clear, actionable error.
  3. Test 3: Performance/latency stays within documented bounds.

6.3 Test Data

Use a fixed dataset or fixed random seed to make metrics reproducible.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
QoS mismatch Discovery works but no data Align policies explicitly
Misconfigured env vars No nodes discovered Print and validate env on startup
Network filtering Intermittent data Check firewall and multicast settings

7.2 Debugging Strategies

  • Start from the graph: confirm discovery before tuning QoS.
  • Capture packets: validate that RTPS traffic appears on expected ports.

7.3 Performance Traps

If throughput is low, check for unnecessary serialization, small history depth, or lack of shared memory.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add verbose logging and a dry-run mode.
  • Add a simple configuration file parser.

8.2 Intermediate Extensions

  • Add metrics export to CSV or JSON.
  • Add automated regression tests.

8.3 Advanced Extensions

  • Implement cross-vendor compatibility validation.
  • Add chaos testing with randomized loss/latency patterns.

9. Real-World Connections

9.1 Industry Applications

  • Fleet robotics where reliability must be guaranteed under lossy Wi-Fi.
  • Industrial systems that require deterministic startup and clear failure modes.
  • ROS 2 core repositories (rcl, rmw, rosidl)
  • DDS vendors: Fast DDS, Cyclone DDS

9.3 Interview Relevance

  • Explain QoS compatibility and discovery failures.
  • Describe how to debug why nodes discover but do not communicate.

10. Resources

10.1 Essential Reading

  • “TCP/IP Illustrated, Volume 1” by Fall & Stevens (focus on the sections related to RTPS Packet Structure)
  • ROS 2 official docs for the specific APIs used in this project

10.2 Video Resources

  • ROS 2 community talks on middleware and DDS
  • Vendor tutorials on discovery and QoS

10.3 Tools & Documentation

  • ROS 2 CLI and rclcpp/rclpy docs
  • Wireshark or tcpdump for network visibility
  • Project 1: Builds prerequisite concepts
  • Project 2: Extends the middleware layer

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain RTPS Packet Structure without notes
  • I can explain how QoS and discovery interact
  • I understand why the system fails when policies mismatch

11.2 Implementation

  • All functional requirements are met
  • Golden path demo succeeds
  • Failure demo produces expected errors

11.3 Growth

  • I can explain this project in a technical interview
  • I documented lessons learned and configs
  • I can reproduce the results on another machine

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Golden path demo output matches documentation
  • At least one failure scenario is documented
  • Metrics or logs demonstrate correct behavior

Full Completion:

  • All minimum criteria plus:
  • Compatibility verified across at least two QoS settings
  • Results written to a short report

Excellence (Going Above & Beyond):

  • Automated regression tests for discovery/QoS behavior
  • Clear compatibility matrix or benchmark chart