Project 1: The Space Packet Parser (CCSDS Protocol)
Build a deterministic CLI parser that frames a raw byte stream into CCSDS Space Packets, validates header/length rules, tracks sequence gaps per APID, and emits machine-readable telemetry metadata.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Intermediate |
| Time Estimate | 1-2 weeks |
| Main Programming Language | C |
| Alternative Programming Languages | Rust, C++, Python |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | Level 3: Service & Support Tooling |
| Prerequisites | C bitwise ops, binary I/O, endian awareness, basic packet formats |
| Key Topics | CCSDS primary header, sequence control, stream framing, resynchronization |
1. Learning Objectives
By completing this project, you will:
- Decode CCSDS primary headers down to individual bits and validate packet length rules.
- Implement a robust stream framer that can resynchronize after corruption or byte loss.
- Track per-APID sequence counts and detect gaps, wraparound, and duplication.
- Emit deterministic, structured output (JSON/CSV) suitable for ground ops pipelines.
- Design test vectors and golden files that prove correctness under noisy inputs.
2. All Theory Needed (Per-Concept Breakdown)
This section is the complete theory primer you need before writing production-grade code.
CCSDS Space Packet Structure (Primary + Secondary Header)
Fundamentals CCSDS Space Packets are the lingua franca of satellite telemetry and telecommand. The primary header is a fixed 6-byte structure that tells you how to interpret every byte that follows: version, packet type, secondary header presence, APID, sequence flags, sequence count, and length. Understanding these bits is not optional; it is how you tell a health packet from a payload packet, how you find packet boundaries, and how you avoid corrupting higher-level decoding. The secondary header is mission-specific and often contains timestamps or service identifiers. A correct parser must respect the standard length rule: the “packet length” field encodes (total_bytes_after_primary_header - 1). If you do not internalize that one-off rule, every downstream check will be wrong.
Deep Dive into the concept The CCSDS Space Packet primary header is purposely compact because bandwidth and CPU cycles are scarce in flight systems. It is packed into 6 bytes, which means you must implement bit-accurate parsing with no ambiguity in endianness. The version field is 3 bits and is almost always 0 for current missions. The packet type is 1 bit (0 = telemetry, 1 = telecommand), which often determines how the ground system routes the packet. The secondary header flag tells you whether the packet uses a standard secondary header; many missions use it to indicate presence of a timecode or a service type. The APID (11 bits) is the logical channel; a single downlink stream can contain dozens of APIDs for different subsystems. The sequence flags (2 bits) indicate standalone vs segmented packets (first, continuation, last). The sequence count (14 bits) is your continuity indicator and is critical for gap detection and reassembly. Finally, the length field is a 16-bit value encoding the number of bytes following the primary header minus one. That means total_packet_length = length + 1 + 6. It is easy to get wrong, and it is the most common bug in first-time implementations.
In practice, you cannot just decode; you must validate. The parser must confirm that length does not exceed the remaining bytes in the stream buffer. It must detect impossible combinations (e.g., continuation segment without a prior start). It must handle packet segmentation: if sequence flags indicate the start of a segmented packet, you must buffer payload until the “last” flag arrives. If you are building a pure parser, you still must surface the segmentation state because downstream users need to know whether a payload is complete. Secondary headers are mission-specific, but you should design the parser to expose raw secondary header bytes or allow pluggable decoders. Do not hard-code mission details into the primary header parsing layer; keep that layer pure and stable.
Another subtlety is endianness. The CCSDS primary header is defined in big-endian bit ordering. On little-endian systems, you cannot cast to a struct and expect correct results. You must extract fields by shifting and masking, or read as big-endian 16-bit words. A reliable approach is to read the first two bytes as a 16-bit value (big-endian), then mask the version/type/secondary/APID fields. Do the same for the sequence field (next two bytes) and length field (last two bytes). This makes your code deterministic across compilers and architectures.
Finally, recognize that CCSDS Space Packets are not the same as the transfer frame (e.g., CCSDS TM/TC frames). If your input stream is captured from a downlink, you may have frames that contain multiple packets or packet fragments. For this project you will assume a raw packet stream, but you should explicitly document that assumption and guard against accidental frame data by rejecting illegal headers. A “strict mode” that refuses invalid version numbers or lengths is a powerful debugging tool when integrating with ground systems.
How this fit on projects This concept drives Section 3.2 functional requirements, Section 4.1 architecture (decoder), and Section 6 testing for header validation and segmentation edge cases.
Definitions & key terms
- APID (Application Process ID) -> 11-bit logical channel identifying a telemetry source.
- Sequence Flags -> 2-bit indicator: standalone, first, continuation, last.
- Packet Length -> Number of bytes after the primary header minus one.
- Secondary Header -> Mission-specific header following the primary header.
Mental model diagram (ASCII)
| Primary Header (6B) | Secondary Header (0..N) | Payload (0..N) |
| V T S APID | SeqFlags SeqCount | Length |
How it works (step-by-step, with invariants and failure modes)
- Read 6 bytes for the primary header.
- Extract version/type/sec/APID/seq flags/seq count/length via masks.
- Compute total length = length + 1 + 6.
- Validate that total length fits available bytes.
- If sequence flags indicate segmentation, update reassembly state.
- Pass raw secondary header/payload to downstream or emit metadata.
Invariants: length must be >= 0 and <= max_packet_size; version must be supported; sequence continuity must be monotonic per APID unless reset explicitly.
Failure modes: off-by-one length errors, invalid seq flag transitions, buffer overrun when length exceeds stream, mis-decoding due to endianness.
Minimal concrete example
uint16_t word0 = (buf[0] << 8) | buf[1];
uint16_t word1 = (buf[2] << 8) | buf[3];
uint16_t word2 = (buf[4] << 8) | buf[5];
uint8_t version = (word0 >> 13) & 0x7;
uint8_t type = (word0 >> 12) & 0x1;
uint8_t sec = (word0 >> 11) & 0x1;
uint16_t apid = word0 & 0x7FF;
uint8_t seq_flags = (word1 >> 14) & 0x3;
uint16_t seq_count = word1 & 0x3FFF;
uint16_t pkt_len = word2;
size_t total = (size_t)pkt_len + 1 + 6;
Common misconceptions
- “Packet length is total length” -> It is total minus primary header minus 1.
- “You can cast to a struct” -> Bit order and padding make that unsafe.
- “All packets are standalone” -> Segmentation flags are real and must be tracked.
Check-your-understanding questions
- Why is the packet length field defined as “bytes after header minus one”?
- What breaks if you treat the header as little-endian on a x86 host?
- How would you detect a missing “last” segment in a sequence?
Check-your-understanding answers
- It allows a 16-bit field to represent a length of 1..65536 bytes; the -1 encodes the minimum size.
- Bitfields land in the wrong positions; APID and flags become garbage.
- Track a reassembly state; if a new “first” arrives before “last”, the prior sequence is incomplete.
Real-world applications
- Ground systems that ingest raw downlink streams and triage telemetry.
- Flight software packet routers that multiplex payload and housekeeping data.
Where you’ll apply it
- See Section 3.2 (header decoding) and Section 6.2 (critical tests).
- Also used in: P07-priority-telemetry-scheduler-the-traffic-cop.md, P12-ground-station-command-console-the-hmi.md
References
- CCSDS 133.0-B-1 (Space Packet Protocol)
- Spacecraft Systems Engineering (Data Handling chapter)
Key insights The primary header is the contract; if you decode it wrong, everything else is fiction.
Summary Mastering the header layout gives you deterministic packet boundaries and reliable metadata.
Homework/Exercises to practice the concept
- Manually decode three CCSDS headers from hex dumps and verify the lengths.
Solutions to the homework/exercises
- Decode with masks and confirm total length = pkt_len + 7.
Stream Framing, Resynchronization, and Sequence Control
Fundamentals A downlink stream is just bytes. If you lose a byte, your parser shifts and every subsequent header is garbage. Framing is the discipline of re-establishing packet boundaries reliably. Because CCSDS packets do not include a fixed sync word, you must infer boundaries from the header itself: the version field, allowable APID ranges, sequence flags, and the declared length. Sequence control is your continuity checksum; it does not tell you where a packet starts, but it tells you if you missed one. Together, framing + sequence control give you operational confidence that your telemetry is complete and ordered.
Deep Dive into the concept In real missions, radio links drop bytes, reorder frames, or interleave packets from multiple APIDs. Your parser must treat the stream as untrusted. The simplest framer reads 6 bytes, parses the header, computes the packet length, and then consumes that many bytes. If the length is invalid (e.g., beyond buffer size or above a configured maximum), you must resynchronize. Resynchronization is an algorithmic search problem: slide forward one byte at a time and attempt to parse a header until you find a plausible one. “Plausible” is domain-specific; you should verify version == 0, APID in allowed range, seq flags not illegal, and length within configured bounds. This is not foolproof, but it is often sufficient to recover from isolated corruption.
Sequence control adds a second layer. For each APID you track the last seen sequence count. When a new packet arrives, you check whether the count increments by one modulo 16384. If not, you log a gap. This gap can be caused by dropped packets, but also by resets or mode changes. Therefore you should allow a controlled “reset” event per APID (triggered by a command or by seeing a “standalone” packet after a long gap). The key is to record ambiguity explicitly: a gap is an operational event. Your output should include fields like gap=true and expected_seq to make the problem visible to operators.
Segmentation complicates sequence control because sequence counts usually apply to packet segments. If a packet is split into multiple segments, each segment increments the sequence count. That means you cannot assume that a gap corresponds to a missing complete payload; it may be a missing segment of a larger unit. A robust parser should surface segmentation status and keep partial reassembly state, but it should not silently “fix” gaps. In this project, you will implement tracking but leave reassembly optional. Document the chosen behavior clearly in output.
Resynchronization must be deterministic. Use a maximum scan window (e.g., 4 KB) and a deterministic rule for which candidate header to accept (the first plausible header that yields a length fully contained in the buffer). This ensures that two runs on the same file produce identical outputs. If no candidate is found within the window, drop bytes and emit a resync error with a counter.
Finally, you need to test framing failures. Generate corrupted streams by flipping bits or removing bytes from a golden file. Then verify that your parser reports a resync event and resumes correct decoding after the error region. This is the difference between a parser and an operational tool.
How this fit on projects This concept drives Section 3.2 (sequence tracking), Section 4.1 (framer), and Section 7.1 (debugging strategies).
Definitions & key terms
- Framing -> Identifying packet boundaries in a byte stream.
- Resynchronization -> Searching for a plausible header after corruption.
- Sequence Gap -> Non-monotonic sequence count progression per APID.
- Segment -> A portion of a multi-part packet indicated by sequence flags.
Mental model diagram (ASCII)
Byte Stream -> [Scan 6B] -> Header? -> Len -> Consume -> Next
| invalid
v
Shift +1 and retry
How it works (step-by-step, with invariants and failure modes)
- Attempt to parse a header at the current offset.
- If invalid, increment offset by 1 and retry until max window.
- If valid, compute total length and check buffer bounds.
- Consume packet, update APID sequence state.
- Emit gap/resync events as needed.
Invariants: offset never decreases; accepted headers always satisfy version/length bounds; sequence tracking is per APID.
Failure modes: false positives on random bytes, endless scan loops, missing detection of wraparound, confusing segmentation with standalone packets.
Minimal concrete example
while (offset + 6 <= size) {
if (!header_plausible(buf + offset)) { offset++; continue; }
size_t total = packet_total(buf + offset);
if (offset + total > size) break; // incomplete
process_packet(buf + offset, total);
offset += total;
}
Common misconceptions
- “Sequence gaps always mean corruption” -> They can also mean resets or APID reconfiguration.
- “Resync means start at the next byte” -> Sometimes you should bound the scan window to avoid false headers.
- “Segmentation isn’t used” -> Many missions use it for payload data.
Check-your-understanding questions
- What makes a header “plausible” for resync purposes?
- Why track sequence counts per APID instead of globally?
- How do you handle a wraparound from 16383 to 0?
Check-your-understanding answers
- Valid version, APID in range, reasonable length, and legal sequence flags.
- APIDs represent independent sources; their sequences are not correlated.
- Treat 0 as the next expected value after 16383.
Real-world applications
- Ground station decoders that must recover from RF dropouts.
- Onboard recorders that must validate captured data.
Where you’ll apply it
- See Section 3.2 (sequence tracking), Section 5.10 (implementation phases), Section 6.2 (gap tests).
- Also used in: P11-fdir-watchdog-the-dead-mans-switch.md, P07-priority-telemetry-scheduler-the-traffic-cop.md
References
- CCSDS 133.0-B-1 (Sequence control rules)
- NASA Mission Success Handbook (telemetry integrity practices)
Key insights A parser becomes an ops tool only when it can recover from bad bytes and explain the gaps.
Summary Framing and sequence control turn raw bytes into trustworthy telemetry timelines.
Homework/Exercises to practice the concept
- Corrupt a packet stream by deleting one byte and note how your resync algorithm recovers.
Solutions to the homework/exercises
- Use a scan loop with header plausibility checks and log the resync offset.
3. Project Specification
3.1 What You Will Build
A CLI tool that reads a binary CCSDS packet stream, decodes primary headers, validates length and sequence rules, and emits a deterministic JSON or CSV summary suitable for mission operations.
Included:
- Stream framing with resynchronization
- Header decoding and validation
- Per-APID sequence tracking
- Structured output with error events
Excluded:
- Full CCSDS transfer frame decoding
- Mission-specific secondary header decoding (optional plug-in stub only)
3.2 Functional Requirements
- Header decoding: Extract version, type, secondary flag, APID, sequence flags/count, packet length.
- Length validation: Compute total packet size and reject mismatches or overflows.
- Sequence tracking: Track per-APID sequence counts; report gaps and wraparound.
- Resynchronization: Recover from invalid headers by scanning forward with bounded window.
- Structured output: Emit JSON/CSV lines with deterministic ordering.
3.3 Non-Functional Requirements
- Performance: Sustain parsing at 1 Mbps on a laptop CPU.
- Reliability: Never crash on malformed input; emit errors and continue.
- Usability: Simple CLI flags with
--input,--format,--max-pkt.
3.4 Example Usage / Output
$ ./ccsds_parse --input samples/golden.bin --format json
{"offset":0,"apid":929,"seq":124,"len":143,"gap":false}
{"offset":149,"apid":929,"seq":125,"len":143,"gap":false}
3.5 Data Formats / Schemas / Protocols
Primary header layout (16-bit words, big-endian):
word0: [ver:3][type:1][sec:1][apid:11]
word1: [seq_flags:2][seq_count:14]
word2: [length:16]
JSON output schema (one line per packet):
{
"offset": 0,
"apid": 929,
"type": 0,
"sec": 1,
"seq": 124,
"seq_flags": 3,
"length": 143,
"total_len": 150,
"gap": false,
"expected_seq": 124,
"error": null
}
3.6 Edge Cases
- Length field smaller than minimum payload.
- Length field larger than remaining stream bytes.
- Sequence count wraparound (16383 -> 0).
- Segmented packets without a terminating segment.
- Corrupt headers that still look plausible.
3.7 Real World Outcome
A parser that an operator can use to inspect a downlink capture, immediately see APID continuity, and identify corrupted regions.
3.7.1 How to Run (Copy/Paste)
./ccsds_parse --input samples/golden.bin --format json --max-pkt 2048
3.7.2 Golden Path Demo (Deterministic)
- Use the provided
samples/golden.binfile. - Output must match
samples/golden.expected.jsonexactly. - All offsets and sequence counts must be deterministic.
3.7.3 Failure Demo (Deterministic)
Run on a deliberately corrupted stream:
./ccsds_parse --input samples/corrupt.bin --format json
Expected behavior: emit at least one error line with error="RESYNC" and continue parsing after the corrupted region.
3.7.4 If CLI: Exact Terminal Transcript
$ ./ccsds_parse --input samples/golden.bin --format json
{"offset":0,"apid":929,"seq":124,"len":143,"gap":false,"error":null}
{"offset":149,"apid":929,"seq":125,"len":143,"gap":false,"error":null}
$ ./ccsds_parse --input samples/corrupt.bin --format json
{"offset":0,"apid":929,"seq":124,"len":143,"gap":false,"error":null}
{"offset":149,"error":"RESYNC","skipped":3}
{"offset":152,"apid":929,"seq":125,"len":143,"gap":false,"error":null}
ExitCode=2
4. Solution Architecture
4.1 High-Level Design
Byte Stream -> Framer -> Header Decoder -> Validator -> Seq Tracker -> Output
^ |
|---- Resync Logic --|
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Framer | Locate packet boundaries | Bounded resync window |
| Decoder | Extract header fields | Big-endian masks only |
| Validator | Check length/flags | Strict vs permissive mode |
| Seq Tracker | Track per-APID continuity | Wraparound handling |
| Output | Emit JSON/CSV | Stable ordering + error records |
4.3 Data Structures (No Full Code)
typedef struct {
uint16_t apid;
uint16_t seq;
uint16_t length;
uint8_t seq_flags;
uint8_t type;
uint8_t sec;
} ccsds_header_t;
typedef struct {
uint16_t last_seq;
int has_last;
} apid_state_t;
4.4 Algorithm Overview
Key Algorithm: Framing + validation
- Read 6 bytes.
- Parse header fields by masks.
- Compute total length.
- If invalid, resync by shifting 1 byte.
- If valid, process and emit.
Complexity Analysis:
- Time: O(n) for parsing, O(n * w) in worst-case resync with window w.
- Space: O(APIDs) for sequence tracking.
5. Implementation Guide
5.1 Development Environment Setup
cc -O2 -Wall -Wextra -o ccsds_parse src/*.c
5.2 Project Structure
project-root/
+-- src/
| +-- main.c
| +-- decode.c
| +-- resync.c
| +-- output.c
+-- tests/
| +-- test_decode.c
| +-- test_resync.c
+-- samples/
| +-- golden.bin
| +-- golden.expected.json
| +-- corrupt.bin
+-- README.md
5.3 The Core Question You’re Answering
“How do you turn unreliable raw bytes into trusted telemetry timelines?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- CCSDS primary header field layout and length rule.
- Big-endian parsing on little-endian hosts.
- Sequence flags and wraparound behavior.
5.5 Questions to Guide Your Design
- What is a plausible header and how do you define it?
- How will you bound resynchronization to avoid false positives?
- What error events will operators need to diagnose gaps?
5.6 Thinking Exercise
Sketch a byte stream that includes one missing byte. Mark how your parser shifts and where it should resynchronize.
5.7 The Interview Questions They’ll Ask
- “Why is the CCSDS length field off by one?”
- “How do you handle packet corruption without sync words?”
- “Why track sequences per APID rather than globally?”
5.8 Hints in Layers
Hint 1: Start with a header-only decoder that prints fields.
Hint 2: Add a strict length validator and reject packets that exceed --max-pkt.
Hint 3: Implement resync by sliding 1 byte at a time and checking plausibility.
Hint 4: Add per-APID sequence tracking and emit gap=true when counts jump.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| CCSDS packet format | CCSDS 133.0-B-1 | Primary header |
| Reliable parsing | “The Practice of Programming” (Kernighan & Pike) | Input parsing |
| Embedded data handling | “Spacecraft Systems Engineering” | Data handling |
5.10 Implementation Phases
Phase 1: Header Decoder (2-3 days)
Goals: decode primary header fields correctly. Tasks:
- Implement mask/shift extraction.
- Add unit tests with known hex vectors. Checkpoint: header fields match golden vectors.
Phase 2: Framer + Resync (3-4 days)
Goals: turn stream into packets and recover from corruption. Tasks:
- Implement length calculation and bounds.
- Implement resync scanner with window limit. Checkpoint: corrupted stream logs resync and continues.
Phase 3: Sequence Tracking + Output (3-4 days)
Goals: emit structured logs with gap detection. Tasks:
- Track per-APID seq count, handle wraparound.
- Emit JSON/CSV with deterministic ordering. Checkpoint: golden file output matches expected JSON.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Resync strategy | sliding window / hard reset | Sliding window | Recovers quickly with minimal data loss |
| Output format | JSON / CSV | JSON | Easier to extend with error fields |
| Sequence state | per-APID map / global | per-APID | Matches CCSDS semantics |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Field extraction | Known header vectors |
| Integration Tests | Stream parsing | golden.bin vs expected.json |
| Edge Case Tests | Corruption handling | missing byte, bad length |
6.2 Critical Test Cases
- Header decode: given hex
08 3A C0 01 00 05, fields match expected. - Length mismatch: declared length larger than file triggers error.
- Gap detection: sequence count jumps from 10 -> 12 logs gap.
6.3 Test Data
hex: 08 3A C0 01 00 05
expected: ver=0 type=0 sec=1 apid=0x3A seq_flags=3 seq=1 len=5
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Off-by-one length | Parser stops early | Use total = len + 7 |
| Endianness bug | APID values look random | Mask/shift big-endian words |
| Unbounded resync | Infinite loop | Cap resync window |
7.2 Debugging Strategies
- Hex dumps: print header bytes before decoding to verify alignment.
- Golden vectors: keep a small set of known headers for tests.
7.3 Performance Traps
Avoid repeated per-byte reallocation; keep a fixed buffer and scan with indexes.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add CSV output alongside JSON.
- Support reading from stdin.
8.2 Intermediate Extensions
- Implement segmented packet reassembly with timeouts.
- Add a “strict” vs “lenient” parsing mode.
8.3 Advanced Extensions
- Implement CCSDS transfer frame parsing and extraction.
- Add plugin hooks for mission-specific secondary headers.
9. Real-World Connections
9.1 Industry Applications
- Mission Operations: validating downlink captures for missing packets.
- Onboard Routing: filtering payload data into separate channels.
9.2 Related Open Source Projects
- cFS (core Flight System): CCSDS packet APIs and routers.
- OpenSatKit: ground systems that decode CCSDS packets.
9.3 Interview Relevance
- Packet parsing, bit-level manipulation, and robust stream handling are classic embedded interview topics.
10. Resources
10.1 Essential Reading
- CCSDS 133.0-B-1 - Primary header rules and length semantics.
- Spacecraft Systems Engineering - Data handling and telemetry chapters.
10.2 Video Resources
- CCSDS Space Packet tutorials (conference recordings)
10.3 Tools & Documentation
- xxd/hexdump: inspect raw packet bytes.
- Wireshark CCSDS plugins: compare outputs.
10.4 Related Projects in This Series
- P07-priority-telemetry-scheduler-the-traffic-cop.md - uses packet metadata for scheduling.
- P12-ground-station-command-console-the-hmi.md - uses decoded telemetry.
11. Self-Assessment Checklist
11.1 Understanding
- I can derive total packet length from the CCSDS length field.
- I can explain why sequence tracking is per APID.
- I can describe how resynchronization works after a byte drop.
11.2 Implementation
- All functional requirements are met.
- Golden output matches expected JSON exactly.
- Corrupt inputs produce deterministic error events.
11.3 Growth
- I can describe one improvement for robustness.
- I can explain this parser to a mission ops engineer.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Parses headers and outputs APID/seq/length.
- Detects length mismatches and exits non-zero.
- Passes the golden file test.
Full Completion:
- Includes resynchronization and gap detection.
- Produces structured JSON output.
Excellence (Going Above & Beyond):
- Supports segmented packet reassembly.
- Provides a strict/lenient validation mode with metrics.