← Back to all projects

LEARN DISTRIBUTED TRANSACTIONS SAGAS

In the monolithic world, ACID transactions are our best friends. You open a transaction, update multiple tables, and either everything commits or everything rolls back. Life is simple.

Learn Distributed Transactions (Sagas): From Zero to Saga Master

Goal: Deeply understand the challenges of maintaining data consistency in distributed systems. You will move beyond simple ACID transactions to master the Saga pattern—learning how to coordinate complex business processes across multiple microservices using both orchestration and choreography, handling failures gracefully with compensating transactions, and ensuring eventual consistency without the need for distributed locks or 2PC.


Why Distributed Transactions (Sagas) Matter

In the monolithic world, ACID transactions are our best friends. You open a transaction, update multiple tables, and either everything commits or everything rolls back. Life is simple.

But in a microservices architecture, your data is scattered. An “Order” might involve the Order Service, Payment Service, Inventory Service, and Shipping Service. There is no single database to coordinate the transaction. Traditional distributed transactions like 2-Phase Commit (2PC) don’t scale well; they are slow, prone to deadlocks, and require all participants to be online at the same time.

Sagas were introduced in 1987 by Hector Garcia-Molina and Kenneth Salem to solve the “Long Lived Transaction” (LLT) problem. Instead of one giant lock, a Saga breaks the process into a sequence of local transactions. If one step fails, the Saga executes “compensating transactions” to undo the work of previous steps.

Understanding Sagas is the difference between building a fragile system that leaves data in a corrupted state and building a resilient, enterprise-grade distributed system used by companies like Uber, Netflix, and Amazon.


Core Concept Analysis

The Failure of Atomic Transactions in Microservices

In a monolith:

[ Request ] ──▶ [ Start Transaction ]
                │  - Update Orders Table
                │  - Update Inventory Table
                │  - Update Payment Table
                [ Commit Transaction ] ──▶ [ Success ]

In microservices (The Problem):

[ Order Service ] --(RPC)--> [ Payment Service ] --(RPC)--> [ Inventory Service ]
      │                            │                             │
    [ DB ]                       [ DB ]                        [ DB ]

If the Inventory Service fails, how do you “rollback” the payment that already happened in the Payment Service? You can’t just ROLLBACK a remote database.

The Saga Pattern: Eventual Consistency

A Saga is a sequence of local transactions $T_1, T_2, …, T_n$. Each $T_i$ is accompanied by a compensating transaction $C_i$ that undoes the changes made by $T_i$.

Happy Path: $T_1 \rightarrow T_2 \rightarrow T_3 \rightarrow … \rightarrow T_n$ (All succeed)

Failure Path (at $T_3$): $T_1 \rightarrow T_2 \rightarrow T_3$ (Fails!) $\rightarrow C_2 \rightarrow C_1$ (Rollback)


The Two Coordination Flavors

1. Choreography (Event-Driven)

No central coordinator. Each service listens for events and decides when to execute its local transaction.

[ Order Service ]       [ Payment Service ]       [ Inventory Service ]
       │                         │                         │
       └─(OrderCreated)───▶      │                         │
                                 │                         │
       ◀──(PaymentSuccess)───────┘                         │
                                                           │
       └─(PaymentSuccess)──────────────────────────▶       │
                                                           │
       ◀──(InventoryAllocated)─────────────────────────────┘

Pros: Decoupled, simple for small sagas. Cons: Hard to track state, risk of cyclic dependencies, hard to debug.

2. Orchestration (Centralized)

A central “Orchestrator” (or Saga Manager) tells each service what to do and when.

        ┌────────────────────────┐
        │    Saga Orchestrator   │
        └───────────┬────────────┘
          ▲         │          ▲
(Command) │         ▼ (Reply)  │ (Command)
        ┌─┴─────────┐        ┌─┴──────────┐
        │  Payment  │        │  Inventory │
        │  Service  │        │  Service   │
        └───────────┘        └────────────┘

Pros: Centralized state, easier to reason about, no cyclic dependencies. Cons: Risk of over-centralizing logic, orchestrator becomes a bottleneck if poorly designed.


Transaction Types in Sagas

  1. Compensatable transactions: Transactions that can potentially be reversed by a compensating transaction.
  2. Pivot transaction: If the pivot transaction commits, the saga will run until completion. It’s neither compensatable nor retriable, or it is the last compensatable transaction.
  3. Retriable transactions: Transactions that follow the pivot transaction and are guaranteed to succeed.

Concept Summary Table

Concept Cluster What You Need to Internalize
Local Transactions Each step in a Saga must be atomic within its own database context.
Compensating Transactions These must be idempotent (running them twice has no extra effect) and commutative (order might vary in edge cases).
Eventual Consistency Accepting that data will be inconsistent for a brief window during the Saga execution.
Idempotency Services MUST handle the same message twice without causing double-payments or double-shipping.
Outbox Pattern Ensuring that database updates and message sending happen atomically to avoid “ghost” events.
State Machines Sagas are essentially state machines moving through defined business states.

Deep Dive Reading by Concept

Foundational Theory

Concept Book & Chapter
Distributed Systems Logic “Designing Data-Intensive Applications” by Martin Kleppmann — Ch. 9: “Consistency and Consensus”
Microservices Patterns “Microservices Patterns” by Chris Richardson — Ch. 4: “Managing transactions with sagas”
Original Saga Paper “Sagas” by Hector Garcia-Molina and Kenneth Salem (1987)

Implementation Patterns

Concept Book & Chapter
Transactional Outbox “Microservices Patterns” by Chris Richardson — Ch. 3: “Interprocess Communication” (Section 3.3.5)
Event Sourcing “Building Microservices” by Sam Newman — Ch. 6: “Workflow and Orchestration”
Idempotent Consumers “Enterprise Integration Patterns” by Gregor Hohpe — “Idempotent Receiver”

Essential Reading Order

  1. The Context (Week 1):
    • Designing Data-Intensive Applications Ch. 9 (Why 2PC is hard).
    • Microservices Patterns Ch. 4 (The best modern guide to Sagas).
  2. The Mechanics (Week 2):
    • Enterprise Integration Patterns (Messaging and Idempotency).
    • Read the 1987 “Sagas” paper (it’s surprisingly readable).

Project List


Project 1: The “Simple Order” Choreography (Happy Path)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Java, Node.js, Python
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Distributed Systems / Messaging
  • Software or Tool: RabbitMQ or Redis Pub/Sub
  • Main Book: “Microservices Patterns” by Chris Richardson

What you’ll build: A simple event-driven system where an OrderService emits an event, a PaymentService processes it, and an InventoryService reserves stock. This project focuses solely on the “Happy Path” where everything works.

Why it teaches Sagas: It introduces the concept of local transactions triggered by events. You’ll see how data flows through a distributed system without a central controller.

Core challenges you’ll face:

  • Message Reliability → How do you ensure the event actually reaches the next service?
  • Event Schema → What information does the PaymentService need from the OrderService?
  • Asynchronous Flow → How do you know when the whole “Transaction” is finished?

Key Concepts

  • Pub/Sub: “Designing Data-Intensive Applications” Ch. 11 - Martin Kleppmann
  • Event-Driven Architecture: “Building Microservices” Ch. 6 - Sam Newman

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic understanding of HTTP and a messaging broker like RabbitMQ.


Real World Outcome

You will have three separate services running. When you place an order, you’ll see a waterfall of logs across three terminals showing the sequence of events.

Example Output:

# Terminal 1: Order Service
[INFO] Received Order #101. Status: PENDING.
[INFO] Emitting 'OrderCreated' event.

# Terminal 2: Payment Service
[INFO] Received 'OrderCreated' for #101.
[INFO] Charging $50... Success!
[INFO] Emitting 'PaymentSuccess' event.

# Terminal 3: Inventory Service
[INFO] Received 'PaymentSuccess' for #101.
[INFO] Deducting 1x 'CoolWidget' from stock... Success!
[INFO] Emitting 'InventoryReserved' event.

The Core Question You’re Answering

“How can multiple independent services collaborate on a single business goal without talking to each other directly?”

Before you write any code, think about what happens if you just used REST calls. If the Payment Service is slow, does the Order Service wait? If you use events, the Order Service is free as soon as the event is published.


Concepts You Must Understand First

Stop and research these before coding:

  1. At-Least-Once Delivery
    • What happens if a message is sent twice?
    • What happens if the receiver crashes before processing?
    • Book Reference: “Designing Data-Intensive Applications” Ch. 11
  2. Event Schema Versioning
    • What happens if you add a field to ‘OrderCreated’?
    • Book Reference: “Building Microservices” Ch. 4

Questions to Guide Your Design

  1. Messaging
    • Will you use a Topic or a Queue?
    • What happens if the Payment Service is down when ‘OrderCreated’ is sent?
  2. Tracking
    • How does the Inventory Service know which Order the Payment belongs to? (Correlation ID)

Thinking Exercise

Trace the Event

Imagine you have a piece of paper (The Event). You put it in a mailbox.

  1. Does the mailman (Broker) guarantee it gets there?
  2. Does the recipient keep the paper or a copy?
  3. If they lose it, can they ask for another copy?

The Interview Questions They’ll Ask

  1. “What is the difference between Orchestration and Choreography?”
  2. “How do you handle message loss in an event-driven system?”
  3. “What is a Correlation ID and why is it mandatory in Sagas?”

Hints in Layers

Hint 1: Setup Start by running RabbitMQ in Docker. Create three separate Go programs that connect to it.

Hint 2: The Event Define a simple JSON struct for ‘OrderCreated’ that includes OrderID, CustomerID, and Amount.

Hint 3: Wiring Make OrderService publish to an exchange. PaymentService should bind a queue to that exchange.


Books That Will Help

Topic Book Chapter
Event Driven Design “Building Microservices” Ch. 6
RabbitMQ Basics “RabbitMQ in Action” Ch. 1-2

Implementation Hints

Focus on the message flow. Use a library like amqp in Go. Ensure each service logs heavily so you can see the flow. Don’t worry about database persistence yet; use in-memory maps for this first project.


Project 2: The “Simple Order” Choreography (Compensation)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Java, Node.js
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Error Handling in Distributed Systems
  • Software or Tool: RabbitMQ / Redis
  • Main Book: “Microservices Patterns” by Chris Richardson

What you’ll build: An extension of Project 1. Now, if the InventoryService finds it has no stock, it emits an InventoryFailed event. The PaymentService must listen for this and refund the customer.

Why it teaches Sagas: This is the heart of the Saga pattern—Compensating Transactions. You’ll learn that a “Rollback” in microservices is actually a new “Undo” operation.

Core challenges you’ll face:

  • Defining the Inverse → What is the opposite of “Charge Card”? (Refund Card)
  • State Transitions → The OrderService needs to move from PENDING to CANCELLED.
  • Handling Partial Failures → What if the Refund fails?

Key Concepts

  • Compensating Transactions: “Sagas” (1987 Paper) - Garcia-Molina
  • Semantic Rollback: “Microservices Patterns” Ch. 4

Real World Outcome

You’ll trigger an order that you know will fail (e.g., ordering 1000 items when stock is 10). You’ll see the system automatically undoing its previous steps.

Example Output:

# Order Service
[INFO] Order #102 Created. Status: PENDING.
[INFO] OrderCreated event sent.

# Payment Service
[INFO] Charged $50 for Order #102. Success.
[INFO] PaymentSuccess event sent.

# Inventory Service
[ERROR] Stock insufficient for Order #102.
[INFO] Emitting 'InventoryFailed' event.

# Payment Service (Reaction)
[INFO] Received 'InventoryFailed'. Undoing Payment for Order #102...
[INFO] Refunded $50. Status: COMPENSATED.

# Order Service (Reaction)
[INFO] Received 'InventoryFailed'. Setting Order #102 to REJECTED.

The Core Question You’re Answering

“How do you ‘un-ring’ a bell in a distributed system?”

In Project 1, we assumed everything worked. In the real world, 5% of things fail. If you can’t undo, you lose money or customer trust.


Thinking Exercise

The ATM Analogy

Imagine an ATM. You ask for $100.

  1. The ATM checks your balance (Succeeds).
  2. The ATM debits your account (Succeeds).
  3. The ATM physical dispenser jams (Fails!). What must the system do? Does it just say “Sorry”? No, it must run a compensating transaction to put the $100 back in your account.

The Interview Questions They’ll Ask

  1. “What is a compensating transaction?”
  2. “Why can’t we just use a traditional ROLLBACK in microservices?”
  3. “What happens if a compensating transaction fails? How do you handle that?”

Hints in Layers

Hint 1: The Failure Trigger Add a hardcoded check in InventoryService: if (requestedQuantity > currentStock) emit InventoryFailed.

Hint 2: Listening for Failures The PaymentService now needs to listen to two queues: one for OrderCreated (forward path) and one for InventoryFailed (backward path).

Hint 3: Matching State When PaymentService receives InventoryFailed, it needs to find the specific payment record it created earlier and mark it as ‘Refunded’.


Project 3: The “Simple Order” Orchestrator (State Machine)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Java (Spring State Machine), Node.js (Temporal.io)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Workflow Orchestration
  • Software or Tool: SQLite (for local state)
  • Main Book: “Microservices Patterns” by Chris Richardson

What you’ll build: A central OrderSagaOrchestrator. Instead of services talking to each other via events, they all talk to the Orchestrator. The Orchestrator maintains a State Machine for each order.

Why it teaches Sagas: It shows the alternative to choreography. You’ll learn how to centralize complex business logic while keeping services simple (they just execute commands).

Core challenges you’ll face:

  • State Persistence → If the Orchestrator crashes, how does it know where it was in the saga?
  • Command vs. Event → Understanding that Orchestrators send “Commands” (Do this!) and services send “Events” (I did this!).
  • Avoiding the “God Service” → Ensuring the Orchestrator only handles coordination, not business logic.

Real World Outcome

You will have an Orchestrator database table. You can query it mid-saga and see exactly which state the transaction is in (e.g., PAYMENT_COMPLETE, AWAITING_INVENTORY).

Example State Table: | SagaID | Status | Step | |——–|——–|——| | 105 | IN_PROGRESS | Awaiting Inventory | | 104 | COMPLETED | Done | | 103 | REJECTED | Payment Failed |


Questions to Guide Your Design

  1. Failure Handling
    • If the Orchestrator sends “ChargePayment” and the network times out, what should it do? Retry? Fail?
  2. Persistence
    • At what point in the code do you save the saga state to the DB? (Before or after sending the command?)

The Interview Questions They’ll Ask

  1. “When would you choose Orchestration over Choreography?”
  2. “How do you handle ‘dual writes’ in an Orchestrator (saving state and sending a message)?”
  3. “What is a ‘State Machine’ in the context of a Saga?”

Hints in Layers

Hint 1: The Orchestrator Loop Create a Saga struct that has a nextStep() method.

Hint 2: The Command Channel The Orchestrator should publish to a payment_commands queue. The Payment Service should reply to a payment_replies queue.

Hint 3: Crash Recovery On startup, the Orchestrator should read all IN_PROGRESS sagas from the database and trigger their current step.


Project 4: The Idempotent Consumer (Avoiding Double Actions)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Java, Node.js
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Reliability / Messaging
  • Software or Tool: PostgreSQL / Redis
  • Main Book: “Enterprise Integration Patterns” by Gregor Hohpe

What you’ll build: A consumer service that ensures it never processes the same message twice, even if the broker sends it multiple times.

Why it teaches Sagas: Sagas depend on “at-least-once” delivery, which means duplicates are inevitable. If your “Refund” operation isn’t idempotent, you might refund a customer twice. This project teaches the “Idempotent Receiver” pattern.

Core challenges you’ll face:

  • Unique Identification → How do you distinguish a retry from a new request?
  • The “Check-then-Act” Race Condition → What if two threads process the same duplicate message at the same time?
  • Handling Result Storage → If a message was already processed, what should you return to the caller/broker?

Real World Outcome

You will run a script that intentionally sends the same “OrderCreated” event 10 times. Your PaymentService will only charge the credit card once and log “Duplicate message ignored” for the other 9.

Example Output:

$ ./test_duplicates.sh
[SEND] Sending Message ID: abc-123 ...
[SEND] Sending Message ID: abc-123 (Duplicate) ...

# Payment Service
[INFO] Processing Message ID: abc-123. Charging card... DONE.
[WARN] Processing Message ID: abc-123. ALREADY PROCESSED. Skipping.
[WARN] Processing Message ID: abc-123. ALREADY PROCESSED. Skipping.

The Core Question You’re Answering

“In a world where messages are delivered ‘at-least-once’, how do we ensure we only act ‘exactly-once’?”


Thinking Exercise

The Doorbell

If someone rings your doorbell 5 times, how many times do you walk to the door?

  1. If you haven’t opened it yet, you’re already on your way.
  2. If you already opened it, you say “I’m already here”.
  3. How do you track that you’ve “handled” the ring?

The Interview Questions They’ll Ask

  1. “What is idempotency and why is it important in distributed systems?”
  2. “How would you implement an idempotent consumer using a database?”
  3. “What is a ‘Natural Key’ vs. a ‘Surrogate Key’ for idempotency?”

Hints in Layers

Hint 1: The Idempotency Table Create a table processed_messages with columns message_id and processed_at. Make message_id a UNIQUE primary key.

Hint 2: The Atomic Operation Use a database transaction. INSERT INTO processed_messages ... if it fails due to unique constraint, you know it’s a duplicate.

Hint 3: Returning Results Even if it’s a duplicate, you might need to send back the same response you sent the first time (if using RPC-style messaging).


Project 5: Transactional Outbox (The Atomic Multi-Step)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Java, C#
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Database Internals / Messaging
  • Software or Tool: PostgreSQL / CDC (Change Data Capture)
  • Main Book: “Microservices Patterns” by Chris Richardson

What you’ll build: A system that ensures a database update and a message publication happen atomically. You’ll use an “Outbox” table to store messages and a separate process to publish them.

Why it teaches Sagas: This is the most critical pattern for reliable Sagas. It solves the problem where a service updates its DB but crashes before it can send the next event, leaving the Saga “stuck”.

Core challenges you’ll face:

  • Transactional Integrity → How do you ensure the OUTBOX record is only created if the main business update succeeds?
  • The Polling vs. Tail-Log Debate → How do you detect new messages in the Outbox table efficiently?
  • Exactly-Once Publication → What if the publisher crashes after sending the message but before deleting it from the Outbox?

Real World Outcome

You’ll perform an “Update + Send” operation. You’ll then intentionally crash the “Send” part. When you restart the system, it will automatically “resume” and send the missing message.

Example Output:

# Order Service
[DB] Transaction Started.
[DB] Order #201 Inserted.
[DB] Outbox entry 'OrderCreated' inserted.
[DB] Transaction Committed.
[CRASH] Simulating process death before message publication...

# Restarter Process
[BOOT] Found 1 pending message in OUTBOX.
[SEND] Publishing 'OrderCreated' for #201 to RabbitMQ.
[DB] Marking OUTBOX entry #201 as SENT.

Questions to Guide Your Design

  1. Efficiency
    • If you poll the Outbox table every 100ms, what is the impact on DB performance?
  2. Debezium/CDC
    • How would using the DB transaction log (WAL) differ from a manual Outbox table?

The Interview Questions They’ll Ask

  1. “What problem does the Transactional Outbox pattern solve?”
  2. “Why can’t we just send a message inside a database transaction block?”
  3. “What are the trade-offs between polling and transaction log tailing?”

Hints in Layers

Hint 1: The Outbox Table Fields: id, payload, status (PENDING, SENT), created_at.

Hint 2: The Producer Inside your Go code: db.Begin(), db.Exec("INSERT INTO orders..."), db.Exec("INSERT INTO outbox..."), db.Commit().

Hint 3: The Relay Create a separate Go routine (The “Message Relay”) that does SELECT * FROM outbox WHERE status = 'PENDING' LIMIT 10.


Project 6: Event Sourcing for Sagas (The Audit Log)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Java, F#, Elixir
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Persistence Patterns
  • Software or Tool: EventStoreDB or custom JSON log
  • Main Book: “Microservices Patterns” by Chris Richardson (Ch. 6)

What you’ll build: A saga implementation where the state is not stored as a single row (e.g., status='PENDING'), but as a stream of events (e.g., OrderCreated, PaymentRequested, PaymentReceived).

Why it teaches Sagas: It provides a perfect audit trail. You can reconstruct the history of any saga at any point. This is how high-compliance systems (banking, healthcare) manage distributed transactions.

Core challenges you’ll face:

  • Replaying State → How do you calculate the current state from a list of past events?
  • Snapshotting → What if a saga has 10,000 events? Do you have to read them all every time?
  • Versioning Events → What happens when the business logic changes but old events remain in the store?

Thinking Exercise

The Bank Account

How do you know your current balance?

  1. Is it a single number in a cell? (State-based)
  2. Is it the sum of every transaction you ever made? (Event-sourced) If someone asks “Why is my balance $50?”, which method provides the answer?

The Interview Questions They’ll Ask

  1. “What is Event Sourcing?”
  2. “How does Event Sourcing simplify or complicate Saga implementation?”
  3. “What is a projection in the context of event-sourced systems?”

Project 7: Handling Timeouts in Sagas (The Dead Letter)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Java, Node.js
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Reliability / Operations
  • Software or Tool: RabbitMQ Dead Letter Exchanges (DLX)
  • Main Book: “Building Microservices” by Sam Newman

What you’ll build: A saga where if a service doesn’t respond within a certain time, the saga automatically triggers a compensation or moves to a “Manual Intervention” state.

Why it teaches Sagas: Real-world distributed systems have “silent failures”. A service might not crash, but it might be too slow. You’ll learn how to implement timeouts and retries using messaging infrastructure.

Core challenges you’ll face:

  • Defining “Too Long” → What is the threshold before you give up?
  • Zombie Sagas → What if a service responds AFTER the timeout has triggered a refund?
  • Visibility → How do you notify a human that a saga has failed and requires manual fixing?

Real World Outcome

You’ll slow down your PaymentService (using time.Sleep). Your OrderOrchestrator will wait for 5 seconds, then declare the payment “Timed Out” and trigger the “Cancel Order” flow.

Example Output:

[ORCHESTRATOR] Sent 'ChargeCard' to PaymentService.
[ORCHESTRATOR] Timer started: 5s.
...
[ORCHESTRATOR] Timer Expired! PaymentService took too long.
[ORCHESTRATOR] Transitioning to: FAILED_TIMEOUT.
[ORCHESTRATOR] Sending 'CancelOrder' to OrderService.

The Core Question You’re Answering

“In a distributed system, how do you distinguish between a ‘slow’ service and a ‘dead’ service?”


The Interview Questions They’ll Ask

  1. “What is a Dead Letter Exchange?”
  2. “How do you handle ‘late’ messages that arrive after a timeout has been processed?”
  3. “Why are retries dangerous in non-idempotent systems?”

Hints in Layers

Hint 1: Orchestrator Timer When the Orchestrator sends a command, it should also store a expires_at timestamp in the database.

Hint 2: The Reaper Create a background process that periodically checks for sagas where now() > expires_at and status is still IN_PROGRESS.

Hint 3: DLX Configure RabbitMQ so that if a message stays in a queue for more than X seconds, it’s moved to a failed_messages queue.


Project 8: Concurrent Sagas and Isolation (The Lost Update)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Java, Rust
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Concurrency Control
  • Software or Tool: Redis (Distributed Locks for testing)
  • Main Book: “Microservices Patterns” by Chris Richardson (Ch. 4.3)

What you’ll build: A scenario where two different Sagas try to modify the same record (e.g., updating the same inventory item or bank account) at the same time. You will implement countermeasures like “Semantic Locking”.

Why it teaches Sagas: Sagas lack ACID isolation. This project forces you to grapple with the “Anomalies” (Lost Updates, Dirty Reads) that occur when transactions aren’t isolated.

Core challenges you’ll face:

  • Detecting Conflicts → How do you know another saga is mid-flight on this record?
  • Semantic Locking → How do you flag a record as “Pending Update” without using a DB lock?
  • Commutative Updates → Can you design your operations so the order doesn’t matter (e.g., incrementing/decrementing)?

Thinking Exercise

The Overbooked Flight

Two people try to book the last seat.

  1. Saga A: Check seat (Free) -> Charge Card -> Reserve Seat.
  2. Saga B: Check seat (Free) -> Charge Card -> Reserve Seat. Both check at the same time. Both see it’s free. Both charge the card. Only one can have the seat. How do you prevent this?

The Interview Questions They’ll Ask

  1. “What is the ‘Isolation’ problem in Sagas?”
  2. “Explain ‘Semantic Locking’.”
  3. “How can ‘Pessimistic View’ help manage concurrency in Sagas?”

Project 9: Monitoring Distributed Workflows (The Dashboard)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go + React
  • Alternative Programming Languages: Node.js + Vue, Python + Streamlit
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Observability
  • Software or Tool: OpenTelemetry / Jaeger
  • Main Book: “Building Microservices” by Sam Newman (Ch. 8)

What you’ll build: A web dashboard that visualizes the path of a single Saga across all services. It shows which steps succeeded, which failed, and the timing of each event.

Why it teaches Sagas: Sagas are hard to debug. Without visualization, you’re just looking at logs across 5 different services. This project teaches “Distributed Tracing”.

Core challenges you’ll face:

  • Context Propagation → How do you pass the trace_id from the Order Service to the Payment Service through the Message Broker?
  • Aggregation → How do you collect events from multiple services and correlate them in one UI?
  • Real-time Updates → Using WebSockets to show a Saga progressing live.

Real World Outcome

A dashboard where you can enter an Order ID and see a “Timeline” view. Green dots for success, red dots for failure, and arrows showing the event flow.

Example Dashboard View:

Order #501
[O] OrderCreated (0ms) ----------> [P] PaymentSuccess (450ms)
                                         |
                                         V
[S] ShippingReserved (FAILED) <--- [I] InventoryReserved (1.2s)
  |
  +--- [P] REFUNDING (1.5s) -----> [P] COMPENSATED (1.8s)

The Interview Questions They’ll Ask

  1. “What is a Correlation ID and how is it different from a Trace ID?”
  2. “How do you monitor the health of a distributed transaction?”
  3. “What are the 3 pillars of Observability?”

Project 10: The “Super-Saga” (Dynamic Routing)

  • File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Java, Node.js
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 5: Master
  • Knowledge Area: Advanced Workflow Engines
  • Software or Tool: Temporal.io or custom DSL
  • Main Book: “Microservices Patterns” by Chris Richardson

What you’ll build: A saga where the next step is not hardcoded, but decided by a rules engine at runtime. For example: “If Customer is VIP, skip payment check” or “If Order > $1000, require manual approval step”.

Why it teaches Sagas: It takes the pattern to its logical conclusion. You’re no longer building a hardcoded flow; you’re building a “Distributed Workflow Engine”.


Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. Happy Path Level 1 Weekend ⭐ 😃
2. Compensation Level 2 1 Week ⭐⭐ 😎
3. Orchestrator Level 2 1 Week ⭐⭐⭐ 🧐
4. Idempotency Level 2 Weekend ⭐⭐⭐ 🔒
5. Outbox Level 3 2 Weeks ⭐⭐⭐⭐ 🏗️
6. Event Sourcing Level 4 1 Month ⭐⭐⭐⭐⭐ 🧙‍♂️
7. Timeouts Level 3 1 Week ⭐⭐⭐ ⏰
8. Concurrency Level 4 2 Weeks ⭐⭐⭐⭐⭐ 🧠
9. Dashboard Level 2 1 Week ⭐⭐ 📊
10. Super-Saga Level 5 1 Month+ ⭐⭐⭐⭐⭐ 🚀

Recommendation

If you are new to distributed systems, start with Project 1 and 2. These will give you the “Aha!” moment about event-driven design and the fundamental need for compensation.

If you are an experienced backend dev looking to level up your architectural skills, Project 5 (Transactional Outbox) is the most valuable “real-world” pattern you can learn.


Final Overall Project: The “Distributed E-Commerce Engine”

Combine everything you’ve learned into a single, production-grade system.

  • Goal: Build a system that handles Orders, Payments, Inventory, Shipping, and Customer Notifications.
  • Requirements:
    • Use an Orchestrator with a persistent State Machine.
    • Every service uses the Transactional Outbox pattern.
    • Every consumer is Idempotent.
    • Implement a “Dead Letter” strategy for all queues.
    • Build a React Dashboard to monitor the status of every order in real-time.
    • Implement Circuit Breakers on the RPC calls between services.

Summary

This learning path covers Distributed Transactions (Sagas) through 10 hands-on projects. Here’s the complete list:

# Project Name Main Language Difficulty Time Estimate
1 Simple Order Choreography Go Level 1 Weekend
2 Simple Order Compensation Go Level 2 1 Week
3 Simple Order Orchestrator Go Level 2 1 Week
4 Idempotent Consumer Go Level 2 Weekend
5 Transactional Outbox Go Level 3 2 Weeks
6 Event Sourcing for Sagas Go Level 4 1 Month
7 Handling Timeouts Go Level 3 1 Week
8 Concurrent Sagas Go Level 4 2 Weeks
9 Saga Dashboard Go/React Level 2 1 Week
10 Super-Saga Engine Go Level 5 1 Month+

For beginners: Start with projects #1, #2, #4. For intermediate: Focus on projects #3, #5, #7, #9. For advanced: Focus on projects #6, #8, #10.

Expected Outcomes

After completing these projects, you will:

  • Master the difference between Orchestration and Choreography.
  • Be able to design and implement Compensating Transactions.
  • Understand how to guarantee Eventual Consistency without 2PC.
  • Implement critical infrastructure patterns like Transactional Outbox and Idempotency.
  • Build systems that are resilient to failure, network delays, and duplicates.

You’ll have built 10 working projects that demonstrate deep understanding of Distributed Transactions from first principles.