LEARN DISTRIBUTED TRANSACTIONS SAGAS
In the monolithic world, ACID transactions are our best friends. You open a transaction, update multiple tables, and either everything commits or everything rolls back. Life is simple.
Learn Distributed Transactions (Sagas): From Zero to Saga Master
Goal: Deeply understand the challenges of maintaining data consistency in distributed systems. You will move beyond simple ACID transactions to master the Saga patternâlearning how to coordinate complex business processes across multiple microservices using both orchestration and choreography, handling failures gracefully with compensating transactions, and ensuring eventual consistency without the need for distributed locks or 2PC.
Why Distributed Transactions (Sagas) Matter
In the monolithic world, ACID transactions are our best friends. You open a transaction, update multiple tables, and either everything commits or everything rolls back. Life is simple.
But in a microservices architecture, your data is scattered. An âOrderâ might involve the Order Service, Payment Service, Inventory Service, and Shipping Service. There is no single database to coordinate the transaction. Traditional distributed transactions like 2-Phase Commit (2PC) donât scale well; they are slow, prone to deadlocks, and require all participants to be online at the same time.
Sagas were introduced in 1987 by Hector Garcia-Molina and Kenneth Salem to solve the âLong Lived Transactionâ (LLT) problem. Instead of one giant lock, a Saga breaks the process into a sequence of local transactions. If one step fails, the Saga executes âcompensating transactionsâ to undo the work of previous steps.
Understanding Sagas is the difference between building a fragile system that leaves data in a corrupted state and building a resilient, enterprise-grade distributed system used by companies like Uber, Netflix, and Amazon.
Core Concept Analysis
The Failure of Atomic Transactions in Microservices
In a monolith:
[ Request ] âââś [ Start Transaction ]
â - Update Orders Table
â - Update Inventory Table
â - Update Payment Table
[ Commit Transaction ] âââś [ Success ]
In microservices (The Problem):
[ Order Service ] --(RPC)--> [ Payment Service ] --(RPC)--> [ Inventory Service ]
â â â
[ DB ] [ DB ] [ DB ]
If the Inventory Service fails, how do you ârollbackâ the payment that already happened in the Payment Service? You canât just ROLLBACK a remote database.
The Saga Pattern: Eventual Consistency
A Saga is a sequence of local transactions $T_1, T_2, âŚ, T_n$. Each $T_i$ is accompanied by a compensating transaction $C_i$ that undoes the changes made by $T_i$.
Happy Path: $T_1 \rightarrow T_2 \rightarrow T_3 \rightarrow ⌠\rightarrow T_n$ (All succeed)
Failure Path (at $T_3$): $T_1 \rightarrow T_2 \rightarrow T_3$ (Fails!) $\rightarrow C_2 \rightarrow C_1$ (Rollback)
The Two Coordination Flavors
1. Choreography (Event-Driven)
No central coordinator. Each service listens for events and decides when to execute its local transaction.
[ Order Service ] [ Payment Service ] [ Inventory Service ]
â â â
ââ(OrderCreated)ââââś â â
â â
âââ(PaymentSuccess)ââââââââ â
â
ââ(PaymentSuccess)âââââââââââââââââââââââââââś â
â
âââ(InventoryAllocated)ââââââââââââââââââââââââââââââ
Pros: Decoupled, simple for small sagas. Cons: Hard to track state, risk of cyclic dependencies, hard to debug.
2. Orchestration (Centralized)
A central âOrchestratorâ (or Saga Manager) tells each service what to do and when.
ââââââââââââââââââââââââââ
â Saga Orchestrator â
âââââââââââââŹâââââââââââââ
Ⲡâ â˛
(Command) â âź (Reply) â (Command)
âââ´ââââââââââ âââ´âââââââââââ
â Payment â â Inventory â
â Service â â Service â
âââââââââââââ ââââââââââââââ
Pros: Centralized state, easier to reason about, no cyclic dependencies. Cons: Risk of over-centralizing logic, orchestrator becomes a bottleneck if poorly designed.
Transaction Types in Sagas
- Compensatable transactions: Transactions that can potentially be reversed by a compensating transaction.
- Pivot transaction: If the pivot transaction commits, the saga will run until completion. Itâs neither compensatable nor retriable, or it is the last compensatable transaction.
- Retriable transactions: Transactions that follow the pivot transaction and are guaranteed to succeed.
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Local Transactions | Each step in a Saga must be atomic within its own database context. |
| Compensating Transactions | These must be idempotent (running them twice has no extra effect) and commutative (order might vary in edge cases). |
| Eventual Consistency | Accepting that data will be inconsistent for a brief window during the Saga execution. |
| Idempotency | Services MUST handle the same message twice without causing double-payments or double-shipping. |
| Outbox Pattern | Ensuring that database updates and message sending happen atomically to avoid âghostâ events. |
| State Machines | Sagas are essentially state machines moving through defined business states. |
Deep Dive Reading by Concept
Foundational Theory
| Concept | Book & Chapter |
|---|---|
| Distributed Systems Logic | âDesigning Data-Intensive Applicationsâ by Martin Kleppmann â Ch. 9: âConsistency and Consensusâ |
| Microservices Patterns | âMicroservices Patternsâ by Chris Richardson â Ch. 4: âManaging transactions with sagasâ |
| Original Saga Paper | âSagasâ by Hector Garcia-Molina and Kenneth Salem (1987) |
Implementation Patterns
| Concept | Book & Chapter |
|---|---|
| Transactional Outbox | âMicroservices Patternsâ by Chris Richardson â Ch. 3: âInterprocess Communicationâ (Section 3.3.5) |
| Event Sourcing | âBuilding Microservicesâ by Sam Newman â Ch. 6: âWorkflow and Orchestrationâ |
| Idempotent Consumers | âEnterprise Integration Patternsâ by Gregor Hohpe â âIdempotent Receiverâ |
Essential Reading Order
- The Context (Week 1):
- Designing Data-Intensive Applications Ch. 9 (Why 2PC is hard).
- Microservices Patterns Ch. 4 (The best modern guide to Sagas).
- The Mechanics (Week 2):
- Enterprise Integration Patterns (Messaging and Idempotency).
- Read the 1987 âSagasâ paper (itâs surprisingly readable).
Project List
Project 1: The âSimple Orderâ Choreography (Happy Path)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go
- Alternative Programming Languages: Java, Node.js, Python
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The âResume Goldâ
- Difficulty: Level 1: Beginner
- Knowledge Area: Distributed Systems / Messaging
- Software or Tool: RabbitMQ or Redis Pub/Sub
- Main Book: âMicroservices Patternsâ by Chris Richardson
What youâll build: A simple event-driven system where an OrderService emits an event, a PaymentService processes it, and an InventoryService reserves stock. This project focuses solely on the âHappy Pathâ where everything works.
Why it teaches Sagas: It introduces the concept of local transactions triggered by events. Youâll see how data flows through a distributed system without a central controller.
Core challenges youâll face:
- Message Reliability â How do you ensure the event actually reaches the next service?
- Event Schema â What information does the
PaymentServiceneed from theOrderService? - Asynchronous Flow â How do you know when the whole âTransactionâ is finished?
Key Concepts
- Pub/Sub: âDesigning Data-Intensive Applicationsâ Ch. 11 - Martin Kleppmann
- Event-Driven Architecture: âBuilding Microservicesâ Ch. 6 - Sam Newman
Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic understanding of HTTP and a messaging broker like RabbitMQ.
Real World Outcome
You will have three separate services running. When you place an order, youâll see a waterfall of logs across three terminals showing the sequence of events.
Example Output:
# Terminal 1: Order Service
[INFO] Received Order #101. Status: PENDING.
[INFO] Emitting 'OrderCreated' event.
# Terminal 2: Payment Service
[INFO] Received 'OrderCreated' for #101.
[INFO] Charging $50... Success!
[INFO] Emitting 'PaymentSuccess' event.
# Terminal 3: Inventory Service
[INFO] Received 'PaymentSuccess' for #101.
[INFO] Deducting 1x 'CoolWidget' from stock... Success!
[INFO] Emitting 'InventoryReserved' event.
The Core Question Youâre Answering
âHow can multiple independent services collaborate on a single business goal without talking to each other directly?â
Before you write any code, think about what happens if you just used REST calls. If the Payment Service is slow, does the Order Service wait? If you use events, the Order Service is free as soon as the event is published.
Concepts You Must Understand First
Stop and research these before coding:
- At-Least-Once Delivery
- What happens if a message is sent twice?
- What happens if the receiver crashes before processing?
- Book Reference: âDesigning Data-Intensive Applicationsâ Ch. 11
- Event Schema Versioning
- What happens if you add a field to âOrderCreatedâ?
- Book Reference: âBuilding Microservicesâ Ch. 4
Questions to Guide Your Design
- Messaging
- Will you use a Topic or a Queue?
- What happens if the Payment Service is down when âOrderCreatedâ is sent?
- Tracking
- How does the Inventory Service know which Order the Payment belongs to? (Correlation ID)
Thinking Exercise
Trace the Event
Imagine you have a piece of paper (The Event). You put it in a mailbox.
- Does the mailman (Broker) guarantee it gets there?
- Does the recipient keep the paper or a copy?
- If they lose it, can they ask for another copy?
The Interview Questions Theyâll Ask
- âWhat is the difference between Orchestration and Choreography?â
- âHow do you handle message loss in an event-driven system?â
- âWhat is a Correlation ID and why is it mandatory in Sagas?â
Hints in Layers
Hint 1: Setup Start by running RabbitMQ in Docker. Create three separate Go programs that connect to it.
Hint 2: The Event
Define a simple JSON struct for âOrderCreatedâ that includes OrderID, CustomerID, and Amount.
Hint 3: Wiring
Make OrderService publish to an exchange. PaymentService should bind a queue to that exchange.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Event Driven Design | âBuilding Microservicesâ | Ch. 6 |
| RabbitMQ Basics | âRabbitMQ in Actionâ | Ch. 1-2 |
Implementation Hints
Focus on the message flow. Use a library like amqp in Go. Ensure each service logs heavily so you can see the flow. Donât worry about database persistence yet; use in-memory maps for this first project.
Project 2: The âSimple Orderâ Choreography (Compensation)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go
- Alternative Programming Languages: Java, Node.js
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The âResume Goldâ
- Difficulty: Level 2: Intermediate
- Knowledge Area: Error Handling in Distributed Systems
- Software or Tool: RabbitMQ / Redis
- Main Book: âMicroservices Patternsâ by Chris Richardson
What youâll build: An extension of Project 1. Now, if the InventoryService finds it has no stock, it emits an InventoryFailed event. The PaymentService must listen for this and refund the customer.
Why it teaches Sagas: This is the heart of the Saga patternâCompensating Transactions. Youâll learn that a âRollbackâ in microservices is actually a new âUndoâ operation.
Core challenges youâll face:
- Defining the Inverse â What is the opposite of âCharge Cardâ? (Refund Card)
- State Transitions â The
OrderServiceneeds to move fromPENDINGtoCANCELLED. - Handling Partial Failures â What if the Refund fails?
Key Concepts
- Compensating Transactions: âSagasâ (1987 Paper) - Garcia-Molina
- Semantic Rollback: âMicroservices Patternsâ Ch. 4
Real World Outcome
Youâll trigger an order that you know will fail (e.g., ordering 1000 items when stock is 10). Youâll see the system automatically undoing its previous steps.
Example Output:
# Order Service
[INFO] Order #102 Created. Status: PENDING.
[INFO] OrderCreated event sent.
# Payment Service
[INFO] Charged $50 for Order #102. Success.
[INFO] PaymentSuccess event sent.
# Inventory Service
[ERROR] Stock insufficient for Order #102.
[INFO] Emitting 'InventoryFailed' event.
# Payment Service (Reaction)
[INFO] Received 'InventoryFailed'. Undoing Payment for Order #102...
[INFO] Refunded $50. Status: COMPENSATED.
# Order Service (Reaction)
[INFO] Received 'InventoryFailed'. Setting Order #102 to REJECTED.
The Core Question Youâre Answering
âHow do you âun-ringâ a bell in a distributed system?â
In Project 1, we assumed everything worked. In the real world, 5% of things fail. If you canât undo, you lose money or customer trust.
Thinking Exercise
The ATM Analogy
Imagine an ATM. You ask for $100.
- The ATM checks your balance (Succeeds).
- The ATM debits your account (Succeeds).
- The ATM physical dispenser jams (Fails!). What must the system do? Does it just say âSorryâ? No, it must run a compensating transaction to put the $100 back in your account.
The Interview Questions Theyâll Ask
- âWhat is a compensating transaction?â
- âWhy canât we just use a traditional ROLLBACK in microservices?â
- âWhat happens if a compensating transaction fails? How do you handle that?â
Hints in Layers
Hint 1: The Failure Trigger
Add a hardcoded check in InventoryService: if (requestedQuantity > currentStock) emit InventoryFailed.
Hint 2: Listening for Failures
The PaymentService now needs to listen to two queues: one for OrderCreated (forward path) and one for InventoryFailed (backward path).
Hint 3: Matching State
When PaymentService receives InventoryFailed, it needs to find the specific payment record it created earlier and mark it as âRefundedâ.
Project 3: The âSimple Orderâ Orchestrator (State Machine)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go
- Alternative Programming Languages: Java (Spring State Machine), Node.js (Temporal.io)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The âResume Goldâ
- Difficulty: Level 2: Intermediate
- Knowledge Area: Workflow Orchestration
- Software or Tool: SQLite (for local state)
- Main Book: âMicroservices Patternsâ by Chris Richardson
What youâll build: A central OrderSagaOrchestrator. Instead of services talking to each other via events, they all talk to the Orchestrator. The Orchestrator maintains a State Machine for each order.
Why it teaches Sagas: It shows the alternative to choreography. Youâll learn how to centralize complex business logic while keeping services simple (they just execute commands).
Core challenges youâll face:
- State Persistence â If the Orchestrator crashes, how does it know where it was in the saga?
- Command vs. Event â Understanding that Orchestrators send âCommandsâ (Do this!) and services send âEventsâ (I did this!).
- Avoiding the âGod Serviceâ â Ensuring the Orchestrator only handles coordination, not business logic.
Real World Outcome
You will have an Orchestrator database table. You can query it mid-saga and see exactly which state the transaction is in (e.g., PAYMENT_COMPLETE, AWAITING_INVENTORY).
Example State Table: | SagaID | Status | Step | |âââ|âââ|ââ| | 105 | IN_PROGRESS | Awaiting Inventory | | 104 | COMPLETED | Done | | 103 | REJECTED | Payment Failed |
Questions to Guide Your Design
- Failure Handling
- If the Orchestrator sends âChargePaymentâ and the network times out, what should it do? Retry? Fail?
- Persistence
- At what point in the code do you save the saga state to the DB? (Before or after sending the command?)
The Interview Questions Theyâll Ask
- âWhen would you choose Orchestration over Choreography?â
- âHow do you handle âdual writesâ in an Orchestrator (saving state and sending a message)?â
- âWhat is a âState Machineâ in the context of a Saga?â
Hints in Layers
Hint 1: The Orchestrator Loop
Create a Saga struct that has a nextStep() method.
Hint 2: The Command Channel
The Orchestrator should publish to a payment_commands queue. The Payment Service should reply to a payment_replies queue.
Hint 3: Crash Recovery
On startup, the Orchestrator should read all IN_PROGRESS sagas from the database and trigger their current step.
Project 4: The Idempotent Consumer (Avoiding Double Actions)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go
- Alternative Programming Languages: Java, Node.js
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The âService & Supportâ Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Reliability / Messaging
- Software or Tool: PostgreSQL / Redis
- Main Book: âEnterprise Integration Patternsâ by Gregor Hohpe
What youâll build: A consumer service that ensures it never processes the same message twice, even if the broker sends it multiple times.
Why it teaches Sagas: Sagas depend on âat-least-onceâ delivery, which means duplicates are inevitable. If your âRefundâ operation isnât idempotent, you might refund a customer twice. This project teaches the âIdempotent Receiverâ pattern.
Core challenges youâll face:
- Unique Identification â How do you distinguish a retry from a new request?
- The âCheck-then-Actâ Race Condition â What if two threads process the same duplicate message at the same time?
- Handling Result Storage â If a message was already processed, what should you return to the caller/broker?
Real World Outcome
You will run a script that intentionally sends the same âOrderCreatedâ event 10 times. Your PaymentService will only charge the credit card once and log âDuplicate message ignoredâ for the other 9.
Example Output:
$ ./test_duplicates.sh
[SEND] Sending Message ID: abc-123 ...
[SEND] Sending Message ID: abc-123 (Duplicate) ...
# Payment Service
[INFO] Processing Message ID: abc-123. Charging card... DONE.
[WARN] Processing Message ID: abc-123. ALREADY PROCESSED. Skipping.
[WARN] Processing Message ID: abc-123. ALREADY PROCESSED. Skipping.
The Core Question Youâre Answering
âIn a world where messages are delivered âat-least-onceâ, how do we ensure we only act âexactly-onceâ?â
Thinking Exercise
The Doorbell
If someone rings your doorbell 5 times, how many times do you walk to the door?
- If you havenât opened it yet, youâre already on your way.
- If you already opened it, you say âIâm already hereâ.
- How do you track that youâve âhandledâ the ring?
The Interview Questions Theyâll Ask
- âWhat is idempotency and why is it important in distributed systems?â
- âHow would you implement an idempotent consumer using a database?â
- âWhat is a âNatural Keyâ vs. a âSurrogate Keyâ for idempotency?â
Hints in Layers
Hint 1: The Idempotency Table
Create a table processed_messages with columns message_id and processed_at. Make message_id a UNIQUE primary key.
Hint 2: The Atomic Operation
Use a database transaction. INSERT INTO processed_messages ... if it fails due to unique constraint, you know itâs a duplicate.
Hint 3: Returning Results Even if itâs a duplicate, you might need to send back the same response you sent the first time (if using RPC-style messaging).
Project 5: Transactional Outbox (The Atomic Multi-Step)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go
- Alternative Programming Languages: Java, C#
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The âOpen Coreâ Infrastructure
- Difficulty: Level 3: Advanced
- Knowledge Area: Database Internals / Messaging
- Software or Tool: PostgreSQL / CDC (Change Data Capture)
- Main Book: âMicroservices Patternsâ by Chris Richardson
What youâll build: A system that ensures a database update and a message publication happen atomically. Youâll use an âOutboxâ table to store messages and a separate process to publish them.
Why it teaches Sagas: This is the most critical pattern for reliable Sagas. It solves the problem where a service updates its DB but crashes before it can send the next event, leaving the Saga âstuckâ.
Core challenges youâll face:
- Transactional Integrity â How do you ensure the
OUTBOXrecord is only created if the main business update succeeds? - The Polling vs. Tail-Log Debate â How do you detect new messages in the Outbox table efficiently?
- Exactly-Once Publication â What if the publisher crashes after sending the message but before deleting it from the Outbox?
Real World Outcome
Youâll perform an âUpdate + Sendâ operation. Youâll then intentionally crash the âSendâ part. When you restart the system, it will automatically âresumeâ and send the missing message.
Example Output:
# Order Service
[DB] Transaction Started.
[DB] Order #201 Inserted.
[DB] Outbox entry 'OrderCreated' inserted.
[DB] Transaction Committed.
[CRASH] Simulating process death before message publication...
# Restarter Process
[BOOT] Found 1 pending message in OUTBOX.
[SEND] Publishing 'OrderCreated' for #201 to RabbitMQ.
[DB] Marking OUTBOX entry #201 as SENT.
Questions to Guide Your Design
- Efficiency
- If you poll the Outbox table every 100ms, what is the impact on DB performance?
- Debezium/CDC
- How would using the DB transaction log (WAL) differ from a manual Outbox table?
The Interview Questions Theyâll Ask
- âWhat problem does the Transactional Outbox pattern solve?â
- âWhy canât we just send a message inside a database transaction block?â
- âWhat are the trade-offs between polling and transaction log tailing?â
Hints in Layers
Hint 1: The Outbox Table
Fields: id, payload, status (PENDING, SENT), created_at.
Hint 2: The Producer
Inside your Go code: db.Begin(), db.Exec("INSERT INTO orders..."), db.Exec("INSERT INTO outbox..."), db.Commit().
Hint 3: The Relay
Create a separate Go routine (The âMessage Relayâ) that does SELECT * FROM outbox WHERE status = 'PENDING' LIMIT 10.
Project 6: Event Sourcing for Sagas (The Audit Log)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go
- Alternative Programming Languages: Java, F#, Elixir
- Coolness Level: Level 5: Pure Magic
- Business Potential: 5. The âIndustry Disruptorâ
- Difficulty: Level 4: Expert
- Knowledge Area: Persistence Patterns
- Software or Tool: EventStoreDB or custom JSON log
- Main Book: âMicroservices Patternsâ by Chris Richardson (Ch. 6)
What youâll build: A saga implementation where the state is not stored as a single row (e.g., status='PENDING'), but as a stream of events (e.g., OrderCreated, PaymentRequested, PaymentReceived).
Why it teaches Sagas: It provides a perfect audit trail. You can reconstruct the history of any saga at any point. This is how high-compliance systems (banking, healthcare) manage distributed transactions.
Core challenges youâll face:
- Replaying State â How do you calculate the current state from a list of past events?
- Snapshotting â What if a saga has 10,000 events? Do you have to read them all every time?
- Versioning Events â What happens when the business logic changes but old events remain in the store?
Thinking Exercise
The Bank Account
How do you know your current balance?
- Is it a single number in a cell? (State-based)
- Is it the sum of every transaction you ever made? (Event-sourced) If someone asks âWhy is my balance $50?â, which method provides the answer?
The Interview Questions Theyâll Ask
- âWhat is Event Sourcing?â
- âHow does Event Sourcing simplify or complicate Saga implementation?â
- âWhat is a projection in the context of event-sourced systems?â
Project 7: Handling Timeouts in Sagas (The Dead Letter)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go
- Alternative Programming Languages: Java, Node.js
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The âService & Supportâ Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Reliability / Operations
- Software or Tool: RabbitMQ Dead Letter Exchanges (DLX)
- Main Book: âBuilding Microservicesâ by Sam Newman
What youâll build: A saga where if a service doesnât respond within a certain time, the saga automatically triggers a compensation or moves to a âManual Interventionâ state.
Why it teaches Sagas: Real-world distributed systems have âsilent failuresâ. A service might not crash, but it might be too slow. Youâll learn how to implement timeouts and retries using messaging infrastructure.
Core challenges youâll face:
- Defining âToo Longâ â What is the threshold before you give up?
- Zombie Sagas â What if a service responds AFTER the timeout has triggered a refund?
- Visibility â How do you notify a human that a saga has failed and requires manual fixing?
Real World Outcome
Youâll slow down your PaymentService (using time.Sleep). Your OrderOrchestrator will wait for 5 seconds, then declare the payment âTimed Outâ and trigger the âCancel Orderâ flow.
Example Output:
[ORCHESTRATOR] Sent 'ChargeCard' to PaymentService.
[ORCHESTRATOR] Timer started: 5s.
...
[ORCHESTRATOR] Timer Expired! PaymentService took too long.
[ORCHESTRATOR] Transitioning to: FAILED_TIMEOUT.
[ORCHESTRATOR] Sending 'CancelOrder' to OrderService.
The Core Question Youâre Answering
âIn a distributed system, how do you distinguish between a âslowâ service and a âdeadâ service?â
The Interview Questions Theyâll Ask
- âWhat is a Dead Letter Exchange?â
- âHow do you handle âlateâ messages that arrive after a timeout has been processed?â
- âWhy are retries dangerous in non-idempotent systems?â
Hints in Layers
Hint 1: Orchestrator Timer
When the Orchestrator sends a command, it should also store a expires_at timestamp in the database.
Hint 2: The Reaper
Create a background process that periodically checks for sagas where now() > expires_at and status is still IN_PROGRESS.
Hint 3: DLX
Configure RabbitMQ so that if a message stays in a queue for more than X seconds, itâs moved to a failed_messages queue.
Project 8: Concurrent Sagas and Isolation (The Lost Update)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go
- Alternative Programming Languages: Java, Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The âResume Goldâ
- Difficulty: Level 4: Expert
- Knowledge Area: Concurrency Control
- Software or Tool: Redis (Distributed Locks for testing)
- Main Book: âMicroservices Patternsâ by Chris Richardson (Ch. 4.3)
What youâll build: A scenario where two different Sagas try to modify the same record (e.g., updating the same inventory item or bank account) at the same time. You will implement countermeasures like âSemantic Lockingâ.
Why it teaches Sagas: Sagas lack ACID isolation. This project forces you to grapple with the âAnomaliesâ (Lost Updates, Dirty Reads) that occur when transactions arenât isolated.
Core challenges youâll face:
- Detecting Conflicts â How do you know another saga is mid-flight on this record?
- Semantic Locking â How do you flag a record as âPending Updateâ without using a DB lock?
- Commutative Updates â Can you design your operations so the order doesnât matter (e.g., incrementing/decrementing)?
Thinking Exercise
The Overbooked Flight
Two people try to book the last seat.
- Saga A: Check seat (Free) -> Charge Card -> Reserve Seat.
- Saga B: Check seat (Free) -> Charge Card -> Reserve Seat. Both check at the same time. Both see itâs free. Both charge the card. Only one can have the seat. How do you prevent this?
The Interview Questions Theyâll Ask
- âWhat is the âIsolationâ problem in Sagas?â
- âExplain âSemantic Lockingâ.â
- âHow can âPessimistic Viewâ help manage concurrency in Sagas?â
Project 9: Monitoring Distributed Workflows (The Dashboard)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go + React
- Alternative Programming Languages: Node.js + Vue, Python + Streamlit
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The âMicro-SaaS / Pro Toolâ
- Difficulty: Level 2: Intermediate
- Knowledge Area: Observability
- Software or Tool: OpenTelemetry / Jaeger
- Main Book: âBuilding Microservicesâ by Sam Newman (Ch. 8)
What youâll build: A web dashboard that visualizes the path of a single Saga across all services. It shows which steps succeeded, which failed, and the timing of each event.
Why it teaches Sagas: Sagas are hard to debug. Without visualization, youâre just looking at logs across 5 different services. This project teaches âDistributed Tracingâ.
Core challenges youâll face:
- Context Propagation â How do you pass the
trace_idfrom the Order Service to the Payment Service through the Message Broker? - Aggregation â How do you collect events from multiple services and correlate them in one UI?
- Real-time Updates â Using WebSockets to show a Saga progressing live.
Real World Outcome
A dashboard where you can enter an Order ID and see a âTimelineâ view. Green dots for success, red dots for failure, and arrows showing the event flow.
Example Dashboard View:
Order #501
[O] OrderCreated (0ms) ----------> [P] PaymentSuccess (450ms)
|
V
[S] ShippingReserved (FAILED) <--- [I] InventoryReserved (1.2s)
|
+--- [P] REFUNDING (1.5s) -----> [P] COMPENSATED (1.8s)
The Interview Questions Theyâll Ask
- âWhat is a Correlation ID and how is it different from a Trace ID?â
- âHow do you monitor the health of a distributed transaction?â
- âWhat are the 3 pillars of Observability?â
Project 10: The âSuper-Sagaâ (Dynamic Routing)
- File: LEARN_DISTRIBUTED_TRANSACTIONS_SAGAS.md
- Main Programming Language: Go
- Alternative Programming Languages: Java, Node.js
- Coolness Level: Level 5: Pure Magic
- Business Potential: 5. The âIndustry Disruptorâ
- Difficulty: Level 5: Master
- Knowledge Area: Advanced Workflow Engines
- Software or Tool: Temporal.io or custom DSL
- Main Book: âMicroservices Patternsâ by Chris Richardson
What youâll build: A saga where the next step is not hardcoded, but decided by a rules engine at runtime. For example: âIf Customer is VIP, skip payment checkâ or âIf Order > $1000, require manual approval stepâ.
Why it teaches Sagas: It takes the pattern to its logical conclusion. Youâre no longer building a hardcoded flow; youâre building a âDistributed Workflow Engineâ.
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| 1. Happy Path | Level 1 | Weekend | â | đ |
| 2. Compensation | Level 2 | 1 Week | ââ | đ |
| 3. Orchestrator | Level 2 | 1 Week | âââ | đ§ |
| 4. Idempotency | Level 2 | Weekend | âââ | đ |
| 5. Outbox | Level 3 | 2 Weeks | ââââ | đď¸ |
| 6. Event Sourcing | Level 4 | 1 Month | âââââ | đ§ââď¸ |
| 7. Timeouts | Level 3 | 1 Week | âââ | â° |
| 8. Concurrency | Level 4 | 2 Weeks | âââââ | đ§ |
| 9. Dashboard | Level 2 | 1 Week | ââ | đ |
| 10. Super-Saga | Level 5 | 1 Month+ | âââââ | đ |
Recommendation
If you are new to distributed systems, start with Project 1 and 2. These will give you the âAha!â moment about event-driven design and the fundamental need for compensation.
If you are an experienced backend dev looking to level up your architectural skills, Project 5 (Transactional Outbox) is the most valuable âreal-worldâ pattern you can learn.
Final Overall Project: The âDistributed E-Commerce Engineâ
Combine everything youâve learned into a single, production-grade system.
- Goal: Build a system that handles Orders, Payments, Inventory, Shipping, and Customer Notifications.
- Requirements:
- Use an Orchestrator with a persistent State Machine.
- Every service uses the Transactional Outbox pattern.
- Every consumer is Idempotent.
- Implement a âDead Letterâ strategy for all queues.
- Build a React Dashboard to monitor the status of every order in real-time.
- Implement Circuit Breakers on the RPC calls between services.
Summary
This learning path covers Distributed Transactions (Sagas) through 10 hands-on projects. Hereâs the complete list:
| # | Project Name | Main Language | Difficulty | Time Estimate |
|---|---|---|---|---|
| 1 | Simple Order Choreography | Go | Level 1 | Weekend |
| 2 | Simple Order Compensation | Go | Level 2 | 1 Week |
| 3 | Simple Order Orchestrator | Go | Level 2 | 1 Week |
| 4 | Idempotent Consumer | Go | Level 2 | Weekend |
| 5 | Transactional Outbox | Go | Level 3 | 2 Weeks |
| 6 | Event Sourcing for Sagas | Go | Level 4 | 1 Month |
| 7 | Handling Timeouts | Go | Level 3 | 1 Week |
| 8 | Concurrent Sagas | Go | Level 4 | 2 Weeks |
| 9 | Saga Dashboard | Go/React | Level 2 | 1 Week |
| 10 | Super-Saga Engine | Go | Level 5 | 1 Month+ |
Recommended Learning Path
For beginners: Start with projects #1, #2, #4. For intermediate: Focus on projects #3, #5, #7, #9. For advanced: Focus on projects #6, #8, #10.
Expected Outcomes
After completing these projects, you will:
- Master the difference between Orchestration and Choreography.
- Be able to design and implement Compensating Transactions.
- Understand how to guarantee Eventual Consistency without 2PC.
- Implement critical infrastructure patterns like Transactional Outbox and Idempotency.
- Build systems that are resilient to failure, network delays, and duplicates.
Youâll have built 10 working projects that demonstrate deep understanding of Distributed Transactions from first principles.