← Back to all projects

SERVICE MESH INTERNALS DEEP DIVE

As microservices grew from a few dozen to thousands, the network became the most unreliable part of the system. Initially, developers baked retry logic, circuit breaking, and security into every single application. This was a nightmare to maintain across different languages and teams.

Learn Service Mesh Internals: From Proxy to Control Plane

Goal: Deeply understand the internal mechanics of a Service Mesh—how traffic is intercepted, how mutual TLS (mTLS) is negotiated at the byte level, how dynamic configuration (xDS) works without restarts, and how to build the control plane logic that orchestrates thousands of sidecars.


Why Service Mesh Internals Matter

As microservices grew from a few dozen to thousands, the “network” became the most unreliable part of the system. Initially, developers baked retry logic, circuit breaking, and security into every single application. This was a nightmare to maintain across different languages and teams.

The Service Mesh emerged to pull these concerns out of the application and into the infrastructure. By understanding the internals, you move from “YAML engineer” to a systems architect who understands how to build resilient, secure, and observable distributed systems.

  • The “Sidecar” Revolution: Learn why moving the network stack out of the process was the biggest architectural shift since containers.
  • Zero Trust by Default: Understand how identities are minted (SPIFFE) and how mTLS works when the application doesn’t even know it’s happening.
  • Dynamic Configuration: See how Envoy’s xDS API allows a fleet of proxies to update their routing rules in real-time without dropping a single packet.

Core Concept Analysis

1. The Data Plane vs. Control Plane

The most fundamental split in service mesh architecture.

      CONTROL PLANE (The Brain)
     ┌───────────────────────┐
     │  - Policy Engine      │
     │  - Identity Provider  │
     │  - Config Distributor │
     └──────────┬────────────┘
                │
    xDS Protocol│ (Dynamic Configuration)
                ▼
      DATA PLANE (The Muscle)
     ┌───────────────────────┐      ┌───────────────────────┐
     │   App A (Container)   │      │   App B (Container)   │
     └──────────▲────────────┘      └──────────▲────────────┘
                │                              │
     ┌──────────┴────────────┐      ┌──────────┴────────────┐
     │  ENVOY PROXY (Sidecar)│◄────►│  ENVOY PROXY (Sidecar)│
     └───────────────────────┘ mTLS └───────────────────────┘

2. Traffic Interception (The “Magic”)

How does traffic even get to the proxy? This is usually done via iptables or eBPF.

[ Application ]  --> [ Outbound Port ]
                          │
                  (iptables magic)
                          │
                          ▼
                 [ Envoy Proxy :15001 ]
                          │
                  (Service Discovery)
                          │
                          ▼
                 [ Destination Pod ]

3. The xDS API Family

Envoy doesn’t use static config files in production. It uses a family of APIs:

  • LDS (Listener Discovery Service): What ports/sockets to open.
  • RDS (Route Discovery Service): How to map paths to clusters.
  • CDS (Cluster Discovery Service): What backend groups exist.
  • EDS (Endpoint Discovery Service): What are the IP addresses of the individual pods.

4. Mutual TLS (mTLS) and SPIFFE

In a mesh, identity is not an IP address. It’s a URI.

  • SPIFFE: A standard for naming services (e.g., spiffe://cluster.local/ns/default/sa/app-a).
  • SVID: The certificate that proves this identity.
  • mTLS handshake: Both sidecars present certificates to each other, ensuring encrypted and authenticated communication.

Concept Summary Table

Concept Cluster What You Need to Internalize
The Proxy (Envoy) The sidecar is a transparent network intermediary. It must be high performance, low footprint, and fully dynamic.
xDS Protocol A set of discovery APIs (LDS, RDS, CDS, EDS) that allow the control plane to push config to the data plane in real-time.
Traffic Interception Mechanisms (iptables/eBPF) that redirect application traffic to the proxy without application changes.
Identity & mTLS Services have cryptographic identities (SPIFFE). mTLS provides encryption, authentication, and authorization.
Resilience Patterns Retries, timeouts, and circuit breaking implemented at the proxy level to prevent cascading failures.
Observability Standardized telemetry (metrics, logs, traces) emitted by the proxy for every network request.

Deep Dive Reading by Concept

This section maps each concept from above to specific book chapters for deeper understanding. Read these before or alongside the projects to build strong mental models.

Data Plane & Proxy Internals

Concept Book & Chapter
Envoy Architecture “Istio in Action” by Christian Posta — Ch. 3: “Istio’s data plane: The Envoy proxy”
Proxy Fundamentals “Service Mesh with Envoy and Istio” by Kasun Indrasiri — Ch. 2: “The Sidecar Pattern”
Envoy Configuration “Service Mesh with Envoy and Istio” by Kasun Indrasiri — Ch. 4: “Envoy Proxy”

Control Plane & xDS

Concept Book & Chapter
xDS API Design “Istio in Action” by Christian Posta — Ch. 5: “Traffic control: Fine-grained traffic routing”
Dynamic Config “Service Mesh with Envoy and Istio” by Kasun Indrasiri — Ch. 3: “Service Mesh Architecture”

Security & mTLS

Concept Book & Chapter
mTLS & Identity “Istio in Action” by Christian Posta — Ch. 9: “Securing microservice communication”
SPIFFE/SPIRE “Solving the Bottom Turtle” by Scarfone et al. — Ch. 2-4: “Identity in Distributed Systems”

Essential Reading Order

For maximum comprehension, read in this order:

  1. Foundation (Week 1):
    • Istio in Action Ch. 1 & 3 (The “Why” and the “How” of Proxies)
    • Service Mesh with Envoy and Istio Ch. 2 (Sidecar Pattern)
  2. Operations & Resilience (Week 2):
    • Istio in Action Ch. 5 & 6 (Routing and Resilience)
    • Istio in Action Ch. 9 (Security)

Project List

Projects are ordered from fundamental proxy configuration to building a fully custom control plane.


Project 1: The “Hello Mesh” Envoy Sidecar

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: YAML (Envoy Config)
  • Alternative Programming Languages: JSON, HCL
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Network Proxy / Reverse Proxy
  • Software or Tool: Envoy Proxy, Docker
  • Main Book: “Service Mesh with Envoy and Istio” by Kasun Indrasiri

What you’ll build: A Docker-composed environment where a simple Python web service is “wrapped” by an Envoy sidecar. You will manually configure Envoy to listen on a port and forward traffic to the local application.

Why it teaches service mesh: This is the “atom” of a service mesh. You’ll understand the transition from “Client -> App” to “Client -> Proxy -> App”. It demystifies the sidecar pattern by showing that it’s just a separate process handling the networking.

Core challenges you’ll face:

  • Configuring listeners and filters → maps to Envoy’s processing pipeline
  • Mapping local upstream clusters → maps to how Envoy finds the app
  • Networking inside a container → maps to localhost vs container-ip boundaries

Key Concepts:

  • Static Configuration: “Envoy Proxy” (Ch. 4) - Kasun Indrasiri
  • Listener/Filter/Cluster: Envoy Documentation (Intro)

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Docker, understanding of HTTP.


Real World Outcome

You’ll have a running environment where you can curl the proxy and see the application’s response, but with added headers injected by Envoy that the application didn’t send.

Example Output:

$ curl -v localhost:10000
* Connected to localhost (127.0.0.1) port 10000 (#0)
> GET / HTTP/1.1
> Host: localhost:10000
>
< HTTP/1.1 200 OK
< server: envoy
< x-envoy-upstream-service-time: 2
< x-my-injected-header: envoy-was-here
<
Hello from the Python App!

The Core Question You’re Answering

“If I move the networking logic out of my code, how does the proxy know where to send the bytes, and what does it add to the request?”

Before you write any code, sit with this question. Developers often think the proxy “magically” knows where the app is. In reality, it’s a precisely configured chain of listeners and clusters.


Concepts You Must Understand First

Stop and research these before coding:

  1. Envoy Listener
    • What is a downstream vs upstream connection?
    • How does a filter chain process a request?
    • Book Reference: “Istio in Action” Ch. 3
  2. Docker Networking
    • How do two containers in the same network talk?
    • What does 127.0.0.1 refer to inside a container?

Questions to Guide Your Design

Before implementing, think through these:

  1. Port Mapping
    • If my app runs on 8080, and Envoy runs on 10000, what port does the outside world see?
    • How do I prevent the outside world from bypassing Envoy and hitting 8080 directly?
  2. The Filter Chain
    • Where in the config would I add a rule to log every request?
    • How do I add a custom HTTP header to every response?

Thinking Exercise

Tracing the Packet

Draw a diagram of a request moving from your browser to a Python app. Label every transition:

  1. Browser TCP stack -> Envoy Port 10000
  2. Envoy Listener -> HTTP Connection Manager
  3. HTTP Connection Manager -> Router Filter
  4. Router Filter -> Cluster “app_service”
  5. Cluster “app_service” -> App Port 8080

Questions while tracing:

  • At which point is the x-envoy-upstream-service-time header added?
  • If the App is down, which component returns the 503 error?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the sidecar pattern and its pros/cons vs a central gateway.”
  2. “What is the difference between a Listener and a Cluster in Envoy?”
  3. “How does a proxy improve observability without changing application code?”
  4. “Why is Envoy often chosen over Nginx or HAProxy for service meshes?”
  5. “What is an Envoy Filter, and why is the order of filters important?”

Hints in Layers

Hint 1: The Config Structure Envoy’s static config starts with static_resources:. You need a listeners: section and a clusters: section.

Hint 2: The Listener The listener needs an address (e.g., 0.0.0.0 port 10000) and a filter_chains. For HTTP, use the envoy.filters.network.http_connection_manager.

Hint 3: The Upstream The cluster name in the route_config must match the name in the clusters section. Use load_assignment with a static_endpoint.

Hint 4: Debugging Run Envoy with -l debug. Look for “cluster: app_service, health_status: healthy”. If it’s not healthy, Envoy will return a 503.


Books That Will Help

Topic Book Chapter
Envoy Configuration “Service Mesh with Envoy and Istio” by Kasun Indrasiri Ch. 4
Proxy Mechanics “Istio in Action” by Christian Posta Ch. 3

Project 2: Manual Traffic Shifting (Canary Deployment)

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: YAML (Envoy Config)
  • Alternative Programming Languages: JSON
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Traffic Engineering
  • Software or Tool: Envoy, Docker
  • Main Book: “Istio in Action” by Christian Posta

What you’ll build: A setup with two versions of an application (v1 and v2). You will configure Envoy to split traffic 90/10 between them using weighted clusters.

Why it teaches service mesh: This demonstrates the “Traffic Shifting” power of a mesh. You’ll learn that the proxy can decouple the “Service Address” from the “Deployment Version”. This is the foundation of Canary releases and Blue/Green deployments.

Core challenges you’ll face:

  • Weighted Cluster configuration → maps to Traffic splitting logic
  • Identifying versions → maps to how to tag backends
  • Verification of split → maps to observing statistical distribution

Key Concepts:

  • Weighted Clusters: Envoy Docs (HTTP Route)
  • Canary Deployment: “Traffic control” (Ch. 5) - Christian Posta

Real World Outcome

You will run a loop of 100 requests and see that approximately 10 requests hit version 2 while 90 hit version 1.

Example Output:

$ for i in {1..100}; do curl -s localhost:10000 | grep "Version"; done | sort | uniq -c
  91 Version 1.0 (Stable)
   9 Version 2.0 (Canary)

The Core Question You’re Answering

“How can I release a new version of my code to only 1% of users without changing DNS or Load Balancer IPs?”

This project answers how the data plane handles subset routing.


Thinking Exercise

The Weighting Game

Imagine you have 3 clusters: stable (weight 80), canary (weight 15), and beta (weight 5). If a request comes in, how does Envoy pick the destination? Research “WRR (Weighted Round Robin)” vs “Random weighted selection”.


Hints in Layers

Hint 1: The Cluster Section You now need TWO clusters in your static_resources.clusters list: service_v1 and service_v2.

Hint 2: Weighted Clusters Inside the route object of your route_config, instead of cluster: service_v1, use weighted_clusters:.

Hint 3: Syntax weighted_clusters takes a list of objects, each having a name and a weight (integer).

Hint 4: Total Weight Ensure your weights add up to 100 for easy mental math, though Envoy allows any total (the total_weight parameter defaults to 100).


Project 3: The Circuit Breaker (Protecting the Backend)

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: YAML (Envoy Config)
  • Alternative Programming Languages: JSON
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Resilience Engineering
  • Software or Tool: Envoy, Docker
  • Main Book: “Istio in Action” by Christian Posta

What you’ll build: A configuration that protects a “slow” or “unstable” backend. You will set a limit on the number of concurrent connections and a threshold for failing requests. When these are hit, Envoy will “open the circuit” and fail fast.

Why it teaches service mesh: You’ll understand how the proxy prevents cascading failures. Instead of the application getting overwhelmed and crashing (or timing out clients slowly), the proxy rejects traffic immediately once it detects the backend is struggling.

Core challenges you’ll face:

  • Defining thresholds → maps to capacity planning
  • Observing the state change → maps to Envoy metrics (circuit_breakers)
  • Simulating failure → maps to using slow backends

Key Concepts:

  • Circuit Breaking: “Resilience” (Ch. 6) - Christian Posta
  • Connection Pooling: Envoy Docs (Circuit Breaking)

Real World Outcome

When you run a high-concurrency test against a slow backend, you’ll see Envoy return 503 Service Unavailable immediately for excess requests, rather than them waiting and eventually timing out.

Example Output:

$ ab -n 100 -c 20 http://localhost:10000/
# If limit is 5 concurrent connections:
Complete requests:      100
Failed requests:        75  (Rejected by Envoy circuit breaker)
Non-2xx responses:      75

The Core Question You’re Answering

“How can I stop my entire system from going down just because one service is slow?”

This is the essence of the “Fail Fast” principle in distributed systems.


Hints in Layers

Hint 1: Thresholds Circuit breaking in Envoy is defined at the cluster level, not the listener or route level.

Hint 2: Parameters Look for the circuit_breakers field in the cluster config. You want to set max_connections or max_pending_requests.

Hint 3: Outlier Detection If you want to eject a specific pod from a cluster because it’s returning 5xx errors, you need outlier_detection.

Hint 4: Testing Use a tool like hey or ab (Apache Benchmark) to generate enough concurrency to trigger the breaker.


Project 4: Manual mTLS (The Byte-Level Handshake)

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: OpenSSL / YAML (Envoy)
  • Alternative Programming Languages: Go (for cert generation)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Cryptography / Identity
  • Software or Tool: OpenSSL, Envoy
  • Main Book: “Solving the Bottom Turtle” by Scarfone et al.

What you’ll build: You will manually generate a Certificate Authority (CA) and two sets of client/server certificates. You will then configure two Envoy proxies to require mutual TLS for all communication between them.

Why it teaches service mesh: This demystifies “Auto-mTLS”. You’ll see exactly what files Istio or Linkerd are moving around behind the scenes. You’ll learn how Envoy validates a peer’s certificate against a trusted root and checks for a specific identity in the SAN (Subject Alternative Name) field.

Core challenges you’ll face:

  • Generating correct SANs → maps to SPIFFE identity format
  • Configuring TLS Contexts → maps to Downstream vs Upstream TLS
  • Debugging handshake failures → maps to using ssldump or Envoy logs

Key Concepts:

  • Mutual TLS: “Securing microservice communication” (Ch. 9) - Christian Posta
  • SPIFFE ID: spiffe.io/docs/latest/spiffe-about/overview/

Real World Outcome

A curl from a “rogue” container without the correct certificate will be rejected at the TCP level (reset), while the authenticated sidecar will be allowed through.

Example Output:

# Rogue container (no cert)
$ curl http://envoy-peer:15001
curl: (56) Recv failure: Connection reset by peer

# Envoy sidecar logs (debug mode)
[debug][connection] [source/common/ssl/ssl_socket.cc:220] TLS error: 268435581:SSL routines:OPENSSL_internal:PEER_DID_NOT_RETURN_A_CERTIFICATE

The Core Question You’re Answering

“How can I prove that ‘Service A’ is actually ‘Service A’ without using passwords or API keys?”

This moves identity from the application layer to the transport layer.


Thinking Exercise

The Certificate Chain

If Envoy-A trusts CA-1, and Envoy-B presents a certificate signed by CA-2, will the handshake succeed? What if Envoy-A is given CA-2’s root certificate as a “trusted CA”? Research “Federated Trust” in service meshes.


Hints in Layers

Hint 1: Cert Generation Use openssl req to create the CSR and openssl x509 to sign it. Ensure you add subjectAltName = URI:spiffe://cluster.local/ns/default/sa/app-a to the config.

Hint 2: Downstream TLS In the receiving Envoy (the server sidecar), you need a transport_socket in the listener section. Use envoy.transport_sockets.tls.

Hint 3: Upstream TLS In the sending Envoy (the client sidecar), you need a transport_socket in the cluster section.

Hint 4: Validation Both sides need the trusted_ca (root cert). The server side needs require_client_certificate: true to make it mutual TLS.


Project 5: The Programmable Proxy (Lua Filter)

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: Lua
  • Alternative Programming Languages: WebAssembly (Wasm)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Programmable Infrastructure
  • Software or Tool: Envoy, Lua
  • Main Book: “Istio in Action” by Christian Posta

What you’ll build: A custom Envoy filter written in Lua that inspects an incoming request’s body or headers and makes a routing decision based on complex logic that YAML can’t express (e.g., checking a hash of a user ID).

Why it teaches service mesh: It shows how to extend the data plane. Most service mesh features are just “filters” in the Envoy chain. By writing your own, you understand the lifecycle of a request inside the proxy’s memory.

Core challenges you’ll face:

  • Manipulating headers in Lua → maps to Envoy’s Lua API
  • Handling asynchronous calls → maps to how filters yield
  • Performance implications → maps to latency costs of custom logic

Key Concepts:

  • Envoy Lua Filter: Envoy Docs (Lua Filter)
  • Extending the request path: “Incorporating virtual machine workloads” (Ch. 13) - Christian Posta

Real World Outcome

You’ll be able to send requests with a specific header (e.g., x-user-id: 12345) and have the Lua filter hash the ID and decide whether to add a “privileged” flag or redirect to a specific cluster.

Example Output:

$ curl -H "x-user-id: 999" localhost:10000
# Lua filter sees ID 999, calculates hash, decides it's 'Experimental Group'
< x-envoy-lua-processed: true
< x-user-bucket: experimental
Hello from Version 2!

Project 6: DIY Control Plane (The xDS Server)

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Python (gRPC), Java
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Control Plane Design / gRPC
  • Software or Tool: Go-Control-Plane, Envoy
  • Main Book: “Istio in Action” by Christian Posta

What you’ll build: A minimal gRPC server that implements the Envoy xDS APIs (LDS, RDS, CDS, EDS). Instead of using a static YAML file, Envoy will connect to your Go server and “ask” for its configuration.

Why it teaches service mesh: This is the “Aha!” moment. You’ll move from managing one proxy to managing a fleet. You’ll understand how Istiod (Istio’s control plane) actually works—it’s just a gRPC server that translates high-level intent into Envoy’s low-level xDS protocol.

Core challenges you’ll face:

  • Implementing the gRPC stream → maps to xDS bidirectional streaming
  • Version management → maps to how Envoy knows config has changed
  • Resource snapshotting → maps to atomicity of config updates

Key Concepts:

  • xDS Protocol: Envoy Docs (xDS)
  • Go-Control-Plane: github.com/envoyproxy/go-control-plane

Real World Outcome

You’ll have an Envoy proxy running with no static configuration. You will then start your Go server, and suddenly Envoy will open a listener and start forwarding traffic. You can then change a variable in your Go code, and Envoy will update its routing without a restart.

Example Output:

# Envoy Logs
[info][config] [source/common/config/grpc_subscription_impl.cc:58] gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener accepted

# Controller Logs
[info] Pushing new RDS config version: v2
[info] Snapshot updated for node: envoy-sidecar-1

Project 7: The “Invisible” Proxy (eBPF Interception)

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: C (eBPF)
  • Alternative Programming Languages: Rust (aya), Go (ebpf-go)
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 5: Master
  • Knowledge Area: Kernel Networking / eBPF
  • Software or Tool: eBPF, Cilium, Envoy
  • Main Book: “Learning eBPF” by Liz Rice

What you’ll build: Instead of using slow iptables rules, you’ll write an eBPF program that hooks into the socket layer and transparently redirects traffic from an application to the Envoy proxy.

Why it teaches service mesh: You’ll understand how “ambient” or “sidecarless” meshes like Cilium or Istio Ambient work. You’ll see how to bypass the heavy kernel network stack to move packets between processes at lightning speed.

Core challenges you’ll face:

  • Verifying eBPF program safety → maps to the eBPF verifier
  • Socket-level redirection → maps to bpf_msg_redirect_hash
  • Handling connection state → maps to eBPF maps

Key Concepts:

  • eBPF Socket Maps: Cilium Documentation (eBPF Datapath)
  • Sockmap Redirection: “Learning eBPF” (Ch. 7) - Liz Rice

Real World Outcome

You’ll run an application that thinks it’s talking to google.com:80, but your eBPF program will intercept the connect() call and redirect it to your local Envoy proxy, which can then serve a cached response or apply security policies.

Example Output:

$ bpftool prog show
123: sock_ops  name app_interceptor  tag a1b2c3d4e5
    loaded_at 2025-12-28T10:00:00+0000  uid 0

$ curl http://google.com
# The request never left the machine! It was redirected to Envoy by eBPF.
Hello from Local Envoy Cache!

Project 8: Full Stack Observability (Tracing & Metrics)

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: YAML / Go
  • Alternative Programming Languages: Prometheus, Jaeger
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Distributed Tracing / Telemetry
  • Software or Tool: Prometheus, Jaeger, Envoy
  • Main Book: “Istio in Action” by Christian Posta

What you’ll build: You will configure Envoy to emit Prometheus metrics and Zipkin/Jaeger traces. You’ll then build a dashboard that shows the “Golden Signals” (Latency, Errors, Traffic) for your microservices without adding a single line of instrumentation to the app code.

Why it teaches service mesh: You’ll learn how the mesh provides “Uniform Observability”. You’ll see how B3 or W3C Traceparent headers are propagated and how Envoy acts as the source of truth for the health of the network.

Core challenges you’ll face:

  • Trace ID Propagation → maps to header passing in apps
  • Metric scraping → maps to Envoy’s /stats endpoint
  • Configuring samplers → maps to handling trace volume

Key Concepts:

  • Distributed Tracing: “Observability” (Ch. 7) - Christian Posta
  • Envoy Stats: Envoy Docs (Statistics)

Real World Outcome

You’ll open the Jaeger UI and see a waterfall diagram showing a request moving from Service A to Service B, including exactly how many milliseconds Envoy spent processing the request vs. the actual application time.


Project 9: Global Rate Limiting (RLS)

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: Go / YAML
  • Alternative Programming Languages: Redis
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Distributed Systems / Rate Limiting
  • Software or Tool: Envoy RLS Service, Redis
  • Main Book: “Service Mesh with Envoy and Istio” by Kasun Indrasiri

What you’ll build: A global rate limiting service. You’ll configure Envoy to call an external gRPC service (that you’ll write) before allowing a request. This service will use Redis to track usage across a whole cluster of proxies.

Why it teaches service mesh: You’ll understand the “External Filter” pattern. You’ll see how Envoy offloads heavy or global state decisions to specialized services, keeping the data plane fast while allowing for global policy enforcement.

Core challenges you’ll face:

  • Implementing the RLS gRPC API → maps to Envoy RLS protocol
  • Redis atomic increments → maps to handling race conditions
  • Fail-open vs Fail-closed → maps to availability tradeoffs

Key Concepts:

  • Global Rate Limiting: Envoy Docs (Rate Limit Service)
  • Distributed Caching: “Redis in Action” - Josiah Carlson

Project 10: Next-Gen Extensibility (Envoy Wasm)

  • File: SERVICE_MESH_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: Rust
  • Alternative Programming Languages: C++, Go (TinyGo)
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 4: Expert
  • Knowledge Area: WebAssembly / Proxy Extensions
  • Software or Tool: Envoy, Proxy-Wasm SDK
  • Main Book: “WebAssembly: The Definitive Guide” by Brian Sletten

What you’ll build: A high-performance Envoy extension using WebAssembly. You’ll write a plugin in Rust that performs real-time payload transformation (e.g., masking PII in JSON responses) and load it into Envoy without recompiling the proxy.

Why it teaches service mesh: Wasm is the future of service mesh extensibility (used by Istio). It teaches you how to run untrusted code at near-native speed inside a sandbox within the proxy.

Core challenges you’ll face:

  • Handling the Wasm memory boundary → maps to host vs guest memory
  • The Proxy-Wasm ABI → maps to standardized hooks for proxies
  • Building a compact binary → maps to TinyGo or Rust no_std

Key Concepts:

  • Proxy-Wasm ABI: github.com/proxy-wasm/spec
  • Wasm in Envoy: Envoy Docs (Wasm Filter)

Real World Outcome

You’ll see Envoy automatically scrub sensitive data (like credit card numbers) from responses based on a regex defined in your Rust-Wasm plugin, with significantly lower overhead than the Lua filter.


Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. Hello Mesh Level 1 Weekend High (Foundation) ⭐⭐⭐
2. Traffic Shift Level 2 Weekend High (Routing) ⭐⭐⭐⭐
3. Circuit Breaker Level 2 Weekend High (Resilience) ⭐⭐⭐
4. Manual mTLS Level 3 1 Week Expert (Security) ⭐⭐⭐⭐⭐
5. Lua Filter Level 3 1 Week Expert (Extensibility) ⭐⭐⭐⭐
6. DIY xDS Server Level 4 2 Weeks Master (Architecture) ⭐⭐⭐⭐⭐
7. eBPF Redirect Level 5 1 Month Master (Kernel) ⭐⭐⭐⭐⭐
8. Observability Level 2 Weekend Intermediate ⭐⭐⭐
9. Global RLS Level 3 1 Week High (Distributed) ⭐⭐⭐⭐
10. Wasm Plugin Level 4 2 Weeks Master (Performance) ⭐⭐⭐⭐⭐

Recommendation

Start with Project 1 (Hello Mesh). It is critical to see Envoy running as a standalone process before you try to automate it. Once you can manually route a packet through Envoy, jump straight to Project 6 (DIY xDS Server) if you are a strong programmer. Building the control plane is the single fastest way to “get” how a service mesh works at scale.


Final Overall Project: “Mini-Mesh”

The Challenge: Build a complete, functional service mesh for a 3-tier microservice app (Web -> API -> DB).

Requirements:

  1. Sidecar Injection: Automate the starting of Envoy alongside each service.
  2. Control Plane: A central server that manages discovery and pushes xDS config.
  3. mTLS: Every service-to-service call must be encrypted and authenticated.
  4. Traffic Dashboard: A single UI (using Prometheus/Grafana) that shows the traffic map of your mesh.
  5. Chaos Mode: A CLI tool that tells the Control Plane to inject 10% failure rates or 2s latency into specific services via xDS.

This project combines everything you’ve learned into a system that mimics a production Istio or Linkerd installation.


Summary

This learning path covers Service Mesh Internals through 10 hands-on projects. Here’s the complete list:

# Project Name Main Language Difficulty Time Estimate
1 Hello Mesh Envoy YAML Level 1 Weekend
2 Canary Deployment YAML Level 2 Weekend
3 Circuit Breaking YAML Level 2 Weekend
4 Manual mTLS OpenSSL Level 3 1 Week
5 Lua Request Filter Lua Level 3 1 Week
6 DIY xDS Control Plane Go Level 4 2 Weeks
7 eBPF Interception C Level 5 1 Month
8 Full Observability Stack YAML/Go Level 2 Weekend
9 Global Rate Limiter Go Level 3 1 Week
10 Wasm Payload Scrubbing Rust Level 4 2 Weeks

For beginners: Start with projects #1, #2, #3, #8 For intermediate: Focus on projects #4, #5, #9 For advanced: Master projects #6, #7, and #10

Expected Outcomes

After completing these projects, you will:

  • Deeply understand the Envoy processing pipeline and filter chain.
  • Master the xDS protocol and how dynamic configuration is pushed at scale.
  • Understand the cryptographic handshake and identity management of mTLS.
  • Be able to implement advanced traffic patterns like canary releases and circuit breaking.
  • Understand low-level interception techniques using iptables or eBPF.
  • Know how to extend the data plane using Lua or WebAssembly.

You’ll have built a working “Mini-Mesh” that demonstrates deep understanding of Service Mesh architecture from first principles.