Project 8: The Dependency Spaghetti Visualizer
Build a tool that extracts dependency data and visualizes team-to-team dependencies, identifying circular dependencies and bottleneck teams.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2 Weeks (25-35 hours) |
| Primary Language | Python (NetworkX) / Graphviz |
| Alternative Languages | JavaScript (Cytoscape.js), Neo4j |
| Prerequisites | Graph theory basics, distributed tracing concepts |
| Key Topics | Conway’s Law, Fracture Planes, System Architecture |
1. Learning Objectives
By completing this project, you will:
- Extract dependency data from distributed tracing or code analysis
- Build a dependency graph representing team-to-team relationships
- Identify architectural problems: circular dependencies, bottlenecks
- Visualize the “spaghetti” in an actionable way
- Recommend fracture planes for organizational redesign
2. Theoretical Foundation
2.1 Core Concepts
The Distributed Monolith Problem
EXPECTED (Microservices) REALITY (Distributed Monolith)
┌────┐ ┌────┐ ┌────┐───────┌────┐
│ A │────►│ B │ ┌─┤ A │◄─────►│ B │─┐
└────┘ └────┘ │ └────┘ └────┘ │
│ ▲ ▲ │
┌────┐ ┌────┐ │ │ │ │
│ C │────►│ D │ │ ▼ ▼ │
└────┘ └────┘ │ ┌────┐ ┌────┐│
└►│ C │◄─────►│ D ├┘
Clean interfaces └────┘ └────┘
Independent deploys Circular deps
Can't deploy alone
Conway’s Law in Reverse
If your system looks like spaghetti, your organization probably looks like spaghetti too.
CODE DEPENDENCY TEAM DEPENDENCY
┌──────────────────────────┐ ┌──────────────────────────┐
│ checkout-service │ │ Checkout Team │
│ → payment-service │ ==> │ → Talks to Payments │
│ → inventory-service │ │ → Talks to Inventory │
│ → user-service │ │ → Talks to Identity │
└──────────────────────────┘ └──────────────────────────┘
If checkout calls 5 services, the Checkout team coordinates with 5 teams.
Fracture Planes
Natural boundaries where a system can be split with minimal communication:
| Fracture Plane Type | Example |
|---|---|
| Business Domain | Payments vs. Shipping |
| Data Isolation | Each service owns its data |
| Technology | Python services vs. Go services |
| Regulation | PCI (payments) vs. non-PCI |
| Performance | Real-time vs. batch |
| Change Cadence | Fast-changing vs. stable |
Graph Metrics for Organizations
| Metric | Definition | What It Reveals |
|---|---|---|
| In-Degree | How many teams depend on you | Bottleneck risk |
| Out-Degree | How many teams you depend on | Coordination overhead |
| Cycles | Circular dependencies | Architecture smell |
| Centrality | How “central” a team is | Critical path risk |
| Clusters | Groups that only talk internally | Potential merge candidates |
2.2 Why This Matters
Dependencies are the “Enemy of Flow.”
DEPLOY PROCESS: Feature in Checkout
Without Clear Boundaries:
Checkout ─┬─► Payment Team (wait 2 days for approval)
├─► DB Team (wait 1 day for schema change)
├─► Security (wait 3 days for review)
└─► Platform (wait 1 day for config)
Total: 7 days of waiting, 4 hours of work
With Clear Boundaries:
Checkout ──► Self-service deploy
Total: 4 hours of work, deployed same day
2.3 Historical Context
- Microservices Architecture (2011+): Promise of independent deployment
- Distributed Tracing (2010s): Jaeger, Zipkin made dependencies visible
- Team Topologies (2019): Formalized fracture planes concept
2.4 Common Misconceptions
| Misconception | Reality |
|---|---|
| “More services = more agility” | More services without ownership = more coordination |
| “APIs solve coupling” | You can have tight coupling over APIs |
| “Dependencies are just technical” | Technical deps create organizational deps |
| “We can refactor later” | Organizational dependencies are harder to change than code |
3. Project Specification
3.1 What You Will Build
- Dependency Extractor: Pull data from tracing, code, or surveys
- Graph Builder: Create team-to-team dependency graph
- Analyzer: Calculate metrics (cycles, bottlenecks, clusters)
- Visualizer: Interactive graph visualization
- Reporter: Recommendations for organizational change
3.2 Functional Requirements
- Data Sources
- Distributed tracing (Jaeger/Zipkin spans)
- Code analysis (import statements, API calls)
- Service catalog (declared dependencies)
- Manual input (for non-code deps)
- Graph Operations
- Build directed graph of team dependencies
- Calculate graph metrics
- Detect cycles
- Identify clusters
- Visualization
- Interactive 2D/3D graph
- Click to see dependency details
- Filter by team, criticality, type
- Highlight problems (cycles, bottlenecks)
- Reporting
- Top 5 bottleneck teams
- All circular dependencies
- Recommended fracture planes
- Comparison over time
3.3 Non-Functional Requirements
- Handle 50+ teams, 200+ services
- Visualization must be interactive (not just static image)
- Analysis must complete in < 30 seconds
- Support for incremental updates
3.4 Example Usage / Output
Dependency Report:
# Dependency Analysis Report - Acme Corp
Generated: 2025-01-15
## Executive Summary
- **Teams Analyzed**: 12
- **Services Analyzed**: 45
- **Dependencies Found**: 128
- **Circular Dependencies**: 3 (HIGH RISK)
- **Bottleneck Teams**: 2
---
## Critical Findings
### 1. Circular Dependencies (MUST FIX)
| Cycle | Services Involved | Impact |
|-------|-------------------|--------|
| Cycle 1 | checkout → payments → promotions → checkout | Deployment deadlock |
| Cycle 2 | user → auth → session → user | Data consistency issues |
| Cycle 3 | order → inventory → shipping → order | Cannot test in isolation |
**Recommendation**: Break Cycle 1 by introducing event-driven communication between
promotions and checkout instead of synchronous API calls.
---
### 2. Bottleneck Teams (HIGH RISK)
| Team | In-Degree | Impact |
|------|-----------|--------|
| Platform | 42 | Every team depends on Platform |
| Identity | 38 | Every user-facing service calls Identity |
**Recommendation**:
- Platform should expose more self-service capabilities
- Identity should be split into Auth (fast path) and Profile (heavy reads)
---
### 3. Dependency Clusters
Teams that only talk to each other (potential merge candidates):
| Cluster | Teams | Internal Deps | External Deps |
|---------|-------|---------------|---------------|
| Commerce | Checkout, Cart, Payments | 15 | 3 |
| Logistics | Shipping, Inventory, Warehouse | 12 | 2 |
**Recommendation**: Commerce cluster could be one "Commerce Domain" team
with internal service ownership.
---
## Visualization
[Interactive Graph: http://deps.acme.internal/graph]
CLI Output:
$ ./deps analyze --source jaeger --days 7
Loading trace data... 45,230 spans loaded
Mapping services to teams... 45 services, 12 teams
Building dependency graph...
=== ANALYSIS RESULTS ===
[GRAPH STATS]
Nodes (Teams): 12
Edges (Dependencies): 128
Average In-Degree: 10.7
Average Out-Degree: 10.7
[CYCLES DETECTED: 3]
⚠️ checkout → payments → promotions → checkout
⚠️ user → auth → session → user
⚠️ order → inventory → shipping → order
[BOTTLENECK TEAMS]
🔴 Platform: 42 teams depend on this (CRITICAL)
🟡 Identity: 38 teams depend on this (HIGH)
🟢 Data: 5 teams depend on this (NORMAL)
[CLUSTER ANALYSIS]
📦 Commerce Cluster: Checkout, Cart, Payments
Internal density: 0.87 (very tight)
External deps: 3
Recommendation: Consider merging into one domain team
[FRACTURE PLANE SUGGESTIONS]
1. Commerce vs. Logistics (clean domain split)
2. Sync vs. Async services (communication style)
3. PCI vs. non-PCI (regulatory boundary)
Generating visualization...
Graph saved to: output/dependency-graph.html
Report saved to: output/dependency-report.md
Interactive Graph (Conceptual):
┌─────────────────────┐
│ PLATFORM │
│ (BOTTLENECK) │
│ In-Degree: 42 │
└─────────┬───────────┘
│
┌──────────────┬──────────────┼──────────────┬──────────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│Checkout │◄─►│ Payment │◄─►│ Cart │ │ Orders │ │Identity │
│ │ │ │ │ │ │ │ │(Bottlen)│
└────┬────┘ └────┬────┘ └─────────┘ └─────────┘ └─────────┘
│ │
└──────────────┴───► CYCLE DETECTED!
│
▼
┌─────────┐
│Promotions│
│ │
└─────────┘
3.5 Real World Outcome
After running this analysis:
- Identify 2-3 circular dependencies to break
- Find the 1-2 teams that are organizational bottlenecks
- Propose team restructuring based on natural boundaries
- Track dependency changes over time (is it getting better or worse?)
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────┐
│ DEPENDENCY VISUALIZATION SYSTEM │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ DATA SOURCES │ │ GRAPH BUILD │ │ ANALYSIS │
│ │ │ │ │ │
│ - Jaeger │────►│ - NetworkX │────►│ - Cycles │
│ - Zipkin │ │ - Team map │ │ - Centrality │
│ - Code scan │ │ - Aggregation │ │ - Clusters │
│ - Manual │ │ │ │ │
└───────────────┘ └───────────────┘ └───────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────┐
│ VISUALIZATION │
│ │
│ - D3.js / Cytoscape.js │
│ - Interactive zoom/filter │
│ - Color-coded by risk │
└─────────────────────────────────────────┘
4.2 Key Components
- Data Extractors: Plugins for different data sources
- Service-to-Team Mapper: Links services to owning teams
- Graph Builder: NetworkX-based graph construction
- Analyzer: Graph algorithms for metrics
- Visualizer: Interactive HTML/JavaScript output
- Reporter: Markdown report generator
4.3 Data Structures
# models.py
from dataclasses import dataclass
from typing import List, Set
@dataclass
class Service:
id: str
name: str
owner_team: str
tier: int # 1=critical, 2=important, 3=normal
@dataclass
class Dependency:
source_service: str
target_service: str
call_count: int # From tracing
dependency_type: str # runtime, build, data
@dataclass
class TeamDependency:
source_team: str
target_team: str
services_involved: List[tuple] # [(src_svc, tgt_svc), ...]
total_calls: int
is_bidirectional: bool
@dataclass
class AnalysisResult:
teams: List[str]
dependencies: List[TeamDependency]
cycles: List[List[str]] # Each cycle is list of teams
bottlenecks: List[tuple] # [(team, in_degree), ...]
clusters: List[Set[str]] # Groups of highly connected teams
# service-to-team.yaml
services:
checkout-api:
team: team-checkout
tier: 1
payment-gateway:
team: team-payments
tier: 1
inventory-service:
team: team-inventory
tier: 2
4.4 Algorithm Overview
import networkx as nx
from collections import defaultdict
def build_team_graph(services: dict, dependencies: List[Dependency]) -> nx.DiGraph:
G = nx.DiGraph()
# Add all teams as nodes
teams = set(s['team'] for s in services.values())
G.add_nodes_from(teams)
# Aggregate service deps to team deps
team_edges = defaultdict(lambda: {'count': 0, 'services': []})
for dep in dependencies:
src_team = services[dep.source_service]['team']
tgt_team = services[dep.target_service]['team']
if src_team != tgt_team: # Skip intra-team
key = (src_team, tgt_team)
team_edges[key]['count'] += dep.call_count
team_edges[key]['services'].append(
(dep.source_service, dep.target_service)
)
# Add edges
for (src, tgt), data in team_edges.items():
G.add_edge(src, tgt, weight=data['count'], services=data['services'])
return G
def find_cycles(G: nx.DiGraph) -> List[List[str]]:
return list(nx.simple_cycles(G))
def find_bottlenecks(G: nx.DiGraph, threshold: int = 10) -> List[tuple]:
in_degrees = [(node, G.in_degree(node)) for node in G.nodes()]
bottlenecks = [(n, d) for n, d in in_degrees if d > threshold]
return sorted(bottlenecks, key=lambda x: -x[1])
def find_clusters(G: nx.DiGraph) -> List[Set[str]]:
# Convert to undirected for community detection
G_undirected = G.to_undirected()
from networkx.algorithms.community import louvain_communities
return list(louvain_communities(G_undirected))
5. Implementation Guide
5.1 Development Environment Setup
# Create project
mkdir dependency-visualizer && cd dependency-visualizer
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install networkx matplotlib pyvis pyyaml requests click
# For Jaeger data extraction
pip install jaeger-client opentelemetry-exporter-jaeger
5.2 Project Structure
dependency-visualizer/
├── data/
│ ├── service-to-team.yaml
│ └── sample-traces.json
├── src/
│ ├── __init__.py
│ ├── extractors/
│ │ ├── jaeger.py
│ │ ├── zipkin.py
│ │ └── manual.py
│ ├── graph.py # Graph building
│ ├── analysis.py # Metrics calculation
│ ├── visualize.py # HTML/JS output
│ └── report.py # Markdown generation
├── output/
│ ├── graph.html
│ └── report.md
├── cli.py # Command-line interface
└── tests/
└── test_analysis.py
5.3 The Core Question You’re Answering
“Who is stopping us from moving faster, and is it a technical problem or an organizational one?”
Most “Technical Debt” is actually “Organizational Debt”—messy systems mirror messy team structures. Visualizing the spaghetti is the first step to untangling the architecture.
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Conway’s Law (The Inverse)
- If the code is spaghetti, the team structure is likely spaghetti.
- Book Reference: “Team Topologies” Ch. 2
- Fracture Planes
- What are the natural ways to split a large system?
- Book Reference: “Team Topologies” Ch. 8
- Graph Theory Basics
- Directed vs. undirected graphs
- Cycles, centrality, clustering
- Reference: NetworkX documentation
5.5 Questions to Guide Your Design
Before implementing, think through these:
Types of Dependencies
- Is it a “Design” dependency (I need their approval)?
- Is it a “Runtime” dependency (My service calls their API)?
- Is it a “Data” dependency (We share a database)?
Visual Encoding
- How do you show the “strength” of a dependency (line thickness)?
- How do you show direction (arrows)?
- How do you highlight problems (color, animation)?
Scope
- Do you include non-code dependencies (tickets, approvals)?
- Do you show individual services or aggregate to teams?
5.6 Thinking Exercise
The “Feature Trace”
Pick a feature that was recently launched.
Questions:
- How many teams had to touch the code?
- How many teams had to “approve” PRs?
- How many teams had to be in a deployment meeting?
- If > 3 teams for any of these, why?
Write down:
- The feature name
- Every team involved and their role
- Which dependencies were necessary vs. artificial
5.7 Hints in Layers
Hint 1: Use Distributed Tracing Data Jaeger/Zipkin already have service-to-service call data. Map services to teams.
Hint 2: Simplify the Nodes Don’t visualize 200 services. Aggregate to 10 teams. If Team A’s services call Team B’s services 10,000 times/day, that’s one thick line.
Hint 3: Look for Boundary Crossing Are there teams that only talk to each other? They should probably be one team.
Hint 4: Use NetworkX (Python) Calculate centrality and detect cycles with a few lines:
cycles = list(nx.simple_cycles(G))
centrality = nx.in_degree_centrality(G)
5.8 The Interview Questions They’ll Ask
Prepare to answer these:
- “How do you identify a ‘bottleneck team’?”
- High in-degree (many teams depend on them), long wait times for their services
- “What is a ‘Circular Dependency’ in an organizational context?”
- Team A waits on Team B waits on Team C waits on Team A. Deadlock.
- “How can you use Conway’s Law to fix architecture?”
- Restructure teams to match desired architecture. The code will follow.
- “Explain the concept of ‘Fracture Planes’.”
- Natural boundaries where systems can be split with minimal coordination.
- “When should you merge two teams into one?”
- When they’re in constant collaboration with no clear interface.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Fracture Planes | “Team Topologies” by Skelton & Pais | Ch. 8: Evolve Team Structures |
| Evolutionary Architecture | “Building Evolutionary Architectures” | Ch. 3: Engineering Incremental Change |
| System Design | “Designing Data-Intensive Applications” | Ch. 4 (on coupling) |
5.10 Implementation Phases
Phase 1: Data Extraction (5-7 hours)
- Build Jaeger/Zipkin data extractor
- Parse spans to service-to-service calls
- Load service-to-team mapping
Phase 2: Graph Building (4-5 hours)
- Aggregate service deps to team deps
- Build NetworkX graph
- Calculate basic metrics
Phase 3: Analysis (4-5 hours)
- Implement cycle detection
- Implement bottleneck detection
- Implement cluster detection
Phase 4: Visualization (5-7 hours)
- Generate interactive HTML using PyVis or D3.js
- Add color coding for risk levels
- Add click-through to details
Phase 5: Reporting (3-4 hours)
- Generate Markdown report
- Include recommendations
- Add historical comparison
5.11 Key Implementation Decisions
| Decision | Option A | Option B | Recommendation |
|---|---|---|---|
| Data source | Tracing only | Tracing + code scan | Tracing (runtime reality) |
| Visualization | Static PNG | Interactive HTML | Interactive (more useful) |
| Team aggregation | Always | Optional detail view | Both (zoom in/out) |
| Cycle highlighting | Text list | Graph animation | Both |
6. Testing Strategy
Unit Tests
def test_cycle_detection():
G = nx.DiGraph()
G.add_edges_from([('A', 'B'), ('B', 'C'), ('C', 'A')])
cycles = find_cycles(G)
assert len(cycles) == 1
assert set(cycles[0]) == {'A', 'B', 'C'}
def test_bottleneck_detection():
G = nx.DiGraph()
G.add_edges_from([
('A', 'Z'), ('B', 'Z'), ('C', 'Z'), ('D', 'Z')
])
bottlenecks = find_bottlenecks(G, threshold=2)
assert bottlenecks[0][0] == 'Z'
assert bottlenecks[0][1] == 4
Integration Tests
- Load real Jaeger data
- Verify graph matches expected structure
- Verify visualization renders without errors
Manual Validation
- Show graph to someone who knows the org
- Ask: “Does this match reality?”
7. Common Pitfalls & Debugging
| Problem | Symptom | Root Cause | Fix |
|---|---|---|---|
| Too many edges | Graph is unreadable | Showing service-level | Aggregate to teams |
| Missing services | Some deps not showing | Tracing gaps | Supplement with code scan |
| Wrong team mapping | Services assigned wrong | Stale service-to-team | Update mapping from catalog |
| Slow visualization | Browser hangs | Too many nodes/edges | Filter to top N deps |
8. Extensions & Challenges
Extension 1: Time-Series Analysis
Track dependencies over 4 quarters. Is the graph getting simpler or more complex?
Extension 2: Cost Attribution
Calculate the “cost” of each dependency (based on wait times or incidents).
Extension 3: Simulation
“What if we moved Service X from Team A to Team B?” Show the new graph.
Extension 4: Integration with Catalog
Pull service-to-team mapping from Backstage/service catalog.
9. Real-World Connections
Tools:
- Jaeger - Distributed tracing
- Zipkin - Distributed tracing
- Kiali - Service mesh visualization
- Backstage - Service catalog
How Big Tech Does This:
- Uber: Built internal dependency mapping tools
- Netflix: Service mesh with dependency visualization
- Google: Automatic dependency tracking across all services
10. Resources
NetworkX
Visualization
- PyVis - Python library for network visualization
- D3.js Force Graph
- Cytoscape.js
Related Projects
- P01: Team Interaction Audit - Communication patterns
- P11: Service Catalog - Service metadata
11. Self-Assessment Checklist
Before considering this project complete, verify:
- I can explain Conway’s Law and fracture planes
- Data extraction works for at least one source
- Graph correctly aggregates services to teams
- Cycle detection finds all cycles
- Bottleneck detection identifies high in-degree nodes
- Visualization is interactive and readable
- Report includes actionable recommendations
- At least one stakeholder has validated the graph
12. Submission / Completion Criteria
This project is complete when you have:
- Data extraction from Jaeger/Zipkin or manual input
- Service-to-team mapping for all services
- NetworkX graph with calculated metrics
- Interactive visualization (HTML)
- Markdown report with:
- All cycles identified
- Top 3 bottleneck teams
- Fracture plane recommendations
- Validation from someone who knows the organization
Previous Project: P07: Service Level Expectation Agreement Next Project: P09: Operational Readiness Review System