Project 8: The Dependency Spaghetti Visualizer

Build a tool that extracts dependency data and visualizes team-to-team dependencies, identifying circular dependencies and bottleneck teams.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	2 Weeks (25-35 hours)
Primary Language	Python (NetworkX) / Graphviz
Alternative Languages	JavaScript (Cytoscape.js), Neo4j
Prerequisites	Graph theory basics, distributed tracing concepts
Key Topics	Conway’s Law, Fracture Planes, System Architecture

1. Learning Objectives

By completing this project, you will:

Extract dependency data from distributed tracing or code analysis
Build a dependency graph representing team-to-team relationships
Identify architectural problems: circular dependencies, bottlenecks
Visualize the “spaghetti” in an actionable way
Recommend fracture planes for organizational redesign

2. Theoretical Foundation

2.1 Core Concepts

The Distributed Monolith Problem

EXPECTED (Microservices)          REALITY (Distributed Monolith)

   ┌────┐     ┌────┐                ┌────┐───────┌────┐
   │ A  │────►│ B  │              ┌─┤ A  │◄─────►│ B  │─┐
   └────┘     └────┘              │ └────┘       └────┘ │
                                  │    ▲            ▲   │
   ┌────┐     ┌────┐              │    │            │   │
   │ C  │────►│ D  │              │    ▼            ▼   │
   └────┘     └────┘              │ ┌────┐       ┌────┐│
                                  └►│ C  │◄─────►│ D  ├┘
   Clean interfaces                 └────┘       └────┘
   Independent deploys              Circular deps
                                    Can't deploy alone

Conway’s Law in Reverse

If your system looks like spaghetti, your organization probably looks like spaghetti too.

CODE DEPENDENCY                    TEAM DEPENDENCY
┌──────────────────────────┐      ┌──────────────────────────┐
│ checkout-service         │      │ Checkout Team            │
│   → payment-service      │ ==>  │   → Talks to Payments    │
│   → inventory-service    │      │   → Talks to Inventory   │
│   → user-service         │      │   → Talks to Identity    │
└──────────────────────────┘      └──────────────────────────┘

If checkout calls 5 services, the Checkout team coordinates with 5 teams.

Fracture Planes

Natural boundaries where a system can be split with minimal communication:

Fracture Plane Type	Example
Business Domain	Payments vs. Shipping
Data Isolation	Each service owns its data
Technology	Python services vs. Go services
Regulation	PCI (payments) vs. non-PCI
Performance	Real-time vs. batch
Change Cadence	Fast-changing vs. stable

Graph Metrics for Organizations

Metric	Definition	What It Reveals
In-Degree	How many teams depend on you	Bottleneck risk
Out-Degree	How many teams you depend on	Coordination overhead
Cycles	Circular dependencies	Architecture smell
Centrality	How “central” a team is	Critical path risk
Clusters	Groups that only talk internally	Potential merge candidates

2.2 Why This Matters

Dependencies are the “Enemy of Flow.”

DEPLOY PROCESS: Feature in Checkout

Without Clear Boundaries:
Checkout ─┬─► Payment Team (wait 2 days for approval)
          ├─► DB Team (wait 1 day for schema change)
          ├─► Security (wait 3 days for review)
          └─► Platform (wait 1 day for config)

Total: 7 days of waiting, 4 hours of work

With Clear Boundaries:
Checkout ──► Self-service deploy

Total: 4 hours of work, deployed same day

2.3 Historical Context

Microservices Architecture (2011+): Promise of independent deployment
Distributed Tracing (2010s): Jaeger, Zipkin made dependencies visible
Team Topologies (2019): Formalized fracture planes concept

2.4 Common Misconceptions

Misconception	Reality
“More services = more agility”	More services without ownership = more coordination
“APIs solve coupling”	You can have tight coupling over APIs
“Dependencies are just technical”	Technical deps create organizational deps
“We can refactor later”	Organizational dependencies are harder to change than code

3. Project Specification

3.1 What You Will Build

Dependency Extractor: Pull data from tracing, code, or surveys
Graph Builder: Create team-to-team dependency graph
Analyzer: Calculate metrics (cycles, bottlenecks, clusters)
Visualizer: Interactive graph visualization
Reporter: Recommendations for organizational change

3.2 Functional Requirements

Data Sources
- Distributed tracing (Jaeger/Zipkin spans)
- Code analysis (import statements, API calls)
- Service catalog (declared dependencies)
- Manual input (for non-code deps)
Graph Operations
- Build directed graph of team dependencies
- Calculate graph metrics
- Detect cycles
- Identify clusters
Visualization
- Interactive 2D/3D graph
- Click to see dependency details
- Filter by team, criticality, type
- Highlight problems (cycles, bottlenecks)
Reporting
- Top 5 bottleneck teams
- All circular dependencies
- Recommended fracture planes
- Comparison over time

3.3 Non-Functional Requirements

Handle 50+ teams, 200+ services
Visualization must be interactive (not just static image)
Analysis must complete in < 30 seconds
Support for incremental updates

3.4 Example Usage / Output

Dependency Report:

# Dependency Analysis Report - Acme Corp

Generated: 2025-01-15

## Executive Summary

- **Teams Analyzed**: 12
- **Services Analyzed**: 45
- **Dependencies Found**: 128
- **Circular Dependencies**: 3 (HIGH RISK)
- **Bottleneck Teams**: 2

---

## Critical Findings

### 1. Circular Dependencies (MUST FIX)

| Cycle | Services Involved | Impact |
|-------|-------------------|--------|
| Cycle 1 | checkout → payments → promotions → checkout | Deployment deadlock |
| Cycle 2 | user → auth → session → user | Data consistency issues |
| Cycle 3 | order → inventory → shipping → order | Cannot test in isolation |

**Recommendation**: Break Cycle 1 by introducing event-driven communication between
promotions and checkout instead of synchronous API calls.

---

### 2. Bottleneck Teams (HIGH RISK)

| Team | In-Degree | Impact |
|------|-----------|--------|
| Platform | 42 | Every team depends on Platform |
| Identity | 38 | Every user-facing service calls Identity |

**Recommendation**:
- Platform should expose more self-service capabilities
- Identity should be split into Auth (fast path) and Profile (heavy reads)

---

### 3. Dependency Clusters

Teams that only talk to each other (potential merge candidates):

| Cluster | Teams | Internal Deps | External Deps |
|---------|-------|---------------|---------------|
| Commerce | Checkout, Cart, Payments | 15 | 3 |
| Logistics | Shipping, Inventory, Warehouse | 12 | 2 |

**Recommendation**: Commerce cluster could be one "Commerce Domain" team
with internal service ownership.

---

## Visualization

[Interactive Graph: http://deps.acme.internal/graph]

CLI Output:

$ ./deps analyze --source jaeger --days 7

Loading trace data... 45,230 spans loaded
Mapping services to teams... 45 services, 12 teams
Building dependency graph...

=== ANALYSIS RESULTS ===

[GRAPH STATS]
Nodes (Teams): 12
Edges (Dependencies): 128
Average In-Degree: 10.7
Average Out-Degree: 10.7

[CYCLES DETECTED: 3]
  ⚠️  checkout → payments → promotions → checkout
  ⚠️  user → auth → session → user
  ⚠️  order → inventory → shipping → order

[BOTTLENECK TEAMS]
  🔴 Platform: 42 teams depend on this (CRITICAL)
  🟡 Identity: 38 teams depend on this (HIGH)
  🟢 Data: 5 teams depend on this (NORMAL)

[CLUSTER ANALYSIS]
  📦 Commerce Cluster: Checkout, Cart, Payments
     Internal density: 0.87 (very tight)
     External deps: 3
     Recommendation: Consider merging into one domain team

[FRACTURE PLANE SUGGESTIONS]
  1. Commerce vs. Logistics (clean domain split)
  2. Sync vs. Async services (communication style)
  3. PCI vs. non-PCI (regulatory boundary)

Generating visualization...
Graph saved to: output/dependency-graph.html
Report saved to: output/dependency-report.md

Interactive Graph (Conceptual):

                         ┌─────────────────────┐
                         │     PLATFORM        │
                         │   (BOTTLENECK)      │
                         │   In-Degree: 42     │
                         └─────────┬───────────┘
                                   │
     ┌──────────────┬──────────────┼──────────────┬──────────────┐
     │              │              │              │              │
     ▼              ▼              ▼              ▼              ▼
┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐
│Checkout │◄─►│ Payment │◄─►│  Cart   │   │ Orders  │   │Identity │
│         │   │         │   │         │   │         │   │(Bottlen)│
└────┬────┘   └────┬────┘   └─────────┘   └─────────┘   └─────────┘
     │              │
     └──────────────┴───► CYCLE DETECTED!
              │
              ▼
        ┌─────────┐
        │Promotions│
        │         │
        └─────────┘

3.5 Real World Outcome

After running this analysis:

Identify 2-3 circular dependencies to break
Find the 1-2 teams that are organizational bottlenecks
Propose team restructuring based on natural boundaries
Track dependency changes over time (is it getting better or worse?)

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                  DEPENDENCY VISUALIZATION SYSTEM                │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│  DATA SOURCES │     │  GRAPH BUILD  │     │  ANALYSIS     │
│               │     │               │     │               │
│ - Jaeger      │────►│ - NetworkX    │────►│ - Cycles      │
│ - Zipkin      │     │ - Team map    │     │ - Centrality  │
│ - Code scan   │     │ - Aggregation │     │ - Clusters    │
│ - Manual      │     │               │     │               │
└───────────────┘     └───────────────┘     └───────────────┘
                              │                     │
                              ▼                     ▼
                    ┌─────────────────────────────────────────┐
                    │              VISUALIZATION              │
                    │                                         │
                    │  - D3.js / Cytoscape.js                │
                    │  - Interactive zoom/filter              │
                    │  - Color-coded by risk                 │
                    └─────────────────────────────────────────┘

4.2 Key Components

Data Extractors: Plugins for different data sources
Service-to-Team Mapper: Links services to owning teams
Graph Builder: NetworkX-based graph construction
Analyzer: Graph algorithms for metrics
Visualizer: Interactive HTML/JavaScript output
Reporter: Markdown report generator

4.3 Data Structures

# models.py
from dataclasses import dataclass
from typing import List, Set

@dataclass
class Service:
    id: str
    name: str
    owner_team: str
    tier: int  # 1=critical, 2=important, 3=normal

@dataclass
class Dependency:
    source_service: str
    target_service: str
    call_count: int  # From tracing
    dependency_type: str  # runtime, build, data

@dataclass
class TeamDependency:
    source_team: str
    target_team: str
    services_involved: List[tuple]  # [(src_svc, tgt_svc), ...]
    total_calls: int
    is_bidirectional: bool

@dataclass
class AnalysisResult:
    teams: List[str]
    dependencies: List[TeamDependency]
    cycles: List[List[str]]  # Each cycle is list of teams
    bottlenecks: List[tuple]  # [(team, in_degree), ...]
    clusters: List[Set[str]]  # Groups of highly connected teams

# service-to-team.yaml
services:
  checkout-api:
    team: team-checkout
    tier: 1
  payment-gateway:
    team: team-payments
    tier: 1
  inventory-service:
    team: team-inventory
    tier: 2

4.4 Algorithm Overview

import networkx as nx
from collections import defaultdict

def build_team_graph(services: dict, dependencies: List[Dependency]) -> nx.DiGraph:
    G = nx.DiGraph()

    # Add all teams as nodes
    teams = set(s['team'] for s in services.values())
    G.add_nodes_from(teams)

    # Aggregate service deps to team deps
    team_edges = defaultdict(lambda: {'count': 0, 'services': []})

    for dep in dependencies:
        src_team = services[dep.source_service]['team']
        tgt_team = services[dep.target_service]['team']

        if src_team != tgt_team:  # Skip intra-team
            key = (src_team, tgt_team)
            team_edges[key]['count'] += dep.call_count
            team_edges[key]['services'].append(
                (dep.source_service, dep.target_service)
            )

    # Add edges
    for (src, tgt), data in team_edges.items():
        G.add_edge(src, tgt, weight=data['count'], services=data['services'])

    return G

def find_cycles(G: nx.DiGraph) -> List[List[str]]:
    return list(nx.simple_cycles(G))

def find_bottlenecks(G: nx.DiGraph, threshold: int = 10) -> List[tuple]:
    in_degrees = [(node, G.in_degree(node)) for node in G.nodes()]
    bottlenecks = [(n, d) for n, d in in_degrees if d > threshold]
    return sorted(bottlenecks, key=lambda x: -x[1])

def find_clusters(G: nx.DiGraph) -> List[Set[str]]:
    # Convert to undirected for community detection
    G_undirected = G.to_undirected()
    from networkx.algorithms.community import louvain_communities
    return list(louvain_communities(G_undirected))

5. Implementation Guide

5.1 Development Environment Setup

# Create project
mkdir dependency-visualizer && cd dependency-visualizer
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install networkx matplotlib pyvis pyyaml requests click

# For Jaeger data extraction
pip install jaeger-client opentelemetry-exporter-jaeger

5.2 Project Structure

dependency-visualizer/
├── data/
│   ├── service-to-team.yaml
│   └── sample-traces.json
├── src/
│   ├── __init__.py
│   ├── extractors/
│   │   ├── jaeger.py
│   │   ├── zipkin.py
│   │   └── manual.py
│   ├── graph.py          # Graph building
│   ├── analysis.py       # Metrics calculation
│   ├── visualize.py      # HTML/JS output
│   └── report.py         # Markdown generation
├── output/
│   ├── graph.html
│   └── report.md
├── cli.py                # Command-line interface
└── tests/
    └── test_analysis.py

5.3 The Core Question You’re Answering

“Who is stopping us from moving faster, and is it a technical problem or an organizational one?”

Most “Technical Debt” is actually “Organizational Debt”—messy systems mirror messy team structures. Visualizing the spaghetti is the first step to untangling the architecture.

5.4 Concepts You Must Understand First

Stop and research these before coding:

Conway’s Law (The Inverse)
- If the code is spaghetti, the team structure is likely spaghetti.
- Book Reference: “Team Topologies” Ch. 2
Fracture Planes
- What are the natural ways to split a large system?
- Book Reference: “Team Topologies” Ch. 8
Graph Theory Basics
- Directed vs. undirected graphs
- Cycles, centrality, clustering
- Reference: NetworkX documentation

5.5 Questions to Guide Your Design

Before implementing, think through these:

Types of Dependencies

Is it a “Design” dependency (I need their approval)?
Is it a “Runtime” dependency (My service calls their API)?
Is it a “Data” dependency (We share a database)?

Visual Encoding

How do you show the “strength” of a dependency (line thickness)?
How do you show direction (arrows)?
How do you highlight problems (color, animation)?

Scope

Do you include non-code dependencies (tickets, approvals)?
Do you show individual services or aggregate to teams?

5.6 Thinking Exercise

The “Feature Trace”

Pick a feature that was recently launched.

Questions:

How many teams had to touch the code?
How many teams had to “approve” PRs?
How many teams had to be in a deployment meeting?
If > 3 teams for any of these, why?

Write down:

The feature name
Every team involved and their role
Which dependencies were necessary vs. artificial

5.7 Hints in Layers

Hint 1: Use Distributed Tracing Data Jaeger/Zipkin already have service-to-service call data. Map services to teams.

Hint 2: Simplify the Nodes Don’t visualize 200 services. Aggregate to 10 teams. If Team A’s services call Team B’s services 10,000 times/day, that’s one thick line.

Hint 3: Look for Boundary Crossing Are there teams that only talk to each other? They should probably be one team.

Hint 4: Use NetworkX (Python) Calculate centrality and detect cycles with a few lines:

cycles = list(nx.simple_cycles(G))
centrality = nx.in_degree_centrality(G)

5.8 The Interview Questions They’ll Ask

Prepare to answer these:

“How do you identify a ‘bottleneck team’?”
- High in-degree (many teams depend on them), long wait times for their services
“What is a ‘Circular Dependency’ in an organizational context?”
- Team A waits on Team B waits on Team C waits on Team A. Deadlock.
“How can you use Conway’s Law to fix architecture?”
- Restructure teams to match desired architecture. The code will follow.
“Explain the concept of ‘Fracture Planes’.”
- Natural boundaries where systems can be split with minimal coordination.
“When should you merge two teams into one?”
- When they’re in constant collaboration with no clear interface.

5.9 Books That Will Help

Topic	Book	Chapter
Fracture Planes	“Team Topologies” by Skelton & Pais	Ch. 8: Evolve Team Structures
Evolutionary Architecture	“Building Evolutionary Architectures”	Ch. 3: Engineering Incremental Change
System Design	“Designing Data-Intensive Applications”	Ch. 4 (on coupling)

5.10 Implementation Phases

Phase 1: Data Extraction (5-7 hours)

Build Jaeger/Zipkin data extractor
Parse spans to service-to-service calls
Load service-to-team mapping

Phase 2: Graph Building (4-5 hours)

Aggregate service deps to team deps
Build NetworkX graph
Calculate basic metrics

Phase 3: Analysis (4-5 hours)

Implement cycle detection
Implement bottleneck detection
Implement cluster detection

Phase 4: Visualization (5-7 hours)

Generate interactive HTML using PyVis or D3.js
Add color coding for risk levels
Add click-through to details

Phase 5: Reporting (3-4 hours)

Generate Markdown report
Include recommendations
Add historical comparison

5.11 Key Implementation Decisions

Decision	Option A	Option B	Recommendation
Data source	Tracing only	Tracing + code scan	Tracing (runtime reality)
Visualization	Static PNG	Interactive HTML	Interactive (more useful)
Team aggregation	Always	Optional detail view	Both (zoom in/out)
Cycle highlighting	Text list	Graph animation	Both

6. Testing Strategy

Unit Tests

def test_cycle_detection():
    G = nx.DiGraph()
    G.add_edges_from([('A', 'B'), ('B', 'C'), ('C', 'A')])
    cycles = find_cycles(G)
    assert len(cycles) == 1
    assert set(cycles[0]) == {'A', 'B', 'C'}

def test_bottleneck_detection():
    G = nx.DiGraph()
    G.add_edges_from([
        ('A', 'Z'), ('B', 'Z'), ('C', 'Z'), ('D', 'Z')
    ])
    bottlenecks = find_bottlenecks(G, threshold=2)
    assert bottlenecks[0][0] == 'Z'
    assert bottlenecks[0][1] == 4

Integration Tests

Load real Jaeger data
Verify graph matches expected structure
Verify visualization renders without errors

Manual Validation

Show graph to someone who knows the org
Ask: “Does this match reality?”

7. Common Pitfalls & Debugging

Problem	Symptom	Root Cause	Fix
Too many edges	Graph is unreadable	Showing service-level	Aggregate to teams
Missing services	Some deps not showing	Tracing gaps	Supplement with code scan
Wrong team mapping	Services assigned wrong	Stale service-to-team	Update mapping from catalog
Slow visualization	Browser hangs	Too many nodes/edges	Filter to top N deps

8. Extensions & Challenges

Extension 1: Time-Series Analysis

Track dependencies over 4 quarters. Is the graph getting simpler or more complex?

Extension 2: Cost Attribution

Calculate the “cost” of each dependency (based on wait times or incidents).

Extension 3: Simulation

“What if we moved Service X from Team A to Team B?” Show the new graph.

Extension 4: Integration with Catalog

Pull service-to-team mapping from Backstage/service catalog.

9. Real-World Connections

Tools:

Jaeger - Distributed tracing
Zipkin - Distributed tracing
Kiali - Service mesh visualization
Backstage - Service catalog

How Big Tech Does This:

Uber: Built internal dependency mapping tools
Netflix: Service mesh with dependency visualization
Google: Automatic dependency tracking across all services

10. Resources

NetworkX

Visualization

PyVis - Python library for network visualization
D3.js Force Graph
Cytoscape.js

P01: Team Interaction Audit - Communication patterns
P11: Service Catalog - Service metadata

11. Self-Assessment Checklist

Before considering this project complete, verify:

I can explain Conway’s Law and fracture planes
Data extraction works for at least one source
Graph correctly aggregates services to teams
Cycle detection finds all cycles
Bottleneck detection identifies high in-degree nodes
Visualization is interactive and readable
Report includes actionable recommendations
At least one stakeholder has validated the graph

12. Submission / Completion Criteria

This project is complete when you have:

Data extraction from Jaeger/Zipkin or manual input
Service-to-team mapping for all services
NetworkX graph with calculated metrics
Interactive visualization (HTML)
Markdown report with:
- All cycles identified
- Top 3 bottleneck teams
- Fracture plane recommendations
Validation from someone who knows the organization

Previous Project: P07: Service Level Expectation Agreement Next Project: P09: Operational Readiness Review System