Project 8: The Dependency Spaghetti Visualizer

Build a tool that extracts dependency data and visualizes team-to-team dependencies, identifying circular dependencies and bottleneck teams.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2 Weeks (25-35 hours)
Primary Language Python (NetworkX) / Graphviz
Alternative Languages JavaScript (Cytoscape.js), Neo4j
Prerequisites Graph theory basics, distributed tracing concepts
Key Topics Conway’s Law, Fracture Planes, System Architecture

1. Learning Objectives

By completing this project, you will:

  1. Extract dependency data from distributed tracing or code analysis
  2. Build a dependency graph representing team-to-team relationships
  3. Identify architectural problems: circular dependencies, bottlenecks
  4. Visualize the “spaghetti” in an actionable way
  5. Recommend fracture planes for organizational redesign

2. Theoretical Foundation

2.1 Core Concepts

The Distributed Monolith Problem

EXPECTED (Microservices)          REALITY (Distributed Monolith)

   ┌────┐     ┌────┐                ┌────┐───────┌────┐
   │ A  │────►│ B  │              ┌─┤ A  │◄─────►│ B  │─┐
   └────┘     └────┘              │ └────┘       └────┘ │
                                  │    ▲            ▲   │
   ┌────┐     ┌────┐              │    │            │   │
   │ C  │────►│ D  │              │    ▼            ▼   │
   └────┘     └────┘              │ ┌────┐       ┌────┐│
                                  └►│ C  │◄─────►│ D  ├┘
   Clean interfaces                 └────┘       └────┘
   Independent deploys              Circular deps
                                    Can't deploy alone

Conway’s Law in Reverse

If your system looks like spaghetti, your organization probably looks like spaghetti too.

CODE DEPENDENCY                    TEAM DEPENDENCY
┌──────────────────────────┐      ┌──────────────────────────┐
│ checkout-service         │      │ Checkout Team            │
│   → payment-service      │ ==>  │   → Talks to Payments    │
│   → inventory-service    │      │   → Talks to Inventory   │
│   → user-service         │      │   → Talks to Identity    │
└──────────────────────────┘      └──────────────────────────┘

If checkout calls 5 services, the Checkout team coordinates with 5 teams.

Fracture Planes

Natural boundaries where a system can be split with minimal communication:

Fracture Plane Type Example
Business Domain Payments vs. Shipping
Data Isolation Each service owns its data
Technology Python services vs. Go services
Regulation PCI (payments) vs. non-PCI
Performance Real-time vs. batch
Change Cadence Fast-changing vs. stable

Graph Metrics for Organizations

Metric Definition What It Reveals
In-Degree How many teams depend on you Bottleneck risk
Out-Degree How many teams you depend on Coordination overhead
Cycles Circular dependencies Architecture smell
Centrality How “central” a team is Critical path risk
Clusters Groups that only talk internally Potential merge candidates

2.2 Why This Matters

Dependencies are the “Enemy of Flow.”

DEPLOY PROCESS: Feature in Checkout

Without Clear Boundaries:
Checkout ─┬─► Payment Team (wait 2 days for approval)
          ├─► DB Team (wait 1 day for schema change)
          ├─► Security (wait 3 days for review)
          └─► Platform (wait 1 day for config)

Total: 7 days of waiting, 4 hours of work

With Clear Boundaries:
Checkout ──► Self-service deploy

Total: 4 hours of work, deployed same day

2.3 Historical Context

  • Microservices Architecture (2011+): Promise of independent deployment
  • Distributed Tracing (2010s): Jaeger, Zipkin made dependencies visible
  • Team Topologies (2019): Formalized fracture planes concept

2.4 Common Misconceptions

Misconception Reality
“More services = more agility” More services without ownership = more coordination
“APIs solve coupling” You can have tight coupling over APIs
“Dependencies are just technical” Technical deps create organizational deps
“We can refactor later” Organizational dependencies are harder to change than code

3. Project Specification

3.1 What You Will Build

  1. Dependency Extractor: Pull data from tracing, code, or surveys
  2. Graph Builder: Create team-to-team dependency graph
  3. Analyzer: Calculate metrics (cycles, bottlenecks, clusters)
  4. Visualizer: Interactive graph visualization
  5. Reporter: Recommendations for organizational change

3.2 Functional Requirements

  1. Data Sources
    • Distributed tracing (Jaeger/Zipkin spans)
    • Code analysis (import statements, API calls)
    • Service catalog (declared dependencies)
    • Manual input (for non-code deps)
  2. Graph Operations
    • Build directed graph of team dependencies
    • Calculate graph metrics
    • Detect cycles
    • Identify clusters
  3. Visualization
    • Interactive 2D/3D graph
    • Click to see dependency details
    • Filter by team, criticality, type
    • Highlight problems (cycles, bottlenecks)
  4. Reporting
    • Top 5 bottleneck teams
    • All circular dependencies
    • Recommended fracture planes
    • Comparison over time

3.3 Non-Functional Requirements

  • Handle 50+ teams, 200+ services
  • Visualization must be interactive (not just static image)
  • Analysis must complete in < 30 seconds
  • Support for incremental updates

3.4 Example Usage / Output

Dependency Report:

# Dependency Analysis Report - Acme Corp

Generated: 2025-01-15

## Executive Summary

- **Teams Analyzed**: 12
- **Services Analyzed**: 45
- **Dependencies Found**: 128
- **Circular Dependencies**: 3 (HIGH RISK)
- **Bottleneck Teams**: 2

---

## Critical Findings

### 1. Circular Dependencies (MUST FIX)

| Cycle | Services Involved | Impact |
|-------|-------------------|--------|
| Cycle 1 | checkout → payments → promotions → checkout | Deployment deadlock |
| Cycle 2 | user → auth → session → user | Data consistency issues |
| Cycle 3 | order → inventory → shipping → order | Cannot test in isolation |

**Recommendation**: Break Cycle 1 by introducing event-driven communication between
promotions and checkout instead of synchronous API calls.

---

### 2. Bottleneck Teams (HIGH RISK)

| Team | In-Degree | Impact |
|------|-----------|--------|
| Platform | 42 | Every team depends on Platform |
| Identity | 38 | Every user-facing service calls Identity |

**Recommendation**:
- Platform should expose more self-service capabilities
- Identity should be split into Auth (fast path) and Profile (heavy reads)

---

### 3. Dependency Clusters

Teams that only talk to each other (potential merge candidates):

| Cluster | Teams | Internal Deps | External Deps |
|---------|-------|---------------|---------------|
| Commerce | Checkout, Cart, Payments | 15 | 3 |
| Logistics | Shipping, Inventory, Warehouse | 12 | 2 |

**Recommendation**: Commerce cluster could be one "Commerce Domain" team
with internal service ownership.

---

## Visualization

[Interactive Graph: http://deps.acme.internal/graph]

CLI Output:

$ ./deps analyze --source jaeger --days 7

Loading trace data... 45,230 spans loaded
Mapping services to teams... 45 services, 12 teams
Building dependency graph...

=== ANALYSIS RESULTS ===

[GRAPH STATS]
Nodes (Teams): 12
Edges (Dependencies): 128
Average In-Degree: 10.7
Average Out-Degree: 10.7

[CYCLES DETECTED: 3]
  ⚠️  checkout → payments → promotions → checkout
  ⚠️  user → auth → session → user
  ⚠️  order → inventory → shipping → order

[BOTTLENECK TEAMS]
  🔴 Platform: 42 teams depend on this (CRITICAL)
  🟡 Identity: 38 teams depend on this (HIGH)
  🟢 Data: 5 teams depend on this (NORMAL)

[CLUSTER ANALYSIS]
  📦 Commerce Cluster: Checkout, Cart, Payments
     Internal density: 0.87 (very tight)
     External deps: 3
     Recommendation: Consider merging into one domain team

[FRACTURE PLANE SUGGESTIONS]
  1. Commerce vs. Logistics (clean domain split)
  2. Sync vs. Async services (communication style)
  3. PCI vs. non-PCI (regulatory boundary)

Generating visualization...
Graph saved to: output/dependency-graph.html
Report saved to: output/dependency-report.md

Interactive Graph (Conceptual):

                         ┌─────────────────────┐
                         │     PLATFORM        │
                         │   (BOTTLENECK)      │
                         │   In-Degree: 42     │
                         └─────────┬───────────┘
                                   │
     ┌──────────────┬──────────────┼──────────────┬──────────────┐
     │              │              │              │              │
     ▼              ▼              ▼              ▼              ▼
┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐
│Checkout │◄─►│ Payment │◄─►│  Cart   │   │ Orders  │   │Identity │
│         │   │         │   │         │   │         │   │(Bottlen)│
└────┬────┘   └────┬────┘   └─────────┘   └─────────┘   └─────────┘
     │              │
     └──────────────┴───► CYCLE DETECTED!
              │
              ▼
        ┌─────────┐
        │Promotions│
        │         │
        └─────────┘

3.5 Real World Outcome

After running this analysis:

  • Identify 2-3 circular dependencies to break
  • Find the 1-2 teams that are organizational bottlenecks
  • Propose team restructuring based on natural boundaries
  • Track dependency changes over time (is it getting better or worse?)

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                  DEPENDENCY VISUALIZATION SYSTEM                │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│  DATA SOURCES │     │  GRAPH BUILD  │     │  ANALYSIS     │
│               │     │               │     │               │
│ - Jaeger      │────►│ - NetworkX    │────►│ - Cycles      │
│ - Zipkin      │     │ - Team map    │     │ - Centrality  │
│ - Code scan   │     │ - Aggregation │     │ - Clusters    │
│ - Manual      │     │               │     │               │
└───────────────┘     └───────────────┘     └───────────────┘
                              │                     │
                              ▼                     ▼
                    ┌─────────────────────────────────────────┐
                    │              VISUALIZATION              │
                    │                                         │
                    │  - D3.js / Cytoscape.js                │
                    │  - Interactive zoom/filter              │
                    │  - Color-coded by risk                 │
                    └─────────────────────────────────────────┘

4.2 Key Components

  1. Data Extractors: Plugins for different data sources
  2. Service-to-Team Mapper: Links services to owning teams
  3. Graph Builder: NetworkX-based graph construction
  4. Analyzer: Graph algorithms for metrics
  5. Visualizer: Interactive HTML/JavaScript output
  6. Reporter: Markdown report generator

4.3 Data Structures

# models.py
from dataclasses import dataclass
from typing import List, Set

@dataclass
class Service:
    id: str
    name: str
    owner_team: str
    tier: int  # 1=critical, 2=important, 3=normal

@dataclass
class Dependency:
    source_service: str
    target_service: str
    call_count: int  # From tracing
    dependency_type: str  # runtime, build, data

@dataclass
class TeamDependency:
    source_team: str
    target_team: str
    services_involved: List[tuple]  # [(src_svc, tgt_svc), ...]
    total_calls: int
    is_bidirectional: bool

@dataclass
class AnalysisResult:
    teams: List[str]
    dependencies: List[TeamDependency]
    cycles: List[List[str]]  # Each cycle is list of teams
    bottlenecks: List[tuple]  # [(team, in_degree), ...]
    clusters: List[Set[str]]  # Groups of highly connected teams
# service-to-team.yaml
services:
  checkout-api:
    team: team-checkout
    tier: 1
  payment-gateway:
    team: team-payments
    tier: 1
  inventory-service:
    team: team-inventory
    tier: 2

4.4 Algorithm Overview

import networkx as nx
from collections import defaultdict

def build_team_graph(services: dict, dependencies: List[Dependency]) -> nx.DiGraph:
    G = nx.DiGraph()

    # Add all teams as nodes
    teams = set(s['team'] for s in services.values())
    G.add_nodes_from(teams)

    # Aggregate service deps to team deps
    team_edges = defaultdict(lambda: {'count': 0, 'services': []})

    for dep in dependencies:
        src_team = services[dep.source_service]['team']
        tgt_team = services[dep.target_service]['team']

        if src_team != tgt_team:  # Skip intra-team
            key = (src_team, tgt_team)
            team_edges[key]['count'] += dep.call_count
            team_edges[key]['services'].append(
                (dep.source_service, dep.target_service)
            )

    # Add edges
    for (src, tgt), data in team_edges.items():
        G.add_edge(src, tgt, weight=data['count'], services=data['services'])

    return G

def find_cycles(G: nx.DiGraph) -> List[List[str]]:
    return list(nx.simple_cycles(G))

def find_bottlenecks(G: nx.DiGraph, threshold: int = 10) -> List[tuple]:
    in_degrees = [(node, G.in_degree(node)) for node in G.nodes()]
    bottlenecks = [(n, d) for n, d in in_degrees if d > threshold]
    return sorted(bottlenecks, key=lambda x: -x[1])

def find_clusters(G: nx.DiGraph) -> List[Set[str]]:
    # Convert to undirected for community detection
    G_undirected = G.to_undirected()
    from networkx.algorithms.community import louvain_communities
    return list(louvain_communities(G_undirected))

5. Implementation Guide

5.1 Development Environment Setup

# Create project
mkdir dependency-visualizer && cd dependency-visualizer
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install networkx matplotlib pyvis pyyaml requests click

# For Jaeger data extraction
pip install jaeger-client opentelemetry-exporter-jaeger

5.2 Project Structure

dependency-visualizer/
├── data/
│   ├── service-to-team.yaml
│   └── sample-traces.json
├── src/
│   ├── __init__.py
│   ├── extractors/
│   │   ├── jaeger.py
│   │   ├── zipkin.py
│   │   └── manual.py
│   ├── graph.py          # Graph building
│   ├── analysis.py       # Metrics calculation
│   ├── visualize.py      # HTML/JS output
│   └── report.py         # Markdown generation
├── output/
│   ├── graph.html
│   └── report.md
├── cli.py                # Command-line interface
└── tests/
    └── test_analysis.py

5.3 The Core Question You’re Answering

“Who is stopping us from moving faster, and is it a technical problem or an organizational one?”

Most “Technical Debt” is actually “Organizational Debt”—messy systems mirror messy team structures. Visualizing the spaghetti is the first step to untangling the architecture.

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Conway’s Law (The Inverse)
    • If the code is spaghetti, the team structure is likely spaghetti.
    • Book Reference: “Team Topologies” Ch. 2
  2. Fracture Planes
    • What are the natural ways to split a large system?
    • Book Reference: “Team Topologies” Ch. 8
  3. Graph Theory Basics
    • Directed vs. undirected graphs
    • Cycles, centrality, clustering
    • Reference: NetworkX documentation

5.5 Questions to Guide Your Design

Before implementing, think through these:

Types of Dependencies

  • Is it a “Design” dependency (I need their approval)?
  • Is it a “Runtime” dependency (My service calls their API)?
  • Is it a “Data” dependency (We share a database)?

Visual Encoding

  • How do you show the “strength” of a dependency (line thickness)?
  • How do you show direction (arrows)?
  • How do you highlight problems (color, animation)?

Scope

  • Do you include non-code dependencies (tickets, approvals)?
  • Do you show individual services or aggregate to teams?

5.6 Thinking Exercise

The “Feature Trace”

Pick a feature that was recently launched.

Questions:

  1. How many teams had to touch the code?
  2. How many teams had to “approve” PRs?
  3. How many teams had to be in a deployment meeting?
  4. If > 3 teams for any of these, why?

Write down:

  • The feature name
  • Every team involved and their role
  • Which dependencies were necessary vs. artificial

5.7 Hints in Layers

Hint 1: Use Distributed Tracing Data Jaeger/Zipkin already have service-to-service call data. Map services to teams.

Hint 2: Simplify the Nodes Don’t visualize 200 services. Aggregate to 10 teams. If Team A’s services call Team B’s services 10,000 times/day, that’s one thick line.

Hint 3: Look for Boundary Crossing Are there teams that only talk to each other? They should probably be one team.

Hint 4: Use NetworkX (Python) Calculate centrality and detect cycles with a few lines:

cycles = list(nx.simple_cycles(G))
centrality = nx.in_degree_centrality(G)

5.8 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How do you identify a ‘bottleneck team’?”
    • High in-degree (many teams depend on them), long wait times for their services
  2. “What is a ‘Circular Dependency’ in an organizational context?”
    • Team A waits on Team B waits on Team C waits on Team A. Deadlock.
  3. “How can you use Conway’s Law to fix architecture?”
    • Restructure teams to match desired architecture. The code will follow.
  4. “Explain the concept of ‘Fracture Planes’.”
    • Natural boundaries where systems can be split with minimal coordination.
  5. “When should you merge two teams into one?”
    • When they’re in constant collaboration with no clear interface.

5.9 Books That Will Help

Topic Book Chapter
Fracture Planes “Team Topologies” by Skelton & Pais Ch. 8: Evolve Team Structures
Evolutionary Architecture “Building Evolutionary Architectures” Ch. 3: Engineering Incremental Change
System Design “Designing Data-Intensive Applications” Ch. 4 (on coupling)

5.10 Implementation Phases

Phase 1: Data Extraction (5-7 hours)

  1. Build Jaeger/Zipkin data extractor
  2. Parse spans to service-to-service calls
  3. Load service-to-team mapping

Phase 2: Graph Building (4-5 hours)

  1. Aggregate service deps to team deps
  2. Build NetworkX graph
  3. Calculate basic metrics

Phase 3: Analysis (4-5 hours)

  1. Implement cycle detection
  2. Implement bottleneck detection
  3. Implement cluster detection

Phase 4: Visualization (5-7 hours)

  1. Generate interactive HTML using PyVis or D3.js
  2. Add color coding for risk levels
  3. Add click-through to details

Phase 5: Reporting (3-4 hours)

  1. Generate Markdown report
  2. Include recommendations
  3. Add historical comparison

5.11 Key Implementation Decisions

Decision Option A Option B Recommendation
Data source Tracing only Tracing + code scan Tracing (runtime reality)
Visualization Static PNG Interactive HTML Interactive (more useful)
Team aggregation Always Optional detail view Both (zoom in/out)
Cycle highlighting Text list Graph animation Both

6. Testing Strategy

Unit Tests

def test_cycle_detection():
    G = nx.DiGraph()
    G.add_edges_from([('A', 'B'), ('B', 'C'), ('C', 'A')])
    cycles = find_cycles(G)
    assert len(cycles) == 1
    assert set(cycles[0]) == {'A', 'B', 'C'}

def test_bottleneck_detection():
    G = nx.DiGraph()
    G.add_edges_from([
        ('A', 'Z'), ('B', 'Z'), ('C', 'Z'), ('D', 'Z')
    ])
    bottlenecks = find_bottlenecks(G, threshold=2)
    assert bottlenecks[0][0] == 'Z'
    assert bottlenecks[0][1] == 4

Integration Tests

  • Load real Jaeger data
  • Verify graph matches expected structure
  • Verify visualization renders without errors

Manual Validation

  • Show graph to someone who knows the org
  • Ask: “Does this match reality?”

7. Common Pitfalls & Debugging

Problem Symptom Root Cause Fix
Too many edges Graph is unreadable Showing service-level Aggregate to teams
Missing services Some deps not showing Tracing gaps Supplement with code scan
Wrong team mapping Services assigned wrong Stale service-to-team Update mapping from catalog
Slow visualization Browser hangs Too many nodes/edges Filter to top N deps

8. Extensions & Challenges

Extension 1: Time-Series Analysis

Track dependencies over 4 quarters. Is the graph getting simpler or more complex?

Extension 2: Cost Attribution

Calculate the “cost” of each dependency (based on wait times or incidents).

Extension 3: Simulation

“What if we moved Service X from Team A to Team B?” Show the new graph.

Extension 4: Integration with Catalog

Pull service-to-team mapping from Backstage/service catalog.


9. Real-World Connections

Tools:

How Big Tech Does This:

  • Uber: Built internal dependency mapping tools
  • Netflix: Service mesh with dependency visualization
  • Google: Automatic dependency tracking across all services

10. Resources

NetworkX

Visualization


11. Self-Assessment Checklist

Before considering this project complete, verify:

  • I can explain Conway’s Law and fracture planes
  • Data extraction works for at least one source
  • Graph correctly aggregates services to teams
  • Cycle detection finds all cycles
  • Bottleneck detection identifies high in-degree nodes
  • Visualization is interactive and readable
  • Report includes actionable recommendations
  • At least one stakeholder has validated the graph

12. Submission / Completion Criteria

This project is complete when you have:

  1. Data extraction from Jaeger/Zipkin or manual input
  2. Service-to-team mapping for all services
  3. NetworkX graph with calculated metrics
  4. Interactive visualization (HTML)
  5. Markdown report with:
    • All cycles identified
    • Top 3 bottleneck teams
    • Fracture plane recommendations
  6. Validation from someone who knows the organization

Previous Project: P07: Service Level Expectation Agreement Next Project: P09: Operational Readiness Review System