Project 3: Ownership Boundary Mapper (RACI 2.0)

Build a schema and validation tool that maps every technical asset to exactly one owning team and an escalation path.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1 Week (15-20 hours)
Primary Language YAML / JSON
Alternative Languages Python, Go
Prerequisites Basic scripting, understanding of cloud resources
Key Topics RACI, Bounded Contexts, Code Ownership

1. Learning Objectives

By completing this project, you will:

  1. Define ownership at the asset level (repo, service, bucket, queue)
  2. Create a machine-readable ownership registry
  3. Build a validation tool that flags orphaned resources
  4. Implement the rule: “If it exists, someone owns it”
  5. Connect ownership to on-call and escalation

2. Theoretical Foundation

2.1 Core Concepts

The Ownership Problem

WITHOUT CLEAR OWNERSHIP                 WITH CLEAR OWNERSHIP
┌─────────────────────────────────┐    ┌─────────────────────────────────┐
│ "Whose service is this?"        │    │ owner-check service-xyz         │
│ "I think Team A? Maybe B?"      │    │ => Team: Payments               │
│ "Let me Slack around..."        │    │ => On-call: @jane (PagerDuty)   │
│ [30 minutes of searching]       │    │ => Escalation: #payments-oncall │
│ "Actually, nobody knows"        │    │ [2 seconds]                     │
└─────────────────────────────────┘    └─────────────────────────────────┘

Every hour spent searching for ownership is an hour not spent fixing the problem.

The RACI Matrix Evolved

Traditional RACI defines roles:

  • Responsible: Who does the work
  • Accountable: Who is the decision-maker (only one!)
  • Consulted: Who provides input
  • Informed: Who needs to know

For operating models, we simplify:

  • Owner: The single team accountable for the asset
  • Contributors: Teams that can submit changes (PRs)
  • Consumers: Teams that depend on the asset
  • Escalation: Who to page when things break

Bounded Contexts (DDD)

Domain-Driven Design teaches that systems should be divided into “bounded contexts”—areas where a specific domain model applies. Ownership boundaries should align with these contexts.

┌──────────────────────────────────────────────────────────────┐
│                    ORDER DOMAIN                              │
│  ┌─────────────────┐  ┌─────────────────┐  ┌──────────────┐ │
│  │  Order Service  │  │   Cart Service  │  │ Promo Engine │ │
│  │  Owner: Orders  │  │  Owner: Orders  │  │ Owner: Orders│ │
│  └─────────────────┘  └─────────────────┘  └──────────────┘ │
└──────────────────────────────────────────────────────────────┘
                              │
                    Shared DB (Owned by Orders)
                              │
┌──────────────────────────────────────────────────────────────┐
│                   PAYMENT DOMAIN                             │
│  ┌─────────────────┐  ┌─────────────────┐                   │
│  │ Payment Gateway │  │ Fraud Detection │                   │
│  │ Owner: Payments │  │ Owner: Payments │                   │
│  └─────────────────┘  └─────────────────┘                   │
└──────────────────────────────────────────────────────────────┘

2.2 Why This Matters

Incident ping-pong is the #1 symptom of unclear ownership:

3:02 AM: Alert fires
3:05 AM: SRE pages Team A
3:15 AM: "Not our service" - Team A pages Team B
3:30 AM: "We just consume it" - Team B pages Team C
3:45 AM: "The original team left" - Team C pages... everyone?
4:00 AM: CTO joins call asking "WHO OWNS THIS?"

With an ownership registry, the 3:02 AM alert goes directly to the right team.

2.3 Historical Context

  • CODEOWNERS (GitHub, 2016): First mainstream ownership-in-code
  • Backstage (Spotify, 2020): Service catalog with ownership
  • OpsLevel (2019): Dedicated service ownership platform

2.4 Common Misconceptions

Misconception Reality
“Shared ownership works” “Shared” means “no one” when there’s a problem
“We can figure it out when needed” You can’t figure it out at 3 AM under pressure
“Ownership = who wrote the code” Ownership = who maintains, operates, and evolves
“We need ownership per file” Start at service/repo level, go finer only if needed

3. Project Specification

3.1 What You Will Build

A CLI tool (owner-check) that:

  1. Reads a registry of teams and assets
  2. Validates that every asset has exactly one owner
  3. Flags orphaned resources and stale team references
  4. Outputs ownership information for any asset

3.2 Functional Requirements

  1. Team Registry
    • Store team metadata (ID, name, Slack, on-call link)
    • Support team lifecycle (active, deprecated, merged)
  2. Asset Registry
    • Map assets to owning teams
    • Support multiple asset types (repo, service, S3 bucket, queue)
    • Include escalation path
  3. Validation
    • Every asset must have exactly one owner
    • Owner team must exist and be active
    • Dependencies must reference valid assets
  4. Query Interface
    • owner-check <asset-id> → returns owner info
    • owner-check --team <team-id> → lists all assets
    • owner-check --orphans → lists unowned resources

3.3 Non-Functional Requirements

  • Schema must be version-controllable (YAML/JSON)
  • Must support 1000+ assets without performance issues
  • Should integrate with CI/CD (validate on PR)
  • Must be extensible for new asset types

3.4 Example Usage / Output

Input (teams.yaml):

teams:
  - id: team-identity
    name: Identity & Access
    slack: "#team-identity"
    oncall: https://pagerduty.com/services/identity
    status: active

  - id: team-payments
    name: Payments & Billing
    slack: "#team-payments"
    oncall: https://pagerduty.com/services/payments
    status: active

  - id: team-rocket
    name: Legacy Rocket Team
    status: deprecated
    merged_into: team-identity

Input (assets.yaml):

assets:
  - id: service-auth
    type: service
    name: Authentication Service
    owner: team-identity
    repo: github.com/company/auth-service
    dependencies:
      - asset-id: service-userdb
        type: runtime

  - id: service-payment-gateway
    type: service
    name: Payment Gateway
    owner: team-payments
    repo: github.com/company/payment-gateway

  - id: bucket-legacy-logs
    type: s3_bucket
    name: Legacy Log Archive
    owner: team-rocket  # Problem: team is deprecated!

  - id: queue-notifications
    type: sqs_queue
    name: Notification Queue
    # Problem: no owner defined!

CLI Output:

$ ./owner-check service-auth
Asset: service-auth (Authentication Service)
Type: service
Owner: team-identity (Identity & Access)
On-call: https://pagerduty.com/services/identity
Slack: #team-identity
Repo: github.com/company/auth-service
Dependencies:
  - service-userdb (runtime)

$ ./owner-check --orphans
[ERROR] queue-notifications: No owner defined
[ERROR] bucket-legacy-logs: Owner 'team-rocket' is deprecated

$ ./owner-check --team team-payments
Assets owned by team-payments (Payments & Billing):
  - service-payment-gateway (service)

$ ./owner-check --validate
Validating ownership registry...
[OK] 2 teams active
[OK] 2 assets fully defined
[ERROR] bucket-legacy-logs: owner team-rocket is deprecated (merged into team-identity)
[ERROR] queue-notifications: missing owner field

Validation FAILED: 2 errors found

3.5 Real World Outcome

After implementing this tool:

  • Incident response time decreases (right team paged immediately)
  • Orphaned resources are discovered and assigned
  • Offboarding is cleaner (assets transferred before team dissolves)
  • Audit/compliance is simpler (clear ownership trail)

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                     OWNERSHIP SYSTEM                            │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐     ┌───────────────┐     ┌───────────────┐
│ teams.yaml    │     │ assets.yaml   │     │ Infrastructure│
│               │     │               │     │ (AWS/GCP)     │
│ Team registry │     │ Asset->Team   │     │ Actual state  │
│ with metadata │     │ mapping       │     │ for comparison│
└───────────────┘     └───────────────┘     └───────────────┘
        │                     │                     │
        └─────────────────────┼─────────────────────┘
                              │
                              ▼
                    ┌───────────────────┐
                    │   owner-check     │
                    │   CLI Tool        │
                    │                   │
                    │ - validate        │
                    │ - query           │
                    │ - compare         │
                    └───────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
        ┌──────────┐   ┌──────────┐   ┌──────────┐
        │ Terminal │   │ CI/CD    │   │ Backstage│
        │ Output   │   │ Check    │   │ API      │
        └──────────┘   └──────────┘   └──────────┘

4.2 Key Components

  1. Team Registry (teams.yaml)
    • Source of truth for team metadata
    • Lifecycle management (active/deprecated/merged)
  2. Asset Registry (assets.yaml)
    • Maps every asset to one owner
    • Includes dependencies and escalation
  3. Validator
    • Checks referential integrity
    • Ensures every asset has valid owner
  4. Query Engine
    • Fast lookup by asset or team
    • Supports wildcards and filters

4.3 Data Structures

# Schema: teams.yaml
teams:
  - id: string          # Unique identifier (required)
    name: string        # Human-readable name (required)
    slack: string       # Slack channel for contact
    oncall: url         # PagerDuty/OpsGenie link
    status: enum        # active | deprecated | pending
    merged_into: string # If deprecated, new owner
    members:            # Optional: team composition
      - email: string
        role: string

# Schema: assets.yaml
assets:
  - id: string          # Unique identifier (required)
    type: enum          # service | repo | s3_bucket | rds | queue | etc.
    name: string        # Human-readable name
    owner: string       # team-id (required, must exist)
    repo: url           # GitHub/GitLab repository
    documentation: url  # Link to docs
    criticality: enum   # tier-1 | tier-2 | tier-3
    dependencies:       # What this asset depends on
      - asset_id: string
        type: enum      # runtime | build | data
    slo_link: url       # Link to SLO dashboard

4.4 Algorithm Overview

Validation Algorithm:

1. Load teams.yaml → team_map
2. Load assets.yaml → asset_list
3. For each asset:
   a. Check owner field exists
   b. Check owner is in team_map
   c. Check owner.status == 'active'
   d. For each dependency:
      i. Check dependency asset exists
4. For each team:
   a. Check at least one asset is owned (warn if none)
5. Report all errors

Sync with Infrastructure (Advanced):

1. Scan AWS/GCP for resources with tags
2. Compare to assets.yaml
3. Flag resources not in registry
4. Flag registry entries not in infrastructure

5. Implementation Guide

5.1 Development Environment Setup

# Python implementation
python3 -m venv venv
source venv/bin/activate
pip install pyyaml click

# Or Go implementation
go mod init owner-check
go get gopkg.in/yaml.v3
go get github.com/spf13/cobra

5.2 Project Structure

owner-check/
├── data/
│   ├── teams.yaml
│   └── assets.yaml
├── src/
│   ├── __init__.py
│   ├── models.py      # Data classes
│   ├── loader.py      # YAML parsing
│   ├── validator.py   # Validation logic
│   └── cli.py         # Command-line interface
├── tests/
│   ├── test_validator.py
│   └── fixtures/
│       ├── valid_teams.yaml
│       └── invalid_assets.yaml
└── owner-check        # Entry point script

5.3 The Core Question You’re Answering

“If this service breaks at 3 AM, whose phone rings, and do they know it’s their problem?”

Ambiguous ownership is the leading cause of “Incident Ping-Pong”—tickets bouncing between teams because no one is sure they own the fix.

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Bounded Contexts (DDD)
    • How do you draw lines around code so it can be owned by one team?
    • What happens when two teams need to change the same file?
    • Book Reference: “Domain-Driven Design” by Eric Evans, Ch. 14
  2. The RACI Matrix
    • What’s the difference between Responsible and Accountable?
    • Why can there be multiple R’s but only one A?
    • Book Reference: Standard Project Management literature
  3. GitHub CODEOWNERS
    • How does GitHub enforce code review by owners?
    • What’s the difference between CODEOWNERS and service ownership?
    • Reference: GitHub documentation

5.5 Questions to Guide Your Design

Before implementing, think through these:

Granularity

  • Do you own at the “Repo” level, the “Microservice” level, or the “S3 Bucket” level?
  • What happens when one repo contains code for multiple services?
  • Should infrastructure (VPCs, subnets) have owners?

The Registry

  • Where do the “Teams” live? A YAML file? An LDAP group? A database?
  • How do you handle team renames or merges?
  • Who is allowed to change ownership?

Lifecycle

  • What happens when an owner leaves the company?
  • How do you transfer ownership gracefully?
  • What’s the process for deprecating a team?

5.6 Thinking Exercise

The “Burning Building” Trace

Take a random microservice in your system. Imagine it starts returning 500 errors at 3 AM.

Questions while analyzing:

  1. Who is the first person to get an alert?
  2. How do they know which team the alert belongs to?
  3. If they look at the source code, is there a clear “Contact Us” or “Owned By” header?
  4. If the owner isn’t listed, how many people do they have to ask before finding the owner?

Write down:

  • The path from “alert fires” to “correct human is paged”
  • Every point where someone had to guess or ask
  • Each guess/ask is a bug in your ownership model

5.7 Hints in Layers

Hint 1: Use CODEOWNERS as Inspiration GitHub’s CODEOWNERS file is a great simple format:

# Pattern            Owners
/services/auth/*     @team-identity
/services/payment/*  @team-payments
*                    @platform-team  # fallback

Start here, then extend to non-code assets.

Hint 2: Define the Team Schema First Before mapping assets, define what a “team” is:

team:
  id: string
  name: string
  slack: string
  oncall: url
  status: active|deprecated

Hint 3: Create the Assets Schema Map asset IDs to team IDs:

asset:
  id: string
  type: service|bucket|queue|...
  owner: team-id

Hint 4: Write the Validator The core logic is simple:

for asset in assets:
    if asset.owner not in teams:
        errors.append(f"Unknown owner: {asset.owner}")
    elif teams[asset.owner].status != "active":
        errors.append(f"Deprecated owner: {asset.owner}")

5.8 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How do you handle shared infrastructure that multiple teams use?”
    • One owner, multiple consumers. Owner manages the resource, others have usage rights.
  2. “What are the dangers of ‘Shared Ownership’?”
    • No single point of accountability. “Everyone’s job” becomes “no one’s job.”
  3. “How do you transition ownership of a legacy system to a new team?”
    • Knowledge transfer period, documented handoff, ownership change in registry, redirect of alerts.
  4. “Should the person who writes the code always be the one who owns it in production?”
    • No. Ownership is about maintenance and operations, not original authorship.
  5. “What metrics can you use to prove that ownership boundaries are clear?”
    • MTTA (Mean Time To Acknowledge), incident reassignment rate, orphan resource count.

5.9 Books That Will Help

Topic Book Chapter
Bounded Contexts “Domain-Driven Design” by Eric Evans Ch. 14: Maintaining Model Integrity
Ownership Patterns “Modern Software Engineering” by David Farley Ch. 12
Incident Response “Site Reliability Engineering” Ch. 14: Managing Incidents

5.10 Implementation Phases

Phase 1: Schema Design (2-3 hours)

  1. Define team schema with required fields
  2. Define asset schema with required fields
  3. Document the relationship rules

Phase 2: Registry Bootstrap (3-4 hours)

  1. Create teams.yaml with 5-10 teams
  2. Create assets.yaml with 20-30 assets
  3. Validate manually that mappings are correct

Phase 3: CLI Tool (4-5 hours)

  1. Implement YAML loading
  2. Implement validation logic
  3. Implement query commands
  4. Add colorized output for errors

Phase 4: CI/CD Integration (2-3 hours)

  1. Add GitHub Action to validate on PR
  2. Block merge if validation fails
  3. Add badge to README showing status

5.11 Key Implementation Decisions

Decision Option A Option B Recommendation
Storage YAML files Database YAML (version control)
Team source Manual YAML LDAP sync Manual first, LDAP later
Asset discovery Manual Cloud API scan Manual first, scan for validation
Enforcement Advisory Blocking Start advisory, move to blocking

6. Testing Strategy

Unit Tests

def test_valid_ownership():
    teams = load_teams("fixtures/valid_teams.yaml")
    assets = load_assets("fixtures/valid_assets.yaml")
    errors = validate(teams, assets)
    assert len(errors) == 0

def test_missing_owner():
    assets = [Asset(id="test", owner=None)]
    errors = validate([], assets)
    assert "missing owner" in errors[0]

def test_deprecated_owner():
    teams = [Team(id="old", status="deprecated")]
    assets = [Asset(id="test", owner="old")]
    errors = validate(teams, assets)
    assert "deprecated" in errors[0]

Integration Tests

  • Load real teams.yaml and assets.yaml
  • Validate all assets have valid owners
  • Check for circular dependencies

Smoke Tests

  • Run owner-check --validate in CI
  • Fail build if errors found

7. Common Pitfalls & Debugging

Problem Symptom Root Cause Fix
Orphaned resources --validate finds unknown owners Team renamed/merged without updating Add owner to teams or update asset
Duplicate ownership Multiple teams claim same asset No single source of truth Pick one owner, others become consumers
Stale registry Production resources not in registry Manual process, no automation Add cloud scanning or PR requirement
Over-granular 1000+ assets, unmaintainable Mapped at file level instead of service Aggregate to service/repo level

8. Extensions & Challenges

Extension 1: Cloud Resource Sync

Use AWS/GCP APIs to discover resources. Compare to registry. Flag orphans.

$ owner-check --sync-aws
Discovered 150 S3 buckets in AWS
Matched 142 to registry
[WARN] 8 buckets have no owner:
  - legacy-logs-2019
  - temp-data-export
  ...

Extension 2: Dependency Graph

Visualize asset dependencies. Highlight cross-team dependencies.

Extension 3: Ownership Cost Report

Integrate with billing. Show cost per team based on owned assets.

Extension 4: Slack Integration

Slash command: /owner service-auth returns owner info in Slack.


9. Real-World Connections

How Big Tech Does This:

  • Google: Service Mesh with mandatory owner metadata
  • Netflix: Ownership tags on every AWS resource
  • Spotify: Backstage service catalog with ownership

Open Source Tools:

  • Backstage: Service catalog (catalog-info.yaml)
  • OpsLevel: Service ownership maturity
  • Cortex: Internal developer portal

10. Resources

GitHub CODEOWNERS

Backstage

Articles


11. Self-Assessment Checklist

Before considering this project complete, verify:

  • I can explain the difference between Responsible and Accountable
  • teams.yaml has at least 5 teams with complete metadata
  • assets.yaml has at least 20 assets mapped to owners
  • owner-check --validate passes with no errors
  • owner-check <asset> returns owner info in under 1 second
  • There’s a CI check that validates on PR
  • I’ve found and assigned at least one orphaned resource

12. Submission / Completion Criteria

This project is complete when you have:

  1. teams.yaml with 5+ active teams
  2. assets.yaml with 20+ mapped assets
  3. owner-check CLI with validate, query, and orphan commands
  4. CI integration blocking merges on validation errors
  5. Documentation explaining how to add new assets/teams
  6. One orphan resolution - found an unowned resource and assigned it

Previous Project: P02: Team Service Interface Next Project: P04: Escalation Logic Tree