Project 15: Prompt Registry + Versioning Service

Versioned prompt catalog with compatibility checks and audit history.

Quick Reference

Attribute Value
Difficulty Level 2: Intermediate
Time Estimate 5-10 days (capstone: 3-5 weeks)
Main Programming Language TypeScript
Alternative Programming Languages Python, Go
Coolness Level Level 3: Platform Discipline
Business Potential 4. Internal Platform
Knowledge Area Prompt Lifecycle Management
Software or Tool Registry API + metadata store
Main Book Accelerate (Forsgren et al.)
Concept Clusters Evaluation, Rollouts, and Governance; Prompt Contracts and Output Typing

1. Learning Objectives

By completing this project, you will:

  1. Design a prompt registry API that treats prompts as versioned, immutable artifacts with content-addressable hashes, semantic version numbers, and rich metadata (author, timestamp, contract reference, model compatibility).
  2. Implement semantic versioning rules specific to prompts, where major versions signal breaking output changes, minor versions add new capabilities, and patch versions fix wording without altering behavior.
  3. Build a compatibility checking engine that detects breaking changes by diffing prompt versions against registered consumer contracts and blocks promotions that would break downstream systems.
  4. Design multi-stage approval workflows (author, reviewer, approver, promotion) with role-based access control so that only authorized maintainers can publish or promote prompt versions.
  5. Produce immutable audit logs that record who changed what, when, why, and the exact diff, satisfying compliance requirements for regulated environments.
  6. Implement migration windows that allow old and new prompt versions to coexist for a defined period, with automated deprecation enforcement and consumer impact notifications.

2. All Theory Needed (Per-Concept Breakdown)

Prompt Artifact Versioning

Fundamentals Prompt artifact versioning applies the discipline of software artifact management to prompt templates, treating each prompt as an immutable, content-addressable artifact with a unique identity, version lineage, and rich metadata envelope. In traditional software engineering, artifacts like compiled binaries, container images, and library packages go through registries (npm, Docker Hub, PyPI) where every published version is immutable and every change produces a new version. Prompt artifact versioning brings this same rigor to prompt templates. Without it, teams lose track of which prompt text is running in production, cannot reproduce past behavior, and have no safe way to roll back when a new prompt causes regressions. The registry becomes the single source of truth for what prompt content is deployed where.

Deep Dive into the concept A prompt artifact is the combination of the prompt template text, its metadata envelope, and its content-addressable identity. The template text is the actual instruction sent to the LLM, potentially with variable placeholders like {{user_query}} or {{context_documents}}. The metadata envelope includes the artifact name, semantic version, author, creation timestamp, model compatibility list, output schema reference, tags, and a pointer to the evaluation results that validated this version. The content-addressable identity is a cryptographic hash (SHA-256) of the canonical form of the template text plus critical metadata, ensuring that any change, no matter how small, produces a different identity.

Content-addressable storage is a pattern borrowed from Git and container image registries. In Git, every commit, tree, and blob is stored by its SHA-1 hash. Docker images are identified by content digests. The same principle applies to prompts: if two teams independently write the exact same prompt template with the same metadata, the registry recognizes them as the same artifact. This eliminates duplication and provides a strong guarantee that what you tested is exactly what you deploy. A registry that uses content-addressable storage never overwrites: publishing a “new version” always creates a new object with a new hash, and the old version remains accessible forever.

The storage layer of a prompt registry typically has two tiers. The blob store holds the actual prompt content, indexed by content hash. The metadata store (a relational or document database) holds the version graph, tags, ownership, approval status, and pointers to the blob store. This separation mirrors how container registries work: the manifest (metadata) is separate from the layers (content blobs). The metadata store supports queries like “give me all versions of the refund_policy_assistant prompt” or “which prompts are compatible with claude-3.5-sonnet” while the blob store handles efficient storage and deduplication.

Version lineage tracks the parent-child relationships between prompt versions. When an author creates version 2.3.1 from 2.3.0, the registry records this lineage. This enables diff views (what changed between versions), blame tracking (who introduced a specific change), and rollback chains (which previous version to revert to). Lineage also supports branching: a team might maintain a v2.x line for existing consumers while developing v3.x with breaking changes.

Immutability is the non-negotiable property. Once a prompt version is published to the registry, its content and critical metadata cannot be changed. Mutable fields are limited to operational annotations like deprecation notices and consumer counts. This immutability guarantee is what makes the registry trustworthy: when a consumer pins to version 2.3.0, they know the content will never silently change. If someone needs to fix a typo, they publish 2.3.1. This discipline prevents the “it worked yesterday” debugging nightmare where prompt content is modified in place.

How this fit on projects Prompt artifact versioning is the central design concept for Project 15. Every API endpoint, every storage decision, and every workflow in the registry is shaped by the requirement that prompts are versioned, immutable, content-addressable artifacts. The compatibility checker, approval workflow, and audit trail all depend on the integrity guarantees provided by this versioning model.

Definitions & key terms

  • Content-addressable storage: Storage system where objects are identified by the cryptographic hash of their content rather than by a mutable path or name.
  • Artifact envelope: The combination of content blob and metadata that together constitute a versioned prompt artifact.
  • Version lineage: The directed graph of parent-child relationships between prompt versions, enabling diff, blame, and rollback.
  • Immutability guarantee: The property that once published, an artifact’s content and identity can never be changed, only superseded by new versions.
  • Blob store: The storage tier that holds raw prompt content indexed by content hash.
  • Metadata store: The database tier that holds version graphs, ownership, approval states, and query indexes.

Mental model diagram (ASCII)

Author writes prompt template v2.3.1
            |
            v
+---------------------------+      +---------------------+
|    Registry API           |----->|   Content Hasher     |
|  (publish, query, promote)|      |  SHA-256(template +  |
+---------------------------+      |   critical metadata) |
            |                      +---------------------+
            |                               |
            v                               v
+---------------------------+      +---------------------+
|    Metadata Store         |      |    Blob Store       |
|  (Postgres / DynamoDB)    |      |  (S3 / filesystem)  |
|                           |      |                     |
|  prompt_name: refund_asst |      |  sha256:a1b2c3... ->|
|  version: 2.3.1           |      |    [prompt content] |
|  parent: 2.3.0            |      |                     |
|  author: alice@corp       |      |  sha256:d4e5f6... ->|
|  status: APPROVED         |      |    [prompt content] |
|  content_hash: a1b2c3...  |      +---------------------+
|  model_compat: [claude-3] |
|  output_schema: refund.v2 |
|  created_at: 2026-01-15   |
+---------------------------+
            |
            v
    Consumers pin to version
    or semver range (^2.3.0)

How it works (step-by-step, with invariants and failure modes)

  1. Author submits a prompt template and metadata to the registry API’s publish endpoint. Invariant: the request must include a prompt name, template text, and target version. Failure mode: missing required fields returns a validation error before any storage write.
  2. The registry computes the SHA-256 hash of the canonical template text plus critical metadata (model compatibility, output schema reference). Invariant: the canonical form strips whitespace-only differences to prevent trivial duplicates. Failure mode: if the hash already exists in the blob store and the version number differs, it means two versions have identical content, which is flagged as a warning.
  3. The blob store writes the content by hash (no-op if hash already exists due to deduplication). The metadata store creates a new version record linking the hash, version number, parent version, author, and initial status (DRAFT). Invariant: the metadata write and blob write are atomic (both succeed or both fail). Failure mode: partial write is prevented by wrapping both operations in a transaction or using a write-ahead log.
  4. The version record enters the approval workflow (DRAFT -> REVIEW -> APPROVED -> PROMOTED). Each state transition requires the appropriate role and is recorded in the audit log. Invariant: no version can reach PROMOTED status without passing through APPROVED. Failure mode: attempting to promote an unapproved version returns an authorization error.
  5. Consumers query the registry by name and version range (e.g., ^2.3.0). The registry resolves the range to the highest compatible PROMOTED version and returns the content hash plus metadata. Invariant: only PROMOTED versions are visible to consumer queries. Failure mode: if no promoted version matches the range, the registry returns a resolution error with the closest available versions.

Minimal concrete example

Publish request:
  POST /v1/prompts
  {
    "name": "refund_policy_assistant",
    "version": "2.3.1",
    "parent_version": "2.3.0",
    "template": "You are a refund policy assistant.\n\nContext: {{context}}\nQuery: {{user_query}}\n\nRespond in JSON matching @schema:RefundResponse.",
    "model_compatibility": ["claude-3.5-sonnet", "gpt-4o"],
    "output_schema": "schemas/refund_response.v2.json",
    "tags": ["support", "refund"],
    "change_description": "Clarified context formatting for multi-document inputs"
  }

Registry response:
  {
    "id": "prm_00232",
    "name": "refund_policy_assistant",
    "version": "2.3.1",
    "content_hash": "sha256:a1b2c3d4e5f6...",
    "status": "DRAFT",
    "parent_version": "2.3.0",
    "created_at": "2026-01-15T10:30:00Z"
  }

Version resolution query:
  GET /v1/prompts/refund_policy_assistant?range=^2.3.0&status=PROMOTED

  {
    "resolved_version": "2.3.0",
    "content_hash": "sha256:f6e5d4c3b2a1...",
    "note": "2.3.1 exists but is in DRAFT status"
  }

Common misconceptions

  • “Prompt versioning is just saving files in a Git repo.” Git handles text versioning well, but a prompt registry adds structured metadata queries, consumer dependency tracking, approval workflows, content-addressable deduplication, and API-driven resolution that Git alone does not provide. Git is a reasonable backend for the blob store, but the metadata layer needs a queryable database.
  • “Content-addressable hashing is unnecessary overhead.” Without it, you cannot guarantee that the prompt you tested is the exact prompt running in production. A single whitespace change alters behavior with some models, and hash verification catches this.
  • “Immutability is too rigid; sometimes you just need to fix a typo.” Immutability is the foundation of trust. The fix for a typo is a new patch version (2.3.1), not an in-place edit. The old version remains available for consumers who have not yet migrated.
  • “Version numbers are enough to identify a prompt.” Version numbers are human-readable labels. Content hashes are machine-verifiable identities. Both are needed: versions for human communication, hashes for integrity verification.

Check-your-understanding questions

  1. Why does the registry need both a blob store and a metadata store instead of putting everything in one database?
  2. What happens if two authors publish versions with identical template text but different version numbers?
  3. Why must the canonical form of a prompt template normalize whitespace before hashing?
  4. How does immutability interact with the need to deprecate old versions?

Check-your-understanding answers

  1. The blob store provides content-addressable deduplication and efficient storage for potentially large template texts. The metadata store provides relational queries (find by name, filter by status, resolve version ranges) that blob stores cannot efficiently support. Separating them follows the same pattern as container registries (manifest vs layers).
  2. The registry stores both versions with the same content hash but different version metadata. It flags this as a warning because it suggests unnecessary version inflation. The content-addressable property means the blob store only stores the content once, but the metadata store has two distinct version records pointing to the same hash.
  3. LLMs may be sensitive to whitespace differences, but trivial formatting changes (trailing spaces, inconsistent line endings) should not create new artifact identities. The canonical form defines which whitespace is significant (e.g., newlines between sections) and which is not (trailing spaces, BOM characters). This prevents hash collisions from cosmetic differences while preserving meaningful formatting.
  4. Deprecation is a mutable annotation on an immutable artifact. The prompt content and identity never change, but the metadata store can mark a version as DEPRECATED, add a deprecation message, and set an expiration date. Consumer queries can be configured to exclude deprecated versions or to return them with a warning header.

Real-world applications

  • MLflow’s Prompt Registry provides commit-based versioning with immutable versions, side-by-side diff views, and deployment pipelines for A/B testing and rollbacks, following the same content-addressable pattern used in their model registry.
  • Braintrust assigns content-addressable IDs to prompt versions so the same prompt always produces the same identifier, enabling environment-based deployment pipelines (dev -> staging -> production) that prevent untested changes from reaching production.
  • Container registries (Docker Hub, ECR, GCR) use the same two-tier architecture (manifest store + blob store) with content-addressable digests, providing a proven pattern for immutable artifact management at scale.
  • npm, PyPI, and Maven Central all enforce immutability for published packages: once a version is published, it cannot be changed, only yanked or superseded.

Where you’ll apply it

  • Phase 1 of this project: design the storage schema, implement the publish endpoint with content hashing, and build the version resolution query.
  • Phase 2: integrate the versioning model with the compatibility checker and approval workflow.

References

  • “Designing Data-Intensive Applications” by Martin Kleppmann - Chapter 4: Encoding and Evolution (schema evolution and compatibility)
  • “Accelerate” by Forsgren et al. - Chapters on version control and deployment automation
  • MLflow Prompt Registry documentation (commit-based versioning for GenAI prompts)
  • Docker Registry HTTP API V2 specification (content-addressable manifest and blob model)
  • Semantic Versioning 2.0.0 specification (semver.org)

Key insights A prompt registry is not a file server with version numbers; it is a trust infrastructure where content-addressable hashing guarantees integrity, immutability guarantees reproducibility, and structured metadata enables the queries that power compatibility checking and consumer dependency management.

Summary Prompt artifact versioning treats prompts as immutable, content-addressable artifacts stored in a two-tier system (blob store for content, metadata store for version graphs and queries). Every change creates a new version with a new hash, preserving full lineage and enabling reliable rollback, diff, and resolution. This model is borrowed from proven patterns in container registries and package managers, adapted for the unique needs of prompt templates (model compatibility, output schema references, evaluation result pointers).

Homework/Exercises to practice the concept

  • Design a metadata schema (as a table definition or document structure) for a prompt registry that supports: artifact name, semantic version, content hash, parent version, author, creation timestamp, status (DRAFT/REVIEW/APPROVED/PROMOTED/DEPRECATED), model compatibility list, output schema reference, tags, and change description. Identify which fields are immutable and which are mutable.
  • Write the pseudocode for a version resolution algorithm that takes a prompt name and a semver range (like ^2.3.0 or ~2.3.0) and returns the highest matching PROMOTED version. Handle the case where no promoted version matches.
  • Draw a sequence diagram showing the lifecycle of a prompt from initial publish through approval, promotion, consumer usage, deprecation, and eventual archival.

Solutions to the homework/exercises

  • The metadata schema should have immutable fields (name, version, content_hash, parent_version, author, created_at, change_description, template_text) and mutable fields (status, deprecation_message, deprecation_date, consumer_count). The content_hash serves as the link to the blob store. A composite unique index on (name, version) prevents duplicate version numbers, while content_hash deduplication happens at the blob store level.
  • The resolution algorithm queries the metadata store for all records matching the prompt name with status=PROMOTED, filters them by the semver range using standard range comparison (caret means compatible with major version, tilde means compatible with minor version), sorts descending by version, and returns the first result. If no match, return an error with the closest available versions (highest promoted version below the range floor and lowest above the range ceiling).
  • The sequence diagram should show: Author -> Registry (publish, status=DRAFT) -> Reviewer (approve, status=REVIEW -> APPROVED) -> Platform Lead (promote, status=PROMOTED) -> Consumer (resolve version range, receive content hash) -> Consumer (fetch content by hash). Later: Maintainer -> Registry (deprecate version, set expiration) -> Consumers (receive deprecation warnings on resolution) -> Registry (archive after expiration, remove from active resolution).

Semantic Versioning for Prompt Artifacts

Fundamentals Semantic versioning (semver) for prompt artifacts adapts the familiar MAJOR.MINOR.PATCH convention to the unique characteristics of prompt changes. In software libraries, a major version bump signals breaking API changes, a minor bump adds backward-compatible functionality, and a patch fixes bugs without changing the interface. For prompts, the “interface” is the observable behavior: the output schema, the set of fields populated, the tone and format conventions, and the model compatibility. Applying semver to prompts gives teams a shared language for communicating the risk level of changes and enables automated compatibility checking. Without semver discipline, every prompt change is equally scary, and teams either over-review trivial changes or under-review dangerous ones.

Deep Dive into the concept The challenge of applying semver to prompts is that prompt “behavior” is probabilistic rather than deterministic. A software library either compiles or does not; a prompt might produce slightly different outputs across runs even with identical text. This means that semver for prompts must be defined in terms of the prompt’s contract (its output schema, required fields, and behavioral guarantees) rather than its exact output text.

A major version bump (e.g., 2.x.x -> 3.0.0) signals that the prompt’s contract has changed in a way that will break existing consumers. Examples include: removing a required output field, changing the output schema structure (e.g., from a flat object to a nested one), switching the expected response format (e.g., from JSON to XML), removing support for a previously compatible model, or changing the semantic meaning of an existing field. When a major version is bumped, consumers must update their parsing and validation logic.

A minor version bump (e.g., 2.3.x -> 2.4.0) signals that the prompt’s capabilities have expanded without breaking existing consumers. Examples include: adding a new optional output field, supporting an additional model, adding a new variable placeholder that has a default value, or improving output quality in ways that maintain the existing schema. Consumers on version range ^2.3.0 will automatically receive minor updates.

A patch version bump (e.g., 2.3.0 -> 2.3.1) signals that the prompt wording was adjusted without any contract or capability change. Examples include: fixing a typo, clarifying instructions that were ambiguous, adjusting whitespace or formatting, or tweaking language that does not change output structure. Patches are low-risk and should flow through approval quickly.

The versioning decision process requires comparing the new prompt version against the previous one across three dimensions: structural (does the output schema change?), behavioral (do existing test cases still pass?), and contractual (do consumer version ranges still resolve correctly?). This comparison can be partially automated: structural changes are detectable by diffing the output schema references, behavioral changes are detectable by running the evaluation suite from the previous version against the new prompt, and contractual changes are detectable by checking whether any consumer’s pinned range would break.

Pre-release versions (e.g., 2.4.0-beta.1) are useful for prompts that need testing in a staging environment before promotion. The registry can serve pre-release versions to staging consumers while keeping production consumers on the latest stable version. Build metadata (e.g., 2.3.1+eval.pass.20260115) can encode evaluation results or CI pipeline run IDs without affecting version precedence.

How this fit on projects Semantic versioning is the classification system that the compatibility checker uses to determine whether a new version is safe to promote. Every publish request to the registry must include a version number, and the registry validates that the version number correctly reflects the scope of changes (major, minor, or patch) by comparing against the previous version.

Definitions & key terms

  • Breaking change: A modification to the prompt’s contract that causes existing consumers to fail (e.g., removed output field, changed schema structure).
  • Backward-compatible change: A modification that adds capability without invalidating existing consumer expectations.
  • Version range: A semver expression (e.g., ^2.3.0, ~2.3.0, >=2.0.0 <3.0.0) that specifies which versions a consumer accepts.
  • Pre-release version: A version with a hyphenated suffix (e.g., 3.0.0-beta.1) indicating it is not yet stable for production use.
  • Version precedence: The ordering rules that determine which version is “higher” (3.0.0-beta.1 < 3.0.0 < 3.0.1).

Mental model diagram (ASCII)

Prompt Change Classification:

  +---------------------------------------------------+
  | What changed?                                     |
  +---------------------------------------------------+
  |                                                   |
  |  Output schema structure changed?                 |
  |  Required field removed or renamed?     --> MAJOR |
  |  Response format changed?                         |
  |  Model compatibility removed?                     |
  |                                                   |
  +---------------------------------------------------+
  |                                                   |
  |  New optional field added?                        |
  |  Additional model supported?            --> MINOR |
  |  New variable with default value?                 |
  |  Quality improvement, same contract?              |
  |                                                   |
  +---------------------------------------------------+
  |                                                   |
  |  Typo fix?                                        |
  |  Whitespace adjustment?                 --> PATCH |
  |  Clarification without behavior change?           |
  |  Comment or documentation update?                 |
  |                                                   |
  +---------------------------------------------------+

Version Range Resolution:

  Consumer pins to: ^2.3.0

  Available PROMOTED versions:
    2.2.0  -- too old (below range floor)
    2.3.0  -- matches
    2.3.1  -- matches (patch)
    2.4.0  -- matches (minor)
    3.0.0  -- too new (major break)

  Resolved: 2.4.0 (highest compatible)

How it works (step-by-step, with invariants and failure modes)

  1. Author submits a new prompt version with an explicit version number. Invariant: the version number must be strictly greater than the parent version. Failure mode: submitting 2.3.0 when 2.3.0 already exists returns a conflict error.
  2. The registry diffs the new version against its parent to classify the change scope. Invariant: if the output schema reference changed structurally, the version bump must be major. Failure mode: submitting a minor bump with a structural schema change triggers a version classification warning or error.
  3. If automated evaluation results are available, the registry runs the parent version’s test suite against the new version. Invariant: a patch version must pass all parent test cases. Failure mode: a patch that fails parent tests is flagged for re-classification as minor or major.
  4. The version enters the approval workflow with its classification metadata attached (MAJOR/MINOR/PATCH). Reviewers use this classification to apply appropriate scrutiny. Invariant: major versions require additional review (e.g., platform team approval in addition to team lead). Failure mode: if the approval workflow is not configured for the change scope, the registry blocks promotion until the workflow is updated.
  5. Consumers with version ranges automatically receive the new version when it reaches PROMOTED status, subject to range compatibility. Invariant: a consumer pinned to ^2.3.0 never receives a 3.x.x version. Failure mode: the resolution algorithm strictly enforces range rules per the semver specification.

Minimal concrete example

Version history for "refund_policy_assistant":

  v1.0.0  -- Initial release. Output: { refund_eligible: bool, reason: string }
  v1.1.0  -- Added optional field: confidence_score. (MINOR)
  v1.1.1  -- Fixed typo in system prompt. (PATCH)
  v2.0.0  -- Changed output to nested: { decision: { eligible: bool, reason: string }, metadata: {...} }. (MAJOR)
  v2.1.0  -- Added support for claude-3.5-sonnet. (MINOR)
  v2.1.1  -- Clarified multi-document context formatting. (PATCH)

Consumer A pins to: ^1.0.0  --> resolves to 1.1.1
Consumer B pins to: ^2.0.0  --> resolves to 2.1.1
Consumer C pins to: ~2.1.0  --> resolves to 2.1.1
Consumer D pins to: >=1.0.0 --> resolves to 2.1.1 (gets latest)

Common misconceptions

  • “Any wording change is a patch.” If the wording change alters the model’s output structure or behavior for existing test cases, it is a minor or major change regardless of how small the text diff is. A single word can change output format.
  • “Semver does not apply to prompts because outputs are non-deterministic.” Semver applies to the prompt’s contract, not to individual outputs. A contract specifies the output schema, required fields, and behavioral bounds. Non-determinism exists within those bounds.
  • “Consumers should always pin to exact versions.” Exact pinning prevents automatic patch and minor updates, creating maintenance burden. Range pinning with caret (^) is the right default for most consumers, giving them bug fixes and compatible improvements automatically.
  • “Pre-release versions are unnecessary for prompts.” Pre-release versions enable staging deployment and A/B testing without exposing untested prompts to production traffic. They are essential for major version migrations.
  • “Version numbers alone tell you what changed.” Version numbers indicate the scope of change (major/minor/patch) but not the nature. The change description in the metadata envelope provides the why; the diff between versions provides the what.

Check-your-understanding questions

  1. A prompt change adds a new required output field. What version bump is correct and why?
  2. How does the registry enforce that a declared patch version does not actually contain a breaking change?
  3. Why should major version bumps require a broader approval workflow than patches?
  4. What is the difference between ^2.3.0 and ~2.3.0 in terms of which versions a consumer accepts?

Check-your-understanding answers

  1. Adding a required output field is a MAJOR bump because existing consumers do not expect or handle the new field as required, and their validation logic may reject responses that include unexpected required fields. If the field were optional, it would be a MINOR bump.
  2. The registry diffs the output schema references between the new version and its parent. If the schema has structural changes (added required fields, removed fields, changed types), the registry flags that the declared patch scope is insufficient. Additionally, running the parent version’s evaluation suite against the new version catches behavioral changes that the schema diff might miss.
  3. Major bumps break existing consumers. A patch can be reviewed by the team lead alone because it cannot break anyone. A major bump needs platform team review because it requires coordinating migration across all consumers, and the blast radius of a mistake is much higher.
  4. ^2.3.0 accepts any version >=2.3.0 and <3.0.0 (compatible with major version 2). ~2.3.0 accepts any version >=2.3.0 and <2.4.0 (compatible with minor version 2.3). Tilde is more conservative, only accepting patches within the same minor version.

Real-world applications

  • npm uses semver ranges extensively to manage dependency compatibility across millions of packages, and the same range resolution algorithms apply directly to prompt version queries.
  • Terraform provider registries enforce semver for provider versions, with automated breaking change detection based on schema diffs, the same approach used for prompt output schema comparison.
  • MLflow 3.0 extended its registry to handle GenAI artifacts with commit-based versioning, connecting prompt configurations to evaluation runs and deployment metadata.
  • API versioning in services like Stripe and GitHub follows semver principles where breaking changes require new major versions, with extended deprecation windows for the old versions.

Where you’ll apply it

  • Phase 1: implement the version parsing, comparison, and range resolution logic.
  • Phase 2: build the automated change classification that compares new versions against parent versions to validate the declared semver bump.

References

  • Semantic Versioning 2.0.0 specification (semver.org)
  • “Designing Data-Intensive Applications” by Martin Kleppmann - Chapter 4: Encoding and Evolution
  • npm semver range documentation (node-semver)
  • “Accelerate” by Forsgren et al. - Chapters on version control practices

Key insights Semantic versioning for prompts is not about labeling text changes; it is about communicating the risk level and consumer impact of every change through a machine-readable classification system that enables automated compatibility checking.

Summary Semver for prompts classifies changes as MAJOR (breaks consumers), MINOR (adds capability, safe for existing consumers), or PATCH (no contract change). The registry validates these classifications by diffing output schemas and running parent evaluation suites. Version ranges let consumers declare their compatibility requirements, and the resolution algorithm automatically selects the highest compatible promoted version. Pre-release versions enable staging deployment, and build metadata can encode evaluation results.

Homework/Exercises to practice the concept

  • Given a list of 10 prompt changes (described in natural language), classify each as MAJOR, MINOR, or PATCH and justify your classification.
  • Implement the pseudocode for a semver range resolver that handles caret (^), tilde (~), exact (=), and range (>=X <Y) specifiers.
  • Design the automated change classifier that compares two prompt versions and outputs the minimum required version bump. Define what signals it checks (schema diff, eval suite results, model compatibility changes).

Solutions to the homework/exercises

  • Classifications should be based on the contract impact, not the text diff size. A one-word change that removes a required output field is MAJOR. A paragraph rewrite that does not change the output schema or test results is PATCH. Adding a new optional field is MINOR even if the text diff is large.
  • The resolver pseudocode should: parse the range specifier into a comparator set, query all PROMOTED versions for the prompt name, filter versions that satisfy the comparator set, sort descending by version precedence, and return the first match. Edge cases: pre-release versions only match if the range explicitly includes pre-release tags; build metadata is ignored for precedence.
  • The automated classifier should check: (1) output schema reference changes (structural diff), (2) model compatibility list changes (removed models = MAJOR, added models = MINOR), (3) parent evaluation suite pass rate (any regression = at least MINOR), (4) variable placeholder changes (new required variables = MAJOR, new optional with defaults = MINOR). The minimum bump is the highest severity signal found.

Compatibility Contracts and Migration Windows

Fundamentals Compatibility contracts and migration windows solve the coordination problem that arises when multiple consumers depend on the same prompt artifact. In microservice architectures, API contracts (OpenAPI specs, protobuf schemas) define the interface between producer and consumer. For prompts, the contract is the combination of the output schema, the set of variable placeholders, the model compatibility list, and the behavioral guarantees (e.g., “always returns valid JSON,” “never includes PII in the response”). When a prompt changes, the registry must determine which consumers are affected and whether the change is safe. Migration windows provide a time-bounded period during which both the old and new versions are available, giving consumers time to adapt before the old version is deprecated.

Deep Dive into the concept A compatibility contract is a formal declaration of what a consumer expects from a prompt. At minimum, it specifies: the prompt name, the version range the consumer supports (e.g., ^2.3.0), the output schema the consumer parses, and any behavioral assumptions the consumer makes (e.g., “response latency under 2 seconds,” “no tool calls in output”). The registry stores these consumer contracts alongside the prompt artifacts, creating a dependency graph: prompt A version 2.3.0 has consumers X, Y, Z each with their own version range and schema expectations.

When a new prompt version is published, the compatibility checker walks this dependency graph. For each consumer, it asks: does the new version satisfy the consumer’s version range? If yes, does the new version’s output schema remain compatible with the consumer’s expected schema? Schema compatibility has four modes, borrowed from Apache Avro and Confluent Schema Registry:

  • Backward compatible: new schema can read data written by old schema. For prompts, this means the new output can be parsed by consumers expecting the old schema.
  • Forward compatible: old schema can read data written by new schema. For prompts, this means the old consumer can still parse outputs from the new prompt.
  • Full compatible: both backward and forward compatible.
  • Breaking: neither backward nor forward compatible.

For prompt registries, backward compatibility is the primary concern because consumers need to parse the prompt’s output. Forward compatibility matters during migration windows when both old and new prompt versions are serving traffic simultaneously.

Migration windows are time-bounded periods during which both the old version and the new version are active. The registry manages this by allowing two PROMOTED versions of the same prompt to coexist. During the migration window, consumers are gradually shifted from the old version to the new version (often in conjunction with a canary rollout controller like Project 11). The window has a start date (when the new version is promoted), a notification date (when consumers receive deprecation warnings), and an end date (when the old version is archived and no longer resolvable).

Dependency tracking is the mechanism that makes compatibility checking possible. The registry maintains a table of consumer registrations: which service consumes which prompt at which version range. When a new version is published, the registry can immediately compute the impact: “4 consumers on ^2.3.0 will be affected by this 3.0.0 major release.” This impact report is attached to the approval workflow, giving reviewers concrete data about the blast radius of the change.

Consumer notification is the communication layer. When a new version is promoted that falls within a consumer’s range, the consumer is notified (via webhook, message queue, or polling endpoint) to update their cached version. When a version enters deprecation, consumers receive warnings with the deprecation date and the recommended migration target. When the migration window closes, consumers still on the old version receive errors directing them to update.

How this fit on projects Compatibility contracts and migration windows are the core of the compatibility checking engine in Project 15. The registry API must support consumer registration, dependency queries, impact analysis, and migration window management. The approval workflow uses compatibility check results to gate promotions.

Definitions & key terms

  • Consumer contract: A formal declaration of what a consuming service expects from a prompt (version range, output schema, behavioral assumptions).
  • Dependency graph: The mapping from prompt artifacts to the consumers that depend on them, enabling impact analysis.
  • Schema compatibility mode: The classification of how two schemas relate (backward, forward, full, or breaking).
  • Migration window: A time-bounded period during which old and new prompt versions coexist, giving consumers time to transition.
  • Impact report: An automatically generated summary of which consumers are affected by a proposed prompt version change.

Mental model diagram (ASCII)

Prompt: refund_policy_assistant

  +-------------------+
  | v2.3.0 (current)  |----+---- Consumer A (^2.0.0, schema: refund.v2)
  | status: PROMOTED  |    |
  +-------------------+    +---- Consumer B (~2.3.0, schema: refund.v2)
                           |
                           +---- Consumer C (^2.3.0, schema: refund.v2)
  +-------------------+
  | v3.0.0 (new)      |
  | status: DRAFT     |
  | schema: refund.v3 |
  +-------------------+

  Compatibility Check for v3.0.0:
  +-------------------------------------------------------------------+
  | Consumer | Range    | Schema Match | Impact     | Action Needed   |
  |----------|----------|--------------|------------|-----------------|
  | A        | ^2.0.0   | NO (v3 != v2)| BREAKING  | Update range    |
  | B        | ~2.3.0   | NO (v3 != v2)| BREAKING  | Update range    |
  | C        | ^2.3.0   | NO (v3 != v2)| BREAKING  | Update range    |
  +-------------------------------------------------------------------+
  Result: PROMOTION BLOCKED -- all consumers on v2.x schema

  Migration Window Plan:
  +-------------------------------------------------------------------+
  | Phase      | Duration | v2.3.0 Status | v3.0.0 Status | Traffic  |
  |------------|----------|---------------|---------------|----------|
  | Announce   | Week 1   | PROMOTED      | STAGING       | 100%/0%  |
  | Canary     | Week 2-3 | PROMOTED      | PROMOTED      | 90%/10%  |
  | Ramp       | Week 4-5 | DEPRECATED    | PROMOTED      | 20%/80%  |
  | Complete   | Week 6   | ARCHIVED      | PROMOTED      | 0%/100%  |
  +-------------------------------------------------------------------+

How it works (step-by-step, with invariants and failure modes)

  1. Consumers register their dependency with the registry: prompt name, version range, expected output schema, and callback webhook. Invariant: every consumer registration must include a valid version range and schema reference. Failure mode: registrations with invalid ranges are rejected.
  2. When a new prompt version is published, the compatibility checker queries all consumer registrations for that prompt name. Invariant: the dependency graph is always up to date (consumers must re-register when they change their requirements). Failure mode: stale registrations lead to missed impact analysis; the registry sends periodic health checks to consumers to detect stale entries.
  3. For each consumer, the checker evaluates: (a) does the new version fall within the consumer’s range? (b) is the new version’s schema compatible with the consumer’s expected schema? The checker uses schema diff rules appropriate to the compatibility mode. Invariant: the checker never approves a promotion that would break a registered consumer without an explicit migration plan. Failure mode: if the schema diff produces an ambiguous result (e.g., a renamed field that might or might not break parsing), the checker flags it as NEEDS_REVIEW rather than automatically allowing or blocking.
  4. If incompatibilities are found, the registry generates an impact report listing affected consumers, the nature of each incompatibility, and recommended actions. This report is attached to the approval workflow. Invariant: no major version can be promoted without a migration plan that addresses every affected consumer. Failure mode: promoting without a migration plan leaves consumers on broken versions with no path forward.
  5. During the migration window, the registry serves both old and new versions. Consumer resolution queries specify which version they want (by range), and the registry returns the appropriate one. The registry tracks migration progress: how many consumers have updated their range to include the new version. Invariant: the migration window cannot close until all registered consumers have migrated or been explicitly exempted. Failure mode: closing the window prematurely orphans consumers still dependent on the old version.

Minimal concrete example

Consumer registration:
  POST /v1/consumers
  {
    "service_name": "refund-processor",
    "prompt_name": "refund_policy_assistant",
    "version_range": "^2.3.0",
    "expected_schema": "schemas/refund_response.v2.json",
    "webhook": "https://refund-processor.internal/prompt-updates"
  }

Impact report for v3.0.0:
  {
    "prompt_name": "refund_policy_assistant",
    "proposed_version": "3.0.0",
    "impact": [
      {
        "consumer": "refund-processor",
        "current_range": "^2.3.0",
        "schema_compatible": false,
        "breaking_fields": ["decision.eligible renamed from refund_eligible"],
        "recommended_action": "Update consumer to handle refund.v3 schema, then update range to ^3.0.0"
      },
      {
        "consumer": "support-dashboard",
        "current_range": "^2.0.0",
        "schema_compatible": false,
        "breaking_fields": ["flat object replaced by nested structure"],
        "recommended_action": "Rewrite parser for nested output, then update range to ^3.0.0"
      }
    ],
    "verdict": "PROMOTION_BLOCKED",
    "migration_plan_required": true
  }

Migration window creation:
  POST /v1/migrations
  {
    "prompt_name": "refund_policy_assistant",
    "from_version": "2.3.0",
    "to_version": "3.0.0",
    "window_start": "2026-02-01",
    "notification_date": "2026-02-08",
    "window_end": "2026-03-15",
    "affected_consumers": ["refund-processor", "support-dashboard"]
  }

Common misconceptions

  • “Compatibility checking is only needed for major versions.” Minor versions can introduce subtle behavioral changes that break consumers even if the schema has not changed structurally. Running the parent evaluation suite against new minor versions catches these.
  • “Consumers will just update when we tell them to.” Without tracked migration windows and automated deprecation enforcement, consumer teams deprioritize updates indefinitely. The registry must enforce deadlines.
  • “Schema compatibility is binary: compatible or not.” Schema compatibility has degrees (backward, forward, full, breaking) and ambiguous cases (renamed fields, changed constraints). The checker must handle nuance, not just yes/no.
  • “Migration windows are overhead we cannot afford.” The alternative is uncoordinated breaking changes that cause production incidents. Migration windows are cheaper than incident response.

Check-your-understanding questions

  1. Why does the registry need consumer registrations rather than just checking compatibility at publish time?
  2. What is the difference between backward and forward schema compatibility, and which matters more for prompt registries?
  3. How should the registry handle a consumer that refuses to migrate before the window closes?
  4. Why must the migration window track progress (how many consumers have migrated) rather than just enforcing a deadline?

Check-your-understanding answers

  1. Without consumer registrations, the registry does not know who uses which prompt at which version. It cannot compute impact reports, generate targeted notifications, or enforce migration deadlines. Checking compatibility at publish time only verifies the schema itself, not the actual consumer dependencies.
  2. Backward compatibility means new output can be parsed by old consumers. Forward compatibility means old output can be parsed by new consumers. For prompt registries, backward compatibility is primary because the prompt is the producer and consumers parse its output. Forward compatibility matters during migration windows when old and new prompts coexist.
  3. The registry should escalate through a defined ladder: warning notifications, then blocking the consumer’s resolution queries from returning the deprecated version (forcing them to explicitly request it with a bypass flag), and finally requiring platform team sign-off for any window extension. The consumer’s engineering lead should be notified at each step.
  4. A deadline alone does not reveal whether the migration is on track. If the window ends in one week but zero of five consumers have migrated, the team needs to know now (not at the deadline) so they can extend the window or escalate. Progress tracking enables early intervention.

Real-world applications

  • Confluent Schema Registry provides backward, forward, and full compatibility modes for Kafka message schemas, with automatic compatibility checking at registration time.
  • Google Cloud API versioning uses sunset dates and migration windows, with automated warnings to consumers using deprecated API versions.
  • Stripe API versioning maintains multiple API versions simultaneously, with each account pinned to a version and explicit upgrade paths documented for each breaking change.
  • Kubernetes API deprecation policy requires at least two minor releases between deprecation announcement and removal, functioning as a migration window.

Where you’ll apply it

  • Phase 2: build the consumer registration endpoint, the compatibility checker, and the impact report generator.
  • Phase 3: implement migration window management with progress tracking and deprecation enforcement.

References

  • “Designing Data-Intensive Applications” by Martin Kleppmann - Chapter 4: schema evolution and compatibility
  • Confluent Schema Registry documentation (compatibility modes)
  • Stripe API versioning guide (migration windows and sunset policies)
  • Apache Avro specification (schema resolution and compatibility rules)

Key insights Compatibility checking is not just a schema diff; it is a dependency graph traversal that connects every proposed change to its concrete consumer impact, enabling informed promotion decisions and coordinated migration windows.

Summary Compatibility contracts formalize what consumers expect from a prompt (version range, output schema, behavioral assumptions). The registry maintains a dependency graph of these contracts, enabling automated impact analysis when new versions are published. Migration windows provide time-bounded coexistence of old and new versions, with progress tracking, consumer notification, and deprecation enforcement. Schema compatibility checking follows established modes (backward, forward, full, breaking) adapted from message schema registries.

Homework/Exercises to practice the concept

  • Design the consumer registration data model and the dependency graph query that answers: “Which consumers would be affected by promoting version X.Y.Z of prompt P?”
  • Implement pseudocode for a schema compatibility checker that takes two JSON schemas (old and new) and classifies their relationship as backward-compatible, forward-compatible, full-compatible, or breaking. Handle added fields, removed fields, renamed fields, and type changes.
  • Create a migration window management plan for transitioning a prompt from v2.x to v3.x across 5 consumer services. Include the timeline, notification schedule, progress milestones, and escalation triggers.

Solutions to the homework/exercises

  • The consumer registration model stores: service_name (unique key), prompt_name, version_range, expected_schema_ref, webhook_url, registered_at, last_health_check. The dependency graph query filters registrations by prompt_name, then for each registration evaluates whether the new version falls within the range and whether the schema is compatible. The query returns a list of {consumer, impact_type, breaking_fields} tuples.
  • The schema compatibility checker should compare field sets: old-only fields indicate backward incompatibility (consumer expects them but new version might not produce them). New-only fields indicate forward incompatibility (old consumers do not expect them). Renamed fields are detected by heuristic (same type, similar name) and flagged as NEEDS_REVIEW. Type changes on shared fields are breaking. The overall classification is the most restrictive result across all fields.
  • The migration plan should have phases: Announcement (week 1, notify all 5 consumers), Canary (weeks 2-3, 10% traffic to v3), Ramp (weeks 4-5, progressive increase), Consumer checkpoint (week 4, 3 of 5 consumers must have migrated or the window extends), Completion (week 6, v2 deprecated), Archival (week 8, v2 no longer resolvable). Escalation triggers: fewer than 2 consumers migrated by week 3, any consumer health check failure during migration, any production incident attributed to version mismatch.

Approval Workflows and Audit Trails

Fundamentals Approval workflows and audit trails provide the governance layer that turns a prompt registry from a storage system into a trusted operational platform. In regulated industries (finance, healthcare, government), every change to a production system must be traceable to a specific person, approved by authorized reviewers, and recorded in an immutable log. Even in unregulated environments, approval workflows prevent accidental promotions of untested prompts and audit trails enable incident investigation. The approval workflow defines who can do what and when; the audit trail records what actually happened. Together, they provide both preventive control (blocking unauthorized changes) and detective control (identifying what went wrong after an incident).

Deep Dive into the concept An approval workflow for prompt artifacts is a state machine. Each prompt version moves through a defined set of states, and each state transition requires authorization from a specific role. A typical workflow has five states: DRAFT (author has published but not submitted for review), REVIEW (submitted and awaiting reviewer assessment), APPROVED (reviewer has signed off), PROMOTED (platform lead has released to production), and DEPRECATED (marked for retirement). Some organizations add intermediate states like STAGING (deployed to staging environment for integration testing) or CANARY (serving a fraction of production traffic).

Each state transition has preconditions. Moving from DRAFT to REVIEW requires: the author is not the reviewer (separation of duties), all required metadata fields are populated, and the automated compatibility check has been run. Moving from REVIEW to APPROVED requires: at least one reviewer has approved, no outstanding blocking comments exist, and the evaluation suite has passed. Moving from APPROVED to PROMOTED requires: the compatibility check shows no breaking changes for pinned consumers (or a migration plan exists), the platform team lead has approved (for major versions), and the promotion window is valid (not during a change freeze).

Role-based access control (RBAC) maps organizational roles to workflow permissions. Common roles include: Author (can publish DRAFT versions), Reviewer (can approve or request changes), Platform Lead (can promote to production or initiate rollback), Auditor (read-only access to all versions and logs), and Admin (can modify workflow configuration and role assignments). The principle of least privilege applies: authors cannot promote their own prompts, reviewers cannot modify prompt content, and platform leads cannot bypass the review step.

Audit trails record every action taken in the registry with immutable, append-only entries. Each audit entry includes: the action type (publish, submit_review, approve, reject, promote, deprecate, rollback), the actor (user ID and role), the timestamp, the target artifact (prompt name and version), the previous state, the new state, the reason or comment, and a diff summary (what changed in the prompt content or metadata). Audit entries are never deleted or modified; corrections are recorded as new entries that reference the corrected entry.

For compliance purposes, the audit trail must be tamper-evident. One approach is to chain entries using cryptographic hashes: each entry includes the hash of the previous entry, creating a hash chain similar to a blockchain. If any entry is modified after the fact, the hash chain breaks, making tampering detectable. A simpler approach is to write audit entries to an append-only log store (like Amazon QLDB or a write-once S3 bucket) where the infrastructure guarantees immutability.

Approval workflows interact with CI/CD pipelines. When a developer submits a prompt for review, the CI pipeline runs automated checks: evaluation suite, compatibility analysis, security scan (for prompt injection patterns), and cost estimation. The results are attached to the review request, giving reviewers data-driven context. When a prompt is promoted, the CD pipeline deploys the new version to the prompt resolution infrastructure. This integration ensures that the approval workflow is not a bottleneck; automated checks handle routine validation, and human review focuses on judgment calls.

How this fit on projects Approval workflows and audit trails are the governance engine of Project 15. Every state transition in the prompt lifecycle passes through the approval workflow, and every mutation is recorded in the audit trail. The API endpoints for publish, review, approve, promote, deprecate, and rollback all write to the audit log and enforce workflow rules.

Definitions & key terms

  • State machine: A formal model where a prompt version exists in one of a finite set of states, with defined transitions between states triggered by authorized actions.
  • Separation of duties: The principle that the author of a change cannot also approve it, preventing unchecked self-promotion.
  • Hash chain: An audit trail structure where each entry includes the hash of the previous entry, creating a tamper-evident sequence.
  • Change freeze: A scheduled period during which no promotions are allowed, typically around major business events.
  • Promotion window: The set of time periods when promotions are permitted (excluding change freezes and maintenance windows).

Mental model diagram (ASCII)

Prompt Version Lifecycle (State Machine):

  +-------+   publish   +--------+   submit    +--------+
  | (new) | ----------> | DRAFT  | ----------> | REVIEW |
  +-------+             +--------+             +--------+
                             ^                   |    |
                             |            reject |    | approve
                             +----- (revise) ----+    |
                                                      v
                                                 +----------+
                                          +----->| APPROVED  |
                                          |      +----------+
                                          |           |
                                          |           | promote
                                          |           v
                                          |      +----------+
                                          |      | PROMOTED |
                                          |      +----------+
                                          |           |
                                          |           | deprecate
                                          |           v
                                          |      +------------+
                                          |      | DEPRECATED |
                                          |      +------------+
                                          |           |
                                          |           | archive
                                rollback  |           v
                                (to prev  |      +----------+
                                 promoted)|      | ARCHIVED |
                                          |      +----------+
                                          |
                                     +----------+
                                     | ROLLBACK |
                                     +----------+

  Audit Trail Entry:
  +------------------------------------------------------------------+
  | entry_id: aud_00451                                              |
  | prev_hash: sha256:9f8e7d...                                     |
  | action: PROMOTE                                                  |
  | actor: platform-lead@corp (role: PLATFORM_LEAD)                  |
  | timestamp: 2026-01-15T14:30:00Z                                  |
  | target: refund_policy_assistant v2.3.1                           |
  | prev_state: APPROVED                                             |
  | new_state: PROMOTED                                              |
  | reason: "Eval suite pass rate 99.2%, no breaking consumers"      |
  | diff_summary: "Clarified context formatting (3 lines changed)"   |
  | eval_run_id: eval_run_0089                                       |
  | compatibility_report_id: compat_0045                             |
  | entry_hash: sha256:a1b2c3...                                    |
  +------------------------------------------------------------------+

How it works (step-by-step, with invariants and failure modes)

  1. Author publishes a DRAFT version via the API. The audit trail records the publish action with the full prompt content hash, metadata, and actor identity. Invariant: every mutation to the registry creates an audit entry before the mutation is committed. Failure mode: if the audit write fails, the mutation is rolled back (audit-first pattern).
  2. Author submits the version for review. The workflow engine verifies preconditions: all required metadata is populated, the automated compatibility check has been run, and the author is not self-reviewing. Invariant: the submit action includes the compatibility report ID and evaluation run ID. Failure mode: missing reports block the submission with a clear error listing what is needed.
  3. A reviewer with the REVIEWER role examines the change. They see the diff, the compatibility report, and the evaluation results. They can approve, request changes, or reject. Each action creates an audit entry with the reviewer’s comments. Invariant: at least one reviewer must approve before the version can be promoted. Failure mode: conflicting reviews (one approve, one reject) are resolved by requiring all blocking comments to be addressed.
  4. A platform lead promotes the approved version. The workflow engine verifies: the version is APPROVED, no change freeze is active, and the compatibility report shows either no breaking changes or an active migration plan. The audit trail records the promotion with the platform lead’s identity and reason. Invariant: the promotion is atomic (the version becomes resolvable by consumer queries at the exact moment the audit entry is committed). Failure mode: if the promotion write fails, the audit entry records the failure and the version remains APPROVED.
  5. If an incident occurs, the platform lead can initiate a rollback. The rollback action promotes the previous PROMOTED version and deprecates the current one. The audit trail records the rollback with the incident ID and reason. Invariant: rollback is always available regardless of workflow state (emergency bypass). Failure mode: if no previous PROMOTED version exists, the rollback fails with a clear error.

Minimal concrete example

Approval workflow configuration:
  {
    "workflow_name": "standard_prompt_review",
    "states": ["DRAFT", "REVIEW", "APPROVED", "PROMOTED", "DEPRECATED", "ARCHIVED"],
    "transitions": [
      { "from": "DRAFT", "to": "REVIEW", "required_role": "AUTHOR", "preconditions": ["metadata_complete", "compat_check_run"] },
      { "from": "REVIEW", "to": "APPROVED", "required_role": "REVIEWER", "preconditions": ["no_blocking_comments", "eval_suite_passed"] },
      { "from": "REVIEW", "to": "DRAFT", "required_role": "REVIEWER", "preconditions": [] },
      { "from": "APPROVED", "to": "PROMOTED", "required_role": "PLATFORM_LEAD", "preconditions": ["no_change_freeze", "compat_report_clean_or_migration_plan"] },
      { "from": "PROMOTED", "to": "DEPRECATED", "required_role": "PLATFORM_LEAD", "preconditions": ["migration_window_active"] },
      { "from": "DEPRECATED", "to": "ARCHIVED", "required_role": "ADMIN", "preconditions": ["migration_window_closed", "no_active_consumers"] }
    ],
    "emergency_bypass": { "action": "ROLLBACK", "required_role": "PLATFORM_LEAD", "audit_required": true }
  }

Audit trail query:
  GET /v1/audit?prompt=refund_policy_assistant&from=2026-01-01&to=2026-01-31

  [
    { "entry_id": "aud_00449", "action": "PUBLISH", "actor": "alice@corp", "version": "2.3.1", "state_change": "-> DRAFT" },
    { "entry_id": "aud_00450", "action": "APPROVE", "actor": "bob@corp", "version": "2.3.1", "state_change": "REVIEW -> APPROVED" },
    { "entry_id": "aud_00451", "action": "PROMOTE", "actor": "carol@corp", "version": "2.3.1", "state_change": "APPROVED -> PROMOTED" }
  ]

Common misconceptions

  • “Approval workflows slow down iteration.” Well-designed workflows with automated precondition checks (eval suite, compatibility report) reduce review time by giving reviewers all the data they need upfront. The bottleneck is usually insufficient automation, not the workflow itself.
  • “Audit trails are only needed for compliance.” Audit trails are the primary tool for incident investigation. When a prompt regression causes a production issue, the audit trail tells you exactly what changed, when, and who approved it, cutting diagnosis time from hours to minutes.
  • “Separation of duties is too strict for small teams.” Even on a two-person team, having a second person review prompt changes catches errors. The workflow can be simplified (fewer states, fewer roles) but not eliminated.
  • “Emergency rollbacks should skip the audit trail.” Emergency rollbacks must be recorded in the audit trail with even more detail than normal changes (incident ID, timestamp, blast radius, reason). Skipping the audit trail during emergencies creates blind spots in exactly the situations where traceability matters most.
  • “Hash chains for audit trails are overkill.” For regulated environments, tamper evidence is a compliance requirement. Even for unregulated environments, hash chains are cheap to implement and provide strong guarantees against accidental or malicious audit modification.

Check-your-understanding questions

  1. Why must the audit entry be written before the state transition is committed (audit-first pattern)?
  2. What is the purpose of separation of duties in the prompt approval workflow?
  3. How does the hash chain in the audit trail detect tampering?
  4. Why should emergency rollbacks still follow the audit trail rather than bypassing it?
  5. What information should a reviewer have access to when evaluating a prompt version for approval?

Check-your-understanding answers

  1. If the state transition is committed first and the audit write fails, you have an unrecorded change. If the audit entry is written first and the state transition fails, you have a recorded failed attempt (which is useful information). The audit-first pattern ensures that the audit trail is always at least as complete as the actual state, never less.
  2. Separation of duties prevents a single person from introducing and approving a change, which could be accidental (author is blind to their own errors) or malicious (author intentionally introduces a harmful prompt). Requiring a second person provides an independent quality check.
  3. Each audit entry includes the hash of the previous entry. If an entry in the middle of the chain is modified (e.g., changing the actor or timestamp), its hash changes, which breaks the link from the next entry. Verification walks the chain from the first entry forward, recomputing hashes and comparing them to the stored next-entry reference.
  4. Emergency rollbacks are the highest-risk changes (made under time pressure, with limited analysis). Recording them in the audit trail with incident context enables post-incident review: was the rollback justified? Was it executed correctly? What should change to prevent the need for future emergency rollbacks?
  5. Reviewers should see: the diff between the new version and its parent, the automated compatibility report, the evaluation suite results (pass rate, regression details), the change description from the author, the list of affected consumers, and any security scan findings. This data-driven review reduces the chance of rubber-stamping or unnecessary delays.

Real-world applications

  • SOC 2 compliance requires that all production changes be authorized, logged, and traceable, directly mapping to the approval workflow and audit trail in a prompt registry.
  • GitHub Pull Request workflows implement a simplified version of this pattern: authors create, reviewers approve, branch protection rules enforce required reviews before merge, and the Git log serves as the audit trail.
  • AWS CloudTrail records every API call with the actor, timestamp, and parameters, providing the same audit trail capability for infrastructure changes that we need for prompt changes.
  • Google’s Borg system requires change approval for production deployments, with automated checks running before human review, matching the CI-integrated approval workflow described here.

Where you’ll apply it

  • Phase 1: implement the state machine and RBAC enforcement for the prompt lifecycle.
  • Phase 2: build the audit trail with immutable entries and hash chain verification.
  • Phase 3: integrate the approval workflow with CI pipeline triggers (eval suite, compatibility check) and add the emergency rollback path.

References

  • “Site Reliability Engineering” by Google - Chapters on change management and incident response
  • “Accelerate” by Forsgren et al. - Chapters on deployment automation and lean management
  • SOC 2 Type II compliance requirements for change management
  • AWS CloudTrail documentation (audit trail design patterns)
  • “Designing Data-Intensive Applications” by Martin Kleppmann - Chapter 12: The Future of Data Systems (auditability)

Key insights The approval workflow is the preventive control that stops bad changes before they reach production; the audit trail is the detective control that explains what happened after an incident. Both are necessary, and neither is sufficient alone.

Summary Approval workflows enforce a state machine over prompt version lifecycles with role-based authorization, separation of duties, and automated precondition checks. Audit trails record every action immutably, with tamper-evident hash chains for compliance-grade traceability. Together, they transform the registry from a passive storage system into an active governance platform that prevents unauthorized changes and enables rapid incident investigation.

Homework/Exercises to practice the concept

  • Design the state machine for a prompt version lifecycle with at least 6 states and all transitions. For each transition, specify the required role, preconditions, and what the audit entry should contain.
  • Implement pseudocode for a tamper-evident audit trail using hash chains. Include the write function (append entry with prev_hash) and the verify function (walk the chain and check hash consistency).
  • Create an RBAC matrix for a 4-role system (Author, Reviewer, Platform Lead, Auditor) showing which actions each role can perform and which states they can transition.

Solutions to the homework/exercises

  • The state machine should include: DRAFT, REVIEW, APPROVED, STAGING (optional), PROMOTED, DEPRECATED, ARCHIVED, and ROLLBACK (transient). Key constraints: DRAFT -> REVIEW requires author != reviewer; REVIEW -> APPROVED requires eval_pass and compat_check; APPROVED -> PROMOTED requires platform_lead and no_change_freeze; PROMOTED -> DEPRECATED requires migration_window_active; DEPRECATED -> ARCHIVED requires no_active_consumers. Audit entries for each transition should include: entry_id, prev_hash, action, actor (id + role), timestamp, target (prompt + version), prev_state, new_state, reason, and references to supporting artifacts (eval_run_id, compat_report_id).
  • The hash chain pseudocode: write(entry): compute entry.prev_hash = hash(last_entry); compute entry.entry_hash = hash(entry_without_hash_field); append to log. verify(): for each entry in order, recompute hash(entry_without_hash_field) and compare to stored entry_hash; also check entry.prev_hash == previous_entry.entry_hash. If any mismatch, report the tampered entry index.
  • RBAC matrix: Author can PUBLISH, SUBMIT_REVIEW. Reviewer can APPROVE, REJECT, REQUEST_CHANGES. Platform Lead can PROMOTE, ROLLBACK, DEPRECATE. Auditor can READ_ALL, VERIFY_CHAIN (no write permissions). Admin (bonus role) can CONFIGURE_WORKFLOW, ASSIGN_ROLES, ARCHIVE.

3. Project Specification

3.1 What You Will Build

A registry API for versioned prompt artifacts, compatibility checks, approvals, and audit history.

3.2 Functional Requirements

  1. Register prompt artifacts with semantic versions and contract references.
  2. Run compatibility checks against declared consumer ranges.
  3. Support approval workflow before version promotion.
  4. Expose audit log of who changed what and when.

3.3 Non-Functional Requirements

  • Performance: Registry write/read operations under 150 ms p95.
  • Reliability: Version comparisons and compatibility checks are deterministic.
  • Security/Policy: Only authorized maintainers can publish or promote versions.

3.4 Example Usage / Output

$ npm run dev --workspace p15-prompt-registry
[ready] listening on http://localhost:3000

$ curl -s http://localhost:3000/v1/prompts \
  -H 'content-type: application/json' \
  -d '{
  "name": "refund_policy_assistant",
  "version": "2.3.0",
  "contract": "contracts/refund.v2.json",
  "owner": "support-platform"
}' | jq
{
  "id": "prm_00231",
  "name": "refund_policy_assistant",
  "version": "2.3.0",
  "status": "REGISTERED",
  "compatibility": "PASS"
}

3.5 Data Formats / Schemas / Protocols

  • Prompt manifest JSON: name, version, contract ref, owner, tags.
  • Compatibility report JSON: impacted consumers and breaking fields.
  • Audit log JSONL with actor, action, timestamp, and diff summary.

3.6 Edge Cases

  • Two teams publish same semantic version simultaneously.
  • Contract reference points to missing artifact.
  • Consumer declares invalid version range.
  • Rollback to prior version with revoked policy metadata.

3.7 Real World Outcome

This project is complete when your API can serve valid requests with typed responses and reject invalid/high-risk requests with a unified error shape.

3.7.1 How to Run (Copy/Paste)

$ npm run dev --workspace p15-prompt-registry

3.7.2 Golden Path Demo (Deterministic)

Use fixed fixture payloads and verify the same response shape and decision fields every run.

3.7.3 API Endpoints

| Method | Endpoint | Purpose | |——–|———-|———| | POST | /v1/prompts | Publish a new prompt version | | GET | /v1/prompts/:name | Resolve version by name and range | | POST | /v1/prompts/:name/:version/approve | Approve a prompt version | | POST | /v1/prompts/:name/:version/promote | Promote to production | | POST | /v1/consumers | Register a consumer dependency | | GET | /v1/compatibility/:name/:version | Run compatibility check | | GET | /v1/audit | Query audit trail | | POST | /v1/migrations | Create migration window |

3.7.4 Success Response Example

$ curl -s http://localhost:3000/v1/prompts \
  -H 'content-type: application/json' \
  -d '{
  "name": "refund_policy_assistant",
  "version": "2.3.0",
  "contract": "contracts/refund.v2.json",
  "owner": "support-platform"
}' | jq
{
  "id": "prm_00231",
  "name": "refund_policy_assistant",
  "version": "2.3.0",
  "status": "REGISTERED",
  "compatibility": "PASS"
}

3.7.5 Error Response Example

$ curl -s http://localhost:3000/v1/prompts \
  -H 'content-type: application/json' \
  -d '{
  "name": "refund_policy_assistant",
  "version": "2.3.0",
  "contract": "contracts/refund.v3_breaking.json",
  "owner": "support-platform"
}' | jq
{
  "error": {
    "code": "COMPATIBILITY_FAIL",
    "message": "Breaking contract change detected for consumers on v2.x.",
    "trace_id": "trc_p15_311",
    "project": "P15"
  }
}

4. Solution Architecture

4.1 High-Level Design

                          +------------------------+
                          |   Registry API         |
                          |  (publish, resolve,    |
                          |   approve, promote)    |
                          +------------------------+
                           /         |          \
                          v          v           v
            +-------------+  +--------------+  +---------------+
            | Blob Store  |  | Metadata     |  | Audit Trail   |
            | (content-   |  | Store        |  | (append-only  |
            |  addressed) |  | (versions,   |  |  hash chain)  |
            +-------------+  |  consumers,  |  +---------------+
                             |  workflows)  |
                             +--------------+
                                    |
                                    v
                          +------------------------+
                          | Compatibility Checker  |
                          | (schema diff, range    |
                          |  resolution, impact)   |
                          +------------------------+
                                    |
                                    v
                          +------------------------+
                          | Approval Workflow      |
                          | (state machine, RBAC,  |
                          |  CI integration)       |
                          +------------------------+

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Registry API | Stores and retrieves prompt artifacts by version. | Semantic versioning is required and validated. | | Blob Store | Content-addressable storage for prompt templates. | SHA-256 hashing with deduplication. | | Metadata Store | Version graphs, consumer registrations, workflow state. | Queryable relational or document database. | | Compatibility Checker | Detects breaking changes against consumers. | Fail promotion on unresolved breakage. | | Approval Workflow | State machine enforcing RBAC and preconditions. | Separation of duties required. | | Audit Trail | Persists immutable change history. | Hash-chained, append-only entries. |

4.3 Data Structures (No Full Code)

PromptArtifact:
- name: string
- version: semver
- content_hash: sha256
- parent_version: semver | null
- author: actor_id
- status: DRAFT | REVIEW | APPROVED | PROMOTED | DEPRECATED | ARCHIVED
- model_compatibility: string[]
- output_schema_ref: string
- tags: string[]
- change_description: string
- created_at: timestamp

ConsumerRegistration:
- service_name: string
- prompt_name: string
- version_range: semver_range
- expected_schema_ref: string
- webhook_url: string

AuditEntry:
- entry_id: string
- prev_hash: sha256
- action: PUBLISH | SUBMIT | APPROVE | REJECT | PROMOTE | DEPRECATE | ROLLBACK
- actor: { id: string, role: string }
- timestamp: iso8601
- target: { prompt_name: string, version: semver }
- prev_state: string
- new_state: string
- reason: string
- references: { eval_run_id?, compat_report_id? }
- entry_hash: sha256

4.4 Algorithm Overview

Key algorithm: Version resolution with compatibility checking

  1. Parse consumer’s version range into comparator set.
  2. Query all PROMOTED versions of the requested prompt name.
  3. Filter versions satisfying the comparator set.
  4. Sort descending by semver precedence.
  5. Return the highest matching version with content hash and metadata.
  6. For promotion requests: traverse the consumer dependency graph, run schema diffs against each consumer’s expected schema, and generate an impact report.

Complexity Analysis (conceptual):

  • Time: O(V) for version resolution where V is the number of promoted versions. O(V * C) for compatibility checking where C is the number of registered consumers.
  • Space: O(1) for resolution (streaming filter), O(C) for impact report storage.

5. Implementation Guide

5.1 Development Environment Setup

# 1) Install dependencies
# 2) Prepare fixtures under fixtures/
# 3) Run the project command(s) listed in section 3.7

5.2 Project Structure

p15/
├── src/
│   ├── api/              # Express/Hono route handlers
│   ├── registry/         # Publish, resolve, version logic
│   ├── compatibility/    # Schema diff, consumer tracking
│   ├── workflow/         # State machine, RBAC
│   ├── audit/            # Append-only log, hash chain
│   └── storage/          # Blob store, metadata store adapters
├── fixtures/
│   ├── prompts/          # Sample prompt templates
│   ├── schemas/          # Output schema fixtures
│   └── consumers/        # Consumer registration fixtures
├── policies/
├── out/
└── README.md

5.3 The Core Question You’re Answering

“How do teams collaborate on prompt artifacts without breaking downstream consumers?”

This question matters because it forces you to build a system where changes are tracked, impacts are analyzed, and promotions are gated, replacing the ad-hoc prompt editing that causes production regressions in most organizations.

5.4 Concepts You Must Understand First

  1. Content-addressable storage
    • How do container registries and Git use content hashing for immutable artifact storage?
    • Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 4
  2. Semantic versioning and range resolution
    • How do package managers resolve version ranges to concrete versions?
    • Book Reference: Semantic Versioning 2.0.0 specification
  3. Schema evolution and compatibility
    • How does Confluent Schema Registry classify schema changes as backward/forward/breaking?
    • Book Reference: “Designing Data-Intensive Applications” by Kleppmann - Ch. 4
  4. Change management and approval workflows
    • How do CI/CD systems integrate automated checks with human review?
    • Book Reference: “Accelerate” by Forsgren et al.

5.5 Questions to Guide Your Design

  1. Storage and identity
    • How will you compute and store content-addressable hashes?
    • What is the canonical form of a prompt template for hashing purposes?
    • How do you handle deduplication when two versions have identical content?
  2. Versioning and resolution
    • How will you implement semver range resolution (caret, tilde, exact, range)?
    • How do you validate that the declared version bump matches the actual change scope?
    • What happens when a consumer’s range matches no promoted version?
  3. Compatibility and migration
    • How will you diff output schemas to detect breaking changes?
    • How do you track consumer dependencies and compute impact reports?
    • How will you manage migration windows with progress tracking?
  4. Governance and auditability
    • How will you implement the state machine for the approval workflow?
    • How do you enforce separation of duties?
    • How will you make the audit trail tamper-evident?

5.6 Thinking Exercise

Pre-Mortem for Prompt Registry + Versioning Service

Before implementing, write down 10 ways this project can fail in production. Classify each failure into: storage, versioning, compatibility, governance, or operations.

Questions to answer:

  • Which failures can be prevented by schema validation at publish time?
  • Which failures require runtime monitoring (e.g., consumer health checks)?
  • Which failures are organizational rather than technical (e.g., teams bypassing the registry)?

5.7 The Interview Questions They’ll Ask

  1. “Why is semantic versioning useful for prompts when prompt behavior is probabilistic?”
  2. “How would you design a compatibility check for prompt output schemas?”
  3. “What qualifies as a breaking change in a prompt context?”
  4. “How do you enforce separation of duties without creating bottlenecks?”
  5. “Describe how you would implement a tamper-evident audit trail.”
  6. “How do migration windows interact with canary rollouts?”

5.8 Hints in Layers

Hint 1: Start with the blob store and metadata store Implement content-addressable publish and query before adding versioning logic. Verify that the same content always produces the same hash.

Hint 2: Add version resolution before compatibility checking Build the semver range resolver using known-good fixtures. Verify that ^2.3.0 resolves correctly against a set of test versions.

Hint 3: Layer in consumer registrations and impact reports You cannot reason about breakage without knowing who consumes what. Build the dependency graph before the compatibility checker.

Hint 4: Add the approval workflow as a state machine Model states and transitions explicitly. Use a configuration object (not if-else chains) to define the workflow, making it testable and modifiable.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Schema evolution | “Designing Data-Intensive Applications” by Martin Kleppmann | Ch. 4: Encoding and Evolution | | Engineering throughput | “Accelerate” by Forsgren et al. | Change management chapters | | Service governance | “Building Microservices” by Sam Newman | API governance chapters | | Production reliability | “Site Reliability Engineering” by Google | Change management chapters |

5.10 Implementation Phases

Phase 1: Foundation (Registry Core)

  • Implement the blob store with content-addressable hashing (SHA-256).
  • Build the metadata store schema for prompt artifacts and versions.
  • Implement the publish endpoint and version resolution query.
  • Checkpoint: Publishing a prompt returns a content hash, and resolving by name and range returns the correct version.

Phase 2: Compatibility and Governance

  • Add consumer registration endpoint and dependency graph queries.
  • Build the schema compatibility checker (backward, forward, breaking classification).
  • Implement the approval workflow state machine with RBAC.
  • Generate impact reports for promotion requests.
  • Checkpoint: Publishing a breaking version triggers a compatibility failure, and only authorized roles can transition states.

Phase 3: Operational Hardening

  • Build the audit trail with hash-chained immutable entries.
  • Implement migration window management with progress tracking.
  • Add emergency rollback path.
  • Document runbook and incident/debug flow.
  • Checkpoint: Full lifecycle (publish -> approve -> promote -> deprecate -> archive) works with audit entries for every transition.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Content hashing | SHA-256 vs SHA-1 vs BLAKE3 | SHA-256 | Standard in container registries, strong collision resistance, widely supported | | Metadata store | Postgres vs SQLite vs DynamoDB | Postgres | Supports complex queries (range resolution, dependency graphs), ACID transactions | | Audit storage | Same DB vs append-only log vs external service | Append-only table with hash chain | Tamper-evident without external dependencies | | Version resolution | Custom vs npm/semver library | Use semver library | Battle-tested range resolution, avoid reimplementing edge cases | | Workflow engine | Custom state machine vs off-the-shelf | Custom state machine | Domain-specific preconditions and RBAC rules require custom logic |

6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | Validate deterministic building blocks | Content hashing, semver range resolution, schema diff classification | | Integration Tests | Verify end-to-end project path | Publish -> resolve, publish -> compatibility check -> impact report | | Edge Case Tests | Ensure robust failure handling | Duplicate version publish, stale consumer, hash chain verification |

6.2 Critical Test Cases

  1. Publishing the same content with different version numbers produces the same content hash.
  2. A breaking schema change with a minor version bump is rejected.
  3. Version range ^2.3.0 correctly resolves to the highest compatible PROMOTED version.
  4. Unauthorized state transitions are blocked with clear error messages.
  5. The audit trail hash chain detects an artificially modified entry.
  6. Replay with same fixtures produces identical responses.

6.3 Test Data

fixtures/prompts/refund_assistant_v2.3.0.json
fixtures/prompts/refund_assistant_v2.3.1.json
fixtures/prompts/refund_assistant_v3.0.0_breaking.json
fixtures/schemas/refund_response.v2.json
fixtures/schemas/refund_response.v3.json
fixtures/consumers/refund_processor.json
fixtures/consumers/support_dashboard.json

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | “Consumers broke after upgrade” | Compatibility checks ignored schema structural changes. | Expand checker to compare field names, types, and required/optional status recursively. | | “No one knows who changed the prompt” | Audit metadata is incomplete or missing actor identity. | Require actor/reason fields on every mutation; fail the mutation if audit write fails. | | “Registry became source of confusion” | No ownership, lifecycle states, or clear naming conventions. | Add ownership metadata, enforce prompt naming conventions, and require explicit state transitions. | | “Hash collisions between versions” | Canonical form is not consistent (whitespace, encoding). | Define and document the canonical form; normalize before hashing; add collision detection. | | “Migration window expired with unmigrated consumers” | No progress tracking or escalation triggers. | Track consumer migration status; send reminders at 50%, 75%, and 90% of the window; escalate to team leads. |

7.2 Debugging Strategies

  • Re-run the publish and resolution flow with verbose logging enabled to see each step: hash computation, version matching, compatibility evaluation.
  • Compare the content hash of a problematic version against the expected hash by recomputing from the stored template text.
  • Walk the audit trail hash chain to find any inconsistencies that might indicate data corruption.
  • Query the dependency graph for the affected prompt to see which consumers reported issues.

7.3 Performance Traps

  • Version resolution scans all promoted versions; add a B-tree index on (prompt_name, status, version) for fast range queries.
  • Compatibility checking against many consumers can be slow; run checks in parallel and cache schema diff results keyed by (old_schema_hash, new_schema_hash).
  • Audit trail writes on the hot path (every API call) must be fast; use asynchronous writes with a write-ahead log to avoid blocking the response.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a web UI for browsing the prompt catalog with version history and diff views.
  • Add support for tagging prompt versions (e.g., “latest”, “stable”, “canary”).

8.2 Intermediate Extensions

  • Integrate with the evaluation harness (Project 1) to automatically run eval suites on publish.
  • Add automated version bump suggestion based on schema diff and eval results.
  • Build a migration progress dashboard showing consumer status per migration window.

8.3 Advanced Extensions

  • Implement federated registries where multiple teams run their own registries with cross-registry dependency resolution.
  • Add CI/CD pipeline integration that blocks merges if the prompt version is not registered and approved.
  • Build a dependency graph visualizer showing all prompts, consumers, and their version relationships.

9. Real-World Connections

9.1 Industry Applications

  • PromptOps platform teams operating AI features under compliance constraints (finance, healthcare).
  • Internal AI governance tooling for release safety and incident response at enterprise scale.
  • MLOps teams extending their model registries to include prompt artifacts as first-class versioned entities.
  • MLflow Prompt Registry: commit-based versioning for GenAI prompts with evaluation integration.
  • Braintrust: content-addressable prompt versioning with environment-based deployment pipelines.
  • Langfuse: prompt management with versioning and observability integration.
  • Confluent Schema Registry: schema compatibility checking patterns applicable to prompt output schemas.

9.3 Interview Relevance

  • Demonstrates understanding of artifact management patterns (content-addressable storage, immutability, version lineage).
  • Shows ability to design dependency tracking and automated compatibility checking.
  • Proves governance discipline: approval workflows, RBAC, audit trails, and compliance-grade traceability.

10. Resources

10.1 Essential Reading

  • Semantic Versioning 2.0.0 specification (semver.org)
  • “Designing Data-Intensive Applications” by Martin Kleppmann - Chapter 4: Encoding and Evolution
  • MLflow Prompt Registry documentation
  • Confluent Schema Registry documentation (compatibility modes)

10.2 Video Resources

  • Talks on MLOps model registry architecture and version management patterns.
  • Conference talks on schema evolution and backward compatibility in distributed systems.

10.3 Tools & Documentation

  • node-semver library documentation (npm semver range resolution)
  • Docker Registry HTTP API V2 specification (content-addressable storage patterns)
  • JSON Schema specification and JSON Schema diff tools.
  • Project 1 (Prompt Contract Harness): produces the contracts and evaluation suites that the registry references.
  • Project 11 (Canary Prompt Rollout Controller): uses the registry’s migration windows and promoted versions for traffic routing.
  • Project 14 (Adversarial Eval Forge): provides the eval suite that runs as a precondition in the approval workflow.
  • Project 18 (Production Prompt Platform Capstone): integrates the registry as a core subsystem in the full PromptOps platform.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain why prompts need content-addressable storage and how it works.
  • I can describe the semver rules for prompt artifacts and classify changes as MAJOR/MINOR/PATCH.
  • I can design a compatibility checker that uses consumer registrations and schema diffs.
  • I can explain the approval workflow state machine and separation of duties.
  • I can describe how a tamper-evident audit trail works using hash chains.

11.2 Implementation

  • Publishing a prompt returns a content hash and stores it immutably.
  • Version resolution correctly handles caret, tilde, and exact ranges.
  • Compatibility checking detects breaking schema changes and generates impact reports.
  • Approval workflow enforces RBAC and blocks unauthorized state transitions.
  • Audit trail records every mutation with hash chain integrity.

11.3 Growth

  • I can describe the tradeoffs between centralized and federated registry architectures.
  • I can explain this project’s design in an interview setting with concrete examples.
  • I can identify how the registry integrates with CI/CD pipelines and adjacent projects.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Prompt artifacts can be published with content-addressable hashes and semantic versions.
  • Version resolution by name and range works correctly for PROMOTED versions.
  • At least one failure-path scenario (breaking change, unauthorized promotion) returns a unified error.

Full Completion:

  • Consumer registration, compatibility checking, and impact reports are functional.
  • Approval workflow enforces state machine transitions with RBAC.
  • Audit trail with hash chain records every mutation.
  • Migration window management with progress tracking.

Excellence (Above & Beyond):

  • Integrates with CI pipeline to automate eval suite and compatibility check on publish.
  • Federated registry support or cross-registry dependency resolution.
  • Demonstrates incident drill: rollback path with audit trail verification.