Project 17: Metadata Optimization and Discoverability Evals
Build a measurable optimization loop for app-directory metadata, prompt invocation quality, and first-action conversion.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Main Programming Language | N/A (evaluation + analytics) |
| Alternative Programming Languages | TypeScript, Python |
| Coolness Level | Level 3 |
| Business Potential | High acquisition leverage |
| Prerequisites | Submission-ready listing, telemetry basics |
| Key Topics | Metadata variants, prompt-set evals, activation funnel metrics |
1. Learning Objectives
- Design high-signal listing metadata variants.
- Validate invocation quality with representative prompt sets.
- Measure discovery and activation independently.
- Select metadata improvements with evidence-based criteria.
2. All Theory Needed (Per-Concept Breakdown)
Discoverability as Contract Optimization
Fundamentals Metadata is functional interface design. It influences discovery, invocation quality, and user trust.
Deep Dive into the concept Optimize metadata using a closed loop: hypothesis -> variant -> evaluation -> rollout -> measurement. Start by defining target jobs-to-be-done and mapping those jobs to concise listing language.
Then run prompt-set evaluations that test clear intents, ambiguous intents, and out-of-scope intents. Measure invocation correctness and completion quality, not just click volume.
Separate discovery metrics (impressions, connect rate) from activation metrics (first successful action, recovery completion). A variant that increases clicks but decreases first-value completion is a regression.
Document each metadata change with rationale and measured effect. Because listing updates can impact review and user expectations, treat metadata revisions as governed releases.
Minimal concrete example
variant_v3:
invocation_correctness=87.5%
connect_rate_delta=+12%
first_action_delta=+9%
recommendation=promote
3. Project Specification
3.1 What You Will Build
A metadata experiment framework with prompt-set evaluation, funnel metrics, and change governance.
3.2 Functional Requirements
- Create at least three listing variants.
- Build prompt-set evals for 30+ intents.
- Measure discovery and activation per variant.
- Promote best variant with changelog evidence.
3.3 Real World Outcome
$ npm run metadata:eval
[ok] variants tested: 3
[ok] prompt intents evaluated: 40
[ok] best invocation correctness: 87.5%
[ok] connect rate improvement: +12%
[ok] first action completion improvement: +9%
winner_variant=v3
4. Solution Architecture
Metadata Variant -> Prompt Eval -> Listing Rollout -> Funnel Metrics -> Decision Gate
5. Implementation Guide
5.1 The Core Question You’re Answering
“How do we improve discoverability without sacrificing match quality and user success?”
5.2 Concepts You Must Understand First
- Job-to-be-done messaging.
- Prompt evaluation design.
- Conversion and activation analytics.
5.3 Questions to Guide Your Design
- Which user job should the listing prioritize?
- Which intents currently misroute?
- What threshold defines a successful variant?
5.4 Thinking Exercise
Create a scorecard that weights invocation correctness, connect rate, and first-action completion for variant selection.
5.5 The Interview Questions They’ll Ask
- How do you test metadata quality objectively?
- Which metrics reveal false-positive discovery gains?
- How do you prevent overfitting to one prompt set?
- How do listing changes affect review strategy?
- What cadence should metadata iterations follow?
5.6 Hints in Layers
- Hint 1: Use one primary user job per variant.
- Hint 2: Keep prompt sets balanced and realistic.
- Hint 3: Treat conversion and activation as separate gates.
- Hint 4: Keep a changelog with before/after metrics.
5.7 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Iterative improvement | “The Pragmatic Programmer” | Feedback loops |
| Interface validation | “API Design Patterns” | Contract testing mindset |
| Measurement discipline | “Code Complete” | Quality metrics |
6. Testing Strategy
- Prompt-set invocation accuracy tests.
- Funnel instrumentation validation.
- Regression checks for onboarding/listing alignment.
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Overbroad listing copy | Wrong audience connects | Focus on one clear job-to-be-done |
| Click-only optimization | Low completion quality | Gate on first-action and recovery metrics |
| Untracked changes | No learning over time | Maintain variant changelog with evidence |
8. Extensions & Challenges
- Add segment-specific metadata variants.
- Add weekly automated experimentation report.
- Add long-term retention analysis by variant cohort.
9. Real-World Connections
- Marketplace growth optimization
- App discovery strategy
- Product analytics and conversion engineering
10. Resources
- OpenAI Apps SDK: Optimize metadata
- OpenAI Apps SDK: App submission guidelines
- OpenAI blog: Developers can now submit apps to ChatGPT
11. Self-Assessment Checklist
- I can design and evaluate metadata variants systematically.
- I can separate discovery gains from activation gains.
- I can justify rollout decisions with measurable evidence.
12. Submission / Completion Criteria
Minimum Viable Completion
- Three metadata variants and one prompt-set evaluation.
Full Completion
- Full funnel instrumentation, governance-ready changelog, and measurable conversion improvement.