Project Expansion Engine: Build an AI-Powered Learning Guide Expander
Goal: You will learn how to turn raw project idea lists into complete, teachable, project-based learning guides that read like a mini-book and ship as ready-to-publish Markdown. You will master instructional design fundamentals (backward design, scaffolding, cognitive load), apply them to build structured learning content, and operationalize the process with a retrieval-augmented generation (RAG) pipeline. By the end, you will be able to build an end-to-end expander that analyzes a source file, enriches it with verified sources, generates detailed project deep-dives, and validates the result with a quality rubric. You will also gain the mental models to reason about quality, depth, and correctness, not just text generation.
Introduction: What This Guide Covers
Project Expansion is the process of transforming a short list of project ideas into a complete learning guide with theory, structured scaffolding, and detailed, buildable projects. In practice, it combines instructional design, technical writing, and automated content generation into a single pipeline.
What you will build (by the end of this guide):
- A structured expander that converts idea lists into full-length learning guides
- A concept map + prerequisite analyzer that drives project sequencing
- A RAG-backed citation system that grounds claims in primary sources
- A quality linter that enforces completeness and pedagogical depth
- An end-to-end CLI that outputs publish-ready Markdown
Scope (what is included):
- Instructional design theory used by your expander
- A practical architecture for RAG-powered content generation
- A complete set of expansion projects with verification steps
Out of scope (for this guide):
- Building a full web UI or authoring platform
- Fine-tuning LLMs or training custom foundation models
- Creating proprietary datasets or copyrighted content
The Big Picture (Mental Model)
Idea List
|
v
[Parse + Normalize] -> [Concept Map] -> [Learning Objectives]
| | |
v v v
[Source Retrieval] ---> [Theory Primer] -> [Project Drafts]
| | |
v v v
[Quality Linter] <----- [Rubric + DoD] -> [Publishable Guide]
Key Terms You Will See Everywhere
- Project Expansion: Turning short project ideas into complete learning guides.
- Learning Objective: A measurable outcome describing what the learner can do.
- Backward Design: Start with outcomes, then define evidence, then design activities.
- Scaffolding: Temporary support to help learners master a task they cannot yet do alone.
- RAG (Retrieval-Augmented Generation): Combine retrieval of external sources with generation.
- Definition of Done (DoD): A checklist of explicit, testable completion criteria.
How to Use This Guide
- Read the Theory Primer first. It builds the mental models you will encode into your expander.
- Build the projects in order. Each project adds a layer to the pipeline.
- Use the checklists. Every project has a Definition of Done and pitfalls.
- Iterate with examples. Feed your tool a real idea list and refine the output.
- Measure quality. Use the rubric and linting checks to enforce depth and correctness.
Prerequisites & Background Knowledge
Before starting these projects, you should have foundational understanding in these areas:
Essential Prerequisites (Must Have)
Programming Skills:
- Proficiency in at least one language (Python, Go, or TypeScript)
- Ability to read and write structured text (Markdown, YAML, JSON)
- Comfort with CLI workflows and file system operations
Software Design Fundamentals:
- Decomposition into modules, interfaces, and data flow
- Basic error handling and logging
- Recommended Reading: “Clean Architecture” by Robert C. Martin - Ch. 1-5
Algorithms & Data Structures Basics:
- Arrays, maps, graphs, and basic traversal
- Recommended Reading: “Algorithms, Fourth Edition” by Sedgewick & Wayne - Ch. 1-4
Helpful But Not Required
Natural Language Processing Basics:
- Tokenization, embeddings, and similarity search
- Can learn during: Project 3, Project 4
Instructional Design Knowledge:
- Learning objectives, scaffolding, and assessment
- Can learn during: Theory Primer chapters and Project 2
Self-Assessment Questions
Before starting, ask yourself:
- Can I parse and transform Markdown reliably?
- Can I design a pipeline with clear stages and outputs?
- Can I write or evaluate a rubric with measurable criteria?
- Do I understand the difference between “content” and “assessment”?
- Can I validate outputs with repeatable tests?
If you answered “no” to questions 1-3: Spend 1-2 weeks reviewing software design and basic parsing.
If you answered “yes” to all 5: You’re ready to begin.
Development Environment Setup
To complete these projects, you’ll need:
Required Tools:
- A Linux or macOS environment
- Python 3.11+ (or Node 20+)
rg(ripgrep) for fast file search- Git for version control
Recommended Tools:
jqfor JSON inspectionpandocfor Markdown conversions- A local vector database (SQLite + sqlite-vss, or an in-memory FAISS index)
Testing Your Setup:
$ python --version
Python 3.11.6
$ rg --version
ripgrep 13.0.0
Time Investment
- Simple projects (1, 2): Weekend (4-8 hours each)
- Moderate projects (3, 4, 5): 1-2 weeks (10-20 hours each)
- Complex projects (6, 7): 2+ weeks (20-40 hours each)
- Total sprint: 6-10 weeks if doing all projects sequentially
Important Reality Check
This is a system-design and content-quality challenge, not just a coding exercise. Expect to iterate:
- First pass: Build a working pipeline
- Second pass: Improve completeness and correctness
- Third pass: Tighten outputs with rubrics and automated checks
- Fourth pass: Refine for clarity, depth, and teaching quality
Big Picture / Mental Model (Diagram First)
INPUTS CORE PIPELINE OUTPUTS
Idea List -> Parse/Normalize -> Concept Graph -> Source Retrieval -> Draft
| | | | |
v v v v v
Existing Guides Metadata Map Learning Objectives Evidence Pack Structured Guide
| |
v v
Quality Rubric <------------------------- Linter ---------------------- Publishable MD
Theory Primer (Mini-Book)
Chapter 1: Project-Based Learning (PBL) Foundations
Definitions & Key Terms
- Project-Based Learning (PBL): A teaching method where learners gain knowledge and skills by investigating and responding to an authentic, complex question or challenge over an extended period (see PBLWorks definition and Gold Standard PBL).
- Driving Question: The central problem or question that anchors the project.
- Public Product: A final artifact shared with a real audience.
Source note: PBLWorks provides a widely used definition and the Gold Standard PBL elements used in this guide.
PBLWorks defines Gold Standard PBL using seven essential project design elements: a challenging problem or question, sustained inquiry, authenticity, student voice and choice, reflection, critique and revision, and a public product. These elements should shape how you expand and structure projects in your guide.
Mental Model Diagram
Driving Question
|
v
Sustained Inquiry -> Evidence -> Iteration -> Public Product
| ^
v |
Authenticity + Voice + Reflection + Critique
How It Works (Step-by-Step)
- Start with a real problem. The driving question frames the entire project.
- Sustain inquiry. Learners research, test, and refine ideas over time.
- Create artifacts. Outputs should be tangible (code, docs, demo).
- Expose to feedback. Critique and revision raise quality.
- Make it public. A public product increases rigor and accountability.
Trade-offs
- Pros: Deep learning, strong transfer, higher engagement.
- Cons: Higher upfront design cost, harder assessment, risk of scope creep.
Minimal Concrete Example
Project: "Build a CLI that turns idea lists into full learning guides"
Driving Question: "How can we automate the creation of high-quality learning guides?"
Public Product: A publishable Markdown guide plus a demo walkthrough
Common Misconceptions
- “Any project is PBL” (false: PBL requires inquiry, authenticity, and public output)
- “PBL is unstructured” (false: PBL is structured but learner-driven)
Check-Your-Understanding Questions
- What is the difference between a driving question and a task list?
- Which PBL element ensures quality improvement over time?
Where You Will Apply It
- Project 1 (Guide Inventory)
- Project 4 (Guide Generator)
- Project 7 (End-to-End CLI)
Chapter 2: Backward Design (Understanding by Design)
Definitions & Key Terms
- Backward Design: Start with desired results, determine acceptable evidence, then design learning experiences.
- Learning Objectives: Observable, measurable outcomes.
Wiggins and McTighe describe three stages: (1) identify desired results, (2) determine acceptable evidence, (3) plan learning experiences and instruction. This is the core scaffold for your expander (see Understanding by Design / UbD summaries). Source note: The University of Florida IFAS EDIS summary provides a concise description of the UbD three-stage model.
Mental Model Diagram
Desired Results -> Evidence -> Learning Experiences
| | |
v v v
Objectives Rubrics/DoD Projects + Theory
How It Works (Step-by-Step)
- Write learning objectives. Use action verbs.
- Define evidence. Rubrics, checklists, test outputs.
- Design activities. Projects that force evidence to appear.
Trade-offs
- Pros: Alignment, clarity, measurable outcomes.
- Cons: Can feel rigid if objectives are too narrow.
Minimal Concrete Example
Objective: "Learner can design a RAG pipeline and explain trade-offs."
Evidence: A project with retrieval metrics + failure analysis.
Activity: Build a RAG retriever with evaluation harness.
Common Misconceptions
- “Objectives are only for teachers” (false: they guide tool design)
- “Evidence is just a quiz” (false: evidence can be real artifacts)
Check-Your-Understanding Questions
- What evidence would prove a learner can design a pipeline?
- How do objectives change project sequencing?
Where You Will Apply It
- Project 2 (Concept Map)
- Project 6 (Quality Linter)
- Project 7 (CLI Orchestrator)
Chapter 3: Bloom’s Revised Taxonomy and Assessment Evidence
Definitions & Key Terms
Bloom’s revised taxonomy orders cognitive processes: Remember, Understand, Apply, Analyze, Evaluate, Create. The revision (Anderson & Krathwohl, 2001) shifted focus toward action verbs and placed “Create” at the top (see revised Bloom taxonomy summaries). Source note: The University of Delaware revised Bloom taxonomy summary is a concise reference for the 2001 update.
Mental Model Diagram
Remember -> Understand -> Apply -> Analyze -> Evaluate -> Create
| | | | | |
v v v v v v
Recall Explain Use Break apart Judge Build
How It Works (Step-by-Step)
- Tag each section with the intended cognitive level.
- Ensure projects reach “Create” by requiring real artifacts.
- Align rubrics with the level (e.g., evaluate = justify trade-offs).
Trade-offs
- Pros: Ensures depth and progression.
- Cons: Can oversimplify messy learning paths.
Minimal Concrete Example
Remember: Define RAG
Apply: Implement a retriever
Analyze: Compare chunking strategies
Evaluate: Justify retrieval metrics
Create: Build an end-to-end expander CLI
Common Misconceptions
- “Bloom is linear” (false: learners jump levels)
- “Create is only art” (false: design and systems architecture count)
Check-Your-Understanding Questions
- Which taxonomy level best fits “compare chunking strategies”?
- How would you assess “evaluate” in a project?
Where You Will Apply It
- Project 4 (Guide Generator)
- Project 6 (Quality Linter)
Chapter 4: Scaffolding, ZPD, and Cognitive Load
Definitions & Key Terms
- Scaffolding: Temporary support that enables learners to do tasks they cannot yet do alone. Wood, Bruner, and Ross introduced the term in 1976 (see instructional scaffolding overviews).
- Zone of Proximal Development (ZPD): The gap between what a learner can do alone and what they can do with support (see ZPD summaries).
- Cognitive Load Theory: Instruction should manage intrinsic load, reduce extraneous load, and promote germane load (see cognitive load theory overviews).
Source note: Cognitive load theory distinguishes intrinsic load (task complexity), extraneous load (inefficient presentation), and germane load (schema construction). Teaching center summaries provide applied guidance for these categories.
Mental Model Diagram
Current Skill -------- ZPD -------- Target Skill
| | |
v v v
Independent With Scaffolds Independent
How It Works (Step-by-Step)
- Break tasks into steps. Reduce intrinsic load.
- Remove distractions. Reduce extraneous load.
- Add prompts and examples. Increase germane load.
- Fade supports. Learner becomes independent.
Trade-offs
- Pros: Prevents overwhelm, improves retention.
- Cons: Over-scaffolding can reduce autonomy.
Minimal Concrete Example
Layered hints:
Hint 1: Show file layout
Hint 2: Provide pseudocode
Hint 3: Give a minimal code snippet
Common Misconceptions
- “Scaffolding is hand-holding” (false: it is temporary and faded)
- “More detail is always better” (false: overload kills learning)
Check-Your-Understanding Questions
- What is the difference between intrinsic and extraneous load?
- When should scaffolds be removed?
Where You Will Apply It
- Project 4 (Guide Generator)
- Project 5 (Project Expander)
- Project 6 (Quality Linter)
Chapter 5: Experiential Learning Cycle (Kolb)
Definitions & Key Terms
Kolb’s cycle: Concrete Experience -> Reflective Observation -> Abstract Conceptualization -> Active Experimentation (see experiential learning cycle summaries). Source note: University resources on Kolb’s experiential learning cycle provide the four-stage model used here.
Mental Model Diagram
Experience -> Reflect -> Concept -> Experiment -> Experience
How It Works (Step-by-Step)
- Experience: Build something concrete.
- Reflect: Analyze what worked and failed.
- Conceptualize: Extract general principles.
- Experiment: Apply new understanding.
Trade-offs
- Pros: Strong retention, practical transfer.
- Cons: Slower than lecture-only approaches.
Minimal Concrete Example
Build a parser -> Review failures -> Formalize rules -> Improve parser
Common Misconceptions
- “Reflection is optional” (false: reflection is core to learning)
- “Cycle must be linear” (false: entry can be at any point)
Check-Your-Understanding Questions
- Which step turns experience into general knowledge?
- How does this cycle map to project iteration?
Where You Will Apply It
- Project 7 (End-to-End CLI)
- Project 6 (Quality Linter)
Chapter 6: Retrieval-Augmented Generation (RAG) for Grounded Content
Definitions & Key Terms
- RAG: Combine retrieval from external sources with generation to produce grounded outputs. RAG was formalized as a model combining parametric and non-parametric memory in 2020 (see the RAG paper and modern RAG pipeline docs).
- Indexing: Load, split, embed, and store documents.
- Retrieval: Search the index for relevant chunks at runtime.
Source note: The 2020 RAG paper introduced a model that pairs a parametric generator with non-parametric retrieval, while modern RAG guides (e.g., LangChain) describe indexing as load -> split -> embed -> store.
Mental Model Diagram
Query -> Retrieve -> Context Pack -> Generate -> Answer + Citations
How It Works (Step-by-Step)
- Index sources. Split documents, store embeddings.
- Retrieve at query time. Select top-k relevant chunks.
- Generate with context. The model cites provided evidence.
- Validate citations. Ensure claims are traceable.
Trade-offs
- Pros: Up-to-date, domain-specific, grounded answers.
- Cons: Retrieval errors propagate to generation.
Minimal Concrete Example
Input: "Define scaffolding"
Retrieve: Pedagogy sources
Generate: Definition with cited sources
Common Misconceptions
- “RAG removes hallucinations” (false: it reduces but does not eliminate)
- “More context is always better” (false: context noise reduces quality)
Check-Your-Understanding Questions
- Why is chunking strategy critical?
- How do you validate a citation is real?
Where You Will Apply It
- Project 3 (Source Retrieval)
- Project 4 (Guide Generator)
- Project 5 (Project Expander)
Chapter 7: Quality Assurance for Learning Guides
Definitions & Key Terms
- Rubric: A scored set of criteria to judge completeness and quality.
- Linting: Automated validation of structure, sections, and evidence.
- Definition of Done (DoD): A checklist that proves completion.
Mental Model Diagram
Draft -> Lint -> Rubric Score -> Fix -> Publish
How It Works (Step-by-Step)
- Define rubric criteria. Depth, accuracy, structure.
- Write lint checks. Missing sections, empty tables, no outputs.
- Score and flag. Enforce minimum thresholds.
Trade-offs
- Pros: Consistency, repeatability.
- Cons: Can over-penalize creative outputs.
Minimal Concrete Example
Rule: "Every project must include Definition of Done"
Lint: Fail if missing or empty
Common Misconceptions
- “Quality is subjective” (false: many aspects are measurable)
- “Rubrics reduce creativity” (false: they clarify expectations)
Check-Your-Understanding Questions
- What should be in a DoD checklist?
- Which errors should block publishing?
Where You Will Apply It
- Project 6 (Quality Linter)
- Project 7 (End-to-End CLI)
Glossary (High-Signal)
- Artifact: A tangible output (code, report, dataset, demo) produced by a project.
- Chunking: Splitting documents into smaller pieces for retrieval.
- Concept Map: A graph of ideas and dependencies.
- Driving Question: The central inquiry that defines a project.
- Evidence Pack: Curated sources used to justify factual claims.
- Hint Layering: Progressive assistance from broad to specific.
- Public Product: Final output shared beyond the author.
- Rubric: Scored criteria for evaluating outputs.
- Scaffolding: Temporary support that is gradually removed.
- ZPD: The skill zone where support enables growth.
Why Project Expansion Matters
Modern project-based learning is highly effective because it aligns authentic work with deep inquiry and public output. A 2023 meta-analysis of 66 experimental and quasi-experimental studies (190 effect sizes) reported positive effects of PBL on student learning outcomes, including academic achievement and thinking skills. The same study reported stronger effects in engineering/technology subjects, in lab-oriented classes, with small group sizes (4-5 learners), and with project durations around 9-18 weeks.
Well-designed PBL uses elements like sustained inquiry, authenticity, critique, and public product to increase engagement and quality. These elements also translate directly into better learning guides.
Old Approach vs New Approach
OLD APPROACH NEW APPROACH
Short idea list Full learning guide
No objectives Clear outcomes + evidence
No scaffolding Layered hints + DoD
No sources Grounded citations
Sources & Evidence Pack (Suggested)
- PBLWorks Gold Standard PBL design elements: https://www.pblworks.org/for/gold_standard_pbl
- PBLWorks Gold Standard blog explainer: https://www.pblworks.org/blog/gold_standard_pbl_essential_project_design_elements
- Understanding by Design (UbD) stages summary: https://edis.ifas.ufl.edu/publication/WC322
- Revised Bloom taxonomy (2001) summary: https://www1.udel.edu/educ/gottfredson/451/revisedbloom
- Cognitive Load Theory overview: https://www.sfasu.edu/ctl/resources/learning-design/cognitive-load
- Instructional scaffolding overview: https://en.wikipedia.org/wiki/Instructional_scaffolding
- Zone of Proximal Development overview: https://www.nysed.gov/bilingual-ed/topic-brief-4-zone-proximal-development-affirmative-perspective-teaching-ells-and-mls
- Kolb experiential learning cycle: https://tlc.uthsc.edu/experiential-learning/
- RAG paper (Lewis et al., 2020): https://arxiv.org/abs/2005.11401
- RAG pipeline concepts: https://python.langchain.com/docs/tutorials/rag/
- PBL meta-analysis (Zhang & Ma, 2023): https://doaj.org/article/1b9babf4d78a4868918d0ab4224004a0
Concept Summary Table
| Concept | What You Must Internalize | Where It Appears |
|---|---|---|
| PBL Elements | Driving question, inquiry, authenticity, public product | Projects 1, 4, 7 |
| Backward Design | Outcomes -> evidence -> activities | Projects 2, 6 |
| Bloom’s Taxonomy | Depth progression and assessment levels | Projects 4, 6 |
| Scaffolding + ZPD | Support and fade strategy | Projects 4, 5 |
| Cognitive Load | Manage intrinsic/extraneous/germane | Projects 4, 5 |
| Kolb Cycle | Experience -> reflection -> concept -> experiment | Projects 6, 7 |
| RAG Pipeline | Index -> retrieve -> generate -> cite | Projects 3, 4, 5 |
| QA Rubrics | Linting + measurable completeness | Projects 6, 7 |
Project-to-Concept Map
| Project | Core Concepts |
|---|---|
| 1. Guide Inventory & Diff Scanner | PBL elements, backward design |
| 2. Concept Graph & Prerequisite Mapper | Backward design, Bloom taxonomy |
| 3. Source Retrieval & Citation Packager | RAG pipeline, grounding |
| 4. Template-Driven Guide Generator | Scaffolding, cognitive load |
| 5. Project Deep-Dive Expander | Scaffolding, Bloom taxonomy |
| 6. Quality Linter & Rubric Scorer | QA, evidence, assessment |
| 7. End-to-End CLI Orchestrator | Kolb cycle, system integration |
Deep Dive Reading by Concept
| Concept | Book | Chapter(s) | Why This Matters |
|---|---|---|---|
| Learning objectives | “Understanding by Design” (Wiggins & McTighe) | Ch. 1-3 | Backward design for objectives and evidence |
| Scaffolding and load | “How People Learn” (Bransford et al.) | Ch. 2-4 | How learners build knowledge |
| Evaluation and rubrics | “Classroom Assessment Techniques” (Angelo & Cross) | Ch. 1-2 | Practical assessment design |
| Software architecture | “Clean Architecture” (Martin) | Ch. 1-5 | Pipeline modularity and boundaries |
| Tooling workflows | “The Pragmatic Programmer” (Hunt & Thomas) | Ch. 3-5 | Automation and feedback loops |
Quick Start (First 48 Hours)
Day 1: Understanding the pipeline
- Read Chapters 1-3 of the Theory Primer
- Sketch the pipeline diagram on paper
- Define 3 learning objectives for a sample guide
Day 2: First working output
- Build a minimal parser that loads an idea list
- Create a single expanded project template
- Generate one project with real output and DoD
Recommended Learning Paths
Path A: Educator or Instructional Designer
- Chapters 1-5 (PBL, backward design, Bloom, scaffolding)
- Projects 1, 2, 4, 6
Path B: LLM Engineer / Tool Builder
- Chapters 6-7 (RAG, QA)
- Projects 3, 4, 5, 7
Path C: Full-Stack Builder
- Read everything
- Build projects in order 1-7
Success Metrics
- You can generate a guide with all required sections in < 2 minutes
- Each project has exact output examples and a DoD checklist
- Every factual claim is grounded in a source pack
- Linter catches missing sections and empty tables
- At least 3 example guides pass the rubric with > 90% score
Optional Appendices
Appendix A: Guide Section Checklist
- Goal
- Introduction + big picture diagram
- How to use this guide
- Prerequisites + self-assessment
- Theory primer chapters
- Glossary
- Why topic matters + stats
- Concept summary table
- Project-to-concept map
- Reading list by concept
- Quick start + learning paths
- Success metrics
Appendix B: Quality Rubric (Sample)
| Criterion | Description | Pass Threshold |
|---|---|---|
| Completeness | All required sections present | 100% |
| Depth | Each concept has definitions + diagram + example | 90% |
| Grounding | Claims trace to citations | 95% |
| Practicality | Every project has runnable output | 100% |
Projects
Project 1: Guide Inventory & Diff Scanner
Build a tool that parses a directory of guides and identifies missing or inconsistent sections.
Real World Outcome
You can run a command like:
$ expander scan ./project_based_ideas
[scan] 427 guides found
[scan] 112 missing: "Concept Summary Table"
[scan] 89 missing: "Project-to-Concept Map"
[scan] 37 missing: "Definition of Done" sections
[scan] Report written to reports/guide-audit-2025-12-31.json
The Core Question You’re Answering
How can we systematically detect gaps in learning guides so we know exactly what to expand?
Concepts You Must Understand First
- Markdown parsing and ASTs
- Section normalization and templates
- Backward design evidence mapping
- Reading: “Clean Architecture” Ch. 1-3
Questions to Guide Your Design
- How will you detect headings with inconsistent names?
- What is your minimal schema for a “complete” guide?
- How do you handle partial or empty sections?
Thinking Exercise
Sketch a schema that represents required sections as a JSON structure. Then manually map two existing guides into that schema and list the differences.
The Interview Questions They’ll Ask
- How did you normalize headings across inconsistent files?
- What strategy did you use for partial matches?
- How do you minimize false positives in linting?
- How would you scale this to thousands of files?
Hints in Layers
Hint 1: Start with a heading index.
if line.startswith("#"):
headings.append(line.strip("# "))
Hint 2: Normalize text to compare.
def norm(s):
return " ".join(s.lower().split())
Hint 3: Build a schema validator.
missing = [h for h in REQUIRED if h not in headings]
Books That Will Help
| Book | Chapter | Why |
|---|---|---|
| “Clean Code” | Ch. 3 | Parsing and normalization clarity |
| “Clean Architecture” | Ch. 1-3 | Separation of concerns |
| “Refactoring” | Ch. 2 | Improving parsing logic |
Common Pitfalls & Debugging
Problem: “Heading not detected”
- Why: Markdown heading uses
##but parser expects# - Fix: Accept 1-6 hash levels
- Quick test: Parse a file with multiple heading levels
Problem: “False missing sections”
- Why: Section is present but heading is renamed
- Fix: Add synonyms mapping
- Quick test: Add a custom mapping for “Learning Goals” -> “Goal”
Definition of Done
- Scanner reports missing sections with correct counts
- Normalizes headings reliably across at least 10 guides
- Produces JSON report with file-level diagnostics
- Handles empty sections without crashing
Project 2: Concept Graph & Prerequisite Mapper
Build a concept map engine that extracts concepts from a guide and infers prerequisite relationships.
Real World Outcome
$ expander concepts ./project_based_ideas/LEARN_LLM_MEMORY.md
[concepts] 18 concepts found
[graph] 27 edges inferred
[graph] Wrote graph to graphs/learn_llm_memory.dot
The Core Question You’re Answering
How do we decide the order in which concepts should appear to maximize learning?
Concepts You Must Understand First
- Learning objectives and backward design
- Bloom taxonomy levels
- Graph modeling (nodes, edges)
- Reading: “Algorithms, Fourth Edition” Ch. 4 (graphs)
Questions to Guide Your Design
- How will you extract concept candidates from text?
- How will you infer dependencies without hallucination?
- How do you represent uncertainty in the graph?
Thinking Exercise
Take 5 concepts (RAG, retrieval, embeddings, chunking, evaluation) and draw a dependency graph. Explain each edge in one sentence.
The Interview Questions They’ll Ask
- How do you avoid cycles in prerequisite graphs?
- What is your confidence scoring method?
- How do you validate the graph is pedagogically sound?
- How would you use the graph to reorder sections?
Hints in Layers
Hint 1: Start with keyword extraction.
concepts = set(re.findall(r"\b[A-Z][a-z]+\b", text))
Hint 2: Add a manual synonym map.
ALIASES = {"RAG": "Retrieval Augmented Generation"}
Hint 3: Use heuristics for edges.
if "embeddings" in sentence and "retrieval" in sentence:
add_edge("Embeddings", "Retrieval")
Books That Will Help
| Book | Chapter | Why |
|---|---|---|
| “Algorithms, Fourth Edition” | Ch. 4 | Graph modeling |
| “Clean Architecture” | Ch. 4 | Dependency rules |
| “Design Patterns” | Ch. 1 | Reusable graph patterns |
Common Pitfalls & Debugging
Problem: “Concept explosion”
- Why: Overly broad extraction
- Fix: Apply a whitelist or frequency threshold
- Quick test: Limit to top-30 terms
Problem: “Cycles everywhere”
- Why: Naive dependency inference
- Fix: Add direction rules and prune low-confidence edges
- Quick test: Run cycle detection and log removals
Definition of Done
- Extracts concepts with <= 20% noise on sample guides
- Produces a DOT or JSON graph with edges
- Includes confidence scores per edge
- Supports manual overrides for critical concepts
Project 3: Source Retrieval & Citation Packager (RAG)
Build a RAG pipeline that retrieves authoritative sources and bundles them for downstream generation.
Real World Outcome
$ expander sources --topic "project-based learning" --limit 8
[sources] 8 sources retrieved
[sources] 5 primary sources, 3 secondary
[sources] Pack saved: packs/pbl-2025-12-31.json
The Core Question You’re Answering
How do we ground generated content in verifiable sources without overwhelming the model?
Concepts You Must Understand First
- RAG pipeline (indexing + retrieval)
- Chunking and similarity search
- Citation formatting
- Reading: “The Pragmatic Programmer” Ch. 3-4
Questions to Guide Your Design
- What qualifies as a “primary” source?
- How do you prevent duplicate or low-quality sources?
- How do you store the evidence pack so it is reusable?
Thinking Exercise
Take three sources about PBL and extract one sentence each that could be cited in a guide. Rank them by authority.
The Interview Questions They’ll Ask
- How do you evaluate retrieval quality?
- How do you handle outdated or conflicting sources?
- What is your chunking strategy and why?
- How do you stop citation drift?
Hints in Layers
Hint 1: Store sources with minimal schema.
{"title": "...", "url": "...", "year": 2023, "quotes": []}
Hint 2: Use a two-stage filter.
if domain in TRUSTED and year >= 2018:
keep(source)
Hint 3: Add per-claim citation links.
claim["citations"].append(source_id)
Books That Will Help
| Book | Chapter | Why |
|---|---|---|
| “Clean Code” | Ch. 4 | Structuring data pipelines |
| “Refactoring” | Ch. 3 | Improving retrieval logic |
| “Fundamentals of Software Architecture” | Ch. 8 | Designing pipelines |
Common Pitfalls & Debugging
Problem: “Sources are low quality”
- Why: Unfiltered web search
- Fix: Add domain allowlist + manual curation
- Quick test: Log source domains and counts
Problem: “Citation mismatch”
- Why: Claim text doesn’t align to retrieved evidence
- Fix: Store snippet offsets with citations
- Quick test: Validate citations during generation
Definition of Done
- Retrieves at least 5 high-quality sources per topic
- Stores a reusable evidence pack in JSON
- Includes citations with snippets and dates
- Supports domain allowlists and blacklists
Project 4: Template-Driven Guide Generator
Build a generator that converts an idea list + evidence pack into a structured mini-book guide.
Real World Outcome
$ expander generate ./ideas/LLM_MEMORY.md --evidence packs/llm-memory.json
[generate] Sections written: 14
[generate] Projects expanded: 7
[generate] Output: docs/LLM_MEMORY.md
The Core Question You’re Answering
How do we guarantee consistent, high-depth guides regardless of input quality?
Concepts You Must Understand First
- Scaffolding and cognitive load
- Backward design alignment
- Template systems and slot filling
- Reading: “Clean Architecture” Ch. 6-8
Questions to Guide Your Design
- How do you prevent template sections from becoming generic?
- Which sections require sourced facts vs. original synthesis?
- How do you enforce exact output examples?
Thinking Exercise
Take one project idea and write the “Real World Outcome” as if the CLI already exists. Include realistic output.
The Interview Questions They’ll Ask
- How do you validate that each section has depth?
- How do you handle missing prerequisites?
- How do you avoid contradictory content across sections?
- How do you support different domains (ML vs systems)?
Hints in Layers
Hint 1: Use a strict template.
## Introduction
[definition]
[scope]
Hint 2: Create a section completeness check.
assert "## Glossary" in output
Hint 3: Inject evidence into key claims.
content = insert_citations(content, evidence_pack)
Books That Will Help
| Book | Chapter | Why |
|---|---|---|
| “Clean Architecture” | Ch. 6-8 | Boundary and data flow design |
| “Code Complete” | Ch. 5 | Structured construction |
| “Refactoring” | Ch. 1 | Clean generator code |
Common Pitfalls & Debugging
Problem: “Sections are shallow”
- Why: Template placeholders not expanded
- Fix: Enforce minimum word counts + required subsections
- Quick test: Run a length check per section
Problem: “Output missing CLI examples”
- Why: No detection of project type
- Fix: Classify project type and inject examples
- Quick test: Run a diff to ensure examples exist
Definition of Done
- Generates all required sections in correct order
- Inserts citations for all factual claims
- Ensures each project has real-world output examples
- Produces publishable Markdown without manual fixes
Project 5: Project Deep-Dive Expander
Build a module that expands each project into a full deep-dive file with theory, steps, and validation.
Real World Outcome
$ expander expand-projects ./ideas/LLM_MEMORY.md
[expand] 7 projects detected
[expand] Wrote: ./ideas/LLM_MEMORY/P01-memory-cache.md
[expand] Wrote: ./ideas/LLM_MEMORY/P02-embeddings-store.md
[expand] ...
The Core Question You’re Answering
How do we turn a short project description into a fully teachable, step-by-step learning unit?
Concepts You Must Understand First
- Scaffolding with layered hints
- Bloom taxonomy depth
- Backward design evidence alignment
- Reading: “The Pragmatic Programmer” Ch. 5-7
Questions to Guide Your Design
- How many steps should a deep-dive include?
- How do you link a project back to the theory chapters?
- How do you prevent duplication across projects?
Thinking Exercise
Take a 3-line project idea and expand it into:
- A core question
- 4 milestones
- 3 pitfalls
- A DoD checklist
The Interview Questions They’ll Ask
- How do you choose the correct level of detail?
- How do you prevent repetition across projects?
- How do you structure layered hints?
- How would you add diagrams automatically?
Hints in Layers
Hint 1: Use a consistent project scaffold.
## Real World Outcome
## Core Question
## Definition of Done
Hint 2: Add a milestone structure.
M1: Parse
M2: Generate
M3: Validate
Hint 3: Auto-link to primer sections.
project["concepts"].append("RAG Pipeline")
Books That Will Help
| Book | Chapter | Why |
|---|---|---|
| “Clean Code” | Ch. 7 | Clear, reusable templates |
| “Refactoring” | Ch. 4 | Iterative improvement |
| “Fundamentals of Software Architecture” | Ch. 10 | Decomposition strategy |
Common Pitfalls & Debugging
Problem: “Projects feel copy-pasted”
- Why: Template reuse without adaptation
- Fix: Inject project-specific outcomes and tools
- Quick test: Compare first 200 words across projects
Problem: “No measurable outcome”
- Why: Missing DoD or output example
- Fix: Force a Real World Outcome section
- Quick test: Lint for output code blocks
Definition of Done
- Each project expanded into its own file
- Includes concept references and unique outcomes
- Contains pitfalls, hints, and DoD
- Output is publishable without manual edits
Project 6: Quality Linter & Rubric Scorer
Build a quality checker that scores guides against a rubric and blocks publishing if thresholds fail.
Real World Outcome
$ expander lint ./docs/LLM_MEMORY.md
[lint] Completeness: 100%
[lint] Depth: 92%
[lint] Grounding: 96%
[lint] Practicality: 100%
[lint] PASS (min 90%)
The Core Question You’re Answering
How do we enforce consistent quality across hundreds of generated guides?
Concepts You Must Understand First
- Rubric design and scoring
- Evidence requirements
- Automated validation
- Reading: “Clean Architecture” Ch. 12
Questions to Guide Your Design
- What is your minimum acceptable score per category?
- Which errors should be fatal vs warnings?
- How do you measure “depth” automatically?
Thinking Exercise
Define five rubric criteria and write one automatic check for each.
The Interview Questions They’ll Ask
- How do you quantify qualitative metrics?
- How do you avoid gaming the rubric?
- How do you integrate linting into CI?
- What do you do when sources conflict?
Hints in Layers
Hint 1: Create a rubric schema.
{"completeness": 0.3, "depth": 0.3, "grounding": 0.2, "practicality": 0.2}
Hint 2: Check for required sections.
for h in REQUIRED:
if h not in headings: score -= 10
Hint 3: Add per-section length checks.
if len(section_text) < 300: warn("shallow")
Books That Will Help
| Book | Chapter | Why |
|---|---|---|
| “Refactoring” | Ch. 6 | Improving validation logic |
| “Code Complete” | Ch. 10 | Defensive checks |
| “The Pragmatic Programmer” | Ch. 7 | Automation mindset |
Common Pitfalls & Debugging
Problem: “False fails”
- Why: Overly strict thresholds
- Fix: Tune rubric with real examples
- Quick test: Re-score 5 known-good guides
Problem: “Shallow but passes”
- Why: Only length checks
- Fix: Add semantic checks (keywords, subheadings)
- Quick test: Require diagrams in theory primer
Definition of Done
- Rubric scores are reproducible
- Linter blocks missing critical sections
- Produces a summary report with diagnostics
- Integrates with CLI pipeline
Project 7: End-to-End CLI Orchestrator
Combine all modules into a single CLI that expands guides from idea list to publish-ready output.
Real World Outcome
$ expander run ./ideas/LLM_MEMORY.md --out ./docs
[run] Parsed ideas: 9
[run] Concept graph built
[run] Evidence pack retrieved
[run] Guide generated: docs/LLM_MEMORY.md
[run] 9 projects expanded into ./ideas/LLM_MEMORY/
[run] Quality score: 94% (PASS)
The Core Question You’re Answering
How do we orchestrate a reliable, repeatable pipeline that produces publishable guides at scale?
Concepts You Must Understand First
- Pipeline orchestration
- Dependency management
- Logging and error handling
- Reading: “Fundamentals of Software Architecture” Ch. 12
Questions to Guide Your Design
- How do you ensure deterministic outputs?
- How do you cache results between steps?
- What is your rollback strategy when a step fails?
Thinking Exercise
Design the CLI command structure for all pipeline steps. Identify which steps can be skipped when cached.
The Interview Questions They’ll Ask
- How do you orchestrate tasks safely?
- How do you make the pipeline reproducible?
- What metrics do you log for observability?
- How do you validate end-to-end correctness?
Hints in Layers
Hint 1: Start with a task graph.
scan -> concepts -> sources -> generate -> expand -> lint
Hint 2: Add caching by file hash.
cache_key = sha256(input_text)
Hint 3: Use structured logging.
log.info("stage=generate status=ok duration_ms=1200")
Books That Will Help
| Book | Chapter | Why |
|---|---|---|
| “Fundamentals of Software Architecture” | Ch. 12 | Pipeline orchestration |
| “Clean Architecture” | Ch. 14 | Dependency direction |
| “The Pragmatic Programmer” | Ch. 8 | Automation + tooling |
Common Pitfalls & Debugging
Problem: “Pipeline is flaky”
- Why: Uncontrolled randomness in generation
- Fix: Fix seeds and store prompts
- Quick test: Re-run pipeline and compare hashes
Problem: “Slow retrieval”
- Why: No caching or parallelism
- Fix: Cache evidence packs and parallelize retrieval
- Quick test: Time retrieval with and without cache
Definition of Done
- One command runs the entire pipeline
- Outputs are deterministic across runs
- Each stage logs duration and status
- Pipeline fails fast on critical errors